RSS propagation as a method for automated content
Transcription
RSS propagation as a method for automated content
RSS propagation as a method for automated content distribution management Jacob Briggs Computing with Artificial Intelligence Session 2005/2006 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student) Summary For a modern computer user, there are often a lot of video, audio, and application files that are acquired periodically or frequently updated. In this environment it is difficult to keep track of all the latest versions, and then to find and acquire these when they are available. As such it is proposed that an application capable of utilising existing technologies could be designed to alleviate this problem. This report covers the issue of content distribution in the 21st century, and presents the research, design and development of a prototype client of automatic content distribution based on RSS management and propagation. The software artefacts of this project can be found at: http://autofeed.jacobbriggs.com Here you will find the Rational Rose Model and the developed program. i Acknowledgements I would like to thank Eric Atwell, my project supervisor for all the help and advice he has given me throughout the course of this project. Similarly, I would also like to acknowledge the users that were interviewed and filled out a questionnaire during this project (who are too numerous to list). I would also like to take this opportunity to thank anyone who takes the time to read this project, or to use the application. For those that are interested, it can be downloaded from http://autofeed.jacobbriggs.com . ii Contents 1 2 Project Overview 1 1.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Introduction: Unified Process - Vision Statement . . . . . . . . . . . . . . . . . . . . 1 1.3 Problem Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.5 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.6 Extensions and enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.7 Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Project Management 4 2.1 General Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Feasibility Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Design and Development Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.1 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.1.1 Waterfall Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.1.2 RAD and Component-Based Development . . . . . . . . . . . . . . 7 2.3.1.3 Prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1.4 Boehm’s Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1.5 Unified Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Management and Evaluation Tools . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2.1 Version Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2.3 Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Project Schedule and Schedule of Documents . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2 2.4 3 Requirements Gathering and Analysis Capture 13 3.1 13 Users, Stakeholders and Problem Domain . . . . . . . . . . . . . . . . . . . . . . . . iii 3.1.1 Current System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Methods of Information Gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 SQIRO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1.1 Sampling of Documents . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1.2 Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1.3 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1.4 Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1.5 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Professionalism and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 16 Evaluation of Existing Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 Types of Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 BitsCast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 RSS Bandit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Steam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Windows Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.3 Comparison of Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.4 Conclusion and Future Systems . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Interviews and Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Automatic and Manual Downloading . . . . . . . . . . . . . . . . . . 20 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Storing of Local Database . . . . . . . . . . . . . . . . . . . . . . . . 20 Searching of Local Database . . . . . . . . . . . . . . . . . . . . . . . 20 System Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6.1 Business Actors and Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . 20 RSS Publishing System . . . . . . . . . . . . . . . . . . . . . . . . . 20 RSS Aggregator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Content Publisher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 3.2.2 3.3 3.6 iv 4 RSS Feed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Website System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Item Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6.3 Initial Use Case Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.6.4 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.6.4.1 Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.6.4.2 Non-Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Background Research 24 4.1 Content Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.1 Client - Server Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1.1 HTTP and the WWW . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1.2 FTP - File Transfer Protocol . . . . . . . . . . . . . . . . . . . . . . 26 Peer-to-Peer Filesharing Networks . . . . . . . . . . . . . . . . . . . . . . . . 27 Syndication Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.1 Push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2.2 News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2.3 Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.4 RSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 RSS - RDF Site Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.1 Metadata Mark-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.2 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.3 Original Use and Project Extension . . . . . . . . . . . . . . . . . . . . . . . 31 4.4 Content Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Client Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 4.2 4.3 4.5.1 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5.3 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 v 5 Design and Implementation using the Unified Process 34 5.1 Development with the Unified Process . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1.1 Addressing Requirements Risks . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1.2 Addressing Architectural Risks . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1.2.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.2.2 Three Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . . 36 Addressing Design Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.1.3.1 Presentation - User Interface Design . . . . . . . . . . . . . . . . . 37 5.1.3.2 Application - Refactoring and Central Classes . . . . . . . . . . . . 37 5.1.3.3 Data - Logical and Conceptual Design via ERD . . . . . . . . . . . 38 Addressing Implementation Risks . . . . . . . . . . . . . . . . . . . . . . . . 40 5.1.4.1 Presentation - User Interface Implementation . . . . . . . . . . . . . 40 5.1.4.2 Data - Physical Implementation via XML Flat File System . . . . . 42 5.1.5 Simplifying the Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.1.6 Polling History and Learning Responsible Request Times . . . . . . . . . . . 43 5.1.3 5.1.4 6 Testing and Evaluation 45 6.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.1.1 Input Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.1.2 Security Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.1.3 Functionality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.1.3.1 Algorithm Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.1.4 Deployment Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.1.5 Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.1.6 J-Unit Tests and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Product Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2.1 User Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Project Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.3.1 50 6.2 6.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography 51 vi Chapter 1 Project Overview 1.1 Project Aim This project’s aim is to study the issues of feasibility of a prototype client for RSS propagation as a solution to tracking and updating content automatically to a user’s specification. As such, this project aims to entail the identification of the problem domain of automated content delivery and management, to discuss options and to design, produce, and evaluate a prototype client for tackling these problems. 1.2 Introduction: Unified Process - Vision Statement There exists a large base of users that actively subscribe to and periodically acquire distributed content1 , which is updated or superseded by subsequent releases. Within this lies the problem of notification, which relies on repeated manual searches on the user’s part to find out if a newer release is available, and secondly acquisition, to locate and download this content when it is released. The identified problems are of manual searches, location and procurement of content (all of which rely on the consumer side in terms of responsibility) and the consumer’s ability to successfully complete each stage when a new release is available. Whilst applications have successfully integrated the ability to periodically check for and acquire updates automatically on program launch or user request, this is isolated and restricted to the application platform being updated itself. This is often designed for the purposes of the developers, with the added disadvantage that this is largely only attainable for application content. As such there is a need for a client capable of automatically acquiring various types of content on the user’s behalf, eliminating the requirement of periodically relying on manual searching, location and downloading to achieve this. In recent years, many have speculated that RSS could facilitate the serving of content meta data and that an RSS client can be designed to propagate and extract this data, subsequently initialising automatic transfers to a user’s specification[37][26][29]. Whilst there has been speculation, no real 1 In the sense of generic non-’standard text’ based static files, for example video, audio and executables, which from here on will be the overriding meaning of the term 1 implementations of the idea have come about, although several RSS news clients have attempted to implement basic download abilities [37]. File enclosures have been a part of RSS since version 2.0, but have only recently come to everyone’s attention with the emergence of podcasting. Most people now associate RSS attachments with MP3 files, but there’s no reason to restrict the attachments to audio files. For the purposes of this report, despite the intent to create a generic and site independent solution the following Problem Domain has been defined in areas where prototype RSS feeds have already been established but lack a client to facilitate automatic retrieval. 1.3 Problem Domain Legal Torrents are a large site that offers various public domain movies, documents, audio and documentaries. The updates of content are sporadic and as such it is desirable to create a means of allowing users to automatically be notified of releases and to acquire files torrent files for later execution. The BBC has started to facilitate the distribution of radio content via its website. Many of its popular shows are released on a regular basis, which are posted to their website for license payers to download. However, navigation on the site is difficult and download links are often restricted to the latest episodes. Therefore it is desirable to simplify the retrieval of content within this domain. 1.4 Objectives The objectives can be extracted from the Unified Process in the form of the ‘vision’ statement [28] and as defined in the problem domain. One such way to distribute news content that has emerged and become a prominent feature of large web communities is the uptake of RSS (RDF Site Summary), an XML based web service that relies on client applications to periodically check for Meta data updates for websites. As such this project will focus on the feasibility of this technology as a solution and on the design a client capable of parsing and acquiring content from such feeds. Therefore the project’s objectives are as follows. • To research and analyse the problem domain, existing solutions, central issues and RSS based technology as a solution. • To design and implement a product that is capable of utilising RSS feeds to solve the issue of content distribution. • Ultimately, to reduce human interaction without losing control over the process of content acquisition. 1.5 Requirements There are two distinct deliverables with a subset of requirements. Firstly, will be an analysis and feasibility study of RSS as a problem solution which will be realised using the Unified Process, with the 2 requirement to justify the suitability of RSS to solve the problem. Secondly, will be the design of a prototype of a robust client capable of meeting the following requirements. • The product should allow users to get material from many locations. • The product should allow users to filter and select content. • The product should allow users to be able to download material automatically. 1.6 Extensions and enhancements Given the potentially large scope of the project, there are a large number of possible extensions beyond the base requirements. Supplementary work for the project report can be summarised in two areas. Firstly, the analysis can easily be extended to include an analysis of security and efficiency. As such the following areas present opportunities for enhancement in terms of background study, research and conclusion. • Analysis of security issues and solutions to make the system of content distribution more secure. • Analysis of content validation and issues of trust in a multiple source network. • Analysis of issues of bandwidth consumption, the impact on the efficiency of the product and ways of reducing this impact. • Issues of server side implementation and the design of a product to help in the extraction and mark-up of feeds to distribute content. • Discussion of multiple sources, load balancing and client content distribution. Secondly, extensions to the design and implementation of a prototype client can be summarised as follows. • The product could be extended to allow users to attain material of different file types • The product could be extended to be able to store a local history of what material it has attained/read from the RSS sources. • The product could be extended to allow the user to search through its local database of what material it has read from RSS files (i.e. Search history) • The product could be extended to address some of the inefficiency in the ’user polling’ network model. • Extended issues of HCI, including customisable interface and design. 1.7 Deliverables Upon completing the project there will be two deliverables. 1. A report covering the analysis, feasibility, design and evaluation of a prototype client. 2. Software artefacts for a working prototype product that meet the ‘Must Have’ requirements of the project. 3 Chapter 2 Project Management This section outlines the approach taken during the project development and highlights the importance of clear structure and management in achieving a successful outcome. 2.1 General Management Thorough management and scheduling is an important area of concern when embarking upon any development project. Without a clear plan of execution and detailed scheduling, a project can effectively stall and resources be wasted on trivial tasks whilst more important aspects of project development go unfulfilled. Moreover, the chances of project failure is increasingly more likely if risk is not managed and tackled as early as possible [28]. The computing sector is plagued by reports of project failures to mismanagement and lack of understanding of core requirements. To this end, the evaluation and selection of a clear design methodology, tools and techniques to be used during development, and a clear plan of action was the first aspect of the project to be carried out. The notion of design methodology, the method of documenting a project and assuring it meets all of its aims and implementation goals, a plan of action and tight scheduling are essential in successful project management [34]. The project, although tackling a specific problem domain as outlined in 1.2, can be considered mainly an academic project and to that end, allows for a free range of methodology options. There is no live system in utilisation that would need consideration and ultimately the project is to design an application to fill a current market gap, rather than replace or build upon an existing system. To this end, the cost and the risks involved in this project are minimal, and in fact even project failure could yield usable information regarding feasibility and usefulness, as well as architectural design. However, a successful project outcome is of course more desirable and would allow for further extension and user adoption. To this end, development risk is an important issue that needs reducing to a minimum and can be done so through the adoption of a sound design methodology. 4 2.2 Feasibility Summary Before developing a complete system, it was first a requirement to address the aspects of feasibility of the proposed project, and establish a clear need and list of benefits from its development. Through adopting the Unified Process methodology, described in detail in 2.3.1.5, issues of feasibility were addressed in the early Inception and Elaboration iterations, which focused on business needs and requirements, and refining those requirements and mitigating technical risks respectively. Whilst the Inception stages defined the scope and organisational feasibility, presented in detail in Chapter 3 and Chapter 5, the Elaboration iteration allowed for a clear picture of the systems architecture and the technology it was based upon to be formed. This work is largely presented in Chapter 4 as background reading, but also in Chapter 5 where an overview of the system architecture. [4] summarise the requirements of a feasibility assessment as being the stages of establishing whether a system is technically feasible, economically feasible and operationally and organisationally feasible. When developing a prototype application in an area with no clear existing overall structure or standard, it is hard to establish an economic feasibility summary as required by more rigid methodologies, or indeed required as an initial stage before development with the Unified Process on large scale or business/cost oriented projects as recommended by IBM [19]. As an academic project the cost of development is very low and as such costing methods such as Return on Investment (ROI) are of little value in establishing cost. To this end, cost can only really be quantified as the use of manpower in development compared to the gain from the successful development of the project as well as the quality of the overall system. This is shown in more detail in Chapter 3, where the SQIRO techniques show a real desire for the system to be developed amongst sampled users, and where the analysis of the domain presents further need for a system to be developed. A detailed plan presented in 2.4 shows the cost of the project in terms of resource usage as a development schedule. 2.3 Design and Development Tools 2.3.1 Design Methodology In terms of design methodology it was appropriate to analyse several tested methods and conclude upon which best suited the needs of the project. For this, several academically taught methods were first reviewed, followed by several researched design methodologies. 2.3.1.1 Waterfall Model The Waterfall model offers a clear structured methodology for systems design. The model is constructed of distinct stages, which are completed sequentially as development progresses through to implementa- 5 tion and maintenance cycles[27]. The methodology meticulously covers each stage, which is first judged to be completed before moving onto the next stage. The wealth of coverage ensures that the system is well documented and largely designed based on initial requirements capture, however, this also proves to be the model’s weakness as many sources argue that requirements are rarely static or fully captured in a single stage [34]. Whilst providing a well documented framework, there is a large time cost associated with its usage, largely due to having to ensure that analysis, design, and implementation stages are fully realised before progressing [4]. As such, the model has a large bias towards analysis and design, and also testing as presented in [34]. Whilst it offers clear structure, it is also extremely rigid and methodical, each stage must be signed off as completed before moving onto the next, and as such before coding a design must be fully judged as being completed and all encompassing. For example, before the design stage is started the analysis stage must be completed in Figure 2.1: Traditional Waterfall Model and Modern its entirety. This leads to the question of how to judge whether a solution is fully analysed and Approach when a design is fully matured, as the model restricts returning to a previous stage of development. As such this rigid structure is very slow and detailed to ensure each stage is completed to a level where it is deemed as completely researched and concluded. The Waterfall model is still used in expensive, large-scale projects, such as those of government information systems, where there is an abundance of time and resources. Modern incarnations of the Waterfall model have broken some of the rigid stages and allow for some re-evaluation of previous stages as displayed in Figure 2.1[4]. Similarly, the model has also been adapted to allow for some level of iterative development (a concept covered under later methodologies), with the aid of internal review stages to decide if milestones within the project and minimum requirements have been achieved. Despite this, the model quickly begins to break down in time limited projects or in a dynamic domain with changing requirements. With regard to this particular project, where time is limited and the requirements are hard to pinpoint without continued feedback and analysis, the Waterfall model is unsuited for the rapid development of the project. Whilst the clear structure and tools associated with design and requirements capture phases provide some level of clarity with its clear guidelines and development framework, the rigidness of the model is unsuited to a project of this nature. 6 2.3.1.2 RAD and Component-Based Development Component-Based Development is the integration of new software components into a growing implementation, building upon already developed components or integrating large parts of their functionality into a replacement system. The methodology focuses on designing new aspects of an existing system or building upon an already designed implementation, and as such requires some existing framework and analysis to be built upon. In essence the process involves a bias towards customisation and component integration or in the planning of a large system that is slowly rolled out in usable components, rather than designing and building an entirely new architecture and implementing in one development stage. The Rapid Application Development (RAD) methodology takes this component development approach and aims to roll out the functionality of a system by implementing key features, and then slowly building on this base through further development cycles. As part of its component-based nature, RAD aims to achieve maximum code reuse [4]. In terms of development bias, by comparison to the Waterfall model more time is spent in development of the software artefacts and systems integration in than analysis and design, given the existence of a large part of the system already; it is largely assumed in the process that the requirements have to a great extent been predetermined [34]. Whilst this is a dangerous assumption to make when developing new systems, it allows for rapid development of new software components working on top of existing technology. The methodology also supports object-orientated design principles because of this approach. The overall system can be represented as the interaction between objects, with each systems component being the manipulation of the objects and their interactions. With this approach, the first stages are to map out the entity objects and then to build a system of control classes utilising the standard interface for each object of the system. As the underlying objects largely do not change with further iterations, the rolling out of more control class components will not affect the underlying functionality of the system. Whilst there are parts of the project that are already implemented and a client is to be designed to utilise existing technology, the lessened emphasis on requirements capture and analysis works against its adoption as a methodology in what is largely a new client system. The rapid development of the software suits the timeframe of the project, however, without existing software systems in place and the need to gather detailed requirements, RAD does not suit project development as well as other methods. A key problem with RAD is the lack of clear methodology and development tools, that is a solid framework and methods of expression which mean RAD projects can often overlook critical documentation, and as pointed out in [4], often leads to a casual development approach. The aim of RAD is of course rapid production of applications and to tackle some of the deficiencies 7 of older methodologies, such as the absence of continued user feedback and large development times, but the lack of clear guidelines to document development means that the methodology provides some difficulties in adoption and can lead to overlooking some aspects of development [4] 2.3.1.3 Prototyping Prototyping can be split into two rough categories, Revolutionary Prototyping and Evolutionary Prototyping. The former uses prototypes largely as disposable ‘proof-of-concepts’, to test and validate working functionality, whereas evolutionary focuses more on the reuse of code and development of a program representative of the final product that is built on top of each generation. Evolutionary Prototype utilisation aims at first completing a ‘formal’ design and analysis phase before moving onto a prototyping phase, in which a solution is developed based on continual feedback. In this stage, a prototype is developed, reviewed and then the criticisms of the review are further amalgamated into the prototype. This process is repeated until the review yields that the development requirements have been met, and to this extent the methodology represents an iterative implementation phase in contrast to the Waterfall’s purely sequential nature. [38] outlines the benefits of this method of systems production. Since the system is made tangible it is easier to pursue a dialogue concerning the system, and thereby making better decisions on the design regarding usability and function. However, despite the iterative implementation allowing for refinement in function to meet the requirements of the project, the evolutionary prototype methodology fails to handle problems should they appear in the overall design, and requires building around central flaws, a notion omitted in [38]. This methodology can also lead to fundamental flaws being built on top of, rather than allowing for the prototype to be discarded and replaced by one based on a superior design[34]. This concept of refactoring is covered in the Unified Process later in this chapter. Similarly like building on top of design flaws, the evolutionary prototype methodology gives way to diminished overall structure, with redundant code artefacts being built upon with each iteration of the prototype. As well as the diminished architecture, it is difficult to avoid writing large sections of undocumented code with each iteration and as such, when adopting this methodology it is essential to set clear goals and solve subsets of problem as this allows for thoroughly tested and documented code. [14] summarises the issue, pinning it to the need for prototypes “to be developed rapidly” and thusly the production of “documentation would slow down the process”. [14] also points to prototyping being a method of realising risks and “to foster clarification [of] requirements, and to develop and try out solution concepts”, and to this end prototypes are summarised as best serving as “the centrepiece of a 8 hyperstructured information base” rather than a direct root to a fully documented solution. Ideally, prototyping is a design tool that provides a channel of expression of ideas with a user base and is most useful in areas such as user interface design. In terms of this project, the deliverable of a prototype client is descriptive of the level of implementation the software artefact will achieve, that is as a proof of concept for a content distribution client. 2.3.1.4 Boehm’s Spiral Model The spiral model is an iterative design methodology aimed at managing risk by repeatedly carrying out risk analysis. [4] summarises the stages as planning, risk analysis, engineering and customer evaluation which are repeated until completion. There are similarities with RAD in this regard, given the iterative development and review stages, and thusly contrast with the classical Waterfall methodology. Unlike both Waterfall and RAD methodologies however, the spiral model starts with the production of a requirements plan but does not have detailed initialisation and analysis phases at the start of the project. Instead heavy focus is on design and construction as pointed out by [4]. There are also elements of requirements and design validation which are not present in RAD. Whilst both RAD and the spiral model offer great advantages over the Waterfall model, it is the absence of some key stages and the lack of specific tools for documenting and expressing the system design that are their downfall. To this end, a similar iterative methodology will be reviewed next. 2.3.1.5 Unified Process The Unified Process, developed by Rational, offers an iterative approach to software design where requirements gathering, design and programming stages are repeated, aimed at rapidly creating reviewable results at the end of incremental stages. The core concept of the Unified Process is that of iterative stages of development rather than sequentially completing stages, such as that of the Waterfall Model [20]. The focus here is in quickly attaining aspects of feasibility and identifying design flaws through continuous feedback like in the evolving prototype methodology reviewed in this report, and essentially aims at reducing the risks involved in project development. The methodology attacks risk by constantly evaluating progress, mitigating risk early on in the development process identified by feedback and user engagement. There are notable similarities with the RAD approach, in that the system goes through development cycles that in themselves are structured like the Waterfall model, and that the core functionality is extended as the system develops (defined as Must Have, Should Have and Could Have Use Cases). Where the methodology differs is that the Unified Process utilises many existing object oriented development tools such as UML that aim to model the requirements of the system and to present its required functionality in a clear, universally understandable way. 9 The Unified Process is not a universal process, a rigid framework of stages that need completing, but instead a development process designed for ‘flexibility and extensibility’[17]. As [17] demonstrate, the Unified Process allows for a completely flexible lifecycle strategy and provides the tools and language to express the project, but at the same time allow the developers choice over the artefacts and concepts to model. The Unified Process provides a means to efficiently realise high quality projects by providing the means to document the project’s development based on object-orientated design principles. The basic structure of the Unified Process is split into four phases; Inception, Elaboration, Construction and Transition. During the Inception stage, business and requirement risks are addressed. This is particularly significant for new development projects such as this project, and ensures the system is feasible and achievable. To this extent, the methodology suits the needs of the project appropriately, given the requirement to justify that the project is first capable of attaining a suitable conclusion. The Inception stage is iterated if necessary, producing a statement of the problem in the form of the ’vision’ statement, identifies the scope, formalises a plan of action and produces an evaluation of the risks[28]. This wealth of documentation presents the project in a variety of diagrams and written forms at different levels of abstraction aimed at demonstrating the project’s worth and the fact that it is achievable, both in terms of cost and in terms of technical feasibility. The adoption of the Unified Process was largely based on this notion, the ability to achieve the projects requirements clearly and efficiently. Like the Waterfall model which also starts with an inception phase which cover aspects of feasibility, this methodology allows for the opportunity for refinements in requirements because of the iterative nature, and also builds upon the project’s problem definition to better reflect changes in requirements. A major advantage of this methodology over others is the Use Case driven approach. By modelling based upon specific requirements, the documents produced all have a consistent link between them. The Use Cases are categorised into levels of importance, the highest of which are realised and elaborated upon in earlier iterations, presenting the system via the use cases at increasingly lower levels of abstraction as the project progresses towards implementation in later iterations The Use Cases are then linked toFigure 2.2: Use Case driven analysis gether to show the overall view of the system by the end of the Elaboration phase. As emphasised, the Unified Process provides the tools for documenting and realising the development of a system clearly and efficiently. The methodology provides methods of identifying and attack- 10 ing risk early on in development by closely working with users to develop the system and redesigning if necessary. To contrast with the Waterfall model, if new requirements turn up later in the project, the Waterfall model has no framework for re-evaluation of the system’s design. Similarly, evolutionary prototyping results in flaws being built upon and the system becomes more architecturally complicated than is necessary. In contrast, the Unified Process aims to keep the system’s design to its most simplistic form. Users’ functional requirements are mapped as individual Use Cases, which are ordered by importance and then realised using UML modelling techniques, designed to represent an object orientated view of the system ready for implementation. Quality Assurance is provided through a sound design of the system at the end of the Elaboration stage, but also through testing, version control, refactoring, and other methods such as simple coding standards and planning. Transition is a final iteration of the Unified Process, with the purpose of implementing a realised system into a working environment. With regard to this project, this final stage is not a requirement for the success of the project given the objectives for a prototype client. The Unified Process ultimately proved the most useful for this project. 2.3.2 2.3.2.1 Management and Evaluation Tools Version Control The adoption of version control tools provides numerous advantages during the development of the project. Whilst in this specific project, only a single developer will be analysing, designing and programming the project, there is still a need for tight control over the documentation and software implementation, especially given its iterative nature of production. To this end, incremental versions of the documentation were stored under separate incrementally named files to ensure that any loss or problems occurring on the latest iteration could be rolled back to an early version. Under a similar notion, CVS was adopted to track code changes and share the code base between several development platforms. This allowed for a full listing of documented code changes and the tracking of implementation progress. 2.3.2.2 Testing The policy of test-driven coding is often overlooked as a systems development choice or becomes an afterthought yet is of great importance in establishing a high quality product that meets the requirements of the project and the realisation of the design architecture. By establishing a rigid set of test cases for the project a quality assurance is made, ensuring that the project not only meets the design model but also in evaluating the success of each iteration of an iterative design methodology. 11 2.3.2.3 Refactoring Refactoring is the process of redesigning a system’s architecture to better encompass the required functionality and to ensure it is in its simplest form. Refactoring is central to iterative methodologies, as the concept that continual study of the Problem Domain will lead to refinements in requirements, which give way to a redefining of the system which in turn yields superior simplistic design. The ability to utilise design patterns, or simplify and increase the amount of code reuse by simple refinements in the system’s architecture is an important part of development with the Unified Process, and throughout the project the architecture of the system was repeatedly design. 2.4 Project Schedule and Schedule of Documents A project schedule initially drafted for this project and the final schedule with the differences in dates marked can be found in Appendix B. In addition to these Gantt charts a list of documents produced by following the Unified Process and their location within this report can be found. Stage 1 2 3 4 5 6 7 8 9 10 Date 30.09.2005 21.10.2005 31.10.2005 09.12.2005 19.12.2005 06.02.2006 10.03.2006 17.03.2006 02.05.2006 03.05.2006 Task Submit Project preference form Complete Minimum Aims and Requirements Form Iteration 1 Complete the Mid-Project report Iteration 2 Iteration 3 Submit table of contents and draft chapter Completion of Progress Meeting Submit Project Report Submit Project Report Electronically Table 2.1: Schedule 12 Chapter 3 Requirements Gathering and Analysis Capture This chapter reflects on the requirements capture stages of the project. Iterative comparisons can be found in more detail in Section 5.1.1. 3.1 Users, Stakeholders and Problem Domain The overall aim of this project is to research and implement a content delivery and management system based upon existing technologies. For this project, the main focus of development is on the client application side and to this end the users of the system are defined as a generic set of computer users with a wide range of technical expertise and computing experience. 3.1.1 Current System The current environment for application and media content distribution can be described as an ad-hoc structure crossing various protocols. Content updates are tracked using standard browsing techniques, via search engines or direct navigation of known sites. In terms of update notification, this is largely left to the consumer or is provided as a feature built into a particular application system exclusively for this system and relies on some element of manual notification and then later manual acquisition. There is no general model or platform of Figure 3.1: Problem Domain distribution over any singular technology. The business model of the project specific domain under development is presented as a Business Object Diagram shown later in this chapter in Figure 3.2 13 as the conclusion of initial analysis of the Problem Domain. 3.1.2 Users In terms of the global system of content distribution, there are two distinct stakeholders. Firstly is the content publisher who releases the content and posts notification of release, and secondly is the consumer who acquires the content. There exist a great number of individual subsystems and business actors within the current environment but this can be greatly simplified to the two stakeholders in question in the analysis model. 3.2 Methods of Information Gathering 3.2.1 SQIRO SQIRO is an acronym for various techniques of gathering requirements in the initial analysis stages of a project from a wealth of sources. To elaborate, the areas of requirement gathering can be summarised as the sampling of documents, the use of questionnaires, the interviewing of stakeholders, and further research and observation of the problem domain [2]. However, given the size and scope of this project, as well as the lack of current implementations, not every stage needs undertaking in great detail, nor indeed is it particularly applicable to this project. To this end, a brief description and suitability analysis of each of the stages was carried out. 3.2.1.1 Sampling of Documents The stage of sampling documents does not offer suitability to any great extent for this project. Given that this is proposed as a new product development, there is, in fact, no documentation for an existing system available for analysis that will yield any great wealth of useful information, and given the time involved in reading through documentation of other projects and the varied aims and functionality of those programs, it is not a feasible technique to use for this project. 3.2.1.2 Questionnaires The next technique in SQIRO is the use of questionnaires to gather requirements and feedback for project development. Questionnaires offer a double-edged sword in terms of usefulness. In the method’s favour, questionnaires allow for useful levels of feedbacks if the questions are well formed and concisely answerable, for example evaluating existing products or features with a fixed scale rating. The method also has the advantage of being relatively cheap in terms of overall time expense compared to the results yielded. To this end, the use of questionnaires to empirically gather feedback on the system during the evaluation stages, at each iteration stage, will allow for a guided review of the systems success and areas in which the development is failing. Similarly to gather areas of required functionality questionnaires 14 offer a simple solution. However, it is important to avoid allowing for open ended answers and opinions being expressed in this format, and to this end, questionnaires were used simply to aid in the gathering of statistic feedback and gather product requirements information. In terms of this project, questionnaires were conducted in the form of brief interviews prompting users for their response to various questions gauging their preference for certain features. 3.2.1.3 Interviews The main body of requirements was to be gathered through a combination of interviews, research and observation. The interviewing process allows for a diversity of opinions to be recorded, and a formal channel of dialogue with the stakeholders given the documentation of the interviews. A structured approach allows for the opinion to be gathered in a systematic and useful way. Three candidate users of the future system were chosen and consulted for the entire lifecycle of the project. These users were consulted to gain continued feedback and allowed them to voice their opinions on the evolution of the product. This was a central part of development using the Unified Process methodology, as it allowed for continual attacking of development risks such as the lack of identification of the required functionality as the project developed from impacting upon the success of the project. 3.2.1.4 Research The process of domain specific research allowed for the gathering of information relevant to the problem domain such as the current methods of distribution, allowing for understanding of the core transactions. The process also helped in gathering information required to justify development choices taken later in the project, such as system component and software choices. The majority of research went into the background issues and technologies available. For the research stage of requirements gathering it was not possible to observe a single existing system or visit an organisation using such technologies which is usually the process taken at this stage [2], given that no such system existed. Instead the general overall concepts of the problem domain were researched, using a variety of resources such as academic journals, standards and white papers as well as review of several candidate systems in existence. 3.2.1.5 Observation The final technique of requirements gathering in SQIRO is the use of observation to gather detailed understanding of an existing system. Observation played a key role in the product evaluation, but in terms of initial analysis, observation took form in the shape of an extended feature of the candidate user interviews. Several existing systems were shown to the users, which allowed for the users to engage more openly in a level of dialogue and identify requirements with demonstrations during the interview. The browsing behaviours of users were reviewed during this process, and as such, the central tasks in 15 content management were observed and can be identified as three iterative stages; update notification, search and acquisition. 3.2.1.6 Conclusion Successful use of SQIRO requirements gathering techniques allowed for a clear understanding of the business needs early on in the project. Whilst questionnaires were not originally intended, and discouraged to some extent as a device for requirements gathering, as the results show (Appendix D), their use did quickly highlight key development areas that needed addressing and that would have otherwise been overlooked. As expressed, several techniques from SQIRO were applied, but the body of useful requirements gathering came through interviews which were combined with demonstration and observation to yield a more successful output, and background reading and research which identified the areas of development that needed undertaking early in the project. The sampling of existing documents was not possible for this project and was omitted as a stage. Likewise, the observation was used as a device to gain feedback, but was not useful in discovering an understanding of the problem itself and was largely not applicable to this project as an analysis stage. Existing programs were reviewed however and used as baseline comparators during evaluation. 3.2.2 Professionalism and Techniques Part of any successful project is the way in which it is structured and executed. A key part of the Unified Process is the high level of user interaction during the development. Here, it is important to maintain a level of professionalism as this interaction not only reflects on the quality and volume of useful feedback that can be obtained, but also on the University of Leeds as the parent organisation of this project. To this extent, measures were taken to ensure all external discourse was conducted in a manner befitting the level at which this project represents. Whilst carrying out interviews and questionnaire sessions, it was important to structure and prepare appropriately to ensure that a useful level of feedback was obtained, but also that the process reflected favourable on the University and the interviewee in terms of professionalism 3.3 Evaluation of Existing Products An important aspect of project development is an analysis of what products are available and why exactly a new product would be needed. Even in areas of little development, a review of what is available, what works and what does not work, and a gauging of how successful other applications have been helps to define exactly how a solution can be devised to take advantage or extend upon existing products. In regards to this project, there are a great number of applications that can be reviewed to gain insight into exactly how the problem domain has been tackled and how this project can build upon the success of 16 other applications. 3.3.1 Introduction Since the introduction of RSS as a web technology there have been many applications, both standalone clients and integrated features that web applications to support its usage, which can be classified into two general areas; standalone RSS applications and integrated RSS functionality. In addition to RSS utilising applications, proprietary update management software represent a third area to which the general principle of the project is being tackled. Whilst this project is concerned with the suitability of RSS parsing and feed management in solving the problem domain, and achieving a general, loosely coupled but highly effective system, there are several systems tackling the same domain using different means. In justifying the feasibility of this project, it is a requirement that these applications also be studied. 3.3.2 Types of Products As previously mentioned there are both RSS utilising applications as well as proprietary systems. In regard to RSS clients a base requirement of having to be able to acquire non-text based content attachments to feeds was a necessary requirement for their selection for review. To this end, the following applications were reviewed. BitsCast BitsCast is a popular RSS aggregator that is design to support podcasting as well as other forms of ‘casting’ (that being the concept of attaching content to an RSS feeds such as VideoCasting). The application is designed to show only recent views of RSS feeds and has an inbuilt internet explorer tab to support the viewing of news feeds. RSS Bandit Steam RSS Bandit is a highly popular news client with some support for file attachment downloading. Steam is a games content delivery system developed by Valve to publish and distribute their own games titles directly to their customers. This system offers a fixed content system, where customers can purchase and gain automatic updates to their software purchases. Windows Update This platform is an operating system update delivery platform for Microsoft products. BitsCast and RSS Bandit applications offer RSS utilisation and the ability to acquire content other than standard news. It should be said that the primary purpose of these applications is to retrieve and display news data rather than as a generic content distribution system and as such do not prove to be ideal solutions taken out of context. Although BitsCast claims to be for casting, that being the distribution of non-text data, its design is still around the delivery of news primarily. In addition to these applications the Windows Update system and Steam were reviewed, being applications that exist to serve and manage content using proprietary technology. 17 3.3.3 Comparison of Products The existing products were evaluated based upon useful functionality for this project. A complete comparison table reviewing all four applications can be found in Appendix C. 3.3.4 Conclusion and Future Systems From Steam’s example in particular, it seems very apparent that the future of distribution technologies for commercial games and software is likely to be redefined by the introduction of disruptive distribution technologies that will change the existing landscape of consumer-to-publisher-to-developer channels into a more direct form of communication between consumer and developer [13]. The runaway success of Steam is a prime indicator of the need for an internet content distribution system to replace the existing published content distribution model. In terms of delivery of patches and updates, it is increasingly more frequent that applications are delivering updated content over the web directly to the application, and it would seem logical that not only will this trend increase in frequency and become a standard part of most applications, but also that a generic form of distributing new content directly to consumers is likely to develop. Next generation video gaming consoles are already an indicator of this trend, all three major manufacturers (Nintendo/Sony/Microsoft) have expressed plans for online distribution of content in similar models. 3.4 Questionnaires The use of questionnaires for this project was merely to gauge feedback on the success of features within the systems, and to a large extent can be regarded as quick interviews of users to gain an insight into areas of success and weakness in sampled applications and the developing system. The use of questionnaires does not reflect the traditional sense of technique, where sheets are distributed and later collated, as the interviewer observed and gauged responses. Sample applications were demonstrated to the user and the questions were elaborated upon if a user required. A total of 30 questionnaires were distributed in the initial requirements gathering stage of the project, and were distributed to both experienced and casual computer users (classified as consumers rather than content providers). Of this number, 20 were carried out with the interviewer supervising, and ten were distributed for users to fill out when they had free time. Of these ten, three were returned completed (76.7% of the sample were completed). As such, the questionnaires could be regarded as interviews given the presence of the interviewer to help answer questions and clarify the aim and scope of the process, but were classified as questionnaires given the rigidness of the questions, the speed at which the interviews were carried out, and the simplistic aim of gauging success without detailed elaboration. From the questionnaires, the results were used as a point of reference in more detailed interviews. The results were tallied and split 18 by technical ability classification to represent clearly the viewpoint of both technically capable users, and that of casual computer users. The results showed clear areas where technically experienced users favoured functionality, whilst less experienced users’ favoured clarity and structure more heavily. There were also areas of clear overlap, where both demographics expressed similar sentiments. The following table 3.1 shows the demographic samples. Sampled Returned Response Experienced Users 20 16 80% Casual Users 10 7 70% Table 3.1: Questionnaire Response Similarly, the tallied results were used for the basis of deriving the functional and non-functional requirements of this project. Results of the questionnaires can be found in Appendix D. 3.5 Interviews and Feedback For the purposes of development and as an aid in achieving the requirements of development using the Unified Process methodology, three perspective users were selected to give continued feedback on prototypes of the product, and to help gather requirements throughout the project lifecycle. The candidates were chosen based on their suitability as a user of the product, their technical ability and the availability of the candidate throughout the system development. The following candidate users were interviewed at the initial requirements gathering stage and later were involved in design, development and feedback of the prototype product developed using the Unified Process methodology: Thomas Bradshaw Third Year BSC Computing Student, Leeds University Technically experienced in design and development of software systems. Samuel Thiessen GCSE Student, Rastrick High School, Brighouse Representative of the casual computer user demographic. Dale Smith Third Year BSC Computing Student, Leeds University Uses RSS feeds daily for news and site updates, and has some knowledge of existing uses of the technology such as podcasting and news syndication. For the initial requirements gathering stage these users were demonstrated a variety of applications and interviewed for their thoughts on the usefulness of core functions, and the success of the application in achieving its goals as well as their own thoughts on application requirements. Full summaries of these interviews can be found in Appendix E. A summary of the identified areas of development, which were adapted into minimum and extended requirements are summarised as follows: 19 Automatic and Manual Downloading A key part of functionality was of course to download and manage content. As part of the interviewing process, all three candidates stressed that an important aspect was the ability to have content automatically retrieved with no interaction with the system. Comments were raised on how Windows Update and similar platforms, interfered with computer operation and flashed prompts and reminders to restart. To contrast with this, all three candidates favoured transparent downloading with optional download notification. Another feature expressed as an extended requirement was the ability to manually download also, for content that was not flagged for retrieval automatically. Filtering The ability to mark what content was to be acquired was raised as a requirement. One user requested that the use of regular expressions should be available to achieve this, however for casual users a simpler method seemed more appropriate. Storing of Local Database Several opinions were voiced on this matter, whilst there was agreement on the need to store this information, the candidates all expressed different ideas on how the items retrieved should be stored and displayed. Searching of Local Database One of the candidates raised that he desired the ability to search through all of his acquired material to see where it had been downloaded to and in fact if it had been downloaded. System Integration All three candidates spoke favourably of how other applications integrated into the desktop by use of tray icons and popups, but how no overall windows was ever present to clutter up their desktops. Security A concern of one of the candidates was the retrieval of unsafe material, and the candidate expressed the need for an ability to blacklist content type and particular sources of download to prevent malicious material being brought past his firewall and executed 3.6 Summary After the completion of the first iteration of the Unified Process, a clear picture of the business domain emerged, this being the entire content distribution channel from content producer to end user. From here the domain can be expressed through UML in the form of a Business Object Model. Due to the time constraints of this project, it was decided that only the client-side subsystem should be consider in great detail (highlighted in Figure 3.2). This involves the RSS Aggregator business actor, the consumer and the Item business objects. These were the key concepts for the further design of the system. 3.6.1 Business Actors and Subsystems RSS Publishing System Handles the extraction and mark-up of the latest item entries from the source website system database. RSS Aggregator Propagation subsystem that allows for the processing of an RSS Feed into one or more Items 20 Figure 3.2: Business Object Diagram Content Publisher The website host and content originator. In this project this business actor plays a secondary role as the project focuses on the client side. RSS Feed The source of content information. Website System Consumer Item The Main User of the client side subsystem. A subcomponent of an RSS Feed which identifies a particular content/download entry in the feed. Item Source 3.6.2 The hosting site for the content The source URL reference or content stored on a web server Scope The scope of this project was decided based upon the size of the domain. Whilst the project proposal initially described the entire process of content distribution, and possible areas that this project could entail, initial planning, discussion and the results of the Inception stage indicated that this was not achievable within the time constraints allocated for a project of this size. To this end, after modelling the Business domain and analysing the problem, the scope of the project was limited from a system providing a complete proprietary content publisher to content consumer distribution, to utilising existing RSS feeds and designing a client to aid the consumer’s requirements of being notified of releases, searching for a download location, and manually downloading content. This also allowed for a suitable computing oriented development project to be carried out. This domain is highlighted on the Object Diagram Figure 3.2 and from herein out, the primary concern of the project is in designing an application to use two specific sites, http://www.bbc.co.uk/ and http://www.legaltorrents.com/, as two examples of sites that provide server side implementation but no particular client side implementation of content acquisition systems. These two sites were used as a point of reference for all sampled users, and in evaluation and testing (in addition to many other sites using a variety of server software and hardware 21 configurations that extend beyond the project’s minimum requirements). 3.6.3 Initial Use Case Modelling With the environment modelled as a Business Object Model shown in Figure 3.2 and the scope of the project visualised, the next stage was to model the requirements gathered through interviews and questionnaires in the form of a Use Case diagrams. Use Case diagrams offer a simple representation of the user’s needs and base requirements from the system, and support the communication of ideas directly with the users [1]. The Use Cases for this project can be found in Appendix G. 3.6.4 Requirements From the requirements gathering and analysis stages of the Inception and Elaboration iterations the following functional and non-functional requirements were arrived at. At this point, one of the project deliverables should be clarified, that being the ‘design of a prototype of a robust client’. This is partially covered in 2.3.1.3 where the issues of prototyping are raised. Through the adoption of the Unified Process methodology the prototype software artefact for this project can be described as meeting the Must Have Use Cases to some level of satisfactory implementation. Here we can define satisfactory based upon user feedback and evaluation. Similarly, extensions beyond the Must Have Cases, namely the Could Have and Should Have Use Cases, can be classified as project extensions beyond the base requirements. In [21], it was proposed the following criteria should be met when defining requirements, defined by the acronym ‘SMART’. Firstly, the requirements should be Specific, that is clearly defined in appropriate detail and free from any ambiguity. This is difficult in regards to non-functional requirements, as it is hard to quantify aspects of Human Computer Interaction (HCI) in clear language, however, this is sound advice for functional requirements. Secondly, the requirements should be Measurable, in that it is clear at what level of implementation that requirement has been met at. Attainable is the next criteria; that the requirement is feasible in regards to the proposed system and technology. Next is the advice that listed requirements are Realizable, that being feasible in the sense that they can be completed with the available resources and project limitations. Finally is Traceable, that the requirements can be modelled and realised within the projects development cycle. With these guidelines in mind, the following requirements were chosen. 3.6.4.1 Functional These will later be classified into Use Cases and prioritised, as covered further under Addressing Requirements Risks, Section 5.1.1. 1. Ability to add and remove input sources The ability to add new sources of input, to subscribe to an RSS feed as well as remove an existing 22 one. 2. Download material from many locations The need for the application to be able to take source input from any site and to acquire content from any location, rather than being tied down to a specific site. 3. Add filter criteria The ability for the user to add specific filtering criteria of what to match and acquire. 4. Apply a created Filter to an RSS Feed The ability to filter an RSS feed to select content. 5. Download an Item (Automatically) The need for the application to support automatic acquisition of filtered items. 6. Import Source Item manually Whilst the applications purpose is an update platform, another functional requirement is the ability to add download items manually that has not come directly from a feed. 7. Download an Item (Manually) The ability to allow users to manually start a download of an item that exists within the system. 8. Store a local database of Items The ability to store a database of all items that have been filtered from an RSS source, and store all the information associated with the particular items 9. Search of local database of Items The ability to search through a local database of acquired material to a rudimentary level via the item’s name to select a specific item. 3.6.4.2 Non-Functional 1. Platform Independence The ability for the system to work on both Windows and Linux. 2. Faster Update Tracking than Manual Methods The system must ultimately perform faster and require less effort to gain updates than standard browsing techniques. 3. Logical and helpful layout The application should make use of good design and HCI practices such as tool tips and graphics. 4. Integration into operating system The application should look and ’feel’ like a standard application 5. Filter creation help The application should offer some help for users who aren’t particularly knowledgeable about regular expressions. 23 Chapter 4 Background Research The aim of this chapter is to discuss and conclude upon development choices, justified by research and the suitability of each option in fulfilling the projects aims. This section discusses the technologies most suited for use in the development phases of this project. The first stage was to extract the project’s aims and requirements and consider the underlying issues in each case: • The product should allow users to get content from many locations requires that the product allow for a generic and site impendent solution, capable of processing a supplied input from many sites on request. This requires syndication techniques and information delivery channels evaluation to obtain insight into the suitability of existing models and methods, and furthermore, the functionality of existing techniques should be analysed to ensure that they offer the ability to distribute general content adequately. • The product should allow users to filter and select content implies the application of simple rules on a generic list of content to attain a specific subset to a user’s specification. The implication is the client must support a general input, and allow users to create and apply rules to filter what content is wanted. • The product should allow users to download material automatically as a requirement compels the research of content distribution and how to facilitate downloading in the client application. 4.1 Content Distribution In order to understand the problem, it was necessary to clarify the project’s tier of abstraction, which is how the aims differ from other models of content distribution management in existence, particularly in contrast to the totalitarian approach of other methods1 . Ultimately this involves clarifying what the 1 This was covered under analysis of existing technologies, Section 3.3. A particular problem of distribution channels is the limitations in range of content, acquisition abilities and in compatibility, utilising solely internal transfer mechanisms. 24 project aims to do, and likewise what it does not. In terms of avenues of distribution, there exists a great many applications that aim to facilitate transfers over various protocols at varying levels of success and levels of adoption. At this point, it should be clarified that this project’s penultimate goal is in simplifying the domain, and reducing human interactions required to acquire content rather than in replacing actual established avenues of distribution. With this in mind, electronic distribution can be split into several application/protocol categories. 4.1.1 Client - Server Models Client-server models are the predominant network model in existence to date. In this model an end user client communicates with one of more servers directly, which facilitate the hosting of services and content, most often in a connection-oriented environment[35]. Communication can be client initiated, where the client requests service from the server, and receives a response (Polling). There are inherent problems in this in that with large scalable system, lots of clients all send requests to central servers creating a potential bottleneck, the result being increased demands on the server in this system[33]. In regards to creating a large generic content distribution system, the increased demands on web servers is far from ideal. Likewise clients have no ideal way of determining whether service is available or whether data has changed, and as such potentially there is a great deal of redundancy in this model. An alternative approach to this in the client-server model is a server initiated connection, in which a server polls a list of clients periodically. This alternative, in regards to this project, is poor in comparison. The client looses all control over connection, and relies on fair play by the remote server. If the client wishes to stop receiving content given that the server has total control of connection initiation, trust and fair play is an issue. This problem can be prominently seen in areas of syndication such as email, where spam has been the direct result of server side control over subscription and management. To further this, the client list on the server side has no guarantee of integrity, given that it is held remotely and the availability of clients in a distributed system is difficult to determine. 4.1.1.1 HTTP and the WWW The Hyper-Text Transfer Protocol (HTTP) has achieved complete network protocol adoption, facilitating the World Wide Web application and the most notable bulk of network communications over and between external networks[15]. HTTPs introduction was designed to facilitate the serving of interlinking text documents and the mark-up of basic information, based on request and response functionality [35]. As such, its operation is stateless, and the client and servers themselves are relatively lightweight implementations [33]. Despite its original purpose and design, numerous innovations have brought more secure and varied communication abilities, such as the extension into state driven and secure 25 communication. HTTP is a text-based protocol, which facilitates the transmission of requests such as GET and POST via a web browser, and sits on top of the Transfer Control Protocol (TCP) for actual transmission along a network. In response to client requests, the HTTP protocol returns a file along with a document header providing further information on the file from the web server facilitating the serving of documents [3]. The protocol allows for a wealth of functionality, the original specification for example allows for the transfer of just document heads via the GET command, which helps reduce issues of redundancy identified in 4.1.1 in client initiated client-server models. For example using this in conjunction with a document head with the If-Modified-Since field, it is easy to determine whether content has changed server side and as such greatly protects against potential redundancy. The returned item may be a document or a server error response denoting restricted access, missing or moved files or other serving errors. An application using HTTP should correctly interpret and handle these events to be fully standard compliant [3]. During the implementation of the project, further research of the HTTP standards was required in regards to connecting to a page and being redirected to the actual hosted content. Initially, a simple download of the URL listed within the item source was utilised, however, this caused the application to obtain the content but with an invalid filename. Upon inspecting the HTTP headers, the ContentDisposition could be seen as the source for the filename and the program was modified to utilise this aspect of the HTTP standard allowing for file attachments to be used with dynamically created pages. Once this part of the specification was identified, it was a simple matter of parsing out the required filename attachment and allowing for filename extraction from dynamic pages and redirects [3]. HTTP Header 1 HTTP response header showing use of Content-Disposition {X-Powered-By=[PHP/5.1.2], Date=[Mon, 6 Feb 2006 21:10:03 GMT], Content-Type=[application/octet-stream], Content-Disposition=[attachment; filename=application.exe], Server=[lighttpd/1.4.11]} As can be seen from HTTP Header 1, the filename of the content required is hosted elsewhere on the server and is linked to as an attachment by the dynamic page script. In this case, application executable data is linked to with a filename of application.exe. 4.1.1.2 FTP - File Transfer Protocol The file transfer protocol is another widely utilised method of distributing content. Whilst HTTPs original purpose can be pinpointed to serving documents and marked up linking text, FTP is a simpler method which predates this. FTPs design was to solely facilitate the transfer of files too and from 26 a remote server, and even between remote servers[35]. Given the prominence of the protocol preworld wide web, and the current utilisation of FTP being high[15], it is logical that a content retrieval application should support this protocol. 4.1.2 Peer-to-Peer Filesharing Networks Since the advent of peer to peer technology, this avenue of distribution has grown rapidly in utilisation. The first major application to facilitate the distribution of content was Napster in the later 1990s, which allowed for the distribution and acquisition of audio files [35]. Following its demise due to lawsuits over content copyright issues several other networks rose to prominence, and to date, many large scale networks dominate the environment. As such, a filesharing application’s peer to peer structure is by nature more complex and varied than traditional client-server networks, and the network traffic is likewise either search based or index based, with almost all mature third generation applications capable of multiple source downloads natively, in contrast to the client-server applications above. As such it is difficult to implement existing peer-to-peer network functionality natively, and as such utilisation of filesharing networks will be of supplementary concern for this project. However, many prominent networks such as emule and bit torrent facilitate the distribution of file references that point to a content footprint on the network via various hashing models 2 [6]. This method has been put into use by distributors of large software artefacts, for example, Linux distributions to ease the requirements of any central server previously used to distribute content. As well as utilising third party networks to acquire content, it is also possible to integrate peer to peer functionality and create a fat client for managing content acquisition. In such a system, content source lists can be distributed amongst clients and take some of the requirement away from server polling and transfers. However with this complexity comes great issues of security and trust, how a source can be determined as safe and how the content’s integrity can be guaranteed pose huge problems as without an appropriately secure hash function, a unique reference is not always guaranteed to avoid file collisions. 4.2 Syndication Technologies Syndication is the terminology associated with information distribution and management. Syndication is the term for the distribution of news content and information to a list of recipients to promote new releases, products or to promote the company by syndicating promotional content. In a formal sense, syndication is most often linked with business-to-business information sharing and collaboration, who have a long history of sharing content via processes such as EDI. Speaking of syndication in regards 2 emule clients utilise a proprietary MD4 based hashing mechanism to summarise network content as a URL, whereas BT uses a supplementary .torrent file that summarises the content and location in more detail utilising a SHA1 hash. 27 to business-to-customer (content publisher to consumer) information sharing, in particular reference to computing, content publishers utilise a great number of avenues of distribution. Primarily these are email, the web and news syndication avenues [5]. 4.2.1 Push The concept of push can most easily be defined in a business sense, pushing content such as news and press releases out ‘beyond the firewall’ where the recipient has no control over what is received, only if it is received [10]. In defining the techniques, traditionally the predominant forms of pushing content is via email, newsgroups and publication, however, since its introduction RSS has offered an increasingly popular method for distributing content centrally [18]. 4.2.2 News The use of newsgroups for communication has been a part of network computing for decades, and still boasts large coverage even in the face of rival methods. The basic premise is that a user accesses and subscribes to newsgroups on public or private servers and from here synchronises subscribed groups with this server [35]. The server itself is synchronised with further servers in a network of newsgroup servers. Depending on access rights and configuration, users can post new topics and replies, and retrieve files from the server. The model does inherently have many associated issues, for example, there is no guarantee of retention of content and that posting to a particular newsgroup server will spread to other news servers. As an actual avenue of distributing content, although the technology supports the attaching of data to news posts, the means of distribution are far from ideal. Restrictions on post and data sizes have ultimately mean that the service for content distribution has largely only been utilised for driver and small file distribution, and illegal activities [31]. To this end the utilisation of news as a source for content is infeasible. Lack of retention on the majority of newsgroup servers, and the low spread of the technology to home users mean that the technology is unsuitable for a generic distribution model, both in terms of notification of content release and of distribution of material in itself. Predominantly, newsgroups are an area for discussion and support rather than a means of publishing and announcing the release of content. Recent trends of internet usage have pointed to an overall decline in the usage of Newsgroups, largely attributed to the robustness of electronic communication in other forms, such as the WWW [8]. This is understandable given the age and rigidness of news as a discussion platform, the expressiveness and freedoms of information is much more evident elsewhere, and the ability to communicate beyond a plain text environment is one such reason for this. 28 4.2.3 Email Email is now one of the most common electronic forms of communications in use to date [35]. However, with this huge utilisation comes problem of control and usefulness as a critical or formal channel of communication. A huge problem with email in its current state is the huge amounts of unsolicited mail (SPAM) that is in circulation. The problem exists because of total publisher control over the sending and the recipients of a mail message, leading to the abuse of these responsibilities. As such, E-Mail is considered a lesser form of communication in terms of formal communication. Mail messages are unverified and untrustworthy, and consumers loose all control over what they wish to receive and from whom. When an email address is obtained, it can be sent any message a sender desires. As such, utilising E-Mail for a source of program update notification is a far from ideal method of syndicating release information. Mail can be artificially identified as SPAM, spoofed by a malicious third party or otherwise invalidated with relative ease. Many companies actively release information via this avenue because of the low cost and wide reach however. 4.2.4 RSS RSS (RDF Site Summary) is an XML technology introduced by Netscape in 1998 as a means to push content too and from a central web portal, which parsed this content into usable and marked up information which was finally displayed on the site. RSS works on the notion of breaking down a site’s content into base metadata, and then using XML to mark this metadata back up into a singular XML file that is made publicly available over the World Wide Web. Subscribing parties may then use this feed and parse out required data back into meaningful information for their own purposes. The advantage over server controlled syndication such as email is that information is request orientated, retrieved to an end client’s specification not to the publisher’s specification. This allows for a subscribing party to retain full control over the content, and that upon publisher abuse the subscriber can remove the feed and no longer receive the content. The favourability of RSS over email is expressed clearly in [18] where it is stated as being largely down to the fact that, ‘to retrieve a feed you don’t need to provide you e-mail address. That is why people prefer RSS distribution to e-mail newsletters.’ RSS is most notably present in news and blogging communities to syndicate headlines, and because of its lightweight and simple syntax is being adopted rapidly [5]. 4.3 RSS - RDF Site Summary With RSS feeds’ original design by Netscape its purpose was in the sharing of news instantly between affiliate sites, and to syndicate static news content to subscribed parties. In its original form the aim was in alleviating bandwidth costs by sending simplified text summaries of a site’s content without HTML 29 mark-up-tags and deliver this stripped down summary to a requesting parties RSS Reader or web portal on the client side. The benefits of syndication via RSS are numerous, in comparison to distributing via a standard webpage, the use of RSS provides a way of standardising the structure and syntax of the content [18]. Originally, the syntax of RSS was RDF tag based, which is a more complex standard of XML mark-up. This was later simplified to basic XML entity tag structure, and ultimately adopted the name Really Simple Syndication to denote the standard change. From this point the standards further fragmented as rival companies developed their own standards on the two branches of RSS Standards. As well as structure and the ability to defeat SPAM due to the consumer controlled connection, RSS provides direct support for the attaching of non-text content through the extensible nature of the technology [30]. Despite what RSS offers, there have been numerous criticisms of the technology for various reasons. The criticism largely focuses on its XML base, and its relation to bandwidth costs. Whilst XML results in a coherent and easily readable format, this comes at the cost of increased file size. In terms of network transfers, XML is far from optimum. [12] claims that RSS is not ideal for syndication, and that the bandwidth costs associated it is too high. However, despite these criticisms other sources claim that issues of bandwidth can be addressed via proper client side and server side implementations. For example implementation of a header GET will reduce redundancy problems linked to the user initiated polling model[23]. 4.3.1 Metadata Mark-up RSS mark-up relies on relatively basic XML entity tags which encapsulate core metadata into item entries, composed of a number of required and optional tags such as ‘title’, ‘link, and ‘description’ within an ‘item’ tag. This is shown more precisely in XML Fragment 1, with a sample item and its primary tags. Whilst title and link are required features of an item, description can be omitted to present an item in its simplest form. XML Fragment 1 RSS feed item <item> <title>Title</title> <link>http://domain/valid/url</link> <description>Sample Item</description> </item> As the RSS standards have matured more functionality has been attached and the standards offer a host of useful data tagging options. However with increased amounts of mark-up comes greater 30 bandwidth costs associated with the feeds. 4.3.2 Standards Like the acronym itself, there is no official standard to define an RSS feed’s syntax. Whilst the encompassing documents structure is XML, the specific definition language varies and considering each revision there are in fact nine standards of RSS actively used. This was the result of Netscape abandoning the format in the early stages of development, only to have other companies take over the responsibilities of continued development. There has been continued criticism of the standards for this reason[25]. Given the incompatibilities between formats (and in fact RSS 2 being incompatible with an earlier version of itself), the lack of clear singular standards has resulted in lax implementations of RSS feeds and varying methods of attaching content to feeds as well as marking up the content itself. These lax implementations in turn result in greater requirements on RSS client developers to create a client robust enough to support all the standards, and variations that the standards allow. As [30] summarises. As a result authors typically test their feeds in a small number of ... aggregators, and [developers] ... are force to reverse-engineer market leaders’ behaviours with little guidance from the specification. The most prominent version of RSS in use is the semi-official RSS 2.0 and Atom Standard[30]. 4.3.3 Original Use and Project Extension Given the original purpose of RSS feeds to syndicate basic text summaries, the developing standards have provided a means to beyond this pure text nature into an avenue of distributing any content with the support of attachments to a particular feed item. In 2004, this was first utilised to attach audio clips to a feed for syndication, but despite this ability in the RSS standards only basic applications have arisen to exploit file attachments. As demonstrated in the evaluation of existing clients in Section 3.3, existing applications fail to take a content distribution centric stance and instead utilise feeds a sources of plaintext news. The technology is serving the purpose but has yet to be proven so through the development of an application designed solely to do this, which is the aim of this project. 4.4 Content Management The utilisation of Database Management System (DBMS) for local data storage such as Microsoft Server or Access have a host of advantages in terms of performance enhancements, optimisations, speed and native query languages as well as integrity checks. As a managed solution, utilisation is dependant on a local installation of a third party management system, but because of this the usage is usually feature rich [9]. The database schema can be guaranteed and data types can be enforced and checked, and the 31 overall structure of the database, particularly if the database is relational is managed. In addition to this, transactional interaction is supported allowing for mistakes to be rolled back. The alternative is to utilise a flat file storage system, where all of the advantages outlined above are lost, but the system is no longer dependant on third party management, has reduced amounts of code and allows for a totally portable solution. Given that one of the minimum requirements of the project is for operation on both Windows and Linux based systems, for this reason flat storage seems the most appropriate option despite the loss of functionality and efficiency. Some of the enforced features of a DBMS can be achieved in a flat file system by the utilisation of XML to enforce basic data types and structure. 4.5 Client Language There are a large number of possible implementation languages for the client application. From these, three main languages immediately stand out as the best suited candidates with high level functionality, support for object-orientated implementations and their native networking abilities. From these, one was chosen that best reflected the needs of the application. 4.5.1 C++ C++ is a widely utilised high level language that extends C to add Object-Orientated traits. As a compiled language, C++ provides quick execution times which interpreted languages can not match. The language also provides low level functionality if required as well as mature libraries that provide a wealth of functionality. However, a drawback to being a compiled language is that the compiled binaries loose portability and require the source be recompiled on different operating platforms with possible code ports, for low level functional calls that can not be replicated in different environments. Similarly, the portability of libraries also poses a problem for its use. User interface design is supported most often by the utilisation of libraries such as GTK+ (http://www.gtk.org/ ) which offer portability amongst the most popular operating platforms, which provide a means for developing high quality user interfaces 4.5.2 Python Python is a modern interpretive language, developed in C, and provides a syntactically simple method of developing applications with great support for networking. Whilst utilising simple syntax, python provides a wealth of high level functionality and is increasingly used for network applications (Such as the aforementioned BitTorrent client), as well as middleware and server scripts. The language also provides support for basic user interface implementation, supported by external libraries such as a python port of GTK+. As a language, python is becoming increasingly popular, most notably in academic circles, and despite its simplistic syntax, provides a means for realising object orientated design and can easily meet 32 the functional requirements of this project. 4.5.3 Java Java is a popular high level language and offers a strong choice for the development of this project. Whilst Java requires compilation, it is in fact classified as an interpretive language. The Java Virtual Machine (JVM) reads the compiled byte code, and translates this to platform dependant instructions, allowing for the execution of Java applications on any platform with the Virtual Machine available. Part of Java’s strength as a language is the large number of native and third party libraries available providing a wealth of functionality and supporting rapid development. ROME (https://rome.dev.java.net/ ), a particular third party library provides a means of acquiring and parsing RSS feeds of different standards with the aid of an XML parsing library, JDOM (http://www.jdom.org/ ). The utilisation of these two libraries vastly reduces the amount of work required in programming the networking communications and the reading and parsing of the local database stated as an extended requirement of this project. Natively, Java provides great support for GUI design through its Swing and AWT libraries, and with the support of JDIC (https://jdic.dev.java.net/ ), the HCI non-functional requirements of desktop integration can be achieved. There are drawbacks to Java, largely relating to the requirements of having the java virtual machine on the host computer and the performance impact of being an interpretative language. However, these disadvantages aside, the language offers a strong implementation choice. 4.5.4 Conclusion ++ Type Portability Object-Orientated GUI Speed Networking Support Supporting Libraries Testing C Compiled Requires re-compilation and porting Yes External Support Fast Yes Lots CppUnit (Java Port) Python Interpreted Requires re-compilation Yes Some External Support Fast / Medium Yes Many – Java Interpreted Portable byte code Yes Native Slow on first run, Medium Yes Lots JUnit Table 4.1: Comparison of Languages All languages are capable of realising a product that would reach all of the functional requirements of this project. However, the ease in achieving this differs greatly between languages. C++ would provide the smallest and least resource intensive solution, however, in contrast to java it lacks the portability required as expressed as a non-functional requirement, and would require more coding and implementation time in contrast to the library supported functionality provided for a Java solution. With strong native GUI building support, and the ability to design and create test cases using JUnit libraries, Java provides the most sensible means of achieving the project’s implementation goals. 33 Chapter 5 Design and Implementation using the Unified Process 5.1 Development with the Unified Process The development of the project using the Unified Process allowed for rapid realisation of the project. Given the size and wealth of documentation produced by development with the Unified Process, detailed records of design can be found in the projects appendices. The following sections indicate key methods of mitigating risk utilised and development summaries. 5.1.1 Addressing Requirements Risks The bulk of this project’s requirements gathering techniques are covered in Chapter 3. From here, the requirements were formalised into a specification which defined the focus of development and the goals and aims of the system. This Requirements Specification document is formal presentation of the requirements, problem domain and other aspects of development discussed during this report, due to its length and the repeat of data found with Chapters 1 - 4, it has been ommited from this report but is detailed in Appendix F. The specification itself is based on a Requirements Specification template provided by Atlantic Systems Guild (www.systemsguild.com). From this formal document, it was then a matter of modelling each requirement as a Use Case, and then realising these Use Cases to create architecturally significant diagrams. From the UML model of the Use Cases, it is easy to classify the importance of particular cases and thusly define the core ‘Must Have’ functionality. An important reason for the formalising of user requirements was to avoid function creep, where more and more requirements are requested as the project developed. Whilst this is important to capture, it is undesirable to have the focus of the project compromised by increasing amounts of requirements in later iterations. As such, by presenting the core needs of the application in a Requirements Specification, the required functionality that should be delivered can be documented and finalised. From here, realisation was achieved through utilisation of a Use Case Description Form (UCDF) (see Appendix I) which defined the data flows and steps required for each Use Case. From the UCDF, 34 (a) Must Haves (b) Should Haves (c) Could Haves Figure 5.1: Use Cases: (a) Must Haves, (b) Should Haves and (c) Could Haves an Activity Diagram, which mapped the data flows into a sequence of steps taken by the user, and then finally a Sequence Diagram, which mapped the system calls needed to carry out a task, were created (see Appendix K). By first formalising the gathered requirements in the form of a Requirements Specification, which could be shown to users to gain agreement, and then utilising Use Case development which expressed this document clearly to the user base, this ensured that requirements gathering risks were correctly mitigated and attacked. 5.1.2 Addressing Architectural Risks Based on the domain presented in 3.1.1, the architecture of the system can be formalised with a Deployment Diagram, which addresses the architectural and technical requirements of the system. [1] clarifies the need for its production, ‘to model hardware that will be used to implement the system and the links between different items of hardware’. In this case, the Deployment and Component stages have been combined to show a picture of the dependencies between components of the system. Figure 5.2: Deployment Diagram The client application is dependant on the availability of the RSS feed sources as well as Item data 35 sources which are hosted either on the same server or on two separate servers. As shown above, there are two distinct network communication transactions. Firstly the network is utilised to request the RSS feed from the server which is then retrieved and undergoes local processing. The second transaction is in receiving the item. In terms of minimum requirements, acquisition is required from either the BBC fileservers or that of legal torrents, which are provided over HTTP. However, an extended feature can be to utilise external networks to acquire content. In later implementation stages, this was achieved by setting file type based downloads to a particular location, which initialised retrieval in an external application (in the case of torrent files), or by executing applications by MIME type (ed2k protocol downloads, for example). HTTP and FTP downloads were handled natively by the application. 5.1.2.1 Redundancy As seen in the Deployment Diagram there were two network transactions that ideally should be kept to a minimum to save resources. In the requesting of a download of a file, it is not so much a concern, as downloads are merely a single connection that is utilised until the download is complete, with no repeat requests. On the other hand, the RSS request connection poses some redundancy, given that updates are not assured. As previously covered in the Background Reading Chapter surrounding the HTTP standards, the serving of these feeds over HTTP allows for the utilisation of a header GET which reduces the amount of redundancy. Although this means extra overhead receiving the headers when the feed is updated, its benefits are realised when the feed is not updated, as substantially less data will be transferred. The adoption of the ROME RSS libraries offers an implementation of this feature and helps reduce the central RSS server load by assuring clients obey HTTP standards. 5.1.2.2 Three Tier Architecture Figure 5.3: Presentation, Application, and Data layers (PAD) Good practise in designing high quality systems architecture is to split applications into three tiers of operation, those being the Presentation, the Application and the Data layers (PAD). The Presentation layer focuses on user input design and HCI issues, and is supported by several of techniques such as prototyping and agile design sessions within the Unified Process. The Application layer focuses on programming the functional requirements of the system and is the focus of Use Case realisation and object-orientated design principles. The Data layer is the implementation of persistent data storage, 36 through the utilisation of flat files, database management systems and other database concepts. The aim of this concept is to move functionality away from the immediate presentation layer to the application layer and the handling of data from the application layer to the data layer, which has numerous advantages in achieving a high quality system. PAD ensures that the system is loosely coupled, and that for example, interfaces on the presentation layer can be modified with feedback from the users, but this redesign is independent from the application layer development and should not alter the overall behaviour of the application. In such a case, the Presentation layer provides a means for users to interact with the system but not to directly manipulate data or the system. This also makes for a sound security policy of mistrusting all forms of user input. Whilst method invocations from the Presentation layer should under go heavy validation, the Application to Data layer boundaries can be trusted to a certain degree because of the inability of user’s to directly manipulate the data. 5.1.3 Addressing Design Risks The designing of the system using the Unified Process led to the rapid development of the system. The completeness of the requirements gathering stages meant that Use Cases could be quickly created and translated into a Design Class Diagram, through the realisation of the Use Cases with Sequence Diagrams. This showed the required objects and methods required to create the system. The final Design-Level Object Diagram follows (earlier versions can be found in Appendix L). 5.1.3.1 Presentation - User Interface Design The design of the user interface was modelled with continued feedback from users during agile design sessions and the utilisation of prototyping to quickly bring about viewable designs. Given that all functionality was moved from the presentation tot he application layer, this allowed for free review and redesign of the presentation layer. Agile design is the process of robust and open review sessions where developers and users discuss functionality and the look and feel of the system directly, with tools and guidelines in changing aspects of the design without wasting time through overly encapsulating procedures. The lightweight and highly interactive agile modelling yields high levels of productivity given the small time costs of the technique and the direct interaction with users. The result of this is that the technique greatly helps in mitigating design risks by getting a favourable design realised with great speed. User interface prototyping can be found in Appendix N along with other agile development artefacts. 5.1.3.2 Application - Refactoring and Central Classes When designing the core application, the system underwent a series of refactoring stages, with the aim of simplifying its design and thus reducing its complexity. At the center of the products design was the 37 Figure 5.4: Design-Level Object Diagram concept of a centralised Scheduler and Retrieval architecture, which scheduled and retrieved downloads respectively. Originally the systems design was based around a single Retrieval which handled all network communications, and a single Scheduler which handled all the polling and download scheduling. As development progressed, this proved inefficient and led to concurrency and class type collisions, that being poor, unintuitive casting methods. To combat this, the central architecture was refactored to introduce higher levels of inheritance and to decouple acquisition from the management of acquisition. The central Retrieval class became an abstract class, with RSS and Item specialisation that better handled the requirements of managing the two types of downloads and then further processes them. Similarly, the acquisition itself was moved into a Download class which handled network communications and file creation. This was similarly the case for the Scheduler class. The use of refactoring is important in achieving simple, efficient design and is sought after when developing with the Unified Process [28], both in terms of high quality design, but in terms of future expansion given the Object-Orientated preference of the methodology. Figure 5.5 represents the refactored classes. 5.1.3.3 Data - Logical and Conceptual Design via ERD With regards to data storage there are numerous problems such as duplication, redundancy and consistency that can affect a systems performance. These problems, however, can be overcome by adopt- 38 Figure 5.5: Design-Level Central Classes ing sound database design techniques. The Relational database model is the de facto standard design paradigm in utilisation today. In this model, the principle is the storage of facts once in a single location and then the interlinking of related tables of data via unique primary keys record identifiers. Design with the Relational Model is supported through the utilisation of an Entity Relationship Diagram (ERD) which attempts to prevent the aforementioned data storage problems by meticulously remodelling the entities of the system. In an ERD, entity objects are drawn and the multiplicity between entities is used as the basis of determining how a database table need modelling to represent all the required data yet also storing this data only once. This model differs from the basic structure of the entity classes within the Design Class Diagram, namely as ERD modelling is concerned primarily with ensuring data is stored once. This is largely due to the limitations of traditional database management systems and their inability to store single or multiple records directly in a field of a table and instead index primary and foreign keys, whilst object-orientated languages like java offer the ability to store one or more object references directly in a field of another object. An example of this is seen in Figure 5.6 where the feeds attribute of Filters is an ArrayList of RSS. In a Relational Model a separate table of record indexes is required to hold this many-to-many relationship. In translating between the data layer to usable data objects matching the Design Class Diagram, the two tables that make up the many-to-many relationship are merged and an ArrayList of RSS is constructed and set as the feeds attribute of Filters. The model contains several many-to-many relationships where one record in one entity relates to many records of another and those records in turn relate to many records in the original table. Before 39 Figure 5.6: Conceptual Level ERD and Design Class Diagram equivalent normalisation (the method of removing repetition in the model), records are stored several times which leads to consistency problems. The Conceptual Model (Figure 5.6) is translated into a logical model which solves this problem by normalising the entities, splitting them into several tables by carrying out vertical decomposition. The logical model can be found in Appendix M. Given that all of the entities have no unique identifying field, it was required to create an artificial key for each entity to uniquely identify records. Figure 5.7 represents the final database schema of the data store which is in Boyce Codd Normal Form (BCNF). 5.1.4 Addressing Implementation Risks The final distribution of the application is available for download from http://autofeed.jacobbriggs.com. 5.1.4.1 Presentation - User Interface Implementation When implementing the UI, issues of HCI were taken into account. An important aspect of UI implementation is the logical layout of functionality. The short term memory of humans is used heavily to store a lot of details of various applications and their functionality with computer operation. Most sources, such as [7], claim that a user can only remember a certain number of steps, and that applications developers should ensure tasks require no more than 7 ± 2 steps to be remembered. To this end, GUIs and menu depths were kept below this limit to combat any confusion. Similarly, relevant information 40 Figure 5.7: Database Schema in BCNF was restricted to a single tab and tool tips were used to prevent users having to rely solely on memory to complete tasks. User Interfaces for the most part integrated validation on all sources of input to avoid erroneous data from entering into the application layer. In cases where a user commits an action, a visible response is given to ensure the state is known and the actions have either succeeded or failed. UI design authors promote this in implementation feature, in that user’s expectations should be visibly confirmed [7] [32]. (a) Tray Icon (b) Balloon Message Figure 5.8: Example HCI: (a) Idle Tray Icon and (b) Critical tray icon and error message Part of the non-functional requirements for the system was the ability to integrate the application into an existing operating environment. This was achieved through a number of methods. A third party library, JDIC, was utilised to achieve a number of extended features such as the ability to hide menus and keep a tray icon present in the operating tray. This allowed the application to continue to function transparently without a continual on screen window, yes still allow users at any time to view the application. Critical feedback was given via ‘balloon messages’. These features proved popular in feedback interviews. Similarly, other aspects of the interfaced were considered with regard to standard 41 HCI techniques. Menu systems were created to ensure that all features within the application were available within a maximum of seven stages, to ensure that tasks could be performed simply and were easily remembered. This accounts for the higher times for carrying out tasks in initial tests during the testing of the application (evidenced in Section 6.1.3). 5.1.4.2 Data - Physical Implementation via XML Flat File System The implementation of the database was via a flat file system, marked up in standard XML format and parsed and converted by the program through the utilisation of two classes SystemData and FileParser. This structure allowed for a dynamic input, where standard calls in SystemData populated HashMaps which contained lists of objects constructed through the parsing of input from an XML file. The reason for choosing XML over a database management system implementation was for reasons of portability. An early non-functional requirement expressed in early interviews was the ability to make an operating system independent and portable application. Whilst database management systems boast high levels of optimised database management functionality, it is a requirement to have these systems installed on the host operating system to utilise them, and thusly utilising an external system compromises this nonfunctional requirement. To this end, the decision was made to utilise an XML parser that was used for parsing RSS feeds by the application already, and to extend its usage to handle local input. This allowed for a database of flat files to be created that allowed the application to work in any operating system environment without being dependant on any third party utility. Examples of the XML flat file database can be found in Appendix M. There were drawbacks to this system, not only in terms of the technology as previously covered but also in the differences in data representation between Java and standards compliant XML which was witnessed in the execution of the JUnit testing (see Section 6.1.6). 5.1.5 Simplifying the Domain Even though the process of feed traversal to receive release notification and to obtain material is simplified by the use of RSS feed management, the issue was raised in feedback meetings that it is often hard to browse through sites, searching for appropriate RSS feeds. It is true that feeds are often hard to locate within a site. The effort required to find repeated updates, which is tackled as a minimum requirement, is the same for initially locating the feed within a sites structure (which can be seen in Appendix P as the manual acqusition times), and then after adding the RSS feed to the product, there was still the requirement to add a filter to identify material that is required from it. This adds a level of manual interaction that can be large eliminated to reduce the workload for the consumer. To this end, a solution was devised to try and combat this annoyance, and that was by introducing 42 a custom file format that would summarise the location (that being a list of source RSS feeds) of the download, and also the contents filename aliases within the feeds as Filter objects, summarised as a Filters class (See Design Class Diagram). In addition, the notion of advanced scheduling for the expected time of updates were added as an optional extension to this file format, to allow for the content publisher means to quickly make a file that would provide all the required information for a client to receive updates with one click. The idea was to significantly reduce the workload required by the consumer by adding a hosted feature that could be made once by the content publisher with very little effort and provided on the content publisher’s site, or shared directly amongst users. To facilitate the easy production of this summarising file, RSS, Filter, Filters and Schedule objects could be bundled and exported from the client application. The payback for the host came with the ability to have ideal retrieval settings setup for the users as default, and to optimise bandwidth costs associated with user polling network models. For the user, the ability to transport settings between clients or to quickly configure their client from a trusted source with one click proved hugely beneficial. With the added schedule data, a host could easily put in data relating to how frequent the updates to the feed are likely to be, and as such intensify queries of the RSS during that period, but leave the rest of the time when releases are unlikely to infrequent requests. This extended product feature allowed for an even higher level of automation than set out initially in the projects aims, when adding a specific classification of content to the product, all that was required to add and begin filtering an RSS feed for specific content was a single file that was associated with the product at runtime. After its implementation, review groups presented indicated this was a positive, time saving feature and as such helped boost the overall quality and professionalism of the product. 5.1.6 Polling History and Learning Responsible Request Times The ability to ‘intelligently’ learn an appropriate delay between subsequent RSS feed retrievals, and more over to learn a sense of when the likelihood of a feed updates would serve as a great optimisation technique in reducing both network demands, and server requirements. There are two methods of achieving this simple optimisation, either by putting an extra requirement on the server or by putting an extra feature into the client, both of which have advantages and disadvantages. In regards to providing an extra detail with the RSS feed on the server side, given the flexibility of XML, it is entirely feasible as an implementation detail that a ’repeat request’ tag be added to the feeds header, and indeed this is something supported in some RSS feed standards. RSS 2.0 supports the notion of skipHours and skipDays tags, which numerically represent request periods that should be skipped by an RSS client, however there is no field that numerically expresses an ideal request query delay in the standard natively [36]. As an unfortunate part of the split standards, Atom provides no support for these tags and thusly 43 the use of these would mean that servers using Atom would have to implement none standard elements and thusly is impractical [22]. This means that practically, server side optimised query request times are not a provided part of any RSS standard in existence, and as such, this ultimately means that the client itself is required to calculate an optimum request time rather than expecting an adoption of a none standard feature with server side implementations. This can be achieved by utilising the time of publication header fields that are present in all standards. In RSS variants of the standards, this is provided by the header field lastBuildDate which stores the date and time of the last update to the feed. Similarly, in Atom standards, this is provided by the updated field. From here, it is a matter of storing the actual request value nk , the calculated time between updates from the headers rk in hours, and the time of the last header update tk , where k is the most recent request in a window of x. This can be better expressed in the following algorithm. 1. Initialise default values: n0 = r0 = 12hrs t0 = time of RSS build in initial request 2. If RSS updated at n( k − 1) if (k > 1) t0 = t1 t1 = time of RSS build Replace the first element in the window if ( k > x) r( 0 · · · k − 1) = r( 1 · · · k) n( 0 · · · k − 1) = n( 1 · · · k) rk = t1 − t0 if ( rk > r( k − 1)) rk = r( k − 1) − rk if ( rk < r( k − 1)) rk = rk − r( k − 1) nk = (r0 + · · · + rk )/k 3. Repeat 2 By implementing this algorithm, it is possible to tackle some of the inefficiency in simply choosing an arbitrary value to request a response. With this algorithm, the request times converge on an ideal value over time and can correct, or rather dampen, some of the noise between request times (See 6.1 for evaluation). There are limitations to its success. In cases where the RSS feed is dynamically created its use is impossible, as the value would eventually converge on a request time of zero. To this end its use must be carefully considered, and must be limited to static file implementations with feeds constructed on the server side rather upon client request. 44 Chapter 6 Testing and Evaluation 6.1 Testing The first stage when carrying out any evaluation is to perform a variety of tests to ensure that the requirements have been met. With this project, it was important to test that the client application performed as it should, that human interaction was handled and the application maintained a stable state, and that the system was accepted by the prospective users. 6.1.1 Input Validation Part of testing a system that is intended to be used by a large set of users is the test to ensure that the system handles and stores any input by the user appropriately. For this, a series of test cases were created and performed to ensure every field and point of user interaction was correctly validated to ensure erroneous data could not be entered into the system. For these tests, standard boundary tests were performed to ensure erroneous, empty or large amounts of input were correctly handled and prevented from affecting the application’s state. To simplify execution, these were automated with the help of JUnit Test Cases (See 6.1.6). In cases where user input was rejected, and recommended by HCI studies covered previously, users were prompted of any errors. Figure 6.1: Notification of invalid input 6.1.2 Security Testing The next series of testing was to ensure that the application responded appropriately to security attacks, those being attacks that aimed to achieve access to areas of the system that a user does not have au45 thorisation, and attacks on the application, and surrounding operating system to effect the integrity of operation. The product produced was a relatively open system, with no need for authorisation. However, the application does download content directly to a user’s computer and as such needed testing for a variety of attacks to ensure the application responded appropriately. Modification of local files and settings after run time, for example, had no effect on the data held within the application, and in fact changes made manually to the database are undone when the application next saves. In terms of network code, utilisation of existing mature Java libraries reduced some of the risks in network communication given the level of testing that these libraries under go. 6.1.3 Functionality Testing With this set of tests, the speed in which users were able to carry out tasks were measured and compared to manual browsing methods as a baseline. These tests were carried out to ensure that the application both functioned and at a level that was more efficient than manual methods. These times can be found in Appendix Q and show the applications advantage over manual methods. 6.1.3.1 Algorithm Testing In regards to the simple polling algorithm summarised in 5.1.6, it was appropriate to carry out a single test comparing the improvement to the baseline classifier of a set unit request time. For this test, a test RSS feed was setup and entered into the application (http://www.jacobbriggs.com/forums/rss/rss.xml) which was updated every eight hours, for a series of twelve days. A default request time of twelve hours was setup and was entered as the initial value for both the algorithm and a default baseline comparator (without the algorithm). For the algorithm, a window of ten was used. The effect of increasing the window increases the chances of suppressing noise, for example a rare occasion where an update was at a none periodic time. Actual Baseline Baseline Accuracy Algorithm Algorithm Accuracy Update 1 0 0 n/a 0 n/a Update 2 8 12 66% 12 66% Update 3 16 24 66% 20 80% Update 4 24 36 66% 24 100% Update 5 32 48 66% 32 100% Update 6 40 60 66% 40 100% Update 7 48 72 66% 48 100% Table 6.1: Algorithm vs. Base Line This is an artificial example, where a set eight hour period between updates was used. In reality it is unlikely that updates would be released at an exact set period of time, but would be released at a similar range of times in most cases (For example, the BBC feeds were updated at the same time on the same days of the week within a few hours of deviation). The algorithm successfully dampens noise 46 and reduces the overheads with time. As you can see with the above algorithm, the baseline comparator missed Update 6 altogether. Whilst the algorithm does not ensure an update will not be missed in this way, it greatly reduces the chances that this would happen. The algorithm is simplistic and can be improved with more time and features. For example in its current state, if the next update were at nine hours, this update wouldn’t be downloaded for another 8hrs after the request. However, this update time increase would be added to memory and the next update time would be increased with the new update times converging on the optimum nine hours. 6.1.4 Deployment Testing With this series of tests the aim was to ensure that the application could be deployed on a series of working environments, that the application performed appropriately, and that the program maintained a consistent state of operation between them. For this test, the application was distributed and installed onto several different operating systems, namely Linux (Fedora), Windows XP (SP2), and Windows 98 (SE). There were slight discrepancies in regards to the interface of the application, for example the tray icon on Linux systems using KDE looked different to that of windows systems where primary development had been made, but in the most part these were a simple matter of different tray icons size and positions of tool tips within the user interface. Operational differences were negligible. Whilst the file systems of windows and Linux posed a problem, differences were tackled early on in development and as such the program operated correctly, saving files to appropriate locations between environments. 6.1.5 Performance Testing Later iterations yielded more successful results under the performance category. Optimisations to the threaded downloading code, and some further modifications to the network code, such as forcing a disconnect on opened connection rather than waiting for unused connections to be dropped by the server, which freed up sockets on the client application. These were evidenced by drops in the CPU usage and open network sockets respectively. 6.1.6 J-Unit Tests and Results As stated earlier in the Project Management chapter (Section 2.3.2.2), a test driven approach to development was taken, utilising JUnit testing to evaluate the success of the developing system. This provided a standard set of tests to perform to ensure that the system remained functional despite ongoing changes and refactoring. Early JUnit tests highlighted problems in entity integrity, and file input/output. For example, after the first iteration of programming it was found that in the range test characters such as ‘&’ caused input to become unreadable, as ‘&’ is not a valid HTML character and thusly when the XML 47 parser parsed this the records would fail to be read in. The JUnit tests made it easier to pinpoint points of failure. 6.2 Product Evaluation Feature Ability to add/remove sources of input Ability to add/remove filtering of specific content Ability to automatically acquire content Ability to automatically acquire filtered content Ability to automatically acquire content specific locations Download material from many locations –From p2p Download manually Local History Search Local History Platform Independence Transparent Operation Requirement Minimum Minimum Minimum Extended Extended Extended Extended Extended Extended Extended Extended Extended Product Yes Yes Yes Yes Yes Yes Yes (Externally) Yes Yes Yes Yes Yes Steam No Yes Yes Yes No No No Yes No No No Yes Bitscast Yes No Yes No No Yes Yes (Internally) Yes Some No No Yes Table 6.2: Task times All of the stated minimum requirements for this project stated in Chapters 1 - 3 were exceeded. 6.2.1 User Evaluation Throughout the project development, users were asked for feedback on the product and to evaluate both its success and its points of failure. As such, interviewed users had an active stake in the development and gave active feedback on areas they liked and areas they felt more work was needed. The final evaluation was favourable, with users expressing favouritism of the product over alternatives. All users expressed positive feedback in regards to the performance and the available features within the prototype product. Criticism mainly focused on the user interface, where more work was required to better capture user input and also in the extended functionality of the program. This feedback largely focused on extended features of the project. 6.2.2 Conclusion Ultimately the value of the project was justified by the quality of the product made, that being the features available within the developed solution, how it compares to existing products, the quality of the solution in terms of implementation and performance, and the acceptance by users. To this end, the product can be considered successful, justified by the results of the testing as well as the acceptance by the interviewed users. As can be seen from the testing, the project reached all of the required levels of functionality and to this extent, the project can be considered a success. The approach of the project had a large effect on the quality of the software produced. The adoption 48 of the Unified Process allowed for rapid development, which involved users directly, and to this end it can be said to been a great help in achieving the projects aim. The wealth of documents produced and complexities in development led to project slippage however. As can be seen in Appendix B, there are several alterations made to the original plan to encapsulate problems that arose during the development of the system. The methodology led to large amounts of accurate requirements to be met, but to the same degree, the methodology can be identified in part as the reason that the project began to suffer from function creep when continued user evaluation yielded preference for more functionality. Whilst this is a necessity during development, that is reaching sound levels of requirements analysis, it can be a mixed blessing as user’s often deviate from expressing feedback on core functionality into issues of lesser importance from a development standpoint. Despite these issues with development, the Unified Process offered a great advantage in the development of the project and whilst complexities in implementation and research pushed back the delivery of the project, the methodology is largely responsible for the successful outcome of this project. In terms of the area covered by this project, the content is both current and under ongoing development. As expressed in [13], for example, content distribution channels are providing a disruptive technology for the gaming and software industries, and this trend is likely to continue. 6.3 Project Extensions There are many aspects of the application that would benefit from further study and development, most notably in terms of improving efficiency, compliance and introducing tighter security. Whilst the system successfully exceeded the base requirements, there are many additional features that would also warrant development. During testing it was found that using more sites led to a wider range of server side implementations using different technologies and techniques of mark-up. Ultimately, this extended testing resulted in increased implementation complexity, and therefore as an extension to this project, the compatibility with more sources of input seems a logical choice. This will require steps be taken to increase the robustness and require significant modifications to the Download class implementation which handles most of the network code. This issue was covered under the background reading which identified the problem with lax standards. In addition to extending this project to better handle more sources of input, it would also seem logical to improve the efficiency of the downloading mechanism. This was an issue shown in the testing, where by rewriting segments of code, particularly regarding buffering and the technique within java that was utilised, the application performed better. One element that has not been considered throughout this project is the idea of premium content delivery, which is content to which the consumer is a new or existing paying recipient. This is an option that is available in an existing application that is tackling a similar domain, Steam, which was reviewed 49 in 3.3. In this application, an external web based checkout system utilising Secure Socket Layer (SSL) is used for the purchasing of software. Whilst this is encapsulated in the update platform, it is provided via a standard HTML portal. Encrypted content is then received, data which is available to none paying consumers also, and upon completion can be decrypted by the distribution platform providing the content access rights are associated with their account. This is achievable due to Steam’s proprietary closed source network model. On an open network that utilises existing RSS feeds, a similar system is difficult to achieve. However it is possible for the content delivery system to merely be the transport mechanism for such content, and the content is later decrypted externally following acquisition. In this sense the content type can be regarded as incidental, purchasing and encryption is handled out of the application on the publishers site, and the user is given a means of decryption or authorising themselves for their content later. However, it is worth noting that this does not provide a content publisher assurances over counterfeiting or duplication of their content, it does on the other hand, provide some assurances that interactions with the update system are largely authorised due to the ability to authorise and track login details. With the current product in development, the system is completely open to all users and there is no form of authorisation. This is due to the nature of RSS provided over the web. To implement a successful premium content distribution system, a basic level of security and authorisation is required to mitigate the risk of fraud. As it stands this is absent, and would require both client and server compliance, which is beyond the scope of this project given its focus on client content tracking. However as a project extension this could prove of interest. 6.3.1 Features In terms of improved functionality there are a lot of extensions that can be made: • Scheduling of exact times to query server. The framework was design for its implementation, but this was not implemented due to time constraints. This feature could allow content publishers to put a schedule of when content is likely to be published on their site, which will configure the client to intensify requests for updates during this time. • Mirror searching and multiple source downloading. Optimise the download and remove some of the bottlenecks in the current problem domain. • Peer-to-Peer sharing of item downloads from RSS feeds. Optimise further so no server is needed to download item. This will overcome problems such as a congested server with no mirror data. • Automatic execution or installation of downloaded material. • Advanced version tracking information, so that if an item is posted twice in a feed, that only the latest version will be acquired not the latest item to be posted to the feed. 50 Bibliography [1] Bennet, S., McRobb, S., and Farmer, R. Schaum’s Outline of UML, chapter 3, page 25. McGraw Hill., 2001. [2] Bennet, S., McRobb, S., and Framer, R. Objected-Orientated Systems Analysis and Design using UML, chapter 3, pages 100 – 200. McGraw Hill, 2002. [3] Berners-Lee, T., Fielding, R., and Frystyk, H. RFC 1945: Hypertext Transfer Protocol — HTTP/1.0, May 1996. Status: INFORMATIONAL. [4] Bocij, P., Chaffey, D., Greasley, A., and Hickie, S. Business Information Systems: Technology, Development and Management for the e-business, chapter 8, pages 293 – 415. Prentice Hall, 2003. [5] Byrne, Tony. Content syndication: Ready for the masses? EContent, 26(6):30 – 35, June 2003. [6] Cohen, Bram. Incentives Build Robustness in BitTorrent. Technical report on BT technology, http://www.bittorrent.com/bittorrentecon.pdf [Accessed: 30/11/2005], 22 May 2003. [7] Dix, A., Finlay, J., and Abowd, G. Human-Computer Interaction. Prentice-Hall, 1998. [8] Drze, Xavier and Zufryden, Fred. The Measurement of Online Visibility and its Impact on Internet Traffic. Available online, http://www.cin.ufpe.br/˜fabio/Negocios% 20Virtuais/Leituras/measurement%20of%20online%20visibility.pdf [Accessed: 30/11/2005], October 2001. [9] Elmasri, R., Navathe, S. B., and Farmer, R. Fundamentals of Database Systems, chapter 1, page 23. Addison-Wesley, 2003. [10] ETV. The ETV Cookbook: Glossary of Terms. Glossary of Electronic Transimission Terms, http://etvcookbook.org/glossary/ [Accessed: 30/11/2005], 2005. 51 [11] Hart, Peter E., Piersol, Kurt, and Hull, Jonathon J. Refocusing Multimedia Research on Short Clips. IEEE Multimedia, 12(3):8 – 13, July 2005. [12] Hicks, Matthew. RSS Comes with Bandwidth Price Tag, 21 September 2004. News Article on RSS bandwidth problems, http://www.eweek.com/article2/0,1895,1648625, 00.asp [Accessed: 30/11/2005]. [13] IGN. The Future of Downloadable Content Podcast, 21 April 2006. Roundtable about how distribution is affecting the the software industry, http://uk.games.ign.com/articles/ 702/702731p1.html [Accessed: 21/04/2006]. [14] Institute of Electrical and Electronics Engineers, Inc. Prototypes as Assets, not Toys Why and How to Extract Knowledge from Prototypes. Proceedings of ISCE-18, 1996. [15] Institute of Electrical and Electronics Engineers, Inc. Time Series Models for Internet Data Traffic. Proceedings of the 24th Conference on Local Computer Networks, 1999. [16] Institute of Electrical and Electronics Engineers, Inc. Enabling Context-Aware Agents to Understand Semantic Resources on The WWW and The Semantic Web. Proceedings of the International Conference on Web Intelligence, 20 September 2004. [17] Jacobson, Ivar. Applying UML in the Unified Process. Lecture on the role of UML within the Unified Process, http://www.jeckle.de/files/uniproc.pdf [Accessed: 30/10/2005]. [18] Joyce, John. RSS and Syndication. Scientific Computing and Instrumentation, 21(6):12, May 2004. [19] Kruchten, Philippe. Going Over the Waterfall with the RUP, 26 April 2004. Introductory advice for RUP and comparison to the stages in the Waterfall model, http://www-128.ibm.com/ developerworks/rational/library/4626.html [Accessed: 30/10/2005]. [20] Larman, C. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process, chapter 4, page 35. Prentice Hall, 2001. [21] Mannion, M. and Keepence, B. Smart requirements. ACM Software Engineering Notes, pages 42 – 47, 1995. [22] Marumoto, Toru. RSS and Atom. RSS 2.0 vs Atom comparison, http://www. intertwingly.net/wiki/pie/Rss20AndAtom10Compared [Accessed: 29/12/2005]. 52 [23] Miller, Charles. HTTP Conditional Get for RSS Hackers, 21 October 2002. RSS and HTTP GET utilisation, http://fishbowl.pastiche.org/2002/10/21/http_conditional_ get_for_rss_hackers [Accessed: 30/11/2005]. [24] Miller, Ron. Can RSS Relieve Information Overload? EContent, 27(3):20 – 24, May 2004. [25] Pilgrim, Mark. The myth of RSS compatability, 4 February 2004. Critical review of RSS revisions, http://diveintomark.org/archives/2004/02/04/incompatible-rss [Accessed: 30/11/2005]. [26] Pival, Paul R. Using RSS Enclosures for document delivery? Web blog on uses of RSS in e-learning, http://distlib.blogs.com/distlib/2005/03/using_rss_enclo. html [Accessed: 30/11/2005], 01 March 2005. [27] Pressman, Roger. Software Engineering: A Practitioners Approach:Fourth Edition, chapter 3, pages 100 – 200. McGraw-Hill, 1997. [28] Rational. Using the Rational Unified Process for Small Projects: Expanding Upon eXtreme Programming, 2001. Available online at http://www.rational.com/worldwide/ [Accessed - 17/11/2005]. [29] Richardson, Will. Using RSS Enclosures in Schools. Web blog on RSS use in schools, http://www.weblogg-ed.com/discuss/msgReader$3196?y=2005&m= 3&d=1 [Accessed: 30/11/2005], 01 March 2005. [30] Sayre, Robert. Atom: The Standard in Syndication. IEEE Internet Computing, 9(2):71 – 78, 2005. [31] Schneider, J. Hiding in plain sight: An exploration of the illegal(?) activities of a drugs newsgroup. The Howard Journal of Criminal Justice, page 374, 2003. [32] Schneiderman, B. and Plaisant, C. Designing the user interface - Strategies for effective HumanComputer Interaction. Addison-Wesley, 2005. [33] Robert W. Sebesta. Programming the World Wide Web, chapter 1. Pearson Eductation inc., New Jersey, 2005. [34] Sommerville, Ian. Software Engineering: 7th Edition, chapter 3, pages 100 – 200. Pearson Eductation inc., New Jersey, 2004. 53 [35] Andrew Tanenbaum. Computer Networks, chapter 1, pages 3 – 79. Pearson Eductation inc., New Jersey, 2003. [36] UserLand Software. RSS 2.0 Specification. RSS 2.0 Standard review, http://www. feedvalidator.org/docs/rss2.html [Accessed: 28/12/2005]. [37] Wilson, Tim. Utilizing RSS enclosures. Web blog on further uses of RSS, http://www. eschoolnews.com/eti/2005/02/000705.php [Accessed: 30/11/2005], 26 February 2005. [38] WM-data AB. Delta Method Handbook, 2001. Available online at http://www. deltamethod.net/9PrototypeDesign_index.htm [Accessed - 17/11/2005]. 54 Appendix A - Reflection I am very happy with the outcome of the project, both in terms of the developed software and the experience gained. I found working with the Unified Process very helpful, and I am confident that its use helped deliver the project on time and at a high quality. Project Management Evaluation As highly frequent as this notion is in project development, the issue of time management is paramount. Delays will happen; it is an inevitable part of all projects. With this in mind I can’t stress how important it is to manage your time appropriately. Start work well in advance of any deadlines, hard or soft, as this work will have to be carried out either way and by preparation the work will be of a far higher quality. Continuing this theme is the choice of methodology. Unified Process Applying the Unified Process proved highly beneficial, I would favour this approach given future projects. The framework in place allows for reams of high quality design to be developed very quickly, and for a development project this is of course the ultimate goal, to develop the project in the fastest possible time without compromising quality. I was pleased how well the interaction with users went and how the expansion of the system is directly supported by the methodology. In smaller projects I have previously followed a sequential waterfall type structure and this has often meant that there hasn’t been as much room to manoeuvre when producing the implementation. With the Unified Process you can remodel your work as it desired. Use of Testing In the first iteration of system development process, the emphasis defined in the unified process is to solidify requirements and basic system structure, ultimately creating a design level class diagram and being implementation. To ensure that the prototype client was made to this design specification, a series of JUnit test cases were initially devised. This allowed for easy tests to be carried out during development by simply performing the JUnit tests after modification. The policy was to first ensure the 55 integrity of entity classes (i.e. RSS, Item and other base classes), then move onto the control and use case functionality of classes that utilise these base classes. Lastly, brief UI tests were devised. Throughout the development process, the success of the implementation was quantitively assessed by carrying out these JUnit test cases and to monitor the program was functioning correctly. Of course, this represented supplementary evaluation only allowing for the implementation of the design to be rated, given that the ability to pass the test cases correctly did not reflect that the test cases nor the system design matched the base requirements. However the test driven approach was used to help promote a sound and accurate implementation of system that matched the developed design, and ensured that the level of quality and functionality was tested periodically to ensure it matched. Testing is often overlooked for many reasons, but for this project it provided great value, as whilst implementing the test cases all at once seemed time consuming, time was saved when the program reached a more mature implementation as all classes were thoroughly checked. In terms of evaluation, this allowed for the assignment of a quantitive value to the success of the program as an implementation of the system design, which when combined with qualitive evaluation allowed for a justifiable summary of the projects success and quality. The aspect of testing is something that I will take away from this project and strive to apply to future development projects. Whilst it may seem off putting having to write numerous test cases at once, in the end, the value of doing so is greater than the effort required. CVS and Version Control To ensure that the progress of the project was efficiently and accurately recorded during the coding stages, CVS was used to hold a record of project development. This allowed for a record of what had previously been changed and also allowed for portability of code between systems. Whilst this may seem like a simple matter of project management this provided invaluable when dividing work between workstations and to roll back any errors that maybe made while coding. For future students doing a software development project, I would highly recommend the adoption of version control (detailed in SE20). 56 Appendix B - Plans Given the amount of diagrams produced, several sections document examples of the work produced. For a complete listing, see the Rational Rose Model at http://autofeed.jacobbriggs.com which is a comprehensive model of the design. Iteration Inception Elaboration Construction Document Define Scope Existing Architecture Project and Acceptance Plan Initial Use-Case Diagrams Business Object Diagram Requirements Specification Validate Architecture Refine Vision Statement SAD - Use Case Diagram - Use Case Realisation – Use Case Description Form – Activity Diagram – Sequence Diagram - Design Level Diagram Finalise Design Class Diagram Testing Final Evaluation Table 6.3: Table of Documents 57 Section 3.6.2 3.1.1 2.4 3.6.3 3.1.1 Appendix F 5.1.2 1.2 Appendix G Appendix H Appendix I Appendix J Appendix K Appendix L Appendix L 6.1 6.2 (a) Initial Plan (b) Modified Plan Figure 6.2: Project Plans: (a) The Original Plan and (b) Plan with delays 58 Appendix C - Existing Technologies Figure 6.3: A table of comparative features present in existing technologies 59 Appendix D - Questionnaire Results Figure 6.4: Most significant questionnaire results 60 Appendix E - Interview Summaries Tom • Thomas expressed a need to be able to use regular expressions to decide what to filter out of an RSS feed. • He wanted to be able to override settings so that paths could be set by file type, filter parameters, etc. • He also wanted a database of all the material he had acquired sorted by alphabetically or by date. • He said that he strongly disliked windows update system and how it forced updates on the user and then constantly prompted for restarts. • He also strongly disliked Steam but this was down to the EULA and Valve’s control over the network (it was not open enough) Dale • Dale expressed a liking for the way that steam and windows update existed in the tray and how messages were communicated by tooltips • Dale wanted a local history stored by what he had filtered • Dale also wanted to be able to search • He wanted to be able to filter content but was not confident about regular expressions and would like some help • He wanted the UI to be clean and free from clutter. Sam • Sam once again expressed a like of desktop integration and a clean UI 61 • He wanted to be able to use the application for podcasting and saving MP3s to his iPod device. • He liked the idea that a local database could be stored and he would be able to search it, stating he would like to store items by name. 62 Appendix F - The Requirements Specification Due to its length and the repitition of most of Chapters 1 - 3, the actual document has been ommited, however the following is a list of the areas covered in the specification which was shown to the users in interviews. Figure 6.5: Contents of the Requirements Specification 63 Appendix G - Use Cases Figure 6.6: The System Use Cases 64 Appendix H - Use Case Realisation Figure 6.7: Use Case driven analysis - A sampled Use Case from this project and its realisation into a design level diagram 65 Appendix I - UCDF The following are two examples of UCDF used during this project. 66 67 68 Appendix J - Activity Diagrams Given the amount of diagrams produced, the following are two samples to demonstrate their purpose. For a complete listing, see the Rational Rose Model at http://autofeed.jacobbriggs.com. (a) Subscribe to Feed (b) Add Filter 69 Appendix K - Sequence Diagrams Given the amount of diagrams produced, the following are two samples to demonstrate their purpose. For a complete listing, see the Rational Rose Model at http://autofeed.jacobbriggs.com. 70 71 Appendix L - Design Class Diagrams Figure 6.8: Control Classes 72 Figure 6.9: Entity Classes 73 Figure 6.10: Network Classes Figure 6.11: FileIO Classes 74 Appendix M - Database Files and ERD XML Fragment 2 rss.xml <?xml version="1.0"?> <rssfile> <stats> <program>AutoFeed</program> <version>0.0.0.1</version> <count>7</count> </stats> <rss> <id>2</id> <name>BBC Radio - Best of Chris Moyles</name> <url>http://downloads.bbc.co.uk/rmhttp/downloadtrial/radio1/bestofmoyles <reqfrequency>1</reqfrequency> <lastrequest>April 30, 2006 10:14:20 AM BST</lastrequest> </rss> </rssfile> XML Fragment 3 filtersfilter.xml <?xml version="1.0"?> <filtersfilterfile> <stats> <program>AutoFeed</program> <version>0.0.0.1</version> </stats> <filtersfilter> <filtersid>2</filtersid> <filterid>3</filterid> </filtersfilter> </filtersfilterfile> 75 Figure 6.12: ERD 76 Appendix N - Agile GUI Figure 6.13: AddRSS UI 77 Figure 6.14: ViewDatabase UI Figure 6.15: Download UI 78 Appendix O - Actual GUI 79 80 81 Appendix P - Testing Results Add RSS Add Filter Add Filters Search for ‘Moyles’ Manually download ‘Moyles’ item Program Acquisition Comparitor Manual Acquisition Tom 15.43s 50.32s 20.11s 05.09s 08.00s n/a 100.34s Dale 28.32s 80.80s 29.04s 10.09s 12.05s n/a 140.23s Sam 18.87s 61.01s 21.82s 5.23s 09.47s n/a 110.98s Table 6.4: Task times - Tasks are performed one time only and are then persistant 82