RSS propagation as a method for automated content

Transcription

RSS propagation as a method for automated content
RSS propagation as a method for automated
content distribution management
Jacob Briggs
Computing with Artificial Intelligence
Session 2005/2006
The candidate confirms that the work submitted is their own and the appropriate credit has been given
where reference has been made to the work of others.
I understand that failure to attribute material which is obtained from another source may be considered
as plagiarism.
(Signature of student)
Summary
For a modern computer user, there are often a lot of video, audio, and application files that are
acquired periodically or frequently updated. In this environment it is difficult to keep track of all the
latest versions, and then to find and acquire these when they are available. As such it is proposed that
an application capable of utilising existing technologies could be designed to alleviate this problem.
This report covers the issue of content distribution in the 21st century, and presents the research, design and development of a prototype client of automatic content distribution based on RSS management
and propagation.
The software artefacts of this project can be found at:
http://autofeed.jacobbriggs.com
Here you will find the Rational Rose Model and the developed program.
i
Acknowledgements
I would like to thank Eric Atwell, my project supervisor for all the help and advice he has given me
throughout the course of this project.
Similarly, I would also like to acknowledge the users that were interviewed and filled out a questionnaire during this project (who are too numerous to list).
I would also like to take this opportunity to thank anyone who takes the time to read this project, or to
use the application. For those that are interested, it can be downloaded from http://autofeed.jacobbriggs.com
.
ii
Contents
1
2
Project Overview
1
1.1
Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Introduction: Unified Process - Vision Statement . . . . . . . . . . . . . . . . . . . .
1
1.3
Problem Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.4
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.5
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.6
Extensions and enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.7
Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Project Management
4
2.1
General Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.2
Feasibility Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3
Design and Development Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3.1
Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3.1.1
Waterfall Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3.1.2
RAD and Component-Based Development . . . . . . . . . . . . . .
7
2.3.1.3
Prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3.1.4
Boehm’s Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.3.1.5
Unified Process . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Management and Evaluation Tools . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.2.1
Version Control . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.2.2
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.2.3
Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
Project Schedule and Schedule of Documents . . . . . . . . . . . . . . . . . . . . . .
12
2.3.2
2.4
3
Requirements Gathering and Analysis Capture
13
3.1
13
Users, Stakeholders and Problem Domain . . . . . . . . . . . . . . . . . . . . . . . .
iii
3.1.1
Current System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.1.2
Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Methods of Information Gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.2.1
SQIRO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.2.1.1
Sampling of Documents . . . . . . . . . . . . . . . . . . . . . . . .
14
3.2.1.2
Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.2.1.3
Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.2.1.4
Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.2.1.5
Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.2.1.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
Professionalism and Techniques . . . . . . . . . . . . . . . . . . . . . . . . .
16
Evaluation of Existing Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.3.2
Types of Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
BitsCast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
RSS Bandit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
Steam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
Windows Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.3.3
Comparison of Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.3.4
Conclusion and Future Systems . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.4
Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.5
Interviews and Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
Automatic and Manual Downloading . . . . . . . . . . . . . . . . . .
20
Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Storing of Local Database . . . . . . . . . . . . . . . . . . . . . . . .
20
Searching of Local Database . . . . . . . . . . . . . . . . . . . . . . .
20
System Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.6.1
Business Actors and Subsystems . . . . . . . . . . . . . . . . . . . . . . . . .
20
RSS Publishing System . . . . . . . . . . . . . . . . . . . . . . . . .
20
RSS Aggregator . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Content Publisher . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.2
3.2.2
3.3
3.6
iv
4
RSS Feed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
Website System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
Item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
Item Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.6.2
Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.6.3
Initial Use Case Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.6.4
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.6.4.1
Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.6.4.2
Non-Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
Background Research
24
4.1
Content Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.1.1
Client - Server Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.1.1.1
HTTP and the WWW . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.1.1.2
FTP - File Transfer Protocol . . . . . . . . . . . . . . . . . . . . . .
26
Peer-to-Peer Filesharing Networks . . . . . . . . . . . . . . . . . . . . . . . .
27
Syndication Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.2.1
Push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.2.2
News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.2.3
Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.2.4
RSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
RSS - RDF Site Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.3.1
Metadata Mark-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
4.3.2
Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
4.3.3
Original Use and Project Extension . . . . . . . . . . . . . . . . . . . . . . .
31
4.4
Content Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
4.5
Client Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
4.1.2
4.2
4.3
4.5.1
C++
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
4.5.2
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
4.5.3
Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.5.4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
v
5
Design and Implementation using the Unified Process
34
5.1
Development with the Unified Process . . . . . . . . . . . . . . . . . . . . . . . . . .
34
5.1.1
Addressing Requirements Risks . . . . . . . . . . . . . . . . . . . . . . . . .
34
5.1.2
Addressing Architectural Risks . . . . . . . . . . . . . . . . . . . . . . . . .
35
5.1.2.1
Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
5.1.2.2
Three Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . .
36
Addressing Design Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
5.1.3.1
Presentation - User Interface Design . . . . . . . . . . . . . . . . .
37
5.1.3.2
Application - Refactoring and Central Classes . . . . . . . . . . . .
37
5.1.3.3
Data - Logical and Conceptual Design via ERD . . . . . . . . . . .
38
Addressing Implementation Risks . . . . . . . . . . . . . . . . . . . . . . . .
40
5.1.4.1
Presentation - User Interface Implementation . . . . . . . . . . . . .
40
5.1.4.2
Data - Physical Implementation via XML Flat File System . . . . .
42
5.1.5
Simplifying the Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
5.1.6
Polling History and Learning Responsible Request Times . . . . . . . . . . .
43
5.1.3
5.1.4
6
Testing and Evaluation
45
6.1
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
6.1.1
Input Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
6.1.2
Security Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
6.1.3
Functionality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
6.1.3.1
Algorithm Testing . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
6.1.4
Deployment Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
6.1.5
Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
6.1.6
J-Unit Tests and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
Product Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
6.2.1
User Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
6.2.2
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
Project Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
6.3.1
50
6.2
6.3
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bibliography
51
vi
Chapter 1
Project Overview
1.1
Project Aim
This project’s aim is to study the issues of feasibility of a prototype client for RSS propagation as a
solution to tracking and updating content automatically to a user’s specification. As such, this project
aims to entail the identification of the problem domain of automated content delivery and management,
to discuss options and to design, produce, and evaluate a prototype client for tackling these problems.
1.2
Introduction: Unified Process - Vision Statement
There exists a large base of users that actively subscribe to and periodically acquire distributed content1 ,
which is updated or superseded by subsequent releases. Within this lies the problem of notification,
which relies on repeated manual searches on the user’s part to find out if a newer release is available, and
secondly acquisition, to locate and download this content when it is released. The identified problems
are of manual searches, location and procurement of content (all of which rely on the consumer side
in terms of responsibility) and the consumer’s ability to successfully complete each stage when a new
release is available. Whilst applications have successfully integrated the ability to periodically check for
and acquire updates automatically on program launch or user request, this is isolated and restricted to
the application platform being updated itself. This is often designed for the purposes of the developers,
with the added disadvantage that this is largely only attainable for application content.
As such there is a need for a client capable of automatically acquiring various types of content on
the user’s behalf, eliminating the requirement of periodically relying on manual searching, location and
downloading to achieve this.
In recent years, many have speculated that RSS could facilitate the serving of content meta data
and that an RSS client can be designed to propagate and extract this data, subsequently initialising
automatic transfers to a user’s specification[37][26][29]. Whilst there has been speculation, no real
1 In the sense of generic non-’standard text’ based static files, for example video, audio and executables, which from here
on will be the overriding meaning of the term
1
implementations of the idea have come about, although several RSS news clients have attempted to
implement basic download abilities [37].
File enclosures have been a part of RSS since version 2.0, but have only recently come to everyone’s attention with the emergence of podcasting. Most people now associate RSS attachments
with MP3 files, but there’s no reason to restrict the attachments to audio files.
For the purposes of this report, despite the intent to create a generic and site independent solution the
following Problem Domain has been defined in areas where prototype RSS feeds have already been
established but lack a client to facilitate automatic retrieval.
1.3
Problem Domain
Legal Torrents are a large site that offers various public domain movies, documents, audio and documentaries. The updates of content are sporadic and as such it is desirable to create a means of allowing
users to automatically be notified of releases and to acquire files torrent files for later execution.
The BBC has started to facilitate the distribution of radio content via its website. Many of its popular
shows are released on a regular basis, which are posted to their website for license payers to download.
However, navigation on the site is difficult and download links are often restricted to the latest episodes.
Therefore it is desirable to simplify the retrieval of content within this domain.
1.4
Objectives
The objectives can be extracted from the Unified Process in the form of the ‘vision’ statement [28]
and as defined in the problem domain. One such way to distribute news content that has emerged and
become a prominent feature of large web communities is the uptake of RSS (RDF Site Summary), an
XML based web service that relies on client applications to periodically check for Meta data updates
for websites. As such this project will focus on the feasibility of this technology as a solution and on
the design a client capable of parsing and acquiring content from such feeds. Therefore the project’s
objectives are as follows.
• To research and analyse the problem domain, existing solutions, central issues and RSS based technology
as a solution.
• To design and implement a product that is capable of utilising RSS feeds to solve the issue of content
distribution.
• Ultimately, to reduce human interaction without losing control over the process of content acquisition.
1.5
Requirements
There are two distinct deliverables with a subset of requirements. Firstly, will be an analysis and feasibility study of RSS as a problem solution which will be realised using the Unified Process, with the
2
requirement to justify the suitability of RSS to solve the problem. Secondly, will be the design of a
prototype of a robust client capable of meeting the following requirements.
• The product should allow users to get material from many locations.
• The product should allow users to filter and select content.
• The product should allow users to be able to download material automatically.
1.6
Extensions and enhancements
Given the potentially large scope of the project, there are a large number of possible extensions beyond
the base requirements. Supplementary work for the project report can be summarised in two areas.
Firstly, the analysis can easily be extended to include an analysis of security and efficiency. As such
the following areas present opportunities for enhancement in terms of background study, research and
conclusion.
• Analysis of security issues and solutions to make the system of content distribution more secure.
• Analysis of content validation and issues of trust in a multiple source network.
• Analysis of issues of bandwidth consumption, the impact on the efficiency of the product and ways of
reducing this impact.
• Issues of server side implementation and the design of a product to help in the extraction and mark-up of
feeds to distribute content.
• Discussion of multiple sources, load balancing and client content distribution.
Secondly, extensions to the design and implementation of a prototype client can be summarised as
follows.
• The product could be extended to allow users to attain material of different file types
• The product could be extended to be able to store a local history of what material it has attained/read from
the RSS sources.
• The product could be extended to allow the user to search through its local database of what material it has
read from RSS files (i.e. Search history)
• The product could be extended to address some of the inefficiency in the ’user polling’ network model.
• Extended issues of HCI, including customisable interface and design.
1.7
Deliverables
Upon completing the project there will be two deliverables.
1. A report covering the analysis, feasibility, design and evaluation of a prototype client.
2. Software artefacts for a working prototype product that meet the ‘Must Have’ requirements of the project.
3
Chapter 2
Project Management
This section outlines the approach taken during the project development and highlights the importance
of clear structure and management in achieving a successful outcome.
2.1
General Management
Thorough management and scheduling is an important area of concern when embarking upon any development project. Without a clear plan of execution and detailed scheduling, a project can effectively
stall and resources be wasted on trivial tasks whilst more important aspects of project development go
unfulfilled. Moreover, the chances of project failure is increasingly more likely if risk is not managed
and tackled as early as possible [28]. The computing sector is plagued by reports of project failures to
mismanagement and lack of understanding of core requirements. To this end, the evaluation and selection of a clear design methodology, tools and techniques to be used during development, and a clear
plan of action was the first aspect of the project to be carried out.
The notion of design methodology, the method of documenting a project and assuring it meets all
of its aims and implementation goals, a plan of action and tight scheduling are essential in successful
project management [34].
The project, although tackling a specific problem domain as outlined in 1.2, can be considered
mainly an academic project and to that end, allows for a free range of methodology options. There is
no live system in utilisation that would need consideration and ultimately the project is to design an
application to fill a current market gap, rather than replace or build upon an existing system. To this
end, the cost and the risks involved in this project are minimal, and in fact even project failure could
yield usable information regarding feasibility and usefulness, as well as architectural design. However,
a successful project outcome is of course more desirable and would allow for further extension and user
adoption. To this end, development risk is an important issue that needs reducing to a minimum and can
be done so through the adoption of a sound design methodology.
4
2.2
Feasibility Summary
Before developing a complete system, it was first a requirement to address the aspects of feasibility of the
proposed project, and establish a clear need and list of benefits from its development. Through adopting
the Unified Process methodology, described in detail in 2.3.1.5, issues of feasibility were addressed
in the early Inception and Elaboration iterations, which focused on business needs and requirements,
and refining those requirements and mitigating technical risks respectively. Whilst the Inception stages
defined the scope and organisational feasibility, presented in detail in Chapter 3 and Chapter 5, the
Elaboration iteration allowed for a clear picture of the systems architecture and the technology it was
based upon to be formed. This work is largely presented in Chapter 4 as background reading, but also
in Chapter 5 where an overview of the system architecture.
[4] summarise the requirements of a feasibility assessment as being the stages of establishing
whether a system is technically feasible, economically feasible and operationally and organisationally
feasible. When developing a prototype application in an area with no clear existing overall structure or
standard, it is hard to establish an economic feasibility summary as required by more rigid methodologies, or indeed required as an initial stage before development with the Unified Process on large scale
or business/cost oriented projects as recommended by IBM [19]. As an academic project the cost of
development is very low and as such costing methods such as Return on Investment (ROI) are of little
value in establishing cost. To this end, cost can only really be quantified as the use of manpower in
development compared to the gain from the successful development of the project as well as the quality
of the overall system. This is shown in more detail in Chapter 3, where the SQIRO techniques show a
real desire for the system to be developed amongst sampled users, and where the analysis of the domain
presents further need for a system to be developed. A detailed plan presented in 2.4 shows the cost of
the project in terms of resource usage as a development schedule.
2.3
Design and Development Tools
2.3.1
Design Methodology
In terms of design methodology it was appropriate to analyse several tested methods and conclude upon
which best suited the needs of the project. For this, several academically taught methods were first
reviewed, followed by several researched design methodologies.
2.3.1.1
Waterfall Model
The Waterfall model offers a clear structured methodology for systems design. The model is constructed
of distinct stages, which are completed sequentially as development progresses through to implementa-
5
tion and maintenance cycles[27]. The methodology meticulously covers each stage, which is first judged
to be completed before moving onto the next stage. The wealth of coverage ensures that the system is
well documented and largely designed based on initial requirements capture, however, this also proves
to be the model’s weakness as many sources argue that requirements are rarely static or fully captured in
a single stage [34]. Whilst providing a well documented framework, there is a large time cost associated
with its usage, largely due to having to ensure that analysis, design, and implementation stages are fully
realised before progressing [4]. As such, the model has a large bias towards analysis and design, and
also testing as presented in [34].
Whilst it offers clear structure, it is also extremely rigid and methodical, each stage must be
signed off as completed before moving onto the
next, and as such before coding a design must be
fully judged as being completed and all encompassing. For example, before the design stage is
started the analysis stage must be completed in
Figure 2.1: Traditional Waterfall Model and Modern
its entirety. This leads to the question of how
to judge whether a solution is fully analysed and
Approach
when a design is fully matured, as the model restricts returning to a previous stage of development. As such this rigid structure is very slow and detailed
to ensure each stage is completed to a level where it is deemed as completely researched and concluded.
The Waterfall model is still used in expensive, large-scale projects, such as those of government information systems, where there is an abundance of time and resources. Modern incarnations of the Waterfall
model have broken some of the rigid stages and allow for some re-evaluation of previous stages as displayed in Figure 2.1[4]. Similarly, the model has also been adapted to allow for some level of iterative
development (a concept covered under later methodologies), with the aid of internal review stages to
decide if milestones within the project and minimum requirements have been achieved. Despite this,
the model quickly begins to break down in time limited projects or in a dynamic domain with changing
requirements.
With regard to this particular project, where time is limited and the requirements are hard to pinpoint
without continued feedback and analysis, the Waterfall model is unsuited for the rapid development of
the project. Whilst the clear structure and tools associated with design and requirements capture phases
provide some level of clarity with its clear guidelines and development framework, the rigidness of the
model is unsuited to a project of this nature.
6
2.3.1.2
RAD and Component-Based Development
Component-Based Development is the integration of new software components into a growing implementation, building upon already developed components or integrating large parts of their functionality
into a replacement system. The methodology focuses on designing new aspects of an existing system or
building upon an already designed implementation, and as such requires some existing framework and
analysis to be built upon. In essence the process involves a bias towards customisation and component
integration or in the planning of a large system that is slowly rolled out in usable components, rather
than designing and building an entirely new architecture and implementing in one development stage.
The Rapid Application Development (RAD) methodology takes this component development approach and aims to roll out the functionality of a system by implementing key features, and then slowly
building on this base through further development cycles. As part of its component-based nature, RAD
aims to achieve maximum code reuse [4].
In terms of development bias, by comparison to the Waterfall model more time is spent in development of the software artefacts and systems integration in than analysis and design, given the existence
of a large part of the system already; it is largely assumed in the process that the requirements have
to a great extent been predetermined [34]. Whilst this is a dangerous assumption to make when developing new systems, it allows for rapid development of new software components working on top of
existing technology. The methodology also supports object-orientated design principles because of this
approach. The overall system can be represented as the interaction between objects, with each systems
component being the manipulation of the objects and their interactions. With this approach, the first
stages are to map out the entity objects and then to build a system of control classes utilising the standard interface for each object of the system. As the underlying objects largely do not change with further
iterations, the rolling out of more control class components will not affect the underlying functionality
of the system.
Whilst there are parts of the project that are already implemented and a client is to be designed to
utilise existing technology, the lessened emphasis on requirements capture and analysis works against
its adoption as a methodology in what is largely a new client system. The rapid development of the
software suits the timeframe of the project, however, without existing software systems in place and the
need to gather detailed requirements, RAD does not suit project development as well as other methods.
A key problem with RAD is the lack of clear methodology and development tools, that is a solid
framework and methods of expression which mean RAD projects can often overlook critical documentation, and as pointed out in [4], often leads to a casual development approach.
The aim of RAD is of course rapid production of applications and to tackle some of the deficiencies
7
of older methodologies, such as the absence of continued user feedback and large development times,
but the lack of clear guidelines to document development means that the methodology provides some
difficulties in adoption and can lead to overlooking some aspects of development [4]
2.3.1.3
Prototyping
Prototyping can be split into two rough categories, Revolutionary Prototyping and Evolutionary Prototyping. The former uses prototypes largely as disposable ‘proof-of-concepts’, to test and validate
working functionality, whereas evolutionary focuses more on the reuse of code and development of a
program representative of the final product that is built on top of each generation.
Evolutionary Prototype utilisation aims at first completing a ‘formal’ design and analysis phase
before moving onto a prototyping phase, in which a solution is developed based on continual feedback. In this stage, a prototype is developed, reviewed and then the criticisms of the review are further
amalgamated into the prototype. This process is repeated until the review yields that the development
requirements have been met, and to this extent the methodology represents an iterative implementation
phase in contrast to the Waterfall’s purely sequential nature. [38] outlines the benefits of this method of
systems production.
Since the system is made tangible it is easier to pursue a dialogue concerning the system,
and thereby making better decisions on the design regarding usability and function.
However, despite the iterative implementation allowing for refinement in function to meet the requirements of the project, the evolutionary prototype methodology fails to handle problems should they
appear in the overall design, and requires building around central flaws, a notion omitted in [38]. This
methodology can also lead to fundamental flaws being built on top of, rather than allowing for the prototype to be discarded and replaced by one based on a superior design[34]. This concept of refactoring
is covered in the Unified Process later in this chapter.
Similarly like building on top of design flaws, the evolutionary prototype methodology gives way
to diminished overall structure, with redundant code artefacts being built upon with each iteration of
the prototype. As well as the diminished architecture, it is difficult to avoid writing large sections of
undocumented code with each iteration and as such, when adopting this methodology it is essential to
set clear goals and solve subsets of problem as this allows for thoroughly tested and documented code.
[14] summarises the issue, pinning it to the need for prototypes “to be developed rapidly” and thusly
the production of “documentation would slow down the process”. [14] also points to prototyping being
a method of realising risks and “to foster clarification [of] requirements, and to develop and try out
solution concepts”, and to this end prototypes are summarised as best serving as “the centrepiece of a
8
hyperstructured information base” rather than a direct root to a fully documented solution.
Ideally, prototyping is a design tool that provides a channel of expression of ideas with a user base
and is most useful in areas such as user interface design. In terms of this project, the deliverable of a
prototype client is descriptive of the level of implementation the software artefact will achieve, that is
as a proof of concept for a content distribution client.
2.3.1.4
Boehm’s Spiral Model
The spiral model is an iterative design methodology aimed at managing risk by repeatedly carrying out
risk analysis. [4] summarises the stages as planning, risk analysis, engineering and customer evaluation
which are repeated until completion. There are similarities with RAD in this regard, given the iterative
development and review stages, and thusly contrast with the classical Waterfall methodology. Unlike
both Waterfall and RAD methodologies however, the spiral model starts with the production of a requirements plan but does not have detailed initialisation and analysis phases at the start of the project.
Instead heavy focus is on design and construction as pointed out by [4]. There are also elements of
requirements and design validation which are not present in RAD.
Whilst both RAD and the spiral model offer great advantages over the Waterfall model, it is the
absence of some key stages and the lack of specific tools for documenting and expressing the system
design that are their downfall. To this end, a similar iterative methodology will be reviewed next.
2.3.1.5
Unified Process
The Unified Process, developed by Rational, offers an iterative approach to software design where
requirements gathering, design and programming stages are repeated, aimed at rapidly creating reviewable results at the end of incremental stages. The core concept of the Unified Process is that of iterative
stages of development rather than sequentially completing stages, such as that of the Waterfall Model
[20]. The focus here is in quickly attaining aspects of feasibility and identifying design flaws through
continuous feedback like in the evolving prototype methodology reviewed in this report, and essentially
aims at reducing the risks involved in project development. The methodology attacks risk by constantly
evaluating progress, mitigating risk early on in the development process identified by feedback and user
engagement. There are notable similarities with the RAD approach, in that the system goes through
development cycles that in themselves are structured like the Waterfall model, and that the core functionality is extended as the system develops (defined as Must Have, Should Have and Could Have Use
Cases). Where the methodology differs is that the Unified Process utilises many existing object oriented
development tools such as UML that aim to model the requirements of the system and to present its
required functionality in a clear, universally understandable way.
9
The Unified Process is not a universal process, a rigid framework of stages that need completing, but
instead a development process designed for ‘flexibility and extensibility’[17]. As [17] demonstrate, the
Unified Process allows for a completely flexible lifecycle strategy and provides the tools and language
to express the project, but at the same time allow the developers choice over the artefacts and concepts
to model. The Unified Process provides a means to efficiently realise high quality projects by providing
the means to document the project’s development based on object-orientated design principles.
The basic structure of the Unified Process is split into four phases; Inception, Elaboration, Construction and Transition. During the Inception stage, business and requirement risks are addressed. This
is particularly significant for new development projects such as this project, and ensures the system is
feasible and achievable. To this extent, the methodology suits the needs of the project appropriately,
given the requirement to justify that the project is first capable of attaining a suitable conclusion. The
Inception stage is iterated if necessary, producing a statement of the problem in the form of the ’vision’
statement, identifies the scope, formalises a plan of action and produces an evaluation of the risks[28].
This wealth of documentation presents the project in a variety of diagrams and written forms at different
levels of abstraction aimed at demonstrating the project’s worth and the fact that it is achievable, both in
terms of cost and in terms of technical feasibility. The adoption of the Unified Process was largely based
on this notion, the ability to achieve the projects requirements clearly and efficiently. Like the Waterfall
model which also starts with an inception phase which cover aspects of feasibility, this methodology allows for the opportunity for refinements in requirements because of the iterative nature, and also builds
upon the project’s problem definition to better reflect changes in requirements.
A major advantage of this methodology over others is
the Use Case driven approach. By modelling based upon
specific requirements, the documents produced all have a
consistent link between them. The Use Cases are categorised into levels of importance, the highest of which are
realised and elaborated upon in earlier iterations, presenting the system via the use cases at increasingly lower levels
of abstraction as the project progresses towards implementation in later iterations The Use Cases are then linked toFigure 2.2: Use Case driven analysis
gether to show the overall view of the system by the end of
the Elaboration phase.
As emphasised, the Unified Process provides the tools for documenting and realising the development of a system clearly and efficiently. The methodology provides methods of identifying and attack-
10
ing risk early on in development by closely working with users to develop the system and redesigning
if necessary. To contrast with the Waterfall model, if new requirements turn up later in the project,
the Waterfall model has no framework for re-evaluation of the system’s design. Similarly, evolutionary
prototyping results in flaws being built upon and the system becomes more architecturally complicated
than is necessary. In contrast, the Unified Process aims to keep the system’s design to its most simplistic
form. Users’ functional requirements are mapped as individual Use Cases, which are ordered by importance and then realised using UML modelling techniques, designed to represent an object orientated
view of the system ready for implementation. Quality Assurance is provided through a sound design of
the system at the end of the Elaboration stage, but also through testing, version control, refactoring, and
other methods such as simple coding standards and planning.
Transition is a final iteration of the Unified Process, with the purpose of implementing a realised
system into a working environment. With regard to this project, this final stage is not a requirement for
the success of the project given the objectives for a prototype client. The Unified Process ultimately
proved the most useful for this project.
2.3.2
2.3.2.1
Management and Evaluation Tools
Version Control
The adoption of version control tools provides numerous advantages during the development of the
project. Whilst in this specific project, only a single developer will be analysing, designing and programming the project, there is still a need for tight control over the documentation and software implementation, especially given its iterative nature of production. To this end, incremental versions of the
documentation were stored under separate incrementally named files to ensure that any loss or problems
occurring on the latest iteration could be rolled back to an early version. Under a similar notion, CVS
was adopted to track code changes and share the code base between several development platforms.
This allowed for a full listing of documented code changes and the tracking of implementation progress.
2.3.2.2
Testing
The policy of test-driven coding is often overlooked as a systems development choice or becomes an
afterthought yet is of great importance in establishing a high quality product that meets the requirements
of the project and the realisation of the design architecture.
By establishing a rigid set of test cases for the project a quality assurance is made, ensuring that
the project not only meets the design model but also in evaluating the success of each iteration of an
iterative design methodology.
11
2.3.2.3
Refactoring
Refactoring is the process of redesigning a system’s architecture to better encompass the required functionality and to ensure it is in its simplest form. Refactoring is central to iterative methodologies, as
the concept that continual study of the Problem Domain will lead to refinements in requirements, which
give way to a redefining of the system which in turn yields superior simplistic design. The ability to
utilise design patterns, or simplify and increase the amount of code reuse by simple refinements in the
system’s architecture is an important part of development with the Unified Process, and throughout the
project the architecture of the system was repeatedly design.
2.4
Project Schedule and Schedule of Documents
A project schedule initially drafted for this project and the final schedule with the differences in dates
marked can be found in Appendix B. In addition to these Gantt charts a list of documents produced by
following the Unified Process and their location within this report can be found.
Stage
1
2
3
4
5
6
7
8
9
10
Date
30.09.2005
21.10.2005
31.10.2005
09.12.2005
19.12.2005
06.02.2006
10.03.2006
17.03.2006
02.05.2006
03.05.2006
Task
Submit Project preference form
Complete Minimum Aims and Requirements Form
Iteration 1
Complete the Mid-Project report
Iteration 2
Iteration 3
Submit table of contents and draft chapter
Completion of Progress Meeting
Submit Project Report
Submit Project Report Electronically
Table 2.1: Schedule
12
Chapter 3
Requirements Gathering and Analysis Capture
This chapter reflects on the requirements capture stages of the project. Iterative comparisons can be
found in more detail in Section 5.1.1.
3.1
Users, Stakeholders and Problem Domain
The overall aim of this project is to research and implement a content delivery and management system
based upon existing technologies. For this project, the main focus of development is on the client
application side and to this end the users of the system are defined as a generic set of computer users
with a wide range of technical expertise and computing experience.
3.1.1
Current System
The current environment for application and
media content distribution can be described
as an ad-hoc structure crossing various protocols. Content updates are tracked using standard browsing techniques, via search engines
or direct navigation of known sites. In terms
of update notification, this is largely left to the
consumer or is provided as a feature built into
a particular application system exclusively for
this system and relies on some element of manual notification and then later manual acquisition. There is no general model or platform of
Figure 3.1: Problem Domain
distribution over any singular technology. The
business model of the project specific domain
under development is presented as a Business Object Diagram shown later in this chapter in Figure 3.2
13
as the conclusion of initial analysis of the Problem Domain.
3.1.2
Users
In terms of the global system of content distribution, there are two distinct stakeholders. Firstly is the
content publisher who releases the content and posts notification of release, and secondly is the consumer who acquires the content. There exist a great number of individual subsystems and business
actors within the current environment but this can be greatly simplified to the two stakeholders in question in the analysis model.
3.2
Methods of Information Gathering
3.2.1
SQIRO
SQIRO is an acronym for various techniques of gathering requirements in the initial analysis stages of
a project from a wealth of sources. To elaborate, the areas of requirement gathering can be summarised
as the sampling of documents, the use of questionnaires, the interviewing of stakeholders, and further
research and observation of the problem domain [2]. However, given the size and scope of this project,
as well as the lack of current implementations, not every stage needs undertaking in great detail, nor
indeed is it particularly applicable to this project. To this end, a brief description and suitability analysis
of each of the stages was carried out.
3.2.1.1
Sampling of Documents
The stage of sampling documents does not offer suitability to any great extent for this project. Given
that this is proposed as a new product development, there is, in fact, no documentation for an existing
system available for analysis that will yield any great wealth of useful information, and given the time
involved in reading through documentation of other projects and the varied aims and functionality of
those programs, it is not a feasible technique to use for this project.
3.2.1.2
Questionnaires
The next technique in SQIRO is the use of questionnaires to gather requirements and feedback for
project development. Questionnaires offer a double-edged sword in terms of usefulness. In the method’s
favour, questionnaires allow for useful levels of feedbacks if the questions are well formed and concisely
answerable, for example evaluating existing products or features with a fixed scale rating. The method
also has the advantage of being relatively cheap in terms of overall time expense compared to the results
yielded. To this end, the use of questionnaires to empirically gather feedback on the system during the
evaluation stages, at each iteration stage, will allow for a guided review of the systems success and areas
in which the development is failing. Similarly to gather areas of required functionality questionnaires
14
offer a simple solution. However, it is important to avoid allowing for open ended answers and opinions
being expressed in this format, and to this end, questionnaires were used simply to aid in the gathering of
statistic feedback and gather product requirements information. In terms of this project, questionnaires
were conducted in the form of brief interviews prompting users for their response to various questions
gauging their preference for certain features.
3.2.1.3
Interviews
The main body of requirements was to be gathered through a combination of interviews, research and
observation. The interviewing process allows for a diversity of opinions to be recorded, and a formal
channel of dialogue with the stakeholders given the documentation of the interviews. A structured
approach allows for the opinion to be gathered in a systematic and useful way. Three candidate users
of the future system were chosen and consulted for the entire lifecycle of the project. These users were
consulted to gain continued feedback and allowed them to voice their opinions on the evolution of the
product. This was a central part of development using the Unified Process methodology, as it allowed for
continual attacking of development risks such as the lack of identification of the required functionality
as the project developed from impacting upon the success of the project.
3.2.1.4
Research
The process of domain specific research allowed for the gathering of information relevant to the problem
domain such as the current methods of distribution, allowing for understanding of the core transactions.
The process also helped in gathering information required to justify development choices taken later in
the project, such as system component and software choices. The majority of research went into the
background issues and technologies available. For the research stage of requirements gathering it was
not possible to observe a single existing system or visit an organisation using such technologies which
is usually the process taken at this stage [2], given that no such system existed. Instead the general
overall concepts of the problem domain were researched, using a variety of resources such as academic
journals, standards and white papers as well as review of several candidate systems in existence.
3.2.1.5
Observation
The final technique of requirements gathering in SQIRO is the use of observation to gather detailed
understanding of an existing system. Observation played a key role in the product evaluation, but in
terms of initial analysis, observation took form in the shape of an extended feature of the candidate user
interviews. Several existing systems were shown to the users, which allowed for the users to engage
more openly in a level of dialogue and identify requirements with demonstrations during the interview.
The browsing behaviours of users were reviewed during this process, and as such, the central tasks in
15
content management were observed and can be identified as three iterative stages; update notification,
search and acquisition.
3.2.1.6
Conclusion
Successful use of SQIRO requirements gathering techniques allowed for a clear understanding of the
business needs early on in the project. Whilst questionnaires were not originally intended, and discouraged to some extent as a device for requirements gathering, as the results show (Appendix D), their use
did quickly highlight key development areas that needed addressing and that would have otherwise been
overlooked. As expressed, several techniques from SQIRO were applied, but the body of useful requirements gathering came through interviews which were combined with demonstration and observation
to yield a more successful output, and background reading and research which identified the areas of
development that needed undertaking early in the project. The sampling of existing documents was not
possible for this project and was omitted as a stage. Likewise, the observation was used as a device to
gain feedback, but was not useful in discovering an understanding of the problem itself and was largely
not applicable to this project as an analysis stage. Existing programs were reviewed however and used
as baseline comparators during evaluation.
3.2.2
Professionalism and Techniques
Part of any successful project is the way in which it is structured and executed. A key part of the Unified
Process is the high level of user interaction during the development. Here, it is important to maintain a
level of professionalism as this interaction not only reflects on the quality and volume of useful feedback
that can be obtained, but also on the University of Leeds as the parent organisation of this project. To
this extent, measures were taken to ensure all external discourse was conducted in a manner befitting
the level at which this project represents. Whilst carrying out interviews and questionnaire sessions,
it was important to structure and prepare appropriately to ensure that a useful level of feedback was
obtained, but also that the process reflected favourable on the University and the interviewee in terms of
professionalism
3.3
Evaluation of Existing Products
An important aspect of project development is an analysis of what products are available and why exactly a new product would be needed. Even in areas of little development, a review of what is available,
what works and what does not work, and a gauging of how successful other applications have been helps
to define exactly how a solution can be devised to take advantage or extend upon existing products. In
regards to this project, there are a great number of applications that can be reviewed to gain insight into
exactly how the problem domain has been tackled and how this project can build upon the success of
16
other applications.
3.3.1
Introduction
Since the introduction of RSS as a web technology there have been many applications, both standalone
clients and integrated features that web applications to support its usage, which can be classified into
two general areas; standalone RSS applications and integrated RSS functionality. In addition to RSS
utilising applications, proprietary update management software represent a third area to which the general principle of the project is being tackled. Whilst this project is concerned with the suitability of RSS
parsing and feed management in solving the problem domain, and achieving a general, loosely coupled
but highly effective system, there are several systems tackling the same domain using different means.
In justifying the feasibility of this project, it is a requirement that these applications also be studied.
3.3.2
Types of Products
As previously mentioned there are both RSS utilising applications as well as proprietary systems. In
regard to RSS clients a base requirement of having to be able to acquire non-text based content attachments to feeds was a necessary requirement for their selection for review. To this end, the following
applications were reviewed.
BitsCast
BitsCast is a popular RSS aggregator that is design to support podcasting as well as other forms of
‘casting’ (that being the concept of attaching content to an RSS feeds such as VideoCasting). The application is
designed to show only recent views of RSS feeds and has an inbuilt internet explorer tab to support the viewing
of news feeds.
RSS Bandit
Steam
RSS Bandit is a highly popular news client with some support for file attachment downloading.
Steam is a games content delivery system developed by Valve to publish and distribute their own games
titles directly to their customers. This system offers a fixed content system, where customers can purchase and
gain automatic updates to their software purchases.
Windows Update
This platform is an operating system update delivery platform for Microsoft products. BitsCast
and RSS Bandit applications offer RSS utilisation and the ability to acquire content other than standard
news. It should be said that the primary purpose of these applications is to retrieve and display news data
rather than as a generic content distribution system and as such do not prove to be ideal solutions taken
out of context. Although BitsCast claims to be for casting, that being the distribution of non-text data,
its design is still around the delivery of news primarily. In addition to these applications the Windows
Update system and Steam were reviewed, being applications that exist to serve and manage content
using proprietary technology.
17
3.3.3
Comparison of Products
The existing products were evaluated based upon useful functionality for this project. A complete
comparison table reviewing all four applications can be found in Appendix C.
3.3.4
Conclusion and Future Systems
From Steam’s example in particular, it seems very apparent that the future of distribution technologies
for commercial games and software is likely to be redefined by the introduction of disruptive distribution
technologies that will change the existing landscape of consumer-to-publisher-to-developer channels
into a more direct form of communication between consumer and developer [13]. The runaway success
of Steam is a prime indicator of the need for an internet content distribution system to replace the existing
published content distribution model. In terms of delivery of patches and updates, it is increasingly more
frequent that applications are delivering updated content over the web directly to the application, and
it would seem logical that not only will this trend increase in frequency and become a standard part
of most applications, but also that a generic form of distributing new content directly to consumers is
likely to develop. Next generation video gaming consoles are already an indicator of this trend, all three
major manufacturers (Nintendo/Sony/Microsoft) have expressed plans for online distribution of content
in similar models.
3.4
Questionnaires
The use of questionnaires for this project was merely to gauge feedback on the success of features
within the systems, and to a large extent can be regarded as quick interviews of users to gain an insight
into areas of success and weakness in sampled applications and the developing system. The use of
questionnaires does not reflect the traditional sense of technique, where sheets are distributed and later
collated, as the interviewer observed and gauged responses. Sample applications were demonstrated
to the user and the questions were elaborated upon if a user required. A total of 30 questionnaires
were distributed in the initial requirements gathering stage of the project, and were distributed to both
experienced and casual computer users (classified as consumers rather than content providers). Of this
number, 20 were carried out with the interviewer supervising, and ten were distributed for users to
fill out when they had free time. Of these ten, three were returned completed (76.7% of the sample
were completed). As such, the questionnaires could be regarded as interviews given the presence of the
interviewer to help answer questions and clarify the aim and scope of the process, but were classified as
questionnaires given the rigidness of the questions, the speed at which the interviews were carried out,
and the simplistic aim of gauging success without detailed elaboration. From the questionnaires, the
results were used as a point of reference in more detailed interviews. The results were tallied and split
18
by technical ability classification to represent clearly the viewpoint of both technically capable users,
and that of casual computer users. The results showed clear areas where technically experienced users
favoured functionality, whilst less experienced users’ favoured clarity and structure more heavily. There
were also areas of clear overlap, where both demographics expressed similar sentiments. The following
table 3.1 shows the demographic samples.
Sampled
Returned
Response
Experienced Users
20
16
80%
Casual Users
10
7
70%
Table 3.1: Questionnaire Response
Similarly, the tallied results were used for the basis of deriving the functional and non-functional
requirements of this project. Results of the questionnaires can be found in Appendix D.
3.5
Interviews and Feedback
For the purposes of development and as an aid in achieving the requirements of development using
the Unified Process methodology, three perspective users were selected to give continued feedback
on prototypes of the product, and to help gather requirements throughout the project lifecycle. The
candidates were chosen based on their suitability as a user of the product, their technical ability and the
availability of the candidate throughout the system development. The following candidate users were
interviewed at the initial requirements gathering stage and later were involved in design, development
and feedback of the prototype product developed using the Unified Process methodology:
Thomas Bradshaw
Third Year BSC Computing Student, Leeds University
Technically experienced in design and development of software systems. Samuel Thiessen
GCSE Student, Rastrick High School, Brighouse
Representative of the casual computer user demographic. Dale Smith
Third Year BSC Computing Student, Leeds University
Uses RSS feeds daily for news and site updates, and has some knowledge of existing uses of the technology such as podcasting and news syndication.
For the initial requirements gathering stage these users were demonstrated a variety of applications
and interviewed for their thoughts on the usefulness of core functions, and the success of the application
in achieving its goals as well as their own thoughts on application requirements. Full summaries of
these interviews can be found in Appendix E. A summary of the identified areas of development, which
were adapted into minimum and extended requirements are summarised as follows:
19
Automatic and Manual Downloading
A key part of functionality was of course to download and manage
content. As part of the interviewing process, all three candidates stressed that an important aspect was the ability
to have content automatically retrieved with no interaction with the system. Comments were raised on how
Windows Update and similar platforms, interfered with computer operation and flashed prompts and reminders
to restart. To contrast with this, all three candidates favoured transparent downloading with optional download
notification. Another feature expressed as an extended requirement was the ability to manually download also,
for content that was not flagged for retrieval automatically.
Filtering
The ability to mark what content was to be acquired was raised as a requirement. One user requested
that the use of regular expressions should be available to achieve this, however for casual users a simpler method
seemed more appropriate.
Storing of Local Database
Several opinions were voiced on this matter, whilst there was agreement on the
need to store this information, the candidates all expressed different ideas on how the items retrieved should be
stored and displayed.
Searching of Local Database
One of the candidates raised that he desired the ability to search through all of
his acquired material to see where it had been downloaded to and in fact if it had been downloaded.
System Integration
All three candidates spoke favourably of how other applications integrated into the desktop
by use of tray icons and popups, but how no overall windows was ever present to clutter up their desktops.
Security
A concern of one of the candidates was the retrieval of unsafe material, and the candidate expressed
the need for an ability to blacklist content type and particular sources of download to prevent malicious material
being brought past his firewall and executed
3.6
Summary
After the completion of the first iteration of the Unified Process, a clear picture of the business domain
emerged, this being the entire content distribution channel from content producer to end user. From here
the domain can be expressed through UML in the form of a Business Object Model.
Due to the time constraints of this project, it was decided that only the client-side subsystem should
be consider in great detail (highlighted in Figure 3.2). This involves the RSS Aggregator business actor,
the consumer and the Item business objects. These were the key concepts for the further design of the
system.
3.6.1
Business Actors and Subsystems
RSS Publishing System
Handles the extraction and mark-up of the latest item entries from the source website
system database.
RSS Aggregator
Propagation subsystem that allows for the processing of an RSS Feed into one or more Items
20
Figure 3.2: Business Object Diagram
Content Publisher
The website host and content originator. In this project this business actor plays a secondary
role as the project focuses on the client side.
RSS Feed
The source of content information.
Website System
Consumer
Item
The Main User of the client side subsystem.
A subcomponent of an RSS Feed which identifies a particular content/download entry in the feed.
Item Source
3.6.2
The hosting site for the content
The source URL reference or content stored on a web server
Scope
The scope of this project was decided based upon the size of the domain. Whilst the project proposal
initially described the entire process of content distribution, and possible areas that this project could
entail, initial planning, discussion and the results of the Inception stage indicated that this was not
achievable within the time constraints allocated for a project of this size. To this end, after modelling
the Business domain and analysing the problem, the scope of the project was limited from a system
providing a complete proprietary content publisher to content consumer distribution, to utilising existing
RSS feeds and designing a client to aid the consumer’s requirements of being notified of releases,
searching for a download location, and manually downloading content. This also allowed for a suitable
computing oriented development project to be carried out. This domain is highlighted on the Object
Diagram Figure 3.2 and from herein out, the primary concern of the project is in designing an application
to use two specific sites, http://www.bbc.co.uk/ and http://www.legaltorrents.com/, as two examples of
sites that provide server side implementation but no particular client side implementation of content
acquisition systems. These two sites were used as a point of reference for all sampled users, and in
evaluation and testing (in addition to many other sites using a variety of server software and hardware
21
configurations that extend beyond the project’s minimum requirements).
3.6.3
Initial Use Case Modelling
With the environment modelled as a Business Object Model shown in Figure 3.2 and the scope of
the project visualised, the next stage was to model the requirements gathered through interviews and
questionnaires in the form of a Use Case diagrams. Use Case diagrams offer a simple representation
of the user’s needs and base requirements from the system, and support the communication of ideas
directly with the users [1]. The Use Cases for this project can be found in Appendix G.
3.6.4
Requirements
From the requirements gathering and analysis stages of the Inception and Elaboration iterations the following functional and non-functional requirements were arrived at. At this point, one of the project
deliverables should be clarified, that being the ‘design of a prototype of a robust client’. This is partially covered in 2.3.1.3 where the issues of prototyping are raised. Through the adoption of the Unified
Process methodology the prototype software artefact for this project can be described as meeting the
Must Have Use Cases to some level of satisfactory implementation. Here we can define satisfactory
based upon user feedback and evaluation. Similarly, extensions beyond the Must Have Cases, namely
the Could Have and Should Have Use Cases, can be classified as project extensions beyond the base
requirements. In [21], it was proposed the following criteria should be met when defining requirements, defined by the acronym ‘SMART’. Firstly, the requirements should be Specific, that is clearly
defined in appropriate detail and free from any ambiguity. This is difficult in regards to non-functional
requirements, as it is hard to quantify aspects of Human Computer Interaction (HCI) in clear language,
however, this is sound advice for functional requirements. Secondly, the requirements should be Measurable, in that it is clear at what level of implementation that requirement has been met at. Attainable
is the next criteria; that the requirement is feasible in regards to the proposed system and technology.
Next is the advice that listed requirements are Realizable, that being feasible in the sense that they can
be completed with the available resources and project limitations. Finally is Traceable, that the requirements can be modelled and realised within the projects development cycle. With these guidelines in
mind, the following requirements were chosen.
3.6.4.1
Functional
These will later be classified into Use Cases and prioritised, as covered further under Addressing Requirements Risks, Section 5.1.1.
1. Ability to add and remove input sources
The ability to add new sources of input, to subscribe to an RSS feed as well as remove an existing
22
one.
2. Download material from many locations
The need for the application to be able to take source input from any site and to acquire content
from any location, rather than being tied down to a specific site.
3. Add filter criteria
The ability for the user to add specific filtering criteria of what to match and acquire.
4. Apply a created Filter to an RSS Feed
The ability to filter an RSS feed to select content.
5. Download an Item (Automatically)
The need for the application to support automatic acquisition of filtered items.
6. Import Source Item manually
Whilst the applications purpose is an update platform, another functional requirement is the ability
to add download items manually that has not come directly from a feed.
7. Download an Item (Manually)
The ability to allow users to manually start a download of an item that exists within the system.
8. Store a local database of Items
The ability to store a database of all items that have been filtered from an RSS source, and store
all the information associated with the particular items
9. Search of local database of Items
The ability to search through a local database of acquired material to a rudimentary level via the
item’s name to select a specific item.
3.6.4.2
Non-Functional
1. Platform Independence
The ability for the system to work on both Windows and Linux.
2. Faster Update Tracking than Manual Methods
The system must ultimately perform faster and require less effort to gain updates than standard
browsing techniques.
3. Logical and helpful layout
The application should make use of good design and HCI practices such as tool tips and graphics.
4. Integration into operating system
The application should look and ’feel’ like a standard application
5. Filter creation help
The application should offer some help for users who aren’t particularly knowledgeable about
regular expressions.
23
Chapter 4
Background Research
The aim of this chapter is to discuss and conclude upon development choices, justified by research and
the suitability of each option in fulfilling the projects aims. This section discusses the technologies most
suited for use in the development phases of this project. The first stage was to extract the project’s aims
and requirements and consider the underlying issues in each case:
• The product should allow users to get content from many locations requires that the product
allow for a generic and site impendent solution, capable of processing a supplied input from
many sites on request. This requires syndication techniques and information delivery channels
evaluation to obtain insight into the suitability of existing models and methods, and furthermore,
the functionality of existing techniques should be analysed to ensure that they offer the ability to
distribute general content adequately.
• The product should allow users to filter and select content implies the application of simple rules
on a generic list of content to attain a specific subset to a user’s specification. The implication
is the client must support a general input, and allow users to create and apply rules to filter what
content is wanted.
• The product should allow users to download material automatically as a requirement compels the
research of content distribution and how to facilitate downloading in the client application.
4.1
Content Distribution
In order to understand the problem, it was necessary to clarify the project’s tier of abstraction, which
is how the aims differ from other models of content distribution management in existence, particularly
in contrast to the totalitarian approach of other methods1 . Ultimately this involves clarifying what the
1 This was covered under analysis of existing technologies, Section 3.3. A particular problem of distribution channels is the
limitations in range of content, acquisition abilities and in compatibility, utilising solely internal transfer mechanisms.
24
project aims to do, and likewise what it does not. In terms of avenues of distribution, there exists a great
many applications that aim to facilitate transfers over various protocols at varying levels of success
and levels of adoption. At this point, it should be clarified that this project’s penultimate goal is in
simplifying the domain, and reducing human interactions required to acquire content rather than in
replacing actual established avenues of distribution. With this in mind, electronic distribution can be
split into several application/protocol categories.
4.1.1
Client - Server Models
Client-server models are the predominant network model in existence to date. In this model an end
user client communicates with one of more servers directly, which facilitate the hosting of services and
content, most often in a connection-oriented environment[35]. Communication can be client initiated,
where the client requests service from the server, and receives a response (Polling). There are inherent
problems in this in that with large scalable system, lots of clients all send requests to central servers
creating a potential bottleneck, the result being increased demands on the server in this system[33]. In
regards to creating a large generic content distribution system, the increased demands on web servers
is far from ideal. Likewise clients have no ideal way of determining whether service is available or
whether data has changed, and as such potentially there is a great deal of redundancy in this model. An
alternative approach to this in the client-server model is a server initiated connection, in which a server
polls a list of clients periodically. This alternative, in regards to this project, is poor in comparison. The
client looses all control over connection, and relies on fair play by the remote server. If the client wishes
to stop receiving content given that the server has total control of connection initiation, trust and fair
play is an issue. This problem can be prominently seen in areas of syndication such as email, where
spam has been the direct result of server side control over subscription and management. To further
this, the client list on the server side has no guarantee of integrity, given that it is held remotely and the
availability of clients in a distributed system is difficult to determine.
4.1.1.1
HTTP and the WWW
The Hyper-Text Transfer Protocol (HTTP) has achieved complete network protocol adoption, facilitating the World Wide Web application and the most notable bulk of network communications over and
between external networks[15]. HTTPs introduction was designed to facilitate the serving of interlinking text documents and the mark-up of basic information, based on request and response functionality
[35]. As such, its operation is stateless, and the client and servers themselves are relatively lightweight
implementations [33]. Despite its original purpose and design, numerous innovations have brought
more secure and varied communication abilities, such as the extension into state driven and secure
25
communication. HTTP is a text-based protocol, which facilitates the transmission of requests such as
GET and POST via a web browser, and sits on top of the Transfer Control Protocol (TCP) for actual
transmission along a network. In response to client requests, the HTTP protocol returns a file along
with a document header providing further information on the file from the web server facilitating the
serving of documents [3]. The protocol allows for a wealth of functionality, the original specification
for example allows for the transfer of just document heads via the GET command, which helps reduce
issues of redundancy identified in 4.1.1 in client initiated client-server models. For example using this
in conjunction with a document head with the If-Modified-Since field, it is easy to determine whether
content has changed server side and as such greatly protects against potential redundancy. The returned
item may be a document or a server error response denoting restricted access, missing or moved files or
other serving errors. An application using HTTP should correctly interpret and handle these events to
be fully standard compliant [3].
During the implementation of the project, further research of the HTTP standards was required in
regards to connecting to a page and being redirected to the actual hosted content. Initially, a simple
download of the URL listed within the item source was utilised, however, this caused the application
to obtain the content but with an invalid filename. Upon inspecting the HTTP headers, the ContentDisposition could be seen as the source for the filename and the program was modified to utilise this
aspect of the HTTP standard allowing for file attachments to be used with dynamically created pages.
Once this part of the specification was identified, it was a simple matter of parsing out the required
filename attachment and allowing for filename extraction from dynamic pages and redirects [3].
HTTP Header 1 HTTP response header showing use of Content-Disposition
{X-Powered-By=[PHP/5.1.2], Date=[Mon, 6 Feb 2006 21:10:03 GMT],
Content-Type=[application/octet-stream],
Content-Disposition=[attachment; filename=application.exe],
Server=[lighttpd/1.4.11]}
As can be seen from HTTP Header 1, the filename of the content required is hosted elsewhere on the
server and is linked to as an attachment by the dynamic page script. In this case, application executable
data is linked to with a filename of application.exe.
4.1.1.2
FTP - File Transfer Protocol
The file transfer protocol is another widely utilised method of distributing content. Whilst HTTPs
original purpose can be pinpointed to serving documents and marked up linking text, FTP is a simpler
method which predates this. FTPs design was to solely facilitate the transfer of files too and from
26
a remote server, and even between remote servers[35]. Given the prominence of the protocol preworld wide web, and the current utilisation of FTP being high[15], it is logical that a content retrieval
application should support this protocol.
4.1.2
Peer-to-Peer Filesharing Networks
Since the advent of peer to peer technology, this avenue of distribution has grown rapidly in utilisation.
The first major application to facilitate the distribution of content was Napster in the later 1990s, which
allowed for the distribution and acquisition of audio files [35]. Following its demise due to lawsuits
over content copyright issues several other networks rose to prominence, and to date, many large scale
networks dominate the environment. As such, a filesharing application’s peer to peer structure is by nature more complex and varied than traditional client-server networks, and the network traffic is likewise
either search based or index based, with almost all mature third generation applications capable of multiple source downloads natively, in contrast to the client-server applications above. As such it is difficult
to implement existing peer-to-peer network functionality natively, and as such utilisation of filesharing
networks will be of supplementary concern for this project. However, many prominent networks such
as emule and bit torrent facilitate the distribution of file references that point to a content footprint on
the network via various hashing models 2 [6]. This method has been put into use by distributors of large
software artefacts, for example, Linux distributions to ease the requirements of any central server previously used to distribute content. As well as utilising third party networks to acquire content, it is also
possible to integrate peer to peer functionality and create a fat client for managing content acquisition. In
such a system, content source lists can be distributed amongst clients and take some of the requirement
away from server polling and transfers. However with this complexity comes great issues of security
and trust, how a source can be determined as safe and how the content’s integrity can be guaranteed
pose huge problems as without an appropriately secure hash function, a unique reference is not always
guaranteed to avoid file collisions.
4.2
Syndication Technologies
Syndication is the terminology associated with information distribution and management. Syndication
is the term for the distribution of news content and information to a list of recipients to promote new
releases, products or to promote the company by syndicating promotional content. In a formal sense,
syndication is most often linked with business-to-business information sharing and collaboration, who
have a long history of sharing content via processes such as EDI. Speaking of syndication in regards
2 emule clients utilise a proprietary MD4 based hashing mechanism to summarise network content as a URL, whereas BT
uses a supplementary .torrent file that summarises the content and location in more detail utilising a SHA1 hash.
27
to business-to-customer (content publisher to consumer) information sharing, in particular reference to
computing, content publishers utilise a great number of avenues of distribution. Primarily these are
email, the web and news syndication avenues [5].
4.2.1
Push
The concept of push can most easily be defined in a business sense, pushing content such as news and
press releases out ‘beyond the firewall’ where the recipient has no control over what is received, only if
it is received [10]. In defining the techniques, traditionally the predominant forms of pushing content is
via email, newsgroups and publication, however, since its introduction RSS has offered an increasingly
popular method for distributing content centrally [18].
4.2.2
News
The use of newsgroups for communication has been a part of network computing for decades, and still
boasts large coverage even in the face of rival methods. The basic premise is that a user accesses and
subscribes to newsgroups on public or private servers and from here synchronises subscribed groups
with this server [35]. The server itself is synchronised with further servers in a network of newsgroup
servers. Depending on access rights and configuration, users can post new topics and replies, and retrieve
files from the server. The model does inherently have many associated issues, for example, there is no
guarantee of retention of content and that posting to a particular newsgroup server will spread to other
news servers. As an actual avenue of distributing content, although the technology supports the attaching
of data to news posts, the means of distribution are far from ideal. Restrictions on post and data sizes
have ultimately mean that the service for content distribution has largely only been utilised for driver
and small file distribution, and illegal activities [31]. To this end the utilisation of news as a source for
content is infeasible. Lack of retention on the majority of newsgroup servers, and the low spread of
the technology to home users mean that the technology is unsuitable for a generic distribution model,
both in terms of notification of content release and of distribution of material in itself. Predominantly,
newsgroups are an area for discussion and support rather than a means of publishing and announcing
the release of content.
Recent trends of internet usage have pointed to an overall decline in the usage of Newsgroups,
largely attributed to the robustness of electronic communication in other forms, such as the WWW [8].
This is understandable given the age and rigidness of news as a discussion platform, the expressiveness
and freedoms of information is much more evident elsewhere, and the ability to communicate beyond a
plain text environment is one such reason for this.
28
4.2.3
Email
Email is now one of the most common electronic forms of communications in use to date [35]. However,
with this huge utilisation comes problem of control and usefulness as a critical or formal channel of
communication. A huge problem with email in its current state is the huge amounts of unsolicited mail
(SPAM) that is in circulation. The problem exists because of total publisher control over the sending
and the recipients of a mail message, leading to the abuse of these responsibilities.
As such, E-Mail is considered a lesser form of communication in terms of formal communication.
Mail messages are unverified and untrustworthy, and consumers loose all control over what they wish
to receive and from whom. When an email address is obtained, it can be sent any message a sender
desires. As such, utilising E-Mail for a source of program update notification is a far from ideal method
of syndicating release information. Mail can be artificially identified as SPAM, spoofed by a malicious
third party or otherwise invalidated with relative ease. Many companies actively release information via
this avenue because of the low cost and wide reach however.
4.2.4
RSS
RSS (RDF Site Summary) is an XML technology introduced by Netscape in 1998 as a means to push
content too and from a central web portal, which parsed this content into usable and marked up information which was finally displayed on the site. RSS works on the notion of breaking down a site’s content
into base metadata, and then using XML to mark this metadata back up into a singular XML file that is
made publicly available over the World Wide Web. Subscribing parties may then use this feed and parse
out required data back into meaningful information for their own purposes.
The advantage over server controlled syndication such as email is that information is request orientated, retrieved to an end client’s specification not to the publisher’s specification. This allows for a
subscribing party to retain full control over the content, and that upon publisher abuse the subscriber can
remove the feed and no longer receive the content. The favourability of RSS over email is expressed
clearly in [18] where it is stated as being largely down to the fact that, ‘to retrieve a feed you don’t need
to provide you e-mail address. That is why people prefer RSS distribution to e-mail newsletters.’
RSS is most notably present in news and blogging communities to syndicate headlines, and because
of its lightweight and simple syntax is being adopted rapidly [5].
4.3
RSS - RDF Site Summary
With RSS feeds’ original design by Netscape its purpose was in the sharing of news instantly between
affiliate sites, and to syndicate static news content to subscribed parties. In its original form the aim was
in alleviating bandwidth costs by sending simplified text summaries of a site’s content without HTML
29
mark-up-tags and deliver this stripped down summary to a requesting parties RSS Reader or web portal
on the client side.
The benefits of syndication via RSS are numerous, in comparison to distributing via a standard
webpage, the use of RSS provides a way of standardising the structure and syntax of the content [18].
Originally, the syntax of RSS was RDF tag based, which is a more complex standard of XML mark-up.
This was later simplified to basic XML entity tag structure, and ultimately adopted the name Really
Simple Syndication to denote the standard change. From this point the standards further fragmented
as rival companies developed their own standards on the two branches of RSS Standards. As well as
structure and the ability to defeat SPAM due to the consumer controlled connection, RSS provides direct
support for the attaching of non-text content through the extensible nature of the technology [30].
Despite what RSS offers, there have been numerous criticisms of the technology for various reasons.
The criticism largely focuses on its XML base, and its relation to bandwidth costs. Whilst XML results
in a coherent and easily readable format, this comes at the cost of increased file size. In terms of
network transfers, XML is far from optimum. [12] claims that RSS is not ideal for syndication, and
that the bandwidth costs associated it is too high. However, despite these criticisms other sources claim
that issues of bandwidth can be addressed via proper client side and server side implementations. For
example implementation of a header GET will reduce redundancy problems linked to the user initiated
polling model[23].
4.3.1
Metadata Mark-up
RSS mark-up relies on relatively basic XML entity tags which encapsulate core metadata into item
entries, composed of a number of required and optional tags such as ‘title’, ‘link, and ‘description’
within an ‘item’ tag. This is shown more precisely in XML Fragment 1, with a sample item and its
primary tags. Whilst title and link are required features of an item, description can be omitted to present
an item in its simplest form.
XML Fragment 1 RSS feed item
<item>
<title>Title</title>
<link>http://domain/valid/url</link>
<description>Sample Item</description>
</item>
As the RSS standards have matured more functionality has been attached and the standards offer
a host of useful data tagging options. However with increased amounts of mark-up comes greater
30
bandwidth costs associated with the feeds.
4.3.2
Standards
Like the acronym itself, there is no official standard to define an RSS feed’s syntax. Whilst the encompassing documents structure is XML, the specific definition language varies and considering each
revision there are in fact nine standards of RSS actively used. This was the result of Netscape abandoning the format in the early stages of development, only to have other companies take over the responsibilities of continued development. There has been continued criticism of the standards for this
reason[25]. Given the incompatibilities between formats (and in fact RSS 2 being incompatible with an
earlier version of itself), the lack of clear singular standards has resulted in lax implementations of RSS
feeds and varying methods of attaching content to feeds as well as marking up the content itself. These
lax implementations in turn result in greater requirements on RSS client developers to create a client
robust enough to support all the standards, and variations that the standards allow. As [30] summarises.
As a result authors typically test their feeds in a small number of ... aggregators, and [developers] ... are force to reverse-engineer market leaders’ behaviours with little guidance from the
specification.
The most prominent version of RSS in use is the semi-official RSS 2.0 and Atom Standard[30].
4.3.3
Original Use and Project Extension
Given the original purpose of RSS feeds to syndicate basic text summaries, the developing standards
have provided a means to beyond this pure text nature into an avenue of distributing any content with
the support of attachments to a particular feed item. In 2004, this was first utilised to attach audio
clips to a feed for syndication, but despite this ability in the RSS standards only basic applications
have arisen to exploit file attachments. As demonstrated in the evaluation of existing clients in Section
3.3, existing applications fail to take a content distribution centric stance and instead utilise feeds a
sources of plaintext news. The technology is serving the purpose but has yet to be proven so through
the development of an application designed solely to do this, which is the aim of this project.
4.4
Content Management
The utilisation of Database Management System (DBMS) for local data storage such as Microsoft Server
or Access have a host of advantages in terms of performance enhancements, optimisations, speed and
native query languages as well as integrity checks. As a managed solution, utilisation is dependant on
a local installation of a third party management system, but because of this the usage is usually feature
rich [9]. The database schema can be guaranteed and data types can be enforced and checked, and the
31
overall structure of the database, particularly if the database is relational is managed. In addition to this,
transactional interaction is supported allowing for mistakes to be rolled back.
The alternative is to utilise a flat file storage system, where all of the advantages outlined above are
lost, but the system is no longer dependant on third party management, has reduced amounts of code
and allows for a totally portable solution. Given that one of the minimum requirements of the project
is for operation on both Windows and Linux based systems, for this reason flat storage seems the most
appropriate option despite the loss of functionality and efficiency. Some of the enforced features of a
DBMS can be achieved in a flat file system by the utilisation of XML to enforce basic data types and
structure.
4.5
Client Language
There are a large number of possible implementation languages for the client application. From these,
three main languages immediately stand out as the best suited candidates with high level functionality,
support for object-orientated implementations and their native networking abilities. From these, one
was chosen that best reflected the needs of the application.
4.5.1
C++
C++ is a widely utilised high level language that extends C to add Object-Orientated traits. As a compiled language, C++ provides quick execution times which interpreted languages can not match. The
language also provides low level functionality if required as well as mature libraries that provide a
wealth of functionality. However, a drawback to being a compiled language is that the compiled binaries loose portability and require the source be recompiled on different operating platforms with possible
code ports, for low level functional calls that can not be replicated in different environments. Similarly,
the portability of libraries also poses a problem for its use. User interface design is supported most often
by the utilisation of libraries such as GTK+ (http://www.gtk.org/ ) which offer portability amongst the
most popular operating platforms, which provide a means for developing high quality user interfaces
4.5.2
Python
Python is a modern interpretive language, developed in C, and provides a syntactically simple method of
developing applications with great support for networking. Whilst utilising simple syntax, python provides a wealth of high level functionality and is increasingly used for network applications (Such as the
aforementioned BitTorrent client), as well as middleware and server scripts. The language also provides
support for basic user interface implementation, supported by external libraries such as a python port of
GTK+. As a language, python is becoming increasingly popular, most notably in academic circles, and
despite its simplistic syntax, provides a means for realising object orientated design and can easily meet
32
the functional requirements of this project.
4.5.3
Java
Java is a popular high level language and offers a strong choice for the development of this project.
Whilst Java requires compilation, it is in fact classified as an interpretive language. The Java Virtual
Machine (JVM) reads the compiled byte code, and translates this to platform dependant instructions,
allowing for the execution of Java applications on any platform with the Virtual Machine available. Part
of Java’s strength as a language is the large number of native and third party libraries available providing a wealth of functionality and supporting rapid development. ROME (https://rome.dev.java.net/ ), a
particular third party library provides a means of acquiring and parsing RSS feeds of different standards
with the aid of an XML parsing library, JDOM (http://www.jdom.org/ ). The utilisation of these two
libraries vastly reduces the amount of work required in programming the networking communications
and the reading and parsing of the local database stated as an extended requirement of this project. Natively, Java provides great support for GUI design through its Swing and AWT libraries, and with the
support of JDIC (https://jdic.dev.java.net/ ), the HCI non-functional requirements of desktop integration
can be achieved. There are drawbacks to Java, largely relating to the requirements of having the java
virtual machine on the host computer and the performance impact of being an interpretative language.
However, these disadvantages aside, the language offers a strong implementation choice.
4.5.4
Conclusion
++
Type
Portability
Object-Orientated
GUI
Speed
Networking Support
Supporting Libraries
Testing
C
Compiled
Requires re-compilation and porting
Yes
External Support
Fast
Yes
Lots
CppUnit (Java Port)
Python
Interpreted
Requires re-compilation
Yes
Some External Support
Fast / Medium
Yes
Many
–
Java
Interpreted
Portable byte code
Yes
Native
Slow on first run, Medium
Yes
Lots
JUnit
Table 4.1: Comparison of Languages
All languages are capable of realising a product that would reach all of the functional requirements
of this project. However, the ease in achieving this differs greatly between languages. C++ would provide the smallest and least resource intensive solution, however, in contrast to java it lacks the portability
required as expressed as a non-functional requirement, and would require more coding and implementation time in contrast to the library supported functionality provided for a Java solution. With strong
native GUI building support, and the ability to design and create test cases using JUnit libraries, Java
provides the most sensible means of achieving the project’s implementation goals.
33
Chapter 5
Design and Implementation using the Unified Process
5.1
Development with the Unified Process
The development of the project using the Unified Process allowed for rapid realisation of the project.
Given the size and wealth of documentation produced by development with the Unified Process, detailed
records of design can be found in the projects appendices. The following sections indicate key methods
of mitigating risk utilised and development summaries.
5.1.1
Addressing Requirements Risks
The bulk of this project’s requirements gathering techniques are covered in Chapter 3. From here,
the requirements were formalised into a specification which defined the focus of development and the
goals and aims of the system. This Requirements Specification document is formal presentation of the
requirements, problem domain and other aspects of development discussed during this report, due to
its length and the repeat of data found with Chapters 1 - 4, it has been ommited from this report but
is detailed in Appendix F. The specification itself is based on a Requirements Specification template
provided by Atlantic Systems Guild (www.systemsguild.com). From this formal document, it was then
a matter of modelling each requirement as a Use Case, and then realising these Use Cases to create
architecturally significant diagrams. From the UML model of the Use Cases, it is easy to classify the
importance of particular cases and thusly define the core ‘Must Have’ functionality.
An important reason for the formalising of user requirements was to avoid function creep, where
more and more requirements are requested as the project developed. Whilst this is important to capture,
it is undesirable to have the focus of the project compromised by increasing amounts of requirements in
later iterations. As such, by presenting the core needs of the application in a Requirements Specification,
the required functionality that should be delivered can be documented and finalised.
From here, realisation was achieved through utilisation of a Use Case Description Form (UCDF)
(see Appendix I) which defined the data flows and steps required for each Use Case. From the UCDF,
34
(a) Must Haves
(b) Should Haves
(c) Could Haves
Figure 5.1: Use Cases: (a) Must Haves, (b) Should Haves and (c) Could Haves
an Activity Diagram, which mapped the data flows into a sequence of steps taken by the user, and then
finally a Sequence Diagram, which mapped the system calls needed to carry out a task, were created
(see Appendix K). By first formalising the gathered requirements in the form of a Requirements Specification, which could be shown to users to gain agreement, and then utilising Use Case development
which expressed this document clearly to the user base, this ensured that requirements gathering risks
were correctly mitigated and attacked.
5.1.2
Addressing Architectural Risks
Based on the domain presented in 3.1.1, the architecture of the system can be formalised with a Deployment Diagram, which addresses the architectural and technical requirements of the system. [1] clarifies
the need for its production, ‘to model hardware that will be used to implement the system and the links
between different items of hardware’. In this case, the Deployment and Component stages have been
combined to show a picture of the dependencies between components of the system.
Figure 5.2: Deployment Diagram
The client application is dependant on the availability of the RSS feed sources as well as Item data
35
sources which are hosted either on the same server or on two separate servers. As shown above, there
are two distinct network communication transactions. Firstly the network is utilised to request the RSS
feed from the server which is then retrieved and undergoes local processing. The second transaction is
in receiving the item. In terms of minimum requirements, acquisition is required from either the BBC
fileservers or that of legal torrents, which are provided over HTTP. However, an extended feature can
be to utilise external networks to acquire content. In later implementation stages, this was achieved
by setting file type based downloads to a particular location, which initialised retrieval in an external
application (in the case of torrent files), or by executing applications by MIME type (ed2k protocol
downloads, for example). HTTP and FTP downloads were handled natively by the application.
5.1.2.1
Redundancy
As seen in the Deployment Diagram there were two network transactions that ideally should be kept to
a minimum to save resources. In the requesting of a download of a file, it is not so much a concern,
as downloads are merely a single connection that is utilised until the download is complete, with no
repeat requests. On the other hand, the RSS request connection poses some redundancy, given that
updates are not assured. As previously covered in the Background Reading Chapter surrounding the
HTTP standards, the serving of these feeds over HTTP allows for the utilisation of a header GET which
reduces the amount of redundancy. Although this means extra overhead receiving the headers when the
feed is updated, its benefits are realised when the feed is not updated, as substantially less data will be
transferred. The adoption of the ROME RSS libraries offers an implementation of this feature and helps
reduce the central RSS server load by assuring clients obey HTTP standards.
5.1.2.2
Three Tier Architecture
Figure 5.3: Presentation, Application, and Data layers (PAD)
Good practise in designing high quality systems architecture is to split applications into three tiers
of operation, those being the Presentation, the Application and the Data layers (PAD). The Presentation
layer focuses on user input design and HCI issues, and is supported by several of techniques such as
prototyping and agile design sessions within the Unified Process. The Application layer focuses on
programming the functional requirements of the system and is the focus of Use Case realisation and
object-orientated design principles. The Data layer is the implementation of persistent data storage,
36
through the utilisation of flat files, database management systems and other database concepts. The aim
of this concept is to move functionality away from the immediate presentation layer to the application
layer and the handling of data from the application layer to the data layer, which has numerous advantages in achieving a high quality system. PAD ensures that the system is loosely coupled, and that for
example, interfaces on the presentation layer can be modified with feedback from the users, but this redesign is independent from the application layer development and should not alter the overall behaviour
of the application. In such a case, the Presentation layer provides a means for users to interact with the
system but not to directly manipulate data or the system. This also makes for a sound security policy of
mistrusting all forms of user input. Whilst method invocations from the Presentation layer should under
go heavy validation, the Application to Data layer boundaries can be trusted to a certain degree because
of the inability of user’s to directly manipulate the data.
5.1.3
Addressing Design Risks
The designing of the system using the Unified Process led to the rapid development of the system.
The completeness of the requirements gathering stages meant that Use Cases could be quickly created
and translated into a Design Class Diagram, through the realisation of the Use Cases with Sequence
Diagrams. This showed the required objects and methods required to create the system. The final
Design-Level Object Diagram follows (earlier versions can be found in Appendix L).
5.1.3.1
Presentation - User Interface Design
The design of the user interface was modelled with continued feedback from users during agile design
sessions and the utilisation of prototyping to quickly bring about viewable designs. Given that all
functionality was moved from the presentation tot he application layer, this allowed for free review
and redesign of the presentation layer. Agile design is the process of robust and open review sessions
where developers and users discuss functionality and the look and feel of the system directly, with tools
and guidelines in changing aspects of the design without wasting time through overly encapsulating
procedures. The lightweight and highly interactive agile modelling yields high levels of productivity
given the small time costs of the technique and the direct interaction with users. The result of this is
that the technique greatly helps in mitigating design risks by getting a favourable design realised with
great speed. User interface prototyping can be found in Appendix N along with other agile development
artefacts.
5.1.3.2
Application - Refactoring and Central Classes
When designing the core application, the system underwent a series of refactoring stages, with the aim
of simplifying its design and thus reducing its complexity. At the center of the products design was the
37
Figure 5.4: Design-Level Object Diagram
concept of a centralised Scheduler and Retrieval architecture, which scheduled and retrieved downloads
respectively. Originally the systems design was based around a single Retrieval which handled all network communications, and a single Scheduler which handled all the polling and download scheduling.
As development progressed, this proved inefficient and led to concurrency and class type collisions, that
being poor, unintuitive casting methods. To combat this, the central architecture was refactored to introduce higher levels of inheritance and to decouple acquisition from the management of acquisition. The
central Retrieval class became an abstract class, with RSS and Item specialisation that better handled
the requirements of managing the two types of downloads and then further processes them. Similarly,
the acquisition itself was moved into a Download class which handled network communications and
file creation. This was similarly the case for the Scheduler class. The use of refactoring is important in achieving simple, efficient design and is sought after when developing with the Unified Process
[28], both in terms of high quality design, but in terms of future expansion given the Object-Orientated
preference of the methodology. Figure 5.5 represents the refactored classes.
5.1.3.3
Data - Logical and Conceptual Design via ERD
With regards to data storage there are numerous problems such as duplication, redundancy and consistency that can affect a systems performance. These problems, however, can be overcome by adopt-
38
Figure 5.5: Design-Level Central Classes
ing sound database design techniques. The Relational database model is the de facto standard design
paradigm in utilisation today. In this model, the principle is the storage of facts once in a single location
and then the interlinking of related tables of data via unique primary keys record identifiers. Design
with the Relational Model is supported through the utilisation of an Entity Relationship Diagram (ERD)
which attempts to prevent the aforementioned data storage problems by meticulously remodelling the
entities of the system. In an ERD, entity objects are drawn and the multiplicity between entities is
used as the basis of determining how a database table need modelling to represent all the required data
yet also storing this data only once. This model differs from the basic structure of the entity classes
within the Design Class Diagram, namely as ERD modelling is concerned primarily with ensuring data
is stored once. This is largely due to the limitations of traditional database management systems and
their inability to store single or multiple records directly in a field of a table and instead index primary
and foreign keys, whilst object-orientated languages like java offer the ability to store one or more object
references directly in a field of another object. An example of this is seen in Figure 5.6 where the feeds
attribute of Filters is an ArrayList of RSS. In a Relational Model a separate table of record indexes is
required to hold this many-to-many relationship. In translating between the data layer to usable data
objects matching the Design Class Diagram, the two tables that make up the many-to-many relationship
are merged and an ArrayList of RSS is constructed and set as the feeds attribute of Filters.
The model contains several many-to-many relationships where one record in one entity relates to
many records of another and those records in turn relate to many records in the original table. Before
39
Figure 5.6: Conceptual Level ERD and Design Class Diagram equivalent
normalisation (the method of removing repetition in the model), records are stored several times which
leads to consistency problems. The Conceptual Model (Figure 5.6) is translated into a logical model
which solves this problem by normalising the entities, splitting them into several tables by carrying out
vertical decomposition. The logical model can be found in Appendix M. Given that all of the entities
have no unique identifying field, it was required to create an artificial key for each entity to uniquely
identify records. Figure 5.7 represents the final database schema of the data store which is in Boyce
Codd Normal Form (BCNF).
5.1.4
Addressing Implementation Risks
The final distribution of the application is available for download from http://autofeed.jacobbriggs.com.
5.1.4.1
Presentation - User Interface Implementation
When implementing the UI, issues of HCI were taken into account. An important aspect of UI implementation is the logical layout of functionality. The short term memory of humans is used heavily
to store a lot of details of various applications and their functionality with computer operation. Most
sources, such as [7], claim that a user can only remember a certain number of steps, and that applications
developers should ensure tasks require no more than 7 ± 2 steps to be remembered. To this end, GUIs
and menu depths were kept below this limit to combat any confusion. Similarly, relevant information
40
Figure 5.7: Database Schema in BCNF
was restricted to a single tab and tool tips were used to prevent users having to rely solely on memory
to complete tasks.
User Interfaces for the most part integrated validation on all sources of input to avoid erroneous data
from entering into the application layer. In cases where a user commits an action, a visible response is
given to ensure the state is known and the actions have either succeeded or failed. UI design authors
promote this in implementation feature, in that user’s expectations should be visibly confirmed [7] [32].
(a) Tray Icon
(b) Balloon Message
Figure 5.8: Example HCI: (a) Idle Tray Icon and (b) Critical tray icon and error message
Part of the non-functional requirements for the system was the ability to integrate the application
into an existing operating environment. This was achieved through a number of methods. A third
party library, JDIC, was utilised to achieve a number of extended features such as the ability to hide
menus and keep a tray icon present in the operating tray. This allowed the application to continue to
function transparently without a continual on screen window, yes still allow users at any time to view
the application. Critical feedback was given via ‘balloon messages’. These features proved popular in
feedback interviews. Similarly, other aspects of the interfaced were considered with regard to standard
41
HCI techniques. Menu systems were created to ensure that all features within the application were
available within a maximum of seven stages, to ensure that tasks could be performed simply and were
easily remembered. This accounts for the higher times for carrying out tasks in initial tests during the
testing of the application (evidenced in Section 6.1.3).
5.1.4.2
Data - Physical Implementation via XML Flat File System
The implementation of the database was via a flat file system, marked up in standard XML format and
parsed and converted by the program through the utilisation of two classes SystemData and FileParser.
This structure allowed for a dynamic input, where standard calls in SystemData populated HashMaps
which contained lists of objects constructed through the parsing of input from an XML file. The reason
for choosing XML over a database management system implementation was for reasons of portability.
An early non-functional requirement expressed in early interviews was the ability to make an operating
system independent and portable application. Whilst database management systems boast high levels
of optimised database management functionality, it is a requirement to have these systems installed on
the host operating system to utilise them, and thusly utilising an external system compromises this nonfunctional requirement. To this end, the decision was made to utilise an XML parser that was used for
parsing RSS feeds by the application already, and to extend its usage to handle local input. This allowed
for a database of flat files to be created that allowed the application to work in any operating system
environment without being dependant on any third party utility. Examples of the XML flat file database
can be found in Appendix M.
There were drawbacks to this system, not only in terms of the technology as previously covered but
also in the differences in data representation between Java and standards compliant XML which was
witnessed in the execution of the JUnit testing (see Section 6.1.6).
5.1.5
Simplifying the Domain
Even though the process of feed traversal to receive release notification and to obtain material is simplified by the use of RSS feed management, the issue was raised in feedback meetings that it is often
hard to browse through sites, searching for appropriate RSS feeds. It is true that feeds are often hard
to locate within a site. The effort required to find repeated updates, which is tackled as a minimum
requirement, is the same for initially locating the feed within a sites structure (which can be seen in
Appendix P as the manual acqusition times), and then after adding the RSS feed to the product, there
was still the requirement to add a filter to identify material that is required from it. This adds a level of
manual interaction that can be large eliminated to reduce the workload for the consumer.
To this end, a solution was devised to try and combat this annoyance, and that was by introducing
42
a custom file format that would summarise the location (that being a list of source RSS feeds) of the
download, and also the contents filename aliases within the feeds as Filter objects, summarised as a Filters class (See Design Class Diagram). In addition, the notion of advanced scheduling for the expected
time of updates were added as an optional extension to this file format, to allow for the content publisher means to quickly make a file that would provide all the required information for a client to receive
updates with one click. The idea was to significantly reduce the workload required by the consumer by
adding a hosted feature that could be made once by the content publisher with very little effort and provided on the content publisher’s site, or shared directly amongst users. To facilitate the easy production
of this summarising file, RSS, Filter, Filters and Schedule objects could be bundled and exported from
the client application. The payback for the host came with the ability to have ideal retrieval settings
setup for the users as default, and to optimise bandwidth costs associated with user polling network
models. For the user, the ability to transport settings between clients or to quickly configure their client
from a trusted source with one click proved hugely beneficial. With the added schedule data, a host
could easily put in data relating to how frequent the updates to the feed are likely to be, and as such
intensify queries of the RSS during that period, but leave the rest of the time when releases are unlikely
to infrequent requests.
This extended product feature allowed for an even higher level of automation than set out initially
in the projects aims, when adding a specific classification of content to the product, all that was required
to add and begin filtering an RSS feed for specific content was a single file that was associated with
the product at runtime. After its implementation, review groups presented indicated this was a positive,
time saving feature and as such helped boost the overall quality and professionalism of the product.
5.1.6
Polling History and Learning Responsible Request Times
The ability to ‘intelligently’ learn an appropriate delay between subsequent RSS feed retrievals, and
more over to learn a sense of when the likelihood of a feed updates would serve as a great optimisation technique in reducing both network demands, and server requirements. There are two methods of
achieving this simple optimisation, either by putting an extra requirement on the server or by putting an
extra feature into the client, both of which have advantages and disadvantages. In regards to providing
an extra detail with the RSS feed on the server side, given the flexibility of XML, it is entirely feasible
as an implementation detail that a ’repeat request’ tag be added to the feeds header, and indeed this
is something supported in some RSS feed standards. RSS 2.0 supports the notion of skipHours and
skipDays tags, which numerically represent request periods that should be skipped by an RSS client,
however there is no field that numerically expresses an ideal request query delay in the standard natively
[36]. As an unfortunate part of the split standards, Atom provides no support for these tags and thusly
43
the use of these would mean that servers using Atom would have to implement none standard elements
and thusly is impractical [22]. This means that practically, server side optimised query request times are
not a provided part of any RSS standard in existence, and as such, this ultimately means that the client
itself is required to calculate an optimum request time rather than expecting an adoption of a none standard feature with server side implementations. This can be achieved by utilising the time of publication
header fields that are present in all standards. In RSS variants of the standards, this is provided by the
header field lastBuildDate which stores the date and time of the last update to the feed. Similarly, in
Atom standards, this is provided by the updated field. From here, it is a matter of storing the actual
request value nk , the calculated time between updates from the headers rk in hours, and the time of the
last header update tk , where k is the most recent request in a window of x. This can be better expressed
in the following algorithm.
1. Initialise default values:
n0 = r0 = 12hrs
t0 = time of RSS build in initial request
2. If RSS updated at n( k − 1)
if (k > 1)
t0 = t1
t1 = time of RSS build
Replace the first element in the window
if ( k > x)
r( 0 · · · k − 1) = r( 1 · · · k)
n( 0 · · · k − 1) = n( 1 · · · k)
rk = t1 − t0
if ( rk > r( k − 1))
rk = r( k − 1) − rk
if ( rk < r( k − 1))
rk = rk − r( k − 1)
nk = (r0 + · · · + rk )/k
3. Repeat 2
By implementing this algorithm, it is possible to tackle some of the inefficiency in simply choosing
an arbitrary value to request a response. With this algorithm, the request times converge on an ideal
value over time and can correct, or rather dampen, some of the noise between request times (See 6.1 for
evaluation). There are limitations to its success. In cases where the RSS feed is dynamically created its
use is impossible, as the value would eventually converge on a request time of zero. To this end its use
must be carefully considered, and must be limited to static file implementations with feeds constructed
on the server side rather upon client request.
44
Chapter 6
Testing and Evaluation
6.1
Testing
The first stage when carrying out any evaluation is to perform a variety of tests to ensure that the requirements have been met. With this project, it was important to test that the client application performed as
it should, that human interaction was handled and the application maintained a stable state, and that the
system was accepted by the prospective users.
6.1.1
Input Validation
Part of testing a system that is intended to be used by a large set of users is the test to ensure that
the system handles and stores any input by the user appropriately. For this, a series of test cases were
created and performed to ensure every field and point of user interaction was correctly validated to
ensure erroneous data could not be entered into the system. For these tests, standard boundary tests were
performed to ensure erroneous, empty or large amounts of input were correctly handled and prevented
from affecting the application’s state. To simplify execution, these were automated with the help of
JUnit Test Cases (See 6.1.6). In cases where user input was rejected, and recommended by HCI studies
covered previously, users were prompted of any errors.
Figure 6.1: Notification of invalid input
6.1.2
Security Testing
The next series of testing was to ensure that the application responded appropriately to security attacks,
those being attacks that aimed to achieve access to areas of the system that a user does not have au45
thorisation, and attacks on the application, and surrounding operating system to effect the integrity of
operation. The product produced was a relatively open system, with no need for authorisation. However, the application does download content directly to a user’s computer and as such needed testing for
a variety of attacks to ensure the application responded appropriately. Modification of local files and
settings after run time, for example, had no effect on the data held within the application, and in fact
changes made manually to the database are undone when the application next saves. In terms of network
code, utilisation of existing mature Java libraries reduced some of the risks in network communication
given the level of testing that these libraries under go.
6.1.3
Functionality Testing
With this set of tests, the speed in which users were able to carry out tasks were measured and compared
to manual browsing methods as a baseline. These tests were carried out to ensure that the application
both functioned and at a level that was more efficient than manual methods. These times can be found
in Appendix Q and show the applications advantage over manual methods.
6.1.3.1
Algorithm Testing
In regards to the simple polling algorithm summarised in 5.1.6, it was appropriate to carry out a single
test comparing the improvement to the baseline classifier of a set unit request time. For this test, a test
RSS feed was setup and entered into the application (http://www.jacobbriggs.com/forums/rss/rss.xml)
which was updated every eight hours, for a series of twelve days. A default request time of twelve hours
was setup and was entered as the initial value for both the algorithm and a default baseline comparator
(without the algorithm). For the algorithm, a window of ten was used. The effect of increasing the
window increases the chances of suppressing noise, for example a rare occasion where an update was at
a none periodic time.
Actual
Baseline
Baseline Accuracy
Algorithm
Algorithm Accuracy
Update 1
0
0
n/a
0
n/a
Update 2
8
12
66%
12
66%
Update 3
16
24
66%
20
80%
Update 4
24
36
66%
24
100%
Update 5
32
48
66%
32
100%
Update 6
40
60
66%
40
100%
Update 7
48
72
66%
48
100%
Table 6.1: Algorithm vs. Base Line
This is an artificial example, where a set eight hour period between updates was used. In reality
it is unlikely that updates would be released at an exact set period of time, but would be released at a
similar range of times in most cases (For example, the BBC feeds were updated at the same time on
the same days of the week within a few hours of deviation). The algorithm successfully dampens noise
46
and reduces the overheads with time. As you can see with the above algorithm, the baseline comparator
missed Update 6 altogether. Whilst the algorithm does not ensure an update will not be missed in this
way, it greatly reduces the chances that this would happen.
The algorithm is simplistic and can be improved with more time and features. For example in its
current state, if the next update were at nine hours, this update wouldn’t be downloaded for another 8hrs
after the request. However, this update time increase would be added to memory and the next update
time would be increased with the new update times converging on the optimum nine hours.
6.1.4
Deployment Testing
With this series of tests the aim was to ensure that the application could be deployed on a series of
working environments, that the application performed appropriately, and that the program maintained a
consistent state of operation between them. For this test, the application was distributed and installed
onto several different operating systems, namely Linux (Fedora), Windows XP (SP2), and Windows 98
(SE).
There were slight discrepancies in regards to the interface of the application, for example the tray
icon on Linux systems using KDE looked different to that of windows systems where primary development had been made, but in the most part these were a simple matter of different tray icons size and
positions of tool tips within the user interface. Operational differences were negligible. Whilst the file
systems of windows and Linux posed a problem, differences were tackled early on in development and
as such the program operated correctly, saving files to appropriate locations between environments.
6.1.5
Performance Testing
Later iterations yielded more successful results under the performance category. Optimisations to the
threaded downloading code, and some further modifications to the network code, such as forcing a
disconnect on opened connection rather than waiting for unused connections to be dropped by the server,
which freed up sockets on the client application. These were evidenced by drops in the CPU usage and
open network sockets respectively.
6.1.6
J-Unit Tests and Results
As stated earlier in the Project Management chapter (Section 2.3.2.2), a test driven approach to development was taken, utilising JUnit testing to evaluate the success of the developing system. This provided
a standard set of tests to perform to ensure that the system remained functional despite ongoing changes
and refactoring. Early JUnit tests highlighted problems in entity integrity, and file input/output. For
example, after the first iteration of programming it was found that in the range test characters such as
‘&’ caused input to become unreadable, as ‘&’ is not a valid HTML character and thusly when the XML
47
parser parsed this the records would fail to be read in. The JUnit tests made it easier to pinpoint points
of failure.
6.2
Product Evaluation
Feature
Ability to add/remove sources of input
Ability to add/remove filtering of specific content
Ability to automatically acquire content
Ability to automatically acquire filtered content
Ability to automatically acquire content specific locations
Download material from many locations
–From p2p
Download manually
Local History
Search Local History
Platform Independence
Transparent Operation
Requirement
Minimum
Minimum
Minimum
Extended
Extended
Extended
Extended
Extended
Extended
Extended
Extended
Extended
Product
Yes
Yes
Yes
Yes
Yes
Yes
Yes (Externally)
Yes
Yes
Yes
Yes
Yes
Steam
No
Yes
Yes
Yes
No
No
No
Yes
No
No
No
Yes
Bitscast
Yes
No
Yes
No
No
Yes
Yes (Internally)
Yes
Some
No
No
Yes
Table 6.2: Task times
All of the stated minimum requirements for this project stated in Chapters 1 - 3 were exceeded.
6.2.1
User Evaluation
Throughout the project development, users were asked for feedback on the product and to evaluate both
its success and its points of failure. As such, interviewed users had an active stake in the development
and gave active feedback on areas they liked and areas they felt more work was needed. The final
evaluation was favourable, with users expressing favouritism of the product over alternatives. All users
expressed positive feedback in regards to the performance and the available features within the prototype
product. Criticism mainly focused on the user interface, where more work was required to better capture
user input and also in the extended functionality of the program. This feedback largely focused on
extended features of the project.
6.2.2
Conclusion
Ultimately the value of the project was justified by the quality of the product made, that being the
features available within the developed solution, how it compares to existing products, the quality of
the solution in terms of implementation and performance, and the acceptance by users. To this end, the
product can be considered successful, justified by the results of the testing as well as the acceptance by
the interviewed users. As can be seen from the testing, the project reached all of the required levels of
functionality and to this extent, the project can be considered a success.
The approach of the project had a large effect on the quality of the software produced. The adoption
48
of the Unified Process allowed for rapid development, which involved users directly, and to this end
it can be said to been a great help in achieving the projects aim. The wealth of documents produced
and complexities in development led to project slippage however. As can be seen in Appendix B,
there are several alterations made to the original plan to encapsulate problems that arose during the
development of the system. The methodology led to large amounts of accurate requirements to be met,
but to the same degree, the methodology can be identified in part as the reason that the project began
to suffer from function creep when continued user evaluation yielded preference for more functionality.
Whilst this is a necessity during development, that is reaching sound levels of requirements analysis,
it can be a mixed blessing as user’s often deviate from expressing feedback on core functionality into
issues of lesser importance from a development standpoint. Despite these issues with development, the
Unified Process offered a great advantage in the development of the project and whilst complexities
in implementation and research pushed back the delivery of the project, the methodology is largely
responsible for the successful outcome of this project.
In terms of the area covered by this project, the content is both current and under ongoing development. As expressed in [13], for example, content distribution channels are providing a disruptive
technology for the gaming and software industries, and this trend is likely to continue.
6.3
Project Extensions
There are many aspects of the application that would benefit from further study and development, most
notably in terms of improving efficiency, compliance and introducing tighter security. Whilst the system
successfully exceeded the base requirements, there are many additional features that would also warrant
development. During testing it was found that using more sites led to a wider range of server side
implementations using different technologies and techniques of mark-up. Ultimately, this extended
testing resulted in increased implementation complexity, and therefore as an extension to this project,
the compatibility with more sources of input seems a logical choice. This will require steps be taken
to increase the robustness and require significant modifications to the Download class implementation
which handles most of the network code. This issue was covered under the background reading which
identified the problem with lax standards. In addition to extending this project to better handle more
sources of input, it would also seem logical to improve the efficiency of the downloading mechanism.
This was an issue shown in the testing, where by rewriting segments of code, particularly regarding
buffering and the technique within java that was utilised, the application performed better.
One element that has not been considered throughout this project is the idea of premium content
delivery, which is content to which the consumer is a new or existing paying recipient. This is an option
that is available in an existing application that is tackling a similar domain, Steam, which was reviewed
49
in 3.3. In this application, an external web based checkout system utilising Secure Socket Layer (SSL)
is used for the purchasing of software. Whilst this is encapsulated in the update platform, it is provided
via a standard HTML portal. Encrypted content is then received, data which is available to none paying
consumers also, and upon completion can be decrypted by the distribution platform providing the content access rights are associated with their account. This is achievable due to Steam’s proprietary closed
source network model. On an open network that utilises existing RSS feeds, a similar system is difficult
to achieve. However it is possible for the content delivery system to merely be the transport mechanism
for such content, and the content is later decrypted externally following acquisition. In this sense the
content type can be regarded as incidental, purchasing and encryption is handled out of the application
on the publishers site, and the user is given a means of decryption or authorising themselves for their
content later. However, it is worth noting that this does not provide a content publisher assurances over
counterfeiting or duplication of their content, it does on the other hand, provide some assurances that
interactions with the update system are largely authorised due to the ability to authorise and track login
details. With the current product in development, the system is completely open to all users and there
is no form of authorisation. This is due to the nature of RSS provided over the web. To implement a
successful premium content distribution system, a basic level of security and authorisation is required
to mitigate the risk of fraud. As it stands this is absent, and would require both client and server compliance, which is beyond the scope of this project given its focus on client content tracking. However as
a project extension this could prove of interest.
6.3.1
Features
In terms of improved functionality there are a lot of extensions that can be made:
• Scheduling of exact times to query server. The framework was design for its implementation, but
this was not implemented due to time constraints. This feature could allow content publishers to
put a schedule of when content is likely to be published on their site, which will configure the
client to intensify requests for updates during this time.
• Mirror searching and multiple source downloading. Optimise the download and remove some of
the bottlenecks in the current problem domain.
• Peer-to-Peer sharing of item downloads from RSS feeds. Optimise further so no server is needed
to download item. This will overcome problems such as a congested server with no mirror data.
• Automatic execution or installation of downloaded material.
• Advanced version tracking information, so that if an item is posted twice in a feed, that only the
latest version will be acquired not the latest item to be posted to the feed.
50
Bibliography
[1] Bennet, S., McRobb, S., and Farmer, R. Schaum’s Outline of UML, chapter 3, page 25. McGraw
Hill., 2001.
[2] Bennet, S., McRobb, S., and Framer, R. Objected-Orientated Systems Analysis and Design using
UML, chapter 3, pages 100 – 200. McGraw Hill, 2002.
[3] Berners-Lee, T., Fielding, R., and Frystyk, H.
RFC 1945: Hypertext Transfer Protocol —
HTTP/1.0, May 1996. Status: INFORMATIONAL.
[4] Bocij, P., Chaffey, D., Greasley, A., and Hickie, S. Business Information Systems: Technology,
Development and Management for the e-business, chapter 8, pages 293 – 415. Prentice Hall,
2003.
[5] Byrne, Tony. Content syndication: Ready for the masses? EContent, 26(6):30 – 35, June 2003.
[6] Cohen, Bram. Incentives Build Robustness in BitTorrent. Technical report on BT technology, http://www.bittorrent.com/bittorrentecon.pdf [Accessed: 30/11/2005],
22 May 2003.
[7] Dix, A., Finlay, J., and Abowd, G. Human-Computer Interaction. Prentice-Hall, 1998.
[8] Drze, Xavier and Zufryden, Fred. The Measurement of Online Visibility and its Impact on
Internet Traffic.
Available online, http://www.cin.ufpe.br/˜fabio/Negocios%
20Virtuais/Leituras/measurement%20of%20online%20visibility.pdf
[Accessed: 30/11/2005], October 2001.
[9] Elmasri, R., Navathe, S. B., and Farmer, R. Fundamentals of Database Systems, chapter 1, page 23.
Addison-Wesley, 2003.
[10] ETV. The ETV Cookbook: Glossary of Terms. Glossary of Electronic Transimission Terms,
http://etvcookbook.org/glossary/ [Accessed: 30/11/2005], 2005.
51
[11] Hart, Peter E., Piersol, Kurt, and Hull, Jonathon J. Refocusing Multimedia Research on Short
Clips. IEEE Multimedia, 12(3):8 – 13, July 2005.
[12] Hicks, Matthew. RSS Comes with Bandwidth Price Tag, 21 September 2004. News Article
on RSS bandwidth problems, http://www.eweek.com/article2/0,1895,1648625,
00.asp [Accessed: 30/11/2005].
[13] IGN. The Future of Downloadable Content Podcast, 21 April 2006. Roundtable about how distribution is affecting the the software industry, http://uk.games.ign.com/articles/
702/702731p1.html [Accessed: 21/04/2006].
[14] Institute of Electrical and Electronics Engineers, Inc. Prototypes as Assets, not Toys Why and
How to Extract Knowledge from Prototypes. Proceedings of ISCE-18, 1996.
[15] Institute of Electrical and Electronics Engineers, Inc. Time Series Models for Internet Data Traffic.
Proceedings of the 24th Conference on Local Computer Networks, 1999.
[16] Institute of Electrical and Electronics Engineers, Inc. Enabling Context-Aware Agents to Understand Semantic Resources on The WWW and The Semantic Web. Proceedings of the International
Conference on Web Intelligence, 20 September 2004.
[17] Jacobson, Ivar. Applying UML in the Unified Process. Lecture on the role of UML within the Unified Process, http://www.jeckle.de/files/uniproc.pdf [Accessed: 30/10/2005].
[18] Joyce, John. RSS and Syndication. Scientific Computing and Instrumentation, 21(6):12, May
2004.
[19] Kruchten, Philippe. Going Over the Waterfall with the RUP, 26 April 2004. Introductory advice
for RUP and comparison to the stages in the Waterfall model, http://www-128.ibm.com/
developerworks/rational/library/4626.html [Accessed: 30/10/2005].
[20] Larman, C. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design
and the Unified Process, chapter 4, page 35. Prentice Hall, 2001.
[21] Mannion, M. and Keepence, B. Smart requirements. ACM Software Engineering Notes, pages 42
– 47, 1995.
[22] Marumoto, Toru.
RSS and Atom.
RSS 2.0 vs Atom comparison, http://www.
intertwingly.net/wiki/pie/Rss20AndAtom10Compared [Accessed: 29/12/2005].
52
[23] Miller, Charles. HTTP Conditional Get for RSS Hackers, 21 October 2002. RSS and HTTP GET
utilisation, http://fishbowl.pastiche.org/2002/10/21/http_conditional_
get_for_rss_hackers [Accessed: 30/11/2005].
[24] Miller, Ron. Can RSS Relieve Information Overload? EContent, 27(3):20 – 24, May 2004.
[25] Pilgrim, Mark. The myth of RSS compatability, 4 February 2004. Critical review of RSS revisions, http://diveintomark.org/archives/2004/02/04/incompatible-rss
[Accessed: 30/11/2005].
[26] Pival, Paul R. Using RSS Enclosures for document delivery?
Web blog on uses of RSS in
e-learning, http://distlib.blogs.com/distlib/2005/03/using_rss_enclo.
html [Accessed: 30/11/2005], 01 March 2005.
[27] Pressman, Roger. Software Engineering: A Practitioners Approach:Fourth Edition, chapter 3,
pages 100 – 200. McGraw-Hill, 1997.
[28] Rational. Using the Rational Unified Process for Small Projects: Expanding Upon eXtreme
Programming, 2001. Available online at http://www.rational.com/worldwide/ [Accessed - 17/11/2005].
[29] Richardson, Will.
Using RSS Enclosures in Schools.
Web blog on RSS use in
schools, http://www.weblogg-ed.com/discuss/msgReader$3196?y=2005&m=
3&d=1 [Accessed: 30/11/2005], 01 March 2005.
[30] Sayre, Robert. Atom: The Standard in Syndication. IEEE Internet Computing, 9(2):71 – 78, 2005.
[31] Schneider, J. Hiding in plain sight: An exploration of the illegal(?) activities of a drugs newsgroup.
The Howard Journal of Criminal Justice, page 374, 2003.
[32] Schneiderman, B. and Plaisant, C. Designing the user interface - Strategies for effective HumanComputer Interaction. Addison-Wesley, 2005.
[33] Robert W. Sebesta. Programming the World Wide Web, chapter 1. Pearson Eductation inc., New
Jersey, 2005.
[34] Sommerville, Ian. Software Engineering: 7th Edition, chapter 3, pages 100 – 200. Pearson
Eductation inc., New Jersey, 2004.
53
[35] Andrew Tanenbaum. Computer Networks, chapter 1, pages 3 – 79. Pearson Eductation inc., New
Jersey, 2003.
[36] UserLand Software.
RSS 2.0 Specification.
RSS 2.0 Standard review, http://www.
feedvalidator.org/docs/rss2.html [Accessed: 28/12/2005].
[37] Wilson, Tim. Utilizing RSS enclosures. Web blog on further uses of RSS, http://www.
eschoolnews.com/eti/2005/02/000705.php [Accessed: 30/11/2005], 26 February
2005.
[38] WM-data AB.
Delta Method Handbook, 2001.
Available online at http://www.
deltamethod.net/9PrototypeDesign_index.htm [Accessed - 17/11/2005].
54
Appendix A - Reflection
I am very happy with the outcome of the project, both in terms of the developed software and the
experience gained. I found working with the Unified Process very helpful, and I am confident that its
use helped deliver the project on time and at a high quality.
Project Management Evaluation
As highly frequent as this notion is in project development, the issue of time management is paramount.
Delays will happen; it is an inevitable part of all projects. With this in mind I can’t stress how important
it is to manage your time appropriately. Start work well in advance of any deadlines, hard or soft, as this
work will have to be carried out either way and by preparation the work will be of a far higher quality.
Continuing this theme is the choice of methodology.
Unified Process
Applying the Unified Process proved highly beneficial, I would favour this approach given future
projects. The framework in place allows for reams of high quality design to be developed very quickly,
and for a development project this is of course the ultimate goal, to develop the project in the fastest
possible time without compromising quality. I was pleased how well the interaction with users went and
how the expansion of the system is directly supported by the methodology. In smaller projects I have
previously followed a sequential waterfall type structure and this has often meant that there hasn’t been
as much room to manoeuvre when producing the implementation. With the Unified Process you can
remodel your work as it desired.
Use of Testing
In the first iteration of system development process, the emphasis defined in the unified process is to
solidify requirements and basic system structure, ultimately creating a design level class diagram and
being implementation. To ensure that the prototype client was made to this design specification, a
series of JUnit test cases were initially devised. This allowed for easy tests to be carried out during
development by simply performing the JUnit tests after modification. The policy was to first ensure the
55
integrity of entity classes (i.e. RSS, Item and other base classes), then move onto the control and use case
functionality of classes that utilise these base classes. Lastly, brief UI tests were devised. Throughout
the development process, the success of the implementation was quantitively assessed by carrying out
these JUnit test cases and to monitor the program was functioning correctly. Of course, this represented
supplementary evaluation only allowing for the implementation of the design to be rated, given that the
ability to pass the test cases correctly did not reflect that the test cases nor the system design matched
the base requirements. However the test driven approach was used to help promote a sound and accurate
implementation of system that matched the developed design, and ensured that the level of quality and
functionality was tested periodically to ensure it matched. Testing is often overlooked for many reasons,
but for this project it provided great value, as whilst implementing the test cases all at once seemed time
consuming, time was saved when the program reached a more mature implementation as all classes
were thoroughly checked.
In terms of evaluation, this allowed for the assignment of a quantitive value to the success of the
program as an implementation of the system design, which when combined with qualitive evaluation
allowed for a justifiable summary of the projects success and quality.
The aspect of testing is something that I will take away from this project and strive to apply to future
development projects. Whilst it may seem off putting having to write numerous test cases at once, in
the end, the value of doing so is greater than the effort required.
CVS and Version Control
To ensure that the progress of the project was efficiently and accurately recorded during the coding
stages, CVS was used to hold a record of project development. This allowed for a record of what had
previously been changed and also allowed for portability of code between systems. Whilst this may
seem like a simple matter of project management this provided invaluable when dividing work between
workstations and to roll back any errors that maybe made while coding. For future students doing a
software development project, I would highly recommend the adoption of version control (detailed in
SE20).
56
Appendix B - Plans
Given the amount of diagrams produced, several sections document examples of the work produced.
For a complete listing, see the Rational Rose Model at http://autofeed.jacobbriggs.com which is a
comprehensive model of the design.
Iteration
Inception
Elaboration
Construction
Document
Define Scope
Existing Architecture
Project and Acceptance Plan
Initial Use-Case Diagrams
Business Object Diagram
Requirements Specification
Validate Architecture
Refine Vision Statement
SAD
- Use Case Diagram
- Use Case Realisation
– Use Case Description Form
– Activity Diagram
– Sequence Diagram
- Design Level Diagram
Finalise Design Class Diagram
Testing
Final Evaluation
Table 6.3: Table of Documents
57
Section
3.6.2
3.1.1
2.4
3.6.3
3.1.1
Appendix F
5.1.2
1.2
Appendix G
Appendix H
Appendix I
Appendix J
Appendix K
Appendix L
Appendix L
6.1
6.2
(a) Initial Plan
(b) Modified Plan
Figure 6.2: Project Plans: (a) The Original Plan and (b) Plan with delays
58
Appendix C - Existing Technologies
Figure 6.3: A table of comparative features present in existing technologies
59
Appendix D - Questionnaire Results
Figure 6.4: Most significant questionnaire results
60
Appendix E - Interview Summaries
Tom
• Thomas expressed a need to be able to use regular expressions to decide what to filter out of an
RSS feed.
• He wanted to be able to override settings so that paths could be set by file type, filter parameters,
etc.
• He also wanted a database of all the material he had acquired sorted by alphabetically or by date.
• He said that he strongly disliked windows update system and how it forced updates on the user
and then constantly prompted for restarts.
• He also strongly disliked Steam but this was down to the EULA and Valve’s control over the
network (it was not open enough)
Dale
• Dale expressed a liking for the way that steam and windows update existed in the tray and how
messages were communicated by tooltips
• Dale wanted a local history stored by what he had filtered
• Dale also wanted to be able to search
• He wanted to be able to filter content but was not confident about regular expressions and would
like some help
• He wanted the UI to be clean and free from clutter.
Sam
• Sam once again expressed a like of desktop integration and a clean UI
61
• He wanted to be able to use the application for podcasting and saving MP3s to his iPod device.
• He liked the idea that a local database could be stored and he would be able to search it, stating
he would like to store items by name.
62
Appendix F - The Requirements Specification
Due to its length and the repitition of most of Chapters 1 - 3, the actual document has been ommited,
however the following is a list of the areas covered in the specification which was shown to the users in
interviews.
Figure 6.5: Contents of the Requirements Specification
63
Appendix G - Use Cases
Figure 6.6: The System Use Cases
64
Appendix H - Use Case Realisation
Figure 6.7: Use Case driven analysis - A sampled Use Case from this project and its realisation into a
design level diagram
65
Appendix I - UCDF
The following are two examples of UCDF used during this project.
66
67
68
Appendix J - Activity Diagrams
Given the amount of diagrams produced, the following are two samples to demonstrate their purpose.
For a complete listing, see the Rational Rose Model at http://autofeed.jacobbriggs.com.
(a) Subscribe to Feed
(b) Add Filter
69
Appendix K - Sequence Diagrams
Given the amount of diagrams produced, the following are two samples to demonstrate their purpose.
For a complete listing, see the Rational Rose Model at http://autofeed.jacobbriggs.com.
70
71
Appendix L - Design Class Diagrams
Figure 6.8: Control Classes
72
Figure 6.9: Entity Classes
73
Figure 6.10: Network Classes
Figure 6.11: FileIO Classes
74
Appendix M - Database Files and ERD
XML Fragment 2 rss.xml
<?xml version="1.0"?>
<rssfile>
<stats>
<program>AutoFeed</program>
<version>0.0.0.1</version>
<count>7</count>
</stats>
<rss>
<id>2</id>
<name>BBC Radio - Best of Chris Moyles</name>
<url>http://downloads.bbc.co.uk/rmhttp/downloadtrial/radio1/bestofmoyles
<reqfrequency>1</reqfrequency>
<lastrequest>April 30, 2006 10:14:20 AM BST</lastrequest>
</rss>
</rssfile>
XML Fragment 3 filtersfilter.xml
<?xml version="1.0"?>
<filtersfilterfile>
<stats>
<program>AutoFeed</program>
<version>0.0.0.1</version>
</stats>
<filtersfilter>
<filtersid>2</filtersid>
<filterid>3</filterid>
</filtersfilter>
</filtersfilterfile>
75
Figure 6.12: ERD
76
Appendix N - Agile GUI
Figure 6.13: AddRSS UI
77
Figure 6.14: ViewDatabase UI
Figure 6.15: Download UI
78
Appendix O - Actual GUI
79
80
81
Appendix P - Testing Results
Add RSS
Add Filter
Add Filters
Search for ‘Moyles’
Manually download ‘Moyles’ item
Program Acquisition
Comparitor Manual Acquisition
Tom
15.43s
50.32s
20.11s
05.09s
08.00s
n/a
100.34s
Dale
28.32s
80.80s
29.04s
10.09s
12.05s
n/a
140.23s
Sam
18.87s
61.01s
21.82s
5.23s
09.47s
n/a
110.98s
Table 6.4: Task times - Tasks are performed one time only and are then persistant
82