Improving a CollaborativeTravel Planning Application

Transcription

Improving a Collaborative
Travel Planning Application
Jan Hrnčíř
NI VER
S
E
R
G
O F
H
Y
TH
IT
E
U
D I
U
N B
Master of Science
School of Informatics
University of Edinburgh
2011
ABSTRACT
Abstract
In this thesis, we present a novel multi-agent travel planning algorithm which finds
shared journeys for multiple travellers. The algorithm uses existing planners and
works in three phases: finding initial single-agent plans, their optimisation by the
best-response approach and matching the plans to the timetable.
The algorithm has been evaluated using real-world public transportation data
of the United Kingdom in five different scenarios of increasing complexity. The
results show linear scalability both with the scenario size and the number of agents,
confirming empirically that the proposed algorithm has overcome the scalability
issues experienced in the previous work.
The algorithm can be used in practice as a part of a travel planning system for
real passengers. However, there is a trade-off between the amount of improvement
in cost, the percentage of found timetables and the prolongation of journeys.
i
ACKNOWLEDGEMENTS
Acknowledgements
At first, I would like to thank my supervisor Michael Rovatsos for his guidance
and support. Also, I wish to thank Matt Crosby, Radomír Černoch and Dominik
Głodzik for their feedback. Last but not least, I would like to thank Gerhard Wickler
for the consultation of the PDDL specifications.
ii
DECLARATION
Declaration
I declare that this thesis was composed by myself, that the work contained herein
is my own except where explicitly stated otherwise in the text, and that this work
has not been submitted for any other degree or professional qualification except as
specified.
(Jan Hrnčíř )
iii
TABLE OF CONTENTS
Table of contents
Abstract
i
Acknowledgements
ii
Declaration
iii
Table of contents
iv
1 Introduction
1.1 Aim and hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
2 Background
2.1 Multi-agent systems . . . . . . .
2.2 Multi-agent planning . . . . . . .
2.2.1 A best-response approach
2.3 Planners used . . . . . . . . . . .
2.4 Travel planning . . . . . . . . . .
2.5 Related work . . . . . . . . . . .
2.5.1 Data-related problems . .
2.5.2 Planning-related problems
2.6 Summary . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
3
3
3
4
5
6
7
7
8
8
.
.
.
.
.
.
.
.
10
10
10
11
13
13
14
16
17
.
.
.
.
.
19
19
19
20
20
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Design of the algorithm
3.1 Description of the travel domain . .
3.2 Design decisions . . . . . . . . . . .
3.3 Domain definitions . . . . . . . . . .
3.4 Description of the algorithm . . . . .
3.4.1 The initial and the BR phase
3.4.2 The timetabling phase . . . .
3.4.3 Cost functions . . . . . . . .
3.5 Summary . . . . . . . . . . . . . . .
4 Data
4.1 Source data . . . . . . . . . . . . . .
4.2 Data transformation . . . . . . . . .
4.2.1 Database system . . . . . . .
4.2.2 PostGIS extension . . . . . .
4.2.3 NaPTAN data transformation
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TABLE OF CONTENTS
4.2.4 NPTDR data transformation
4.3 Data processing . . . . . . . . . .
4.3.1 Mistakes and stops merging
4.3.2 Walking connections . . . .
4.3.3 Regions and the timetable .
4.3.4 Relaxed domain . . . . . .
4.4 Summary . . . . . . . . . . . . . .
5 Implementation
5.1 The algorithm . . . . . .
5.2 PDDL specifications . . .
5.2.1 The relaxed domain
5.2.2 The full domain .
5.3 Visualisation . . . . . . .
5.4 Summary . . . . . . . . .
. . .
. . .
. .
. . .
. . .
. . .
6 Testing and evaluation
6.1 Scenarios . . . . . . . . . . . .
6.1.1 Experiment generation .
6.2 Evaluation . . . . . . . . . . .
6.2.1 Scalability . . . . . . . .
6.2.2 Plan quality . . . . . . .
6.2.3 Improvement in cost . .
6.2.4 Prolongation of journeys
6.3 Summary . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Discussion
7.1 Using the algorithm in practice . . . . .
7.2 Parallel computing . . . . . . . . . . . .
7.3 Problems with timetabling . . . . . . .
7.4 Domain-independent and domain-specific
7.5 Summary . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
solutions
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
26
26
27
28
30
31
.
.
.
.
.
.
33
33
35
35
37
39
41
.
.
.
.
.
.
.
.
42
42
43
45
45
48
50
51
52
.
.
.
.
.
53
53
53
55
55
56
8 Conclusion
57
8.1 Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.2 Future avenues of research . . . . . . . . . . . . . . . . . . . . . . . . 58
Bibliography
60
v
1 INTRODUCTION
1 Introduction
Travelling is an important and frequent activity, yet people willing to travel have
to face problems with rising fuel prices, carbon footprint and traffic jams. These
problems can be ameliorated by travel sharing, i.e., a group of people travel together
in one vehicle for the whole or a part of the journey.
Members of a travel group can benefit from travel sharing in several ways.
First, their public transport journey can be cheaper if they buy a group ticket.
Second, their carbon footprint can be reduced when they share a car. Finally, they
can enjoy the company of the group on a long journey.
There are many services1 already in existence that can find a travel plan with
a timetable for a single passenger using the public transport of the United Kingdom.
There are also services2 which help with a manual negotiation of a shared journey
for multiple passengers. However, a system that is able to find automatically a fully
or partially grouped route with a timetable for several people using public transport
has not yet been developed.
This project builds on the bachelor thesis of Eamonn McCafferty [20]. McCafferty tried to use a single-agent off-the-shelf AI planner with real-world transportation data to plan shared travel routes for multiple agents. However, the experiments
showed that this approach has limited scalability in terms of the size of the domain
and the number of agents. To overcome the scalability issues, a novel multi-agent
travel planning algorithm is designed in this project. The algorithm combines existing components to solve a real-world problem.
1.1
Aim and hypothesis
The aim of this project is to design, implement and evaluate a multi-agent planning
algorithm which finds shared journeys for agents using real-world public transportation data of the United Kingdom. To build an efficient multi-agent planning algorithm is a challenging task for several reasons. First, the public transport domain
is complex in structure and very large – there are 240 590 timetable connections
for trains and coaches in the UK. Second, it is very hard to plan for multiple selfFor example http://www.nationalrail.co.uk/ for trains or http://www.traveline.info/
and http://www.google.com/transit/ for a combination of means of transport.
2
For instance http://www.companions2travel.co.uk/ or http://www.travbuddy.com/.
1
1
1 INTRODUCTION
interested agents that are willing to cooperate only if it is profitable for them. In
this case, planning can become exponentially harder than planning for each agent
individually [8]. Third, in order for the planning algorithm to be applicable to other
domains, we cannot use domain-specific knowledge about the travel domain. We
hope to overcome the mentioned difficulties and verify the following hypothesis:
“A multi-agent planning algorithm is able to plan meaningful shared routes for
all agents in a feasible time in a real-world travel domain.”
To plan meaningful shared routes means that an agent shares a part of its
journey only if it is profitable for the agent. All agents in the scenario will receive
a single, partially or fully shared journey after the run of the algorithm. Our requirement for planning in a feasible time is that the algorithm is able to calculate the
results for a train scenario with 6 agents in the central UK region, cf. section 4.3.3, in
a time less than 15 minutes. From a research perspective, we will observe the quality
of produced plans and the scalability of the algorithm in terms of scenario size and
number of agents in it. A real-world travel domain is described in section 3.1.
1.2
Structure of the thesis
To begin with, the background of multi-agent planning together with the related
work is given in Chapter 2. Then we proceed to the description of the domain and
the algorithm for finding shared journeys in Chapter 3. The explanation of the public
transportation data processing in Chapter 4 is followed by the implementation of the
algorithm in Chapter 5. The performance of the algorithm is evaluated in Chapter 6
and then discussed in Chapter 7. Finally, Chapter 8 presents the conclusion deduced
from the obtained results.
2
2 BACKGROUND
2 Background
This chapter contains background information about multi-agent systems and multiagent planning which is needed for planning journeys of agents in the travel domain.
Then, this chapter describes the planners LAMA, SGPlan6 and POPF2 that are used
for computing plans for the agents. Finally, existing systems for public transport
travel planning are introduced and work related to this project is discussed.
2.1
Multi-agent systems
There is no universally accepted definition of the term agent. However, the following
definition by Wooldridge [33] is convenient for our purposes: “An agent is a computer
system that is situated in some environment, and that is capable of autonomous
action in this environment in order to meet its design objectives.” The important
characteristics of an agent are autonomy (the ability to operate without external
guidance), social ability (the ability to interact meaningfully with other agents),
reactivity (the ability to respond to changes in the environment) and pro-activeness
(the ability to take the initiative to achieve own goals) [32].
A multi-agent system can be defined as an environment where there is more
than one agent and where the agents interact with each other [25]. There are usually
constraints such that any agent cannot at any time know everything about the world
and the other agents. A cooperative multi-agent system is a multi-agent system where
agents are cooperating to fulfil their objectives or to maximise utility [25].
In this project, we ignore perception, belief revision, deliberation, plan execution and communication which are needed to build “full” agents. To solve the travel
sharing problem, we focus on multi-agent planning.
2.2
Multi-agent planning
Efficient planning and deliberative decision making is an important tool of current
Artificial Intelligence (AI). Furthermore, planning can be used as a general problem
solving method [16]. The main outcome of the planning procedure is a plan, i.e.,
a sequence of actions, to achieve specified goals. A planner is a software tool which
is able to compute plans given the description of the environment, possible actions,
an initial state and goals to achieve.
3
2 BACKGROUND
The Planning Domain Definition Language (PDDL) [14, 15, 21] is commonly
used to describe the planning problem. Originally, it was used in the International
Planning Competition3 (IPC) to enable the comparison of participating planners.
The description of the environment is divided into two parts. Firstly, the domain
definition contains a list of actions, predicates and functions. It is possible to assign
a cost to every action. Secondly, the problem definition includes a list of objects, an
initial state of the environment and goals to achieve.
Just as there is no common definition of an agent, there is not any common
definition of a multi-agent planner. In this project, by a centralised multi-agent planner, we mean a single-agent planner (e.g., Metric-FF [17]) which uses a multi-agent
specification of the domain. The planner tries to optimise the value of the joint cost
function which is in our implementation the sum of the values of the cost functions
of agents in the environment. However, the centralised multi-agent planner does not
have any notion of self-interested agents, i.e., it ignores the individual preferences of
agents. By a strategic centralised multi-agent planner, we mean a centralised multiagent planner which has the notion of self-interested agents, e.g., a best-response
approach which is explained in the following section.
The main problem when planning for multiple agents with a centralised multiagent planner is the exponential blowup in the action space which is caused by using
concurrent, independent actions [19]. This is induced by a centralised multi-agent
planner producing a plan that is totally ordered. However, every permutation of
concurrent (or independent) actions is a valid plan with the same meaning. As an
example, imagine there are two agents, one travelling from Glasgow to Edinburgh
and the second from London to Birmingham. Their journeys are totally independent
and so are the sequences of each agent’s actions. Yet, every totally ordered plan
obtained by interleaving the two ordered sets of actions is considered different to
a centralised multi-agent planner.
Therefore, a centralised multi-agent planner explores a huge number of equal
plans which significantly increases the complexity of planning. Consequently, it is
very hard to use a centralised multi-agent planner to solve a multi-agent planning
problem in practice.
3
http://icaps-conference.org/index.php/Main/Competitions
4
2 BACKGROUND
2.2.1
A best-response approach
Jonsson et al. [19] propose a strategic centralised multi-agent planner which uses
the best-response approach. The planner works in two phases. In the first phase,
an initial plan for each agent is computed (e.g., each agent plans asynchronously
or a centralised multi-agent planner is used). In the second phase, the planner
solves simpler best-response planning (BRP) problems from the point of view of
each individual agent. The goal of the planner in a BRP problem is to minimise the
cost of agent’s plan without changing the plans of others. Consequently, it optimises
a plan of each agent with respect to the current joint plan.
The best-response approach takes into account that every agent is self-interested, i.e., an agent tries to achieve its goal with a plan that has minimal cost,
independently of the others. However, if it is strategically meaningful for the agent,
it will be willing to participate in a joint plan.
On the one hand, this approach has several advantages. It supports full concurrency of actions and the BRP phase overcomes the exponential blowup in the
action space resulting in a very good time complexity. For the class of potential
games [23], it guarantees to converge to a Nash equilibrium. The convergence is
also ensured for the travel domain which belongs to the class of potential games.
On the other hand, it does not guarantee the optimality of a solution. However,
the experiments have proven that it can be successfully used for improving general
multi-agent plans [19].
2.3
Planners used
All three single-agent planners used in this project were taken from recent International Planning Competitions (IPC) from 2008 and 2011. LAMA is a sequential
satisficing planner whereas SGPlan6 and POPF2 are temporal satisficing planners.
A satisficing planner searches for a plan which solves a given problem. It does not
guarantee the optimality of the found plan (as opposed to an optimal planner).
A temporal satisficing planner takes into account durations of actions. It tries to
minimise makespan (i.e., total duration) of a plan but it does not guarantee the
optimality of the found plan.
The LAMA planner by Silvia Richter and Matthias Westphal is a propositional
planning system based on heuristic state space search [27]. It is the winner of the
5
2 BACKGROUND
sequential satisficing track at the IPC 2008. The core feature of the planner is the
usage of landmarks [26], i.e., propositions that must be true in every solution of
a planning problem. LAMA consists of three separate parts that are executed in
a sequence: the translator, the knowledge compilation module and the search engine.
The SGPlan6 planner by Chih-Wei Hsu and Benjamin W. Wah is designed to
solve both temporal and non-temporal planning problems [18]. It is the winner of the
temporal satisficing track at the IPC 2008. It consists of three inter-related steps:
parallel decomposition, constraint resolution and subproblem solver [10, 17, 22, 29].
The POPF2 planner by Amanda Coles, Andrew Coles, Maria Fox and Derek
Long is a temporal forward-chaining partial-order planner [13]. It performed very
well in the temporal satisficing track at the IPC 2011. Details about the extended
grounded search of POPF2 are available in the articles [11, 12].
2.4
Travel planning
In this section, existing systems for public transport travel planning are presented.
In the single-agent case, there are many web applications that return a travel plan
with a timetable given an origin, a destination and a leaving time. Three significant
web applications for the United Kingdom are provided by National Rail Enquiries4 ,
Traveline5 and Google Transit6 . National Rail Enquiries offers a train-only journey
planner with the possibility to find the cheapest fare and to buy the tickets in
advance. Traveline is a partnership of local authorities and transport operators
that makes available comprehensive information about public transport in Scotland,
England and Wales. It provides a journey planner covering buses, trains, ferries and
underground. The public transport data of Traveline is used by Google Transit for
a journey planner which is based on the user interface of Google Maps7 . A found
journey is displayed visually on the map and supplemented with a journey itinerary.
In the multi-agent case, there are services and forums that allow finding companions for shared travelling and to negotiate a journey plan. These services include
4
http://www.nationalrail.co.uk/
http://www.traveline.info/
6
http://www.google.com/transit/
7
http://maps.google.com/
5
6
2 BACKGROUND
Companions2Travel8 , Travel Buddy9 and Travel Together10 . Nonetheless, they do
not offer a built-in journey planner for a negotiated travel group. In order to find
a travel plan, the members of the group need to use one of the single-agent travel
planners described in the previous paragraph.
In summary, there are single-agent services for finding a travel plan with
a timetable and multi-agent services for manual negotiation of travel companions
and a travel plan. However, a journey planner that combines these two types of services and has the ability of finding shared journeys for several people using public
transport has not yet been developed.
2.5
Related work
In his project, Eamonn McCafferty [20] tried to find shared journeys by public
transport using a centralised multi-agent planner. McCafferty used a multi-agent
description of the domain for a single-agent planner Metric-FF [17] with the full
timetable information. This approach was tested in 13 small experiments with 1–3
agents. However, McCafferty encountered several problems which were not solved
in the project. They can be divided into problems related to the data and problems
related to the planning approach.
2.5.1
Data-related problems
To start with, the data-related problems are discussed. In order to get the PDDL
specification of the domain for the planner, McCafferty transformed the source XML
data (see section 4.1 for the description of the data) directly to a PDDL specification.
After the direct transformation, several problems arose.
It was very difficult to deal automatically with changes of means of transport
which involved walks (only some walking connections were created manually and
this is infeasible to do for the whole UK public transportation data). Also, it was
not possible to use the information about which bus bays are on the same bus station
and which parts of a railway station belong to each other.
Furthermore, ATCO codes11 were not used as identifiers for the stops in the
8
http://www.companions2travel.co.uk/
http://www.travbuddy.com/
10
http://www.traveltogether.com/
11
A unique identifier for all points of access to public transport in the United Kingdom.
9
7
2 BACKGROUND
domain. Instead, the concatenation of a stop’s common name, locality and area
was used (special characters were filtered out, spaces were replaced by underscores).
With such identifiers, it is difficult to relate the stops to the NaPTAN data (cf. section 4.1) without creating any additional memory structure. The NaPTAN data
contains all available information about the stops (name, address, latitude, longitude, etc.) that can be used for example for visualising the stops on the map.
Overall, the direct transformation of the source XML data to PDDL specification prohibited automatic processing of the data in a comprehensive and accurate
way.
2.5.2
Planning-related problems
Re-running the experiments has proven that a direct application of a centralised
multi-agent planner to this problem does not scale well (as discussed in section 2.2).
For example, a simple scenario with two agents, ferries to Orkney Islands and trains
in the area between Edinburgh and Aberdeen resulted in a one-day computation
time. Therefore, this approach is not very suitable for use in practice.
The planner was tested only in 13 small experiments with 1–3 agents. The
biggest tested experiment contained only 10 % of the train timetable data of Scotland. Despite the very small size of experiments, a problem with the quantity of
the data was identified. In some scenarios, the planner was not able to plan with all
given timetable data. In order to get at least some solution, a part of the timetable
data was excluded from the scenario. However, this led to suboptimal plans for the
agents.
In conclusion, the experiments have shown that a direct application of a centralised multi-agent planner to the problem of finding shared journeys for agents does
not scale well. Also, planning with the full timetable data in some cases exceeded
the memory limits of the Metric-FF planner.
2.6
Summary
This chapter has explained the relevant background information and approaches
of multi-agent planning. Most importantly, related work that aimed to solve the
problem of finding shared journeys for agents has been described. In the related
work, problems with the computation time and memory limits were encountered.
8
2 BACKGROUND
As a result, the approach was not very suitable for use in practice. To conclude,
we can now take advantage of this information to build a better algorithm which is
described in the next chapter.
9
3 DESIGN OF THE ALGORITHM
3 Design of the algorithm
In this chapter, the introduction of the travel domain is followed by the description
of the algorithm for finding shared journeys together with the relaxed and the full
domain. Next, cost functions in the travel domain are discussed. The data used by
the algorithm are then presented in Chapter 4, the implementation of the algorithm
is described in Chapter 5 and its evaluation is done in Chapter 6.
3.1
Description of the travel domain
The real-world travel domain used in this project is based on the public transport
network in the United Kingdom. An agent representing a passenger is able to use
different means of transport during its journey: trains, coaches, local buses and
ferries. The domain is supplemented by complete timetable information. The aim
of each agent is to get from its starting location to its final destination at the lowest
possible cost, where the cost of the journey is based on the duration and the price
of the journey. Importantly, sharing a part of a journey with other agents is cheaper
than travelling alone.
This travel domain is very large and complex. It contains large numbers of
railway stations, bus and ferry stops, as well as information about the timetable in
each station. To be able to deal with the domain, certain restrictions are imposed.
First, Tuesday is chosen as a single working day for the scenarios. Therefore, all
agents are travelling on the same day and all journeys must be completed within
24 hours. Second, local buses are filtered out. They are not worth sharing because
people usually use them only for short parts of their journey. Furthermore, people
often take local buses without planning of their journey in advance (e.g., they just see
a convenient bus on the street and use it to bring them nearer to their destination).
This behaviour is different from sharing a journey by public transport which must
be planned in advance. As we will show later in section 7.1, these restrictions are
not crucial, so the introduced algorithm can be used in practice.
3.2
Design decisions
Based on the previous work of McCafferty [20] discussed in section 2.5, I made
two important decisions in the design of the algorithm in order to achieve scalabil10
ity which is very important for scenarios involving real-world data and real-world
situations.
First, to overcome the problem with quantity of data, a plan found in the
relaxed domain (defined in the next section) is matched to the timetable instead
of working with the full timetable information from the very beginning. Second, in
order to speed up the planning process, a single-agent planner is used for computing
an initial plan for each agent separately. Then, the plans of agents are further
optimised using the best-response approach.
In the next two sections, the domain definition is followed by a detailed description of the algorithm.
3.3
Domain definitions
Two different types of the travel domain are needed for the algorithm: the relaxed
domain and the full domain. They are defined in this section and their specification
in PDDL is then described in section 5.2.
On the one hand, the relaxed domain is a single-agent specification represented
as a directed graph where the nodes are the stops and the edges are the connections
provided by a service. The graph must be directed because there exist stops that
are used in one direction only. There is an edge from the stop A to B if there is
at least one connection from A to B in the timetable. Its cost is the minimal time
needed for travelling from A to B. A plan Pi found in the relaxed domain for the
agent i is a sequence of connections to travel from its origin to its destination.
C
F
80 min
20 min
30 min
120 min
B
D
50 min
E
30 min
60 min
G
A
Figure 3.1: An example of the relaxed domain (e.g., it takes 50 minutes to travel from
the stop A to B).
A small example of the relaxed domain is shown in Figure 3.1. An example
plan for an agent travelling from C to F is P1 = hC → D, D → E, E → F i. To
11
illustrate the difference between the relaxed domain and the full timetable, there are
5 454 connections in the relaxed domain for trains in the UK compared to 227 668
timetable connections.
On the other hand, the full domain is a multi-agent specification based on
the joint plan P . Assume that there are N agents in the full domain (each agent i
has the plan Pi from the relaxed domain). Then, the joint plan P is a merge of
single-agent plans defined by formula (3.1).
P =
N
[
Pi
(3.1)
i=1
Given a set of single-agent plans, the plan merging operator
S
computes its result in
three steps. At first, it transforms every single-agent plan Pi to a directed graph Gi
where the nodes are the stops from the single-agent plan Pi and the edges represent
the actions of Pi (for instance, a plan P1 = hC → D, D → E, E → F i is transformed
to a directed graph G1 = {C → D → E → F }). Second, it performs a graph union
operation over the directed graphs Gi . Finally, it labels every edge in the joint plan
with the numbers of agents that are using the edge.
As an example, the joint plan P for two agents is shown in equation (3.2).
Agent 1 which is travelling from the stop C to F shares a part of its journey from
D to E with agent 2.
P =
2
[
Pi = hC → D, D → E, E → F i
[
hD → Ei =
(3.2)
i=1
(1)
(1,2)
(1)
= {C −→ D −−→ E −→ F }
Then, the full domain is represented as a directed multigraph where the nodes
are the stops that are present in the joint plan P . Edges of the multigraph are the
service journeys from the timetable. Every service is identified by a unique service
name and is assigned a departure time from each stop and a duration of its journey
between two stops. In the example of the full domain in Figure 3.2, the agents can
travel by five different services S1 to S5. In order to travel from C to D by the
service S1, an agent must be present at the stop C before its departure.
12
C
S1
S2
S3
F
S4
S5
S1
D
E
S2
Figure 3.2: An example of the full domain with stops C, D, E and F for the joint
(1)
(1,2)
(1)
plan P = {C −−→ D −−−→ E −−→ F }.
3.4
Description of the algorithm
As already discussed, it is very time-consuming and memory-demanding to find
shared journeys in one step in this complex domain with full location data and finegrained timetable data. Hence the algorithm is designed to work in three distinct
phases. The pseudocode of the whole algorithm is shown in Figure 3.3.
3.4.1
The initial and the BR phase
First, in the initial phase, an initial journey is found for each agent using the relaxed
domain. A journey for each agent is calculated independently of other agents in the
scenario using a single-agent planner. As a result, each agent is assigned a singleagent plan which will be further optimised in the next phase. This approach makes
sense in this domain because the agents do not need each other to achieve their
goals and they cannot destroy each other’s plans. This approach would not give
good results in other domains where cooperation is needed to achieve the goals.
Second, in the BR phase (best-response phase) which is also based on the
relaxed domain, the algorithm uses the best-response approach [19]. It iteratively
creates and solves simpler best-response planning (BRP) problems from the point of
view of each individual agent. In the case of the relaxed domain, the BRP problem
looks almost the same as a problem of finding a single-agent initial journey. The
difference is that the cost of travelling is smaller when an agent uses a connection
which is used by one or more other agents, cf. formula (3.3).
Iterations over agents continue until there is no change in the cost of the joint
plan between two successive iterations. That means that the joint plan cannot be
further improved using the best-response approach. The output of the BR phase is
the joint plan P in the relaxed domain (defined in section 3.3) that specifies which
13
connections the agents use for their journeys and which segments of their journeys
are shared. The joint plan P will be matched to the timetable in the final phase of
the algorithm.
Input
• a relaxed domain
• a set of N agents A = {a1 , . . . , aN }
• an origin and a destination for each agent
1. The initial phase
For i = 1, . . . , N do
Find an initial journey for agent ai using a single-agent planner.
2. The BR phase
Do until no change in the cost of the joint plan
For i = 1, . . . , N do
1) Create a simpler best-response planning (BRP)
problem from the point of view of agent ai .
2) Minimise the cost of ai ’s plan without changing
the plans of others.
End
3. The timetabling phase
Identify independent groups of agents G = {g1 , . . . , gM }.
For i = 1, . . . , M do
1) Find the relevant timetable for group gi .
2) Match the joint plan of gi to timetable by a temporal single-agent
planner in the full domain with the relevant timetable.
End
Figure 3.3: Pseudocode of the algorithm for finding shared journeys for agents.
3.4.2
The timetabling phase
Finally, in the timetabling phase, the optimised shared journeys are matched to the
relevant timetable by a temporal single-agent planner which is using the full domain.
14
At first, independent groups of agents with respect to journey sharing are
identified. The independent group of agents is defined as a connected component of
the joint plan P . Then, in every independent group, parts of the group journey are
found. A part of the group journey is defined as a maximal continuous segment of
the group journey which is done by the same set of agents. As an example, there is
a group of two agents sharing a segment of their journeys, cf. Figure 3.4. Agent 1
is travelling from the stop A to G and agent 2 is travelling from B to H. Their
group journey has five parts, the shared part of their journey is between the stop C
and F (part 3).
A
G
part 3
(1)
(1)
C
part 1
D
(1, 2)
part 2
F
E
(1, 2)
(2)
part 4
(1, 2)
part 5
(2)
B
H
Figure 3.4: Parts of the group journey of two agents.
As described later in section 4.3.4, direct trains, i.e., trains that do not stop
at every stop, are filtered from the relaxed domain for the following reason. In
Figure 3.4, assume that there is only one agent travelling from the stop A to stop
G and that its plan at the end of the BR phase is one direct train from the stop A
to G. However, then it is possible to match its plan only to a direct train from the
stop A to G. It is not possible to match it to a train which is stopping at all the
stops A, C, . . . , G. Therefore, agent’s plan cannot be matched to all possible trains
going from the stop A to G which is problematic especially in the case where the
majority of trains stop at every stop and only a few trains are direct. On the other
hand, it is possible to match a plan with a train stopping in every stop to a direct
train as is explained in the next paragraph.
In order to use direct trains when the group journey is matched to the timetable,
the relevant timetable for a group journey is composed in the following way: for ev-
15
ery part of the group journey, return all timetable services in the direction of agents’
journeys which connect the stops in that part. An example of the relevant timetable
for a group of agents from the previous example is shown in Figure 3.5. Now, the
agents can travel by the direct train T1 or by the stopping train T2.
A
T1
G
T1
T2
T2
T1
C
T2
D
T2
E
T2
F
T5
B
T3
T4
H
Figure 3.5: The full domain with services from the relevant timetable. There are five
different trains T1 to T5, the train T1 is a direct train.
The relevant timetable for the group journey is used with the aim to cut down
the amount of data that will be given to a temporal single-agent planner. For
instance, there are 23 994 train timetable connections in Scotland. For an example
journey of two agents, there are only 885 services in the relevant timetable which is
approximately 4 % of the Scottish timetable data. As a result, the temporal singleagent planner gets only the necessary amount of data to prevent the time-consuming
exploration of irrelevant state-space.
To conclude, the timetable matching problem is solved using a temporal singleagent planner which is using the full domain with the relevant timetable.
3.4.3
Cost functions
The timetable data used in this project (cf. section 4.1) contains neither the information about ticket prices nor the distances between adjacent stops. The data
contains only the durations of journeys from one stop to another which significantly
restricts the design of cost functions used for the planning problems. Therefore,
the cost functions used in the three phases of the algorithm are based solely on the
duration of journeys.
16
In the initial phase, every agent tries to get to its destination in the shortest
possible time. The cost of travelling between adjacent stops A and B is simply the
duration of the journey between stops A and B.
In the BR phase, the cost function has to favour shared journeys of agents.
Assume that the duration of the journey between adjacent stops A and B is t. The
cost tn for one agent travelling from A to B in a group of size n is then defined by
formula (3.3). The cost tn is designed to model approximately the discount for the
passengers if they buy a group ticket. In reality, pricing for group tickets could vary.
tn =
1
0.8 t + 0.2 t
n
(3.3)
Figure 3.6 shows how the cost of a journey for a single agent changes with the
number of agents travelling in the same group. The more agents travel together,
the cheaper it is for each agent in the group. Also, an agent cannot travel cheaper
cost for one agent
[% of the single-agent cost]
than 20 % of the single-agent cost.
100
80
60
40
20
0
1
2
3
4
5
6
7
8
number of agents in the group
Figure 3.6: The dependence of the cost of a journey for one agent on the size of a group
of agents.
In the timetabling phase, every agent in a group of agents tries to spend the
shortest possible time on its journey. When matching the plan to the timetable,
the temporal planner tries to minimise the sum of durations of agents’ journeys
including waiting times between services.
17
3.5
Summary
In this chapter, the algorithm for finding shared journeys in the travel domain has
been presented. The algorithm works in three phases: initial single-agent plans
are found in the initial phase, they are optimised by the best-response approach
in the BR phase and finally, the joint plan is matched to the timetable in the
timetabling phase. The relaxed and the full domain used by the algorithm have
been defined. In the next two chapters, the processing of the transportation data
and the implementation of the algorithm are described.
18
4 DATA
4 Data
This chapter discusses the processing of the source data. To begin with, publicly
available data about the public transportation system in the United Kingdom is
introduced. Next, the transformation of the data from XML files to a database
system is described. Finally, the data processing in the database is explained step
by step.
4.1
Source data
The project uses the National Public Transport Data Repository (NPTDR)12 which
is publicly available from the Department for Transport of the British Government.
It contains data about the public transport system in the United Kingdom. The data
is compiled from eleven regions of the United Kingdom, national coach services and
National Rail services are compiled centrally [7]. Every year since 2004, a snapshot
of route and timetable data is gathered in the first or second complete week of
October. The data is available in ATCO-CIF13 and TransXChange XML14 data
format. In this project, TransXChange XML data from year 201015 is used.
National Public Transport Access Nodes (NaPTAN)16 is a UK national system
for uniquely identifying all the points of access to public transport. Every point of
access is identified by an ATCO code, e.g. 9100HAYMRKT for Haymarket Rail
Station. Each stop in NaPTAN XML data is also supplemented by common name,
latitude, longitude, address and other pieces of information. This data also contains
information about how the stops are grouped together (e.g., several bus bays at
a bus station).
4.2
Data transformation
Recent AI planners use the Planning Domain Definition Language (PDDL) for describing the problem domain. Therefore, a transformation of the data from XML
to PDDL is needed. The algorithm described in previous chapter uses two different
12
http://data.gov.uk/dataset/nptdr
ATCO file format for interchange of route and timetable data.
14
A UK national XML-based data standard for interchange of route and timetable data.
15
http://www.nptdr.org.uk/snapshot/2010/nptdr2010txc.zip
16
http://data.gov.uk/dataset/naptan
13
19
4 DATA
PDDL specifications (a relaxed domain and a domain with timetable information).
Furthermore, as discussed in section 2.5, there is duplicate and erroneous information in the data which makes it hard to transform directly to PDDL. Therefore,
I processed the data in three stages. First, I transformed XML data to a PostgreSQL database. Second, I manually corrected and automatically processed and
optimised the data in the database. Finally, I created a script for generating PDDL
specifications based on the data in the database.
4.2.1
Database system
For processing the data in the database, we needed to decide which database management system to use. We chose the PostgreSQL17 database management system for
several reasons. Firstly, PostgreSQL is an object-relational database system which
is free and open source. Secondly, it provides a loadable procedural PL/pgSQL
language which is executed inside the database system. As a result, a function
written in PL/pgSQL increases performance by avoiding client-server communication overhead and by eliminating transfer of intermediate results the client does not
need [4]. Moreover, the PostGIS extension adds spatial capabilities to the PostgreSQL database system.
4.2.2
PostGIS extension
The PostGIS18 extension allows us to work with geographical data by including
support for GiST-based R-Tree spatial indexes and functions for processing geographical objects [5]. There are two base data types for representing geographic
coordinates, the geometry and geographic type. On the one hand, the geometry type
is based on a plane where the shortest path between two points is a straight line.
Therefore, all calculations on geometries (e.g., distances, areas) can be computed
using simple Cartesian mathematics. The geometry type is a very good approximation for shorter distances within one country. For this reason, it is used to calculate
distances between stops that are close to each other. On the other hand, the geographic type is based on a sphere where the shortest path is a great circle arc.
Consequently, the calculations on geographies use more complicated mathematics
and are computationally harder than calculations on geometries.
17
18
PostgreSQL 9.0.4, http://www.postgresql.org/
PostGIS 1.5.2, http://postgis.refractions.net/
20
4 DATA
Geographical locations of all stops in the NaPTAN data are represented as their
longitude and latitude values using the World Geodetic System (version WGS 84).
WGS 84 is a geographic coordinate system type identified by SRID 4326 (Spatial Reference System Identifier). In order to use the simpler geometry type in the database,
a projected coordinate type is needed. For locations in the United Kingdom, the
spatial reference system British National Grid with SRID 27700 is used [2]. This
system is regional and does not have any meaning for locations outside the United
Kingdom. The PostGIS extension provides the function ST_Transform for transforming geometries between different spatial reference systems.
During data processing, a query in the form “return all stops that are closer
than d metres from a given stop” is often required. A function ST_DWithin combines
a distance test with a bounding box test. At first, it finds a relevant subset of data
by a bounding box test which uses the spatial index. Next, the distance test is
applied to the subset of the data to generate the final result. The following query
returns ATCO code and a stop type of all stops in radius of 1000 metres from the
bus station in Aberdeen. It also returns the distance from this bus station using
the function ST_Distance. The record AberdeenBus contains the ATCO code and
geographical coordinates of the bus station.
SELECT
atcocode, stoptype,
ST_Distance(lonlat, AberdeenBus.lonlat) AS distance
FROM
naptanused
WHERE
(atcocode NOT LIKE AberdeenBus.atcocode)
AND (ST_DWithin(lonlat, AberdeenBus.lonlat, 1000));
As a result, the database system returns one ferry stop, one railway station
and two other bus stops with computed distance.
atcocode
| stoptype |
distance
------------–|----------|-----------------9300ABA
| FER
| 270.155795933715
9100ABRDEEN | RLY
| 178.243680342881
639006436
| BST
| 277.469632365982
639006371
| BST
| 561.171919346201
21
4 DATA
4.2.3
NaPTAN data transformation
To be able to use spatial data about stops (their geographical coordinates) and to
identify groups of stops, I processed the NaPTAN data in the version from 4 January
2011. The NaPTAN data is stored in XML which uses the NaPTAN Schema, version 2.1. The NaPTAN data is large (547 MB in one XML file) so the Java StAX19
API is used to parse the XML data in an efficient way.
The NaPTAN data consists of two main parts, StopPoints and StopAreas,
cf. Figure 4.1. Every stop point is assigned a type [1] (e.g., a rail station RLY, a bus
stop BCT and a ferry stop FER). Stop areas represent groups of stop points, e.g.,
a bus station with several bus bays. They are also assigned to types [1] (e.g., a rail
stop area GRLS, a bus stop area GBCS).
Figure 4.1: Overview of the XML Schema of the NaPTAN data.
stopareas
naptan
stop
CHARACTER VARYING(16)
atcocode
stoparea
commonname
lat
DOUBLE PRECISION
lon
DOUBLE PRECISION
stoptype
lonlat
USER-DEFINED
Figure 4.2: ER diagram of the naptan and stopareas tables.
19
Event based Streaming API for XML.
22
4 DATA
I processed the NaPTAN data during one pass of the Java StAX API in application naptan2sql that I developed. For every stop and every stop area, the
application saves the important information (ATCO code, common name, latitude,
longitude and type) in a file in the form of a SQL insert query for the naptan table,
cf. Figure 4.2. Every stop that is a member of a stop area is saved as a SQL insert
query for the stopareas table. The following example shows train stations in Edinburgh as they were gathered from the NaPTAN data. In the header of the table,
column name stoptype is abbreviated to “st”.
atcocode
|
commonname
|
lat
|
lon
| st
-------------|-----------------------------|---------|---------|----9100EDINPRK | Edinburgh Park Rail Station | 55.9275 | -3.3077 | RLY
9100EDINBUR | Edinburgh Rail Station
| 55.9524 | -3.1882 | RLY
9100HAYMRKT | Haymarket Rail Station
| 55.9458 | -3.2184 | RLY
9100SLATEFD | Slateford Rail Station
| 55.9267 | -3.2435 | RLY
9100STHGYLE | South Gyle Rail Station
| 55.9363 | -3.2995 | RLY
4.2.4
NPTDR data transformation
The NPTDR data (National Public Transport Data Repository) contains timetable
information about all services in the UK. They are organised according to the means
of transport and a local transport authority resulting in 291 XML files for trains,
coaches and ferries (525 MB). I transformed the NPTDR data from October 2010
which uses the TransXChange XML Schema in version 2.1. The overview of the
TransXChange XML Schema is shown in Figure 4.3.
The NPTDR XML data is transformed to the form of SQL insert queries by
a Java batch application nptdr2sql. For developing this application, I reused and
improved the source code created by McCafferty [20]. The Java StAX API is used
again for effective processing of large XML files.
The timetable information in the NPTDR data is represented by a list of
vehicle journeys. Each vehicle journey is assigned a departure from its first stop,
a service name, stops on its journey and runtimes between them. It is transformed
to a group of database records in the data table, cf. Figure 4.4. The origin and the
destination of the vehicle journey are represented by their ATCO codes which are
related to the naptan table. The service type distinguishes between different means
of transport, i.e., a train “T”, a coach “C” and a ferry “F”. The departure is expressed
as an integer value of minutes from midnight, the runtime is in minutes.
23
4 DATA
Figure 4.3: Overview of the TransXChange XML Schema of the NPTDR data.
Two problem causing issues need to be solved during the transformation. First,
there are vehicle journeys in the NPTDR data which have impossible runtimes of
zero minutes. When I compared a timetable of a train in the XML file with an
online train journey planner provided by National Rail20 , I discovered that if there
is a zero runtime between two stops, the train does not usually stop at the second
stop. To fix the problem, these stops were removed from the vehicle journeys.
20
http://ojp.nationalrail.co.uk/service/planjourney/search
24
4 DATA
areas
data
area
SMALLINT
region
authority CHARACTER VARYING(32)
area
SMALLINT
origin
destination
atcocode
servicetype
CHARACTER(1)
commonname
service
lat
DOUBLE PRECISION
servicebase
lon
DOUBLE PRECISION
departure
INTEGER
stoptype
runtime
INTEGER
lonlat
USER-DEFINED
naptan
Figure 4.4: ER diagram of the tables data, naptan and areas. The origin and the destination in the data table are foreign keys for ATCO code in the naptan table. The area is
a foreign key for the number of the area in the areas table.
Second, a unique service number for each vehicle journey is needed for planning
with a timetable. However, vehicle journeys at different times during a day on the
same route usually have the same service number. Therefore, every vehicle journey is
assigned a unique identifier which is the concatenation of a service number, a service
type and a unique number. This service identifier is stored in the column service
and the original service number is stored in the column servicebase.
In the following example of transformed data, there is a train with a service
number 2Y04 which departs from Edinburgh Rail Station at 13:37 and arrives in
North Berwick at 14:10. In the header of the table, column name servicetype is
abbreviated to “t”, servicebase to “sbase”, departure to “dep” and runtime to “rt”.
area |
origin
| destination | t |
service
| sbase | dep | rt
------|-------------|-------------|---|-------------|-------|-----|---627 | 9100EDINBUR | 9100MSELBGH | T | 2Y04-T14619 | 2Y04 | 817 | 3
627 | 9100MSELBGH | 9100WALLYFD | T | 2Y04-T14619 | 2Y04 | 820 | 4
627 | 9100WALLYFD | 9100PPAN
| T | 2Y04-T14619 | 2Y04 | 824 | 3
627 | 9100PPAN
| 9100LNGNDRY | T | 2Y04-T14619 | 2Y04 | 827 | 5
627 | 9100LNGNDRY | 9100DREMJ
| T | 2Y04-T14619 | 2Y04 | 832 | 7
627 | 9100DREMJ
| 9100NBERWCK | T | 2Y04-T14619 | 2Y04 | 839 | 11
The transformation of the NPTDR data is a computationally intensive task.
The student computation server (student.compute) at the School of Informatics is
able to transform the NPTDR data for trains, coaches and ferries in 24 hours.
25
4 DATA
4.3
Data processing
Once the transformation of the XML data is finished, we proceed to step by step
description of data processing. To start with, SQL insert queries produced by the
applications naptan2sql and nptdr2sql are inserted into the database called transport,
cf. Figure 4.5. After the import, the naptan table contains the information about
stops and stop areas, the stopareas table relates stops to stopareas and the data
table holds the information about the connections between the stops and about the
timetable. Then, the data is processed in several successive steps by SQL functions
that are written in the procedural PL/pgSQL language.
PostgreSQL database system
naptan2sql.jar
Database transport
Transform NaPTAN
nptdr2sql.jar
Transform NPTDR
Data processing
NaPTAN XML data
PL/pgSQL functions
NPTDR XML data
Figure 4.5: Overview of the data transformation and processing.
The data processing functions are written in the PL/pgSQL language because
then they are more effective, easier to write and offer more concise source code
than any external application for processing the data (written for example in Java
or C++). Also, all functions are stored in the database schema and the database
system is able to precompile them to speed up their execution.
4.3.1
Mistakes and stops merging
To begin with, mistakes in the data need to be corrected. In the NaPTAN data,
five ferry stops and six coach stops used in NPTDR data were missing. They were
added to the naptan table manually. In NPTDR data, several connections used
stops with an incorrect stop type (e.g., a train using a bus stop). In addition, some
ferry connections went from a ferry stop to a bus stop in the same port. These errors
were also corrected manually in the data table. After correcting the mistakes, the
naptan table is spatially enabled by computing the coordinates of each stop in the
British National Grid spatial reference system as discussed in section 4.2.2.
26
4 DATA
In order to treat bus bays at a bus station and parts of a train station as a single
stop, bus and train stops are merged according to the NaPTAN data available in
the stopareas table. ATCO codes of stops are simply replaced by ATCO codes of
stopareas. Then, no walking connections are needed for getting from one bus bay to
another within a bus station which helps to reduce the complexity of the transport
network.
Then, the naptanused table is created. It contains only the stops that are
present in the data table. The naptan table contains 421 309 stops and stopareas
from the NaPTAN data whereas the naptanused table contains a significantly lower
number of 4 229 stops. The difference is caused by removing the stops of local buses.
Consequently, the naptanused table speeds up the rest of the processing.
However, the NaPTAN data does not contain all groups of stops that are very
close to each other (e.g., several bus bays at a bus station). Therefore, additional
groups of stops are inferred and then merged. Stops that are closer to each other
than 120 metres are considered as one stop. Inference of groups of stops is done
efficiently using the PostGIS extension.
4.3.2
Walking connections
In order to use different means of transport during one journey, it is necessary to
introduce walking connections to connect stops with different types. Every ferry stop
is connected with a nearest train station and a nearest bus stop in the radius of 2 000
metres. Every train station is connected to a nearest bus stop in the radius of 1 000
metres. Additionally, train stations that are closer than 600 metres and with no
direct train connection are also connected. Also, bus stops with only one connection
to another bus stop (i.e., dead end bus stops) are connected to any nearest bus or
train station closer than 1 000 metres.
Finally, all the walking connections that were found are checked by the Google
Distance Matrix API [6]. In some cases, the straight distance between two stops is
a very poor approximation, e.g., each stop is on one side of the river and there is
no bridge available. All walking connections that were longer than 3 000 metres are
deleted. The walking time t for each connection is set by equation (4.1)
tAP I
t=8+
60
27
(4.1)
4 DATA
where tAP I is the time in seconds returned by the Google Distance Matrix API. An
additional 8 minutes time is added for exiting the first stop and then to find a bus
or train at the second stop.
The two following figures 4.6 and 4.7 show the merging of coach stops and the
addition of walking connections in the centre of Aberdeen. In Figure 4.7, the bus
bays at Aberdeen bus station are merged and the disjoint networks of trains, coaches
and ferries are interconnected with walking connections. Therefore, an agent can
use different means of transport during its journey.
Figure 4.6: The centre of Aberdeen before merging stops and adding walking connections
(map source: [3]). Black lines denote the connections by public transport to other stops.
Figure 4.7: The centre of Aberdeen after merging stops and adding walking connections
(map source: [3]). Three red lines represent walking connections (from the train station to
the bus station, from the bus station to the ferry terminal and from the ferry terminal to
the train station).
28
4 DATA
4.3.3
Regions and the timetable
In order to test the algorithm in scenarios of increasing complexity, the stops from the
naptanused table are divided into three regions (Scotland, central UK and southern
UK). For generating testing scenarios in a specified region, a relation between stops
and regions is needed. This relation is created by the PostGIS extension, which
enables the use of a polygon as a bounding box of the region and to return all stops
that lie within. The regions created are shown in Figure 4.8.
Figure 4.8: Train network in Scotland, central UK and southern UK (map source: [3]).
timetable
origin
destination
servicetype
CHARACTER(1)
service
servicebase
departure
INTEGER
runtime
INTEGER
naptan
atcocode
commonname
lat
DOUBLE PRECISION
lon
DOUBLE PRECISION
stoptype
lonlat
USER-DEFINED
Figure 4.9: ER diagram of the timetable and naptan tables. The origin and the destination
in the timetable table are foreign keys for ATCO code in the naptan table.
29
4 DATA
Then, the timetable table with timetable information is created. When the
data from all NPTDR XML files is combined, there are duplicates in train and coach
vehicle journeys. The duplicates are eliminated so the timetable table contains only
unique vehicle journeys, cf. Figure 4.9.
4.3.4
Relaxed domain
Finally, the simpledomain table for a fast generation of the relaxed domain (cf., section 3.3) is created. The following SQL query creates the relaxed domain for trains.
At first, the SELECT query on lines 2–5 returns the connections in the relaxed domain (the cost of a connection is the minimal time in seconds needed for travelling
from its origin to its destination). Then, the INSERT query on line 1 inserts the
whole result of the SELECT query into the simpledomain table.
1 INSERT INTO simpledomain(origin, destination, servicetype, runtime)
2
SELECT origin, destination, servicetype, (min(runtime) ∗ 60)
3
FROM timetable
4
WHERE (servicetype LIKE ’T’)
5
GROUP BY origin, destination, servicetype;
When the relaxed domain is created, cf. Figure 4.10 on the left, there is a large
number of direct trains that are not stopping at every stop on their journey. With
the direct trains, the relaxed domain does not represent accurately the underlying
network of rail tracks (there are edges in the relaxed domain with no underlying rail
tracks). In addition, it causes problems when a plan from the relaxed domain with
direct trains is matched to the timetable, cf. section 3.4.2. Therefore, the direct
trains need to be eliminated from the relaxed domain.
The simplest way to eliminate the direct trains would be to get the information
which train is direct from the NPTDR data. The TransXChange XML Schema
supports this idea by element <Route> which includes all stops on the journey of the
train. However, this element is not used in the NPTDR data at all. Consequently,
the direct trains must be eliminated in another way.
For eliminating the direct trains, I used the following approach. For every edge
A to B in the relaxed domain, check the following: if and only if there is at least
one train going from A to B without using the edge A to B, it is a direct train. In
fact, a transitive reduction of a graph is performed [30]. The relaxed domain after
30
4 DATA
removing the direct trains from the simpledomain table is shown in Figure 4.10 on
the right. As a result, the edges represent the underlying network of rail tracks more
accurately and the relaxed domain is simplified.
Figure 4.10: Relaxed domain with direct trains in the left map. In the right map, the
direct trains are filtered out (map source: [3]).
To conclude the data processing, an overview of the tables in the database is
shown in Table 4.1. The size of the whole database is approximately 550 MB.
4.4
Summary
This chapter has described the data processing of the transportation data stored in
the XML format. Importing all the data into the PostgreSQL database enabled us to
automatically and effectively process the data. Mistakes in the data were corrected,
31
4 DATA
duplicate services were deleted, bus bays were merged into one bus station and
walking connections allowing changes between different means of transport were
introduced. Also, tables for fast generation of the relaxed and the full domain
were created. As a result, the data is ready to be used by the algorithm whose
implementation is described in the next chapter.
Table 4.1: Tables in the database and their record counts and sizes.
table
records
size (MB)
data
746 889
287.78
naptan
timetable
421 309
242 671
137.27
102.13
stopareas
168 123
20.60
10 235
2.33
naptanused
regions
4 191
4 191
1.15
0.91
directroutes
3 300
0.52
walks
781
0.20
stopareasinferred
areas
682
148
0.17
0.05
1 602 520
553.10
simpledomain
total
32
5 IMPLEMENTATION
5 Implementation
After the data is processed, we can proceed to the implementation of the algorithm
which is described in the first part of this chapter. In the second part, PDDL
specifications of the relaxed and the full domain are presented. In the last part, the
visualisation of the travel domain is presented.
5.1
The algorithm
Following the description of the algorithm (cf. section 3.4), the algorithm is implemented as a system of bash scripts, one Java application, three off-the-shelf planners
and the program psql for fetching data from the database. The communication between the parts of the system is performed by input and output files. An overview
of the whole system is shown in Figure 5.1. The implementation of the algorithm is
described in more detail in the rest of this section.
1. Initial phase
2. BR phase
./phase2.sh
./run.sh S
Run scenario S
3. Timetabling phase
./results.sh
Execute BR phase
Save results
javabrp.jar
./phase1.sh
Single-agent plans
1...N
BR planning
Fetch timetable
Create PDDL
1...I
./plan.sh
Iteration
Plan with LAMA
1...N
./plan.sh
Solve BRP problem
with LAMA
./fetch.sh
Execute psql
./phase3.sh
Timetabling with
SGPlan6 , POPF2
Figure 5.1: An overview of the implementation of the algorithm (S is the number of
scenario, N the number of agents and I the number of iterations in the BR phase).
33
5 IMPLEMENTATION
In the initial phase, PDDL problem files for all agents in the relaxed domain
are generated. Then, an initial plan for each agent is computed by the LAMA
planner. The BR (best-response) and timetabling phases exceed the capabilities of
bash scripts. Therefore, these two phases of the algorithm were developed as a Java
batch application called javabrp.
In the BR phase, javabrp is executed. It loads the single-agent plans from the
initial phase and then runs the iterations of the best-response approach. For each
agent in each iteration, a simpler best-response planning problem is created from
the point of view of the individual agent and then solved by the LAMA planner.
A
part 3
(1)
C
part 1
D
(1, 2)
part 2
F
E
(1, 2)
(1, 2)
(2)
B
Figure 5.2: An example of a group journey of two agents with three parts.
In the timetabling phase, the joint plan of every independent group of agents
is matched to the relevant timetable in the full domain. According to the definition
of the relevant timetable in section 3.4.2, the SQL query for fetching the relevant
timetable from the database is composed in the following way: for every part of the
group journey, return all services from the timetable which connect the stops in that
part. The SQL query for the example in Figure 5.2 is therefore:
SELECT origin, destination, servicetype || service , departure, runtime
FROM timetable WHERE
(( origin IN (’A’)) AND (destination IN (’C’))) OR
(( origin IN (’B’)) AND (destination IN (’C’))) OR
(( origin IN (’C’, ’D’, ’E’)) AND (destination IN (’D’, ’E’, ’F’)))
ORDER BY origin, destination, departure;
Inevitably, the result of the SQL query contains backward services (e.g., from
the stop E to D in the two-agent example in Figure 5.2). Therefore, the backward
34
5 IMPLEMENTATION
services are filtered out when the returned services are assigned to the connections
in the joint plan.
Next, a PDDL specification of the full domain with the relevant timetable for
each group is generated (see next section for details). Then, two different planners
SGPlan6 and POPF2 are executed to match the group journeys to the timetable.
The planners use different strategies for finding a plan (cf. section 2.3). Therefore,
the planners produce different results and we can pick the plan with the shortest
duration. It is not known beforehand which planner will return a better plan.
Finally, single-agent plans are matched to the timetable for evaluation purposes
(cf. section 6.2.4).
5.2
PDDL specifications
In this section, the PDDL specifications used in this project are described. Following
the domain definitions in section 3.3, two types of PDDL are required. First, the
relaxed domain which is used in the first two phases of the algorithm. Second, the
full domain which is used in the timetabling phase.
In both versions of the domain, every train, coach and ferry stop has to have
a unique identifier. The ATCO code from the NaPTAN data is best suited for this
purpose. The prefix “AC” is used because names of objects starting with a number
are not supported in PDDL [21]. For example, Edinburgh Rail Station with the
ATCO code 9100EDINBUR has the identifier AC9100EDINBUR.
5.2.1
The relaxed domain
In the relaxed domain, a single agent aims to travel from its origin to its destination.
The domain file contains two predicates, two functions and only one action. The
domain file is shown in Figure 5.3. The predicate connection is true when there is an
edge from ?origin to ?destination, the predicate at denotes the current location
of the agent. The function time returns the cost of travelling from the location
?origin to ?destination. The action go moves the agent from the location ?o
to ?d and it increases the total cost of the plan which is stored by the total-cost
function.
35
5 IMPLEMENTATION
(define (domain travelplanner)
(:requirements :typing :action-costs)
(:types location)
(:predicates
(connection ?origin - location ?destination - location)
(at ?loc - location)
)
(:functions
(time ?origin - location ?destination - location)
(total-cost)
)
(:action go
:parameters (?o ?d - location)
:precondition (and (at ?o) (connection ?o ?d) )
:effect (and
(at ?d) (not (at ?o))
(increase (total-cost) (time ?o ?d)) )
)
)
Figure 5.3: The domain file for the relaxed domain.
The problem file contains a list of stops, a list of connections between the
stops and their costs. The abbreviated example for trains in Scotland is shown in
Figure 5.4. There are 344 different train stops and 744 connections, e.g., the cost
of travelling from AC9100ABDO to AC9100BISLND is 240 (time in seconds needed
to travel). In the initial state, the agent’s origin is set to Dalmuir Rail Station and
the total cost of the agent’s plan is set to zero. The agent travels to Fort William
Rail Station which is specified in the goal state. The planner is told to minimise the
total cost of the plan.
(define (problem practice-problem)
(:domain travelplanner)
(:objects
AC9100CDND - location
AC9100HAYMRKT - location
<< 342 lines omitted >>
)
(:init
(connection AC9100ABDO AC9100BISLND)
(connection AC9100ABDO AC9100DALGETY)
(= (time AC9100ABDO AC9100BISLND) 240 )
(= (time AC9100ABDO AC9100DALGETY) 240 )
36
5 IMPLEMENTATION
(= (total-cost) 0 )
(at AC9100DALMUIR)
)
(:goal (at AC9100FRTWLM))
(:metric minimize (total-cost))
)
Figure 5.4: The abbreviated problem file for the relaxed domain in the scenario S1 (trains
in Scotland, cf. section 6.1).
5.2.2
The full domain
In the full domain, multiple agents aim to travel from their origins to their destinations. The domain is based on the plan P from the relaxed domain. Therefore, it
contains only the stops that are present in the plan P and the shared parts of the
journeys are already specified.
In order to represent temporal constraints (e.g., an agent must be at a stop
before the departure of a service), it is necessary to know the current time of every
agent. The function (agent-time ?a - agent) is used to store the current time of
the agent ?a.
The domain file contains a list of partially instantiated durative actions for
travelling from one stop to another. The origin and the destination as well as the
agents using this action are instantiated. The only free variable is a service name of
the service the agents are going to use. An example of a durative action is shown in
Figure 5.5. The durative action go-agent-1-2_A-B enables agent 1 and 2 to travel
together from the stop A to B.
Let N be the number of agents travelling together, ati the current time of
agent i, dAB (s) the departure of the service s from the stop A to B and rAB (s) its
runtime. Then, the duration DAB of the action to travel from the stop A to B is
computed by the formula (5.1).
DAB =
N
X
(dAB (s) + rAB (s) − ati )
(5.1)
i=1
The temporal planner tries to minimise the sum of durations of agents’ journeys. In other words, it tries to find a journey with minimal waiting times in between
services.
37
5 IMPLEMENTATION
The conditions of the action are the following: there must be a connection by
the service s between the stops A, B and the agents must be present at the stop A
before the departure of the service s. Once the action is executed, the agents are
located in the stop B and their current time is set to the arrival of the service s at
the stop B.
(:durative-action go-agent-1-2_A-B
:parameters (?s - service)
:duration (= ?duration (+
(- (+ (departure A B ?s) (runtime A B ?s)) (agent-time agent1))
(- (+ (departure A B ?s) (runtime A B ?s)) (agent-time agent2))
))
:condition (and
(at start (connection A B ?s))
(at start (at agent1 A))
(at start (<= (agent-time agent1) (departure A B ?s)))
(at start (at agent2 A))
(at start (<= (agent-time agent2) (departure A B ?s)))
)
:effect (and
(at end (at agent1 B))
(at start (not (at agent1 A)))
(at end (assign (agent-time agent1)
(+ (departure A B ?s) (runtime A B ?s))))
(at end (at agent2 B))
(at start (not (at agent2 A)))
(at end (assign (agent-time agent2)
(+ (departure A B ?s) (runtime A B ?s))))
))
Figure 5.5: A durative action go-agent-1-2_A-B in the domain file for the full domain.
The problem file contains a list of services and their departures and runtimes.
The abbreviated example for two agents travelling by train in Scotland is shown
in Figure 5.6. There are 201 different services and 885 timetable connections, e.g.,
the train T2G24-T36728 from AC9100STHGYLE to AC9100HAYMRKT departs at
06:26 (386 minutes after midnight) and its runtime is 10 minutes. In the initial
state, origins of the agents are set to South Gyle Rail Station and Haymarket Rail
Station. In the goal state, destinations are set to Glasgow Central Rail Station and
Uddingston Rail Station. The planner is required to minimise the makespan (the
total duration) of the plan.
38
5 IMPLEMENTATION
(define (problem practice-problem)
(:domain timetabletravelplanner)
(:objects
T2Y51-T18976 - service
T2Y55-T18986 - service
)
(:init
(connection AC9100STHGYLE AC9100HAYMRKT T2G24-T36728)
(= (departure AC9100STHGYLE AC9100HAYMRKT T2G24-T36728) 386)
(= (runtime AC9100STHGYLE AC9100HAYMRKT T2G24-T36728) 10)
(at agent1 AC9100STHGYLE)
(= (agent-time agent1) 0 )
(at agent2 AC9100HAYMRKT)
(= (agent-time agent2) 0 )
)
(:goal (and (at agent1 AC910GGLGC) (at agent2 AC9100UDNGSTN)))
(:metric minimize (total-time))
)
Figure 5.6: The abbreviated problem file for the full domain.
5.3
Visualisation
In order to get a better idea of what the travel domain in the United Kingdom looks
like, I decided to create a map visualisation. The visualisation is accessible through
a web browser, it is programmed in PHP together with the Google Maps JavaScript
API21 .
In Figure 5.7, the user interface of the visualisation is shown. On the left hand
side, there is a map showing the stops and the connections in the relaxed domain.
The map can be moved around and zoomed in and out thanks to the Google Maps
API. On the right hand side, there is a control panel for changing the scenario and
displaying information about it. In addition, the scenario generation tool allows fast
choosing of agents’ origins and destinations for an experiment designed by hand. The
origins and destinations are added to the grey box simply by clicking on the stops
in the map.
Furthermore, the visualisation is able to show the plans of agents that were
found in the initial and the BR phase, cf. Figure 5.8. On the left hand side, there
is a map showing the journeys of agents. The control panel on the right hand side
21
http://code.google.com/intl/cs/apis/maps/documentation/javascript/
39
5 IMPLEMENTATION
now enables navigation between the initial plans and the plans in each step of the
BR phase through the buttons “Previous” and “Next”.
Figure 5.7: A visualisation of the train network in Scotland (map source: [3]).
Figure 5.8: A visualisation of the plans of six agents travelling in Scotland in the E–W
direction (map source: [3]). The origins are denoted by the light blue markers whereas the
destinations are denoted by the dark blue ones. The thicker the dark blue line is, the more
agents are travelling together.
40
5 IMPLEMENTATION
To summarise, the map visualisation is a useful tool to display the travel
domain and the shared journeys of agents. It provides a better idea of the domain
together with the possibility to check visually if the shared journeys of agents are
reasonable.
5.4
Summary
In this chapter, the implementation of the three phases of the algorithm has been
described. Then, the PDDL specifications for the relaxed and the full domain have
been presented together with the examples of domain and problem files. Finally,
the visualisation of the travel domain and the agent’s journeys has been introduced.
Once the implementation of the algorithm has been explained, we can proceed to
the evaluation of the algorithm which follows in the next chapter.
41
6 TESTING AND EVALUATION
6 Testing and evaluation
This chapter contains the evaluation of the algorithm over public transportation
data of the United Kingdom. At first, it describes the testing scenarios and how the
experiments were generated. Then, it presents the results of the algorithm in the
generated experiments.
6.1
Scenarios
Five different scenarios of increasing complexity were created for testing the algorithm. They are based on the regions of the United Kingdom which are defined in
section 4.3.3. An overview of the scenarios is shown in Table 6.1, their parameters
are shown in Table 6.2. Also, the scenarios are visualised on the map in Figure 6.1.
In every scenario, there are either trains or trains and coaches as means of
transport. Ferries are not used in the testing scenarios because their network is very
sparse. When the testing scenario is generated (cf. section 6.1.1), there are many
cases in which it is not possible to find a shared timetable because of the infrequent
timetable of ferries.
Table 6.1: An overview of the testing scenarios.
scenario code
regions
means of transport
S1
Scotland
trains
S2
Scotland
trains and coaches
S3
central UK
trains
S4
S5
central UK
central and southern UK
trains and coaches
trains
In order to observe the behaviour of the algorithm with different numbers of
agents, every scenario is tested with 2, 4, 6, . . . , 14 agents in it. To ensure that sharing of journeys is possible, all agents in the scenario travels in the same direction.
There are four possible directions of travel (N–S, S–N, W–E, E–W). For every scenario and for every number of agents in the scenario, 40 different experiments are
generated (10 experiments for each direction of travel). Consequently, there are
42
1 400 experiments for testing the algorithm. All experiments are generated partially
randomly as defined in the next section.
Table 6.2: Parameters of the testing scenarios: number of stops, connections in the relaxed
domain and connections in the timetable.
scenario code
S1
S2
S3
S4
S5
stops
344
721
1 044
1 670
2 176
744
23 994
1 520
26 702
2 275
68 597
4 001
72 937
4 794
203 590
connections
timetable connections
Figure 6.1: Five testing scenarios visualised on the map; train stops are red, coach stops
are green (map source: [3]).
6.1.1
Experiment generation
Every generated experiment is defined by the scenario (S1, . . . , S5), the number of
agents in it and the direction of travel. The experiment is generated in three steps
based on the data in the database.
Firstly, the stops of the region specified by the scenario are fetched from the
regions table, cf. section 4.3.3. Secondly, the connections for the region in the
relaxed domain are fetched from the simpledomain table, cf. section 4.3.4. Lastly,
every agent in the scenario is assigned an origin and a destination based on the
direction of travel. For the rest of this section, assume that the agents travel in the
north–south direction.
43
To compute the origin–destination pairs, two axes x and y are placed over
the region dividing the stops in the scenario into four quadrants I, II, III and IV,
cf. Figure 6.2. Then, the set O of possible origin–destination pairs is computed
according to definition (6.1).
O := {(A, B) | ((A ∈ I ∧ B ∈ IV) ∨ (A ∈ II ∧ B ∈ III)) ∧ |AB| ∈ [20, 160]} (6.1)
Every agent travels from the origin A to the destination B either from the
quadrant I to IV or from the quadrant II to III. The direct distance |AB| between
the origin and the destination is set to the interval from 20 to 160 km (when using
the road or rail tracks, this interval stretches approximately to the interval from
30 to 250 km). This interval is chosen to prevent too long journeys which can be
problematic to complete in 24 hours.
Afterwards, the required number of origin–destination pairs is randomly selected from the set O. An example of six selected origin–destination pairs is shown
in Figure 6.2. For the other directions of travel, the origin–destination pairs are
computed accordingly.
II I
x
III IV
y
Figure 6.2: Six randomly generated origin–destination pairs for agents travelling in the
north–south direction.
44
6.2
Evaluation
In this section, the performance of the algorithm is evaluated according to the hypothesis of the project: “A multi-agent planning algorithm is able to plan meaningful
shared routes for all agents in a feasible time in a real-world travel domain.”
Three different metrics of the algorithm are evaluated. First, the amount of
time the algorithm needs to plan shared journeys for all agents in a testing scenario
is measured. It shows the scalability of the algorithm in terms of scenario size and
number of agents in a scenario. Second, the success rate of finding a plan in each
phase of the algorithm is observed. Third, the quality of plans found is measured.
The cost of a shared travel plan is compared to the sum of costs of individual singleagent travel plans and the prolongation of a journey duration caused by sharing is
measured.
The evaluation of the algorithm according to the metrics is presented in the
next four sections. If not stated otherwise, the values in graphs are averaged over
40 experiments that were performed for every scenario and every number of agents in
it. The total number of 1 400 experiments was performed in two days using 30 Linux
desktop computers with 2.66 GHz Intel Core 2 Duo processor and 4 GB of memory.
6.2.1
Scalability
In order to observe the scalability of the algorithm, the amount of time the algorithm
needs to plan shared journeys for all agents in a testing scenario is measured.
In a large part of the experiments, the temporal planners SGPlan6 and POPF2
in the timetabling phase returned some plans in the first few minutes but then they
continued exploration of the search space without returning any better plan. Because
of this behaviour, a time limit is set for the temporal planners according to Table 6.3.
Table 6.3: Time limits for the temporal planners in the timetabling phase.
agents in a group
time limit [min]
[1, 5]
5
[6, 10]
10
> 10
15
45
In Figure 6.3, the computation times of the algorithm are plotted. The graph
appears to show that with the increasing number of agents in the scenario, the
computation time grows linearly. Therefore, it was empirically confirmed that the
algorithm avoids the exponential blowup in the action space characteristic for a centralised multi-agent planner. Consequently, the algorithm is scalable in terms of the
number of agents in the scenario.
70
S1: Scotland (trains)
S2: Scotland (trains, coaches)
S3: Central UK (trains)
60
S4: Central UK (trains, coaches)
computation time [min]
S5: South and central UK (trains)
50
40
30
20
10
0
2
4
6
8
10
12
14
agents in scenario
Figure 6.3: The dependence of the computation time of the algorithm on the number of
agents in the scenario.
With the growing size of the scenario, the computation time also increases
linearly. In Figure 6.4, there is a graph of computation times for 4, 8 and 12 agents
in dependency on the scenario S1, . . . , S5. Therefore, the algorithm is also scalable
in terms of the domain size.
As was stated in section 1.1, our requirement for planning in a feasible time is
that the algorithm is able to calculate the results for a train scenario with 6 agents
in the central UK region in a time less than 15 minutes. According to Figure 6.3,
the average computation time of the algorithm in the scenario S3 is 13 minutes so
this requirement was fulfilled.
46
S1
60
S3
S4
S5
12 agents
8 agents
4 agents
50
S2
40
30
20
10
0
0
1000
2000
3000
4000
5000
scenario size [connections in the relaxed domain]
Figure 6.4: The dependence of the computation time of the algorithm on the size of the
scenario.
In addition, the computation time is measured separately for each phase of
the algorithm. In Figure 6.5, the computation times of the three phases of the
algorithm in the scenario S3 are shown. It can be observed that the algorithm
spends the largest part of the computation time in the timetabling phase even when
it is restricted by the time limits for matching a group of agents to timetable (defined
in Table 6.3).
30
1. Initial phase
25
2. BR phase
3. Timetabling phase
20
15
10
5
0
2
4
6
8
10
12
14
agents in scenario
Figure 6.5: Computation times of the phases of the algorithm (scenario S3).
47
To summarise, the algorithm scales linearly both with the scenario size and
the number of agents. In the hypothesis, the requirement for planning in a feasible
time was fulfilled.
6.2.2
Plan quality
In this section, the success rate of finding a plan in each phase of the algorithm is
discussed. On the one hand, in the initial phase with the relaxed domain, 99.1 %
of initial single-agent plans are found (there are 11 200 agents in total, 100 without
a plan). In the BR phase with the relaxed domain, an additional 37 single-agent
plans are found. That means that the best-response approach (a strategic centralised
multi-agent planner) solved a problem in 37 % of cases where the single-agent planner
failed. After the BR phase, 99.4 % of agents has a journey plan. The remaining
0.6 % of agents (63 agents) with no plan in the relaxed domain are not matched to
timetable in the timetabling phase. In brief, planning in the relaxed domain in the
initial and the BR phase of the algorithm is very successfull.
groups with timetable [%]
100
90
80
70
60
50
40
30
20
10
0
1
2
3
4
5
6
7
8
group size [number of agents]
Figure 6.6: The dependence of the percentage of groups for which a timetable was found
on the group size.
48
On the other hand, the timetabling phase proved to be the most problematic
part of the algorithm. In Figure 6.6, the percentage of groups for which a timetable
was found in dependency on the group size is shown. In order to create this graph,
groups of agents in every scenario were statistically analysed. There are several
things to point out.
The percentage of groups for which a timetable was found is dependent on the
size of the group. The bigger the group is, the harder finding a timetable is and the
lower the success rate is. When a group of agents sharing parts of their journeys is
big (5 or more agents), the percentage of groups with timetable gets below 50 %.
With a group of 8 agents, almost no timetable is found.
Next, the success rate is much better in the scenarios that contain trains only
(S1, S3 and S5) than in the scenarios combining trains and coaches (S2 and S4).
I assume that this is mainly caused by different service densities in the rail and
coach network. Service density is calculated as a ratio of timetable connections
over connections in the relaxed domain. In Scotland, the service density is 33 train
services a day per one connection in the relaxed domain compared to only 4 coach
services, cf. Table 6.4. As a consequence, it is much harder to find a timetable in
a scenario with both trains and coaches because the timetable of coaches is much
less regular than the timetable of trains.
Table 6.4: Service densities in train and coach networks in Scotland and central UK.
region
Scotland
central UK
service type
trains
coaches
trains
coaches
timetable connections
connections
23 980
730
2 586
654
68 575
2 253
3 892
1 278
33
4
30
3
service density
As described in section 1.1, by being able to plan shared routes for all agents
we mean that every agent in the scenario will receive a single or shared journey.
After the BR phase, 99.4 % of all agents in the experiments receives a single or
shared journey in the relaxed domain. In the timetabling phase, the percentage of
groups assigned a timetable depends on the size of the group, cf. Figure 6.6. When
a group is not assigned a timetable, the joint shared plan in the relaxed domain can
49
serve as a guide for the shared journey and be matched to the timetable manually.
To conclude, nearly all agents receive a plan in the relaxed domain whereas only
a part of groups receives a detailed timetable.
In summary, the algorithm is able to find single or shared journeys for all agents
in the scenario. However, only part of the groups of agents travelling together is
assigned a timetable. Several possibilities for how to improve the timetabling phase
of the algorithm are discussed in section 8.2.
6.2.3
Improvement in cost
This section describes the improvement in cost of agents’ journeys. At first, the
improvement in cost is defined. Assume that cindividual is the sum of costs of individual initial single-agent plans and cshared is the sum of costs at the end of the BR
phase of the algorithm when the cost of a shared journey is computed according to
formula (3.3). Then, the improvement I in the cost is defined by formula (6.2).
I=
cindividual − cshared
· 100 %
cindividual
(6.2)
The improvement in cost of journeys is plotted in Figure 6.7. The more agents
there are in the scenario, the better the improvement. However, there is a trade-off
between the improvement in cost and the percentage of groups assigned a timetable,
cf. Figure 6.6.
improvement in cost [%]
50
40
30
20
10
0
2
4
6
8
10
agents in scenario
Figure 6.7: The average improvement in cost of journeys.
50
12
14
In addition, the improvement in cost reached in the BR phase is observed.
Except the scenario S1, the improvement in cost reached in the BR phase is approximately 25 % of the total improvement. This means that the initial single-agent
plans are very good in terms of journey sharing and there is not too much space for
improvement by the best-response approach. In the scenario S1 (trains in Scotland),
the improvement reached in the BR phase is around 10 % only. This is caused by
the low density of rail tracks in the Highlands. Therefore, the agents just share the
journey because there is no other way how to get from an origin to a destination.
To summarise, improvement in cost of agents’ journeys was reached by sharing
parts of the journeys. Nevertheless, there is a trade-off between the amount of
improvement (the bigger the group, the better the improvement) and the percentage
of found timetables, cf. Figure 6.6.
6.2.4
Prolongation of journeys
On the one hand, travel sharing is beneficial in terms of cost. On the other hand,
a shared journey has in most cases longer duration than a single one. In order to
evaluate this trade-off, prolongation of the journeys is measured.
Assume that tindividual is the sum of durations of individual initial single-agent
plans and tshared is the duration of the shared joint plan at the end of the timetabling
phase. Then, the prolongation L of a journey is defined by formula (6.3).
L=
tshared − tindividual
· 100 %
tindividual
(6.3)
The prolongation of a journey can be calculated only when a group is assigned
a timetable and also each member of the group is assigned a single-agent timetable.
In every experiment, the single-agent timetables are computed once the timetabling
phase of the algorithm is finished.
A graph of the percentage of groups that have a timetable with prolongation
less than 30 % in dependency on the group size is shown in Figure 6.8. In other
words, groups that benefit from travel sharing and whose journeys are not prolonged
excessively by travelling together are shown in the graph. For this graph, groups
from all 1 400 experiments were used.
Approximately 15 % of groups with 3–4 agents are assigned a timetable with
prolongation less than 30 %. Such a low percentage of groups can be explained by
51
the algorithm trying to optimise the price of the journey by sharing in the BR phase.
However, there is a trade-off between the price and the duration of the journey. The
more agents are sharing a journey, the longer the journey duration is likely to be.
groups with prolongation < 30 % [%]
40
30
20
10
0
2
3
4
5
6
7
8
group size [number of agents]
Figure 6.8: A percentage of groups that have timetable with less than 30 % prolongation.
In summary, only a minority of groups is assigned a timetable with prolongation less than 30 %. As discussed in section 8.2, this behaviour can be improved
for example by splitting the groups where the prolongation is too high and retimetabling.
6.3
Summary
The first part of the chapter has explained how the testing scenarios were created and
how the experiments were generated. The second part of the chapter has evaluated
the results of the algorithm in terms of scalability, plan quality, improvement in cost
and prolongation of journeys.
52
7 DISCUSSION
7 Discussion
This section discusses how the algorithm can be used in practice as a part of a travel
planning system for real passengers. Then, speeding up the algorithm by parallel
computing, problems with timetabling and domain-independent and domain-specific
solutions are explored.
7.1
Using the algorithm in practice
In a real-world travel planning system, every user submits in advance his or her origin, destination and travelling time. The users submit their preferences at different
times. The system is continuously computing shared journeys for the users. It is
necessary that the users agree on the shared journey at least one day in advance so
they have enough time to arrange meeting points and to buy tickets. Therefore, it
is entirely sufficient if the users get an e-mail with a planned journey one hour after
the last member of the travel group submits his or her journey details. As shown
in Figure 6.3, the computation time ranges from 7 minutes for the smallest scenario
with 4 agents up to 53 minutes for the largest one with 12 agents.
Next, a reasonable size of a group for travel sharing is discussed. In my opinion,
a reasonable size is two to four persons. Such a group can be easily coordinated and
meeting points quickly agreed, no explicit leader of the group is needed. With the
price model used in this project, cf. formula (3.3), every member of a three-person
group can save 53 % of the single-agent price. The success rate of the algorithm for
three-person groups in the scenario S3 (trains in the central UK) is 70 %.
To deal with thousands of users that could be in a real-world travel planning
system, a preprocessing step is needed. The agents are divided into small groups of
approximately five agents by clustering according to their departure time, direction
of travel, origin, destination, length of journey and preferences. Then, the algorithm
is used to find a shared travel plan with a timetable. As mentioned above, such
groups are easy to coordinate and it is very likely that they receive a timetable. The
computation time for such groups is linear both in the number of agents and the
size of the scenario, cf. section 6.2.1. As a consequence, the algorithm can be used
in practice in a travel planning system for real passengers.
53
7 DISCUSSION
7.2
Parallel computing
The computation time of the algorithm can be improved by parallel computing.
Computing single-agent plans in the initial phase are independent problems. Therefore, they can be solved in parallel using multiple CPUs, e.g., with the program
parallel 22 . It is also possible to use parallel computing for matching to the timetable
in the timetabling phase (matching different groups of agents to timetable are independent problems).
As an example, assume that there are N agents in the scenario and t1 , . . . , tN
are the computation times for respective single-agent initial plans. Then with paralP
0
N
lel computing, the computation time is reduced from t = N
i=1 ti to t = maxi=1 (ti ).
The computation time of the timetabling phase is reduced accordingly. The sequential and parallel computation times are compared in Figure 7.1. For example, in
the experiments with 10 agents, the computation time can be reduced on average
to 52 % in the scenario S1 and to 56 % in the scenario S5.
70
S1: sequential computation
S1: parallel computation
S5: sequential computation
60
S5: parallel computation
50
40
30
20
10
0
2
4
6
8
10
12
14
agents in scenario
Figure 7.1: A comparison of sequential and parallel computing in scenarios S1 and S5.
22
http://www.gnu.org/software/parallel/
54
7 DISCUSSION
7.3
Problems with timetabling
In this section, the reasons why in some cases the timetable is not found are discussed. The main reason why some groups are not assigned a timetable is the usage
of the relaxed and the full domain. On the one hand, a plan in the relaxed domain is
found quickly without any problems. On the other hand, a planner fails to match it
to the timetable in a percentage of cases dependent on the group size. This implies
that the relaxed domain is too simplified. A journey found in the relaxed domain
does not correspond to a journey that would be found if it was planned in the full
domain with a timetable.
An obvious solution to this problem is to plan with the full domain from the
very beginning. However, this has proven to be infeasible in practice (cf. section 2.5).
Another solution could be to replace the relaxed domain with an intermediate representation with the complexity in between the current relaxed and full domain.
Another reason prevents finding a timetable especially in the bigger groups
(5 or more agents). There are too many time constraints (e.g., agents need to meet
at the stop before the departure of the service) so the timetable matching problem
becomes unsolvable given the 24-hour timetable. Lastly, the timetable is not found
because a specific part of the public transport network has very irregular timetable
(e.g., one coach in the morning, other one in the evening).
7.4
Domain-independent and domain-specific solutions
This section discusses domain-independent and domain-specific solutions for the
travel sharing problem. A domain-independent solution can be used for solving
a similar problem in other domains whereas a domain-specific solution uses some
special knowledge about the domain which restricts its use for a particular problem
domain.
On the one hand, the initial and the BR phase of the developed algorithm are
domain-independent so they can be used in other problem domains such as logistics,
network routing or services allocation. In the traffic domain, the algorithm can be
used to plan routes that avoid traffic jams or to control traffic lights. What is more,
additional constraints such as staying at one city for some time or travelling together
with a specific person can be simply added. On the other hand, the timetabling
phase of the algorithm is domain-specific.
55
7 DISCUSSION
The problem of finding shared routes using public transport could be also
solved by a domain-specific graph algorithm. There exists algorithms [9, 24, 28, 31]
that are able to find a journey with a timetable for a single agent. They are faster
than a domain-independent solution because they can use domain-specific heuristics
and data structures. However, the algorithms work only for single-agent problems
and they cannot be used in other domains. Furthermore, adding constraints to the
problem leads to changes in the graph algorithm (as opposed to a relatively simple
change of PDDL for the domain-independent solution).
7.5
Summary
In this section, the algorithm has been discussed in terms of usage in practice and
in other problem domains. The possibility of speeding up the algorithm by parallel computing has been explored and the problems with matching a joint plan to
timetable have been critically evaluated.
56
8 CONCLUSION
8 Conclusion
This chapter concludes the dissertation by summarising the results of the evaluation and describing the achieved improvements with respect to the related work of
McCafferty. The outcomes of the project and the future avenues of research are
described afterwards.
To begin with, the algorithm has very good scalability, it scales linearly both
with the scenario size and the number of agents. The average computation time for
12 agents in the scenario with 90 % of trains in the UK is less than one hour. Experiments have shown that the algorithm avoids the exponential blowup in the action
space characteristic for a centralised multi-agent planner. Furthermore, the computation time can be halved by parallel computation in the initial and timetabling
phases.
This is a very significant improvement compared to the McCafferty’s project
where the computation time for a scenario with 3 agents and 1 % of trains in the UK
was approximately one day. Also, the problem with the quantity of data is solved
by using the relaxed and the full domain. The algorithm has not failed because of
a memory limit in any of its phases.
The algorithm is able to find single or shared journeys in the relaxed domain
for all agents in the scenario. However, only part of the groups of agents travelling
together is assigned a timetable. There are several reasons for this behaviour. First,
the relaxed domain is too simplified. Second, there are too many time constraints
in bigger groups (5 or more agents) so the timetable matching becomes unsolvable
given the 24-hour timetable. Finally, some parts of public transportation network
have very irregular timetable.
The cost of agents’ journeys is improved by sharing parts of the journeys.
Nevertheless, there is a trade-off between the amount of improvement, the percentage
of found timetables and the prolongation of journeys. On the one hand, the bigger
the group, the better the improvement is. On the other hand, the more agents
share a journey, the harder it is to match their joint plan to timetable. Also, the
prolongation is likely to be higher with more agents travelling together.
Changing means of transport is enabled by introducing the walking connections. This process is carried out fully automatically with the help of the PostGIS
57
8 CONCLUSION
extension for the PostgreSQL database. In comparison, in McCafferty’s project only
some walking connections were created manually which is infeasible to do for the
whole UK public transportation data.
To conclude, the results indicate that the multi-agent planning algorithm is
able to plan meaningful shared routes for all agents in a feasible time in a real-world
travel domain, as stated in the hypothesis. In this project, a scalable multi-agent
planning algorithm has been successfully designed, implemented and evaluated. The
problems in the McCafferty’s approach have been overcome by performing automated data processing and using the best-response approach combined with the
relaxed and the full domain.
8.1
Outcomes
The main outcome of the project is a multi-agent planning algorithm for finding
shared routes in the travel domain. The algorithm generates partially shared travel
plans to reduce the travel costs and carbon footprint. The first two phases of the
algorithm are domain-independent and therefore, they can be reused in other problem domains. The algorithm as a whole can be easily extended to other countries
or areas.
Another outcome of the project are the naptan2sql and nptdr2sql applications for transforming the public transportation XML data to SQL database. The
database is supplemented with SQL functions for automated data processing. After
their application, the processed data is optimised for querying and can be reused in
projects that need usable public transportation data.
Finally, a web-based visualisation of the travel domain has been created using
the Google Maps API. It provides a better idea of the travel domain together with
the possibility of displaying shared journeys found by the algorithm.
8.2
Future avenues of research
In this section, possible improvements of the algorithm are described to provide the
avenues of further research in the travel sharing domain. They can be divided into
four distinct categories: speeding up the algorithm; improving the phases of the
algorithm; creating a shared travel planning system for use in practice; combining
public transport with car sharing.
58
8 CONCLUSION
First, the computation time can be improved by parallel computation using
several CPUs. Finding initial single-agent plan for every agent are independent problems as well as matching plans of groups to timetable. Therefore, the implementation
of parallel computation is straightforward in the initial and the timetabling phase.
The potential speedup is shown in Figure 7.1.
Second, as a preprocessing step, the agents can be clustered according to their
departure time, direction of travel, origin, destination, length of journey and preferences (e.g., travel by trains only, in shortest time or with cheapest price of the
journey). To prevent too large groups of agents which are not possible to match to
the timetable, a limit can be set to the size of the group. Or, if a group plan is not
matched to the timetable, the group can be split into smaller parts which are more
likely to get the timetable. Alternatively, the relaxed domain can be replaced by
an intermediate representation with the complexity in between the current relaxed
and full domain.
Third, features of a shared travel planning system for use in practice are discussed. Timetable data needs to contain timetable information for every day of
a week so the users can choose the date of their journey. A simple web-based user
interface should be created for entering the parameters of a journey and for displaying the shared journeys together with a timetable itinerary. In addition, local buses
can be added for single-agent parts of journeys so the origins and destinations are
not restricted only to train and coach stops.
Lastly, the price of travel can be significantly reduced by sharing a car (e.g.,
reduction to one fourth of the price with four people in a car). Therefore, it would
be very interesting to explore the problem of planning shared journeys when the
public transport is combined with car sharing. Then, in order to have a feasible
number of nodes in the travel domain, train and bus stops can be used as meeting
points where it is possible to change from a car to public transport or vice versa.
59
BIBLIOGRAPHY
Bibliography
[1] NaPTAN – Stop & Stop Area Types, Departement for Transport, UK, May 2009.
[Online]. Available at http://www.dft.gov.uk/naptan/stopTypes.htm. [Accessed:
June 11, 2011].
[2] EPSG Projection 27700 – OSGB 1936/British National Grid, Spatial Reference,
2011. [Online]. Available at http://spatialreference.org/ref/epsg/27700/.
[Accessed: June 21, 2011].
c
[3] Google Maps, 2011. [Online]. Available at http://maps.google.com/. 2011
c
Google – Data map 2011
Europa Technologies, Google, Tele Atlas. [Accessed:
July 7, 2011].
[4] PL/pgSQL – SQL Procedural Language – Overview, Chapter 39.1., PostgreSQL
Global Development Group, 2011. [Online]. Available at http://www.postgresql.
org/docs/9.0/static/plpgsql-overview.html. [Accessed: June 10, 2011].
[5] PostGIS 1.5.2 Manual, Refractions Research Inc., 2011. [Online]. Available at http:
//postgis.refractions.net/download/postgis-1.5.2.pdf. [Accessed: June 10,
2011].
[6] The Google Distance Matrix API, Google Inc., 2011. [Online]. Available at http:
//code.google.com/intl/cs/apis/maps/documentation/distancematrix/. [Accessed: June 11, 2011].
[7] What is the National Public Transport Data Repository (NPTDR), Department
for Transport, UK, 2011. [Online]. Available at http://tinyurl.com/nptdr-info.
[Accessed: March 8, 2011].
[8] R. I. Brafman and C. Domshlak. From One to Many: Planning for Loosely Coupled
Multi-Agent Systems. In J. Rintanen, B. Nebel, J. C. Beck, and E. Hansen, editors,
Proceedings of the Eighteenth International Conference on Automated Planning and
Scheduling (ICAPS-08), pages 28–35, Menlo Park, California USA, 2008. AAAI
Press.
[9] G. S. Brodal and R. Jacob. Time-dependent Networks as Models to Achieve Fast
Exact Time-table Queries. Electronic Notes in Theoretical Computer Science, 92:3–
15, 2004. Proceedings of ATMOS Workshop 2003.
[10] Y. Chen, B. W. Wah, and C.-W. Hsu. Temporal planning using subgoal partitioning
and resolution in SGPlan. Journal of Artificial Intelligence Research, 26:323–369,
Aug. 2006.
60
BIBLIOGRAPHY
[11] A. J. Coles, A. I. Coles, A. Clark, and S. T. Gilmore. Cost-Sensitive Concurrent
Planning under Duration Uncertainty for Service Level Agreements. In F. Bacchus,
C. Domshlak, S. Edelkamp, and M. Helmert, editors, Proceedings of the TwentyFirst International Conference on Automated Planning and Scheduling (ICAPS-11),
pages 34–41, Menlo Park, California USA, June 2011. AAAI Press.
[12] A. J. Coles, A. I. Coles, M. Fox, and D. Long. Forward-Chaining Partial-Order Planning. In R. Brafman, H. Geffner, J. Hoffmann, and H. Kautz, editors, Proceedings
of the Twentieth International Conference on Automated Planning and Scheduling
(ICAPS-10), pages 42–49, Menlo Park, California USA, May 2010. AAAI Press.
[13] A. J. Coles, A. I. Coles, M. Fox, and D. Long. POPF2: a Forward-Chaining Partial
Order Planner. In Proceedings of the International Planning Competition (IPC-7),
2011.
[14] M. Fox and D. Long. PDDL2.1: An Extension to PDDL for Expressing Temporal
Planning Domains. Journal of Artificial Intelligence Research, 20:61–124, 2003.
[15] A. Gerevini and D. Long. Plan Constraints and Preferences in PDDL3. Technical
report, Dipartimento di Elettronica per l’Automazione, Università degli Studi di
Brescia, 2005.
[16] M. Ghallab, D. Nau, and P. Traverso. Automated Planning: Theory and Practice.
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004.
[17] J. Hoffmann and B. Nebel. The FF planning system: Fast plan generation through
heuristic search. Journal of Artificial Intelligence Research, 14:253–302, 2001.
[18] C.-W. Hsu and B. W. Wah. The SGPlan Planning System in IPC-6. In Proceedings
of the International Planning Competition (IPC-6), 2008.
[19] A. Jonsson and M. Rovatsos. Scaling Up Multiagent Planning: A Best-Response
Approach. In F. Bacchus, C. Domshlak, S. Edelkamp, and M. Helmert, editors,
Proceedings of the Twenty-First International Conference on Automated Planning
and Scheduling (ICAPS-11), pages 114–121, Menlo Park, California USA, June
2011. AAAI Press.
[20] E. McCafferty. A Travel Sharing Application based on Multiagent Planning. Bachelor’s thesis, The University of Edinburgh, School of Informatics, Apr. 2011.
[21] D. McDermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso, D. Weld,
and D. Wilkins. PDDL – The Planning Domain Definition Language – Version 1.2.
Technical report, Yale Center for Computational Vision and Control, 1998.
61
BIBLIOGRAPHY
[22] N. Meuleau, M. Hauskrecht, K.-E. Kim, L. Peshkin, L. P. Kaelbling, T. Dean,
and C. Boutilier. Solving very large weakly coupled Markov decision processes. In
Proceedings of the Fifteenth National Conference on Artificial Intelligence, AAAI
’98, pages 165–172, Menlo Park, CA, USA, 1998. AAAI.
[23] D. Monderer and L. S. Shapley. Potential Games. Games and Economic Behavior,
14(1):124–143, 1996.
[24] A. Orda and R. Rom. Shortest-path and minimum-delay algorithms in networks
with time-dependent edge-length. J. ACM, 37:607–625, July 1990.
[25] L. Panait and S. Luke. Cooperative Multi-Agent Learning: The State of the Art.
Autonomous Agents and Multi-Agent Systems, 11(3):387–434, Nov. 2005.
[26] S. Richter, M. Helmert, and M. Westphal. Landmarks Revisited. In D. Fox and C. P.
Gomes, editors, Proceedings of the Twenty-Third AAAI Conference on Artificial
Intelligence, pages 975–982. AAAI Press, July 2008.
[27] S. Richter and M. Westphal. The LAMA planner. Using landmark counting in
heuristic search. In Proceedings of the International Planning Competition (IPC-6),
2008.
[28] F. Schulz, D. Wagner, and K. Weihe. Dijkstra’s Algorithm On-Line: An Empirical
Case Study From Public Railroad Transport. J. Exp. Algorithmics, 5, Dec. 2000.
[29] B. W. Wah and Y. Chen. Constraint partitioning in penalty formulations for solving
temporal planning problems. Artificial Intelligence, 170:187–231, Mar. 2006.
[30] E. W. Weisstein.
Transitive Reduction, from MathWorld – A Wolfram
Web Resource, 2011. [Online]. Available at http://mathworld.wolfram.com/
TransitiveReduction.html. [Accessed: July 10, 2011].
[31] M. P. Wellman, K. Larson, M. Ford, and P. R. Wurman. Path Planning under TimeDependent Uncertainty. In Proceedings of the Eleventh Conference on Uncertainty
in Artificial Intelligence, pages 532–539. Morgan Kaufmann, 1995.
[32] M. Wooldridge and N. R. Jennings. Intelligent Agents: Theory and Practice. Knowledge Engineering Review, 10(2):115–152, 1995.
[33] M. Woolridge. Intelligent Agents, chapter 2 in Introduction to Multiagent Systems,
pages 15–46. John Wiley & Sons, Inc., New York, USA, 2001.
62

Improving a CollaborativeTravel Planning Application

Transcription

Similar documents

Map

schoolplanner

Italian Train Crash - LiveWell with Optum

Al Massira Flyer englisch

SSRTP Update December 2013 - South Staffordshire Partnership

A Career as a Certified Spinologist

study skills: managing your learning

Rejseplanen – future development

Alternative Timetable 2016