Efficient Robot Navigation with Omnidirectional Vision

Transcription

Efficient Robot Navigation with Omnidirectional Vision
Ecient Robot Navigation with
Omnidirectional Vision
Christopher James Charles Burbridge
Faculty of Computing and Engineering
A thesis submitted for the degree of
Doctor of Philosophy
July, 2010
i
Contents
Acknowledgements
iv
Abstract
v
Declaration
1
2
3
vi
Introduction
1
1.1
Low-Cost and Computationally Ecient
. . . . . . . . . . . . . .
2
1.2
Sensor Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Research Tasks and Objectives . . . . . . . . . . . . . . . . . . . . . .
4
1.4
Contributions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.5
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Algorithms and Tools
8
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.2
Omnidirectional Vision . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3
Omnidirectional Binocular Vision with Conical Mirrors . . . . . . . .
14
2.4
Appearance-Based Vision
. . . . . . . . . . . . . . . . . . . . . . . .
16
2.5
Self-Organising Maps . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.6
Summary
20
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Literature Review: Robot Navigation
22
Contents
ii
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.2
Mapless Robot Navigation . . . . . . . . . . . . . . . . . . . . . . . .
23
3.3
Map-based Robot Navigation
. . . . . . . . . . . . . . . . . . . . . .
31
3.4
Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4
5
6
Ego-Motion Estimation with Binocular Omnidirectional Vision
44
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.2
Visual Compass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.3
Limitations of the Methods
. . . . . . . . . . . . . . . . . . . . . . .
53
4.4
Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.5
Calculating the Translation of the Robot . . . . . . . . . . . . . . . .
61
4.6
Omnidirectional Binocular Vision Without Alignment . . . . . . . . .
63
4.7
Summary
72
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Metric SLAM with Monocular Vision
74
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.2
Extended Kalman Filter SLAM Algorithm . . . . . . . . . . . . . . .
75
5.3
Experiments with EKF SLAM . . . . . . . . . . . . . . . . . . . . . .
80
5.4
Metric Visual SLAM in Practice . . . . . . . . . . . . . . . . . . . . .
85
5.5
Summary
88
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Topological Map-Based Navigation
90
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
6.2
Dense Metric-Topological Mapping
92
6.3
Using a SOM for Ecient Topological Mapping
6.4
Experiments in Discrete Navigation . . . . . . . . . . . . . . . . . . . 103
6.5
Experiments in Continuous Appearance-Based Navigation
6.6
Limitations of the Method . . . . . . . . . . . . . . . . . . . . . . . . 123
6.7
Summary
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
96
. . . . . . 111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Contents
7
8
iii
Accurate Visual Appearance-Based Local Navigation
126
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.2
Appearance-Based Docking Algorithm
7.3
Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.5
Summary
. . . . . . . . . . . . . . . . . 128
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Conclusion
145
8.1
Thesis Summary
8.2
Summary of Research Contributions
8.3
Directions for Future Research . . . . . . . . . . . . . . . . . . . . . . 148
References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
. . . . . . . . . . . . . . . . . . 147
150
iv
Acknowledgements
The rst half of this work was undertaken at the University of Essex under the
supervision of Dr Libor Spacek.
The University of Essex provided an excellent
environment for research, and awarded me an EPSRC scholarship to support my
PhD. I would like to express my sincerest gratitude to Libor for introducing me to
omnidirectional imaging, to the EPSRC for funding me for the rst two years of
study, and to all the numerous sta and students at the University of Essex who
assisted me.
Upon the retirement of Dr Libor Spacek, my PhD studies transferred to the University of Ulster, where Dr Joan Condell, Professor Ulrich Nehmzow and later Professor
Martin McGinnity became my supervisors. I wish to express my utmost gratitude
to Joan, Ulrich and Martin for their guidance, kindness and insight throughout my
study at Ulster. I wish to express particular thanks to Ulrich for inspiring my interest in robotics, and encouraging me to undertake further study after supervising
my undergraduate work. Without Ulrich, this thesis would not exist.
Additionally, I thank the University of Ulster Intelligent Systems Research Centre
(ISRC), directed by Professor Martin McGinnity, for funding the second half of my
PhD and providing an outstanding atmosphere for research. Special thanks are also
due to my colleagues in the ISRC robotics group for their continual support and
friendship.
I also thank my family for their constant support and encouragement. And nally,
conducting this research and writing this thesis would not have been possible without
the love, understanding and support of Emily.
v
Abstract
This thesis investigates vision based robot navigation for low-cost commercially viable robots. As robots become more ubiquitous in everyday life, cheap and ecient
methods of navigation that can work on consumer level hardware with low-cost
sensors will need to be developed.
The rst half of the thesis deals with sensor choice, and looks at the application of
binocular and monocular omnidirectional vision to robot navigation. A navigation
strategy that utilises binocular omnidirectional vision with conical mirrors is considered, and the problems and limitations of such a stereo system in the context
of low-cost low-computationally powered robots are discussed.
Contributions are
made in the form of two new visual compass approaches alongside the formulation
of generic image projections with conical mirrors. A single monocular omnidirectional camera is then investigated as a sensor for conventional metric localisation
and mapping, and is discussed in the context of this thesis. Experimental results
are given that quantify the benets of omnidirectional vision.
In the second half of this thesis, contributions are made in the form of new computationally ecient navigation algorithms that operate with a single low-cost monocular omnidirectional camera. In order to realise eective, ecient and comprehensive
navigation, the task is divided into two aspects: approximate navigation, where an
accuracy of approximately
0.5
navigation, with an accuracy of
m is achieved, and a more accurate local homing
6
cm.
The approximate navigation is achieved using an appearance-based approach combined with topological map construction. The novel use of self-organising feature
maps provides ecient topological map indexing and storage. For higher accuracy
navigation, an accurate robot homing algorithm is proposed which uses the raw image perception, with minimal processing necessary to navigate the robot to a desired
location. Experimental results show the advantages of the newly proposed method
over existing image-based homing strategies.
vi
Declaration
I hereby declare that with eect from the date on which the thesis is deposited in
the Library of the University of Ulster, I permit the Librarian of the University to
allow the thesis to be copied in whole or in part without reference to me on the
understanding that such authority applies to the provision of single copies made for
study purposes or for inclusion within the stock of another library. This restriction
does not apply to the British Library Thesis Service (which is permitted to copy
the thesis on demand for loan or sale under the terms of a separate agreement)
nor to the copying or publication of the title and abstract of the thesis. IT IS A
CONDITION OF USE OF THIS THESIS THAT ANYONE WHO CONSULTS IT
MUST RECOGNISE THAT THE COPYRIGHT RESTS WITH THE AUTHOR
AND THAT NO QUOTATION FROM THE THESIS AND NO INFORMATION
DERIVED FROM IT MAY BE PUBLISHED UNLESS THE SOURCE IS PROPERLY ACKNOWLEDGED
1
Introduction
It is essential that all autonomous mobile robots are equipped with the ability to
navigate. Doing so without requiring expensive sensors or vast computational resources is necessary if wide-spread deployment of mobile robots is to be realised.
The cost of a robot and its capabilities are usually linked. In many elds this could
be considered a marketing or business strategy, but in robotics the choice of hardware
and software has a genuine impact on both cost and capabilities. Expensive high
precision sensors and large amounts of computing power are essential for state of the
art navigation algorithms. While these algorithms could potentially allow a robot
to achieve more elaborate tasks, the costs involved prevent widespread deployment.
It could be argued that algorithms developed now which operate on high end PCs
will be applicable to small low-cost robots in the future, and therefore any eort to
create computationally ecient algorithms is essentially pointless. This however is a
very weak argument. Companies such as iRobot and Friendly Robotics have shown
that consumers are ready for robots now, and in recent years there have been an
increasing number of robots targeted at consumers. For some time robots such as
iRobot's Roomba vacuum cleaner and Friendly Robotics' RoboMow lawn mower have
been available, and increasing in popularity [41]. However, major factors limiting
their more widespread use are (a) their cost and (b) their limited capabilities. New
navigation algorithms that deliver accurate navigation using less resources and with
1.1. Low-Cost and Computationally Ecient
2
a single low-cost sensor will allow these limiting factors to be addressed.
Indeed,
1
the very recent release of a new vacuum cleaning robot by Samsung in April 2010
illustrates the potential for improved navigation algorithms.
The new consumer
robot, named NaviBot, is claimed to make use of visual mapping to allow more
advanced cleaning trajectories to be taken.
In this thesis, new navigation algorithms are developed that utilise a low-cost sensor
and impose limited requirements on the processing power available. If robots are
to be deployed in everyday environments, ecient algorithms such as these are
essential.
1.1
Low-Cost and Computationally Ecient
Throughout this thesis, the terms low-cost and computationally ecient are
used.
When referring to something as low-cost, a comparison to another more
expensive item or to a pre-established price that is considered expensive is implied.
In this thesis, low-cost is used to refer to the total cost of a hypothetical robot,
and is considered in comparison to typical low-cost commercial robots such as the
2
Roomba .
In order for a robot to be low-cost and still be capable of performing navigation, it
must be computationally ecient. This means that it must make use of the limited
processing power available to it in an ecient way. In this thesis, the term computationally ecient is used to indicate that an algorithm requires little computation,
and that a current low powered embedded computer such as the Asus Eee-Box
would be sucient.
3
Such computers are currently available to consumers at less
1 Samsung's Navibot robot vacuum charting European living rooms in April., Endgadget Online: http://www.engadget.com/2010/02/23/samsungs-navibot-robot-vacuumcharting- europeanliving-rooms-in/, 2010. Accessed 29th June 2010.
2A
Roomba robot vacuum cleaner currently costs from ¿250 ¿400.
Asus Eee-Box is a 1.6Ghz x86 PC with 1Gb RAM and 160Gb HDD, with dimensions
22.2cm (W) x 2.69cm (D) x 17.8cm (H) and weighing 1.0kg.
3 The
1.2. Sensor Choice
3
than ¿200.
1.2
Sensor Choice
There are a number of sensors available to mobile robots, each with advantages
and disadvantages. A review of sensors for navigation is provided by Borenstein et
al. [12], with a lot of emphasis given to inertial motion units (IMUs) and the global
positioning system (GPS). Whilst these sensors are capable of aiding localisation,
they have limited applicability for tasks other than navigation.
The overall choice of a sensor for low-cost robots should not only be driven by the
task of navigation. A versatile sensor will provide enough information not just for
navigation, but for other tasks such as recognition. This then allows the creation
of a single sensor robot, which in turn can result in a lower priced robot.
On
the other hand, this will not be the case if the cost of a single universal sensor is
more than an array of specialised sensors. This suggests that two properties need to
be considered when selecting a sensor physical cost and general purpose sensing
ability.
The possible choices of sensor, and their advantages and disadvantages are as follows:
•
Range Sensors: Laser, sonar and infrared are popular general purpose sensors for robotics. A laser range nder typically costs about ¿2000 and delivers
accurate information in non-reective environments. Sonar and infrared sensors have lower accuracy, but are generally a lot cheaper, ranging from ¿10 ¿20. In order to obtain a truly versatile range sensor capable of sensing more
than distance on a plane, a 3D laser scanner can be chosen. However, the cost
of such a sensor will prevent its use as a sensor for a cheap robot.
•
Navigation Sensors: Compasses, shaft encoders and motion sensors are all
examples of non-versatile sensors which are designed and used only for the
1.3. Research Tasks and Objectives
purpose of navigation.
4
A single sensor robot equipped with such a sensor
would be unable to perform additional tasks such as object recognition or
obstacle avoidance.
•
Vision: The cost of vision sensors has fallen dramatically in recent years,
and now a reasonable webcam can be purchased for ¿30. Such cameras provide adequate sensing for most robotic tasks. However, the interpretation of
the sensor information conventionally requires computationally expensive algorithms, leading to the requirement of more advanced, and therefore more
expensive, on-board computing.
Vision sensors are low-cost and yet the most versatile of all sensors. Omnidirectional
vision is even more versatile in that it provides information about the entire surroundings of the robot, in more than one direction. Furthermore, omnidirectional
imaging no longer requires expensive equipment; miniature sheye lenses that can
be used with low-cost webcams can now be bought for less than ¿50.
This thesis demonstrates that omnidirectional vision can be used eectively with
little computation, thus providing a low-cost navigation system. Whilst the scope of
this thesis is limited to navigation, elsewhere in the literature omnidirectional vision
has been shown to be eective not only for navigation, but for other tasks such as
obstacle avoidance [77] and people tracking or following [65], further demonstrating
the versatility of this sensing modality. Therefore, omnidirectional vision is the best
choice of sensor for an economical single sensor robot.
1.3
Research Tasks and Objectives
The overall aim of this thesis is to develop an indoor mobile robot navigation system
that works with a single low-cost omnidirectional camera and without large computational requirements. In order to achieve this, the following tasks and objectives
were formed:
1.4. Contributions
5
1. To investigate omnidirectional vision as a sensor for low-cost mobile robot
navigation;
2. To achieve approximate navigation (1 metre accuracy) with a low-cost camera
and an economical algorithm suitable for low-cost mobile robots;
3. To achieve ecient visual docking to be carried out after the approximate
navigation;
4. To quantitatively test the constructed navigation systems using external ground
truth data.
The rst objective is met in Chapters 4 and 5, with an evaluation of the use of
omnidirectional stereo, and a comparison and testing of conventional metric SLAM
using both omnidirectional and classical vision.
The second and third objectives
are the subject of Chapters 6 and 7 respectively. Objective 4 is carried out in each
chapter using an external state of the art motion tracking system.
1.4
Contributions
The novel contributions of this thesis are as follows.
•
An algorithm for instantaneous ego-motion estimation between two
consecutive video frames.
Two new rotation estimation algorithms are proposed and compared to a well
known existing method.
•
The formulae for the generic omnidirectional projection with a conical mirror.
Previously all formulations of the projection with a conical mirror assumed
perfect alignment [112], or other limiting restrictions. Generic projection formulae are presented in Chapter 4 that allow arbitrary mirror and camera
placement.
1.5. Thesis Outline
•
6
A quantitative comparison of monocular omnidirectional vision and
classical monocular vision for metric visual simultaneous localisation
and mapping.
Although there are many papers in the literature that present implementations of well known monocular vision SLAM algorithms (see Chapter 3.3.1),
a quantitative look at how the accuracy of such systems can be improved by
using omnidirectional vision had not previously been presented.
•
A novel algorithm for approximate navigation that eciently uses
visual appearance-based topological mapping.
The algorithm proposed here runs in real-time on standard hardware and is
ideally suited for use on low-cost robots. Quantitative testing of the algorithm
using external ground truth shows good performance (less than 1 metre error).
•
A novel algorithm for accurate visual homing that exploits the properties of images taken from an indoor robot.
Like existing algorithms, the snapshot model by Cartwright and Collett [18,19]
is used. However, unlike existing algorithms such as that by Franz et al. [40],
the properties of an indoor mobile robot have been exploited to create a novel
algorithm that is fast, ecient and accurate. Quantitative testing of the proposed algorithm using external ground truth shows good performance (approximately 6 centimetre error).
•
1.5
A quantitative evaluation of the algorithms developed.
Thesis Outline
The remainder of this thesis is organised as follows: related work and background
material are discussed in the following two chapters.
Key image processing and
machine learning algorithms are detailed in Chapter 2, and existing approaches to
visual navigation are described in Chapter 3.
In Chapter 4 the use of binocular
omnidirectional vision with conical mirrors is investigated.
Two new rotation es-
1.5. Thesis Outline
7
timation algorithms are proposed and compared to a well known existing method,
and the limitations and problems with omnidirectional stereo vision are addressed.
Chapter 4 shows that catadioptric omnidirectional stereo is unsuitable as a cheap
sensor. Chapter 5 subsequently introduces the use of monocular omnidirectional vision. Davison's monocular vision based SLAM algorithm is investigated as a solution
to navigation.
The advantages brought about through the use of omnidirectional
vision are quantied and demonstrated, and the potential application in the context
of this thesis is discussed.
A new navigation strategy that utilises a low-cost sh-eye camera to achieve an
ecient appearance-based, navigation-driven representation of the robot's working
environment is presented in Chapter 6.
Experiments are performed to show the
eectiveness of the approach, and its limitations are discussed. A complementary
navigation algorithm is developed in Chapter 7 that provides higher accuracy at the
expense of the range of operation, and is thus more suitable for application in robot
docking scenarios.
Finally, this thesis is concluded in Chapter 8 with a summary of the work presented
and suggestions for future research.
2
Algorithms and Tools
2.1
Introduction
This chapter provides details of existing algorithms and methods that are directly
used in this thesis.
Specic topics of interest are catadioptric and dioptric omni-
directional vision, omnidirectional stereo vision, appearance-based vision, and selforganising maps.
2.2
Omnidirectional Vision
Omnidirectional vision, often referred to as panoramic vision, is vision with a wide
eld of view. In general the eld of view is not truly omnidirectional, but rather
a 360 degree horizontal view with a more limited vertical eld of view [91].
The
key advantage of omnidirectional vision over classical narrow eld of view vision
is the increased area covered in an image. In robotic applications, this allows the
robot to see in all directions, which has been shown to improve the ability to
localise and navigate [125], and to be able track objects [115].
Omnidirectional
vision is versatile thus ideally suited as a sole sensor on an economical robot and
is therefore the sensor of choice in the methods developed in this thesis. However,
2.2. Omnidirectional Vision
9
the increased eld of view acquired in omnidirectional images comes at a cost in the
form of spatial resolution. When using a single camera to capture a wide eld of
view, the spatial resolution is directly aected by the eld of view that is captured.
The captured scene is spread over a xed size image sensor, and when a large area
is captured, fewer pixels are used to capture the same area. A full investigation of
spatial resolution in omnidirectional images is provided in [91].
There are many ways of capturing an omnidirectional image.
These can be split
into catadioptric [47] and dioptric. In a dioptric (refracting) system, the scene is
imaged directly through a lens without any reection (catoptric ). In a catadioptric
arrangement the scene is rst reected in a mirror before being captured by a dioptric
camera.
Examples of dioptric approaches include using multiple cameras [116], a
spinning camera [53] and a sheye lens [85]. Catadioptric approaches include using a
pyramid of planar mirrors [117], rotating planar mirrors, and rotationally symmetric
mirrors viewed along the axis. A full review of catadioptric approaches is provided
in [92]. The most important dierences between the approaches are:
1.
The viewpoint geometry
A single eective viewpoint is generally considered desirable because, as explained in [43], it allows many classical computer vision approaches built
around a pinhole model to be directly applied with omnidirectional vision.
As shown in Figure 2.1, this occurs when all pixels in the image are the result
of a projection ray that passes through a common 3D point. Most imaging arrangements provide a single eective viewpoint, with the exception of multiple
camera setups, and catadioptric vision using a conical or spherical mirror [8].
2.
The eld and direction of view
As mentioned above, omnidirectional vision does not usually provide a
360◦
view in all directions, but just along the horizontal. The vertical resolution
varies between approaches, but the highest resolution is oered by multiple
camera arrangements such as Immersive Media's Telemmersion
1 Dodeca
1
system, which
2360 Camera System. http://www.immersivemedia.com/. Accessed 29th June 2010.
2.2. Omnidirectional Vision
10
Camera
Projection
rays
11
00
View
Point
Hyperboloidal
mirror
Figure 2.1: A catadioptric imaging system has a single eective viewpoint when all
projection rays prior to reection intersect at a common point. As shown in this
diagram, omnidirectional imaging with a hyperbolic mirror exhibits this property.
combines 11 cameras to obtain a
3.
360◦
omnidirectional image.
The cost of the system
In the context of this thesis, both the physical and computational cost of the
approach are important. The physical cost varies between approaches, with
home built solutions such as that by Francis et al. [38] costing less than ¿50 and
high resolution commercial solutions such as Immersive Media's Telemmersion
system costing over ¿5000.
In this thesis two forms of omnidirectional imaging are used: conical mirrors, and a
sheye lens.
2.2.1 Conical Mirrors
There are several types of rotationally symmetric mirrors that can be used in a
catadioptric setup.
Parabolic, hyperbolic and ellipsoidal mirrors can be used to
achieve a single eective viewpoint. However, as pointed out in [78], using a cone
2.2. Omnidirectional Vision
11
mirror is preferable as there is no radial distortion and the camera is not reected
in the mirror. In this thesis a conical mirror with a
90◦
apex is used, as suggested
by Spacek [112].
(a)
(b)
(c)
Figure 2.2: Omnidirectional vision with a conical mirror. (a) The image projection
with a conical mirror viewed along the axis.
(b) An example catadioptric image
taken using a conical mirror. (c) The corresponding catadioptric image unrolled.
2.2. Omnidirectional Vision
Using a
90◦
12
apex conical mirror as shown in Figure 2.2 results in a cylindrical
projection of the surroundings of the robot. Figure 2.2a shows the geometry of the
projection, demonstrating the multiple eective viewpoints that result. A circle of
viewpoints is formed, where each vertical line in the unrolled image (Figure 2.2c) is
the result of a dierent view point. Each vertical line of pixels in the unrolled image
is equivalent to a radial line in the catadioptric image (Figure 2.2b).
Although the projection does not provide a single eective viewpoint, Lin and Bajcsy [78] suggested that a single eective viewpoint can be obtained by placing the
camera at the tip of the cone. Unfortunately imaging the cone from the tip leads to
diculties focusing the lens and capturing enough light. Moreover, a single eective
viewpoint was shown to be unnecessary by Spacek [112]. The projection formulae
of a conical mirror were derived for the case where the camera is positioned on the
axis of the cone at an arbitrary distance from the tip.
A cost eective robot can be constructed that uses a conical mirror to achieve
omnidirectional vision, as demonstrated by Francis et al. [38]. A good quality conical
mirror was economically produced using a lathe and a block of aluminium, and
the advantages relating to the construction of cones over alternative mirror shapes
were discussed. The conical mirror was then used in an optical ow based obstacle
avoidance algorithm.
2.2.2 Fisheye Lenses
A sheye lens (developed in 1964 by Miyamoto [86]) is a special type of camera
lens that can be used to obtain up to a hemispherical image in a purely dioptric
setup as shown in Figure 2.3.
A true sheye lens captures a
180◦
hemisphere as
a circle on the image plane, which may be shown full frame when coupled with a
small image sensor resulting in a eld of view of
180◦
only on the diagonal direction.
In such an arrangement there will be a lower view angle vertically and horizontally.
Conventionally sheye lenses are considered expensive, large, heavy, complicated
2.2. Omnidirectional Vision
13
and unsuitable for mobile robot applications, with lenses only designed for expensive
photographic equipment and with little interest in the robotics community [124].
Figure 2.3:
(a)
(b)
(c)
(d)
Omnidirectional vision with a sheye lens.
(a) The Unibrain Fire-i
webcam [121] used to capture images. (b) The miniature M12 mount sheye lens.
(c) An image captured with the camera and lens. (d) An undistorted version of (c)
using the calibration algorithm in [107].
However, in the last 5 years miniature sheye lenses have been developed that operate with inexpensive webcams like that shown in Figure 2.3 and used in Chapters
6 and 7. These miniature sheye lenses can be bought for less than ¿50, and are
gaining popularity in robotics applications.
A sheye lens does not produce a single viewpoint projection, and the image distortions introduced are impossible to correct exactly. However, it does very closely
approximate a single viewpoint and omnidirectional camera calibration algorithms
have been developed that allow a near-perspective image to be created from the
warped image. One such algorithm is that proposed by Scaramuzza et al. [107], in
which the camera projections are modelled with polynomials tted using checker-
2.3. Omnidirectional Binocular Vision with Conical Mirrors
2
board point correspondences .
14
The general form of the projection model allows
for the fact that the sheye lens does not produce images with a perfectly single
viewpoint. In Chapter 7 this approach is used to rectify a sheye image, creating a
perspective image like that shown in Figure 2.3d.
2.3
Omnidirectional Binocular Vision with Conical
Mirrors
Placing two of the conical mirrors described in Section 2.2.1 in a co-axial arrangement as shown in Figure 2.4 allows stereopsis to be achieved. The range,
r,
to an
object can then be calculated using the pair of omnidirectional images in much the
same way as in classical forward looking stereo vision. This elegant arrangement of
conical mirrors was proposed by [112], who provided the formulae for stereopsis. A
method of simulating omnidirectional images from a conical mirror was presented
by [17], and the co-axial omnidirectional stereopsis with conical mirrors was veried
using the simulation.
Although alternative approaches to achieving omnidirectional stereo vision exist
(for a review see [130]), this conguration leads to simple projections and epipolar
constraints. The formulae given in [112] for calculating the range are:
r=
where
(2.1)
d is the distance from the camera lens to the tip of the mirror, s is the distance
between the mirrors,
and
vs
− d,
hi1 − hi2
hi1 , hi2
v
is the distance of the image behind the lens of the camera
are the image distances in the upper and lower images between the tip
of the cone in the image and the object in the image. The dierence
(hi1 − hi2 )
is
the radial disparity between the imaged object positions as shown in Figure 2.4.
2 Note that the sheye calibration in [107] diers from regular camera calibration in that the
projection function is being estimated, not only the distortion parameters as in [120].
2.3. Omnidirectional Binocular Vision with Conical Mirrors
15
Range
v
Lower Camera
d
Lower Mirror
s
v
Upper Camera
d
Common
Region
Upper Mirror
Object
h i1
h i2
(a)
(b)
Figure 2.4: The vertically aligned coaxial omnistereo conguration. (a) The arrangement of the mirrors and the symbols used in the formulae. (b) A robot equipped
with two coaxial arranged mirrors.
To convert between pixels and millimetres,
v=(
where
v
is calculated as
d
+ 1)rm ,
R
R is the radius of the mirrors in millimetres, and rm
(2.2)
is the radius of the mirror
in the image.
The distance to an object can only be computed if the object lies in the common
region viewed by both mirrors. A full comparison of mirror arrangements is provided
in [15], where the primary reasons for preferring a co-axial arrangement over a
horizontal side-by-side setup are stated as:
1. Arranging co-axially allows a
360◦ view to be unoccluded by the second mirror.
2. The epipolar constraint in a stereo vision system refers to the relationship
between corresponding feature positions in the two images. When using a coaxial conical mirror arrangement the epipolar constraint can be formulated as
corresponding radial lines, allowing well constrained correspondence matching.
2.4. Appearance-Based Vision
Equations (2.1) and (2.2) apply for conical mirrors with a
16
90◦
angle at their apex,
and are critically dependent upon the precise alignment of the mirrors and cameras
as discussed in Chapter 4.6.
2.4
Appearance-Based Vision
Appearance-based vision is a subcategory of machine vision in which the geometry
of the viewed scene is not considered, or indeed reconstructed. Only the appearance
of the scene in the form of the pixel intensities returned from the imaging device
is used when interpreting the image. A common application of appearance-based
vision methods is image retrieval, in which a database of images is queried with a
sample image and any images similar to the sample image are retrieved [55].
Approaches can be split into two categories: local appearance-based methods and
global appearance-based methods.
Local methods use the appearance of local patches in the image, either by splitting
the image into specied regions or into regions around detected features. An example
is the bag-of-words paradigm [110], in which a collection of previously seen image
features form a vocabulary.
The presence of the words in the image becomes
equivalent to words in a string, and well known text retrieval methods are applied
to search an image database for similar images.
Global methods use the entire image as a whole, either using the intensity values of
the pixels directly or by building a global image descriptor such as an edge intensity
histogram.
An example is the work of Gaspar et al. [42], in which raw greyscale
images are compressed using principal component analysis, and the projection of
each image into the lower dimensional subspace is used as a global image descriptor
for image comparison.
In this thesis both global and local appearance-based methods are used. In Chapters
2.4. Appearance-Based Vision
17
4, 6 and 7 global appearance methods are used in the form of gradient magnitude
images, while in the second half of Chapter 6 local appearance methods are used in
the form of simple coloured blobs. The coloured blobs are detected as explained in
the following section.
2.4.1 Blob detection with Opposing Colour Channel Coding
As part of a larger visual attention mechanism, Itti et al. [54] proposed emphasising
the channels of an RGB image.
By emphasising the colour channels of an RGB
image using opposing colour channel coding, red, green and blue objects can be
easier located [95]. The formulae used in [54] were as follows:
(2.3)
G0
(2.4)
B0
where
R 0 , G0
and
B0
(b + g)
2
(r + b)
= g−
2
(r + g)
= b−
2
R0 = r −
are the emphasised red, green and blue channels and
(2.5)
r, g
are the red, green and blue channels of the original image. In Chapter 6 the
and
B0
and
b
R 0 , G0
channels are thresholded resulting in pure red, green and blue objects being
detectable as blobs.
This is not a generic blob detection strategy suitable for an
end product, and indeed the necessary threshold depends upon the ambient lighting
conditions and must be manually selected and changed regularly. However, it works
well for situations where the way in which the blobs are detected is unimportant.
In Chapter 6 coloured blobs are detected and used for navigation with a selforganising map.
2.5. Self-Organising Maps
18
Figure 2.5: The architecture of a self-organising map.
2.5
Self-Organising Maps
The Self-Organising Map (SOM) [6668] is a vector quantisation and clustering
algorithm that creates a topology preserving mapping of an
space onto a
p
dimensional discrete map space.
n
dimensional vector
The map space usually has one
or two dimensions, resulting in SOMs that can be displayed and used for data
visualisation [126], and is formed by a number of nodes (often also referred to as
neurons or units) evenly spaced on a grid as shown in Figure 2.5.
Each node has a weight vector representing an
n dimensional cluster centroid.
When
a new input vector is presented to the SOM, the weight vectors of each node are
compared to the input vector. The output of each node is calculated as the Euclidean
distance between the weight vector and the input vector, and the node that best
matches the input pattern is considered as the output of the SOM.
The topology preserving property of the SOM algorithm ensures that close vectors
in the
n
dimensional input vector space are mapped to close nodes within the
p
dimensional SOM. It is the fundamental contribution of the algorithm over alternative methods of unsupervised clustering such as k-means. As such, the algorithm
has been used extensively since its creation in 1981, with applications ranging from
medical diagnosis [23] to the prediction of bankruptcy [63]. A bibliography of papers
2.5. Self-Organising Maps
19
using or analysing SOMs between 1981 and 2001 cites over 5000 papers [57, 97].
Training of a SOM involves a form of competitive learning.
The weight vectors
of each node are initialised randomly, and then for every epoch each vector in the
training set is in turn classied with the SOM. The node whose weight vector best
represents the training vector and its neighbouring nodes are adapted so that their
weight vectors are brought closer to the input vector. Adapting not only the best
matching node but also the neighbours of that node ensures that the topology of
the training vectors is preserved. Figure 2.5 on the preceding page shows an update
neighbourhood of size
√
2,
where the map is wrapped around so that the edges of
the map are neighbours (forming a torus shape).
The weight vector
w
~
of each node selected for updating is changed according to
Equation (2.6):
w
~ t+1 = w
~ t + ηt (~x − w
~t) ,
where
~x
is the training vector,
η
is the learning rate and
(2.6)
t
is the epoch number.
The learning rate and the neighbourhood function for selecting nodes for updating
are reduced at each epoch exponentially according to Equations (2.7) and (2.8):
ηt+1 = η0 e−r
2 /R2
2 /λ2
Rt+1 = R0 e−10t
where nodes that fall within a radius
for updating,
λ
;
(2.7)
,
(2.8)
of the best matching node are selected
is the number of epochs that will be taken (500 in the experiments
reported in this thesis),
of the SOM and
r≤R
2 /λ2
et
η0
t is the current epoch number, R0
is initialised to the radius
is initialised to 0.9. Thus the learning rate starts at
0.9,
and the
amount that the node weight vectors are changed is reduced as the distance from
the winning node increases and the training progresses.
Figure 2.6 shows a SOM algorithm used to cluster 1000 random RGB triplets generated with a uniform distribution. The training was carried out as above with a
2.6. Summary
Figure 2.6: A
30 × 30
20
SOM trained with 1000 randomly generated RGB triplets.
The topology preserving property of the SOM algorithm is evident in the grouping
of similar colours.
30 × 30
map, and the nal SOM was saved as an image with each node represented
as a grid location. The squares were coloured with the RGB value corresponding
to the node's weight vector. It can be seen that close colours are mapped to close
locations on the SOM, preserving the topology of the higher dimensional input space
on the lower dimensional map.
2.6
Summary
In this chapter omnidirectional vision, appearance-based vision and self-organising
maps have been introduced.
The key tools and algorithms that are used in this
thesis are as follows:
•
Omnidirectional imaging is versatile and ideally suited as a sole sensor on an
economical robot. It is therefore used throughout this thesis.
•
Conical mirrors are an appealing way to obtain an omnidirectional image cata-
2.6. Summary
21
dioptrically because the direction of view is over a useful area and there is no
radial distortion. In Chapter 4 conical mirrors are used in a visual compass
algorithm.
•
Research suggests that omnidirectional stereo with conical mirrors is reasonably straightforward. In Chapter 4 omnidirectional stereo with conical mirrors
is considered and new formulae are derived to remove the strict alignment constraint in Spacek's formulation [112].
•
Fisheye lenses have recently become available for small inexpensive cameras,
and methods of rectifying the distortion have been developed. In Chapters 6
and 7 a sheye lens is used as part of an alternative, low-cost omnidirectional
camera.
•
Appearance-based vision does not try to reconstruct the geometry of the environment, and relies only on the captured appearance of the environment. In
Chapters 6 and 7 appearance based vision is carried out to utilise raw sensor
information in an ecient way.
•
SOMs are popular tools for vector quantisation and clustering, with a useful
topology preserving property. In Chapter 6 the topology preserving property
is exploited to index a topological environment map.
Having now introduced the tools utilised in the work presented in this thesis, in
the next chapter related work in visual robot navigation is considered and the work
presented in Chapters 4 to 7 is placed in context.
3
Literature Review: Robot Navigation
3.1
Introduction
Mobile robot navigation can be divided into three intertwined competences [94]:
•
Self-localisation the ability of the robot to calculate its location in an environment,
•
Path planning the process of choosing a route from the current location to
a desired location,
•
Map building and map interpretation the construction of some form of map
of the environment, and the interpretation of the map to allow path planning
and localisation.
Review papers that cover the full range of approaches to visual navigation have been
provided by DeSouza and Kak in 2002 [28] and by Bonin-Font et al. in 2008 [11]. In
both reviews, self-localisation and path planning are categorised as either map-based
or mapless ; in this chapter the same classication is adopted.
As stated in Chapter 1, the overall aim of this thesis is to develop an indoor mobile
robot navigation system that works with a single low-cost omnidirectional camera
3.2. Mapless Robot Navigation
and without large computational requirements.
23
In light of this, key approaches
to mobile robot navigation that are of interest are identied and discussed in this
chapter.
Firstly mapless navigation strategies are discussed, in particular visual
odometry estimation algorithms and image based homing methods. Secondly mapbased navigation approaches are considered, specically simultaneous localisation
and mapping, and topological map-based navigation. The chapter is concluded in
Section 3.4 with a summary of the navigation approaches considered in this thesis.
3.2
Mapless Robot Navigation
Mapless navigation involves using the perception of the robot to navigate to a goal,
without relying on known landmarks or complicated online map building algorithms.
Mapless navigation strategies are appealing for low-cost robots where advantages
are gained by not requiring map storage, look-up, creation or maintenance.
Two
forms of mapless navigation investigated in this thesis are image based homing, and
dead-reckoning. Dead-reckoning (discussed in Section 3.2.1) attempts to maintain
a metric localisation without resorting to a map, by integrating robot ego-motion
estimates over time. Image based homing (detailed in Section 3.2.2) aims to navigate
a robot to a target image based solely on comparisons between the current and target
images.
3.2.1 Robot Ego-Motion and Visual Odometry
In the context of robotics, ego-motion is the motion of a robot with respect to a
static environment.
When a map of the environment is provided the ego-motion
and absolute position of the robot can be calculated. When no apriori knowledge
of the environment is available and no map is constructed, global positioning is not
possible; in this case the location estimation becomes incremental and prone to drift
error.
Despite this drift error, mapless visual odometry strategies are still useful,
3.2. Mapless Robot Navigation
24
particularly for low-cost robots where advantages are gained by not requiring a map.
The robot's ego-motion is frequently calculated by using wheel encoders and knowledge of the mechanics of the robot. However, wheel slippage causes inaccuracies in
the estimates. An alternative solution is to use visual odometry, where the wheel
encoders are supplemented or replaced by a camera. A successful example of the
use of visual odometry is demonstrated on the Mars Rovers, where it has been used
successfully for over ve years [81] to cope with extreme wheel slippage on soft 3D
terrain.
It is common for the ego-motion of a robot to be computed by separately calculating
the rotation and translation estimates.
Several researchers have developed visual
compass algorithms, as detailed in Section 3.2.1.1. Methods of translation estimation
are covered in Section 3.2.1.2.
3.2.1.1
Visual Compasses
It has been shown that the polarisation pattern of sunlight is used by animals to
provide a compass for navigation [105].
Taking inspiration from this discovery,
Lambrinos et al. [75] developed a visual compass using an array of photo-diodes to
nd the polarisation pattern of sunlight. A similar approach was taken by Usher et
al. [122], who developed a visual compass that uses a colour video camera combined
with a polarising lter to detect the polarisation of the sun. Although interesting
from a biological point of view, application is limited to outdoor robotics and requires
a dedicated sensor to detect the light polarisation.
Albert and Conell [3] demonstrated an algorithm using a forward looking camera
to estimate rotation.
In their approach vertical and horizontal lines are matched
between the current video frame and a key frame, which is captured at the start
of the traversal and replaced after every 30 degrees of rotation.
A brute force
search is used to nd the rotation of the robot that results in the maximum feature
correspondence between frames. The best rotation is then taken as the estimate of
3.2. Mapless Robot Navigation
25
the robot rotation. While they show that the approach is computationally ecient,
the method relies on the presence of vertical and horizontal features, and requires
an accurately calibrated camera so that straight lines can be detected.
Other approaches that also require local image features were proposed by Sturm
and Visser [113] and by Montiel and Davison [90]. In [90] a Bayesian simultaneous
localisation and mapping solution based on the extended Kalman lter is used to
provide a three dimensional visual compass. A greyscale monocular camera with a
90 degree lens is used to map distant point features, which are matched to provide
an incremental estimate of the camera orientation and angular velocity. Unlike the
SLAM solutions detailed in Section 3.3.1, the point features are assumed to be at
an innite distance, allowing translation to be ignored and 3D reconstruction to be
avoided.
Sturm and Visser [113] also employed a Bayesian lter to produce a visual compass.
Their appearance-based approach uses a colour monocular forward looking camera,
with which a cylindrical map of colour transition frequencies is constructed. The
cylindrical map is constructed by rotating the robot on the spot and recording
the changes in pixel colours along the horizontal lines in the images.
The pixel
colour transitions are assumed to result from innitely distant objects, allowing
robot translation to be neglected.
Labrosse [73] proposed an incremental, purely mapless appearance-based method
of calculating a robots rotation using omnidirectional images. The rotation of the
robot between frames is determined by nding the image patch corresponding to
the expansion and contraction elds of the current image within the previous image.
This is achieved by minimising the Euclidean distance between the previous image
and a rotated copy of the current image with respect to the rotation. Eciency is
maintained by combining a local and global search when minimising the distance.
Of all of these approaches, the Labrosse visual compass is the most appealing with
the widest array of experiments presented and a full exploitation of omnidirectional
3.2. Mapless Robot Navigation
26
imaging. The conceptual simplicity and applicability of the approach ts well within
the context of ecient economical robots.
In Chapter 4 two new visual compass
approaches inspired by that of Labrosse are detailed, and direct performance comparisons are made.
3.2.1.2
Feature-Based Translation
The visual compass approaches detailed in Section 3.2.1.1 did not attempt to geometrically reconstruct the scene in order to nd the rotation of the robot, and as such
can be classied as appearance-based. However, there are many non appearancebased methods that aim to calculate a metrically accurate estimate of translation by
using feature correspondences and scene geometry. When feature correspondences
between frames and the intrinsic calibration of the camera are known, a geometric reconstruction can be performed that allows the ego-motion of the robot to be
found. In computer vision this is referred to as structure from motion [46], in which
both the structure of the scene and the motion of a camera observing the scene are
recovered simultaneously.
Matsuhisa et al. [83] presented an application of the factorisation approach to structure from motion.
Omnidirectional images were captured from a moving vehicle,
and a three dimensional model of the scene was reconstructed.
The factorisation
approach, proposed in 1992 by Tomasi and Kanade [119], collects tracked features
over a stream of images and batch computes the scene structure and camera trajectory. It is computationally demanding and better suited for scene reconstruction
than for motion estimation. Furthermore, the motion and structure are only estimated up to a scale factor. An alternative omnidirectional vision based structure
from motion algorithm that works by self calibration of the intrinsic and extrinsic camera parameters has been proposed by [85].
In the proposed solution the
extrinsic parameters give the motion between frames. However, the computationally demanding calibration process requires at least 9 point correspondences.
A
full three dimensional path of the camera is reconstructed, which is unnecessary for
3.2. Mapless Robot Navigation
27
small robots designed to work on planar surfaces.
Several visual odometry methods have been developed that constrain the motion
of the robot to a plane.
Scaramuzza et al. [108] developed a monocular omnidi-
rectional visual odometry algorithm that calculates motion on a plane by tracking features on the ground. As in earlier work with perspective cameras the strict
mathematical relationship between views of a planar surface (a homography) is used
to calculate the motion. However, unlike previous homography based approaches,
the rotation estimation is provided by the Labrosse appearance-based visual compass [73]. The translation is found using ground plane feature correspondences, with
outliers removed using RANSAC (RANdom SAmple Consensus) [36] in order to ensure that only ground plane features are used. It was observed that the addition of
an appearance-based compass improves ego-motion estimation.
RANSAC is an algorithm for robustly tting a known model to a data set that contains outliers. It is frequently used in multiple view computer vision for estimating
the relationship between views [46]. However, as an iterative process with no upper
bound on the time taken to t the model it can be computationally expensive and
unsuitable for small robots with limited processing power. As observed by Scaramuzza et al. in a later paper [106], the number of model parameters that RANSAC
uses has an exponential eect on the number of iterations required. To overcome this
they presented a monocular omnidirectional vision based visual odometry system
that estimates the robot motion from a single image correspondence. Multiple correspondences between frames each provide dierent estimates of the motion, and the
median estimate is used as the overall estimate. This avoids the iterative RANSAC
algorithm altogether.
However, in order to be able to estimate the robot motion
from only a single point correspondence, the approach imposes strong limits upon
the possible motion of the vehicle.
An alternative way of achieving ego-motion estimation from a single point is to use
stereo vision. When range information is available and the robot is constrained to
motion on a plane, one correspondence is sucient. In Chapter 4 omnidirectional
3.2. Mapless Robot Navigation
28
stereo vision is considered for translation estimation.
3.2.2 Image-based Homing
The pioneering work of Cartwright and Collett [18, 19] inspired the eld of imagebased homing. Their work examined how honeybees navigate to food sources using
the surrounding landmarks. After experimenting with real bees and recording the
ight paths under varying landmark positions, a computer model was created that
simulated the navigation of the bees. In the model, named the snap-shot model,
a comparison is made between the positions of landmarks in the current view and
a snapshot image recorded at the target location. The comparison yields a homing
vector that can be used to steer towards the target image. The work was undertaken purely from a biological viewpoint, with the aim of better understanding the
behaviour of bees.
Therefore unrealistic simulated binary landmark images were
used.
Two real-world (not simulated) robotic implementations of image-based homing directly inspired from the work of Cartwright and Collett were presented by Röfer [104]
and Franz et al. [39, 40]. In [104] a rotating photoreceptor was used that captured 64
intensity readings as a one dimensional panoramic perception. Self-organising map
based clustering was used to calculate the correspondence between images, which
was then used to compute the homing vector. An ending position accuracy of 3cm
from the target was claimed, but limited experiments were performed and the experimental environment was not detailed. Furthermore, the robot was required to
have a constant orientation meaning that a compass was necessary.
In Franz et al. [39, 40] features are extracted from a one dimensional panoramic
image and shifted to calculate the displacement between views. Experiments were
performed using a miniature robot (a Khepera) in an arena approximately 1 metre
squared. Homing was shown to be successful over a distance of up to 25cm, where a
successful homing navigation was dened as stopping less than 1cm from the target
3.2. Mapless Robot Navigation
29
location. Stürzl et al. [114] developed an extended version of the method of Franz et
al. [39, 40], where increased robustness and speed were achieved by switching to the
Fourier domain. The algorithm is similar to that by Hong et al. [48], but importantly
a constant robot heading is not required and more complete experimental results
were presented.
Labrosse [74] presented a global appearance-based homing strategy that uses the
entire omnidirectional image as is, without the need for costly feature extraction
or matching. Moving away from the one dimensional image approaches in [19, 39,
48, 104, 114], the proposed algorithm directly uses a two dimensional colour image.
Zeil et al [127] observed that the Euclidean image distance between physically close
omnidirectional images with the same orientation reduces smoothly as the image
locations get closer. This fact is exploited by Labrosse in his approach. The current
image is rotated to be orientated in the same direction as the target image, and
then warped into two additional images, one that resembles an image taken from
the right of the current position and one that resembles an image taken from in
front of the current position. The current and warped images are then compared
to the target image, and the robot drives in the direction of the descending image
distance gradient to get home. Robot rotation was permitted by using the Labrosse
visual compass described in Section 3.2.1, and the use of full images increased the
robustness of the algorithm. Experiments operating over ranges of up to 5 m to the
target were reported to have an accuracy of approximately 40 cm.
An approach very similar to that of Labrosse was developed independently by Möller
and Vardy [87], in which the optic ow templates of forward and sideways motions
are applied to the image.
The core dierence of their algorithm, called matched
lter descent image distance (MFDID), to that of Labrosse is that in MFDID the
image warpings are implicitly calculated as part of the gradient descent, whereas in
Labrosse [74] the image warpings are explicitly created. This provides a theoretical
advantage in that a rigorous mathematical derivation is given, although the practical
advantages are not clear with comparatively few results presented.
3.2. Mapless Robot Navigation
30
Although promising results have been presented by the above homing approaches,
the methods make limiting assumptions regarding the distribution of landmarks
around the robot.
In particular, landmarks are either assumed to be at a con-
stant distance from the robot, or evenly (isotropically) distributed around the robot.
Möller et al. [88] extended the MFDID algorithm to deal with the problem of a nonisotropic landmark distribution, creating an algorithm known as Newton-MFDID.
Homing vector directions are improved by combining a Newton algorithm gradient
descent with the earlier developed MFDID algorithm. However, the assumption of
constant distance landmarks is still required.
Alternative approaches to homing without such assumptions have been created by
not only employing appearance-based vision, but by considering the geometry of the
scene. A key example of this is the work by Goedeme et al. [44], in which a visual
servoing approach to image-based homing is presented which relies on the tracking
of features with an extended Kalman lter combined with 3D reconstruction from
wide-baseline stereo. Although the method is claimed to be robust and accurate,
no detailed experimental results are provided. Furthermore, in order to be able to
track and match image features between frames a computationally demanding image
descriptor was used. The application domain of the algorithm is that of autonomous
wheelchairs, where limits on the computational power available are not as restrictive
as those for low-cost robots.
Image-based homing can be considered as a form of image registration. As stated by
Zitova and Flusser [131], image registration is the process of overlaying images (two
or more) of the same scene taken at dierent times, from dierent viewpoints, and/or
by dierent sensors. (Zitova and Flusser [131], page 1 paragraph 1.) The calculation
of a robot homing vector from a current image and a target image is essentially image
registration, in which the homing vector is given by the transformation between the
image viewpoints.
When viewing a planar surface from two viewpoints, the image transformation of
the imaged surface between dierent views is purely ane. Such transformations are
3.3. Map-based Robot Navigation
31
referred to as homographies, and well known feature-based algorithms can be used
to nd the transformations (see Hartley and Zisserman [46]). Image homographies
have been exploited by López-Nicolás et al. [79] for robot homing, where multiple
image homographies are calculated and used.
The use of omnidirectional vision
meant that several planar surfaces were visible in the image, and by calculating the
transformations of the surfaces between the current image and the target image,
steering commands were generated to drive the robot between locations.
However, methods of calculating image homographies rely on image feature extraction and matching, and incorporate computationally expensive statistical data
tting algorithms. This is unsuitable for a robot with limited computational power,
and Sim and Dudek [109] showed that global appearance-based navigation is more
robust to noise and occlusions than a local feature-based approach.
Chapter 7 demonstrates that a full feature-based homography calculation is not necessary in order to navigate an indoor robot to a target image. An appearance-based
approach to image-based homing that utilises the ceiling plane is developed, thereby
avoiding assumptions about feature distributions (isotropic or constant distance).
3.3
Map-based Robot Navigation
Map-based localisation involves using some form of model of the environment to
calculate the localisation of the robot based on the current perception. The model
can be supplied beforehand, for example as a geometric CAD model of the environment [69], a simple list of landmark locations [15], or a trained articial neural network [98]. If no map is supplied prior to navigation, then a map can be constructed
as the robot moves. This is often referred to as SLAM (Simultaneous Localisation
and Mapping), and is a very active research area in robotics and computer vision.
Maps can be split into two main categories: metric and topological, as depicted in
Figure 3.1.
Metric maps can be either sparse, like those created in visual SLAM
3.3. Map-based Robot Navigation
32
(a)
Figure 3.1:
(b)
Examples of (a) a dense metric map created by an occupancy grid
algorithm with a laser range nder and (b) a topological map, in this case a rail
network.
algorithms, or dense like those created in occupancy grid based algorithms. Dense
metric maps like that shown in Figure 3.1a are computationally expensive to create
and maintain, and often contain information that is not actually needed by the
robot.
They are therefore not appropriate for use in the algorithms developed in
this thesis. On the other hand, the sparse maps produced by visual SLAM are more
suitable for low-cost navigation.
Maps that do not provide any metric information about the environment, but rather
only show the connections between places, are called topological maps. Topological
maps are generally considered cheaper to create and maintain than metric maps
[50], and are therefore of key interest in this work.
They are usually sparse and
contain few nodes, with each node representing a dierent location such as
a dierent room [22] or suciently dierent perceptions [129]. Navigation using a
topological map consists of nding a route through nodes in the map that gets to the
desired location, and requires the ability to navigate between nodes. The two main
approaches to achieve this are dead-reckoning with local metric information [99],
and image-based homing.
Two key areas of interest in map-based navigation visual SLAM and topological
maps are discussed further in the following two sections.
3.3. Map-based Robot Navigation
33
3.3.1 Visual Simultaneous Localisation and Mapping with
Metric Maps
The goal of simultaneous localisation and mapping (SLAM) is to have a robot construct a consistent map of its initially unknown environment at the same time as
keep track of its position. No prior knowledge of the environment or robot trajectory
is provided, but instead the trajectory is calculated online using only the sensor measurements available. An introduction to the SLAM problem can be found in [30], in
which it is pointed out that SLAM is a solved problem and a notable achievement
of the robotics community in the past
15
years.
The solution comes by incorporating the probabilistic uncertainties of landmark and
robot positions, and relies on the correlation between measured landmarks. The rst
demonstration of the importance of the correlations between landmarks was in 1995
by Durrant-Whyte et al. [32], who demonstrated the convergence of the map when
incorporating all uncertainties. The SLAM solution is formulated in a probabilistic
manner, with the current estimate of the robot position and map being maintained
as a probability distribution. Critically, the probability distribution is maintained
over the complete state comprising of both the map and the landmark positions.
The two main approaches used for maintaining the state estimate are the extended
Kalman lter (EKF), and the particle lter.
The extended Kalman lter repre-
sents the probability distribution as Gaussian, allowing simple parameterisation as
a mean and covariance matrix.
This approach is computationally appealing, but
can lead to inaccurate estimation when the probability is non-Gaussian.
Particle
lters attempt to overcome this by representing the distribution with a discrete set
of samples referred to as particles, allowing multi-modal non-Gaussian distributions
to be modelled. However, this approach generally requires more computational resources in order to maintain the set of samples [6]. The current literature is split
between particle lter based algorithms such as FastSLAM [89], and EKF algorithms such as Divide and Conquer SLAM [101]. A comparison between an EKF
3.3. Map-based Robot Navigation
34
and a particle lter is provided by [1].
Until recently the most common sensor used for SLAM was a laser range nder.
The high cost and limited sensing capabilities of laser range nders has led many
researchers to investigate the use of vision as a primary sensor for SLAM [102].
However, unlike laser range nders, standard monocular vision does not directly
give range information. This can be overcome by using stereo vision, which allows
range information to be calculated so that landmarks can be initialised directly into
the map as in [2, 33].
Methods of SLAM using a single camera have also been developed, perhaps the best
known of which is the pioneering work of Davison [26].
The proposed approach
named MonoSLAM is a 3D extended Kalman lter based SLAM system in which
three dimensional points are mapped. The points are found as image corner features
and matched using xed region size template matching.
New features are only
initialised in the map after having been seen in a few consecutive frames so that
their depth can be estimated. A similar system that replaces the Kalman lter with
a particle lter was presented in [72].
Omnidirectional vision has received relatively little attention as a sensor for metric
single camera SLAM, despite being ideally suited to the process of localisation.
Features remain in view in omnidirectional images for longer than in regular images,
allowing increased robustness, less frequent loop closing and the ability to cope with
fewer features in the world.
Kim et al. [60] proposed a particle lter based SLAM system making use of coaxial stereo omnidirectional vision. They claim that omnidirectional stereo vision
increases robustness and allows faster operation as fewer particles are required to
model the probability distribution. In an earlier paper [61], the same authors used
side by side omnidirectional stereo vision in an extended Kalman lter SLAM system.
However, although limited results were presented, the advantages gained
through the use of omnidirectional vision were not quantied.
Furthermore, the
3.3. Map-based Robot Navigation
35
applicability of the approaches to real robots is limited by the use of a restrictive
motion model and a computationally demanding feature detector that operates ofine.
Moving away from stereo vision, a method that combines the use of monocular
omnidirectional vision with a laser range nder has been developed by Kim et al. [62].
In their approach omnidirectional vision is used to provide the bearing to vertical
edge features, while a laser range nder provides both horizontal line features and
the range to the vertical features.
The use of a laser range nder assists feature
initialisation and improves system accuracy, but makes the method only applicable
to robots equipped with range nders.
A comparison between bundle adjustment optimisation based and Kalman lter
based methods for SLAM in the context of omnidirectional vision was presented by
Deans and Hebert [27].
Experiments were performed on both simulated and real
data, and a method of combining the bundle adjustment with a Kalman lter was
developed. Although the approach is interesting from a theoretical point of view,
there is a lack of quantitative results and the method incurs a high computational
cost.
The most notable implementation of monocular omnidirectional SLAM is that by
Lemaire et al. [76]. An EKF based 3D SLAM system using monocular panoramic
vision was demonstrated that maps Harris corner feature points and uses an appearance based method for loop closing in large scale outdoor environments. The
underlying SLAM algorithm was the same as Davison et al. [26] and similar to that
implemented in this thesis in Chapter 5.
3.3.2 Topological Map Navigation
There are many dierent methods of using visual information in a topological map.
The two main issues that dierentiate the methods are how the topological map
3.3. Map-based Robot Navigation
nodes are represented and matched, and what the node creation criteria is.
36
The
natural choice when using vision is for each map node to be represented by an
image taken at a specic location.
The node then corresponds to the location at
which the image was taken.
Winters [42, 124, 125] proposed the use of principal component analysis (PCA) to
create visual appearance-based topological maps. In a map construction phase omnidirectional images were recorded at 50cm intervals along a corridor and a lower
dimensional eigenspace was calculated directly from the recorded images. A topological map was then constructed from the image sequence with the projection of
each image into the lower dimensional eigenspace stored as a map node. Localisation
was achieved by comparing the projection of the current image into the eigenspace
with every node in the map, and using the nearest neighbour as the localisation of
the robot.
Jogan and Leonardis [56] also proposed the use of omnidirectional image eigenspaces
for image-based localisation. Unlike Winters, they considered rotation invariance.
It was observed that two physically close locations only have images that project to
similar vectors in the eigenspace if the orientations of the images are the same. To
overcome this, they introduced the idea of spinning images, in which all rotations
of an omnidirectional image are considered.
An alternative node representation was proposed by Ishiguro and Tsuji [52]. They
represented locations by the coecients of a Fourier transformed panoramic image captured at the location.
Although the coecients provided a heavily com-
pressed representation of the image, they are highly susceptible to local illumination changes [45]. More recently, the bag-of-words paradigm discussed in Chapter
2.4 has been employed for topological mapping by [4, 25, 34]. Nodes are represented
by the visibility of image features, as demonstrated by Cummins and Newman [25]
to result in a compact node representation that is both descriptive and robust to
image noise. However, extracting image features and descriptors to represent local
appearance can be computationally expensive and relies on the presence of features.
3.3. Map-based Robot Navigation
37
Neal and Labrosse [93] and Koseka et al. [70] propose methods of selecting a canonical set of characteristic images from a dense sequence.
The set of canonical im-
ages are then used as topological map nodes. Neal and Labrosse [93] achieve this
using an articial immune system algorithm that both selects key images from a
set of panoramic images and simultaneously reconstructs their topology. A global
appearance-based approach is used for matching images in which the image is reduced in size to
50 × 50
pixels and converted to CIE L*a*b* colour space.
The
Euclidean distance between reduced images then provides a comparison metric both
for the articial immune system algorithm and for localising the robot in the map.
Localisation is achieved by taking the node with the minimum Euclidean distance
to the current image as the current location, and image-based homing is suggested
for navigation between nodes.
Koseka et al. [70] use Learning Vector Quantisation (LVQ) to determine characteristic images in a video sequence.
A histogram of edge gradient orientations is
calculated for every frame, and used as a global appearance descriptor. After selecting regions to be mapped by splitting the image sequence based on temporal
information, LVQ is used to determine a reduced set of representative histograms
that can be used to describe the regions. Localisation is achieved by comparing the
current image histogram to the representative histograms of mapped spatial regions,
followed by a nearest neighbour approach to determine the most likely location. As
only temporal information is used when constructing the topology, no cycles are
determined and each location is assumed to have been visited only once.
The use of self-organisation for topological mapping has been developed by Pisokas
and Nehmzow [103]. In their approach map learning is split into two phases, rstly
a self organising map is created from sonar perceptions logged from a robot driven
around an environment. Each node in the SOM is then used to represent a topological map node. In the second phase the robot is driven around the same environment
using four discrete actions (forward 50cm, backward 50cm, left turn
40◦ ),
40◦ ,
right turn
and metric links are recorded between the topological map nodes.
This is
achieved by linking the SOM node that is activated by the current sonar reading
3.4. Summary
38
and the SOM node that was activated prior to taking the last action.
link is labelled with the action taken.
The new
The topological map can then be used for
navigation by nding a route between the current node and the target node, and
executing the actions associated to the links in the route.
Severe problems with
perceptual aliasing were encountered because large physical spaces were represented
by single map nodes, and therefore few successful navigations were demonstrated.
Although the approach does not consider the use of visual information, it is closely
related to the work presented in Chapter 6.
3.4
Summary
Robot navigation strategies can be classied as either mapless or map-based. Mapbased strategies can be split into topological map-based navigation and metric mapbased navigation, while mapless navigation can be split into ego-motion estimation
methods and image-based homing methods.
The following table summarises the
reviewed work in each of these four areas:
Navigation Approach
Mapless
Ego-motion estimation
Lambrinos et al. [75]
Usher et al. [122]
Albert & Conell [3]
Sturm & Visser [113]
Labrosse [73]
Advantages
Disadvantages
Computationally
Dedicated
ecient
only outdoors
Computationally
Dedicated
ecient
only outdoors
Single camera and
Specic features re-
low computation
quired
No specic features
Computationally
needed
demanding
Computationally
ecient
sensor,
sensor,
3.4. Summary
39
Matsuhisa et al. [83]
High accuracy
Computationally
demanding
Mikusik et al. [85]
High accuracy
Computationally
demanding
Scaramuzza et al. [108]
High accuracy
Full
homography
calculation
Scaramuzza et al. [106]
Cartwright
&
Collett
Computationally
Vehicle
ecient
heavily limited
Field inspiring
Unrealistic simula-
[18]
Röfer [104]
motion
tion
Computationally
Compass
ecient
little
required,
experimenta-
tion
Franz et al. [39]
Eective
Homing
only
over
short range (25 cm)
Image based homing
Stürzl et al. [114]
Hong et al. [48]
Robust and compu-
Homing
tationally ecient
short range (25 cm)
Inspired
Compass
Franz
et
al. [39]
little
only
over
required,
experimenta-
tion
Labrosse [74]
Long range opera-
Compass
required,
tion (5 m)
feature distribution
important
Möller & Vardy [87]
Formally
dened
algorithm, eective
Computationally
demanding,
ture
fea-
distribution
important
3.4. Summary
Möller & Vardy [88]
40
Feature
tion
distribu-
Computationally
constraints
expensive
and
Computationally
lifted
Goedemé et al. [44]
Robust
accu-
rate
3D
expensive
feature tracking
López-Nicolás et al. [79]
Robust
and
accu-
rate
Computationally
expensive
homog-
raphy calculation
Montemerlo et al. [89]
Paz et al. [101]
Multi-modal
prob-
Computationally
ability model
expensive
Increased eciency
Laser
scanner
based
Davison [26]
Single
monocular
camera
Image features required, large maps
are
computation-
Metric maps
Map-based
ally expensive
Kwok et al. [72]
Multi-modal
prob-
ability model, sin-
Computationally
expensive
gle camera
Kim et al. [60]
Accurate
localisa-
tion
Requires
precise
omnidirectional
stereo,
oine
operation
Kim et al. [61]
Accurate
tion
localisa-
Requires
precise
omnidirectional
stereo,
operation
oine
3.4. Summary
41
Kim et al. [62]
Deans & Hebert [27]
Monocular omnidi-
Requires
laser
rectional vision
range nder
High accuracy
Computationally
expensive
Lemaire et al. [76]
Davison
[26]
proach
improved
with
ap-
Improvement
not
quantied
omnidirec-
tional vision
Winters [42, 124, 125]
Disk storage space
Non
rotation
for map low
variant,
in-
computa-
tionally expensive
Jogan & Leonardis [56]
Ishiguro & Tsuji [52]
Rotation
invariant
Computationally
perception
expensive
Highly compressed
Limited
perception, ecient
highly illumination
accuracy,
Topological maps
sensitive
Cummins
&
Newman
[25]
Accurate
tional
opera-
Computationally
large
expensive, oine
over
environments
(10
Km)
Neal & Labrosse [93]
Kosecka et al. [70]
No image features
Robot
navigation
required
not considered
No image features
Only temporal in-
required
formation for mapping (no loops)
Pisokas
zow [103]
&
Nehm-
No image features,
Using only a lim-
computationally ef-
iting sonar percep-
cient
tion
3.4. Summary
42
Each of the four areas are considered in this thesis.
The most important related
works in these areas are:
Mapless navigation:
1.
Dead reckoning: (see Section 3.2.1)
Dead reckoning using wheel encoders is fast and ecient, but very prone to drift from
erroneous ego-motion estimates stemming from wheel slippage.
Visual odometry
aims to eradicate this problem by supplementing or replacing the wheel encoders
with a camera. Doing so eciently is important if visual odometry is to be of use
on economical hardware. The Labrosse visual compass [73] provides the best visual
compass information in terms of accuracy and eciency. In Chapter 4 two ecient
visual compass approaches inspired by [73] are developed with the aim of being
even more accurate, and omnidirectional stereo is considered for nding the robot
translation.
2.
Image-based homing: (see Section 3.2.2)
As a form of mapless navigation that does not use metric information, the problems
of drift error encountered in dead reckoning are avoided. López-Nicolás et al. [79]
demonstrated that image homographies allow accurate visual homing.
However,
using them is computationally expensive requiring feature extraction and statistical data tting algorithms.
Labrosse [74] presented an appearance-based homing
strategy that uses the raw image directly, resulting in an ecient algorithm suitable
for low computationally powered robots. In Chapter 7 a new image-based homing
algorithm is presented that benets from the use of image homographies as in [79],
while employing ecient appearance-based navigation strategies as in [74].
Map-based navigation:
3.
Metric simultaneous localisation and mapping: (see Section 3.3.1)
Davison's MonoSLAM approach [26] was the catalyst for intense interest in monocular vision based SLAM, and is the topic of interest in Chapter 5.
Existing im-
plementations of monocular omnidirectional vision based SLAM have not quantitatively assessed the benets gained from omnidirectional sensing. In Chapter 5 this
3.4. Summary
43
is provided through an implementation of monocular camera SLAM.
4.
Topological map navigation: (see Section 3.3.2)
Topological maps are generally considered cheaper to create and maintain than
metric maps [50], and are therefore of key interest in this work.
In Chapter 6 a
topological mapping and navigation algorithm is developed that is closely related to
the SOM based approach by Pisokas and Nehmzow [103]. The important dierence
is that rather than limiting the number of topological map nodes to be equal to the
number of nodes in the SOM, in the algorithm presented in Chapter 6 the SOM
is used to eciently index the topological map nodes. This permits the number of
map nodes to be as large as required and avoids the problems of perceptual aliasing
encountered in [103]. Furthermore, unlike [103] the approach detailed in Chapter 6
uses a camera sensor rather than a sonar sensor and works in a continuous action
space.
This thesis now continues by considering the use of omnidirectional stereo vision for
robot ego-motion estimation.
4
Ego-Motion Estimation with
Binocular Omnidirectional Vision
4.1
Introduction
As discussed in Chapter 3.2, calculating the ego-motion of a robot between frames
allows a metric localisation to be formed without the need for a map. Not requiring
a map is desirable when considering low-cost robots as no storage or look up is
required, and no map maintenance or construction is necessary. In this chapter the
application of binocular omnidirectional vision for ecient ego-motion estimation
with low-cost robots is considered.
Using binocular vision allows the distance to objects to be calculated. If the robot's
rotation is known then the distance to a single image feature tracked between consecutive frames provides enough information to calculate the translation between
the frames. A simple way of achieving omnidirectional stereo is to use two co-axial
conical mirrors as presented by Spacek [112] and summarised in this thesis in Chapter 2.3. However, the formulation provided by Spacek does not work in real-world
situations where perfect alignment between mirror and camera cannot be assumed.
A contribution of this thesis is the full derivation of formulae for omnidirectional
4.2. Visual Compass
45
stereo vision using two arbitrarily placed conical mirrors.
In this chapter the estimation of the robot's ego-motion is considered in two parts,
rstly the calculation of the rotation and secondly the calculation of the translation.
Rotation calculation is carried out using a single monocular omnidirectional camera,
providing another contribution of this thesis:
of calculating a robot's rotation.
the proposal of two new methods
This chapter is organised as follows; rstly the
rotation estimation methods are detailed, and experiments that were carried out
using simulated and real images are discussed.
Binocular omnidirectional stereo
vision is then considered for calculating the robot's translation, and formulae for
achieving stereo with arbitrary camera and mirror placement are derived. Finally
the stereo formulation is discussed in the context of low-cost robots.
4.2
Visual Compass
The Labrosse visual compass approach was introduced in Chapter 3. Two alternative
approaches to calculating the rotation
and
f2
ω
of the robot between two video frames
f1
by using omnidirectional vision are (a) to calculate the optic ow in the front
and back regions of the image directly from two edge points, and (b) to calculate
the optic ows of 1D representations of the front and back regions. As pointed out
in [96], the front and back regions represent the expansion and contraction elds
of an omnidirectional image, and are less aected by translation. Only looking at
the ow in these regions, as shown in Figure 4.1, helps to separate rotation from
translation.
4.2. Visual Compass
46
Figure 4.1: The image sectors used to calculate the optic ow. The expansion and
contraction elds exhibit less translation induced optic ow, making them more
suitable for rotation estimation.
The two rotation estimation methods developed calculate the average optic ow in
the expansion and contraction elds. Each method calculates this ow in a dierent
way.
4.2.1 Method 1: Rotation estimation from optic ow calculated using point pairs
Horn and Schunk [49] compute optic ow (Equation 4.2) from the assumption that
the grey level values of image points do not change as the points move across an
image:
f (x, y, t) = f (x + dx, y + dy, t + dt)
where
f (x, y, t)
(x, y)
at time
t,
(4.1)
is the temporal image function giving the grey level value of pixel
and
over a time interval
(dx, dy)
is how far the image point has moved in the image
dt.
Using only the rst order terms of the Taylor series gives the optic ow equation:
−
∂f
'v·g
∂t
(4.2)
4.2. Visual Compass
where
47
, dy )
v = (vx , vy ) = ( dx
dt dt
df
, df
dx dy
is the optic ow vector to be found,
g = ∇f (x, y) =
is the rst derivative of the image found by convolution with an edge
detection kernel, and
·
denotes the scalar product of two vectors. This is a single
equation in two unknowns, and therefore cannot be solved directly, leading to the
development of many iterative methods for calculating the ow [9].
However, by dividing both sides of Equation 4.2 by
magnitude
vg
|g|
(the magnitude of
of the ow in the direction of the unit vector
vg = v · ĝ ' −
∂f
∂t
|g|
ĝ
g),
the
can be calculated:
.
(4.3)
The overall optic ow can be calculated directly (non iteratively) by combining the
ow in the edge gradient directions of two adjacent but dierently oriented edge
1
pixels in the image.
Let two edge elements (edgels) with dierent orientations
have unit gradient vectors
û = (ux , uy ), ŵ = (wx , wy )
typically found on opposite sides of a point
c
of high curvature, as shown in Figure
4.2. Substituting these known unit gradients for
vu = v · û,
where
vu
and
vw
respectively. Such edgels are
ĝ
in to Equation 4.3 gives:
vw = v · ŵ
(4.4)
are the magnitudes of the optic ow in the direction of the edge
gradients, obtained by evaluating the right hand side of Equation 4.3.
By applying the ow magnitude in the
v
at point
c
û and ŵ directions at c, the optic ow vector
can be estimated directly by solving Equations 4.4 for
vx =
provided that
vu wy − vw uy
,
ux wy − uy wx
ux wy − uy wx 6= 0.
the two edge gradients
ŵ
and
û
vy =
vx
and
vw ux − vu wx
ux w y − uy w x
vy :
(4.5)
This condition will always be met provided that
are not in the same or exactly opposite directions.
1 This assumes that the two adjacent edge pixels are moving in the image in the same way. This
is a reasonable assumption provided that the edge pixels are suciently close together.
4.2. Visual Compass
48
(a)
(b)
Figure 4.2: (a) The optic ow at point
c
can be found by combining the optic ow
in the direction of the image gradients at two points either side of point
calculation of the
θ
c.
(b) The
component of the optic ow vector.
The robot's rotation
ω
can be estimated by averaging the angular ow components
in the front and back regions of the omnidirectional image:
ω = mean(vθ )
where
vθ
is the angle between the line from the centre of the image to
from the centre of the image to
c + (vx , vy )T
(4.6)
c and the line
as shown in Figure 4.2. Performing
this conversion rather than unwarping the front and back regions of the image prior
to calculating the ow avoids noise that would be introduced into the edge detection
process by the image interpolations.
The two-point optic ow method of rotation estimation is summarised in Algorithm
4.1. The parameters of the algorithm are:
•
A gradient magnitude threshold (τ1 ).
This dictates how strong the edge pixels in the front and back region have to
be in order to be used to calculate the optic ow. Experimentation showed
that the algorithm is not sensitive to this value; in experiments reported here
the threshold value is set at
gradient vectors.
5,
which proved sucient to lter out very weak
4.2. Visual Compass
•
49
The front and back region size.
The region size dictates how much of the image is used for rotation estimation.
The wider the region size, the more the rotation estimate will be corrupted
by translation.
Labrosse [73] reported a region size of
60◦
to be the best
compromise between losing information and being inuenced by translation.
The same region size is used in all experiments here.
•
The size of the window to look for adjacent gradient vectors
û
and
ŵ (s × s).
This dictates how far apart the adjacent gradient vectors are permitted to
be.
Points that are far apart are less likely to be part of the same object,
meaning that the overall optic ow cannot be computed from their individual
ows. However, restricting the search to very close point pairs will reduce the
number of candidate pairs. A window size of
5×5
was found to provide the
best results, and is therefore used in the experiments reported here.
•
The minimum angle between the adjacent gradient vectors (anglemin )
In order to calculate the optic ow from two adjacent edge pixels, the gradient
vectors at the adjacent pixels must be suciently dierent.
was used in all experiments reported here.
A value of
40◦
This proved to provide the best
compromise between not nding enough adjacent edges, and the increased
sensitivity to noise when using edge pixels with too similar orientations.
4.2. Visual Compass
50
Algorithm 4.1 Calculate the rotation of the robot between two frames using the
optic ow in the front and back regions, calculated using the two-point optic ow
method outlined.
Input: Frame 1 (f1 ), Frame 2 (f2 )
Parameters: Gradient magnitude threshold τ1 .
s × s.
between the gradient vectors of adjacent point pairs, anglemin
Output: The rotation of the robot between f1 and f2 , ω .
1: Extract the front and back regions from image f1 .
Front and back region size. Adjacent pair window size
Minimum angle
2: Perform rst derivative edge detection over the front and back regions of image
f1 ,
returning the partial derivatives
∇f1 = (fx , fy ).
3: Compute the gradient magnitudes of each front and back region pixel in
4:
G(pixel) = |∇f1 (pixel)| =
G(u) > τ1
if
as
then
for all pixels w in a
6:
if
7:
s×s
window around
u do
G(w) > τ1 and the angle between u and w > anglemin then
(u, w, c) to the set of ow points P, where c = u+2 w .
Add
8:
end if
9:
end for
10:
end if
11:
12:
end for
13:
if
15:
f1
fx2 + fy2 .
for all pixels u in the front and back regions do
5:
14:
p
P
contains no members
then
return 0 {FAIL}
end if
← 0.
(u, v, c)
16: totalow
17:
18:
for all
in
P
do
Calculate the ow in the direction of the gradient for pixels
u and w using
Equation 4.4.
19:
Combine the gradient direction ows of
u and w to nd the ow at point c
using Equation 4.5.
20:
totalow
21:
end for
22:
ω←
23:
←
totalow
return
ω.
totalow +
÷
θ
component of ow at point
number of members in
c.
P.
4.2.2 Method 2: Rotation estimation from optic ow calculated using 1D images
An alternative to calculating the optic ow of the image directly is to reduce the
image to a one dimensional representation, and to calculate the ow of the reduced
4.2. Visual Compass
image.
51
The one dimensional representation can be found by averaging the radial
lines of the image in the same way as in Briggs et al. [13], as shown in Figure 4.3.
Again only the front and back regions are used.
(a)
(b)
Figure 4.3: (a) The regions of the image that are converted into 1D representations.
(b) Example of two 1D representations at time
t1
and
t2 ,
and the disparity between
them.
The front and back 1D representations of a
W ×H
image
I(w, h) are calculated as:
H/2 X
W
H
s s
F (i) =
I
+ r.cos(i),
+ r.sin(i) , i = − ...
2
2
2 2
r=0
H/2 X
W
H
s
s
B(i) =
I
+ r.cos(i),
+ r.sin(i) , i = − − 180... − 180
2
2
2
2
r=0
where
s is the region width in degrees, r is the distance along the radial line, F
front region 1D representation,
of the image
I
(4.7)
(4.8)
is the
B is the back region 1D representation and the values
are interpolated using bilinear interpolation. Alternative interpola-
tion strategies could also be employed, for example nearest-neighbour or bicubic,
but bilinear interpolation was used here as a compromise between the higher computational cost and accuracy of bicubic interpolation and the lower computational
cost and accuracy of nearest-neighbour interpolation.
When the 1D representations have been found, the optic ow between them (shown
in Figure 4.3b) can be computed eciently by using the single scale 1D case of the
4.2. Visual Compass
52
Lukas-Kanade iterative image registration technique [80]:
h0 = 0
(4.9)
P
hk+1 = hk +
where
F (x)
F0
w(x) =
1
,
|G0 (x)−F 0 (x)|
hk
0
x
w(x)F (x + hk )[G(x) − F (x + hk )]
P
0
2
x w(x)F (x + hk )
is the optic ow estimate at iteration
(4.10)
k , G(x)
and
are the front and back regions of the current and previous 1D image, and
and
G0
are the rst derivatives approximated by
G0 (x) ≈ G(x + 1) − G(x).
F 0 (x) ≈ F (x + 1) − F (x)
The iteration continues until
∆h
is less than
and
0.001,
i.e
the estimate converges.
An alternative strategy for nding the correspondence between the 1D image representations is to use the Euclidean distance, as in [73]. However, the shift
images can be very small.
h
between
Using the Lukas-Kanade image registration technique
allows the subpixel shift to be found directly with the interpolation built into the
process. Using the Euclidean distance method presented in [73] requires additional
computation if subpixel accuracy is required.
As in method 1 (Section 4.2.1), the average of the front 1D representation ow and
the back 1D representation ow gives
ω.
The full procedure for calculation of
ω
is
detailed in Algorithm 4.2. As with Algorithm 4.1 a region size needs to be decided
in advance.
In all experiments reported here the region size was the same as for
◦
experiments with Algorithm 4.1 (60 ).
4.3. Limitations of the Methods
53
Algorithm 4.2 Calculate the rotation of the robot between two frames by reducing
the front and back regions to 1D representations and calculating the optic ow in
1D.
W ×H
w degrees. w must
between f1 and f2 , ω .
Input: Frame 1 (f1 ), Frame 2 (f2 ), dimensions
Parameters: Front and back region width,
Output: The rotation of the robot
1:
2:
3:
4:
5:
6:
7:
8:
be an even integer.
F ← zero vector size w {The average radial line image for image f1 }
G ← zero vector size w {The average radial line image for image f2 }
w
w
for i = − 2 to 2 do
for r = 1 to H do
F (i) ← F (i) + f1 ( W2 + r.cos(i), H2 + r.sin(i))
G(i) ← G(i) + f2 ( W2 + r.cos(i), H2 + r.sin(i))
end for
end for
9: { Compute the disparity
h
between the signatures in the front and back re-
gions by using the Newton-Raphson like iterative image registration technique
proposed by Lucas-Kanade in [80]}
10:
11:
12:
13:
14:
15:
16:
17:
18:
h←0
F 0 ← zero vector size w − 1 {The rst derivative of F}
G0 ← zero vector size w − 1 {The rst derivative of G}
W ← zero vector size w − 1 {The weighting vector}
for i = 0 to w − 1 do
F 0 (i) = F (i + 1) − F (i)
G0 (i) = G(i + 1) − G(i)
1
W (i) ← |G0 (x)−F
0 (x)|
end for
19:
repeat
20:
∆h ← x P w(x)F 0 (x+h)2
x
h ← h + ∆h
until ∆h < 0.001
return h
21:
22:
23:
4.3
P
w(x)F 0 (x+h)[G(x)−F (x+h)]
Limitations of the Methods
Before detailing experiments with the visual compass approaches, it is important to
point out the limitations that can be expected of the approaches. These limitations
are inherent to many vision systems, and similar to biological vision limitations.
Both methods require visible features in the environment. Although the
1D case does
not explicitly use features, an image of a single non-varying intensity will produce
4.3. Limitations of the Methods
the same
1D
54
representation regardless of rotation and the method will fail. The
2D
optic ow method depends even more heavily on the presence of edge features. For
example, if there is a smoothly varying gradient in the image then the
will succeed, whereas the
2D
A further limitation of the
1D
method
approach will be unable to calculate the optic ow.
1D
approach is that in environments of uniformly dis-
tributed colour, averaging the radial lines in the image will produce non-varying
1D
representations which cannot be used to nd rotation. However, this limitation
is pathological and unlikely to occur in realistic scenarios.
Non uniform varying
intensity within an image can almost always be assumed, as diering objects or
non-uniform lighting will cause varying intensity.
As with the Labrosse approach [73], self repeating chess-board type environments
could potentially cause problems with the
arated
1D
1D
approach. When the temporally sep-
images are registered using Equation 4.10 the rotation estimation will
be found as the closest signature shift to
that the closest rotation to
0
0.
However, aliasing will occur meaning
may be the incorrect rotation. Again this limitation
is pathological and unlikely to occur.
An underlying assumption of both methods, and indeed of many of the appearancebased visual navigation methods discussed in Chapter 3 including the Labrosse
approach, is that of an isotropic distribution of features in the environment. This
assumption is used to allow the separation of rotation from translation by considering
only the image movement in the front and back regions of the image. However, if
this assumption is false then the translation induced optic ow in the left and right
sides of the image will not balance each other out.
therefore be corrupted.
The rotation estimation will
4.4. Experiments and Results
4.4
55
Experiments and Results
In this section the two rotation estimation algorithms proposed in Section 4.2 are
quantitatively evaluated.
Each method was tested with both simulated and real
images, and compared to the Labrosse visual compass introduced in Chapter 3.
4.4.1 Simulated image experiments
The omnidirectional image simulation method proposed by Burbridge and Spacek
[17] was used to generate
250
articial images at
20
frames per second. Using this
approach, a ray tracing package called POV-Ray was used to generate ray traced
articial images of an environment reected in a conical mirror. The position of the
simulated camera was changed by linear and angular velocity commands, which were
executed using the kinematic equations of a dierential drive robot, thus simulating a
robot that is equipped with omnidirectional vision. The exact ground truth position
of the robot in the simulated environment was available from the simulator.
(a)
(b)
Figure 4.4: The output of the omnidirectional vision simulator used. (a) The simulated environment. (b) An example of a simulated omnidirectional image.
The simulated robot was moved with a random rotational velocity between
per second and
20◦
per second, and a xed linear velocity of
0.5 m/sec.
−20◦
The environ-
ment model is shown in Figure 4.4a, and an example of the omnidirectional image
generated is shown in Figure 4.4b. The mean absolute rotation between frames was
4.4. Experiments and Results
0.0051
56
radians.
A graph of the computed heading of the robot using the two estimation algorithms
described in Section 4.2 and the Labrosse visual compass [73] (discussed in Chapter
3.2.1.1) is shown in Figure 4.5a.
Although all methods show a drift error, the
Pearson product-moment correlation coecients (PMCC) between the actual change
in heading (ω ) and the estimated change in heading (ω̃ ) for each method show very
strong correlations. The error in the estimated change in heading for each method
(ωe
ω¯e
= ω − ω̃ ) followed an approximately normal distribution, characterised by means
and standard deviations
σ
shown in Table 4.1.
0.4
2−Point
optic flow
0.35
0.3
Heading (radians)
0.25
Ground
truth
0.2
0.15
Labrosse
0.1
1D image
optic flow
0.05
0
−0.05
−0.1
0
50
100
150
200
250
Frame number (20 fps)
(a)
0.16
1D image
optic flow
0.14
2−Point
optic flow
Heading error (radians)
0.12
0.1
Labrosse
0.08
0.06
0.04
0.02
0
−0.02
0
50
100
150
200
250
Frame number (20 fps)
(b)
Figure 4.5: (a) Accumulated heading estimates for the two proposed methods and
the Labrosse method alongside the ground truth over
250 simulated frames.
(b) The
error in the accumulated heading estimates (the drift error) for the two proposed
methods and the Labrosse method over the
250
simulated frames.
4.4. Experiments and Results
57
Mean error
ω¯e
2-point Optic Flow
1D Image Flow
radians
3.50 × 10−4
−3.23 × 10−4
−1.55 × 10−4
Labrosse
Standard
deviation
PMCC
σ
between
ω
0.0021
0.93
0.0026
0.87
0.0016
0.96
and
ω̃
Table 4.1: Performance statistics for the two proposed methods and the Labrosse
method for
250
simulated image pairs.
Figure 4.5b shows a plot of the absolute heading error against frame number for each
visual compass approach. As pointed out by Labrosse in [73], the slope of the robust
linear regression of the error curve for each method provides a good approximate
measure of the drift speed of the method.
The robust linear regression of each
method was found using the iteratively re-weighted least squares routine provided
by the MATLAB robustt function, and the slopes were found to be as follows:
Drift error
radians / second
2-point Optic Flow
0.011
1D Image Flow
0.014
Labrosse
0.0061
Table 4.2: Drift rate for each method when applied with articial images.
4.4.2 Real image experiments
A MetraLabs Scitos G5 mobile robot equipped with a conical mirror and a webcam
(as shown in Figure 4.6a) was driven around the robot arena at the University of
Essex at a xed linear velocity of
0.3
m/s and varying rotational velocity.
The
ground truth pose of the robot was provided by a Vicon tracking system, and was
logged along with
250 frames like that shown in Figure 4.6b.
Frames were logged at
4.4. Experiments and Results
approximately
was
0.0093
20
58
frames per second. The mean absolute rotation between frames
radians.
(a)
(b)
Figure 4.6: (a) The robot equipped with omnidirectional cameras. (b) An example
of a real omnidirectional image captured by the robot.
A graph of the accumulated calculated heading of the robot using the two estimation
algorithms described in Section 4.2 and the Labrosse visual compass is shown in
Figure 4.7a.
As with the simulated image experiment, all methods show a drift
error and the Pearson product-moment correlation coecients (PMCC) between
the actual change in heading (ω ) and the estimated change in heading (ω̃ ) for the
Labrosse and the signature ow methods again show very strong correlations. The
PMCC for the two point optic ow method shows a weaker correlation, due in part
to a higher failure caused by a lack of clear point pairs. For the initial part of the
trajectory, between frames
1
and
70,
there was texture available in the front and
back regions of the image. However, from frame
70
onwards the regions contained
little texture and the two-point approach for calculating optic ow was ineective.
For each method the error in the estimated change in heading (ωe
= ω − ω̃ ) followed
an approximately normal distribution, characterised by the means
deviations
σ
shown in Table 4.3.
ω¯e
and standard
4.4. Experiments and Results
59
0.5
Heading (radians)
0
−0.5
1D image
optic flow
−1
Labrosse
Ground
truth
−1.5
−2
2−Point
optic flow
−2.5
−3
0
50
100
150
200
250
Frame number (20 fps)
(a)
0.8
Heading error (radians)
0.7
2−Point
optic flow
0.6
0.5
0.4
1D image
optic flow
0.3
Labrosse
0.2
0.1
0
−0.1
−0.2
0
50
100
150
200
250
Frame number (20 fps)
(b)
Figure 4.7: (a) Accumulated heading estimates for the two proposed methods and
the Labrosse method alongside ground truth over
250
real frames. (b) The error in
the accumulated heading estimates (the drift error) for the two proposed methods
and the Labrosse method over the
250
real frames.
Mean error
ω¯e
2-point Optic Flow
1D Image Flow
Labrosse
radians
−4.1 × 10−3
1.1 × 10−3
1.2 × 10−3
Standard
deviation
PMCC
σ
between
ω
0.053
0.24
0.0082
0.89
0.0058
0.94
and
ω̃
Table 4.3: Performance statistics for the two proposed methods and the Labrosse
method for
250
real image pairs.
Figure 4.5b shows a plot of the absolute heading error against frame number for
each visual compass approach. Again the robust linear regression of each method
4.4. Experiments and Results
60
was found using the MATLAB robustt routine. The drift speeds were found to be
as follows:
Drift error
radians / second
2-point Optic Flow
0.064
1D Image Flow
0.031
Labrosse
0.022
Table 4.4: Drift rate for each method when applied with real images.
4.4.3 Discussion
It is important to note that only a partial implementation of the Labrosse method
has been implemented here. In particular, in order to nd the rotation between two
consecutive frames, where the rotation is very small, a high level of interpolation
has been used. A complete implementation of the Labrosse visual compass includes
criteria for deciding which frames to calculate rotation between.
For this reason,
the Labrosse method did not perform as well in this work as that published in
[73].
However, this partial implementation allows conclusions to be drawn from
comparisons between the methods.
Of the two new algorithms proposed for calculating the rotation of the robot, the
1D image ow method (Method 2) produced the best results when applied to real
images and is comparable to the method proposed by Labrosse [73]. The two-point
optic ow method (Method 1) was ineective in real world situations, despite having
shown promising results in simulation. This is because of the strong limitation of
explicitly requiring image features from which to calculate the optic ow.
The 1D approach does not require such explicit image features. This makes it more
robust, as shown in the experiment with real images where the 2D method began to
4.5. Calculating the Translation of the Robot
fail from frame
61
70 onwards where as the 1D approach succeeded.
This suggests that
although the Labrosse method performed best, future research could investigate the
use of
1D
images for nding the rotation between temporally more distant frames
as in a complete implementation of the Labrosse approach.
The motion of a robot in a
20th
of a second (i.e.
between consecutive frames) is
very small, and calculating it as attempted by the two newly proposed rotation
estimation algorithms is prone to error. Even a small error in the calculation, for
example the error resulting from noisy measurements, can be vastly larger than the
actual motion. For this reason it can be concluded that nding the rotation of the
robot is best achieved at a reduced frame rate.
4.5
Calculating the Translation of the Robot
The previous section showed that the rotation between frames can be found by
applying a visual compass algorithm.
If the range to a single point feature can
be measured, then a complete ego-motion estimate can be formed by nding the
translation of the robot between those frames.
This becomes an easily tractable
application of trigonometry:
1. At time step
t,
choose an image feature in the lower catadioptric image.
2. Calculate the range
r0
and the bearing angle
θ0
to the point by nding its
location in the upper mirror using coaxial stereo as described in Chapter 2.3.
3. At time step
angle
θ1 .
t + 1 calculate the new range to the point r1
(see Figure 4.8.)
and the new bearing
4.5. Calculating the Translation of the Robot
4. Compute the translation
where
ω
(tx , ty )
62
of the robot:
tx = x0 − cos(ω) x1 + sin(ω) y1
(4.11)
ty = y0 − sin(ω) x1 − cos(ω) y1
(4.12)
x0 = r0 .cos(θ0 )
(4.13)
y0 = r0 .sin(θ0 )
(4.14)
x1 = r1 .cos(θ1 )
(4.15)
y1 = r1 .sin(θ1 )
(4.16)
is the rotation between the frames.
Figure 4.8: The translation of a robot between two frames can be calculated from
the distance to a single landmark if the rotation
ω
between the frames is known.
Omnidirectional stereo vision could be used to nd the range information, and the
conceptual simplicity of the formulae proposed by Spacek [112] (see Chapter 2.3)
suggests application on a low-cost robot may be feasible.
However, although the
omnidirectional stereo vision with conical mirrors proposed by Spacek [112] has
been shown to work successfully in simulation (see Burbridge and Spacek [17]),
4.6. Omnidirectional Binocular Vision Without Alignment
63
perfect mirror and camera alignments are essential for it to reliably work in real
world applications. When using the real mirrors and cameras shown in Figure 4.6a,
poor alignment between the cones and cameras prevents stereo reconstruction.
In the next section the image projections through a conical mirror are explored in
greater detail, allowing stereopsis to be achieved regardless of the alignment of the
cameras and cones.
Previously conical mirrors could only be used for geometric
reconstructions when perfect alignment could be guaranteed.
4.6
Omnidirectional Binocular Vision Without
Alignment
Taking into account the true alignment of the mirrors and cameras allows the derivation of a more general set of formulae for catadioptric omnidirectional projections.
The back projection of a pixel provides the ray line in 3 dimensional space on which
the point imaged by the pixel must lie. By imaging a landmark from two viewpoints,
the two back projected rays can be intersected to reconstruct the landmark's
3D
coordinates.
Forward projection formulae for projection through a non-aligned cone have been
proposed by Cauchois et al. [20, 21]. However, the formulae proposed will only work
if the vertex of the cone, the imaged world point, the camera and the normal to
the cone are all in the same plane. New forward and backward projection formulae
for catadioptric omnidirectional vision using an arbitrarily placed cone mirror and
perspective camera are presented here.
4.6. Omnidirectional Binocular Vision Without Alignment
64
4.6.1 Backward Projection Formulae
The back projected ray line is described in vector form as
start point of the ray and
d̂
(D, d̂)
where
D
is the
is a unit vector giving the direction. Tracing the ray
backwards from the camera can be split into three stages:
•
determining the direction of the ray as it leaves the camera;
•
determining the point on the surface of the mirror at which it was reected;
•
determining the direction of the ray from the point on the mirror to the imaged
point.
These stages are solved as follows.
Ray leaving the camera
The direction of the reversed ray leaving the camera can be specied with the standard pinhole camera model using the intrinsic parameters of the camera, namely
the focal length
f,
scale factor
s
and the coordinates of the principal point
(px , py ).
In a camera centred coordinate system with the camera looking along the negative
axis and the horizontal image direction parallel to the
x axis as shown in Figure 4.9,
the direction of a ray leaving the camera associated with an image location

px − u


d =  py − v

f /s
z
(u, v)
is



.

(4.17)
4.6. Omnidirectional Binocular Vision Without Alignment
+y
v
u
+x
P
65
dˆ
(px, py)
f
−u
−z
Camera
lens
(u, v)
−v
Image plane
−y
Figure 4.9: The perspective projection camera model.
Converting the ray from camera centred coordinates to cone centred coordinates, as
shown in Figure 4.10, gives the new ray
(C, ĉ),
where
ĉ = Rd̂.
R
(4.18)
is a rotation matrix constructed from the Euler angles for the rotation of the
camera,
d̂ is the normalised form of the direction ray d, and C is the location of the
camera in cone centred coordinates.
Figure 4.10: The three coordinate systems: world, cone and camera. Each coordinate system is unique, with a rotation and translation between systems.
4.6. Omnidirectional Binocular Vision Without Alignment
66
Intersection with the cone
The point of intersection
I
between the ray and the cone can be found algebraically
by equating the formula for the cone with the formula for the ray:

Ix





 Iy  = I = C + pĉ,


Iz
p 2
Ix + Iy2 h
= 0.
Iz +
r
(4.19)
(4.20)
p
gives two solu-
+ c2y Cz2 + c2z Cx2 + c2z Cy2 ) 2 a)/(−c2x a2 − c2y a2 + c2z );
(4.21)
Substituting Equation (4.19) into Equation (4.20) and solving for
tions:
p = −((−Ax − Ay )a2 + Az − (2Ax a2 Ay − 2Ax Az
− 2Ay Az + c2x Cz2 − c2x a2 Cy2 − c2y a2 Cx2
1
and
p = −((−Ax − Ay )a2 + Az + (2Ax a2 Ay − 2Ax Az
− 2Ay Az + c2x Cz2 − c2x a2 Cy2 − c2y a2 Cx2
1
+ c2y Cz2 + c2z Cx2 + c2z Cy2 ) 2 a)/(−c2x a2 − c2y a2 + c2z );
where
a=
Ay = Cy ĉy
h
is the ratio between the height
r
and
Az = Cz ĉz .
solutions that satisfy
h
and radius
The correct value of
p
r
of the cone,
(4.22)
Ax = Cx ĉx ,
is selected as the smallest of the
Cz + pcz < 0.
Direction of the reected ray
The back projected ray intersects the mirror at
I
with a direction vector
ĉ.
Assum-
ing that the mirror is a perfectly specular reective surface and that there are no
4.6. Omnidirectional Binocular Vision Without Alignment
67
manufacturing imperfections in the shape of the cone, the reected ray direction can
then be found as
r̂ = ĉ − 2(ĉ · n̂)n̂
where
n̂
(4.23)
is the normalised vector normal to the cone at


n = ∇f = 

Multiplying by
p 2
Ix + Iy2
∂f
∂Ix
∂f
∂Iy
∂f
∂Iz

found by dierentiating
I:
the formula for the cone Equation (4.20) at

I

hIx /((Ix2
1
Iy2 ) 2 r)
+
 
 
1
 =  hIy /((Ix2 + Iy2 ) 2 r)
 
1



.

(4.24)
h
Iz
allows
is the same as
r
(Ix2 +Iy2 )−1/2
and observing that
Equation (4.24) to be simplied to:

Ix Iz


n =  Iy Iz

Ix2 Iy2



.

The resulting ray line that the point imaged at
(4.25)
(u, v)
must lie on is
(I, r̂).
4.6.2 Forward Projection Formulae
The aim of the forward projection model is to map a point from world coordinates
P = (x y z)T
to the pixel coordinates
(u, v)
at which it appears in the image.
This involves two steps:
•
nding the point on the surface of the mirror that reects the point into the
camera (the reection point);
•
projecting this point into the camera.
Finding the reection point can be achieved by placing constraints upon the location,
and then solving these constraints as a system of equations.
4.6. Omnidirectional Binocular Vision Without Alignment
68
Reection point
Assuming that the world point has been transformed into cone centred coordinates
with the rotation and translation between the cone and world known, the following
constraints upon the reection point
R = (Rx Ry Rz )T
exist:
1. The point must be on the surface of the cone:
Rz +
q
h
Rx2 + Ry2 = 0.
r
(4.26)
2. The angle between the normal to the cone at the reection point and the vector
from the world point to the reection point must equal the angle between the
normal to the cone at the reection point and the vector from the projection
into the camera to the reection point:
C−R
P−R
·n−
·n=0
kC − Rk
kP − Rk
(4.27)
expanding to
((Cx − Rx )Rz Rx + (Cy − Ry )Ry Rz + (Cz − Rz )(−Rx2 − Ry2 ))2
(Cx − Rx )2 + (Cy − Ry )2 + (Cz − Rz )2
((x − Rx )Rz Rx + (y − Ry )Ry Rz + (z − Rz )(−Rx2 − Ry2 ))2
= 0.
−
(x − Rx )2 + (y − Ry )2 + (z − Rz )2
(4.28)
3. The normal to the cone at the reection point must be in the same plane as
the world point, the camera, and the reection point:
((P − C) × (P − R)) · n = 0
(4.29)
expanding to
((y − Cy )(z − Rz ) − (z − Cz )(y − Ry )) Rz Rx +
((z − Cz )(x − Rx ) − (x − Cx )(z − Rz )) Rz Ry +
((x − Cx )(y − Ry ) − (y − Cy )(x − Rx )) (−Rx2 − Ry2 ) = 0.
(4.30)
4.6. Omnidirectional Binocular Vision Without Alignment
A direct analytical solution for
R
69
cannot be obtained from the above constraints
because Equation (4.28) is highly non-linear, containing fth powers. Figure 4.11
shows the complexity of these constraints graphically.
Figure 4.11: A graphical representation of the forward projection constraints (4.26),
(4.28) and (4.30). All parameters other than
R
are xed.
Iterative solution
A solution can be obtained iteratively using Newton's method:
Rn+1 = Rn − ∆F (Rn )−1 F (Rn )
where
F =0
is a multidimensional function comprising of Equations (4.26), (4.28)
and (4.30), and
iteration
n.
(4.31)
∆F
is the Jacobian of
F
with respect to the reection point
With a good initial estimate of
Rn
Rn
at
few iterations are generally required,
however, when a poor initial estimate is used convergence is not guaranteed. Adaptations of Newton's method such as the Newton-Armijo method which addresses
this issue by introducing a step size and the Newton-GMRES iteration which oers
better global convergence are discussed in [58].
4.6. Omnidirectional Binocular Vision Without Alignment
70
When the camera is nearly aligned with the axis in the usually required manner,
the projection equations detailed by Spacek [112] can be used to provide an initial
estimate. If the camera is intentionally set o axis then a method of searching for
an initial point is required.
Projection into the camera
The reection point
R in cone centred coordinates can be projected into the camera
using the pinhole camera model:


R
 x 




 Ry 



 Sv  = M 




 Rz 


S
1

where
S
Su

(4.32)
is the homogeneous coordinate scale factor and
M
is the
3×4
camera
projection matrix comprising of both the intrinsic and extrinsic parameters of the
camera as in Equations (4.17) and (4.18):

sf


M = 0

0
0
px
−sf
py
0
1
0


 R−1

0 

0
0
−C
1

.
(4.33)
These newly derived formulae can now be used to achieve stereo vision using arbitrarily aligned mirrors and cameras, as demonstrated in Burbridge et al. [16].
However, they are far more complicated than the attractively simple formulae by
Spacek [112], and their application in robotics needs to be more carefully considered.
4.6.3 Application in Robotics
Having formulated a more generic set of projections with a conical mirror, it is now
possible to draw conclusions regarding the applicability of omnidirectional stereo
vision to small low-cost robots. When perfect alignment is not assumed, there are
4.6. Omnidirectional Binocular Vision Without Alignment
71
many parameters involved in achieving stereo with two mirrors. To summarise, these
parameters are as follows:
•
the transformation between the rst cone and the rst camera (six parameters), and the transformation between the second cone and second camera (six
parameters),
•
the transformation between the rst and second cone (six parameters),
•
the ratio between the height and radius of the cone (one parameter),
•
the intrinsic parameters of each camera (eight parameters),
•
the distortion models for the cameras (at least two parameters for radial and
two for tangential distortions), and
•
any manufacturing imperfections in the shape of the mirror, requiring further
calibration.
In total there are at least 31 parameters, all of which are critical for accurate geometric reconstruction. In mobile robot applications, many of these parameters are
subject to change when the robot bumps into an object or wobbles over a dip in
the ground for example, and the appealing simplicity of the formulae presented by
Spacek [112] is lost. Using a catadioptric solution to omnidirectional stereo is only
possible with expensive well-manufactured precision apparatus.
Furthermore, even with well manufactured equipment the computational cost of the
approach is high. Feature matches between the two stereo images are required in
all stereo approaches (the correspondence problem), which leads to computationally
demanding feature extraction and matching.
This problem becomes even more
demanding when considering the new stereo formulation presented here.
This is
because when perfect alignment is not assumed, no convenient epipolar constraint
exists to allow guided feature matching as is the case in Spacek's formulation (see
Chapter 2). The application to widespread small cheap robots is therefore infeasible.
4.7. Summary
4.7
72
Summary
In this chapter omnidirectional stereo vision has been considered for ego-motion
estimation. Two new approaches to nding the rotation of the robot between frames
have been proposed and tested.
The results for the rotation estimation showed
that both methods are able to estimate the rotation, but that an existing method
marginally outperformed them both. As a purely appearance-based method, the 1D
image ow algorithm performed better in real images than the 2-point optic ow
method as no local image features were required.
It was observed that if the range to image features can be measured, then calculating the translation of the robot is trivial. However, calculating the range using the
simplistic projection formulation of omnidirectional stereo described in Section 2.3 is
not possible with poorly aligned apparatus. In order to rectify this situation, a new
set of equations for image projections with a conical mirror have been formulated
for the more general case of non-exact alignment. However, the formulation encompassed a very large number of parameters, suggesting that a method that makes
use of omnidirectional stereo with conical mirrors is not ideally suited for low-cost
mobile robots.
Although the results achieved by the methods presented in this chapter have not
been satisfactory, conclusions can be drawn as to what the best method of achieving
reliable navigation on a low-cost low-powered commercially viable robot is:
1. Attempting to calculate the inter frame translation of the robot with no form of
lter to integrate a dynamic model with past sensory inputs is highly prone to
error. Although not relying on measurement history or a known model of the
robot kinematics theoretically allows instantaneous acceleration detection, the
magnitude of the motion being estimated is vastly dwarfed by the magnitude
of the error in the sensor readings.
2. The use of stereo omnidirectional vision achieved with catadioptric sensors re-
4.7. Summary
73
quires many parameters to be known and complex projections to be evaluated.
When a geometric reconstruction is required, the combination of cost, weight
and complexity of catadioptric setups seriously limits their scope in robotic
applications.
Having discovered that binocular omnidirectional vision is unsuitable for cheap
robots, the natural question to ask is What is the feasibility of navigation with
only a single omnidirectional camera?. In the next chapter navigation with a single
camera is considered. Rather than the approach of ego-motion estimation taken in
this chapter, conventional SLAM algorithms are considered. An implementation is
constructed and tested in the context of low-cost navigation.
5
Metric Simultaneous Localisation
and Mapping with Monocular Vision
5.1
Introduction
Chapter 4 concluded that neither stereo vision nor inter frame localisation were
eective approaches to robot localisation for low powered robots.
An alternative
approach can be found in metric simultaneous localisation and mapping (SLAM)
algorithms that make use of a motion model and past readings when estimating the
position of the robot, eectively smoothing the location estimation.
By building
a map and performing loop closing, the kind of drift error that was apparent in
Section 4.2 can be prevented.
Although the diculties with conical stereo are not prohibitive, as shown in Chapter
4, the added complexity brought about by having two cameras dramatically limits
the usefulness of omnidirectional stereo for low-cost robot navigation. Davison [26]
showed that stereo is not necessary to achieve visual SLAM with an approach called
MonoSLAM which makes use of a single camera to localise and map point features
in 3D.
Monocular omnidirectional vision using a sh-eye lens, as described in Chapter 2,
5.2. Extended Kalman Filter SLAM Algorithm
75
avoids both the complexities of stereo vision, and the complexities of projections in
a non-central catadioptric system. As detailed in Chapter 3, omnidirectional vision
has had relatively little attention as a sensor for metric single camera SLAM, despite
being ideally suited to the process of localisation. Earlier research that has made
use of omnidirectional vision for SLAM has not fully quantied the improvements
gained by omnidirectional sensing.
In this chapter the use of such a metric SLAM is considered in the context of lowcost robots.
In Section 5.2 the well known extended Kalman lter based SLAM
approach is detailed, and in Section 5.3 simulation experiments are presented that
quantitatively show the advantages gained by the use of a wider angle lens.
The
suitability of metric SLAM to low-cost, low powered robots is discussed in Section
5.4.
5.2
Extended Kalman Filter SLAM Algorithm
In Kalman Filter based SLAM algorithms the system state is stored as a vector. The
state is represented by the robot position
(`v , ωv ),
and the world positions
x=
m=
where
fn
is the
nth
h
h
(xf , yf )
(x, y, θ),
its angular and linear velocities
of all features in the map
x y θ `v ωv m
iT
xf 1 yf 1 xf 2 yf 2 ... ... xf n yf n
(5.1)
i
(5.2)
feature in the map.
The aim of the algorithm is to maintain an estimate of the state
probability distribution
t.
m:
P (xt |xt−1 , Zt ), where Zt
x
in the form of a
is the measurement vector at time
The Kalman lter models the probability distribution as Gaussian, allowing it to
be represented by a covariance matrix
P
and the mean state vector
x.
Any linear
transformation of the state estimation will produce an altered Gaussian distribution,
5.2. Extended Kalman Filter SLAM Algorithm
76
meaning that the Kalman lter is an optimal state lter for linear systems with
Gaussian probability distributions.
In an extended Kalman lter (EKF), the non-linear measurement and motion models
are linearised about the current state estimate as Jacobian matrices. Therefore for
non-linear systems the EKF is only a rough non-optimal approximation. However,
it is still widely used when the state transition and measurement models are nearlinear. An alternative Bayesian lter for highly non-linear systems is the Particle
lter, in which the state estimate is maintained as a discrete set of samples.
The algorithm follows the usual predict-rene cycle, whereby the state
at time step
t + 1,
x is predicted
and measurements are used to rene the prediction.
motion model is used to predict the state
x
A robot
at the subsequent time step.
5.2.1 Robot Motion Prediction Model
A motion model is employed in order to predict the position of the robot at the next
time step. Many SLAM algorithms use wheel odometry information as the motion
model. This is an appealingly simplistic model, although problems could potentially
occur when the accuracy of the odometry varies signicantly throughout the robot
traversal, requiring additional methods such as that presented in the last chapter
in order to estimate the accuracy of the odometry. In addition to this, some small
robots [38] have no wheel odometry.
The use of odometry can be avoided by using a constant velocity motion model and
5.2. Extended Kalman Filter SLAM Algorithm
77
the kinematics of a dierential drive robot as follows:
xt+1 = f (xt , n)
= f (xt , 0)












= xt + 










where
n =
h
`vn ωvn
i
+`vn
vn
sin θ + ω`vv +`
sin(θ + (ωv + ωvn )∆t)
− ω`vv +ω
+ωvn
vn
`v +`vn
+`vn
cos θ − ω`vv +ω
cos(θ + (ωv + ωvn )∆t)
ωv +ωvn
vn
(ωv + ωvn )∆t
`vn
ωvn
0
.
.
0























is the zero mean Gaussian process noise and
(5.3)
∆t
is the
amount of time ahead that the model is being used to predict. A complete derivation
of this model is presented by Thrun et al. [118].
The covariance of the state estimate is updated using the Jacobian of the motion
model with respect to
x, ∆fx ,
and with respect to
n, ∆fn :
P = ∆fx P ∆fxT + ∆fn Q∆fnT
where
Q
(5.4)
is the covariance of the process noise. The Jacobian matrix
∆fx
does not
depend upon the map or velocity elements in the state vector, so state augmentation
[123] can be employed to decrease the computational complexity of this operation.
5.2.2 Measurements to Rene Motion Prediction
The measurement model predicts what the measurement will be from the current
state estimate. In the case of a single monocular camera mapping and localising in
2D, the predicted measurement of each mapped feature
i
is a single scalar
ẑi
giving
5.2. Extended Kalman Filter SLAM Algorithm
78
the relative bearing to the feature:
ẑi = h(x, v)
(5.5)
y −y
= arctan( xff ii −x ) − θ + v
where
v
is the measurement noise and
θ
is the heading angle of the robot.
The state estimate and covariance are rened after each feature measurement
zi
using the standard equations of the EKF as follows:
where
∆hx
and
∆hv
S = (∆hx P ∆hTx + ∆hv R∆hTv )−1
(5.6)
x = x + P ∆hTx S(zi − ẑi )
(5.7)
P = (I − P ∆hTx S)P
(5.8)
are the Jacobians of the measurement model (5.5) with respect
to the state and the noise, and
R
is the covariance of the measurement noise in
this case simply a scalar representing the variance.
S
represents the certainty of the
predicted measurement which can be used to provide search bounds for the feature.
5.2.3 Placing Features in the Map
In monocular vision based SLAM, only the bearing to a feature can be measured, so
measurements from two dierent views are required in order to estimate the distance
to the feature and to initialise it in the map. The typical solution in monocular vision
based SLAM is to initialise several known features into the map before the algorithm
starts. Without this or any odometry, the robot would be unable to know how far
it had travelled in order to initialise features, and scale would have to be left as an
unknown as in [24]. In 2D at least two pre-initialised known features are needed to
constrain the robot location.
In the implementation described in this chapter, as soon as the robot starts moving
new features are tracked for potential mapping. Bailey's frozen state approach [7] is
5.2. Extended Kalman Filter SLAM Algorithm
79
employed to initialise the features. This occurs by freezing a copy of the robot state
at the time a rst measurement of a potential feature is made, and using the frozen
copy when the feature is measured again with a sucient baseline. Using Bailey's
approach allows the consistency of the map covariance to be maintained.
5.2.4 Algorithm Overview
An outline of the well known extended Kalman lter based algorithm for bearings
only 2D SLAM with Bailey's frozen state feature initialisation [7] is presented in
Algorithm 5.1.
Algorithm 5.1 Extended Kalman lter SLAM with a monocular camera.
Require: A state vector with two pre-mapped features.
1:
while running do
2:
Predict state using motion model.
3:
for all feature in the map do
4:
5:
6:
if feature visible then
Rene state and covariance estimates based on feature measurement.
end if
7:
end for
8:
for all partially initialised feature do
9:
10:
11:
if feature not found then
Delete frozen copy of state.
else if baseline and disparity above threshold then
12:
Insert feature estimate into state.
13:
Augment covariance matrix with covariance estimate for new feature
based on frozen copy.
14:
15:
Remove corresponding frozen copy of state.
end if
16:
end for
17:
for all remaining visible feature do
18:
19:
20:
Freeze a copy of the robot state.
end for
end while
5.3. Experiments with EKF SLAM
5.3
80
Experiments with EKF SLAM
In this section the SLAM algorithm detailed in Section 5.2.4 is tested in simulation
to evaluate the accuracy of the localisation and its suitability to low-cost robots.
5.3.1 Experimental Method
Experiments were carried out in simulation so that repeatable testing could be
performed. Doing so makes it possible to ensure that only the factor that is being
investigated changes between tests.
The simulation was made realistic by closely
linking it to a real physical robot.
A Hybrid Simulation
In order to avoid using an over-simplistic simulation of robot kinematics, the trajectory of a real physical robot being driven around an arena was logged using a
Vicon tracking system. This logged trajectory was then used as the trajectory of
the robot in the simulation (see Figure 5.1) avoiding any bias towards the simulator
caused by over simplied perfect trajectories.
True real robot
pose from Vicon
system
True pose
Compare
Simulate perception
at robot position
SLAM Performance
SLAM System
Estimated pose
Figure 5.1: The hybrid simulation method. Only the sensor perception is simulated.
This hybrid simulation method means that only the sensor is being simulated.
Throughout the experiments detailed below, robot refers to the real physical robot,
while sensor, feature and world refer to software simulations.
5.3. Experiments with EKF SLAM
81
Angular Velocity of the Robot
Linear Velocity of the Robot
1.5
0.8
Linear Velocity (m/s)
Radians / second
1
0.5
0
−0.5
−1
−1.5
0
500
1000
1500
Frame (30 fps)
2000
0.6
0.4
0.2
0
0
2500
500
1000
1500
Frame (30 fps)
2000
2500
Figure 5.2: The linear and angular velocity of the robot over its journey.
Robot trajectory
3
Y (metres)
2
1
0
−1
Ground truth
Odometry
−2
−2
−1
0
X (metres)
1
2
Figure 5.3: The ground truth trajectory of the robot and the trajectory reported by
the on-board odometry.
Robot Trajectories
A trajectory was acquired by driving a robot around the robot lab at the University
of Essex while capturing the exact position at
30Hz.
The linear and angular veloci-
ties were varied throughout the journey as shown in Figures 5.2 and 5.3. This trajectory realistically violates the constant velocity model described in Section 5.2.1,
maintaining the applicability of these experiments to real robot applications.
The same trajectory was then used for each experiment. The visible range of the
simulated sensor was xed at
10 meters, which covered all features placed within the
direction of view. In all experiments, Gaussian noise was added to the measurements
with
σ = 0.0025.
This was added to the measurement as a percentage of the eld of
view of the sensor, simulating the idea that the sensor has a xed resolution that is
spread across the eld of view. For a eld of view of
360◦
this equates to a standard
5.3. Experiments with EKF SLAM
deviation of
82
0.9◦ .
The performance of the localisation in each experiment was assessed by calculating
the root mean squared error (RMSE) of the position of the robot over the whole
journey. In this work no drift error occurs because data association is known and
four pre-initialised features are provided.
5.3.2 Experiment 1: How many features are needed?
In order to evaluate the eect the number of features in the environment has on
the localisation error, worlds were generated with a number of features uniformly
distributed in a
6m × 6m
grid centred at the robot start position. The number of
features was varied, and the accuracy of the localisation for each was compared.
The results are shown in Figure 5.4.
R.M.S.E (metres)
Number of Features and Localisation Accuracy
0.05
0.04
0.03
0.02
0.01
0
0
10
20
30
40
Number of features
50
Figure 5.4: The eect of the number of features available in the world (in addition
to the four pre-initialised features) on the accuracy of the localisation.
As can be seen from the graph, even with only the four initially known features the
algorithm performs well, achieving an average error of approximately
5cm.
As more
features are added to the world the accuracy increases. However, as the number of
features increases, the amount of improvement in accuracy decreases. This would
suggest that maintaining a map of only
40
well spaced features will perform almost
as well as maintaining a similar map of
50
features. The marginally lower accuracy
5.3. Experiments with EKF SLAM
83
achieved using the smaller map may be compensated for by the increased speed of
the computation.
5.3.3 Experiment 2: How important is an isotropic feature
distribution?
The distribution of features in the environment is an important factor in the accuracy
of the localisation of the robot. To demonstrate this the SLAM algorithm was run
in a world of 25 features randomly distributed about the origin, with all 25 features
visible in every frame.
The random distribution was Gaussian, and the standard
deviation was varied between
0m
and
2m.
The results are shown in Figure 5.5.
R.M.S.E (metres)
Localisation Accuracy against Feature Distribution
0.035
0.03
0.025
0.02
0.015
0.01
0
0.5
1
1.5
Standard Deviation of Feature Distribution (metres)
Figure 5.5: The eect of the spacing of world features on the accuracy of the localisation.
When the features are not well spread across the environment (ie. have a small
standard deviation) they are mostly at the same bearing as the robot moves away
from the origin. This non-uniform distribution of features about the robot causes
a decrease in accuracy coming from poor triangulation and the features being more
distant and thus more aected by noisy measurements. As the standard deviation
of the feature distribution is increased, the features become more spread across the
whole of the robot trajectory and the overall accuracy increases.
5.3. Experiments with EKF SLAM
84
5.3.4 Experiment 3: Does omnidirectional vision improve
SLAM?
The accuracy of localisation achieved when using cameras of dierent elds of view
was compared by running the algorithm in a world of 25 evenly spaced features with
elds of view ranging from
120◦
to
360◦ .
The results are shown in Figure 5.6.
R.M.S.E (metres)
The Localisation Accuracy with Different Fields of View
0.8
0.6
0.4
0.2
0
100
Figure 5.6:
150
200
250
300
Camera Field of View (degrees)
350
The eect of the eld of view of the camera on the accuracy of the
localisation.
The experiments show that the larger the eld of view the more accurate the localisation. This is due to a combination of more features being visible when the angle
of view is greater, and those features being more spread out allowing better triangulation. When both the eld of view and the number of features in the world are
small, there are locations in the trajectory at which no features are visible. At these
times the error in the estimated position of the robot will increase until features are
visible again, at which point the lter trusts the measurement model more than the
motion model and the robot is re-localised. Since no odometry information is used,
the motion model can quickly become inaccurate and result in large errors in the
predicted position. This is shown by the very low accuracy when the eld of view
was
120◦
large areas of the trajectory contained few visible features, resulting in
poor localisation and new feature initialisation.
Although a larger eld of view provides a more accurate localisation, the improve-
5.4. Metric Visual SLAM in Practice
85
ment in accuracy gained by increasing the eld of view increased slowly from about
180◦ .
This was because a large number of features were always visible when the eld
of view was greater that
180◦ .
5.3.5 Experiment 4: What is the eect of measurement noise?
To nd out what eect the noise present in the measurements has on the accuracy
of the localisation, the algorithm was tested in a world of 25 evenly spaced features,
adding zero mean Gaussian noise levels of
10.8◦
σ = 0◦ , 0.9◦ , 1.8◦ , 2.7◦ , 3.6◦ , 7.2◦
and
to the measurements. The results are shown in Figure 5.7.
R.M.S.E (metres)
Measurement Noise and Localisation Accuracy
0.8
0.6
0.4
0.2
0
0
2
4
6
8
Noise σ (degrees)
10
Figure 5.7: The eect of measurement noise on the accuracy of the localisation.
The results show that localisation is very sensitive to noisy measurements.
This
was found to be mostly due to poor initial estimates of newly initialised features,
subsequently resulting in detrimental renement of the robot location estimate.
5.4
Metric Visual SLAM in Practice
The metric formulation of SLAM tested in this chapter is not suitable for small low
powered robots. The reasons for this are detailed in this section.
5.4. Metric Visual SLAM in Practice
86
5.4.1 Computational Complexity
Experiments 1 and 3 show that in order to achieve an accurate localisation, about
10 constantly visible features are required. This is without considering features that
go out of sight, and is assuming a low level of noise in the measurements of the
bearing to the features. In feature-based metric SLAM solutions with the map as
part of the state, the number of features in the map has an important impact on
the computational cost.
To run in a practical environment, the required number
of tracked features would lead to a large covariance matrix and a computationally
expensive matrix inversion as shown in Equation 5.6.
Approaches to reduce the complexity have been developed, with a key example being Divide and Conquer SLAM [101] in which the mapping is divided into sub maps
that are later merged. This results in a SLAM system that has a mapping complexity linear in the number of mapped features. However, the overall computational
complexity of the system remains high due in part to the unavoidable necessity to
match features.
5.4.2 Features in the Real World
In the experiments above, and in all feature-based metric visual SLAM systems,
there is an underlying assumption that features are available in the environment,
and that those features are trackable and distinguishable.
As shown by experi-
ments 1 and 2, the number of features and their distribution in the environment are
important factors in the performance of the system.
However, features are not always available in the real world, and even when they
are they can be costly to reliably locate, extract and match.
5.4. Metric Visual SLAM in Practice
87
5.4.3 Accuracy
Experiment 4 showed that despite having four perfectly initialised features in the
map before beginning, and always having exact data association guaranteed, when
the measurements exhibited zero mean noise levels with a standard deviation of
7◦
the accuracy of the system was only approximately
35
cm. Adding noisy data
association to this would further reduce the accuracy.
In Chapter 4 the diculties involved in obtaining an accurate geometric reconstruction with binocular omnidirectional vision were discovered.
Although with
monocular omnidirectional vision there are fewer parameters involved, obtaining
very accurate measurements still requires a good calibration of the camera. On a
low-cost robot with a low-cost sensor, this adds an additional complexity that would
be best avoided.
5.4.4 Alternative Metric SLAM Solutions
There are a number of alternative solutions to the SLAM problem. An outline of
the SLAM solutions in the literature is given in Chapter 3, with solutions split into
metric approaches such as that detailed in this chapter, and topological approaches
such as [25].
Particle lter based metric solutions such as FastSLAM [89] also suer from similar
problems to those described for the Kalman lter based solution outlined in Section
5.2. The main advantage of a particle lter based solution over the Kalman lter
based solution to SLAM is its ability to cope with non-linear dynamics and nonlinear measurement functions and to be able to track multiple hypothesises [51].
However, the need for feature matching, the diculties of data association and the
restrictions of state complexity still limit the use of metric SLAM solutions on small
low-cost robots.
5.5. Summary
88
Topological SLAM solutions that do not attempt to create a metric representation
of the environment are more realistically applicable. An appearance-based approach
to the use of the image information avoids costly and restrictive feature matching
and extraction, and by taking into account an entire image rather than individually localised features the problems with noisy measurements can be counteracted.
However, a key problem is that data association and map consistency must still be
considered for an accurate solution.
5.5
Summary
The conventional approach to SLAM is to construct a map in a probabilistic framework [111]. Doing so allows the uncertainties of landmark positions and the robot's
location to be combined and a consistent map to be constructed. However, maintaining a probability distribution over the map is a costly process. The metric example
discussed in this chapter may not be the best solution with low-cost robots with limited computational power. The key reasons for this centre around the requirement
of detectable, matchable image features.
The data association problem, in which feature position measurements must be associated with feature position predictions, is non trivial. As pointed out by DurrantWhyte et al. [30, 31], data association is normally the main source of fragility in
SLAM systems, with erroneous data association causing catastrophic localisation
failure. In the experiments presented here data association has not been considered.
Therefore, the accuracy achievable in real world situations cannot be presumed to be
the same as that achieved in simulation. Furthermore, the additional computational
cost brought about by requiring data association deters use on low-cost robots with
limited computational power.
Distinguishable and trackable features must be available in the image, and their
extraction and matching adds additional computational cost. State of the art image
feature extraction approaches can reliably locate and match image features with
5.5. Summary
illumination, scale and ane invariance [59].
89
However, such approaches are com-
putationally demanding and real time performance is not achievable. Furthermore,
although the experiments presented here suggest that a high level of accuracy can
be achieved with just four features, in real world situations features may go out of
sight or become occluded.
Chapter 4 concluded that omnidirectional stereo vision was not a satisfactory tool
for navigation with low-cost cheap robots.
This chapter has shown that metric
SLAM with a single omnidirectional camera is also not a satisfactory solution for
navigation with computationally low powered robots.
The next chapter presents
a new ecient approach to topological mapping-based navigation using visual appearance that is well suited for low-cost robot applications where the robots have
limited computational power.
6
Topological Map-Based Navigation
6.1
Introduction
Metric map construction was shown to be an unsuitable approach to navigation with
a low-cost robot in Chapter 5. An alternative to the metric SLAM approach can be
found in topological mapping, in which the world is represented by a collection of
nodes that are linked together to map the connectivity of locations.
Topological maps are usually sparse, containing few nodes, with each node representing a dierent location such as a dierent room [22] or suciently dierent
perceptions [129].
Navigation using a topological map consists of nding a route
through nodes in the map that gets to the desired node, and requires the ability to
navigate from node to node. As detailed in Chapter 3, various methods of navigating
between topological map nodes have been suggested.
One such method is to create a hybrid metric-topological map, placing metric information within a topological representation. The metric information aids navigation
between nodes by providing a metric coordinate system.
Two important factors
inuencing the eciency of the navigation system are how the metric information
is found, and how it is used. The previous two chapters showed that metric reconstructions rely on image features and accurate camera calibrations and are therefore
6.1. Introduction
91
not suitable for low-cost robots.
The use of wheel odometry alone to incorporate metric information and to navigate
between topological map nodes is not usually a sensible approach because the drift
in the odometry information will be large between sparse map nodes. An alternative
is to create a dense topological map with the distance between nodes considerably
shorter as depicted in Figure 6.1. Since distances between nodes in a dense topological map are short, odometry information can be used reliably to navigate between
them.
Sparse
Dense
Figure 6.1: Left: A sparse topological map with only a single node per room. Right:
A dense topological map where a single room is covered by many densely packed
map nodes.
In both Chapter 4 and Chapter 5 geometric approaches have been shown to be
unsuitable for low-cost robot navigation. This chapter focuses on the use of visual
appearance and avoids the use of image features for metric scene understanding,
rstly by using a local appearance-based approach and secondly with a global approach.
An ecient mobile robot navigation system is developed that works by creating a
dense topological map combined with metric information from the robot's wheel
odometry. In the following section the advantages and diculties of dense topolog-
6.2. Dense Metric-Topological Mapping
92
ical map-based navigation are discussed. In Section 6.3 a self-organising map-based
solution to the diculties of dense topological mapping is presented.
Two imple-
mentations are then detailed and experimental results presented.
6.2
Dense Metric-Topological Mapping
In this thesis a dense topological map is dened as a map with nodes spaced closely
together, with no lower bound on how close two nodes can be.
This leads to a
large number of nodes covering an environment. For example, in a
4m × 4m
topological map of nodes stored every
approximately
25000.
dense
2.5cm the number of nodes in the map will be
Although there are a number of advantages associated with
a dense topological map, there are also several disadvantages. The disadvantages
stem from the large number of map nodes, and need to be addressed in order to
gain from the advantages. The advantages and disadvantages are summarised in the
following two sections.
As detailed in Chapter 3, most existing research in topological mapping focuses on
the creation of sparse maps with relatively few nodes. Little work has been done
creating dense topological maps.
The notable exceptions to this are discussed in
Section 6.2.3. Methods that may be used to overcome the disadvantages of dense
topological maps are considered.
6.2.1 Advantages of Dense Mapping
There are numerous advantages gained by using a dense topological map rather than
a sparse map:
1. As mentioned above, by having nodes that are closely spaced in the environment, basic wheel odometry can be more reliably used for navigation between
nodes. Over long distances wheel odometry is unreliable, so the error incurred
6.2. Dense Metric-Topological Mapping
93
at each traversal between nodes would be large in a sparse topological map.
2. Routes closer to the optimum can be taken through the environment when
using a very dense topological map. Arbitrary routes through the environment
can be taken because the robot is not forced to navigate towards a specic
node as a way point, but instead can navigate in any part of the environment
knowing that a node will exist close to the current location.
When a route
through the topological map has been found, the metric information between
the nodes can be exploited without having to visit the nodes that form the
route.
3. Using a dense rather than a sparse topological map makes the map consistency
become less signicant, as demonstrated in Section 6.5. An inexact mapping
of the topology as shown in Figure 6.2 can still be used for navigation, as
the robot is able to constantly re-assess its position and route through the
environment.
The map becomes a virtual sensor that tells the robot how
far to travel, and inconsistencies within the map become noise which can be
ltered. This avoids maintaining a complete probability distribution over the
map, as was shown to be inappropriate in the last chapter.
4. When describing a node using the raw perception in an appearance-based
way, a dense topological map is favourable as more of the perceptual space
is captured.
In a sparse map the perception will vary signicantly between
nodes, however in a dense map the perception is sampled frequently. Since the
perception can vary non-smoothly over large distances, localising can become
dicult and prone to error when map nodes are far apart.
6.2. Dense Metric-Topological Mapping
94
Link with
difference
(0,0.15)
Node A
True Pose
(0.45, 1.25)
Link with
difference
(0,-0.25)
Node B
True Pose
(0.45, 1.00)
Link with
difference
(0, 0)
Node F
True Pose
(0.45, 1.15)
Node Z
True Pose
(0.45, 1.00)
Link with
difference
(0, 0)
Link with
difference
(0,0.25)
Link with
difference
(0, 0)
Node L
True Pose
(0.45, 1.25)
INCONSISTENT
LINK
Figure 6.2: An example of an inconsistent topological map.
circles with true
(x, y)
Nodes are shown as
world position and links are shown between the nodes that
specify the dierence between node positions. When using the map to navigate from
node A to node F by following link distances, the inconsistent link between L and
F could result in failed navigation.
6.2.2 Disadvantages of Dense Mapping
Alongside the advantages of dense mapping are a number of serious disadvantages.
Particularly in the context of low-cost robots, the main disadvantages result from
the increased number of map nodes leading to the following diculties:
1. With a large number of map nodes, localising becomes a computationally
demanding process of comparing the current perception to that stored for
each node, and taking the best matching node as the robot location.
This
approach is not very ecient, requiring a large number of comparisons, and
if the stored perception for each node is large then the computational power
required for localising can be too great.
2. Storing a topological map with many nodes can take a large amount of disk
space when an image is stored for each node.
3. Finding a route through a dense topological map requires searching many more
links than in a sparse map.
6.2. Dense Metric-Topological Mapping
95
4. If map consistency is important then it has to be obtained by incorporating
the uncertainty of links between nodes.
Mapping a large number of nodes
increases the computational expense, causing problems similar to those found
when a large number of features are mapped in metric SLAM implementations
(see Chapter 5).
6.2.3 Methods of Coping with Disadvantages
Dense topological mapping was considered in [35], with a
approximately
250 nodes spaced 25cm apart.
50m corridor mapped using
The focus of the approach was on map
construction and localisation, with the importance of map consistency considered in
detail. This led to a computationally intensive algorithm. Furthermore, in order to
cope with the disadvantages of dense mapping, a one dimensional panoramic image
was used as the robot perception. Although this kept the map storage requirements
low, it increased perceptual aliasing.
One dimensional panoramic images were also used for robot navigation in [14], where
scale-space features were extracted and matched to allow navigation between views.
A dense topological map with nodes spaced
25cm apart was used for navigation, but
the map had to be metrically consistent with the global pose of each node known.
No details of how this was acquired were given.
The PCA based methods discussed in Chapter 3 lend themselves well to dense
topological mapping as the image data becomes compressed, allowing storage of a
large database of images and faster matching within the database as shown by Winters [124]. However, as the number of images in the database increases, the number
of principal components necessary to capture their variance increases. Furthermore,
updating the principal components requires a computationally expensive matrix decomposition (the Singular Value Decomposition), even when using incremental PCA
approaches [5].
6.3. Using a SOM for Ecient Topological Mapping
96
Alternative methods of reducing the image data include image edge magnitude histograms [128], Fourier series descriptors [52] and bag-of words type approaches [25].
Image histogram-based descriptors can reduce the amount of data stored for each
topological map node. However, they are not well suited to dense topological mapping because near locations will have almost identical histograms.
Bag-of-words approaches to appearance-based image description (see Chapter 2.4)
suer from a similar problem to image edge magnitude histograms, particularly
when using omnidirectional vision where features remain in view over long distances.
Fourier series descriptors of images as used by Ishiguro et al. [52] allow compact
storage of image data in the form of several Fourier coecients, and as shown in
[84] a hierarchical approach to localisation can be achieved by varying the number
of coecients used when comparing descriptors.
However, Hafed and Levine [45]
showed that Fourier signatures are very susceptible to illumination change.
6.3
Using a Self Organising Map for Ecient Topological Map Indexing and Construction
A novel contribution of this thesis is the use of a self-organising feature map (SOM)
to overcome the complexities of dense topological mapping.
This is achieved by
using a SOM for both node compression and node indexing.
Construction of the
SOM indexed dense topological map is split into two distinct learning phases:
•
Phase I: The robot learns what kind of perception to expect. In this phase,
the self-organising map is trained to cluster the perceptual space into regions.
•
Phase II: The robot learns how the perception will change when a specic
motion is taken. A dense topological map is created in association with the
self-organising map learnt in the rst phase, exploiting the topology preserving
property of the SOM as a key to the robot's perceptual space. Every node in
6.3. Using a SOM for Ecient Topological Mapping
97
the topological map corresponds to a physical location, which is represented
by the SOM classied perception at that location.
6.3.1 Phase I: Learning the Perceptual Space
Firstly the robot is manually driven in the area that is to be mapped.
The area
is covered thoroughly, and the robot perception is logged. A SOM is then trained
oine using the standard training procedure given in Chapter 2. The objective of
the SOM training is both to cluster the high dimensional perceptual space and map
those clusters to a two dimensional SOM.
After training the SOM coarsely splits the perceptual space into clusters, each cluster
corresponding to a class of perceptions.
Each class of perceptions equates to an
area of perceptual space which, provided the perception varies smoothly over space,
will equate to a well dened region in physical space. Importantly, the topologypreserving property of the SOM means that close nodes will relate to close physical
locations. This fact can later be exploited when localising.
6.3.2 Phase II: Creating a Topological Map
The trained SOM can be used to classify new perception vectors. Rather than using
only the winning node as the output of the SOM classication (as is usually the case),
the
q -dimensional
classied vector
perception vector
P̃ ,
P~
is transformed to an
(N × M )-dimensional
with each element containing the Euclidean distance between
the current perception and the weights of a node as given in Equation 6.1:
v
uX
u q
P̃i = t (P~j − w
~ ji )2
j=1
where
SOM.
w
~ ji
is component
j
of the
ith
i = 1...N × M.
SOM node, and
N ×M
(6.1)
is the dimension of the
6.3. Using a SOM for Ecient Topological Mapping
Mapping is achieved by using the classied perception vectors
98
P̃
in place of the
initial perception vector to represent topological map nodes. Each topological map
node is stored underneath the SOM as a sub-node of the best matching unit in the
SOM, as shown in Figure 6.3. This process addresses two of the complications of
dense topological maps detailed above:
1. This method of storage allows fast and ecient localisation when searching the
topological map nodes to nd which node best matches the current perception.
The nodes that represent the same or near-by physical locations also have
similar classied perception vectors that are stored as sub-nodes of the same or
near-by SOM node. When localising, the current perception vector is classied
using the SOM and Equation 6.1. Only the sub-nodes of the best matching
unit in the SOM are then compared to the current classied vector
closest sub-node is taken as the current topological map node.
P̃ , and the
This avoids
comparing all map nodes, and ideally will reduce the number of comparisons
necessary by a factor of the number of nodes in the SOM.
2. By using only the classied perception vector, the data that needs to be stored
for each topological map node can be reduced. For example, using a
10 × 10
SOM will mean that every node in the topological map is described by a
vector.
100D
This means that for each node in the topological map a perception
vector does not need to be stored, but instead only a classied perception
vector. This way all perceptions are stored in a compressed form.
Weighted links are placed between nodes within the topological map (the sub-nodes
in the SOM), with information stored in the link that species the relationship
between the nodes and the condence that the link is correct.
In the rst set of
experiments reported in this chapter, the nodes are related by discrete physical
actions, while in the second set of experiments the node relationships can be continuous physical distances. By storing this metric information in the links between
nodes, a form of hybrid metric-topological map is created.
6.3. Using a SOM for Ecient Topological Mapping
99
N x M Self Organising Map
M
perception vector N
N x M dimensional classified
perception vector of SOM
node responses
best matching unit
classified perception
stored within the SOM as a topological map node
store vector under winning unit
Figure 6.3: The use of a self-organising map to eciently store and index a dense
topological map.
Perceptions are classied by the self-organising map and stored
as topological map nodes in association with the best matching unit in the selforganising map.
As the map is constructed without explicit consideration for metric consistency,
the same physical location may potentially be mapped multiple times by dierent
nodes. For this reason links are also stored between perceptually similar nodes, with
weights that reect the certainty of the link. This closes the loops in the map.
6.3.3 Navigation with the Topological Map
In order to navigate from the current location to a target location, a route through
the topological map needs to be found, starting at the node that represents the
current location and ending at the node that represents the target location.
The
local metric information stored for each of the links can then be used to navigate the
robot from node to node and ultimately to the target location. When the current
and target nodes in the map are known, Dijkstra's shortest path algorithm is used
to nd a route between them.
6.3. Using a SOM for Ecient Topological Mapping
100
6.3.4 Map Parameters and Computational Cost
There are several parameters involved in the SOM based topological mapping and
navigation algorithm proposed. These are summarised as follows:
•
Size of the SOM:
The size of the SOM used inuences the eciency of the method. Assuming
that the SOM is perfect, it will segment the mapped area equally with each
SOM node indexing an equal number of topological map nodes. When localising within the map, the number of comparisons that need to be made will then
be
n
full size perception vector comparisons, where
in the SOM, and
n
is the number of nodes
V /n classied perception vectors comparisons where V
is the
number of topological map nodes. A SOM size should be selected to balance
the number of comparisons necessary to the minimum, taking into account the
size dierence between the perception vector and its classied counterpart.
•
Perception size and properties:
The eciency and eectiveness of the approach is inuenced by the way in
which the camera information is used.
It is important that the perception
vector varies smoothly over the physical space, with close physical locations
having similar perceptions. This ensures that the topology of the SOM relates
to the topology of the physical space.
It is equally important that the size of the perception vector be kept small.
When localising and mapping, the current perception vector needs to be compared to each node in the SOM. If a large perception is used then this will
require more computational power.
•
Type of metric information:
The metric information that links the topological map nodes together can be
discrete, in the form of a limited number of actions performable, or continuous
allowing any change in pose. Limiting the action space of the robot to specic
6.3. Using a SOM for Ecient Topological Mapping
motions, such as forward
10cm
and right turn
90◦ ,
101
has the advantage of re-
stricting the physical pose space that the robot can occupy, but subsequently
limits the usefulness of the algorithm. Conversely, allowing the robot to take
arbitrary motions between nodes creates a more general and useful navigation
algorithm, but increases the diculty of navigation.
•
How links are weighted and a path found:
The weighting of links in the map governs which route will be found by the
breadth rst search when navigating from a start node to a target node. The
link weights need to be selected so that they reect the certainty that the link
is true. Thus the route that is most likely to succeed in navigating the robot
will be selected.
•
Map density:
The density of the map is controlled by the map node creation criteria, which
in the implementations detailed here is xed at regular intervals throughout
the mapping process. In the rst set of experiments (detailed in Section 6.4)
map nodes were created before and after execution of a discrete action. In the
second set of experiments (detailed in Section 6.5) map nodes were created at
regular
•
1
second intervals as the robot moved at a speed of
0.2m/s.
Map building termination criteria:
In the experiments detailed here, the decision of when to terminate the map
building stage was made manually by an expert user.
When the robot was
deemed to have driven suciently around the environment, the process was
stopped. Automatic stopping criteria could be the focus of future work.
The computational cost of the algorithm is an important factor dictating its applicability to low-cost robots. The algorithm has very low complexity with the most
demanding component being the initial training of the self-organising map. This is a
task that only needs to be carried out once, and has no real-time constraints. On the
whole, the algorithm is highly parallelisable, and as such is well suited for hardware
implementation, with hardware implementations of SOMs in development [71].
6.3. Using a SOM for Ecient Topological Mapping
102
The complexity of the mapping and localisation depends on the number of topological map nodes (V ) and the number of self-organising map nodes (n). The best case
complexity occurs when all
V
topological map nodes are evenly indexed by the
self-organising map nodes. In this case,
required and
V /n
n
n self-organising map node comparisons are
topological map node comparisons are required. The complexity
of the mapping and localisation algorithm is
O(V /n + n).
In the worst case all
topological map nodes are indexed by a single self-organising map node; in this case
all topological map nodes and self-organising map nodes need to be compared, and
the complexity is
O(V + n).
There are three possible scenarios:
equal to
If
n,
V n
or
V
then
V
is vastly smaller than
n
is vastly larger than
n, V
is approximately
n.
can be considered as a constant, and therefore removed from the
calculations:
best case:
O
V + n2
V
+n =O
≈ O V + n2 ≈ O (V )
n
n
(6.2)
worst case:
O (V + n) ≈ O (V )
If
(6.3)
V ≈ n,
best case:
O
worst case:
V
+n
n
≈O
n
n
+ n ≈ O (1 + n) ≈ O (V )
(6.4)
6.4. Experiments in Discrete Navigation
103
O (V + n) ≈ O (2V ) ≈ O (V )
The case of
V n
is specular to
V n.
(6.5)
This establishes that both the lower and
upper bounds to the algorithm complexity are
O (V ),
that is, linear in the number
of topological map nodes.
The use of a self-organising map to compress the perceptions results in low storage
cost. For each node in the topological map, a vector of size equal to the number
of nodes in the SOM is stored.
requirement of
100
For a
10 × 10
SOM, this results in a memory
oating point values for each node approximately
bytes (390 Kbytes) per
1000
nodes when using
4
400, 000
byte oating points. The storage
of the SOM requires storing a oating point vector of the size of the raw perception
for each node in the SOM.
In the next two sections, separate experiments are carried out with dierent perceptions and metric information. In the rst set of experiments discrete actions are
used with a blob detection based perception. In the second set of experiments the
algorithm is applied in a more general case of continuous actions, and the weighting of links in the map is developed to allow inconsistent maps to be acceptable.
In all experiments the proposed algorithm operated on the robot, with processing
carried out by the robot's on-board computer. The robot used in experiments was a
MetraLabs Scitos G5 mobile robot equipped with a
1.6Ghz
PC similar to the Asus
Eee-Box discussed in Chapter 1.1. Navigation was carried out in real-time.
6.4
Experiments in Discrete Navigation with a Simplied Perception
To verify that the use of a SOM can assist in dense topological mapping, and to
demonstrate a real implementation of the approach outlined in Section 6.3, exper-
6.4. Experiments in Discrete Navigation
104
2.7m
Blue tile
A
Blue tile
B
2.4m
Blue tile
C
Blue tile
D
Figure 6.4: Left: The lab setup with four blue tiles on the oor. Right: An example
omnidirectional image taken on the robot, with the four blue tiles visible.
iments were performed with a simplied perception and discrete motions between
the map nodes. This section details the implementation and presents the results of
two experiments.
6.4.1 Experimental Environment
In the experiments in this section a
190◦
640 × 480
RGB webcam was coupled with the
eld of view miniature sh-eye lens described in Chapter 2, with a total cost of
less than
£100.
The camera was positioned on the robot at the centre of rotation,
facing the ground. A
2.7 × 2.4m
environment was constructed as shown in Figure
6.4, with no obstacles present. An example image taken with the camera is shown
in Figure 6.4.
6.4.2 Local Appearance-Based Perception
Four blue tiles were placed in the environment as shown in Figure 6.4. The bearing
θ
to each of these tiles from the robot was used to form a
Equation 6.6. The
4D
4D
vector
~
Θ
given in
vector was then used as the robots perception:
~ = (θA θB θC θD )T
Θ
(6.6)
6.4. Experiments in Discrete Navigation
To turn the camera image into a
105
4D perception vector, the blue tiles were detected in
the image using opposing colour channel coding (see Section 2.4.1) and the bearing
to each tile was calculated as
θ = arctan
where
(Cu , Cv )
Cu − Ru
Cv − R v
is the centre of the image and
(Ru , Rv )
(6.7)
is the centre of the imaged
tile.
The tiles were tracked throughout the traversal of the robot in order to ensure that
the bearings vector
~
Θ
remained in the xed order of Equation 6.6.
6.4.3 Training a SOM
In order for the robot to learn the perceptual space (Phase I, Section 6.3.1), it was
manually driven around the entire arena shown in Figure 6.4, and
1500 4D bearings
vectors as described in Section 6.4.2 were logged at a rate of 2 Hz.
vectors were then used oine to train a
7×7
The bearing
SOM. It has been shown in [82] that
the Euclidean distance is not as eective as the Mahalanobis distance when training
SOMs with vectors of landmark bearings.
However, the Euclidean distance was
used in these experiments. The implication of this decision is that the SOM will not
optimally index the space, with some nodes indexing a larger number of topological
map nodes than others.
6.4.4 Mapping Nodes with Discrete Action Links
In the second learning stage, the robot explored the environment using only four
discrete actions: forward
15
cm, backward
15
cm, left turn
45◦
and right turn
45◦ .
Note that these four actions are reciprocal, they can undo each other. In total,
approximately 5000 actions were performed.
6.4. Experiments in Discrete Navigation
106
Before and after each action, the perception was classied and mapped as detailed
in Section 6.3.2. The links between topological map nodes were labelled with the
action that was taken, and links in the opposite direction labelled with the inverse
of the actions were created as shown in Figure 6.5. The explored links were assigned
a cost value of
1.
F
S
B
S
L
R
R
S
L
S
S
S
S
S
Action links
Subnodes
SOM nodes
Figure 6.5: An illustration of the mapping of actions between sub-nodes of the SOM.
Sub-nodes are shown as small circles, SOM nodes as large circles and action links
as directional arrows.
The actions and links are represented by their initials, F
(forward), R (right), L (left), B (back) and S (similarity link).
In addition to the links between nodes created as the robot explored, similarity links
were created between sub-nodes of similar classied perception vectors. These links
were assigned a cost value of
3
plus the Euclidean distance between the perception
vectors. The additional cost assigned to similarity links reects the fact that they
are not as certain as links that have been previously traversed. The result is that
known links will be preferred when navigating, and if three actions will perform the
same result as a similarity link then the robot will always choose to take the three
already tried-and-tested actions.
The choice of how heavily the similarity links are weighted inuences how likely they
are to be selected when nding the shortest path between nodes in the topological
map. In these experiments the choice was made by manually looking at the data
that was being captured.
However, this is not possible when a robot is deployed
6.4. Experiments in Discrete Navigation
107
in everyday situations. In the experiments presented in Section 6.5 a more realistic
method of weighting links is used.
6.4.5 Navigation with Discrete Actions
After completion of the two learning stages, the SOM indexed topological map was
ready to be used to navigate from the current location to the desired perception.
This was achieved as follows:
1. Take the current 4D bearing vector and use the SOM to calculate the 49D
vector of node responses.
2. Find the closest matching sub-node from the SOM by comparing the 49D
vector to all the sub-nodes of the ring node and the nodes surrounding it.
3. Find the closest matching sub-node to the target perception in the same way
as for the current perception.
4. Find the shortest path between the current sub-node and the target sub-node,
taking into account the cost of traversing each link.
5. Execute the rst action in the path.
6. If the destination has not been reached then go back to step 4.
6.4.6 Results
In this section results of experiments testing the navigation performance are presented. A MetraLabs Scitos G5 mobile robot was used in all experiments, and a
Vicon tracking system was used to track the robot in order to assess the accuracy
and reliability of the algorithm.
6.4. Experiments in Discrete Navigation
108
Navigations to a Goal
In order to assess the navigation method, the robot was placed at a goal location and
the perception at that location was stored. The robot was then moved to a dierent
location and the navigation algorithm described in Section 6.4.5 was applied. This
was repeated with 10 dierent starting positions.
The results are shown in Figure 6.6 and in tabular form in Table 6.1.
The goal
location is shown as a large diamond, and the start and end positions of each run
are shown with lled circles and black dots respectively.
Figure 6.6: Ten navigations to a goal location from dierent starting positions. Each
run is numbered at the starting position.
The results show that in all tests the robot was able to move closer to the target perception. The absolute mean error in the nal position of the robot was
metres, with a mean heading error of
position errors were
0.25
m and
0.009
2.98◦ ± 0.57◦ .
m respectively.
0.132 ± 0.024
The maximum and minimum
6.4. Experiments in Discrete Navigation
Start Position
Run
θ
109
End Position
[rad]
θ
X [m]
Y [m]
X [m]
Y [m]
1
0.968
-0.898
1.569
-0.628
0.595
1.674
[rad]
Actions
53
2
0.676
-0.796
1.583
-0.715
0.715
1.591
61
3
-1.052
-0.844
1.622
-0.865
0.810
1.491
83
4
1.039
0.922
0.003
-0.762
0.822
1.514
65
5
0.006
0.006
2.369
-0.583
0.574
1.511
30
6
0.015
0.694
-0.008
-0.632
0.806
1.612
31
7
0.392
1.038
1.577
-0.768
0.729
1.587
46
8
0.016
-0.470
-1.549
-0.618
0.724
1.621
36
9
-0.032
0.633
1.570
-0.595
0.653
1.483
31
10
1.148
-0.746
1.593
-0.713
0.746
1.570
68
Table 6.1: Ten navigations to a goal location from dierent starting positions. Run
numbers correspond to the run numbers shown in Figure 6.6.
The route taken to the goal by the robot is often inecient, as can be seen in
Figure 6.6, with 83 actions being taken in run 3. This is due to the method favouring
locations which the robot has previously been able to navigate to. Traversing the
similarity links in the map is considered more expensive than regular action links,
resulting in paths that contain more actions being shorter than paths containing
similarity links. If more training was carried out in the second stage, more action
links would be created and shorter paths would be found.
Real-time operation was comfortably achieved, with the map only consulted in between discrete actions while the robot was stationary. The robot speed for carrying
out the actions proved to be the limiting factor of the algorithm, with a linear
velocity of
0.1m/s
used to ensure repeatable actions.
Repeatability
The repeatability of the proposed navigation method was tested by performing ve
navigations to the same goal position, starting from approximately the same location
each time. The results are shown in Figure 6.7.
On each run the same path was taken, with dierences in the end position only
6.4. Experiments in Discrete Navigation
110
Figure 6.7: The repeatability of the navigation. Five navigations from approximately
the same starting position to the same goal position (shown as a diamond).
caused by slight variations in the starting position and actions not always resulting
in the exact same motion of the robot. The fact that the same path was taken on
each run shows that close positions in pose space are producing similar perceptions.
The example of route ineciency marked in Figure 6.7 occurred for the same reasons
as inecient routes in the rst experiment. The method favours locations which the
robot has previously been able to navigate to. Traversing the similarity links in the
map is considered more expensive than regular action links, resulting in paths that
contain more actions being shorter than paths containing similarity links.
6.4.7 Discussion of Discrete Navigation Experiments
The experiments in this section have demonstrated how the environment learning
and navigation strategy proposed is both eective and cheap to implement.
The
experiments presented made use of a restricted sensor perception by relying on
the presence of four blue objects as landmarks.
Although this restriction is not
6.5. Experiments in Continuous Appearance-Based Navigation
111
completely unrealistic, as in many environments there are such well dened objects,
the use of visual features was shown to be undesirable in Chapter 4. Furthermore,
it was shown in Chapter 5 that with four easily identiable and constantly visible
landmarks, a metric SLAM approach would be as eective as this approach.
The use of the blue tiles simply served to demonstrate the viability of the approach,
which is applicable to more general image appearance descriptors. The implementation presented in this section was further simplied by limiting the action space to
just four carefully repeatable actions. In the next section the method is developed
further to remove these limitations.
6.5
Experiments in Continuous Appearance-Based
Navigation
The experiments detailed in Section 6.4 showed that the approach to navigation with
a SOM is possible, albeit in a specially constructed environment with the presence
of detectable blue tiles presumed.
In the experiments detailed in this section the
image data is minimally processed before being used in an appearance-based manner.
Using this approach image features need not be detected, matched or tracked, and
no assumptions need to be made about the presence of specic types of features.
In contrast to the previous experiments, continuous motion is used in this section
and a more complete navigation algorithm is developed.
6.5.1 Experimental Environment
The same camera as used in the experiments in the previous section was used in these
experiments. In this experiment it was positioned facing the ceiling so that people
walking around and furniture being moved would not interfere with the perception.
6.5. Experiments in Continuous Appearance-Based Navigation
Figure 6.8:
(a) The Scitos G5 robot used in the experiments.
112
(b) An example
omnidirectional image taken on the robot with the camera facing the ceiling.
The robot was restricted to a larger
4×4
metre area, again with no obstacles. The
robot and an example distorted wide eld of view image captured by the camera
are shown in Figure 6.8.
6.5.2 Global Appearance-Based Perception
The image received from the camera is minimally processed before being used directly as the perception vector. The aims of the processing are to achieve rotation
and illumination invariance and to reduce the size of the image.
Rotation Invariance
A compass is desirable as it reduces the number of degrees of freedom of the robot
and, as shown by Zeil et al. [127], when images are aligned to the same direction
the dierence between them varies smoothly over
x, y
space. Rotation invariance
and bearing calculation can be achieved in a number of ways, for example magnetic
compasses, visual compasses such as those detailed in Chapter 3.2.1.1, or Fourier
space representations such as [100].
In these experiments a colour blob detection
based algorithm for nding and removing the rotation from the images was used.
6.5. Experiments in Continuous Appearance-Based Navigation
113
Two square tiles were placed on the ceiling of the laboratory, one red and one blue.
These tiles were detectable as the largest red and blue regions in the undistorted
640 × 480
RGB image using opposing colour channel coding (see Section 2.4.1).
When the image locations
is rotated
and
(ur , vr )
and
(ub , vb )
of the two tiles are known, the image
θ degrees so that the image locations are aligned vertically, that is ur = ub
vb < vr .
The rotation is calculated as
θ = atan2(vr − vb , ur − ub )
(6.8)
Having a global compass bearing allows all odometry values of the robot to be
expressed in the same
x, y
axis directions.
Illumination Invariance
The intensity of light varies over time, and a pixel-wise comparison of two images
taken at exactly the same location at dierent times can be misleading. In order to
remove some of the eects of varying illumination, the gradient magnitude image
is computed by convolving with
derivatives
fx
and
fy
x
and
y
Sobel gradient operators.
The partial
for each pixel are combined to give a new pixel value:
q
G = fx2 + fy2 .
(6.9)
Image Size
The initial image supplied by the camera is RGB colour with a resolution of
This is reduced to a
160 × 120
640×480.
greyscale image, from which a circular region of
radius 50 pixels is taken from the centre. The circular region is mapped to a
7986
dimensional vector in a spiralling concentric circles pattern as shown in Figure 6.9.
Although a column-wise or row-wise mapping would also be possible, a concentric
circles mapping improves the rotational invariance by considering only the circular
region and avoids discontinuities in the vector at the row boundaries.
The process of taking an image and turning it into the appearance descriptor is
6.5. Experiments in Continuous Appearance-Based Navigation
114
summarised in Figure 6.9.
Figure 6.9: The raw image is turned into an appearance-based descriptor.
6.5.3 Training a SOM
In order for the robot to learn the perceptual space (Phase I, Section 6.3.1), it
was manually driven around covering the entire environment, and 1000 images were
logged at a rate of 2 Hz.
These images were then turned into
7986-dimensional
perception vectors as described in Section 6.5.2 and used to train a
10 × 10
SOM.
A graphical representation of the trained SOM is shown in Figure 6.10.
Figure 6.10: A graphical representation of the trained SOM using the perception
vectors described in Section 6.5.2.
6.5. Experiments in Continuous Appearance-Based Navigation
115
6.5.4 Mapping Nodes with Continuous Action Links
In the second learning stage, the robot was again driven around the environment.
Once per second, the current classied perception vector
robot heading
θ
P̃
and the corresponding
were stored as a sub-node under the winning node in the SOM as
detailed in Section 6.3.2.
The links between topological map nodes were labelled
with the dierence in the robot's odometry between the perceptions. As the robot
explored the environment a dense topological map of
4000
nodes was constructed.
As new topological map nodes were created, the nodes assigned to the same SOM
node were compared.
A similarity link was created between sibling nodes with
similar perception vectors, and these links were labelled with the dierence in the
robot heading (θ ) between the nodes and the Euclidean distance between the node's
perception vectors. This information was then used when navigating with the map.
6.5.5 Navigation with Continuous Action Links
The local odometry osets stored for each of the links in the route between the target
node and the current node need to be combined to nd the total distance the robot
needs to travel.
In the experiments detailed in Section 6.4 the robot was driven
by the actions specied in the route without explicitly considering the uncertainties
involved in the localisation or the route found.
However, a better approach is to
formulate the problem in a probabilistic way, encompassing all of these uncertainties.
To eciently achieve this the framework of a linear Kalman lter with a one dimensional state space was used. The topological map can be considered as a noisy
sensor that returns how far in metric terms the robot must move to get to the goal
location. The measurement of the
x
and
y
osets from the current location to the
goal location are corrupted by noise caused by poor localisation and inconsistencies
in the map. Acknowledging this at the point of navigation avoids the need to create
or maintain a consistent map.
6.5. Experiments in Continuous Appearance-Based Navigation
Find the distance left to
travel to the goal using
the topological map
Update
Drive towards goal
using Dynamic Window
Controller
Linear Kalman
Filter
Odometry change
since last cycle
116
Predict
Figure 6.11: A ow diagram of the navigation algorithm encompassing the inaccuracies in the map by employing a linear Kalman lter.
Figure 6.11 shows a ow chart of the navigation algorithm. Although the errors in
and
x
y goal distances are not independent, using two one dimensional lters results in
lower computational cost while still remaining operational. To lter the measured
distance to the goal location, the usual predict-update cycle of the Kalman lter
was followed:
1. The current
where
x
and
y
oset is predicted as
x̂k = x̂k−1 − ux,k
(6.10)
ŷk = ŷk−1 − uy,k
(6.11)
σx,k = σx,k−1 + qx,k
(6.12)
σy,k = σy,k−1 + qy,k
(6.13)
(x̂, ŷ) is the oset to the goal location, and (ux , uy ) is the odometry dis-
tance travelled since the last cycle. The subscript
the variance of the ltered goal oset and
(qx , qy )
k
is the time step,
(σx , σy ) is
is the variance of the odom-
etry distance travelled since the last lter cycle (2% of the distance covered
in experiments reported here). The initial estimate
(x̂0 , ŷ0 )
is set as the rst
measurement of the oset made using the topological map.
2. A measurement of the oset is made using the topological map.
This is
achieved using a weighted breadth rst search to nd the shortest path be-
6.5. Experiments in Continuous Appearance-Based Navigation
117
tween the current node and the target node, with the cost of each link set as
the variance of the measurements involved in the link. In these experiments
odometry oset links were weighted with a variance of
2%
of the distance the
link covers, equating to the condence in the odometry system. The similarity
links were weighted in relation to the distance between the perception vectors,
up to a maximum of
The variance
(rx , ry )
5cm.
of the measured oset from the current location to the
goal location is computed as the sum of all the variances in the route. This
measured oset is then used to update the oset estimate
tainty measure
(x̂, ŷ)
(σx , σy ):
σx,k
(zx,k − x̂k )
σx,k + rx,k
2
σx,k
= σx,k −
σx,k + rx,k
σy,k
= ŷk +
(zy,k − ŷk )
σy,k + ry,k
2
σy,k
= σy,k −
σy,k + ry,k
x̂k = x̂k +
σx,k
ŷk
σy,k
where
(zx , zy )
and its cer-
(6.14)
(6.15)
(6.16)
(6.17)
is the oset to the goal location as measured by re-localising
within the topological map and nding a route to the goal node from the
current node.
3. The Kalman ltered
(x̂, ŷ)
oset is then used as the goal position in a robot-
centred coordinate frame, and a controller based on the dynamic window approach [37] is used in order to drive the robot towards the goal. When either
the oset lies within one standard deviation of
0,
or is less than
0.15m
then
the navigation process is terminated and the goal is assumed to be reached.
This method of navigation allows odometry drift errors and localisation errors to be
smoothed and ignored without having to employ computationally costly topology
checks such as those carried out in [29].
As the robot approaches a target, the
errors in the map will get smaller. Provided the robot continually re-evaluates the
6.5. Experiments in Continuous Appearance-Based Navigation
118
necessary route, then these errors will not prevent navigation.
6.5.6 Results
In this section the results of experiments which tested the navigation performance
of the method presented in Section 6.5.5 are presented. The MetraLabs Scitos G5
mobile robot shown in Figure 6.8 was used in all experiments, and a Vicon tracking
system was used to track the robot in order to assess the accuracy of the algorithm.
Navigations to a Goal
The robot was placed at a goal location and the perception at that location was
stored. The robot was then moved to a dierent location and the navigation algorithm described in Section 6.5.5 was applied. This was repeated with 15 dierent
starting positions.
The results are shown in Figure 6.12. The goal location is shown as a large diamond,
and the start and end positions of each run are shown with open circles and stars
respectively. Figure 6.12b shows the ltering of the oset estimation during one of
the navigations.
The results show that in all tests the robot was able to move closer to the target
perception. The absolute mean error in nal position of the robot was
metres.
The maximum and minimum position errors were
0.456
0.283 ± 0.027
m and
0.137
m
respectively.
The route taken to the goal by the robot was often not completely direct, as can be
seen in runs
1
and
13.
This is because the oset to the goal location, as measured
using the topological map, uctuates around the true oset, and a combination of
the Kalman lter and the physical robot causes a smooth trajectory to be taken. In
run 1 the robot can be seen to initially be driving away from the target. However,
6.5. Experiments in Continuous Appearance-Based Navigation
119
2.5
10
2
1.5
Y (metres)
1
6
12
5
1
9
14
3
11
15
0.5
0
7
13
2
-0.5
-1
4
-1.5
-2
-1.5
8
-1
-0.5
0
0.5
X (metres)
1
1.5
2
2.5
(a)
0.5
0
X offset
-0.5
Distance (metres)
-1
-1.5
-2
-2.5
1 sigma error bound
actual offset
measured offset
filtered offset
-3
Y offset
-3.5
-4
0
50
Time step
100
150
(b)
Figure 6.12: (a) Fifteen navigations to a goal location from dierent starting positions. The goal location is shown as a large diamond, the nishing positions are
shown as stars and the starting positions as empty circles. Each run is numbered at
the starting position. (b) The Kalman ltered
goal navigation runs.
x
and
y
osets during one of the 15
6.5. Experiments in Continuous Appearance-Based Navigation
120
this is not a result of errors in the topological map, but is rather caused by the use
of the dynamic window approach to control the robot. In this case the robot was
initially facing away from the goal, and in order to achieve a continuous smooth
trajectory the linear velocity was kept at a minimum of
0.2 m/s.
Again real-time operation was comfortably achieved. In this experiment the speed
of localising and nding a route through the map is critical, as unlike in the previous
experiments, the robot is moving during the calculation. The lter-based navigation
algorithm was run at
10 Hz, with a single cycle of the navigation algorithm (localising
and nding the remaining distance to travel) taking approximately
85
ms.
This
imposes a limit on the velocity that the robot can be driven at because of the lag
between taking an image and knowing the position. For example, at a velocity of
1m/s
the robot will have moved
typical speeds of
0.3
10cm
during the time taken to localise.
At the
m/s used here the lag is negligible.
Random Navigations
300
The navigation algorithm was further tested by performing
range of map nodes.
navigations to a
The destination map nodes were automatically selected at
random, and each trial took place sequentially. The dierence between the nishing
location of the robot and the target node location in external world coordinates was
recorded. A two dimensional scatter of these errors is shown in Figure 6.13.
Figure 6.13 shows that the errors in the ending
follow normal distributions. For the error in the
metres with a standard deviation of
error was
−0.0186
0.6012
y
x
and
y
positions approximately
direction, the mean error is
metres. For the
metres with a standard deviation of
x
0.5313
0.045
direction the mean
metres.
On some of the navigations the nishing position error was large, and indeed the
navigation was manually terminated outside of the mapped area. These are examples
of complete failure of the system, and occur whenever the robot drives outside of
the mapped region.
When outside of the mapped region the localisation cannot
6.5. Experiments in Continuous Appearance-Based Navigation
121
Y (metres)
4
2
0
−2
−4
−2
−1
0
1
2
X (metres)
Figure 6.13: A scatter plot of the errors for 300 navigations.
succeed, and getting back to the known area is only possible if the mapping phase
is still active.
Overall, of the 300 navigations
nished within
0.5
88%
nished within
1
metre of the target and
67%
metres.
Route Following
As a further qualitative demonstration of the algorithm, a route following experiment
was performed.
The robot was manually driven to ve waypoint locations, and
the topological map node was recorded. The robot then autonomously navigated
along the route through these points using the navigation algorithm described in
Section 6.5.5. Fifteen loops of the route are shown in Figure 6.14.
The navigation algorithm was able to correctly navigate the robot around the 5
waypoint course.
The paths taken were all correct with the notable exception of
the circular loop indicated.
The robot performed this loop because it overshot
the target location and then was facing in the wrong direction. At this point the
6.5. Experiments in Continuous Appearance-Based Navigation
2
122
Way-point
overshoot
1.5
1
Y (metres)
0.5
0
-0.5
-1
-1.5
-2
-2.5
-1
0
1
X (metres)
2
3
Figure 6.14: Fifteen navigations of a route described by ve waypoints indicated by
crossed circles. Red circles indicate when the robot believed the waypoint had been
reached.
dynamic window controller used to drive the robot maintained a linear velocity of
0.2m/s
while trying to get back to the goal.
6.5.7 Discussion of Continuous Navigation Experiments
The experiments presented in Section 6.5 have extended those presented in Section
6.4 by allowing an arbitrary motion between nodes and by using the image directly
as the perception. The algorithm has been shown to be applicable in a more general
setting, and the accuracy has been quantitatively and qualitatively evaluated.
The proposed approach remains partially limited by the requirement for the robot
heading in an external reference frame to be known, as provided here by two coloured
6.6. Limitations of the Method
ceiling tiles.
123
However, this limitation is not completely restrictive.
A number of
visual compass approaches have been developed (see Chapter 3.2) that could be
used to provide such information without the need for articial landmarks.
6.6
Limitations of the Method
There are a number of limitations to the new navigation method proposed, each of
which could be investigated further in future research. Firstly, perceptual aliasing
has not been explicitly considered. In particular, if two locations in the environment
produce the same perception then the localisation will fail.
This in turn means
that the constructed map will be incorrect. This is partially solved here by using a
Kalman lter to combine goal distance estimates with predictions based on odometry
distance travelled when navigating with the map.
However, the algorithm could
potentially fail if there are many instances of perceptual aliasing.
If the robot navigates outside of the mapped area (as was the case in some tests
in the random navigations experiment in Section 6.5.6) then navigation will fail.
Solutions to this problem have not been considered here, but could involve keeping
the mapping phase active during the navigation. However, this in turn highlights
an additional limitation; although the complexity of the algorithm is low (linear in
the number of map nodes), if the number of map nodes increases unbounded then
localising will eventually become computationally intractable. A potential solution
to this would be to introduce a node pruning strategy, although in the algorithm
detailed here this has not been considered.
A further limitation of the proposed algorithm is that it assumes that the environment is static. If this is not the case, then localisation within the map may fail. A
potential solution to this problem is to only map parts of the environment that are
static. This suggests adding a component to the algorithm that dierentiates between static and dynamic parts of the image, and could form an interesting direction
for future research.
6.7. Summary
6.7
124
Summary
In this chapter a new environment learning and navigation strategy has been proposed.
The method makes use of dense topological mapping to allow odometry
information to be successfully used for navigation, while at the same time maintaining eciency through the use of a SOM to index the perceptual space. Experiments
with both a simplied blob detection based perception and a general appearancebased approach have shown the eectiveness of the algorithm.
The self-organising map has been shown to be a suitable tool for clustering the
perceptual space, and the use of a SOM within the algorithm not only allowed
greater eciency by compressing the vectors, it also allowed increased speed by
reducing the number of comparisons that were required during localisation.
This
allowed the advantages of dense topological mapping detailed in Section 6.2.1 to be
realised, and the disadvantages detailed in Section 6.2.2 to be avoided.
The proposed method and experiments allow the following conclusions to be drawn
regarding the best method of achieving reliable navigation on a low-cost low-powered
commercially viable robot:
•
Topological mapping combined with local odometry based metric information
is a feasible approach for navigation with low-cost robots. Specically, a dense
topological mapping approach to navigation is desirable when the map is compressed and indexed by a SOM as it allows the navigation of arbitrary routes
through the environment.
•
Metric consistency is an expensive goal, and is not necessary for eective robot
navigation. Ignoring inconsistencies in the map until the point of navigation
reduces the computational cost.
•
Appearance-based approaches to the use of the camera data are eective, general and ecient, and are therefore well suited for use on low-cost robots.
6.7. Summary
The accuracy of the method presented was shown to be in the range of
125
0 to 1 metre.
When a higher accuracy than this is needed, for example when docking the robot,
an alternative method of navigation needs to be employed. In the next chapter an
ecient method for accurate local navigation is developed that does not make use
of a map, but rather works directly with a snapshot of a target location in order to
manoeuvre into a more accurate position.
7
Accurate Visual Appearance-Based
Local Navigation
7.1
Introduction
In Chapter 6 a system was developed that was capable of navigating the robot to
within
1m
of the goal location. The system demonstrated that visual appearance-
based algorithms are suitable for low-cost robot navigation, and that computationally demanding geometric methods such as those developed and investigated in
Chapters 4 and 5 are not necessary for an approximate navigation system. However, when a more accurate navigation is required an algorithm such as the one
developed in this chapter is required.
A higher level of accuracy is required when docking with a charging station, for
example, where the required level of accuracy depends on the size and type of
the docking station.
A MetraLabs Scitos G5 mobile robot like that used in the
experiments in this thesis has two pickups next to the wheels that are lowered when
the robot stops. For charging to be possible the robot must be stopped so that each
contact is above a dierent oor tile as shown in Figure 7.1.
Due to the spacing
between the contacts and the size of the oor tiles the required accuracy is about
7.1. Introduction
127
Figure 7.1: The powered oor arrangement at the University of Ulster. The robots
have two contacts on the base that are lowered when the robot stops moving. For
charging to be possible each contact must be placed over a dierent oor tile, one
positive and one negative. This requires the robot to stop with an accuracy of about
20cm.
20cm.
A similar arrangement is employed by MobileRobots Inc with their Pioneer
3 DX charging mat, with a required accuracy of about
15cm.
Combining an approximate navigation algorithm such as that developed in the previous chapter with an accurate local docking procedure yields a two-tiered navigation
strategy. Such an approach is common in navigation systems where a high level of
accuracy is not required for all operations, but may incur a higher computational
cost. The aim of this chapter is to develop a complementary navigation algorithm to
that presented in Chapter 6, with a shorter range but higher accuracy, as a process
to be applied after navigating to within
1m of the goal location.
In many situations
it is not necessary for a docking procedure to exhibit smooth motion, although a
computationally ecient algorithm is still required.
The accurate local navigation algorithm developed in this chapter employs the snapshot model discussed in Chapter 3, and performs computationally ecient image
processing of the current location and the target location images. Like the previous
7.2. Appearance-Based Docking Algorithm
128
chapter, only the global visual appearance is used, avoiding the need to extract and
match image features. The proposed algorithm diers from the existing snapshot
model based approaches described in Chapter 3 in that no costly feature extraction
is necessary, and an isotropic distribution of objects in the environment is not assumed. As in existing image-based homing methods such as Franz et al. [40], 1D
images are used for eciency. However, rather than a single radial 1D image, two
1D representations are created from the input 2D image to allow fast translation
estimation.
In the following section the details of the proposed algorithm are given, and in
Section 7.3 experiments are detailed. A comparison of the proposed method to those
discussed in Chapter 3 is given in Section 7.4, and the computational eciency and
suitability to low-cost robots is discussed in Section 7.4.1.
7.2
Appearance-Based Docking Algorithm
In the algorithm developed in this chapter the camera is placed at the centre of
rotation, with the optical axis perpendicular to the robot and the direction of view
pointing towards the ceiling in exactly the same way as in the second set of experiments in Chapter 6. The omnidirectional lens used in Chapter 6 is used again to
provide a wide eld of view, ensuring substantial overlap between the current and
the target image. The complete eld of view is not used, but rather just a central
90◦
area which is undistorted, as described in Chapter 2, to create a perspective
image.
In order to navigate the robot to the target image, the translation and rotation
between the current image and the target image are calculated.
As the ceiling is
approximately planar with only small variations in height, all pixels in the current
image will be related to pixels in the target image by an image homography. Restricting the motion of the robot to the oor plane, which is approximately parallel
to the ceiling plane, further reduces the global transformation between images to a
7.2. Appearance-Based Docking Algorithm
(a) Current Image
129
(b) Target Image
Figure 7.2: Example current and target images. The transformation between them
can be described by one rotation followed by one translation. In this example if the
◦
current image is rotated clockwise by 93 and shifted up and right then it matches
the target image.
1
single rotation and translation (as can be seen in Figure 7.2).
This is similar to
opening an image in a graphics program and rotating and translating it.
The image transformation between the current view and the target view directly
relates to the physical transformation between the current location and the target
location.
Since the camera is placed at the robot's centre of rotation, the image
rotation is equal to the physical rotation, and the image translation is linearly proportional to the physical translation.
The algorithm rst calculates the rotation between the current and target images,
and then nds the translation by aligning the images to face the same direction.
After the images are aligned they are reduced to two 1D representations. The goal
of the method is similar to the ego-motion estimation algorithm presented in Chapter
4, but here the distance between images separated by up to
1m is calculated whereas
in Chapter 4 only a small rotation and translation between consecutive video frames
was considered. Furthermore, here a metric estimate of distance is not sought. The
1 Note that assuming that the ceiling plane is parallel to the oor plane results in a weak perspective projection. Although this is only an approximation, it allows the transformation between
images to be considered as only a rotation and translation, with no perspective shear.
7.2. Appearance-Based Docking Algorithm
130
following two sections detail the approach.
7.2.1 Calculating Rotation
In Chapter 4 three methods of visual compass were compared, namely the Labrosse
compass [73], a 2D optic ow approach and a 1D optic ow approach. Each method
aimed to calculate the rotation of the robot between two video frames. However,
when the robot has travelled a large distance between the two video frames, neither
the 1D nor the 2D optic ow approaches are applicable, and the Labrosse compass
can be adversely aected by strong visual features such as ceiling lights.
An alternative approach is taken here that exploits the fact the ceiling is planar.
The rotation
∆θ
between the target and current image is eciently found without
any additional sensor information by building histograms of the edge orientations
in each image as follows:
1. Convolve the
proximate
φ
δx
x
and
and
δy
y
Sobel edge templates with the images to nd the ap-
partial derivatives of each image. The edge orientation
of each pixel is then dened as
dened as
atan2(δy , δx ),
and the edge magnitude
G
is
p 2
δx + δy2 .
2. For each image build a histogram of the edge orientations of all pixels with
an edge magnitude over a threshold
µ.
A threshold is needed to remove the
high frequency noise present in the images. In the experiments reported here,
µ = 150 and the histograms contained one bin per degree.
for
µ
The choice of value
was made by manually looking at histograms with diering values of
µ,
and selecting a value that resulted in histograms that were not inuenced by
high frequency noise. Values of
µ
greater that
100
were found to be sucient
to remove noise.
Two example histograms are shown in Figure 7.3.
The histogram for the
7.2. Appearance-Based Docking Algorithm
131
current image is similar to the histogram for the target, but shifted horizontally by the rotation between the two images. This is because the edge element
(edgel) orientations are only aected by the rotation of the robot, and not by
the translation. Although some pixels will be recorded in the histogram of the
current image that are not in the histogram of the target image, the majority
of edges are visible in both images so are recorded in both histograms. This
is due to the use of a wide angle lens and the assumption that the robot is no
more than
1m
away from the target.
3. The rotation between the two images is then calculated by nding the horizontal shift between the histograms. Let
of edge orientation
Note that
φ
Hc (φ) and Ht (φ) equal the frequency
in the current and target image histograms respectively.
Hc and Ht are periodic, in the range 0...2π .
The rotation between the
images is then found as the rotation that gives the minimum sum of squared
errors between the histograms:
2π
X
(Hc (φ + θ) − Ht (φ))2 .
∆θ = arg min
θ
(7.1)
φ=0
The minimum is found by testing all values of
θ
from
0...2π
in increments of
one degree.
The accuracy of the rotation calculation is partially dictated by the resolution of
the edge orientation histograms. In the experiments reported here histograms with
one bin per degree were built and no interpolation between bins was carried out,
giving a theoretical maximum accuracy of
±0.5◦ .
Edgel count
Edgel count
7.2. Appearance-Based Docking Algorithm
132
Current Image Edgel Orientation Histogram
400
200
0
0
50
100
300
350
Target Image Edgel Orientation Histogram
400
Shift equals
robot
rotation
200
0
0
150
200
250
Edgel Orientation (degrees)
50
100
150
200
250
Edgel Orientation (degrees)
300
350
400
Figure 7.3: The rotation between the target image and the current image can be
seen as a shift in the edgel orientation histograms of the images.
7.2.2 Calculating Translation
After nding the rotation between the current and target location, it is necessary to
nd the image translation. This is done eciently by utilising average intensity row
and column images. The current image is rst rotated by
∆θ
to align it with the
target image, and then for both the target location image and the current location
image, an average row intensity image and an average column intensity image are
created, as shown in Figure 7.4.
The one dimensional shift between the target row image and the current row image, and between the target column image and the current column image are then
found. These shifts (shown in Figure 7.5) are equivalent to the
∆x
and
∆y
image
translations, which equate to how far in pixels the robot needs to move to get to
the target location.
The process of creating the one dimensional images and calculating the necessary
translation is as follows:
133
Mean Intensity
7.2. Appearance-Based Docking Algorithm
Average each column
Mean Intensity
Average each row
Figure 7.4: The
1D
average row and column images are constructed from the
2D
image by averaging the scanlines.
Verticle Scan Line Current and Target Images
Current
Target
Current
Target
230
Shift equals x translation
Mean Intensity
230
Shift equals y translation
Mean Intensity
Horizontal Scan Line Current and Target Images
250
250
210
210
190
190
170
170
150
150
130
130
110
110
90
90
0
100
200
Image Row
300
400
0
100
200
300
400
Image Column
Figure 7.5: Example average row and average column 1D images for the target and
current images shown in Figure 7.2 after the current image has been aligned with
the target image. The shift between the images gives the translation.
7.2. Appearance-Based Docking Algorithm
134
1. Rotate the current image about the centre by the value of
∆θ
that was found
using Equation (7.1) so that the target image and the current image are aligned
in the same direction.
2. Create one dimensional vertical and horizontal representations of the aligned
current and target images
C(u, v)
and
T (u, v):
U
1 X
Vc (i) =
C(u, i), i = 1...V
U u=0
(7.2)
V
1 X
Hc (i) =
C(i, v), i = 1...U
V v=0
(7.3)
and similarly for the target image, where
Vc
and
intensity images of the current and target images,
Vt
are the vertical average
Hc and Ht are the horizontal
average intensity images of the current and target images and
U, V
are the
width and height of the images.
3. Approximate the gradient of the one dimensional images:
0
Vc (i) = Vc (i + 1) − Vc (i)
and similarly for
Vt , Hc
and
Ht .
i = 1...V − 1
(7.4)
By doing so, some of the eects of varying
illumination can be eliminated.
0
Hc
4. Extract the central region of
matching location in
correlation.
0
Ht
and
0
Vt
and
0
Vc
as templates, and locate the best
using template matching and normalised cross-
Only the central region is considered because the sides of the
one dimensional representation may not be visible in both images.
experiments reported here a
40%
In the
central region was used as a compromise
between a higher computational cost caused by a bigger template region and
the susceptibility to incorrect matches by using a small region. A
size when using
144
480 × 480
40%
region
pixel images allows for an image dierence of up to
pixels, which the experiments show is ample for a
The result of the translation estimation is a vertical shift
v
1m
range.
and a horizontal shift
7.2. Appearance-Based Docking Algorithm
u.
These shifts can be expressed as a
2D
vector
135
(∆x, ∆y)T
that gives the direction
and distance (in pixels) that the robot needs to move to get to the target location.
This is done by rotating the shifts by
∆θ
to move into the coordinate frame of the
current robot position:


∆x
∆y


=
cos∆θ −sin∆θ
sin∆θ
cos∆θ


u
v

.
(7.5)
This vector is often referred to as the homing vector in image-based homing methods [18].
7.2.3 Controlling the Robot
In order to drive the robot to the target image, the following simple control algorithm
was employed:
1. Estimate
(∆x, ∆y)T and ∆θ
using Equations (7.1) and (7.5).
2. Rotate the robot on the spot so as to be facing in the direction of the goal,
where the direction to the goal is given by
arctan2(∆y, ∆x).
This is performed
using the odometry information provided by wheel encoders.
3. Using the wheel encoders for feedback, drive
n
cm forward, where
n
is calcu-
lated using Equation (7.6):
n=k
In experiments
p
∆x2 + ∆y 2 .
(7.6)
k = 1cm.
4. Repeat procedure until
p
∆x2 + ∆y 2 < τ ,
that is accepted as home.
where
τ
is the distance in pixels
7.3. Experiments and Results
5. Rotate the robot on the spot by
136
∆θ
so as to be facing in the same direction
as the goal.
Whilst this approach results in non-smooth trajectories, the application of this algorithm is as an additional procedure to be run after navigation to the vicinity of
a goal position has been carried out using a method like that developed in Chapter
6. Under these circumstances a smooth trajectory is not usually required.
7.3
Experiments and Results
In order to assess the performance of the algorithm, two sets of experiments were
carried out using a Scitos G5 mobile robot.
The environment used was the Uni-
versity of Ulster robotics arena shown in Figure 7.1, with no obstacles present. A
Vicon tracking system was used to record sub-millimetre accurate ground truth.
7.3.1 Experiment 1: Navigations to a Goal
The performance of the proposed navigation algorithm was tested by performing
50
navigations to a goal location from randomly selected positions within a distance
of
1m
to the goal. The control algorithm outlined in Section 7.2.3 was used with a
tolerance of
τ =1
pixel, stopping the robot when the image distance between the
current view and the target view was found to be 1 pixel or less.
Histograms of the errors in the stopping position are shown in Figure 7.6.
mean heading error was
4.45◦
(standard deviation
from the goal after homing was
0.0648
2.0685◦ )
The
and the mean distance
metres (standard deviation
0.0265
metres).
Although on every run the robot was able to navigate to the home position, on some
runs it did not move directly towards the goal. This occurred when a poor heading
estimate was found, as is quantied in the following experiment.
steps were required by the robot to navigate to the home image.
On average 16
7.3. Experiments and Results
137
Histogram of Final Position Error
Histogram of Final Heading Error
8
10
7
8
Frequency
Frequency
6
5
4
3
6
4
2
2
1
0
0
0.05
0
0
0.1
2
4
Figure 7.6: Histograms of the heading and (x,
position for
50
6
8
10
Error (degrees)
Distance to Goal (m)
y)
position errors in the stopping
runs from random start positions up to
1m
from the target.
7.3.2 Experiment 2: Evaluation of the rotation and translation estimation
In order to better evaluate the performance of the (∆x,
∆y, ∆θ)
estimation, the
robot was placed in an empty environment and an image was acquired. The image
was saved as the target image, and the robot was manually driven in a random
pattern around the target. The translation and rotation estimation was executed
during the robot traversal, and the calculated (∆x,
∆y, ∆θ)
values were recorded
along with the ground truth position of the current and target images over
800
frames.
Histogram of ∆θ Estimate Error
∆θ Estimate Error against Physical Distance
600
4
3
Error (radians)
Frequency
500
400
300
200
2
1
0
−1
−2
100
−3
0
−150
−100
−50
0
50
100
150
−4
0
Error (degrees)
0.5
(a)
Figure 7.7: (a) Histogram of the
1
1.5
2
Physical Distance (metres)
(b)
∆θ
estimate for
800
tests over a range of
0
to
1.8
metres. (b) The errors plotted against the physical distance, showing no correlation.
7.3. Experiments and Results
138
Figure 7.7a shows a histogram of the
∆θ
estimate error for each frame and a plot
of the error against the physical distance for each frame.
histogram, other than
8 estimates that were incorrect by 180◦ , the error distribution
is approximately Gaussian with a mean of
8.4569◦ .
As can be seen in the
0.1375◦
and a standard deviation of
Figure 7.7b shows that the angular error does not depend on distance
within the range tested. A Spearman rank correlation test was performed in order
to test for correlation between angular error and distance. The null hypothesis that
no signicant correlation is present was accepted at a
5%
signicance level.
The method of estimating the dierence in orientation between images proved to
work well even up to a distance of
incorrect by
resented
1%
180◦
1.8m.
The
8
orientation estimates that were
were the result of symmetries in the environment, and only rep-
of the trials. The eect of the error is that the robot navigates in the
wrong direction for one navigation step, and in subsequent navigation steps then
corrects this problem.
Image ∆x,∆y Distance against Physical Distance
160
Samples
Line of best fit (robust)
140
Image distance (pixels)
120
100
80
60
40
20
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Physical Distance (metres)
Figure 7.8: The image distance against the physical distance. The boxed area shows
the target region algorithm, i.e. navigations within a range of
Figure 7.8 shows a plot of the image distance
p
∆x2 + ∆y 2
1m.
against the physical
distance for each frame. The boxed region highlights the values in a range up to
1m,
which is the target range for the algorithm. Outside of this region it can be
7.3. Experiments and Results
139
Average Homing Component against Physical Distance
1
Histogram of Average Homing Component
450
0.6
400
0.4
350
0.2
300
0
Frequency
Average Homing Component
0.8
−0.2
−0.4
200
150
−0.6
100
−0.8
−1
0
250
50
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0
−1
1.8
−0.5
Distance to Target (metres)
0
0.5
1
Average Homing Component
(a)
(b)
Figure 7.9: The average homing component of the estimated displacement vectors
for all 800 frames, computed as the dot product of the normalised homing vector
with the true homing vector. (a) A plot of the homing component against physical
distance, showing that the homing component is weakly correlated with the distance
to the goal. (b) A histogram of the homing components.
seen that estimates become increasingly noisy.
In ideal circumstances the relationship between physical distance and image distance
should be linear, and a robust estimate of the linear relationship is shown by a line in
2
the plot . By using the parameters of the tted line to convert the image distances to
physical distances, the physical error between estimated distance and actual distance
can be found. The mean error of the estimated distance for ranges of up to
−3.34−4
metres and the standard deviation was
0.1004
1m
was
metres.
As an evaluation of the direction of the homing vector, [40] suggested the use of
the Average Homing Component (AHC). This is the projection of the normalised
estimated homing vector onto the normalised true homing vector, computed as the
dot product of the two normalised vectors. The AHC ranges from -1 to 1, where a
value of 1 indicates a perfect direction and -1 indicates the opposite direction. Any
value above zero will drive the robot towards the goal, but the higher the number
the more direct the route. The AHC for all
median value was
0.9614,
800 samples is shown in Figure 7.9.
which is equivalent to a direction error of
16
The
degrees.
2 The robust estimate of the linear relationship between physical distance and image distance
was found using the iteratively re-weighted least squares routine provided by the MATLAB
function.
robustt
7.4. Discussion
140
The Spearman rank correlation coecient between the AHC and the distance to the
target was calculated as 0.1371. The null hypothesis that there is no correlation was
rejected at the 5% signicance, indicating that there is weak correlation.
This is
expected because the translation is estimated to the nearest pixel. As the distance
decreases, rounding errors in the translation estimate have a larger eect on the
AHC.
The experiments suggest that the homing algorithm can be expected to achieve a
nal position accuracy of at worst
and
20
cm
95%
10
cm
68%
of the time (1 standard deviation),
of the time (2 standard deviations). However, in practice the nal
position accuracy was higher in the results for experiment 1 as the robot did not stop
until an image distance of 1 pixel was found. This meant that the robot occasionally
oscillated about its nal position before deciding it was home.
7.4
Discussion
Method
Accuracy
Bekris et al. [10]
Mean
std
2cm
≈ 60cm
Labrosse [74]
Röfer [104]
always less
than
Stürzl and Mallot [114]
Using articial landmarks
1.5cm
≈ 1cm
Franz et al. [40]
Comments
3cm
≈ 3cm
From distances of up to
25cm
The target distance is about
4m
Number of experiments
and environment not mentioned.
Evaluated oine with a
recorded image set
Proposed method
≈ 6cm
Table 7.1: Published accuracy of several methods discussed in Chapter 3 compared
to the accuracy of the method presented in this chapter.
As a comparison of the accuracy of the method developed in this chapter to the
state of the art, Table
1
shows the published accuracy of related methods cited in
7.4. Discussion
Chapter 3.
141
It is important to note that this only provides a general comparison
as the cited methods have not been tested with the same data, or in the same
environment.
The accuracy of the new ceiling plane based method is similar to that of [10, 40, 104,
114], and has a number of important advantages. Firstly, unlike [40, 104, 114], an
isotropic distribution of landmarks is not assumed. Assuming that landmarks are at
equal distances in all directions limits applicability, and can cause problems when
obstacles are present in the environment [40].
In contrast to [10, 40, 104, 114], the approach proposed in this chapter is limited
by the requirement that a ceiling plane is visible, and that the robot's motion is
parallel to the ceiling plane.
However, in most indoor environments these condi-
tions will be met. Furthermore the ceiling in the test environment is slanted and
uneven, with variation in height up to
1.5m.
Despite this violation of the under-
lying assumptions, the results have still been promising. Furthermore, articial or
repeatably identiable landmarks (Bekris et al. [10]) are not required. This is an
advantage in situations where it is undesirable to have articial beacons, for example in living rooms, or where features in the environment may change. The use of
the ceiling plane for navigation means that the approach is applicable in dynamic
environments.
The algorithm does not exhibit local minima problems [104, 114] that result in a
limited range of operation. Although the
1m
range used here is lower than the
4m
range reported in [74], the method developed in this chapter is an order of magnitude
more accurate. It is therefore well suited as a complementary navigation strategy for
application after approaching a docking location using longer distance navigation,
such as that in Chapter 6.
7.5. Summary
142
7.4.1 Computational Cost
The computational cost of the algorithm is linear in the number of pixels in the input
image. This is similar to the complexity of many appearance-based approaches, and
by reducing the image to
10
1D the number of operations per pixel is kept low, allowing
3
frames per second to be evaluated on a standard laptop computer .
A breakdown of the typical processing time required for each component of the
algorithm is shown in Table 7.2:
Component
Time / ms
Pre-processing
8
Rotation Estimation
Edge Histogram Creation
11
Edge Histogram Matching
14
Image Rotation
15
1D Image Creation
9
1D Matching
22
Translation Estimation
Total
79
Table 7.2: The typical processing time required for each component of the algorithm.
7.5
Summary
In this chapter a new algorithm for image-based homing has been presented that
exploits the properties of a camera pointing at a ceiling. The new algorithm is fast
and ecient and has linear complexity in the number of pixels in the input image. It
is ideally suited for use on small robots where sensing and computational power are
limited. Unlike other approaches, the algorithm does not require an isotropic distribution of landmarks and does not require explicit feature extraction and matching.
The use of 1D horizontal and vertical image averages for the translation estimation
3 The computer used had a 1.6Ghz processor with 1Gb of RAM, running Fedora Linux.
7.5. Summary
143
has allowed the creation of a computationally ecient algorithm.
Experiments have shown that the algorithm is capable of reducing the distance that
the robot is from a goal from
1m
to
6cm.
This is accurate enough to successfully
dock a Scitos G5 or a Pioneer 3-DX mobile robot with a charging station, and serve
as an additional navigation phase to be executed after the approximate navigation
algorithm presented in Chapter 6. The navigation methods presented in Chapter
6 and Chapter 7 are fully complimentary. They could be combined by introducing
high accuracy nodes to the topological map created in the previous chapter, with
such nodes containing extended perception vectors storing a full snapshot image.
The applicability of the algorithm is limited to cases where a ceiling plane is available.
Furthermore the algorithm assumes that the ceiling plane is parallel to the oor
plane. However, this assumption has been shown not to be overly restrictive, as in
the test environment the ceiling plane was not a single plane and not parallel to the
oor.
A more limiting assumption of the proposed approach is that the ceiling plane is
not homogeneous. If this is the case then the approach will fail to nd the correct
rotation or translation, in a similar way to the potential problems discussed in
Chapter 4.
However, as with the visual compass algorithms in Chapter 4, any
variations in the ceiling intensity will allow the method to work. For example, if a
ceiling light is present then the intensity will vary across the image and the algorithm
will succeed.
Conclusions to be drawn from this chapter regarding the best method of achieving
reliable navigation on a low-cost low-powered commercially viable robot are:
1. Appearance-based methods are not only eective for approximate navigation,
but also for high accuracy navigation methods. In order to achieve accurate
docking, an accurate metric reconstruction is not required. In particular, by
reducing an image to a 1D representation the number of operations per pixel
is kept low.
7.5. Summary
2. Using the ceiling plane allows simple and ecient navigation.
144
The trans-
formation between image pixels becomes homographic, reducing the required
complexity of the method. Furthermore, dynamic objects in the environment
will not be imaged.
3. Separating navigation into two levels an approximate navigation algorithm
such as that in Chapter 6, and a more accurate docking procedure such as
that proposed in this chapter avoids the need for one holistic but complex
navigation system.
This thesis is concluded in the next chapter, with a summary of the work presented
and indications of directions for future research.
8
Conclusion
This chapter concludes the thesis with a summary of the work presented and suggestions of future research directions.
8.1
Thesis Summary
As detailed in Chapter 2, recent advancements in camera and lens technology have
provided omnidirectional cameras at consumer prices.
This thesis has proposed
novel navigation algorithms that utilise such cameras for perceiving the environment
while imposing limited requirements on the processing power available.
Ecient
navigation algorithms that need low-cost hardware are essential if robots are to
become widespread consumer products. Four key navigation strategies have been
addressed in this work: dead reckoning, image-based homing, metric SLAM and
dense topological mapping. Each of these strategies has been considered for ecient
low-cost robot navigation.
Dead reckoning.
Chapter 4 investigated ego-motion estimation using omnidi-
rectional vision with conical mirrors. The Labrosse visual compass algorithm was
identied as an ecient tool for aiding robot ego-motion estimation, and it inspired
two new visual compass algorithms. Direct comparison with the Labrosse approach
8.1. Thesis Summary
146
showed the performance of both proposed algorithms to be poor. This was because
unlike the two newly proposed algorithms, the Labrosse compass algorithm uses two
dimensional image patches directly, thus reducing the susceptibility to noise.
Translation calculation using two conical mirrors to achieve omnidirectional stereopsis was then considered, but problems were encountered with poor alignment
resulting in incorrect stereopsis. To rectify this situation, a new set of equations for
image projections with a conical mirror were formulated for the more general case of
non-exact alignment. However, the formulation encompassed a very large number
of parameters. Finding the values of the parameters is costly, and the parameters
are liable to change during navigation.
This suggests that omnidirectional stereo
vision with conical mirrors is not ideally suited for low-cost mobile robots.
Metric SLAM. The investigation of computationally ecient navigation continued
in Chapter 5 with a look at omnidirectional monocular vision based metric SLAM.
A system based on Davison's MonoSLAM was implemented, and quantitative experiments were performed to assess the benets gained by omnidirectional sensing
and the applicability within the context of this thesis. It was concluded that conventional visual metric SLAM systems are not suitable, and creating a metrically
consistent map is an expensive goal.
The overall conclusion from Chapters 4 and 5 was that a low-cost robot may not
need complex sensors and algorithms in order to navigate. In the second half of the
work, attention turned to dense topological mapping and image-based homing as
alternative strategies, and attempted to achieve ecient navigation using a low-cost
sheye omnidirectional camera.
Dense topological map. Chapter 6 presented a new environment learning and
navigation strategy that is both eective and computationally cheap. The method
makes use of dense topological mapping to allow odometry information to be successfully used for navigation, while at the same time maintaining eciency through
the use of a SOM to index the perceptual space. Experiments with both a simplied
8.2. Summary of Research Contributions
147
blob detection based perception and a general appearance-based approach showed
the eectiveness of the algorithm.
From Chapter 6 it can be concluded that topological mapping combined with local
odometry based metric information is an eective approach for navigation.
Fur-
thermore, appearance-based approaches to the use of the camera data are eective,
general and ecient, and are therefore well suited for use on low-cost robots.
Image-based homing. Finally the investigation looked at image-based homing in
Chapter 7. The accuracy of the navigation algorithm proposed in Chapter 6 was
shown to be in the range of 0 to 1 metre. When a higher accuracy than this is needed,
for example when docking the robot, an alternative method of navigation needs to be
employed. Chapter 7 proposed an ecient method for accurate local navigation that
uses image-based homing in order to manoeuvre into position, yielding an algorithm
with a mean accuracy of 6cm.
It was shown that in order to achieve an accurate docking, an accurate metric
reconstruction is not required. Appearance-based methods were found to not only
be eective for approximate navigation, but also for high accuracy navigation. This
suggests that when developing a navigation system for a low-powered low-cost robot,
appearance-based approaches are both ecient and eective.
8.2
Summary of Research Contributions
The following research contributions have been made in this thesis:
•
A thorough literature review and analysis of robot navigation strategies for
economical robots.
•
Two new rotation estimation algorithms.
•
The general formulation of omnidirectional projections with conical mirrors.
8.3. Directions for Future Research
•
148
A quantitative comparison of monocular omnidirectional vision and classical
monocular vision for metric visual simultaneous localisation and mapping.
•
A novel use of self-organising maps for approximate navigation, eciently
achieving visual appearance-based topological mapping.
•
A novel algorithm for accurate visual homing that exploits the properties of
images taken from an indoor robot.
8.3
Directions for Future Research
There are a number of directions that future research could take. The work presented
in Chapters 4 to 7 suggests the following:
•
Ego-motion with the sheye camera. The work presented in Chapter 4
made use of conical mirrors to achieve omnidirectional vision. Having moved
from the conical mirror to a sheye lens in later chapters, it would be worth
returning to Chapter 4 to investigate the use of the sh eye lens for egomotion estimation. Stereo vision can be avoided by exploiting pre-programmed
knowledge about the environment, such as in Chapter 7 where the planarity of
the ceiling was utilised. Future work could also focus on improving the visual
compass methods presented in Chapter 4, in particular the one dimensional
image ow method presented in Section 4.2. Improvements may be possible
by employing edge information rather than intensity information.
•
Alternative metric SLAM solutions. Although Chapter 5 showed that
conventional metric monocular visual SLAM is not suitable for low-cost navigation, SLAM is a very active research area and advances are continuously
being made.
Future work could follow newly developed SLAM algorithms
such as the pioneering SLAM implementation on a mobile phone by Klein and
Murray [64], which has recently provided promising results.
8.3. Directions for Future Research
•
149
Topological mapping. The successful topological mapping and navigation
proposed in Chapter 6 has a number of possible extensions that could form the
basis of future research. A major extension could be to allow entirely online operation by performing SOM training online, thus reducing the algorithm from
two stages to a single stage. The main diculty that will need to be overcome
will be the need to regularly change the associations between topological map
nodes and SOM nodes. Future research could look at ways of achieving this
eciently and eectively, and may involve investigations of alternative perception clustering and compression algorithms. Shorter term future research
could involve looking at alternative image representations, with a particular
focus on rotation invariance in order to remove the need to supplement the
navigation algorithm with man-made landmarks. Finally, node management
strategies need to be considered.
In the current implementation map nodes
are created at xed time intervals; this is certainly not optimum, and future
research needs to investigate this.
Node deletion approaches could also be
considered allowing the map to evolve to t the areas regularly navigated to.
•
Image-based docking.
The image-based docking algorithm proposed in
Chapter 7 also suggests a number of viable directions for future research.
In particular, the way in which the image is used importantly aects the
performance of the algorithm. Reducing the size of the image will increase the
computational eciency, but may potentially degrade the performance. This
could be considered in conjunction with alternative image representations, for
example edge magnitudes rather than image intensities.
150
References
[1] Ababsa, F., Mallem, M., and Roussel, D. Comparison Between Particle Filter Approach and Kalman Filter-Based Technique For Head Tracking in
Augmented Reality Systems. In Proceedings of the 2004 International Confer-
ence on Robotics and Automation (ICRA 2004) (2004), IEEE, pp. 10211026.
[2] Agrawal, M., and Konolige, K. Real-time Localization in Outdoor Environments using Stereo Vision and Inexpensive GPS. In Proceedings of the
18th International Conference on Pattern Recognition (ICPR 2006) (2006),
IEEE, pp. 10631068.
[3] Albert, M., and Connell, J. H.
tion for mobile robot navigation.
Visual rotation detection and estima-
In Proceedings of the 2004 International
Conference on Robotics and Automation (ICRA 2004) (2004), vol. 5, IEEE,
pp. 42474252.
[4] Angeli, A., Doncieux, S., Meyer, J.-A., and Filliat, D. Visual topological SLAM and global localization.
In Proceedings of the 2009 Interna-
tional Conference on Robotics and Automation (ICRA 2009) (2009), IEEE,
pp. 20292034.
[5] Artac, M., Jogan, M., and Leonardis, A. Incremental PCA for on-line
visual learning and recognition. In Proceedings of the 2002 International Con-
ference on Pattern Recognition (ICPR 2002) (2002), vol. 16, IEEE, pp. 781
784.
References
151
[6] Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. A tutorial on particle lters for online nonlinear/non-Gaussian Bayesian tracking.
IEEE Transactions on Signal Processing 50, 2 (2002), 174188.
[7] Bailey, T.
Constrained initialisation for bearing-only SLAM.
In Proceed-
ings of the 2003 International Conference on Robotics and Automation (ICRA
2003) (2003), IEEE, pp. 19661971.
[8] Baker, S., and Nayar, S. A theory of single-viewpoint catadioptric image
formation. International Journal of Computer Vision 35, 2 (1999), 175196.
[9] Beauchemin, S. S., and Barron, J. L. The computation of optical ow.
ACM Computing Surveys (CSUR) 27, 3 (1995), 433466.
[10] Bekris, K., Argyros, A., and Kavraki, L. Imaging Beyond the Pinhole
Camera, vol. 33. Springer Netherlands, 2006, ch. Exploiting Panoramic Vision
for Bearing-Only Robot Homing, pp. 229251.
[11] Bonin-Font, F., Ortiz, A., and Oliver, G. Visual Navigation for Mobile
Robots: A Survey.
Journal of Intelligent and Robotic Systems 53, 3 (Nov.
2008), 263296.
[12] Borenstein, J., Everett, H., and Feng, L. Navigating Mobile Robots:
Sensors and Techniques. A. K. Peters, Ltd., Wellesley, MA, 1997.
[13] Briggs, A., Detweiler, C., Mullen, P., and Scharstein, D. ScaleSpace Features in 1D Omnidirectional Images. In Proceedings of the fth Work-
shop on Omnidirectional Vision (OmniVis 2004) (May 2004), no. 5, pp. 115
126.
[14] Briggs, A., Li, Y., Scharstein, D., and Wilder, M.
tion using 1D panoramic images.
Robot naviga-
In Proceedings of the 2006 International
Conference on Robotics and Automation (ICRA 2006) (May 2006), IEEE,
pp. 26792685.
[15] Burbridge, C. Omnidirectional Vision Simulation and Stereo Range Finding. MSc Thesis, University of Essex, Sept. 2006.
References
152
[16] Burbridge, C., Nehmzow, U., and Condell, J. Omnidirectional projections with a cone mirror and single mirror stereo. In Proceedings of OMNIVIS
2008 (2008).
[17] Burbridge, C., and Spacek, L.
Omnidirectional Vision Simulation and
Robot Localisation. In Proceedings of Towards Autonomous Robotic Systems
(TAROS 2006) (2006), pp. 3239.
[18] Cartwright, B., and Collett, T. Landmark maps for honeybees. Bio-
logical Cybernetics 57, 1 (1987), 8593.
[19] Cartwright, B. A., and Collett, T. S.
Landmark learning in bees.
Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and
Behavioral Physiology 151, 4 (Dec. 1983), 521543.
[20] Cauchois, C., Brassart, E., Drocourt, C., and Vasseur, P. Calibration of the omnidirectional vision sensor: SYCLOP. In Proceedings of the 1999
International Conference on Robotics and Automation (ICRA 1999) (1999),
IEEE, pp. 12871292.
[21] Cauchois, C., Brassart, E., Pegard, C., and Clerentin, A. Technique for calibrating an omnidirectional sensor. In Proceedings of the 1999 In-
ternational Conference on Intelligent Robots and Systems (IROS 1999) (1999),
IEEE/RSJ, pp. 166171.
[22] Chella, A., Macaluso, I., and Riano, L.
and localization in autonomous robotics.
Automatic place detection
In Proceedings of the 2007 Inter-
national Conference on Intelligent Robots and Systems (IROS 2007) (2007),
IEEE/RSJ, pp. 741746.
[23] Chen, D., Chang, R., and Huang, Y.
Breast cancer diagnosis using
self-organizing map for sonography. Ultrasound in Medicine & Biology 26, 3
(2000), 405411.
References
153
[24] Civera, J., Davison, A., and Montiel, J. Pattern Recognition and Image
Analysis (Lecture Notes in Computer Science). Springer Berlin, June 2007,
ch. Dimensionless Monocular SLAM, pp. 412419.
[25] Cummins, M., and Newman, P. Probabilistic Appearance Based Navigation and Loop Closing. In Proceedings of the 2007 International Conference
on Robotics and Automation (ICRA 2007) (Apr. 2007), IEEE, pp. 20422048.
[26] Davison, A. J., Reid, I., Molton, N., and Stasse, O.
Real-Time Single Camera SLAM.
MonoSLAM:
IEEE Transactions on Pattern Analysis
and Machine Intelligence 29, 6 (2007), 10521067.
[27] Deans, M., and Hebert, M. Experimental Comparison of Techniques for
Localization and Mapping using a Bearings Only Sensor. In Lecture Notes in
Control and Information Sciences; Vol. 271 Experimental Robotics VI (Dec.
2000), Springer-Verlag, pp. 395404.
[28] DeSouza, G. N., and Kak, A. C. Vision for Mobile Robot Navigation: A
Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 24,
2 (2002), 237267.
[29] Duckett, T., Marsland, S., and Shapiro, J. Fast, on-line learning of
globally consistent maps. Autonomous Robots 12, 3 (2002), 287300.
[30] Durrant-Whyte, H., and Bailey, T. Simultaneous localization and mapping: part I. IEEE Robotics & Automation Magazine 13, 2 (2006), 99110.
[31] Durrant-Whyte, H., Majumder, S., Thrun, S., de Battista, M.,
and Scheding, S.
map building.
A bayesian algorithm for simultaneous localisation and
In Robotics Research, vol. 6 of Springer Tracts in Advanced
Robotics. Springer Berlin / Heidelberg, 2003, pp. 4960.
[32] Durrant-Whyte, H., Rye, D., and Nebot, E. Localisation of automatic
guided vehicles.
In Proceedings of 7th International Synopsium on Robotics
Research (ISRR 95) (1996), Springer Verlag, pp. 613625.
References
154
[33] Elinas, P., Sim, R., and Little, J. J.
σ SLAM: Stereo Vision SLAM using
the Rao-Blackwellised Particle Filter and a Novel Mixture Proposal Distribution.
In Proceedings of the IEEE International Conference on Robotics and
Automation (ICRA 2006) (May 2006), IEEE, IEEE, pp. 15641570.
[34] Filliat, D. Interactive learning of visual topological navigation. In Proceed-
ings of the 2008 International Conference on Intelligent Robots and Systems
(IROS 2008) (2008), IEEE/RSJ, pp. 248254.
[35] Filliat, D., and Meyer, J.-A. Global localization and topological maplearning for robot navigation. In Proceedings of the Seventh International Con-
ference on Simulation of Adaptive Behavior (SAB 2002) (2002), MIT Press,
pp. 131140.
[36] Fischler, M. A., and Bolles, R. C.
Random Sample Consensus:
A
Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM 24, 6 (June 1981), 381395.
[37] Fox, D., Burgard, W., and Thrun, S. The dynamic window approach
to collision avoidance.
IEEE Robotics & Automation Magazine 4, 1 (1997),
2333.
[38] Francis, G., and Spacek, L. Linux Robot with Omnidirectional Vision. In
Proceedings of Towards Autonomous Robotic Systems (TAROS 2006) (2006),
pp. 5663.
[39] Franz, M., Schölkopf, B., and Bülthoff, H. Homing by Parameterized
Scene Matching. Experimental Brain Research 114 (1997), 235245.
[40] Franz, M., Schölkopf, B., Mallot, H., and Bülthoff, H.
Where
did I take that snapshot? Scene-based homing by image matching. Biological
Cybernetics 79, 3 (1998), 191202.
[41] Garreau,
ington
J.
Post,
Robots
available
That
online:
Fill
an
Emotional
Vacuum.
Wash-
http://www.washingtonpost.com/wp-
References
155
dyn/content/article/2006/12/28/AR2006122801309.html.
Accessed
29th
June 2010.
[42] Gaspar, J., Winters, N., and Santos-victor, J. Vision-based navigation and environmental representations with an omnidirectional camera. IEEE
Transactions on Robotics and Automation 16, 6 (2000), 890898.
[43] Geyer, C., and Daniilidis, K. Catadioptric projective geometry. Interna-
tional Journal of Computer Vision 45, 3 (2001), 223243.
[44] Goedemé, T., Tuytelaars, T., Van Gool, L., Vanacker, G., and
Nuttin, M. Feature based omnidirectional sparse visual path following. In
Proceedings of the 2005 International Conference on Intelligent Robots and
Systems (IROS 2005) (2005), IEEE/RSJ, pp. 18061811.
[45] Hafed, Z., and Levine, M.
Face recognition using the discrete cosine
transform. International Journal of Computer Vision 43, 3 (2001), 167188.
[46] Hartley, R., and Zisserman, A.
Multiple View Geometry.
Cambridge
University Press, 2000.
[47] Hecht, E., and Zajac, A. Optics. 1974.
[48] Hong, J., Tan, X., Pinette, B., Weiss, R., and Riseman, E. Imagebased homing. IEEE Control Systems Magazine 12, 1 (1992), 3845.
[49] Horn, B. K., and Schunck, B. G.
Determining optical ow.
Articial
Intelligence 17, 1-3 (Aug. 1981), 185203.
[50] Huang, W. H., and Beevers, K. R. Algorithmic Foundations of Robotics
VI. 2005, ch. Topological Mapping with Sensing-Limited Robots, pp. 235250.
[51] Isard, M., and Blake, A. CONDENSATIONConditional Density Propagation for Visual Tracking. International Journal of Computer Vision 29, 1
(Aug. 1998), 528.
References
156
[52] Ishiguro, H., and Tsuji, S.
Image-based memory of environment.
In
Proceedings of the 1996 International Conference on Intelligent Robots and
Systems (IROS 1996) (1996), pp. 634639.
[53] Ishiguro, H., Yamamoto, M., and Tsuji, S.
Omni-Directional Stereo.
IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (1992),
257262.
[54] Itti, L., Koch, C., and Niebur, E.
attention for rapid scene analysis.
A model of saliency-based visual
IEEE Transactions on Pattern Analysis
and Machine Intelligence 20, 11 (1998), 12541259.
[55] Jain, A., and Vailaya, A. Image retrieval using color and shape. Pattern
Recognition 29, 8 (1996), 12331244.
[56] Jogan, M., and Leonardis, A. Robust localization using an omnidirectional appearance-based subspace model of environment.
Robotics and Au-
tonomous Systems 45, 1 (2003), 5172.
[57] Kaski, S., Kangas, J., and Kohonen, T. Bibliography of self-organizing
map (SOM) papers: 1981-1997. Neural Computing Surveys 1, 3 (1998), 1176.
[58] Kelley, C. T. Solving Nonlinear Equations with Newton's Method. SIAM
Press, 2003.
[59] Kerr, D. Autonomous scale invariant feature extraction. PhD Thesis, University of Ulster, 2008.
[60] Kim, J., Yoon, K.-J., Kim, J.-S., and Kweon, I. Visual SLAM by SingleCamera Catadioptric Stereo. In Proceedings of SICE-ICCAS Joint Conference
(Oct. 2006), pp. 20052009.
[61] Kim, J.-H., and Chung, M. J. SLAM with omni-directional stereo vision
sensor.
In Proceedings of the 2003 International Conference on Intelligent
Robots and Systems (IROS 2003) (Oct. 2003), IEEE/RSJ, pp. 442447.
References
[62] Kim,
157
S.,
and Oh,
S.-Y.
SLAM in Indoor Environments using Omni-
directional Vertical and Horizontal Line Features. Journal of Intelligent and
Robotic Systems 51, 1 (2008), 3143.
[63] Kiviluoto, K. Predicting bankruptcies with the self-organizing map. Neu-
rocomputing 21, 1-3 (1998), 191201.
[64] Klein, G., and Murray, D. Parallel Tracking and Mapping on a Camera
Phone.
In Proceedings of the Eigth International Symposium on Mixed and
Augmented Reality (ISMAR'09) (Oct. 2009), IEEE/ACM, pp. 8386.
[65] Kobilarov, M., Sukhatme, G., Hyams, J., and Batavia, P.
People
tracking and following with mobile robot using an omnidirectional camera
and a laser. In Proceedings of the 2006 International Conference on Robotics
and Automation (ICRA 2006) (2006), IEEE, pp. 557562.
[66] Kohonen, T. Self-organized formation of topologically correct feature maps.
Biological Cybernetics 43, 1 (1982), 5969.
[67] Kohonen, T.
Self-organization and associative memory.
Harvard Press,
1988.
[68] Kohonen, T. The self-organizing map. Neurocomputing 21, 1-3 (1998), 16.
[69] Kosaka, A., and Pan, J. Purdue Experiments in Model-based Vision for
Hallway Navigation.
In Proceedings of Workshop on Vision for Robots in
IROS'05 (1995), IEEE, pp. 8796.
[70] Kosecka, J., Zhou, L., Barber, P., and Duric, Z. Qualitative image
based localization in indoors environments. In Proceedings of the 2003 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition
(2003), vol. 2, IEEE, pp. 310.
[71] Kurdthongmee, W. A novel hardware-oriented Kohonen SOM image compression algorithm and its FPGA implementation. Journal of Systems Archi-
tecture 54, 10 (Oct. 2008), 983994.
References
158
[72] Kwok, N. M., and Dissanayake, G.
Bearing-only SLAM in indoor en-
vironments using a modied particle lter. In Proceedings of the 2003 Aus-
tralasian Conference on Robotics and Automation (2003), pp. 18.
[73] Labrosse,
F.
The visual compass:
performance and limitations of an
appearance-based method. Journal of Field Robotics 23, 10 (2006), 913941.
[74] Labrosse, F. Short and long-range visual navigation using warped panoramic
images. Robotics and Autonomous Systems 55, 9 (Sept. 2007), 675684.
[75] Lambrinos, D., Moller, R., Labhart, T., and Pfeifer, R. A mobile
robot employing insect strategies for navigation.
Robotics and Autonomous
Systems 30, 1 (2000), 3964.
[76] Lemaire, T., and Lacroix, S. SLAM with Panoramic Vision. Journal of
Field Robotics 24, 1-2 (2007), 91111.
[77] Lenser, S., and Veloso, M.
ing monocular vision.
Visual sonar:
fast obstacle avoidance us-
In Proceedings of the 2003 International Conference
on Intelligent Robots and Systems (IROS 2003) (2003), vol. 1, IEEE/RSJ,
pp. 886891.
[78] Lin, S.-S., and Bajcsy, R.
True Single View Point Cone Mirror Omni-
Directional Catadioptric System.
In Proceedings of the 2001 International
Conference on Computer Vision (2001), pp. 102107.
[79] López-Nicolás, G., Guerrero, J., and Sagüés, C. Multiple homographies with omnidirectional vision for robot homing. Robotics and Autonomous
Systems 58, 6 (2010), 773783.
[80] Lucas, B., and Kanade, T.
An Iterative Image Registration Technique
with an Application to Stereo Vision. In Proceedings of the 7th International
Conference on Articial Intelligence (IJCAI) (Aug. 1981), pp. 674679.
[81] Maimone, M., Cheng, Y., and Matthies, L. Two years of Visual Odometry on the Mars Exploration Rovers. Journal of Field Robotics 24, 3 (2007),
169186.
References
159
[82] Maire, F., and Keeratipranon, N. Bearing Similarity Measures for SelfOrganizing Feature Maps.
In Proceedings of Sixth International Conference
on Intelligent Data Engineering and Automated Learning (IDEAL'05) (2005),
pp. 286293.
[83] Matsuhisa, R., Kawasaki, H., Ono, S., Hanno, T., and Ikeuchi, K.
Structure from Motion for Omni-directional Images by using Factorization
and Bundle Adjustment Method (in Japanese). In Proceedings of Meeting on
Image Recognition and Understanding (2008).
[84] Menegatti, E., Ishiguro, H., and Maeda, T. Image-based memory for
robot navigation using properties of omnidirectional images.
Robotics and
Autonomous Systems 47, 4 (July 2004), 251267.
[85] Micusik, B., and Pajdla, T. Structure from Motion with Wide Circular
Field of View Cameras. IEEE Transactions on Pattern Analysis and Machine
Intelligence 28, 7 (July 2006), 11351149.
[86] Miyamoto, K. Fish Eye Lens. Journal of the Optical Society of America 54,
8 (Aug. 1964), 1060.
[87] Möller, R., and Vardy, A. Local visual homing by matched-lter descent
in image distances. Biological Cybernetics 95, 5 (2006), 412430.
[88] Möller, R., Vardy, A., Kreft, S., and Ruwisch, S. Visual homing in
environments with anisotropic landmark distribution. Autonomous Robots 23,
3 (2007), 231245.
[89] Montemerlo, M., Thrun, S., Koller, D., and Wegbreit, B.
Fast-
SLAM: a factored solution to the simultaneous localization and mapping problem. In Proceedings of the Eighteenth National Conference on Articial intel-
ligence (Menlo Park, CA, USA, 2002), American Association for Articial
Intelligence, pp. 593598.
References
160
[90] Montiel, J., and Davison, A.
A visual compass based on SLAM.
In
Proceedings of the 2006 International Conference on Robotics and Automation
(ICRA 2006) (2006), pp. 19171922.
[91] Nayar, S. Omnidirectional vision. In Proceedings of the 1998 British Machine
Vision Conference (1998).
[92] Nayar, S. K., and Baker, S. Catadioptric Image Formation. In Proceedings
of the 1997 DARPA Image Understanding Workshop (1997), pp. 13471431.
[93] Neal, M., and Labrosse, F. Rotation-invariant appearance based maps
for robot navigation using an articial immune network algorithm. In Proceed-
ings of the 2004 Congress on Evolutionary Computation (CEC 2004) (2004),
pp. 863870.
[94] Nehmzow, U.
Mobile Robotics: A Practical Introduction. Springer-Verlag
London, 2000.
[95] Nehmzow, U., and Vieira Neto, H.
Locating Objects Visually using
Opposing-Colour-Channel Coding. In Proceedings of Towards Intelligent Mo-
bile Robots (TIMR 2003) (2003).
[96] Nelson, R. C., and Aloimonos, J. Finding motion parameters from spherical motion elds (or the advantages of having eyes in the back of your head).
Biological Cybernetics 58, 4 (Mar. 1988), 261273.
[97] Oja, M., Kaski, S., and Kohonen, T. Bibliography of self-organizing map
(SOM) papers: 1998-2001 addendum. Neural Computing Surveys 3, 1 (2003),
1156.
[98] Owen, C., and Nehmzow, U. Route Learning in Mobile Robots through
Self-Organisation. In Proceedings of the 1st Euromicro Workshop on Advanced
Mobile Robots (EUROBOT) (1996), IEEE Computer Society, ISBN 0-81867695-7, p. 126.
References
161
[99] Owen, C., and Nehmzow, U.
Landmark-based navigation for a mobile
robot. In Proceedings of the Fifth International Conference on Simulation of
Adaptive Behavior (SAB 1998) (1998), pp. 240245.
[100] Pajdla, T., and Hlavac, V. Zero Phase Representation of Panoramic Images for Image Based Localization. In Proceedings of the International Con-
ference on Computer Analysis of Images and Patterns (1999), pp. 550557.
[101] Paz, L. M., Jensfelt, P., Tardós, J. D., and Neira, J. EKF SLAM
updates in O(n) with Divide and Conquer SLAM. In Proceedings of the 2007
International Conference on Robotics and Automation (ICRA 2007) (Apr.
2007), pp. 16531657.
[102] Perez, J. A., Castellanos, J. A., Montiel, J. M. M., Neira, J., and
Tardos, J. D. Continuous Mobile Robot Localization: Vision vs. Laser. In
Proceedings of the 1999 International Conference on Robotics and Automation
(ICRA 1999) (May 1999), IEEE, pp. 29172923.
[103] Pisokas, J., and Nehmzow, U. Adaptive Agents and Multi-Agent Systems
III. Springer Berlin, 2005, ch. Experiments in Subsymbolic Action Planning
with Mobile Robots, pp. 216229.
[104] Röfer, T.
Image based homing using a self-organizing feature map.
In
Proceedings of the 1995 International Conference on Articial Neural Networks
(ICANN-95) (1995), vol. 1, pp. 475480.
[105] Rössel, S., and Wehner, R. Polarization vision in bees. Nature 323 (1986),
128131.
[106] Scaramuzza,
D.,
Fraundorfer,
F.,
and Siegwart,
R.
Real-time
monocular visual odometry for on-road vehicles with 1-point RANSAC.
In
Proceedings of the 2009 International Conference on Robotics and Automation (ICRA 2009) (2009), pp. 42934299.
[107] Scaramuzza, D., Martinelli, A., and Siegwart, R. A Flexible Technique for Accurate Omnidirectional Camera Calibration and Structure from
References
162
Motion. In Proceedings of the Fourth IEEE International Conference on Com-
puter Vision Systems (ICVS '06) (2006), IEEE, p. 45.
[108] Scaramuzza, D., and Siegwart, R. Appearance-Guided Monocular Omnidirectional Visual Odometry for Outdoor Ground Vehicles. IEEE Transac-
tions on Robotics 24, 5 (2008), 10151026.
[109] Sim, R., and Dudek, G. Comparing image-based localization methods. In
Proceedings of the 18th International Joint Conference on Articial Intelligence (2003), pp. 15601562.
[110] Sivic, J., and Zisserman, A. Toward Category-Level Object Recognition.
Springer Berlin, 2006, ch. Video Google: Ecient visual search of videos.
[111] Smith, R. C., and Cheeseman, P. On the Representation and Estimation
of Spatial Uncertainty. The International Journal of Robotics Research 5, 4
(1986), 5668.
[112] Spacek, L.
A catadioptric sensor with multiple viewpoints.
Robotics and
Autonomous Systems 51, 1 (2005), 315.
[113] Sturm, J., and Visser, A. An appearance-based visual compass for mobile
robots. Robotics and Autonomous Systems 57, 5 (May 2009), 536545.
[114] Stürzl, W., and Mallot, H.
transformed panoramic images.
Ecient visual homing based on Fourier
Robotics and Autonomous Systems 54, 4
(2006), 300313.
[115] Sun, Y., Cao, Q., and Chen, W. An object tracking and global localization
method using omnidirectional vision system. In Proceedings of the 5th World
Congress on Intelligent Control and Automation (2004).
[116] Swaminathan, R., and Nayar, S. Polycameras: Camera clusters for wide
angle imaging.
NY, 2005.
Tech. Rep. CUCS-013-99, Columbia University, New York,
References
163
[117] Tan, K., Hua, H., and Ahuja, N.
Multiview Panoramic Cameras Us-
ing Mirror Pyramids. IEEE Transactions on Pattern Analysis and Machine
Intelligence 26, 7 (2004), 941946.
[118] Thrun, S., Burgard, W., and Fox, D. Probabilistic Robotics. MIT Press,
2005.
[119] Tomasi, C., and Kanade, T. Shape and motion from image streams under orthography: a factorization method. International Journal of Computer
Vision 9, 2 (Nov. 1992), 137154.
[120] Tsai, R. Y. An ecient and accurate camera calibration technique for 3D
machine vision.
In Proceedings of the 1986 IEEE Conference on Computer
Vision (1986), pp. 364374.
[121] Unibrain
Inc.
Fire-i
rewire
board
http://www.unibrain.com/Products/VisionImg/Fire_i_BC.htm.
camera.
Accessed
29th June 2010.
[122] Usher, K., Ridley, P., and Corke, P. A Camera as a Polarized Light
Compass: Preliminary Experiments.
In Proceedings of the 2001 Australian
Conference on Robotics and Automation (2001), no. November, pp. 116120.
[123] Williams, S. B. Ecient solutions to autonomous mapping and navigation
problems. PhD Thesis, University of Sydney, Australia, 2001.
[124] Winters, N. A holistic approach to mobile robot navigation using omnidi-
rectional vision. PhD Thesis, Trinity College, Dublin, 2001.
[125] Winters, N., Gaspar, J., Lacey, G., and Santos-Victor, J. Omnidirectional vision for robot navigation. In Proceedings of the 2000 Workshop
on Omnidirectional Vision (OMNIVIS 2000) (2000), p. 21.
[126] Yin, H. Data visualisation and manifold mapping using the ViSOM. Neural
Networks 15, 8 (2002), 10051016.
References
164
[127] Zeil, J., Hofmann, M. I., and Chahl, J. S. Catchment areas of panoramic
snapshots in outdoor scenes. Journal of the Optical Society of America 20, 3
(Mar. 2003), 450.
[128] Zhou, C., Wei, Y., and Tan, T. Mobile robot self-localization based on
global visual appearance features.
In Proceedings of the 2003 International
Conference on Robotics and Automation (ICRA 2003) (2003), pp. 12711276.
[129] Zhou, L., Barber, P., and Duric, Z. Qualitative image based localization
in indoors environments. In Proceedings of the 2003 Conference on Computer
Vision and Pattern Recognition (CVPR'03) (2003), pp. 310.
[130] Zhu, Z. Omnidirectional stereo vision. In Proceedings of the 10th International
Conference on Advanced Robotics 2001 (2001), IEEE.
[131] Zitova, B., and Flusser, J. Image registration methods: a survey. Image
and Vision Computing 21, 11 (2003), 9771000.