Efficient Robot Navigation with Omnidirectional Vision
Transcription
Efficient Robot Navigation with Omnidirectional Vision
Ecient Robot Navigation with Omnidirectional Vision Christopher James Charles Burbridge Faculty of Computing and Engineering A thesis submitted for the degree of Doctor of Philosophy July, 2010 i Contents Acknowledgements iv Abstract v Declaration 1 2 3 vi Introduction 1 1.1 Low-Cost and Computationally Ecient . . . . . . . . . . . . . . 2 1.2 Sensor Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Research Tasks and Objectives . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Algorithms and Tools 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Omnidirectional Vision . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Omnidirectional Binocular Vision with Conical Mirrors . . . . . . . . 14 2.4 Appearance-Based Vision . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Self-Organising Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 Summary 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Literature Review: Robot Navigation 22 Contents ii 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Mapless Robot Navigation . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Map-based Robot Navigation . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4 5 6 Ego-Motion Estimation with Binocular Omnidirectional Vision 44 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Visual Compass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Limitations of the Methods . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.5 Calculating the Translation of the Robot . . . . . . . . . . . . . . . . 61 4.6 Omnidirectional Binocular Vision Without Alignment . . . . . . . . . 63 4.7 Summary 72 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Metric SLAM with Monocular Vision 74 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 Extended Kalman Filter SLAM Algorithm . . . . . . . . . . . . . . . 75 5.3 Experiments with EKF SLAM . . . . . . . . . . . . . . . . . . . . . . 80 5.4 Metric Visual SLAM in Practice . . . . . . . . . . . . . . . . . . . . . 85 5.5 Summary 88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topological Map-Based Navigation 90 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.2 Dense Metric-Topological Mapping 92 6.3 Using a SOM for Ecient Topological Mapping 6.4 Experiments in Discrete Navigation . . . . . . . . . . . . . . . . . . . 103 6.5 Experiments in Continuous Appearance-Based Navigation 6.6 Limitations of the Method . . . . . . . . . . . . . . . . . . . . . . . . 123 6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 . . . . . . 111 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Contents 7 8 iii Accurate Visual Appearance-Based Local Navigation 126 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.2 Appearance-Based Docking Algorithm 7.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.5 Summary . . . . . . . . . . . . . . . . . 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Conclusion 145 8.1 Thesis Summary 8.2 Summary of Research Contributions 8.3 Directions for Future Research . . . . . . . . . . . . . . . . . . . . . . 148 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 . . . . . . . . . . . . . . . . . . 147 150 iv Acknowledgements The rst half of this work was undertaken at the University of Essex under the supervision of Dr Libor Spacek. The University of Essex provided an excellent environment for research, and awarded me an EPSRC scholarship to support my PhD. I would like to express my sincerest gratitude to Libor for introducing me to omnidirectional imaging, to the EPSRC for funding me for the rst two years of study, and to all the numerous sta and students at the University of Essex who assisted me. Upon the retirement of Dr Libor Spacek, my PhD studies transferred to the University of Ulster, where Dr Joan Condell, Professor Ulrich Nehmzow and later Professor Martin McGinnity became my supervisors. I wish to express my utmost gratitude to Joan, Ulrich and Martin for their guidance, kindness and insight throughout my study at Ulster. I wish to express particular thanks to Ulrich for inspiring my interest in robotics, and encouraging me to undertake further study after supervising my undergraduate work. Without Ulrich, this thesis would not exist. Additionally, I thank the University of Ulster Intelligent Systems Research Centre (ISRC), directed by Professor Martin McGinnity, for funding the second half of my PhD and providing an outstanding atmosphere for research. Special thanks are also due to my colleagues in the ISRC robotics group for their continual support and friendship. I also thank my family for their constant support and encouragement. And nally, conducting this research and writing this thesis would not have been possible without the love, understanding and support of Emily. v Abstract This thesis investigates vision based robot navigation for low-cost commercially viable robots. As robots become more ubiquitous in everyday life, cheap and ecient methods of navigation that can work on consumer level hardware with low-cost sensors will need to be developed. The rst half of the thesis deals with sensor choice, and looks at the application of binocular and monocular omnidirectional vision to robot navigation. A navigation strategy that utilises binocular omnidirectional vision with conical mirrors is considered, and the problems and limitations of such a stereo system in the context of low-cost low-computationally powered robots are discussed. Contributions are made in the form of two new visual compass approaches alongside the formulation of generic image projections with conical mirrors. A single monocular omnidirectional camera is then investigated as a sensor for conventional metric localisation and mapping, and is discussed in the context of this thesis. Experimental results are given that quantify the benets of omnidirectional vision. In the second half of this thesis, contributions are made in the form of new computationally ecient navigation algorithms that operate with a single low-cost monocular omnidirectional camera. In order to realise eective, ecient and comprehensive navigation, the task is divided into two aspects: approximate navigation, where an accuracy of approximately 0.5 navigation, with an accuracy of m is achieved, and a more accurate local homing 6 cm. The approximate navigation is achieved using an appearance-based approach combined with topological map construction. The novel use of self-organising feature maps provides ecient topological map indexing and storage. For higher accuracy navigation, an accurate robot homing algorithm is proposed which uses the raw image perception, with minimal processing necessary to navigate the robot to a desired location. Experimental results show the advantages of the newly proposed method over existing image-based homing strategies. vi Declaration I hereby declare that with eect from the date on which the thesis is deposited in the Library of the University of Ulster, I permit the Librarian of the University to allow the thesis to be copied in whole or in part without reference to me on the understanding that such authority applies to the provision of single copies made for study purposes or for inclusion within the stock of another library. This restriction does not apply to the British Library Thesis Service (which is permitted to copy the thesis on demand for loan or sale under the terms of a separate agreement) nor to the copying or publication of the title and abstract of the thesis. IT IS A CONDITION OF USE OF THIS THESIS THAT ANYONE WHO CONSULTS IT MUST RECOGNISE THAT THE COPYRIGHT RESTS WITH THE AUTHOR AND THAT NO QUOTATION FROM THE THESIS AND NO INFORMATION DERIVED FROM IT MAY BE PUBLISHED UNLESS THE SOURCE IS PROPERLY ACKNOWLEDGED 1 Introduction It is essential that all autonomous mobile robots are equipped with the ability to navigate. Doing so without requiring expensive sensors or vast computational resources is necessary if wide-spread deployment of mobile robots is to be realised. The cost of a robot and its capabilities are usually linked. In many elds this could be considered a marketing or business strategy, but in robotics the choice of hardware and software has a genuine impact on both cost and capabilities. Expensive high precision sensors and large amounts of computing power are essential for state of the art navigation algorithms. While these algorithms could potentially allow a robot to achieve more elaborate tasks, the costs involved prevent widespread deployment. It could be argued that algorithms developed now which operate on high end PCs will be applicable to small low-cost robots in the future, and therefore any eort to create computationally ecient algorithms is essentially pointless. This however is a very weak argument. Companies such as iRobot and Friendly Robotics have shown that consumers are ready for robots now, and in recent years there have been an increasing number of robots targeted at consumers. For some time robots such as iRobot's Roomba vacuum cleaner and Friendly Robotics' RoboMow lawn mower have been available, and increasing in popularity [41]. However, major factors limiting their more widespread use are (a) their cost and (b) their limited capabilities. New navigation algorithms that deliver accurate navigation using less resources and with 1.1. Low-Cost and Computationally Ecient 2 a single low-cost sensor will allow these limiting factors to be addressed. Indeed, 1 the very recent release of a new vacuum cleaning robot by Samsung in April 2010 illustrates the potential for improved navigation algorithms. The new consumer robot, named NaviBot, is claimed to make use of visual mapping to allow more advanced cleaning trajectories to be taken. In this thesis, new navigation algorithms are developed that utilise a low-cost sensor and impose limited requirements on the processing power available. If robots are to be deployed in everyday environments, ecient algorithms such as these are essential. 1.1 Low-Cost and Computationally Ecient Throughout this thesis, the terms low-cost and computationally ecient are used. When referring to something as low-cost, a comparison to another more expensive item or to a pre-established price that is considered expensive is implied. In this thesis, low-cost is used to refer to the total cost of a hypothetical robot, and is considered in comparison to typical low-cost commercial robots such as the 2 Roomba . In order for a robot to be low-cost and still be capable of performing navigation, it must be computationally ecient. This means that it must make use of the limited processing power available to it in an ecient way. In this thesis, the term computationally ecient is used to indicate that an algorithm requires little computation, and that a current low powered embedded computer such as the Asus Eee-Box would be sucient. 3 Such computers are currently available to consumers at less 1 Samsung's Navibot robot vacuum charting European living rooms in April., Endgadget Online: http://www.engadget.com/2010/02/23/samsungs-navibot-robot-vacuumcharting- europeanliving-rooms-in/, 2010. Accessed 29th June 2010. 2A Roomba robot vacuum cleaner currently costs from ¿250 ¿400. Asus Eee-Box is a 1.6Ghz x86 PC with 1Gb RAM and 160Gb HDD, with dimensions 22.2cm (W) x 2.69cm (D) x 17.8cm (H) and weighing 1.0kg. 3 The 1.2. Sensor Choice 3 than ¿200. 1.2 Sensor Choice There are a number of sensors available to mobile robots, each with advantages and disadvantages. A review of sensors for navigation is provided by Borenstein et al. [12], with a lot of emphasis given to inertial motion units (IMUs) and the global positioning system (GPS). Whilst these sensors are capable of aiding localisation, they have limited applicability for tasks other than navigation. The overall choice of a sensor for low-cost robots should not only be driven by the task of navigation. A versatile sensor will provide enough information not just for navigation, but for other tasks such as recognition. This then allows the creation of a single sensor robot, which in turn can result in a lower priced robot. On the other hand, this will not be the case if the cost of a single universal sensor is more than an array of specialised sensors. This suggests that two properties need to be considered when selecting a sensor physical cost and general purpose sensing ability. The possible choices of sensor, and their advantages and disadvantages are as follows: • Range Sensors: Laser, sonar and infrared are popular general purpose sensors for robotics. A laser range nder typically costs about ¿2000 and delivers accurate information in non-reective environments. Sonar and infrared sensors have lower accuracy, but are generally a lot cheaper, ranging from ¿10 ¿20. In order to obtain a truly versatile range sensor capable of sensing more than distance on a plane, a 3D laser scanner can be chosen. However, the cost of such a sensor will prevent its use as a sensor for a cheap robot. • Navigation Sensors: Compasses, shaft encoders and motion sensors are all examples of non-versatile sensors which are designed and used only for the 1.3. Research Tasks and Objectives purpose of navigation. 4 A single sensor robot equipped with such a sensor would be unable to perform additional tasks such as object recognition or obstacle avoidance. • Vision: The cost of vision sensors has fallen dramatically in recent years, and now a reasonable webcam can be purchased for ¿30. Such cameras provide adequate sensing for most robotic tasks. However, the interpretation of the sensor information conventionally requires computationally expensive algorithms, leading to the requirement of more advanced, and therefore more expensive, on-board computing. Vision sensors are low-cost and yet the most versatile of all sensors. Omnidirectional vision is even more versatile in that it provides information about the entire surroundings of the robot, in more than one direction. Furthermore, omnidirectional imaging no longer requires expensive equipment; miniature sheye lenses that can be used with low-cost webcams can now be bought for less than ¿50. This thesis demonstrates that omnidirectional vision can be used eectively with little computation, thus providing a low-cost navigation system. Whilst the scope of this thesis is limited to navigation, elsewhere in the literature omnidirectional vision has been shown to be eective not only for navigation, but for other tasks such as obstacle avoidance [77] and people tracking or following [65], further demonstrating the versatility of this sensing modality. Therefore, omnidirectional vision is the best choice of sensor for an economical single sensor robot. 1.3 Research Tasks and Objectives The overall aim of this thesis is to develop an indoor mobile robot navigation system that works with a single low-cost omnidirectional camera and without large computational requirements. In order to achieve this, the following tasks and objectives were formed: 1.4. Contributions 5 1. To investigate omnidirectional vision as a sensor for low-cost mobile robot navigation; 2. To achieve approximate navigation (1 metre accuracy) with a low-cost camera and an economical algorithm suitable for low-cost mobile robots; 3. To achieve ecient visual docking to be carried out after the approximate navigation; 4. To quantitatively test the constructed navigation systems using external ground truth data. The rst objective is met in Chapters 4 and 5, with an evaluation of the use of omnidirectional stereo, and a comparison and testing of conventional metric SLAM using both omnidirectional and classical vision. The second and third objectives are the subject of Chapters 6 and 7 respectively. Objective 4 is carried out in each chapter using an external state of the art motion tracking system. 1.4 Contributions The novel contributions of this thesis are as follows. • An algorithm for instantaneous ego-motion estimation between two consecutive video frames. Two new rotation estimation algorithms are proposed and compared to a well known existing method. • The formulae for the generic omnidirectional projection with a conical mirror. Previously all formulations of the projection with a conical mirror assumed perfect alignment [112], or other limiting restrictions. Generic projection formulae are presented in Chapter 4 that allow arbitrary mirror and camera placement. 1.5. Thesis Outline • 6 A quantitative comparison of monocular omnidirectional vision and classical monocular vision for metric visual simultaneous localisation and mapping. Although there are many papers in the literature that present implementations of well known monocular vision SLAM algorithms (see Chapter 3.3.1), a quantitative look at how the accuracy of such systems can be improved by using omnidirectional vision had not previously been presented. • A novel algorithm for approximate navigation that eciently uses visual appearance-based topological mapping. The algorithm proposed here runs in real-time on standard hardware and is ideally suited for use on low-cost robots. Quantitative testing of the algorithm using external ground truth shows good performance (less than 1 metre error). • A novel algorithm for accurate visual homing that exploits the properties of images taken from an indoor robot. Like existing algorithms, the snapshot model by Cartwright and Collett [18,19] is used. However, unlike existing algorithms such as that by Franz et al. [40], the properties of an indoor mobile robot have been exploited to create a novel algorithm that is fast, ecient and accurate. Quantitative testing of the proposed algorithm using external ground truth shows good performance (approximately 6 centimetre error). • 1.5 A quantitative evaluation of the algorithms developed. Thesis Outline The remainder of this thesis is organised as follows: related work and background material are discussed in the following two chapters. Key image processing and machine learning algorithms are detailed in Chapter 2, and existing approaches to visual navigation are described in Chapter 3. In Chapter 4 the use of binocular omnidirectional vision with conical mirrors is investigated. Two new rotation es- 1.5. Thesis Outline 7 timation algorithms are proposed and compared to a well known existing method, and the limitations and problems with omnidirectional stereo vision are addressed. Chapter 4 shows that catadioptric omnidirectional stereo is unsuitable as a cheap sensor. Chapter 5 subsequently introduces the use of monocular omnidirectional vision. Davison's monocular vision based SLAM algorithm is investigated as a solution to navigation. The advantages brought about through the use of omnidirectional vision are quantied and demonstrated, and the potential application in the context of this thesis is discussed. A new navigation strategy that utilises a low-cost sh-eye camera to achieve an ecient appearance-based, navigation-driven representation of the robot's working environment is presented in Chapter 6. Experiments are performed to show the eectiveness of the approach, and its limitations are discussed. A complementary navigation algorithm is developed in Chapter 7 that provides higher accuracy at the expense of the range of operation, and is thus more suitable for application in robot docking scenarios. Finally, this thesis is concluded in Chapter 8 with a summary of the work presented and suggestions for future research. 2 Algorithms and Tools 2.1 Introduction This chapter provides details of existing algorithms and methods that are directly used in this thesis. Specic topics of interest are catadioptric and dioptric omni- directional vision, omnidirectional stereo vision, appearance-based vision, and selforganising maps. 2.2 Omnidirectional Vision Omnidirectional vision, often referred to as panoramic vision, is vision with a wide eld of view. In general the eld of view is not truly omnidirectional, but rather a 360 degree horizontal view with a more limited vertical eld of view [91]. The key advantage of omnidirectional vision over classical narrow eld of view vision is the increased area covered in an image. In robotic applications, this allows the robot to see in all directions, which has been shown to improve the ability to localise and navigate [125], and to be able track objects [115]. Omnidirectional vision is versatile thus ideally suited as a sole sensor on an economical robot and is therefore the sensor of choice in the methods developed in this thesis. However, 2.2. Omnidirectional Vision 9 the increased eld of view acquired in omnidirectional images comes at a cost in the form of spatial resolution. When using a single camera to capture a wide eld of view, the spatial resolution is directly aected by the eld of view that is captured. The captured scene is spread over a xed size image sensor, and when a large area is captured, fewer pixels are used to capture the same area. A full investigation of spatial resolution in omnidirectional images is provided in [91]. There are many ways of capturing an omnidirectional image. These can be split into catadioptric [47] and dioptric. In a dioptric (refracting) system, the scene is imaged directly through a lens without any reection (catoptric ). In a catadioptric arrangement the scene is rst reected in a mirror before being captured by a dioptric camera. Examples of dioptric approaches include using multiple cameras [116], a spinning camera [53] and a sheye lens [85]. Catadioptric approaches include using a pyramid of planar mirrors [117], rotating planar mirrors, and rotationally symmetric mirrors viewed along the axis. A full review of catadioptric approaches is provided in [92]. The most important dierences between the approaches are: 1. The viewpoint geometry A single eective viewpoint is generally considered desirable because, as explained in [43], it allows many classical computer vision approaches built around a pinhole model to be directly applied with omnidirectional vision. As shown in Figure 2.1, this occurs when all pixels in the image are the result of a projection ray that passes through a common 3D point. Most imaging arrangements provide a single eective viewpoint, with the exception of multiple camera setups, and catadioptric vision using a conical or spherical mirror [8]. 2. The eld and direction of view As mentioned above, omnidirectional vision does not usually provide a 360◦ view in all directions, but just along the horizontal. The vertical resolution varies between approaches, but the highest resolution is oered by multiple camera arrangements such as Immersive Media's Telemmersion 1 Dodeca 1 system, which 2360 Camera System. http://www.immersivemedia.com/. Accessed 29th June 2010. 2.2. Omnidirectional Vision 10 Camera Projection rays 11 00 View Point Hyperboloidal mirror Figure 2.1: A catadioptric imaging system has a single eective viewpoint when all projection rays prior to reection intersect at a common point. As shown in this diagram, omnidirectional imaging with a hyperbolic mirror exhibits this property. combines 11 cameras to obtain a 3. 360◦ omnidirectional image. The cost of the system In the context of this thesis, both the physical and computational cost of the approach are important. The physical cost varies between approaches, with home built solutions such as that by Francis et al. [38] costing less than ¿50 and high resolution commercial solutions such as Immersive Media's Telemmersion system costing over ¿5000. In this thesis two forms of omnidirectional imaging are used: conical mirrors, and a sheye lens. 2.2.1 Conical Mirrors There are several types of rotationally symmetric mirrors that can be used in a catadioptric setup. Parabolic, hyperbolic and ellipsoidal mirrors can be used to achieve a single eective viewpoint. However, as pointed out in [78], using a cone 2.2. Omnidirectional Vision 11 mirror is preferable as there is no radial distortion and the camera is not reected in the mirror. In this thesis a conical mirror with a 90◦ apex is used, as suggested by Spacek [112]. (a) (b) (c) Figure 2.2: Omnidirectional vision with a conical mirror. (a) The image projection with a conical mirror viewed along the axis. (b) An example catadioptric image taken using a conical mirror. (c) The corresponding catadioptric image unrolled. 2.2. Omnidirectional Vision Using a 90◦ 12 apex conical mirror as shown in Figure 2.2 results in a cylindrical projection of the surroundings of the robot. Figure 2.2a shows the geometry of the projection, demonstrating the multiple eective viewpoints that result. A circle of viewpoints is formed, where each vertical line in the unrolled image (Figure 2.2c) is the result of a dierent view point. Each vertical line of pixels in the unrolled image is equivalent to a radial line in the catadioptric image (Figure 2.2b). Although the projection does not provide a single eective viewpoint, Lin and Bajcsy [78] suggested that a single eective viewpoint can be obtained by placing the camera at the tip of the cone. Unfortunately imaging the cone from the tip leads to diculties focusing the lens and capturing enough light. Moreover, a single eective viewpoint was shown to be unnecessary by Spacek [112]. The projection formulae of a conical mirror were derived for the case where the camera is positioned on the axis of the cone at an arbitrary distance from the tip. A cost eective robot can be constructed that uses a conical mirror to achieve omnidirectional vision, as demonstrated by Francis et al. [38]. A good quality conical mirror was economically produced using a lathe and a block of aluminium, and the advantages relating to the construction of cones over alternative mirror shapes were discussed. The conical mirror was then used in an optical ow based obstacle avoidance algorithm. 2.2.2 Fisheye Lenses A sheye lens (developed in 1964 by Miyamoto [86]) is a special type of camera lens that can be used to obtain up to a hemispherical image in a purely dioptric setup as shown in Figure 2.3. A true sheye lens captures a 180◦ hemisphere as a circle on the image plane, which may be shown full frame when coupled with a small image sensor resulting in a eld of view of 180◦ only on the diagonal direction. In such an arrangement there will be a lower view angle vertically and horizontally. Conventionally sheye lenses are considered expensive, large, heavy, complicated 2.2. Omnidirectional Vision 13 and unsuitable for mobile robot applications, with lenses only designed for expensive photographic equipment and with little interest in the robotics community [124]. Figure 2.3: (a) (b) (c) (d) Omnidirectional vision with a sheye lens. (a) The Unibrain Fire-i webcam [121] used to capture images. (b) The miniature M12 mount sheye lens. (c) An image captured with the camera and lens. (d) An undistorted version of (c) using the calibration algorithm in [107]. However, in the last 5 years miniature sheye lenses have been developed that operate with inexpensive webcams like that shown in Figure 2.3 and used in Chapters 6 and 7. These miniature sheye lenses can be bought for less than ¿50, and are gaining popularity in robotics applications. A sheye lens does not produce a single viewpoint projection, and the image distortions introduced are impossible to correct exactly. However, it does very closely approximate a single viewpoint and omnidirectional camera calibration algorithms have been developed that allow a near-perspective image to be created from the warped image. One such algorithm is that proposed by Scaramuzza et al. [107], in which the camera projections are modelled with polynomials tted using checker- 2.3. Omnidirectional Binocular Vision with Conical Mirrors 2 board point correspondences . 14 The general form of the projection model allows for the fact that the sheye lens does not produce images with a perfectly single viewpoint. In Chapter 7 this approach is used to rectify a sheye image, creating a perspective image like that shown in Figure 2.3d. 2.3 Omnidirectional Binocular Vision with Conical Mirrors Placing two of the conical mirrors described in Section 2.2.1 in a co-axial arrangement as shown in Figure 2.4 allows stereopsis to be achieved. The range, r, to an object can then be calculated using the pair of omnidirectional images in much the same way as in classical forward looking stereo vision. This elegant arrangement of conical mirrors was proposed by [112], who provided the formulae for stereopsis. A method of simulating omnidirectional images from a conical mirror was presented by [17], and the co-axial omnidirectional stereopsis with conical mirrors was veried using the simulation. Although alternative approaches to achieving omnidirectional stereo vision exist (for a review see [130]), this conguration leads to simple projections and epipolar constraints. The formulae given in [112] for calculating the range are: r= where (2.1) d is the distance from the camera lens to the tip of the mirror, s is the distance between the mirrors, and vs − d, hi1 − hi2 hi1 , hi2 v is the distance of the image behind the lens of the camera are the image distances in the upper and lower images between the tip of the cone in the image and the object in the image. The dierence (hi1 − hi2 ) is the radial disparity between the imaged object positions as shown in Figure 2.4. 2 Note that the sheye calibration in [107] diers from regular camera calibration in that the projection function is being estimated, not only the distortion parameters as in [120]. 2.3. Omnidirectional Binocular Vision with Conical Mirrors 15 Range v Lower Camera d Lower Mirror s v Upper Camera d Common Region Upper Mirror Object h i1 h i2 (a) (b) Figure 2.4: The vertically aligned coaxial omnistereo conguration. (a) The arrangement of the mirrors and the symbols used in the formulae. (b) A robot equipped with two coaxial arranged mirrors. To convert between pixels and millimetres, v=( where v is calculated as d + 1)rm , R R is the radius of the mirrors in millimetres, and rm (2.2) is the radius of the mirror in the image. The distance to an object can only be computed if the object lies in the common region viewed by both mirrors. A full comparison of mirror arrangements is provided in [15], where the primary reasons for preferring a co-axial arrangement over a horizontal side-by-side setup are stated as: 1. Arranging co-axially allows a 360◦ view to be unoccluded by the second mirror. 2. The epipolar constraint in a stereo vision system refers to the relationship between corresponding feature positions in the two images. When using a coaxial conical mirror arrangement the epipolar constraint can be formulated as corresponding radial lines, allowing well constrained correspondence matching. 2.4. Appearance-Based Vision Equations (2.1) and (2.2) apply for conical mirrors with a 16 90◦ angle at their apex, and are critically dependent upon the precise alignment of the mirrors and cameras as discussed in Chapter 4.6. 2.4 Appearance-Based Vision Appearance-based vision is a subcategory of machine vision in which the geometry of the viewed scene is not considered, or indeed reconstructed. Only the appearance of the scene in the form of the pixel intensities returned from the imaging device is used when interpreting the image. A common application of appearance-based vision methods is image retrieval, in which a database of images is queried with a sample image and any images similar to the sample image are retrieved [55]. Approaches can be split into two categories: local appearance-based methods and global appearance-based methods. Local methods use the appearance of local patches in the image, either by splitting the image into specied regions or into regions around detected features. An example is the bag-of-words paradigm [110], in which a collection of previously seen image features form a vocabulary. The presence of the words in the image becomes equivalent to words in a string, and well known text retrieval methods are applied to search an image database for similar images. Global methods use the entire image as a whole, either using the intensity values of the pixels directly or by building a global image descriptor such as an edge intensity histogram. An example is the work of Gaspar et al. [42], in which raw greyscale images are compressed using principal component analysis, and the projection of each image into the lower dimensional subspace is used as a global image descriptor for image comparison. In this thesis both global and local appearance-based methods are used. In Chapters 2.4. Appearance-Based Vision 17 4, 6 and 7 global appearance methods are used in the form of gradient magnitude images, while in the second half of Chapter 6 local appearance methods are used in the form of simple coloured blobs. The coloured blobs are detected as explained in the following section. 2.4.1 Blob detection with Opposing Colour Channel Coding As part of a larger visual attention mechanism, Itti et al. [54] proposed emphasising the channels of an RGB image. By emphasising the colour channels of an RGB image using opposing colour channel coding, red, green and blue objects can be easier located [95]. The formulae used in [54] were as follows: (2.3) G0 (2.4) B0 where R 0 , G0 and B0 (b + g) 2 (r + b) = g− 2 (r + g) = b− 2 R0 = r − are the emphasised red, green and blue channels and (2.5) r, g are the red, green and blue channels of the original image. In Chapter 6 the and B0 and b R 0 , G0 channels are thresholded resulting in pure red, green and blue objects being detectable as blobs. This is not a generic blob detection strategy suitable for an end product, and indeed the necessary threshold depends upon the ambient lighting conditions and must be manually selected and changed regularly. However, it works well for situations where the way in which the blobs are detected is unimportant. In Chapter 6 coloured blobs are detected and used for navigation with a selforganising map. 2.5. Self-Organising Maps 18 Figure 2.5: The architecture of a self-organising map. 2.5 Self-Organising Maps The Self-Organising Map (SOM) [6668] is a vector quantisation and clustering algorithm that creates a topology preserving mapping of an space onto a p dimensional discrete map space. n dimensional vector The map space usually has one or two dimensions, resulting in SOMs that can be displayed and used for data visualisation [126], and is formed by a number of nodes (often also referred to as neurons or units) evenly spaced on a grid as shown in Figure 2.5. Each node has a weight vector representing an n dimensional cluster centroid. When a new input vector is presented to the SOM, the weight vectors of each node are compared to the input vector. The output of each node is calculated as the Euclidean distance between the weight vector and the input vector, and the node that best matches the input pattern is considered as the output of the SOM. The topology preserving property of the SOM algorithm ensures that close vectors in the n dimensional input vector space are mapped to close nodes within the p dimensional SOM. It is the fundamental contribution of the algorithm over alternative methods of unsupervised clustering such as k-means. As such, the algorithm has been used extensively since its creation in 1981, with applications ranging from medical diagnosis [23] to the prediction of bankruptcy [63]. A bibliography of papers 2.5. Self-Organising Maps 19 using or analysing SOMs between 1981 and 2001 cites over 5000 papers [57, 97]. Training of a SOM involves a form of competitive learning. The weight vectors of each node are initialised randomly, and then for every epoch each vector in the training set is in turn classied with the SOM. The node whose weight vector best represents the training vector and its neighbouring nodes are adapted so that their weight vectors are brought closer to the input vector. Adapting not only the best matching node but also the neighbours of that node ensures that the topology of the training vectors is preserved. Figure 2.5 on the preceding page shows an update neighbourhood of size √ 2, where the map is wrapped around so that the edges of the map are neighbours (forming a torus shape). The weight vector w ~ of each node selected for updating is changed according to Equation (2.6): w ~ t+1 = w ~ t + ηt (~x − w ~t) , where ~x is the training vector, η is the learning rate and (2.6) t is the epoch number. The learning rate and the neighbourhood function for selecting nodes for updating are reduced at each epoch exponentially according to Equations (2.7) and (2.8): ηt+1 = η0 e−r 2 /R2 2 /λ2 Rt+1 = R0 e−10t where nodes that fall within a radius for updating, λ ; (2.7) , (2.8) of the best matching node are selected is the number of epochs that will be taken (500 in the experiments reported in this thesis), of the SOM and r≤R 2 /λ2 et η0 t is the current epoch number, R0 is initialised to the radius is initialised to 0.9. Thus the learning rate starts at 0.9, and the amount that the node weight vectors are changed is reduced as the distance from the winning node increases and the training progresses. Figure 2.6 shows a SOM algorithm used to cluster 1000 random RGB triplets generated with a uniform distribution. The training was carried out as above with a 2.6. Summary Figure 2.6: A 30 × 30 20 SOM trained with 1000 randomly generated RGB triplets. The topology preserving property of the SOM algorithm is evident in the grouping of similar colours. 30 × 30 map, and the nal SOM was saved as an image with each node represented as a grid location. The squares were coloured with the RGB value corresponding to the node's weight vector. It can be seen that close colours are mapped to close locations on the SOM, preserving the topology of the higher dimensional input space on the lower dimensional map. 2.6 Summary In this chapter omnidirectional vision, appearance-based vision and self-organising maps have been introduced. The key tools and algorithms that are used in this thesis are as follows: • Omnidirectional imaging is versatile and ideally suited as a sole sensor on an economical robot. It is therefore used throughout this thesis. • Conical mirrors are an appealing way to obtain an omnidirectional image cata- 2.6. Summary 21 dioptrically because the direction of view is over a useful area and there is no radial distortion. In Chapter 4 conical mirrors are used in a visual compass algorithm. • Research suggests that omnidirectional stereo with conical mirrors is reasonably straightforward. In Chapter 4 omnidirectional stereo with conical mirrors is considered and new formulae are derived to remove the strict alignment constraint in Spacek's formulation [112]. • Fisheye lenses have recently become available for small inexpensive cameras, and methods of rectifying the distortion have been developed. In Chapters 6 and 7 a sheye lens is used as part of an alternative, low-cost omnidirectional camera. • Appearance-based vision does not try to reconstruct the geometry of the environment, and relies only on the captured appearance of the environment. In Chapters 6 and 7 appearance based vision is carried out to utilise raw sensor information in an ecient way. • SOMs are popular tools for vector quantisation and clustering, with a useful topology preserving property. In Chapter 6 the topology preserving property is exploited to index a topological environment map. Having now introduced the tools utilised in the work presented in this thesis, in the next chapter related work in visual robot navigation is considered and the work presented in Chapters 4 to 7 is placed in context. 3 Literature Review: Robot Navigation 3.1 Introduction Mobile robot navigation can be divided into three intertwined competences [94]: • Self-localisation the ability of the robot to calculate its location in an environment, • Path planning the process of choosing a route from the current location to a desired location, • Map building and map interpretation the construction of some form of map of the environment, and the interpretation of the map to allow path planning and localisation. Review papers that cover the full range of approaches to visual navigation have been provided by DeSouza and Kak in 2002 [28] and by Bonin-Font et al. in 2008 [11]. In both reviews, self-localisation and path planning are categorised as either map-based or mapless ; in this chapter the same classication is adopted. As stated in Chapter 1, the overall aim of this thesis is to develop an indoor mobile robot navigation system that works with a single low-cost omnidirectional camera 3.2. Mapless Robot Navigation and without large computational requirements. 23 In light of this, key approaches to mobile robot navigation that are of interest are identied and discussed in this chapter. Firstly mapless navigation strategies are discussed, in particular visual odometry estimation algorithms and image based homing methods. Secondly mapbased navigation approaches are considered, specically simultaneous localisation and mapping, and topological map-based navigation. The chapter is concluded in Section 3.4 with a summary of the navigation approaches considered in this thesis. 3.2 Mapless Robot Navigation Mapless navigation involves using the perception of the robot to navigate to a goal, without relying on known landmarks or complicated online map building algorithms. Mapless navigation strategies are appealing for low-cost robots where advantages are gained by not requiring map storage, look-up, creation or maintenance. Two forms of mapless navigation investigated in this thesis are image based homing, and dead-reckoning. Dead-reckoning (discussed in Section 3.2.1) attempts to maintain a metric localisation without resorting to a map, by integrating robot ego-motion estimates over time. Image based homing (detailed in Section 3.2.2) aims to navigate a robot to a target image based solely on comparisons between the current and target images. 3.2.1 Robot Ego-Motion and Visual Odometry In the context of robotics, ego-motion is the motion of a robot with respect to a static environment. When a map of the environment is provided the ego-motion and absolute position of the robot can be calculated. When no apriori knowledge of the environment is available and no map is constructed, global positioning is not possible; in this case the location estimation becomes incremental and prone to drift error. Despite this drift error, mapless visual odometry strategies are still useful, 3.2. Mapless Robot Navigation 24 particularly for low-cost robots where advantages are gained by not requiring a map. The robot's ego-motion is frequently calculated by using wheel encoders and knowledge of the mechanics of the robot. However, wheel slippage causes inaccuracies in the estimates. An alternative solution is to use visual odometry, where the wheel encoders are supplemented or replaced by a camera. A successful example of the use of visual odometry is demonstrated on the Mars Rovers, where it has been used successfully for over ve years [81] to cope with extreme wheel slippage on soft 3D terrain. It is common for the ego-motion of a robot to be computed by separately calculating the rotation and translation estimates. Several researchers have developed visual compass algorithms, as detailed in Section 3.2.1.1. Methods of translation estimation are covered in Section 3.2.1.2. 3.2.1.1 Visual Compasses It has been shown that the polarisation pattern of sunlight is used by animals to provide a compass for navigation [105]. Taking inspiration from this discovery, Lambrinos et al. [75] developed a visual compass using an array of photo-diodes to nd the polarisation pattern of sunlight. A similar approach was taken by Usher et al. [122], who developed a visual compass that uses a colour video camera combined with a polarising lter to detect the polarisation of the sun. Although interesting from a biological point of view, application is limited to outdoor robotics and requires a dedicated sensor to detect the light polarisation. Albert and Conell [3] demonstrated an algorithm using a forward looking camera to estimate rotation. In their approach vertical and horizontal lines are matched between the current video frame and a key frame, which is captured at the start of the traversal and replaced after every 30 degrees of rotation. A brute force search is used to nd the rotation of the robot that results in the maximum feature correspondence between frames. The best rotation is then taken as the estimate of 3.2. Mapless Robot Navigation 25 the robot rotation. While they show that the approach is computationally ecient, the method relies on the presence of vertical and horizontal features, and requires an accurately calibrated camera so that straight lines can be detected. Other approaches that also require local image features were proposed by Sturm and Visser [113] and by Montiel and Davison [90]. In [90] a Bayesian simultaneous localisation and mapping solution based on the extended Kalman lter is used to provide a three dimensional visual compass. A greyscale monocular camera with a 90 degree lens is used to map distant point features, which are matched to provide an incremental estimate of the camera orientation and angular velocity. Unlike the SLAM solutions detailed in Section 3.3.1, the point features are assumed to be at an innite distance, allowing translation to be ignored and 3D reconstruction to be avoided. Sturm and Visser [113] also employed a Bayesian lter to produce a visual compass. Their appearance-based approach uses a colour monocular forward looking camera, with which a cylindrical map of colour transition frequencies is constructed. The cylindrical map is constructed by rotating the robot on the spot and recording the changes in pixel colours along the horizontal lines in the images. The pixel colour transitions are assumed to result from innitely distant objects, allowing robot translation to be neglected. Labrosse [73] proposed an incremental, purely mapless appearance-based method of calculating a robots rotation using omnidirectional images. The rotation of the robot between frames is determined by nding the image patch corresponding to the expansion and contraction elds of the current image within the previous image. This is achieved by minimising the Euclidean distance between the previous image and a rotated copy of the current image with respect to the rotation. Eciency is maintained by combining a local and global search when minimising the distance. Of all of these approaches, the Labrosse visual compass is the most appealing with the widest array of experiments presented and a full exploitation of omnidirectional 3.2. Mapless Robot Navigation 26 imaging. The conceptual simplicity and applicability of the approach ts well within the context of ecient economical robots. In Chapter 4 two new visual compass approaches inspired by that of Labrosse are detailed, and direct performance comparisons are made. 3.2.1.2 Feature-Based Translation The visual compass approaches detailed in Section 3.2.1.1 did not attempt to geometrically reconstruct the scene in order to nd the rotation of the robot, and as such can be classied as appearance-based. However, there are many non appearancebased methods that aim to calculate a metrically accurate estimate of translation by using feature correspondences and scene geometry. When feature correspondences between frames and the intrinsic calibration of the camera are known, a geometric reconstruction can be performed that allows the ego-motion of the robot to be found. In computer vision this is referred to as structure from motion [46], in which both the structure of the scene and the motion of a camera observing the scene are recovered simultaneously. Matsuhisa et al. [83] presented an application of the factorisation approach to structure from motion. Omnidirectional images were captured from a moving vehicle, and a three dimensional model of the scene was reconstructed. The factorisation approach, proposed in 1992 by Tomasi and Kanade [119], collects tracked features over a stream of images and batch computes the scene structure and camera trajectory. It is computationally demanding and better suited for scene reconstruction than for motion estimation. Furthermore, the motion and structure are only estimated up to a scale factor. An alternative omnidirectional vision based structure from motion algorithm that works by self calibration of the intrinsic and extrinsic camera parameters has been proposed by [85]. In the proposed solution the extrinsic parameters give the motion between frames. However, the computationally demanding calibration process requires at least 9 point correspondences. A full three dimensional path of the camera is reconstructed, which is unnecessary for 3.2. Mapless Robot Navigation 27 small robots designed to work on planar surfaces. Several visual odometry methods have been developed that constrain the motion of the robot to a plane. Scaramuzza et al. [108] developed a monocular omnidi- rectional visual odometry algorithm that calculates motion on a plane by tracking features on the ground. As in earlier work with perspective cameras the strict mathematical relationship between views of a planar surface (a homography) is used to calculate the motion. However, unlike previous homography based approaches, the rotation estimation is provided by the Labrosse appearance-based visual compass [73]. The translation is found using ground plane feature correspondences, with outliers removed using RANSAC (RANdom SAmple Consensus) [36] in order to ensure that only ground plane features are used. It was observed that the addition of an appearance-based compass improves ego-motion estimation. RANSAC is an algorithm for robustly tting a known model to a data set that contains outliers. It is frequently used in multiple view computer vision for estimating the relationship between views [46]. However, as an iterative process with no upper bound on the time taken to t the model it can be computationally expensive and unsuitable for small robots with limited processing power. As observed by Scaramuzza et al. in a later paper [106], the number of model parameters that RANSAC uses has an exponential eect on the number of iterations required. To overcome this they presented a monocular omnidirectional vision based visual odometry system that estimates the robot motion from a single image correspondence. Multiple correspondences between frames each provide dierent estimates of the motion, and the median estimate is used as the overall estimate. This avoids the iterative RANSAC algorithm altogether. However, in order to be able to estimate the robot motion from only a single point correspondence, the approach imposes strong limits upon the possible motion of the vehicle. An alternative way of achieving ego-motion estimation from a single point is to use stereo vision. When range information is available and the robot is constrained to motion on a plane, one correspondence is sucient. In Chapter 4 omnidirectional 3.2. Mapless Robot Navigation 28 stereo vision is considered for translation estimation. 3.2.2 Image-based Homing The pioneering work of Cartwright and Collett [18, 19] inspired the eld of imagebased homing. Their work examined how honeybees navigate to food sources using the surrounding landmarks. After experimenting with real bees and recording the ight paths under varying landmark positions, a computer model was created that simulated the navigation of the bees. In the model, named the snap-shot model, a comparison is made between the positions of landmarks in the current view and a snapshot image recorded at the target location. The comparison yields a homing vector that can be used to steer towards the target image. The work was undertaken purely from a biological viewpoint, with the aim of better understanding the behaviour of bees. Therefore unrealistic simulated binary landmark images were used. Two real-world (not simulated) robotic implementations of image-based homing directly inspired from the work of Cartwright and Collett were presented by Röfer [104] and Franz et al. [39, 40]. In [104] a rotating photoreceptor was used that captured 64 intensity readings as a one dimensional panoramic perception. Self-organising map based clustering was used to calculate the correspondence between images, which was then used to compute the homing vector. An ending position accuracy of 3cm from the target was claimed, but limited experiments were performed and the experimental environment was not detailed. Furthermore, the robot was required to have a constant orientation meaning that a compass was necessary. In Franz et al. [39, 40] features are extracted from a one dimensional panoramic image and shifted to calculate the displacement between views. Experiments were performed using a miniature robot (a Khepera) in an arena approximately 1 metre squared. Homing was shown to be successful over a distance of up to 25cm, where a successful homing navigation was dened as stopping less than 1cm from the target 3.2. Mapless Robot Navigation 29 location. Stürzl et al. [114] developed an extended version of the method of Franz et al. [39, 40], where increased robustness and speed were achieved by switching to the Fourier domain. The algorithm is similar to that by Hong et al. [48], but importantly a constant robot heading is not required and more complete experimental results were presented. Labrosse [74] presented a global appearance-based homing strategy that uses the entire omnidirectional image as is, without the need for costly feature extraction or matching. Moving away from the one dimensional image approaches in [19, 39, 48, 104, 114], the proposed algorithm directly uses a two dimensional colour image. Zeil et al [127] observed that the Euclidean image distance between physically close omnidirectional images with the same orientation reduces smoothly as the image locations get closer. This fact is exploited by Labrosse in his approach. The current image is rotated to be orientated in the same direction as the target image, and then warped into two additional images, one that resembles an image taken from the right of the current position and one that resembles an image taken from in front of the current position. The current and warped images are then compared to the target image, and the robot drives in the direction of the descending image distance gradient to get home. Robot rotation was permitted by using the Labrosse visual compass described in Section 3.2.1, and the use of full images increased the robustness of the algorithm. Experiments operating over ranges of up to 5 m to the target were reported to have an accuracy of approximately 40 cm. An approach very similar to that of Labrosse was developed independently by Möller and Vardy [87], in which the optic ow templates of forward and sideways motions are applied to the image. The core dierence of their algorithm, called matched lter descent image distance (MFDID), to that of Labrosse is that in MFDID the image warpings are implicitly calculated as part of the gradient descent, whereas in Labrosse [74] the image warpings are explicitly created. This provides a theoretical advantage in that a rigorous mathematical derivation is given, although the practical advantages are not clear with comparatively few results presented. 3.2. Mapless Robot Navigation 30 Although promising results have been presented by the above homing approaches, the methods make limiting assumptions regarding the distribution of landmarks around the robot. In particular, landmarks are either assumed to be at a con- stant distance from the robot, or evenly (isotropically) distributed around the robot. Möller et al. [88] extended the MFDID algorithm to deal with the problem of a nonisotropic landmark distribution, creating an algorithm known as Newton-MFDID. Homing vector directions are improved by combining a Newton algorithm gradient descent with the earlier developed MFDID algorithm. However, the assumption of constant distance landmarks is still required. Alternative approaches to homing without such assumptions have been created by not only employing appearance-based vision, but by considering the geometry of the scene. A key example of this is the work by Goedeme et al. [44], in which a visual servoing approach to image-based homing is presented which relies on the tracking of features with an extended Kalman lter combined with 3D reconstruction from wide-baseline stereo. Although the method is claimed to be robust and accurate, no detailed experimental results are provided. Furthermore, in order to be able to track and match image features between frames a computationally demanding image descriptor was used. The application domain of the algorithm is that of autonomous wheelchairs, where limits on the computational power available are not as restrictive as those for low-cost robots. Image-based homing can be considered as a form of image registration. As stated by Zitova and Flusser [131], image registration is the process of overlaying images (two or more) of the same scene taken at dierent times, from dierent viewpoints, and/or by dierent sensors. (Zitova and Flusser [131], page 1 paragraph 1.) The calculation of a robot homing vector from a current image and a target image is essentially image registration, in which the homing vector is given by the transformation between the image viewpoints. When viewing a planar surface from two viewpoints, the image transformation of the imaged surface between dierent views is purely ane. Such transformations are 3.3. Map-based Robot Navigation 31 referred to as homographies, and well known feature-based algorithms can be used to nd the transformations (see Hartley and Zisserman [46]). Image homographies have been exploited by López-Nicolás et al. [79] for robot homing, where multiple image homographies are calculated and used. The use of omnidirectional vision meant that several planar surfaces were visible in the image, and by calculating the transformations of the surfaces between the current image and the target image, steering commands were generated to drive the robot between locations. However, methods of calculating image homographies rely on image feature extraction and matching, and incorporate computationally expensive statistical data tting algorithms. This is unsuitable for a robot with limited computational power, and Sim and Dudek [109] showed that global appearance-based navigation is more robust to noise and occlusions than a local feature-based approach. Chapter 7 demonstrates that a full feature-based homography calculation is not necessary in order to navigate an indoor robot to a target image. An appearance-based approach to image-based homing that utilises the ceiling plane is developed, thereby avoiding assumptions about feature distributions (isotropic or constant distance). 3.3 Map-based Robot Navigation Map-based localisation involves using some form of model of the environment to calculate the localisation of the robot based on the current perception. The model can be supplied beforehand, for example as a geometric CAD model of the environment [69], a simple list of landmark locations [15], or a trained articial neural network [98]. If no map is supplied prior to navigation, then a map can be constructed as the robot moves. This is often referred to as SLAM (Simultaneous Localisation and Mapping), and is a very active research area in robotics and computer vision. Maps can be split into two main categories: metric and topological, as depicted in Figure 3.1. Metric maps can be either sparse, like those created in visual SLAM 3.3. Map-based Robot Navigation 32 (a) Figure 3.1: (b) Examples of (a) a dense metric map created by an occupancy grid algorithm with a laser range nder and (b) a topological map, in this case a rail network. algorithms, or dense like those created in occupancy grid based algorithms. Dense metric maps like that shown in Figure 3.1a are computationally expensive to create and maintain, and often contain information that is not actually needed by the robot. They are therefore not appropriate for use in the algorithms developed in this thesis. On the other hand, the sparse maps produced by visual SLAM are more suitable for low-cost navigation. Maps that do not provide any metric information about the environment, but rather only show the connections between places, are called topological maps. Topological maps are generally considered cheaper to create and maintain than metric maps [50], and are therefore of key interest in this work. They are usually sparse and contain few nodes, with each node representing a dierent location such as a dierent room [22] or suciently dierent perceptions [129]. Navigation using a topological map consists of nding a route through nodes in the map that gets to the desired location, and requires the ability to navigate between nodes. The two main approaches to achieve this are dead-reckoning with local metric information [99], and image-based homing. Two key areas of interest in map-based navigation visual SLAM and topological maps are discussed further in the following two sections. 3.3. Map-based Robot Navigation 33 3.3.1 Visual Simultaneous Localisation and Mapping with Metric Maps The goal of simultaneous localisation and mapping (SLAM) is to have a robot construct a consistent map of its initially unknown environment at the same time as keep track of its position. No prior knowledge of the environment or robot trajectory is provided, but instead the trajectory is calculated online using only the sensor measurements available. An introduction to the SLAM problem can be found in [30], in which it is pointed out that SLAM is a solved problem and a notable achievement of the robotics community in the past 15 years. The solution comes by incorporating the probabilistic uncertainties of landmark and robot positions, and relies on the correlation between measured landmarks. The rst demonstration of the importance of the correlations between landmarks was in 1995 by Durrant-Whyte et al. [32], who demonstrated the convergence of the map when incorporating all uncertainties. The SLAM solution is formulated in a probabilistic manner, with the current estimate of the robot position and map being maintained as a probability distribution. Critically, the probability distribution is maintained over the complete state comprising of both the map and the landmark positions. The two main approaches used for maintaining the state estimate are the extended Kalman lter (EKF), and the particle lter. The extended Kalman lter repre- sents the probability distribution as Gaussian, allowing simple parameterisation as a mean and covariance matrix. This approach is computationally appealing, but can lead to inaccurate estimation when the probability is non-Gaussian. Particle lters attempt to overcome this by representing the distribution with a discrete set of samples referred to as particles, allowing multi-modal non-Gaussian distributions to be modelled. However, this approach generally requires more computational resources in order to maintain the set of samples [6]. The current literature is split between particle lter based algorithms such as FastSLAM [89], and EKF algorithms such as Divide and Conquer SLAM [101]. A comparison between an EKF 3.3. Map-based Robot Navigation 34 and a particle lter is provided by [1]. Until recently the most common sensor used for SLAM was a laser range nder. The high cost and limited sensing capabilities of laser range nders has led many researchers to investigate the use of vision as a primary sensor for SLAM [102]. However, unlike laser range nders, standard monocular vision does not directly give range information. This can be overcome by using stereo vision, which allows range information to be calculated so that landmarks can be initialised directly into the map as in [2, 33]. Methods of SLAM using a single camera have also been developed, perhaps the best known of which is the pioneering work of Davison [26]. The proposed approach named MonoSLAM is a 3D extended Kalman lter based SLAM system in which three dimensional points are mapped. The points are found as image corner features and matched using xed region size template matching. New features are only initialised in the map after having been seen in a few consecutive frames so that their depth can be estimated. A similar system that replaces the Kalman lter with a particle lter was presented in [72]. Omnidirectional vision has received relatively little attention as a sensor for metric single camera SLAM, despite being ideally suited to the process of localisation. Features remain in view in omnidirectional images for longer than in regular images, allowing increased robustness, less frequent loop closing and the ability to cope with fewer features in the world. Kim et al. [60] proposed a particle lter based SLAM system making use of coaxial stereo omnidirectional vision. They claim that omnidirectional stereo vision increases robustness and allows faster operation as fewer particles are required to model the probability distribution. In an earlier paper [61], the same authors used side by side omnidirectional stereo vision in an extended Kalman lter SLAM system. However, although limited results were presented, the advantages gained through the use of omnidirectional vision were not quantied. Furthermore, the 3.3. Map-based Robot Navigation 35 applicability of the approaches to real robots is limited by the use of a restrictive motion model and a computationally demanding feature detector that operates ofine. Moving away from stereo vision, a method that combines the use of monocular omnidirectional vision with a laser range nder has been developed by Kim et al. [62]. In their approach omnidirectional vision is used to provide the bearing to vertical edge features, while a laser range nder provides both horizontal line features and the range to the vertical features. The use of a laser range nder assists feature initialisation and improves system accuracy, but makes the method only applicable to robots equipped with range nders. A comparison between bundle adjustment optimisation based and Kalman lter based methods for SLAM in the context of omnidirectional vision was presented by Deans and Hebert [27]. Experiments were performed on both simulated and real data, and a method of combining the bundle adjustment with a Kalman lter was developed. Although the approach is interesting from a theoretical point of view, there is a lack of quantitative results and the method incurs a high computational cost. The most notable implementation of monocular omnidirectional SLAM is that by Lemaire et al. [76]. An EKF based 3D SLAM system using monocular panoramic vision was demonstrated that maps Harris corner feature points and uses an appearance based method for loop closing in large scale outdoor environments. The underlying SLAM algorithm was the same as Davison et al. [26] and similar to that implemented in this thesis in Chapter 5. 3.3.2 Topological Map Navigation There are many dierent methods of using visual information in a topological map. The two main issues that dierentiate the methods are how the topological map 3.3. Map-based Robot Navigation nodes are represented and matched, and what the node creation criteria is. 36 The natural choice when using vision is for each map node to be represented by an image taken at a specic location. The node then corresponds to the location at which the image was taken. Winters [42, 124, 125] proposed the use of principal component analysis (PCA) to create visual appearance-based topological maps. In a map construction phase omnidirectional images were recorded at 50cm intervals along a corridor and a lower dimensional eigenspace was calculated directly from the recorded images. A topological map was then constructed from the image sequence with the projection of each image into the lower dimensional eigenspace stored as a map node. Localisation was achieved by comparing the projection of the current image into the eigenspace with every node in the map, and using the nearest neighbour as the localisation of the robot. Jogan and Leonardis [56] also proposed the use of omnidirectional image eigenspaces for image-based localisation. Unlike Winters, they considered rotation invariance. It was observed that two physically close locations only have images that project to similar vectors in the eigenspace if the orientations of the images are the same. To overcome this, they introduced the idea of spinning images, in which all rotations of an omnidirectional image are considered. An alternative node representation was proposed by Ishiguro and Tsuji [52]. They represented locations by the coecients of a Fourier transformed panoramic image captured at the location. Although the coecients provided a heavily com- pressed representation of the image, they are highly susceptible to local illumination changes [45]. More recently, the bag-of-words paradigm discussed in Chapter 2.4 has been employed for topological mapping by [4, 25, 34]. Nodes are represented by the visibility of image features, as demonstrated by Cummins and Newman [25] to result in a compact node representation that is both descriptive and robust to image noise. However, extracting image features and descriptors to represent local appearance can be computationally expensive and relies on the presence of features. 3.3. Map-based Robot Navigation 37 Neal and Labrosse [93] and Koseka et al. [70] propose methods of selecting a canonical set of characteristic images from a dense sequence. The set of canonical im- ages are then used as topological map nodes. Neal and Labrosse [93] achieve this using an articial immune system algorithm that both selects key images from a set of panoramic images and simultaneously reconstructs their topology. A global appearance-based approach is used for matching images in which the image is reduced in size to 50 × 50 pixels and converted to CIE L*a*b* colour space. The Euclidean distance between reduced images then provides a comparison metric both for the articial immune system algorithm and for localising the robot in the map. Localisation is achieved by taking the node with the minimum Euclidean distance to the current image as the current location, and image-based homing is suggested for navigation between nodes. Koseka et al. [70] use Learning Vector Quantisation (LVQ) to determine characteristic images in a video sequence. A histogram of edge gradient orientations is calculated for every frame, and used as a global appearance descriptor. After selecting regions to be mapped by splitting the image sequence based on temporal information, LVQ is used to determine a reduced set of representative histograms that can be used to describe the regions. Localisation is achieved by comparing the current image histogram to the representative histograms of mapped spatial regions, followed by a nearest neighbour approach to determine the most likely location. As only temporal information is used when constructing the topology, no cycles are determined and each location is assumed to have been visited only once. The use of self-organisation for topological mapping has been developed by Pisokas and Nehmzow [103]. In their approach map learning is split into two phases, rstly a self organising map is created from sonar perceptions logged from a robot driven around an environment. Each node in the SOM is then used to represent a topological map node. In the second phase the robot is driven around the same environment using four discrete actions (forward 50cm, backward 50cm, left turn 40◦ ), 40◦ , right turn and metric links are recorded between the topological map nodes. This is achieved by linking the SOM node that is activated by the current sonar reading 3.4. Summary 38 and the SOM node that was activated prior to taking the last action. link is labelled with the action taken. The new The topological map can then be used for navigation by nding a route between the current node and the target node, and executing the actions associated to the links in the route. Severe problems with perceptual aliasing were encountered because large physical spaces were represented by single map nodes, and therefore few successful navigations were demonstrated. Although the approach does not consider the use of visual information, it is closely related to the work presented in Chapter 6. 3.4 Summary Robot navigation strategies can be classied as either mapless or map-based. Mapbased strategies can be split into topological map-based navigation and metric mapbased navigation, while mapless navigation can be split into ego-motion estimation methods and image-based homing methods. The following table summarises the reviewed work in each of these four areas: Navigation Approach Mapless Ego-motion estimation Lambrinos et al. [75] Usher et al. [122] Albert & Conell [3] Sturm & Visser [113] Labrosse [73] Advantages Disadvantages Computationally Dedicated ecient only outdoors Computationally Dedicated ecient only outdoors Single camera and Specic features re- low computation quired No specic features Computationally needed demanding Computationally ecient sensor, sensor, 3.4. Summary 39 Matsuhisa et al. [83] High accuracy Computationally demanding Mikusik et al. [85] High accuracy Computationally demanding Scaramuzza et al. [108] High accuracy Full homography calculation Scaramuzza et al. [106] Cartwright & Collett Computationally Vehicle ecient heavily limited Field inspiring Unrealistic simula- [18] Röfer [104] motion tion Computationally Compass ecient little required, experimenta- tion Franz et al. [39] Eective Homing only over short range (25 cm) Image based homing Stürzl et al. [114] Hong et al. [48] Robust and compu- Homing tationally ecient short range (25 cm) Inspired Compass Franz et al. [39] little only over required, experimenta- tion Labrosse [74] Long range opera- Compass required, tion (5 m) feature distribution important Möller & Vardy [87] Formally dened algorithm, eective Computationally demanding, ture fea- distribution important 3.4. Summary Möller & Vardy [88] 40 Feature tion distribu- Computationally constraints expensive and Computationally lifted Goedemé et al. [44] Robust accu- rate 3D expensive feature tracking López-Nicolás et al. [79] Robust and accu- rate Computationally expensive homog- raphy calculation Montemerlo et al. [89] Paz et al. [101] Multi-modal prob- Computationally ability model expensive Increased eciency Laser scanner based Davison [26] Single monocular camera Image features required, large maps are computation- Metric maps Map-based ally expensive Kwok et al. [72] Multi-modal prob- ability model, sin- Computationally expensive gle camera Kim et al. [60] Accurate localisa- tion Requires precise omnidirectional stereo, oine operation Kim et al. [61] Accurate tion localisa- Requires precise omnidirectional stereo, operation oine 3.4. Summary 41 Kim et al. [62] Deans & Hebert [27] Monocular omnidi- Requires laser rectional vision range nder High accuracy Computationally expensive Lemaire et al. [76] Davison [26] proach improved with ap- Improvement not quantied omnidirec- tional vision Winters [42, 124, 125] Disk storage space Non rotation for map low variant, in- computa- tionally expensive Jogan & Leonardis [56] Ishiguro & Tsuji [52] Rotation invariant Computationally perception expensive Highly compressed Limited perception, ecient highly illumination accuracy, Topological maps sensitive Cummins & Newman [25] Accurate tional opera- Computationally large expensive, oine over environments (10 Km) Neal & Labrosse [93] Kosecka et al. [70] No image features Robot navigation required not considered No image features Only temporal in- required formation for mapping (no loops) Pisokas zow [103] & Nehm- No image features, Using only a lim- computationally ef- iting sonar percep- cient tion 3.4. Summary 42 Each of the four areas are considered in this thesis. The most important related works in these areas are: Mapless navigation: 1. Dead reckoning: (see Section 3.2.1) Dead reckoning using wheel encoders is fast and ecient, but very prone to drift from erroneous ego-motion estimates stemming from wheel slippage. Visual odometry aims to eradicate this problem by supplementing or replacing the wheel encoders with a camera. Doing so eciently is important if visual odometry is to be of use on economical hardware. The Labrosse visual compass [73] provides the best visual compass information in terms of accuracy and eciency. In Chapter 4 two ecient visual compass approaches inspired by [73] are developed with the aim of being even more accurate, and omnidirectional stereo is considered for nding the robot translation. 2. Image-based homing: (see Section 3.2.2) As a form of mapless navigation that does not use metric information, the problems of drift error encountered in dead reckoning are avoided. López-Nicolás et al. [79] demonstrated that image homographies allow accurate visual homing. However, using them is computationally expensive requiring feature extraction and statistical data tting algorithms. Labrosse [74] presented an appearance-based homing strategy that uses the raw image directly, resulting in an ecient algorithm suitable for low computationally powered robots. In Chapter 7 a new image-based homing algorithm is presented that benets from the use of image homographies as in [79], while employing ecient appearance-based navigation strategies as in [74]. Map-based navigation: 3. Metric simultaneous localisation and mapping: (see Section 3.3.1) Davison's MonoSLAM approach [26] was the catalyst for intense interest in monocular vision based SLAM, and is the topic of interest in Chapter 5. Existing im- plementations of monocular omnidirectional vision based SLAM have not quantitatively assessed the benets gained from omnidirectional sensing. In Chapter 5 this 3.4. Summary 43 is provided through an implementation of monocular camera SLAM. 4. Topological map navigation: (see Section 3.3.2) Topological maps are generally considered cheaper to create and maintain than metric maps [50], and are therefore of key interest in this work. In Chapter 6 a topological mapping and navigation algorithm is developed that is closely related to the SOM based approach by Pisokas and Nehmzow [103]. The important dierence is that rather than limiting the number of topological map nodes to be equal to the number of nodes in the SOM, in the algorithm presented in Chapter 6 the SOM is used to eciently index the topological map nodes. This permits the number of map nodes to be as large as required and avoids the problems of perceptual aliasing encountered in [103]. Furthermore, unlike [103] the approach detailed in Chapter 6 uses a camera sensor rather than a sonar sensor and works in a continuous action space. This thesis now continues by considering the use of omnidirectional stereo vision for robot ego-motion estimation. 4 Ego-Motion Estimation with Binocular Omnidirectional Vision 4.1 Introduction As discussed in Chapter 3.2, calculating the ego-motion of a robot between frames allows a metric localisation to be formed without the need for a map. Not requiring a map is desirable when considering low-cost robots as no storage or look up is required, and no map maintenance or construction is necessary. In this chapter the application of binocular omnidirectional vision for ecient ego-motion estimation with low-cost robots is considered. Using binocular vision allows the distance to objects to be calculated. If the robot's rotation is known then the distance to a single image feature tracked between consecutive frames provides enough information to calculate the translation between the frames. A simple way of achieving omnidirectional stereo is to use two co-axial conical mirrors as presented by Spacek [112] and summarised in this thesis in Chapter 2.3. However, the formulation provided by Spacek does not work in real-world situations where perfect alignment between mirror and camera cannot be assumed. A contribution of this thesis is the full derivation of formulae for omnidirectional 4.2. Visual Compass 45 stereo vision using two arbitrarily placed conical mirrors. In this chapter the estimation of the robot's ego-motion is considered in two parts, rstly the calculation of the rotation and secondly the calculation of the translation. Rotation calculation is carried out using a single monocular omnidirectional camera, providing another contribution of this thesis: of calculating a robot's rotation. the proposal of two new methods This chapter is organised as follows; rstly the rotation estimation methods are detailed, and experiments that were carried out using simulated and real images are discussed. Binocular omnidirectional stereo vision is then considered for calculating the robot's translation, and formulae for achieving stereo with arbitrary camera and mirror placement are derived. Finally the stereo formulation is discussed in the context of low-cost robots. 4.2 Visual Compass The Labrosse visual compass approach was introduced in Chapter 3. Two alternative approaches to calculating the rotation and f2 ω of the robot between two video frames f1 by using omnidirectional vision are (a) to calculate the optic ow in the front and back regions of the image directly from two edge points, and (b) to calculate the optic ows of 1D representations of the front and back regions. As pointed out in [96], the front and back regions represent the expansion and contraction elds of an omnidirectional image, and are less aected by translation. Only looking at the ow in these regions, as shown in Figure 4.1, helps to separate rotation from translation. 4.2. Visual Compass 46 Figure 4.1: The image sectors used to calculate the optic ow. The expansion and contraction elds exhibit less translation induced optic ow, making them more suitable for rotation estimation. The two rotation estimation methods developed calculate the average optic ow in the expansion and contraction elds. Each method calculates this ow in a dierent way. 4.2.1 Method 1: Rotation estimation from optic ow calculated using point pairs Horn and Schunk [49] compute optic ow (Equation 4.2) from the assumption that the grey level values of image points do not change as the points move across an image: f (x, y, t) = f (x + dx, y + dy, t + dt) where f (x, y, t) (x, y) at time t, (4.1) is the temporal image function giving the grey level value of pixel and over a time interval (dx, dy) is how far the image point has moved in the image dt. Using only the rst order terms of the Taylor series gives the optic ow equation: − ∂f 'v·g ∂t (4.2) 4.2. Visual Compass where 47 , dy ) v = (vx , vy ) = ( dx dt dt df , df dx dy is the optic ow vector to be found, g = ∇f (x, y) = is the rst derivative of the image found by convolution with an edge detection kernel, and · denotes the scalar product of two vectors. This is a single equation in two unknowns, and therefore cannot be solved directly, leading to the development of many iterative methods for calculating the ow [9]. However, by dividing both sides of Equation 4.2 by magnitude vg |g| (the magnitude of of the ow in the direction of the unit vector vg = v · ĝ ' − ∂f ∂t |g| ĝ g), the can be calculated: . (4.3) The overall optic ow can be calculated directly (non iteratively) by combining the ow in the edge gradient directions of two adjacent but dierently oriented edge 1 pixels in the image. Let two edge elements (edgels) with dierent orientations have unit gradient vectors û = (ux , uy ), ŵ = (wx , wy ) typically found on opposite sides of a point c of high curvature, as shown in Figure 4.2. Substituting these known unit gradients for vu = v · û, where vu and vw respectively. Such edgels are ĝ in to Equation 4.3 gives: vw = v · ŵ (4.4) are the magnitudes of the optic ow in the direction of the edge gradients, obtained by evaluating the right hand side of Equation 4.3. By applying the ow magnitude in the v at point c û and ŵ directions at c, the optic ow vector can be estimated directly by solving Equations 4.4 for vx = provided that vu wy − vw uy , ux wy − uy wx ux wy − uy wx 6= 0. the two edge gradients ŵ and û vy = vx and vw ux − vu wx ux w y − uy w x vy : (4.5) This condition will always be met provided that are not in the same or exactly opposite directions. 1 This assumes that the two adjacent edge pixels are moving in the image in the same way. This is a reasonable assumption provided that the edge pixels are suciently close together. 4.2. Visual Compass 48 (a) (b) Figure 4.2: (a) The optic ow at point c can be found by combining the optic ow in the direction of the image gradients at two points either side of point calculation of the θ c. (b) The component of the optic ow vector. The robot's rotation ω can be estimated by averaging the angular ow components in the front and back regions of the omnidirectional image: ω = mean(vθ ) where vθ is the angle between the line from the centre of the image to from the centre of the image to c + (vx , vy )T (4.6) c and the line as shown in Figure 4.2. Performing this conversion rather than unwarping the front and back regions of the image prior to calculating the ow avoids noise that would be introduced into the edge detection process by the image interpolations. The two-point optic ow method of rotation estimation is summarised in Algorithm 4.1. The parameters of the algorithm are: • A gradient magnitude threshold (τ1 ). This dictates how strong the edge pixels in the front and back region have to be in order to be used to calculate the optic ow. Experimentation showed that the algorithm is not sensitive to this value; in experiments reported here the threshold value is set at gradient vectors. 5, which proved sucient to lter out very weak 4.2. Visual Compass • 49 The front and back region size. The region size dictates how much of the image is used for rotation estimation. The wider the region size, the more the rotation estimate will be corrupted by translation. Labrosse [73] reported a region size of 60◦ to be the best compromise between losing information and being inuenced by translation. The same region size is used in all experiments here. • The size of the window to look for adjacent gradient vectors û and ŵ (s × s). This dictates how far apart the adjacent gradient vectors are permitted to be. Points that are far apart are less likely to be part of the same object, meaning that the overall optic ow cannot be computed from their individual ows. However, restricting the search to very close point pairs will reduce the number of candidate pairs. A window size of 5×5 was found to provide the best results, and is therefore used in the experiments reported here. • The minimum angle between the adjacent gradient vectors (anglemin ) In order to calculate the optic ow from two adjacent edge pixels, the gradient vectors at the adjacent pixels must be suciently dierent. was used in all experiments reported here. A value of 40◦ This proved to provide the best compromise between not nding enough adjacent edges, and the increased sensitivity to noise when using edge pixels with too similar orientations. 4.2. Visual Compass 50 Algorithm 4.1 Calculate the rotation of the robot between two frames using the optic ow in the front and back regions, calculated using the two-point optic ow method outlined. Input: Frame 1 (f1 ), Frame 2 (f2 ) Parameters: Gradient magnitude threshold τ1 . s × s. between the gradient vectors of adjacent point pairs, anglemin Output: The rotation of the robot between f1 and f2 , ω . 1: Extract the front and back regions from image f1 . Front and back region size. Adjacent pair window size Minimum angle 2: Perform rst derivative edge detection over the front and back regions of image f1 , returning the partial derivatives ∇f1 = (fx , fy ). 3: Compute the gradient magnitudes of each front and back region pixel in 4: G(pixel) = |∇f1 (pixel)| = G(u) > τ1 if as then for all pixels w in a 6: if 7: s×s window around u do G(w) > τ1 and the angle between u and w > anglemin then (u, w, c) to the set of ow points P, where c = u+2 w . Add 8: end if 9: end for 10: end if 11: 12: end for 13: if 15: f1 fx2 + fy2 . for all pixels u in the front and back regions do 5: 14: p P contains no members then return 0 {FAIL} end if ← 0. (u, v, c) 16: totalow 17: 18: for all in P do Calculate the ow in the direction of the gradient for pixels u and w using Equation 4.4. 19: Combine the gradient direction ows of u and w to nd the ow at point c using Equation 4.5. 20: totalow 21: end for 22: ω← 23: ← totalow return ω. totalow + ÷ θ component of ow at point number of members in c. P. 4.2.2 Method 2: Rotation estimation from optic ow calculated using 1D images An alternative to calculating the optic ow of the image directly is to reduce the image to a one dimensional representation, and to calculate the ow of the reduced 4.2. Visual Compass image. 51 The one dimensional representation can be found by averaging the radial lines of the image in the same way as in Briggs et al. [13], as shown in Figure 4.3. Again only the front and back regions are used. (a) (b) Figure 4.3: (a) The regions of the image that are converted into 1D representations. (b) Example of two 1D representations at time t1 and t2 , and the disparity between them. The front and back 1D representations of a W ×H image I(w, h) are calculated as: H/2 X W H s s F (i) = I + r.cos(i), + r.sin(i) , i = − ... 2 2 2 2 r=0 H/2 X W H s s B(i) = I + r.cos(i), + r.sin(i) , i = − − 180... − 180 2 2 2 2 r=0 where s is the region width in degrees, r is the distance along the radial line, F front region 1D representation, of the image I (4.7) (4.8) is the B is the back region 1D representation and the values are interpolated using bilinear interpolation. Alternative interpola- tion strategies could also be employed, for example nearest-neighbour or bicubic, but bilinear interpolation was used here as a compromise between the higher computational cost and accuracy of bicubic interpolation and the lower computational cost and accuracy of nearest-neighbour interpolation. When the 1D representations have been found, the optic ow between them (shown in Figure 4.3b) can be computed eciently by using the single scale 1D case of the 4.2. Visual Compass 52 Lukas-Kanade iterative image registration technique [80]: h0 = 0 (4.9) P hk+1 = hk + where F (x) F0 w(x) = 1 , |G0 (x)−F 0 (x)| hk 0 x w(x)F (x + hk )[G(x) − F (x + hk )] P 0 2 x w(x)F (x + hk ) is the optic ow estimate at iteration (4.10) k , G(x) and are the front and back regions of the current and previous 1D image, and and G0 are the rst derivatives approximated by G0 (x) ≈ G(x + 1) − G(x). F 0 (x) ≈ F (x + 1) − F (x) The iteration continues until ∆h is less than and 0.001, i.e the estimate converges. An alternative strategy for nding the correspondence between the 1D image representations is to use the Euclidean distance, as in [73]. However, the shift images can be very small. h between Using the Lukas-Kanade image registration technique allows the subpixel shift to be found directly with the interpolation built into the process. Using the Euclidean distance method presented in [73] requires additional computation if subpixel accuracy is required. As in method 1 (Section 4.2.1), the average of the front 1D representation ow and the back 1D representation ow gives ω. The full procedure for calculation of ω is detailed in Algorithm 4.2. As with Algorithm 4.1 a region size needs to be decided in advance. In all experiments reported here the region size was the same as for ◦ experiments with Algorithm 4.1 (60 ). 4.3. Limitations of the Methods 53 Algorithm 4.2 Calculate the rotation of the robot between two frames by reducing the front and back regions to 1D representations and calculating the optic ow in 1D. W ×H w degrees. w must between f1 and f2 , ω . Input: Frame 1 (f1 ), Frame 2 (f2 ), dimensions Parameters: Front and back region width, Output: The rotation of the robot 1: 2: 3: 4: 5: 6: 7: 8: be an even integer. F ← zero vector size w {The average radial line image for image f1 } G ← zero vector size w {The average radial line image for image f2 } w w for i = − 2 to 2 do for r = 1 to H do F (i) ← F (i) + f1 ( W2 + r.cos(i), H2 + r.sin(i)) G(i) ← G(i) + f2 ( W2 + r.cos(i), H2 + r.sin(i)) end for end for 9: { Compute the disparity h between the signatures in the front and back re- gions by using the Newton-Raphson like iterative image registration technique proposed by Lucas-Kanade in [80]} 10: 11: 12: 13: 14: 15: 16: 17: 18: h←0 F 0 ← zero vector size w − 1 {The rst derivative of F} G0 ← zero vector size w − 1 {The rst derivative of G} W ← zero vector size w − 1 {The weighting vector} for i = 0 to w − 1 do F 0 (i) = F (i + 1) − F (i) G0 (i) = G(i + 1) − G(i) 1 W (i) ← |G0 (x)−F 0 (x)| end for 19: repeat 20: ∆h ← x P w(x)F 0 (x+h)2 x h ← h + ∆h until ∆h < 0.001 return h 21: 22: 23: 4.3 P w(x)F 0 (x+h)[G(x)−F (x+h)] Limitations of the Methods Before detailing experiments with the visual compass approaches, it is important to point out the limitations that can be expected of the approaches. These limitations are inherent to many vision systems, and similar to biological vision limitations. Both methods require visible features in the environment. Although the 1D case does not explicitly use features, an image of a single non-varying intensity will produce 4.3. Limitations of the Methods the same 1D 54 representation regardless of rotation and the method will fail. The 2D optic ow method depends even more heavily on the presence of edge features. For example, if there is a smoothly varying gradient in the image then the will succeed, whereas the 2D A further limitation of the 1D method approach will be unable to calculate the optic ow. 1D approach is that in environments of uniformly dis- tributed colour, averaging the radial lines in the image will produce non-varying 1D representations which cannot be used to nd rotation. However, this limitation is pathological and unlikely to occur in realistic scenarios. Non uniform varying intensity within an image can almost always be assumed, as diering objects or non-uniform lighting will cause varying intensity. As with the Labrosse approach [73], self repeating chess-board type environments could potentially cause problems with the arated 1D 1D approach. When the temporally sep- images are registered using Equation 4.10 the rotation estimation will be found as the closest signature shift to that the closest rotation to 0 0. However, aliasing will occur meaning may be the incorrect rotation. Again this limitation is pathological and unlikely to occur. An underlying assumption of both methods, and indeed of many of the appearancebased visual navigation methods discussed in Chapter 3 including the Labrosse approach, is that of an isotropic distribution of features in the environment. This assumption is used to allow the separation of rotation from translation by considering only the image movement in the front and back regions of the image. However, if this assumption is false then the translation induced optic ow in the left and right sides of the image will not balance each other out. therefore be corrupted. The rotation estimation will 4.4. Experiments and Results 4.4 55 Experiments and Results In this section the two rotation estimation algorithms proposed in Section 4.2 are quantitatively evaluated. Each method was tested with both simulated and real images, and compared to the Labrosse visual compass introduced in Chapter 3. 4.4.1 Simulated image experiments The omnidirectional image simulation method proposed by Burbridge and Spacek [17] was used to generate 250 articial images at 20 frames per second. Using this approach, a ray tracing package called POV-Ray was used to generate ray traced articial images of an environment reected in a conical mirror. The position of the simulated camera was changed by linear and angular velocity commands, which were executed using the kinematic equations of a dierential drive robot, thus simulating a robot that is equipped with omnidirectional vision. The exact ground truth position of the robot in the simulated environment was available from the simulator. (a) (b) Figure 4.4: The output of the omnidirectional vision simulator used. (a) The simulated environment. (b) An example of a simulated omnidirectional image. The simulated robot was moved with a random rotational velocity between per second and 20◦ per second, and a xed linear velocity of 0.5 m/sec. −20◦ The environ- ment model is shown in Figure 4.4a, and an example of the omnidirectional image generated is shown in Figure 4.4b. The mean absolute rotation between frames was 4.4. Experiments and Results 0.0051 56 radians. A graph of the computed heading of the robot using the two estimation algorithms described in Section 4.2 and the Labrosse visual compass [73] (discussed in Chapter 3.2.1.1) is shown in Figure 4.5a. Although all methods show a drift error, the Pearson product-moment correlation coecients (PMCC) between the actual change in heading (ω ) and the estimated change in heading (ω̃ ) for each method show very strong correlations. The error in the estimated change in heading for each method (ωe ω¯e = ω − ω̃ ) followed an approximately normal distribution, characterised by means and standard deviations σ shown in Table 4.1. 0.4 2−Point optic flow 0.35 0.3 Heading (radians) 0.25 Ground truth 0.2 0.15 Labrosse 0.1 1D image optic flow 0.05 0 −0.05 −0.1 0 50 100 150 200 250 Frame number (20 fps) (a) 0.16 1D image optic flow 0.14 2−Point optic flow Heading error (radians) 0.12 0.1 Labrosse 0.08 0.06 0.04 0.02 0 −0.02 0 50 100 150 200 250 Frame number (20 fps) (b) Figure 4.5: (a) Accumulated heading estimates for the two proposed methods and the Labrosse method alongside the ground truth over 250 simulated frames. (b) The error in the accumulated heading estimates (the drift error) for the two proposed methods and the Labrosse method over the 250 simulated frames. 4.4. Experiments and Results 57 Mean error ω¯e 2-point Optic Flow 1D Image Flow radians 3.50 × 10−4 −3.23 × 10−4 −1.55 × 10−4 Labrosse Standard deviation PMCC σ between ω 0.0021 0.93 0.0026 0.87 0.0016 0.96 and ω̃ Table 4.1: Performance statistics for the two proposed methods and the Labrosse method for 250 simulated image pairs. Figure 4.5b shows a plot of the absolute heading error against frame number for each visual compass approach. As pointed out by Labrosse in [73], the slope of the robust linear regression of the error curve for each method provides a good approximate measure of the drift speed of the method. The robust linear regression of each method was found using the iteratively re-weighted least squares routine provided by the MATLAB robustt function, and the slopes were found to be as follows: Drift error radians / second 2-point Optic Flow 0.011 1D Image Flow 0.014 Labrosse 0.0061 Table 4.2: Drift rate for each method when applied with articial images. 4.4.2 Real image experiments A MetraLabs Scitos G5 mobile robot equipped with a conical mirror and a webcam (as shown in Figure 4.6a) was driven around the robot arena at the University of Essex at a xed linear velocity of 0.3 m/s and varying rotational velocity. The ground truth pose of the robot was provided by a Vicon tracking system, and was logged along with 250 frames like that shown in Figure 4.6b. Frames were logged at 4.4. Experiments and Results approximately was 0.0093 20 58 frames per second. The mean absolute rotation between frames radians. (a) (b) Figure 4.6: (a) The robot equipped with omnidirectional cameras. (b) An example of a real omnidirectional image captured by the robot. A graph of the accumulated calculated heading of the robot using the two estimation algorithms described in Section 4.2 and the Labrosse visual compass is shown in Figure 4.7a. As with the simulated image experiment, all methods show a drift error and the Pearson product-moment correlation coecients (PMCC) between the actual change in heading (ω ) and the estimated change in heading (ω̃ ) for the Labrosse and the signature ow methods again show very strong correlations. The PMCC for the two point optic ow method shows a weaker correlation, due in part to a higher failure caused by a lack of clear point pairs. For the initial part of the trajectory, between frames 1 and 70, there was texture available in the front and back regions of the image. However, from frame 70 onwards the regions contained little texture and the two-point approach for calculating optic ow was ineective. For each method the error in the estimated change in heading (ωe = ω − ω̃ ) followed an approximately normal distribution, characterised by the means deviations σ shown in Table 4.3. ω¯e and standard 4.4. Experiments and Results 59 0.5 Heading (radians) 0 −0.5 1D image optic flow −1 Labrosse Ground truth −1.5 −2 2−Point optic flow −2.5 −3 0 50 100 150 200 250 Frame number (20 fps) (a) 0.8 Heading error (radians) 0.7 2−Point optic flow 0.6 0.5 0.4 1D image optic flow 0.3 Labrosse 0.2 0.1 0 −0.1 −0.2 0 50 100 150 200 250 Frame number (20 fps) (b) Figure 4.7: (a) Accumulated heading estimates for the two proposed methods and the Labrosse method alongside ground truth over 250 real frames. (b) The error in the accumulated heading estimates (the drift error) for the two proposed methods and the Labrosse method over the 250 real frames. Mean error ω¯e 2-point Optic Flow 1D Image Flow Labrosse radians −4.1 × 10−3 1.1 × 10−3 1.2 × 10−3 Standard deviation PMCC σ between ω 0.053 0.24 0.0082 0.89 0.0058 0.94 and ω̃ Table 4.3: Performance statistics for the two proposed methods and the Labrosse method for 250 real image pairs. Figure 4.5b shows a plot of the absolute heading error against frame number for each visual compass approach. Again the robust linear regression of each method 4.4. Experiments and Results 60 was found using the MATLAB robustt routine. The drift speeds were found to be as follows: Drift error radians / second 2-point Optic Flow 0.064 1D Image Flow 0.031 Labrosse 0.022 Table 4.4: Drift rate for each method when applied with real images. 4.4.3 Discussion It is important to note that only a partial implementation of the Labrosse method has been implemented here. In particular, in order to nd the rotation between two consecutive frames, where the rotation is very small, a high level of interpolation has been used. A complete implementation of the Labrosse visual compass includes criteria for deciding which frames to calculate rotation between. For this reason, the Labrosse method did not perform as well in this work as that published in [73]. However, this partial implementation allows conclusions to be drawn from comparisons between the methods. Of the two new algorithms proposed for calculating the rotation of the robot, the 1D image ow method (Method 2) produced the best results when applied to real images and is comparable to the method proposed by Labrosse [73]. The two-point optic ow method (Method 1) was ineective in real world situations, despite having shown promising results in simulation. This is because of the strong limitation of explicitly requiring image features from which to calculate the optic ow. The 1D approach does not require such explicit image features. This makes it more robust, as shown in the experiment with real images where the 2D method began to 4.5. Calculating the Translation of the Robot fail from frame 61 70 onwards where as the 1D approach succeeded. This suggests that although the Labrosse method performed best, future research could investigate the use of 1D images for nding the rotation between temporally more distant frames as in a complete implementation of the Labrosse approach. The motion of a robot in a 20th of a second (i.e. between consecutive frames) is very small, and calculating it as attempted by the two newly proposed rotation estimation algorithms is prone to error. Even a small error in the calculation, for example the error resulting from noisy measurements, can be vastly larger than the actual motion. For this reason it can be concluded that nding the rotation of the robot is best achieved at a reduced frame rate. 4.5 Calculating the Translation of the Robot The previous section showed that the rotation between frames can be found by applying a visual compass algorithm. If the range to a single point feature can be measured, then a complete ego-motion estimate can be formed by nding the translation of the robot between those frames. This becomes an easily tractable application of trigonometry: 1. At time step t, choose an image feature in the lower catadioptric image. 2. Calculate the range r0 and the bearing angle θ0 to the point by nding its location in the upper mirror using coaxial stereo as described in Chapter 2.3. 3. At time step angle θ1 . t + 1 calculate the new range to the point r1 (see Figure 4.8.) and the new bearing 4.5. Calculating the Translation of the Robot 4. Compute the translation where ω (tx , ty ) 62 of the robot: tx = x0 − cos(ω) x1 + sin(ω) y1 (4.11) ty = y0 − sin(ω) x1 − cos(ω) y1 (4.12) x0 = r0 .cos(θ0 ) (4.13) y0 = r0 .sin(θ0 ) (4.14) x1 = r1 .cos(θ1 ) (4.15) y1 = r1 .sin(θ1 ) (4.16) is the rotation between the frames. Figure 4.8: The translation of a robot between two frames can be calculated from the distance to a single landmark if the rotation ω between the frames is known. Omnidirectional stereo vision could be used to nd the range information, and the conceptual simplicity of the formulae proposed by Spacek [112] (see Chapter 2.3) suggests application on a low-cost robot may be feasible. However, although the omnidirectional stereo vision with conical mirrors proposed by Spacek [112] has been shown to work successfully in simulation (see Burbridge and Spacek [17]), 4.6. Omnidirectional Binocular Vision Without Alignment 63 perfect mirror and camera alignments are essential for it to reliably work in real world applications. When using the real mirrors and cameras shown in Figure 4.6a, poor alignment between the cones and cameras prevents stereo reconstruction. In the next section the image projections through a conical mirror are explored in greater detail, allowing stereopsis to be achieved regardless of the alignment of the cameras and cones. Previously conical mirrors could only be used for geometric reconstructions when perfect alignment could be guaranteed. 4.6 Omnidirectional Binocular Vision Without Alignment Taking into account the true alignment of the mirrors and cameras allows the derivation of a more general set of formulae for catadioptric omnidirectional projections. The back projection of a pixel provides the ray line in 3 dimensional space on which the point imaged by the pixel must lie. By imaging a landmark from two viewpoints, the two back projected rays can be intersected to reconstruct the landmark's 3D coordinates. Forward projection formulae for projection through a non-aligned cone have been proposed by Cauchois et al. [20, 21]. However, the formulae proposed will only work if the vertex of the cone, the imaged world point, the camera and the normal to the cone are all in the same plane. New forward and backward projection formulae for catadioptric omnidirectional vision using an arbitrarily placed cone mirror and perspective camera are presented here. 4.6. Omnidirectional Binocular Vision Without Alignment 64 4.6.1 Backward Projection Formulae The back projected ray line is described in vector form as start point of the ray and d̂ (D, d̂) where D is the is a unit vector giving the direction. Tracing the ray backwards from the camera can be split into three stages: • determining the direction of the ray as it leaves the camera; • determining the point on the surface of the mirror at which it was reected; • determining the direction of the ray from the point on the mirror to the imaged point. These stages are solved as follows. Ray leaving the camera The direction of the reversed ray leaving the camera can be specied with the standard pinhole camera model using the intrinsic parameters of the camera, namely the focal length f, scale factor s and the coordinates of the principal point (px , py ). In a camera centred coordinate system with the camera looking along the negative axis and the horizontal image direction parallel to the x axis as shown in Figure 4.9, the direction of a ray leaving the camera associated with an image location px − u d = py − v f /s z (u, v) is . (4.17) 4.6. Omnidirectional Binocular Vision Without Alignment +y v u +x P 65 dˆ (px, py) f −u −z Camera lens (u, v) −v Image plane −y Figure 4.9: The perspective projection camera model. Converting the ray from camera centred coordinates to cone centred coordinates, as shown in Figure 4.10, gives the new ray (C, ĉ), where ĉ = Rd̂. R (4.18) is a rotation matrix constructed from the Euler angles for the rotation of the camera, d̂ is the normalised form of the direction ray d, and C is the location of the camera in cone centred coordinates. Figure 4.10: The three coordinate systems: world, cone and camera. Each coordinate system is unique, with a rotation and translation between systems. 4.6. Omnidirectional Binocular Vision Without Alignment 66 Intersection with the cone The point of intersection I between the ray and the cone can be found algebraically by equating the formula for the cone with the formula for the ray: Ix Iy = I = C + pĉ, Iz p 2 Ix + Iy2 h = 0. Iz + r (4.19) (4.20) p gives two solu- + c2y Cz2 + c2z Cx2 + c2z Cy2 ) 2 a)/(−c2x a2 − c2y a2 + c2z ); (4.21) Substituting Equation (4.19) into Equation (4.20) and solving for tions: p = −((−Ax − Ay )a2 + Az − (2Ax a2 Ay − 2Ax Az − 2Ay Az + c2x Cz2 − c2x a2 Cy2 − c2y a2 Cx2 1 and p = −((−Ax − Ay )a2 + Az + (2Ax a2 Ay − 2Ax Az − 2Ay Az + c2x Cz2 − c2x a2 Cy2 − c2y a2 Cx2 1 + c2y Cz2 + c2z Cx2 + c2z Cy2 ) 2 a)/(−c2x a2 − c2y a2 + c2z ); where a= Ay = Cy ĉy h is the ratio between the height r and Az = Cz ĉz . solutions that satisfy h and radius The correct value of p r of the cone, (4.22) Ax = Cx ĉx , is selected as the smallest of the Cz + pcz < 0. Direction of the reected ray The back projected ray intersects the mirror at I with a direction vector ĉ. Assum- ing that the mirror is a perfectly specular reective surface and that there are no 4.6. Omnidirectional Binocular Vision Without Alignment 67 manufacturing imperfections in the shape of the cone, the reected ray direction can then be found as r̂ = ĉ − 2(ĉ · n̂)n̂ where n̂ (4.23) is the normalised vector normal to the cone at n = ∇f = Multiplying by p 2 Ix + Iy2 ∂f ∂Ix ∂f ∂Iy ∂f ∂Iz found by dierentiating I: the formula for the cone Equation (4.20) at I hIx /((Ix2 1 Iy2 ) 2 r) + 1 = hIy /((Ix2 + Iy2 ) 2 r) 1 . (4.24) h Iz allows is the same as r (Ix2 +Iy2 )−1/2 and observing that Equation (4.24) to be simplied to: Ix Iz n = Iy Iz Ix2 Iy2 . The resulting ray line that the point imaged at (4.25) (u, v) must lie on is (I, r̂). 4.6.2 Forward Projection Formulae The aim of the forward projection model is to map a point from world coordinates P = (x y z)T to the pixel coordinates (u, v) at which it appears in the image. This involves two steps: • nding the point on the surface of the mirror that reects the point into the camera (the reection point); • projecting this point into the camera. Finding the reection point can be achieved by placing constraints upon the location, and then solving these constraints as a system of equations. 4.6. Omnidirectional Binocular Vision Without Alignment 68 Reection point Assuming that the world point has been transformed into cone centred coordinates with the rotation and translation between the cone and world known, the following constraints upon the reection point R = (Rx Ry Rz )T exist: 1. The point must be on the surface of the cone: Rz + q h Rx2 + Ry2 = 0. r (4.26) 2. The angle between the normal to the cone at the reection point and the vector from the world point to the reection point must equal the angle between the normal to the cone at the reection point and the vector from the projection into the camera to the reection point: C−R P−R ·n− ·n=0 kC − Rk kP − Rk (4.27) expanding to ((Cx − Rx )Rz Rx + (Cy − Ry )Ry Rz + (Cz − Rz )(−Rx2 − Ry2 ))2 (Cx − Rx )2 + (Cy − Ry )2 + (Cz − Rz )2 ((x − Rx )Rz Rx + (y − Ry )Ry Rz + (z − Rz )(−Rx2 − Ry2 ))2 = 0. − (x − Rx )2 + (y − Ry )2 + (z − Rz )2 (4.28) 3. The normal to the cone at the reection point must be in the same plane as the world point, the camera, and the reection point: ((P − C) × (P − R)) · n = 0 (4.29) expanding to ((y − Cy )(z − Rz ) − (z − Cz )(y − Ry )) Rz Rx + ((z − Cz )(x − Rx ) − (x − Cx )(z − Rz )) Rz Ry + ((x − Cx )(y − Ry ) − (y − Cy )(x − Rx )) (−Rx2 − Ry2 ) = 0. (4.30) 4.6. Omnidirectional Binocular Vision Without Alignment A direct analytical solution for R 69 cannot be obtained from the above constraints because Equation (4.28) is highly non-linear, containing fth powers. Figure 4.11 shows the complexity of these constraints graphically. Figure 4.11: A graphical representation of the forward projection constraints (4.26), (4.28) and (4.30). All parameters other than R are xed. Iterative solution A solution can be obtained iteratively using Newton's method: Rn+1 = Rn − ∆F (Rn )−1 F (Rn ) where F =0 is a multidimensional function comprising of Equations (4.26), (4.28) and (4.30), and iteration n. (4.31) ∆F is the Jacobian of F with respect to the reection point With a good initial estimate of Rn Rn at few iterations are generally required, however, when a poor initial estimate is used convergence is not guaranteed. Adaptations of Newton's method such as the Newton-Armijo method which addresses this issue by introducing a step size and the Newton-GMRES iteration which oers better global convergence are discussed in [58]. 4.6. Omnidirectional Binocular Vision Without Alignment 70 When the camera is nearly aligned with the axis in the usually required manner, the projection equations detailed by Spacek [112] can be used to provide an initial estimate. If the camera is intentionally set o axis then a method of searching for an initial point is required. Projection into the camera The reection point R in cone centred coordinates can be projected into the camera using the pinhole camera model: R x Ry Sv = M Rz S 1 where S Su (4.32) is the homogeneous coordinate scale factor and M is the 3×4 camera projection matrix comprising of both the intrinsic and extrinsic parameters of the camera as in Equations (4.17) and (4.18): sf M = 0 0 0 px −sf py 0 1 0 R−1 0 0 0 −C 1 . (4.33) These newly derived formulae can now be used to achieve stereo vision using arbitrarily aligned mirrors and cameras, as demonstrated in Burbridge et al. [16]. However, they are far more complicated than the attractively simple formulae by Spacek [112], and their application in robotics needs to be more carefully considered. 4.6.3 Application in Robotics Having formulated a more generic set of projections with a conical mirror, it is now possible to draw conclusions regarding the applicability of omnidirectional stereo vision to small low-cost robots. When perfect alignment is not assumed, there are 4.6. Omnidirectional Binocular Vision Without Alignment 71 many parameters involved in achieving stereo with two mirrors. To summarise, these parameters are as follows: • the transformation between the rst cone and the rst camera (six parameters), and the transformation between the second cone and second camera (six parameters), • the transformation between the rst and second cone (six parameters), • the ratio between the height and radius of the cone (one parameter), • the intrinsic parameters of each camera (eight parameters), • the distortion models for the cameras (at least two parameters for radial and two for tangential distortions), and • any manufacturing imperfections in the shape of the mirror, requiring further calibration. In total there are at least 31 parameters, all of which are critical for accurate geometric reconstruction. In mobile robot applications, many of these parameters are subject to change when the robot bumps into an object or wobbles over a dip in the ground for example, and the appealing simplicity of the formulae presented by Spacek [112] is lost. Using a catadioptric solution to omnidirectional stereo is only possible with expensive well-manufactured precision apparatus. Furthermore, even with well manufactured equipment the computational cost of the approach is high. Feature matches between the two stereo images are required in all stereo approaches (the correspondence problem), which leads to computationally demanding feature extraction and matching. This problem becomes even more demanding when considering the new stereo formulation presented here. This is because when perfect alignment is not assumed, no convenient epipolar constraint exists to allow guided feature matching as is the case in Spacek's formulation (see Chapter 2). The application to widespread small cheap robots is therefore infeasible. 4.7. Summary 4.7 72 Summary In this chapter omnidirectional stereo vision has been considered for ego-motion estimation. Two new approaches to nding the rotation of the robot between frames have been proposed and tested. The results for the rotation estimation showed that both methods are able to estimate the rotation, but that an existing method marginally outperformed them both. As a purely appearance-based method, the 1D image ow algorithm performed better in real images than the 2-point optic ow method as no local image features were required. It was observed that if the range to image features can be measured, then calculating the translation of the robot is trivial. However, calculating the range using the simplistic projection formulation of omnidirectional stereo described in Section 2.3 is not possible with poorly aligned apparatus. In order to rectify this situation, a new set of equations for image projections with a conical mirror have been formulated for the more general case of non-exact alignment. However, the formulation encompassed a very large number of parameters, suggesting that a method that makes use of omnidirectional stereo with conical mirrors is not ideally suited for low-cost mobile robots. Although the results achieved by the methods presented in this chapter have not been satisfactory, conclusions can be drawn as to what the best method of achieving reliable navigation on a low-cost low-powered commercially viable robot is: 1. Attempting to calculate the inter frame translation of the robot with no form of lter to integrate a dynamic model with past sensory inputs is highly prone to error. Although not relying on measurement history or a known model of the robot kinematics theoretically allows instantaneous acceleration detection, the magnitude of the motion being estimated is vastly dwarfed by the magnitude of the error in the sensor readings. 2. The use of stereo omnidirectional vision achieved with catadioptric sensors re- 4.7. Summary 73 quires many parameters to be known and complex projections to be evaluated. When a geometric reconstruction is required, the combination of cost, weight and complexity of catadioptric setups seriously limits their scope in robotic applications. Having discovered that binocular omnidirectional vision is unsuitable for cheap robots, the natural question to ask is What is the feasibility of navigation with only a single omnidirectional camera?. In the next chapter navigation with a single camera is considered. Rather than the approach of ego-motion estimation taken in this chapter, conventional SLAM algorithms are considered. An implementation is constructed and tested in the context of low-cost navigation. 5 Metric Simultaneous Localisation and Mapping with Monocular Vision 5.1 Introduction Chapter 4 concluded that neither stereo vision nor inter frame localisation were eective approaches to robot localisation for low powered robots. An alternative approach can be found in metric simultaneous localisation and mapping (SLAM) algorithms that make use of a motion model and past readings when estimating the position of the robot, eectively smoothing the location estimation. By building a map and performing loop closing, the kind of drift error that was apparent in Section 4.2 can be prevented. Although the diculties with conical stereo are not prohibitive, as shown in Chapter 4, the added complexity brought about by having two cameras dramatically limits the usefulness of omnidirectional stereo for low-cost robot navigation. Davison [26] showed that stereo is not necessary to achieve visual SLAM with an approach called MonoSLAM which makes use of a single camera to localise and map point features in 3D. Monocular omnidirectional vision using a sh-eye lens, as described in Chapter 2, 5.2. Extended Kalman Filter SLAM Algorithm 75 avoids both the complexities of stereo vision, and the complexities of projections in a non-central catadioptric system. As detailed in Chapter 3, omnidirectional vision has had relatively little attention as a sensor for metric single camera SLAM, despite being ideally suited to the process of localisation. Earlier research that has made use of omnidirectional vision for SLAM has not fully quantied the improvements gained by omnidirectional sensing. In this chapter the use of such a metric SLAM is considered in the context of lowcost robots. In Section 5.2 the well known extended Kalman lter based SLAM approach is detailed, and in Section 5.3 simulation experiments are presented that quantitatively show the advantages gained by the use of a wider angle lens. The suitability of metric SLAM to low-cost, low powered robots is discussed in Section 5.4. 5.2 Extended Kalman Filter SLAM Algorithm In Kalman Filter based SLAM algorithms the system state is stored as a vector. The state is represented by the robot position (`v , ωv ), and the world positions x= m= where fn is the nth h h (xf , yf ) (x, y, θ), its angular and linear velocities of all features in the map x y θ `v ωv m iT xf 1 yf 1 xf 2 yf 2 ... ... xf n yf n (5.1) i (5.2) feature in the map. The aim of the algorithm is to maintain an estimate of the state probability distribution t. m: P (xt |xt−1 , Zt ), where Zt x in the form of a is the measurement vector at time The Kalman lter models the probability distribution as Gaussian, allowing it to be represented by a covariance matrix P and the mean state vector x. Any linear transformation of the state estimation will produce an altered Gaussian distribution, 5.2. Extended Kalman Filter SLAM Algorithm 76 meaning that the Kalman lter is an optimal state lter for linear systems with Gaussian probability distributions. In an extended Kalman lter (EKF), the non-linear measurement and motion models are linearised about the current state estimate as Jacobian matrices. Therefore for non-linear systems the EKF is only a rough non-optimal approximation. However, it is still widely used when the state transition and measurement models are nearlinear. An alternative Bayesian lter for highly non-linear systems is the Particle lter, in which the state estimate is maintained as a discrete set of samples. The algorithm follows the usual predict-rene cycle, whereby the state at time step t + 1, x is predicted and measurements are used to rene the prediction. motion model is used to predict the state x A robot at the subsequent time step. 5.2.1 Robot Motion Prediction Model A motion model is employed in order to predict the position of the robot at the next time step. Many SLAM algorithms use wheel odometry information as the motion model. This is an appealingly simplistic model, although problems could potentially occur when the accuracy of the odometry varies signicantly throughout the robot traversal, requiring additional methods such as that presented in the last chapter in order to estimate the accuracy of the odometry. In addition to this, some small robots [38] have no wheel odometry. The use of odometry can be avoided by using a constant velocity motion model and 5.2. Extended Kalman Filter SLAM Algorithm 77 the kinematics of a dierential drive robot as follows: xt+1 = f (xt , n) = f (xt , 0) = xt + where n = h `vn ωvn i +`vn vn sin θ + ω`vv +` sin(θ + (ωv + ωvn )∆t) − ω`vv +ω +ωvn vn `v +`vn +`vn cos θ − ω`vv +ω cos(θ + (ωv + ωvn )∆t) ωv +ωvn vn (ωv + ωvn )∆t `vn ωvn 0 . . 0 is the zero mean Gaussian process noise and (5.3) ∆t is the amount of time ahead that the model is being used to predict. A complete derivation of this model is presented by Thrun et al. [118]. The covariance of the state estimate is updated using the Jacobian of the motion model with respect to x, ∆fx , and with respect to n, ∆fn : P = ∆fx P ∆fxT + ∆fn Q∆fnT where Q (5.4) is the covariance of the process noise. The Jacobian matrix ∆fx does not depend upon the map or velocity elements in the state vector, so state augmentation [123] can be employed to decrease the computational complexity of this operation. 5.2.2 Measurements to Rene Motion Prediction The measurement model predicts what the measurement will be from the current state estimate. In the case of a single monocular camera mapping and localising in 2D, the predicted measurement of each mapped feature i is a single scalar ẑi giving 5.2. Extended Kalman Filter SLAM Algorithm 78 the relative bearing to the feature: ẑi = h(x, v) (5.5) y −y = arctan( xff ii −x ) − θ + v where v is the measurement noise and θ is the heading angle of the robot. The state estimate and covariance are rened after each feature measurement zi using the standard equations of the EKF as follows: where ∆hx and ∆hv S = (∆hx P ∆hTx + ∆hv R∆hTv )−1 (5.6) x = x + P ∆hTx S(zi − ẑi ) (5.7) P = (I − P ∆hTx S)P (5.8) are the Jacobians of the measurement model (5.5) with respect to the state and the noise, and R is the covariance of the measurement noise in this case simply a scalar representing the variance. S represents the certainty of the predicted measurement which can be used to provide search bounds for the feature. 5.2.3 Placing Features in the Map In monocular vision based SLAM, only the bearing to a feature can be measured, so measurements from two dierent views are required in order to estimate the distance to the feature and to initialise it in the map. The typical solution in monocular vision based SLAM is to initialise several known features into the map before the algorithm starts. Without this or any odometry, the robot would be unable to know how far it had travelled in order to initialise features, and scale would have to be left as an unknown as in [24]. In 2D at least two pre-initialised known features are needed to constrain the robot location. In the implementation described in this chapter, as soon as the robot starts moving new features are tracked for potential mapping. Bailey's frozen state approach [7] is 5.2. Extended Kalman Filter SLAM Algorithm 79 employed to initialise the features. This occurs by freezing a copy of the robot state at the time a rst measurement of a potential feature is made, and using the frozen copy when the feature is measured again with a sucient baseline. Using Bailey's approach allows the consistency of the map covariance to be maintained. 5.2.4 Algorithm Overview An outline of the well known extended Kalman lter based algorithm for bearings only 2D SLAM with Bailey's frozen state feature initialisation [7] is presented in Algorithm 5.1. Algorithm 5.1 Extended Kalman lter SLAM with a monocular camera. Require: A state vector with two pre-mapped features. 1: while running do 2: Predict state using motion model. 3: for all feature in the map do 4: 5: 6: if feature visible then Rene state and covariance estimates based on feature measurement. end if 7: end for 8: for all partially initialised feature do 9: 10: 11: if feature not found then Delete frozen copy of state. else if baseline and disparity above threshold then 12: Insert feature estimate into state. 13: Augment covariance matrix with covariance estimate for new feature based on frozen copy. 14: 15: Remove corresponding frozen copy of state. end if 16: end for 17: for all remaining visible feature do 18: 19: 20: Freeze a copy of the robot state. end for end while 5.3. Experiments with EKF SLAM 5.3 80 Experiments with EKF SLAM In this section the SLAM algorithm detailed in Section 5.2.4 is tested in simulation to evaluate the accuracy of the localisation and its suitability to low-cost robots. 5.3.1 Experimental Method Experiments were carried out in simulation so that repeatable testing could be performed. Doing so makes it possible to ensure that only the factor that is being investigated changes between tests. The simulation was made realistic by closely linking it to a real physical robot. A Hybrid Simulation In order to avoid using an over-simplistic simulation of robot kinematics, the trajectory of a real physical robot being driven around an arena was logged using a Vicon tracking system. This logged trajectory was then used as the trajectory of the robot in the simulation (see Figure 5.1) avoiding any bias towards the simulator caused by over simplied perfect trajectories. True real robot pose from Vicon system True pose Compare Simulate perception at robot position SLAM Performance SLAM System Estimated pose Figure 5.1: The hybrid simulation method. Only the sensor perception is simulated. This hybrid simulation method means that only the sensor is being simulated. Throughout the experiments detailed below, robot refers to the real physical robot, while sensor, feature and world refer to software simulations. 5.3. Experiments with EKF SLAM 81 Angular Velocity of the Robot Linear Velocity of the Robot 1.5 0.8 Linear Velocity (m/s) Radians / second 1 0.5 0 −0.5 −1 −1.5 0 500 1000 1500 Frame (30 fps) 2000 0.6 0.4 0.2 0 0 2500 500 1000 1500 Frame (30 fps) 2000 2500 Figure 5.2: The linear and angular velocity of the robot over its journey. Robot trajectory 3 Y (metres) 2 1 0 −1 Ground truth Odometry −2 −2 −1 0 X (metres) 1 2 Figure 5.3: The ground truth trajectory of the robot and the trajectory reported by the on-board odometry. Robot Trajectories A trajectory was acquired by driving a robot around the robot lab at the University of Essex while capturing the exact position at 30Hz. The linear and angular veloci- ties were varied throughout the journey as shown in Figures 5.2 and 5.3. This trajectory realistically violates the constant velocity model described in Section 5.2.1, maintaining the applicability of these experiments to real robot applications. The same trajectory was then used for each experiment. The visible range of the simulated sensor was xed at 10 meters, which covered all features placed within the direction of view. In all experiments, Gaussian noise was added to the measurements with σ = 0.0025. This was added to the measurement as a percentage of the eld of view of the sensor, simulating the idea that the sensor has a xed resolution that is spread across the eld of view. For a eld of view of 360◦ this equates to a standard 5.3. Experiments with EKF SLAM deviation of 82 0.9◦ . The performance of the localisation in each experiment was assessed by calculating the root mean squared error (RMSE) of the position of the robot over the whole journey. In this work no drift error occurs because data association is known and four pre-initialised features are provided. 5.3.2 Experiment 1: How many features are needed? In order to evaluate the eect the number of features in the environment has on the localisation error, worlds were generated with a number of features uniformly distributed in a 6m × 6m grid centred at the robot start position. The number of features was varied, and the accuracy of the localisation for each was compared. The results are shown in Figure 5.4. R.M.S.E (metres) Number of Features and Localisation Accuracy 0.05 0.04 0.03 0.02 0.01 0 0 10 20 30 40 Number of features 50 Figure 5.4: The eect of the number of features available in the world (in addition to the four pre-initialised features) on the accuracy of the localisation. As can be seen from the graph, even with only the four initially known features the algorithm performs well, achieving an average error of approximately 5cm. As more features are added to the world the accuracy increases. However, as the number of features increases, the amount of improvement in accuracy decreases. This would suggest that maintaining a map of only 40 well spaced features will perform almost as well as maintaining a similar map of 50 features. The marginally lower accuracy 5.3. Experiments with EKF SLAM 83 achieved using the smaller map may be compensated for by the increased speed of the computation. 5.3.3 Experiment 2: How important is an isotropic feature distribution? The distribution of features in the environment is an important factor in the accuracy of the localisation of the robot. To demonstrate this the SLAM algorithm was run in a world of 25 features randomly distributed about the origin, with all 25 features visible in every frame. The random distribution was Gaussian, and the standard deviation was varied between 0m and 2m. The results are shown in Figure 5.5. R.M.S.E (metres) Localisation Accuracy against Feature Distribution 0.035 0.03 0.025 0.02 0.015 0.01 0 0.5 1 1.5 Standard Deviation of Feature Distribution (metres) Figure 5.5: The eect of the spacing of world features on the accuracy of the localisation. When the features are not well spread across the environment (ie. have a small standard deviation) they are mostly at the same bearing as the robot moves away from the origin. This non-uniform distribution of features about the robot causes a decrease in accuracy coming from poor triangulation and the features being more distant and thus more aected by noisy measurements. As the standard deviation of the feature distribution is increased, the features become more spread across the whole of the robot trajectory and the overall accuracy increases. 5.3. Experiments with EKF SLAM 84 5.3.4 Experiment 3: Does omnidirectional vision improve SLAM? The accuracy of localisation achieved when using cameras of dierent elds of view was compared by running the algorithm in a world of 25 evenly spaced features with elds of view ranging from 120◦ to 360◦ . The results are shown in Figure 5.6. R.M.S.E (metres) The Localisation Accuracy with Different Fields of View 0.8 0.6 0.4 0.2 0 100 Figure 5.6: 150 200 250 300 Camera Field of View (degrees) 350 The eect of the eld of view of the camera on the accuracy of the localisation. The experiments show that the larger the eld of view the more accurate the localisation. This is due to a combination of more features being visible when the angle of view is greater, and those features being more spread out allowing better triangulation. When both the eld of view and the number of features in the world are small, there are locations in the trajectory at which no features are visible. At these times the error in the estimated position of the robot will increase until features are visible again, at which point the lter trusts the measurement model more than the motion model and the robot is re-localised. Since no odometry information is used, the motion model can quickly become inaccurate and result in large errors in the predicted position. This is shown by the very low accuracy when the eld of view was 120◦ large areas of the trajectory contained few visible features, resulting in poor localisation and new feature initialisation. Although a larger eld of view provides a more accurate localisation, the improve- 5.4. Metric Visual SLAM in Practice 85 ment in accuracy gained by increasing the eld of view increased slowly from about 180◦ . This was because a large number of features were always visible when the eld of view was greater that 180◦ . 5.3.5 Experiment 4: What is the eect of measurement noise? To nd out what eect the noise present in the measurements has on the accuracy of the localisation, the algorithm was tested in a world of 25 evenly spaced features, adding zero mean Gaussian noise levels of 10.8◦ σ = 0◦ , 0.9◦ , 1.8◦ , 2.7◦ , 3.6◦ , 7.2◦ and to the measurements. The results are shown in Figure 5.7. R.M.S.E (metres) Measurement Noise and Localisation Accuracy 0.8 0.6 0.4 0.2 0 0 2 4 6 8 Noise σ (degrees) 10 Figure 5.7: The eect of measurement noise on the accuracy of the localisation. The results show that localisation is very sensitive to noisy measurements. This was found to be mostly due to poor initial estimates of newly initialised features, subsequently resulting in detrimental renement of the robot location estimate. 5.4 Metric Visual SLAM in Practice The metric formulation of SLAM tested in this chapter is not suitable for small low powered robots. The reasons for this are detailed in this section. 5.4. Metric Visual SLAM in Practice 86 5.4.1 Computational Complexity Experiments 1 and 3 show that in order to achieve an accurate localisation, about 10 constantly visible features are required. This is without considering features that go out of sight, and is assuming a low level of noise in the measurements of the bearing to the features. In feature-based metric SLAM solutions with the map as part of the state, the number of features in the map has an important impact on the computational cost. To run in a practical environment, the required number of tracked features would lead to a large covariance matrix and a computationally expensive matrix inversion as shown in Equation 5.6. Approaches to reduce the complexity have been developed, with a key example being Divide and Conquer SLAM [101] in which the mapping is divided into sub maps that are later merged. This results in a SLAM system that has a mapping complexity linear in the number of mapped features. However, the overall computational complexity of the system remains high due in part to the unavoidable necessity to match features. 5.4.2 Features in the Real World In the experiments above, and in all feature-based metric visual SLAM systems, there is an underlying assumption that features are available in the environment, and that those features are trackable and distinguishable. As shown by experi- ments 1 and 2, the number of features and their distribution in the environment are important factors in the performance of the system. However, features are not always available in the real world, and even when they are they can be costly to reliably locate, extract and match. 5.4. Metric Visual SLAM in Practice 87 5.4.3 Accuracy Experiment 4 showed that despite having four perfectly initialised features in the map before beginning, and always having exact data association guaranteed, when the measurements exhibited zero mean noise levels with a standard deviation of 7◦ the accuracy of the system was only approximately 35 cm. Adding noisy data association to this would further reduce the accuracy. In Chapter 4 the diculties involved in obtaining an accurate geometric reconstruction with binocular omnidirectional vision were discovered. Although with monocular omnidirectional vision there are fewer parameters involved, obtaining very accurate measurements still requires a good calibration of the camera. On a low-cost robot with a low-cost sensor, this adds an additional complexity that would be best avoided. 5.4.4 Alternative Metric SLAM Solutions There are a number of alternative solutions to the SLAM problem. An outline of the SLAM solutions in the literature is given in Chapter 3, with solutions split into metric approaches such as that detailed in this chapter, and topological approaches such as [25]. Particle lter based metric solutions such as FastSLAM [89] also suer from similar problems to those described for the Kalman lter based solution outlined in Section 5.2. The main advantage of a particle lter based solution over the Kalman lter based solution to SLAM is its ability to cope with non-linear dynamics and nonlinear measurement functions and to be able to track multiple hypothesises [51]. However, the need for feature matching, the diculties of data association and the restrictions of state complexity still limit the use of metric SLAM solutions on small low-cost robots. 5.5. Summary 88 Topological SLAM solutions that do not attempt to create a metric representation of the environment are more realistically applicable. An appearance-based approach to the use of the image information avoids costly and restrictive feature matching and extraction, and by taking into account an entire image rather than individually localised features the problems with noisy measurements can be counteracted. However, a key problem is that data association and map consistency must still be considered for an accurate solution. 5.5 Summary The conventional approach to SLAM is to construct a map in a probabilistic framework [111]. Doing so allows the uncertainties of landmark positions and the robot's location to be combined and a consistent map to be constructed. However, maintaining a probability distribution over the map is a costly process. The metric example discussed in this chapter may not be the best solution with low-cost robots with limited computational power. The key reasons for this centre around the requirement of detectable, matchable image features. The data association problem, in which feature position measurements must be associated with feature position predictions, is non trivial. As pointed out by DurrantWhyte et al. [30, 31], data association is normally the main source of fragility in SLAM systems, with erroneous data association causing catastrophic localisation failure. In the experiments presented here data association has not been considered. Therefore, the accuracy achievable in real world situations cannot be presumed to be the same as that achieved in simulation. Furthermore, the additional computational cost brought about by requiring data association deters use on low-cost robots with limited computational power. Distinguishable and trackable features must be available in the image, and their extraction and matching adds additional computational cost. State of the art image feature extraction approaches can reliably locate and match image features with 5.5. Summary illumination, scale and ane invariance [59]. 89 However, such approaches are com- putationally demanding and real time performance is not achievable. Furthermore, although the experiments presented here suggest that a high level of accuracy can be achieved with just four features, in real world situations features may go out of sight or become occluded. Chapter 4 concluded that omnidirectional stereo vision was not a satisfactory tool for navigation with low-cost cheap robots. This chapter has shown that metric SLAM with a single omnidirectional camera is also not a satisfactory solution for navigation with computationally low powered robots. The next chapter presents a new ecient approach to topological mapping-based navigation using visual appearance that is well suited for low-cost robot applications where the robots have limited computational power. 6 Topological Map-Based Navigation 6.1 Introduction Metric map construction was shown to be an unsuitable approach to navigation with a low-cost robot in Chapter 5. An alternative to the metric SLAM approach can be found in topological mapping, in which the world is represented by a collection of nodes that are linked together to map the connectivity of locations. Topological maps are usually sparse, containing few nodes, with each node representing a dierent location such as a dierent room [22] or suciently dierent perceptions [129]. Navigation using a topological map consists of nding a route through nodes in the map that gets to the desired node, and requires the ability to navigate from node to node. As detailed in Chapter 3, various methods of navigating between topological map nodes have been suggested. One such method is to create a hybrid metric-topological map, placing metric information within a topological representation. The metric information aids navigation between nodes by providing a metric coordinate system. Two important factors inuencing the eciency of the navigation system are how the metric information is found, and how it is used. The previous two chapters showed that metric reconstructions rely on image features and accurate camera calibrations and are therefore 6.1. Introduction 91 not suitable for low-cost robots. The use of wheel odometry alone to incorporate metric information and to navigate between topological map nodes is not usually a sensible approach because the drift in the odometry information will be large between sparse map nodes. An alternative is to create a dense topological map with the distance between nodes considerably shorter as depicted in Figure 6.1. Since distances between nodes in a dense topological map are short, odometry information can be used reliably to navigate between them. Sparse Dense Figure 6.1: Left: A sparse topological map with only a single node per room. Right: A dense topological map where a single room is covered by many densely packed map nodes. In both Chapter 4 and Chapter 5 geometric approaches have been shown to be unsuitable for low-cost robot navigation. This chapter focuses on the use of visual appearance and avoids the use of image features for metric scene understanding, rstly by using a local appearance-based approach and secondly with a global approach. An ecient mobile robot navigation system is developed that works by creating a dense topological map combined with metric information from the robot's wheel odometry. In the following section the advantages and diculties of dense topolog- 6.2. Dense Metric-Topological Mapping 92 ical map-based navigation are discussed. In Section 6.3 a self-organising map-based solution to the diculties of dense topological mapping is presented. Two imple- mentations are then detailed and experimental results presented. 6.2 Dense Metric-Topological Mapping In this thesis a dense topological map is dened as a map with nodes spaced closely together, with no lower bound on how close two nodes can be. This leads to a large number of nodes covering an environment. For example, in a 4m × 4m topological map of nodes stored every approximately 25000. dense 2.5cm the number of nodes in the map will be Although there are a number of advantages associated with a dense topological map, there are also several disadvantages. The disadvantages stem from the large number of map nodes, and need to be addressed in order to gain from the advantages. The advantages and disadvantages are summarised in the following two sections. As detailed in Chapter 3, most existing research in topological mapping focuses on the creation of sparse maps with relatively few nodes. Little work has been done creating dense topological maps. The notable exceptions to this are discussed in Section 6.2.3. Methods that may be used to overcome the disadvantages of dense topological maps are considered. 6.2.1 Advantages of Dense Mapping There are numerous advantages gained by using a dense topological map rather than a sparse map: 1. As mentioned above, by having nodes that are closely spaced in the environment, basic wheel odometry can be more reliably used for navigation between nodes. Over long distances wheel odometry is unreliable, so the error incurred 6.2. Dense Metric-Topological Mapping 93 at each traversal between nodes would be large in a sparse topological map. 2. Routes closer to the optimum can be taken through the environment when using a very dense topological map. Arbitrary routes through the environment can be taken because the robot is not forced to navigate towards a specic node as a way point, but instead can navigate in any part of the environment knowing that a node will exist close to the current location. When a route through the topological map has been found, the metric information between the nodes can be exploited without having to visit the nodes that form the route. 3. Using a dense rather than a sparse topological map makes the map consistency become less signicant, as demonstrated in Section 6.5. An inexact mapping of the topology as shown in Figure 6.2 can still be used for navigation, as the robot is able to constantly re-assess its position and route through the environment. The map becomes a virtual sensor that tells the robot how far to travel, and inconsistencies within the map become noise which can be ltered. This avoids maintaining a complete probability distribution over the map, as was shown to be inappropriate in the last chapter. 4. When describing a node using the raw perception in an appearance-based way, a dense topological map is favourable as more of the perceptual space is captured. In a sparse map the perception will vary signicantly between nodes, however in a dense map the perception is sampled frequently. Since the perception can vary non-smoothly over large distances, localising can become dicult and prone to error when map nodes are far apart. 6.2. Dense Metric-Topological Mapping 94 Link with difference (0,0.15) Node A True Pose (0.45, 1.25) Link with difference (0,-0.25) Node B True Pose (0.45, 1.00) Link with difference (0, 0) Node F True Pose (0.45, 1.15) Node Z True Pose (0.45, 1.00) Link with difference (0, 0) Link with difference (0,0.25) Link with difference (0, 0) Node L True Pose (0.45, 1.25) INCONSISTENT LINK Figure 6.2: An example of an inconsistent topological map. circles with true (x, y) Nodes are shown as world position and links are shown between the nodes that specify the dierence between node positions. When using the map to navigate from node A to node F by following link distances, the inconsistent link between L and F could result in failed navigation. 6.2.2 Disadvantages of Dense Mapping Alongside the advantages of dense mapping are a number of serious disadvantages. Particularly in the context of low-cost robots, the main disadvantages result from the increased number of map nodes leading to the following diculties: 1. With a large number of map nodes, localising becomes a computationally demanding process of comparing the current perception to that stored for each node, and taking the best matching node as the robot location. This approach is not very ecient, requiring a large number of comparisons, and if the stored perception for each node is large then the computational power required for localising can be too great. 2. Storing a topological map with many nodes can take a large amount of disk space when an image is stored for each node. 3. Finding a route through a dense topological map requires searching many more links than in a sparse map. 6.2. Dense Metric-Topological Mapping 95 4. If map consistency is important then it has to be obtained by incorporating the uncertainty of links between nodes. Mapping a large number of nodes increases the computational expense, causing problems similar to those found when a large number of features are mapped in metric SLAM implementations (see Chapter 5). 6.2.3 Methods of Coping with Disadvantages Dense topological mapping was considered in [35], with a approximately 250 nodes spaced 25cm apart. 50m corridor mapped using The focus of the approach was on map construction and localisation, with the importance of map consistency considered in detail. This led to a computationally intensive algorithm. Furthermore, in order to cope with the disadvantages of dense mapping, a one dimensional panoramic image was used as the robot perception. Although this kept the map storage requirements low, it increased perceptual aliasing. One dimensional panoramic images were also used for robot navigation in [14], where scale-space features were extracted and matched to allow navigation between views. A dense topological map with nodes spaced 25cm apart was used for navigation, but the map had to be metrically consistent with the global pose of each node known. No details of how this was acquired were given. The PCA based methods discussed in Chapter 3 lend themselves well to dense topological mapping as the image data becomes compressed, allowing storage of a large database of images and faster matching within the database as shown by Winters [124]. However, as the number of images in the database increases, the number of principal components necessary to capture their variance increases. Furthermore, updating the principal components requires a computationally expensive matrix decomposition (the Singular Value Decomposition), even when using incremental PCA approaches [5]. 6.3. Using a SOM for Ecient Topological Mapping 96 Alternative methods of reducing the image data include image edge magnitude histograms [128], Fourier series descriptors [52] and bag-of words type approaches [25]. Image histogram-based descriptors can reduce the amount of data stored for each topological map node. However, they are not well suited to dense topological mapping because near locations will have almost identical histograms. Bag-of-words approaches to appearance-based image description (see Chapter 2.4) suer from a similar problem to image edge magnitude histograms, particularly when using omnidirectional vision where features remain in view over long distances. Fourier series descriptors of images as used by Ishiguro et al. [52] allow compact storage of image data in the form of several Fourier coecients, and as shown in [84] a hierarchical approach to localisation can be achieved by varying the number of coecients used when comparing descriptors. However, Hafed and Levine [45] showed that Fourier signatures are very susceptible to illumination change. 6.3 Using a Self Organising Map for Ecient Topological Map Indexing and Construction A novel contribution of this thesis is the use of a self-organising feature map (SOM) to overcome the complexities of dense topological mapping. This is achieved by using a SOM for both node compression and node indexing. Construction of the SOM indexed dense topological map is split into two distinct learning phases: • Phase I: The robot learns what kind of perception to expect. In this phase, the self-organising map is trained to cluster the perceptual space into regions. • Phase II: The robot learns how the perception will change when a specic motion is taken. A dense topological map is created in association with the self-organising map learnt in the rst phase, exploiting the topology preserving property of the SOM as a key to the robot's perceptual space. Every node in 6.3. Using a SOM for Ecient Topological Mapping 97 the topological map corresponds to a physical location, which is represented by the SOM classied perception at that location. 6.3.1 Phase I: Learning the Perceptual Space Firstly the robot is manually driven in the area that is to be mapped. The area is covered thoroughly, and the robot perception is logged. A SOM is then trained oine using the standard training procedure given in Chapter 2. The objective of the SOM training is both to cluster the high dimensional perceptual space and map those clusters to a two dimensional SOM. After training the SOM coarsely splits the perceptual space into clusters, each cluster corresponding to a class of perceptions. Each class of perceptions equates to an area of perceptual space which, provided the perception varies smoothly over space, will equate to a well dened region in physical space. Importantly, the topologypreserving property of the SOM means that close nodes will relate to close physical locations. This fact can later be exploited when localising. 6.3.2 Phase II: Creating a Topological Map The trained SOM can be used to classify new perception vectors. Rather than using only the winning node as the output of the SOM classication (as is usually the case), the q -dimensional classied vector perception vector P̃ , P~ is transformed to an (N × M )-dimensional with each element containing the Euclidean distance between the current perception and the weights of a node as given in Equation 6.1: v uX u q P̃i = t (P~j − w ~ ji )2 j=1 where SOM. w ~ ji is component j of the ith i = 1...N × M. SOM node, and N ×M (6.1) is the dimension of the 6.3. Using a SOM for Ecient Topological Mapping Mapping is achieved by using the classied perception vectors 98 P̃ in place of the initial perception vector to represent topological map nodes. Each topological map node is stored underneath the SOM as a sub-node of the best matching unit in the SOM, as shown in Figure 6.3. This process addresses two of the complications of dense topological maps detailed above: 1. This method of storage allows fast and ecient localisation when searching the topological map nodes to nd which node best matches the current perception. The nodes that represent the same or near-by physical locations also have similar classied perception vectors that are stored as sub-nodes of the same or near-by SOM node. When localising, the current perception vector is classied using the SOM and Equation 6.1. Only the sub-nodes of the best matching unit in the SOM are then compared to the current classied vector closest sub-node is taken as the current topological map node. P̃ , and the This avoids comparing all map nodes, and ideally will reduce the number of comparisons necessary by a factor of the number of nodes in the SOM. 2. By using only the classied perception vector, the data that needs to be stored for each topological map node can be reduced. For example, using a 10 × 10 SOM will mean that every node in the topological map is described by a vector. 100D This means that for each node in the topological map a perception vector does not need to be stored, but instead only a classied perception vector. This way all perceptions are stored in a compressed form. Weighted links are placed between nodes within the topological map (the sub-nodes in the SOM), with information stored in the link that species the relationship between the nodes and the condence that the link is correct. In the rst set of experiments reported in this chapter, the nodes are related by discrete physical actions, while in the second set of experiments the node relationships can be continuous physical distances. By storing this metric information in the links between nodes, a form of hybrid metric-topological map is created. 6.3. Using a SOM for Ecient Topological Mapping 99 N x M Self Organising Map M perception vector N N x M dimensional classified perception vector of SOM node responses best matching unit classified perception stored within the SOM as a topological map node store vector under winning unit Figure 6.3: The use of a self-organising map to eciently store and index a dense topological map. Perceptions are classied by the self-organising map and stored as topological map nodes in association with the best matching unit in the selforganising map. As the map is constructed without explicit consideration for metric consistency, the same physical location may potentially be mapped multiple times by dierent nodes. For this reason links are also stored between perceptually similar nodes, with weights that reect the certainty of the link. This closes the loops in the map. 6.3.3 Navigation with the Topological Map In order to navigate from the current location to a target location, a route through the topological map needs to be found, starting at the node that represents the current location and ending at the node that represents the target location. The local metric information stored for each of the links can then be used to navigate the robot from node to node and ultimately to the target location. When the current and target nodes in the map are known, Dijkstra's shortest path algorithm is used to nd a route between them. 6.3. Using a SOM for Ecient Topological Mapping 100 6.3.4 Map Parameters and Computational Cost There are several parameters involved in the SOM based topological mapping and navigation algorithm proposed. These are summarised as follows: • Size of the SOM: The size of the SOM used inuences the eciency of the method. Assuming that the SOM is perfect, it will segment the mapped area equally with each SOM node indexing an equal number of topological map nodes. When localising within the map, the number of comparisons that need to be made will then be n full size perception vector comparisons, where in the SOM, and n is the number of nodes V /n classied perception vectors comparisons where V is the number of topological map nodes. A SOM size should be selected to balance the number of comparisons necessary to the minimum, taking into account the size dierence between the perception vector and its classied counterpart. • Perception size and properties: The eciency and eectiveness of the approach is inuenced by the way in which the camera information is used. It is important that the perception vector varies smoothly over the physical space, with close physical locations having similar perceptions. This ensures that the topology of the SOM relates to the topology of the physical space. It is equally important that the size of the perception vector be kept small. When localising and mapping, the current perception vector needs to be compared to each node in the SOM. If a large perception is used then this will require more computational power. • Type of metric information: The metric information that links the topological map nodes together can be discrete, in the form of a limited number of actions performable, or continuous allowing any change in pose. Limiting the action space of the robot to specic 6.3. Using a SOM for Ecient Topological Mapping motions, such as forward 10cm and right turn 90◦ , 101 has the advantage of re- stricting the physical pose space that the robot can occupy, but subsequently limits the usefulness of the algorithm. Conversely, allowing the robot to take arbitrary motions between nodes creates a more general and useful navigation algorithm, but increases the diculty of navigation. • How links are weighted and a path found: The weighting of links in the map governs which route will be found by the breadth rst search when navigating from a start node to a target node. The link weights need to be selected so that they reect the certainty that the link is true. Thus the route that is most likely to succeed in navigating the robot will be selected. • Map density: The density of the map is controlled by the map node creation criteria, which in the implementations detailed here is xed at regular intervals throughout the mapping process. In the rst set of experiments (detailed in Section 6.4) map nodes were created before and after execution of a discrete action. In the second set of experiments (detailed in Section 6.5) map nodes were created at regular • 1 second intervals as the robot moved at a speed of 0.2m/s. Map building termination criteria: In the experiments detailed here, the decision of when to terminate the map building stage was made manually by an expert user. When the robot was deemed to have driven suciently around the environment, the process was stopped. Automatic stopping criteria could be the focus of future work. The computational cost of the algorithm is an important factor dictating its applicability to low-cost robots. The algorithm has very low complexity with the most demanding component being the initial training of the self-organising map. This is a task that only needs to be carried out once, and has no real-time constraints. On the whole, the algorithm is highly parallelisable, and as such is well suited for hardware implementation, with hardware implementations of SOMs in development [71]. 6.3. Using a SOM for Ecient Topological Mapping 102 The complexity of the mapping and localisation depends on the number of topological map nodes (V ) and the number of self-organising map nodes (n). The best case complexity occurs when all V topological map nodes are evenly indexed by the self-organising map nodes. In this case, required and V /n n n self-organising map node comparisons are topological map node comparisons are required. The complexity of the mapping and localisation algorithm is O(V /n + n). In the worst case all topological map nodes are indexed by a single self-organising map node; in this case all topological map nodes and self-organising map nodes need to be compared, and the complexity is O(V + n). There are three possible scenarios: equal to If n, V n or V then V is vastly smaller than n is vastly larger than n, V is approximately n. can be considered as a constant, and therefore removed from the calculations: best case: O V + n2 V +n =O ≈ O V + n2 ≈ O (V ) n n (6.2) worst case: O (V + n) ≈ O (V ) If (6.3) V ≈ n, best case: O worst case: V +n n ≈O n n + n ≈ O (1 + n) ≈ O (V ) (6.4) 6.4. Experiments in Discrete Navigation 103 O (V + n) ≈ O (2V ) ≈ O (V ) The case of V n is specular to V n. (6.5) This establishes that both the lower and upper bounds to the algorithm complexity are O (V ), that is, linear in the number of topological map nodes. The use of a self-organising map to compress the perceptions results in low storage cost. For each node in the topological map, a vector of size equal to the number of nodes in the SOM is stored. requirement of 100 For a 10 × 10 SOM, this results in a memory oating point values for each node approximately bytes (390 Kbytes) per 1000 nodes when using 4 400, 000 byte oating points. The storage of the SOM requires storing a oating point vector of the size of the raw perception for each node in the SOM. In the next two sections, separate experiments are carried out with dierent perceptions and metric information. In the rst set of experiments discrete actions are used with a blob detection based perception. In the second set of experiments the algorithm is applied in a more general case of continuous actions, and the weighting of links in the map is developed to allow inconsistent maps to be acceptable. In all experiments the proposed algorithm operated on the robot, with processing carried out by the robot's on-board computer. The robot used in experiments was a MetraLabs Scitos G5 mobile robot equipped with a 1.6Ghz PC similar to the Asus Eee-Box discussed in Chapter 1.1. Navigation was carried out in real-time. 6.4 Experiments in Discrete Navigation with a Simplied Perception To verify that the use of a SOM can assist in dense topological mapping, and to demonstrate a real implementation of the approach outlined in Section 6.3, exper- 6.4. Experiments in Discrete Navigation 104 2.7m Blue tile A Blue tile B 2.4m Blue tile C Blue tile D Figure 6.4: Left: The lab setup with four blue tiles on the oor. Right: An example omnidirectional image taken on the robot, with the four blue tiles visible. iments were performed with a simplied perception and discrete motions between the map nodes. This section details the implementation and presents the results of two experiments. 6.4.1 Experimental Environment In the experiments in this section a 190◦ 640 × 480 RGB webcam was coupled with the eld of view miniature sh-eye lens described in Chapter 2, with a total cost of less than £100. The camera was positioned on the robot at the centre of rotation, facing the ground. A 2.7 × 2.4m environment was constructed as shown in Figure 6.4, with no obstacles present. An example image taken with the camera is shown in Figure 6.4. 6.4.2 Local Appearance-Based Perception Four blue tiles were placed in the environment as shown in Figure 6.4. The bearing θ to each of these tiles from the robot was used to form a Equation 6.6. The 4D 4D vector ~ Θ given in vector was then used as the robots perception: ~ = (θA θB θC θD )T Θ (6.6) 6.4. Experiments in Discrete Navigation To turn the camera image into a 105 4D perception vector, the blue tiles were detected in the image using opposing colour channel coding (see Section 2.4.1) and the bearing to each tile was calculated as θ = arctan where (Cu , Cv ) Cu − Ru Cv − R v is the centre of the image and (Ru , Rv ) (6.7) is the centre of the imaged tile. The tiles were tracked throughout the traversal of the robot in order to ensure that the bearings vector ~ Θ remained in the xed order of Equation 6.6. 6.4.3 Training a SOM In order for the robot to learn the perceptual space (Phase I, Section 6.3.1), it was manually driven around the entire arena shown in Figure 6.4, and 1500 4D bearings vectors as described in Section 6.4.2 were logged at a rate of 2 Hz. vectors were then used oine to train a 7×7 The bearing SOM. It has been shown in [82] that the Euclidean distance is not as eective as the Mahalanobis distance when training SOMs with vectors of landmark bearings. However, the Euclidean distance was used in these experiments. The implication of this decision is that the SOM will not optimally index the space, with some nodes indexing a larger number of topological map nodes than others. 6.4.4 Mapping Nodes with Discrete Action Links In the second learning stage, the robot explored the environment using only four discrete actions: forward 15 cm, backward 15 cm, left turn 45◦ and right turn 45◦ . Note that these four actions are reciprocal, they can undo each other. In total, approximately 5000 actions were performed. 6.4. Experiments in Discrete Navigation 106 Before and after each action, the perception was classied and mapped as detailed in Section 6.3.2. The links between topological map nodes were labelled with the action that was taken, and links in the opposite direction labelled with the inverse of the actions were created as shown in Figure 6.5. The explored links were assigned a cost value of 1. F S B S L R R S L S S S S S Action links Subnodes SOM nodes Figure 6.5: An illustration of the mapping of actions between sub-nodes of the SOM. Sub-nodes are shown as small circles, SOM nodes as large circles and action links as directional arrows. The actions and links are represented by their initials, F (forward), R (right), L (left), B (back) and S (similarity link). In addition to the links between nodes created as the robot explored, similarity links were created between sub-nodes of similar classied perception vectors. These links were assigned a cost value of 3 plus the Euclidean distance between the perception vectors. The additional cost assigned to similarity links reects the fact that they are not as certain as links that have been previously traversed. The result is that known links will be preferred when navigating, and if three actions will perform the same result as a similarity link then the robot will always choose to take the three already tried-and-tested actions. The choice of how heavily the similarity links are weighted inuences how likely they are to be selected when nding the shortest path between nodes in the topological map. In these experiments the choice was made by manually looking at the data that was being captured. However, this is not possible when a robot is deployed 6.4. Experiments in Discrete Navigation 107 in everyday situations. In the experiments presented in Section 6.5 a more realistic method of weighting links is used. 6.4.5 Navigation with Discrete Actions After completion of the two learning stages, the SOM indexed topological map was ready to be used to navigate from the current location to the desired perception. This was achieved as follows: 1. Take the current 4D bearing vector and use the SOM to calculate the 49D vector of node responses. 2. Find the closest matching sub-node from the SOM by comparing the 49D vector to all the sub-nodes of the ring node and the nodes surrounding it. 3. Find the closest matching sub-node to the target perception in the same way as for the current perception. 4. Find the shortest path between the current sub-node and the target sub-node, taking into account the cost of traversing each link. 5. Execute the rst action in the path. 6. If the destination has not been reached then go back to step 4. 6.4.6 Results In this section results of experiments testing the navigation performance are presented. A MetraLabs Scitos G5 mobile robot was used in all experiments, and a Vicon tracking system was used to track the robot in order to assess the accuracy and reliability of the algorithm. 6.4. Experiments in Discrete Navigation 108 Navigations to a Goal In order to assess the navigation method, the robot was placed at a goal location and the perception at that location was stored. The robot was then moved to a dierent location and the navigation algorithm described in Section 6.4.5 was applied. This was repeated with 10 dierent starting positions. The results are shown in Figure 6.6 and in tabular form in Table 6.1. The goal location is shown as a large diamond, and the start and end positions of each run are shown with lled circles and black dots respectively. Figure 6.6: Ten navigations to a goal location from dierent starting positions. Each run is numbered at the starting position. The results show that in all tests the robot was able to move closer to the target perception. The absolute mean error in the nal position of the robot was metres, with a mean heading error of position errors were 0.25 m and 0.009 2.98◦ ± 0.57◦ . m respectively. 0.132 ± 0.024 The maximum and minimum 6.4. Experiments in Discrete Navigation Start Position Run θ 109 End Position [rad] θ X [m] Y [m] X [m] Y [m] 1 0.968 -0.898 1.569 -0.628 0.595 1.674 [rad] Actions 53 2 0.676 -0.796 1.583 -0.715 0.715 1.591 61 3 -1.052 -0.844 1.622 -0.865 0.810 1.491 83 4 1.039 0.922 0.003 -0.762 0.822 1.514 65 5 0.006 0.006 2.369 -0.583 0.574 1.511 30 6 0.015 0.694 -0.008 -0.632 0.806 1.612 31 7 0.392 1.038 1.577 -0.768 0.729 1.587 46 8 0.016 -0.470 -1.549 -0.618 0.724 1.621 36 9 -0.032 0.633 1.570 -0.595 0.653 1.483 31 10 1.148 -0.746 1.593 -0.713 0.746 1.570 68 Table 6.1: Ten navigations to a goal location from dierent starting positions. Run numbers correspond to the run numbers shown in Figure 6.6. The route taken to the goal by the robot is often inecient, as can be seen in Figure 6.6, with 83 actions being taken in run 3. This is due to the method favouring locations which the robot has previously been able to navigate to. Traversing the similarity links in the map is considered more expensive than regular action links, resulting in paths that contain more actions being shorter than paths containing similarity links. If more training was carried out in the second stage, more action links would be created and shorter paths would be found. Real-time operation was comfortably achieved, with the map only consulted in between discrete actions while the robot was stationary. The robot speed for carrying out the actions proved to be the limiting factor of the algorithm, with a linear velocity of 0.1m/s used to ensure repeatable actions. Repeatability The repeatability of the proposed navigation method was tested by performing ve navigations to the same goal position, starting from approximately the same location each time. The results are shown in Figure 6.7. On each run the same path was taken, with dierences in the end position only 6.4. Experiments in Discrete Navigation 110 Figure 6.7: The repeatability of the navigation. Five navigations from approximately the same starting position to the same goal position (shown as a diamond). caused by slight variations in the starting position and actions not always resulting in the exact same motion of the robot. The fact that the same path was taken on each run shows that close positions in pose space are producing similar perceptions. The example of route ineciency marked in Figure 6.7 occurred for the same reasons as inecient routes in the rst experiment. The method favours locations which the robot has previously been able to navigate to. Traversing the similarity links in the map is considered more expensive than regular action links, resulting in paths that contain more actions being shorter than paths containing similarity links. 6.4.7 Discussion of Discrete Navigation Experiments The experiments in this section have demonstrated how the environment learning and navigation strategy proposed is both eective and cheap to implement. The experiments presented made use of a restricted sensor perception by relying on the presence of four blue objects as landmarks. Although this restriction is not 6.5. Experiments in Continuous Appearance-Based Navigation 111 completely unrealistic, as in many environments there are such well dened objects, the use of visual features was shown to be undesirable in Chapter 4. Furthermore, it was shown in Chapter 5 that with four easily identiable and constantly visible landmarks, a metric SLAM approach would be as eective as this approach. The use of the blue tiles simply served to demonstrate the viability of the approach, which is applicable to more general image appearance descriptors. The implementation presented in this section was further simplied by limiting the action space to just four carefully repeatable actions. In the next section the method is developed further to remove these limitations. 6.5 Experiments in Continuous Appearance-Based Navigation The experiments detailed in Section 6.4 showed that the approach to navigation with a SOM is possible, albeit in a specially constructed environment with the presence of detectable blue tiles presumed. In the experiments detailed in this section the image data is minimally processed before being used in an appearance-based manner. Using this approach image features need not be detected, matched or tracked, and no assumptions need to be made about the presence of specic types of features. In contrast to the previous experiments, continuous motion is used in this section and a more complete navigation algorithm is developed. 6.5.1 Experimental Environment The same camera as used in the experiments in the previous section was used in these experiments. In this experiment it was positioned facing the ceiling so that people walking around and furniture being moved would not interfere with the perception. 6.5. Experiments in Continuous Appearance-Based Navigation Figure 6.8: (a) The Scitos G5 robot used in the experiments. 112 (b) An example omnidirectional image taken on the robot with the camera facing the ceiling. The robot was restricted to a larger 4×4 metre area, again with no obstacles. The robot and an example distorted wide eld of view image captured by the camera are shown in Figure 6.8. 6.5.2 Global Appearance-Based Perception The image received from the camera is minimally processed before being used directly as the perception vector. The aims of the processing are to achieve rotation and illumination invariance and to reduce the size of the image. Rotation Invariance A compass is desirable as it reduces the number of degrees of freedom of the robot and, as shown by Zeil et al. [127], when images are aligned to the same direction the dierence between them varies smoothly over x, y space. Rotation invariance and bearing calculation can be achieved in a number of ways, for example magnetic compasses, visual compasses such as those detailed in Chapter 3.2.1.1, or Fourier space representations such as [100]. In these experiments a colour blob detection based algorithm for nding and removing the rotation from the images was used. 6.5. Experiments in Continuous Appearance-Based Navigation 113 Two square tiles were placed on the ceiling of the laboratory, one red and one blue. These tiles were detectable as the largest red and blue regions in the undistorted 640 × 480 RGB image using opposing colour channel coding (see Section 2.4.1). When the image locations is rotated and (ur , vr ) and (ub , vb ) of the two tiles are known, the image θ degrees so that the image locations are aligned vertically, that is ur = ub vb < vr . The rotation is calculated as θ = atan2(vr − vb , ur − ub ) (6.8) Having a global compass bearing allows all odometry values of the robot to be expressed in the same x, y axis directions. Illumination Invariance The intensity of light varies over time, and a pixel-wise comparison of two images taken at exactly the same location at dierent times can be misleading. In order to remove some of the eects of varying illumination, the gradient magnitude image is computed by convolving with derivatives fx and fy x and y Sobel gradient operators. The partial for each pixel are combined to give a new pixel value: q G = fx2 + fy2 . (6.9) Image Size The initial image supplied by the camera is RGB colour with a resolution of This is reduced to a 160 × 120 640×480. greyscale image, from which a circular region of radius 50 pixels is taken from the centre. The circular region is mapped to a 7986 dimensional vector in a spiralling concentric circles pattern as shown in Figure 6.9. Although a column-wise or row-wise mapping would also be possible, a concentric circles mapping improves the rotational invariance by considering only the circular region and avoids discontinuities in the vector at the row boundaries. The process of taking an image and turning it into the appearance descriptor is 6.5. Experiments in Continuous Appearance-Based Navigation 114 summarised in Figure 6.9. Figure 6.9: The raw image is turned into an appearance-based descriptor. 6.5.3 Training a SOM In order for the robot to learn the perceptual space (Phase I, Section 6.3.1), it was manually driven around covering the entire environment, and 1000 images were logged at a rate of 2 Hz. These images were then turned into 7986-dimensional perception vectors as described in Section 6.5.2 and used to train a 10 × 10 SOM. A graphical representation of the trained SOM is shown in Figure 6.10. Figure 6.10: A graphical representation of the trained SOM using the perception vectors described in Section 6.5.2. 6.5. Experiments in Continuous Appearance-Based Navigation 115 6.5.4 Mapping Nodes with Continuous Action Links In the second learning stage, the robot was again driven around the environment. Once per second, the current classied perception vector robot heading θ P̃ and the corresponding were stored as a sub-node under the winning node in the SOM as detailed in Section 6.3.2. The links between topological map nodes were labelled with the dierence in the robot's odometry between the perceptions. As the robot explored the environment a dense topological map of 4000 nodes was constructed. As new topological map nodes were created, the nodes assigned to the same SOM node were compared. A similarity link was created between sibling nodes with similar perception vectors, and these links were labelled with the dierence in the robot heading (θ ) between the nodes and the Euclidean distance between the node's perception vectors. This information was then used when navigating with the map. 6.5.5 Navigation with Continuous Action Links The local odometry osets stored for each of the links in the route between the target node and the current node need to be combined to nd the total distance the robot needs to travel. In the experiments detailed in Section 6.4 the robot was driven by the actions specied in the route without explicitly considering the uncertainties involved in the localisation or the route found. However, a better approach is to formulate the problem in a probabilistic way, encompassing all of these uncertainties. To eciently achieve this the framework of a linear Kalman lter with a one dimensional state space was used. The topological map can be considered as a noisy sensor that returns how far in metric terms the robot must move to get to the goal location. The measurement of the x and y osets from the current location to the goal location are corrupted by noise caused by poor localisation and inconsistencies in the map. Acknowledging this at the point of navigation avoids the need to create or maintain a consistent map. 6.5. Experiments in Continuous Appearance-Based Navigation Find the distance left to travel to the goal using the topological map Update Drive towards goal using Dynamic Window Controller Linear Kalman Filter Odometry change since last cycle 116 Predict Figure 6.11: A ow diagram of the navigation algorithm encompassing the inaccuracies in the map by employing a linear Kalman lter. Figure 6.11 shows a ow chart of the navigation algorithm. Although the errors in and x y goal distances are not independent, using two one dimensional lters results in lower computational cost while still remaining operational. To lter the measured distance to the goal location, the usual predict-update cycle of the Kalman lter was followed: 1. The current where x and y oset is predicted as x̂k = x̂k−1 − ux,k (6.10) ŷk = ŷk−1 − uy,k (6.11) σx,k = σx,k−1 + qx,k (6.12) σy,k = σy,k−1 + qy,k (6.13) (x̂, ŷ) is the oset to the goal location, and (ux , uy ) is the odometry dis- tance travelled since the last cycle. The subscript the variance of the ltered goal oset and (qx , qy ) k is the time step, (σx , σy ) is is the variance of the odom- etry distance travelled since the last lter cycle (2% of the distance covered in experiments reported here). The initial estimate (x̂0 , ŷ0 ) is set as the rst measurement of the oset made using the topological map. 2. A measurement of the oset is made using the topological map. This is achieved using a weighted breadth rst search to nd the shortest path be- 6.5. Experiments in Continuous Appearance-Based Navigation 117 tween the current node and the target node, with the cost of each link set as the variance of the measurements involved in the link. In these experiments odometry oset links were weighted with a variance of 2% of the distance the link covers, equating to the condence in the odometry system. The similarity links were weighted in relation to the distance between the perception vectors, up to a maximum of The variance (rx , ry ) 5cm. of the measured oset from the current location to the goal location is computed as the sum of all the variances in the route. This measured oset is then used to update the oset estimate tainty measure (x̂, ŷ) (σx , σy ): σx,k (zx,k − x̂k ) σx,k + rx,k 2 σx,k = σx,k − σx,k + rx,k σy,k = ŷk + (zy,k − ŷk ) σy,k + ry,k 2 σy,k = σy,k − σy,k + ry,k x̂k = x̂k + σx,k ŷk σy,k where (zx , zy ) and its cer- (6.14) (6.15) (6.16) (6.17) is the oset to the goal location as measured by re-localising within the topological map and nding a route to the goal node from the current node. 3. The Kalman ltered (x̂, ŷ) oset is then used as the goal position in a robot- centred coordinate frame, and a controller based on the dynamic window approach [37] is used in order to drive the robot towards the goal. When either the oset lies within one standard deviation of 0, or is less than 0.15m then the navigation process is terminated and the goal is assumed to be reached. This method of navigation allows odometry drift errors and localisation errors to be smoothed and ignored without having to employ computationally costly topology checks such as those carried out in [29]. As the robot approaches a target, the errors in the map will get smaller. Provided the robot continually re-evaluates the 6.5. Experiments in Continuous Appearance-Based Navigation 118 necessary route, then these errors will not prevent navigation. 6.5.6 Results In this section the results of experiments which tested the navigation performance of the method presented in Section 6.5.5 are presented. The MetraLabs Scitos G5 mobile robot shown in Figure 6.8 was used in all experiments, and a Vicon tracking system was used to track the robot in order to assess the accuracy of the algorithm. Navigations to a Goal The robot was placed at a goal location and the perception at that location was stored. The robot was then moved to a dierent location and the navigation algorithm described in Section 6.5.5 was applied. This was repeated with 15 dierent starting positions. The results are shown in Figure 6.12. The goal location is shown as a large diamond, and the start and end positions of each run are shown with open circles and stars respectively. Figure 6.12b shows the ltering of the oset estimation during one of the navigations. The results show that in all tests the robot was able to move closer to the target perception. The absolute mean error in nal position of the robot was metres. The maximum and minimum position errors were 0.456 0.283 ± 0.027 m and 0.137 m respectively. The route taken to the goal by the robot was often not completely direct, as can be seen in runs 1 and 13. This is because the oset to the goal location, as measured using the topological map, uctuates around the true oset, and a combination of the Kalman lter and the physical robot causes a smooth trajectory to be taken. In run 1 the robot can be seen to initially be driving away from the target. However, 6.5. Experiments in Continuous Appearance-Based Navigation 119 2.5 10 2 1.5 Y (metres) 1 6 12 5 1 9 14 3 11 15 0.5 0 7 13 2 -0.5 -1 4 -1.5 -2 -1.5 8 -1 -0.5 0 0.5 X (metres) 1 1.5 2 2.5 (a) 0.5 0 X offset -0.5 Distance (metres) -1 -1.5 -2 -2.5 1 sigma error bound actual offset measured offset filtered offset -3 Y offset -3.5 -4 0 50 Time step 100 150 (b) Figure 6.12: (a) Fifteen navigations to a goal location from dierent starting positions. The goal location is shown as a large diamond, the nishing positions are shown as stars and the starting positions as empty circles. Each run is numbered at the starting position. (b) The Kalman ltered goal navigation runs. x and y osets during one of the 15 6.5. Experiments in Continuous Appearance-Based Navigation 120 this is not a result of errors in the topological map, but is rather caused by the use of the dynamic window approach to control the robot. In this case the robot was initially facing away from the goal, and in order to achieve a continuous smooth trajectory the linear velocity was kept at a minimum of 0.2 m/s. Again real-time operation was comfortably achieved. In this experiment the speed of localising and nding a route through the map is critical, as unlike in the previous experiments, the robot is moving during the calculation. The lter-based navigation algorithm was run at 10 Hz, with a single cycle of the navigation algorithm (localising and nding the remaining distance to travel) taking approximately 85 ms. This imposes a limit on the velocity that the robot can be driven at because of the lag between taking an image and knowing the position. For example, at a velocity of 1m/s the robot will have moved typical speeds of 0.3 10cm during the time taken to localise. At the m/s used here the lag is negligible. Random Navigations 300 The navigation algorithm was further tested by performing range of map nodes. navigations to a The destination map nodes were automatically selected at random, and each trial took place sequentially. The dierence between the nishing location of the robot and the target node location in external world coordinates was recorded. A two dimensional scatter of these errors is shown in Figure 6.13. Figure 6.13 shows that the errors in the ending follow normal distributions. For the error in the metres with a standard deviation of error was −0.0186 0.6012 y x and y positions approximately direction, the mean error is metres. For the metres with a standard deviation of x 0.5313 0.045 direction the mean metres. On some of the navigations the nishing position error was large, and indeed the navigation was manually terminated outside of the mapped area. These are examples of complete failure of the system, and occur whenever the robot drives outside of the mapped region. When outside of the mapped region the localisation cannot 6.5. Experiments in Continuous Appearance-Based Navigation 121 Y (metres) 4 2 0 −2 −4 −2 −1 0 1 2 X (metres) Figure 6.13: A scatter plot of the errors for 300 navigations. succeed, and getting back to the known area is only possible if the mapping phase is still active. Overall, of the 300 navigations nished within 0.5 88% nished within 1 metre of the target and 67% metres. Route Following As a further qualitative demonstration of the algorithm, a route following experiment was performed. The robot was manually driven to ve waypoint locations, and the topological map node was recorded. The robot then autonomously navigated along the route through these points using the navigation algorithm described in Section 6.5.5. Fifteen loops of the route are shown in Figure 6.14. The navigation algorithm was able to correctly navigate the robot around the 5 waypoint course. The paths taken were all correct with the notable exception of the circular loop indicated. The robot performed this loop because it overshot the target location and then was facing in the wrong direction. At this point the 6.5. Experiments in Continuous Appearance-Based Navigation 2 122 Way-point overshoot 1.5 1 Y (metres) 0.5 0 -0.5 -1 -1.5 -2 -2.5 -1 0 1 X (metres) 2 3 Figure 6.14: Fifteen navigations of a route described by ve waypoints indicated by crossed circles. Red circles indicate when the robot believed the waypoint had been reached. dynamic window controller used to drive the robot maintained a linear velocity of 0.2m/s while trying to get back to the goal. 6.5.7 Discussion of Continuous Navigation Experiments The experiments presented in Section 6.5 have extended those presented in Section 6.4 by allowing an arbitrary motion between nodes and by using the image directly as the perception. The algorithm has been shown to be applicable in a more general setting, and the accuracy has been quantitatively and qualitatively evaluated. The proposed approach remains partially limited by the requirement for the robot heading in an external reference frame to be known, as provided here by two coloured 6.6. Limitations of the Method ceiling tiles. 123 However, this limitation is not completely restrictive. A number of visual compass approaches have been developed (see Chapter 3.2) that could be used to provide such information without the need for articial landmarks. 6.6 Limitations of the Method There are a number of limitations to the new navigation method proposed, each of which could be investigated further in future research. Firstly, perceptual aliasing has not been explicitly considered. In particular, if two locations in the environment produce the same perception then the localisation will fail. This in turn means that the constructed map will be incorrect. This is partially solved here by using a Kalman lter to combine goal distance estimates with predictions based on odometry distance travelled when navigating with the map. However, the algorithm could potentially fail if there are many instances of perceptual aliasing. If the robot navigates outside of the mapped area (as was the case in some tests in the random navigations experiment in Section 6.5.6) then navigation will fail. Solutions to this problem have not been considered here, but could involve keeping the mapping phase active during the navigation. However, this in turn highlights an additional limitation; although the complexity of the algorithm is low (linear in the number of map nodes), if the number of map nodes increases unbounded then localising will eventually become computationally intractable. A potential solution to this would be to introduce a node pruning strategy, although in the algorithm detailed here this has not been considered. A further limitation of the proposed algorithm is that it assumes that the environment is static. If this is not the case, then localisation within the map may fail. A potential solution to this problem is to only map parts of the environment that are static. This suggests adding a component to the algorithm that dierentiates between static and dynamic parts of the image, and could form an interesting direction for future research. 6.7. Summary 6.7 124 Summary In this chapter a new environment learning and navigation strategy has been proposed. The method makes use of dense topological mapping to allow odometry information to be successfully used for navigation, while at the same time maintaining eciency through the use of a SOM to index the perceptual space. Experiments with both a simplied blob detection based perception and a general appearancebased approach have shown the eectiveness of the algorithm. The self-organising map has been shown to be a suitable tool for clustering the perceptual space, and the use of a SOM within the algorithm not only allowed greater eciency by compressing the vectors, it also allowed increased speed by reducing the number of comparisons that were required during localisation. This allowed the advantages of dense topological mapping detailed in Section 6.2.1 to be realised, and the disadvantages detailed in Section 6.2.2 to be avoided. The proposed method and experiments allow the following conclusions to be drawn regarding the best method of achieving reliable navigation on a low-cost low-powered commercially viable robot: • Topological mapping combined with local odometry based metric information is a feasible approach for navigation with low-cost robots. Specically, a dense topological mapping approach to navigation is desirable when the map is compressed and indexed by a SOM as it allows the navigation of arbitrary routes through the environment. • Metric consistency is an expensive goal, and is not necessary for eective robot navigation. Ignoring inconsistencies in the map until the point of navigation reduces the computational cost. • Appearance-based approaches to the use of the camera data are eective, general and ecient, and are therefore well suited for use on low-cost robots. 6.7. Summary The accuracy of the method presented was shown to be in the range of 125 0 to 1 metre. When a higher accuracy than this is needed, for example when docking the robot, an alternative method of navigation needs to be employed. In the next chapter an ecient method for accurate local navigation is developed that does not make use of a map, but rather works directly with a snapshot of a target location in order to manoeuvre into a more accurate position. 7 Accurate Visual Appearance-Based Local Navigation 7.1 Introduction In Chapter 6 a system was developed that was capable of navigating the robot to within 1m of the goal location. The system demonstrated that visual appearance- based algorithms are suitable for low-cost robot navigation, and that computationally demanding geometric methods such as those developed and investigated in Chapters 4 and 5 are not necessary for an approximate navigation system. However, when a more accurate navigation is required an algorithm such as the one developed in this chapter is required. A higher level of accuracy is required when docking with a charging station, for example, where the required level of accuracy depends on the size and type of the docking station. A MetraLabs Scitos G5 mobile robot like that used in the experiments in this thesis has two pickups next to the wheels that are lowered when the robot stops. For charging to be possible the robot must be stopped so that each contact is above a dierent oor tile as shown in Figure 7.1. Due to the spacing between the contacts and the size of the oor tiles the required accuracy is about 7.1. Introduction 127 Figure 7.1: The powered oor arrangement at the University of Ulster. The robots have two contacts on the base that are lowered when the robot stops moving. For charging to be possible each contact must be placed over a dierent oor tile, one positive and one negative. This requires the robot to stop with an accuracy of about 20cm. 20cm. A similar arrangement is employed by MobileRobots Inc with their Pioneer 3 DX charging mat, with a required accuracy of about 15cm. Combining an approximate navigation algorithm such as that developed in the previous chapter with an accurate local docking procedure yields a two-tiered navigation strategy. Such an approach is common in navigation systems where a high level of accuracy is not required for all operations, but may incur a higher computational cost. The aim of this chapter is to develop a complementary navigation algorithm to that presented in Chapter 6, with a shorter range but higher accuracy, as a process to be applied after navigating to within 1m of the goal location. In many situations it is not necessary for a docking procedure to exhibit smooth motion, although a computationally ecient algorithm is still required. The accurate local navigation algorithm developed in this chapter employs the snapshot model discussed in Chapter 3, and performs computationally ecient image processing of the current location and the target location images. Like the previous 7.2. Appearance-Based Docking Algorithm 128 chapter, only the global visual appearance is used, avoiding the need to extract and match image features. The proposed algorithm diers from the existing snapshot model based approaches described in Chapter 3 in that no costly feature extraction is necessary, and an isotropic distribution of objects in the environment is not assumed. As in existing image-based homing methods such as Franz et al. [40], 1D images are used for eciency. However, rather than a single radial 1D image, two 1D representations are created from the input 2D image to allow fast translation estimation. In the following section the details of the proposed algorithm are given, and in Section 7.3 experiments are detailed. A comparison of the proposed method to those discussed in Chapter 3 is given in Section 7.4, and the computational eciency and suitability to low-cost robots is discussed in Section 7.4.1. 7.2 Appearance-Based Docking Algorithm In the algorithm developed in this chapter the camera is placed at the centre of rotation, with the optical axis perpendicular to the robot and the direction of view pointing towards the ceiling in exactly the same way as in the second set of experiments in Chapter 6. The omnidirectional lens used in Chapter 6 is used again to provide a wide eld of view, ensuring substantial overlap between the current and the target image. The complete eld of view is not used, but rather just a central 90◦ area which is undistorted, as described in Chapter 2, to create a perspective image. In order to navigate the robot to the target image, the translation and rotation between the current image and the target image are calculated. As the ceiling is approximately planar with only small variations in height, all pixels in the current image will be related to pixels in the target image by an image homography. Restricting the motion of the robot to the oor plane, which is approximately parallel to the ceiling plane, further reduces the global transformation between images to a 7.2. Appearance-Based Docking Algorithm (a) Current Image 129 (b) Target Image Figure 7.2: Example current and target images. The transformation between them can be described by one rotation followed by one translation. In this example if the ◦ current image is rotated clockwise by 93 and shifted up and right then it matches the target image. 1 single rotation and translation (as can be seen in Figure 7.2). This is similar to opening an image in a graphics program and rotating and translating it. The image transformation between the current view and the target view directly relates to the physical transformation between the current location and the target location. Since the camera is placed at the robot's centre of rotation, the image rotation is equal to the physical rotation, and the image translation is linearly proportional to the physical translation. The algorithm rst calculates the rotation between the current and target images, and then nds the translation by aligning the images to face the same direction. After the images are aligned they are reduced to two 1D representations. The goal of the method is similar to the ego-motion estimation algorithm presented in Chapter 4, but here the distance between images separated by up to 1m is calculated whereas in Chapter 4 only a small rotation and translation between consecutive video frames was considered. Furthermore, here a metric estimate of distance is not sought. The 1 Note that assuming that the ceiling plane is parallel to the oor plane results in a weak perspective projection. Although this is only an approximation, it allows the transformation between images to be considered as only a rotation and translation, with no perspective shear. 7.2. Appearance-Based Docking Algorithm 130 following two sections detail the approach. 7.2.1 Calculating Rotation In Chapter 4 three methods of visual compass were compared, namely the Labrosse compass [73], a 2D optic ow approach and a 1D optic ow approach. Each method aimed to calculate the rotation of the robot between two video frames. However, when the robot has travelled a large distance between the two video frames, neither the 1D nor the 2D optic ow approaches are applicable, and the Labrosse compass can be adversely aected by strong visual features such as ceiling lights. An alternative approach is taken here that exploits the fact the ceiling is planar. The rotation ∆θ between the target and current image is eciently found without any additional sensor information by building histograms of the edge orientations in each image as follows: 1. Convolve the proximate φ δx x and and δy y Sobel edge templates with the images to nd the ap- partial derivatives of each image. The edge orientation of each pixel is then dened as dened as atan2(δy , δx ), and the edge magnitude G is p 2 δx + δy2 . 2. For each image build a histogram of the edge orientations of all pixels with an edge magnitude over a threshold µ. A threshold is needed to remove the high frequency noise present in the images. In the experiments reported here, µ = 150 and the histograms contained one bin per degree. for µ The choice of value was made by manually looking at histograms with diering values of µ, and selecting a value that resulted in histograms that were not inuenced by high frequency noise. Values of µ greater that 100 were found to be sucient to remove noise. Two example histograms are shown in Figure 7.3. The histogram for the 7.2. Appearance-Based Docking Algorithm 131 current image is similar to the histogram for the target, but shifted horizontally by the rotation between the two images. This is because the edge element (edgel) orientations are only aected by the rotation of the robot, and not by the translation. Although some pixels will be recorded in the histogram of the current image that are not in the histogram of the target image, the majority of edges are visible in both images so are recorded in both histograms. This is due to the use of a wide angle lens and the assumption that the robot is no more than 1m away from the target. 3. The rotation between the two images is then calculated by nding the horizontal shift between the histograms. Let of edge orientation Note that φ Hc (φ) and Ht (φ) equal the frequency in the current and target image histograms respectively. Hc and Ht are periodic, in the range 0...2π . The rotation between the images is then found as the rotation that gives the minimum sum of squared errors between the histograms: 2π X (Hc (φ + θ) − Ht (φ))2 . ∆θ = arg min θ (7.1) φ=0 The minimum is found by testing all values of θ from 0...2π in increments of one degree. The accuracy of the rotation calculation is partially dictated by the resolution of the edge orientation histograms. In the experiments reported here histograms with one bin per degree were built and no interpolation between bins was carried out, giving a theoretical maximum accuracy of ±0.5◦ . Edgel count Edgel count 7.2. Appearance-Based Docking Algorithm 132 Current Image Edgel Orientation Histogram 400 200 0 0 50 100 300 350 Target Image Edgel Orientation Histogram 400 Shift equals robot rotation 200 0 0 150 200 250 Edgel Orientation (degrees) 50 100 150 200 250 Edgel Orientation (degrees) 300 350 400 Figure 7.3: The rotation between the target image and the current image can be seen as a shift in the edgel orientation histograms of the images. 7.2.2 Calculating Translation After nding the rotation between the current and target location, it is necessary to nd the image translation. This is done eciently by utilising average intensity row and column images. The current image is rst rotated by ∆θ to align it with the target image, and then for both the target location image and the current location image, an average row intensity image and an average column intensity image are created, as shown in Figure 7.4. The one dimensional shift between the target row image and the current row image, and between the target column image and the current column image are then found. These shifts (shown in Figure 7.5) are equivalent to the ∆x and ∆y image translations, which equate to how far in pixels the robot needs to move to get to the target location. The process of creating the one dimensional images and calculating the necessary translation is as follows: 133 Mean Intensity 7.2. Appearance-Based Docking Algorithm Average each column Mean Intensity Average each row Figure 7.4: The 1D average row and column images are constructed from the 2D image by averaging the scanlines. Verticle Scan Line Current and Target Images Current Target Current Target 230 Shift equals x translation Mean Intensity 230 Shift equals y translation Mean Intensity Horizontal Scan Line Current and Target Images 250 250 210 210 190 190 170 170 150 150 130 130 110 110 90 90 0 100 200 Image Row 300 400 0 100 200 300 400 Image Column Figure 7.5: Example average row and average column 1D images for the target and current images shown in Figure 7.2 after the current image has been aligned with the target image. The shift between the images gives the translation. 7.2. Appearance-Based Docking Algorithm 134 1. Rotate the current image about the centre by the value of ∆θ that was found using Equation (7.1) so that the target image and the current image are aligned in the same direction. 2. Create one dimensional vertical and horizontal representations of the aligned current and target images C(u, v) and T (u, v): U 1 X Vc (i) = C(u, i), i = 1...V U u=0 (7.2) V 1 X Hc (i) = C(i, v), i = 1...U V v=0 (7.3) and similarly for the target image, where Vc and intensity images of the current and target images, Vt are the vertical average Hc and Ht are the horizontal average intensity images of the current and target images and U, V are the width and height of the images. 3. Approximate the gradient of the one dimensional images: 0 Vc (i) = Vc (i + 1) − Vc (i) and similarly for Vt , Hc and Ht . i = 1...V − 1 (7.4) By doing so, some of the eects of varying illumination can be eliminated. 0 Hc 4. Extract the central region of matching location in correlation. 0 Ht and 0 Vt and 0 Vc as templates, and locate the best using template matching and normalised cross- Only the central region is considered because the sides of the one dimensional representation may not be visible in both images. experiments reported here a 40% In the central region was used as a compromise between a higher computational cost caused by a bigger template region and the susceptibility to incorrect matches by using a small region. A size when using 144 480 × 480 40% region pixel images allows for an image dierence of up to pixels, which the experiments show is ample for a The result of the translation estimation is a vertical shift v 1m range. and a horizontal shift 7.2. Appearance-Based Docking Algorithm u. These shifts can be expressed as a 2D vector 135 (∆x, ∆y)T that gives the direction and distance (in pixels) that the robot needs to move to get to the target location. This is done by rotating the shifts by ∆θ to move into the coordinate frame of the current robot position: ∆x ∆y = cos∆θ −sin∆θ sin∆θ cos∆θ u v . (7.5) This vector is often referred to as the homing vector in image-based homing methods [18]. 7.2.3 Controlling the Robot In order to drive the robot to the target image, the following simple control algorithm was employed: 1. Estimate (∆x, ∆y)T and ∆θ using Equations (7.1) and (7.5). 2. Rotate the robot on the spot so as to be facing in the direction of the goal, where the direction to the goal is given by arctan2(∆y, ∆x). This is performed using the odometry information provided by wheel encoders. 3. Using the wheel encoders for feedback, drive n cm forward, where n is calcu- lated using Equation (7.6): n=k In experiments p ∆x2 + ∆y 2 . (7.6) k = 1cm. 4. Repeat procedure until p ∆x2 + ∆y 2 < τ , that is accepted as home. where τ is the distance in pixels 7.3. Experiments and Results 5. Rotate the robot on the spot by 136 ∆θ so as to be facing in the same direction as the goal. Whilst this approach results in non-smooth trajectories, the application of this algorithm is as an additional procedure to be run after navigation to the vicinity of a goal position has been carried out using a method like that developed in Chapter 6. Under these circumstances a smooth trajectory is not usually required. 7.3 Experiments and Results In order to assess the performance of the algorithm, two sets of experiments were carried out using a Scitos G5 mobile robot. The environment used was the Uni- versity of Ulster robotics arena shown in Figure 7.1, with no obstacles present. A Vicon tracking system was used to record sub-millimetre accurate ground truth. 7.3.1 Experiment 1: Navigations to a Goal The performance of the proposed navigation algorithm was tested by performing 50 navigations to a goal location from randomly selected positions within a distance of 1m to the goal. The control algorithm outlined in Section 7.2.3 was used with a tolerance of τ =1 pixel, stopping the robot when the image distance between the current view and the target view was found to be 1 pixel or less. Histograms of the errors in the stopping position are shown in Figure 7.6. mean heading error was 4.45◦ (standard deviation from the goal after homing was 0.0648 2.0685◦ ) The and the mean distance metres (standard deviation 0.0265 metres). Although on every run the robot was able to navigate to the home position, on some runs it did not move directly towards the goal. This occurred when a poor heading estimate was found, as is quantied in the following experiment. steps were required by the robot to navigate to the home image. On average 16 7.3. Experiments and Results 137 Histogram of Final Position Error Histogram of Final Heading Error 8 10 7 8 Frequency Frequency 6 5 4 3 6 4 2 2 1 0 0 0.05 0 0 0.1 2 4 Figure 7.6: Histograms of the heading and (x, position for 50 6 8 10 Error (degrees) Distance to Goal (m) y) position errors in the stopping runs from random start positions up to 1m from the target. 7.3.2 Experiment 2: Evaluation of the rotation and translation estimation In order to better evaluate the performance of the (∆x, ∆y, ∆θ) estimation, the robot was placed in an empty environment and an image was acquired. The image was saved as the target image, and the robot was manually driven in a random pattern around the target. The translation and rotation estimation was executed during the robot traversal, and the calculated (∆x, ∆y, ∆θ) values were recorded along with the ground truth position of the current and target images over 800 frames. Histogram of ∆θ Estimate Error ∆θ Estimate Error against Physical Distance 600 4 3 Error (radians) Frequency 500 400 300 200 2 1 0 −1 −2 100 −3 0 −150 −100 −50 0 50 100 150 −4 0 Error (degrees) 0.5 (a) Figure 7.7: (a) Histogram of the 1 1.5 2 Physical Distance (metres) (b) ∆θ estimate for 800 tests over a range of 0 to 1.8 metres. (b) The errors plotted against the physical distance, showing no correlation. 7.3. Experiments and Results 138 Figure 7.7a shows a histogram of the ∆θ estimate error for each frame and a plot of the error against the physical distance for each frame. histogram, other than 8 estimates that were incorrect by 180◦ , the error distribution is approximately Gaussian with a mean of 8.4569◦ . As can be seen in the 0.1375◦ and a standard deviation of Figure 7.7b shows that the angular error does not depend on distance within the range tested. A Spearman rank correlation test was performed in order to test for correlation between angular error and distance. The null hypothesis that no signicant correlation is present was accepted at a 5% signicance level. The method of estimating the dierence in orientation between images proved to work well even up to a distance of incorrect by resented 1% 180◦ 1.8m. The 8 orientation estimates that were were the result of symmetries in the environment, and only rep- of the trials. The eect of the error is that the robot navigates in the wrong direction for one navigation step, and in subsequent navigation steps then corrects this problem. Image ∆x,∆y Distance against Physical Distance 160 Samples Line of best fit (robust) 140 Image distance (pixels) 120 100 80 60 40 20 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Physical Distance (metres) Figure 7.8: The image distance against the physical distance. The boxed area shows the target region algorithm, i.e. navigations within a range of Figure 7.8 shows a plot of the image distance p ∆x2 + ∆y 2 1m. against the physical distance for each frame. The boxed region highlights the values in a range up to 1m, which is the target range for the algorithm. Outside of this region it can be 7.3. Experiments and Results 139 Average Homing Component against Physical Distance 1 Histogram of Average Homing Component 450 0.6 400 0.4 350 0.2 300 0 Frequency Average Homing Component 0.8 −0.2 −0.4 200 150 −0.6 100 −0.8 −1 0 250 50 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 −1 1.8 −0.5 Distance to Target (metres) 0 0.5 1 Average Homing Component (a) (b) Figure 7.9: The average homing component of the estimated displacement vectors for all 800 frames, computed as the dot product of the normalised homing vector with the true homing vector. (a) A plot of the homing component against physical distance, showing that the homing component is weakly correlated with the distance to the goal. (b) A histogram of the homing components. seen that estimates become increasingly noisy. In ideal circumstances the relationship between physical distance and image distance should be linear, and a robust estimate of the linear relationship is shown by a line in 2 the plot . By using the parameters of the tted line to convert the image distances to physical distances, the physical error between estimated distance and actual distance can be found. The mean error of the estimated distance for ranges of up to −3.34−4 metres and the standard deviation was 0.1004 1m was metres. As an evaluation of the direction of the homing vector, [40] suggested the use of the Average Homing Component (AHC). This is the projection of the normalised estimated homing vector onto the normalised true homing vector, computed as the dot product of the two normalised vectors. The AHC ranges from -1 to 1, where a value of 1 indicates a perfect direction and -1 indicates the opposite direction. Any value above zero will drive the robot towards the goal, but the higher the number the more direct the route. The AHC for all median value was 0.9614, 800 samples is shown in Figure 7.9. which is equivalent to a direction error of 16 The degrees. 2 The robust estimate of the linear relationship between physical distance and image distance was found using the iteratively re-weighted least squares routine provided by the MATLAB function. robustt 7.4. Discussion 140 The Spearman rank correlation coecient between the AHC and the distance to the target was calculated as 0.1371. The null hypothesis that there is no correlation was rejected at the 5% signicance, indicating that there is weak correlation. This is expected because the translation is estimated to the nearest pixel. As the distance decreases, rounding errors in the translation estimate have a larger eect on the AHC. The experiments suggest that the homing algorithm can be expected to achieve a nal position accuracy of at worst and 20 cm 95% 10 cm 68% of the time (1 standard deviation), of the time (2 standard deviations). However, in practice the nal position accuracy was higher in the results for experiment 1 as the robot did not stop until an image distance of 1 pixel was found. This meant that the robot occasionally oscillated about its nal position before deciding it was home. 7.4 Discussion Method Accuracy Bekris et al. [10] Mean std 2cm ≈ 60cm Labrosse [74] Röfer [104] always less than Stürzl and Mallot [114] Using articial landmarks 1.5cm ≈ 1cm Franz et al. [40] Comments 3cm ≈ 3cm From distances of up to 25cm The target distance is about 4m Number of experiments and environment not mentioned. Evaluated oine with a recorded image set Proposed method ≈ 6cm Table 7.1: Published accuracy of several methods discussed in Chapter 3 compared to the accuracy of the method presented in this chapter. As a comparison of the accuracy of the method developed in this chapter to the state of the art, Table 1 shows the published accuracy of related methods cited in 7.4. Discussion Chapter 3. 141 It is important to note that this only provides a general comparison as the cited methods have not been tested with the same data, or in the same environment. The accuracy of the new ceiling plane based method is similar to that of [10, 40, 104, 114], and has a number of important advantages. Firstly, unlike [40, 104, 114], an isotropic distribution of landmarks is not assumed. Assuming that landmarks are at equal distances in all directions limits applicability, and can cause problems when obstacles are present in the environment [40]. In contrast to [10, 40, 104, 114], the approach proposed in this chapter is limited by the requirement that a ceiling plane is visible, and that the robot's motion is parallel to the ceiling plane. However, in most indoor environments these condi- tions will be met. Furthermore the ceiling in the test environment is slanted and uneven, with variation in height up to 1.5m. Despite this violation of the under- lying assumptions, the results have still been promising. Furthermore, articial or repeatably identiable landmarks (Bekris et al. [10]) are not required. This is an advantage in situations where it is undesirable to have articial beacons, for example in living rooms, or where features in the environment may change. The use of the ceiling plane for navigation means that the approach is applicable in dynamic environments. The algorithm does not exhibit local minima problems [104, 114] that result in a limited range of operation. Although the 1m range used here is lower than the 4m range reported in [74], the method developed in this chapter is an order of magnitude more accurate. It is therefore well suited as a complementary navigation strategy for application after approaching a docking location using longer distance navigation, such as that in Chapter 6. 7.5. Summary 142 7.4.1 Computational Cost The computational cost of the algorithm is linear in the number of pixels in the input image. This is similar to the complexity of many appearance-based approaches, and by reducing the image to 10 1D the number of operations per pixel is kept low, allowing 3 frames per second to be evaluated on a standard laptop computer . A breakdown of the typical processing time required for each component of the algorithm is shown in Table 7.2: Component Time / ms Pre-processing 8 Rotation Estimation Edge Histogram Creation 11 Edge Histogram Matching 14 Image Rotation 15 1D Image Creation 9 1D Matching 22 Translation Estimation Total 79 Table 7.2: The typical processing time required for each component of the algorithm. 7.5 Summary In this chapter a new algorithm for image-based homing has been presented that exploits the properties of a camera pointing at a ceiling. The new algorithm is fast and ecient and has linear complexity in the number of pixels in the input image. It is ideally suited for use on small robots where sensing and computational power are limited. Unlike other approaches, the algorithm does not require an isotropic distribution of landmarks and does not require explicit feature extraction and matching. The use of 1D horizontal and vertical image averages for the translation estimation 3 The computer used had a 1.6Ghz processor with 1Gb of RAM, running Fedora Linux. 7.5. Summary 143 has allowed the creation of a computationally ecient algorithm. Experiments have shown that the algorithm is capable of reducing the distance that the robot is from a goal from 1m to 6cm. This is accurate enough to successfully dock a Scitos G5 or a Pioneer 3-DX mobile robot with a charging station, and serve as an additional navigation phase to be executed after the approximate navigation algorithm presented in Chapter 6. The navigation methods presented in Chapter 6 and Chapter 7 are fully complimentary. They could be combined by introducing high accuracy nodes to the topological map created in the previous chapter, with such nodes containing extended perception vectors storing a full snapshot image. The applicability of the algorithm is limited to cases where a ceiling plane is available. Furthermore the algorithm assumes that the ceiling plane is parallel to the oor plane. However, this assumption has been shown not to be overly restrictive, as in the test environment the ceiling plane was not a single plane and not parallel to the oor. A more limiting assumption of the proposed approach is that the ceiling plane is not homogeneous. If this is the case then the approach will fail to nd the correct rotation or translation, in a similar way to the potential problems discussed in Chapter 4. However, as with the visual compass algorithms in Chapter 4, any variations in the ceiling intensity will allow the method to work. For example, if a ceiling light is present then the intensity will vary across the image and the algorithm will succeed. Conclusions to be drawn from this chapter regarding the best method of achieving reliable navigation on a low-cost low-powered commercially viable robot are: 1. Appearance-based methods are not only eective for approximate navigation, but also for high accuracy navigation methods. In order to achieve accurate docking, an accurate metric reconstruction is not required. In particular, by reducing an image to a 1D representation the number of operations per pixel is kept low. 7.5. Summary 2. Using the ceiling plane allows simple and ecient navigation. 144 The trans- formation between image pixels becomes homographic, reducing the required complexity of the method. Furthermore, dynamic objects in the environment will not be imaged. 3. Separating navigation into two levels an approximate navigation algorithm such as that in Chapter 6, and a more accurate docking procedure such as that proposed in this chapter avoids the need for one holistic but complex navigation system. This thesis is concluded in the next chapter, with a summary of the work presented and indications of directions for future research. 8 Conclusion This chapter concludes the thesis with a summary of the work presented and suggestions of future research directions. 8.1 Thesis Summary As detailed in Chapter 2, recent advancements in camera and lens technology have provided omnidirectional cameras at consumer prices. This thesis has proposed novel navigation algorithms that utilise such cameras for perceiving the environment while imposing limited requirements on the processing power available. Ecient navigation algorithms that need low-cost hardware are essential if robots are to become widespread consumer products. Four key navigation strategies have been addressed in this work: dead reckoning, image-based homing, metric SLAM and dense topological mapping. Each of these strategies has been considered for ecient low-cost robot navigation. Dead reckoning. Chapter 4 investigated ego-motion estimation using omnidi- rectional vision with conical mirrors. The Labrosse visual compass algorithm was identied as an ecient tool for aiding robot ego-motion estimation, and it inspired two new visual compass algorithms. Direct comparison with the Labrosse approach 8.1. Thesis Summary 146 showed the performance of both proposed algorithms to be poor. This was because unlike the two newly proposed algorithms, the Labrosse compass algorithm uses two dimensional image patches directly, thus reducing the susceptibility to noise. Translation calculation using two conical mirrors to achieve omnidirectional stereopsis was then considered, but problems were encountered with poor alignment resulting in incorrect stereopsis. To rectify this situation, a new set of equations for image projections with a conical mirror were formulated for the more general case of non-exact alignment. However, the formulation encompassed a very large number of parameters. Finding the values of the parameters is costly, and the parameters are liable to change during navigation. This suggests that omnidirectional stereo vision with conical mirrors is not ideally suited for low-cost mobile robots. Metric SLAM. The investigation of computationally ecient navigation continued in Chapter 5 with a look at omnidirectional monocular vision based metric SLAM. A system based on Davison's MonoSLAM was implemented, and quantitative experiments were performed to assess the benets gained by omnidirectional sensing and the applicability within the context of this thesis. It was concluded that conventional visual metric SLAM systems are not suitable, and creating a metrically consistent map is an expensive goal. The overall conclusion from Chapters 4 and 5 was that a low-cost robot may not need complex sensors and algorithms in order to navigate. In the second half of the work, attention turned to dense topological mapping and image-based homing as alternative strategies, and attempted to achieve ecient navigation using a low-cost sheye omnidirectional camera. Dense topological map. Chapter 6 presented a new environment learning and navigation strategy that is both eective and computationally cheap. The method makes use of dense topological mapping to allow odometry information to be successfully used for navigation, while at the same time maintaining eciency through the use of a SOM to index the perceptual space. Experiments with both a simplied 8.2. Summary of Research Contributions 147 blob detection based perception and a general appearance-based approach showed the eectiveness of the algorithm. From Chapter 6 it can be concluded that topological mapping combined with local odometry based metric information is an eective approach for navigation. Fur- thermore, appearance-based approaches to the use of the camera data are eective, general and ecient, and are therefore well suited for use on low-cost robots. Image-based homing. Finally the investigation looked at image-based homing in Chapter 7. The accuracy of the navigation algorithm proposed in Chapter 6 was shown to be in the range of 0 to 1 metre. When a higher accuracy than this is needed, for example when docking the robot, an alternative method of navigation needs to be employed. Chapter 7 proposed an ecient method for accurate local navigation that uses image-based homing in order to manoeuvre into position, yielding an algorithm with a mean accuracy of 6cm. It was shown that in order to achieve an accurate docking, an accurate metric reconstruction is not required. Appearance-based methods were found to not only be eective for approximate navigation, but also for high accuracy navigation. This suggests that when developing a navigation system for a low-powered low-cost robot, appearance-based approaches are both ecient and eective. 8.2 Summary of Research Contributions The following research contributions have been made in this thesis: • A thorough literature review and analysis of robot navigation strategies for economical robots. • Two new rotation estimation algorithms. • The general formulation of omnidirectional projections with conical mirrors. 8.3. Directions for Future Research • 148 A quantitative comparison of monocular omnidirectional vision and classical monocular vision for metric visual simultaneous localisation and mapping. • A novel use of self-organising maps for approximate navigation, eciently achieving visual appearance-based topological mapping. • A novel algorithm for accurate visual homing that exploits the properties of images taken from an indoor robot. 8.3 Directions for Future Research There are a number of directions that future research could take. The work presented in Chapters 4 to 7 suggests the following: • Ego-motion with the sheye camera. The work presented in Chapter 4 made use of conical mirrors to achieve omnidirectional vision. Having moved from the conical mirror to a sheye lens in later chapters, it would be worth returning to Chapter 4 to investigate the use of the sh eye lens for egomotion estimation. Stereo vision can be avoided by exploiting pre-programmed knowledge about the environment, such as in Chapter 7 where the planarity of the ceiling was utilised. Future work could also focus on improving the visual compass methods presented in Chapter 4, in particular the one dimensional image ow method presented in Section 4.2. Improvements may be possible by employing edge information rather than intensity information. • Alternative metric SLAM solutions. Although Chapter 5 showed that conventional metric monocular visual SLAM is not suitable for low-cost navigation, SLAM is a very active research area and advances are continuously being made. Future work could follow newly developed SLAM algorithms such as the pioneering SLAM implementation on a mobile phone by Klein and Murray [64], which has recently provided promising results. 8.3. Directions for Future Research • 149 Topological mapping. The successful topological mapping and navigation proposed in Chapter 6 has a number of possible extensions that could form the basis of future research. A major extension could be to allow entirely online operation by performing SOM training online, thus reducing the algorithm from two stages to a single stage. The main diculty that will need to be overcome will be the need to regularly change the associations between topological map nodes and SOM nodes. Future research could look at ways of achieving this eciently and eectively, and may involve investigations of alternative perception clustering and compression algorithms. Shorter term future research could involve looking at alternative image representations, with a particular focus on rotation invariance in order to remove the need to supplement the navigation algorithm with man-made landmarks. Finally, node management strategies need to be considered. In the current implementation map nodes are created at xed time intervals; this is certainly not optimum, and future research needs to investigate this. Node deletion approaches could also be considered allowing the map to evolve to t the areas regularly navigated to. • Image-based docking. The image-based docking algorithm proposed in Chapter 7 also suggests a number of viable directions for future research. In particular, the way in which the image is used importantly aects the performance of the algorithm. Reducing the size of the image will increase the computational eciency, but may potentially degrade the performance. This could be considered in conjunction with alternative image representations, for example edge magnitudes rather than image intensities. 150 References [1] Ababsa, F., Mallem, M., and Roussel, D. Comparison Between Particle Filter Approach and Kalman Filter-Based Technique For Head Tracking in Augmented Reality Systems. In Proceedings of the 2004 International Confer- ence on Robotics and Automation (ICRA 2004) (2004), IEEE, pp. 10211026. [2] Agrawal, M., and Konolige, K. Real-time Localization in Outdoor Environments using Stereo Vision and Inexpensive GPS. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006) (2006), IEEE, pp. 10631068. [3] Albert, M., and Connell, J. H. tion for mobile robot navigation. Visual rotation detection and estima- In Proceedings of the 2004 International Conference on Robotics and Automation (ICRA 2004) (2004), vol. 5, IEEE, pp. 42474252. [4] Angeli, A., Doncieux, S., Meyer, J.-A., and Filliat, D. Visual topological SLAM and global localization. In Proceedings of the 2009 Interna- tional Conference on Robotics and Automation (ICRA 2009) (2009), IEEE, pp. 20292034. [5] Artac, M., Jogan, M., and Leonardis, A. Incremental PCA for on-line visual learning and recognition. In Proceedings of the 2002 International Con- ference on Pattern Recognition (ICPR 2002) (2002), vol. 16, IEEE, pp. 781 784. References 151 [6] Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. A tutorial on particle lters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50, 2 (2002), 174188. [7] Bailey, T. Constrained initialisation for bearing-only SLAM. In Proceed- ings of the 2003 International Conference on Robotics and Automation (ICRA 2003) (2003), IEEE, pp. 19661971. [8] Baker, S., and Nayar, S. A theory of single-viewpoint catadioptric image formation. International Journal of Computer Vision 35, 2 (1999), 175196. [9] Beauchemin, S. S., and Barron, J. L. The computation of optical ow. ACM Computing Surveys (CSUR) 27, 3 (1995), 433466. [10] Bekris, K., Argyros, A., and Kavraki, L. Imaging Beyond the Pinhole Camera, vol. 33. Springer Netherlands, 2006, ch. Exploiting Panoramic Vision for Bearing-Only Robot Homing, pp. 229251. [11] Bonin-Font, F., Ortiz, A., and Oliver, G. Visual Navigation for Mobile Robots: A Survey. Journal of Intelligent and Robotic Systems 53, 3 (Nov. 2008), 263296. [12] Borenstein, J., Everett, H., and Feng, L. Navigating Mobile Robots: Sensors and Techniques. A. K. Peters, Ltd., Wellesley, MA, 1997. [13] Briggs, A., Detweiler, C., Mullen, P., and Scharstein, D. ScaleSpace Features in 1D Omnidirectional Images. In Proceedings of the fth Work- shop on Omnidirectional Vision (OmniVis 2004) (May 2004), no. 5, pp. 115 126. [14] Briggs, A., Li, Y., Scharstein, D., and Wilder, M. tion using 1D panoramic images. Robot naviga- In Proceedings of the 2006 International Conference on Robotics and Automation (ICRA 2006) (May 2006), IEEE, pp. 26792685. [15] Burbridge, C. Omnidirectional Vision Simulation and Stereo Range Finding. MSc Thesis, University of Essex, Sept. 2006. References 152 [16] Burbridge, C., Nehmzow, U., and Condell, J. Omnidirectional projections with a cone mirror and single mirror stereo. In Proceedings of OMNIVIS 2008 (2008). [17] Burbridge, C., and Spacek, L. Omnidirectional Vision Simulation and Robot Localisation. In Proceedings of Towards Autonomous Robotic Systems (TAROS 2006) (2006), pp. 3239. [18] Cartwright, B., and Collett, T. Landmark maps for honeybees. Bio- logical Cybernetics 57, 1 (1987), 8593. [19] Cartwright, B. A., and Collett, T. S. Landmark learning in bees. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology 151, 4 (Dec. 1983), 521543. [20] Cauchois, C., Brassart, E., Drocourt, C., and Vasseur, P. Calibration of the omnidirectional vision sensor: SYCLOP. In Proceedings of the 1999 International Conference on Robotics and Automation (ICRA 1999) (1999), IEEE, pp. 12871292. [21] Cauchois, C., Brassart, E., Pegard, C., and Clerentin, A. Technique for calibrating an omnidirectional sensor. In Proceedings of the 1999 In- ternational Conference on Intelligent Robots and Systems (IROS 1999) (1999), IEEE/RSJ, pp. 166171. [22] Chella, A., Macaluso, I., and Riano, L. and localization in autonomous robotics. Automatic place detection In Proceedings of the 2007 Inter- national Conference on Intelligent Robots and Systems (IROS 2007) (2007), IEEE/RSJ, pp. 741746. [23] Chen, D., Chang, R., and Huang, Y. Breast cancer diagnosis using self-organizing map for sonography. Ultrasound in Medicine & Biology 26, 3 (2000), 405411. References 153 [24] Civera, J., Davison, A., and Montiel, J. Pattern Recognition and Image Analysis (Lecture Notes in Computer Science). Springer Berlin, June 2007, ch. Dimensionless Monocular SLAM, pp. 412419. [25] Cummins, M., and Newman, P. Probabilistic Appearance Based Navigation and Loop Closing. In Proceedings of the 2007 International Conference on Robotics and Automation (ICRA 2007) (Apr. 2007), IEEE, pp. 20422048. [26] Davison, A. J., Reid, I., Molton, N., and Stasse, O. Real-Time Single Camera SLAM. MonoSLAM: IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6 (2007), 10521067. [27] Deans, M., and Hebert, M. Experimental Comparison of Techniques for Localization and Mapping using a Bearings Only Sensor. In Lecture Notes in Control and Information Sciences; Vol. 271 Experimental Robotics VI (Dec. 2000), Springer-Verlag, pp. 395404. [28] DeSouza, G. N., and Kak, A. C. Vision for Mobile Robot Navigation: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 2 (2002), 237267. [29] Duckett, T., Marsland, S., and Shapiro, J. Fast, on-line learning of globally consistent maps. Autonomous Robots 12, 3 (2002), 287300. [30] Durrant-Whyte, H., and Bailey, T. Simultaneous localization and mapping: part I. IEEE Robotics & Automation Magazine 13, 2 (2006), 99110. [31] Durrant-Whyte, H., Majumder, S., Thrun, S., de Battista, M., and Scheding, S. map building. A bayesian algorithm for simultaneous localisation and In Robotics Research, vol. 6 of Springer Tracts in Advanced Robotics. Springer Berlin / Heidelberg, 2003, pp. 4960. [32] Durrant-Whyte, H., Rye, D., and Nebot, E. Localisation of automatic guided vehicles. In Proceedings of 7th International Synopsium on Robotics Research (ISRR 95) (1996), Springer Verlag, pp. 613625. References 154 [33] Elinas, P., Sim, R., and Little, J. J. σ SLAM: Stereo Vision SLAM using the Rao-Blackwellised Particle Filter and a Novel Mixture Proposal Distribution. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2006) (May 2006), IEEE, IEEE, pp. 15641570. [34] Filliat, D. Interactive learning of visual topological navigation. In Proceed- ings of the 2008 International Conference on Intelligent Robots and Systems (IROS 2008) (2008), IEEE/RSJ, pp. 248254. [35] Filliat, D., and Meyer, J.-A. Global localization and topological maplearning for robot navigation. In Proceedings of the Seventh International Con- ference on Simulation of Adaptive Behavior (SAB 2002) (2002), MIT Press, pp. 131140. [36] Fischler, M. A., and Bolles, R. C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM 24, 6 (June 1981), 381395. [37] Fox, D., Burgard, W., and Thrun, S. The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine 4, 1 (1997), 2333. [38] Francis, G., and Spacek, L. Linux Robot with Omnidirectional Vision. In Proceedings of Towards Autonomous Robotic Systems (TAROS 2006) (2006), pp. 5663. [39] Franz, M., Schölkopf, B., and Bülthoff, H. Homing by Parameterized Scene Matching. Experimental Brain Research 114 (1997), 235245. [40] Franz, M., Schölkopf, B., Mallot, H., and Bülthoff, H. Where did I take that snapshot? Scene-based homing by image matching. Biological Cybernetics 79, 3 (1998), 191202. [41] Garreau, ington J. Post, Robots available That online: Fill an Emotional Vacuum. Wash- http://www.washingtonpost.com/wp- References 155 dyn/content/article/2006/12/28/AR2006122801309.html. Accessed 29th June 2010. [42] Gaspar, J., Winters, N., and Santos-victor, J. Vision-based navigation and environmental representations with an omnidirectional camera. IEEE Transactions on Robotics and Automation 16, 6 (2000), 890898. [43] Geyer, C., and Daniilidis, K. Catadioptric projective geometry. Interna- tional Journal of Computer Vision 45, 3 (2001), 223243. [44] Goedemé, T., Tuytelaars, T., Van Gool, L., Vanacker, G., and Nuttin, M. Feature based omnidirectional sparse visual path following. In Proceedings of the 2005 International Conference on Intelligent Robots and Systems (IROS 2005) (2005), IEEE/RSJ, pp. 18061811. [45] Hafed, Z., and Levine, M. Face recognition using the discrete cosine transform. International Journal of Computer Vision 43, 3 (2001), 167188. [46] Hartley, R., and Zisserman, A. Multiple View Geometry. Cambridge University Press, 2000. [47] Hecht, E., and Zajac, A. Optics. 1974. [48] Hong, J., Tan, X., Pinette, B., Weiss, R., and Riseman, E. Imagebased homing. IEEE Control Systems Magazine 12, 1 (1992), 3845. [49] Horn, B. K., and Schunck, B. G. Determining optical ow. Articial Intelligence 17, 1-3 (Aug. 1981), 185203. [50] Huang, W. H., and Beevers, K. R. Algorithmic Foundations of Robotics VI. 2005, ch. Topological Mapping with Sensing-Limited Robots, pp. 235250. [51] Isard, M., and Blake, A. CONDENSATIONConditional Density Propagation for Visual Tracking. International Journal of Computer Vision 29, 1 (Aug. 1998), 528. References 156 [52] Ishiguro, H., and Tsuji, S. Image-based memory of environment. In Proceedings of the 1996 International Conference on Intelligent Robots and Systems (IROS 1996) (1996), pp. 634639. [53] Ishiguro, H., Yamamoto, M., and Tsuji, S. Omni-Directional Stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (1992), 257262. [54] Itti, L., Koch, C., and Niebur, E. attention for rapid scene analysis. A model of saliency-based visual IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 11 (1998), 12541259. [55] Jain, A., and Vailaya, A. Image retrieval using color and shape. Pattern Recognition 29, 8 (1996), 12331244. [56] Jogan, M., and Leonardis, A. Robust localization using an omnidirectional appearance-based subspace model of environment. Robotics and Au- tonomous Systems 45, 1 (2003), 5172. [57] Kaski, S., Kangas, J., and Kohonen, T. Bibliography of self-organizing map (SOM) papers: 1981-1997. Neural Computing Surveys 1, 3 (1998), 1176. [58] Kelley, C. T. Solving Nonlinear Equations with Newton's Method. SIAM Press, 2003. [59] Kerr, D. Autonomous scale invariant feature extraction. PhD Thesis, University of Ulster, 2008. [60] Kim, J., Yoon, K.-J., Kim, J.-S., and Kweon, I. Visual SLAM by SingleCamera Catadioptric Stereo. In Proceedings of SICE-ICCAS Joint Conference (Oct. 2006), pp. 20052009. [61] Kim, J.-H., and Chung, M. J. SLAM with omni-directional stereo vision sensor. In Proceedings of the 2003 International Conference on Intelligent Robots and Systems (IROS 2003) (Oct. 2003), IEEE/RSJ, pp. 442447. References [62] Kim, 157 S., and Oh, S.-Y. SLAM in Indoor Environments using Omni- directional Vertical and Horizontal Line Features. Journal of Intelligent and Robotic Systems 51, 1 (2008), 3143. [63] Kiviluoto, K. Predicting bankruptcies with the self-organizing map. Neu- rocomputing 21, 1-3 (1998), 191201. [64] Klein, G., and Murray, D. Parallel Tracking and Mapping on a Camera Phone. In Proceedings of the Eigth International Symposium on Mixed and Augmented Reality (ISMAR'09) (Oct. 2009), IEEE/ACM, pp. 8386. [65] Kobilarov, M., Sukhatme, G., Hyams, J., and Batavia, P. People tracking and following with mobile robot using an omnidirectional camera and a laser. In Proceedings of the 2006 International Conference on Robotics and Automation (ICRA 2006) (2006), IEEE, pp. 557562. [66] Kohonen, T. Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 1 (1982), 5969. [67] Kohonen, T. Self-organization and associative memory. Harvard Press, 1988. [68] Kohonen, T. The self-organizing map. Neurocomputing 21, 1-3 (1998), 16. [69] Kosaka, A., and Pan, J. Purdue Experiments in Model-based Vision for Hallway Navigation. In Proceedings of Workshop on Vision for Robots in IROS'05 (1995), IEEE, pp. 8796. [70] Kosecka, J., Zhou, L., Barber, P., and Duric, Z. Qualitative image based localization in indoors environments. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2003), vol. 2, IEEE, pp. 310. [71] Kurdthongmee, W. A novel hardware-oriented Kohonen SOM image compression algorithm and its FPGA implementation. Journal of Systems Archi- tecture 54, 10 (Oct. 2008), 983994. References 158 [72] Kwok, N. M., and Dissanayake, G. Bearing-only SLAM in indoor en- vironments using a modied particle lter. In Proceedings of the 2003 Aus- tralasian Conference on Robotics and Automation (2003), pp. 18. [73] Labrosse, F. The visual compass: performance and limitations of an appearance-based method. Journal of Field Robotics 23, 10 (2006), 913941. [74] Labrosse, F. Short and long-range visual navigation using warped panoramic images. Robotics and Autonomous Systems 55, 9 (Sept. 2007), 675684. [75] Lambrinos, D., Moller, R., Labhart, T., and Pfeifer, R. A mobile robot employing insect strategies for navigation. Robotics and Autonomous Systems 30, 1 (2000), 3964. [76] Lemaire, T., and Lacroix, S. SLAM with Panoramic Vision. Journal of Field Robotics 24, 1-2 (2007), 91111. [77] Lenser, S., and Veloso, M. ing monocular vision. Visual sonar: fast obstacle avoidance us- In Proceedings of the 2003 International Conference on Intelligent Robots and Systems (IROS 2003) (2003), vol. 1, IEEE/RSJ, pp. 886891. [78] Lin, S.-S., and Bajcsy, R. True Single View Point Cone Mirror Omni- Directional Catadioptric System. In Proceedings of the 2001 International Conference on Computer Vision (2001), pp. 102107. [79] López-Nicolás, G., Guerrero, J., and Sagüés, C. Multiple homographies with omnidirectional vision for robot homing. Robotics and Autonomous Systems 58, 6 (2010), 773783. [80] Lucas, B., and Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the 7th International Conference on Articial Intelligence (IJCAI) (Aug. 1981), pp. 674679. [81] Maimone, M., Cheng, Y., and Matthies, L. Two years of Visual Odometry on the Mars Exploration Rovers. Journal of Field Robotics 24, 3 (2007), 169186. References 159 [82] Maire, F., and Keeratipranon, N. Bearing Similarity Measures for SelfOrganizing Feature Maps. In Proceedings of Sixth International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'05) (2005), pp. 286293. [83] Matsuhisa, R., Kawasaki, H., Ono, S., Hanno, T., and Ikeuchi, K. Structure from Motion for Omni-directional Images by using Factorization and Bundle Adjustment Method (in Japanese). In Proceedings of Meeting on Image Recognition and Understanding (2008). [84] Menegatti, E., Ishiguro, H., and Maeda, T. Image-based memory for robot navigation using properties of omnidirectional images. Robotics and Autonomous Systems 47, 4 (July 2004), 251267. [85] Micusik, B., and Pajdla, T. Structure from Motion with Wide Circular Field of View Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 7 (July 2006), 11351149. [86] Miyamoto, K. Fish Eye Lens. Journal of the Optical Society of America 54, 8 (Aug. 1964), 1060. [87] Möller, R., and Vardy, A. Local visual homing by matched-lter descent in image distances. Biological Cybernetics 95, 5 (2006), 412430. [88] Möller, R., Vardy, A., Kreft, S., and Ruwisch, S. Visual homing in environments with anisotropic landmark distribution. Autonomous Robots 23, 3 (2007), 231245. [89] Montemerlo, M., Thrun, S., Koller, D., and Wegbreit, B. Fast- SLAM: a factored solution to the simultaneous localization and mapping problem. In Proceedings of the Eighteenth National Conference on Articial intel- ligence (Menlo Park, CA, USA, 2002), American Association for Articial Intelligence, pp. 593598. References 160 [90] Montiel, J., and Davison, A. A visual compass based on SLAM. In Proceedings of the 2006 International Conference on Robotics and Automation (ICRA 2006) (2006), pp. 19171922. [91] Nayar, S. Omnidirectional vision. In Proceedings of the 1998 British Machine Vision Conference (1998). [92] Nayar, S. K., and Baker, S. Catadioptric Image Formation. In Proceedings of the 1997 DARPA Image Understanding Workshop (1997), pp. 13471431. [93] Neal, M., and Labrosse, F. Rotation-invariant appearance based maps for robot navigation using an articial immune network algorithm. In Proceed- ings of the 2004 Congress on Evolutionary Computation (CEC 2004) (2004), pp. 863870. [94] Nehmzow, U. Mobile Robotics: A Practical Introduction. Springer-Verlag London, 2000. [95] Nehmzow, U., and Vieira Neto, H. Locating Objects Visually using Opposing-Colour-Channel Coding. In Proceedings of Towards Intelligent Mo- bile Robots (TIMR 2003) (2003). [96] Nelson, R. C., and Aloimonos, J. Finding motion parameters from spherical motion elds (or the advantages of having eyes in the back of your head). Biological Cybernetics 58, 4 (Mar. 1988), 261273. [97] Oja, M., Kaski, S., and Kohonen, T. Bibliography of self-organizing map (SOM) papers: 1998-2001 addendum. Neural Computing Surveys 3, 1 (2003), 1156. [98] Owen, C., and Nehmzow, U. Route Learning in Mobile Robots through Self-Organisation. In Proceedings of the 1st Euromicro Workshop on Advanced Mobile Robots (EUROBOT) (1996), IEEE Computer Society, ISBN 0-81867695-7, p. 126. References 161 [99] Owen, C., and Nehmzow, U. Landmark-based navigation for a mobile robot. In Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB 1998) (1998), pp. 240245. [100] Pajdla, T., and Hlavac, V. Zero Phase Representation of Panoramic Images for Image Based Localization. In Proceedings of the International Con- ference on Computer Analysis of Images and Patterns (1999), pp. 550557. [101] Paz, L. M., Jensfelt, P., Tardós, J. D., and Neira, J. EKF SLAM updates in O(n) with Divide and Conquer SLAM. In Proceedings of the 2007 International Conference on Robotics and Automation (ICRA 2007) (Apr. 2007), pp. 16531657. [102] Perez, J. A., Castellanos, J. A., Montiel, J. M. M., Neira, J., and Tardos, J. D. Continuous Mobile Robot Localization: Vision vs. Laser. In Proceedings of the 1999 International Conference on Robotics and Automation (ICRA 1999) (May 1999), IEEE, pp. 29172923. [103] Pisokas, J., and Nehmzow, U. Adaptive Agents and Multi-Agent Systems III. Springer Berlin, 2005, ch. Experiments in Subsymbolic Action Planning with Mobile Robots, pp. 216229. [104] Röfer, T. Image based homing using a self-organizing feature map. In Proceedings of the 1995 International Conference on Articial Neural Networks (ICANN-95) (1995), vol. 1, pp. 475480. [105] Rössel, S., and Wehner, R. Polarization vision in bees. Nature 323 (1986), 128131. [106] Scaramuzza, D., Fraundorfer, F., and Siegwart, R. Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC. In Proceedings of the 2009 International Conference on Robotics and Automation (ICRA 2009) (2009), pp. 42934299. [107] Scaramuzza, D., Martinelli, A., and Siegwart, R. A Flexible Technique for Accurate Omnidirectional Camera Calibration and Structure from References 162 Motion. In Proceedings of the Fourth IEEE International Conference on Com- puter Vision Systems (ICVS '06) (2006), IEEE, p. 45. [108] Scaramuzza, D., and Siegwart, R. Appearance-Guided Monocular Omnidirectional Visual Odometry for Outdoor Ground Vehicles. IEEE Transac- tions on Robotics 24, 5 (2008), 10151026. [109] Sim, R., and Dudek, G. Comparing image-based localization methods. In Proceedings of the 18th International Joint Conference on Articial Intelligence (2003), pp. 15601562. [110] Sivic, J., and Zisserman, A. Toward Category-Level Object Recognition. Springer Berlin, 2006, ch. Video Google: Ecient visual search of videos. [111] Smith, R. C., and Cheeseman, P. On the Representation and Estimation of Spatial Uncertainty. The International Journal of Robotics Research 5, 4 (1986), 5668. [112] Spacek, L. A catadioptric sensor with multiple viewpoints. Robotics and Autonomous Systems 51, 1 (2005), 315. [113] Sturm, J., and Visser, A. An appearance-based visual compass for mobile robots. Robotics and Autonomous Systems 57, 5 (May 2009), 536545. [114] Stürzl, W., and Mallot, H. transformed panoramic images. Ecient visual homing based on Fourier Robotics and Autonomous Systems 54, 4 (2006), 300313. [115] Sun, Y., Cao, Q., and Chen, W. An object tracking and global localization method using omnidirectional vision system. In Proceedings of the 5th World Congress on Intelligent Control and Automation (2004). [116] Swaminathan, R., and Nayar, S. Polycameras: Camera clusters for wide angle imaging. NY, 2005. Tech. Rep. CUCS-013-99, Columbia University, New York, References 163 [117] Tan, K., Hua, H., and Ahuja, N. Multiview Panoramic Cameras Us- ing Mirror Pyramids. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 7 (2004), 941946. [118] Thrun, S., Burgard, W., and Fox, D. Probabilistic Robotics. MIT Press, 2005. [119] Tomasi, C., and Kanade, T. Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision 9, 2 (Nov. 1992), 137154. [120] Tsai, R. Y. An ecient and accurate camera calibration technique for 3D machine vision. In Proceedings of the 1986 IEEE Conference on Computer Vision (1986), pp. 364374. [121] Unibrain Inc. Fire-i rewire board http://www.unibrain.com/Products/VisionImg/Fire_i_BC.htm. camera. Accessed 29th June 2010. [122] Usher, K., Ridley, P., and Corke, P. A Camera as a Polarized Light Compass: Preliminary Experiments. In Proceedings of the 2001 Australian Conference on Robotics and Automation (2001), no. November, pp. 116120. [123] Williams, S. B. Ecient solutions to autonomous mapping and navigation problems. PhD Thesis, University of Sydney, Australia, 2001. [124] Winters, N. A holistic approach to mobile robot navigation using omnidi- rectional vision. PhD Thesis, Trinity College, Dublin, 2001. [125] Winters, N., Gaspar, J., Lacey, G., and Santos-Victor, J. Omnidirectional vision for robot navigation. In Proceedings of the 2000 Workshop on Omnidirectional Vision (OMNIVIS 2000) (2000), p. 21. [126] Yin, H. Data visualisation and manifold mapping using the ViSOM. Neural Networks 15, 8 (2002), 10051016. References 164 [127] Zeil, J., Hofmann, M. I., and Chahl, J. S. Catchment areas of panoramic snapshots in outdoor scenes. Journal of the Optical Society of America 20, 3 (Mar. 2003), 450. [128] Zhou, C., Wei, Y., and Tan, T. Mobile robot self-localization based on global visual appearance features. In Proceedings of the 2003 International Conference on Robotics and Automation (ICRA 2003) (2003), pp. 12711276. [129] Zhou, L., Barber, P., and Duric, Z. Qualitative image based localization in indoors environments. In Proceedings of the 2003 Conference on Computer Vision and Pattern Recognition (CVPR'03) (2003), pp. 310. [130] Zhu, Z. Omnidirectional stereo vision. In Proceedings of the 10th International Conference on Advanced Robotics 2001 (2001), IEEE. [131] Zitova, B., and Flusser, J. Image registration methods: a survey. Image and Vision Computing 21, 11 (2003), 9771000.