A real time multi-agent visual tracking system for modelling complex
Transcription
A real time multi-agent visual tracking system for modelling complex
A real time multi-agent visual tracking system for modelling complex behaviours on mobile robots Luke Robertson MSc in Artificial Intelligence Division of Informatics University of Edinburgh 2000 Abstract An overhead camera system capable of tracking a single ‘Khepera’ robot around a large arena in real time is in common use by the Mobile Robot Group at the Division of Informatics, Edinburgh University. This system, while very successful at tracking one robot, could not be used to track more than one object successfully. This report documents an extension to this system capable of locating, identifying and tracking a number of mobile objects within a set of images in real-time. The system is generic and can be applied to any type of robot and vision hardware. The system presented in this report is shown to be able to track up to seven Khepera robots using a 75 MHz processor. An example application of the system is demonstrated by implementing a robot following task, using a number of LEGO MINDSTORM robots, using only the positional information obtained from the tracking system. This application is demonstrated successfully using a relatively slow frame grabbing device. Acknowledgements I would like to thank: My supervisor Chris Malcolm for his advice in presenting this MSc report and his help during the scope of the project. My second supervisor, Yuval Marom, has provided patient and useful guidance in obtaining and using resources and of knowledge of the existing single Khepera tracking system. Sunarto Quek collaborated in the design of the tracking system, providing some useful tips and ideas for constructing the indicators, extracting regions from the image and efficiently processing the image. Louise Bowden, Nico Kampchen, Emanuele Menegatti, Marietta Scott, Jan Wessnitzer and Paul Wilson all provided valuable help with the twin joys that are MATLAB and LATEX. The EPSRC have supported my work over the year of the MSc course. ii Contents 1 Introduction and background 1.1 Introduction . . . . . . . . . . . . . . . 1.1.1 Project outline . . . . . . . . . 1.1.2 Report Outline . . . . . . . . . 1.1.3 Collaborative work . . . . . . . 1.2 Social robotics . . . . . . . . . . . . . . 1.3 Object Tracking . . . . . . . . . . . . . 1.3.1 Segmentation . . . . . . . . . . 1.3.2 Classification . . . . . . . . . . 1.3.3 Object tracking . . . . . . . . . 1.4 Robot tracking . . . . . . . . . . . . . 1.4.1 The Cognachrome vision system 1.4.2 Roboroos robot soccer team . . 1.5 Robot Tracking summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Single Khepera tracking system and the vision 2.1 The single Khepera tracking system . . . . . . . 2.1.1 The Khepera arena . . . . . . . . . . . . 2.1.2 Meteor Driver . . . . . . . . . . . . . . . 2.1.3 Operation . . . . . . . . . . . . . . . . . 2.2 Evaluation system . . . . . . . . . . . . . . . . . 2.2.1 Layout of the test system . . . . . . . . 2.2.2 Video for Linux driver . . . . . . . . . . 2.3 Hardware and single Khepera tracking summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Object segmentation and classification 3.1 Vision techniques . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Moments . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Grouping pixels to regions - Connected Components 3.1.3 Boundary tracking . . . . . . . . . . . . . . . . . 3.2 Efficiency of the vision algorithms . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 4 4 5 5 7 9 10 10 11 12 . . . . . . . . 13 14 14 15 17 20 21 22 24 . . . . . 25 26 27 31 33 33 3.3 3.4 3.5 3.6 3.2.1 Efficient connected component algorithm 3.2.2 One pass orientation calculation . . . . . Region finding experiments . . . . . . . . . . . . 3.3.1 Indicator classification experiments . . . Extending the classification system . . . . . . . 3.4.1 Processing regions . . . . . . . . . . . . Detection tests and indicator design . . . . . . . 3.5.1 Design of the indicators . . . . . . . . . 3.5.2 Single frame classification tests . . . . . 3.5.3 Execution time . . . . . . . . . . . . . . Detection and identification summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 34 36 36 38 40 41 42 43 46 49 4 Object Tracking 4.1 Method of operation . . . . . . . . . . . . . . . . 4.1.1 Tracking and windowing . . . . . . . . . . 4.1.2 Scanning and detecting . . . . . . . . . . . 4.1.3 Indicator identification . . . . . . . . . . . 4.1.4 System output . . . . . . . . . . . . . . . 4.1.5 System debugger . . . . . . . . . . . . . . 4.1.6 Basic performance evaluation . . . . . . . 4.2 Extensions to the basic system . . . . . . . . . . . 4.2.1 Calculating the heading of an indicator . . 4.3 Tracking experiments . . . . . . . . . . . . . . . . 4.3.1 Execution time . . . . . . . . . . . . . . . 4.3.2 Static positions . . . . . . . . . . . . . . . 4.3.3 Tracking objects moving in a straight line 4.3.4 Tracking simple robot tasks . . . . . . . . 4.3.5 Tracking Kheperas . . . . . . . . . . . . . 4.4 Tracking summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 52 53 54 55 56 57 59 59 60 61 61 62 65 66 68 69 5 Multiple robot tracking application: The waggle 5.1 Bee foraging and communication . . . . . . . . . 5.2 The round dance . . . . . . . . . . . . . . . . . . 5.3 The waggle dance . . . . . . . . . . . . . . . . . . 5.3.1 Dance description . . . . . . . . . . . . . . 5.3.2 Orientation of the food source . . . . . . . 5.3.3 Distance . . . . . . . . . . . . . . . . . . . 5.4 Bee communication summary . . . . . . . . . . . dance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 72 73 73 74 75 75 iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Multiple robot tracking application: Robot implementation 6.1 The dancing bee . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Robot construction . . . . . . . . . . . . . . . . . . . . 6.1.2 Line following method . . . . . . . . . . . . . . . . . . 6.1.3 Extracting orientation information from the robot dance path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The dance following bee . . . . . . . . . . . . . . . . . . . . . 6.2.1 Communicating with the MINDSTORM bees . . . . . 6.3 The robot bee controller . . . . . . . . . . . . . . . . . . . . . 6.3.1 The bee controls . . . . . . . . . . . . . . . . . . . . . 6.3.2 Bee control simulator . . . . . . . . . . . . . . . . . . . 6.3.3 The real robot . . . . . . . . . . . . . . . . . . . . . . . 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Copying the heading of the leading bee . . . . . . . . . 6.4.2 Following the dance path with a single bee . . . . . . . 6.5 Summary of the multiple agent tracking application . . . . . . 7 Conclusion 7.1 Achievement of the goals of the project . . . . . . . 7.2 Assumptions and limitations . . . . . . . . . . . . . 7.3 Advice for extending this work . . . . . . . . . . . . 7.3.1 Automatic threshold selection . . . . . . . . 7.3.2 Simple prediction of object positions . . . . 7.3.3 Using statistical methods to derive estimates 7.4 What was learnt during the scope of the project . . A Machine vision algorithms A.1 Component labelling . . . . . . . . . . . . . . . . A.2 Boundary tracking . . . . . . . . . . . . . . . . . A.3 Locating Holes . . . . . . . . . . . . . . . . . . . A.4 Orientation of an objects axis of elongation . . . . A.4.1 Efficient method of calculating orientation A.5 The heading of an object-indicator . . . . . . . . Appendices . . . . . . 77 79 79 81 82 83 83 84 85 85 87 89 89 92 93 95 . . . . . . 97 . . . . . . 100 . . . . . . 101 . . . . . . 101 . . . . . . 103 of position104 . . . . . . 104 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 . 111 . 113 . 113 . 115 . 117 . 118 110 B Robot implementation algorithms 121 B.1 Line following . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 B.2 Sending a byte command via the MINDSTORM transceiver . 122 v vi List of Figures 1.1 1.2 1.3 Example of edge detection . . . . . . . . . . . . . . . . . . . . 7 View of football arena . . . . . . . . . . . . . . . . . . . . . . 10 Identifying Robo-soccer players . . . . . . . . . . . . . . . . . 11 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 Layout of the Khepera arena . . . . . . . . . . . . . . . . . . . The YUV planar arrangement . . . . . . . . . . . . . . . . . . Memory layout of a UNIX system . . . . . . . . . . . . . . . . A Khepera in the arena, and a pixel-level view . . . . . . . . . Two Kheperas interacting . . . . . . . . . . . . . . . . . . . . The Khepera Arena . . . . . . . . . . . . . . . . . . . . . . . . The layout of the test system . . . . . . . . . . . . . . . . . . Video for Linux - Capturing process . . . . . . . . . . . . . . . A (covered) Khepera in the test arena (and a pixel-level view) 15 16 17 18 19 21 22 23 24 3.1 3.2 3.3 3.4 3.5 3.6 Differently parameterised indicators . . . . . . . . . . . . . . Pixel labelling conventions . . . . . . . . . . . . . . . . . . . The orientation of an object . . . . . . . . . . . . . . . . . . Pixel neighbours . . . . . . . . . . . . . . . . . . . . . . . . Describing thresholded image with regions . . . . . . . . . . Object representation after one pass of the connected component algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Extracted moment information from an image . . . . . . . . An A4 sized indicator in the Khepera arena . . . . . . . . . The indicator marking system . . . . . . . . . . . . . . . . . Identifying an indicator by holes . . . . . . . . . . . . . . . . An example of the system being unable to find a hole in an indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scene containing 7 object indicators before and after classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detecting a Khepera sized indicator . . . . . . . . . . . . . . Example of a partially obscured indicator being incorrectly classified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 27 29 31 32 . . . . . 34 36 37 39 40 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 vii . 42 . 44 . 45 . 46 3.15 Indicators spread over the Khepera arena . . . . . . . . . . . . 47 3.16 Testing the classification of indicators . . . . . . . . . . . . . . 48 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.14 Tracking by appling the detection system in successive frames Searching for indicators using windows . . . . . . . . . . . . . System debugger: The scanning band sweeping the image . . . Sharing object positions with other processes . . . . . . . . . . Listening for object positions . . . . . . . . . . . . . . . . . . View of the system debugger . . . . . . . . . . . . . . . . . . . The heading of an indicator . . . . . . . . . . . . . . . . . . . Detecting the headings of indicators . . . . . . . . . . . . . . . The positions of two static indicators . . . . . . . . . . . . . . The observed COM positions of an object on a straight line path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The straight line path (y-magnified) . . . . . . . . . . . . . . . Paths of four robots being tracked . . . . . . . . . . . . . . . . The respective size of the Khepera and indicator used in Khepera tracking experiments . . . . . . . . . . . . . . . . . . . . . Paths of two Kheperas being tracked . . . . . . . . . . . . . . 5.1 5.2 5.3 The Round Dance . . . . . . . . . . . . . . . . . . . . . . . . . 72 The Waggle Dance . . . . . . . . . . . . . . . . . . . . . . . . 74 Orientation of food source from hive . . . . . . . . . . . . . . 74 6.1 6.2 6.3 Robot bees . . . . . . . . . . . . . . . . . . . . . . . . . . . . The dancing bee . . . . . . . . . . . . . . . . . . . . . . . . . The dance path, with overlayed positions obtained from the tracking system . . . . . . . . . . . . . . . . . . . . . . . . . . The layout of the gears in the robot bee . . . . . . . . . . . . Overhead view of the robot bee . . . . . . . . . . . . . . . . . Extracting the orientation of a food source from the positions of a dancing bee . . . . . . . . . . . . . . . . . . . . . . . . . . Communication between the RCX and a PC . . . . . . . . . . The start of the simulation . . . . . . . . . . . . . . . . . . . . The simulated paths of the bees after one circuit . . . . . . . . Simulation of the following bee . . . . . . . . . . . . . . . . . Three robot bees matching the heading of a leading bee . . . . The heading of the four bees at the start of the following process The heading of the four bees over the following process . . . . The dance path of the leading bee, the path of the following bee, and the orientation of the two paths . . . . . . . . . . . . 4.11 4.12 4.13 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 viii 51 53 55 57 57 58 60 61 63 65 65 66 68 69 78 79 80 80 81 83 84 86 86 87 89 90 91 92 A.1 A.2 A.3 A.4 Identifying an indicator by holes . . . . . . . . The area scanned when looking for holes . . . Indication of object heading from the center of Deriving the indicator heading . . . . . . . . . B.1 Line following: Scanning for the line ix . . . . . . . . holes . . . . . . . . . . . . . . . . . . . . . . . . . 114 115 118 119 . . . . . . . . . . . . . . 123 x List of Tables 3.1 3.2 3.3 3.4 Moments derived from an A4 indicator in the Khepera arena Moment invariance of an indicator in the test arena . . . . . Design of the object indicators in the two arenas . . . . . . . Execution time of the detection process . . . . . . . . . . . . . . . . 4.1 4.2 Execution time of the tracking process . . . . . . . . . . Evaluations of the tracked positions obtained from static dicators . . . . . . . . . . . . . . . . . . . . . . . . . . . Heading calculations from static indicators . . . . . . . . The number of instances of the system losing a robot . . . 62 4.3 4.4 . . in. . . . . . 37 38 43 47 . 63 . 64 . 67 A.1 Connectedness ambiguity . . . . . . . . . . . . . . . . . . . . . 114 B.1 Contents of a transmit-message packet sent to the RCX IR tower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 xi Chapter 1 1.1 Introduction An overhead camera system capable of tracking a single ‘Khepera’ robot around a large arena in real time is in common use by the Mobile Robot Group at the Division of Informatics, Edinburgh University. This system, while very successful at tracking one robot, could not be used to track more than one object. 1.1.1 Project outline The outline for this project was to: • Refine the tracking system to allow for several robots and static objects to be tracked using the existing hardware. • Construct a system capable of distinguishing between different objects. • Be able to match the positional information of the detected objects with built-in positional information of critical regions in the environment. • Maintain the efficiency required for real-time control of the robots from the vision system. 1 2 1. Introduction and background This report describes a visual system capable of locating, identifying and tracking a number of mobile objects within a set of images. The system has been designed to follow a number of generic robots around a controlled environment. The system was required to extract the real time positions of the objects to allow this information to be used to control the robots, and to accurately describe the path of the robot. The required execution time of the system depended on the speed of the robot (and the robotics application). If a robot could not move far between each time-step then the system could process frames at a slow rate. If the tracked robot could move across the frame, or the application required fast sampling, then the system would need to process frames at a faster rate. The system presented in this report system is shown to be able to track up to (a theoretical) seven Khepera robots (in real time) using a 75 MHz CPU. Demonstrations of the system are performed showing its performance on different processors, frame grabbing hardware and robots, applied to different applications. In the second half of the report an example application is demonstrated by implementing a robot following task using a number of LEGO MINDSTORM robots using only the positional information obtained from the tracking system as input. This application is demonstrated successfully using a relatively slow (15fps), but cheap, frame grabbing device that was being evaluated for use in future tracking applications. This device was shown to be acceptable, allowing reasonable control of the following robots, providing the tracked robots were sufficiently slowed down. 1.1.2 Report Outline The report is broken into the following chapters: 1: Describes some of the traditional machine vision methods associated with 1.1 Introduction 3 tracking objects in images and explains the problems associated with each method. It is shown that the main problem occurs when separating the object pixels from the image. This is a non-trivial problem which effects other processes if done poorly. 2: Briefly explains the vision hardware available during the scope of this project. The existing single robot tracking system is described. This fails to track more than one Khepera principally because it makes no attempt to identify the robots, so it cannot recognise a robot in successive frames. 3: The third and fourth chapters explain the extensions made to the robot tracking system to allow it to track more than one robot. This chapter introduces a method of identifying individual ‘object-indicators’ used to mark the physical objects. These indicators can be seen more clearly by the system, which can extract the position and heading of each indicator and hence information about the real object. 4: Provides an outline of the system capabilities. This describes the operation of the system, the method communicating positional information to other processes, and introduces an easy to use system debugger/user interface. Some tests of the system are also performed on trivial tracking problems. 5: A more detailed test of the system is made in the fifth and sixth chapters. The fifth chapter gives a brief description of the waggle dance which it is postulated, bees perform when communicating the location of distant food sources to other bees. A simple model of this dance is demonstrated using the tracking system in the sixth chapter. 6: A number of LEGO MINDSTORM robots follow a leading robot by only using positional information obtained from the tracking system. A simulator used to test the method used to control the following robots is introduced. A single bee robot is shown to successfully copy the dance of a leading bee. 4 1. Introduction and background 1.1.3 Collaborative work The project was implemented by both the writer and Sunarto Quek [15]. The generic design of the tracking system was worked on as a team with each member then independently implementing, revising and testing the system (for separate MSc project considerations). • This collaboration involved the design of the object segmentation and classification part of the project (objects are effectively tracked by applying this process in each frame). This involved sharing the design of the objectindicators, the extraction of the indicators from the image, the classification of the indicators and also the extraction of the position and orientation information of these segmented indicators. • The final system (performing the actual tracking in each sequential frame) was implemented independently. • All of the code used in the system was constructed independently. Although the basic vision algorithms were shared (with some of the code being discussed). • The writing of the report and the design and implementation of the main tracking application were performed independently. 1.2 Social robotics The need for such a system for experimentation with the Khepera robots arises as a result of their limited sensing capabilities, and the limitations of the existing tracking system (which cannot reliably track more than one Khepera). Information from the tracking system would be used to enhance the perceptual repertoire of one or more agents to enable group interactions. 1.3 Object Tracking 1.3 5 Object Tracking The core of this report is involved with the traditional vision problem of object tracking. Conventionally the vision system is broken into three main components: segmentation, object recognition and object tracking. This section describes some of the more popular techniques applied to these elements. It should be noted that: • There is no properly defined method to complete any of these stages. • All of the techniques have disadvantages (and all offer some applicationdependent advantages). 1.3.1 Segmentation Segmentation separates an image into its component regions. While this is relatively easy for humans it is far more difficult for machines. Segmentation can be defined as a method to partition a gray-level image F [i, j] into regions P1 , ..., Pk . This can be visualised as determining how the foreground of an image is separated from the background. The segmentation process should extract interesting objects1 from the image. The technique applied determines the level and detail to which this is possible. This section demonstrates the three chief segmentation methods: Thresholding, Edge Detection, and Token Grouping. Thresholding Thresholding is accomplished by defining a range of values in the image. Pixels within this range are said to be part of the ‘foreground’ all the other pixels are defined as the ‘background’ [9]. A typical black and white image is quantised into 256 gray levels during 1 Where an interesting object refers to an item that the system is required to identify. 6 1. Introduction and background the digitisation process, whereas a binary image contains only two levels (on and off). Most basic machine vision algorithms operate on a binary image, although they can be extended to work with gray images. A binary image B[i, j] is created by thresholding a gray image. If it is known that the object intensity values are in a range [T1 , T2 ] then we can the obtain the thresholded binary image B[i, j] from the gray level image F [i, j]: B[i, j] = 1 if T1 ≤ F [i, j] ≤ T2 0 otherwise. • The user must select the thresholds [T1 , T2 ], either by setting a fixed value by trial and error, or by using an adaptive threshold derived from a histogram of the image [9]. • Some images may not have clear-cut histogram peaks corresponding to distinct image structure, making automatic thresholds complex to derive. • Automatic thresholding techniques are discussed as part of future extensions to this project in section 7.3.1 on page 101. Edge Detection Thresholding selects pixels by brightness — there are no requirements that the segmented regions are continuous. An alternative definition of a region can be based on its boundary [6]. An example of an image segmented into edges is shown in figure 1.1. The second image shows the edges found in the first object. • Edge definitions are generally local — there may be places on boundaries where the measure of ‘edgeness’ drops, resulting in poor segmentation [16]. • A further problem occurs when there are two edges touching. Two different starting points can result in different segmentations. 1.3 Object Tracking 7 Figure 1.1: Example of edge detection Token group segmentation This technique represents aspects of the image by tokens. These can represent several features in the image such as specific pixel, line, or surface features. Segmentation is achieved by grouping tokens into more abstract groups until new tokens are formed, representing instances of objects of interest. Global regularities are found by combining local and non local information [20]. • The vision system decides whether grouping occurs depending on some predefined grouping criteria as specified by a user. 1.3.2 Classification After regions have been found within the image a classification process needs to be applied to identify these objects. The three most popular methods are Region Growing, Template Matching and Connectionist Methods. Region Growing Given starting points in the image, neighbouring pixels are examined to see if they meet the criteria of properties of a particular region. If they do then they are added to the growing region, otherwise they are assigned to a new region [9]. 8 1. Introduction and background • The final solution is dependent on the initial definitions as defined by human input. • Different starting conditions may grow into different regions. Template matching A target pattern (the template) is shifted to every location in an image and used as a mask. An image is formed showing where regions similar to the template are located [7]. • This is the preferred technique for tracking object within an image, due to its relative simplicity. → The computation required can be large, depending on the size of the image. → This can be reduced by limiting the size of the window that the template is applied to within the image. Neural networks Neural networks (NNs) can be used to partition the feature space using non-linear class boundaries. These boundaries are set during the training phase by showing a carefully selected training set representing the objects that will be encountered during the recognition phase. After learning these classification boundaries the network behaviours like any other classifier [3]. More recent research tend towards unsupervised learning — with a Self Organising Network (SOM) forming its own categories through the correlation of the data, without relying on a person teaching the network which group an object belongs to. • The design of the NN is important and is dependent on the application. 1.3 Object Tracking 9 • NNs require a large training set, with a human evaluating each network classification. • The SOM can categorise objects but not label them. 1.3.3 Object tracking Segmentation techniques can lead to large inconsistencies when tracking an object across several images. Traditionally, segmentation and classification techniques fail to uniquely identify objects across frames, despite being able to successfully classify objects in static frames. The two most successful tracking methods are Kalman Filtering and Differencing techniques. Differencing While not a dedicated tracking mechanism this technique can be used to predict where an object should be in the current frame based on its location and velocity from the preceding frames. By looking for the closest match to that area a record can be kept of which object is which [17]. • However the method does not uniquely identify the objects and can lose track of them if they pass close to each other. Kalman filtering The Kalman filter is a set of mathematical equations which provide recursive definitions of past, present, and future states even if the precise nature of the system is unknown. Its predictive algorithm considerably reduces the search space required and offers an accurate method of tracking objects [21]. • The mathematics used by the Kalman filter are very intensive and can result in large computational costs. 10 1. Introduction and background • There are also a large variety of Kalman filters available, with varying complexity and accuracy — it is not obvious which filter is best suited to individual applications. 1.4 Robot tracking This section provides a rough outline into some research into appling object tracking methods specifically for tracking mobile robots. The most common field of research combining both robotics and vision comes from robot soccer competitions. Several teams use their own vision systems with the robot players ‘coached’ by a PC watching the action on an overhead camera. 1.4.1 The Cognachrome vision system A popular commercial system, used to win the 1996 International Micro Robot World Cup Soccer Tournament, is the Cognachrome Vision system built by Newton Research Labs [10]. Figure 1.2: View of football arena The system is chiefly implemented in hardware allowing it to track 5 objects in a frame of 200x250 at a rate of 60 frames a second. The system can track 1.4 Robot tracking 11 a maximum of 25 objects (although the speed drops below 60 fps as the number of objects increases beyond 5). • Each object is distinguished by colour. There are 3 colour tracking channels each of which can track multiple objects of the given colour. Each robot is identified by two coloured circles, the larger identifys the team and the smaller identifys the player in that team. Figure 1.3: Identifying Robo-soccer players • The light source and light gradient determines how the camera interprets the colour balance — which can change over the pitch. • The curvature of the lens distorts the pitch so careful calibration is required. • The system has been successfully applied to robot catching, soccer and docking applications. 1.4.2 Roboroos robot soccer team A team of researchers from the Department of Computer Science at the University of Queensland produced their own tracking method [5] reaching second place in the 1998 RoboCup. This system used template matching to match both robots and the ball, successfully distinguishing them from foreign objects and noise. The tracking was conducted by a method of differencing between two frames. 12 1. Introduction and background The system was unable to uniquely identify each individual object and track it across images — leading to inconsistent results in varying environments. The researchers concluded that refinements needed to be made to the tracking component by applying a Kalman Filtering technique. 1.5 Robot Tracking summary • The tracking process can be split into three distinctive elements: Extracting important features from the image, Classifying features found in the image into objects, and tracking these objects in successive frames. • All of the later tracking elements require a successful segmentation to have been performed. • With carefully designed objects, in controlled environments, simple thresholding techniques allow the object pixels to be extracted from the image. In changing conditions, thresholding may segment these objects poorly causing problems for the classification element. • Using a probabilistic technique badly segmented and classed objects can still be accurately tracked. These techniques are computationally (and mathematically) complex. • A number of moderately successful robot tracking applications are available. These either track objects by colour (which requires extra calibration to find the colour threshold used when segmenting the images) or by following robot ‘indicators’ placed on top of the robot. Chapter 2 Single Khepera tracking system and the vision hardware This chapter principally introduces the vision hardware used in the scope of this project. The single Khepera tracking system (which this project extends) is also described. As well as using the existing Khepera tracking hardware, a ‘test’ arena was made to allow simple tests to be conducted on an independent system. This arena used newer and cheaper vision hardware so the tests were also partly to evaluate this hardware as an alternative method of conducting robot tracking work in the future. The main problems with the existing system will be detailed. To outline, these are: • The system does not attempt to identify the object. If the system is extended to track multiple objects, it is not clear which object is which in each frame. • The system tracks object(s) by following two LEDs placed on top of the robot. These are generally not part of the robot so an external circuit board is required. • It is difficult for a basic vision system to group together LEDs if there is more than one robot in the frame. This is especially true when the robots 13 14 2. Single Khepera tracking system and the vision hardware are near to each other, which makes robot interaction tasks impossible. • The system makes no attempt to track an object if it is partially obscured by the overhead umbilical cord of the Khepera. • Segmenting the LEDs from the image relies on a fixed, user defined threshold. 2.1 The single Khepera tracking system The tracking solution presented by Lund, Cuenca and Hallam [12] simplifys the tracking problem by attaching an external circuit board with two mounted LEDs onto the Khepera. This allows a very simple tracking system to accurately follow the position of the two LEDs within the image. The LEDs are identified as the two brightest pixels1 within a window in the image. If the LEDs are found in the window then the robot has been detected and simple pixel moment calculations are conducted to extract its center of mass position and orientation. If the LEDs are not found2 then the robot is lost and the system waits for it to become visible again. By simplifying the system in this way, the system bypasses some of the complexity involved in the object segmentation and classification issues by never needing to explicitly identify the robot. Only the two LEDs (which are easily distinguished from the rest of the image) need to be segmented from the image. 2.1.1 The Khepera arena The Khepera arena consists of a number of pieces of chipboard arranged into a 2.4m x 2.4m square. A black and white camera is located 2m above the central position of the arena. A Matrox Meteor frame grabber samples the camera at a maximum resolution of 640 x 480 at a maximum rate of 50 1 2 Above a set threshold. For example if the Khepera is hidden under its umbilical cord. 2.1 The single Khepera tracking system 15 frames per second (every 0.02s). The resolution varies over the image, at the center one pixel represents a real distance of ∼7mm. The actual arena is contained in a 370 x 370 pixel sub-window in the center of the image. CAMERA MONITOR 2m 1111111111111111111111 0000000000000000000000 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 2.5m 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 2.5m FRAME GRABBER CPU Figure 2.1: Layout of the Khepera arena The Meteor card is attached to a Linux box with 16 MB of memory running at 75MHz. This frame grabbing hardware is obsolete and no longer supported by Matrox. A third party, free device driver is used to control the frame grabber [1]. 2.1.2 Meteor Driver The frame grabber captures a colour image and stores it in a directly referenced, contiguous block of PC memory. The Linux box has 16MB of RAM, 1.2 MB of this is allocated for frame storage purposes. This allows enough for two 640x480 frames to be stored in memory3 . Frames can either be stored in a RGB, or packed or planar YUV420 format. The packed YUV format stores the luminescence component (Y) and the chromosity components (U and V) in separate blocks (see figure 2.2). Because only the intensity component is required4 this format is convenient as it allows quick, efficient access to the Y component. 3 4 The frames are captured in colour with two bytes required to represent each pixel. The image is only in black and white. 16 2. Single Khepera tracking system and the vision hardware Figure 2.2: The YUV planar arrangement The driver can operate in a number of modes. A user can request a single frame, or a continuous (unsynchronised) selection of images. The most common is an ‘asynchronous’ mode which allows the images to be placed into different buffers in the memory area. This allows one frame to be processed by the user while the capture card places image data into another. The capture card sends a signal to the user process when a frame is ready to be processed. Reading from the device Frames are transfered from the capture card to a continuous block of RAM on the PC by a DMA process. To guarantee enough contiguous space, the Unix kernel can be prevented from using a segment at the ‘top’ of RAM by hiding it during boot time. The kernel will not build page tables to reference this area, so Unix processes will not use this segment. Providing no other devices are configured to use it, this segment of physical memory will be written to by the frame grabbing device only. Bypassing the Unix kernel in this way is the only way to ensure a large continuous block (several hundred kilobytes) of RAM. A simple method of ‘hiding’ RAM from the kernel at boot time has been obtained by applying a “big physical area” patch [8] to the kernel. The device driver allows a user to transfer this frame data into memory 2.1 The single Khepera tracking system UNIX KERNEL Kernel 17 FRAME BUFFER Tracker PC MEMORY Figure 2.3: Memory layout of a UNIX system allocated for their own use by a read() call. This has obvious speed and storage disadvantages5 . The driver offers a more efficient method of memorymapping the hidden segment to the user via a mmap() call. Problems with the driver • Due to the experimental nature of the device driver, frames were often out of sync when using the single frame capture mode and during early frames in the synchronous capture mode. A basic workaround was to ignore the early frames. • A more problematic fault with the driver was that during synchronous capture the card is free to dump a frame into any bank. If the user is slow processing a frame bank, the capture card may write another frame into that bank! The user must either assume that the frames will not have altered much in a 20 ms time slice (in which time a Khepera can move a maximum of 2cm) or copy each frame into another section of memory. 2.1.3 Operation Tracking the two LEDs allowed the orientation of the robot to be calculated — which allowed the system to predict where the robot was going to be next6 . However, it was not clear which way along this line of orientation the robot was heading. 5 6 The image exists twice in RAM. Although this feature was not used by the tracking system itself. 18 2. Single Khepera tracking system and the vision hardware Kheperas can move at a maximum speed of 1 ms−1 . In one time slice (20 ms) a Khepera can move a maximum of 2 cm (less than 3 pixels). Figure 2.4 shows a Khepera in the bottom left of the arena. The ∼58mm diameter of the Khepera occupys a box of ∼8x8 pixels. Figure 2.4: A Khepera in the arena, and a pixel-level view There were two operational modes of the system, ‘tracking’ and ‘scanning and detection’. During the detection phase the arena is scanned, breaking the image into bands to speed up processing (only one band is processed in each frame). The pixel moments in the band are calculated to determine whether the robot is inside the band, and to derive its orientation and the position of the two LEDs. When tracking, the LEDs are assumed to be in a window of 8x8 pixels (equivalent to 5.8x5.8 cm2 ) centred at the previous position of the robot. If the robot is not found in this window then the system reverts to its detection phase. Problems with the system • The system can be expanded to track several objects. But the objects can not be identified. When Kheperas get too close together it is very difficult 2.1 The single Khepera tracking system 19 (for a simple vision system) to determine which LEDs are grouped. An example is shown in figure 2.5. → As the objects are not identified there is no control as to how the objects are arranged in output from the system. The first robot seen in a scan is generally classed as robot ‘one’. But during another scan this might be the second robot found, and classed as robot ‘two’. Figure 2.5: Two Kheperas interacting • The relatively poor resolution of the Khepera in the arena (∼8 pixels in diameter) makes a more complicated vision technique difficult. • The system has to be carefully calibrated (to obtain the real ‘arena’ coordinates of the Khepera). Any changes in the setup (camera or arena position) requires the lengthy calibration process to be re-conducted. • The intensity threshold used to find the LEDs has to be set by hand. → Depending on the lighting conditions this can vary. → It can also vary depending on the position of the Khepera in the image. • However the system offers some advantages. The system has been used to 20 2. Single Khepera tracking system and the vision hardware track two objects by placing the two LEDs on the two objects and tracking the lightest pixels around the screen. This works well for objects close to each other (eg in box pushing experiments). The orientation of the robot is lost though (obviously this cant be found from one pixel). Post processing of the positions is used to class each recorded position to the correct robot. 2.2 Evaluation system The Khepera arena presented a complex environment for a vision system: • The arena consists of a number of unevenly surfaced pieces of chipboard arranged together. • It has a surrounding wall of foam about a meter high (for an acoustic experiment) which cast uneven shadows around the board. • The overhead lighting is unevenly spaced around two sides of the arena making patches of light and dark regions around the arena. • To allow the Khepera to run online, an umbilical cable with a large feeder crane is placed overhead, which can occlude objects in the arena. • The Kheperas themselves appear as very small objects within the image. • These features are shown in figure 2.6. A Khepera can (just) be seen in the bottom right of the arena. The overhead crane, wires and counter-balance weight (used to run the Khepera online) can be seen in the center of the image. To allow for a more controllable environment, and a clearer robot resolution, a purpose built arena was built for the testing phase of this project. A cheap USB web camera was purchased. This plugged directly into a Unix box and 2.2 Evaluation system 21 Figure 2.6: The Khepera Arena could be controlled by generic video driver software. The more expensive camera and frame grabber used in the Khepera arena, were obsolete so the purpose built arena also acted as an evaluation of using this cheap alternative as part of future tracking systems. 2.2.1 Layout of the test system The system did not require any expensive frame grabbing hardware, just a USB port. The processor used was a 500 MHz Pentium III, allowing faster processing power than the 75MHz system used in the Khepera arena. However the system had a maximum transfer rate7 of fifteen frames (of 640x480) per second (0.06̇ s per frame) much slower than the Meteor frame grabber which could operate at 50 fps (0.02s per frame). Experiments performed on the system used direct pixel values for positional information — the experiments were only for the evaluation of the vision system so calibration was not needed. This allowed the system to be moved around to suit the application. 7 USB can only transfer data at a maximum rate of 12Mbps. 22 2. Single Khepera tracking system and the vision hardware CAMERA USB PORT CPU Figure 2.7: The layout of the test system 2.2.2 Video for Linux driver To access the USB camera with Linux, a kernel with USB support was required [19]. This was only available (at the time) in development kernels, so a back-port was obtained which included the USB code in a stable Linux kernel. The main advantage of this setup was that Video for Linux [11] could be used as the driver. This is built into modern Linux kernels, and allows generic code to work with any compatible video device. Using this driver the tracking system could be used with several hardware setups. A Phillips PCVC680K camera was used, this allowed: • A maximum resolution of 640x480 in colour at a maximum rate of 15 fps. • Numerous pixel format methods (including packed YUV420 as used in the Khepera arena system). • Several vision controls (brightness, contrast, colour) which could be controlled by software. The camera could also auto-adjust its aperture size. The Video for Linux driver allowed an array of frame-banks to be used as storage space for the incoming frames. As with the meteor driver, the area used for frame storage could be memory mapped by the user. The user selected a particular bank and requested the driver to place an image into 2.2 Evaluation system 23 that bank with a non-blocking call. This allowed the user to process another frame while the driver acted on the request. When the user was ready to process this incoming frame a blocking synchronise call was issued which waited for the driver to finish writing to the bank. REQUEST FRAME IN BANK A PROCESS FRAME IN BANK B WAIT FOR FRAME A (block) SWAP BANKS Figure 2.8: Video for Linux - Capturing process Test arena issues • The main advantages of this test system was that it was portable, customisable and cheap. Any system that used the Video for Linux driver would conceivable work with any Video for Linux compliant hardware. • The small Khepera robot could be represented with better definition. In the Khepera arena the ∼60mm diameter robot covered 8 pixels. With the setup used in the test arena, the same diameter covered 30 pixels. • The camera produced a colour image. This offered an extra feature parameter which a detection system could use to distinguish between objects. However the system presented in this report was principally built to work on the black and white system available with the Khepera arena, so this colour information is not used. • The potential problem with the system is the relatively slow frame rate offered by the USB. Only a maximum of 15 frames per second is offered (compared with the 50 frames offered by the Matrox Meteor hardware). 24 2. Single Khepera tracking system and the vision hardware Figure 2.9: A (covered) Khepera in the test arena (and a pixel-level view) 2.3 Hardware and single Khepera tracking summary • In this chapter the vision hardware used during the scope of this project was introduced. The existing Khepera arena system was detailed and the system used to evaluate a cheap USB web camera as a effective tracking camera was also described. • The single Khepera tracking system was introduced. The problems and limitations with the system were described. → An extraneous piece of hardware (a LED board) was required so each object could be tracked. → Only the orientation of the tracked object was found. The actual heading of the robot was unknown (whether the robot was facing up or down the line of orientation). → A prediction of the objects next position using orientation and previous positions was not used when looking for the object in a new frame. → The system had a built in conversion system to convert pixels to world coordinates. This required a lengthy calibration procedure for each change to the setup. → The system relied on a fixed threshold value to determine whether the LEDs had been found. This threshold was fixed by hand. → Crucially tracking more than one robot was not feasible (especially interacting robots). Chapter 3 Object segmentation and classification This chapter describes the methodology used to segment and identify objects within an image, employed by the extension to the tracking system (as presented in this report). Ideally the new system would be sufficiently simple to run on the existing hardware (with a relatively slow processor). A more sophisticated method of segmenting and classifying the objects in an image was required, so that the robots could be identified, preventing confusion when making classifications. A system of using ‘object-indicators’ to mark the identity of each object was introduced. Rather than tracking LEDs the system would track (and identify) objects by an indicator placed on top of the robot. These indicators were easy to see by a vision process, compared to the less well defined robots. The indicator method allowed a user to construct their own indicator from any material, without relying on many software or hardware constraints (unlike the purpose built LED boards required for the existing system). The indicators would be recognised by a set of feature parameters (which would vary for each indicator). This would allow the system to recognise each ob25 26 3. Object segmentation and classification ject in each frame, thus avoiding the principal problem of the single Khepera tracking system (not being able to recognise robots in successive frames). The only criteria the user would have to consider when making the indicators would be to: • Design indicators large enough to provide sufficient resolution to perform the relevant vision tasks. • Construct enough sufficiently differently parameterised indicators so that each object could be distinguished by the system. Figure 3.1: Differently parameterised indicators By making the indicators sufficiently small compared to a robot, and placing a boundary around each indicator, the indicators could be prevented from overlapping and adding extra complexity to a classification system. 3.1 Vision techniques Each indicator needed to be designed to allow it to be separated from the image background by a simple thresholding process. Simple machine vision techniques were used to extract thresholded pixels from the image, and to group connected pixels into regions. These regions were processed further to extract more detailed information (the size, position, and orientation), allowing the ‘un-interesting’ regions to be ignored so that only the regions which possibly matched important objects were studied. The majority of techniques demonstrated in this chapter are basic vision 3.1 Vision techniques 27 techniques and discussed in many sources [3], [9], and [16]. Machine vision algorithms generally follow the following conventions. The image is described as a matrix with row and column indices i and j respectively. The pixel in row i and column j is described by the element [i, j] in the image matrix. This matrix is of size m x n corresponding to the height and width of the image. By convention the pixel [0, 0] is visualised as being at the top left of the image. Therefore the index i points ‘downwards’ and index j points to the ‘right’. This is illustrated in figure 3.2. ( 0, 0 ) j ( 0, n ) ( i, j ) i ( m, 0 ) ( m, n) Figure 3.2: Pixel labelling conventions 3.1.1 Moments Information about an object represented by a group of pixels can be extracted from the intensity and position information of each pixel. This allows a relatively detailed knowledge of the objects position, size, and orientation to be found [3]. Objects are represented by ‘on’ pixels in a binary pixel set describing the pixels in an image after a segmentation process has been applied to the gray level image (see section 1.3.1, on page 5). 28 3. Object segmentation and classification Gray level moments Using the gray-level intensity information about a pixel provides more accurate moment calculations [9]. The intensity information is used as an indicator of ‘mass’1 of the pixel. This provides a more accurate calculation as the more important pixels (with intensitys far away from the threshold boundaries) are given a higher mass weighting to allow for errors in the segmentation process. Pixels that are mistakenly classed as part of an object (or missed from the object) are generally close to the threshold boundaries and are given less weight during calculations. Zeroth order moments - Area The area of an object contained in a binary pixel set B[i, j], is simply the number of ‘on’ pixels contained in the set. A= n X m X B[i, j] i=1 j=1 First order moments - Position The mid-position of an object can be found by using the first order moments. If we consider the intensity of a point as being an indicator of the ‘mass’ of this point, this can be used to calculate the center of mass coordinates (x̄, ȳ) of the object represented in the binary set (equation 3.1). Pn Pm x̄ = ȳ = 1 Or some weighting factor. i=1 j=1 A Pn Pm i=1 j=1 A iB[i, j] jB[i, j] (3.1) 3.1 Vision techniques 29 The calculation requires the position and intensity value of each pixel making up the object to be examined. The result is the weighted mean of all the pixels forming the object. Second order moments - Orientation To be able to define a unique orientation, the object must be elongated along one axis [9]. The orientation of this axis defines the orientation of the object. The axis of least second moment, which is equivalent to the axis of least inertia (in 2D), is used to find this axis of elongation. The orientation is the angle θ, of this axis from the x axis. OBJECT AXIS OF ELONGATION θ Figure 3.3: The orientation of an object Parameters which solve the least squares fit of the axis through the object pixels are the second order moments shown in equations 3.2. A proof of this is shown in appendix A.4. a = n X m X (xij − x̄)2 B[i, j] i=1 j=1 n X m X b = 2 c = (xij − x̄)(yij − ȳ)B[i, j] (3.2) i=1 j=1 n m XX (yij − ȳ)2 B[i, j] i=1 j=1 These coefficients can be used to find the orientation, θ, of the axis of elongation from the x axis as shown in equation 3.3. 30 3. Object segmentation and classification tan2θ = b a−c (3.3) Notice, however, that an algorithm must already know the centre of mass position of the object (x̄, ȳ) before starting to calculate these coefficients. The position of each pixel has to be subtracted from the mean position, so each object has to be examined twice, first to enable the extraction of the mean position of the object and then to calculate the orientation coefficients. A more efficient algorithm is presented in section 3.2.2 on page 34. Notice that for clarity the intensity weighting factor of each pixel is not included in the coefficient calculations in equations 3.2. Compactness The compactness of a continuous geometric figure is measured by the isoperimetric inequality: P2 ≥ 4π A where P and A are the perimeter and area respectively, of the figure. This enables geometric figures to be classified regardless of their size [9]. The shape with the smallest compactness (4π) is a circle. The compactness is an invariant parameter so its value for each indicator should stay constant wherever the indicator is in the arena (because of lens curvature objects further away from the center of the image are smaller). This parameter should have allowed carefully designed indicators to be identified by their shape. 3.1 Vision techniques 3.1.2 31 Grouping pixels to regions - Connected Components When an image has been segmented and represented by a binary image, pixels that obviously make up a region can be grouped together using the ‘connected component’ algorithm [3]. In the common image representation, a pixel has a common boundary with four other pixels (its 4-neighbours), and shares a corner with four additional pixels (its 8-neighbours). PIXEL FOUR NEIGHBOUR EIGHT NEIGHBOUR PIXEL AND ITS FOUR NEIGHBOURS PIXEL WITH ITS FOUR AND EIGHT NEIGHBOURS Figure 3.4: Pixel neighbours A path from a pixel at [i0 , j0 ] to [in , jn ] is the sequence of pixel indices [ik , jk ] such that the pixel at [ik , jk ] is a neighbour at [ik+1 , jk+1 ] for all k from 0 ≤ k ≤ n − 1. The set of all ‘on’ pixels in a binary image, S, is the foreground of the image. A pixel, p ∈ S is connected to q ∈ S if there is a path from p to s consisting of pixels of S. A set of pixels in which each pixel is connected to all other pixels is a connected component [9]. This defines regions within the segmented image. Component labelling The standard component labelling method finds all of the connected components in an image and assigns a unique label to each component. All of the pixels forming a component are assigned with that label. This technique forms a label map which describes the image in terms of the object that 32 3. Object segmentation and classification each pixel represents (pixels that are not part of an object are defined as the background). KEY ON PIXEL REGION 2 REGION 1 OFF PIXEL REGION 3 BINARY IMAGE Figure 3.5: Describing thresholded image with regions The sequential form of the algorithm requires two passes of the image, and is described more fully in appendix A.1. Connected component summary • The major disadvantage of this approach is that ALL of the pixels inside the processed window need to be scanned twice. A slight disadvantage is that a second copy of the image is produced when constructing the label map (doubling the memory requirements). • However this label map allows the regions to be examined quickly by further algorithms. The pixel moments of each region can be calculated by using the knowledge of which pixels belong to which region (as contained in the label map). • The project used a more efficient component labelling strategy which only required one complete scan of the pixels — although a label map was not constructed. This is presented in section 3.2.1. 3.2 Efficiency of the vision algorithms 3.1.3 33 Boundary tracking A simple algorithm (described in appendix A.2) allowed the boundary of a connected component to be found. This algorithm required a starting index on the edge of an object to be provided, and traced around the boundary of the object. 3.2 Efficiency of the vision algorithms The obvious key to real-time image processing is to process each image quickly. Because of the relatively slow processor in the Khepera arena the system needed fast, simple algorithms. The traditional vision algorithms presented above are slow — especially the connected component algorithm which scans the whole image window twice. These algorithms can be sped up so the bulk of each image is only processed once — however this results in the loss of some of the information, so careful use is required. 3.2.1 Efficient connected component algorithm Rather than constructing a map of the pixel labels during the connected component algorithm, only the index of the top left pixel and the area of the region was stored by the efficient algorithm. This reduced the two pass algorithm to one pass. If the region looked interesting2 then the actual information about the region could be extracted by re-applying the algorithm to this much smaller region. In this way, rather than sweeping the whole 640x480 image a second time (as in the original algorithm) only the important looking regions were scanned twice (a much smaller section of the image). A problem with this method was that a label map of the regions was not produced. All of the moment calculations rely on summations of intensity 2 If the area of the region matched a particular range. 34 3. Object segmentation and classification START REGION 2 START REGION 1 KEY ON PIXEL OFF PIXEL START REGION 3 BINARY IMAGE Figure 3.6: Object representation after one pass of the connected component algorithm and pixel positions, so these pixel summations had to be conducted during the second sweep. This presented a problem for the orientation calculation. 3.2.2 One pass orientation calculation As shown in section 3.1.1 (on page 29) the calculation of the orientation of an object required the first-order center of mass positions to be known, while completing the summation calculations of each pixel in the object. This would require these regions to be processed a third time. A more efficient calculation of the second order moments can be shown (see appendix A.4.1) to be: a = n X m X Fij x2ij i=1 j=1 X n X m b = 2 + n X m X i=1 j=1 Fij xij yij − x̄ i=1 j=1 +x̄ȳ c = Fij xij + n X m X n X m X Fij + x̄2 − 2x̄ i=1 j=1 n X m X Fij yij − ȳ i=1 j=1 n X m X Fij xij i=1 j=1 Fij i=1 j=1 n n X m m XX X 2 Fij yij + Fij yij i=1 j=1 i=1 j=1 + n X m X i=1 j=1 Fij + ȳ 2 − 2ȳ (3.4) 3.2 Efficiency of the vision algorithms 35 This change allowed the pixel values to be summated during the sweeping phase without having to know the first order position moments during the calculations. Only the the positions xij , yij and gray level intensity Fij of the actual point were required during these calculations. The first order values (x̄, ȳ) were factored in after each pixel had been examined. The orientation of the axis can be found with these coefficients as shown in equation 3.3 (page 30). Region finding summary • Objects (indicators) within a specific threshold could be extracted from the image. • The pixels forming each object could be grouped together. • The size, position, perimeter and orientation of each indicator could be extracted from these pixel groups. • Efficient algorithms were created to extract the relevant features from the image, by only scanning the whole image once, rather than the traditional method of scanning the whole image twice. • Indicators had to be large and clear enough to see and contain an obvious axis of elongation to allow the orientation of the object to be extracted. • The orientation of the object only gave the orientation of the axis of elongation. There was no indication as to the heading of the robot along this line. (This is rectified in section 4.2.1, on page 60). • Indicators could be classified by their relative size (although this would change slightly depending on the position), compactness, or some other invariant feature parameter. An example of some regions found in an image, with their centers, orientations and perimeters marked out, is shown in figure 3.7. • Three regions have passed the intensity and area thresholds: the indicator, a lamp (at the bottom of the image) and a section of wall (on the left of the image). 36 3. Object segmentation and classification Figure 3.7: Extracted moment information from an image • A black mark shows the detected center of mass position of each region with the orientation displayed above this marking. 3.3 Region finding experiments It was hoped that objects could be sufficiently identified by the simple feature parameters shown (size, area, perimeter and compactness) to enable simple indicators to be built. Experiments were performed to evaluate such a classification method. 3.3.1 Indicator classification experiments Tests were performed comparing the parameter information obtained from a large A4 indicator positioned in different parts of the arena. The tests used the same threshold and similar environmental conditions (the arena in the same state, similar lighting conditions with a small time between tests). However in this complex environment the parameters extracted from the object varied by a large amount. The lighting could not be controlled suf- 3.3 Region finding experiments 37 Figure 3.8: An A4 sized indicator in the Khepera arena ficiently, because of outside lighting effects. Also the lighting conditions in altered in different parts of the arena, resulting in the fixed threshold selecting radically different pixels depending on where the indicator was. Even a static indicator, processed several times, resulted in a significant parameter change (due to slight changes in the overall lighting conditions). Table 3.1 shows examples of the parameters extracted from an A4 sized indicator located in different parts of the Khepera arena. The threshold used to segment the indicator from the image was the same in each position (130). Error (%) Area 1300 1154 857 51 Perimeter 143 137 144 5.1 Compactness 15.7 16.3 24.2 54.1 Table 3.1: Moments derived from an A4 indicator in the Khepera arena As the indicator size reduced, a larger fraction of its total area was misclassified resulting in even poorer results. Even controlling the lighting in 38 3. Object segmentation and classification the more easily controlled test arena, resulted in poor results (as shown in table 3.2). The parameters extracted from static indicators fluctuated less as the lighting conditions could be kept moderately constant. Error (%) Area 2911 1754 2099 66 Perimeter 215 208 207 3.9 Compactness 15.88 24.95 20.41 57.1 Table 3.2: Moment invariance of an indicator in the test arena Clearly the area and compactness values changed by too large a margin to allow for the successful classification of different objects. To solve this problem either another method of classification, or a method for selecting a threshold depending on position was required. Indicator classification summary • In the complex vision environment of the Khepera arena, simple feature parameters such as the indicator size proved to vary too much to allow object identification to be made. • This was due to: → Fluctuating lighting conditions. → Un-uniform lighting and background conditions over different parts of the arena. → Poor resolution of the indicators. 3.4 Extending the classification system To allow for successful identification of each object, a stronger method of identification was introduced. A number of black markers were placed on the indicators. These markers would stand-out from the indicators, such 3.4 Extending the classification system 39 that they would not be part of the indicator after segmentation, effectively appearing as ‘holes’3 in the indicator. Each object was identified by the number of holes on the indicator. OBJECT 1 OBJECT 2 Figure 3.9: The indicator marking system This invariant parameter depended only on: • A secondary thresholding process being able to locate all of the holes on an indicator. • The process not mistaking dirt or shadows on the indicator for extra holes. Again this allowed very simple, user defined indicators to be made. These were very simple in that one design could be used, with each indicator being identified by a particular number of holes, rather than by using different shaped and sized indicators to classify each object. The only extra constraint introduced was for the user to make the holes large enough to be visible and far enough apart to distinguish. Each indicator had to be large enough to allow sufficient spacing between the holes and the edge of the indicator (to prevent the merging of holes). Locating and counting holes required extra computation. Again the bulk of the computational time came when scanning the entire image. Once again the areas actually scanned for holes would be much smaller than the overall image size. 3 During this report the markings are referred to as holes (which is how vision algorithms see them). It should be understood that rather than being physical holes in the indicators they are black marks. 40 3. Object segmentation and classification A sketch of the algorithm used for locating holes can be seen in appendix A.3. IMAGE ON PIXEL OFF PIXEL BACKGROUND HOLES Figure 3.10: Identifying an indicator by holes 3.4.1 Processing regions To extract the information about the region: • The single pass, connected component algorithm was initially applied. The number of components, their size and the index of the top-left pixel were stored. • Objects too small or too large were ignored. • The perimeter of each object was then scanned. Objects touching the edge of the image were ignored, and objects too large or too small could also be ignored. • The object was then scanned for holes (see appendix A.3). The connected component algorithm was applied searching for pixels inside the indicator less than a ‘hole’ threshold. • Care had to be made to not confuse jagged edges (which could appear to be holes) with actual holes. The system needed to check that each hole was completely surrounded by the indicator. • The summation information found during the hole processing was added 3.5 Detection tests and indicator design 41 to the overall moment calculations to achieve more accurate positional and orientational information about the indicator. Detection summary • An image was scanned for ‘interesting’ looking areas, these were identified as specific indicators which were matched to objects. • After scanning the following was known about the object: position, orientation, area, perimeter and number of holes it contained. • This information (particularly the number of holes) could be used to identify the object. 3.5 Detection tests and indicator design Some sample indicators (using the hole marking scheme) were created to demonstrate the detection process. When making the indicators the end user needed to considerate: • The definition of the indicator, such that the system had the ability to extract indicators from the image by thresholding. (eg making large, bright indicators). • The definition of each hole — Allowing the system to ‘see’ all of the holes clearly and not mistake other markings (dirt) for holes. • The main consideration was the size of the holes, making them large enough to see, but not so large that huge indicators would be required. • The separation of the holes was important, preventing holes being merged with the background, or with other holes. If the holes were positioned too closely together, or too close to the boundary they could blend, making the section of indicator separating them darker than the threshold. Either two holes would merge, or the hole would be classed as 42 3. Object segmentation and classification part of the background. Figure 3.11 shows a pixel-level section of captured image showing a indicator with three holes. Figure 3.11: An example of the system being unable to find a hole in an indicator • The white pixels show areas above a threshold (the indicator). • The black pixels show areas below another threshold (the holes). • The gray pixels are ‘uninteresting’ pixels. • It can be seen that the bottom left hole is located too near to the boundary so that in this particular lighting condition the pixels forming the region of indicator between it and the boundary have become darker (because of pixels blending) resulting in those pixels not being grouped as part of the indicator. • This results in the hole being classed as part of the background, so only two holes are found in the indicator. 3.5.1 Design of the indicators By experimentation and observation, a reasonable size of the holes and the spacing between them (and the indicator edge) was found. These sizes for 3.5 Detection tests and indicator design 43 the two arenas are shown in table 3.3. The size of the indicator depended on the number of holes placed on it. System Khepera arena Evaluation arena Hole size (mm) 20 5 Hole separation (mm) 15 5 Table 3.3: Design of the object indicators in the two arenas The material chosen to make the indicators was basic printer paper which reflected a large proportion of light. Slightly more ‘shiny’ paper could have been used to make the indicators easier to segment from the image. As the Khepera arena was not uniformly lit from above, reflective paper couldn’t be used (it would reflect a large proportion of the light to the camera in some positions, and not at all at others). 3.5.2 Single frame classification tests Figure 3.12 shows a cluttered scene from the evaluation arena. Seven indicators are spread over the image, as well as other bright objects (an unmarked indicator, reflective paper, and base of a lamp). The second image shows the output after applying the segmentation and classification process. An intensity threshold of 120 has been used to segment the indicators from the image, and a threshold of 30 is used to find the holes. All of the marked indicators have been found by the detection process with relatively good accuracy in both the position and orientation of the objects. Khepera detection Figure 3.13 shows an indicator placed on a Khepera robot placed in a clear section of the Khepera arena which has been successively segmented and classified. Using an indicator of this size made it impossible to place more than one (visible) hole-marker onto the indicator (at this resolution). 44 3. Object segmentation and classification Figure 3.12: Scene containing 7 object indicators before and after classification To segment indicators in this clear area of the Khepera arena a gray level threshold of 130 is used to find the indicators. Within the indicators, holes are marked by pixels with intensity less than 110. Object occlusion in the Khepera arena If the indicator is partially or totally obscured then it is unlikely to be found correctly by the system. An example is shown in figure 3.14. Two indicators are in the image, but the second is partially obscured by the overhead crane, hiding one of the holes. This ‘second’ indicator has been mis-classified as the ‘first’ indicator. Segmenting the image in different parts of the Khepera arena The varying lighting and background conditions in different parts of the Khepera arena required different thresholds to segment objects from the background. Kheperas grouped in the same (or similar) parts of the arena could be segmented by using the same fixed threshold. Figure 3.15 shows three indicators spaced over the arena. 3.5 Detection tests and indicator design 45 Figure 3.13: Detecting a Khepera sized indicator The left most indicator (in a clear part of the arena) can be segmented with a threshold of 130 (and hole threshold 110), as shown in the left image. With this threshold the other indicators cannot be segmented (such that they were identified by the classification process). The right image shows the system detecting these indicators after segmenting with a threshold of 125 (and hole threshold of 70). With these settings the first indicator is lost. Orientation of the indicators It can be seen by observation, that the orientation of the indicators found in figures 3.12, 3.13, 3.14 and 3.15 have been found fairly accurately. Any error comes from mis-segmentation of the image where more of one side of the indicator is segmented than the other, resulting in the indicator appearing to tilt more than it actually does. Indicator identification The image shown in figure 3.16 shows the detection process applied to the cluttered image shown in figure 3.12, but only having searched for the first 46 3. Object segmentation and classification Figure 3.14: Example of a partially obscured indicator being incorrectly classified two indicators. It can be seen that these objects (marked with one and two hole markers respectively) have been selected correctly. 3.5.3 Execution time The bulk of execution time when processing an image comes from the initial scan of every pixel within the image window. The more complex pixel calculations for moment calculations and hole finding operations are performed for relatively few pixels. The size of the window and the number of pixels above the threshold effects runtime. The fairly complicated cluttered scene shown in figure 3.12 (page 44) was captured and the detection process was applied and timed. The processing times are shown in table 3.4. This time was obtained by processing the entire scene 100 times and averaging. The times for both the 500MHz evaluation system and 75MHz Khepera system are shown. These times are also compared with the time taken for the simpler (non-hole finding) method shown in section 3.3 (page 36). 3.5 Detection tests and indicator design 47 Figure 3.15: Indicators spread over the Khepera arena System Khepera arena Evaluation arena Method Basic Hole finding Basic Hole finding Time (s) 0.165 0.249 0.014 0.023 Table 3.4: Execution time of the detection process • The scene shown in figure 3.12 is fairly complex. It is unlikely that 7 objects would be tracked, and the objects cover a large proportion of the image. Usually 2/3 robots covering a small area would be followed. • As expected the test arena (500MHz) processed each frame ∼10 times faster than the Khepera arena system (∼75MHz). • The test arena system received a frame every 0.06s. It took 0.023s to process the scene as shown, much less than the frame time (almost at the speed of the Meteor hardware in the Khepera system). • The Khepera arena hardware received a frame every 0.02s, whereas it took 0.249s to process the scene as shown. This was obviously not quick enough for real time applications. 48 3. Object segmentation and classification Figure 3.16: Testing the classification of indicators • The execution time depended on the amount of ‘interesting’ pixels in the image. • A scene must be processed quickly enough so that the tracked robot does not move too far before the next frame. This is dependent on the type of robot used. A Khepera can move at a maximum of 2cm in the 0.02s between each frame. Processing the scene shown would miss ∼12 frames, in which time a Khepera could move a maximum of 20cm! • Even the simple feature detection method (not looking for holes) would miss ∼8 frames, allowing the Khepera to move 16cm. • As 12 frames are missed when processing the image on the Khepera system, care must be taken to prevent the Meteor card from over-writing the frame being processed (see section 2.1.2 on page 17). It would need to be copied to another memory location (slowing the system further). • These times are to scan the entire image. The detection process could be applied to a small sub-window of the image (explained more fully in section 4.1 on page 52). 3.6 Detection and identification summary 49 • It is ironic that the testing system is limited to processing 15 frames a second, yet can process frames fast enough to operate on the Matrox hardware. A trivial solution would be to update the CPU used in the Khepera system to process each frame at a faster rate. During the scope of this project this was not practical, as the Khepera system was in heavy use and was relatively complicated to setup. 3.6 Detection and identification summary • The indicator system allowed simple, unique, user defined indicators to be built allowing objects to be identified and information about their position and orientation to be extracted. • When tracking, objects could be lost and system would find them again without the user needing to re-identify the objects. • The user need never identify the objects. The system simply matched robots and indicators by the number of holes on the indicator. • The detection system operated with a set of fast, efficient algorithms. • Objects could be lost by being partially obscured (for example by the overhead crane). • The system required an accurate threshold to segment the indicators from the image. • One complete (complex) frame could be processed sufficiently quickly using the processor in the test arena for real time processing (of the incoming frames). • The 75MHz processor used in the Khepera arena was not quick enough to apply this process for each frame to achieve any kind of real time processing. Rather than replace the machine (which was in heavy use) a method of reducing the processing time was required. Chapter 4 Object Tracking As demonstrated in the previous chapter the tracking system introduced in this report follows a number of object-indicators across successive images. This identification process provides a simple solution of the traditional tracking problem which can mis-classify objects in successive frames. Each indicator is identified within each frame, which allows the segmentation and classification procedure to operate on a sequence of frames to provide the tracking information1 . A A A B FRAME 1 B B FRAME 2 FRAME N. Figure 4.1: Tracking by appling the detection system in successive frames This chapter of the report documents the tracking element of the presented system, following the object indicators in successive image frames. • Rather than implementing a computationally expensive probabilistic modelling method, the system relied on the basic identification procedure coupled with simple positional prediction based on the objects trajectory. 1 This is similar to differencing methods — see section 1.3.3 on page 9. 51 52 4. Object Tracking • The indicator was searched for in a particular location (based on its previous location). • If the indicator was not found, or was occluded, it was ‘lost’ by the system, until it became visible again. • Providing objects were only lost for a small enough time, this provided a very simple solution to the tracking problem. • Because the system relied on user set, constant thresholds to stay reliable over the whole tracking area for the tracking run, the system could occasionally mis-classify objects for short amounts of time. This occurred when the identification features of an indicator were obscured — or if shadows across the indicator appeared as hole markings. 4.1 Method of operation The system borrowed heavily from the tracking method applied in the single Khepera tracking system [12] by placing a ‘window’ around the predicted robot location and applying the detection process (documented in section 3.4 on page 38) in that window. If the indicator was not found then the system would revert to a scanning phase looking for that object in the whole frame. For this early design, each object had its own unique ID number, corresponding to the number of markings on the indicator. Rather than detecting all of the objects by itself the user had to specify the number of objects for the system. In future the identification code could be updated such that: • Robots might be identified by their starting location. • The system might always scan for new robots being introduced to the system. → Without knowing the number of robots the system might miss robots when making an initial count, whereas by specifying the number and searching this is prevented. 4.1 Method of operation 4.1.1 53 Tracking and windowing The detection algorithm (as detailed in the previous chapter) could be applied to a small window within the image. The main bottleneck in the execution time during the processing of each frame, came from examining each pixel of the image in the first pass of the detection process. By only examining small windows in the image, processing could be performed much quicker. If a prediction could made about the location of an object then only a small section of the image needed to be searched when looking for that object. This tracking phase was applied to each known object in each frame. The image shown in figure 4.2 shows the tracking process applied to one image. The previous locations of the indicators in the scene were known, and the system searched for the new positions in the windows shown. Figure 4.2: Searching for indicators using windows Rather than using the position, speed and orientation information to make an accurate prediction of the new position, the initial system assumed the object was in a small window centred on its previous location. The detection and identification algorithm was applied to each window for each known object to extract the new position information and to check the identification of 54 4. Object Tracking the object. Checking the identification reduced the chances of the system mis-recognising an object (the most serious error). The window size could be changed depending on: • The size of the indicator and its resolution in the image. • The distance that the robot could move in the space of one frame. → In the test arena a window of 70x70 pixels was used, allowing for fairly fast robots and the relatively slow frame rate. This corresponded to a distance of ∼13.5 cm across the window, allowing for robot speeds of less than 1 ms−1 . → The single Khepera tracking system used a window of 8x8 pixels, the approximate size of the Khepera, which could move approximately 4 pixels (2cm) in one time-slice. • This parameter could be easily changed by the user depending on the application. 4.1.2 Scanning and detecting The scanning phase applied when the system did not know the location of one or more of the objects. The system also started up in this mode (as the positions of the indicators were not yet known). A scanning window would sweep ‘down’ the picture moving a specific2 number of pixels in each frame. Only the pixels in this window would be examined for objects. If a missing object was found and identified in that window then that object could be flagged as ‘found’. An example is shown in figure 4.3. A band scanning the entire width of the image but only 70 pixels in height is shown in two positions in the image. In the second figure the scanning window has found the identifier with two hole markers. 2 User defined—as required by the application. 4.1 Method of operation 55 Figure 4.3: System debugger: The scanning band sweeping the image By only scanning this small band the system had time to process the tracking windows for objects that had been found in the previous image frame. The testing system running on the 500MHz processor could process the entire image much faster than the rate that frames could be transferred over the USB connection (as shown in section 3.5.3 on page 46). An option allowed the user the choice of using the movable scanning window or scanning the entire image, depending on the system it was running on. Scanning the whole image would find a lost object in one frame (if it was visible) rather than waiting until the scanning band covered that object. This was obviously desirable and practical on a fast system. 4.1.3 Indicator identification Due to poor image segmentation, some objects could be mis-classified. Checking the identity of each object in each frame limited the mis-classified cases resulting in more lost objects rather than mis-labelled objects. The tracking 56 4. Object Tracking code could be easily modified to allow multiple objects of the same type to be tracked. This might be applied to static objects (for example to mark danger, or refuelling areas). 4.1.4 System output • The position of each robot could be dumped to a file describing the position of the robots over time. • This information was also shared with a second process which could send the data to other processes running over the network. This allowed complex robot controller software to run on external machines without slowing down the tracking machine, or allowed the information to be processed or stored, easily by external processes. • The system could also dump each frame processed to disk as a ppm image. Information on the objects and regions found in an image could be overlayed onto the image to provide an easy method of debugging the system. • A further option allowed the final positions of the robots to be overlayed at the end of the tracking process to allow the user to verify that the system was performing sensibly (without having to store the image of each frame). Position storage To simply the scanning and identifying process, the user informed the system of the number of objects it was required to track. The system would build a list of object descriptor elements describing each object. Each object element contained the following information about the object: its id number, a flag showing whether the object was found in the previous frame, and its position, orientation and speed. A good approximation of the current speed of the object could be obtained by examining the object’s displacement over the preceding frames. 4.1 Method of operation 57 Position communication This object information was held in shared memory which allowed the tracking process and a separate communication process to both access it. When the communication process received a request from another machine, the current position information was sent back. N. OBJECTS OBJECT 1. INFORMATION OBJECT 2. INFORMATION OBJECT N. INFORMATION TRACKER SHARED MEMORY COMMUNICATOR Figure 4.4: Sharing object positions with other processes This method allowed a very simple communication process to be built, and was reliable enough to allow communication between two machines on the same network (eg. a separate tracker and robot controlling machine)3 . This avoided the synchronisation problems involved with storing positional information in a file4 . REQUEST POSITION ACTION READ POSITION TRACKER LISTENER Figure 4.5: Listening for object positions 4.1.5 System debugger The system proved troublesome to debug. Unusual conditions could lead to memory access errors, the rarity of these conditions making traditional 3 This communication aspect is of no interest in the scope of object tracking — it used common Unix facilities. Interesting parties should refer to the code used in the project. 4 Trying to write information to a file while another process trys to access that file. 58 4. Object Tracking debugging techniques difficult. To aid debugging a simple GUI interface was created using the QT toolkit [18] which allowed the current output from the camera to be overlayed with information from the tracking system. The tracker itself could be stepped through allowing a simple visual aid to the debugging. The QT toolkit allows nice looking, user interfaces to be created easily. Because of this the ‘debugger’ also offered an easy to use introduction to the system. The debugging application required a large chunk of system resources (especially a problem with the Khepera hardware) so it was not designed as the ‘real’ interface to the system. Figure 4.6: View of the system debugger However, the tracking code consisted of modular code which could be easily ‘wrapped’ by any interface, allowing the main command line interface and GUI debugger to share the same ‘tracking’ code. This also allowed the interface to be modified in future. The debugger provided: • An easy to use interface. • Allowed a frame to be examined at pixel-level, thresholded and saved to disk. 4.2 Extensions to the basic system 59 • The individual pixels values could be examined (and areas could be enlarged) allowing thresholding values to be easily selected. • The tracking information could be overlayed onto the original frame. • The tracking system could be stepped through frame by frame. 4.1.6 Basic performance evaluation Because of the nature of the vision system, an end user could test the output from the system by comparing it with what was actually in the arena. The debugging/interface system displayed each frame for the user to inspect and overlayed the tracked information onto it. It also allowed the system to be ‘stepped’, allowing the user to conveniently inspect dumped (text) output from the system and compare it with the graphical image and the real system. Displaying each frame was computationally costly (it could not be performed quickly enough on the 75Mhz to be able to process frames at a fast enough rate). The code used to overlay object information onto the real frame allowed the ‘real’ tracking interface to dump frames to disk at intervals. This allowed the user to check the output at various instances. To prevent any deterioration to the system speed an option was introduced to allow only the final positions to be output when the system was stopped. 4.2 Extensions to the basic system The detection/tracking system as it has been demonstrated (section 3.5.3 on page 46), worked sufficiently well for simple robot applications. A number of features that were simplified for the initial detection system could be improved: • Derive the indicator heading, not just the orientation of the axis of elongation. 60 4. Object Tracking • More effective tracking window placement — using the heading and speed of the indicator to predict a new position. • Ability to predict positions of occluded objects. • Ability to find partially occluded indicators. • Automatic intensity (and area) thresholding. Except for calculating the heading of the indicator these extensions were not implemented. Suggestions for making these improvements are discussed in section 7.3 (on page 101), and left for future extensions to this project. 4.2.1 Calculating the heading of an indicator As described in section 3.2.1 (page 33) the orientation of the axis of elongation of a indicator could be found by using equation 3.3 with the pixel moment coefficients found in equations 3.4. The orientation of this axis, gives no indication as to which way a robot is facing along the line. A simple way to rectify this was achieved by grouping the indicator identification markers at one side of the indicator’s elongated axis. This side defined the front of the indicator — the heading of the indicator was toward the markings (along the elongated axis). The method used to calculate the heading from these markings and the calculated orientation is shown in appendix A.5. KEY CENTER MASS OF LABEL CENTER OF MASS OF HOLES HEADING LABEL Figure 4.7: The heading of an indicator An example of the heading calculation is shown in figure 4.8. Four indicators are placed on four LEGO robots. It can be seen that the heading of the 4.3 Tracking experiments 61 indicators has been accurately found. Providing the user knows how the indicator is positioned on the robot, this heading can be transformed into that of the robot. Figure 4.8: Detecting the headings of indicators 4.3 Tracking experiments This section presents some experiments used to the time the process used to detect objects in a frame, given the previous location of the objects. Some simple experiments are also conducted showing the accuracy of the tracking system as a whole. 4.3.1 Execution time The cluttered frame processed in the experiments in section 3.5.3 (see figure 3.12 on page 44) was again processed to test the speed of the tracking process (using a window). The image was loaded and scanned for the seven objects. The tracking phase was then applied 100 times on the seven objects found to calculate an average time to apply the tracking phase. 62 4. Object Tracking System Time for detection frame (s) Test arena 0.023 Khepera arena 0.249 Time for tracking frame (s) 0.0056 0.056 Table 4.1: Execution time of the tracking process During the tracking phase a window of 70x70 pixels was placed around the previous position of each object found. Seven objects were found so only 70x70x7 pixels were processed in each frame. This is approximately a tenth of the amount of pixels processed when looking at the whole 640x480 image. It can be seen that: • The frame grabbing hardware (used in the Khepera arena) could deliver a frame every 0.02s so the system could track objects in every third frame (in which time the Kheperas could move a maximum 6cm) which is much more acceptable. When objects are lost, the small scanning window used to find them would slow the system down but not to the level of scanning the whole image. • This execution time was from tracking 7 robots. Usually less than this (two or three Kheperas) would be tracked. • Transferring frames via the USB was obviously a bottleneck in the test arena. The system could process 10 frames while waiting for the next frame to be transfered. • The main question was whether one frame in 0.06s was too slow — which is dependent on the type of robot used and the application. 4.3.2 Static positions Two indicators were placed in the test arena. The tracking system was run for approximately 8 minutes during which time the indicators were not moved 4.3 Tracking experiments 63 and the environment kept moderately static. This allowed a measure of the accuracy of the tracking/detection process over time to be found (evaluating how much the position and orientation information changed due to slightly changing environmental conditions). Figure 4.9: The positions of two static indicators Figure 4.9 shows the two indicators in the test arena. Indicator ‘one’ has center of mass position at approximately (413, 199), object ‘two’ has its center of mass at ∼ (346, 374). Object Total frames Lost indicator Found indicator Mean Std. Dev. 1 6823 4 6819 X Y Heading 412.9 198.9 132.4 6.2 3.9 4.6 2 6823 6 6817 X Y Heading 345.9 373.9 94.02 2.7 4.4 3.2 Table 4.2: Evaluations of the tracked positions obtained from static indicators The results obtained from the system are shown in table 4.2, which shows: 64 4. Object Tracking • The extracted heading and position of the indicators was approximately accurate. • The ‘lost frames’ entry shows the number of instances where the indicator could not be found by the system. This does not represent the instances where an indicator was incorrectly found. Mis-classified objects and heading calculations By visual inspection of a plot of the static positions, it was seen that the first indicator had been found in the completely wrong area in four frames. This had occurred once for the second indicator. The system occasionally classified the small section of silver tape at the top-left of the image as an indicator. This occurred when: • The real indicator had been lost (and the system was searching for it). • The lighting conditions were such that an area on the silver tape matched the matching criteria (so 1 or 2 ‘holes’ appeared to be on the tape). • When the lighting conditions changed so that this area no longer matched the identification criteria, the system reverted to the scanning phase and found the real indicator. By removing these mis-classified cases (by hand) all of the data points were centred on the correct center of mass position. The change in heading between the frames could be examined more clearly as shown in table 4.3. Object 1 2 Max 133.70 95.98 Min 131.07 91.96 Mean 132.28 93.99 Std. dev 0.42 0.72 Table 4.3: Heading calculations from static indicators 4.3 Tracking experiments 65 • Clearly the heading calculation is effected by noise more than position calculations (principally due to the position being to a nearest pixel integer). • The robot heading information is only used as a guide as to what the robot is doing. Usually an error of 10◦ is acceptable. 4.3.3 Tracking objects moving in a straight line A straight-line ‘runner’ was built from LEGO, which allowed an indicator to be smoothly pushed in a straight line across the image. The y position of the indicator was kept constant as the indicator was pushed across the arena with a heading of ∼180◦ . The y position actually changed by approximately 10 pixels during this distance, so the indicator was not moving exactly horizontally across the image. 332 450 331 400 330 350 329 328 250 327 y 300 200 326 150 325 100 324 323 50 0 322 0 100 200 300 400 500 Figure 4.10: The observed COM positions of an object on a straight line path 600 0 100 200 300 400 500 x Figure 4.11: The straight line path (y-magnified) The path is shown in figure 4.10. The system examined 349 frames, and lost the indicator in 18 of those. • It can be seen that there were no mis-classified points, and the path of 600 66 4. Object Tracking the positions is fairly smooth. It can be seen in figure 4.11 that there is a slight mis-calculation of the center of mass y pixel (at approximately x = 150, 300 and 550), where the y-position fluctuates +/− 1 pixel (possibly due to jerky movement of the indicator). • The mean orientation of the indicator was found to be 179.7 ◦ with a standard deviation of 2.09. Any inaccuracy was partly due to the indicator not being accurately positioned on the base on which it was being pushed. 4.3.4 Tracking simple robot tasks Four LEGO robots conducting very simple tasks (or random movement) were placed in the test arena and tracked for a minute using a tracking window of 70x70 pixels. This window was 135mm across which limited the movement of each robot to less than 7mm in 0.06s (less than 1 ms−1 ). 450 Robot−1 Robot−2 Robot−3 Robot−4 400 350 Y 300 250 200 150 100 50 150 200 250 300 350 400 450 500 550 600 X Figure 4.12: Paths of four robots being tracked • The thresholds used to segment the image were: 110 (to separate the indicators) and 30 (to find the holes). • Figure 4.12 shows the center of mass positions obtained from the system while tracking the four robots. 4.3 Tracking experiments 67 • It can be seen that the first two robots are followed successfully (the plot of their paths describes a more or less continuous, sensible path). • The robots were tracked for 1217 frames (1.5 minutes). The number of instances where the robots were lost are shown in table 4.4. Object Lost points 1 14 2 93 3 13 4 420 Table 4.4: The number of instances of the system losing a robot • The third robot seems to have been lost at low y values and low x values (where its path seems spread out). However, the system only lost this robot on 13 occasions as shown in table 4.4, so the robot seems to have sped up at these points. • The second robot was lost on several occasions. A few positions can be seen a distance away from the main path described by this robot. The intervening points have been lost. • The path of the fourth robot is very disjointed because of the system losing it so frequently. • The lost points are partially due to poor threshold selection. All of the robot tracking and detection experiments shown were conducted with the same thresholds. They should have been carefully selected before each application. The second and fourth indicators had markings close to each other and the edge, which may have resulted in the holes ‘blurring’ together causing mis-identification. • By visual analysis of the individual frames, it was seen that the system 68 4. Object Tracking had segmented some bright yellow LEGO bricks around the fourth robot as part of the indicator. Often the ‘nodules’ on the LEGO bricks would cause shadows causing the system to see extra holes in the indicator. 4.3.5 Tracking Kheperas Two Khepera robots were tracked in the Khepera arena. To prevent thresholding problems they were contained in a small, clear area in the bottom left of the arena. The umbilical chord and crane where still present causing occlusion problems. To be able to track the robots, massive indicators (in relation to the size of the Khepera) were required so that the hole markers could be located (as shown in figure 4.13). Figure 4.13: The respective size of the Khepera and indicator used in Khepera tracking experiments • The thresholds used were: 150 to extract the indicator from the image, and 100 to find the holes in the indicator. • The robots were kept in a uniform section of the arena where these thresholds applied. • Figure 4.14 shows the tracked path of the two Khepera. 4.4 Tracking summary 69 • 448 frames were processed. The first Khepera was lost in 62 instances, and the second Khepera lost in 110 instances. • These lost instances are mainly due to objects being occluded by the overhead crane. • The system could process each frame in 0.03s (on average). 400 380 Y 360 340 320 300 Khepera−1 Khepera−2 280 180 200 220 240 260 280 300 320 X Figure 4.14: Paths of two Kheperas being tracked Summary of results • Using a tracking window improved the processing time. This made the system suitable for use with the Khepera system (a Khepera could move a maximum of 6cm between each frame being processed). • The position information extracted was accurate. • The heading calculation was accurate (to 10◦ ), and fairly robust to noise. 4.4 Tracking summary • Object tracking was achieved by applying the object detection process in a number of small tracking windows, using the previous position of objects. → The frame was scanned if an object couldn’t be found. 70 4. Object Tracking → Objects were identified in each frame to reduce the number of misclassified objects. • The heading of the robot was be calculated (as opposed to its orientation). • Each frame could be processed quickly (in the tracking phase). A demonstration was timed at 0.03s using the Khepera hardware and 0.003s using the test hardware. The Khepera hardware could process the frame approximately fast enough so a Khepera could move a maximum of 6cm (∼ 8 pixels) between processed frames. • The extracted position and orientation of the robots where shown to be sufficiently accurate. • The system could be used to track objects sufficiently well providing fixed thresholds were selected which allowed the indicators to be segmented during the tracking period. • The system was application dependent. The size of the tracking and scanning windows (and the threshold settings) needed to be set by the user depending on the speed of the system, the robot type, and end application. • Objects were frequently lost due to occlusion, and also because the system checked the objects identity in each frame. Slight changes to the environment could cause a correct indicator to fail these tests. • Instances of incorrectly matching areas in the image to indicators were rare. Chapter 5 Multiple robot tracking application: The waggle dance The next two chapters provide an example of using the tracking system as a robot sensor to aid in a multi agent task. This chapter describes a postulated method of communication between bees (describing a path to distant food sources to other hive mates). The following chapter demonstrates a simple model of this method of communication, conducted as a robotics task using only positional information from the overhead camera tracking system as input to the robot agents. 5.1 Bee foraging and communication To achieve efficiency while foraging, it is postulated that honey bees share information about potential sources of food. When a scout bee locates an important source it attempts to recruit other bees to ensure that the colony’s foraging force is focused on the richest available area. Studies have shown that there is a good correlation between the ‘dances’ that a bee performs in the hive, and the area consequently searched by other bees who have followed the dance [4]. This form of communication is unique in 71 72 5. Multiple robot tracking application: The waggle dance the animal kingdom and offers several interesting research possibilities: • The understanding of the information held in the dance is by no means complete, each dance could contain many biophysical signals and it is not clear which are critical. • Several biologists are sceptical of the amount of information that the dances contain (some doubt that they contain any information at all). 5.2 The round dance It is postulated that for food sources close to the colony (< 80m) a ‘round dance’ is performed (as shown in figure 5.1). This elicits flight and searching behaviour1 for flowers close to the nest but without respect to a specified direction. Figure 5.1: The Round Dance 1 By olfactory and visual clues. 5.3 The waggle dance 5.3 73 The waggle dance A more interesting method of recruitment is by the waggle dance [4]. A returning forager bee performs a miniaturised re-enactment of its journey. Neighbouring bees appear to learn the distance, direction, and maybe even the odour and ‘importance’ of the flower patch by following the dance. A following bee would seem to translate the information contained in the dance into flight information. It can be said that the bees are sent and not led to the goal. This seems unique in that it is a truly symbolic message that guides a complex response after the message is given, unlike other examples which are effective only while the signals are in existence (or soon after in the case of chemical communication). 5.3.1 Dance description The waggle dance consists of a figure of eight, looping dance with a straight ‘middle run’ (figure 5.2). During this straight run [13] the bee waggles her abdomen laterally and emits strong vibrations2 . While the forager performs this dance other recruits gather behind it and follow it through the dance, keeping their antennae in contact with the leading bee. There is a strong correlation between the orientation and speed (and the rate of ‘waggle’) of the straight run and the direction and distance of the food source from the hive [4]. There are two main lines of research regarding the waggle dance. One involves the efficiency of recruitment of sister bees and the other with the mechanisms involved in the communication. 2 Both audible, and in the hive substrate. 74 5. Multiple robot tracking application: The waggle dance Figure 5.2: The Waggle Dance 5.3.2 Orientation of the food source Flowers directed in line with the azimuth of the sun are represented by straight ‘runs’ in an upward direction of the vertical combs of the hive. The direction to the food source is coded inside the hive by the angle of the straight runs from the vertical. This angle corresponds the angle between the azimuth of the sun and food source from the hive (figure 5.3). Dances directed downwards indicate the opposite direction. Figure 5.3: Orientation of food source from hive 5.4 Bee communication summary 5.3.3 75 Distance The distance of the food source from the hive is signalled by the duration of waggle run. Duration increases non-linearly with distance. Early researchers [4] supposed the estimate of distance was derived by the energy expended during flight. Modern research [2] has demonstrated that bees monitor physical distance visually by optical flow methods. 5.4 Bee communication summary • The Waggle dance is a symbolic message that guides a complex response from a following bee. • The dance is a unique form of communication in the animal kingdom. • It is postulated that the orientation, distance and ‘importance’ (size and type) of the food source is encoded into the dance. • Not all biologists believe in every aspect of the dance language. • Very little is known about which combination of biophysical signals are transmitted and how this information is used outside the hive. • The bees often follow several dances and their recruitment is not precise. Many bees fail to find the source or may stumble on it by following floral aromas or by following other bees. Chapter 6 Robot implementation of the waggle dance To demonstrate the multi agent tracking system, the positional information of a number of tracked robots was used implement a simple model of the waggle dance. The waggle dance was implemented as a robot following task with a single (or several) robot bee(s) following a dance performed by a leading bee. For this demonstration, only a simple robot control system coupled with positional information from the tracking system was used to copy the dance on the following ‘bees’. For this demonstration, only the most basic signal contained in the dance (the orientation of the food source) was modelled. It should be noted that this model is only for the purposes of demonstrating the tracking system. The model is a gross simplification: • The waggle dance takes place in a dark hive. Visual signals can be discounted as stimuli to the following bees. • A bee learns the dance by following close behind the leading bee (such that the bees touch), and the orientation of the food source is in respect to gravity. 77 78 6. Multiple robot tracking application: Robot implementation Figure 6.1: Robot bees For the scope of this report the simplifications are unimportant and this example can be thought of as any basic multi-robot following example, using only the positional information from the tracking system (eg using no light or touch sensors). To reduce the number of mis-classified or lost occurrences (due to occlusion, changing lighting conditions, etc), the experiments were performed in the test arena. This allowed much more accurate tracking to be performed in a more easily controlled environment. A selection of LEGO MINDSTORM robots were constructed. The MINDSTORM microcomputer (the RCX unit) is able to communicate with the world via an IR transceiver. This provided a useful mechanism to communicate with the robot bees, from the controlling system, without the need of a system of cables (which would cause more occlusion and shadows, deteriorating the performance of the tracking system). 6.1 The dancing bee 79 Figure 6.2: The dancing bee 6.1 The dancing bee To reliably and repeatedly follow a figure of eight dance pattern, the dancing bee robot followed a silver line marked on the arena. This allowed any path to be marked out for the bee to follow. A straight path connected by two loops was constructed. This straight line part of the path indicated the orientation of the food source. An example of this is shown in figure 6.3. A path has been marked out in silver tape on the floor of the test arena. Overlayed onto the silver tape are the tracked positions of a line following robot after following the path for a number of minutes. 6.1.1 Robot construction A robot ‘bee’ was constructed from LEGO MINDSTORM to follow the line marked on the arena. A gear ratio of 25:11 was used, as shown in figure 6.4, 1 On top of the built in LEGO motor gearing. 80 6. Multiple robot tracking application: Robot implementation Figure 6.3: The dance path, with overlayed positions obtained from the tracking system which allowed the bee to travel at a top straight line speed2 of 15cms−1 . In each 0.06̇s time-slice the robot could move a maximum of 1cm (∼5 pixels). KEY MOTOR GEAR (8 TEETH) GEAR (40 TEETH) Figure 6.4: The layout of the gears in the robot bee A tracking window of 70x70 pixels was used to contain the ∼220 pixel-sized indicator and allow for movement of the robot, when tracking a particular robot. When scanning for an indicator the system would search the whole image in each frame3 . The robot was built on two large wheels ∼114mm apart. A small wheel 2 3 Measured by the tracking system. The extra processing requirements could be afforded, using the 500MHz processor. The system gained in that lost objects would be found as soon as they became visible. 6.1 The dancing bee 81 located at the rear of the robot prevented it from tipping over, although this also reduced its rate of turn, due to friction. The turning rate at maximum motor output (with wheels turning in opposite directions) was approximately 43◦ s−1 . The robot could turn a maximum of 3◦ in one time-slice. RCX UNIT LIGHT SENSOR (Leading bee) INDICATOR IR TRANSCIEVER (FRONT) Figure 6.5: Overhead view of the robot bee A MINDSTORM light-sensor brick was placed at the front of the robot near to the center of its wheel base, approximately 5mm above the ground. The dance path was marked on the arena with silver tape. At this height the light sensor brick detected a (raw) value of > 60 when it was over the tape. As the sensor moved away from the tape this would drop to around 30. 6.1.2 Line following method An object indicator was placed in the center of the robot above its turning point. Several maneuvers performed by the robot would be sweeping turns looking for the line so the indicator was placed on the turning axis so its displacement would be minimised while the robot turned. The indicator was placed horizontally across the robot, such that an indicator heading of 0◦ corresponded to a real heading of 270◦ (heading vertically ‘up’ the image). The basic line following system algorithm can be seen in appendix B.1. 82 6. Multiple robot tracking application: Robot implementation 6.1.3 Extracting orientation information from the robot dance path The tracking system could dump the position of each robot found in each frame into a data file. Simple MATLAB routines were written to extract and analyse the positions of a specified robot from this file. By examining plots of the positions, any obvious instances of object mis-classification could be removed. If enough points were given to describe a robots motion around the dance path, a straight line could be fitted through the data points to provide an indication of the orientation of the path (and hence the orientation of the food source). Fitting a straight line to the dance path Simple line fitting algorithms produce a basic least squares regression line of y on x (which relies on uncertainty in the ‘y’ data points). This will fit a line through the data such that an equal number of points are on either side of the line ‘boundary’. There is no guarantee that this would match the straight line part of the path 4 . However, given that the detection system calculates the orientation of the axis of elongation of a number of points (as in section 3.1.1 on page 29) it made more sense to re-apply this algorithm to find the orientation of the set of positional points obtained from the robot following this path. It should be clear that the axis of elongation of this path describes the orientation of the dance path. An example of the calculation of the orientation from the positions of a dancing bee is shown in figure 6.6 which shows the orientation (as calculated 4 A regression line of x on y can also be calculated and coupled with the y-on-x line such that the resultant line gives an accurate indication of the orientation. 6.2 The dance following bee 83 by a MATLAB routine) overlayed with the original path onto an image of the arena. 600 500 400 Y 300 200 100 0 −100 180 200 220 240 260 280 X 300 320 340 360 380 Figure 6.6: Extracting the orientation of a food source from the positions of a dancing bee The next task was to design another bee (or a set of bees) which could follow this dance and maybe communicate this information to further bees. 6.2 The dance following bee While the line following bee was executing its version of the waggle dance, the tracking system relayed positional information to a second program which controlled the following robot(s). This program used the position of the bees to execute a following behaviour on the dance copying bee(s). 6.2.1 Communicating with the MINDSTORM bees The MINDSTORM RCX units can transmit or receive an infrared message as a series of unsigned byte-characters. The MINDSTORM IR tower which connects to the PC, downloads programs to the RCX unit in this way. 84 6. Multiple robot tracking application: Robot implementation Figure 6.7: Communication between the RCX and a PC The serial protocol used to send single byte messages via the IR tower is simple [14]. This allowed the following-control program to send a simple instruction (turn left, go back, etc) as an IR message to the desired robot. The range of the IR transmitter (set to long-range) exceeded the ∼2m test arena size (in good IR lighting conditions). The IR messages could be ‘bounced’ off walls and clutter making it easy to communicate with the RCX unit whichever way it was facing. A brief description of this serial protocol is given in appendix B.2. 6.3 The robot bee controller The controller sent commands to the following bees depending on the state of the leading bee. The only input into the controller were the positions and headings obtained from the multi agent tracking system. The controller attempted to copy the movement of the leading bee on the following bees: • A movement and a turning threshold were set. • If the leading robot moved more than the movement threshold between each frame, the following bee was instructed to move forward. • If the difference between the headings of the two bees was greater than the turning threshold, then the following bee was instructed to turn such that it was heading in the same direction. 6.3 The robot bee controller 85 • The thresholds were set to allow for slight errors in the tracking information (rather than trying to copy a small movement that hadn’t happened). Errors in the segmentation process could cause apparent change in the position of the robot that were not real. 6.3.1 The bee controls A following bee could be sent four commands: LEFT, RIGHT, FORWARD and REVERSE. Each command, for each bee, had a different corresponding byte value so a command could be sent to only one bee (without confusing the others). 6.3.2 Bee control simulator To evaluate the control process without the need of continuously online robots, a simple simulator was built. An image could be loaded representing the line for the leading bee to follow. The leading bee, shown in the left pane of the simulator (in figure 6.8) performed the basic line following strategy. Whenever it moved or turned more than a set threshold an instruction was sent to the following bee (in the right pane) to copy that movement. Figure 6.9 shows the path followed by the controlled bee after the leading bee had completed one circuit. • The center of each bee is shown with a cross. The direction the bee is facing is shown with a small white line. The previous 100 positions of each bee are shown with small white dots — this indicates the path of the bee. This can be more clearly seen in figure 6.10 which shows a pixel-level view of the following bee. • The bee could only move forward in the direction it was currently facing. • To simplify the control process, the following bee could either turn or move in each time step. 86 6. Multiple robot tracking application: Robot implementation Figure 6.8: The start of the simulation Figure 6.9: The simulated paths of the bees after one circuit • Various delays and limitations were coded into the system to allow for inaccuracies in the real system. The movement of the following bee would be delayed for one frame. It would not move until the leading bee had either moved more than 2 pixels in one frame, or the leader had turned such that there was a difference of over 5◦ between the headings of the two robots. • Even with these limits placed on the system, the following bee was able to trace out a fairly accurate description of the path as shown in figure 6.9. • The controller would occasionally miss small turns by the leading bee so the follower was not properly lined up when it moved. This caused the path to gradually drift across the arena when the simulation was run for several ‘circuits’ of the dance. 6.3 The robot bee controller 87 Figure 6.10: Simulation of the following bee Matching the heading of the robots If the difference between the headings of the two bees was over a specified threshold, commands were sent to match the heading of the following bee with that of the leading bee. Simple calculations were performed to calculate the direction that the following bee had to turn to match its heading with that of the leader, such that it would turn the shortest distance. 6.3.3 The real robot The RCX unit has seven motor power settings. It can time in increments of 0.01s. Following bees identical to the leading dancing bee were built from LEGO (but without the light sensor brick). • To prevent the robot turning too far when trying to match the heading of the leading bee, a delay was set which waited for 6 counts (0.06s), allowing the turn to be executed, then the motors would be switched off. This ensured that the robot would not move too far in one turn. The controller would send a TURN signal each frame until the heading of the following robots matched that of the leader so the turn would be executed in a series of steps. • The turning delay had to be implemented because each robot could turn a maximum of 3◦ in each time slice which coupled with the delay in process- 88 6. Multiple robot tracking application: Robot implementation ing the position, sending a command, and the RCX stopping the robot, meant the robot could turn further than it was required before receiving a STOP signal. If the robot tried to achieve an accuracy of 5◦ (as obtainable by the tracking system) it would continuously sweep backwards and forwards continually correcting itself. By inserting this stepped movement the robot could be positioned more accurately. • The leading bee would not move forward for long periods. Most of its movements would be in short bursts which wouldnt be translated into movements in the following bee. This prevented tracking errors being mistaken for real movement. Movements by the following bee had to be enhanced to compensate for this. • The leading, line-following bee was slowed down further to allow more time for the following bee to match its heading. After each turn, before it started to move, the leader would wait for a number of cycles. • For the same reason, the turning rate of the leading bee was also reduced by achieving each turn in a ‘stepping’ motion. While experimenting with real robots, it was found that yellow LEGO bricks (and the yellow RCX units) would have an intensity similar to the paper used to make the indicators. Some positions of the robots (or darker bricks surrounded by yellow bricks) could cause the tracking system to match part of a LEGO structure with an indicator. This was obviously a serious design flaw (the mis-classification of indicators is the most serious problem in the tracking system). To overcome this, the trivial solution was to either not use these bricks, or to shield any bright areas with darker bricks. 6.4 Results 6.4 89 Results The arena was constructed on a desktop with length less than 2m. Because of time limitations all of the experiments were performed in this arena. It would have been more sensible to relocate the equipment to allow more arena-space (and to prevent damage to a robot if the controller mis-positioned it at the edge of the table). The ‘heading-matching’ part of the dance following method was implemented first. 6.4.1 Copying the heading of the leading bee Four identical bee robots were used. One followed the silver line placed onto the floor of the arena, the others attempted to match the orientation of this robot (by executing the commands sent by the controller). Figure 6.11: Three robot bees matching the heading of a leading bee After each turn the leading bee was delayed by at least5 0.4s to allow the other robots could catch it up. A MATLAB plot of the heading of each robot in each frame is shown in figure 6.12. 5 Depending on the angle turned. 90 6. Multiple robot tracking application: Robot implementation 400 200 0 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 Frame 300 350 400 450 500 400 300 200 100 400 300 200 100 Heading 400 300 200 100 Figure 6.12: The heading of the four bees at the start of the following process This plot shows the first 500 frames. It can be seen that there is a slight delay between the leading robot changing its heading and the others copying it. • The controlling process was only started in frame 90, from this point the robots all start to turn to face in the same direction as the leader (which is also turning in the opposite direction). • The robots are all aligned by frame 150. • Between frames 350–420 the IR control signal was not read by the third robot, resulting in it not changing its heading. The IR tower was repositioned at this point so that the robot could read the turn signal. • Around frame 240 the tracking system seems to have confused the heading of the fourth robot making it turn in the wrong direction. This was either because the wrong object was identified as the robot, the heading 6.4 Results 91 calculation or the controller failed, or maybe the robot turned slightly too far and had to adjust. Leading−bee 400 200 0 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 Frame 2500 3000 3500 4000 0 500 1000 1500 2000 Frame 2500 3000 3500 4000 400 200 0 400 200 0 Heading 400 200 0 Figure 6.13: The heading of the four bees over the following process Figure 6.13 shows the plot of the headings of the four robots for 4000 frames. In this time the leading robot made two circuits of the path. • It can be seen that the following robots match the heading of the leading robot successfully over this period. • The tracking system has problems with headings of around 0◦ . The heading of indicators in this position seems to fluctuate between 0◦ and 180◦ in successive frames. 92 6.4.2 6. Multiple robot tracking application: Robot implementation Following the dance path with a single bee A single robot bee was setup to test the dance following controller. More than one bee would not fit in the arena (time constraints prevented the arena being moved to a more sensible location). The controller instructed the following bee to move forward if the leading bee moved more than 3 pixels between frames. This prevented the bee moving because of poor indicator segmentation. The following robot moved for 0.15s, to ensure it moved a fairly large amount to compensate for small movements made by the leading bee that were not followed. The paths of both the leading bee and following bee are shown in figure 6.14. The orientation of the two paths are also overlayed. Figure 6.14: The dance path of the leading bee, the path of the following bee, and the orientation of the two paths • It can be seen that the orientations are approximately the same. The calculated orientation of the dance path performed by the leading bee was 76.62◦ whereas the orientation of the path formed by the following bee was 78.95◦. 6.5 Summary of the multiple agent tracking application 93 • The orientation of the followed path should be greater than this. The orientation of each complete circuit made by the robot seems to reduce as the robot drifts (because the headings of the robots are not completely matched when the following robot moves). • The orientation of the first circuit made by the bee was 84.44◦. This is still fairly accurate considering the delay and other limitations of the system. 6.5 Summary of the multiple agent tracking application • Providing the LEGO robots were slowed down sufficiently, the 0.06ms frame time obtainable by the vision hardware, allowed a robot controller to accurately implement a robot following behaviour by using the information obtained from the tracking system. This included the extra delay in sending and executing commands to the following robots. • This allowed the heading of a leading robot to be matched successfully by a number of robots. • The actual movements of the leading robot were harder to detect and hence follow. Uncertainties in the position of the robot meant that the system had to ignore apparent movements of less than 3 pixels (which could occur due to poor segmentation of the indicator). This meant that real small movements made by the robot were lost. Chapter 7 Conclusion This chapter re-outlines the initial goals of the project and discusses the effectiveness with which these were met while describing the inherent limitations built into the system. Each chapter of the report demonstrated the following: 1: Described the machine vision problems associated with tracking objects in successive frames. Principally these involved making a correct classification of an object given that a segmentation process had extracted the object pixels from the image successfully. This segmentation process was non-trivial. 2: The vision hardware used during the scope of this project was introduced. A point was made concerning whether a tracking system could process each frame sufficiently quickly for real time processing of the problem — this depended on the application and the type of robot. A cheap frame grabber was also introduced for the purposes of evaluating its use in future tracking applications (its reduced frame rate caused concerns as to how useful it would be). The existing single Khepera tracking system was explained. This system could not track more than one Khepera, principally because it made no 95 96 7. Conclusion attempt to identify different Khepera within the image, causing confusion as to which robot was which over time. 3: A method of segmenting and classifying objects within frames using object-indicators was introduced. This allowed an end user to design the indicators using simple materials to suit the end application. The indicators were uniquely classified by the number of markers placed upon them. This method allowed the system to recognise different objects and extract position and heading information from them. Experiments were performed testing the classification system in different environments. The system would work in the complex Khepera environment, but only if the robots were kept in a small section of the arena where the conditions were static. The system could not find all of the robots if they were in parts of the arena with different lighting/background conditions. 4: The basic tracking procedure was demonstrated. This performed the classification process in each frame. To increase the efficiency small sections of the image would be processed when looking for the objects. Windows centred on the previous positions of the robots were used to find individual robots and small sections of the arena were scanned to find any missing objects. Experiments were performed showing that the system could be used to obtain real time positions of Kheperas (using the existing Khepera hardware). Approximately 1 frame in 3 could be processed (depending on the amount of clutter in a frame), in which time a Khepera could move 8 pixels. This could be reduced by limiting the amount of objects/Kheperas/clutter in the frame and processing smaller windows. The detection process was improved such that the heading of an object could be extracted from the object-indicator (as opposed to its orientation). 7.1 Achievement of the goals of the project 97 A GUI debugger/system-interface was shown and the method used when storing and communicating the locations of the robots was explained. 5: The fifth and sixth chapters conducted a more elaborate test of the system. The fifth chapter gave a brief description of the waggle dance, a method used by bees to communicate the location of distant food sources. 6: The waggle dance was implemented by only using positional information from the tracking system following two ‘bee’ robots. The system modelled a method of a bee learning the orientation of the food source by copying the dance of another bee. The system was demonstrated using two LEGO MINDSTORM robots and the slower, evaluation vision hardware. By slowing the robot bees the system could accurately control the second bee such that the dance could be reasonably followed. A simulator of the control software was also shown. 7.1 Achievement of the goals of the project The principle goal of the project was to refine the existing Khepera tracking system to enable it to track several Khepera in real time. • The classification part of the system could detect any object by matching it with a unique object indicator found in the image. These indicators had to be sufficiently well defined to allow the system to accurately segment them from the image, and to identify the result. • It was shown (section 3.5.2, page 43) that this was possible in the Khepera arena. Although: → To identify the indicator, they had to be larger than the actual Khepera size making the system cumbersome. → The Khepera arena was not uniformly lit and the conditions could not be 98 7. Conclusion well controlled so a single static threshold was not sufficient to segment the indicators from the whole arena. → The Khepera system contained an overhead crane (supporting the umbilical cables allowing online running of the robots). These cast shadows around the arena and physically occluded parts of the arena such that all of the pixels making the indicators could often not be grouped together • The indicators allowed the position and heading of the corresponding robots to be extracted and uniquely identified thus solving the identification problem of the single Khepera tracking system (section 3.5.2, page 45). • Experiments demonstrated that this system could process a frame in 0.06s, in which time the Khepera could move a maximum of 8 pixels (section 3.5.3, page 46). The system had to be capable of distinguishing between several objects and be able to match positional information with real life positional knowledge of region in the environment. • The principal tracking problem of identifying the objects was solved by introducing simple invariant markers to the indicators. The number of these on each indicator identified the object. By identifying each indicator, in each frame, the probability of mis-classifying the objects was massively reduced (section 3.4, page 38). → However, by making this identification the objects were ‘lost’ more frequently. This was because the system relied on a fixed threshold that occasionally would result in poor segmentation of the image (or would not allow the markings to be extracted from the indicator). This was less serious than mis-classifying an object so the payoff was probably worth it, providing that the indicators were designed well 7.1 Achievement of the goals of the project 99 enough and the thresholds selected such that these instances were minimised. → Serious mis-classification problems could occur when an object had been lost and the system was performing a search. Stray shadows could cause random areas of the image to match the identification criteria such that the system matched an incorrect region with an object. By making identification checks in each frame, the system would eventually decide the incorrect region was not the object and resume the search. • Positional information was at the pixel level and could be communicated to external processes to use as they desired (section 4.1.4, page 57). A sub section of the problem was to evaluate the vision equipment used in the test arena setup as a cheap method of conducting tracking work in the future. • The indicator method was very general allowing the end user to build their own application dependent indicators. This allowed the system to be very general and work on a number of systems, as demonstrated with large LEGO robots in the test arena (section 6.2, page 83). • The parameters of the system could be changed to cope with application specific factors (section 4.1.1, page 53). Principally this involved scanning different parts of the image for the object in frames, depending on the speed of the robot and hence its displacement between each frame. The faster the robot the less accurate predictions could be made about its position, and larger areas had to be searched (reducing execution time). 100 7.2 7. Conclusion Assumptions and limitations The end user had to select thresholds that would apply over the whole tracking space during the whole tracking period. A single fixed threshold could never provide perfect segmentation. It was shown that only part of the Khepera arena could be used with a single threshold (section 3.5.2, page 44). Changes in the environment would cause objects to be frequently lost, even in controlled environments. Very occasionally these changes would cause an object to be lost (section 4.3.2, page 62) and also cause the system to identify an incorrect part of the scene as the object (while searching for the missing object). The system identified each object in each frame, even when a prediction of the location of an object had been made and a likely looking object was found. This prevented the most serious problem, mis-classification of objects, but increased the number of lost points. The indicators had to be carefully designed by the end user such that they were easily separable from the image and with well defined, separable hole markers (section 3.5.1, page 42). The demonstration application showed that even brightly coloured (gray or yellow) LEGO bricks could confuse the system (section 6.3.3, page 87). Shadows cast by the ‘nodules’ on the LEGO bricks, or regions of dark bricks inside light boundarys could cause the identification process to class this part of LEGO as an indicator. The tracking system ultimately tracks the object indicators around the image, and not the robots. Poor or incorrect positioning of the indicator would cause obvious confusion to the system. Statistical methods, coupled with accurate segmentation, could maybe track the actual robots in the image without resorting to object-indicator methods, providing the shape of the robot can be well defined. 7.3 Advice for extending this work 7.3 101 Advice for extending this work The object indicators are identified in each frame to compensate for poor segmentation. However this increases the time required to process the frame and causes several points to be lost where an object fails the identification test. With better object segmentation, or with a different method of invariant classification of the individual objects, this identification would not be required. A better study of the materials used to build indicators could be conducted to select a reflective or shiny material. More uniformly placed overhead lights, and a darker background could be used to allow a thresholding selecting process to be more accurate. The main problems are: • Selecting a threshold that stays constant during the tracking run (and which applies over the whole arena). • Making predictions of the location of the object (to reduce the searching time) and having confidence of its classification so its identification doesn’t need to be checked. Having confidence in the predictions would also allow the system to accurately guess the position of an object if it was lost. 7.3.1 Automatic threshold selection If the environment was carefully setup such that it was uniformly illuminated and with a uniform background, the user could select a threshold to segment the object indicators from the image approximately successfully. Problems arose when the lighting conditions changed, either because of a change in lighting or external objects creating shadows. In these cases a fixed threshold would no longer apply, resulting in some of the indicators not being segmented correctly. 102 7. Conclusion No environment could be controlled such that one fixed threshold would successfully segment all of the objects for the entire tracking period. In the Khepera arena this is made worse in that the scene is never uniformly illuminated (because of the lighting arrangement) and consists of slightly differently textured and coloured pieces of chipboard. One threshold could not apply across the whole arena, so experiments had to be limited to a small section of the arena (section 4.3.5, page 68). Segmentation errors are the principle cause of poor tracking. All of the tracking elements require an accurate segmentation to have been made. With sufficient knowledge, automatic thresholding techniques can select an approximately accurate threshold. Selecting thresholds in evenly illuminated scenes Methods to automatically select a threshold require some general knowledge of the objects, environment and application (such as the size, number, or intensity characteristics of the objects). An image can be assumed to contain n objects O1 ...On 1 and gray values from different populations π1 ...πn with probability distributions p1 (z)...pn (z). In many applications the probabilities P1 ...Pn of the objects appearing in the image are known. The illumination of the scene controls the gray values, so pi (z) cannot be known beforehand so most simple auto-thresholding methods make a prediction of this, using knowledge of the application and a histogram of the gray level intensitys contained in a specific frame. Methods exist to select a threshold such that a certain % of the gray level histogram is partitioned so that the same % of pixels are selected [3]. Another method assumes the object intensitys are affected by Gaussian noise. The peaks and troughs in the gray level histogram can be partitioned by 1 Including the background. 7.3 Advice for extending this work 103 computationally searching for the troughs. However the peaks are generally not well separated making this a non-trivial problem [9]. Selecting thresholds in unevenly illuminated scenes With scenes that are unevenly lit, different thresholds are required in different sub-images. Adaptive methods divide the image into small sub-images. Thresholds in each sub-image are selected and applied to this smaller region. The final segmentation of the image is obtained from the union of all the thresholded sub-images. Variable methods approximate the intensity across the image by a simple function (plane or biquadratic). The function fit is determined by the gray level of the background (termed background normalisation). The gray level histogram and threshold selection are performed in relation to the base level determined by the fitted function [3]. As the complexity of the image increases the performance decreases — often a fixed threshold selected by a user will perform better. Automatic threshold selection involves analysis of the gray level histogram of the image, which removes the spatial information of the intensity [9]. 7.3.2 Simple prediction of object positions If the previous location of an indicator was known, the system would scan a small window based on this position when checking the new position of the indicator (section 4.1 on page 52). The smaller this window the less computational power was required. As the number of indicators in a frame built up this became a requirement, especially when using the slower Khepera arena hardware (section 3.5.3 on page 46). Also the smaller the window the more effective a threshold could be selected by applying auto-threshold methods in the sub window [9]. 104 7. Conclusion Rather than using the maximum distance a robot could travel to select a size of window (and center it at its previous position) a simple extension could be made to use the maximum speed and acceleration of the robot coupled with its current speed and heading information to position a window more accurately. The size of the window would allow for error and uncertainty in the predicted position. 7.3.3 Using statistical methods to derive estimates of position Kalman Filter techniques have been successfully applied to object tracking problems [21]. These techniques, though computationally complex, are reliable and generally the gain in efficiency outweighs the expense. The filter is primarily used to predict the locations of the objects (using basic 2D models of the mechanics, and error and prediction parameters learnt from the real positions of the object). These accurate predictions allow a smaller sub-image to be searched for the object which aids both the processing time and threshold selection (when selected by an automatic process). The accuracy of the position also allows the system to confidently predict the location of an object that has been occluded by other objects in the scene. 7.4 What was learnt during the scope of the project All vision problems boil down to making a good segmentation of the interesting objects from an image. The work performed on top of this process in this project was relatively simple so more time could have been spent in experimenting with different segmentation methods to a achieve a better 7.4 What was learnt during the scope of the project 105 system (although this could still be accomplished without effecting the rest of the system). Away from the actual subject of the report, the writer also learnt valuable time management, resource sharing and collaboration skills, as well as learning useful skills in both MATLAB and LATEX. Bibliography [1] Carceroni and Rodrigo. Matrox meteor device driver. http://www.cs.rochester.edu/users/grads/carceron/meteor int 1.4.1/meteor int 1.4.1.html , 1994. [2] H Esch and J.A. Bastian. How do newly recruited honeybees approach a food site? In Physiology, volume 68, pages 175–181. 1970. [3] Fisher, Perkins, Walker, and Wolfart. Hypermedia image processing reference. http://www.dai.ed.ac.uk/daidb/people/homes/rbf/HIPR/HIPRsrc/html/hipr top.htm , 2000. [4] K Von Frisch. The dance language and orientation of honeybees. Belknap/Harvard University Press, 1967. [5] Carlos Gomez Gallego. Object Recognition and tracking in a robot soccer environment. PhD thesis, The University of Queensland, 1998. [6] JP Gambotto. A new approach to combining region growing and edge detection. Pattern Recognition Letters, 14:869–875, 1993. [7] Gose and Johnsonbaught. Pattern Recognition and Image Analysis. Prentice Hall PTR, 1996. [8] AG Heiss. Big physical area paderborn.de/fachbereich/AG/heiss/linux/bigphysarea.html 107 patch. , 1994. http://www.uni- 108 BIBLIOGRAPHY [9] Schnunck. Jain, Kasturi. Machine Vision. McGraw-Hill International Editions, 1995. [10] ”Newton Research Labs”. The cognachrome vision system. http://www.newtonlabs.com/cognachrome/index.html [11] ”RHAD Labs”. Video for linux. . http://roadrunner.swansea.linux.org.uk/v4l.shtml , 2000. [12] Henrik Hautop Lund, Esther de Ves Cuenca, and John Hallam. A simple real time mobile robot tracking system. Technical Report 41, Department of Artificial Intelligence, University of Edinburgh, 1996. [13] A Michelsen.... How honeybees perceive communication dances studied by means of a mechanical model. In Behavioral Ecology and Sociobiology, volume 30, pages 143–150. 1992. [14] R. Nelson. Lego mindstorms internals. http://www.crynwr.com/lego-robotics/ , 2000. [15] Sunato Quek. A real time multi-agent visual tracking system for modelling complex behaviours on mobile robots. Master’s thesis, Artificial Intelligence, Division of Informatics, University of Edinburgh, 2000. [16] J Russ. The Image Processing Handbook. London: CRC Press, 1995. [17] A Saighi. Image processing and image analysis for computer vision. Keele: University of Keele Press, 1987. [18] Trolltech. Qt free edition. http://www.trolltech.com/products/download/freelicense/qtfreedl.html , 2000. [19] Nemosoft Unv. http://www.smcc.demon.nl/webcam/ Linux , 2000. support for usb webcams. BIBLIOGRAPHY 109 [20] Vernazza, Venetsanopolous, and Braccini. Image Processing: Theory and Applications. Elsevier, Amsterdam, 1993. [21] Greg Welch and Gary Bishop. Scaat: Incremental tracking with incomplete information. In T. Whitted, editor, Computer Graphics, pages 333–344. ACM Press, Addison-Wesley, August 3 - 8 1997. Appendix A Machine vision algorithms A.1 Component labelling The standard component labelling method finds all of the connected components in an image and assigns a unique label to each component. All of the pixels forming a component are assigned with that label. This technique forms a label map which describes the image in terms of the object that each pixel represents (pixels that are not part of an object are defined as the background). The sequential form of the algorithm requires two passes of the image. The first sweep scans the image from left to right and top to bottom. If a pixel is part of the foreground (the pixel is ‘on’) the two neighbouring pixels immediately above it, and to its left are examined1 . The possible three cases are: 1: Neither of the previously examined neighbour pixels has been labelled. ⇒ The current pixel is assigned a new label - it seems to belongs to a new object. 2: Only one of the neighbour pixels is labelled, or both of the neighbours 1 These are its four neighbours that have already been examined by the algorithm 111 112 A. Machine vision algorithms are assigned the same label. ⇒ The current pixel is assigned the same label as the connected neighbour pixel — This pixel belongs to the same object as it labelled neighbour. 3: The two neighbouring pixels have been assigned different labels. ⇒ Two different labels have been assigned to the same component. As well as adding the current pixel to the component, the two labels assigned to the two neighbour pixels must be combined by using an equivalence table. The equivalence table contains the information used to assign unique labels to each connected component. During the first scan, all of the labels used to define a component are declared equivalent. At the end of the pass the table is renumbered so there are no gaps in the labels. Then during the second sweep the label assigned to each pixel is renumbered using the equivalence table as a lookup. This results in a m x n image (the label map) with each pixels value identifying the region that the corresponding pixel in the original image belonged to. To summarise: Connected components algorithm using 4-connectivity (A.1.1) 1: Scan the image left to right, top to bottom. 2: If the pixel is on, then → If only one of the upper or left neighbour has a label, then copy that label. → If both neighbours have the same label, then copy that label. → If both neighbours have different labels, then copy the upper’s label and enter the labels into the equivalence table. → Otherwise assign a new label to this pixel. 3: If there are more pixels goto step 2. 4: Find the lowest label for each equivalent set. A.2 Boundary tracking 113 5: Renumber labels in equivalence set, so there are no missing labels 6: Scan the picture. Replace each label by the lowest label in the equivalence set. A.2 Boundary tracking The boundary of a connected component S is the set of pixels of S that are adjacent to the background S̄. A common approach is to track the pixels on the boundary in a clockwise sequence. A common boundary tracking algorithm [9] selects a starting pixel s ∈ S known to be on the boundary and tracks the boundary until it reaches the starting pixel s, assuming that the boundary is not on the edge of the image. Boundary following algorithm (A.1.2) 1: Find the starting pixel s ∈ S for the region. 2: Let the current pixel be c. Set c = s and the the 4- neighbour to the ‘west’ of c be b ∈ S̄, ie a boundary pixel. 3: Let the eight 8-neighbours of c starting with b in a clockwise order be n1 , n2 , ..., n8 . Find ni for the first i that is in S. 4: Set c = ni and b = ni−1 . 5: Repeat steps 3 and 4 until c = s. A.3 Locating Holes The identification scheme used to classify object-indicators in this report (section 3.4 on page 38) uses dark markers placed on the indicator. These markers appear as ‘holes’ in the indicator to the vision system. The set of all connected components S̄ (the complement of S), that have points on the border of an image are background regions. All others components of S̄ are holes [9]. 114 A. Machine vision algorithms IMAGE ON PIXEL OFF PIXEL BACKGROUND HOLES Figure A.1: Identifying an indicator by holes When searching for holes to avoid ambiguity, a pixel’s 4-neighbours are examined when connecting foreground pixels and its 8-neighbours are used when finding background pixels. If the same ‘connectedness’ is used then awkward situations can be encountered. A simple case is shown in table A.1. If the off pixels (0’s) are connected then the on pixels should not be. If the foreground pixels were joined by using an 8-connectedness mask then a four connectedness mask should be used to search for background pixels (to prevent this ambiguity) [9]. 1 0 0 1 Table A.1: Connectedness ambiguity The algorithm searches for the perimeter of each object, these perimeter pixel coordinates are sorted, so there is a list of column boundaries of the object for each row. Using this information the object is scanned using a connected8 algorithm (similar to connected-4, but using the north-west neighbour as well). This allows the number of holes (and their moments) to be calculated in a similar method to the moments of objects. A.4 Orientation of an objects axis of elongation 115 x y REGION AREA SCANNED FOR HOLES Figure A.2: The area scanned when looking for holes A.4 Orientation of an objects axis of elongation The orientation of an object can be represented by the orientation of its axis of elongation (provided it has one), as in section 3.1.1 on page 29. Usually the axis of least second moment (the axis of least inertia in 2D) is used as the axis of elongation. This axis is the line for which the sum of the squared, perpendicular distances between the object points and the line is minimised. χ2 = n X m X rij2 B[i, j] (A.1) i=1 j=1 where rij2 is the perpendicular distance from the object point [i, j] to the line. The line is represented in polar coordinates to avoid numerical problems when the line is vertical: ρ = xcosθ + ysinθ (A.2) where θ is the orientation of the normal to the line with the x axis, and ρ is the perpendicular distance of the line from the origin. 116 A. Machine vision algorithms The distance r of a point (xi , yi) from the line is obtained by plugging these coordinates into the equation of the line and finding the error from the real line: r 2 = (xi cosθ + yisinθ − ρ)2 (A.3) Plugging this distance representation into the minimisation criteria (equation A.1) and setting the derivative of χ2 with respect to ρ to zero and solving for ρ: ρ = (x̄cosθ + ȳsinθ) (A.4) Which shows that the regression line passes through the center of the object (x̄, ȳ). By setting x0 = x − x̄ and y 0 = y − ȳ and substituting the value of ρ from A.4 into A.3 and then A.1 allows the minimisation problem to become: χ2 = acos2 θ + bsinθcosθ + csin2 θ (A.5) with the parameters (the second order moments) shown in equation 3.2 on page 29: a = n X m X (xij − x̄)2 B[i, j] i=1 j=1 n X m X b = 2 c = i=1 j=1 n m XX (xij − x̄)(yij − ȳ)B[i, j] (yij − ȳ)2 B[i, j] i=1 j=1 A.4 Orientation of an objects axis of elongation 117 By inserting these coefficients into A.5, differentiating χ2 and solving for θ yields the orientation of the axis of elongation from the x axis as shown in equation 3.3 on page 30: tan2θ = A.4.1 b a−c (A.6) Efficient method of calculating orientation Section 3.2.2 on page 34 demonstrated the use of more efficient algorithms to process the image. This required a change in the calculation of the second order moments so the first order moments (x̄, ȳ) were not needed during the processing of each individual pixel. By not setting x0 = x − x̄ and y 0 = y − ȳ, r 2 can be shown to be: r 2 = (x − x̄)2 cosθ + (2xy + 2x̄ȳ − x̄y − xȳ)sinθcosθ + (y − ȳ)2 sinθ (A.7) Substituting this into the minimisation equation A.1 and using the gray level intensitys Fij resulted in the minimisation equation shown in A.5 with the second order moments demonstrated in equation 3.4 on page 34: a = n X m X Fij x2ij i=1 j=1 X n X m b = 2 + n X m X i=1 j=1 Fij xij yij − x̄ i=1 j=1 +x̄ȳ c = Fij xij + n X m X n X m X Fij + x̄2 − 2x̄ i=1 j=1 n X m X Fij yij − ȳ i=1 j=1 n X m X Fij xij i=1 j=1 Fij i=1 j=1 n n X m m XX X 2 Fij yij + Fij yij i=1 j=1 i=1 j=1 + n X m X i=1 j=1 Fij + ȳ 2 − 2ȳ 118 A. Machine vision algorithms These could be used as before to calculate the orientation using equation A.6. A.5 The heading of an object-indicator The orientation of the object gave no indication as to which way along the axis the object was facing. Section 4.2.1 on page 60 explained the method used to mark the front of the object by the hole markers. The center of the hole markers gave an indication as to which way along the axis of elongation the object was facing. As described in section 3.4 (page 38) the pixel moments of the hole markers are taken into consideration when calculating the position and orientation of the indicator. This provides a more accurate measure of the moments. After calculating each individual hole’s moment, the center of mass position of the holes could be found by finding the average position of each of the holes2 . The position of this in relation to the center of mass position of the whole indicator gave a clue as to the heading of the indicator. α Figure A.3: Indication of object heading from the center of holes This calculation was not used as the indicator’s heading, because the mid position of the holes could not be easily determined when placing the markers. Instead this heading determined whether the more accurate heading from the 2 Every pixel that was part of a hole had equal weighting A.5 The heading of an object-indicator 119 orientation calculations in the range [0◦ , 180◦ ] needed to be scaled by 180◦ into the range [180◦ , 360◦ ]. TRADITONAL ORIENTATION HEADING FROM HOLE POSTION (ERROR) HEADING Figure A.4: Deriving the indicator heading Appendix B Robot implementation algorithms B.1 Line following The leading bee robot shown in section 6.1.2 (page 81) completed a repeatable waggle dance by follow a dance path traced on the arena floor with silver tape. The algorithm used to follow the silver tape was: 1: Wait for <delay> seconds. This allowed the user to position the robot near to the line, and place the indicator before the line following process started. 2: Move forward until the robot had found the line. This had occurred when the value from the light sensor was greater than the threshold determined for when the sensor was on the line. 3: While the light sensor value was greater than <threshold> move forward. 4: If the light sensor value was less than <threshold> then pivot around the turning axis scanning for the line: A: Set <turn counter> to limit the amount that the robot turned. The 121 122 B. Robot implementation algorithms robot turned for a limited number of processor cycles. B: Clear <counter>. This counted up to <turn counter> then the robot stopped turning in that direction. C: If the light sensor value was greater then <threshold>, then the robot was over the line. The sweep process could stop. Goto stage 3. D: While the light-sensor value was less than <threshold> AND <counter> was less than <turn counter>: • Turn in direction of the last turn. E: If light-sensor value was less than <threshold> AND <counter> was greater than <turn counter>: • Reverse turn. The robot started to sweep in the opposite direction. • Increment <turn counter> to allow the robot to sweep for a longer period. • Repeat the sweep process. Goto stage B. 5: Repeat from step 3. • This algorithm allowed the robot to scan for the line when it had moved off it. The robot would sweep in one direction for a small amount of time, then if the line had not been found, it would sweep in the other direction for a slightly longer period. This would be repeated until the line was found, which ensured the robot eventually turned the shortest amount to find the line. • After losing the line the robot would initially turn in the direction it had last turned. Having found the line (after performing a sweeping operation) the robot would be just touching the line, but not aligned with the line. After moving it would move off the line again, and would most likely find the line a short distance in the direction it had last turned in. B.2 Sending a byte command via the MINDSTORM transceiver All commands to the RCX unit are sent byte by byte. A simple message can be sent via the IR tower by filling in an 9 byte transmition message packet B.2 Sending a byte command via the MINDSTORM transceiver 123 SILVER LINE T1 T2 T3 LOST LINE FOUND LINE LOST LINE Figure B.1: Line following: Scanning for the line (as shown in table B.2). The IR tower can operate in two modes, this section only documents the ‘send-slow’ mode. Also notice that the complement of each byte, and a final checksum need to be included as part of the message. • The first three bytes of the message represent the RCX sync pattern. • The third byte should be the RCX Message flag, the fourth byte contains its complement. • The fifth byte contains the actual byte message sent to the RCX, the sixth contains its complement. • The seventh byte contains the message checksum with is the sum of the third (RCX Message flag) and fifth (actual message) bytes. The eighth byte contains the complement of the checksum. Byte Contents 0 0x55 1 0xFF 2 0 3 RCX Message 4 ∼RCX Message 5 Message 6 ∼Message 7 Checksum 8 ∼Checksum Description RCX sync pattern End of sync pattern Message packet indicator Complement of byte 3 Actual message sent Complement of byte 5 byte(3)+byte(5) Complement of byte 7 Table B.1: Contents of a transmit-message packet sent to the RCX IR tower