A real time multi-agent visual tracking system for modelling complex

Transcription

A real time multi-agent visual tracking system for modelling complex
A real time multi-agent visual
tracking system for modelling
complex behaviours on mobile
robots
Luke Robertson
MSc in Artificial Intelligence
Division of Informatics
University of Edinburgh
2000
Abstract
An overhead camera system capable of tracking a single ‘Khepera’ robot
around a large arena in real time is in common use by the Mobile Robot
Group at the Division of Informatics, Edinburgh University. This system,
while very successful at tracking one robot, could not be used to track more
than one object successfully.
This report documents an extension to this system capable of locating, identifying and tracking a number of mobile objects within a set of images in
real-time. The system is generic and can be applied to any type of robot and
vision hardware.
The system presented in this report is shown to be able to track up to seven
Khepera robots using a 75 MHz processor.
An example application of the system is demonstrated by implementing a
robot following task, using a number of LEGO MINDSTORM robots, using
only the positional information obtained from the tracking system. This application is demonstrated successfully using a relatively slow frame grabbing
device.
Acknowledgements
I would like to thank:
My supervisor Chris Malcolm for his advice in presenting this MSc report
and his help during the scope of the project.
My second supervisor, Yuval Marom, has provided patient and useful guidance in obtaining and using resources and of knowledge of the existing single
Khepera tracking system.
Sunarto Quek collaborated in the design of the tracking system, providing
some useful tips and ideas for constructing the indicators, extracting regions
from the image and efficiently processing the image.
Louise Bowden, Nico Kampchen, Emanuele Menegatti, Marietta Scott, Jan
Wessnitzer and Paul Wilson all provided valuable help with the twin joys
that are MATLAB and LATEX.
The EPSRC have supported my work over the year of the MSc course.
ii
Contents
1 Introduction and background
1.1 Introduction . . . . . . . . . . . . . . .
1.1.1 Project outline . . . . . . . . .
1.1.2 Report Outline . . . . . . . . .
1.1.3 Collaborative work . . . . . . .
1.2 Social robotics . . . . . . . . . . . . . .
1.3 Object Tracking . . . . . . . . . . . . .
1.3.1 Segmentation . . . . . . . . . .
1.3.2 Classification . . . . . . . . . .
1.3.3 Object tracking . . . . . . . . .
1.4 Robot tracking . . . . . . . . . . . . .
1.4.1 The Cognachrome vision system
1.4.2 Roboroos robot soccer team . .
1.5 Robot Tracking summary . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Single Khepera tracking system and the vision
2.1 The single Khepera tracking system . . . . . . .
2.1.1 The Khepera arena . . . . . . . . . . . .
2.1.2 Meteor Driver . . . . . . . . . . . . . . .
2.1.3 Operation . . . . . . . . . . . . . . . . .
2.2 Evaluation system . . . . . . . . . . . . . . . . .
2.2.1 Layout of the test system . . . . . . . .
2.2.2 Video for Linux driver . . . . . . . . . .
2.3 Hardware and single Khepera tracking summary
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
hardware
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
3 Object segmentation and classification
3.1 Vision techniques . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Moments . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Grouping pixels to regions - Connected Components
3.1.3 Boundary tracking . . . . . . . . . . . . . . . . .
3.2 Efficiency of the vision algorithms . . . . . . . . . . . . .
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
4
4
5
5
7
9
10
10
11
12
.
.
.
.
.
.
.
.
13
14
14
15
17
20
21
22
24
.
.
.
.
.
25
26
27
31
33
33
3.3
3.4
3.5
3.6
3.2.1 Efficient connected component algorithm
3.2.2 One pass orientation calculation . . . . .
Region finding experiments . . . . . . . . . . . .
3.3.1 Indicator classification experiments . . .
Extending the classification system . . . . . . .
3.4.1 Processing regions . . . . . . . . . . . .
Detection tests and indicator design . . . . . . .
3.5.1 Design of the indicators . . . . . . . . .
3.5.2 Single frame classification tests . . . . .
3.5.3 Execution time . . . . . . . . . . . . . .
Detection and identification summary . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
34
36
36
38
40
41
42
43
46
49
4 Object Tracking
4.1 Method of operation . . . . . . . . . . . . . . . .
4.1.1 Tracking and windowing . . . . . . . . . .
4.1.2 Scanning and detecting . . . . . . . . . . .
4.1.3 Indicator identification . . . . . . . . . . .
4.1.4 System output . . . . . . . . . . . . . . .
4.1.5 System debugger . . . . . . . . . . . . . .
4.1.6 Basic performance evaluation . . . . . . .
4.2 Extensions to the basic system . . . . . . . . . . .
4.2.1 Calculating the heading of an indicator . .
4.3 Tracking experiments . . . . . . . . . . . . . . . .
4.3.1 Execution time . . . . . . . . . . . . . . .
4.3.2 Static positions . . . . . . . . . . . . . . .
4.3.3 Tracking objects moving in a straight line
4.3.4 Tracking simple robot tasks . . . . . . . .
4.3.5 Tracking Kheperas . . . . . . . . . . . . .
4.4 Tracking summary . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
52
53
54
55
56
57
59
59
60
61
61
62
65
66
68
69
5 Multiple robot tracking application: The waggle
5.1 Bee foraging and communication . . . . . . . . .
5.2 The round dance . . . . . . . . . . . . . . . . . .
5.3 The waggle dance . . . . . . . . . . . . . . . . . .
5.3.1 Dance description . . . . . . . . . . . . . .
5.3.2 Orientation of the food source . . . . . . .
5.3.3 Distance . . . . . . . . . . . . . . . . . . .
5.4 Bee communication summary . . . . . . . . . . .
dance
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
71
72
73
73
74
75
75
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Multiple robot tracking application: Robot implementation
6.1 The dancing bee . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Robot construction . . . . . . . . . . . . . . . . . . . .
6.1.2 Line following method . . . . . . . . . . . . . . . . . .
6.1.3 Extracting orientation information from the robot dance
path . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 The dance following bee . . . . . . . . . . . . . . . . . . . . .
6.2.1 Communicating with the MINDSTORM bees . . . . .
6.3 The robot bee controller . . . . . . . . . . . . . . . . . . . . .
6.3.1 The bee controls . . . . . . . . . . . . . . . . . . . . .
6.3.2 Bee control simulator . . . . . . . . . . . . . . . . . . .
6.3.3 The real robot . . . . . . . . . . . . . . . . . . . . . . .
6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1 Copying the heading of the leading bee . . . . . . . . .
6.4.2 Following the dance path with a single bee . . . . . . .
6.5 Summary of the multiple agent tracking application . . . . . .
7 Conclusion
7.1 Achievement of the goals of the project . . . . . . .
7.2 Assumptions and limitations . . . . . . . . . . . . .
7.3 Advice for extending this work . . . . . . . . . . . .
7.3.1 Automatic threshold selection . . . . . . . .
7.3.2 Simple prediction of object positions . . . .
7.3.3 Using statistical methods to derive estimates
7.4 What was learnt during the scope of the project . .
A Machine vision algorithms
A.1 Component labelling . . . . . . . . . . . . . . . .
A.2 Boundary tracking . . . . . . . . . . . . . . . . .
A.3 Locating Holes . . . . . . . . . . . . . . . . . . .
A.4 Orientation of an objects axis of elongation . . . .
A.4.1 Efficient method of calculating orientation
A.5 The heading of an object-indicator . . . . . . . .
Appendices
.
.
.
.
.
.
77
79
79
81
82
83
83
84
85
85
87
89
89
92
93
95
. . . . . . 97
. . . . . . 100
. . . . . . 101
. . . . . . 101
. . . . . . 103
of position104
. . . . . . 104
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
. 111
. 113
. 113
. 115
. 117
. 118
110
B Robot implementation algorithms
121
B.1 Line following . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
B.2 Sending a byte command via the MINDSTORM transceiver . 122
v
vi
List of Figures
1.1
1.2
1.3
Example of edge detection . . . . . . . . . . . . . . . . . . . . 7
View of football arena . . . . . . . . . . . . . . . . . . . . . . 10
Identifying Robo-soccer players . . . . . . . . . . . . . . . . . 11
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
Layout of the Khepera arena . . . . . . . . . . . . . . . . . . .
The YUV planar arrangement . . . . . . . . . . . . . . . . . .
Memory layout of a UNIX system . . . . . . . . . . . . . . . .
A Khepera in the arena, and a pixel-level view . . . . . . . . .
Two Kheperas interacting . . . . . . . . . . . . . . . . . . . .
The Khepera Arena . . . . . . . . . . . . . . . . . . . . . . . .
The layout of the test system . . . . . . . . . . . . . . . . . .
Video for Linux - Capturing process . . . . . . . . . . . . . . .
A (covered) Khepera in the test arena (and a pixel-level view)
15
16
17
18
19
21
22
23
24
3.1
3.2
3.3
3.4
3.5
3.6
Differently parameterised indicators . . . . . . . . . . . . . .
Pixel labelling conventions . . . . . . . . . . . . . . . . . . .
The orientation of an object . . . . . . . . . . . . . . . . . .
Pixel neighbours . . . . . . . . . . . . . . . . . . . . . . . .
Describing thresholded image with regions . . . . . . . . . .
Object representation after one pass of the connected component algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
Extracted moment information from an image . . . . . . . .
An A4 sized indicator in the Khepera arena . . . . . . . . .
The indicator marking system . . . . . . . . . . . . . . . . .
Identifying an indicator by holes . . . . . . . . . . . . . . . .
An example of the system being unable to find a hole in an
indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scene containing 7 object indicators before and after classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Detecting a Khepera sized indicator . . . . . . . . . . . . . .
Example of a partially obscured indicator being incorrectly
classified . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
26
27
29
31
32
.
.
.
.
.
34
36
37
39
40
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
vii
. 42
. 44
. 45
. 46
3.15 Indicators spread over the Khepera arena . . . . . . . . . . . . 47
3.16 Testing the classification of indicators . . . . . . . . . . . . . . 48
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.14
Tracking by appling the detection system in successive frames
Searching for indicators using windows . . . . . . . . . . . . .
System debugger: The scanning band sweeping the image . . .
Sharing object positions with other processes . . . . . . . . . .
Listening for object positions . . . . . . . . . . . . . . . . . .
View of the system debugger . . . . . . . . . . . . . . . . . . .
The heading of an indicator . . . . . . . . . . . . . . . . . . .
Detecting the headings of indicators . . . . . . . . . . . . . . .
The positions of two static indicators . . . . . . . . . . . . . .
The observed COM positions of an object on a straight line
path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The straight line path (y-magnified) . . . . . . . . . . . . . . .
Paths of four robots being tracked . . . . . . . . . . . . . . . .
The respective size of the Khepera and indicator used in Khepera tracking experiments . . . . . . . . . . . . . . . . . . . . .
Paths of two Kheperas being tracked . . . . . . . . . . . . . .
5.1
5.2
5.3
The Round Dance . . . . . . . . . . . . . . . . . . . . . . . . . 72
The Waggle Dance . . . . . . . . . . . . . . . . . . . . . . . . 74
Orientation of food source from hive . . . . . . . . . . . . . . 74
6.1
6.2
6.3
Robot bees . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The dancing bee . . . . . . . . . . . . . . . . . . . . . . . . .
The dance path, with overlayed positions obtained from the
tracking system . . . . . . . . . . . . . . . . . . . . . . . . . .
The layout of the gears in the robot bee . . . . . . . . . . . .
Overhead view of the robot bee . . . . . . . . . . . . . . . . .
Extracting the orientation of a food source from the positions
of a dancing bee . . . . . . . . . . . . . . . . . . . . . . . . . .
Communication between the RCX and a PC . . . . . . . . . .
The start of the simulation . . . . . . . . . . . . . . . . . . . .
The simulated paths of the bees after one circuit . . . . . . . .
Simulation of the following bee . . . . . . . . . . . . . . . . .
Three robot bees matching the heading of a leading bee . . . .
The heading of the four bees at the start of the following process
The heading of the four bees over the following process . . . .
The dance path of the leading bee, the path of the following
bee, and the orientation of the two paths . . . . . . . . . . . .
4.11
4.12
4.13
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
viii
51
53
55
57
57
58
60
61
63
65
65
66
68
69
78
79
80
80
81
83
84
86
86
87
89
90
91
92
A.1
A.2
A.3
A.4
Identifying an indicator by holes . . . . . . . .
The area scanned when looking for holes . . .
Indication of object heading from the center of
Deriving the indicator heading . . . . . . . . .
B.1 Line following: Scanning for the line
ix
. . . .
. . . .
holes .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
114
115
118
119
. . . . . . . . . . . . . . 123
x
List of Tables
3.1
3.2
3.3
3.4
Moments derived from an A4 indicator in the Khepera arena
Moment invariance of an indicator in the test arena . . . . .
Design of the object indicators in the two arenas . . . . . . .
Execution time of the detection process . . . . . . . . . . . .
.
.
.
.
4.1
4.2
Execution time of the tracking process . . . . . . . . . .
Evaluations of the tracked positions obtained from static
dicators . . . . . . . . . . . . . . . . . . . . . . . . . . .
Heading calculations from static indicators . . . . . . . .
The number of instances of the system losing a robot . .
. 62
4.3
4.4
. .
in. .
. .
. .
37
38
43
47
. 63
. 64
. 67
A.1 Connectedness ambiguity . . . . . . . . . . . . . . . . . . . . . 114
B.1 Contents of a transmit-message packet sent to the RCX IR
tower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
xi
Chapter 1
1.1
Introduction
An overhead camera system capable of tracking a single ‘Khepera’ robot
around a large arena in real time is in common use by the Mobile Robot
Group at the Division of Informatics, Edinburgh University. This system,
while very successful at tracking one robot, could not be used to track more
than one object.
1.1.1
Project outline
The outline for this project was to:
• Refine the tracking system to allow for several robots and static objects to
be tracked using the existing hardware.
• Construct a system capable of distinguishing between different objects.
• Be able to match the positional information of the detected objects with
built-in positional information of critical regions in the environment.
• Maintain the efficiency required for real-time control of the robots from
the vision system.
1
2
1. Introduction and background
This report describes a visual system capable of locating, identifying and
tracking a number of mobile objects within a set of images. The system
has been designed to follow a number of generic robots around a controlled
environment.
The system was required to extract the real time positions of the objects to
allow this information to be used to control the robots, and to accurately
describe the path of the robot. The required execution time of the system
depended on the speed of the robot (and the robotics application). If a robot
could not move far between each time-step then the system could process
frames at a slow rate. If the tracked robot could move across the frame, or
the application required fast sampling, then the system would need to process
frames at a faster rate. The system presented in this report system is shown
to be able to track up to (a theoretical) seven Khepera robots (in real time)
using a 75 MHz CPU. Demonstrations of the system are performed showing
its performance on different processors, frame grabbing hardware and robots,
applied to different applications.
In the second half of the report an example application is demonstrated by
implementing a robot following task using a number of LEGO MINDSTORM
robots using only the positional information obtained from the tracking system as input. This application is demonstrated successfully using a relatively
slow (15fps), but cheap, frame grabbing device that was being evaluated for
use in future tracking applications. This device was shown to be acceptable,
allowing reasonable control of the following robots, providing the tracked
robots were sufficiently slowed down.
1.1.2
Report Outline
The report is broken into the following chapters:
1: Describes some of the traditional machine vision methods associated with
1.1 Introduction
3
tracking objects in images and explains the problems associated with
each method. It is shown that the main problem occurs when separating
the object pixels from the image. This is a non-trivial problem which
effects other processes if done poorly.
2: Briefly explains the vision hardware available during the scope of this
project. The existing single robot tracking system is described. This fails
to track more than one Khepera principally because it makes no attempt
to identify the robots, so it cannot recognise a robot in successive frames.
3: The third and fourth chapters explain the extensions made to the robot
tracking system to allow it to track more than one robot. This chapter
introduces a method of identifying individual ‘object-indicators’ used to
mark the physical objects. These indicators can be seen more clearly by
the system, which can extract the position and heading of each indicator
and hence information about the real object.
4: Provides an outline of the system capabilities. This describes the operation of the system, the method communicating positional information
to other processes, and introduces an easy to use system debugger/user
interface. Some tests of the system are also performed on trivial tracking
problems.
5: A more detailed test of the system is made in the fifth and sixth chapters.
The fifth chapter gives a brief description of the waggle dance which it
is postulated, bees perform when communicating the location of distant
food sources to other bees. A simple model of this dance is demonstrated
using the tracking system in the sixth chapter.
6: A number of LEGO MINDSTORM robots follow a leading robot by
only using positional information obtained from the tracking system. A
simulator used to test the method used to control the following robots is
introduced. A single bee robot is shown to successfully copy the dance
of a leading bee.
4
1. Introduction and background
1.1.3
Collaborative work
The project was implemented by both the writer and Sunarto Quek [15].
The generic design of the tracking system was worked on as a team with each
member then independently implementing, revising and testing the system
(for separate MSc project considerations).
• This collaboration involved the design of the object segmentation and classification part of the project (objects are effectively tracked by applying
this process in each frame). This involved sharing the design of the objectindicators, the extraction of the indicators from the image, the classification of the indicators and also the extraction of the position and orientation
information of these segmented indicators.
• The final system (performing the actual tracking in each sequential frame)
was implemented independently.
• All of the code used in the system was constructed independently. Although the basic vision algorithms were shared (with some of the code
being discussed).
• The writing of the report and the design and implementation of the main
tracking application were performed independently.
1.2
Social robotics
The need for such a system for experimentation with the Khepera robots
arises as a result of their limited sensing capabilities, and the limitations
of the existing tracking system (which cannot reliably track more than one
Khepera). Information from the tracking system would be used to enhance
the perceptual repertoire of one or more agents to enable group interactions.
1.3 Object Tracking
1.3
5
Object Tracking
The core of this report is involved with the traditional vision problem of
object tracking. Conventionally the vision system is broken into three main
components: segmentation, object recognition and object tracking. This section describes some of the more popular techniques applied to these elements.
It should be noted that:
• There is no properly defined method to complete any of these stages.
• All of the techniques have disadvantages (and all offer some applicationdependent advantages).
1.3.1
Segmentation
Segmentation separates an image into its component regions. While this is
relatively easy for humans it is far more difficult for machines. Segmentation can be defined as a method to partition a gray-level image F [i, j] into
regions P1 , ..., Pk . This can be visualised as determining how the foreground
of an image is separated from the background. The segmentation process
should extract interesting objects1 from the image. The technique applied
determines the level and detail to which this is possible. This section demonstrates the three chief segmentation methods: Thresholding, Edge Detection,
and Token Grouping.
Thresholding
Thresholding is accomplished by defining a range of values in the image.
Pixels within this range are said to be part of the ‘foreground’ all the other
pixels are defined as the ‘background’ [9].
A typical black and white image is quantised into 256 gray levels during
1
Where an interesting object refers to an item that the system is required to identify.
6
1. Introduction and background
the digitisation process, whereas a binary image contains only two levels (on
and off). Most basic machine vision algorithms operate on a binary image,
although they can be extended to work with gray images. A binary image
B[i, j] is created by thresholding a gray image. If it is known that the object
intensity values are in a range [T1 , T2 ] then we can the obtain the thresholded
binary image B[i, j] from the gray level image F [i, j]:
B[i, j] =
1 if T1 ≤ F [i, j] ≤ T2
0 otherwise.
• The user must select the thresholds [T1 , T2 ], either by setting a fixed value
by trial and error, or by using an adaptive threshold derived from a histogram of the image [9].
• Some images may not have clear-cut histogram peaks corresponding to
distinct image structure, making automatic thresholds complex to derive.
• Automatic thresholding techniques are discussed as part of future extensions to this project in section 7.3.1 on page 101.
Edge Detection
Thresholding selects pixels by brightness — there are no requirements that
the segmented regions are continuous. An alternative definition of a region
can be based on its boundary [6]. An example of an image segmented into
edges is shown in figure 1.1. The second image shows the edges found in the
first object.
• Edge definitions are generally local — there may be places on boundaries
where the measure of ‘edgeness’ drops, resulting in poor segmentation [16].
• A further problem occurs when there are two edges touching. Two different
starting points can result in different segmentations.
1.3 Object Tracking
7
Figure 1.1: Example of edge detection
Token group segmentation
This technique represents aspects of the image by tokens. These can represent
several features in the image such as specific pixel, line, or surface features.
Segmentation is achieved by grouping tokens into more abstract groups until
new tokens are formed, representing instances of objects of interest. Global
regularities are found by combining local and non local information [20].
• The vision system decides whether grouping occurs depending on some
predefined grouping criteria as specified by a user.
1.3.2
Classification
After regions have been found within the image a classification process needs
to be applied to identify these objects. The three most popular methods are
Region Growing, Template Matching and Connectionist Methods.
Region Growing
Given starting points in the image, neighbouring pixels are examined to see
if they meet the criteria of properties of a particular region. If they do then
they are added to the growing region, otherwise they are assigned to a new
region [9].
8
1. Introduction and background
• The final solution is dependent on the initial definitions as defined by
human input.
• Different starting conditions may grow into different regions.
Template matching
A target pattern (the template) is shifted to every location in an image and
used as a mask. An image is formed showing where regions similar to the
template are located [7].
• This is the preferred technique for tracking object within an image, due to
its relative simplicity.
→ The computation required can be large, depending on the size of the
image.
→ This can be reduced by limiting the size of the window that the template
is applied to within the image.
Neural networks
Neural networks (NNs) can be used to partition the feature space using
non-linear class boundaries. These boundaries are set during the training
phase by showing a carefully selected training set representing the objects
that will be encountered during the recognition phase. After learning these
classification boundaries the network behaviours like any other classifier [3].
More recent research tend towards unsupervised learning — with a Self Organising Network (SOM) forming its own categories through the correlation
of the data, without relying on a person teaching the network which group
an object belongs to.
• The design of the NN is important and is dependent on the application.
1.3 Object Tracking
9
• NNs require a large training set, with a human evaluating each network
classification.
• The SOM can categorise objects but not label them.
1.3.3
Object tracking
Segmentation techniques can lead to large inconsistencies when tracking an
object across several images. Traditionally, segmentation and classification
techniques fail to uniquely identify objects across frames, despite being able
to successfully classify objects in static frames. The two most successful
tracking methods are Kalman Filtering and Differencing techniques.
Differencing
While not a dedicated tracking mechanism this technique can be used to
predict where an object should be in the current frame based on its location
and velocity from the preceding frames. By looking for the closest match to
that area a record can be kept of which object is which [17].
• However the method does not uniquely identify the objects and can lose
track of them if they pass close to each other.
Kalman filtering
The Kalman filter is a set of mathematical equations which provide recursive
definitions of past, present, and future states even if the precise nature of the
system is unknown. Its predictive algorithm considerably reduces the search
space required and offers an accurate method of tracking objects [21].
• The mathematics used by the Kalman filter are very intensive and can
result in large computational costs.
10
1. Introduction and background
• There are also a large variety of Kalman filters available, with varying
complexity and accuracy — it is not obvious which filter is best suited to
individual applications.
1.4
Robot tracking
This section provides a rough outline into some research into appling object
tracking methods specifically for tracking mobile robots. The most common
field of research combining both robotics and vision comes from robot soccer
competitions. Several teams use their own vision systems with the robot
players ‘coached’ by a PC watching the action on an overhead camera.
1.4.1
The Cognachrome vision system
A popular commercial system, used to win the 1996 International Micro
Robot World Cup Soccer Tournament, is the Cognachrome Vision system
built by Newton Research Labs [10].
Figure 1.2: View of football arena
The system is chiefly implemented in hardware allowing it to track 5 objects
in a frame of 200x250 at a rate of 60 frames a second. The system can track
1.4 Robot tracking
11
a maximum of 25 objects (although the speed drops below 60 fps as the
number of objects increases beyond 5).
• Each object is distinguished by colour. There are 3 colour tracking channels each of which can track multiple objects of the given colour. Each
robot is identified by two coloured circles, the larger identifys the team
and the smaller identifys the player in that team.
Figure 1.3: Identifying Robo-soccer players
• The light source and light gradient determines how the camera interprets
the colour balance — which can change over the pitch.
• The curvature of the lens distorts the pitch so careful calibration is required.
• The system has been successfully applied to robot catching, soccer and
docking applications.
1.4.2
Roboroos robot soccer team
A team of researchers from the Department of Computer Science at the
University of Queensland produced their own tracking method [5] reaching
second place in the 1998 RoboCup. This system used template matching to
match both robots and the ball, successfully distinguishing them from foreign
objects and noise. The tracking was conducted by a method of differencing
between two frames.
12
1. Introduction and background
The system was unable to uniquely identify each individual object and track
it across images — leading to inconsistent results in varying environments.
The researchers concluded that refinements needed to be made to the tracking
component by applying a Kalman Filtering technique.
1.5
Robot Tracking summary
• The tracking process can be split into three distinctive elements: Extracting important features from the image, Classifying features found in the
image into objects, and tracking these objects in successive frames.
• All of the later tracking elements require a successful segmentation to have
been performed.
• With carefully designed objects, in controlled environments, simple thresholding techniques allow the object pixels to be extracted from the image.
In changing conditions, thresholding may segment these objects poorly
causing problems for the classification element.
• Using a probabilistic technique badly segmented and classed objects can
still be accurately tracked. These techniques are computationally (and
mathematically) complex.
• A number of moderately successful robot tracking applications are available. These either track objects by colour (which requires extra calibration
to find the colour threshold used when segmenting the images) or by following robot ‘indicators’ placed on top of the robot.
Chapter 2
Single Khepera tracking system
and the vision hardware
This chapter principally introduces the vision hardware used in the scope of
this project. The single Khepera tracking system (which this project extends)
is also described.
As well as using the existing Khepera tracking hardware, a ‘test’ arena was
made to allow simple tests to be conducted on an independent system. This
arena used newer and cheaper vision hardware so the tests were also partly to
evaluate this hardware as an alternative method of conducting robot tracking
work in the future.
The main problems with the existing system will be detailed. To outline,
these are:
• The system does not attempt to identify the object. If the system is
extended to track multiple objects, it is not clear which object is which in
each frame.
• The system tracks object(s) by following two LEDs placed on top of the
robot. These are generally not part of the robot so an external circuit
board is required.
• It is difficult for a basic vision system to group together LEDs if there is
more than one robot in the frame. This is especially true when the robots
13
14
2. Single Khepera tracking system and the vision hardware
are near to each other, which makes robot interaction tasks impossible.
• The system makes no attempt to track an object if it is partially obscured
by the overhead umbilical cord of the Khepera.
• Segmenting the LEDs from the image relies on a fixed, user defined threshold.
2.1
The single Khepera tracking system
The tracking solution presented by Lund, Cuenca and Hallam [12] simplifys the tracking problem by attaching an external circuit board with two
mounted LEDs onto the Khepera. This allows a very simple tracking system
to accurately follow the position of the two LEDs within the image.
The LEDs are identified as the two brightest pixels1 within a window in the
image. If the LEDs are found in the window then the robot has been detected
and simple pixel moment calculations are conducted to extract its center of
mass position and orientation. If the LEDs are not found2 then the robot is
lost and the system waits for it to become visible again.
By simplifying the system in this way, the system bypasses some of the
complexity involved in the object segmentation and classification issues by
never needing to explicitly identify the robot. Only the two LEDs (which are
easily distinguished from the rest of the image) need to be segmented from
the image.
2.1.1
The Khepera arena
The Khepera arena consists of a number of pieces of chipboard arranged
into a 2.4m x 2.4m square. A black and white camera is located 2m above
the central position of the arena. A Matrox Meteor frame grabber samples
the camera at a maximum resolution of 640 x 480 at a maximum rate of 50
1
2
Above a set threshold.
For example if the Khepera is hidden under its umbilical cord.
2.1 The single Khepera tracking system
15
frames per second (every 0.02s). The resolution varies over the image, at
the center one pixel represents a real distance of ∼7mm. The actual arena
is contained in a 370 x 370 pixel sub-window in the center of the image.
CAMERA
MONITOR
2m
1111111111111111111111
0000000000000000000000
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
2.5m
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
2.5m
FRAME
GRABBER
CPU
Figure 2.1: Layout of the Khepera arena
The Meteor card is attached to a Linux box with 16 MB of memory running at
75MHz. This frame grabbing hardware is obsolete and no longer supported
by Matrox. A third party, free device driver is used to control the frame
grabber [1].
2.1.2
Meteor Driver
The frame grabber captures a colour image and stores it in a directly referenced, contiguous block of PC memory.
The Linux box has 16MB of RAM, 1.2 MB of this is allocated for frame
storage purposes. This allows enough for two 640x480 frames to be stored
in memory3 . Frames can either be stored in a RGB, or packed or planar
YUV420 format. The packed YUV format stores the luminescence component (Y) and the chromosity components (U and V) in separate blocks (see
figure 2.2). Because only the intensity component is required4 this format is
convenient as it allows quick, efficient access to the Y component.
3
4
The frames are captured in colour with two bytes required to represent each pixel.
The image is only in black and white.
16
2. Single Khepera tracking system and the vision hardware
Figure 2.2: The YUV planar arrangement
The driver can operate in a number of modes. A user can request a single
frame, or a continuous (unsynchronised) selection of images. The most common is an ‘asynchronous’ mode which allows the images to be placed into
different buffers in the memory area. This allows one frame to be processed
by the user while the capture card places image data into another. The
capture card sends a signal to the user process when a frame is ready to be
processed.
Reading from the device
Frames are transfered from the capture card to a continuous block of RAM
on the PC by a DMA process. To guarantee enough contiguous space, the
Unix kernel can be prevented from using a segment at the ‘top’ of RAM by
hiding it during boot time. The kernel will not build page tables to reference
this area, so Unix processes will not use this segment. Providing no other
devices are configured to use it, this segment of physical memory will be
written to by the frame grabbing device only. Bypassing the Unix kernel in
this way is the only way to ensure a large continuous block (several hundred
kilobytes) of RAM. A simple method of ‘hiding’ RAM from the kernel at
boot time has been obtained by applying a “big physical area” patch [8] to
the kernel.
The device driver allows a user to transfer this frame data into memory
2.1 The single Khepera tracking system
UNIX KERNEL
Kernel
17
FRAME
BUFFER
Tracker
PC MEMORY
Figure 2.3: Memory layout of a UNIX system
allocated for their own use by a read() call. This has obvious speed and
storage disadvantages5 . The driver offers a more efficient method of memorymapping the hidden segment to the user via a mmap() call.
Problems with the driver
• Due to the experimental nature of the device driver, frames were often out
of sync when using the single frame capture mode and during early frames
in the synchronous capture mode. A basic workaround was to ignore the
early frames.
• A more problematic fault with the driver was that during synchronous
capture the card is free to dump a frame into any bank. If the user is
slow processing a frame bank, the capture card may write another frame
into that bank! The user must either assume that the frames will not have
altered much in a 20 ms time slice (in which time a Khepera can move a
maximum of 2cm) or copy each frame into another section of memory.
2.1.3
Operation
Tracking the two LEDs allowed the orientation of the robot to be calculated
— which allowed the system to predict where the robot was going to be
next6 . However, it was not clear which way along this line of orientation the
robot was heading.
5
6
The image exists twice in RAM.
Although this feature was not used by the tracking system itself.
18
2. Single Khepera tracking system and the vision hardware
Kheperas can move at a maximum speed of 1 ms−1 . In one time slice (20
ms) a Khepera can move a maximum of 2 cm (less than 3 pixels). Figure
2.4 shows a Khepera in the bottom left of the arena. The ∼58mm diameter
of the Khepera occupys a box of ∼8x8 pixels.
Figure 2.4: A Khepera in the arena, and a pixel-level view
There were two operational modes of the system, ‘tracking’ and ‘scanning
and detection’. During the detection phase the arena is scanned, breaking
the image into bands to speed up processing (only one band is processed in
each frame). The pixel moments in the band are calculated to determine
whether the robot is inside the band, and to derive its orientation and the
position of the two LEDs. When tracking, the LEDs are assumed to be in
a window of 8x8 pixels (equivalent to 5.8x5.8 cm2 ) centred at the previous
position of the robot. If the robot is not found in this window then the
system reverts to its detection phase.
Problems with the system
• The system can be expanded to track several objects. But the objects can
not be identified. When Kheperas get too close together it is very difficult
2.1 The single Khepera tracking system
19
(for a simple vision system) to determine which LEDs are grouped. An
example is shown in figure 2.5.
→ As the objects are not identified there is no control as to how the objects
are arranged in output from the system. The first robot seen in a scan
is generally classed as robot ‘one’. But during another scan this might
be the second robot found, and classed as robot ‘two’.
Figure 2.5: Two Kheperas interacting
• The relatively poor resolution of the Khepera in the arena (∼8 pixels in
diameter) makes a more complicated vision technique difficult.
• The system has to be carefully calibrated (to obtain the real ‘arena’ coordinates of the Khepera). Any changes in the setup (camera or arena
position) requires the lengthy calibration process to be re-conducted.
• The intensity threshold used to find the LEDs has to be set by hand.
→ Depending on the lighting conditions this can vary.
→ It can also vary depending on the position of the Khepera in the image.
• However the system offers some advantages. The system has been used to
20
2. Single Khepera tracking system and the vision hardware
track two objects by placing the two LEDs on the two objects and tracking
the lightest pixels around the screen. This works well for objects close to
each other (eg in box pushing experiments).
The orientation of the robot is lost though (obviously this cant be found
from one pixel).
Post processing of the positions is used to class each recorded position to
the correct robot.
2.2
Evaluation system
The Khepera arena presented a complex environment for a vision system:
• The arena consists of a number of unevenly surfaced pieces of chipboard
arranged together.
• It has a surrounding wall of foam about a meter high (for an acoustic
experiment) which cast uneven shadows around the board.
• The overhead lighting is unevenly spaced around two sides of the arena
making patches of light and dark regions around the arena.
• To allow the Khepera to run online, an umbilical cable with a large feeder
crane is placed overhead, which can occlude objects in the arena.
• The Kheperas themselves appear as very small objects within the image.
• These features are shown in figure 2.6. A Khepera can (just) be seen in the
bottom right of the arena. The overhead crane, wires and counter-balance
weight (used to run the Khepera online) can be seen in the center of the
image.
To allow for a more controllable environment, and a clearer robot resolution,
a purpose built arena was built for the testing phase of this project. A cheap
USB web camera was purchased. This plugged directly into a Unix box and
2.2 Evaluation system
21
Figure 2.6: The Khepera Arena
could be controlled by generic video driver software. The more expensive
camera and frame grabber used in the Khepera arena, were obsolete so the
purpose built arena also acted as an evaluation of using this cheap alternative
as part of future tracking systems.
2.2.1
Layout of the test system
The system did not require any expensive frame grabbing hardware, just a
USB port. The processor used was a 500 MHz Pentium III, allowing faster
processing power than the 75MHz system used in the Khepera arena. However the system had a maximum transfer rate7 of fifteen frames (of 640x480)
per second (0.06̇ s per frame) much slower than the Meteor frame grabber
which could operate at 50 fps (0.02s per frame).
Experiments performed on the system used direct pixel values for positional
information — the experiments were only for the evaluation of the vision
system so calibration was not needed. This allowed the system to be moved
around to suit the application.
7
USB can only transfer data at a maximum rate of 12Mbps.
22
2. Single Khepera tracking system and the vision hardware
CAMERA
USB PORT
CPU
Figure 2.7: The layout of the test system
2.2.2
Video for Linux driver
To access the USB camera with Linux, a kernel with USB support was required [19]. This was only available (at the time) in development kernels,
so a back-port was obtained which included the USB code in a stable Linux
kernel. The main advantage of this setup was that Video for Linux [11] could
be used as the driver. This is built into modern Linux kernels, and allows
generic code to work with any compatible video device. Using this driver the
tracking system could be used with several hardware setups.
A Phillips PCVC680K camera was used, this allowed:
• A maximum resolution of 640x480 in colour at a maximum rate of 15 fps.
• Numerous pixel format methods (including packed YUV420 as used in the
Khepera arena system).
• Several vision controls (brightness, contrast, colour) which could be controlled by software. The camera could also auto-adjust its aperture size.
The Video for Linux driver allowed an array of frame-banks to be used as
storage space for the incoming frames. As with the meteor driver, the area
used for frame storage could be memory mapped by the user. The user
selected a particular bank and requested the driver to place an image into
2.2 Evaluation system
23
that bank with a non-blocking call. This allowed the user to process another
frame while the driver acted on the request. When the user was ready to
process this incoming frame a blocking synchronise call was issued which
waited for the driver to finish writing to the bank.
REQUEST FRAME
IN BANK A
PROCESS FRAME
IN BANK B
WAIT FOR
FRAME A
(block)
SWAP BANKS
Figure 2.8: Video for Linux - Capturing process
Test arena issues
• The main advantages of this test system was that it was portable, customisable and cheap. Any system that used the Video for Linux driver
would conceivable work with any Video for Linux compliant hardware.
• The small Khepera robot could be represented with better definition. In
the Khepera arena the ∼60mm diameter robot covered 8 pixels. With the
setup used in the test arena, the same diameter covered 30 pixels.
• The camera produced a colour image. This offered an extra feature parameter which a detection system could use to distinguish between objects.
However the system presented in this report was principally built to work
on the black and white system available with the Khepera arena, so this
colour information is not used.
• The potential problem with the system is the relatively slow frame rate
offered by the USB. Only a maximum of 15 frames per second is offered
(compared with the 50 frames offered by the Matrox Meteor hardware).
24
2. Single Khepera tracking system and the vision hardware
Figure 2.9: A (covered) Khepera in the test arena (and a pixel-level view)
2.3
Hardware and single Khepera tracking
summary
• In this chapter the vision hardware used during the scope of this project
was introduced. The existing Khepera arena system was detailed and the
system used to evaluate a cheap USB web camera as a effective tracking
camera was also described.
• The single Khepera tracking system was introduced. The problems and
limitations with the system were described.
→ An extraneous piece of hardware (a LED board) was required so each
object could be tracked.
→ Only the orientation of the tracked object was found. The actual heading
of the robot was unknown (whether the robot was facing up or down
the line of orientation).
→ A prediction of the objects next position using orientation and previous
positions was not used when looking for the object in a new frame.
→ The system had a built in conversion system to convert pixels to world
coordinates. This required a lengthy calibration procedure for each
change to the setup.
→ The system relied on a fixed threshold value to determine whether the
LEDs had been found. This threshold was fixed by hand.
→ Crucially tracking more than one robot was not feasible (especially
interacting robots).
Chapter 3
Object segmentation and
classification
This chapter describes the methodology used to segment and identify objects within an image, employed by the extension to the tracking system (as
presented in this report).
Ideally the new system would be sufficiently simple to run on the existing
hardware (with a relatively slow processor). A more sophisticated method of
segmenting and classifying the objects in an image was required, so that the
robots could be identified, preventing confusion when making classifications.
A system of using ‘object-indicators’ to mark the identity of each object was
introduced. Rather than tracking LEDs the system would track (and identify) objects by an indicator placed on top of the robot. These indicators
were easy to see by a vision process, compared to the less well defined robots.
The indicator method allowed a user to construct their own indicator from
any material, without relying on many software or hardware constraints (unlike the purpose built LED boards required for the existing system). The
indicators would be recognised by a set of feature parameters (which would
vary for each indicator). This would allow the system to recognise each ob25
26
3. Object segmentation and classification
ject in each frame, thus avoiding the principal problem of the single Khepera
tracking system (not being able to recognise robots in successive frames).
The only criteria the user would have to consider when making the indicators
would be to:
• Design indicators large enough to provide sufficient resolution to perform
the relevant vision tasks.
• Construct enough sufficiently differently parameterised indicators so that
each object could be distinguished by the system.
Figure 3.1: Differently parameterised indicators
By making the indicators sufficiently small compared to a robot, and placing
a boundary around each indicator, the indicators could be prevented from
overlapping and adding extra complexity to a classification system.
3.1
Vision techniques
Each indicator needed to be designed to allow it to be separated from the
image background by a simple thresholding process. Simple machine vision
techniques were used to extract thresholded pixels from the image, and to
group connected pixels into regions. These regions were processed further
to extract more detailed information (the size, position, and orientation),
allowing the ‘un-interesting’ regions to be ignored so that only the regions
which possibly matched important objects were studied.
The majority of techniques demonstrated in this chapter are basic vision
3.1 Vision techniques
27
techniques and discussed in many sources [3], [9], and [16].
Machine vision algorithms generally follow the following conventions. The
image is described as a matrix with row and column indices i and j respectively. The pixel in row i and column j is described by the element [i, j] in
the image matrix. This matrix is of size m x n corresponding to the height
and width of the image. By convention the pixel [0, 0] is visualised as being
at the top left of the image. Therefore the index i points ‘downwards’ and
index j points to the ‘right’. This is illustrated in figure 3.2.
( 0, 0 )
j
( 0, n )
( i, j )
i
( m, 0 )
( m, n)
Figure 3.2: Pixel labelling conventions
3.1.1
Moments
Information about an object represented by a group of pixels can be extracted
from the intensity and position information of each pixel. This allows a
relatively detailed knowledge of the objects position, size, and orientation to
be found [3].
Objects are represented by ‘on’ pixels in a binary pixel set describing the
pixels in an image after a segmentation process has been applied to the gray
level image (see section 1.3.1, on page 5).
28
3. Object segmentation and classification
Gray level moments
Using the gray-level intensity information about a pixel provides more accurate moment calculations [9]. The intensity information is used as an
indicator of ‘mass’1 of the pixel. This provides a more accurate calculation
as the more important pixels (with intensitys far away from the threshold
boundaries) are given a higher mass weighting to allow for errors in the segmentation process.
Pixels that are mistakenly classed as part of an object (or missed from the
object) are generally close to the threshold boundaries and are given less
weight during calculations.
Zeroth order moments - Area
The area of an object contained in a binary pixel set B[i, j], is simply the
number of ‘on’ pixels contained in the set.
A=
n X
m
X
B[i, j]
i=1 j=1
First order moments - Position
The mid-position of an object can be found by using the first order moments.
If we consider the intensity of a point as being an indicator of the ‘mass’ of
this point, this can be used to calculate the center of mass coordinates (x̄, ȳ)
of the object represented in the binary set (equation 3.1).
Pn Pm
x̄ =
ȳ =
1
Or some weighting factor.
i=1
j=1
A
Pn Pm
i=1
j=1
A
iB[i, j]
jB[i, j]
(3.1)
3.1 Vision techniques
29
The calculation requires the position and intensity value of each pixel making
up the object to be examined. The result is the weighted mean of all the
pixels forming the object.
Second order moments - Orientation
To be able to define a unique orientation, the object must be elongated along
one axis [9]. The orientation of this axis defines the orientation of the object.
The axis of least second moment, which is equivalent to the axis of least
inertia (in 2D), is used to find this axis of elongation. The orientation is the
angle θ, of this axis from the x axis.
OBJECT
AXIS OF
ELONGATION
θ
Figure 3.3: The orientation of an object
Parameters which solve the least squares fit of the axis through the object
pixels are the second order moments shown in equations 3.2. A proof of this
is shown in appendix A.4.
a =
n X
m
X
(xij − x̄)2 B[i, j]
i=1 j=1
n X
m
X
b = 2
c =
(xij − x̄)(yij − ȳ)B[i, j]
(3.2)
i=1 j=1
n
m
XX
(yij − ȳ)2 B[i, j]
i=1 j=1
These coefficients can be used to find the orientation, θ, of the axis of elongation from the x axis as shown in equation 3.3.
30
3. Object segmentation and classification
tan2θ =
b
a−c
(3.3)
Notice, however, that an algorithm must already know the centre of mass
position of the object (x̄, ȳ) before starting to calculate these coefficients.
The position of each pixel has to be subtracted from the mean position, so
each object has to be examined twice, first to enable the extraction of the
mean position of the object and then to calculate the orientation coefficients.
A more efficient algorithm is presented in section 3.2.2 on page 34.
Notice that for clarity the intensity weighting factor of each pixel is not
included in the coefficient calculations in equations 3.2.
Compactness
The compactness of a continuous geometric figure is measured by the isoperimetric inequality:
P2
≥ 4π
A
where P and A are the perimeter and area respectively, of the figure.
This enables geometric figures to be classified regardless of their size [9]. The
shape with the smallest compactness (4π) is a circle. The compactness is
an invariant parameter so its value for each indicator should stay constant
wherever the indicator is in the arena (because of lens curvature objects
further away from the center of the image are smaller). This parameter
should have allowed carefully designed indicators to be identified by their
shape.
3.1 Vision techniques
3.1.2
31
Grouping pixels to regions -
Connected Components
When an image has been segmented and represented by a binary image,
pixels that obviously make up a region can be grouped together using the
‘connected component’ algorithm [3].
In the common image representation, a pixel has a common boundary with
four other pixels (its 4-neighbours), and shares a corner with four additional
pixels (its 8-neighbours).
PIXEL
FOUR
NEIGHBOUR
EIGHT
NEIGHBOUR
PIXEL AND ITS
FOUR NEIGHBOURS
PIXEL WITH ITS FOUR AND
EIGHT NEIGHBOURS
Figure 3.4: Pixel neighbours
A path from a pixel at [i0 , j0 ] to [in , jn ] is the sequence of pixel indices [ik , jk ]
such that the pixel at [ik , jk ] is a neighbour at [ik+1 , jk+1 ] for all k from
0 ≤ k ≤ n − 1.
The set of all ‘on’ pixels in a binary image, S, is the foreground of the image.
A pixel, p ∈ S is connected to q ∈ S if there is a path from p to s consisting of
pixels of S. A set of pixels in which each pixel is connected to all other pixels
is a connected component [9]. This defines regions within the segmented
image.
Component labelling
The standard component labelling method finds all of the connected components in an image and assigns a unique label to each component. All of
the pixels forming a component are assigned with that label. This technique
forms a label map which describes the image in terms of the object that
32
3. Object segmentation and classification
each pixel represents (pixels that are not part of an object are defined as the
background).
KEY
ON PIXEL
REGION 2
REGION 1
OFF PIXEL
REGION 3
BINARY IMAGE
Figure 3.5: Describing thresholded image with regions
The sequential form of the algorithm requires two passes of the image, and
is described more fully in appendix A.1.
Connected component summary
• The major disadvantage of this approach is that ALL of the pixels inside
the processed window need to be scanned twice. A slight disadvantage is
that a second copy of the image is produced when constructing the label
map (doubling the memory requirements).
• However this label map allows the regions to be examined quickly by further algorithms. The pixel moments of each region can be calculated by
using the knowledge of which pixels belong to which region (as contained
in the label map).
• The project used a more efficient component labelling strategy which only
required one complete scan of the pixels — although a label map was not
constructed. This is presented in section 3.2.1.
3.2 Efficiency of the vision algorithms
3.1.3
33
Boundary tracking
A simple algorithm (described in appendix A.2) allowed the boundary of a
connected component to be found. This algorithm required a starting index
on the edge of an object to be provided, and traced around the boundary of
the object.
3.2
Efficiency of the vision algorithms
The obvious key to real-time image processing is to process each image
quickly. Because of the relatively slow processor in the Khepera arena the
system needed fast, simple algorithms. The traditional vision algorithms
presented above are slow — especially the connected component algorithm
which scans the whole image window twice. These algorithms can be sped
up so the bulk of each image is only processed once — however this results
in the loss of some of the information, so careful use is required.
3.2.1
Efficient connected component algorithm
Rather than constructing a map of the pixel labels during the connected
component algorithm, only the index of the top left pixel and the area of
the region was stored by the efficient algorithm. This reduced the two pass
algorithm to one pass. If the region looked interesting2 then the actual
information about the region could be extracted by re-applying the algorithm
to this much smaller region. In this way, rather than sweeping the whole
640x480 image a second time (as in the original algorithm) only the important
looking regions were scanned twice (a much smaller section of the image).
A problem with this method was that a label map of the regions was not
produced. All of the moment calculations rely on summations of intensity
2
If the area of the region matched a particular range.
34
3. Object segmentation and classification
START REGION 2
START
REGION 1
KEY
ON PIXEL
OFF PIXEL
START
REGION 3
BINARY IMAGE
Figure 3.6: Object representation after one pass of the connected component
algorithm
and pixel positions, so these pixel summations had to be conducted during
the second sweep. This presented a problem for the orientation calculation.
3.2.2
One pass orientation calculation
As shown in section 3.1.1 (on page 29) the calculation of the orientation of
an object required the first-order center of mass positions to be known, while
completing the summation calculations of each pixel in the object. This
would require these regions to be processed a third time. A more efficient
calculation of the second order moments can be shown (see appendix A.4.1)
to be:
a =
n X
m
X
Fij x2ij
i=1 j=1
X
n X
m
b = 2
+
n X
m
X
i=1 j=1
Fij xij yij − x̄
i=1 j=1
+x̄ȳ
c =
Fij xij +
n X
m
X
n X
m
X
Fij + x̄2 − 2x̄
i=1 j=1
n X
m
X
Fij yij − ȳ
i=1 j=1
n X
m
X
Fij xij
i=1 j=1
Fij
i=1 j=1
n
n X
m
m
XX
X
2
Fij yij +
Fij yij
i=1 j=1
i=1 j=1
+
n X
m
X
i=1 j=1
Fij + ȳ 2 − 2ȳ
(3.4)
3.2 Efficiency of the vision algorithms
35
This change allowed the pixel values to be summated during the sweeping
phase without having to know the first order position moments during the
calculations. Only the the positions xij , yij and gray level intensity Fij of the
actual point were required during these calculations. The first order values
(x̄, ȳ) were factored in after each pixel had been examined.
The orientation of the axis can be found with these coefficients as shown in
equation 3.3 (page 30).
Region finding summary
• Objects (indicators) within a specific threshold could be extracted from
the image.
• The pixels forming each object could be grouped together.
• The size, position, perimeter and orientation of each indicator could be
extracted from these pixel groups.
• Efficient algorithms were created to extract the relevant features from the
image, by only scanning the whole image once, rather than the traditional
method of scanning the whole image twice.
• Indicators had to be large and clear enough to see and contain an obvious
axis of elongation to allow the orientation of the object to be extracted.
• The orientation of the object only gave the orientation of the axis of elongation. There was no indication as to the heading of the robot along this
line. (This is rectified in section 4.2.1, on page 60).
• Indicators could be classified by their relative size (although this would
change slightly depending on the position), compactness, or some other
invariant feature parameter.
An example of some regions found in an image, with their centers, orientations and perimeters marked out, is shown in figure 3.7.
• Three regions have passed the intensity and area thresholds: the indicator,
a lamp (at the bottom of the image) and a section of wall (on the left of
the image).
36
3. Object segmentation and classification
Figure 3.7: Extracted moment information from an image
• A black mark shows the detected center of mass position of each region
with the orientation displayed above this marking.
3.3
Region finding experiments
It was hoped that objects could be sufficiently identified by the simple feature parameters shown (size, area, perimeter and compactness) to enable
simple indicators to be built. Experiments were performed to evaluate such
a classification method.
3.3.1
Indicator classification experiments
Tests were performed comparing the parameter information obtained from a
large A4 indicator positioned in different parts of the arena. The tests used
the same threshold and similar environmental conditions (the arena in the
same state, similar lighting conditions with a small time between tests).
However in this complex environment the parameters extracted from the
object varied by a large amount. The lighting could not be controlled suf-
3.3 Region finding experiments
37
Figure 3.8: An A4 sized indicator in the Khepera arena
ficiently, because of outside lighting effects. Also the lighting conditions in
altered in different parts of the arena, resulting in the fixed threshold selecting radically different pixels depending on where the indicator was. Even a
static indicator, processed several times, resulted in a significant parameter
change (due to slight changes in the overall lighting conditions). Table 3.1
shows examples of the parameters extracted from an A4 sized indicator located in different parts of the Khepera arena. The threshold used to segment
the indicator from the image was the same in each position (130).
Error (%)
Area
1300
1154
857
51
Perimeter
143
137
144
5.1
Compactness
15.7
16.3
24.2
54.1
Table 3.1: Moments derived from an A4 indicator in the Khepera arena
As the indicator size reduced, a larger fraction of its total area was misclassified resulting in even poorer results. Even controlling the lighting in
38
3. Object segmentation and classification
the more easily controlled test arena, resulted in poor results (as shown in
table 3.2). The parameters extracted from static indicators fluctuated less
as the lighting conditions could be kept moderately constant.
Error (%)
Area
2911
1754
2099
66
Perimeter
215
208
207
3.9
Compactness
15.88
24.95
20.41
57.1
Table 3.2: Moment invariance of an indicator in the test arena
Clearly the area and compactness values changed by too large a margin
to allow for the successful classification of different objects. To solve this
problem either another method of classification, or a method for selecting a
threshold depending on position was required.
Indicator classification summary
• In the complex vision environment of the Khepera arena, simple feature
parameters such as the indicator size proved to vary too much to allow
object identification to be made.
• This was due to:
→ Fluctuating lighting conditions.
→ Un-uniform lighting and background conditions over different parts of
the arena.
→ Poor resolution of the indicators.
3.4
Extending the classification system
To allow for successful identification of each object, a stronger method of
identification was introduced. A number of black markers were placed on
the indicators. These markers would stand-out from the indicators, such
3.4 Extending the classification system
39
that they would not be part of the indicator after segmentation, effectively
appearing as ‘holes’3 in the indicator. Each object was identified by the
number of holes on the indicator.
OBJECT 1
OBJECT 2
Figure 3.9: The indicator marking system
This invariant parameter depended only on:
• A secondary thresholding process being able to locate all of the holes on
an indicator.
• The process not mistaking dirt or shadows on the indicator for extra holes.
Again this allowed very simple, user defined indicators to be made. These
were very simple in that one design could be used, with each indicator being
identified by a particular number of holes, rather than by using different
shaped and sized indicators to classify each object. The only extra constraint
introduced was for the user to make the holes large enough to be visible and
far enough apart to distinguish. Each indicator had to be large enough to
allow sufficient spacing between the holes and the edge of the indicator (to
prevent the merging of holes).
Locating and counting holes required extra computation. Again the bulk of
the computational time came when scanning the entire image. Once again
the areas actually scanned for holes would be much smaller than the overall
image size.
3
During this report the markings are referred to as holes (which is how vision algorithms
see them). It should be understood that rather than being physical holes in the indicators
they are black marks.
40
3. Object segmentation and classification
A sketch of the algorithm used for locating holes can be seen in appendix
A.3.
IMAGE
ON PIXEL
OFF PIXEL
BACKGROUND
HOLES
Figure 3.10: Identifying an indicator by holes
3.4.1
Processing regions
To extract the information about the region:
• The single pass, connected component algorithm was initially applied. The
number of components, their size and the index of the top-left pixel were
stored.
• Objects too small or too large were ignored.
• The perimeter of each object was then scanned. Objects touching the edge
of the image were ignored, and objects too large or too small could also be
ignored.
• The object was then scanned for holes (see appendix A.3). The connected
component algorithm was applied searching for pixels inside the indicator
less than a ‘hole’ threshold.
• Care had to be made to not confuse jagged edges (which could appear to
be holes) with actual holes. The system needed to check that each hole
was completely surrounded by the indicator.
• The summation information found during the hole processing was added
3.5 Detection tests and indicator design
41
to the overall moment calculations to achieve more accurate positional and
orientational information about the indicator.
Detection summary
• An image was scanned for ‘interesting’ looking areas, these were identified
as specific indicators which were matched to objects.
• After scanning the following was known about the object: position, orientation, area, perimeter and number of holes it contained.
• This information (particularly the number of holes) could be used to identify the object.
3.5
Detection tests and indicator design
Some sample indicators (using the hole marking scheme) were created to
demonstrate the detection process. When making the indicators the end
user needed to considerate:
• The definition of the indicator, such that the system had the ability to
extract indicators from the image by thresholding. (eg making large, bright
indicators).
• The definition of each hole — Allowing the system to ‘see’ all of the holes
clearly and not mistake other markings (dirt) for holes.
• The main consideration was the size of the holes, making them large enough
to see, but not so large that huge indicators would be required.
• The separation of the holes was important, preventing holes being merged
with the background, or with other holes.
If the holes were positioned too closely together, or too close to the boundary
they could blend, making the section of indicator separating them darker than
the threshold. Either two holes would merge, or the hole would be classed as
42
3. Object segmentation and classification
part of the background. Figure 3.11 shows a pixel-level section of captured
image showing a indicator with three holes.
Figure 3.11: An example of the system being unable to find a hole in an
indicator
• The white pixels show areas above a threshold (the indicator).
• The black pixels show areas below another threshold (the holes).
• The gray pixels are ‘uninteresting’ pixels.
• It can be seen that the bottom left hole is located too near to the boundary
so that in this particular lighting condition the pixels forming the region
of indicator between it and the boundary have become darker (because of
pixels blending) resulting in those pixels not being grouped as part of the
indicator.
• This results in the hole being classed as part of the background, so only
two holes are found in the indicator.
3.5.1
Design of the indicators
By experimentation and observation, a reasonable size of the holes and the
spacing between them (and the indicator edge) was found. These sizes for
3.5 Detection tests and indicator design
43
the two arenas are shown in table 3.3. The size of the indicator depended on
the number of holes placed on it.
System
Khepera arena
Evaluation arena
Hole size (mm)
20
5
Hole separation (mm)
15
5
Table 3.3: Design of the object indicators in the two arenas
The material chosen to make the indicators was basic printer paper which
reflected a large proportion of light. Slightly more ‘shiny’ paper could have
been used to make the indicators easier to segment from the image. As the
Khepera arena was not uniformly lit from above, reflective paper couldn’t be
used (it would reflect a large proportion of the light to the camera in some
positions, and not at all at others).
3.5.2
Single frame classification tests
Figure 3.12 shows a cluttered scene from the evaluation arena. Seven indicators are spread over the image, as well as other bright objects (an unmarked
indicator, reflective paper, and base of a lamp).
The second image shows the output after applying the segmentation and
classification process. An intensity threshold of 120 has been used to segment
the indicators from the image, and a threshold of 30 is used to find the holes.
All of the marked indicators have been found by the detection process with
relatively good accuracy in both the position and orientation of the objects.
Khepera detection
Figure 3.13 shows an indicator placed on a Khepera robot placed in a clear
section of the Khepera arena which has been successively segmented and
classified. Using an indicator of this size made it impossible to place more
than one (visible) hole-marker onto the indicator (at this resolution).
44
3. Object segmentation and classification
Figure 3.12: Scene containing 7 object indicators before and after classification
To segment indicators in this clear area of the Khepera arena a gray level
threshold of 130 is used to find the indicators. Within the indicators, holes
are marked by pixels with intensity less than 110.
Object occlusion in the Khepera arena
If the indicator is partially or totally obscured then it is unlikely to be found
correctly by the system. An example is shown in figure 3.14. Two indicators
are in the image, but the second is partially obscured by the overhead crane,
hiding one of the holes. This ‘second’ indicator has been mis-classified as the
‘first’ indicator.
Segmenting the image in different parts of the Khepera arena
The varying lighting and background conditions in different parts of the
Khepera arena required different thresholds to segment objects from the
background. Kheperas grouped in the same (or similar) parts of the arena
could be segmented by using the same fixed threshold. Figure 3.15 shows
three indicators spaced over the arena.
3.5 Detection tests and indicator design
45
Figure 3.13: Detecting a Khepera sized indicator
The left most indicator (in a clear part of the arena) can be segmented with
a threshold of 130 (and hole threshold 110), as shown in the left image. With
this threshold the other indicators cannot be segmented (such that they were
identified by the classification process). The right image shows the system
detecting these indicators after segmenting with a threshold of 125 (and hole
threshold of 70). With these settings the first indicator is lost.
Orientation of the indicators
It can be seen by observation, that the orientation of the indicators found in
figures 3.12, 3.13, 3.14 and 3.15 have been found fairly accurately. Any error
comes from mis-segmentation of the image where more of one side of the
indicator is segmented than the other, resulting in the indicator appearing
to tilt more than it actually does.
Indicator identification
The image shown in figure 3.16 shows the detection process applied to the
cluttered image shown in figure 3.12, but only having searched for the first
46
3. Object segmentation and classification
Figure 3.14: Example of a partially obscured indicator being incorrectly
classified
two indicators. It can be seen that these objects (marked with one and two
hole markers respectively) have been selected correctly.
3.5.3
Execution time
The bulk of execution time when processing an image comes from the initial
scan of every pixel within the image window. The more complex pixel calculations for moment calculations and hole finding operations are performed
for relatively few pixels. The size of the window and the number of pixels
above the threshold effects runtime.
The fairly complicated cluttered scene shown in figure 3.12 (page 44) was
captured and the detection process was applied and timed. The processing
times are shown in table 3.4. This time was obtained by processing the entire
scene 100 times and averaging. The times for both the 500MHz evaluation
system and 75MHz Khepera system are shown. These times are also compared with the time taken for the simpler (non-hole finding) method shown
in section 3.3 (page 36).
3.5 Detection tests and indicator design
47
Figure 3.15: Indicators spread over the Khepera arena
System
Khepera arena
Evaluation arena
Method
Basic
Hole finding
Basic
Hole finding
Time (s)
0.165
0.249
0.014
0.023
Table 3.4: Execution time of the detection process
• The scene shown in figure 3.12 is fairly complex. It is unlikely that 7
objects would be tracked, and the objects cover a large proportion of the
image. Usually 2/3 robots covering a small area would be followed.
• As expected the test arena (500MHz) processed each frame ∼10 times
faster than the Khepera arena system (∼75MHz).
• The test arena system received a frame every 0.06s. It took 0.023s to
process the scene as shown, much less than the frame time (almost at the
speed of the Meteor hardware in the Khepera system).
• The Khepera arena hardware received a frame every 0.02s, whereas it took
0.249s to process the scene as shown. This was obviously not quick enough
for real time applications.
48
3. Object segmentation and classification
Figure 3.16: Testing the classification of indicators
• The execution time depended on the amount of ‘interesting’ pixels in the
image.
• A scene must be processed quickly enough so that the tracked robot does
not move too far before the next frame. This is dependent on the type
of robot used. A Khepera can move at a maximum of 2cm in the 0.02s
between each frame. Processing the scene shown would miss ∼12 frames,
in which time a Khepera could move a maximum of 20cm!
• Even the simple feature detection method (not looking for holes) would
miss ∼8 frames, allowing the Khepera to move 16cm.
• As 12 frames are missed when processing the image on the Khepera system,
care must be taken to prevent the Meteor card from over-writing the frame
being processed (see section 2.1.2 on page 17). It would need to be copied
to another memory location (slowing the system further).
• These times are to scan the entire image. The detection process could
be applied to a small sub-window of the image (explained more fully in
section 4.1 on page 52).
3.6 Detection and identification summary
49
• It is ironic that the testing system is limited to processing 15 frames a
second, yet can process frames fast enough to operate on the Matrox hardware. A trivial solution would be to update the CPU used in the Khepera
system to process each frame at a faster rate. During the scope of this
project this was not practical, as the Khepera system was in heavy use
and was relatively complicated to setup.
3.6
Detection and identification summary
• The indicator system allowed simple, unique, user defined indicators to be
built allowing objects to be identified and information about their position
and orientation to be extracted.
• When tracking, objects could be lost and system would find them again
without the user needing to re-identify the objects.
• The user need never identify the objects. The system simply matched
robots and indicators by the number of holes on the indicator.
• The detection system operated with a set of fast, efficient algorithms.
• Objects could be lost by being partially obscured (for example by the
overhead crane).
• The system required an accurate threshold to segment the indicators from
the image.
• One complete (complex) frame could be processed sufficiently quickly using
the processor in the test arena for real time processing (of the incoming
frames).
• The 75MHz processor used in the Khepera arena was not quick enough to
apply this process for each frame to achieve any kind of real time processing. Rather than replace the machine (which was in heavy use) a method
of reducing the processing time was required.
Chapter 4
Object Tracking
As demonstrated in the previous chapter the tracking system introduced in
this report follows a number of object-indicators across successive images.
This identification process provides a simple solution of the traditional tracking problem which can mis-classify objects in successive frames. Each indicator is identified within each frame, which allows the segmentation and
classification procedure to operate on a sequence of frames to provide the
tracking information1 .
A
A
A
B
FRAME 1
B
B
FRAME 2
FRAME N.
Figure 4.1: Tracking by appling the detection system in successive frames
This chapter of the report documents the tracking element of the presented
system, following the object indicators in successive image frames.
• Rather than implementing a computationally expensive probabilistic modelling method, the system relied on the basic identification procedure coupled with simple positional prediction based on the objects trajectory.
1
This is similar to differencing methods — see section 1.3.3 on page 9.
51
52
4. Object Tracking
• The indicator was searched for in a particular location (based on its previous location).
• If the indicator was not found, or was occluded, it was ‘lost’ by the system,
until it became visible again.
• Providing objects were only lost for a small enough time, this provided a
very simple solution to the tracking problem.
• Because the system relied on user set, constant thresholds to stay reliable
over the whole tracking area for the tracking run, the system could occasionally mis-classify objects for short amounts of time. This occurred when
the identification features of an indicator were obscured — or if shadows
across the indicator appeared as hole markings.
4.1
Method of operation
The system borrowed heavily from the tracking method applied in the single
Khepera tracking system [12] by placing a ‘window’ around the predicted
robot location and applying the detection process (documented in section
3.4 on page 38) in that window. If the indicator was not found then the
system would revert to a scanning phase looking for that object in the whole
frame.
For this early design, each object had its own unique ID number, corresponding to the number of markings on the indicator. Rather than detecting all
of the objects by itself the user had to specify the number of objects for the
system. In future the identification code could be updated such that:
• Robots might be identified by their starting location.
• The system might always scan for new robots being introduced to the
system.
→ Without knowing the number of robots the system might miss robots
when making an initial count, whereas by specifying the number and
searching this is prevented.
4.1 Method of operation
4.1.1
53
Tracking and windowing
The detection algorithm (as detailed in the previous chapter) could be applied
to a small window within the image. The main bottleneck in the execution
time during the processing of each frame, came from examining each pixel
of the image in the first pass of the detection process. By only examining
small windows in the image, processing could be performed much quicker. If
a prediction could made about the location of an object then only a small
section of the image needed to be searched when looking for that object.
This tracking phase was applied to each known object in each frame.
The image shown in figure 4.2 shows the tracking process applied to one
image. The previous locations of the indicators in the scene were known,
and the system searched for the new positions in the windows shown.
Figure 4.2: Searching for indicators using windows
Rather than using the position, speed and orientation information to make an
accurate prediction of the new position, the initial system assumed the object
was in a small window centred on its previous location. The detection and
identification algorithm was applied to each window for each known object
to extract the new position information and to check the identification of
54
4. Object Tracking
the object. Checking the identification reduced the chances of the system
mis-recognising an object (the most serious error).
The window size could be changed depending on:
• The size of the indicator and its resolution in the image.
• The distance that the robot could move in the space of one frame.
→ In the test arena a window of 70x70 pixels was used, allowing for fairly
fast robots and the relatively slow frame rate. This corresponded to a
distance of ∼13.5 cm across the window, allowing for robot speeds of
less than 1 ms−1 .
→ The single Khepera tracking system used a window of 8x8 pixels, the
approximate size of the Khepera, which could move approximately 4
pixels (2cm) in one time-slice.
• This parameter could be easily changed by the user depending on the
application.
4.1.2
Scanning and detecting
The scanning phase applied when the system did not know the location of
one or more of the objects. The system also started up in this mode (as the
positions of the indicators were not yet known). A scanning window would
sweep ‘down’ the picture moving a specific2 number of pixels in each frame.
Only the pixels in this window would be examined for objects. If a missing
object was found and identified in that window then that object could be
flagged as ‘found’. An example is shown in figure 4.3. A band scanning
the entire width of the image but only 70 pixels in height is shown in two
positions in the image. In the second figure the scanning window has found
the identifier with two hole markers.
2
User defined—as required by the application.
4.1 Method of operation
55
Figure 4.3: System debugger: The scanning band sweeping the image
By only scanning this small band the system had time to process the tracking
windows for objects that had been found in the previous image frame.
The testing system running on the 500MHz processor could process the entire
image much faster than the rate that frames could be transferred over the
USB connection (as shown in section 3.5.3 on page 46). An option allowed
the user the choice of using the movable scanning window or scanning the
entire image, depending on the system it was running on. Scanning the whole
image would find a lost object in one frame (if it was visible) rather than
waiting until the scanning band covered that object. This was obviously
desirable and practical on a fast system.
4.1.3
Indicator identification
Due to poor image segmentation, some objects could be mis-classified. Checking the identity of each object in each frame limited the mis-classified cases
resulting in more lost objects rather than mis-labelled objects. The tracking
56
4. Object Tracking
code could be easily modified to allow multiple objects of the same type to
be tracked. This might be applied to static objects (for example to mark
danger, or refuelling areas).
4.1.4
System output
• The position of each robot could be dumped to a file describing the position
of the robots over time.
• This information was also shared with a second process which could send
the data to other processes running over the network. This allowed complex robot controller software to run on external machines without slowing
down the tracking machine, or allowed the information to be processed or
stored, easily by external processes.
• The system could also dump each frame processed to disk as a ppm image.
Information on the objects and regions found in an image could be overlayed onto the image to provide an easy method of debugging the system.
• A further option allowed the final positions of the robots to be overlayed at
the end of the tracking process to allow the user to verify that the system
was performing sensibly (without having to store the image of each frame).
Position storage
To simply the scanning and identifying process, the user informed the system
of the number of objects it was required to track. The system would build a
list of object descriptor elements describing each object. Each object element
contained the following information about the object: its id number, a flag
showing whether the object was found in the previous frame, and its position,
orientation and speed. A good approximation of the current speed of the
object could be obtained by examining the object’s displacement over the
preceding frames.
4.1 Method of operation
57
Position communication
This object information was held in shared memory which allowed the tracking process and a separate communication process to both access it. When
the communication process received a request from another machine, the
current position information was sent back.
N. OBJECTS
OBJECT 1. INFORMATION
OBJECT 2. INFORMATION
OBJECT N. INFORMATION
TRACKER
SHARED MEMORY
COMMUNICATOR
Figure 4.4: Sharing object positions with other processes
This method allowed a very simple communication process to be built, and
was reliable enough to allow communication between two machines on the
same network (eg. a separate tracker and robot controlling machine)3 . This
avoided the synchronisation problems involved with storing positional information in a file4 .
REQUEST POSITION
ACTION
READ POSITION
TRACKER
LISTENER
Figure 4.5: Listening for object positions
4.1.5
System debugger
The system proved troublesome to debug. Unusual conditions could lead
to memory access errors, the rarity of these conditions making traditional
3
This communication aspect is of no interest in the scope of object tracking — it used
common Unix facilities. Interesting parties should refer to the code used in the project.
4
Trying to write information to a file while another process trys to access that file.
58
4. Object Tracking
debugging techniques difficult. To aid debugging a simple GUI interface was
created using the QT toolkit [18] which allowed the current output from
the camera to be overlayed with information from the tracking system. The
tracker itself could be stepped through allowing a simple visual aid to the
debugging.
The QT toolkit allows nice looking, user interfaces to be created easily. Because of this the ‘debugger’ also offered an easy to use introduction to the
system. The debugging application required a large chunk of system resources
(especially a problem with the Khepera hardware) so it was not designed as
the ‘real’ interface to the system.
Figure 4.6: View of the system debugger
However, the tracking code consisted of modular code which could be easily ‘wrapped’ by any interface, allowing the main command line interface
and GUI debugger to share the same ‘tracking’ code. This also allowed the
interface to be modified in future.
The debugger provided:
• An easy to use interface.
• Allowed a frame to be examined at pixel-level, thresholded and saved to
disk.
4.2 Extensions to the basic system
59
• The individual pixels values could be examined (and areas could be enlarged) allowing thresholding values to be easily selected.
• The tracking information could be overlayed onto the original frame.
• The tracking system could be stepped through frame by frame.
4.1.6
Basic performance evaluation
Because of the nature of the vision system, an end user could test the output
from the system by comparing it with what was actually in the arena. The
debugging/interface system displayed each frame for the user to inspect and
overlayed the tracked information onto it. It also allowed the system to
be ‘stepped’, allowing the user to conveniently inspect dumped (text) output
from the system and compare it with the graphical image and the real system.
Displaying each frame was computationally costly (it could not be performed
quickly enough on the 75Mhz to be able to process frames at a fast enough
rate). The code used to overlay object information onto the real frame allowed the ‘real’ tracking interface to dump frames to disk at intervals. This
allowed the user to check the output at various instances. To prevent any
deterioration to the system speed an option was introduced to allow only the
final positions to be output when the system was stopped.
4.2
Extensions to the basic system
The detection/tracking system as it has been demonstrated (section 3.5.3 on
page 46), worked sufficiently well for simple robot applications. A number
of features that were simplified for the initial detection system could be
improved:
• Derive the indicator heading, not just the orientation of the axis of elongation.
60
4. Object Tracking
• More effective tracking window placement — using the heading and speed
of the indicator to predict a new position.
• Ability to predict positions of occluded objects.
• Ability to find partially occluded indicators.
• Automatic intensity (and area) thresholding.
Except for calculating the heading of the indicator these extensions were not
implemented. Suggestions for making these improvements are discussed in
section 7.3 (on page 101), and left for future extensions to this project.
4.2.1
Calculating the heading of an indicator
As described in section 3.2.1 (page 33) the orientation of the axis of elongation
of a indicator could be found by using equation 3.3 with the pixel moment
coefficients found in equations 3.4.
The orientation of this axis, gives no indication as to which way a robot is
facing along the line. A simple way to rectify this was achieved by grouping
the indicator identification markers at one side of the indicator’s elongated
axis. This side defined the front of the indicator — the heading of the indicator was toward the markings (along the elongated axis). The method used
to calculate the heading from these markings and the calculated orientation
is shown in appendix A.5.
KEY
CENTER MASS
OF LABEL
CENTER OF
MASS OF HOLES
HEADING
LABEL
Figure 4.7: The heading of an indicator
An example of the heading calculation is shown in figure 4.8. Four indicators
are placed on four LEGO robots. It can be seen that the heading of the
4.3 Tracking experiments
61
indicators has been accurately found. Providing the user knows how the
indicator is positioned on the robot, this heading can be transformed into
that of the robot.
Figure 4.8: Detecting the headings of indicators
4.3
Tracking experiments
This section presents some experiments used to the time the process used to
detect objects in a frame, given the previous location of the objects. Some
simple experiments are also conducted showing the accuracy of the tracking
system as a whole.
4.3.1
Execution time
The cluttered frame processed in the experiments in section 3.5.3 (see figure
3.12 on page 44) was again processed to test the speed of the tracking process
(using a window). The image was loaded and scanned for the seven objects.
The tracking phase was then applied 100 times on the seven objects found
to calculate an average time to apply the tracking phase.
62
4. Object Tracking
System
Time for detection frame (s)
Test arena
0.023
Khepera arena
0.249
Time for tracking frame (s)
0.0056
0.056
Table 4.1: Execution time of the tracking process
During the tracking phase a window of 70x70 pixels was placed around the
previous position of each object found. Seven objects were found so only
70x70x7 pixels were processed in each frame. This is approximately a tenth
of the amount of pixels processed when looking at the whole 640x480 image.
It can be seen that:
• The frame grabbing hardware (used in the Khepera arena) could deliver a
frame every 0.02s so the system could track objects in every third frame
(in which time the Kheperas could move a maximum 6cm) which is much
more acceptable. When objects are lost, the small scanning window used
to find them would slow the system down but not to the level of scanning
the whole image.
• This execution time was from tracking 7 robots. Usually less than this
(two or three Kheperas) would be tracked.
• Transferring frames via the USB was obviously a bottleneck in the test
arena. The system could process 10 frames while waiting for the next
frame to be transfered.
• The main question was whether one frame in 0.06s was too slow — which
is dependent on the type of robot used and the application.
4.3.2
Static positions
Two indicators were placed in the test arena. The tracking system was run
for approximately 8 minutes during which time the indicators were not moved
4.3 Tracking experiments
63
and the environment kept moderately static. This allowed a measure of the
accuracy of the tracking/detection process over time to be found (evaluating
how much the position and orientation information changed due to slightly
changing environmental conditions).
Figure 4.9: The positions of two static indicators
Figure 4.9 shows the two indicators in the test arena. Indicator ‘one’ has
center of mass position at approximately (413, 199), object ‘two’ has its center
of mass at ∼ (346, 374).
Object
Total frames
Lost indicator
Found indicator
Mean
Std. Dev.
1
6823
4
6819
X
Y
Heading
412.9 198.9
132.4
6.2
3.9
4.6
2
6823
6
6817
X
Y
Heading
345.9 373.9
94.02
2.7
4.4
3.2
Table 4.2: Evaluations of the tracked positions obtained from static indicators
The results obtained from the system are shown in table 4.2, which shows:
64
4. Object Tracking
• The extracted heading and position of the indicators was approximately
accurate.
• The ‘lost frames’ entry shows the number of instances where the indicator
could not be found by the system. This does not represent the instances
where an indicator was incorrectly found.
Mis-classified objects and heading calculations
By visual inspection of a plot of the static positions, it was seen that the
first indicator had been found in the completely wrong area in four frames.
This had occurred once for the second indicator. The system occasionally
classified the small section of silver tape at the top-left of the image as an
indicator. This occurred when:
• The real indicator had been lost (and the system was searching for it).
• The lighting conditions were such that an area on the silver tape matched
the matching criteria (so 1 or 2 ‘holes’ appeared to be on the tape).
• When the lighting conditions changed so that this area no longer matched
the identification criteria, the system reverted to the scanning phase and
found the real indicator.
By removing these mis-classified cases (by hand) all of the data points were
centred on the correct center of mass position. The change in heading between the frames could be examined more clearly as shown in table 4.3.
Object
1
2
Max
133.70 95.98
Min
131.07 91.96
Mean
132.28 93.99
Std. dev 0.42
0.72
Table 4.3: Heading calculations from static indicators
4.3 Tracking experiments
65
• Clearly the heading calculation is effected by noise more than position calculations (principally due to the position being to a nearest pixel integer).
• The robot heading information is only used as a guide as to what the robot
is doing. Usually an error of 10◦ is acceptable.
4.3.3
Tracking objects moving in a straight line
A straight-line ‘runner’ was built from LEGO, which allowed an indicator to
be smoothly pushed in a straight line across the image.
The y position of the indicator was kept constant as the indicator was pushed
across the arena with a heading of ∼180◦ . The y position actually changed
by approximately 10 pixels during this distance, so the indicator was not
moving exactly horizontally across the image.
332
450
331
400
330
350
329
328
250
327
y
300
200
326
150
325
100
324
323
50
0
322
0
100
200
300
400
500
Figure 4.10: The observed COM
positions of an object on a straight
line path
600
0
100
200
300
400
500
x
Figure 4.11: The straight line path
(y-magnified)
The path is shown in figure 4.10. The system examined 349 frames, and lost
the indicator in 18 of those.
• It can be seen that there were no mis-classified points, and the path of
600
66
4. Object Tracking
the positions is fairly smooth. It can be seen in figure 4.11 that there is a
slight mis-calculation of the center of mass y pixel (at approximately x =
150, 300 and 550), where the y-position fluctuates +/− 1 pixel (possibly
due to jerky movement of the indicator).
• The mean orientation of the indicator was found to be 179.7
◦
with a
standard deviation of 2.09. Any inaccuracy was partly due to the indicator
not being accurately positioned on the base on which it was being pushed.
4.3.4
Tracking simple robot tasks
Four LEGO robots conducting very simple tasks (or random movement) were
placed in the test arena and tracked for a minute using a tracking window of
70x70 pixels. This window was 135mm across which limited the movement
of each robot to less than 7mm in 0.06s (less than 1 ms−1 ).
450
Robot−1
Robot−2
Robot−3
Robot−4
400
350
Y
300
250
200
150
100
50
150
200
250
300
350
400
450
500
550
600
X
Figure 4.12: Paths of four robots being tracked
• The thresholds used to segment the image were: 110 (to separate the
indicators) and 30 (to find the holes).
• Figure 4.12 shows the center of mass positions obtained from the system
while tracking the four robots.
4.3 Tracking experiments
67
• It can be seen that the first two robots are followed successfully (the plot
of their paths describes a more or less continuous, sensible path).
• The robots were tracked for 1217 frames (1.5 minutes). The number of
instances where the robots were lost are shown in table 4.4.
Object
Lost points
1
14
2
93
3
13
4
420
Table 4.4: The number of instances of the system losing a robot
• The third robot seems to have been lost at low y values and low x values
(where its path seems spread out). However, the system only lost this
robot on 13 occasions as shown in table 4.4, so the robot seems to have
sped up at these points.
• The second robot was lost on several occasions. A few positions can be
seen a distance away from the main path described by this robot. The
intervening points have been lost.
• The path of the fourth robot is very disjointed because of the system losing
it so frequently.
• The lost points are partially due to poor threshold selection. All of the
robot tracking and detection experiments shown were conducted with the
same thresholds. They should have been carefully selected before each
application. The second and fourth indicators had markings close to each
other and the edge, which may have resulted in the holes ‘blurring’ together
causing mis-identification.
• By visual analysis of the individual frames, it was seen that the system
68
4. Object Tracking
had segmented some bright yellow LEGO bricks around the fourth robot
as part of the indicator. Often the ‘nodules’ on the LEGO bricks would
cause shadows causing the system to see extra holes in the indicator.
4.3.5
Tracking Kheperas
Two Khepera robots were tracked in the Khepera arena. To prevent thresholding problems they were contained in a small, clear area in the bottom
left of the arena. The umbilical chord and crane where still present causing
occlusion problems.
To be able to track the robots, massive indicators (in relation to the size of
the Khepera) were required so that the hole markers could be located (as
shown in figure 4.13).
Figure 4.13: The respective size of the Khepera and indicator used in Khepera
tracking experiments
• The thresholds used were: 150 to extract the indicator from the image,
and 100 to find the holes in the indicator.
• The robots were kept in a uniform section of the arena where these thresholds applied.
• Figure 4.14 shows the tracked path of the two Khepera.
4.4 Tracking summary
69
• 448 frames were processed. The first Khepera was lost in 62 instances, and
the second Khepera lost in 110 instances.
• These lost instances are mainly due to objects being occluded by the overhead crane.
• The system could process each frame in 0.03s (on average).
400
380
Y
360
340
320
300
Khepera−1
Khepera−2
280
180
200
220
240
260
280
300
320
X
Figure 4.14: Paths of two Kheperas being tracked
Summary of results
• Using a tracking window improved the processing time. This made the
system suitable for use with the Khepera system (a Khepera could move
a maximum of 6cm between each frame being processed).
• The position information extracted was accurate.
• The heading calculation was accurate (to 10◦ ), and fairly robust to noise.
4.4
Tracking summary
• Object tracking was achieved by applying the object detection process
in a number of small tracking windows, using the previous position of
objects.
→ The frame was scanned if an object couldn’t be found.
70
4. Object Tracking
→ Objects were identified in each frame to reduce the number of misclassified objects.
• The heading of the robot was be calculated (as opposed to its orientation).
• Each frame could be processed quickly (in the tracking phase). A demonstration was timed at 0.03s using the Khepera hardware and 0.003s using
the test hardware. The Khepera hardware could process the frame approximately fast enough so a Khepera could move a maximum of 6cm (∼
8 pixels) between processed frames.
• The extracted position and orientation of the robots where shown to be
sufficiently accurate.
• The system could be used to track objects sufficiently well providing fixed
thresholds were selected which allowed the indicators to be segmented
during the tracking period.
• The system was application dependent. The size of the tracking and scanning windows (and the threshold settings) needed to be set by the user
depending on the speed of the system, the robot type, and end application.
• Objects were frequently lost due to occlusion, and also because the system
checked the objects identity in each frame. Slight changes to the environment could cause a correct indicator to fail these tests.
• Instances of incorrectly matching areas in the image to indicators were
rare.
Chapter 5
Multiple robot tracking
application: The waggle dance
The next two chapters provide an example of using the tracking system as a
robot sensor to aid in a multi agent task. This chapter describes a postulated
method of communication between bees (describing a path to distant food
sources to other hive mates). The following chapter demonstrates a simple
model of this method of communication, conducted as a robotics task using
only positional information from the overhead camera tracking system as
input to the robot agents.
5.1
Bee foraging and communication
To achieve efficiency while foraging, it is postulated that honey bees share
information about potential sources of food. When a scout bee locates an
important source it attempts to recruit other bees to ensure that the colony’s
foraging force is focused on the richest available area.
Studies have shown that there is a good correlation between the ‘dances’ that
a bee performs in the hive, and the area consequently searched by other bees
who have followed the dance [4]. This form of communication is unique in
71
72
5. Multiple robot tracking application: The waggle dance
the animal kingdom and offers several interesting research possibilities:
• The understanding of the information held in the dance is by no means
complete, each dance could contain many biophysical signals and it is not
clear which are critical.
• Several biologists are sceptical of the amount of information that the dances
contain (some doubt that they contain any information at all).
5.2
The round dance
It is postulated that for food sources close to the colony (< 80m) a ‘round
dance’ is performed (as shown in figure 5.1). This elicits flight and searching
behaviour1 for flowers close to the nest but without respect to a specified
direction.
Figure 5.1: The Round Dance
1
By olfactory and visual clues.
5.3 The waggle dance
5.3
73
The waggle dance
A more interesting method of recruitment is by the waggle dance [4]. A
returning forager bee performs a miniaturised re-enactment of its journey.
Neighbouring bees appear to learn the distance, direction, and maybe even
the odour and ‘importance’ of the flower patch by following the dance. A
following bee would seem to translate the information contained in the dance
into flight information.
It can be said that the bees are sent and not led to the goal. This seems unique
in that it is a truly symbolic message that guides a complex response after
the message is given, unlike other examples which are effective only while the
signals are in existence (or soon after in the case of chemical communication).
5.3.1
Dance description
The waggle dance consists of a figure of eight, looping dance with a straight
‘middle run’ (figure 5.2). During this straight run [13] the bee waggles her
abdomen laterally and emits strong vibrations2 .
While the forager performs this dance other recruits gather behind it and follow it through the dance, keeping their antennae in contact with the leading
bee.
There is a strong correlation between the orientation and speed (and the rate
of ‘waggle’) of the straight run and the direction and distance of the food
source from the hive [4].
There are two main lines of research regarding the waggle dance. One involves
the efficiency of recruitment of sister bees and the other with the mechanisms
involved in the communication.
2
Both audible, and in the hive substrate.
74
5. Multiple robot tracking application: The waggle dance
Figure 5.2: The Waggle Dance
5.3.2
Orientation of the food source
Flowers directed in line with the azimuth of the sun are represented by
straight ‘runs’ in an upward direction of the vertical combs of the hive. The
direction to the food source is coded inside the hive by the angle of the
straight runs from the vertical. This angle corresponds the angle between
the azimuth of the sun and food source from the hive (figure 5.3). Dances
directed downwards indicate the opposite direction.
Figure 5.3: Orientation of food source from hive
5.4 Bee communication summary
5.3.3
75
Distance
The distance of the food source from the hive is signalled by the duration of
waggle run. Duration increases non-linearly with distance. Early researchers
[4] supposed the estimate of distance was derived by the energy expended
during flight. Modern research [2] has demonstrated that bees monitor physical distance visually by optical flow methods.
5.4
Bee communication summary
• The Waggle dance is a symbolic message that guides a complex response
from a following bee.
• The dance is a unique form of communication in the animal kingdom.
• It is postulated that the orientation, distance and ‘importance’ (size and
type) of the food source is encoded into the dance.
• Not all biologists believe in every aspect of the dance language.
• Very little is known about which combination of biophysical signals are
transmitted and how this information is used outside the hive.
• The bees often follow several dances and their recruitment is not precise.
Many bees fail to find the source or may stumble on it by following floral
aromas or by following other bees.
Chapter 6
Robot implementation of the
waggle dance
To demonstrate the multi agent tracking system, the positional information
of a number of tracked robots was used implement a simple model of the
waggle dance.
The waggle dance was implemented as a robot following task with a single (or
several) robot bee(s) following a dance performed by a leading bee. For this
demonstration, only a simple robot control system coupled with positional
information from the tracking system was used to copy the dance on the
following ‘bees’.
For this demonstration, only the most basic signal contained in the dance
(the orientation of the food source) was modelled.
It should be noted that this model is only for the purposes of demonstrating
the tracking system. The model is a gross simplification:
• The waggle dance takes place in a dark hive. Visual signals can be discounted as stimuli to the following bees.
• A bee learns the dance by following close behind the leading bee (such
that the bees touch), and the orientation of the food source is in respect
to gravity.
77
78
6. Multiple robot tracking application: Robot implementation
Figure 6.1: Robot bees
For the scope of this report the simplifications are unimportant and this
example can be thought of as any basic multi-robot following example, using
only the positional information from the tracking system (eg using no light
or touch sensors).
To reduce the number of mis-classified or lost occurrences (due to occlusion,
changing lighting conditions, etc), the experiments were performed in the
test arena. This allowed much more accurate tracking to be performed in a
more easily controlled environment.
A selection of LEGO MINDSTORM robots were constructed. The MINDSTORM microcomputer (the RCX unit) is able to communicate with the
world via an IR transceiver. This provided a useful mechanism to communicate with the robot bees, from the controlling system, without the need of
a system of cables (which would cause more occlusion and shadows, deteriorating the performance of the tracking system).
6.1 The dancing bee
79
Figure 6.2: The dancing bee
6.1
The dancing bee
To reliably and repeatedly follow a figure of eight dance pattern, the dancing
bee robot followed a silver line marked on the arena. This allowed any path
to be marked out for the bee to follow. A straight path connected by two
loops was constructed.
This straight line part of the path indicated the orientation of the food source.
An example of this is shown in figure 6.3. A path has been marked out in
silver tape on the floor of the test arena. Overlayed onto the silver tape are
the tracked positions of a line following robot after following the path for a
number of minutes.
6.1.1
Robot construction
A robot ‘bee’ was constructed from LEGO MINDSTORM to follow the line
marked on the arena. A gear ratio of 25:11 was used, as shown in figure 6.4,
1
On top of the built in LEGO motor gearing.
80
6. Multiple robot tracking application: Robot implementation
Figure 6.3: The dance path, with overlayed positions obtained from the
tracking system
which allowed the bee to travel at a top straight line speed2 of 15cms−1 . In
each 0.06̇s time-slice the robot could move a maximum of 1cm (∼5 pixels).
KEY
MOTOR
GEAR (8 TEETH)
GEAR (40 TEETH)
Figure 6.4: The layout of the gears in the robot bee
A tracking window of 70x70 pixels was used to contain the ∼220 pixel-sized
indicator and allow for movement of the robot, when tracking a particular
robot. When scanning for an indicator the system would search the whole
image in each frame3 .
The robot was built on two large wheels ∼114mm apart. A small wheel
2
3
Measured by the tracking system.
The extra processing requirements could be afforded, using the 500MHz processor.
The system gained in that lost objects would be found as soon as they became visible.
6.1 The dancing bee
81
located at the rear of the robot prevented it from tipping over, although this
also reduced its rate of turn, due to friction. The turning rate at maximum
motor output (with wheels turning in opposite directions) was approximately
43◦ s−1 . The robot could turn a maximum of 3◦ in one time-slice.
RCX UNIT
LIGHT SENSOR
(Leading bee)
INDICATOR
IR TRANSCIEVER
(FRONT)
Figure 6.5: Overhead view of the robot bee
A MINDSTORM light-sensor brick was placed at the front of the robot near
to the center of its wheel base, approximately 5mm above the ground. The
dance path was marked on the arena with silver tape. At this height the
light sensor brick detected a (raw) value of > 60 when it was over the tape.
As the sensor moved away from the tape this would drop to around 30.
6.1.2
Line following method
An object indicator was placed in the center of the robot above its turning
point. Several maneuvers performed by the robot would be sweeping turns
looking for the line so the indicator was placed on the turning axis so its
displacement would be minimised while the robot turned. The indicator was
placed horizontally across the robot, such that an indicator heading of 0◦
corresponded to a real heading of 270◦ (heading vertically ‘up’ the image).
The basic line following system algorithm can be seen in appendix B.1.
82
6. Multiple robot tracking application: Robot implementation
6.1.3
Extracting orientation information from the robot
dance path
The tracking system could dump the position of each robot found in each
frame into a data file. Simple MATLAB routines were written to extract and
analyse the positions of a specified robot from this file. By examining plots
of the positions, any obvious instances of object mis-classification could be
removed.
If enough points were given to describe a robots motion around the dance
path, a straight line could be fitted through the data points to provide an
indication of the orientation of the path (and hence the orientation of the
food source).
Fitting a straight line to the dance path
Simple line fitting algorithms produce a basic least squares regression line
of y on x (which relies on uncertainty in the ‘y’ data points). This will fit
a line through the data such that an equal number of points are on either
side of the line ‘boundary’. There is no guarantee that this would match the
straight line part of the path 4 .
However, given that the detection system calculates the orientation of the
axis of elongation of a number of points (as in section 3.1.1 on page 29) it
made more sense to re-apply this algorithm to find the orientation of the set
of positional points obtained from the robot following this path. It should
be clear that the axis of elongation of this path describes the orientation of
the dance path.
An example of the calculation of the orientation from the positions of a
dancing bee is shown in figure 6.6 which shows the orientation (as calculated
4
A regression line of x on y can also be calculated and coupled with the y-on-x line
such that the resultant line gives an accurate indication of the orientation.
6.2 The dance following bee
83
by a MATLAB routine) overlayed with the original path onto an image of
the arena.
600
500
400
Y
300
200
100
0
−100
180
200
220
240
260
280
X
300
320
340
360
380
Figure 6.6: Extracting the orientation of a food source from the positions of
a dancing bee
The next task was to design another bee (or a set of bees) which could follow
this dance and maybe communicate this information to further bees.
6.2
The dance following bee
While the line following bee was executing its version of the waggle dance,
the tracking system relayed positional information to a second program which
controlled the following robot(s). This program used the position of the bees
to execute a following behaviour on the dance copying bee(s).
6.2.1
Communicating with the MINDSTORM bees
The MINDSTORM RCX units can transmit or receive an infrared message
as a series of unsigned byte-characters. The MINDSTORM IR tower which
connects to the PC, downloads programs to the RCX unit in this way.
84
6. Multiple robot tracking application: Robot implementation
Figure 6.7: Communication between the RCX and a PC
The serial protocol used to send single byte messages via the IR tower is
simple [14]. This allowed the following-control program to send a simple instruction (turn left, go back, etc) as an IR message to the desired robot. The
range of the IR transmitter (set to long-range) exceeded the ∼2m test arena
size (in good IR lighting conditions). The IR messages could be ‘bounced’
off walls and clutter making it easy to communicate with the RCX unit
whichever way it was facing. A brief description of this serial protocol is
given in appendix B.2.
6.3
The robot bee controller
The controller sent commands to the following bees depending on the state
of the leading bee. The only input into the controller were the positions
and headings obtained from the multi agent tracking system. The controller
attempted to copy the movement of the leading bee on the following bees:
• A movement and a turning threshold were set.
• If the leading robot moved more than the movement threshold between
each frame, the following bee was instructed to move forward.
• If the difference between the headings of the two bees was greater than the
turning threshold, then the following bee was instructed to turn such that
it was heading in the same direction.
6.3 The robot bee controller
85
• The thresholds were set to allow for slight errors in the tracking information
(rather than trying to copy a small movement that hadn’t happened).
Errors in the segmentation process could cause apparent change in the
position of the robot that were not real.
6.3.1
The bee controls
A following bee could be sent four commands: LEFT, RIGHT, FORWARD
and REVERSE. Each command, for each bee, had a different corresponding
byte value so a command could be sent to only one bee (without confusing
the others).
6.3.2
Bee control simulator
To evaluate the control process without the need of continuously online
robots, a simple simulator was built. An image could be loaded representing
the line for the leading bee to follow. The leading bee, shown in the left pane
of the simulator (in figure 6.8) performed the basic line following strategy.
Whenever it moved or turned more than a set threshold an instruction was
sent to the following bee (in the right pane) to copy that movement. Figure
6.9 shows the path followed by the controlled bee after the leading bee had
completed one circuit.
• The center of each bee is shown with a cross. The direction the bee is
facing is shown with a small white line. The previous 100 positions of each
bee are shown with small white dots — this indicates the path of the bee.
This can be more clearly seen in figure 6.10 which shows a pixel-level view
of the following bee.
• The bee could only move forward in the direction it was currently facing.
• To simplify the control process, the following bee could either turn or move
in each time step.
86
6. Multiple robot tracking application: Robot implementation
Figure 6.8: The start of the
simulation
Figure 6.9: The simulated
paths of the bees after one circuit
• Various delays and limitations were coded into the system to allow for
inaccuracies in the real system. The movement of the following bee would
be delayed for one frame. It would not move until the leading bee had
either moved more than 2 pixels in one frame, or the leader had turned
such that there was a difference of over 5◦ between the headings of the two
robots.
• Even with these limits placed on the system, the following bee was able to
trace out a fairly accurate description of the path as shown in figure 6.9.
• The controller would occasionally miss small turns by the leading bee so
the follower was not properly lined up when it moved. This caused the
path to gradually drift across the arena when the simulation was run for
several ‘circuits’ of the dance.
6.3 The robot bee controller
87
Figure 6.10: Simulation of the following bee
Matching the heading of the robots
If the difference between the headings of the two bees was over a specified
threshold, commands were sent to match the heading of the following bee
with that of the leading bee. Simple calculations were performed to calculate
the direction that the following bee had to turn to match its heading with
that of the leader, such that it would turn the shortest distance.
6.3.3
The real robot
The RCX unit has seven motor power settings. It can time in increments of
0.01s. Following bees identical to the leading dancing bee were built from
LEGO (but without the light sensor brick).
• To prevent the robot turning too far when trying to match the heading
of the leading bee, a delay was set which waited for 6 counts (0.06s),
allowing the turn to be executed, then the motors would be switched off.
This ensured that the robot would not move too far in one turn. The
controller would send a TURN signal each frame until the heading of the
following robots matched that of the leader so the turn would be executed
in a series of steps.
• The turning delay had to be implemented because each robot could turn a
maximum of 3◦ in each time slice which coupled with the delay in process-
88
6. Multiple robot tracking application: Robot implementation
ing the position, sending a command, and the RCX stopping the robot,
meant the robot could turn further than it was required before receiving
a STOP signal. If the robot tried to achieve an accuracy of 5◦ (as obtainable by the tracking system) it would continuously sweep backwards and
forwards continually correcting itself. By inserting this stepped movement
the robot could be positioned more accurately.
• The leading bee would not move forward for long periods. Most of its
movements would be in short bursts which wouldnt be translated into
movements in the following bee. This prevented tracking errors being
mistaken for real movement. Movements by the following bee had to be
enhanced to compensate for this.
• The leading, line-following bee was slowed down further to allow more
time for the following bee to match its heading. After each turn, before it
started to move, the leader would wait for a number of cycles.
• For the same reason, the turning rate of the leading bee was also reduced
by achieving each turn in a ‘stepping’ motion.
While experimenting with real robots, it was found that yellow LEGO bricks
(and the yellow RCX units) would have an intensity similar to the paper
used to make the indicators. Some positions of the robots (or darker bricks
surrounded by yellow bricks) could cause the tracking system to match part
of a LEGO structure with an indicator. This was obviously a serious design
flaw (the mis-classification of indicators is the most serious problem in the
tracking system). To overcome this, the trivial solution was to either not use
these bricks, or to shield any bright areas with darker bricks.
6.4 Results
6.4
89
Results
The arena was constructed on a desktop with length less than 2m. Because of
time limitations all of the experiments were performed in this arena. It would
have been more sensible to relocate the equipment to allow more arena-space
(and to prevent damage to a robot if the controller mis-positioned it at the
edge of the table).
The ‘heading-matching’ part of the dance following method was implemented
first.
6.4.1
Copying the heading of the leading bee
Four identical bee robots were used. One followed the silver line placed onto
the floor of the arena, the others attempted to match the orientation of this
robot (by executing the commands sent by the controller).
Figure 6.11: Three robot bees matching the heading of a leading bee
After each turn the leading bee was delayed by at least5 0.4s to allow the
other robots could catch it up. A MATLAB plot of the heading of each robot
in each frame is shown in figure 6.12.
5
Depending on the angle turned.
90
6. Multiple robot tracking application: Robot implementation
400
200
0
0
50
100
150
200
250
300
350
400
450
500
0
50
100
150
200
250
300
350
400
450
500
0
50
100
150
200
250
300
350
400
450
500
0
50
100
150
200
250
Frame
300
350
400
450
500
400
300
200
100
400
300
200
100
Heading
400
300
200
100
Figure 6.12: The heading of the four bees at the start of the following process
This plot shows the first 500 frames. It can be seen that there is a slight
delay between the leading robot changing its heading and the others copying
it.
• The controlling process was only started in frame 90, from this point the
robots all start to turn to face in the same direction as the leader (which
is also turning in the opposite direction).
• The robots are all aligned by frame 150.
• Between frames 350–420 the IR control signal was not read by the third
robot, resulting in it not changing its heading. The IR tower was repositioned at this point so that the robot could read the turn signal.
• Around frame 240 the tracking system seems to have confused the heading of the fourth robot making it turn in the wrong direction. This was
either because the wrong object was identified as the robot, the heading
6.4 Results
91
calculation or the controller failed, or maybe the robot turned slightly too
far and had to adjust.
Leading−bee
400
200
0
0
500
1000
1500
2000
2500
3000
3500
4000
0
500
1000
1500
2000
2500
3000
3500
4000
0
500
1000
1500
2000
Frame
2500
3000
3500
4000
0
500
1000
1500
2000
Frame
2500
3000
3500
4000
400
200
0
400
200
0
Heading
400
200
0
Figure 6.13: The heading of the four bees over the following process
Figure 6.13 shows the plot of the headings of the four robots for 4000 frames.
In this time the leading robot made two circuits of the path.
• It can be seen that the following robots match the heading of the leading
robot successfully over this period.
• The tracking system has problems with headings of around 0◦ . The heading
of indicators in this position seems to fluctuate between 0◦ and 180◦ in
successive frames.
92
6.4.2
6. Multiple robot tracking application: Robot implementation
Following the dance path with a single bee
A single robot bee was setup to test the dance following controller. More
than one bee would not fit in the arena (time constraints prevented the arena
being moved to a more sensible location).
The controller instructed the following bee to move forward if the leading bee
moved more than 3 pixels between frames. This prevented the bee moving
because of poor indicator segmentation. The following robot moved for 0.15s,
to ensure it moved a fairly large amount to compensate for small movements
made by the leading bee that were not followed.
The paths of both the leading bee and following bee are shown in figure 6.14.
The orientation of the two paths are also overlayed.
Figure 6.14: The dance path of the leading bee, the path of the following
bee, and the orientation of the two paths
• It can be seen that the orientations are approximately the same. The
calculated orientation of the dance path performed by the leading bee was
76.62◦ whereas the orientation of the path formed by the following bee was
78.95◦.
6.5 Summary of the multiple agent tracking application
93
• The orientation of the followed path should be greater than this. The
orientation of each complete circuit made by the robot seems to reduce
as the robot drifts (because the headings of the robots are not completely
matched when the following robot moves).
• The orientation of the first circuit made by the bee was 84.44◦. This is still
fairly accurate considering the delay and other limitations of the system.
6.5
Summary of the multiple agent tracking
application
• Providing the LEGO robots were slowed down sufficiently, the 0.06ms
frame time obtainable by the vision hardware, allowed a robot controller
to accurately implement a robot following behaviour by using the information obtained from the tracking system. This included the extra delay
in sending and executing commands to the following robots.
• This allowed the heading of a leading robot to be matched successfully by
a number of robots.
• The actual movements of the leading robot were harder to detect and hence
follow. Uncertainties in the position of the robot meant that the system
had to ignore apparent movements of less than 3 pixels (which could occur
due to poor segmentation of the indicator). This meant that real small
movements made by the robot were lost.
Chapter 7
Conclusion
This chapter re-outlines the initial goals of the project and discusses the effectiveness with which these were met while describing the inherent limitations
built into the system.
Each chapter of the report demonstrated the following:
1: Described the machine vision problems associated with tracking objects
in successive frames. Principally these involved making a correct classification of an object given that a segmentation process had extracted
the object pixels from the image successfully. This segmentation process
was non-trivial.
2: The vision hardware used during the scope of this project was introduced.
A point was made concerning whether a tracking system could process
each frame sufficiently quickly for real time processing of the problem
— this depended on the application and the type of robot. A cheap
frame grabber was also introduced for the purposes of evaluating its use
in future tracking applications (its reduced frame rate caused concerns
as to how useful it would be).
The existing single Khepera tracking system was explained. This system
could not track more than one Khepera, principally because it made no
95
96
7. Conclusion
attempt to identify different Khepera within the image, causing confusion
as to which robot was which over time.
3: A method of segmenting and classifying objects within frames using
object-indicators was introduced. This allowed an end user to design
the indicators using simple materials to suit the end application. The
indicators were uniquely classified by the number of markers placed upon
them. This method allowed the system to recognise different objects and
extract position and heading information from them.
Experiments were performed testing the classification system in different environments. The system would work in the complex Khepera environment, but only if the robots were kept in a small section of the
arena where the conditions were static. The system could not find all
of the robots if they were in parts of the arena with different lighting/background conditions.
4: The basic tracking procedure was demonstrated. This performed the
classification process in each frame. To increase the efficiency small sections of the image would be processed when looking for the objects.
Windows centred on the previous positions of the robots were used to
find individual robots and small sections of the arena were scanned to
find any missing objects. Experiments were performed showing that the
system could be used to obtain real time positions of Kheperas (using
the existing Khepera hardware). Approximately 1 frame in 3 could be
processed (depending on the amount of clutter in a frame), in which time
a Khepera could move 8 pixels. This could be reduced by limiting the
amount of objects/Kheperas/clutter in the frame and processing smaller
windows.
The detection process was improved such that the heading of an object
could be extracted from the object-indicator (as opposed to its orientation).
7.1 Achievement of the goals of the project
97
A GUI debugger/system-interface was shown and the method used when
storing and communicating the locations of the robots was explained.
5: The fifth and sixth chapters conducted a more elaborate test of the system. The fifth chapter gave a brief description of the waggle dance, a
method used by bees to communicate the location of distant food sources.
6: The waggle dance was implemented by only using positional information
from the tracking system following two ‘bee’ robots. The system modelled a method of a bee learning the orientation of the food source by
copying the dance of another bee. The system was demonstrated using
two LEGO MINDSTORM robots and the slower, evaluation vision hardware. By slowing the robot bees the system could accurately control the
second bee such that the dance could be reasonably followed.
A simulator of the control software was also shown.
7.1
Achievement of the goals of the project
The principle goal of the project was to refine the existing Khepera tracking
system to enable it to track several Khepera in real time.
• The classification part of the system could detect any object by matching
it with a unique object indicator found in the image. These indicators had
to be sufficiently well defined to allow the system to accurately segment
them from the image, and to identify the result.
• It was shown (section 3.5.2, page 43) that this was possible in the Khepera
arena. Although:
→ To identify the indicator, they had to be larger than the actual Khepera
size making the system cumbersome.
→ The Khepera arena was not uniformly lit and the conditions could not be
98
7. Conclusion
well controlled so a single static threshold was not sufficient to segment
the indicators from the whole arena.
→ The Khepera system contained an overhead crane (supporting the umbilical cables allowing online running of the robots). These cast shadows
around the arena and physically occluded parts of the arena such that all
of the pixels making the indicators could often not be grouped together
• The indicators allowed the position and heading of the corresponding
robots to be extracted and uniquely identified thus solving the identification problem of the single Khepera tracking system (section 3.5.2, page
45).
• Experiments demonstrated that this system could process a frame in 0.06s,
in which time the Khepera could move a maximum of 8 pixels (section
3.5.3, page 46).
The system had to be capable of distinguishing between several objects and
be able to match positional information with real life positional knowledge
of region in the environment.
• The principal tracking problem of identifying the objects was solved by
introducing simple invariant markers to the indicators. The number of
these on each indicator identified the object. By identifying each indicator,
in each frame, the probability of mis-classifying the objects was massively
reduced (section 3.4, page 38).
→ However, by making this identification the objects were ‘lost’ more frequently. This was because the system relied on a fixed threshold that
occasionally would result in poor segmentation of the image (or would
not allow the markings to be extracted from the indicator).
This was less serious than mis-classifying an object so the payoff was
probably worth it, providing that the indicators were designed well
7.1 Achievement of the goals of the project
99
enough and the thresholds selected such that these instances were minimised.
→ Serious mis-classification problems could occur when an object had been
lost and the system was performing a search. Stray shadows could cause
random areas of the image to match the identification criteria such that
the system matched an incorrect region with an object. By making
identification checks in each frame, the system would eventually decide
the incorrect region was not the object and resume the search.
• Positional information was at the pixel level and could be communicated
to external processes to use as they desired (section 4.1.4, page 57).
A sub section of the problem was to evaluate the vision equipment used in
the test arena setup as a cheap method of conducting tracking work in the
future.
• The indicator method was very general allowing the end user to build their
own application dependent indicators. This allowed the system to be very
general and work on a number of systems, as demonstrated with large
LEGO robots in the test arena (section 6.2, page 83).
• The parameters of the system could be changed to cope with application
specific factors (section 4.1.1, page 53). Principally this involved scanning
different parts of the image for the object in frames, depending on the speed
of the robot and hence its displacement between each frame. The faster
the robot the less accurate predictions could be made about its position,
and larger areas had to be searched (reducing execution time).
100
7.2
7. Conclusion
Assumptions and limitations
The end user had to select thresholds that would apply over the whole tracking space during the whole tracking period. A single fixed threshold could
never provide perfect segmentation. It was shown that only part of the
Khepera arena could be used with a single threshold (section 3.5.2, page 44).
Changes in the environment would cause objects to be frequently lost, even
in controlled environments. Very occasionally these changes would cause an
object to be lost (section 4.3.2, page 62) and also cause the system to identify
an incorrect part of the scene as the object (while searching for the missing
object).
The system identified each object in each frame, even when a prediction of the
location of an object had been made and a likely looking object was found.
This prevented the most serious problem, mis-classification of objects, but
increased the number of lost points.
The indicators had to be carefully designed by the end user such that they
were easily separable from the image and with well defined, separable hole
markers (section 3.5.1, page 42). The demonstration application showed that
even brightly coloured (gray or yellow) LEGO bricks could confuse the system
(section 6.3.3, page 87). Shadows cast by the ‘nodules’ on the LEGO bricks,
or regions of dark bricks inside light boundarys could cause the identification
process to class this part of LEGO as an indicator.
The tracking system ultimately tracks the object indicators around the image, and not the robots. Poor or incorrect positioning of the indicator would
cause obvious confusion to the system. Statistical methods, coupled with
accurate segmentation, could maybe track the actual robots in the image
without resorting to object-indicator methods, providing the shape of the
robot can be well defined.
7.3 Advice for extending this work
7.3
101
Advice for extending this work
The object indicators are identified in each frame to compensate for poor
segmentation. However this increases the time required to process the frame
and causes several points to be lost where an object fails the identification
test. With better object segmentation, or with a different method of invariant classification of the individual objects, this identification would not be
required.
A better study of the materials used to build indicators could be conducted to
select a reflective or shiny material. More uniformly placed overhead lights,
and a darker background could be used to allow a thresholding selecting
process to be more accurate.
The main problems are:
• Selecting a threshold that stays constant during the tracking run (and
which applies over the whole arena).
• Making predictions of the location of the object (to reduce the searching
time) and having confidence of its classification so its identification doesn’t
need to be checked. Having confidence in the predictions would also allow
the system to accurately guess the position of an object if it was lost.
7.3.1
Automatic threshold selection
If the environment was carefully setup such that it was uniformly illuminated
and with a uniform background, the user could select a threshold to segment
the object indicators from the image approximately successfully. Problems
arose when the lighting conditions changed, either because of a change in
lighting or external objects creating shadows. In these cases a fixed threshold would no longer apply, resulting in some of the indicators not being
segmented correctly.
102
7. Conclusion
No environment could be controlled such that one fixed threshold would
successfully segment all of the objects for the entire tracking period. In
the Khepera arena this is made worse in that the scene is never uniformly
illuminated (because of the lighting arrangement) and consists of slightly
differently textured and coloured pieces of chipboard. One threshold could
not apply across the whole arena, so experiments had to be limited to a small
section of the arena (section 4.3.5, page 68).
Segmentation errors are the principle cause of poor tracking. All of the tracking elements require an accurate segmentation to have been made. With
sufficient knowledge, automatic thresholding techniques can select an approximately accurate threshold.
Selecting thresholds in evenly illuminated scenes
Methods to automatically select a threshold require some general knowledge
of the objects, environment and application (such as the size, number, or
intensity characteristics of the objects).
An image can be assumed to contain n objects O1 ...On 1 and gray values from
different populations π1 ...πn with probability distributions p1 (z)...pn (z). In
many applications the probabilities P1 ...Pn of the objects appearing in the
image are known.
The illumination of the scene controls the gray values, so pi (z) cannot be
known beforehand so most simple auto-thresholding methods make a prediction of this, using knowledge of the application and a histogram of the gray
level intensitys contained in a specific frame.
Methods exist to select a threshold such that a certain % of the gray level
histogram is partitioned so that the same % of pixels are selected [3].
Another method assumes the object intensitys are affected by Gaussian noise.
The peaks and troughs in the gray level histogram can be partitioned by
1
Including the background.
7.3 Advice for extending this work
103
computationally searching for the troughs. However the peaks are generally
not well separated making this a non-trivial problem [9].
Selecting thresholds in unevenly illuminated scenes
With scenes that are unevenly lit, different thresholds are required in different
sub-images.
Adaptive methods divide the image into small sub-images. Thresholds in
each sub-image are selected and applied to this smaller region. The final
segmentation of the image is obtained from the union of all the thresholded
sub-images.
Variable methods approximate the intensity across the image by a simple
function (plane or biquadratic). The function fit is determined by the gray
level of the background (termed background normalisation). The gray level
histogram and threshold selection are performed in relation to the base level
determined by the fitted function [3].
As the complexity of the image increases the performance decreases — often a
fixed threshold selected by a user will perform better. Automatic threshold
selection involves analysis of the gray level histogram of the image, which
removes the spatial information of the intensity [9].
7.3.2
Simple prediction of object positions
If the previous location of an indicator was known, the system would scan
a small window based on this position when checking the new position of
the indicator (section 4.1 on page 52). The smaller this window the less
computational power was required. As the number of indicators in a frame
built up this became a requirement, especially when using the slower Khepera
arena hardware (section 3.5.3 on page 46).
Also the smaller the window the more effective a threshold could be selected
by applying auto-threshold methods in the sub window [9].
104
7. Conclusion
Rather than using the maximum distance a robot could travel to select a size
of window (and center it at its previous position) a simple extension could
be made to use the maximum speed and acceleration of the robot coupled
with its current speed and heading information to position a window more
accurately. The size of the window would allow for error and uncertainty in
the predicted position.
7.3.3
Using statistical methods to derive estimates of
position
Kalman Filter techniques have been successfully applied to object tracking
problems [21]. These techniques, though computationally complex, are reliable and generally the gain in efficiency outweighs the expense. The filter is
primarily used to predict the locations of the objects (using basic 2D models
of the mechanics, and error and prediction parameters learnt from the real
positions of the object).
These accurate predictions allow a smaller sub-image to be searched for the
object which aids both the processing time and threshold selection (when
selected by an automatic process). The accuracy of the position also allows
the system to confidently predict the location of an object that has been
occluded by other objects in the scene.
7.4
What was learnt during the scope of the
project
All vision problems boil down to making a good segmentation of the interesting objects from an image. The work performed on top of this process
in this project was relatively simple so more time could have been spent
in experimenting with different segmentation methods to a achieve a better
7.4 What was learnt during the scope of the project
105
system (although this could still be accomplished without effecting the rest
of the system).
Away from the actual subject of the report, the writer also learnt valuable
time management, resource sharing and collaboration skills, as well as learning useful skills in both MATLAB and LATEX.
Bibliography
[1] Carceroni
and
Rodrigo.
Matrox
meteor
device
driver.
http://www.cs.rochester.edu/users/grads/carceron/meteor int 1.4.1/meteor int 1.4.1.html
,
1994.
[2] H Esch and J.A. Bastian. How do newly recruited honeybees approach
a food site? In Physiology, volume 68, pages 175–181. 1970.
[3] Fisher, Perkins, Walker, and Wolfart. Hypermedia image processing reference.
http://www.dai.ed.ac.uk/daidb/people/homes/rbf/HIPR/HIPRsrc/html/hipr top.htm
, 2000.
[4] K Von Frisch. The dance language and orientation of honeybees. Belknap/Harvard University Press, 1967.
[5] Carlos Gomez Gallego. Object Recognition and tracking in a robot soccer
environment. PhD thesis, The University of Queensland, 1998.
[6] JP Gambotto. A new approach to combining region growing and edge
detection. Pattern Recognition Letters, 14:869–875, 1993.
[7] Gose and Johnsonbaught. Pattern Recognition and Image Analysis.
Prentice Hall PTR, 1996.
[8] AG
Heiss.
Big
physical
area
paderborn.de/fachbereich/AG/heiss/linux/bigphysarea.html
107
patch.
, 1994.
http://www.uni-
108
BIBLIOGRAPHY
[9] Schnunck. Jain, Kasturi. Machine Vision. McGraw-Hill International
Editions, 1995.
[10] ”Newton Research Labs”.
The cognachrome vision system.
http://www.newtonlabs.com/cognachrome/index.html
[11] ”RHAD Labs”. Video for linux.
.
http://roadrunner.swansea.linux.org.uk/v4l.shtml
,
2000.
[12] Henrik Hautop Lund, Esther de Ves Cuenca, and John Hallam. A simple
real time mobile robot tracking system. Technical Report 41, Department of Artificial Intelligence, University of Edinburgh, 1996.
[13] A Michelsen.... How honeybees perceive communication dances studied
by means of a mechanical model. In Behavioral Ecology and Sociobiology,
volume 30, pages 143–150. 1992.
[14] R. Nelson. Lego mindstorms internals.
http://www.crynwr.com/lego-robotics/
,
2000.
[15] Sunato Quek. A real time multi-agent visual tracking system for modelling complex behaviours on mobile robots. Master’s thesis, Artificial
Intelligence, Division of Informatics, University of Edinburgh, 2000.
[16] J Russ. The Image Processing Handbook. London: CRC Press, 1995.
[17] A Saighi. Image processing and image analysis for computer vision.
Keele: University of Keele Press, 1987.
[18] Trolltech. Qt free edition. http://www.trolltech.com/products/download/freelicense/qtfreedl.html
, 2000.
[19] Nemosoft
Unv.
http://www.smcc.demon.nl/webcam/
Linux
, 2000.
support
for
usb
webcams.
BIBLIOGRAPHY
109
[20] Vernazza, Venetsanopolous, and Braccini. Image Processing: Theory
and Applications. Elsevier, Amsterdam, 1993.
[21] Greg Welch and Gary Bishop. Scaat: Incremental tracking with incomplete information. In T. Whitted, editor, Computer Graphics, pages
333–344. ACM Press, Addison-Wesley, August 3 - 8 1997.
Appendix A
Machine vision algorithms
A.1
Component labelling
The standard component labelling method finds all of the connected components in an image and assigns a unique label to each component. All of
the pixels forming a component are assigned with that label. This technique
forms a label map which describes the image in terms of the object that
each pixel represents (pixels that are not part of an object are defined as the
background).
The sequential form of the algorithm requires two passes of the image.
The first sweep scans the image from left to right and top to bottom. If a
pixel is part of the foreground (the pixel is ‘on’) the two neighbouring pixels
immediately above it, and to its left are examined1 . The possible three cases
are:
1: Neither of the previously examined neighbour pixels has been labelled.
⇒ The current pixel is assigned a new label - it seems to belongs to a
new object.
2: Only one of the neighbour pixels is labelled, or both of the neighbours
1
These are its four neighbours that have already been examined by the algorithm
111
112
A. Machine vision algorithms
are assigned the same label.
⇒ The current pixel is assigned the same label as the connected neighbour pixel — This pixel belongs to the same object as it labelled neighbour.
3: The two neighbouring pixels have been assigned different labels.
⇒ Two different labels have been assigned to the same component. As
well as adding the current pixel to the component, the two labels assigned
to the two neighbour pixels must be combined by using an equivalence
table.
The equivalence table contains the information used to assign unique labels
to each connected component. During the first scan, all of the labels used to
define a component are declared equivalent. At the end of the pass the table
is renumbered so there are no gaps in the labels. Then during the second
sweep the label assigned to each pixel is renumbered using the equivalence
table as a lookup. This results in a m x n image (the label map) with each
pixels value identifying the region that the corresponding pixel in the original
image belonged to. To summarise:
Connected components algorithm using 4-connectivity (A.1.1)
1: Scan the image left to right, top to bottom.
2: If the pixel is on, then
→ If only one of the upper or left neighbour has a label, then copy that
label.
→ If both neighbours have the same label, then copy that label.
→ If both neighbours have different labels, then copy the upper’s label
and enter the labels into the equivalence table.
→ Otherwise assign a new label to this pixel.
3: If there are more pixels goto step 2.
4: Find the lowest label for each equivalent set.
A.2 Boundary tracking
113
5: Renumber labels in equivalence set, so there are no missing labels
6: Scan the picture. Replace each label by the lowest label in the equivalence
set.
A.2
Boundary tracking
The boundary of a connected component S is the set of pixels of S that are
adjacent to the background S̄. A common approach is to track the pixels on
the boundary in a clockwise sequence.
A common boundary tracking algorithm [9] selects a starting pixel s ∈ S
known to be on the boundary and tracks the boundary until it reaches the
starting pixel s, assuming that the boundary is not on the edge of the image.
Boundary following algorithm
(A.1.2)
1: Find the starting pixel s ∈ S for the region.
2: Let the current pixel be c. Set c = s and the the 4- neighbour to the
‘west’ of c be b ∈ S̄, ie a boundary pixel.
3: Let the eight 8-neighbours of c starting with b in a clockwise order be
n1 , n2 , ..., n8 . Find ni for the first i that is in S.
4: Set c = ni and b = ni−1 .
5: Repeat steps 3 and 4 until c = s.
A.3
Locating Holes
The identification scheme used to classify object-indicators in this report
(section 3.4 on page 38) uses dark markers placed on the indicator. These
markers appear as ‘holes’ in the indicator to the vision system.
The set of all connected components S̄ (the complement of S), that have
points on the border of an image are background regions. All others components of S̄ are holes [9].
114
A. Machine vision algorithms
IMAGE
ON PIXEL
OFF PIXEL
BACKGROUND
HOLES
Figure A.1: Identifying an indicator by holes
When searching for holes to avoid ambiguity, a pixel’s 4-neighbours are examined when connecting foreground pixels and its 8-neighbours are used when
finding background pixels. If the same ‘connectedness’ is used then awkward
situations can be encountered. A simple case is shown in table A.1. If the off
pixels (0’s) are connected then the on pixels should not be. If the foreground
pixels were joined by using an 8-connectedness mask then a four connectedness mask should be used to search for background pixels (to prevent this
ambiguity) [9].
1
0
0
1
Table A.1: Connectedness ambiguity
The algorithm searches for the perimeter of each object, these perimeter pixel
coordinates are sorted, so there is a list of column boundaries of the object
for each row. Using this information the object is scanned using a connected8 algorithm (similar to connected-4, but using the north-west neighbour as
well). This allows the number of holes (and their moments) to be calculated
in a similar method to the moments of objects.
A.4 Orientation of an objects axis of elongation
115
x
y
REGION
AREA SCANNED FOR HOLES
Figure A.2: The area scanned when looking for holes
A.4
Orientation of an objects axis of elongation
The orientation of an object can be represented by the orientation of its axis
of elongation (provided it has one), as in section 3.1.1 on page 29.
Usually the axis of least second moment (the axis of least inertia in 2D) is
used as the axis of elongation.
This axis is the line for which the sum of the squared, perpendicular distances
between the object points and the line is minimised.
χ2 =
n X
m
X
rij2 B[i, j]
(A.1)
i=1 j=1
where rij2 is the perpendicular distance from the object point [i, j] to the line.
The line is represented in polar coordinates to avoid numerical problems
when the line is vertical:
ρ = xcosθ + ysinθ
(A.2)
where θ is the orientation of the normal to the line with the x axis, and ρ is
the perpendicular distance of the line from the origin.
116
A. Machine vision algorithms
The distance r of a point (xi , yi) from the line is obtained by plugging these
coordinates into the equation of the line and finding the error from the real
line:
r 2 = (xi cosθ + yisinθ − ρ)2
(A.3)
Plugging this distance representation into the minimisation criteria (equation
A.1) and setting the derivative of χ2 with respect to ρ to zero and solving
for ρ:
ρ = (x̄cosθ + ȳsinθ)
(A.4)
Which shows that the regression line passes through the center of the object
(x̄, ȳ).
By setting x0 = x − x̄ and y 0 = y − ȳ and substituting the value of ρ from
A.4 into A.3 and then A.1 allows the minimisation problem to become:
χ2 = acos2 θ + bsinθcosθ + csin2 θ
(A.5)
with the parameters (the second order moments) shown in equation 3.2 on
page 29:
a =
n X
m
X
(xij − x̄)2 B[i, j]
i=1 j=1
n X
m
X
b = 2
c =
i=1 j=1
n
m
XX
(xij − x̄)(yij − ȳ)B[i, j]
(yij − ȳ)2 B[i, j]
i=1 j=1
A.4 Orientation of an objects axis of elongation
117
By inserting these coefficients into A.5, differentiating χ2 and solving for θ
yields the orientation of the axis of elongation from the x axis as shown in
equation 3.3 on page 30:
tan2θ =
A.4.1
b
a−c
(A.6)
Efficient method of calculating orientation
Section 3.2.2 on page 34 demonstrated the use of more efficient algorithms
to process the image. This required a change in the calculation of the second
order moments so the first order moments (x̄, ȳ) were not needed during the
processing of each individual pixel.
By not setting x0 = x − x̄ and y 0 = y − ȳ, r 2 can be shown to be:
r 2 = (x − x̄)2 cosθ + (2xy + 2x̄ȳ − x̄y − xȳ)sinθcosθ + (y − ȳ)2 sinθ (A.7)
Substituting this into the minimisation equation A.1 and using the gray level
intensitys Fij resulted in the minimisation equation shown in A.5 with the
second order moments demonstrated in equation 3.4 on page 34:
a =
n X
m
X
Fij x2ij
i=1 j=1
X
n X
m
b = 2
+
n X
m
X
i=1 j=1
Fij xij yij − x̄
i=1 j=1
+x̄ȳ
c =
Fij xij +
n X
m
X
n X
m
X
Fij + x̄2 − 2x̄
i=1 j=1
n X
m
X
Fij yij − ȳ
i=1 j=1
n X
m
X
Fij xij
i=1 j=1
Fij
i=1 j=1
n
n X
m
m
XX
X
2
Fij yij +
Fij yij
i=1 j=1
i=1 j=1
+
n X
m
X
i=1 j=1
Fij + ȳ 2 − 2ȳ
118
A. Machine vision algorithms
These could be used as before to calculate the orientation using equation
A.6.
A.5
The heading of an object-indicator
The orientation of the object gave no indication as to which way along the
axis the object was facing. Section 4.2.1 on page 60 explained the method
used to mark the front of the object by the hole markers. The center of the
hole markers gave an indication as to which way along the axis of elongation
the object was facing.
As described in section 3.4 (page 38) the pixel moments of the hole markers
are taken into consideration when calculating the position and orientation of
the indicator. This provides a more accurate measure of the moments.
After calculating each individual hole’s moment, the center of mass position
of the holes could be found by finding the average position of each of the
holes2 . The position of this in relation to the center of mass position of the
whole indicator gave a clue as to the heading of the indicator.
α
Figure A.3: Indication of object heading from the center of holes
This calculation was not used as the indicator’s heading, because the mid
position of the holes could not be easily determined when placing the markers.
Instead this heading determined whether the more accurate heading from the
2
Every pixel that was part of a hole had equal weighting
A.5 The heading of an object-indicator
119
orientation calculations in the range [0◦ , 180◦ ] needed to be scaled by 180◦
into the range [180◦ , 360◦ ].
TRADITONAL
ORIENTATION
HEADING FROM
HOLE POSTION
(ERROR)
HEADING
Figure A.4: Deriving the indicator heading
Appendix B
Robot implementation
algorithms
B.1
Line following
The leading bee robot shown in section 6.1.2 (page 81) completed a repeatable
waggle dance by follow a dance path traced on the arena floor with silver
tape. The algorithm used to follow the silver tape was:
1: Wait for <delay> seconds. This allowed the user to position the robot
near to the line, and place the indicator before the line following process
started.
2: Move forward until the robot had found the line. This had occurred when
the value from the light sensor was greater than the threshold determined
for when the sensor was on the line.
3: While the light sensor value was greater than <threshold> move forward.
4: If the light sensor value was less than <threshold> then pivot around
the turning axis scanning for the line:
A: Set <turn counter> to limit the amount that the robot turned. The
121
122
B. Robot implementation algorithms
robot turned for a limited number of processor cycles.
B: Clear <counter>. This counted up to <turn counter> then the robot
stopped turning in that direction.
C: If the light sensor value was greater then <threshold>, then the robot
was over the line. The sweep process could stop. Goto stage 3.
D: While the light-sensor value was less than <threshold> AND
<counter> was less than <turn counter>:
• Turn in direction of the last turn.
E: If light-sensor value was less than <threshold> AND <counter> was
greater than <turn counter>:
• Reverse turn. The robot started to sweep in the opposite direction.
• Increment <turn counter> to allow the robot to sweep for a longer
period.
• Repeat the sweep process. Goto stage B.
5: Repeat from step 3.
• This algorithm allowed the robot to scan for the line when it had moved
off it. The robot would sweep in one direction for a small amount of time,
then if the line had not been found, it would sweep in the other direction
for a slightly longer period. This would be repeated until the line was
found, which ensured the robot eventually turned the shortest amount to
find the line.
• After losing the line the robot would initially turn in the direction it had
last turned. Having found the line (after performing a sweeping operation)
the robot would be just touching the line, but not aligned with the line.
After moving it would move off the line again, and would most likely find
the line a short distance in the direction it had last turned in.
B.2
Sending a byte command via the MINDSTORM transceiver
All commands to the RCX unit are sent byte by byte. A simple message can
be sent via the IR tower by filling in an 9 byte transmition message packet
B.2 Sending a byte command via the MINDSTORM transceiver
123
SILVER
LINE
T1
T2
T3
LOST
LINE
FOUND
LINE
LOST
LINE
Figure B.1: Line following: Scanning for the line
(as shown in table B.2). The IR tower can operate in two modes, this section
only documents the ‘send-slow’ mode. Also notice that the complement of
each byte, and a final checksum need to be included as part of the message.
• The first three bytes of the message represent the RCX sync pattern.
• The third byte should be the RCX Message flag, the fourth byte contains
its complement.
• The fifth byte contains the actual byte message sent to the RCX, the sixth
contains its complement.
• The seventh byte contains the message checksum with is the sum of the
third (RCX Message flag) and fifth (actual message) bytes. The eighth
byte contains the complement of the checksum.
Byte
Contents
0
0x55
1
0xFF
2
0
3
RCX Message
4
∼RCX Message
5
Message
6
∼Message
7
Checksum
8
∼Checksum
Description
RCX sync pattern
End of sync pattern
Message packet indicator
Complement of byte 3
Actual message sent
Complement of byte 5
byte(3)+byte(5)
Complement of byte 7
Table B.1: Contents of a transmit-message packet sent to the RCX IR tower