Lehrveranstaltungsinhalt aus - the Institute for Computer Graphics

Transcription

Lehrveranstaltungsinhalt aus - the Institute for Computer Graphics
Lehrveranstaltungsinhalt aus Bildanalyse und
”
Computergrafik“
Franz Leberl
28. Jänner 2002
2
Contents
0 Introduction
11
0.1
Using Cyber-Cities as an Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
11
0.2
Introducing the Lecturer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
0.3
From images to geometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
0.4
Early Experiences in Vienna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
0.5
Geometric Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
0.6
Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
0.7
Modeling Denver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
0.8
The Inside of buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
0.9
Built-Documentation Modeling the Inside of Things in Industry . . . . . . . . . . .
15
0.10 Modeling Rapidly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
0.11 Vegetation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
0.12 Coping with Large Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
0.13 Non-optical sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
0.14 The Role of the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
0.15 Two Systems for Smart Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
0.16 International Center of Excellence for City Modeling . . . . . . . . . . . . . . . . .
20
0.17 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
0.18 Telecom Applications of City Models . . . . . . . . . . . . . . . . . . . . . . . . . .
21
1 Characterization of Images
37
1.1
The Digital Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
1.2
The Image as a Raster Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
1.3
System Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
1.4
Displaying Images on a Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
1.5
Images as Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
1.6
Operations on Binary Raster Images . . . . . . . . . . . . . . . . . . . . . . . . . .
42
1.7
Algebraic Operations on Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
3
4
CONTENTS
2 Sensing
51
2.1
The Most Important Sensors: The Eye and the Camera . . . . . . . . . . . . . . .
51
2.2
What is a Sensor Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
2.3
Image Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
2.4
The Quality of Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
2.5
Non-Perspective Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
2.6
Heat Images or Thermal Images
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
2.7
Multispectral Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
2.8
Sensors to Image the Inside of Humans . . . . . . . . . . . . . . . . . . . . . . . . .
58
2.9
Panoramic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
2.10 Making Images Independent of Sunlight and in Any Weather: Radar Images . . . .
59
2.11 Making Images with Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
2.12 Passive Radiometry
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
2.13 Microscopes and Endoscopes Imaging . . . . . . . . . . . . . . . . . . . . . . . . .
61
2.14 Objects-Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
2.15 Photometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
2.16 Data Garments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
2.17 Sensors for Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
2.18 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
3 Raster-Vector-Raster Convergence
69
3.1
Drawing a straight line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
3.2
Filling of Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
3.3
Thick lines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
3.4
The Transition from Thick Lines to Skeletons . . . . . . . . . . . . . . . . . . . . .
73
4 Morphology
79
4.1
What is Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
4.2
Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
4.3
Opening and Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.4
Morphological Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.5
Shape Recognition by a Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . .
85
4.6
Some Additional Morphological Algorithms . . . . . . . . . . . . . . . . . . . . . .
86
CONTENTS
5
5 Color
93
5.1
Gray Value Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
5.2
Color images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
5.3
Tri-Stimulus Theory, Color Definitions, CIE-Model . . . . . . . . . . . . . . . . . .
96
5.4
Color Representation on Monitors and Films . . . . . . . . . . . . . . . . . . . . .
99
5.5
The 3-Dimensional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
5.6
CMY-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
5.7
Using CMYK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.8
HSI-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.9
YIQ-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.10 HSV and HLS -Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.11 Image Processing with RGB versus HSI Color Models . . . . . . . . . . . . . . . . 110
5.12 Setting Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.13 Encoding in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.14 Negative Photography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.15 Printing in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.16 Ratio Processing of Color Images and Hyperspectral Images . . . . . . . . . . . . . 113
6 Image Quality
121
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3
Gray Value and Gray Value Resolutions . . . . . . . . . . . . . . . . . . . . . . . . 121
6.4
Geometric Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.5
Geometric Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.6
Histograms as a Result of Point Processing or Pixel Processing . . . . . . . . . . . 123
7 Filtering
133
7.1
Images in the Spatial Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2
Low-Pass Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3
The Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.4
High Pass-Filter - Sharpening Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.5
The Derivative Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.6
Filtering in the Spectral Domain / Frequency Domain . . . . . . . . . . . . . . . . 140
7.7
Improving Noisy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.8
The Ideal and the Butterworth High-Pass Filter . . . . . . . . . . . . . . . . . . . . 141
7.9
Anti-Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.9.1
What is Aliasing ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.9.2
Aliasing by Cutting-off High Frequencies . . . . . . . . . . . . . . . . . . . . 142
7.9.3
Overcoming Aliasing with an Unweightable Area Approach . . . . . . . . . 143
7.9.4
Overcoming Aliasing with a Weighted Area Approach . . . . . . . . . . . . 143
6
CONTENTS
8 Texture
151
8.1
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.2
A Statistical Description of Texture
8.3
Structural Methods of Describing Texture . . . . . . . . . . . . . . . . . . . . . . . 152
8.4
Spectral Representation of Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.5
Texture Applied to Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.6
Bump Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.7
3D Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.8
A Review of Texture Concepts by Example . . . . . . . . . . . . . . . . . . . . . . 155
8.9
Modeling Texture: Procedural Approach . . . . . . . . . . . . . . . . . . . . . . . . 155
9 Transformations
. . . . . . . . . . . . . . . . . . . . . . . . . . 151
161
9.1
About Geometric Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.2
Problem of a Geometric Transformation . . . . . . . . . . . . . . . . . . . . . . . . 161
9.3
Analysis of a Geometric Transformation . . . . . . . . . . . . . . . . . . . . . . . . 162
9.4
Discussing the Rotation Matrix in two Dimensions . . . . . . . . . . . . . . . . . . 165
9.5
The Affine Transformation in 2 Dimensions . . . . . . . . . . . . . . . . . . . . . . 167
9.6
A General 2-Dimensional Transformation . . . . . . . . . . . . . . . . . . . . . . . 169
9.7
Image Rectification and Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.8
Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.9
9.8.1
Half Space Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.8.2
Trivial acceptance and rejection . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.8.3
Is the Line Vertical? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.8.4
Computing the slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.8.5
Computing the Intersection A in the Window Boundary . . . . . . . . . . . 172
9.8.6
The Result of the Cohen-Sutherland Algorithm . . . . . . . . . . . . . . . . 173
Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.10 A Three-Dimensional Conformal Transformation . . . . . . . . . . . . . . . . . . . 174
9.11 Three-Dimensional Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . 176
9.12 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.13 Vanishing Points in Perspective Projections . . . . . . . . . . . . . . . . . . . . . . 177
9.14 A Classification of Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.15 The Central Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.16 The Synthetic Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.17 Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.18 Interpolation versus Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.19 Transforming a Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
CONTENTS
7
9.19.1 Presenting a Curve by Samples and an Interpolation Scheme . . . . . . . . 182
9.19.2 Parametric Representations of Curves . . . . . . . . . . . . . . . . . . . . . 183
9.19.3 Introducing Piecewise Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.19.4 Rearranging Entities of the Vector Function Q . . . . . . . . . . . . . . . . 183
9.19.5 Showing Examples: Three methods of Defining Curves . . . . . . . . . . . . 184
9.19.6 Hermite’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.20 Bezier’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.21 Subdividing Curves and Using Spline Functions . . . . . . . . . . . . . . . . . . . . 185
9.22 Generalization to 3 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.23 Graz and Geometric Algorithms
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10 Data Structures
195
10.1 Two-Dimensional Chain-Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.2 Two-Dimensional Polygonal Representations
. . . . . . . . . . . . . . . . . . . . . 196
10.3 A Special Data Structure for 2-D Morphing . . . . . . . . . . . . . . . . . . . . . . 197
10.4 Basic Concepts of Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.5 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
10.6 Data Structures for Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.7 Three-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.8 The Wire-Frame Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.9 Operations on 3-D Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.10Sweep-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.11Boundary-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.12A B-Rep Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
10.13Spatial Partitioning
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
10.14Binary Space Partitioning BSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.15Constructive Solid Geometry, CSG . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
10.16Mixing Vectors and Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
10.17Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
11 3-D Objects and Surfaces
211
11.1 Geometric and Radiometric 3-D Effects . . . . . . . . . . . . . . . . . . . . . . . . 211
11.2 Measuring the Surface of An Object (Shape from X) . . . . . . . . . . . . . . . . . 211
11.3 Surface Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
11.4 Representing 3-D Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
11.5 The z-Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.6 Ray-tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
11.7 Other Methods of Providing Depth Perception . . . . . . . . . . . . . . . . . . . . 218
8
CONTENTS
12 Interaction of Light and Objects
223
12.1 Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
12.2 Reflections from Polygon Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
12.3 Shadows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
12.4 Physically Inspired Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . 228
12.5 Regressive Ray-Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
12.6 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
13 Stereopsis
235
13.1 Binokulares Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
13.2 Stereoskopisches Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
13.3 Stereo-Bildgebung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
13.4 Stereo-Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
13.5 Non-Optical Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
13.6 Interactive Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
13.7 Automated Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
14 Classification
245
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
14.2 Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
14.3 Features, Patterns, and a Feature Space . . . . . . . . . . . . . . . . . . . . . . . . 246
14.4 Principle of Decisions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
14.5 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
14.6 Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
14.7 Real Life Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
14.8 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
15 Resampling
15.1 The Problem in Examples of Resampling
255
. . . . . . . . . . . . . . . . . . . . . . . 255
15.2 A Two-Step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
15.2.1 Manipulation of Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 256
15.2.2 Gray Value Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
15.3 Geometric Processing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
15.4 Radiometric Computation Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
15.5 Special Case: Rotating an Image by Pixel Shifts . . . . . . . . . . . . . . . . . . . 258
CONTENTS
9
16 About Simulation in Virtual and Augmented Reality
261
16.1 Various Realisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
16.2 Why simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
16.3 Geometry, Texture, Illumination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
16.4 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
16.5 Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
17 Motion
265
17.1 Image Sequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
17.2 Motion Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
17.3 Detecting Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
17.4 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
18 Man-Machine-Interfacing
269
18.1 Visualization of Abstract Information . . . . . . . . . . . . . . . . . . . . . . . . . 269
18.2 Immersive Man-Machine Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . 269
19 Pipelines
271
19.1 The Concept of an Image Analysis System . . . . . . . . . . . . . . . . . . . . . . . 271
19.2 Systems of Image Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
19.3 Revisiting Image Analysis versus Computer Graphics . . . . . . . . . . . . . . . . . 272
20 Image Representation
275
20.1 Definition of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
20.1.1 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
20.1.2 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
20.1.3 Progressive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
20.1.4 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
20.1.5 Digital Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
20.2 Common Image File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
20.2.1 BMP: Microsoft Windows Bitmap . . . . . . . . . . . . . . . . . . . . . . . 278
20.2.2 GIF: Graphics Interchange Format . . . . . . . . . . . . . . . . . . . . . . . 278
20.2.3 PICT: Picture File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
20.2.4 PNG: Portable Network Graphics
. . . . . . . . . . . . . . . . . . . . . . . 279
20.2.5 RAS: Sun Raster File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
20.2.6 EPS: Encapsulated PostScript . . . . . . . . . . . . . . . . . . . . . . . . . 279
20.2.7 TIFF: Tag Interchange File Format
. . . . . . . . . . . . . . . . . . . . . . 279
20.2.8 JPEG: Joint Photographic Expert Group . . . . . . . . . . . . . . . . . . . 280
20.3 Video File Formats: MPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
20.4 New Image File Formats: Scalable Vector Graphic - SVG . . . . . . . . . . . . . . 281
10
CONTENTS
A Algorithmen und Definitionen
285
B Fragenübersicht
289
B.1 Gruppe 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
B.2 Gruppe 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
B.3 Gruppe 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Chapter 0
Introduction
0.1
Using Cyber-Cities as an Introduction
We introduce the subject of “digital processing of visual information”, also denoted as “digital
image processing” and “computer graphics”. We introduce the subject by means of one particular
application, namely 3D computer modelling of our cities. This is part of the wider topic of the so
called “virtual habitat”. “Modelling cities”, what do we mean by that? The example in Slide 0.7
shows a traditional representation of a city, in this particular example the “Eisene Tor” in Graz.
In two dimensions we see the streetcar tracks, we see the Mariensäule, buildings and vegetation.
This is the status quo of current urban 2-D computer graphics.
The new approach is to represent this in three dimension as shown in Slide 0.8. The two dimensional map of the city is augmented to include the third dimension, thus the elevations, and in
order to render, represent or visualise the city we add photographic texture to create as realistic
a model of the city as possible. Once we have that we can stroll through the city, we can inspect
the buildings, we can read the signs and derive from them what’s inside the buildings.
The creation of the model for this city is a subject of “image processing”. The rendering of the
model is the subject of “computer graphics”. These two belong together and constitute a field
denoted as “digital processing of visual information”.
The most sophisticated recent modelling of a city was archieved of a section of Philadelphia. This
employed a software called “Microstation” and was done by hand with great detail. In this case
this detail includes vegetation, the virtual trees, waterfountains and people. I am attempting
here to illustrate the concepts of “computer graphics” and “image processing” by talking about
Cyber-Cities, namely how to create them from sensor data and how to visualise them. And this
is the subject of this introduction.
0.2
Introducing the Lecturer
Before we go into the material, permit me to introduce myself. I have roots both in Graz and
in Boulder (Colorado, USA). My affiliations are since 1992 with the Technische Universität Graz,
where I am a Professor of Computer Vision and Graphics. But I am also affiliated with a company
in the United States since 1985 called Vexcel Corporation. In both places, the Vexcel Corporation
and the University, cyber-cities play a role in the daily work. Vexcel Corporation in the US
operates in four technical fields:
1. It builds systems to process radar images
11
12
CHAPTER 0. INTRODUCTION
2. It deals with satellite receiving stations, to receive large quantities of images that are transmitted from satellites
3. It deals with close range photogrammetry for “as-built” documentation and
4. It deals with images from the air
Slide 0.19 is an example showing a remote sensing satellite ground receiving station installed
in Hiroshima (Japan), carried on a truck to be moveable. Slide 0.20 shows a product of the
Corporation, namely a software package to process certain radar-images interferometrically. We
will towards the end of this class, talk quickly about this interferometry. What you see in Slide
0.20 are interferometric “fringes” obtained from images, using a phase differences between the two
images. The fringes indicate the elevation of the terrain, in this particular case Mt. Fuji in Japan.
Another software package models the terrain and renders realistically looking images by superimposing the satellite images over the shape of the terrain with its mountains and valleys. Slide
0.22 shows another software package to convert aerial photography to so called “ortho-photos”,
a concept we will explain later in this class. Then we have an application, a software package
called Foto-G, which supports the modelling of existing plants performing a task called “as builtdocumentation”. You take images of a facility or plant, extract from the image geometry the
location and dimensions of pipes and valves, and obtain in a “reverse engineering mode” so called
CAD (computer-aided-design) drawings of the facility.
0.3
From images to geometric models
We proceed to a serious of sub-topics to discuss the ideas of city-modeling. I would like to convey
an idea of what the essence is of “digital processing of visual information”. What we see in Slide
0.25 is on the left part of an aerial photograph of a new housing development and on the right
we see information extracted from the image of the left using a process called “stereoscopy”,
representing the small area that is marked in red on the right side. We are observing here a
transition from images of an object to a model of that object.
Such images as in Slide 0.26 show so-called “human scale objects” like buildings, fences, trees,
roads. But images may show our entire planet. There have been various projects in Graz to
address the extraction of information from images in there is a burdle of problems available as
topic for a Diplomarbeit or a Dissertation to address the optimum geometric scale and geometric
resolution needed for a specific task at hand. If I want to model a building, what is the required
optimum image resolution? We review in Slide 0.29 the Down-town of Denver at 30 cm per pixel.
Slide 0.30 is the same Down-town at 1.20 m per pixel. Finally in Slide 0.31 we have 4 meters per
pixel. Can we map the buildings and which accuracy can we get in mapping them?
0.4
Early Experiences in Vienna
Our Institute at the Technical University in Graz got involved in city-modelling in 1994 when we
got invited by the Magistrat of Vienna to model a city block consisting of 29 buildings inside the
block and another 25 buildings surrounding the block. The block is defined by the 4 streets in the
7th district in Vienna. The work was performed by 2 students in two diploma theses and the initial
results were of course a LEGO-type representation of each building. The building itself can not
be recognised, as seen in the example of a generic building. It can be recognised only if we apply
the photographic texture. We can take this either from a photograph taken from a street level or
from aerial photography taking from an airplane. The entire cityblock was modelled but a cause
that some photographic texture was missing. Particularly the photographic texture was missing
0.5. GEOMETRIC DETAIL
13
in the courtyards and so they shown black or grey here. When this occurs, the representation is
without photographic texture, and is instead in the form of a flat shaded representation.
Slide 0.37 looks at the roof scape and we see that perhaps we should model the chimneys as shown
here. However, the skylights were not modeled. What can we do with these data? We can walk or
fly through the cities. We can assess changes for example by removing a building and replacing
it by a new one. We call this “virtual reality”, but scientists often prefer the expression “virtual
environment”, since “virtual” and “reality” represent a contradiction in terms. This differs of
course from photographic reality, which is more detailed and more realistic by showing great
geometric detail, showing wires, dirt on the road, cars, the effect of weather. There is yet another
type of reality, namely “physical reality”, when we are out there in a city and we feel the wetness
in our shoes, we feel the cold in the air, we hear the noise of birds, the screeching of cars. So we
see various levels of reality: physical, photographic and virtual reality.
0.5
Geometric Detail
What geometric detail do we need when we model a city? Lets take the example of a roof. Slide
0.44 is a roofshape extracted for the Vienna example, We have not applied to the roof photographic
texture, but instead some generic computer texture. We will talk later of course about texture
and I will try to explain different types of texture for use in rendering for computer graphics. If
we apply this kind of generic texture we loose all information about the specific characteristics
of this roof. What we would like to have is the roof shown with chimneys. Maybe we need
skylights as well far the fire-guard in order to direct people to an exit through the roof in the case
of a catastrophy. There is a topic here for a Diplomarbeit and Dissertation theme to study the
amount of geometric detail needed in the presence of photographic texture: the trade-off between
photographic texture and geometry detail. To illustrate this further let us take a look at the same
roof with its skylights and chimneys and now use photographic texture to illustrate how this roof
looks like. If we take photographic texture, and if we have some chimneys, and if we render this
roof from another perspective than that from which the photograph was taken, the chimneys will
look very unnatural. So we need to do some work and create the geometric model of the chimneys.
If we employ that model and we now superimpose the photographic texture over it, we see that
we have sunshine casting shadows and we have certain areas of the roof that are covered by pixels
from the shadows left by the chimneys. If the sunshine is from another side, say in the morning,
but the picture was taken in the afternoon, we have wrong shadows. So we need to fix this by
eliminating the shadows in the texture. We introduce the shadow in a proper rendering by a
computation. We also need to fill in those pixels that are covered by the perspective distortion of
the chimneys, and use generic pixels of the roof to fill in the areas where no picture exists. Slide
0.50 is the final result: we have removed the shadow, we have filled in the pixels. We now have
the best representation of that roof with its chimneys and we can render this now correctly in the
morning and in the afternoon, with rain or with sunshine.
0.6
Automation
All of this modeling of cities is expensive, because it is based on manual work. In order to
reduce the cost of creating such models one needs to automate their creation. Automation is a
large topic and is available for many Diplomarbeiten and many Dissertations. Let me illustrate
automation for about our city-models in Graz. There already exist 2-dimensional descriptions
so the task of automating here is to achieve the transition from two to three dimensions. Slide
0.52 is a two-dimensional so-called geographic information system (GIS) of a certain area around
the Schlossberg in Graz. Lets take a look at this particular building in Slide 0.53. We have a
total of five aerial photographs, 3 of them are shown of that particular building in Slide 0.54.
14
CHAPTER 0. INTRODUCTION
The five photographs can be converted into so called edge images, a classical component of image
processing. There are topics hidden here for more Diplomarbeiten and Dissertationen.
We also convert an input GIS data into an output edge image. This edge image from the GISvectors can now be the basis for a match between these five edge images and the two dimensional
GIS image. They will not fit, because those edges of the roof as shown here are elevated and
therefore perspectively distorted as the other polygon is the representation of the footprint of the
building.
Algorithm 1 Affine matching
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
Read in and organize one or more digital photos with their camera information
Compute an egde image for each of the photos
Read in and organize the polygons of each building footprint
Project the polygon into each photo’s edge image
Vector-raster convert the polygon in each edge image, creating a polygon image
Compute a distance transform for each polygon image
repeat
Compute the distance between each edge image and its polygon image using the distance
transform
Change the geometry of the polygon image
until distance no longer gets reduced
There is a process called “affine matching” which allows to match the edge images computed from
the aeriaphotos and the representation which originally was a vector data structure. Affine matching is a Graz innovation: To match two different data structures namely raster and vector, which
in addition are geometrically different, is the purpose of affine matching the footprint of the house
is in an orthographic projection, with the roofline of the house in a central perspective projection.
Affine matching overcomes these differences and finds the best possible matches between the data
structures. The result in Slide 0.58 shows how the footprint was used to match the roofline of the
building using this affine matching technique. The algorithm itself is rather simple described (see
Algorithm 1). Now, the same idea of matching vectors with images is shown in the illustration
of Slide 0.59 where we see in yellow the primary position of a geometric shape, say typically the
footprint, and in red is the roofline. We need to match the roofline with the footprint. Slide 0.60
is another example of these matches, and Slide 0.61 is the graphic representation of the roofline.
0.7
Modeling Denver
We talk about a method to model all the buildings of a city like Denver (Colorado, USA). This
is an aerial photographic coverage of the entire city. Slide 0.63 is the down town area of Denver.
From overlapping aerial photographs we can automatically create a digital elevation model (DEM)
by a process called stereo matching. A DEM is a representation of the z-elevation to each (x, y)
at a regular grid mesh of points. So we have a set of regularly space of (x, y) locations where we
know the z-value of the terrain. We invite everybody to look into a Diplomarbeit or Dissertation
topic of taking this kind of digital elevation model and create from what it is called the “Bald
Earth”. One needs to create a filter which will take the elevation model and erase all the trees
and all the buildings, so that the only thing that is left is the Bald Earth. What is being “erased”
are towers, trees, buildings. That process needs an intelligent low-pass-filter. We will talk about
low-pass-filters later in this class. Slide 0.67 is the result a so called Bald Earth DEM (das DEM
der kahlen Erde). The difference between the two DEMs, namely the Bald Earth DEM and the
full DEM is of course the elevation of the vertical objects that exist on top of the Bald Earth.
These are the buildings, the cars, the vegetation. This is another topic one could study. Now we
need to look at the difference DEM and automatically extract the footprints of buildings. We can
0.8. THE INSIDE OF BUILDINGS
15
do that by some morphological operations, where we will close the gaps straighten the edges of
buildings, then compute the contours of the buildings. Finally we obtain the buildings and place
them on top of the Bald Earth.
When we have done that, we can now superimpose over the geometric shapes of building “boxes”
(the box-models) the photographic texture. We get a photorealistic model of all of Denver, all
generated automatically from aerial photographs. There exist multiple views of the same area of
Denver.
0.8
The Inside of buildings
City models are not only a subject of the outside of buildings, but also of their inside. Slide
0.74 is the Nationalbibliothek in Vienna, in which there is a Representation Hall (Prunksaal). If
one takes the architect’s drawings of that building, one can create a wire mesh representation as
illustrated in Slide 0.75, consisting of arcs and nodes. We can render this without removal of the
hidden surfaces and hidden lines to obtain this example.
We can go inside this structure, take pictures and use photographic texture to photo-realistically
render the inside of the Prunksaal in a manner that a visitor to the Prunksaal will never see. We
can not fly into the Prunksaal like a bird. We can also see the Prunksaal in the light that computer
rendering permits us to create. We can even go back a hundred years and show the Prunksaal at it
was a hundred years ago, before certain areas were converted into additional shelf-space for books.
There is a Diploma- and Dissertation-topic hidden in developments to produce images effectively
and efficiently inside a building. An example is shown in Slide 0.80 and Slide 0.81 of the ceiling,
imaging it efficiently in all its detail and colorful glory.
Yet another subject is how to model objects inside a room like this statue of emperor Charles VI.
He is shown in Slide 0.82 a triangulated mesh created from a point cloud. We will talk a little bit
about triangulated meshes later. Slide 0.82 is based on 20.000 points that are triangulated in a
non-trivial process. Slide 0.83 is a photo-realistic rendering of the triangulated point cloud, with
each triangle being superimposed by the photographic texture that was created from photographs.
A good scientific topic for Diplomarbeiten of Dissertationen is the transition from point clouds
to surfaces. A non-trivial problem exists when we look at the hand of the emperor. We need to
make sure to connect points in the triangles that should topologically be connected. And we do
not want the emperor to have hands like the feet of a duck.
0.9
Built-Documentation Modeling the Inside of Things in
Industry
There exists not only cultural monuments, but also industrial plants. This goes back to that idea
of “inverse” or “reverse engineering” to create drawings of a facility of a building for example, of
a refinery. The refinery may have been built 30 or 40 years ago and the drawings are no longer
available, since there was no CAD at that time. We take pictures of the inside of a building, using
perhaps thousands of pictures. We re-establish relationships between the pictures. We need to
know from were they are taken. One picture overlaps with another picture. Which pictures show
the same objects and which do not? That is done by developing this graph in Slide 0.89. Each
node of the graph is “a postage stamp” of the picture and the arcs between these nodes describe
the relationship. If there is no arcs then there is no relationship. Any images can be called up
on a monitor. Also pairs of images can be set up. We can point to a point on one image and
a process will look for the corresponding point in the other overlapping image or images. The
three dimensional location of the point we have pointing at in only one image will be shown in
the three dimensional rendering of the object. So again, “from image to objects” means in this
16
CHAPTER 0. INTRODUCTION
case “reverse engineering” or “as-built-documentation”. Again there are plenty of opportunities
for research and study in the area of automation of all these processes.
A classical topic is the use of two pictures of a some industrial structure to find correspondences
of the same object in both images without any knowledge about the camera or object. By eye we
can point to the same feature in two images, but this is not trivial to do by machine if we have
no geometrie relationships established between the two images that would limit the search areas.
One idea is to find many candidates of features in both images and than determine by some logic
which of those features might be identical. So we find one group of features in one image, and
another group in the other image. Then we decide which points or objects belong together. The
result is shown as highlighted circles.
A similar situation is illustrated in Slide 0.95, however with test targets to calibrate a camerasystem for as-built-documentation. We automatically extract all the test objects (cercles) from
the images. We can see a three dimensional pattern of these calibration targets in Slide 0.96 and
Slide 0.97.
Now the same approach can also be applied to the outside of buildings as shown in Slide 0.98 with
three photographs of a railroad-station. The three images are input to an automatic algorithm to
find edges, the edges get pruned and reduced so that we are only left with significant edges that
represent windows, doors, awnings and the roofline of the building. This of course can also be
converted into three dimensions. There is yet another research topic, namely “automated mapping
of geometric details of facades”. Slide 0.100 and Slide 0.101 are the three dimensional renderings
of those edges that are found automatically in 3-D.
0.10
Modeling Rapidly
We not only want to create these data at a low cost, we also want to get them rapidly. Slide
0.103 is an example: a village as been imaged from a helicopter with a handheld camera, looking
out to the horizon we appreciate an oblique, panoramic image. “Give us a model of that village
tomorrow” may be the task. Particularly when it concerns catastrophies, disasters, military or
anti-terror operations and so forth. The topic which is hidden here is that these photos were
not taken with a well-controlled camera but accidentally and hastily from a helicopter and with
an average amateur camera. The research topic here is the “use of uncalibrated cameras”. A
wire-mesh representation of the geometry can be created by a stereo process. We can then place
the buildings on top of the surface much like in the Denver-example discussed earlier and we can
render it in a so-called flat-shaded representation. We can now look at it, navigate in the data set,
but this is not visually as easy to interpret as it would be if we had photography super-imposed,
which is the case in Slide 0.109 and Slide 0.110. Now we can rehearse an action needed because of
a catastrophy or because of a terrorist attack in one of those buildings. We can fly around, move
around and so forth.
0.11
Vegetation
“Vegetation” is a big and important topic in this field. Vegetation is difficult to map, difficult
to render and difficult to remove. Vegetation as in the Graz-example, may obscure facades. If
we made pictures to map the buildings and to get the photographic texture, then these trees,
pedestrians and cars are a nuisance. What can we do? We need to eliminate the vegetation, and
this is an interesting research topic. The vegetation is eliminated with a lot of manual work. How
can we automate that? There are ways and ideas to automate this kind of separation of objects
that are at a different depth from the viewer using multiple images.
0.12. COPING WITH LARGE DATASETS
17
Using vegetation for rendering, like in the picture of the Schloßberg of Slide 0.115, is not trivial
either. How do we model vegetation in this virtual habitat? The Schloßberg example is based on
vegetation that is photographically collected and then pasted onto flat surfaces that are mounted
on tree trunks. This is acceptable for a still image like Slide 0.117, but if we have some motion,
then vegetation produces a very irritating effect, because the trees move as we walk by. Another
way, of course, is to really have a three dimensional rendering of a tree, but they typically are
either very expensive or they look somewhat artificial, like the tree in the example of Slide 0.118.
Vegetation rendering is thus also an important research topic.
0.12
Coping with Large Datasets
We have a need to cope with large data sets in the administration, rendering and visualization
of city data. The example of modeling Vienna with its 220,000 buildings in real-time illustrates
the magnitude of the challenge. Even if one compresses the 220,000 individual buildings into
20,000 “blocks”, thus on average combining 10 buildings into a single building block, one still has
to cope with a time-consuming rendering effort that is not possible to achieved in real-time. A
recent doctoral thesis by M. Kofler (1998) reported on algorithms to accelerate the rendering on
an unaided computer by the factor of 100, simply by using an intelligent data structure.
If the geometric data are augmented by photographic texture, then the quantity of data gets even
more voluminous. Just assume that one has 220,000 individual buildings consisting of 10 facades
each, each facade representating roughly 10m × 10m, photographic texture at a resolution of
5cm × 5cm per pixel. You are invited to compute the quantity of data that results from this
consideration.
Kofler’s thesis proposed a clever data structure called “LOD/R-tree”. “LOD” stands for level
of detail, and R-tree stands for rectangular tree. The author took the entire city of Vienna and
defined for each building a rectangle. These are permitted to overlap. In addition, separate
rectangles represent a group of buildings, even the districts are represented by one rectangle each.
Actually, the structure was generalized to 3D, thus we are not dealing with rectangles but with
cubes.
Now as this is being augmented by photographic texture one needs to select the appropriate data
structure, to be super-imposed over the geometry. As one uses the data one defines the so-called
“Frustum” as the intanstaneous cone-of-view. At the front of the viewing cone one has high
resolution, whereas in the back one employs low resolution. The idea is to store the photographic
texture and the geometry at various levels of detail and then call up those levels of detail that
are relevant, at a certain of distance to the viewer. This area of research is still rapidly evolving
and “fast visualization” is therefore another subject of on-going research for Diplomarbeiten and
Dissertationen. The actual fly-over of Vienna using the 20,000 building blocks in real-time is now
feasible on a regular personal computer producing about 10 to 20 frames per second as opposed
to 10 seconds per frame prior to the LOD/R-tree data structure. Slide 0.129 and Slide 0.130 are
two views computed with LOD/R-tree. The same LOD/R-tree data structure can also be used to
fly over regular DEMs - recall that these are regular grids in (x, y) to which a z-value is attached
at each grid intersection to represent terrain elevations. These meshes are then associated with
photographic texture as shown in three segmential views.
We generally call this “photorealistic
rendering of outdoor environments”.
Another view of a Digital Elevation Model (DEM), super-imposed with a higher resolution aerial
photograph, is shown in Slide 0.135 and Slide 0.136.
18
0.13
CHAPTER 0. INTRODUCTION
Non-optical sensing
Non-photographic, therefore non-optical, sensors could be used for city modeling. Recall that
we model cities from sensor data and then we render cities using the models as input and we
potentially augment those by photographic texture. Which non-optical sensors can we typically
consider? A first example is radar imagery. We can use imagery taken with microwaves at
wavelengths between 1 mm to 25 cm or so. That radiation penetrates fog, rain, clouds and is
thus capable of “all-weather” operations. The terrain is illuminated actively like with a flash
light supporting a “day & night” operation. An antenna transmits microwave radiation, this gets
reflected on the ground, echoes are coming back to the antenna which is now switched to receive.
We will discuss radar imaging in a later section of this class. Let’s take a look at two images. One
image of Slide 0.138 has the illumination from the top, the other has the illumination from the
bottom. Each image point or pixel covers 30 cm × 30 cm on the ground representing a geometric
resolution of 30 cm. Note that the illumination causes shadows to exist and how the shadows fall
differently in the two images.
The radar images can be associated with a direct observation of the digital elevation of the terrain.
Slide 0.139 is an example associated with the previous two images of the area of the Sandia
Research Laboratories in Albuquerque (New Mexico, USA). About 6,000 people work at Sandia.
The individual buildings are shown in this dataset, which is in it-self rather noisy. But it becomes
a very powerful dataset when it is combined with the actual images. We have found here a
non-stereo way of directly mapping the shape of the Earth in three dimensions.
Another example with 30 cm × 30 cm pixels is a small village, the so-called MOUT site (Military
Operations in Urban Terrain). Four looks from the four cardinal directions show shadows and
other image phenomena that are different to understand and are subject of later courses. We will
not discuss those phenomena much further in this course. Note simply that we have four images
of one and the same village and those phenomena in the four images look very different. Just
study those images in detail and consider how shadows fall, how roofs are being imaged and note
in particular one object, namely a church as marked. This church can be reconstructed using
eleven measurements. There are about 47 measurements one can take from those four images,
so that we have a set of redundant observations of these dimensions to describe the church. The
model of the church is shown in Slide 0.141, and is compared to an actual photograph of the same
church in Slide 0.142. This demonstrates that one can model a building not only from optical
photography, but from various types of sensor data. We have seen radar images in combination
with interferometry. There is a ample opportunity to study “Building re-construction from radar
images” in the form of Diploma and Doctoral thesis.
Another sensor is the laser scanner. Slide 0.144 is an example of a laser scanner result from downtown Denver. How does a laser scanner operate? An airplane carries a laser device. It shoots a
laser ray to the ground. It gets reflected and the time it takes to do the roundtrip is measured. If
there is an elevation the roundtrip time is shorter than if there is a depression. The direction into
which the laser “pencil” looks rapidly changes from left-to-right to create a “scanline”. Scanlines
are being added up by the forward motion of the plane. The scanlines accrue into an elevation
map of the ground.
The position of the airplane itself is determined using a Global Positioning System which is
carried on the airplane. The position might have a systematic error. But by employing a second
simultaneously observed GPS position on the ground one will really observe the relative motion
between the airplane GPS and the stationary GPS platform on the ground. This leads to a position
error in the cm-range for the airplane and to a very small error in the cm range for the distance
between the airplane in the ground. Laser measurements are a very hot topic in city modeling,
and there are advantages as well as disadvantages vis-a-vis building models from images. To study
this issue could be a subject of Diploma and Doctoral thesis.
Note that as the airplane flies along, only a narrow strip of the ground gets mapped. In order to
cover a large area of the ground one has to combine individual strips. Slide 0.147 illustrates how
0.14. THE ROLE OF THE INTERNET
19
the strips need to be merged and how any discrepancies between those strips, particularly in their
overlaps, need to be removed by some computational measure. In addition, one needs to know
points on the ground with their true coordinates in order to remove any uncertainties that may exist
from the airplane observations. So finally we have a matched, merged, cleaned-up data set and we
now can do the same thing that we did with the DEM from aerial photography, namely we merge
the elevation data obtained from the laser scanner with potentially simultaneously collected video
imagery, also taken from that same airplane: We obtain a laser scan and phototexture product.
0.14
The Role of the Internet
It is of increasing interest to look at a model of a city from remote locations. An example is the
so-called “armchair tourism”, vacation planning and such. Slide 0.152 is an example of work done
for a regional Styrian tourism group. They contracted to have a mountain-biking trail advertised
on the Internet using a VRML model of the terrain. Shown is in Slide 0.153 a map near Bad
Mitterndorf in Styria and a vertical view of a mountain-biking trail. Slide 0.154 is a perspective
view of that mountain-bike trail super-imposed onto a digital elevation model that is augmented
by photographic texture obtained from a satellite. This is actually available today via the Internet.
The challenge is to compress the data without significant loss of information and to offer that
information via the Internet at attractive real-time rates. Again Diploma and Doctoral thesis
topics could address the Internet and how it can help to transport more information faster and in
more detail and of course in all three dimensions.
Another example of the same idea is an advertisement for the Grazer Congress on the Internet.
The Grazer Congress’s inside was to be viewable to far away potential organizers of conferences.
They obtain a VRML view of the various inside spaces. Because of the need to compress those
spaces, the data are geometrically very simple, but they carry the actual photographic texture
that is available through photographs taken at the inside of the Grazer Congress.
The Internet is a source of a great variety of image information, an interesting variation of the city
models relates to the so-called “orthophoto”, namely photographs taken from the air or from space
that are geometrically corrected to take on the geometry of a map. The example of Slide 0.158
shows the downtown of Washington D.C. with the U.S. Capitol (where the parliament resides).
This particular web site is called “City Scenes”.
0.15
Two Systems for Smart Imaging
We already talked about imaging by regular cameras, by radar or non-imaging sensing and by
laser. Let’s go a step further: specific smart sensing developed for city mapping. As part of a
doctoral thesis in Graz a system was developed to be carried on the roof of a car with a number
of cameras that allow one to reconstruct the facades of buildings in the city. Images are produced
by driving with this system along those buildings. At the core of the system is a so-called linear
detector array consisting of 6,000 CCD elements in color. These elements are combined with two
or three optical systems, so that 3,000 elements are exposed through one lens and another 3,000
elements through another lens. By properly arranging the lenses and the CCDs one obtains a
system, whereby one lens collects a straight line of the facade looking forward and the other lens
collects a straight line either looking backwards or looking perpendicular at the building.
In Slide 0.163 we see the car with the camera-rig driving by a few buildings in Graz- Kopernikusgasse. Slide 0.164 shows two images with various details from those images in Slide 0.165, in
particular images collected of the Krones-Hauptschule. Simultaneously with the linear detector
array collecting images line by line as the car moves forward (this is also called “push broom
imaging”), one can take images with a square array camera. So we have the lower resolution
20
CHAPTER 0. INTRODUCTION
square array camera with maybe 700 × 500 pixels augmented by the linear detector array images
with 3,000 pixels in one line and an infinite number of lines as the car drives by. The opportunity
exists here as well to perform work for Diploma or Doctoral-theses to develop the advantages and
disadvantages of square array versus line array cameras.
A look at an image of a linear array shows its poor geometry because as the car drives there are
lots of motions going on. In the particular doctoral thesis, the candidate developed software and
algorithms to fix the geometric deformations in the images. Used is the fact that many of the
features are recti-linear, for example edges of windows and details on the wall. This can help to
automatically produce good images.
If two images are produced, one can produce a stereo
rendering of the city scape. The human observer can obtain a 3 dimensional using stereo glasses,
as we will discuss later.
That linear detector array approach carried in a car as a rigid arrangement without any moving
camera points was also used by the same author to create a panoramic camera. What is a panorama
camera? This is a camera that sweeps (rotates) across the area of interest with an open shutter,
producing a very wide angle of view, in this case of 360 degrees in the horizontal dimension and
maybe 90 degrees in the vertical direction. We can use two such images for stereoscopy by taking
photos from two different positions. The example shown in Slide 0.172 has two images taken of
an office space to combine into a stereo pair which can be used to recreate a complete digital 3-D
model of the office space. These are the two raw images in which the “panoramic sweep” across
360o is presented as a flat image.
What is the geometry of such a panoramic camera? This is rather complex. We do have a
projection center O that is located on a rotation axis, which in turn defines a z-coordinate axis.
The rotation axis passes through the center of an imaging lens. The CCD elements are arranged
vertically at location zCCD . An object point pObj is imaged onto the imaging surface at location
zCCD . The distance between O and the vertical line through the CCD is called “focal distance”
fCCD . An image is created by rotating the entire arrangement around the z-axis and collecting
vertical rows of pixels of the object space, and as we move we a assemble many rows into a
continuous image. One interesting topic about this typ of imaging would be to find out what
the most efficient and smartest ways would be to image indoor spaces (more potential topics for
Diploma- and Doctoral research. To conclude Slide 0.175 is an image of an office space with a door,
umbrella and a bookshelf that is created from that panoramic view in Slide 0.172 by geometrically
“fixing” it to make it look like a photo from a conventional camera. The Congress Center in
Graz has also been imaged in Slide 0.176 with a panoramic sweep; a separate sweep was made in
slideFigure x to see how the ceiling looks when swept with a panoramic camera.
0.16
International Center of Excellence for City Modeling
Who is interested in research on city models in the world? What are the “centers of excellence”?
In any endeavour that is new and “hot” you always want to know who is doing what and where.
In Europe there were several Conferences in recent years on this subject. One of these was in
Graz, one was in Ascona in Switzerland, one in Bonn. Ascona was organized by the ETH-Zürich,
Bonn by the University of Bonn, the Graz meeting by our Institute.
The ETH-Zurich is home of considerable work in this area so much so that some University people
even started a company, Cybercity AG. The work in Zurich addresses details of residential homes
led to the organisation of two workshops in Ascona for which books have been published in the
Birkhäuser-Verlag. One can see in these examples of Slide 0.182 Slide 0.183 Slide 0.184 Slide 0.185
Slide 0.186 that they find edges, use those to segment the roof into it’s parts. They use multiple
images of the same building to verify that the segmentation is correct and improve it if errors
are found. The typical example from which they work is aerial photography at large scales (large
scales are at 1:1500; small scales are at 1:20,000). Large models have been made, for example of
Zurich as shown in Slide 0.186.
0.17. APPLICATIONS
21
The most significant amount of work in this area of city modeling has probably been performed at
the University in Bonn. The image in Slide 0.188 is an example of an area in Munich. The method
used in Bonn is fairly complex and encompasses an entire range of procedures that typically would
be found in many chapters of books on image processing or pattern recognition. One calls the
diagram shown in Slide 0.189 an “image processing pipeline”.
The data processed in Bonn are the same as used in Zurich. There exists an international data-set
for research so that various institutions have to ability to practice their skill and compare the
results. We will later go through the individual worksteps that are being listed in the pipe-line.
One result from Bonn using the international images shows edges and from the edges finds match
points and corners in separate images of the same object. This indicates the top of a roof. This
illustration in Slide 0.190 is explaining the principle of the work done in Bonn. Another Bonnapproach is to first create corners and then topologically connect the corners so that roof segments
come into existence. Then these roof-segments are merged into the largest possible area that might
present roofs as shown in this example.
Another approach is to start the modelling of a building not from the image itself nor from its
edges and corners, but to create point clouds by stereo measurements. This represents a dense
digital elevation model as we have explained earlier in the Denver-example. Digital elevations are
illustrated here by encoding the elevation by brightness values with dark being low, white being
high. One can now try to fit planes to the elements of the digital elevation model. Slide 0.193 is
an intermediate result, where it looks as if one has found some roofs. The digital elevation model
here invites one to compute planes to define roofs and the sides of buildings.
In North America the work on City modeling is typically sponsored by the Defence Advanced
Research Projects Agency (DARPA). Their motivation is the military application, for example
to fight urban wars or having robots move through cities, or face terrorists. DARPA programs
typically address university research labs. The most visible ones were the University of Massachusetts in Amherst, the University of Southern Colorado, the Carnegie-Mellon University and
the Stanford Research Institute (SRI), which is a spin-out from Carnegie-Mellon University. SRI
is a well-known research lab that is separately organised as a foundation.
In the US there are other avenues towards modeling of cities, which are not defense oriented. One
is architecture. In Los Angeles there is the architecture department of the University of California
at Los Angeles. They are building a model the entire city of Los Angeles using students and
manual work.
0.17
Applications
Let me come to a conclusion of city modeling. Why do people create such modeling? The
development of an anwer presents another opportunity to do application studies for Diploma
and Doctoral thesis Let me illustrate some of those applications of city models. These certainly
include city planning, architectural design, (car-)navigation, there is engineering re-construction
of buildings that have been damaged and need to be repaired, then infotainment (entertainment),
there is simulation and training for fire-guards and for disaster preparedness. Applications can
be found in Telecom or in the Military. A military issue is guidance of robot soldiers, targeting
and guiding of weapons. In Telecom we may need to transmit data from roof to roof as one
way of broad band wireless access systems. In infotainment we might soon have 3-dimensional
phonebooks.
0.18
Telecom Applications of City Models
A particular computer graphics and image processing issue which should be of specific interest to
Telematics-people is “the use of building models for Telecom and how these building models are
22
CHAPTER 0. INTRODUCTION
made”. In Slide 0.202 is a three dimensional model of the downtown of Montreal. The purpose
of this model is the plan to setup on top of roofs of high buildings. Those antennas would serve
as hubs to illuminate other buildings and to receive data from other buildings, in a system that
is called Local Multi-Point Distribution system (LMDS). This is a broadband wireless access
technology that competes with fibre optics in the ground and with satellite communication. We
will see how the technologies will shake out, but LMDS is evolving everywhere, it is very scaleable
since one can build up the system sequentially hub-by-hub, and one can increase the performance
sequentially as more and more users in buildings sign up.
Slide 0.204 is a model of a large section of Vancouver, where the buildings are modeled in support
of an LMDS-project. In order to define where to place a hub one can go into software that
automatically selects the best location for a hub. For example if we place an antenna on a high
building we then can determine which buildings illuminated from that antenna and which are not.
We use examples from a Canadian project to map more than 60 cities. One delivers to the
Telecom company so called “raster data”, but also so called “vector data”, and also non-graphic
data, namely addresses. We will talk later about raster and vector data structures, and we will
discuss how they are converted into one another.
The geometric accuracy of the shape of these buildings should be in the range of ±1 meter in x,
y, and z in order to be useful for the optimum location of antennas.
How many buildings are in a square km? In Montreal this was about 1000 buildings per sqkm
in the downtown. Because the data need to be delivered quickly (Telecom-companies need them
“now”), one can not always have perfect images to extract buildings from. So one must be able
to mix pre-existing photography and new aerial sources and work from what is there. For this
reason one needs to be robust in one’s procedures vis-à-vis the type of photography. The question
often is: from what altitude is that photography taken and therefore what is the scale of the
photographs?
Some Telecom-companies want all buildings (commercial and residential), while others only need
the commercial buildings. Most of the companies want all addresses. Even multiple addresses
must be provided in the case of an apartment building. There is always a need to be quick and
inexpensive Companies expect that a hundred sqkm can be modeled per week which is a hundred
thousand buildings per week. One cannot achieve this by hand. One has to do this by machine.
One challenge might be that one is faced with aerial photography that is flown at too large a
scale. Slide 0.207 shows a high-riser, looks different in one view from the other stereoscopic view
in Slide 0.208. In a high-rise building we may not even see a certain side of the building in one
photograph, but we see that side in the other. Our procedure must cope with these dissimilarities.
In Slide 0.209 is a set of polygons extracted from an image and one can already see that some
polygons are not visible from that particular photograph. Clearly those data where extracted from
another photograph as shown in Slide 0.210. The same situation is illustrated again in this second
example of Slide 0.211. Finally we have a raster representation of the buildings in Slide 0.212. So
we have an (x, y)-grid on the ground and to each (x, y)-grid we have a z-elevation. The images
shown before were the source of the building in the center of this particular raster representation.
But we also want a vector-representation of the building footprints and of the details of the roofs
as in the example of downtown Montreal.
These vectors are needed, because the addresses can
be associated with polygons describing a building, but one has a harder time associating addresses
with a raster representation. However, the signal propagation computation needs raster data as
shown here.
The entire area of central Montreal has 400,000 buildings as shown in Slide 0.217. Zooming in
on the green segment permits one to see city-blocks. Zooming in further produces individual
buildings. A very complex building is the cathedral, which on a an aerial photograph looks like
Slide 0.220.
0.18. TELECOM APPLICATIONS OF CITY MODELS
23
Lets us summarize: the data-sets being used for this Telecom wave-propagation modeling in the
LMDS application consists first of all of vector data of the buildingsSlide 0.222 but also of the
vegetation, because the vegetation may block the intervisibility of antennas and, we show also
the combination of both. Of course the same data are needed in a raster format of the building
data, and finally a combination of raster and vector data to include the trees. And we must
not forget the addresses. Again, there may be one address per building, or multiple addresses for
each building. The addresses are locked to the geometric data address-locators that are placed
inside the polygons. As a result the addresses are associated with the polygons and thus with the
buildings.
What do such Telecom-data-sets go for in terms of price? A building may cost between $ 1 and $
25. A square km may go for $ 100 to $ 600. However, if there are 1000 buildings per sqkm then
obviously an individual building may cost less than one dollar. A metropolis such as Montreal
may cover 4000 square km but the interest is focussed on 800 sqkm. On average of course there
are less than 1000 buildings per sqkm. One might find more typically 200 or so buildings per sqkm
over larger metropolitan regions.
...
24
CHAPTER 0. INTRODUCTION
0.18. TELECOM APPLICATIONS OF CITY MODELS
25
Slide 0.1
Slide 0.2
Slide 0.3
Slide 0.4
Slide 0.5
Slide 0.6
Slide 0.7
Slide 0.8
Slide 0.9
Slide 0.10
Slide 0.11
Slide 0.12
Slide 0.13
Slide 0.14
Slide 0.15
Slide 0.16
Slide 0.17
Slide 0.18
Slide 0.19
Slide 0.20
Slide 0.21
Slide 0.22
Slide 0.23
Slide 0.24
Slide 0.25
Slide 0.26
Slide 0.27
Slide 0.28
26
CHAPTER 0. INTRODUCTION
Slide 0.29
Slide 0.30
Slide 0.31
Slide 0.32
Slide 0.33
Slide 0.34
Slide 0.35
Slide 0.36
Slide 0.37
Slide 0.38
Slide 0.39
Slide 0.40
Slide 0.41
Slide 0.42
Slide 0.43
Slide 0.44
Slide 0.45
Slide 0.46
Slide 0.47
Slide 0.48
Slide 0.49
Slide 0.50
Slide 0.51
Slide 0.52
Slide 0.53
Slide 0.54
Slide 0.55
Slide 0.56
0.18. TELECOM APPLICATIONS OF CITY MODELS
27
Slide 0.57
Slide 0.58
Slide 0.59
Slide 0.60
Slide 0.61
Slide 0.62
Slide 0.63
Slide 0.64
Slide 0.65
Slide 0.66
Slide 0.67
Slide 0.68
Slide 0.69
Slide 0.70
Slide 0.71
Slide 0.72
Slide 0.73
Slide 0.74
Slide 0.75
Slide 0.76
Slide 0.77
Slide 0.78
Slide 0.79
Slide 0.80
Slide 0.81
Slide 0.82
Slide 0.83
Slide 0.84
28
CHAPTER 0. INTRODUCTION
Slide 0.85
Slide 0.86
Slide 0.87
Slide 0.88
Slide 0.89
Slide 0.90
Slide 0.91
Slide 0.92
Slide 0.93
Slide 0.94
Slide 0.95
Slide 0.96
Slide 0.97
Slide 0.98
Slide 0.99
Slide 0.100
Slide 0.101
Slide 0.102
Slide 0.103
Slide 0.104
Slide 0.105
Slide 0.106
Slide 0.107
Slide 0.108
Slide 0.109
Slide 0.110
Slide 0.111
Slide 0.112
0.18. TELECOM APPLICATIONS OF CITY MODELS
Slide 0.113
Slide 0.114
Slide 0.115
Slide 0.117
Slide 0.118
Slide 0.119
29
Slide 0.116
30
CHAPTER 0. INTRODUCTION
0.18. TELECOM APPLICATIONS OF CITY MODELS
31
Slide 0.120
Slide 0.121
Slide 0.122
Slide 0.123
Slide 0.124
Slide 0.125
Slide 0.126
Slide 0.127
Slide 0.128
Slide 0.129
Slide 0.130
Slide 0.131
Slide 0.132
Slide 0.133
Slide 0.134
Slide 0.135
Slide 0.136
Slide 0.137
Slide 0.138
Slide 0.139
Slide 0.140
Slide 0.141
Slide 0.142
Slide 0.143
Slide 0.144
Slide 0.145
Slide 0.146
Slide 0.147
32
CHAPTER 0. INTRODUCTION
Slide 0.148
Slide 0.149
Slide 0.150
Slide 0.151
Slide 0.152
Slide 0.153
Slide 0.154
Slide 0.155
Slide 0.156
Slide 0.157
Slide 0.158
Slide 0.159
Slide 0.160
Slide 0.161
Slide 0.162
Slide 0.163
Slide 0.164
Slide 0.165
Slide 0.166
Slide 0.167
Slide 0.168
Slide 0.169
Slide 0.170
Slide 0.171
Slide 0.172
Slide 0.173
Slide 0.174
Slide 0.175
0.18. TELECOM APPLICATIONS OF CITY MODELS
33
Slide 0.176
Slide 0.177
Slide 0.178
Slide 0.179
Slide 0.180
Slide 0.181
Slide 0.182
Slide 0.183
Slide 0.184
Slide 0.185
Slide 0.186
Slide 0.187
Slide 0.188
Slide 0.189
Slide 0.190
Slide 0.191
Slide 0.192
Slide 0.193
Slide 0.194
Slide 0.195
Slide 0.196
Slide 0.197
Slide 0.198
Slide 0.199
Slide 0.200
Slide 0.201
Slide 0.202
Slide 0.203
34
CHAPTER 0. INTRODUCTION
Slide 0.204
Slide 0.205
Slide 0.206
Slide 0.207
Slide 0.208
Slide 0.209
Slide 0.210
Slide 0.211
Slide 0.212
Slide 0.213
Slide 0.214
Slide 0.215
Slide 0.216
Slide 0.217
Slide 0.218
Slide 0.219
Slide 0.220
Slide 0.221
Slide 0.222
Slide 0.223
Slide 0.224
Slide 0.225
Slide 0.226
Slide 0.227
Slide 0.228
Slide 0.229
Slide 0.230
Slide 0.231
0.18. TELECOM APPLICATIONS OF CITY MODELS
35
Slide 0.232
Slide 0.233
Slide 0.234
Slide 0.235
Slide 0.236
Slide 0.237
Slide 0.238
Slide 0.239
Slide 0.240
Slide 0.241
Slide 0.242
Slide 0.243
Slide 0.244
Slide 0.245
Slide 0.246
Slide 0.247
Slide 0.248
Slide 0.249
Slide 0.250
Slide 0.251
Slide 0.252
Slide 0.253
Slide 0.254
Slide 0.255
Slide 0.256
Slide 0.257
Slide 0.258
Slide 0.259
36
CHAPTER 0. INTRODUCTION
Chapter 1
Characterization of Images
1.1
The Digital Image
Images can be generated from at least two sources. The first is creation of the image from the
measurements taken by a sensor. We would call this a “natural image”. In contrast, an image may
also be generated by a computer describing an object or a situation that may or may not consist
in the real-world. Such images are “computer generated” (CGI, computer-generated-images).
All digital images have a coordinate system associated with them. Slide 1.5 is an original and
typical image with two dimensions and has a rectangular (Cartesian) coordinate system with
axes x and y. Therefore a location in the image can be defined by its coordinates x and y.
Properties of the image can now be associated with that location. In that sense the image is an
algebraic function f (x, y). When we deal with digital images then we discretize this continuous
function and we replace the continuous image by rows and columns of image elements or pixels.
A pixel is typically to be a square or rectangular entity. More realistically of course the sensor
that may have caused an image may have an instantaneous field-of-view that is not rectangular
or square. It is oftentimes a circle. We are presenting an image digitally as an arrangement of
square pixels, although the machinery which creates the digital image may not produce square
pixels.
Digital images are fairly simple arrangements of numbers that are associated with gray values as
illustrated in Slide 1.7. If shows four different gray values between 0 and 30 with 0 being white
and 30 being black. A very simple type of image is a so-called “binary image” or binary mask.
That is an image of which the pixels have gray values of either 0 as white or 1 as black. Such a
binary image may be obtained by thresholding a gray value image. We may have a threshold
Algorithm 2 Threshold image
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
create a binary output image with the same dimensions as the input image
for all pixel p of the input image do
retrieve grayvalue v of pixel p from image
find pixel p0 of output image corresponding to p
if v ≥ vt then
{compare grayvalue v with threshold vt }
set p0 to white
else
set p0 to black
end if
end for
37
38
CHAPTER 1. CHARACTERIZATION OF IMAGES
that takes all pixel values between 15 and 25 to be black (or 1) and all other gray values will be
set to white or 0.
An immediate question to ask is for the reason that this technology has been developed to take
continuous gray values and convert them into digital pixel arrays. Let’s discuss a few advantages,
a very significant one is “quantification”. In a digital environment we are not subject to judging
an image with our opinions but one has actual measurements. This can be illuminated by an
example of a gray area embedded either in a dark or a white background. Subjectively our eye
will tell us that the gray area is brighter when embedded in a dark environment or darker when
embedded in a brighter environment. But in reality the two gray values are identical.
An
eye can objectively differentiate a limited number of gray values. In a chaotic image we may be
able to separate only 16 to 64 gray values. Relatively, though, namely in situations where we
have two areas adjacent to one another, our eyes become very sensitive to the differences. But
we cannot compare a gray-tone in one corner of an image to a gray-tone in another corner of the
same image and be certain which one is brighter or darker. That can be easily accomplished in a
digital environment.
There is a whole host of other advantages that will not be discussed at the same level of detail.
First, a very important one is the automation of the visual sense. We can give the computer eyes
and can process the visual information by machine, and thereby taking the work of interpreting
various visual inputs away from the human. Examples are quality control in a factory environment
or in inaccessible, dangerous areas.
Second, an advantage is “flexibility”. We have options that we do not have in an analog environment or with the natural visual sense in configuring very flexible sensing systems for very specific
tasks. Third, the ability to store, retrieve, transfer and publish visual information at very little
cost is another advantage if the information is digital. We all have of course experience now with
multimedia information on the web and we all know that duplication and transfer is available at
almost no cost. Forth is the advantage to enhance the visual sense of the human by an array of
sensors, for example under water imaging, sound imaging, x-ray imaging, microwave imaging. We
will address sensors in more detail.
Fifth, digital processing of sensor data is essentially independent of the specifics of the sensor. We
may have algorithms and software that are applicable to a variety of sensors. That is an advantage
in a digital environment. Sixth is cost: digital images are inexpensive. This was mentioned already
in the context of storage, transfer and publication. Expensive looking color images can be rendered
on a computer monitor and yet we have no direct costs for those images. This is quite a difference
from going to a photo lab and getting quality paper prints offer diapositive.
The seventh advantage of digital images needs an example to explain. There exist numerous
satellites orbiting the Earth and carrying Earth-observing sensors. One such system is from the
US-NASA and is called “Landsat”, Slide ?? is an example of a Landsat image of the Ennstal
with its rows and columns. What makes this image interesting is that the color presentation of
what the sensor in orbit “sees”. The presentation is made from 7 separate spectral channels, not
from simple red/green/blue color photography. Something that is very typical of the flexibility
and versatility of digital sensors and digital image processing is this ability to extend the visual
capabilities of humans and operate with many more images than a human can “see” or cope with.
Prüfungsfragen:
• Was versteht man unter einem Schwellwertbild“, und für welchen Zweck wird es verwendet?
”
• Welche Vorteile haben digitale Bilder gegenüber analogen Bildern?
• Was versteht man unter einem Mehrfach- oder Multispektralbild, und wofür wird es verwendet?
1.2. THE IMAGE AS A RASTER DATA SET
1.2
39
The Image as a Raster Data Set
A digital image is an array of pixels. It was already mentioned that in principle the images are
continuous functions f (x, y). A very simple “image model” states that f (x, y) is the product of
two separate functions. One function is the illumination I and the other function describes the
properties of the object that is being illuminated, namely the reflection R. The reflection function
may vary between 0 and 1 whereas the illumination function may vary between 0 and ∞.
We now need to discretize this continuous function in order to end up with a digital image.
We might create 800 by 1000 pixels, a very typical arrangement of pixels for the digital sensing
environment. So we sample our continuous function f (x, y) into an N × M matrix with N rows
and M columns. Typically our image dimension are 2n . So our number of rows may be 64, 128,
512, 1024 etc. We not only discretize or sample the image (x, y)-locations. We also have to take
the gray value at each location and discretize it. We do that also at 2b , with b typically being
small and producing 2, 4, 8, 12, 16 bits per pixels.
Definition 1 Amount of data in an image
Definition 3: ”The amount of data of an image”
To calculate the amount of data of an image you have to have given the geometric and radiometric
resolution of the image.
Let’s say we have an image with N columns and M rows (geometric resolution) and with the
radiometric resolution of R bits per pixel.
The amount of data b of the image is then calculated using the formula:
b=N ∗M ∗R
A very simple question is shown in Slide 1.20. If we create an image of an object and we need to
understand from the image a certain detail in the object, say a spec of dirt on a piece of wood of
60 cm by 60 cm, and if that dirt can be as small as 0.08 mm2 , what’s the size of the image to be
sure that we recognize all the dirt spots?
The resolution of an image is a widely discussed issue. When we talk about a geometric resolution
of an image than we typically associate with this the size of the pixel on the object and the number
of pixels in an image. When we talk about radiometric resolution than we describe here the number
of bits we have per pixel. Let us take the example of geometric resolution. We have in Slide 1.22
and Slide 1.23 a sequence of images of a rose that begins with a resolution of a 1000 by 1000 pixels.
We go down from there to ultimately 64 by 64 or even 32 by 32 pixels. Clearly at 32 by 32 pixels
we cannot recognize the rose any more.
Lets take a look at the radiometric resolution. We have in Slide 1.24 a black and white image of
that a rose at 8 bits per pixel. We reduce the number of bits and in the extreme case we have
one bit only, resulting in a binary image (either black or white). In the end we may have a hard
time interpreting what we are looking at, unless we know already what to expect. As we will see
later, image processing a 8-bits in black & white images is very common. A radiometric resolution
at more bits per black & white pixel is needed for example in radiology. In medicine it is not
uncommon to use 16 bits per pixel. With 8 bits we obviously get 256 gray values, if we have 12
bits we have 4096 gray values.
The color representation is more complex, we will talk about that extensively. In that case we
do not have one 8-bit number per color pixel, but we typically have three numbers, one each for
red/green/blue, thus 24 bits in total per each color pixel.
40
CHAPTER 1. CHARACTERIZATION OF IMAGES
Prüfungsfragen:
• Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“. Was ist
”
darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells?
• Beschreiben Sie den Vorgang der Diskretisierung beim Übergang von einem analogen zu
einem digitalen Bild.
• Was versteht man unter Sampling, und welche Probleme treten dabei auf? Sie sind eingeladen, in Ihrer Antwort Formeln zu benutzen.
• Was bedeuten die Begriffe geometrische“ bzw. radiometrische“ Auflösung eines Bildes?
”
”
Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen.
1.3
System Concepts
We talk about image-analysis, image-processing or pattern recognition and about computer graphics. What are their various basic ideas? Image processing goes from the image to a model of an
object, and from there to an understanding of the object. In [GW92] an image analysis system
is described in the first introduction chapter. One always begins with (a) sensors, thus with the
image acquisition step, the creation of an image by a camera, radar system, by sound. Once the
image is acquired it is, so to speak, “in the can”. We now can (b) improve the image, this is called
“pre-processing”. Improving means fixing errors in the image, making the image look good for
the eye if a human needs to inspect it. Preprocessing produces a new, improved image.
We now want to decompose the image into its primitives. We would like to (c) segment it into areas
or fields, edges, lines, regions. This creates from the pre-processed image as it has been seen visually
a new image in which the original pixels are substituted by the image regions, contours, edges. We
denote this as “segmentation”. After segmentation we need to create a (d) representation and a
description of the image contents. And finally we want to use the image contents and (e) interpret
their meaning. How do objects looks like? This phase is called recognition and interpretation.
All of this is based on (f) knowledge about a problem domain, about the sensor, about the object,
about the application of the information. So once the object information has been interpreted we
now can use the information extracted from the image for action. We may make a decision to e.g.
move a robot, or to dispose of a defective part or to place an urban waste dump and so forth.
The typical ideas at the basis of computer graphics are slightly different. We start out from the
computer in which we store data about objects and create an image as a basis for actions. So
we have a database and an application model. We have a program to take the data from the
database and to feed the data into a graphic system for display. The object of computer graphics
is the visual impression of a human user. However, what may seem like two different worlds, image
processing versus computer graphics, really are largly one and the same world. Image processing
creates from images of the real world a model of that real world. Computer graphics takes a model
of objects and creates from it an image of those objects. So in terms of a real world, computer
graphics and image processing are entirely complementary. Image processing is going from real
world to a model of the real world, and computer graphics takes the object of the real world and
creates an image of it.
Where those two areas do diverge is in the non-real world. There is no sensing and no image
analysis of a non-real world. What is computer graphics of a non-real world? Just look at
cartoons and the movies. So there is point-of-view that says that image processing and computer
graphics belong together. A slightly different point of view is to say that image processing and
computer graphics overlap in areas addressing the real world, and that there are areas that are
separate.
1.4. DISPLAYING IMAGES ON A MONITOR
41
Prüfungsfragen:
• Skizzieren Sie den Vorgang der Bilderkennung als Kette von Prozessen von der Szene bis hin
zur Szenenbeschreibung.
1.4
Displaying Images on a Monitor
The customary situation today is with a refresh buffer in which we store numbers and represent
the image. We will use a display controller that managers this buffer based on data and software
residing on a host computer. And we have a video controller that takes what’s in the buffer and
presents this information on a computer monitor. In the buffer we might have a binary image
at 1 bit per pixel. Or we may have a color image at 24 bits per pixel. These are the typical
arrangements for refresh buffers. The refresh buffer typically is larger than the information on
a computer monitor. The computer monitor may display 800 by 1000 pixels. The refresh-buffer
might hold 2000 by 2000 pixels. An image is displayed on the monitor using a cathode-ray tube
or as LCD-arrangement. On a cathode-ray tube the image is being painted line by line on the
phosphor is surface, going from top to bottom.
Then the ray gets turned off. So it moves from left to right with the beam-on, right to left
with the beam-off, top down with beam-on, down-to-top at beam-off. An image like the one in
Slide “Wiedergabe bildhafter Information” is a line drawing. How could this be represented on
a monitor? In the early days this was by a vector scan, so the cathode-ray was used to actually
paint vectors on the monitor. Very expensive vector display monitors where originally built maybe
as long as into the mid-80’s. The development of television monitors became very inexpensive,
but vector monitors remained expensive, and so a transition took place from vector monitors to
raster monitors, and today everything is represented in this raster. Vector scan) We could have
a raster display to present the contours of an object, but we can also fill the object in the raster
data format.
Not all representations on a monitor are always dealing with the 3-dimensional world. Many
representations in image form can be of an artificial world or of technical data, thus of non-image
information. This is typically denoted by the concept of “visualization”. Slide “Polyline” is a
visualization of data in one dimension. Associated with this very simple idea are concepts such as
polylines (representing a bow tie) and we have a table of points 0 to 6 representing this polyline.
There are concepts such as “markers” which are symbols that represent particular values in a two
dimensional array. This has once been a significant element in computer graphic literature that
today no longer represents a big issue.
Prüfungsfragen:
• Beschreiben sie die Komponenten, die in einem Computer zur Ausgabe und zur interaktiven
Manipulation eines digitales Rasterbildes benötigt werden.
• Beschreiben Sie unter Verwendung einer Skizze den Aufbau eines digitalen Rasterbildes auf
der Leuchtfläche eines Elektronenstrahlschirmes .
• Was ist der Unterschied zwischen Vektor- und Rasterdarstellung eines digitalen Bildes?
Veranschaulichen Sie Ihre Antwort anhand eines einfachen Beispiels und beschreiben Sie die
Vor- und Nachteile beider Verfahren.
• Erklären Sie anhand einer Skizze den zeitlichen Ablauf des Bildaufbaus auf einem Elektronenstrahlschirm!
42
CHAPTER 1. CHARACTERIZATION OF IMAGES
Algorithm 3 Simple raster image scaling by pixel replication
1:
2:
3:
4:
5:
6:
7:
8:
widthratio ⇐ newimagewidth/oldimagewidth
heightratio ⇐ newimageheight/oldimageheight
for all y such that 0 ≤ y < newimageheight do
for all x such that 0 ≤ x < newimagewidth do
newimage[x, y] ⇐ oldimage[round(x/widthratio), round(y/heightratio)]
end for
x⇐0
end for
Algorithm 4 Image resizing
1:
2:
3:
4:
5:
6:
widthratio ⇐ newgraphicwidth/oldgraphicwidth
heightratio ⇐ newgraphicheight/oldgraphicheight
for all Points p in the graphic do
p.x ⇐ p.x × widthratio
p.y ⇐ p.y × heightratio
end for
1.5
Images as Raster Data
We deal with a continuous world of objects, such as curves or areas and we have to convert them
into pixel arrays. Slide “Rasterkonvertiertes Objekt” shows the representation of a certain figure
in a raster image. If we want to enlarge this, we obtain a larger figure with the exact same shape
but a larger size of the object’s elements. If we enlarged the image by a factor of two, what was
one pixel before now is talking up four pixels. The same shape that we had before would look
identical but smaller if we had smaller pixels. We make a transition to pixels that are only a
quarter as large as before. If we now enlarge the image, starting from the smaller pixels we get
back the same shape we had before. However, if we reconvert from the vector to a raster format,
then the original figure really will produce a different result at a higher resolution. So we need to
understand what pixel size and geometric resolution do in the transition from a vector world to a
raster world.
Prüfungsfragen:
• Was versteht man unter Rasterkonversion“, und welche Probleme können dabei auftreten?
”
1.6
Operations on Binary Raster Images
There is an entire world of interesting mathematics dealing with binary images and operations on
such binary images. These ideas have to do with neighborhoods, connectivity, edges, lines, and
regions. This type of mathematics was developed in the 1970’s. A very important contributor was
Prof. Azriel Rosenfeld, who with Prof. Avi Kak wrote the original book on pattern recognition
and image processing.
What is a neighborhood?
Remember that a pixel at location (x, y) has a neighborhood of
four pixels, that are up and down, left and right of the pixel in the middle. We call this an N4
neighbourhood or 4-neighbors. We can also have diagonal neighbors ND with the lower left, lower
right, upper right, upper left neighbors. We add these ND and the N4 neighbors to obtain the N8
neighbors. This is being further illustrated as Prof. Rosenfeld did in 1970. Slide 1.56 presents
the N4 -neighbors and the N8 -neighbors and associates this with a chess game’s movements of the
king. We may also have the oblique-neighbors Nv and the springer-neighbors Nsp which are like
1.6. OPERATIONS ON BINARY RASTER IMAGES
analogous chess movements of the springer etc.
from the “Dame” game.
43
Another diagonal neighborhood would derive
We have neighborhoods of the first order, which are the neighbors of a pixel-x. The neighbors of
the neighbors are “neighbors of second order” with respect to a pixel at x. We could increase the
order by having neighbors of the neighbors of the neighbors.
Definition 2 Connectivity
2 Pixel haengen zusammen, wenn sie einanders Nachbarn sind und dieselbe Zusammenhangseigenschaft V besitzen.
4-Zusammenhang:
if q N4-Nachbar von p then
Pixel p und q haengen zusammen
else
Pixel p und q haengen nicht zusammen
5: end if
1:
2:
3:
4:
{Def. 5}
m-Zusammenhang:
1: if (N4 (p) geschnitten N4 (q)) = 0 then
{N4( x): Menge der x-N4-Nachbarn}
2:
if (q ist N4 -Nachbar von p)||(q ist ND-Nachbar von p) then
{Def. 5}
3:
Pixel p und q haengen zusammen
4:
else
5:
Pixel p und q haengen nicht zusammen
6:
end if
7: else
8:
Pixel p und q haengen nicht zusammen
9: end if
Connectivity is defined by two pixels belonging together: They are “connected” if they are one
another’s neighbors. So we need to have a neighbor-relationship to define connectivity. Depending
on a 4-neighborhood, an 8-neighborhood, a springer-neighborhood we can define various types of
connectivities. We therefore say that two pixels p and q are one another’s neighbors if they are
connected, if they are neighbors under a neighborhood-relationship.
This becomes pretty interesting and useful once we start to do character-recognition and we need
to figure out which pixels belong together and create certain shapes. We may have an example
of three-by-three pixels of which four pixels are black and five pixels are white. We now can
have connections established between those four black pixels under various connectivity rules. A
connectivity with eight neighbors creates a more complex shape than a connectivity via so-called mneighbors, where m-neighbors have been defined previously in Slide “Zusammenhaengende Pixel”.
Definition 3 Distance
Gegeben: Punkte p(x,y) und q(s,t)
1:
2:
3:
p
De(p,q) = 2 (x − s)2 + (y − t)2 (Euklidische Distanz)
D4-Distanz (City Block Distance)
D8-Distanz (Schachbrett-Distanz)
The neighborhood- and connectivity-relationships can be used to established distances between
pixels, to define edges, lines and region in images, to define contours of objects, to find a path
between any two locations in an image and to perhaps eliminate pixels as noise if they are not
connected to any other pixels. A quick example of a distance addresses two pixels P and Q with
44
CHAPTER 1. CHARACTERIZATION OF IMAGES
a distance depending on the neighborhood-relationships that we have defined. The Euclidian
distance of course is simply obtained by the pythagorean sum of the coordinate differences. But
if we take a 4-neighborhood as the base for distance measurements than we have a “city block
distance”, two blocks up, two blocks over. Or if we have the 8-neighborhood than we have a
“chessboard type of distance”.
Let’s define an “edge”. This is important because there is a mathematical definition that is a little
different from what one would define an edge to be in a sort casual way. An edge e in an image
is a property of a pair of pixels which are neighbors of one another. That is thus a property of
a pair of pixels and one needs to consider two pixels to define this. It is important that the two
pixels are neighbors under a neighborhood relationship. Any pair of pixels that are neighbors of
one another represent an edge. The edge has a “direction” and a “strength”. Clearly the strength
of the edge is what is important to us. The edge is defined on an image B and an edge image is
obtained by taking each edge value at each pixel. We can apply a threshold to the weight and the
direction of the edge. All edges with a weight beyond a certain value become 1 and all edges less
than a certain value become 0. In that case now we have converted our image into a binary edge
image.
What is a line? A line is a finite sequence of edges, with each edge ei , i = 1, . . . n. A line is
a sequence of edges where the edges need to be one another’s neighbor under a neighborhood
relationship. The edges must be connected. A line has a length, the length is the number of the
edges that form that line.
What’s a region in the image? A region is a connected set R of pixels from an image B. A region
has a contour. A contour is a line composed of edges and the edges are defined with the property
of two neighboring pixels P and Q. P must be part of the region R, Q must not be. This sounds
all pretty intuitive, but gets pretty complicated once one starts doing operations.
Prüfungsfragen:
• Wenn wir eine Distanz“ zwischen zwei Pixeln in einem Digitalbild anzugeben haben, stehen
”
uns verschiedene Distanzmaße zur Verfügung. Zählen Sie bitte auf, welche Distanzmaße Sie
kennen. Sie sind eingeladen, für die Beantwortung Formeln zu nutzen.
• Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle
”
Arten von Nachbarschaften auf, die in der Vorlesung behandelt wurden, und beschreiben Sie
diese Nachbarschaften mittels je einer Skizze.
• Welche Möglichkeiten gibt es, Pixel in einem digitalen Rasterbild als zusammenhängend zu
definieren? Erläutern Sie jede Definition anhand einer Skizze.
• Zu welchen Zwecken definiert man Nachbarschafts- und Zusammenhangsbeziehungen zwischen Pixeln in digitalen Rasterbildern?
• Geben Sie die Definitionen der Begriffe Kante“, Linie“ und Region“ in einem digitalen
”
”
”
Rasterbild an.
1.7
Algebraic Operations on Images
We can add two images, subtract, multiply, divide them, we can compare images by some logical
operations and we can look at one image using a second image as a mask. Suppose we have a
source image, an operator and a destination image. Now, depending on the operator we obtain
a resulting image. We take a particular source and destination image and make our operator
the function “replace” or the function “or” or the function “X or” or the function “and” to then
obtain different results. We may have mask operations. In this case we take an image A to
1.7. ALGEBRAIC OPERATIONS ON IMAGES
45
Algorithm 5 Logical mask operations
This is an example for a mask operation. Two images are linked with the Boolean OR-operator,
pixel by pixel.
1: for all i=0, i<width, i++ do
2:
for all j=0, j<height, j++ do
3:
x1=source-image.value(i,j)
4:
x2=operate-image.value(i,j)
5:
target-image.value(i,j) = x1 OR x2
6:
end for
7: end for
obtain a resulting image “not A”. We may produce from images A and B a logical addition
“and”. Slide “Maskenoperationen 2” is an example of the “or” and the “X or” operation, slide
“Maskenoperationen 3” shows the “not and” operation. Operating on raster images with the
help of a second image was mentioned earlier as a masking operation. This is a filter operator, also
called window operation. Let us assume that we have an image with gray values z and we have a
second image with gray values w. We can now take the second image, a small 3 × 3 pixel image,
place it over the first image and now define an operation that multiplies each pixel of the second
image with the underlying pixel of the first image adding up all the multipled values, resulting in
a new z-value at the pixel in the middle of that mask. All we have done here is apply a filter
operation. We will address filter operations later in a separate chapter of this class on filters.
Algorithm 6 Fast mask operations
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
Framebuffer
 A, B

w1 w2 w3
Mask =  w4 w5 w6 
w7 w8 w9
B=A
Multiply B by w5
Shift A right by one pixel
Add w4 · A to B
Shift A down by one pixel
Add w1 · A to B
Shift A left by one pixel
Add w2 · A to B
Shift A left by one pixel
Add w3 · A to B
Shift A up by one pixel
Add w6 · A to B
Shift A up by one pixel
Add w9 · A to B
Shift A right by one pixel
Add w8 · A to B
Shift A right by one pixel
Add w7 · A to B
Shift A left by one pixel
Shift A down by one pixel
{A . . . source image, B . . . destination image}
{Defines filter mask}
{Initialize frame buffers}
{Multiplication is performed on all pixels of B}
{Shifts whole frame buffer to the right}
{Now A is in original position again}
At this point let us close out with the basic idea of a parallel operation on an image using a
second image as a mask or filter or window. What is interesting is that these operations can
run very quickly. We do not need to necessarily go sequentially through the image and do these
multiplications and additions on one pixel only. We can instead do a fast operation on an entire
46
CHAPTER 1. CHARACTERIZATION OF IMAGES
mask. For this we may have an input frame buffer A and an output frame puffer B. We may
be able to process everything that is in these two buffers in a 1/30 of a second. So we can do an
operation on N times M pixels in (N × M )/30 seconds, as illustrated in Slide “Operationen”.
Prüfungsfragen:
• Gegeben seien die zwei binären Bilder in Abbildung ??. Welches Ergebnis wird durch eine
logische Verknüpfung der beiden Bildern nach einer xor“-Operation erhalten? Verwenden
”
Sie bitte eine Skizze.
• Erläutern Sie anhand einiger Beispiele, was man unter algebraischen Operationen mit zwei
Bildern versteht.
• Erklären Sie die Begriffe Maske“, Filter“ und Fenster“ im Zusammenhang mit algebrais”
”
”
chen Operationen mit zwei Bildern. Veranschaulichen Sie Ihre Antwort anhand einer Skizze.
1.7. ALGEBRAIC OPERATIONS ON IMAGES
47
48
CHAPTER 1. CHARACTERIZATION OF IMAGES
Slide 1.1
Slide 1.2
Slide 1.3
Slide 1.4
Slide 1.5
Slide 1.6
Slide 1.7
Slide 1.8
Slide 1.9
Slide 1.10
Slide 1.11
Slide 1.12
Slide 1.13
Slide 1.14
Slide 1.15
Slide 1.16
Slide 1.17
Slide 1.18
Slide 1.19
Slide 1.20
Slide 1.21
Slide 1.22
Slide 1.23
Slide 1.24
Slide 1.25
Slide 1.26
Slide 1.27
Slide 1.28
1.7. ALGEBRAIC OPERATIONS ON IMAGES
49
Slide 1.29
Slide 1.30
Slide 1.31
Slide 1.32
Slide 1.33
Slide 1.34
Slide 1.35
Slide 1.36
Slide 1.37
Slide 1.38
Slide 1.39
Slide 1.40
Slide 1.41
Slide 1.42
Slide 1.43
Slide 1.44
Slide 1.45
Slide 1.46
Slide 1.47
Slide 1.48
Slide 1.49
Slide 1.50
Slide 1.51
Slide 1.52
Slide 1.53
Slide 1.54
Slide 1.55
Slide 1.56
50
CHAPTER 1. CHARACTERIZATION OF IMAGES
Slide 1.57
Slide 1.58
Slide 1.59
Slide 1.60
Slide 1.61
Slide 1.62
Slide 1.63
Slide 1.64
Slide 1.65
Slide 1.66
Slide 1.67
Slide 1.68
Slide 1.69
Slide 1.70
Slide 1.71
Slide 1.72
Slide 1.73
Slide 1.74
Slide 1.75
Chapter 2
Sensing
2.1
The Most Important Sensors: The Eye and the Camera
The eye is the primary sensor of a human. It is certainly important to understand how it operates
to understand how a computer can mimic the eye and how certain new ideas in computer vision
and also in computer graphics have developed taking advantage of the specificities of the eye.
In Slide Slide 2.5 we show an eye and define an optical axis of an eye’s lens. This optical axis
intersects the retina at a place called the fovea, which is the area of highest geometric and radiometric resolution. The lens can change its focal length using muscles that pull on the lens and
change its shape. As a result the human can focus on objects that are near by, for example at a
25 cm distance which is typically used in reading a newspaper or book. Or it can focus at infinity
looking out into the world.
The light that is projected from the world through the lens onto the retina gets converted into
signals that are then fed by nerves into the brain. The place where the nerve leaves the eye is
called the blind spot. That is a location where no image can be sensed. The optical system of
the eye consists, apart from the lens, of the so called vitreous humor 1 , in front of the lens is a
protective layer called the cornea 2 and between the lens and the cornea is a space filled with liquid
called the anterior chamber . Therefore the optical system of the eye consists of essentially four
optically active bodies: 1. the cornea, 2. the anterior chamber, 3. the lens and 4. the vitreous
humor.
The conversion of light into nerve signals is accomplished by means of rods and cones that are
embedded in the retina. The rods 3 are black-and-white sensors. The eye has about 75 million of
them, and they are distributed widely over the retina.
If there is very little light, the rods will still be able to receive photons and convert them into
recognizable nerve-signals. If we see color, we need the cones 4 . We have only 6 million of those
and they are not that evenly distributed as the rods are. They are concentrated at the fovea so
that the fovea has about 150.000 of those cones per square millimeter. That number is important
to remember for a discussion of resolution later on.
We take a look at the camera as an analogon of an eye. A camera may produce black-and-white
or color-images, or even false color-images. Slide is a typical color image taken from an airplane of
a set of buildings (see these images also in the previous Chapter 0). This color-photograph is built
1 in
German:
German:
3 in German:
4 in German:
2 in
Glaskörper
Hornhaut
Zäpfchen
Stäbchen
51
52
CHAPTER 2. SENSING
from three component images. First is a the red channel. Second is the green channel followed by
the blue channel. We can combine those red/green/blue channels into a true color-image.
In terms of technical imaging, a camera is capable of producing a single image or an entire image
sequence. When we have multiple images or image sequences, we typically denote them as multiimages.
A first case be in the form of multi-spectral images, if we break up the entire range of electromagnetic radiation from ultraviolet to infrared into individual bands and produce a separate image
for each band. We call the sum of those images multi-spectral . If we have many of those bands we
might call the images hyper-spectral . Typical hyper-spectral image cameras produce 256 separate
images simultaneously, not just red/green/blue!
A second case is to have the camera sit somewhere and make images over and over, always in the
same color but observing changes in the scene. We call that multi-temporal . A third case is to
observe a scene or an object from various positions. A satellite may fly over Graz and take images
once as the satellite barely arrives over Graz, a moment later as the satellite already leaves Graz.
We call this multi-position images.
And then finally, a fourth case might have images taken not only by one sensor but by multiple
sensors, not just by a regular optical camera, but perhaps also by radar or other sensors as we will
discuss them later. That approach will produce some multi-sensor images.
This multiplicity of images presents a very interesting challenge in image processing. Particularly
when we have a need to merge images that are taken at separate times from separate positions
and with different sensors, and if we want to automatically extract information about an object
from many images of that object, we have a good challenge. Multiple digital images of a particular
object location results in multiple pixels per given location.
Those pixels can be stacked on top of one another and then represent “a vector” with the actual
gray values in each individual image being the “elements” of that vector. We can now apply the
ideology of vector algebra to these multi-image pixels. Such a vector may be called feature vector ,
with the features being the color values of the pixel to which the vector belongs.
Prüfungsfragen:
• Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige
Beispiele für Mehrfachbilder!
2.2
What is a Sensor Model?
So far we have only talked about one particular sensor, the camera as an analagon to the eye. We
describe in image processing each sensor by a so called sensor model . What does a sensor model
do? It replaces the physical image and the process of its creation by a geometric description of
the image’s creation. We stay with the camera: this is designed to reconstruct the geometric ray
passing through the perspective center of the camera, from there through the image plane and out
into the world.
Slide 2.11 illustrates that in a camera’s sensor model we have a perspective center 0, we have an
image plane P , we have image coordinates x and h, we have an image of the perspective center
H at the location that is obtained by dropping a line perpendicular from the perspective center
onto the image plane. We find that our image coordinate system x, h, and its origin M does not
necessarily have to coincide with location H.
So what is now a sensor model? It is a set of techniques and of mathematical equations that allow
us to take an image point P 0 as shown in Slide 2.11 and define a geometric ray going from location
2.2. WHAT IS A SENSOR MODEL?
53
Definition 4 Perspective camera
Definition 10 (Modellierung einer perspektiven Kamera(siehe Abschnitt 2.2)):
Ziel: eine Beziehung zwischen dem perspektivischen Zentrum und der Welt aufzustellen; Werkzeug:
perspektivische Transformation (projeziert 3 D-Punkte auf eine Ebene), ist eine nichtlineare Transformation.
Beschreibung von Slide 2.12:
Man arbeitet mit 2 Koordinatensystemen:
1.Bild-Koordinatensystem (x,y,z), 2.WeltKoordinatensystem (X,Y,Z). Ein Strahl vom Punkt w im 3 D-Objektraum trifft auf die Bildebene
(x,y) im Bildpunkt c. Das Zentrum dieser Bildebene ist der Koordinatenursprung, von dem aus
normal zu deren Ebene noch eine zusaetzliche z-Achse verlaeuft, die identisch mit der optischen
Achse unserer Kameralinse ist. Dort, wo der Strahl diese z-Achse schneidet, hat man das sogenannte Linsenzentrum, welches die Koordinaten (0,0,L) besitzt; L ist bei Focuskameras mit der
Focuslaenge zu vergleichen. Bedingung:
Z>L
d.h., alle Punkte, die uns interessieren, liegen hinter der Linse.
Vektor
w0
gibt die Position der Rotationsachsen im 3 D-Raum an, vom Ursprung des WeltKoordinatensystems bis zum Zentrum der Aufhaengung der Kamera
Vektor r definiert, wo der Bildursprung ist unter Beruecksichtigung der Rotationsachsen
(X0 , Y0 , Z0 ),
welche die Kamera auf und ab
rotieren lassen koennen, vom Zentrum der Aufhaengung bis zum Zentrum der Bildebene,
r = (r1 , r2 , r3 )T
.
Perspektivische Transformation: Beziehung zwischen (x,y) und (X,Y,Z)
Hilfsmittel: aehnliche Dreiecke
x : L = (−X) : (Z − L) = X : (L − Z)
y : L = (−Y ) : (Z − L) = Y : (L − Z)
’-X’ bzw. ’-Y’ bedeuten, dass die Bildpunkte invertiert auftreten (Geometrie)
x = L · X : (L − Z)
y = L · Y : (L − Z)
Homogene Koordinaten von einem Punkt im kartesischen Koordinatensystem:
wkar = (X, Y, Z)T
whom = (k · X, k · Y, k · Z, k)T = (whom1 , whom2 , whom3 , whom4 )T , k = const.! = 0
Zurueckwandlung in kartesische Koordinaten:
wkar = (whom1 : whom4 , whom2 : whom4 , whom3 : whom4 )T
Perspektivische Transformationsmatrix:

P
1
 0
= 
 0
0
0
0
1
0
0
1
0 −1 : L

0
0 

0 
1
54
CHAPTER 2. SENSING
0 (the perspective center) through P 0 into the world. What the sensor model does not tell us is
where the camera is and how this camera is oriented in space. So we do not, from the sensor
model, find the world point P in three dimensional space (x, y, z). We only take a camera and an
image with its image point P 0 and from that can project back into the world a ray, but where that
ray intersects the object point in the world needs something that goes beyond the sensor model.
We need to know where the camera is in a World system and how it is oriented in 3D-space.
In computer vision and in computer graphics we do not always deal with cameras that are carried
in aircraft looking vertically down and having therefore a horizontal image plane. Commonly, we
have cameras that are in a factory environment or similar situation and they look horizontally or
obliquely at something that is nearby.
Slide 2.12 illustrates the relationships between a perspective center and the world. We have an
image plane which is defined by the image coordinate axes x and y (was x and h before) and a
ray from the object space denoted as W will hit the image plane at location C. The center of the
image plane is defined by the coordinate origin. Perpendicular onto the image plane (which was
defined by x and y) is the Z-axis and may in this case be identical to the optical axis of the lens.
In this particular case we would not have a difference between the image point of the perspective
center (was H before) and the origin of the coordinate system (was M before).
Now, in this robotics case we have two more vectors that define this particular camera. We have a
vector r that defines where the image origin is with respect to our rotation axis that would rotate
the camera. And we have a vector W0 that gives us the position of that particular rotation axis
in 3D-space. We still need to define for that particular camera its rotation axis that will rotate
the camera up and down and that is oriented in a horizontal plane. We will talk about angles
and positions of cameras later in the context of transformations. Let us therefore not pursue this
subject here. All we need to say at this point is that a sensor model relates to the sensor itself
and in robotics one might understand the sensor model to include some or all of the exterior
paraphernalia that position and orient the camera in 3D-space (the pose). In photogrammetry,
just that later data are part of the so-called exterior orientation of the camera.
Prüfungsfragen:
• Erläutern Sie den Begriff Sensor-Modell“!
”
2.3
Image Scanning
Images on film need to be stored in a computer. But before they can be stored they need to be
scanned. On film an image is captured in an emulsion. The emotion contains chemistry and as
light falls onto the emulsion the material gets changed under the effect of photons. Those changes
are very volatile. They need to be preserved by developing the film. The emulsion is protected
from the environment by supercoats. The emulsion itself is applied to a film base. So the word
“film” really applies to just the material on which the emulsion is fixed. There is a substrate that
holds the emulsion onto the film base and the film base on its back often has a backing layer. That
will be a black and white film.
With colored film we have more than one emulsion. We have three of those layers on top of one
another. We are dealing mostly with digital images, so analog film, photolabs and chemical film
developments are not of great interest of us. But we need to understand a few basic facts about
film and the appearances of objects in film.
Slide 2.15 illustrate that appearance. We have the ordinate of a diagram to record the density
that exists from the reflections of the world onto the emulsion. Those densities are 0 when it is
very white, there is no light and the film is totally transparent (negative film!). And as more and
more light falls onto that film the film will get more exposed and the density will get higher until
2.3. IMAGE SCANNING
55
the negative is totally black. Now this negative film is exposed by the light that is emitted from
the object through a lens onto the film. Typically, the relationship between the density recorded
on film and light emitted from an object is a logarithmic one. As the logarithm of the emitted
light increases along the abszissa the density will typically increase linearly and that is the most
basic relationship between the light falling onto a camera and the light recorded on film, except
in the very bright and the very dark areas. When there is almost no light falling on the film, the
film will still show what is called a gross fog. So film typically will never be completely unexposed.
There will always appear to be an effect as if a little bit of light had fallen onto the film. We have
a lot of light coming in, we loose the linear relationship again and we come to the “shoulder” of
the gradation curve. As additional light comes in, the density of the negative does not increase
any more.
Note that the slope of the linear region is denoted here by tan(α) and is called the gamma of the
film. This defines more or less sensitive films and the sensitivity has to do with the slope of that
linear region. If a lot of light is needed to change the density, we call this a slow or “low sensitivity
film”. If a small change in light causes large change in density then we call this a “very sensitive
film” and the linear region is shallower. The density range that we can record on film is often
perhaps between 0 to 2. However, in same technical applications or in the graphic arts and in the
printing industry, densities may go up to 3.6. And in medicine X-ray film density is going up as
high as 4.0. Again, we will talk more about density later so keep in mind those numbers: Note
that they are dimensionless numbers. We will interpret them later.
We need to convert film to digital images. This is based on one of three basic technologies.
First, so-called drum scanners have the transparent film mounted on the drum, inside the drum
is a fixed light source, the drum rotates, the light source illuminates the film and the light that
is coming through the film is collected by a lens and put on a photo detector (photo-multiplier 5 ).
The detector sends electric signals which get A/D converted and produce at rapid intervals a series
of numbers per one rotation of the drum. We do get a row of pixels per one drum rotation. That
has been very popular but has recently been made obsolete because this device has sensitive and
rapid mechanic movements. It is difficult to keep these systems calibrated.
Second, a much simpler way of scanning is by using not a single dot but a whole array of dots,
namely a CCD (charge-coupled-device). We put them in a scan-head and collect light that is for
example coming from below the table, shining through the film, gets collected through the lens
and gets projected onto a serious of detectors. There may be 6000, 8000, 10.000 or even 14.000
detectors. And these detectors collect the information about one single line of film. The detector
charges are being read out, an A/D converter produces for each detector element one number.
Again, the entire row of detectors will create in one instant a row of pixels. How do we get a
continuous image? Of course by moving the scan head and we can be in the process of collecting
the charges built up row by row into an image (push-broom technology).
Third, we can have a square array detector field. The square CCD is mounted in the scan-head
and the scan-head “grabs” a square. How do we get a complete image that is much larger that a
single square?
By stepping the camera, stopping it, staring at the object, collecting 1000 by 1000 pixels, reading
them out, storing them in the computer, moving the scan head, stopping it again, taking the next
one and so on. That technology is called step and stare. An array CCD is used to cover a large
document by individual tiles but then assemble the tiles into a seam-less image.
We get the push-broom single-path linear CCD array scanner typically in desktop-, household-,
H.P.-, Microtec-, Mostec-, UMAX-type products.
Those create an image in a single swath and are limited by the length of the CCD array. If we
want to create a larger image than the length of a CCD array then we need to assemble image
segments.
5 in
German: Sekundärelektronenverfielfacher
56
CHAPTER 2. SENSING
So to create a swath by one movement of the scan head, we step the scan head over and repeat
this swath in the new location. This is called the multiple path linear CCD scanner. Another
name for this is xy-stitching. The scan head moves in x and y, individual segments are collected,
then will be “stitched” together.
Prüfungsfragen:
• Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen
(z.B. Fotografien)!
2.4
The Quality of Scanning
People are interested in how accurate scanning is geometrically. The assessment is typically based
on scanning a grid and comparing the grid intersections in a digital image with the known grid
intersection coordinates of the film document. A second issue is the geometric resolution. We
check that by imaging a pattern.
Slide 2.22 is called a US Air Force Resolution Target and each of the patterns has a very distinct
distance between the black lines and intervals between of those black lines. As those black lines
get smaller and narrower together we challenge the imaging system more and more.
If we take a look at an area that is beyond the resolution of the camera than we will see that we
cannot resolve the individual bars anymore. The limiting case that we can just resolve is used to
describe the resolution capability of the imaging system. That may describe the performance of a
scanner but it may just as well describe the resolution of a digital camera.
These resolution targets come with tables that describe what each element resolves. For example,
we have groups of six elements each (they are called Group 1, 2, 3, 4, 5, 6) and within each group
we find six elements.
In the example shown in Slide 2.24 one sees how the resolution is being designated by line pairs
per millimeter . However, we have a pixels and the pixels have a side length. How do we relate
the line pairs per millimeter to pixel diameter? We will discuss this later.
The next subject for evaluating a digital image and developing a scanner is the gray value performance. We have a Kodak gray wedge that has been scanned. On the bright end the density
is 0, on the dark end the density is 3.4. We have now individual steps of 0.1 and we can judge
whether those steps get resolved both in the bright as well as in the dark area. On a monitor like
this we can not really see all thirty-four individual steps in intervals of 0.1 D from 0 to 3.4. We
can use Photoshop and do a function called histogram equalization, whatever that means, on each
segment of this gray wedge. As a result we see that all the elements have been resolved in this
particular case.
Prüfungsfragen:
• Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren kann man sie ermitteln?
2.5
Non-Perspective Cameras
Cameras per se have been described as having a lens projecting light onto film and then we scan
the film. We might also have instead of film a digital square array CCD in the film plane to get
the direct digital image. In that case we do not go through a film scanner. We can also have a
2.6. HEAT IMAGES OR THERMAL IMAGES
57
camera on a tripod with a linear array, moving the linear array while the light is falling on the
image plane collecting the pixels in a sequential motion much like a scanner would. There also
are stranger cameras yet which do not have a square array in the film plane and avoid a regular
perspective lens. These are non-perspective cameras.
First let us look at a linear CCD array in howing a CCD array with 6000 elements that are
arranged side by side, each element having a surface of 12 mm x 12 mm. These are being read out
very rapidly and so that a new line can be exposed as the array moves forward. For example, an
interesting arrangement with two lenses is shown in Slide 2.28: the two lenses expose one single
array in the back. Half of the array looks in one direction, half in the other direction. By moving
the whole scan head we now can assemble two digital strip images. Such a project to build this
camera was completed as part of a PhD thesis in Graz. The student built a rig on top of his car,
mounted this camera, he drove through the city, collecting images of building facades as we have
seen earlier (See Chapter 0).
Prüfungsfragen:
• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?
2.6
Heat Images or Thermal Images
Heat images collect electromagnetic radiation in the middle to the far infrared, not in the near
infrared. So it is not next to visible light in the electromagnetic spectrum. That type of sensing can
be accomplished by a mirror that would illuminate (look at) essentially one small instantaneous
field-of-view (IFOR), in the form of a circular area on the ground, collect the light from there,
project it onto a detector and make sure that in a rapid sequence one can collect a series of those
circular areas on the ground.
What we have here is an instantaneous angle-of-view α. We have the center of a cone that relates
the sensor to the ground, and the axis of the cone is at an angle of the vertical called “A”. In the
old days, say in the sixties and seventies, often-times the recording was not digital but on film.
Slide 2.35 illustrates the old-fashioned approach. We have infrared-light coming from the ground.
It is reflected off a mirror, goes through an optical system that focuses that light on the IRdetector, it converts the incoming photons into an electric signal which is then used to modulate
the intensity of light which is then projected via another lens and a mirror onto a piece of curved
film.
Slide 2.36 was collected in 1971 or 1972 in Holland. These thermal images were taken from an
airplane over regularly patterned Dutch landscapes. What we see here is the geometrical distortion
of fields, as a result of the airplane wobbling in the air as the individual image lines are collected
in each row. Each image line is accrued to its previous one by a sequential motion of the airplane.
A closer look shows that there are areas that are bright and others that are dark. If it is a positive
then the bright things are warm, the dark things are cold.
Prüfungsfragen:
• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?
58
CHAPTER 2. SENSING
2.7
Multispectral Images
We already saw the concept of multi-spectral images. In principle they get, or in the past have
been, collected by a rotating mirror that reflects the light from the ground off a mirror onto a
refraction prism. The refraction prism splits the white light coming from the ground into its
color-components. We have for each color a detector. This could be three for red/green/blue or
226 for hyper-spectral-systems. Detectors convert the incoming light into an electric signal and
they get either A/D converted or directly recorded. In the old days recording was onto a magnetic
tape unit, today we record everything on a digital disc with a so-called direct capture system DCS.
When one does these measurements with sensors one really is into a lot of open air physics. One
needs to understand what is light, electromagnetic radiation. When energy comes from the sun a
lot of it is in the visible area, somewhat less in the ultraviolet, some what less in the infrared.
The sun’s energy is augmented by energy that the Earth itself radiates off as an active body.
However, its energy is in the longer wavelengths. The visible light goes, of course, from blue via
green to red. The infrared goes from the near infrared to the middle and far infrared. As our
wavelengths get longer we go away from infrared and we go into the short waves, microwaves, long
microwaves and radiowaves.
When we observe in a sensor the radiation that comes in from the surface of the Earth we don’t get
an even distribution of the energy as the sun has sent it to the Earth but we get the reflection of
the surface and those reflections are depending on what’s on the ground, but also depends on what
the atmosphere does to the radiation. A lot of that radiation gets blocked by the atmosphere, in
particular from the infrared on. There are a few windows at 10 micrometers, and at 14 micrometers
wavelength, where the energy gets blocked less and we can obtain infrared radiation. In the visible
and near infrared the atmosphere lets this radiation through unless, of course, the atmosphere
contains a lot of water in form of clouds, rain or snow: that will block the visible light just as well
as it blocks a lot of the longer wavelength. The blocking of the light in the atmosphere is also a
measure of the quality of the atmosphere.
In imaging the Earth’s surface, the atmosphere is a “nuisance”. It reduces the ability to observe
the Earth’s surface. However, the extent to which we have blockage by the atmosphere tells us
something about pollution, moisture etc. So something that can be a nuisance to one application
can also be useful in another.
We are really talking here about the ideas that are at a base of a field called remote sensing. A
typical image of the Earth’s surface shown in Slide 2.42.
In a color photograph has no problem from the atmosphere, we have the energy from the sun
illuminating the ground, we have the red/green/blue colors of a film image, it can be scanned and
put into the computer, and the computer can use the colors to assess what is on the ground.
Prüfungsfragen:
• Skizzieren Sie das Funktionsprinzip eines multispektralen Abtastsystemes“ (Multispectral
”
Scanner). Sie sind eingeladen, in der Beantwortung eine grafische Skizze zu verwenden.
2.8
Sensors to Image the Inside of Humans
Sensors cover a very wide field and imaging is a subset of sensing (think also of acoustics, temperature, salinity and things like that). Very well known are so called CAT scans (computer aided
tomography). That was invented in 1973 and in 1975 the inventors received the Nobel prize, two
scientists from England (Houndsfield&Cormack). It was the fastest recognition of a breakthrough ever. It revolutionized medicine because it allowed medical people to look at the inside
2.9. PANORAMIC IMAGING
59
of humans at a resolution and accuracy that was previously unavailable without having to open
up that human.
Slide 2.44 illustrates the idea of the CAT scan that represents the transmissivity of little cubes of
tissue inside the human. While a pixel is represented in two dimensions, here each gray value
represents how much radiation was transmitted through a volume element. So therefore those gray
values do not associate well with a 2D pixel but with a 3D voxel or volume element. A typical
CAT image that may appear in 2D really reflects in x and y a 1 mm × 1 mm base, but in z it
may reflect a 4 mm depth.
Prüfungsfragen:
• Erklären Sie, wie man mit Hilfe der Computertomografie ein dreidimensionales Volumenmodell vom Inneren des menschlichen Körpers gewinnt.
2.9
Panoramic Imaging
We talked in Chapter 0 about the increasingly popular panoramic images.
They used to be produced by spy satellites, spy airplanes, spacecraft of other planets or of the
Earth. The reason why we are interestest in these images is that we would like to have a high
geometric resolution and a very wide swath, thus a wide field of view at high resolution. Those two
things are in conflict. A wide angle lens gives an overview image or one has to have a tele-lens to
give a very detailed image, but only of a small element of the object. How can we have both a very
high resolution of a tele-lens and still have a coverage from a wide angle lens? That is obtained
by moving the tele lens, by sweeping it to produce a panoramic image (compare the material from
Chapter 0).
Prüfungsfragen:
• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?
2.10
Making Images Independent of Sunlight and in Any
Weather: Radar Images
Slide 2.49 is an image taken from a European Space Agency (ESA) satellite called ERS-1, of an
area in Tirol’s Ötztal. There exists a second image so that the two together permit us to see a
three dimensional model in stereo. We will talk about this topic of stereo later. How is a radar
image being produced? Let’s assume we deal with an aircraft sensor.
Because we are making images with radiation that is way beyond the infrared, namely in the
microwaves (we have one millimeter to two meter wavelengths, but typically 3 to 4 to 5 cm
wavelengths). We can not use glass lenses to focus that radiation. We need to use something else,
namely antennas. So a wave gets generated, it’s traveling through a waveguide to an antenna.
The antenna transmits the small burst of energy, a pulse. That travels through the atmosphere to
the ground. It illuminates the area on the ground with a footprint that is a function of the shape
of the antenna. The ground reflects it back, the antenna goes into the listening mode and “hears”
is the echo. The echo is coming from the nearby objects first, from the far away objects latest.
This gets amplified, gets A/D converted, gets sampled and produces a row of pixels, in this case
radar image pixels. The aircraft moves forward, the same repeats itself 3000 times at second. One
60
CHAPTER 2. SENSING
obtains a continuous image of the ground. Since we illuminate the ground by means of the sensor,
we can image day-and-night. Since we use microwaves, we can image through clouds, snow and
rain (all weather).
Prüfungsfragen:
• Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile
bietet dieses Verfahren?
• Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten aus digitale Bilder
erzeugen, aus welchen ein topografisches Modell des Geländes (ein Höhenmodell) aus einer
einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der
elektromagnetischen Strahlung, die für diese Zwecke genutzt werden!
2.11
Making Images with Sound
There is a very common technique to map the floor of the oceans. There exists really only one
technique right now that is widely applicable. Under-Water SONAR. SONAR means sound,
navigation and range. It is a total analogy to radar except that we don’t use antennas and
electromagnetic energy but we use membranes that vibrate instead of sound impulses and we
need water for sound to travel. The sound pulse travels through the water, hits the ground, gets
reflected, the membrane goes into a listening mode for the echos. These get processed and create
one line of pixels. As the ship moves forward, line by line gets accrued into a continuous image.
The medical ultrasound technology is similar to under-water imaging, but there are various different approaches. Some methods of sound imaging employ the Doppler-effect. We will not discuss
medical ultrasound in this class, but defer to later classes in the “image processing track”.
Prüfungsfragen:
• Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung!
2.12
Passive Radiometry
We mentioned earlier that the Earth is active, is transmitting radio-waves without being illuminated by the sun. This can be measured by passive radiometry. We have an antenna, not a lens.
It “listens” to the ground. The antenna receives energy which comes from a small circular area on
the ground. That radiation is collected by the antenna, is processed and creates an image point.
By moving the antenna we can move that point on the ground and thereby have a scanning motion
producing an image scan that gets converted into a row of pixels. By moving the aircraft forward
we accumulate rows of pixels for a continuous image. Passive radiometry is the basis of weather
observations from space where large areas are being observed, for example the arctic regions.
Prüfungsfragen:
• Was versteht man unter passiver Radiometrie“?
”
2.13. MICROSCOPES AND ENDOSCOPES IMAGING
2.13
61
Microscopes and Endoscopes Imaging
The most popular microscopes for digital imaging are so called scanning electron-microscopes
(SEM) or X-ray-microscopes. Endoscopes are optical devices using light to look “inside things”.
Most users are in medicine to look into humans. There is a lens-system and light to illuminate the
inside of the human. The lens collects the light, brings it back out, goes in the computer and on
the monitor the medical staff can see the inside of the human, the inside of the heart, the inside
of arteries and so forth. The endoscopes are often times taking on the shape of thick “needles”
that be inserted into a human.
The same approach is used in mechanical engineering to inspect the inside of engines, for example
to find out what happens while an explosion takes place inside a cylinder chamber in an engine.
Prüfungsfragen:
• Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung
digitaler Rasterbilder verwendet werden!
2.14
Objects-Scanners
The task is to model a 3D object, a head, a face, an engine, a chair. We would like to have a
representation of that object in the computer. This could already be a result of a complete image
processing system, of which the sensor is only a component, as is suggested in Slide 2.58. The
sensor produces a 3D model from images of the entire object. This could be done in various ways.
One way is to do it by a linear array camera that is being moved over the object and obtains a
strip-image. This is set up properly in the scanner, to produce a surface patch. Multiple patches
must be assembled. This is done automatically by making various sweeps of the camera over the
object as it gets rotated.
We can also have a person sit down on a rotating chair and a device will optically (by means of
an infrared laser) scan the head and produce a 3D replica of the head. Or the object is fixed and
the IR-laser is rotating.
The next technique would be to scan an object by projecting a light pattern on to the surface.
That is called structured light 6 . Finally we can scan an object by having something touch it with
a touch-sensitive pointer and the pointer is under a force that keeps the tip of the pointer on
the object as it moves; another approach is to have a pointer move along the surface and track
the pointer by one of may Tracking Technologies (optical, magnetic, sound, see also Augmented
Reality later on).
Prüfungsfragen:
• Welchem Zweck dient ein sogenannter Objektscanner“? Nennen Sie drei verschiedene Ver”
fahren, nach denen ein Objektscanner berührungslos arbeiten kann!
2.15
Photometry
We are now already at the borderline between sensors and image processing/image analysis. In
photometry we do not only talk about sensors. However, photometry is a particular type of sensor
6 in
German: Lichtschnitte
62
CHAPTER 2. SENSING
arrangement. We image a 3D object with one camera taking multiple images like in a time series,
but each image is taken with a different illumination. So we may have four or ten lamps at different
positions. We take one image with lamp 1, a second image with lamp 2, a third image with lamp
4 etc. We collect these multiple images thereby producing a multi illumination image dataset.
The shape reconstruction is based on a model of the surface reflection properties. Reviewing those
properties, the radiometry of the image produces the object shape.
2.16
Data Garments
Developments attributed to computer graphics concern so-called data-garments. We need to
sense not only properties of the objects of interest, but also need to sense where an observer is
because we may want to present him or her with a view of an object in the computer from specific
places and directions. The computer must know in these cases where we are. This is achieved
with data-gloves and head-mounted displays (HMD). For tracking the display’s pose, we may have
magnetic tracking devices to track where our head is, in which direction we are looking. There
is also optical tracking which is more accurate and less sensitive to electric noise, there may be
acoustic tracking of the position and attitude of the head using ultrasound.
Prüfungsfragen:
• Was versteht man unter data garmets“ (Datenkleidung)? Nennen Sie mindestens zwei
”
Geräte dieser Kategorie!
2.17
Sensors for Augmented Reality
In order to understand what the sensor needs for augmented reality, we need first to understand
what augmented reality is. Let us take a simple view. Augmented reality is a simultaneous visual
perception by a human being of the real environment, of course by looking at it, and superimposing
onto that real environment virtual objects and visual data that are not physically present in the
real environment.
How do we do this? We provide the human with transparent glasses which double as computer
monitors. So we use one monitor for the left eye, another monitor for the right eye. The monitors
show a computer generated image, but they are transparent (or better semitransparent). We not
only see what is on the monitor, we also see the real world. The technology is called head mounted
displays or HMDs. Now, for an HMD to make any sense, the computer needs to know where the
eyes are and in what direction they are looking. Therefore we need to combine this HMD with a
way of detecting the exterior orientation or pose.
That is usually accomplished by means of magnetic positioning. Magnetic positioning, however,
is fairly inaccurate and heavily affected by magnetic fields that might exist in a facility with
computers. Therefore we tend to augment magnetic positioning by optical positioning as suggested
in Slide 2.63. A camera is looking at the world, mounted rigidly with the HMDs. Grabbing an
image, one derives from the image where the camera is and in which direction it is pointed and
one also detects where the eyes are and in which direction they are looking. Now we have the basis
for the computer to feed into the glasses the proper object in the proper position and attitude
so that the objects are where they should be. As we review augmented reality, we immediately
can see an option of viewing the real world via the cameras and feeding the eyes not with the
direct view of reality, but indirectly with the camera’s views. This reduces the calibration effort
in optical tracking.
Prüfungsfragen:
2.18. OUTLOOK
63
• Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter
Trackingverfahren und erläutern Sie deren Vor- und Nachteile!
Antwort:
Tracking
magnetisch
optisch
2.18
Vorteile
robust
schnell
genau
Nachteile
kurze Reichweite
ungenau
Anforderung an Umgebung
aufwändig
Outlook
The topic of imaging sensors is wide. Naturally we have to skip a number of items. However, some
of these topics will be visited in other classes for those interested in image processing or computer
graphics. They also appear in other courses of our school. Two examples might illustrate this
matter. The first is Interferometry, a sensing technology combined with a processing technology
that allows one to make very accurate reconstructions of 3D shapes by making two images and
measuring the phase of the radiation that gave rise to each pixel. We will deal with this off and
on throughout “image processing”.
Second, there is the large area of medical imaging, with a dedicated course. This is a rapidly
growing area where today there are ultrafast CAT scanners producing thousands of images of a
patient in a very short time. It becomes a real challenge for the doctor to take advantage of
these images and reconstruct what the objects are of which those images are taken. This very
clearly needs a sophisticated level of image processing and computer graphics to help human
analysts with an understanding what’s in the images and to reconstruct the relevant objects
in 3D. A clear separation of the field into Image Processing/Computer Vision and Computer
Graphics/Visualization is not really useful and feasible.
64
CHAPTER 2. SENSING
2.18. OUTLOOK
65
Slide 2.1
Slide 2.2
Slide 2.3
Slide 2.4
Slide 2.5
Slide 2.6
Slide 2.7
Slide 2.8
Slide 2.9
Slide 2.10
Slide 2.11
Slide 2.12
Slide 2.13
Slide 2.14
Slide 2.15
Slide 2.16
Slide 2.17
Slide 2.18
Slide 2.19
Slide 2.20
Slide 2.21
Slide 2.22
Slide 2.23
Slide 2.24
Slide 2.25
Slide 2.26
Slide 2.27
Slide 2.28
66
CHAPTER 2. SENSING
Slide 2.29
Slide 2.30
Slide 2.31
Slide 2.32
Slide 2.33
Slide 2.34
Slide 2.35
Slide 2.36
Slide 2.37
Slide 2.38
Slide 2.39
Slide 2.40
Slide 2.41
Slide 2.42
Slide 2.43
Slide 2.44
Slide 2.45
Slide 2.46
Slide 2.47
Slide 2.48
Slide 2.49
Slide 2.50
Slide 2.51
Slide 2.52
Slide 2.53
Slide 2.54
Slide 2.55
Slide 2.56
2.18. OUTLOOK
67
Slide 2.57
Slide 2.58
Slide 2.59
Slide 2.60
Slide 2.61
Slide 2.62
Slide 2.63
Slide 2.64
68
CHAPTER 2. SENSING
Chapter 3
Raster-Vector-Raster Convergence
Algorithm 7 Digital differential analyzer
1:
2:
3:
4:
5:
6:
7:
8:
dy = y2 − y1
dx = x2 − x1
m = dy/dx
y = y1
for x = x1 to x2 do
draw (x, round(y))
y =y+m
end for
3.1
{Step y by slope m}
Drawing a straight line
We introduce the well-known Bresenham Algorithm from 1965. The task is to draw a straight
line on a computer monitor and to replace a vector representation of a straight line that goes from
the beginning point to an end point by a raster representation in the form of pixels. Obviously, as
we zoom in on a straight line, that is shown on a computer monitor we do notice that we are really
looking at an irregular edge of an area that is representing the straight line. The closer we look,
the more we see that the edge of that straight line is not straight at all. Conceptually, we need to
find those pixels in a raster representation that will represent the straight line, as shown in Slide
3.4. The simplest method of assigning pixels to the straight line is the so-called DDA Algorithm
(Digital Differential Analyzer). Conceptually we intersect the straight line with the columns that
pass through the center of the patterns of pixels. The intersection coordinates are (xi , yi ) and
at the next column of pixels they are (xi + 1, yi + n). The DDA algorithm (see Algorithm 7)
uses rounding operations to find the nearest pixel simple by rounding the y-coordinates. Slide ??
illustrates graphically the operations of the DDA algorithm, Slide 3.6 is a conventional procedure
doing what was just described graphically. Obviously the straight line’s beginning point is defined
by (x0 , y0 ), the end point is defined by (xl , yl ) for simplicity’s sake we say that x is an integer
value, then we define auxiliary values, dx, dy, y and m as real numbers and we go then through a
loop column by column of pixels doing rounding operations to find those pixels that will represent
the straight line.
The DDA Algorithm is slow because it uses rounding operations. In 1965 Bresenham proposed his
algorithm that was exceedingly fast and outperformed the DDA Algorithm by far and Pittoray
in 1967 proposed the Midpoint-Line-Algorithm (see Algorithm ??). These algorithms avoid the
rounding operations and simply operate with decision variables only. For a long time, the vector
69
70
CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE
to raster conversion implemented by Bresenham and Pittoray was only applicable to straight
lines. It was as late 1985 that this ideology of very fast conversion of vector to raster was extended
to involve circles and ellipses. In this class we will not go beyond straight lines.
The next six illustrations address the Bresenham Algorithm. We begin by defining the sequence
of pixels that are being visited by the algorithm as East, and North-East of the previous pixel and
we find an auxiliary position m which is halfway between the North-East and the East pixel. The
actual intersection of the straight line with a line through the column of pixels to be visited is
denoted by Q. Essentially Bresenham now says: ”Given that we know the previous pixel we must
make a decision whether we should assign to the straight line the pixel N E or the pixel E. You
can of course immediately see that the approach here is applicable to straight lines that progress
between the angles of 0 and 45 degrees. However, for directions between 45 and 90 degrees and so
forth the same ideas apply and with minimum modifications. Slide 3.10 and Slide 3.11 actually
describe the procedure used for the Midpoint Line Algorithm with the beginning point (x0 , y0 )
and end point (x1 , y1 ) and will come back from the procedure with the set of raster pixels that
describe the straight line. Again we have to have a dx and a dy with increments E and increments
N E, we have an auxiliary variable b and we have variables x and y. Now the algorithm itself
is self-explanatory, we really do not need much text to explain it. The reader is invited to work
through the algorithm.
The next two Slide 3.12 and Slide 3.13 explain the basic idea behind the midpoint line algorithm.
Note that we have introduced an auxiliary point M into the approach and the coordinates of that
point are (xp + 1, yp + 1/2). The equation of a straight line clearly is
axm + bym + c = 0,
a point that is not on the straight line will produce with the equation a value of either more or
less than zero. Values larger than zero would be above the straight line, values less than zero with
a negative signal below the straight line.
Now we can write the equation of a straight line also as
y=
dy
x + b.
dx
This can be rearranged as shown in Slide 3.13, we can ultimately write down that a variable d
that can be larger than zero, equal to zero or less than zero equals
1
d = dy(xp + 1) − dx(yp + ) + c.
2
If d is larger than zero, then the pixel of interest is N E, otherwise the pixel of interest is E. If E
is selected as the next pixel, then we have to compute a new value for d, a dnew , by putting into
the equation of a straight line the coordinate of a new midpoint M which then we would have to
call (xp + 2, yp + 0.5), which, if we look at is, is really nothing else but the old value of d + dy. But
if we select N E as the next pixel, then our midpoint has the coordinates (xp + 2, yp + 1.5), which
is nothing else but the old value of d + dy − dx. Once we realize that, we see that the equation of
a straight line comes out for a value of the midpoint M as a + b/2 and if we do that and we don’t
want to divide anything by two, we simply multiply everything by a factor of 2 and we end up by
saying
2d = 2dy − dx.
So Bresenham’s trick was to avoid multiplications and divisions, and simply make decision
whether things are larger or smaller than zero and by finding a value that is larger than zero
add to that value one number if it is less than zero, add another number and work one’s way along
the straight line from pixel to pixel. So this was a pretty creative algorithm to be fast.
There is a problem. The line that is horizontal has a sequence of pixels that are basically a pixel
3.2. FILLING OF POLYGONS
71
diameter of ?. See in Slide 3.15 that line a would be a dark line. However if we incline that line by
45 degrees, then the pixels we find to be assigned to that line have a distance that is the diameter
of a pixel times the square root of 2. Therefore we have across the entire length of straight line
fewer pixels, the same line would be less dark. We will address this and related subjects later in
section 3.3.
Prüfungsfragen:
• Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus
gegenüber dem DDA-Algorithmus.
• Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt
werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden. Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln
führen.
• Das Quadrat Q in normalisierten Bildschirmkoordinaten aus Beispiel B.2 wird in ein Rechteck
R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten transformiert. Zeichnen Sie die
Verbindung der zwei Punkte p01 und p02 in Abbildung B.20 ein und bestimmen Sie grafisch
jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu
approximieren!
3.2
Filling of Polygons
Another issue when converting from the vector world to the raster world is dealing with areas that
have boundaries in the form of polygons. Such polygons could be convex, concave, they could
intersect themselves, they could have islands. It is very quickly a non-trivial problem to take a
polygon from the vector world, create from it a raster representation and fill the area inside the
polygons. Slide 3.17 illustrates the issue. Instead of finding pixels along the polygon we simply
have the task of finding pixels that are inside the polygon represented by a sequence of vectors.
We define a scan line as a row of pixels going from left to right . The illustrations in Slide 3.17
illustrate that the first pixel would be assigned when along the scan line we intersect the first
vector, and every time we find along the scan line an intersection with a vector from the polygon,
we change from assigning pixels to not-assigning pixels and vice-versa.
A second approach shown in Slide 3.18 is the idea of using the Bresenham algorithm to rastorize
all the vectors defining the polygon and then, after that, go along the scan lines and take the pairs
of pixels from the Bresenham algorithm and fill intermediate spaces with additional pixels. As
we can see in this example, that approach may produce pixels that have a center outside of the
actual polygon. There is yet another algorithm that we could use that takes the polygonal points
at the inside of the polygon. That is different from the previous application of the Bresenham
algorithm.
Slide 3.21 illustrates for the first time a concept, which we will address in a moment, and that is
if we have a very narrow polygon, a triangle, we might get a very irregular pattern of pixels, and
when we look at this kind of pattern, we notice that we have a severe case of aliasing. Aliasing is
a topic of interest in computer graphics.
Prüfungsfragen:
• Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon ausgefüllt
(also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme treten auf, wenn das Polygon sehr spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)?
”
72
CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE
3.3
Thick lines
A separate subject is the various ways one can use to plot thick lines, not simply applying a
Bresenham algorithm to a mathematically infinitely line, but to say a fat line. One way of doing
that is to apply a Bresenham algorithm and then replicate the pixels along the columns and
saying that when found a pixel according to Bresenham I now make five pixels out of that. We do
that, then the thickness of the line becomes a function of the slope of the straight line. A second
way of plotting a thick line is by taking the Bresenham pixels and think of applying at each
location of the pixel a rectangular pen. That is, as in the example of Slide 3.23 a pensize of 5 × 5,
each 25 pixels (see Algorithm 8.
Algorithm 8 Thick lines using a rectangular pen
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
procedure drawThickLine2(x1,y1,x2,y2,thickness,color);
var x,i:integer;
p1x,p1y,p2x,p2y:integer;
dx,dy,y,m:real;
Begin
dy:=y2-y1;
dx:=x2-x1;
m:=dy/dx;
y:=y1;
for x:=x1 to x2 do
p1x:=x-(thickness div 2);
{upper left point}
p1y:=Round(y)+(thickness div 2);
p2x:=x+(thickness div 2);
{lower right point}
p2y:=Round(y)-(thickness div 2);
drawFilledRectangle(p1x,p1y,p2x,p2y,color);
{rectangle with p1 and p2}
y:=y+m;
end for;
end;
{drawThickLine2}
{Note: drawFilledRectangle draws a rectangle given by the upper left
and the lower right point. If you want to use a circular pen simply replace the rectangle with
drawFilledCircle(x,y,(thickness div 2),color). Syntax: drawFilledCircle(mx,my,radius,color)}
The difficulty of fat lines becomes evident if we have circles. Let us assume in Slide 3.25 that we
have pixel replication as the method, we use Bresenham to assign pixels to the circle and then
we add one pixel at top and one pixel below at each pixel. What we can very quickly see is that
the thickness of the line describing the circle is good at zero and ninety degrees, but is narrower
at 45 degrees where the same thickness, which was t at 0 and 90 degrees reduces to t divided by
square root of two. This problem goes away if we think of using a moving pen with 3 × 3 pixels.
In that case the variation in pixels goes away. Yet another approach will be, that if we apply
a vector-to-raster-conversion algorithm, to two contours by changing the radius of the circle and
then we fill the area described by the two contours with pixels again, we see that we do avoid the
change in thickness of the lines.
Prüfungsfragen:
• Nennen Sie verschiedene Techniken, um dicke“ Linien (z.B. Geradenstücke oder Kreisbögen)
”
zu zeichnen.
3.4. THE TRANSITION FROM THICK LINES TO SKELETONS
73
Definition 5 Skeleton
The skeleton of a region R contains all points p which have more than one nearest neighbour on
the border-line of R. The points p are the centers of these discs which intersect the border-line b
in two or more points.
The detection of skeletons is useful for shape recognition and runs in O(n2 ) for concave polygons
and O(n log n) for convex polygons.
3.4
The Transition from Thick Lines to Skeletons
The best known algorithm to make a transition from a thick line or an area to a representation
by the area’s skeleton is by Bloom from the year 1967. We define a region R, and its borderline
B. The basic idea of the medial axis transform (see Definition 3.4) is to take a region as shown in
Slide 3.30 and replace this region by those pixels (string of individual pixels) which have more than
one single nearest neighbor along the boundary of the area. When we look at the area at example
(a) in the slide, we can very quickly recognize that every point along the dashed lines has two
nearest points along the border, either on the left and right border or on the left and top border
etc. When we create a pattern like that and we have a disturbance as we see in image (b) of that
slide, we see that we get immediately a stop from the center line leading towards the disturbances.
Example (c) shows how this basic idea of finding pixels who have two nearest neighbors along the
borderline will create a pattern when the area itself is not rectangular, but has an L-shape. Slide
3.31 summarizes in words, how we go from a region to a boundary line b, and from the boundary
line we go to pixels p which have more than a single nearest neighbor on the boundary line b. As
a result the pixels p form the so-called medial axis of region R.
This basic matter of finding the medial axis is expensive, because the distances need to be computed
among all the pixels within the region R and all the pixels on the boundary line B. A lot of sorting
would go on. For this reason, Bloom considered a different approach. He said, the transition
from the region to the skeleton, or the medial axis, is better achieved by means of rethinning
algorithm. Therefore we go from the edge of a region and we delete contour pixels. What is a
pixel on the contour? A pixel on the contour is part of the region R and has a value of 1 in a
binary representation, and it has at least one zero among its eight neighbors, which is therefore a
pixel that does not belong to region R. Slide 3.32 explains the basic idea of a thinning algorithm.
We have a pixel p1 and its eight neighbors p2 through p9 . We can now associate with a pixel p1
a number of non-zero neighbors by simply adding up the gray values of the eight neighborhoods.
We compute a second auxiliary number S of p1 , which is the number of transitions from zero to
one in the ordered set of values of pixels p2 to p8 . The decision whether a pixel p1 gets deleted or
not depends on the outcome of four computations. We compute ( also shown in Slide 3.34 ).
Pixel p is deleted if:
2 <= N (p1 ) <= 6
S(p1 ) = 1
p2 ∗ p4 ∗ p6 = 0
p4 ∗ p6 ∗ p8 = 0
Pixel p is also deleted if:
1. and 2. as above
p2 ∗ p4 ∗ p8 = 0
p2 ∗ p6 ∗ p8 = 0
The workings of the algorithm are further illustrated in Slide 3.35 for a particular pattern of pixels.
We compute in p1 and S(p1 ) to document the interpretation of the computation of N and S. Slide
3.36 illustrates the workings of the iterative algorithm. By going from a letter “H” to its skeleton,
74
CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE
we can see that after the initial iteration through all pixels, which pixels have been deleted. After
five iterations the result is obtained in slide Slide 3.36.
We have now dealt with the issue of converting a given vector to a set of binary pixels and have
denoted that as vector raster conversion, this is also denoted as scan conversion and it occurs in
the representation of vector data in a raster monitor environment. What we have not yet talked
about is the inverse issue, namely given is a raster and a pattern and we would like to get vectors
from it. We have touched upon a raster pattern and replacing it by a medial axis or skeleton. But
we have not yet really come out from that conversion with a set of vectors. Yet, the raster-vectorconversion is an important element in dealing with object recognition. A particular example has
been hinted at in Slide 3.36 because it clearly represents an example from character recognition.
The letter H in a binary raster image is described by many pixels. To recognize as a raster H,
might be based on a conversion to a skeleton, a replacement of the skeleton by a set of vectors and
then by submitting those vectors to a set of rules that would tell us which letter we are dealing
with.
Prüfungsfragen:
• Wenden Sie die medial axis“ Transformation von Bloom auf das Objekt in Abbildung B.39
”
links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen.
3.4. THE TRANSITION FROM THICK LINES TO SKELETONS
75
76
CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE
Slide 3.1
Slide 3.2
Slide 3.3
Slide 3.4
Slide 3.5
Slide 3.6
Slide 3.7
Slide 3.8
Slide 3.9
Slide 3.10
Slide 3.11
Slide 3.12
Slide 3.13
Slide 3.14
Slide 3.15
Slide 3.16
Slide 3.17
Slide 3.18
Slide 3.19
Slide 3.20
Slide 3.21
Slide 3.22
Slide 3.23
Slide 3.24
Slide 3.25
Slide 3.26
Slide 3.27
Slide 3.28
3.4. THE TRANSITION FROM THICK LINES TO SKELETONS
77
Slide 3.29
Slide 3.30
Slide 3.31
Slide 3.32
Slide 3.33
Slide 3.34
Slide 3.35
Slide 3.36
Slide 3.37
Slide 3.38
78
CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE
Chapter 4
Morphology
Prüfungsfragen:
• Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch,
mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser
Pixelanordnung.
4.1
What is Morphology
This is an interesting subject. It is not very difficult yet also not to be underestimated. We talk
about shape and the structure of objects in images. It’s a topic that has to do with binary image
processing. Recall that binary images have pixels that are only either black or white. Objects
typically are described by a group of black pixels and the background consists of all white pixels.
So one has a two-dimensional space of integer numbers to which we apply set theory in morphology.
Let us take an object - we call it A - and that object is hinged at a location designated in Slide 4.5
by a little round symbol. Morphology now says that A is a set of pixels in this two-dimensional
space. A separate object B is also defined by a set of pixels. We now translate A by distance x
and obtain a new set called Ax. The translation is described by two numbers, x1 and x2 , for the
two dimensions of the translation.
We can write the expression in Slide 4.6 to define the result A after the translation: Ax consists of
all pixels c, so that c is equal to a + x, where a are all the pixels from pixel set A. Geometrically
and graphically we can illustrate the translation very simply by the two distances x1 and x2 of
Slide 4.7. Instead of A we have (A)x. A very simple concept for humans becomes a somewhat
complex equation in the computer.
Morphology also talks about “reflection”.
We have a pixel set B and reflect it into a set B̂, which is the set of all pixels x such that x is
−b, where b is each pixel from pixel set B. The interpretation of −b is needed. Geometrically, B̂
is the mirror reflection of B, and we have mirrored B over the hinge point (point of reflection).
The next concept we look at is “complementing” a set A into a set AC . AC is the set of all pixels
x so that x are just all those pixels that do not belong to set A.
An object to be composed of all the pixels inside a contour is called A, and AC is the background.
Next we can take two or objects A, B and we build a difference A − B. The difference is the
set of all pixels x, such that x belongs to set A but not to set B. We can describe this by a new
symbol and say this is the interscetion of two sets, namely of set A and the complement B C of B.
79
80
CHAPTER 4. MORPHOLOGY
Definition 6 Difference
Given two objects A and B as sets of pixels (points of the 2D-Integer-space).
The difference of the two sets A and B is defined as
A − B = {x|x A, x not B} = A intersects B C .
Slide 4.14 shows A, B and A − B is now A reduced by the area of B covering part of A.
Prüfungsfragen:
• Was ist Morphologie“?
”
Antwort: die Anwendung nichtlinearer Operatoren auf die Form eines Objekts
4.2
Dilation and Erosion
“Dilation” means that we make something bigger (in German: Blähung). The symbol we use to
describe the dilation of a set A using a “structure element” B is shown in Slide ??. A dilated by
B is the collection of all the pixels x that belong to the reflected and translated structure element
B, and belong to A, provided they are not zero, or not empty.
This sounds pretty difficult, but when we look at it geometrically it is very simple.
A be a square with a side length d and B is another square of a diameter d/4. If we reflect B
around a reflection point that is in the center of the square then the reflection is the same as the
original structure element. Thus we reflect (with no effect) and shift B by a distance to pixel x.
As we go to each pixel x of set A, we place the (reflected) structure element there - we translate
B̂ to the location x - and we now have the union of the pixels in set A and the structure element
B̂. We add up all pixels that are in the union of A and B̂. What we do is to add a little fringe
around area A that is obtained by moving the pixels of set B over A and through all pixels of
A. B̂ will extend along the fringe of A, so we make A a little larger. If our structure element is
not a square but a rectangle of dimension d in one direction and d/4 in the other, then we obtain
an enlargement of our area A that is significant in one direction and less significant in the other
direction.
Algorithm 9 Dilation
1:
2:
3:
4:
5:
6:
7:
8:
9:
for all x do
Y = Translate(Reflect(B), x)
for all y element Y do
if (y element A) AND (x not element X) then
Insert(X, x)
end if
end for
end for
return X
Dilation has a sister operation called “erosion” (Abmagerung). The erosion is thex x opposite of a
dilation, and the symbol designating an erosion is a little circle with a minus in it, shown in Slide
4.18.
The result of an erosion consists of all those pixels x that come from the structure element B
placed at location x such that the shifted structure element completely lies within set A. How
does this look like geometrically?
4.2. DILATION AND EROSION
81
Definition 7 Erosion
X B = {d ∈ E 2 : Bd ⊆ X}
B . . . binary erosion matrix
Bd . . . B translated by d
X . . . binary image matrix
Outgoing from this equation we get to following equal expression:
\
X−b
X B =
b∈B
In Slide 4.19 we have subtracted from set A a fringe that has been deleted like an eraser of the
size of B. Doing this with a non-square but rectangular structure element we receive a result
that in the particular case of Slide 4.19 reduces set A to merely a linear element because there
is only one row of pixels that satisfies the erosion condition using this type of structure element
with dimensions d and d/4.
There is a duality of erosion and dilation because we can express an erosion of set A by structure
element B as a dilation taking the complement of AC and dilate the complement AC of A with a
reflection B̂ of B.
This is being demonstrated in Slide 4.21 where we go through the erosion definition of A by
structure element B and say the complement of that eroded object A by structure element B
equals the complement of the set of all pixels x, such that B gets placed over x and we count
those pixels of A that are not on B. We go through our previous definitions and we can show in
Slide 4.21 that we end up with a dilation of the complement AC of set A with the reflection B̂ of
structure element B.
Prüfungsfragen:
• Erläutern Sie die morphologische Erosion“ unter Verwendung einer Skizze und eines Forme”
lausdruckes.
• Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation
Erosion“ angewandt werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dila”
tion genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In anderen
Worten: statt der Erosion sollen andere morphologische Operationen eingesetzt werden, die
in geeigneter Reihenfolge nacheinander ausgeführt das gleiche Ergebnis liefern wie eine Erosion.) Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und
benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende Formelement ist ebenfalls in Abbildung B.65 dargestellt.
Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt aus der
Definitionsmenge Z2 zeigt!
Antwort: Die morphologische Erosion kann durch eine Abfolge der folgenden Operationen
ersetzt werden (siehe Abbildung 4.1):
1. Komplement
2. Dilation
3. Komplement
82
CHAPTER 4. MORPHOLOGY
1
2
3
Formelement
Figure 4.1: Morphologische Erosion als Abfolge Komplement→Dilation→Komplement
• Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich
durch die Gleichung
(A B)c = Ac ⊕ B̂
formulieren. Warum ist in dieser Gleichung die Reflexion (B̂) von Bedeutung?
• Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen Erosion“
”
bzw. Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur
”
Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion
bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden kann!
Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen.
Antwort: Man betrachtet den gewünschten Kernel für die morphologischen Operationen
als Filtermaske (mit 1“ für jedes gesetzte Pixel“ im Kernel, 0“ sonst) und faltet das
”
”
”
Binärbild mit dieser Maske. Im Ergebnisbild stehen nun Werte g(x, y), wobei
– g(x, y) ≥ 1, wenn mindestens ein Pixel der Maske mit dem Inputbild in Deckung war
(Dilation), bzw.
– g(x, y) ≥ K, wenn alle Pixel der Maske mit dem Inputbild in Deckung waren (Erosion),
wobei K die Anzahl der gesetzten Maskenpixel ist.
4.3
Opening and Closing
We have a more complex operation that is a sequence of previously defined operations. We call
them “opening” and “closing”. Let’s take first the question of opening. We may have two objects,
one to the left and the other one to the right and they are connected by a thin bridge, perhaps
because of a mistake in sensing and preprocessing of the data.
4.3. OPENING AND CLOSING
83
We can separate those two objects by an operation called “opening”. Opening a set A by means
of a structure element B is defined by a symbol shown in Slide 4.24 namely by an open little
circle. This begins with the erosion of A using structure element B, and subsequently dilating
again the result by the structure element B as well. So we first shrink, then we enlarge again. But
in shrinking we get rid of certain things that are not there anymore when we enlarge.
Slide 4.25 shows the circular structure
Definition 8 Open
A ◦ B = (A B) ⊕ B
◦. . .open , . . .erosion , ⊕. . .dilation ; B is a circular structure element
element B and the original object A. As we now erode object A and obtain a shrunk situation,
object A is certainly broken up into two eroded smaller objects.
Now the bridge between the two points in the original set A is narrower than the size of the
structure element, so the structure element will, like an eraser, erase that bridge. Now we want to
go back to the original size. So we dilate with the structure element B again and what we obtain
is now the separation of thinly connected objects.
Slide 4.27 and Slide 4.28 are a summary of the opening operation.
We proceed to the “closing” operation. Closing set A with the help of structure element B is
defined by a little filled circle.
We first dilate A by B and then we erode the result by structure element B. We do the opposite
of opening. The process will remove little holes in things. One will not break up, but connect,
one will fill in, remove noise.
Definition 9 Closing
A • B = (A ⊕ B) B
Erosion: remove all structures smaller than the structure element B
⊕ Dilation: restore the original size excepting the removed structures
Closing set A with structure element B means to first dilate A by B and afterwards erode
the result by structure element B.
Slide 4.30 Slide 4.31 Slide 4.32 Slide 4.33 feature a complex shape. The shape seems to break
apart when it really should not. We take the original figure and dilate (make it larger). As it
grows, this will reduce small details. The resulting object is less sophisticated, less detailed than
we had before.
Closing an object A using the structure element B can again be shown to be the dual with opening,
concerning complementarity and reflection. Closing an object A with respect of structure element
B and creating the complement of the result is the same as opening the complement of A with
the mirror reflection of structure element B.
Prüfungsfragen:
• Erläutern Sie das morphologische Öffnen“ unter Verwendung einer Skizze und eines Formel”
ausdruckes.
84
CHAPTER 4. MORPHOLOGY
Formelement
Figure 4.2: morphologisches Öffnen
• Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man1 die zugrundeliegenden Operationen (Erosion und Dilation) wiederholt ausführen. Welches der
folgenden beiden Verfahren führt zum gewünschten Ergebnis:
1. Es wird zuerst die Erosion n-mal ausgeführt und anschließend n-mal die Dilation, also
(((A B) . . . B) ⊕B) . . . ⊕ B
|
{z
}|
{z
}
n−mal n−mal ⊕
2. Es wird die Erosion ausgeführt und anschließend die Dilation, und der Vorgang wird
n-mal wiederholt, also
(((A B) ⊕ B) . . . B) ⊕ B
|
{z
}
n−mal abwechselnd /⊕
Begründen Sie Ihre Antwort und erklären Sie, warum das andere Verfahren versagt!
Antwort: (a) ist richtig, bei (b) bleibt das Objekt nach der ersten /⊕-Iteration unverändert.
• Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öff”
nen“ mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische
Effekt tritt auch in diesem Beispiel auf?
Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis
”
”
rechts in Abbildung B.31 eintragen.
Antwort: siehe Abbildung 4.2, typischer Effekt: Trennung von Regionen, die durch eine
schmale Brücke“ verbunden sind
”
4.4
Morphological Filter
Definition 10 Morphological filter
A morphological filter consits out of one or more morphologic operations such as dilation, erosion,
open, close, hit and miss that are applied sequentially to an input image.
A very simple application is morphological filtering. Say we have an object such as an ice floe on
the ocean and we have several little things floating around it. We would like to recognize and map
the large ice floe.
1 abgesehen
von einer Vergrößerung des Maskenelements B
4.5. SHAPE RECOGNITION BY A HIT OR MISS OPERATOR
85
We would like to isolate this object, measure its surface, its contour, see where it is. In an
automated process we need to remove all the trash around it. We need to fill the holes and get
rid of the extraneous details on the open water.
Morphological filtering is illustrated in Slide 4.38 and Slide 4.39.
We find a structure element which has to be a little larger than the elements that we would like to
remove. We first let that structure element run over the image and perform an erosion operation.
When we erode with the structure element every object that is smaller than that structure element
will disappear, but those holes will get bigger. We follow with dilation after the erosion. That
combination is what we call the opening operation. We have removed all the small items outside
the object, but the elements inside the object are still there.
We now do the opposite operation, namely the closing. That means we take the opening result
and do a dilation, which increases the size of the object in such a way that it will also close up all
the holes, then we have to shrink it again. We have to do a dilation with our structure element B
and that operation is “closing”. The sequence of opening, thinning a result, and closing, produces
a clean object without extraneous detail. We have applied morphological filtering.
Prüfungsfragen:
• Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern
Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert.
Verwenden Sie bitte dazu Formelausdrücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf. Stellen Sie auch das Ergebnisbild dar.
• Erklären Sie anhand eines Beispiels den Vorgang des morphologischen Filterns!
4.5
Shape Recognition by a Hit or Miss Operator
Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have
three small objects X, Y and Z and we would like to find object X as shown in Slide 4.41
The union of X, Y , and Z is denoted as the auxiliary object A. Now we define a structure element
W , and from that structure element a second structure element as the difference of W and shape
X that we are looking for. That gives an interesting structure element which in this case looks
like the frame of a window. We build the complement AC of A, which is the background without
the objects X, Y , and Z.
If we erode A with X then the object that is smaller than X gets wiped out, the object that is
larger than X will be showing as an area which results from the erosion by object X. For X we
obtain a single pixel in Slide 4.42. The automated process has produced pixels that are candidates
for the object of interest, X. We need to know which pixel to choose.
We go through this operation again, but use AC as the object and W − X as structure element.
The erosion of AC by the structure element W − X produces the background with an enlarged
hole for the 3 objects X, Y , and Z, and two auxiliary objects, namely the single pixel where our
X is located and a pattern consisting of several pixels for the small objects in Slide 4.43. We
intersect the two erosion results we had obtained, once eroding A with object X, the other with
AC eroded by W − Z. The intersection produces a single pixel at the location of our object X.
This is the so-called Hit-or-Miss-Method of finding an instance where object X exists.
All other objects that are either bigger or smaller will disappear. The process and the formula
are shown in Slide 4.46. Slide 4.46 summarizes the Hit-or-Miss Process that was illustrated in the
previous paragraph. The process uses a symbol with a circle and a little asterisk in it. Again: A
86
CHAPTER 4. MORPHOLOGY
Definition 11 Hit or Miss Operator
A ⊗ W = (A W1 ) ∩ (AC W2 )
Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have
three small objects X, Y and Z and we would like to find object X. The union of X, Y , and Z is
denoted as the auxiliary object X. Now we define a structure element W , and from that structure
element a second structure element as the difference of W and shape X that we are looking for.
That gives an interesting structure element which in this case looks like the frame of a window.
We build the complement AC of A, which is the background without the objects X, Y and Z. If
we erode A with X then the object that is smaller than X gets wiped out, the object that is larger
than X will be showing as an area which results from the erosion by object X. For X we obtain
a single pixel. The automated process has produced pixels that are candidates for the object of
interest, X. We need to know which pixel to choose.
We go through this operation again, but use AC as the object and W − X as structure element.
The erosion of AC by the structure element W − X produces the background with an enlarged
hole for the 3 objects X, Y , and Z and two auxiliary objects, namely the single pixel where our
X is located and a pattern consisting of several pixels for the small objects. We intersect the
two erosion results we had obtained, once eroding A with object X, the other with AC eroded
by W − Z. The intersection produces a single pixel at the location of our object X. This is the
so-called Hit-or-Miss-Method of finding an instance where object X exists.
All other objects that are either bigger or smaller will disappear. The process uses a symbol with
a circle and a little asterisk in it. Again: A is eroded by X and the complement of A is eroded by
W − X. The two results get intersected. We have two structure elements, X and W − X.
is eroded by X and the complement of A is eroded by W − X. The two results get intersected.
We have two structure elements, X and W − X.
Slide 4.46 shows that the equation can be rewritten in various forms.
Prüfungsfragen:
• Wie ist der Hit-or-Miss“-Operator A ~ B definiert? Erläutern Sie seine Funktionsweise zur
”
Erkennung von Strukturen in Binärbildern!
Antwort: Es gilt
A ~ B = (A  B) ∩ AC  (W − B) ,
wobei W ein Strukturelement größer als B ist. Bei Erosion von A mit B verschwinden
alle Teile von A, die kleiner sind als B, ein Teil in der Form von B bleibt als isoliertes
Pixel zurück. Bei Erosion von AC mit W − B werden alle Löcher von AC , die größer sind
als B, aufgeweitet, während Teile der Form B wieder ein einzelnes Pixel ergeben. Der
Mengendurchschnitt liefert also genau dort ein gesetztes Pixel, wo ein Teil von A mit B
identisch ist.
4.6
Some Additional Morphological Algorithms
Morphological algorithms that are commonly used deal with finding the contour of an object,
findintranslationg the skeleton of an object, filling regions, cutting off branches from skeletons. The
whole world of morphological algorithms is clearly applicable in character recognition, particularly
in dealing with handwriting. It is always applied in those cases where the object of interest can
be described in a binary image, where we do not need color nor gray values. Instead we simply
have object or non-object.
4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS
87
Given an object A in Slide 4.48 and Slide 4.49, we are looking for the contour of A as b(A). We
use a structure element B to find the contour. The contour of region A is obtained by subtracting
from A an eroded version of A. The erosion should just be by one pixel. Structure element B is
a 3 × 3 window.
Definition 12 Contour
We present the formal definition of a contour. It is the digital counterpart of a boundary of an
analog set.
We are looking for the contour of A as b(A).
b(A) = A − (A B)
(4.1)
We use a structure element B to find the contour. The contour of region A is obtained by
subtracting from A an eroded version of A. The erosion should just be by one pixel. Structure
element B is a 3x3 window which looks like :


a11 a12 a13

.. 
..
B =  ...
(4.2)
.
. 
a31
a32
a33
The contour of a connected set of points R is defined as the points of R having at least one neighbor
not in R. The contour is the outline or visible edge of a mass, form or object.
Slide 4.49 shows the erosion of region A and the difference from region A to get to the contour
pixels.
Region filling is the opposite operation, starting from a binary representation of a contour. We
want to fill the interior of the contour. This particular contour is continuous, non-interrupted,
under an 8-neighborhood relationship (recall: up, down, left, right plus all oblique relationships).
We build the complement AC of contour A. The structure element B is again a 3 × 3 matrix
but only using the 4-neighbors. Region filling is an iterative process according to Slide 4.51.
We get a running index k, which increases as we go through the iterations, create at each step
an intermediate result that always looks back at the complement AC of A, using the structure
element B and applying a dilation of the previous iteration by the structure element and the union
with the complement AC of A, and repeat this step by step, until such time that we do not get
any new pixels added. The issue is the starting point X0 , which is an arbitrary pixel inside the
contour from which we start the process.
A final illustration of the usefulness of morphology deals with the automated recognition of zip
codes that are hand-written.
Slide 4.52 and Slide 4.53 presents a hand-written address that is being imaged. Through some
pre-processing that hand-writing has been converted into a binary image. The first step might
be to threshold the gray-tone image to convert to a binary image. After having thresholded the
address we need to find the area with the zip-codes.
Let us address the task of extracting all connected components in the area that comprises the
address field. From a segmentation into components, one finds rectangular boxes containing a
connected object. One would assume to have now each digit separate from all the other digits.
However, if two digits are connected like in this example with a digit 3 and a digit 7, then we
misread this to be one single digit. We can help ourselves considering the shape of the rectangular
box, plus using knowledge about how many digits one has in a zip-code. It is five basically in
the United States, so one needs to have five digits and so one can look for joined characters by
measuring the relative widths of the boxes that enclose the characters. We must expect certain
dimensions of the box surrounding a digit. Opening and closing operations can separate digits
that should be separate, or merge broken elements that should describe a single digit. Actual
character recognition (OCR for “Optical Character Recognition”) then takes each binary image
88
CHAPTER 4. MORPHOLOGY
window with one digit and seeks to find which value between 0 and 9 this could be. This can be
based on a skeleton of each segment, and a count of the structure with nodes and arcs. We will
address this topic later in this class.
As a short outlook beyond morphology of binary images, let’s just state that there is a variation
of morphology applied to gray value images.
Gray-tone images can be filtered with morphology, and an example is presented in Slide 4.55.
Prüfungsfragen:
• Gegeben sei die in Abbildung ?? dargestellte Pixelanordnung. Beschreiben Sie grafisch und
mittels Formel das Verfahren der morphologischen Ermittlung des Umrisses des dargestellten
Objektes mit einem von Ihnen vorzuschlagenden Strukturelement.
• Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des
Randes eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete
Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung
B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse.
4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS
89
90
CHAPTER 4. MORPHOLOGY
Slide 4.1
Slide 4.2
Slide 4.3
Slide 4.4
Slide 4.5
Slide 4.6
Slide 4.7
Slide 4.8
Slide 4.9
Slide 4.10
Slide 4.11
Slide 4.12
Slide 4.13
Slide 4.14
Slide 4.15
Slide 4.16
Slide 4.17
Slide 4.18
Slide 4.19
Slide 4.20
Slide 4.21
Slide 4.22
Slide 4.23
Slide 4.24
Slide 4.25
Slide 4.26
Slide 4.27
Slide 4.28
4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS
91
Slide 4.29
Slide 4.30
Slide 4.31
Slide 4.32
Slide 4.33
Slide 4.34
Slide 4.35
Slide 4.36
Slide 4.37
Slide 4.38
Slide 4.39
Slide 4.40
Slide 4.41
Slide 4.42
Slide 4.43
Slide 4.44
Slide 4.45
Slide 4.46
Slide 4.47
Slide 4.48
Slide 4.49
Slide 4.50
Slide 4.51
Slide 4.52
Slide 4.53
Slide 4.54
Slide 4.55
92
CHAPTER 4. MORPHOLOGY
Chapter 5
Color
5.1
Gray Value Images
A precursor to color images is of course a black & white image. Some basic issues can be studied
with black & white images before we proceed to color. A regular gray value image is shown in
Slide 5.3. We need to characterize a gray value image by its densities, the way it may challenge
our eyes, the manner in which it captures the physics of illumination and reflection, and how it
is presented to the human viewer. We have discussed such concepts as the density of film, the
intensity of the light that is reflected from objects, and the quality of an image in terms of its
histogram.
Intensity describes the energy, light or brightness. When an intensity value is zero, we are talking
about darkness, no light. If the intensity is bright, then we should have a large value describing
it. The opposite is true for film. A film with a density zero is completely transparent, whereas a
film at a density 4 is totally opaque and will not let any light go through.
A negative film that is totally transparent is representing an object that does not send any light
through the optical system. A negative that is totally opaque had been brightly illuminated. The
opposite is true for a positive film. The darker the positive film, the less light it represents.
In Chapter 2 we already talked about the eye, but we did not address sensitivity of the eye to
brightness, energy and light. A notable characteristic of the eye is that it is very sensitive to ratios.
If we present to an eye two different brightnesses, let’s say a density of 0.11 and 0.10 then the eye
might perceive this as if it were the same densities as 0.55 and 0.5, both being 10% different from
one another. The sensitivity of the eye to differences ∆I of the intensity of light I is expressed by
the Weber-Ratio ∆I/I.
What is now the interval, as a ratio r, when presenting an image with n discrete gray values? Let
us define n intensity steps I:
In = rn I0
If we say intensity I is the maximum and intensity I0 is the minimum, then we will have to compute
the value of r that allows to break up the interval I0 to I into n steps. Slide 5.5 illustrates the
issue:
r
In
r= n
I0
If n = 3, then we have 4 different levels of intensity, namely 1/8, 1/4, 1/2 and 1. The eye needs
an r value of 0.01 or the differences between two intensities will not be recognizable, conceptually
presenting a capability of resolving 100 different gray values.
93
94
CHAPTER 5. COLOR
A monitor presents an intensity I, that is a function of N , the number of electrons creating the
intensity on the monitor. Slide 5.6 presents the relationship. Film has a density that relates
linearly to the logarithm of the energy of light that falls onto the film. The dynamic range is the
ratio of the highest and lowest intensity that a medium can represent. In the event of a monitor
that value might be 200, in the event of film it might be 1000, in the event of paper it might be
100. Note that the dynamic range d is the power of base 10 that the medium can support, thus
10d. For film the ratio of brightest and darkest intensity is 1000 and therefore film typically has
a density range d = 3, whereas paper lies at d < 2.
Continuous tone photography cannot be printed directly. Instead one needs to create so-called
half tone images by means of a raster pattern. These images make use of the spatial integration
that human eyes perform. A half tone is a representation of a gray tone. The image is resolved
into discrete points, each point is associated with an area on paper. At each point one places a
small dot proportional in size to the density of the object. If it is bright the dots are small, and
they are large dots if the object is dark. One denotes this also as screening of a gray tone image.
Note that screening typically is arranged at an angle of 45o . In Slide 5.7 is a so-called half-tone
image. Slide 5.8 makes the transition to the digital world. Gray tones can be obtained in a digital
environment by substituting for each pixel a matrix of subpixels. If we have 2 × 2 subpixels we
can represent five gray values as shown in Slide 5.9. Similarly, a 3 × 3 pattern will permit one
to represent 10 different gray values. We call the matrix into which we subdivide pixels a dither
matrix : a D2 - dither matrix means that 2 × 2 pixels are used to represent one digital gray value
of a digital image.
The basic principle is demonstrated in Algorithm 10. An example for the creation of a 3 × 3 dither
matrix would be:

6
D= 1
5
8
0
2

4
3 
7
An image gray value is checked against each element of the dither matrix and only those pixels
are set, where the gray value is larger than the value in the dither matrix.
For a gray value of 5 the given matrix D would produce the following pattern

0
P= 1
0
0
1
1

1
1 
0
A dither matrix of n × n defines n2 + 1 different patterns. It should be created wisely in order not
to define patterns that produce artefacts. For instance the following pattern (for a gray value of
3) would create horizontal lines if applied on larger areas.

5
D= 1
8


3 6
0
v=3
0 2  −→ P = 1
4 7
0
0
1
0

0
1 
0
Prüfungsfragen:
• Was versteht man unter dem dynamischen Bereich“ eines Mediums zur Wiedergabe bild”
hafter Informationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung? Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen
Bereiches!
5.2. COLOR IMAGES
95
Algorithm 10 Halftone-Image (by means of a dither matrix)
1:
2:
3:
4:
5:
6:
Häufigkeit
7:
8:
9:
10:
11:
dm = createDitherMatrix(n, n)
{create a Dither-Matrix n × n}
for all pixels (x, y) of the image do
v == getGrayValueOfPixel(x, y)
for all elements (i, j) of dm do
{checking the value against the matrix}
if v > dm(i, j) then
setPixel(OutputImage,x · n + i,y · n + j,black)
{applying the pattern}
else
setPixel(OutputImage,x · n + i,y · n + j,white)
end if
end for
end for
Grauwert
Figure 5.1: Histogramm von Abbildung B.29
• Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt,
wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster
dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt?
• Skizzieren Sie das Histogramm des digitalen Grauwertbildes aus Abbildung B.29, und kommentieren Sie Ihre Skizze!
Antwort: Das Histogramm ist bimodal, wobei die Spitze im Weiß-Bereich etwas flacher ist
als im Schwarz-Bereich, da das Bild im hellen Bereich mehr Struktur aufweist als im dunklen
Bereich (siehe Abbildung 5.1).
5.2
Color images
Of course computer graphics and digital image processing are significantly defined by color. Color
has been a mysterious phenomenon through the history of mankind and there are numerous models
that explain color and how color works.
Slide 5.12 does this with a triangle: the three corners of the triangle represent white, black and
color, so that the arcs of the triangle represent values of gray, tints between white and pure color
or shades between pure color and black. The concept of tones fills the area of the triangle. A color
is being judged against existing color tables. A very widely used system is by Munsell. This is
organized along 3 ordering schemes: hue (color), value (lightness) and saturation. These 3 entities
can be the coordinate axes of a 3D space. We will visit the 3-dimensional idea later in subtopic
5.5.
96
CHAPTER 5. COLOR
Color in image processing represents us with many interesting phenomena. The example in Slide
5.16 is a technical image, a so-called false color image. In this case film is being used that is not
sensitive to blue, but is instead sensitive to green, red and infrared. In this particular film, the
infrared light falling onto the emulsion will activate the red layer in the film. The red light will
activate the green layer, the green light will activate the blue layer. As a result, an image will
show infrared as red. Slide 5.16 is a vegetated area. We recognize that vegetation is reflecting a
considerable amount of infrared light, much more so than red or green light. Healthy vegetation
will look red, sick vegetation will reflect less infrared light and will therefore look whitish.
Color images not only serve to represent the natural colors of our environment, or the electromagnetic radiation as we receive it with our eyes or by means of sensors, but color may also be used
to visualize things that are totally invisible to humans.
Slide 5.18 is an example of a terrain elevation in the form of color, looking at the entire world.
Similarly, Slide 5.19 illustrates the rings of planet Saturn and uses color to highlight certain
segments of those rings to draw the human observer’s attention. The colors can be used to mark
or make more clearly visible to a human interpreter a physical phenomenon or particular data
that one wants the human to pay attention to. This is called pseudo-color .
Prüfungsfragen:
• Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild (pseudo color image)? Nennen Sie je einen typischen Anwendungsfall!
5.3
Tri-Stimulus Theory, Color Definitions, CIE-Model
The eye has color sensitive cones around the fovea, the area of highest color sensitivity in the eye.
It turns out that these cones are not equally sensitive to red, green and blue. Slide 5.22 shows that
we have much less sensitivity to blue light than we have to green and red. The eye’s cones can see
the electromagnetic spectrum from 0.4 to 0.7 µm wavelength (or 400 to 700 nanometers). We find
that the eye’s rods are most sensitive in the yellow - green area. Sensitivity luminance is best in
that color range. Slide 5.23 illustrates the concept of the tri-stimulus idea. The tri-stimulus theory
is attractive since it explains that all colors can be made from only 3 basic colors. If one were to
create all spectral colors from red, green, blue, our cones in the eye would have to respond at the
levels shown in Slide 5.13. The problem exists that one would have to allow for negative values
in red, which is not feasible. So those colors cannot be created. Such colors are being falsified by
too much red.
The physics of color is explained in Slide 5.25.
White light from the sun is falling onto an optical prism, breaking up the white light into the
rainbow colors from ultraviolet via blue, green, yellow, orange to red and on to infrared. These
are the spectral colors first scientifically explained by Sir Isaac Newton in 1666. We all recall from
elementary physics that the electromagnetic spectrum is ordered by wavelength or frequency and
goes from cosmic rays via gamma rays and X rays to ultraviolet, then on to the visible light, from
there to near infrared, far infrared, microwaves, television and radio frequencies. Wavelengths of
visible light range between 0.35 µm to 0.7 µm. Ultraviolet has shorter wavelengths in the range
of 0.3 µm, infrared goes from 0.7 to perhaps several 300 µm.
We would like to create color independent of natural light. We have two major ways of doing this.
One is based on light, the other on pigments. We can take primary colors of light and mix them
up.
These primary colors would be green, blue and red, spectrally clean colors. As we mix equal
portions of those three, we produce a white color. If we mix just two of them each we get yellow,
cyan and magenta.
5.3. TRI-STIMULUS THEORY, COLOR DEFINITIONS, CIE-MODEL
97
In contrast to additive mixing of light there exist subtractive primaries of pigments. If we want to
print something we have colors to mix. Primary colors in that case are magenta, yellow and cyan.
As we mix equal parts we get black. If we mix pairs of them, we get red, green and blue. We call
yellow, magenta and cyan primary colors, green, red and blue secondary colors of pigment. To
differentiate between subtractive and additive primaries, we talk about pigments and light. An
important difference between additive and subtractive colors is the manner in which they are being
generated. A pigment absorbs a primary color of light and reflects the other two. Naturally then,
if blue and green get reflected but red is absorbed, that pigment appears cyan, and represents
the primary pigment “cyan”. The primary colors of light are perceived by the eye’s cones on the
retina as red, green and blue, and combinations are perceived as secondary colors.
The Commission Internationale of Éclairage (CIE) has been responsible for an entire world of
standards and definitions. As early as 1931, CIE confirmed the spectral wavelenghts for red with
100 nm, green with 546.1 nm and blue with 435.8 nm.
So far we have not yet been concerned about the dimensions of the color issue. But Munsell
defined concepts such as hue1 , intensity (value or lightness), and saturation or chroma2 . We
can build from such concepts a three dimensional space and define chromaticity, thus color, as a
2-dimensional subspace.
The necessity of coping with negative color as one builds spectral colors from RGB has led the
Commission Internationale l’Éclairage (CIE) to define 3 primary colors X, Y and Z. CIE defined
their values to form the spectral colors as shown in Slide 5.27
The Y -curve was chosen to be identical to the luminous efficiency function of the eye.
The auxiliary values X, Y and Z are denoted as tri-stimulus values, defining tri-chromatic coefficients x, y, z as follows:
x =
y
=
z
=
X
X +Y +Z
Y
X +Y +Z
Z
X +Y +Z
and x + y + z = 1.
A 3-dimensional space is defined by X, Y, Z and by x, y, z. X, Y, Z are the amounts of red,
green, and blue to obtain a specific color; whereas x, y, z are normalized tri-chromatic coefficients.
One way of specifying color with the help of the tri-chromatic coefficients is by means of a CIE
chromaticity diagram.
A two dimensional space is defined by the plane x + y + z = 1 with an x-and a y-axis, whereby
the values along the x-axis represent red, and y is green. The values vary between 0 and 1. The
z-value (blue) results from z = 1 − x − y.
There are several observations to be made about the CIE chromaticity diagram:
1. A point is marked as “green”, and is composed of 62% green, 25% red and from z = 1−x−y,
13% blue.
2. Pure spectral colors from a prism or rainbow are found along the edge of the diagram, with
their wavelength in nm.
1 in
2 in
German: Farbton
German: Sättigung
98
CHAPTER 5. COLOR
3. Any point inside the tongue-shaped area represents a color that cannot only be composed
from x, y and z, but also from the spectral colors along the edge of the tongue.
4. There is a point marked that has 33% of x, 33% of y and 33% of z and in the CIE-value for
white light.
5. Any point along the boundary of the chromaticity chart represents a saturated color.
6. As a point is defined away from the boundary of the diagram we have a desaturated color
by adding more white light. Saturation at the point of equal energy is 0.
7. A straight line connecting any 2 colors defines all the colors that can be mixed addditively
from the end points.
8. From the white point to the edge of the diagram, one obtains all the shades of a particular
spectral color.
9. Any three colors I, J, K define all other colors that can be mixed from them, by looking at
the triangle by I, J, K.
Definition 13 Conversion from CIE to RGB
To device-specifically transform between different monitor RGB-spaces we can use transformations
from a particular RGBmonitor -space to CIE XYZ-space.
The general transformation can be written as:
X
Y
Z
= X r · Rm + X g · Gm + X b · B m
= Yr · Rm + Yg · Gm + Yb · Bm
= Zr · Rm + Zg · Gm + Zb · Bm
Under the assumption that equal RGB voltages (1,1,1) should lead to the colour white and specifying chromaticity coordinates for a monitor consisting of long-persistence phosphors like this:
x
y
red
0.620 0.330
we have for example:
green 0.210 0.685
blue 0.150 0.063
X
Y
Z
= 0.584 · Rm + 0.188 · Gm + 0.179 · Bm
= 0.311 · Rm + 0.614 · Gm + 0.075 · Bm
= 0.047 · Rm + 0.103 · Gm + 0.939 · Bm
The inverse transformation is:
Rm
Gm
Bm
= 2.043 · X − 0.568 · Y − 0.344 · Z
= −1.036 · X + 1.939 · Y + 0.043 · Z
= 0.011 · X − 0.184 · Y + 1.078 · Z
Prüfungsfragen:
• Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer
Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen,
und welche den Schnittpunkten der Geraden durch A, B mit dem Rand des CIE-Farbraumes?
5.4. COLOR REPRESENTATION ON MONITORS AND FILMS
99
• Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt werden? Begründen Sie Ihre Antwort anhand einer Skizze!
5.4
Color Representation on Monitors and Films
The CIE chromaticity diagram describes more colors than the subset that is displayable on film
on a monitor, or on a printer.
The subset of colors that may be displayable on a medium can be represented from its primary
colors in an additive system. A monitor uses the RGB model. In order for the same color to
appear on a printer that was perceived on a monitor, and that might come from scanning color
film, the proper mix of that color from the triangles can be assessed via the CIE chromaticity
diagram.
Prüfungsfragen:
• Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit
der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz?
5.5
The 3-Dimensional Models
The tri-stimulus values x, y, z define a 3D space as shown in Slide 5.33 with the plane x + y + z = 1
marked. If a color monitor builds its colors from 3 primaries RGB, then it will be able to display
a subset of the CIE-colors.
The xyz-space is shown in Slide 5.35 in 3 views.
We extend our model to a three dimensional coordinate system with the red, green and blue color
axes, the origin at black, a diagonal extending away from the origin under 45 degrees with each
axis giving us gray values until we hit the white point. The red-blue plane defines the magenta
color, the red-green plane defines yellow and the green-blue plane defines cyan. That resulting
color model is shown in Slide 5.36 and is illustrated in Slide 5.37.
The RGB values range between 0 and 1. The RGB model is the basis of remote sensing and
displaying color images on various media such as monitors.
How does one modify the histogram of an RGB-image? Clearly changing the intensity of each
component image separately will change a resulting color. This needs to be avoided. We will
discuss other color models that will help here.
Prüfungsfragen:
• Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie
mindestens drei Beispiele davon!
5.6
CMY-Model
Prüfungsfragen:
• Gegeben sei ein Farbwert CRGB = (0.8, 0.5, 0.1)T im RGB-Farbmodell.
100
CHAPTER 5. COLOR
Definition 14 CMY color model
CMY stands for:
C . . . Cyan
M . . . Magenta
Y . . . Yellow
The three dimensional geometric representation of the CMY-Model can be done in the same way
as the RGB-Model representation i.e a cube.
In contrast to the RGB-Model the CMY-Model uses the principle of subtractive colors.
Subtractive colors are seen when pigments in an object absorb certain wavelengths of white light
while reflecting the rest.
We see examples of this all around us. Any colored object, whether natural or man-made, absorbs some wavelengths of light and reflects or transmits others; the wavelengths left in the reflected/transmitted light make up the color we see.
Some examples:
• White light falling onto a cyan pigment will be reflected as a mix of blue and green since
red will get absorbed.
• White light falling onto a magenta pigment will be reflected as a mix of red and blue since
green will get absorbed.
• White light falling onto a yellow pigment will be reflected as a mix of red and green since
blue will get absorbed.
Therefore the conversion of RGB to CMY is supported by the physics of light and pigments. This
leads to the following conversion-formulas:
C
M
Y
= 1−R
= 1−G
= 1−B
R
G
B
= 1−C
= 1−M
= 1−Y
The CMY-Model is not used on monitors but in printing.
5.7. USING CMYK
101
1. Welche Spektralfarbe entspricht am ehesten dem durch CRGB definierten Farbton?
2. Finden Sie die entsprechende Repräsentation von CRGB im CMY- und im CMYKFarbmodell!
Antwort:
CCMY
K
CCMYK
= (1, 1, 1)T − CRGB = (0.2, 0.5, 0.9)T
= min(C, M, Y ) = 0.2
= (0, 0.3, 0.7, 0.2)T
Der gegebene Farbton entspricht etwa orange.
5.7
Using CMYK
Definition 15 CMYK color model
CMYK is a scheme for combining primary pigments. The C stands for cyan (aqua), M stands for
magenta (pink), Y is yellow, and K stands for black. The CMYK pigment model works like an
”upside-down” version of the RGB (red, green, and blue) color model. The RGB scheme is used
mainly for computer displays, while the CMYK model is used for printed color illustrations (hard
copy).
K is being defined as the minimum of C 0 , M 0 , and Y 0 so that C is really redefined as C 0 − K, M
as M 0 − K, and Y as Y 0 − K.
Conversion from RGB to CMYK:
C0
M0
Y0
K
C
M
Y
=
=
=
=
=
=
=
1−R
1−G
1−B
min(C 0 , M 0 , Y 0 )
C0 − K
M0 − K
Y0−K
Defining K (black) from CMY is called undercolor removal . Images become darker than they
would be as CMY-alone, and there is less need for expensive printing colors CMY, which also
need time to dry on paper.
Prüfungsfragen:
• Entsprechend welcher Formel wird eine CMYK-Farbdarstellung in eine RGB-Darstellung
übergeführt?
• Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in
das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird
das CMYK-Modell verwendet?
• Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit
der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz?
• Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 0% magenta, 50% gelb und 30% schwarz
gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den
Farbton in Worten!
102
CHAPTER 5. COLOR
Antwort: es ist
CCMYK
CCMY
CRGB
= (0.7, 0.0, 0.5, 0.3)T
= (1, 0.3, 0.8)T
= (0, 0.7, 0.2)T
Die Farbe entspricht einem leicht bläulichen Grünton.
5.8
HSI-Model
The hue-saturation-intensity color model derives from a transformation of the RGB color space
that is rather complicated. The HSI-model is useful when analyzing images where color and
intensity is important by itself. Also one may do an improvement of the image in its HSI-version,
not the natural RGB-representation.
Slide 5.44 introduces the transition from RGB to HSI. A color located at P in the RGB triangle
has its hue H described by the angle with respect to the red axis. Saturation S is the distance
from the white point, thus from the point of equal RGB at the center of the triangle.
Intensity is not within the triangle of Slide 5.44, but is perpendicular to the triangle in plane, Slide
5.45 explains. The HSI-model is thus a pyramid - like shape. It is visualized in Slide 5.46.
Conversion of RGB to HSI has been explained in concept, but it is based on one elaborate algorithm. The easiest element is intensity I which simply is I = 1/3(R + B + G). We do not detail
H and S, nor do we address the inverse conversion from HSI to RGB.
5.9
YIQ-Model
Prüfungsfragen:
• Zum YIQ-Farbmodell:
1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell?
2. Wo wird das YIQ-Farbmodell eingesetzt?
• Ein Farbwert CRGB = (R, G, B)T im RGB-Farbmodell wird in den entsprechenden Wert
CYIQ = (Y, I, Q)T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet:


0.299 0.587
0.114
CYIQ =  0.596 −0.275 −0.321  · CRGB
0.212 −0.528 0.311
Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix ausgedrückt? (Hinweis: Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem
Zusammenhang die Y-Komponente hat.)
5.10
HSV and HLS -Models
Variations on the HSI-Models are available. The HSV model (Hue-Saturation-Value) is also called
the HSB model with B for brightness. This responds to the intuition of an artist, who thinks
5.10. HSV AND HLS -MODELS
103
Definition 16 YIQ color model
This model is used in U.S. TV broadcasting. The RGB to YIQ transformation is based on a
well-known matrix M .


0.299 0.587
0.114
M =  0.596 −0.275 −0.321 
0.212 −0.523 0.311
The Y -component is all one needs for black & white TV. Y has the highest bandwidth, I and Q
get less. Transmission of I, Q are separate from Y , where I, Q are encoded in a complex signal.
RGB to YIQ Conversion:
Y = 0.299 · R + 0.587 · G + 0.114 · B
I = 0.596 · R − 0.275 · G − 0.321 · B
Q = 0.212 · R − 0.523 · G + 0.311 · B
YIQ to RGB Conversion:
R
G
B
= 1 · Y + 0.956 · I + 0.621 · Q
= 1 · Y − 0.272 · I − 0.647 · Q
= 1 · Y − 1.105 · I + 1.702 · Q
Again, simple image processing such as histogram changes can take place with only Y . Color does
not get affected since that is encoded in I, Q.
in terms of tint, shade and tone. We introduce a cylindrical coordinate system, and the model
defines a hexagon.
In the coordinates with Slide 5.44. The hue is again measured as an angle around the vertical
axis, in this case with intervals of 120 degrees going from one primary color to the next (Red
at 0 degrees, green at 320. Blue at 240 and the intermediate degrees are then yellow, cyan and
magenta). The value of saturation S is a ratio going from 0 at the center of the pyramid to one
at the side of the hexagon. The values for V are varying between 0 for black and one for white.
Note that the top of the hex? can be obtained by looking at the RGB cube along the diagonal
axis from white to black. This is illustrated in Slide 5.45. This also provides the basic idea of
converting an RGB input into an HSV color model.
The HSL (hue-lightness-saturation) model of color is defined by a double hex-cone shown in Slide
5.49. The HLS model is essentially obtained as a deformation of the HSV model by pulling up
from the center of the base of the hex-cone (the V = 1 plane). Therefore a transformation of an
RGB into an HLS color model is similar to the RGB to HSV transformation.
The HSV color space is visualized in Slide 5.52. Similarly Slide 5.53 illustrates an entire range of
color models in the form of cones and hex-cones.
Prüfungsfragen:
• Gegeben sei ein Farbwert CRGB = (0.8, 0.4, 0.2)T im RGB-Farbmodell. Schätzen Sie grafisch
die Lage des Farbwertes CHSV in Abbildung B.32 (also die Entsprechung von CRGB im HSV0
Modell). Skizzieren Sie ebenso die Lage eines Farbwertes CHSV
, der den gleichen Farbton
und die gleiche Helligkeit aufweist wie CHSV , jedoch nur die halbe Farbsättigung!
104
CHAPTER 5. COLOR
Algorithm 11 Conversion from RGB to HSI
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
Input, real R, G, B, the RGB color coordinates to be converted.
Output, real H, S, I, the corresponding HSI color coordinates.
float Z, n, Hf, Sf, delta
Z = ((R-G)+(R-B))*0.5
n = sqrt((R-G)*(R-G)+(R-B)*(G-B))
if n! = 0 then
delta=acos(Z/n)
else
delta=0.0
end if
if B <= G then
Hf=delta
else
Hf=2.0*PI-delta
end if
{Default assignment}
18:
19:
20:
21:
// Assignment to H and normalization to values between 0-255
H = (int) ((Hf*255.0)/(2.0*PI))
SUM = R+G+B
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
MIN = min(R,G,B)
if SU M ! = 0 then
Sf=1.0-3.0*(MIN/SUM)
else
Sf=255.0
end if
{calculate minimum}
S = (int) (Sf*255.0)
I = (int) (SUM/3.0)
if !(0 <= H, S, I <= 255) then
Set H, S, I beetween 0 and 255
end if
{prevent artifacts}
5.10. HSV AND HLS -MODELS
105
Algorithm 12 Conversion from HSI to RGB
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
Input, real H, S, I, the HSI color coordinates to be converted.
Output, real R, G, B, the corresponding RGB color coordinates.
float H, S, I
float rt3, R, G, B, hue
if S = 0 then
R=I, G=I, B=I
else
rt3=1/sqrt(3.0)
end if
22:
23:
24:
25:
26:
27:
28:
29:
if 0.0 <= H < 120.0 then
B=((1.0-S)*I)
h=rt3*tan((H-60.0)*PI/180)
G=(1.5+1.5*h)*I-(0.5+1.5*h)*B
R=3.0*I-G-B
else
if 120.0 <= H < 240.0 then
R=((1.0-S)*I)
h=rt3*tan((H-180.0)*PI/180)
B=(1.5+1.5*h)*I-(0.5+1.5*h)*R
G=3.0*I-B-R
else
G=((1.0-S)*I)
hue=rt3*tan((H-300.0)*PI/180)
R=(1.5+1.5*h)*I-(0.5+1.5*h)*G
B=3.0*I-R-G
end if
end if
30:
31:
32:
33:
if !(0 <= R, G, B <= 255) then
Set R, G, B beetween 0 and 255
end if
13:
14:
15:
16:
17:
18:
19:
20:
21:
{prevent artifacts}
106
CHAPTER 5. COLOR
Algorithm 13 Conversion from GRB to HSV
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
Input, real R, G, B, the RGB color coordinates to be converted.
Output, real H, S, V, the corresponding HSV color coordinates.
real B, bc, G, gc, H, R, rc, rgbmax, rgbmin, rmodp, S, V
rgbmax = max ( R, G, B )
rgbmin = min ( R, G, B )
V = rgbmax
Compute the saturation.
if rgbmax/ = 0.0 then
S = ( rgbmax - rgbmin ) / rgbmax
else
S = 0.0
end if
Compute the hue.
if S = 0.0 then
H = 0.0
else
rc = ( rgbmax - R ) / ( rgbmax - rgbmin )
gc = ( rgbmax - G ) / ( rgbmax - rgbmin )
bc = ( rgbmax - B ) / ( rgbmax - rgbmin )
if R = rgbmax then
H = bc - gc
else
if G = rgbmax then
H = 2.0 + rc - bc
else
H = 4.0 + gc - rc
end if
H = H * 60.0
Make sure H lies between 0 and 360.0
H = rmodp ( H, 360.0 )
end if
end if
5.10. HSV AND HLS -MODELS
Algorithm 14 Conversion from HSV to RGB
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
Input, real H, S, V, the HSV color coordinates to be converted.
Output, real R, G, B, the corresponding RGB color coordinates.
real B, f, G, H, hue, i, p, q, R, rmodp, S, t, V
if s = 0.0 then
R = V, G = V, B = V
else
Make sure HUE lies between 0 and 360.0
hue = rmodp ( H, 360.0 )
hue = hue / 60.0
i = int ( hue )
f = hue - real ( i )
p = V * ( 1.0 - S )
q = V * ( 1.0 - S * f )
t = V * ( 1.0 - S + S * f )
end if
if i = 0 then
R = V, G = t, B = p
else
if i = 1 then
R = q, G = V, B = p
else
if i = 2 then
R = p, G = V, B = t
else
if i = 3 then
R = p, G = q, B = V
else
if i = 4 then
R = t, G = p, B = V
else
if i = 5 then
R = V, G = p, B = q
end if
end if
end if
end if
end if
end if
107
108
CHAPTER 5. COLOR
Algorithm 15 Conversion from RGB to HLS
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
Input, real R, G, B, the RGB color coordinates to be converted.
Output, real H, L, S, the corresponding HLS color coordinates.
real B, bc, G, gc, H, L, R, rc, rgbmax, rgbmin, rmodp, S
Compute lightness.
rgbmax = max ( R, G, B )
rgbmin = min ( R, G, B )
L = ( rgbmax + rgbmin ) / 2.0
Compute saturation.
if rgbmax = rgbmin then
S = 0.0
else
if L <= 0.5 then
S = ( rgbmax - rgbmin ) / ( rgbmax + rgbmin )
else
S = ( rgbmax - rgbmin ) / ( 2.0 - rgbmax - rgbmin )
end if
end if
Compute the hue.
if rgbmax = rgbmin then
H = 0.0
else
rc = ( rgbmax - R ) / ( rgbmax - rgbmin )
gc = ( rgbmax - G ) / ( rgbmax - rgbmin )
bc = ( rgbmax - B ) / ( rgbmax - rgbmin )
if r = rgbmax then
H = bc - gc
else
if g = rgbmax then
H = 2.0 + rc - bc
else
H = 4.0 + gc - rc
end if
H = H * 60.0 Make sure H lies between 0 and 360.0.
H = rmodp ( H, 360.0 )
end if
end if
5.10. HSV AND HLS -MODELS
Algorithm 16 Conversion from HLS to RGB
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
Input, real H, L, S, the HLS color coordinates to be converted.
Output, real R, G, B, the corresponding RGB color coordinates.
real B, G, H, hlsvalue, L, m1, m2, R, S
if L <= 0.5 then
m2 = L + L * S
else
m2 = L + S - L * S
end if
m1 = 2.0 * L - m2
if S = 0.0 then
R = L, G = L, B = L
else
R = hlsvalue ( m1, m2, H + 120.0 )
G = hlsvalue ( m1, m2, H )
B = hlsvalue ( m1, m2, H - 120.0 )
end if
Algorithm 17 hlsvalue(N1,N2,HLSVALUE)
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
Input, real N1, N2, H.
Output, real HLSVALUE.
real H, HLSVALUE, hue, N1, N2 rmodp
Make sure HUE lies between 0 and 360.
hue = rmodp ( H, 360.0 )
if hue < 60.0 then
hlsvalue = N1 + ( N2 - N1 ) * hue / 60.0
else
if hue¡180.0 then
HLSVALUE = N2
else
if hue < 240.0 then
HLSVALUE = N1 + ( N2 - N1 ) * ( 240.0 - hue ) / 60.0
else
HLSVALUE = N1
end if
end if
end if
109
110
CHAPTER 5. COLOR
grün
gelb
cyan
rot
weiß
blau
magenta
Figure 5.2: eine Ebene im HSV-Farbmodell
Antwort: Es gilt (siehe Abbildung 5.2):
CHSV
0
CHSV
0
CRGB
= (20◦ , 75%, 0.8)
= (20◦ , 37.5%, 0.8)
= (0.8, 0.6, 0.5)
Halbierung der Sättigung im HSV-Modell bedeutet Halbierung der Entfernung vom Zentrum.
Die Komponenten des entsprechenden Punktes im RGB-Modell liegen näher beinander, die
Ordnung bleibt aber erhalten.
• Welche Farbe liegt in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb
”
und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser
geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau?
5.11
Image Processing with RGB versus HSI Color Models
An RGB color test pattern is shown in Slide 5.51. This test pattern is being used to calibrate
printers, monitors, scanners, image color through a production system that is based on color.
This particular test pattern is a digital and offers 8 bits of red, green and blue. This pattern is
symmetric from top to bottom, consisting of one black band on top, bands two, three and four are
the primary colors. For the RGB model 5, 6, 7 are the secondary colors, band 8 should be white,
band 9 then is a continuous variation from blue to red. Band 9 is a gray wedge.
The manner in which the band of rainbow colors is shown in Slide 5.52 obtained is by continuously
varying from left to right the intensity of blue through values of 1 to 0, of red from 0 to full
intensity, and then green goes from 0 to full and back to 0 across the band. Using the process we
have conceptually hinted at in the HSI-model converts this RGB image into an HSI image. The
easy part is the computation of the intensity I, the complex process is the computation of hue
and saturation. In Slide 5.46, we are looking at the same pattern in terms of hue: we see that
we have lost all sense of color and essentially have a bright image on the left and a dark image
on the right in the color band. Looking at the saturation the variation in the various colors has
also disappeared and the variation of saturation going in the color band from left to center to
right. Most of the information is in the intensity band although some differences in colors have
disappeared here.
5.12. SETTING COLORS
111
The advantage of the HSI model is that we can optimize an image by just optimizing the intensity
segment of the HSI presentation. It is not uncommon that one goes from the RGB into the
HSI color model, modifies the intensity band only and then does the transformation back into
RGB. This typically will apply for histogram modifications of color images. As stated earlier
this optimization will preserve the color and saturation and it will only change the contrast as
we perceive it through the intensity of the image. Doing the optimization on each color band
separately will give us unpredictable color results.
Slide 5.53 illustrates the approach by means of an underexposed RGB original of a Kakadu bird.
The result obtained by an HSI transformation and histogram equalization of just the intensity
band produces the result shown next. We do have a much improved and satisfactory image.
A similar ideology is used when one creates color products from multiple input sources: an example
might be a high resolution black and white satellite image at one meter pixel size that is being
combined with a lower resolution color image in RGB at 4 meter resolution. A process to combine
those two image sources takes the RGB low resolution image and converts it into an HSI-model.
The I component is then removed and for it one inserts the higher resolution black and white
satellite image. The result is transformed back into RGB space. The entire operation requires of
course that all images have the same pixel size and are a perfect geometric match.
5.12
Setting Colors
We have now found that a great number of different color models exist that allow us to define
colors in various ways. Slide 5.60 is a pictorial summary of the various color models. The models
shown are those that are common in the image processing and technical arena. The most popular
color model in the very large printing and graphic arts industry is not shown here, and that is
the CMYK model. Setting a color on a monitor or printer requires that the color be selected on
a model that the output device uses.
Let us assume that we choose the red, green, blue model for presentation of an image on a color
monitor. It is customary to change the red, green and blue channels in order to obtain a desired
output color. Inversely, an output color could be selected and the RGB components from which
that output color is created are being set automatically.
If we were to choose the HSV color model we would create a sample color by selection of an angle
for the hue we would shift saturation on a slider between 0 and 1, we would set the value also
between 0 and 1 and in the process obtain the result in color. Inversely, a chosen color could be
converted into its HSV components.
Finally, the HSI and RGB models can be looked at simultaneously: as we change the HSI values, the system instantaneously computes the RGB output and vice versa. In the process, the
corresponding colors are being shown as illustrated in Slide 5.63.
Optical illusions are possible in comparing colors: Slide 5.64 shows the same color appearing
differently when embedded in various backgrounds.
5.13
Encoding in Color
This is the topic of pseudo-color in image processing where we assign color to gray values in order
to highlight certain phenomena and make them more easily visible to a human observer. Slide 5.67
illustrates a medical X-ray image, initially in a monochrome representation. The gray values can
be “sliced” into 8 different gray value regions which then can be encoded in color. The concept of
this segmentation into gray value regions is denoted as intensity slicing, sometimes density slicing
and is illustrated in Slide 5.66. The medical image may be represented by Slide 5.66 where the
112
CHAPTER 5. COLOR
gray values are encoded as f (x, y). A plane is defined that intersects the gray values at a certain
level li, one can assign now all pixels with a value greater than li to one color. All pixels below
the slicing plane can be assigned to another color, and by moving the slicing plane we can see
very clearly on a monitor which pixels are higher and lower than the slicing plane. This becomes
a much more easily interpretable situation than one in which we see the original gray values only.
Another matter of assigning color to a black and white image is illustrated in Slide 5.68. The
idea is to take an input value f (x, y) and apply it, three different transformations, one into a red
image, one into a green image and one into a blue image, so that the to three different images
are assigned to the red, green and blue guns of a monitor. A variety of transformations would
be available to obtain from a black and white image the colorful output. Of course, the matter
of Slide 5.68 is nothing but a more general version of the specialized slicing plane applied in the
previous Slide 5.66.
Slide 5.69 illustrates the general transformation from a gray level to color with the example of an
X-ray image obtained from a luggage checking system at an airport. We can see in the example
how various color transformations enhance a luggage with and without explosives such that the
casual observer might notice the explosive in the luggage very quickly. We skip the discussion of
the details of a very complex color transformation but refer to [GW92, chapter 4.6].
Prüfungsfragen:
• Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild (pseudo color image)? Nennen Sie je einen typischen Anwendungsfall!
5.14
Negative Photography
The negative black and white photograph of Slide 5.71 is usually converted to positive by inverting
the gray values as in Slide 5.72. This is demonstrated in Algorithm ??. Well we take a transformation that simply inverts the values 0 to 255 into 255 to 0. This trivial approach will not
work with color photography. As shown in Slide 5.73 color negatives typically are masked with
a protective layer that has a brown-reddish color. If one were to take an RGB scan of that color
negative and convert it into a positive by inverting the red, green and blue components directly
one would obtain a fairly unattractive result as shown in Slide 5.74. One has first to eliminate the
protective layer, that means one has to go to the edge of the photograph and find an area that is
not part of the image to determine the RGB components that represent that protective layer and
then we have to subtract the R component from all pixel R-values, similarly in the B component
and in green G. As a result we obtain a clean negative as shown in Slide 5.75. If we now convert
that slide we obtain a good color positive as shown in Slide 5.76. Again, one calls this type of
negative a masked negative (compare Algorithm 18). There have been in the past developments
of color negative film that is not masked. However, that film is for special purposes only and is
not usually available.
Algorithm 18 Masked negative of a color image
1:
2:
3:
4:
5:
6:
7:
8:
locate a pixel p which color is known in all planes
{e.g. the black film border}
for all planes plane do
diff = grayvalue(p, plane) - known grayvalue(p, plane)
{calculate the “masking layer”}
for all pixel picture do
grayvalue(pixel,plane) = grayvalue(pixel, plane) - diff
{correct the color}
Invert(pixel)
{invert the corrected negative pixel to get the positive}
end for
end for
5.15. PRINTING IN COLOR
113
Prüfungsfragen:
• Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig,
um daraus mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer
als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund des Positivbildes an!
5.15
Printing in Color
As we observe advertisement spaces with their posters, we see colorful photographs and drawings
which, when we inspect them from a short distance, are really the sum of four separate screened
images.
We have said earlier that for printing the continuous tone images are being converted into half
tones and we also specified in a digital environment that each pixel is further decomposed by a
dithering matrix into subpixels.
When printing a color originally, one typically uses the four color approach and bases this on
the cyan, magenta, yellow and black pigments which are the primary colors from which the color
images are being produced. Each of these separates of the four components is screened and the
screen has an angle with respect to the horizontal or vertical. However, in order to avoid a Moiree
effect, by interference of the different screens with one another, the screens themselves are slightly
rotated with respect to one another. This type of printing is used in the traditional offset printing
industry.
If printing is then directly from a computer onto a plotter paper, then the dithering approach is
used instead. If we look at a poster that is directly printed with a digital output device and not
via an offset press, we can see how the dithering matrix is responsible for each of the dots on the
poster. Again each dot is encoded by one of the four basic pigment colors, cyan, magenta yellow
or black.
Prüfungsfragen:
• Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird
verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert?
Antwort: Vier separate Bilder (je eines für die Komponenten Cyan, Magenta, Yellow und
Black) werden übereinander gedruckt (CMYK-Farbmodell). Jede Ebene ist ein HalftoneBild, wobei die Ebenen geringfügig gegeneinander rotiert sind, um den Moiree-Effekt zu
verhindern.
5.16
Ratio Processing of Color Images and Hyperspectral
Images
We start out from a color image and for simplicity we make the assumption that we only have two
color bands, R, G, so that we can explain the basic idea of ratio imaging. Suppose a satellite is
imaging the terrain in those two colors. As the sun shines onto the terrain, we will have a stronger
illumination on terrain slopes facing the sun than on slopes that face away from the sun. Yet,
the trees may have the exact same color on both sides of the mountain. When we look now at
the image of the terrain, we will see differences between the slopes facing the sun and the slopes
facing away from the sun.
114
CHAPTER 5. COLOR
In Slide 5.81 let’s take three particular pixels, one from the front slope, one from the back, and
perhaps a third pixel from a flat terrain, all showing the same type of object, namely a tree. We
now enter these three pixels into a feature space that is defined by the green and red color axes.
Not surprisingly, the three locations for the pixels that we have chosen are on a straight line from
the origin. Clearly, the color of all three pixels is the same, but the intensity is different. We are
back again with the ideology of the HSI-model.
We now can create two images from the one color input image. Both of those images are black
and white. In one case, we place at each pixel its ratio R/G, the angle that that vector forms with
the abscissa. In the other image we place at each pixel the distance of the pixel from the origin in
the feature space: As a result, we obtain one black and white image in Slide 5.82, that is clean of
color and shows us essentially the variations in intensity as a function of slope. The other image
in Slide 5.83 shows us the image clean of variations of density as if it were all flat and therefore
the variations of color are only shown there. Conceptually, one image is the I component of an
HSI transformation and the other one is the H component. Such ratio images have in the past
been used to take satellite images and make an estimate of the slope of the terrain, assuming that
the terrain cover is fairly uniform. That clearly is the case on glaciers, the arctic or antaractic or
in heavily wooded areas.
5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES
115
116
CHAPTER 5. COLOR
Slide 5.1
Slide 5.2
Slide 5.3
Slide 5.4
Slide 5.5
Slide 5.6
Slide 5.7
Slide 5.8
Slide 5.9
Slide 5.10
Slide 5.11
Slide 5.12
Slide 5.13
Slide 5.14
Slide 5.15
Slide 5.16
Slide 5.17
Slide 5.18
Slide 5.19
Slide 5.20
Slide 5.21
Slide 5.22
Slide 5.23
Slide 5.24
Slide 5.25
Slide 5.26
Slide 5.27
Slide 5.28
5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES
Slide 5.29
Slide 5.30
Slide 5.31
Slide 5.32
Slide 5.33
Slide 5.34
Slide 5.35
Slide 5.36
Slide 5.37
Slide 5.38
Slide 5.39
Slide 5.40
Slide 5.41
Slide 5.42
Slide 5.43
Slide 5.44
Slide 5.45
Slide 5.46
Slide 5.47
Slide 5.48
Slide 5.49
Slide 5.50
Slide 5.51
Slide 5.52
Slide 5.53
Slide 5.54
Slide 5.55
Slide 5.56
117
118
CHAPTER 5. COLOR
Slide 5.57
Slide 5.58
Slide 5.59
Slide 5.60
Slide 5.61
Slide 5.62
Slide 5.63
Slide 5.64
Slide 5.65
Slide 5.66
Slide 5.67
Slide 5.68
Slide 5.69
Slide 5.70
Slide 5.71
Slide 5.72
Slide 5.73
Slide 5.74
Slide 5.75
Slide 5.76
Slide 5.77
Slide 5.78
Slide 5.79
Slide 5.80
Slide 5.81
Slide 5.82
Slide 5.83
5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES
119
Prüfungsfragen:
• Was ist ein Ratio-Bild“?
”
• Zu welchem Zweck würde man als Anwender ein sogenanntes Ratio-Bild“ herstellen? Ver”
wenden Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes.
120
CHAPTER 5. COLOR
Chapter 6
Image Quality
6.1
Introduction
As image quality we generally denote an objective impression of the crispness, the color, the detail,
the composition of an image. Slide 6.2 is an example of an exciting image with a lot of detail,
crispness and color. Slide 6.3 adds the excitement of motion and a sentiment of activity and cold.
Generally in engineering we do not deal with these concepts that are more artistic and aesthetic.
We deal with art definitions.
6.2
Definitions
In images we define quality by various components. Slide 6.5 illustrates radiometric concepts of
quality that relate to density and dynamic range. Density 0 means that the light can go through
the image unhindered, density 4 means that the image blocks the light. Intensity is the concept
associated with the object. Greater intensity means that more light is coming from the object.
The dynamic range of an image is the greatest density value divided by the least density value
in the image, the darkest value divided by the brightest value. The dynamic range is typically
encoded logarithmically.
Prüfungsfragen:
• Was versteht man unter dem dynamischen Bereich“ eines Mediums zur Wiedergabe bild”
hafter Informationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung? Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen
Bereiches!
6.3
Gray Value and Gray Value Resolutions
We have already described in earlier presentations the idea of resolving gray values. Chapter 3 the
concept of a gray wedge and how a gray wedge gets scanned to assess the quality of a scanning
process. Similarly we can assess the quality of an image by describing how many different gray
values the image can contain. Slide ?? illustrates the resolution of a gray value image.
Note again that in this case we talk about the gray values in an image whereas in the previous
chapter we talked about the quality of the conversion of a given continuous tone image into a
digital rendition in a computer in the process of scanning.
121
122
CHAPTER 6. IMAGE QUALITY
Resolving great radiometric detail means that we can recognize objects in the shadow, while we
also can read writing on a bright roof. Resolution of the gray values in the low density bright
areas does not compromise a resolution in the high density dark areas. Slide ?? is a well resolved
image.
Prüfungsfragen:
• Was versteht man unter der Grauwerteauflösung eines digitalen Rasterbildes?
Antwort: Die Anzahl der verschiedenen Grauwerte, die in dem Bild repräsentiert werden
können
6.4
Geometric Resolution
Again, just as in the process of scanning an image, we can judge the image itself independent from
its digital or analog format. I refer to an earlier illustration which essentially describes again by
means of the US Air Force (USAF) resolution target how the quality of an image can be described
by means of how well it shows small objects on the ground of in the scene. We recall that the
USAF target, when photographed presents to the camera groups of line patterns and within each
group elements. So in this particular case of Slide 6.10 group 6, element 1 is the element still
resolved. We know from an accompanying table that that particular element in group 6 presents
the resolution of 64 line pairs per mm. We can see in the lower portion of the slide that element
6 in group 4 represents 28 pairs per mm.
We have now in Slide 6.11 a set of numbers typical of the geometrical resolution in digital image.
We have a resolution where we typically deal with of dots per inch, for example when something
is printed. So a high resolution is 3000 dots per inch, a low resolution is 100 dots per inch. Note
that at 3000 dots per inch, each point is about 8 micrometers, recall that 1000 dots per inch is 25
micrometers per pixel. Which leads us to the second measure of geometric resolution: the size of
a pixel. When we go to a computer screen, we have a third measure and we say the screen can
resolve 1024 by 1024 pixels, irrespective of the size of the screen.
Recall the observations about the eye and the fovea. We said that we had about 150 000 cone
elements per mm on the fovea. So when we focus our attention on the computer monitor, those
1000 by 1000 pixels would represent the resolution of about 3 by 3 mm on the retina. We may
really not have any use for a screen with more resolution, because we wouldn’t be able to digest
the information on the screen in one glance, because it would overwhelm our retina.
A next measure of resolution is the linear line pairs per mm, mentioned earlier. The 25 line pairs
per mm is a good average of resolution for photography on a paper-print, and 50 line pairs per mm
is a very good resolution on film. Best resolutions can be obtained with spy photography which
is very slow filming needs lots of exposure times, but is capable of resolving great detail. In that
case we make it in access of 75 line pairs per mm.
It might be of interest to define the geometric resolution of an unaided eye, that is 3 to 8 pixels
per mm at a distance of 25 cm. Again, when a human person sits in front of a monitor and starts
seeing the images at a continuous pattern, and not recognizing individual pixels, at 3 pixels per
mm the screen could have a dimension of 300 by 300 mm. For an eagle-eyed person at 8 pixels
the same surface that the human can resolve would be about an 12 by 12 cm square.
It is of interest to relate these resolutions to one another, this is shown in Slide 6.15.
Film may have n line pairs per mm. This represents 2.8 × n pixels per mm (see below). If we had
film of 25 pairs per mm then we would have to represent this image at 14 micrometers per pixel
under this relationship. Now on a monitor of a sidelength of 250 mm with 1024 pixels, one pixel
has the dimension of 0.25 mm.
6.5. GEOMETRIC ACCURACY
123
We can again confirm that if we have on a monitor each pixel occupying 250 micrometers (equals
0.25 mm) then we have 4 pixels in a mm, then typically the range of normal vision people perceive
this as a continuous tone image, the actual range is at 125 to 300 micrometer per pixel.
The Kell-factor proposed during World War II in the context of television suggests that resolving
a single line pair of a black and white line by 2 pixels will be insufficient because statistically we
cannot be certain that those pixels would fall directly on each dark line and on each bright line,
but they fall halfway in between. If they do, the line pairs will not be resolved. Therefore Kell
proposed,
that the proper number of pixels to resolve under all circumstances each line pair needs
√
is 2 2 the number of line pairs per mm.
Prüfungsfragen:
• Ein sehr hochauflösender Infrarotfilm wird mit einer geometrischen Auflösung von 70 Linienpaaren pro Millimeter angepriesen. Mit welcher maximalen Pixelgröße müsste dieser
Film abgetastet werden, um jedweden Informationsverlust gegenüber dem Filmoriginal zu
vermeiden?
• Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit
welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine
Skizze.
6.5
Geometric Accuracy
An image always represents a certain geometric accuracy of the object. Again we have already
taken a look at the basic idea when we talked in an earlier chapter about the conversion of a given
analog picture into a digital forum. Geometric accuracy of an image is described by the sensor
model, a concept mentioned in the chapter on sensors. We have deviations between the geometric
locations of object points in a perfect camera, from the geometric locations of our real camera.
Those discrepancies can be described in a calibration procedure. Calibrating imaging systems is
a big issue and has given many diploma engineers and doctors their degrees in vision. The basic
idea is illustrated in Slide 6.17.
Prüfungsfragen:
• Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen
Rasterbildes?
6.6
Histograms as a Result of Point Processing or Pixel
Processing
The basic element of analyzing the quality of any image is a look at its histogram. Slide 6.19
illustrates an input image in color, that is semidark and for which we want to build its histogram:
we find many pixels in the darker range and fewer in the brighter range. We can now change this
image by redistributing the histogram in a process called histogram equalization. We see however
in Slide 6.20 that we have a histogram for each of the color component images, while we are only
showing a composite of the colors denoted as luminosity. The summary of this manipulation is
shown in Slide 6.22.
A very common improvement of an image’s quality is a change of the assignment of gray values to
the pixels of an image. This is based on the histogram. Let us assume that there indeed in each
124
CHAPTER 6. IMAGE QUALITY
Algorithm 19 Histogram equalization
1:
2:
For an N x M image of G gray-levels (often 256), create an array H of length G initialized
with 0 values.
Form the image histogram: Scan every pixel and increment the relevant member of H - if pixel
p has intensity gp , perform
H[gp ] = H[gp ] + 1
3:
Form the cumulative image histogram Hc
Hc [0] = H[0]
Hc [p] = Hc [p − 1] + H[p] where p = 1, 2, ..., G − 1
4:
Set
T [p] = round(
5:
G−1
Hc [p])
NM
Rescan the image and write an output image with gray-levels gq , setting
gq = T [gq ]
Definition 17 Histogram stretching
Stretching or spreading of an histogram is mapping the grey value of each pixel of an image or
part of an image to an piecewise continuous function T (r).
Normally the gradation curve T (r) is monotonous growing and assigns a small range of gray values
of the input image over the entire range of available values, so that the result image looks as if it
had a lot more contrast.
6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 125
of the 8 bit input images exists one more pixel. We may have individual gray values, say the gray
value 67 which may have 10 000 pixels, gray values 68 may have none, gray values 69 may have
none, but gray value 70 may again have 7000 pixels. We can change this image by allocating. The
is input gray values to new output values depending on their frequency as seen in the histogram.
We aim for a histogram that is as uniform as possible. Slide 6.23 shows a detail of the previous
slide in one case with the input histogram and in a second case with the equalized histogram: we
have attempted to distribute the gray values belonging to the input pixels such that the histogram
is as uniform as possible. Slide 6.24 shows how we can change the gray values from an input image
B into an output image C. Geometrically we describe the operation by a 2-d diagram with the
abscissa for input pixels and the ordinate for the gray values. The relationship between input and
output pixels is shown on the curve in the slide representing a look-up-table.
Slide 6.25 illustrates again how an input image can be changed and how a certain area of the
input image can be highlighted in the output image. We simply set all input pixels below a
certain threshold A and above a certain threshold B to zero, and then set the intermediate range
and spread that range to a specific value in the output image. Another method of highlighting is
to take an input image and convert it one-on-one to an output image with the exception of a gray
value range from a lower gray value A to an upper gray value B which is set into one output gray
value, thereby accentuating this part of the input image.
Another analysis is shown in Slide 6.26 where we represent the 8 bits of a gray value image as
8 separate images and in each bit plane we see the bit that is set in the byte of the image. Bit
plane 7 is most significant, bit plane zero the least significant. We obtain an information about
the contents of an image as shown in Slide 6.27 where we see the 8 levels of an image and note
that we have a thresholded type image at level 7 and we have basically no information in level 0.
We see there is very low information in the lower three bits of that image. In reality we may not
deal with an 8 bit image, but with a 5 bit image.
Histograms let us see where all pixels are aggregated. Low digital numbers represent a dark image,
pixels clustered in the high digital numbers show a bright image. A narrow histogramm with a
single peak is a low contrast image because does not have many different gray values. However,
if an image has all of its gray values occupied with pixels and if those are equally distributed, we
obtain a high contrast - high quality image.
How do we change the histogram of an image and spread it or equalize it? We think of the image
gray values in the input on the abscissa of a 2D-diagramm and translate them to output gray
values on the ordinate. We use a curve that relates the input to the output pixels. The curve is
denoted as gradation curve or t(r). Let’s take an example of an image with very low contrast as
signified by a histogram that has only values in the range round 64 and 10 gray values to the left
and to the right.. We now spread this histogram by a curve t(r) that takes the input values where
we have many and spreads them over many values in the output image. As a result we now have
pixels values spread over the entire range of available values, so that the image looks as if it had
a lot more contrast. We may not really be able to change the basic shape of the histogram, but
we can certainly stretch it as shown in slide ??. Equlisation is illustrated in slide ??.
We may want to define a desired histogram and try to approach this histogram given an input
image which may have a totally different histogram. How does this work? Slide 6.36 explains.
Let’s take a thermal image of an indoor scene. We show the histogram of this input image and
for compansion we also illustrate the result of equalization. We would like, however, to have a
histogram as shown in the center of the histogram display of the slide. We change the input
histogram to approach the designed histogram as best as we possbile obtaining the third image.
The resulting image permits one to see chairs in the room.
Slide 6.36 summarizes that enhancement is the improvement of the image by locally processing
each pixel separately from the other pixels. This may not only concern contrast but could address
noise as well. A noisy input image can become worse if we improve the histogram since we may
increase the noise. If we do some type of histogram equalization that is locally changing we might
get an improvment of the image structure and increased ease of interpretability.
126
CHAPTER 6. IMAGE QUALITY
An example is shown in slide 6.38. We have a very noisy image and then embedded in the image
are 5 targets of interest. With a global histogram process we may not be able to resolve the detail
within those 5 targets. We might enhance the noise that already exists in the image and still not
see what is inside the targets. However, when we go through the image and we look at individual
segments via a small window and we improve the image locally, moving the window from place
to place with moving new parameters at each location, we might obtain the result as shown in
the next component of the Slid. We find that detail within each target consists of a point and a
square around that point.
Algorithm 20 Local image improvement
g(x,y) ... Ergebnisbild
f(x,y) ... Ausgangsbild
g(x, y) = A(x, y) ∗ {f (x, y) − m(x, y)} + m(x, y)
und
A(x, y) =
k∗M
σ(x, y)
wobei
k ... Konstante
M ... globaler Mittelwert
σ ... Standardabweichung der Grauwerte
We have taken an input image f (x, y) and created a resulting image g(x, y), by a formula shown
in slide 6.39. There is a coefficient A(x, y) involved and a mean of m(x, y). So in a window we
computed mean gray value m(x, y) as an average gray value, we subtract it from each gray value
in the image f , we multiply the difference by a multiplication factor A(x, y) and then add back
the mean m(x, y).
What is this A(x, y)? It is in itself a function of (x, y) in each window we compute the mean m(x, y)
and a standard deviation σ(x, y) of the gray values. We also compute a global average M , separate
from the average of each small window, and we have some constant k. These improvements of
images according to the formula, and similar approaches, are heavily used in medical imaging
and many other areas where images are presented to the eye for interactive analysis. We are
processing images here, but before we analyze them. Therefore, we call this preprocessing. A
particular example of preprocessing is shown in the medical image of slide 6.40 illustrating how a
bland image with no details reveals its detail after some local processing.
Another idea is the creation of difference images, for example an X-ray image of a brain taken
before and after some injection is given as a contrast agent. We then have an image of the brain
before and after the contrast agent has entered into the blood stream. The two images can then
be subtracted and will highlight the vessels that contain the contrast material.
How else can we improve images? We can take several noisy images and average them. For
example we can take a microscopic image of some cells, and a single image may be very noisy, but
by repeating the image and computing the average of the gray values of each pixels we eliminate
the noise and obtain a better signal. Slide 6.44 shows the effect of averaging 128 images.
Prüfungsfragen:
• Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses
Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist,
das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu
segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form
eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)!
6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 127
50
0
0
255
Figure 6.1: Histogramm eines Graukeils
• Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender
Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses
Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze Rand in Abbildung
B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst.
Antwort: siehe Abbildung 6.1
• Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar1 und die Spring Conference on Computer Graphics stattfinden. Durch einen
automatischen Prozess wurde daraus Abbildung B.74(b) erzeugt, wobei einige Details (z.B.
die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur
Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise!
• Skizzieren Sie das Histogramm eines
1. dunklen,
2. hellen,
3. kontrastarmen,
4. kontrastreichen
monochromen digitalen Rasterbildes!
Antwort: Siehe Abbildung 6.2, man beachte, dass die Fläche unter der Kurve immer gleich
groß ist.
1 Für interessierte Studenten aus der Vertiefungsrichtung Computergrafik besteht die Möglichkeit, kostenlos an
diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren.
128
CHAPTER 6. IMAGE QUALITY
(a) dunkel
(b) hell
(c) kontrastarm
(d) kontrastreich
Figure 6.2: Histogramme
6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 129
130
CHAPTER 6. IMAGE QUALITY
Slide 6.1
Slide 6.2
Slide 6.3
Slide 6.4
Slide 6.5
Slide 6.6
Slide 6.7
Slide 6.8
Slide 6.9
Slide 6.10
Slide 6.11
Slide 6.12
Slide 6.13
Slide 6.14
Slide 6.15
Slide 6.16
Slide 6.17
Slide 6.18
Slide 6.19
Slide 6.20
Slide 6.21
Slide 6.22
Slide 6.23
Slide 6.24
Slide 6.25
Slide 6.26
Slide 6.27
Slide 6.28
6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 131
Slide 6.29
Slide 6.30
Slide 6.31
Slide 6.32
Slide 6.33
Slide 6.34
Slide 6.35
Slide 6.36
Slide 6.37
Slide 6.38
Slide 6.39
Slide 6.40
Slide 6.41
Slide 6.42
Slide 6.43
Slide 6.44
Slide 6.45
132
CHAPTER 6. IMAGE QUALITY
Chapter 7
Filtering
7.1
Images in the Spatial Domain
We revisit the definition of an image space with its cartesian coordinates x and y to denote the
columns and rows of pixels. We define a pixel at location (x, y) and denote its gray value with
f (x, y). Filtering changes the gray value f of an input image into an output gray value g(x, y) in
accordance with Slide 7.3. The transformation
g(x, y) = T [f (x, y)]
is represented by an operator T which acts on the pixel at location (x, y) and on its neighbourhood.
The neighbourhood is defined by a mask which may also be denoted as template, window or filter
mask . We can therefore state in general terms that: a filter is an operation that produces from
an input image and its pixels f (x, y) an output image with pixels g(x, y) by a filter operator T .
This operator uses in the transformation the input pixel and its neighbourhood to produce a value
in the output pixel. We will see later that filtering is a concept encompassing many different
types of operations to which the basic definition applies common. It may be of interest to note
that some of the operations we have previously discussed can be classified as filter operations,
namely the transformation of the image where an operation addresses a neighbourhood of size
1 × 1. Those transformations produce from an input an output pixel via the transfer function
T that one calls “point operations” or transformations of individual pixels. We have the special
case of contrast enhancement in Slide 7.4, and of “thresholding”. Similarely, these operations on
single pixels included the inversion of a negative to a positive as shown in Slide 7.5. The same
type of operation is shown in Slide 7.6. The astronomic imaging sensors at times produce a very
high density range that challenges the capabilities of film and certainly of monitors. On an 8-bit
image we may not really appreciate the detail that a star may provide through a high resolution
telescope. To do better justice to a high density range image a single pixel operation is applied
that non-linearly transforms the input gray values into the output gray values. Again, in the
narrow sense of our definition of filtering, this is a “filter operation”. However, we have previously
discussed the same transformation under the name of contrast stretching. In this particular case,
the contrast stretch is logarithmic.
Prüfungsfragen:
• In der Vorlesung wurden die Operationen Schwellwert“ und Median“, anzuwenden auf
”
”
digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden
Operationen im Kontext der Filterung?
133
134
7.2
CHAPTER 7. FILTERING
Low-Pass Filtering
Let us define a mask of 3 by 3 pixels in Slide 7.8. We enter into that mask values w1 , w2 , . . . , w9 .
We call those values w “weights”. We now place the 3 by 3 pixel mask on top of the input image
which has gray values denoted as zi . Let us assume that we center the 3 by 3 mask over the pixel
z5 so that w5 is on top of z5 . We can now compute a new gray value g5 as the sum of the products
of the values wi and zi , wi · zi in accordance with slide. This describes an operation on an input
image without specifying the values in the filter mask. We need to assign such values to the 3 by
3 mask: a low-pass filter is filled with a set of values shown in slide: in this example we assign the
value 1/9. The sum of all values is 1. Similarly, a larger mask of 5 × 5 values may be filled with
1/25, a 7 × 7 filter mask with 1/49. The three examples are typical low-pass filters.
Slide 7.10 illustrates the effect the low-pass filter masks, filled with weights of 1/k, with k being
the number of pixels in the filter mask. Slide 7.10 shows the image of a light bulb and how the
effect of low-pass filters increases the blur as the size of the filter mask increases from 25 via 125
to 625 values representing windows with side lengths of 5,1 and 25 pixels.
We will next consider the analogy between “filtering” and “sampling”. Slide 7.11 shows an image
and the gray value profile along a horizontal line (row of pixels). The continuous gray value trace
needs to be “sampled” into discrete pixels. We show in Slide 7.12 the basic concept of a transition
from the continuous gray value trace to a set of pixels. If we reconstruct the original trace from
the discrete pixels, we will obtain a new version of the continuous gray value trace. If turns out
that the reconstruction is nothing else but a filtered version of the original. Sampling and signal
reconstruction are thus an analogy to filtering, and sampling theory is related to filter theory.
Slide 7.13 illustrates one particular and important low-pass-filter: the sinc-filter . A sinc function
is
sin(πf )
sinc(f ) =
πf
and represents the Fourier-transform of a rectangular “pulse” in the Fourier-space (see below).
Slide 7.13 illustrates how a filtered value of the input function is obtained from the large filter mask
representing the sinc-function. By shifting the sinc-function along the abszissa and computing a
filter value at each location, we obtain a smoothed version of the input signal. This is analogous
to sampling the input signal and reconstructing it from the samples.
Next we consider the median filter . This is a popular and frequently used operator. It inspects
each gray value under the filter window and picks that gray value under that window which has
half of the pixels with larger and the other half with smaller gray values. Essentially the gray
values under the filter window are being sorted and the median value is chosen. Where would
this be superior to an arithmetic mean? Clearly, the median filter does suppress high frequency
information or rapid changes in the image. Thus it suppresses salt and pepper noise. Salt and
pepper noise results from irregularities where individual pixels are corrupted. They might be
either totally black or totally white.
By applying a median filter one will throw out these individual pixels and replace them by one midrange pixel from the neighbourhood. The effect can sometimes be amazing. Slide 7.16 illustrates
with a highly corrupted image of a female person, and a corruption of the image with about 20%
of the pixels. Computing the arithmetic mean will produce a smoother image but will not do away
with the effect of noise. Clusters of corrupted pixels will result in persistent corruptions of the
image. However, the median filter will work a miracle. An image, almost as good as the input
image, without many corruptions, will result.
A median filter also has a limitation: If we have fine details in an image, say individual narrow
linear features (an example would be telegraph wires in an aerial photo) then those pixels marking
such a narrow object will typically get suppressed and replaced by the median value in their
environment. As a result the fine linear detail would no longer show in the image.
7.2. LOW-PASS FILTERING
135
0 0 5 0 0 0 0 0
0 0 5 0 0 4 0 0
0 0 1 5 0 0 1 2
0 0 0 5 2 4 5 5
0
0
4
5
0 0 0 0 0
0 0 1 2 1
0 1 2 4 4
0 1 3 5 5
0 0 1 3 5 5 5 5 5
0 1 3 5 5 5 2 5 5
0 2 5 5 3 5 5 5 5
Figure 7.1: Anwendung eines Median-Filters
Prüfungsfragen:
• Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche
Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um
die Darstellung der Methode und die Begründung, warum diese Methode die Störungen
entfernen wird.
• Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er
eingesetzt?
• Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umrandeten Bereiches
des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in
Abbildung B.14 eintragen.
Antwort: Siehe Abbildung 7.1
• Skizzieren Sie die Form des Filterkerns eines Gaussschen Tiefpassfilters. Worauf muss man
bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten?
• Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass
1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals
unverändert lässt,
2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals
vollständig unterdrückt!
Antwort: siehe Abbildung 7.3
(a) Tiefpass
(b) Hochpass
Figure 7.2: Tief- und Hochpassfilter
136
7.3
CHAPTER 7. FILTERING
The Frequency Domain
We have so far looked at images represented as gray values in an (x, y) cartesian coordinate system.
We call this the spatial-domain representation. There is another representation of images using
sinus- and cosinus-functions called spectral representation. The transformation of the spatialdomain image f (x, y) into a spectral-domain representation F (u, v) is via a Fourier-transform:
Z Z
F {f (x, y)} = F (u, v) =
f (x, y)e−2jπ(ux+vy) dxdy
x
y
The spectral representation is with the independent variables u, v which are the frequencies in the
coordinate directions. The spectral representation can be converted back into a special representation by the inverse transform:
Z Z
f (x, y) =
F (u, v)e−2jπ(ux+vy) dudv(???)
u
v
RR
PP
In the discrete world of pixels, the double integral
is replaced by a double summation
.
A filter operation can be seen as a convolution (Faltung) in accordance with Slide 7.18. The
convolution is defined in nd graphically illustrated in through .
In this case the two functions f (x) and g(x) are one-dimensional functions for simplicity. They
are being convolved using an operation denoted by a symbol ∗:
Z ∞
f (x) ∗ g(x) =
f (t)g(x − t)dt
t=−∞
We define the function f (t) as a simple rectangle on the interval 0 ≤ t ≤ 1. The second function
g(t) is also defined in the same space as a box on the interval 0 ≤ t ≤ 1. We illustrate the function
g at location −t and at x − t produce the product of f (t). g(x − t), and as shown in Slide 7.24 as
the shaded area. We illustrate this at x = x1 , and x = x2 . The convolution now is the integral of
all these areas as we move g(x − t) into the various positions along the axis x. When there is no
overlap between the two functions the product f · g is empty. As a result, the integral produces
values that increase monotonously from 0 to c and then decrease from c to 0 as the co-ordinate x
goes from 0 through 1 to the value of 2. This produces a “smoothed” version of the input function
f.
It is now of interest to appreciate that a convolution in the spatial domain is a multiplication in
the spectral domain. This was previously explained in Slide 7.18. We can thus execute a filter
operation by transforming the input image f and the filter function f into the spectral domain,
resulting in F and H. We multiply the two spectral representation, obtain G as the spectral
representation of the output image. After of to a Fourier transform of G we have the viewable
output image g.
This would be the appropriate point in this course to interrupt the discussion of filtering and
inserting a “tour d’horizon” of the Fourier-transform. We will not do this in this class, and
reserve that discussion for a later course as part of the specialization track in “image processing”.
However, a Fourier-transform of an image is but one of several transforms in image processing.
There are others, such as the Hadamard-transform, a Cosine-transformation, Walsh-transforms
and similar. Of interest now is the question of filtering in the spatial domain, representing a
convolution, or in the spectral domain representing a multiplication. At this point we only state
that with large filter masks at sizes greater than 15 x 15 pixels, it may be more efficient to use
the spectral representation. We do have the cost of 3 Fourier transforms (note: f ??? F, h ??? H,
G ??? G), but the actual convolution is being replaced by a simple multiplication of F???H.
Slide 7.25 now introduces certain filter windows and their representation both in the spatial and
spectral domains. presents a one-dimensional filter functions. We are, therefore, looking at a row
7.4. HIGH PASS-FILTER - SHARPENING FILTERS
137
of pixels in the spatial domain or a row of pixels in the spectral domain through the center of the
2D function. The 2D functions themselves are rotationally symmetric.
A typical low-pass filter in the spatial domain will have a Gaussian shape. Its representation in
the spatial domain is similar to its representation in the spectral domain. In the spectral domain it
is evident that the filter rapidly approaches a zero-value, therefore suppressing higher frequencies.
In the spectral domain a high-pass filter has a large value as frequencies increase and is zero at
low frequencies. Such a high pass-filter looks like the so called mexican hat, if presented in the
spatial domain. A band pass-filter in two dimensions is a ring like a “donut shape”, and in the
one dimensional case it is a Gaussian curve that is displaced with the respect to the origin. In
the spatial domain the band-pass filter-shape is similar to a “mexican hat”. However, the values
in the high pass-filter are negative outside the central area in the spectral domain, whereas is in
the band pass-filter the shape goes first negative, then positive again .
Prüfungsfragen:
• Beschreiben Sie anhand einer Skizze das Aussehen“ folgender Filtertypen im Frequenzbe”
reich:
1. Tiefpassfilter
2. Hochpassfilter
3. Bandpassfilter
7.4
High Pass-Filter - Sharpening Filters
We now are ready to visit the effect of a high-pass filter. In the spatial domain, the shape of the
high pass-filter was presented in Slide 7.26. In actual numerical values such a filter is shown in
Slide 7.28. The filter window is normalized such that the sum of all values equals zero. Note that
we have a high positive value in the center and negative values at the edge of the window. The
pixel at the center of the window in the image will be emphasized and the effect of neighbouring
pixels reduced. Therefore small details will be accentuated. Background will be suppressed. It is
as if we only had the high-frequency detail left and the low frequency variations disappear. The
reason is obvious. In areas were pixel values don’t change very much, the output gray values
will become 0, because there are no differences among the gray values. The input pixels will be
replaced by the value 0 because we are subtracting from the gray value the average value of the
surrounding pixels.
This high-pass filter can be used to emphasize (highlight) the geometric detail. But if we do
not want to suppress the background, as we have seen in the pure high pass-filter, we need to
re-introduce it. This leads to a particular type of filter that is popular in the graphic arts: the
unsharp masking or USM. The high pass-filtered image really is the difference between an original
image and a low-pass version of the image, so that only the high frequency content survives. In
the USM we would like to have the high-pass-version of the image augmented with the original
image. We obtain this by means of a high pass-filter version of the image and adding to it the
original image, however multiplied by a factor A − 1, where A > 1. If A = 1 we have a standard
high pass-filter. As we increase A we add more and more of the original image back. The effect
is shown in Slide 7.30. In that slide we have a 3 by 3 filter window and the factor A is shown
variably as being 1.1, 1.15, and 1.2. As A increases, the original image gets more and more added
back in, to a point where we get overwhelmed by the amount of very noisy detail.
Prüfungsfragen:
138
CHAPTER 7. FILTERING
• Gegeben sei eine Filtermaske entsprechend Abbildung ??. Um was für eine Art Filter handelt
es sich hier?
• Gegeben sei ein Bild nach Abbildung ??. Was sind die Ergebnispixel im Ergebnisbild an
den markierten drei Orten nach Anwendung der Filtermaske aus Abbildung ???
• Eines der populärsten Filter heißt Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte
”
um eine einfache formelmäßige Erläuterung.
• In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in
der Mitte heller ist als am Rand. Geben Sie ein Verfahren an, das diese Störung entfernt!
• Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas flau“.
”
1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert.
2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes
zu verbessern?
Wird durch diese Methoden auch der Informationsgehalt des Bildes vergrößert? Begründen
Sie Ihre Antwort.
• Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass
1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals
unverändert lässt,
2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals
vollständig unterdrückt!
Antwort: siehe Abbildung 7.3
(a) Tiefpass
(b) Hochpass
Figure 7.3: Tief- und Hochpassfilter
7.5
The Derivative Filter
A very basic image processing function is the creation of a so called edge-image. Recall that we
had one definition of “edge” that related to the binary image early on in this class (Chapter 1).
That definition of an edge will now be revisited and we will learn about a second definition of an
edge.
Let us first define what a gradient image is. We apply a gradient operator to the image function
f (x, y). The gradient of f (x, y) is shown in Slide 7.32, denoted as ∇ (Nabla). A gradient is thus a
7.5. THE DERIVATIVE FILTER
139
multidimensional entity; in a two dimensional image we obtain a two dimensional gradient vector
with a length and a direction. We now have to associate with each location x, y in the image these
two entities. The length of the gradient vector is of course the Pythagorean sum of its elements,
namely of the derivatives of the gray-value function with respect to x and y. We typically use
the magnitude of the gradient vector and ignore it’s direction. However, this is not true in every
instance.
We are not dealing with continuous tone images but discrete renditions in the form of pixels and
discrete matrices of numbers. We can approximate the computation of a gradient function by
means of a three by three matrix as explains. The 3 × 3 matrix has nine values z1 , z2 , . . . , z9 . We
approximate the derivative my means of a first difference, namely z5 − z8 , z5 − z6 , and so forth.
The magnitude of the gradient function is being approximated by the expression shown in Slide
7.34. We define a way of computing the gradient in a discrete, sampled digital image avoiding
squares and square roots. We can even further simplify the approximation as shown in Slide 7.34
namely as the sum of the absolute values of the differences between pixel gray values. We can
also use gradient approximations by means of cross-differences, thus not by horizontal and vertical
differences along rows and columns of pixels in the window.
Some of these approximations are associated with their inventors. The gradient operator
∇f ≈ |z5 − z9 | + |z6 − z8 |
is named after Roberts. Prewitt’s approximation is little more complicated:
∇f ≈ |(z7 + z8 + z9 ) − (z1 + z2 + z3 )| + |(z3 + z6 + z9 ) − (z1 + z4 + z7 )|
Slide 7.36 computation is being implemented. Two filter functions are sequentially being applied
to the input image, and the two resulting output images are being added up. The Roberts
operates with two windows of dimensions 2 × 2. The case of Prewitt uses two windows with
dimensions 3 × 3, and a third gradient approximation by Sobel also uses two 3 × 3 windows.
Lets take a look at an example: Slide 7.37 shows a military fighter plane and the gradient image
derived from it using the Prewitt-operator. These gradient images can then be post-processed
by e.g. removing the background details, simply by reassigning gray values above a certain level
to zero or one, or assign gradients of a certain value to a particular colour such as white or black.
This will produce from an original image the contours of its objects, as seen Slide 7.37.
We call the resulting image, after a gradient operator has been applied, an edge-image. However,
in reality we don’t have any edges yet. We still have a gray-tone image that visually appears like
as image of edges and contours. To convert this truly to an edge image we need to treshold the
gradient image so that only the highest valued pixels get a value one and all lower value pixels are
set to 0 (black) and are called “background”.
This means that we have produced a binary image where the contours and edgy objects are marked
as binary elements. We now need to remove the noise, for example in the form of single pixels
using by a morphological filter. We also have to link up the individual edge pixels along the
contours so that we obtain contour lines. Linking up these edges is an operation that has to do
with “neighbourhoods” (Chapter 1) we also need to obtain skeletons and connected sequences of
pixels as discussed previously (Chapter 3).
Prüfungsfragen:
• Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett
umrandeten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das
Ergebnis direkt in Abbildung B.13 eintragen.
140
CHAPTER 7. FILTERING
9 9 8 8 6 7 6 6
7 8 9 8 7 2 3 1
0 5 6 3 2
2 12 4 2 3
3 12 3 1 2
6 8 7 8 3 2 0 1
8 7 8 2 3 1 1 2
7 6 7 1 0 2 3 1
7 6 8 2 2 1 2 0
Figure 7.4: Roberts-Operator
• Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden.
Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des
fett umrandeten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen.
Führen Sie außerdem für eines der Pixel den Rechengang vor.
• Wenden Sie auf den fett umrandeten Bereich in Abbildung B.34 den Roberts-Operator zur
Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen.
Antwort: Siehe Abbildung 7.4
7.6
Filtering in the Spectral Domain / Frequency Domain
We define a filter function H(u, v) in the spectral domain as a rectangular function, a so-called
box function. Multiplying the Fourier transform of an image by H(u, v) produces the spectral
representation of the final image G(u, v) as a product of H and F. We have a transform-function of
the shape of filter-function H as shown in Slide 7.39, that has the value 1 at the origin, and from
the origin to a value D0 and we assume that H is rotationally symmetric. In the frequency domain
the value D0 is denoted as a cut-of-frequency. Any frequency beyond D0 will not be permitted
through the filter function.
Let us take a look of how this works. In Slide 7.40 we have the image of the head of a bee course in
the spectral domain we would not be able to judge what the image shows. We can create a spectral
representation by applying a Fourier-transform to the image and we can now define circles in
the spectral representation with the centre at the origin of the spectral domain and radius that
contains 90%, 93% or more of the image frequencies, also denoted as the “energies”. Now if we
apply a filter function H is shown before that will only let the frequencies pass through within
90% of the energy and than we transform the resulting function G back from this spectral into
the spatial domain to obtain an image g we obtain a blurred version of the original image. As we
let more frequencies go throw the blur will be less and less. What we have obtained is a series of
low-pass filtered images of the head of the bee and we also have indicated how much of the image
content we have filtered out and how much we have let go through the low-pass filter.
If we transform the function H from the spectral domain into the spatial domain, we obtain Slide
7.41. If we apply this filter function to an image that contains nothing but 2 white points, we
will obtain an image g that will appear corrupted, presenting us with a ghost image. We should
therefore be careful with that type of box filter (in the spectral domain). The ghost images of
high contrast content in our input image will be disturbing. It is advisable to not use such a box
filter, which is sometimes also called ideal filter . We should use instead an approximation . We
introduce the Butterworth filter as shown in . The Butterworth filter is represented in the
spectral domain by a curve as shown in the slide:
1
H(u, v) =
1+
D(u,v)
D0
2n
7.7. IMPROVING NOISY IMAGES
141
In two dimensions this is a volcano-like shape. Applying that type of filter now to the bee produces
a series of low pass filtered images without ghost images . A straight example of the difference of
a applying the box-filter as opposed to the Butterworth filter is shown in .
Of course this entire discussion of spectral and spatial domains and of convolutions and filters
requires space, time and effort and is related to a discussion of the Fourier transform, and
the effect of these transforms and of the suppression of certain frequencies on the appearance of
functions. Typically throughout an engineering program the signals are mostly one-dimensional,
whereas in image processing the typical signals are two-dimensional. A quick view of Fourier
transforms of certain functions illustrates some of what one needs to be aware off. Slide 7.47
presents one function F (u) in the spectral domain in the form of a rectangular pulse. Its transform
into the spatial domain gives us the sinc-function, as previously discussed as f (x) = sin(πx)/(πx).
Now if we cut off the extremities of f (x) and then transform that function back into the spectral
space we obtain a so-called ringing of the function. Giving up therefore certain frequencies in the
spectral domain can lead to a certain noisiness of the signal in the other domain.
Prüfungsfragen:
• Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit
der cutoff“-Frequenz D0 an! Skizzieren Sie die Transferfunktion!
”
7.7
Improving Noisy Images
There are many uses of filters. We have already found the use of filters to enhance edges, and
pointed out that filters transform individual pixels. We may use filters also to remove problems
in images. Let us assume that we have compressed an image from 8 bits to 4 bits and therefore
have reduced a number of available grey values to 16. We have an example in Slide 7.49 where
the low number of gray values creates artefacts in the image in the form of gray value contours.
By applying a low-pass filter we can suppress the unpleasant appearance of false density contours.
Another example also in Slide 7.49 is an image with some corruption by noise. A low pass filter
will produce a new image that is smoother and therefore more pleasant to look at. Finally, we
want to revisit the relationship between “filter” and “sampling”.. Slide 7.51 illustrates again the
monkey-face: smoothing an image by a low-pass filter maybe equivalent to sampling the image,
then reconstructing it from the samples.
Prüfungsfragen:
• Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer
diskretisierten Bildfunktion. Erklären Sie diese Behauptung!
7.8
The Ideal and the Butterworth High-Pass Filter
??? inspection we may want to use high-pass filters, because our eye likes the crisp, sharp edges
and a high level of energy in an image. Slide 7.53 introduces such a high pass filter in the
spectral domain. The ideal high-pass filter lets all high frequencies go through and supresses
all low frequencies. This “ideal” filter has the same problems as we have seen in the low-pass
case. Therefore we may prefer the Butterworth high-pass filter is not a box, but a monotonaes
function. Of course in the 2-dimensional domain the ideal and Butterworth high-pass filters
appear like a brick with a hole in it. Application of the high-pass filter is to enhance the contrast
and bringing out the fine detail of the object as shown in . The high-pass filter improves the
142
CHAPTER 7. FILTERING
appearance of the image, suppresses the background. If we addin the original image in a highpass filtered version we come again to a type of ”emphasis filter” that we have seen earlier under
the name “unsharp masking” (USM). The resulting image can be processed into an equalized
histogram for optimum visual inspection.
Again high-pass filters can be studied in booth the spatial and the spectral domains . We have
the sinc function in the spectral domain which represents in the spatial domain a box-function, a
pulse. The sinc2 function is a triangular function in the spatial domain. And a Gaussian function
will remain a Gaussian function both in the spectral and in the spatial domains.
Prüfungsfragen:
• Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters und vergleichen Sie die Vor- und Nachteile beider Filtertypen!
7.9
7.9.1
Anti-Aliasing
What is Aliasing ?
Recall the rasterization or scan-conversion of straight lines and curves, and the resulting aliasing.
Suppose we work with a trigonometric function of some sort. This function is being sampled
at certain widely spaced intervals. Reconstruction of the function from samples will produce
a particular function that’s not really there. What is shown in Slide 7.58 is a high frequency
function, whereas the samples describe a low frequency sinus-curve. We denote the falsification of
the original function into one of a different frequency with “aliasing”. This type of aliasing is a
widely reviewed subject of sampling theory and signal processing and is not particular to image
processing or graphics.
Aliasing is a result of our need to sample continous functions, both in the creation of images and
in the creation of visualizations of objects in computer graphics.
7.9.2
Aliasing by Cutting-off High Frequencies
explains the issue further with an excursion into sampling theory. We have an input-image f (x) in
the spatial domain that needs to be sampled. As we go into the spectral domain we cannot use all
frequencies. We cut them off at w and we loose all frequencies outside the interval −w ≤ F (u) ≤ w.
Let us now define a sampling function in the spatial domain as s(x), consisting of a series of Diracfunctions at an interval ∆x. The multiplication of f (x) with s (x) produces the sampled function in
the spatial domain. As we go into the spectral domain we also obtain a set of discrete frequencies
s(u) at 1/∆x, 2/∆x.
If we now convolve (in the spectral domain) the sampling function S(u) with the original function
F (u) we get the spectral view of the sampled function f (x) · s(x). We see the original function
F (u) repeated at locations −1/∆x, +1/∆x, . . . Transforming this back into the spatial domain
produces samples from which the original function f (x) can only be incompletely reconstructed.
What is now the effect of changing ∆x? If we make it smaller we get a more accurate sampling
of the input function in accurdance with . We see in the spectral domain that the repetitions
of the original function F (u) in F (u) ∗ S(u) are spaced apart at wider intervals 1/∆x, as ∆x
gets smaller. Slide 7.62 illustrates that we could isolate the spectrum of our function f (x) by
multiplying F (x) ∗ S(u) by a box filter G(u), producing F (u) and we can fully reconstruct f (x)
from the samples. If w was the smallest frequency in our function f (x) or F (u), then we have no
7.9. ANTI-ALIASING
143
less if the sampling interval ∆x is smaller than 1/2w:
∆x ≤
1
2w
Whittaker-Shannon theorem
In turn we define a cut-off frequency w and denote it the Nyquist-frequency that is fully represented by a sampling interval ∆x if w = 1/(2∆x) Nyquist frequency.
7.9.3
Overcoming Aliasing with an Unweightable Area Approach
Of course the implementation is again as smart as possible to avoid multiplications and divisions,
and replaces them by simpler operations. The approach in Slide 7.63. Aliasing occurs if ∆x
violates the Whittaker-Shannon theorem. Anti-Aliasing by means of a low-pass filter occurs
in the rasterization or scan conversion of geometric elements in computer graphics. We have
discussed this effect in the context of scan conversion by means of the Bresenham-approach.
Slide 7.63 explains another view of the issue, using the scan-conversion of a straight line. We can
assign grey values to those pixels that are being touched by the area representing the “thin-line”.
This would produce a different approach from Bresenham because we are not starting out from
a binary decision that certain pixels are in, all others are out: We instead select pixels that are
“touched” by the straight line, and assign a brightness proportional to the area that the overlap
takes up.
7.9.4
Overcoming Aliasing with a Weighted Area Approach
Algorithm 21 Weighted Antialiasing
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
set currentX to x-value of start of line
while currentX smaller than x-value of end of line do
apply Bresenham’s Line Algorithm to get appropriate currentY -value
consider three cones (each with diameter of 2 pixels and volume normalized to 1)
erected over the grid positions (currentX, currentY + 1), (currentX, currentY ) and
(currentX, currentY - 1 )
for all cones do
determine the intersection of the cone’s base with the line
calculate the volume above the intersection
multiply the obtained volume with the desired gray value
take the result and set it as the pixel’s gray value
end for
increase currentX
end while
In weighted area sampling we also decrease a pixel´s brightness as it has less overlap with the area
of the “thin line”. But not all overlap areas are treated equal! We introduce a “distance” from the
center of a pixel for the overlap area. With this basic idea in mind we can revisit the unweighted
area sampling and treat all overlap areas equally, implementing a “box-filter” as shown in . Each
overlap area is multiplied with the same value represented us the height of the box, normalized to
1.
A weighted area sampling approach is shown in . The “base” of the filter (its support) is circular
and larger than a pixel, typically with a diameter at 2x the pixel´s side length. The height of the
filter come in such that its volume is 1.
illustrates the effect that a maring small triangle would have on pixels as it moves across an image.
The triangle is smaller than a pixel.
144
CHAPTER 7. FILTERING
Getting Antialiased Lines by Means of the Gupta-Sproull approach.
Algorithm 22 Gupta-Sproull-Antialiasing
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
dx := x2 − x1;
dy := y2 − y1;
d := 2*dy − dx;
incrE := 2∗ dy;
incrNE := 2 ∗ ( dy − dx);
two v dx := 0;
invDenom := 1/(2∗ Sqrt(dx ∗ dx + dy ∗ dy));
two dx invDenom := 2∗ dx ∗ invDenom;
x := x1;
y := y1;
IntensifyPixel (x, y, 0);
IntensifyPixel (x, y +1, two dx invDenom);
IntensifyPixel (x, y −1, two dx invDenom);
while x < x2 do
if d < 0 then
two v dx := d + dx;
d := d + incrE;
x := x + 1;
else
two v dx := d − dx;
d := d + incrNE;
x := x + 1;
y := y + 1;
end if
IntensifyPixel (x, y, two v dx ∗ invDenom);
IntensifyPixel (x, y +1, two dx invDenom-two v dx ∗ invDenom);
IntensifyPixel (x, y −1, two dx invDenom-two v dx ∗ invDenom);
end while
intensity := Filter(Round(Abs(distance)));
WritePixel (x, y, intensity);
Using the weighted area method, we can pre-compute a table for lines at different distances from
a pixel’s center. A line will typically intersect those cones centered on three pixels as shown in
Slide 7.69, but it may intersect also only 2, maximally 5 such cones. The look-up table is filled
with values, computed by using to the definitions of as a function F (D, t) of two variables: t as
the line’s thickness and D as the distance from a pixel center. Gupta and Sproull, two early
pioneers of computer graphics, introduced the table look up for a 4-bit display device. There are
only 16 values of D needed since a 4-bit display only has 16 different gray values. The Bresenham
method (the midpoint line algorithm) needs now to be modified to not only decide on the E or
NE pixel, but we also need to assign a grey value. However, we not only set a grey value for the
single pixels at E or NE, but also for its two neighbours above and below.
Slide 7.70 illustrates how distance D is being computed using simple trigonometry:
dx
D = vp
dx2 + dy 2
And we need two additional distances Dabove and Dbelow :
Dabove
dx
= (1 − v) p
dx2 + dy 2
(7.1)
7.9. ANTI-ALIASING
145
Dbelow
dx
= (1 + v) p
dx2 + dy 2
(7.2)
Prüfungsfragen:
• Erklären Sie, unter welchen Umständen Aliasing“ auftritt und was man dagegen unterneh”
men kann!
• In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären Sie, wie die Artefakte am oberen Bildrand zustandekommen, und beschreiben Sie eine
Möglichkeit, deren Auftreten zu verhindern!
146
CHAPTER 7. FILTERING
7.9. ANTI-ALIASING
147
Slide 7.1
Slide 7.2
Slide 7.3
Slide 7.4
Slide 7.5
Slide 7.6
Slide 7.7
Slide 7.8
Slide 7.9
Slide 7.10
Slide 7.11
Slide 7.12
Slide 7.13
Slide 7.14
Slide 7.15
Slide 7.16
Slide 7.17
Slide 7.18
Slide 7.19
Slide 7.20
Slide 7.21
Slide 7.22
Slide 7.23
Slide 7.24
Slide 7.25
Slide 7.26
Slide 7.27
Slide 7.28
148
CHAPTER 7. FILTERING
Slide 7.29
Slide 7.30
Slide 7.31
Slide 7.32
Slide 7.33
Slide 7.34
Slide 7.35
Slide 7.36
Slide 7.37
Slide 7.38
Slide 7.39
Slide 7.40
Slide 7.41
Slide 7.42
Slide 7.43
Slide 7.44
Slide 7.45
Slide 7.46
Slide 7.47
Slide 7.48
Slide 7.49
Slide 7.50
Slide 7.51
Slide 7.52
Slide 7.53
Slide 7.54
Slide 7.55
Slide 7.56
7.9. ANTI-ALIASING
149
Slide 7.57
Slide 7.58
Slide 7.59
Slide 7.60
Slide 7.61
Slide 7.62
Slide 7.63
Slide 7.64
Slide 7.65
Slide 7.66
Slide 7.67
Slide 7.68
Slide 7.69
Slide 7.70
150
CHAPTER 7. FILTERING
Chapter 8
Texture
8.1
Description
Texture is an important subject in the analysis of natural images of our environment and in the
computer generation of images if we want to achieve photo-realism. Slide 8.3 illustrates three
different sets of textures. The first maybe of pebbles on the ground, the second of a quarry to
mine stones and the third is a texture of fabric. We can describe texture (a) pictorially be means
of the photograph of the surface (b) or by a set of mathematical methods: these may be statistical,
structural or spectral. And finally we will present a procedural approach to modeling and using
texture.
Prüfungsfragen:
• Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an.
8.2
A Statistical Description of Texture
Recall the image function as z = f (x, y) with the image gray values z. We can compute so-called
moments of the image gray values as shown in Slide 8.7. The moments are denoted as µn (z).
The first moment, µ1 (z) is the mean of the gray values. The second moment m2 (z) represents
the variance of the gray values with respect to the mean, see definition 8.2 . Moments include the
probability of the gray value p(z). In a discrete context probability is represented by the histogram
of the gray values. Obviously if a gray value is very unlikely to occur, its column in the histogram
will be very low or empty.
The measure of texture can be a function of these moments. A very simple one is the value R .
If there is no variation in gray value then its variance σ 2 or its standard deviation σ or its second
moment µ2 (z) is 0 or close to 0. In that case the value of R is 0 as well. R therefore represents a
measure of the smoothness of the image. We can associate a separate value R with each pixel i
by computing it for a window around that pixel i.
There are other statistical measures of texture, for example associated with the “edginess” of
an area. In this case we would produce an edge value associated with each pixel, for example
representing the number, direction and strength of the edges in small windows surrounding a
pixel. Nominally we obtain a different texture parameter at each pixel. However, we are looking
to describe an extended image by regions of similar texture. Therefore we will classify the texture
parameters into a few groups. We may create an equidensity image as discussed previously. If
151
152
CHAPTER 8. TEXTURE
µn (z)
=
X
[(zi − m)n ∗ p(zi )]
z ... the grayvalue Image, zi the gray value of the i-th pixel in the Image
m ... mean value of z ( average intensity )
µn ... n-th moment of z about the mean
R
=
1 − 1/(1 + σ 2 (z))
R ... a measure of the relative smoothness
σ 2 ... variance
we do that, we might be able to describe a quarry with a only two texture parameters as shown
in Slide 8.8 and . While the quarry itself has been delineated manually by a human operator, a
texture parameter is computed within this delineated polygon and the equidensity method applied
to the texture parameter will define two different textures in this quarry.
A very frequently used texture measure is the so called co-occurrence-matrix which is seeking to
describe the occurrence of similar patterns in an image. We are not discussing this in this context
other than to mention the name.
Prüfungsfragen:
• Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden? Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern!
8.3
Structural Methods of Describing Texture
In order to understand the concept of a structural texture description we refer to Slide 8.11. We
define a rule that replaces a small window in a image by a pattern, for example we replace a
window by a pattern “aS” and “a” may represent a circle. If we now apply the same operation
multiple times, we do get an arrangement of repetitive patterns located adjacent to one another
in a row and column pattern “a a a S”. We might denote the neighbourhood relationship between
adjacent areas by different symbols. Slide 8.12 below the current location, “c” to the left. We can
now describe a certain pattern by a certain sequence of “a”, “b” and “c” operations. A texture
primitive which is shown here as a circle, could be any kind of other pattern. We set up our
texture by repeating the pattern. Note again that a this point we are concerned with describing
texture as we find it in a natural image. We are not, at this time, generating a texture for an
object that we want to visualise.
Prüfungsfragen:
• Erläutern Sie die strukturelle Methode der Texturbeschreibung!
8.4
Spectral Representation of Texture
We have previously discussed the technical viability to represent an image in a computer. In
the spatial domain we are using the rows and colums of pixels, in the spectral domain we are
8.5. TEXTURE APPLIED TO VISUALISATION
153
using the frequencies. The description of texture using the spectral representation of an image is
therefore described next. Slide 8.14 illustrates a typical texture pattern. Its spectral representation
illustrates that there are distinct patterns that are repeated in the image. These are illustrated
by dominant frequencies in the image.
We call the two dimensional function in the spatial domain the spectral function s(r, j), where r
is the radius of a certain spectral location from the origin and j is the angle from the axis x in a
counter-clock wise direction. Any location in the spectral representation of the image therefore has
the coordinates r, j. Slide 8.15 explains this further. We can simplify the spectral representation
of the image into two functions. One functions is a plot of the angle j as a function of r and the
other is a plot of the function r for a given value of j. Slide 8.16 illustrates two different patterns
of textures and the manifestation of those patterns in the j-curve. A texture parameter can now
be extracted from the spectral representation, for example by counting the number of peaks or
the average distance between the peaks in the spectral domain.
We can also set up a texture vector with several values that consider the number of peaks as a
function of the radius r. The aim is to associate with a pixel or a window in the image a simple
number or vector that is indicative of the type of texture one finds there. Therefore we have
here a case of classification where we could take a set of know textures and create from those a
feature space in two or more dimensions (see Chapter 14). If we now have an unknown texture
we might try to describe this in terms of the known textures using the feature space and looking
for the nearest texture that we can find given the texture numbers of the unknown texture. In
this manner we can replace an input image by a texture image which indicates at each location
the kind of texture which exists there. In classifying areas of similar texture as one area we will
replace a large number of pixels by a small numbers of textures and a description of the contour
of an area of uniform texture.
Prüfungsfragen:
• Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum
auf? Welche Aussagen können über eine Textur anhand ihres Spektrums gemacht werden?
• Das digitalen Rasterbild aus Abbildung B.71 soll segmentiert werden, wobei die beiden
Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung hier nicht erfolgreich sein. Welche anderen Bildeigenschaften kann man verwenden, um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können?
8.5
Texture Applied to Visualisation
To achieve photorealism in the visualisation of two- or three-dimensional objects we employ descriptions of texture rather than texture itself. We may apply artificial texture, also denoted as
synthetic texture and place this on the geometric polygons describing the surface shape of an
object. The texture itself may consist of texture elements which are also denoted as texels. Slide
8.19 is an example of some simple objects showing a wire-frame rendering of an indoor scene and
illustrates how unrealistic this type of representation appears. Slide 8.20 is a result of placing
photographic texture on top of the objects. We obtain a photorealistic representation of those
objects. Basic concepts in Slide 8.21 are illustrated by various examples of a two-dimensional flag,
a three-dimensional indoor-scene, a two dimensional representation of symbols, phototexture of
wood, texture of some hand-written material.
How is this photographic texture applied to a geometrically complex object? This is illustrated
in Slide 8.22, see also 23. We deal with three different coordinate systems. At first we have the
representation on a monitor or display medium. And a window on this display contains a segment
154
CHAPTER 8. TEXTURE
Algorithm 23 Texture mapping
1:
2:
3:
4:
5:
6:
7:
8:
surround the object with a virtual cylinder
for all pixels of the texture do
make a coordinate transformation from carthesian to cylindric coordinates
{to wrap the
texture on the cylinders surface}
end for
for all points of the object do
project the point perpendicularly from the midpoint of the cylinder to the cylinders surface
where the projection cuts the edge of the object, assign the object point the color of the
corresponding cylinder point
end for
of a three dimensional object which is represented in a world-coordinate system. The surface of
this object needs to be photo-textured and it receives that photo-texture from a texture map with
its third coordinate system. Essentially we a projecting the texture map onto the curved surface
of an object and than render the curved surface on the display medium using a transformation
that results from a synthetic camera and with a camera pose consisting of attitude and angle
orientation.
Prüfungsfragen:
• Erklären Sie, wie in der Visualisierung die Qualität eines vom Computer erzeugten Bildes
durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften (insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer
Textur eignen.
• In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht
zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung
solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche
Fälle typisch?
8.6
Bump Mapping
In order to provide a realistic appearance to a surface which is not smooth but bumpy, there exists
a concept called bump-mapping. This applies a two dimensional texture to a three-dimensional
object and making the two dimensional texture appear as if it were three dimensional. Slide 8.24
and Slide ?? explain the concept with a donut and a strawberry. Note that the texture really is
two dimensional. The third dimension is introduced by some 2D-picture of a shadow and detail
that is not available in the third dimension. This is visible in the contours of the object where
the bumps on the texture are not reflected in the geometry of the object. In this case we do not
apply the photographic texture we did use in the previous chapter, but we deal with a computed
texture.
Prüfungsfragen:
• Was beschreibt der Begriff Bump-Mapping“?
”
• In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle
einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet.
Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte
dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt,
und was sind die typischen Eigenschaften, anhand derer man das Verfahren hier erkennen
kann?
8.7. 3D TEXTURE
8.7
155
3D Texture
Another concept of texture is three dimensional. In this case we do not texture a surface but an
entire three dimensional body. An example is shown in Slide 8.27 where the surface results from
the intersection of the three dimensional texture body with the surface geometry.
Prüfungsfragen:
• Was ist eine 3D Textur“?
”
8.8
A Review of Texture Concepts by Example
Slide 8.29 illustrates from an animated movie an example of the complexities of applying photorealistic textures to three dimensional objects. We begin with basic shapes of the geometric entity
and apply to it some basic colours. We superimpose on these colours an environment map. This
is again modified by a bump map and the appropriate illumination effect. The intermediate result
is shown in Slide 8.31 adding dirt specks for additional realism. Slide 8.32 adds further details:
We want to add details by creating a near-photographic texture, by adding more colour, the effect
of water troplets, mirror and spectral reflections.
We should not be surprised that the creation of such animated scenes consumes growing computing
power and therefore takes time the complete. The final result is in Slide 8.33.
8.9
Modeling Texture: Procedural Approach
As previously discussed we process natural images to find a model of texture and we use those
models to create images. Slide 8.35 details the method of analysing existing texture. We have a
real scene of an environment and we do understand from the image model that the intensity in
the image is a function of the material property fr and the illumination Ei . Material property is
unknown and needs to be determined from the raster image. Illumination is known. We estimate
the model parameters for the material property and we use it to approximate objects. The photo
texture use this and the virtual scene with the unknown material and illumination properties to
compute the density per pixel and thereby obtain a synthetic image. An issue is now to find a
method of model an unknown texture by simple curves.
Slide 8.36 explains how a reference surface, a light source, a camera and a texture to be analysed
can be set up into a sensor system. The resulting image is illustrated in Slide 8.37 with the known
reference surface and the unknown texture.
We need to have the reference texture so that we can calibrate the differences in illumination. As
seen in the previous slide we have an image of texture and an effect of illumination, particularly
we may have mirror or specular reflection. We do not discuss models for reflection at this time but
just show a given model in Slide 8.38 for illustration purposes. We have for each pixel a known
gray value f and we know the angle Qi under which a pixel is being illuminated and how the
reflection occurs . We will discuss the parameters of the illumination model in Chapter ??. We
need to compute the parameters of the reflections that are marked.
In Slide 8.39 we study a particular column of pixels that represent a gray value curve of an
unknown photo texture. The question is: what is “texture” here ? Slide 8.40 explains. We do
have the actual brightness along the row of pixels plotted and we model the change of brightness
as a function of the illumination with an average that we can calibrate with our reference pattern.
The deviation from the average is then the actual texture in the form of an irregular signal. We
156
CHAPTER 8. TEXTURE
now need to describe that signal statistically by means of a few simple numbers. How to do this
is a topic of “statistical signal analysis”, for example in a spectral representation of the signal as
previously discussed in section 8.4.
Let us review the basic idea in a different way. We have an image of a surface and we can
take a little window for analysis. We can create a texture surface by projecting that window
multiple times onto the surface and we may obtain in the process some type of “tiling effect”. The
procedural texture discussed before will model the surface texture by mathematics and avoid the
seaming effect of the individual tiles. We can create any kind of shapes in our synthetic surface as
shown in Slide 8.43. We can illustrate in Slide 8.44 that those shapes can be fairly complex even
in three dimensions.
Prüfungsfragen:
• Was versteht man unter prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile
”
bringt ihr Einsatz?
8.9. MODELING TEXTURE: PROCEDURAL APPROACH
157
158
CHAPTER 8. TEXTURE
Slide 8.1
Slide 8.2
Slide 8.3
Slide 8.4
Slide 8.5
Slide 8.6
Slide 8.7
Slide 8.8
Slide 8.9
Slide 8.10
Slide 8.11
Slide 8.12
Slide 8.13
Slide 8.14
Slide 8.15
Slide 8.16
Slide 8.17
Slide 8.18
Slide 8.19
Slide 8.20
Slide 8.21
Slide 8.22
Slide 8.23
Slide 8.24
Slide 8.25
Slide 8.26
Slide 8.27
Slide 8.28
8.9. MODELING TEXTURE: PROCEDURAL APPROACH
159
Slide 8.29
Slide 8.30
Slide 8.31
Slide 8.32
Slide 8.33
Slide 8.34
Slide 8.35
Slide 8.36
Slide 8.37
Slide 8.38
Slide 8.39
Slide 8.40
Slide 8.41
Slide 8.42
Slide 8.43
Slide 8.44
160
CHAPTER 8. TEXTURE
Chapter 9
Transformations
9.1
About Geometric Transformations
We will discuss in this chapter the transformation of objects in a fixed coordinate system, the
change of coordinate systems with a fixed object, the deformation of objects, so that from an
input object the geometrically changed output object results, we will discuss projections of the
3D-world into a 2D-display plane and finally we will discuss under the heading of “transformations”
the change in representation of an object if we approximate it by simple functions and we denote
this as approximation and interpolation.
Geometric transformations apply when objects move in a fixed world coordinate system, but they
also apply when we need to look at objects and have to create images or use images of objects to
reconstruct them. In that case we need to understand the projection of the object into an image
or display medium. A very important application of geometric transformation is in robotics. This
is can be unrelated to the processing of digital visual information, but employ the same sets of
formule and ideologies of “transformation”. A simple robot may have associated with it numerous
coordinate systems which are attached to its rigid elements. Each coordinate system is related
to each other coordinate system by a coordinate transformation. Slide 9.3 and Slide 9.4 explain
how a world coordinate system is home to the robot’s body which in turn is the reference for the
robot arm. The arm holds the hand, the hand holds the fingers and the fingers seek to relate to an
object or box which itself is presented in the world coordinate system. Slide 9.4 illustrates these
six coordinate systems in a simplified presentation in two dimensions.
Our interest is in geometric transformations concerning the use of imagery. Slide 9.5 illustrates
an early video image of the surface of planet Mercury, from NASA’s Mariner mission in the mid
1960’s. We do need to relate each image to images taken from other orbits, and we need to place
each image into a coordinate reference frame that is defined by meridians, the equator and poles of
the planet. Slide 9.6 represents a geometric rectification of the previous image. The transformation
is into a Mercator or Stereographic projection. We can see the geometric correction of the image
if we note that the craters which were of elliptical shape in the original image, now approximate
circles, as they would appear from an overhead-view straight down. Such a view is also denoted
as an orthographic projection.
9.2
Problem of a Geometric Transformation
The geometric transformation applies typically to a 2-dimensional space in the plane, the 3dimensional space as in the natural human environment and more generally to n-dimensional space.
161
162
CHAPTER 9. TRANSFORMATIONS
In the processing of digital visual information, most of our geometric transformations address the
3-dimensional space of our environment and the 2-dimensional space of a display medium. Slide
9.8 illustrates the transformation of objects in a rigid coordinate system (x, y) in a 2-dimensional
space. We have in this example 2 objects, 1 and 2, before the transformation, and 1’, 2’ after
the transformation. The general model of a rigid body transformation in 2-D the space is shown
inSlide 9.9: the equation takes the input (x, y) coordinates and produces from them the output
x0 , y 0 coordinates using transformation parameters a0 , a1 , a2 and b0 , b1 , b2 . Slide 9.10 illustrates
the usefulness of this formulation, if we have given objects before and after the transformation
and we need to determine (“estimate”) the unknown parameters of the transformation. Given are
therefore: x1 , y1 , x2 , y2 , x01 , y10 , x02 , and y20 , and we seek to compute a0 , a1 , a2 , b0 , b1 , b2 .
We may also know the transformation parameters and need to compute for each given input
coordinate pair (x, y) its associated output coordinate pair x0 , y 0 as illustrated in Slide 9.11.
This concludes the introduction of the basic ideas of transformations using the example of 2
dimensional space.
9.3
Analysis of a Geometric Transformation
We will use the example of a 2-dimensional object that is transformed in 2D-space under a socalled conformal transformation which does not change the angles of the object. The following
illustrates in Slide 9.13, Slide 9.14, Slide 9.15 through Slide 9.16 the elements from which a
geometric transformation in 2D-space is assembled. A very basic element of a transformation
always is the translation. We add to each pair (x, y) an object’s translational component tx and
ty to produce the output coordinates x0 , y 0 .
Definition 18 Conformal transformation
x0
y0
= s · cos(α) · x − s · sin(α) · y + tx
= s · sin(α) · x + s · cos(α) · y + ty
A second important transformational element is scaling. An object gets reduced or enlarged by a
scale factor s ( see definition 18 ), and more generally we might use 2 different scale factors in the
x coordinate direction denoted sx and in the y coordinate direction denoted sy . As a result, we
may obtain a squished thus deformed object. We call a deformation by means of 2 different scale
factors an affine deformation and will discuss this later.
Finally we have rotations and rotate an object by an angle α. The transformation equation
representing the rotation is shown in Slide 9.15. For a rotation we need a point around which we
rotate the object. Normally this is the origin of the coordinate system. The general expression for
a rotation using a rotation angle α produces the output x0 , y 0 coordinates from the input (x, y)
coordinates by multiplying those coordinates with cos α and sinα in accordance with Slide 9.16.
This can also be presented in matrix notation, resulting in the expression p0 = R · p, and we call
R the rotation matrix.
What makes now a transformation in 2D-space specifically a conformal transformation? We
already stated that this does not change any angles. Obviously this requires that our body not
be changed in shape. Instead it may be enlarged or reduced, it may be translated and it may be
rotated, but right angles before the transformation will be right angles after the transformation
as well. Slide 9.17 explains that we combine the three elements of the 2D-transformation that we
denoted as scaling by factor s, rotating by angle α and translating by the translation elements tx
9.3. ANALYSIS OF A GEOMETRIC TRANSFORMATION
163
and ty . We call this a four parameter transformation since we have four independent elements of
the transformation: s, α, tx , ty . In matrix notation this transformation is
x0 = s · Rx + t,
and s · R can be replaced by the transformation matrix M.
We have described a transformation by means of Cartesian coordinates (x, y). One could use
polar coordinates (r, φ). A point with coordinates (x, y) receives the coordinates (r, φ). A rotation
becomes a very simple operation, changing the angle φ by the rotation angle ω. The relationships
between (x, y) and (r, φ) are fairly obvious:
x = r cos φ,
y = r sin φ.
A rotated point p0 will have the coordinates r cos(φ + ω) and r sin(φ + ω).
When performing a transformation we may have a fixed coordinate system and rotate the object
or we may have a fixed object and rotate the coordinate system. In Slide 9.20 we explain how a
point p with coordinates (x, y) obtains coordinates (X, Y ) as a result of rotating the coordinate
system by an angle α. Note that the angle α is the angle subtended between the input and output
axes. We can therefore interpret that the rotation matrix is not only filled with the elements cos α,
sin α, but we can interpret the rotation matrix to be filled with the angle subtended between the
rotation axes before and after rotation and we have the angles xX, xY , yX, yY and have them
all enter the rotation matrix with a cos(xX), cos(xY ) etc.
We have thus found in Slide 9.20 a second definition for the contents of the rotation matrix: first
was the interpretation of R with cos α and sin α of the rotation angle α. The second now is,
that the elements of the rotation matrix are the cosinus of the angles subtended by the input and
output coordinates.
Prüfungsfragen:
• In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Transformation M in das
Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an,
die diese Transformation beschreibt (zwei verschiedene Lösungen)!
Antwort: Zwei verschiedene Lösungen ergeben sich, weil das Objekt symmetrisch ist und
um die y-Achse gespiegelt werden kann, ohne verändert zu werden.


2 0 4
M1 =  0 0.5 3 
0 0 1


−2 0 12
M2 =  0 0.5 3 
0
0
1
• Berechnen Sie jene Transformationsmatrix M, die eine Rotation um 45◦ im Gegenuhrzeiger√
sinn um den Punkt R = (3, 2)T und zugleich eine Skalierung mit dem Faktor 2 bewirkt
(wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei
Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p0 = Mp in den Punkt
p0 übergeführt wird.
Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für
die Matrixmultiplikation geeignet anwenden.
Antwort:
√
M = T(3, 2) · S( 2) · R(45◦ ) · T(−3, −2)
164
CHAPTER 9. TRANSFORMATIONS

1
=  0
0

1
=  0
0

1
=  0
0
0
1
0
0
1
0
0
1
0
 
3
2 ·
1
 
3
2 ·
1
 
3
2 ·
1
√
 
2 √0 0
cos 45◦
0
2 0  ·  sin 45◦
0
0
0 1
 
1 −1 0
1 0 −3
1 1 0  ·  0 1 −2
0 0 1
0 0 1
 
1 −1 −1
1 −1
1 1 −5  =  1 1
0 0
1
0 0
− sin 45◦
cos 45◦
0

 
0
1
0 · 0
1
0

0 −3
1 −2 
0 1


2
−3 
1
• Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Transformationsmatrix (in
zwei Dimensionen) gefragt, die sich aus einer Skalierung und einer Rotation um ein beliebiges
Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Transformation?
Begründen Sie Ihre Antwort!
Antwort: Rotationszentrum (rx , ry ), Rotationswinkel (ϕ) und Skalierungsfaktor (s) ergeben
vier Freiheitsgrade.
• Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung
liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angewandt
”
werden, wobei




1 0 tx
s 0 0
T =  0 1 ty  , S =  0 s 0  .
0 0 1
0 0 1
Nach der Tranformation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt
soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p
des Objekts gemäß obiger Vorschrift in einen Punkt p0 = M · p des transformierten Objekts
überführt. Welche ist die richtige Lösung:
1. M = T · S
2. M = S · T
Begründen Sie Ihre Antwort und geben Sie M an!
Antwort: Antwort 1 ist richtig, da durch die Skalierung der Schwerpunkt genau dann
unverändert bleibt, wenn er im Koordinatenursprung liegt. Die anschließende Translation
verschiebt das Objekt (und damit den Schwerpunkt) an die gewünschte Position. Es ist also

 
 

1 0 tx
s 0 0
s 0 tx
M = T · S =  0 1 ty  ·  0 s 0  =  0 s ty 
0 0 1
0 0 1
0 0 1
• Gegeben seien eine 3 × 3-Transformationsmatrix


3 4 2
M =  −4 3 1 
0 0 1
sowie drei Punkte
a =
b =
c =
(2, 0)T ,
(0, 1)T ,
(0, 0)T
9.4. DISCUSSING THE ROTATION MATRIX IN TWO DIMENSIONS
165
im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine
konforme Transformation, wobei ein Punkt p gemäß p0 = Mp in einen Punkt p0 übergeführt
wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und
bc stehen normal aufeinander.
1. Berechnen Sie a0 , b0 und c0 durch Anwendung der durch M beschriebenen Transformation auf die Punkte a, b und c!
2. Da M eine konforme Transformation beschreibt, müssen auch die Punkte a0 , b0 und
c0 ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall
ist! (Hinweis: es genügt zu zeigen, dass die Strecken a0 c0 und b0 c0 normal aufeinander
stehen.)
Antwort:
1.
a0
b0
c0
= (8, −7)T
= (6, 4)T
= (2, 1)T
2.
a0 − c0 = (6, −8)T
b0 − c0 = (4, 3)T
(a0 − c0 ) · (b0 − c0 ) = 6 · 4 + (−8) · 3 = 0
9.4
Discussing the Rotation Matrix in two Dimensions
A rotation matrix R is filled with four elements it if concerns rotations in two dimensions.
Definition 19 Rotation in 2D
x0
y0
= x · cos θ − y · sin θ
= x · sin θ + y · cos θ
written in matrix-form:
x0
y0
R
cos θ − sin θ
sin θ
cos θ
x
= R·
y
=
As shown in 9.1 two elements can be combined into a unit vector, namely unit vectors i and j.
The rotation matrix R consists of i, j, which are the unit vectors in the direction of the rotated
coordinate system. We can show that the rotation matrix has some interesting properties, namely
that the multiplications of the unit vectors with themselves are 1, and that the cross-products of
the unit vectors are zero. ( see also Slide 9.22 )
166
CHAPTER 9. TRANSFORMATIONS
Definition 20 2D rotation matrix
A point of an object is rotated about the origin by multiplying it with a so called rotation matrix.
When dealing with rotations in two dimensions the rotation matrix R consists of four elements.
These elements can be combined into two unit vectors i and j.
i
x0
y0
R
=
cos α
sin α
,
− sin α
cos α
j=
cos θ − sin θ
=
= (i, j)
sin θ
cos θ
x
= R·
y
Starting from a given coordinate system with axes X and Y the vectors i and j correspond to the
unit vectors in the direction of the rotated coordinate system (see Figure 9.1).
Figure 9.1: rotated coordinate system
We have now found a third definition of the rotation matrix element, namely the unit vectors
along the axes of the rotated coordinate system as expressed in the input coordinate system. Slide
9.23 summarizes the 3 interpretations of the elements of a rotation matrix.
Let’s take a look at the inverse of a rotation matrix. Note that if we premultiply a rotation
matrix by its inverse we get the unit matrix (obviously). But we also learn very quickly, that
premultiplying the rotation matrix with the transposed of the rotation matrix also produces the
unit vector, which very quickly proves to us in accordance with Slide 9.24 that the inverse of a
rotation matrix is nothing else but the transposed rotation matrix.
We now take a look at the forward and backward rotation. Suppose we have rotated a coordinate
system denoted by x into a new coordinate system of X. If we now premultiply the new coordinate
system with the transposed rotation matrix, we obtain the inverse relationship and see that we
obtain, in accordance with Slide 9.25, the original input coordinates. Therefore we know that the
transposed of a rotation matrix serves to rotate back the rotated coordinate system into its input
state.
Let’s now take a look at multiple sequential rotations. We first rotate input coordinates x into
output coordinates x1 and then we rotate the output coordinates x1 further into coordinates x2 .
9.5. THE AFFINE TRANSFORMATION IN 2 DIMENSIONS
167
We see very quickly that x2 is obtained from the product of two rotation matrixes R1 and R2 .
However, it is also very quickly evident that multiplying two rotation matrixes produces nothing
else but a third rotation matrix.
Definition 21 Sequenced rotations
x1
x2
x2
R
=
=
=
=
R1 x
R2 x1
R2 R1 x = Rx
R2 R1
It is important, however, to realize that matrix multiplications are not commutative: R2 · R1 is
not necessarily identical to R1 · R2 !
Prüfungsfragen:
• In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen
nicht kommutativ ist, d.h. für zwei Transformationsmatrizen M1 und M2 gilt M1 ·M2 6= M2 ·
M1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R1
und R2 , so gilt sehr wohl R1 ·R2 = R2 ·R1 . Geben Sie eine geometrische oder mathematische
Begründung für diesen Sachverhalt an!
Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt!
Antwort: Bei der Drehung um eine fixe Rotationsachse addieren sich die Rotationswinkel,
die Reihenfolge der Rotationen spielt daher keine Rolle.
9.5
The Affine Transformation in 2 Dimensions
Slide 9.28 is an example of an Affine Transformation created with the help of a letter “F”. We
see a shearing effect as a characteristic feature of an Affine Transformation. Similarly, Slide 9.29
illustrates how a unit square will be deformed for example by squishing it only along the axis x
but not along the axis y, or by shearing the square in one direction or in the other direction. All
these are effects of an Affine Transformation.
Slide 9.30 provides us with the equation for a general Affine Transformation in 2 dimensions. We
see that this is a six parameter transformation, defined by transformation parameters a, b, c, d,
tx , ty . We may again ask the question of estimating the unknown transformation parameters
if we have given a number of points both before and after the transformation. Question: How
many points do we need at a minimum to be able to solve for the unknown six transformation
parameters. Obviously we need three points, because each point provides us with two equations,
so that three points provide us with six equations suitable of solving for the six unknown equation
parameters. But be aware: those three points cannot be colinear!
Let us now analyze the elements of an Affine Transformation and let us take a look at Slide 9.32,
Slide 9.33 however recalling what we saw from Slide 9.13 to Slide 9.15. First, we see a scaling of
the input coordinates, in this case denoted as px and py , independently by scaling factors sx and
sy to obtain output coordinates qx and qy . We can denote the scaling operations by means of a
2 × 2 scaling matrix Msc as shown in Definition ??.
Secondly, we have a shearing deformation which adds to each coordinate x an increment that is
proportional to y and we add in y an augmentation that is proportional to the x coordinate using a
168
CHAPTER 9. TRANSFORMATIONS
proportionality factor g. That shearing transformation can be described by a matrix Msh shearing
(see Definition ?? ). Thirdly, we can introduce a translation adding to each x and y coordinate
the translational element tx and ty ( see Definition ?? ).
Finally, we can rotate the entire object identical to the rotation that we saw earlier using a rotation
angle α and producing a rotation matrix MR ( see Chapter 9.4 ). An Affine Transformation is now
the sum total of the transformations, thus the product of three transformations: Msc for scale,
Msh for shearing and MR for rotation and adding on the translation as discussed previously.
Slide 9.34 further explains how the transformation of the input coordinate vector p into an output
coordinate q is identical to the earlier two equations, converting the input coordinate pair (x, y)
into an output coordinate pair (x0 , y 0 ) via a six parameter affine transformation.
Definition 22 Affine transformation with 2D homogeneous coordinates



x0
sx
 y0  =  0
w0
0
0
sy
0
 



0
x
x
0  ·  y  = Msc  y 
1
w
w


x0
1
 y 0  =  hy
w0
0
hx
1
0
 



0
x
x
0  ·  y  = Msh  y 
1
w
w



 



x0
1 0 tx
x
x
 y 0  =  0 1 ty  ·  y  = Mtr  y 
w0
0 0 1
w
w



r11
x0
 y 0  =  r21
w0
0

r12
r22
0
 

tx
x
ty  ·  y 
s
w
Definition 22 shows an example of how to construct a Affine Transformation that rotates, translates
and scales it in one step. The transformation is done in 2D using homogeneous coordinates ( see
Chapter 9.9 ). The parameters ri specify the rotation, ti specify the translational element and s
is a scaling factor ( which in this case scales equally in both directions x, and y ).
Prüfungsfragen:
• Es seien zwei Punktwolken“ entsprechend Abbildung ?? gegeben. Stelle zunächst die
”
geeignete Transformation der einen Punktgruppe auf die zweite Punktgruppe unter Verwendung des dazu einzusetzenden Formelapparates (ohne Verwendung der angebenen Koordinaten) dar, sodass die markierten drei Punkte im linken Bild (jene drei, welche als
Kreisflächen markiert sind) nach der Transformation mit den drei Punkten im rechten Bild
(die ebenfalls als Kreisflächen markiert sind) zur Deckung gebracht werden.
• Stellen Sie bitte für die in der Frage ?? gesuchte Berechnung der unbekannten Transformationsparameter die Koeffizientenmatrix auf, wobei die Koordinaten aus Abbildung ?? nur
ganzzahling verwendet werden.
9.6. A GENERAL 2-DIMENSIONAL TRANSFORMATION
9.6
169
A General 2-Dimensional Transformation
We begin the consideration of a more general 2-dimensional transformation by a look at the
bilinear transformation ( see Definition 23 ), which takes the input coordinates (x, y) and converts
them into an output coordinate pair (X, Y ) via a bilinear expression which has a term with a
product (x, y) of the input x and input y coordinates. This transformation is called bilinear
because if we freeze either the coordinate x or the coordinate y we obtain a linear expression for
the transformation. Such a transformation has 8 parameters as we can see from Slide 9.36. Each
input point (x, y) produces 2 equations as shown in that slide. We need four points to compute
the transformation parameters a, b, c, e, f , g, and the translational parameters d and h. By means
of a bilinear transformation we can match any group of four input points into any group of four
output points and thereby achieve a perfect fit by means of that transformation.
Definition 23 Bliniear transformation
0
x
0
y
= a ∗ x + b ∗ y + c ∗ xy + d
= e ∗ x + f ∗ y + g ∗ xy + h
A more general transformation would be capable of taking a group of input points as shown in
Slide 9.37, in this example with an arrangement of 16 points, into a desired output geometry
as shown in Slide 9.38. We suggest that the randomly deformed arrangements of that slide be
converted into a rigidly rectangular pattern: How can we achieve this?
Obviously, we need to define a transformation with 16 × 2 = 32 parameters for all 32 coordinate
values. Slide 9.39 illustrates the basic concept. We are setting up a polynomial transformation to
take the input coordinate pair (x, y) and translate it into an coordinate pair (X, Y ) by means of
two 16-parameter polynomials. These polynomial coefficients a0 , a1 , a2 , . . . and b0 , b1 , b2 , . . . may
initially be unknown, but if we have 16 input points with their input locations (x, y) and we
know their output locations (X, Y ), then we can set up an equation system to solve the unknown
transformation parameters a0 , a1 , . . . , a15 , and b0 , b1 , . . . , b15 .
Slide 9.40 illustrates the type of computation we have to perform. Suppose we had given in the
input coordinate system 1 the input coordinates (xi , yi ) and we have n such points. We also have
given in the output coordinate system 2 the output coordinates (Xj , Yj ) and we have the same
number of output points n. We can now set up the equation system that translates the input
coordinates (xi , yi ) into output coordinates (Xj , Yj ). What we ultimately obtain is an equation
system:
x=K·u
In this equation, x are the known output coordinates, u is the vector of unknown transformation
parameters, and this may be 4 in the conformal transformation, 6 in the affine, 8 in the bilinear or,
as we discussed before, 32 for a polynomial transformation that must fit 16 points from an input
to 16 output locations. What is in the matrix K? It is the coefficient matrix for the equations and
is filled with the input coordinates as shown in the polynomial or other transformation equations.
How large is the coefficient matrix K? Obviously for an affine transformation, the coefficient
matrix K is filled with 6 by 6 elements, and in the polynomial case discussed here the coefficient
matrix K has 36 by 36 elements.
What happens if we have more points given in system 1 with their transformed coordinates in
system 2 than we need to solve for the unknowns? Suppose we had ten input and ten output
points to compute the unknown coefficients of a conformal transformations where we would only
need 2 points producing 4 equations to allow us to solve for the 4 unknowns? We have an overdetermined equation system and our matrix K is rectangular. We can not invert a rectangular
matrix. So what do we do?
170
CHAPTER 9. TRANSFORMATIONS
There is a theory in statistics and estimation theory which is called Least Squares Method . Slide
9.41 explains: we can solve an over-determined equation system which has a rectangular and not
a square coefficient matrix by premultiplying the left and the right side of the equation system by
a transposed of the coefficient matrix, KT . We obtain in this manner a square matrix KT · K on
the right hand side and we call this a normal equation matrix . It is square in the shorter of the
two dimensions of the rectangular matrix K and it can be inverted. So the unknown coefficient u
results of an inverse of the product KT · K as shown in Slide 9.41.
This is but a very simple glimpse at the matters of “Least Squares”. In reality, this is a concept
that can fill many hundreds of pages of textbooks, but the basic idea is that we estimate the
unknown parameters u using observations that are often erroneous, and to be robust against such
errors, we provide more points (xi , yi ) in the input system and (Xi , Yi ) in the output system than
needed as a minimum. Because of these errors the equations will not be entirely consistent and
we will have to compute transformation parameters that will provide a best approximation of the
transformation.
“Least squares” solutions have optimality properties if the errors in the coordinates are statistically
normally distributed.
Prüfungsfragen:
• Im 2D Raum sei ein bilineare Transformation gesucht, und die unbekannten Transformationsparameter seien zu berechnen. Es seien dafür N Punkte mit ihren Koordinaten vor und
nach der Transformation bekannt, wobei N > 4. Welcher Lösungsansatz kommt hier zur
Anwendung?
Antwort: Methode der kleinsten Quadrate:
X = K·u
K · X = KT ·K · u
−1
u = KT · K
· KT · X
T
• In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen
Transformation in zwei Dimensionen erläutert:
1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare gegeben sind
2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/OutputPunktpaare gegeben sind ( Least squares method“)
”
Die Methode der kleinsten Quadrate kann jedoch auch dann angewandt werden, wenn genau
vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese
Feststellung?
Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat.
Antwort:
u = KT K
−1
KT X = K−1
KT
−1
KT X = K−1 X
Diese Umformungen sind möglich, da K hier eine quadratische Matrix ist. Da das Gleichungssystem nicht überbestimmt ist, existiert eine exakte Lösung (Fehler ε = 0). Diese
Lösung wird auch von der Methode der kleinsten Quadrate gefunden, indem der Fehler
(ε ≥ 0) minimiert wird.
• Beschreiben Sie eine bilineare Transformation anhand ihrer Definitionsgleichung!
9.7. IMAGE RECTIFICATION AND RESAMPLING
9.7
171
Image Rectification and Resampling
We change the geometry of an input image as illustrated in Slide 9.43, Slide 9.44, Slide 9.45
showing a mesh or a grid superimposed over the input image. We similarly show a different shape
mesh in the output image. The task is to match the input image onto the output geometry so
that the meshes fit one another. We have to establish a geometric relationship between the image
in the input and the output using a transformation equation from the input to the output.
If we now do a geometric transformation of an image we have essentially two tasks to perform.
First we need to describe the geometric transformation between the input image and the output
image by assigning to every input image location the corresponding location in the output image.
This is a geometric operation with coordinates. Second, we need to produce an output gray level
for the resulting image based on the input gray levels. We call this second process a process of
resampling as shown in Slide 9.46, and use operations on gray values.
Again, what we do conceptually is to take an input image pixel at location (x, y) and to compute
by a spatial transform the location in the output image at which this input pixel would fall and
this location has the coordinates (x0 , y 0 ) in accordance with Slide 9.46. However, that location
may not perfectly coincide with the center of a pixel in the output image. Now we have a second
problem and that is to compute the gray value at the center of the output pixel by looking at the
area in which the input image corresponds to that output location. One method is to assign the
gray value we find in the input image to the specific location in the output image. If we use this
method, we have used a so-called nearest neighbor -method.
An application of this matter of resampling and rectification of images is illustrated in Slide 9.47.
We have a distorted input image which would show an otherwise perfectly regular grid with some
distortions. In the output image that same grid is reconstructed with reasonably perfect vertical
and horizontal grid lines. The transition is obtained by means of a geometric rectification and
this rectification includes as an important element the function of resampling. Slide 9.113 is
again the image of planet Mercury before the rectification and Slide 9.49 after the rectification
performing a process as illustrated earlier. Let us hold right at this point and delay a further
discussion of resampling to a separate later (Chapter 15). Resampling and image rectification was
only mentioned at this point to establish the relationship of this task to the idea of 2-dimensional
transformations from an input image to an output image.
Prüfungsfragen:
• Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein
verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die erforderlich sind, um ein
solches verzerrtes Bild zu rektifizieren!
Antwort:
1. geometrisches Resampling: Auffinden von korrespondierenden Positionen in beiden
Bildern
2. radiometrisches Resampling: Auffinden eines geeigneten Grauwertes im Ausgabebild
9.8
Clipping
As part of the process of transforming an object from a world coordinate system into a display
coordinate system on a monitor or on a hardcopy output device we are faced with an interesting
problem: We need to take objects represented by vectors and figure out which element of each
vector is visible on the display device. This task is called clipping. An algorithm to achieve
172
CHAPTER 9. TRANSFORMATIONS
clipping very efficiently is named after Cohen-Sutherland. Slide 9.51 illustrates the problem
a number of objects is in world coordinates and a display window will only show part of those
objects. On the monitor the objects will be clipped.
Slide 9.52 algorithm. The task is to receive on the input side a vector defined by the end points
p1 and p2 and computing auxiliary points C, D where this vector intersects the display window
which is defined by a rectangle.
9.8.1
Half Space Codes
In order to solve the clipping problem Cohen and Sutherland have defined so-called half-space
codes in Slide 9.54 and relate to the half spaces defined by the straight lines delineating the display
window. These half-space codes designate spaces to the right, to the left, to the top and to the
bottom of the boundaries of the display window, say with subscripts cr , cl , ct , and cb . For example
if a point is to the right of the vertical boundary, the point’s half-space code is set to “true”, but if
it is to the left the code per is set to “false”. Similar a location above the window gets a half-space
code ct “true” and below gets “false”.
We now need to define a procedure called “Encode” in Slide 9.55 which takes an input point p and
produces for it the associated four half-space codes assigning the half-space codes to variable c, a
Boolean variable. We obtain 2 values for each of the 2 coordinates px and py of point p, obtaining
a value of true or false depending on where px falls with respect to the vertical boundaries of the
display window, and on where py falls with respect to the horizontal boundaries.
9.8.2
Trivial acceptance and rejection
Slide 9.56 is a picture of the first part of the procedure clip, as it is presented in [FvDFH90,
Section 3.12.3]. Procedure “Encode” is called up for the beginning and end points of a straight
line, denoted as P1 and P2 and the resulting half-space codes are denoted as C1 and C2 . We
now have to take a few decisions about the straight line depending on where P1 and P2 fall. We
compute 2 auxiliary Boolean variables, |In|1 and |In|2 . We can easily show that the straight line
is entirely within the display window if |In|1 and |In|2 are “true”. This is called trivial acceptance,
shown in Slide 9.57 for points A, B. Trivial rejection is also shown in Slide 9.57 for a straight line
connecting points C and D.
9.8.3
Is the Line Vertical?
We need to proceed in the “Clipping Algorithm” in Slide 9.58, if we do not have a trivial acceptance
nor a trivial rejection. We differentiate among cases where at least one point is outside the display
window. The first possibility is that the line is vertical. That is considered first.
9.8.4
Computing the slope
If the line is not vertical we compute its slope. This is illustrated in Slide 9.59.
9.8.5
Computing the Intersection A in the Window Boundary
With this slope we compute the intersection of the straight line with the relevant boundary lines of
the display window at wl , wr , wt and wb . We work our way through a few decisions to make sure
that we do find the intersections of our straight line with the boundaries of the display window.
9.9. HOMOGENEOUS COORDINATES
9.8.6
173
The Result of the Cohen-Sutherland Algorithm
The algorithm will produce a value starting that either the straight line is entirely outside of the
window or it returns with the end points of the straight line. These are the end points from the
input if the entire line segment is within the window, they are the intersection points of the input
line with the window bountaries if the line intersects them.
Prüfungsfragen:
• Welche Halbraumcodes“ werden im Clipping verwendet, und welche Rolle spielen sie?
”
• Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherland
anhand des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes
sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks
R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt
aus Abbildung B.18 ablesen.
• Wenden Sie den Clipping-Algorithmus von Cohen-Sutherland (in zwei Dimensionen)
auf die in Beispiel B.2 gefundenen Punkte p01 und p02 an, um den innerhalb des Quadrats
Q = {(0, 0)T , (0, 1)T , (1, 1)T , (1, 0)T } liegenden Teil der Verbindungsstrecke zwischen p01 und
p02 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen grafisch lösen.
Antwort:
9.9
p01
p02
cl
true
false
cr
false
true
ct
false
false
cb
false
true
Homogeneous Coordinates
A lot of use of homogenous coordinates is made in the world of computer graphics. The attraction
of homogenous coordinates is that in a 2- or 3- dimensional transformation of an input x coordinate
system or object described by x into an output coordinate system x0 or changed object we do not
have to split our operation into a part with a multiplication for the rotation matrix and scale
factor, and separately have an addition for the translation vector t. Instead we simply employ
only a matrix multiplication having a simple homogeneous coordinate X for a point and output
coordinates X0 for the same point after the transformation.
Slide 9.62 explains the basic idea of homogenous coordinates. Instead of working in 2 dimensions
in a 2-dimensional Cartesian coordinate system (x, y) we augment the coordinate system by a
third coordinate w, and any point in 2D-space with locations (x, y) receives a third coordinate
and therefore is at location (x, y, w). If we define w1 = 1 we have defined a horizontal plane
for the location of a point. Again Slide 9.63 states that Cartesian coordinates in 2 dimensions
represent a point p as (x, y) and homogeneous coordinates in 2 dimensions have that same point
represented by the three element vector (x, y, 1). Let us try to explain how we use homogeneous
coordinates staying with 2 dimensions only. In Slide 9.64 we have another view of a translation
in Cartesian coordinates. Slide 9.65 describes scaling, in this particular case an affine scaling
occurs with separate scale factors in the two different coordinate directions (Slide 9.66 illustrates
a rotation). Slide 9.67 illustrates the translation by means of a translation vector and scaling
by means of a scaling matrix. Slide 9.68 introduces the relationship between a Cartesian and
a homogenous coordinate system. Slide 9.69 uses homogeneous coordinates for a translation by
means of a multiplication of the input coordinate into an output coordinate system. The same
operation is used for scaling in Slide 9.70 and for rotation in Slide ??, Slide ?? summarizes.
As Slide 9.73 reiterates that translation and scaling are described by matrix multiplication and
of course rotation and scaling have previously also been matrix multiplications in the Cartesian
174
CHAPTER 9. TRANSFORMATIONS
coordinate system. If we now combine these three transformations of translation, scaling and
rotation we obtain a single transformation matrix M which describes all three transformations
without separation into multiplication and additions as is necessary in the Cartesian case.
The simplicity of doing everything in matrix form is the appeal that leads computer graphics
software to heavily rely on homogeneous coordinates. In image analysis homogeneous coordinates
are not as prevalent. One may assume that because we often times have in image processing
to estimate transformation parameters using for this over-determined equation systems and the
method of least squares. That approach typically is better applicable with Cartesian geometry
than with the homogeneous system.
Prüfungsfragen:
• Erklären Sie die Bedeutung von homogenen Koordinaten für die Computergrafik! Welche
Eigenschaften weisen homogene Koordinaten auf?
• Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß
q = M p zu transformieren (sog. rigid body transformation“)!
”
Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“
”
enthalten. Wären sie hingegen explizit formuliert, wäre die Antwort eigentlich Material der
Gruppe I“.
”
• Gegeben seien die Transformationsmatrix


0 2 0 0
 0 0 2 0 

M =
 1 0 0 −5 
−2 0 0 8
und zwei Punkte

3
p1 =  −1  ,
1


2
p2 =  4 
−1

in Objektkoordinaten. Führen Sie die beiden Punkte p1 und p2 mit Hilfe der Matrix M in
die Punkte p01 bzw. p02 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei
die Umwandlungen zwischen dreidimensionalen und homogenen Koordinaten)!
Antwort:

0 2
 0 0

 1 0
−2 0

 
3
0 0
 −1
2 0 
·
0 −5   1
1
0 8
 
0 2 0 0
2
 0 0 2 0   4

 
 1 0 0 −5  ·  −1
−2 0 0 8
1
9.10




−2
−1
  2 
0

=
 1 
  −2  ⇒ p1 =
−1
2






8
2
  −2 
0
=

 −0.5 
  −3  ⇒ p2 =
−0.75
4
A Three-Dimensional Conformal Transformation
In three dimensions things become considerably more complex and more difficult to describe. Slide
9.75 shows that a 3-dimensional conformal transformation rotates objects or coordinate axes, scales
9.10. A THREE-DIMENSIONAL CONFORMAL TRANSFORMATION
175
Definition 24 Rotation in 3D
The three-dimensional rotation transforms an input point P with coordinates (x,y,z) into an output
coordinate system (X,Y,Z) by means of a rotation matrix R.
The elements of this rotation matrix can be interpreted following:
- as the cosines of the angles subtended by the coordinate axes xX,yX,zX,...zZ
- as the assembly of the three unit vectors directed along the axes of the rotated coordinate systems
but described in terms of the input.
R

cos(xX) cos(yX) cos(zX)
=  cos(xY ) cos(yY ) cos(zY ) 
cos(xZ) cos(yZ) cos(zZ)

or

R
r11
=  r21
r31
P0
= R·P
r12
r22
r32

r13
r23 
r33
A 3D rotation can be considered as a composition of three individual 2-D rotations around the
coordinate axes x,y,z. It is easy to see that rotating around one axis will also affect the two other
axis. Therefore the sequence of the rotations is very important. Changing the sequence of the
rotations may result in a different output image.
them and translates, just as we had in 2 dimensions. However, the rotation matrix now needs to
cope with three coordinate axes.
In analogy to the 2-dimensional case we now know that the rotation matrix takes an input point
P with coordinates (x, y, z) into an output coordinate system (X, Y, Z) by means of a rotation
matrix R. The elements of this rotation matrix are again first: the cosines of the angles subtended
by the coordinate axes xX, yX, zX, . . . , zZ; second is the assembly of three unit vectors directed
along the axes of the rotated coordinate system but described in terms of the input coordinate
systems (Slide 9.76 the multiplication of three 2-D rotation a matrices as shown in Slide 9.77. The
composition of the rotation matrix by three individual 2-D rotations around the three coordinate
axes x, y and z is the most commonly used approach. Each rotation around an axis needs to
consider that that particular axis may already have been rotated by a previous rotation. Note as
we rotate around a particular axis first, that will move the other two coordinate axes. We then
rotate around the rotated second axis, affecting the third one again and then we rotate around the
third axis. The sequence of rotations is of importance and will change the ultimate outcome if we
change the sequence. Slide 9.79 illustrates how we might define a three-dimensional rotation and
translation by means of three points P1 , P2 , P3 which represent two straight line sequency P1 P2
and P1 P3 . We begin by translating P1 into the origin of the coordinate system. We proceed by
rotating P2 into the z axis and complete the rotation by rotating P3 into the yz plane. We thereby
obtain the final position. If we track this operation we see that we have applied several rotations.
We have first rotated P1 P2 into the xz plane. Then we have rotated the result around the y-axis
into the z-axis. Finally we have rotated P1 P3 around the z-axis into the yz plane. Slide 9.80
and Slide 9.81 explain in detail the sequence of three rotations of three angles which are denoted
in this case first as angle Θ, second angle φ, and third angle α. Generally, a three dimensional
conformal transformation will be described by a scaling l, a rotation matrix R and a translation
vector t. Note that l is a scalar value, the rotation matrix is a 3 by 3 matrix containing three
angles and translation vector t has three elements with translations along the directions x, y, and
z. This type of transformation contains seven parameters for the three dimensions as opposed to
four parameters in the 2D case. Note that the rotation matrix has 3 angles, the scale factor is a
fourth value and the translation vector has three values, resulting in a total of seven parameters
176
CHAPTER 9. TRANSFORMATIONS
to define this transformation.
Prüfungsfragen:
• Was versteht man unter einer konformen Transformation“?
”
9.11
Three-Dimensional Affine Transformations
Definition 25 Affine transformation with 3D homogeneous coordinates
case ’translation’:

1
 0
tr matrix = 
 0
0

0 tx
0 ty 

1 tz 
0 1
0
1
0
0
case ’rotation x’:

1
 0
rotationx = 
 0
0

0
0
0
cos φ − sin φ 0 

sin φ cos φ 0 
0
0
1
case ’rotation y’:

cos φ

0

rotationy = 
− sin φ
0
0
1
0
0
sin φ
0
cos φ
0

0
0 

0 
1
case ’rotation z’:

cos φ − sin φ
 sin φ cos φ
rotationz = 
 0
0
0
0
0
1
0
0

0
0 

0 
1
case ’scale’:

sx
 0
scale matrix = 
 0
0
0
sy
0
0
0
1
sz
0

0
0 

0 
1
By using homogeneous coordinates, all transformations are 4x4 matrices. So the transformations
can be easily combined by multiplying the matrices. This results in a speedup because every point
is only multiplied with one matrix and not with all transformation-matrices.
The three-dimensional transformation may change the object shape. A simple change results from
shearing or squishing, and produces an affine transformation. Generally, the affine transformation
9.12. PROJECTIONS
177
does not have a single scale factor, but we may have up to three different scale factors along the
x, y, and z axes as illustrated in Slide 9.83. An other interpretation of this effect is to state
that a coordinate X is obtained from the input coordinates (x, y, z) by means of these shearing
elements hyx and hzy which are really part of the scaling matrix Msc . Ultimately, a cube will be
deformed into a fairly irregular shape as shown in Slide 9.84 with the example of a building shape.
A three-dimensional affine transformations now has 12 parameters, so that transformations of the
x, y, and z coordinates are independent of one another. Yet, however, the transformation will
maintain straight lines as straight lines. However, right angles will not remain right angles.
9.12
Projections
From a higher dimensional space, projections produce images in a lower dimensional space. We
have projection lines in projectors that connect input to output points, we have projection centers
and we have a projection surface onto which the high-dimensional space is projected. In the real
world we basically project 3-dimensional spaces onto 2 dimensional projection planes. The most
common projections are the perspective projections as used by the human eye and by optical
cameras.
We differentiate among a multitude of projections. The perspective projections model what happens in a camera or the human eye. However, engineers have long used parallel projections. These
are historically used also in the arts and in cartography and have projection rays (also called
projectors or projection lines) that are parallel. If they are perpendicular onto the projection
plane we talk about an orthographic projection. If they are not perpendicular but oblique to the
projection plane, we talk about an oblique projection (see Slide 9.86).
A special case of the orthographic projection results in the commonly used presentations of three
dimension space in a top view, front view and side view (Slide 9.87) case. Heavy use in architecture
and civil engineering of top views, front views and side views of a 3-D space is easy to justify:
from these 3 views we can reconstruct the 3 dimensions of that space. Another special is the
axonometric projection where the projection plane is not in one of the three coordinate planes of
a three-dimensional space.. Yet another special case in the isometric projection which occurs if
the projection plane is chosen such that all three coordinate axes are changed equally much in the
projection (the projection rays are directed along the vector with elements (1,1,1). We highlight
particular oblique projections which are the cavalier and the cabinet projection. The cavalier
projection produces no scale reduction along the coordinate axes because it projects perfectly
under 45◦ . In the cabinet projection we project under an angle α = 63.4◦ since from tan α = 2,
this projection shrinks an object in one direction by factor of 1/2.
9.13
Vanishing Points in Perspective Projections
In order to construct a perspective projection we can take advantage of parallel lines. In the
natural world they meet of course at infinity. In the projection they meet at a so-called vanishing
point 1 . This is a concept of descriptive geometry, a branch of mathematics. Slide 9.91 is the
example of a perspective projection as produced by a synthetic camera. Note how parallel lines
converge at a point which typically is outside the display area. The vanishing point is the image
of the object point at infinity.
Because there exists an infinity of directions for bundles of parallel lines in 3D space, there exists
an infinity of vanishing points. However, special vanishing points are associated with bundles
of lines that are parallel with the coordinate axes. Such vanishing points are called principal.
1 in
German: Fluchtpunkt
178
CHAPTER 9. TRANSFORMATIONS
If we may have only one axis producing a finite vanishing point since the other two axes are
themselves parallel to the projection plane and their vanishing points are at infinity. Therefore
such a perspective projection is called a one-point perspective in which a cube aliguid with the
coordinate axes will only have one vanishing point. Analogously, Slide 9.93 and Slide 9.94 present
a 2-point and a general 3-point perspective.
9.14
A Classification of Projections
Slide 9.96 presents the customary hierarchy of projections as they are commonly presented in
books about architecture, art and engineering. In all cases, these projections are onto a plane
and are thus planar projections. The differenciation between perspective and parallel projections
is somewhat artificial if one considers that with a perspective center at infinity, one obtains the
parallel projection. However, the projections are grouped into parallel and perspective projections,
the perspective axes are then subdivided into single point, two point and three point perspective
projections and the parallel projections are classified into orthographic and oblique ones, the
oblique have the cavalier and cabinet projection as special cases. The orthographic projections
have the axonometry on one hand and the multi-view orthographic on the other hand and within
the axonometric projection we have one special case we discussed, the isometric projection.
We do not discuss the world of more complex projections, for example to convert the surface of a
sphere into a plane: this is the classical problem of cartography with its need to present a picture
of the Earth on a flat sheet of paper.
Prüfungsfragen:
• In der Vorlesung wurde ein Baum“ für die Hierarchie diverser Projektionen in die Ebene
”
dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden Projektionen.
9.15
The Central Projection
This is the most important projection of all the ones we discuss in this class. The simple reason
for this is that it is the geometric model of a classical camera. Slide 9.98 explains the geometry of
a camera and defines three coordinate systems. The first is the world coordinate system with X,
Y and Z. In this world coordinate system we have a projection center O at location (X0 , Y0 , Z0 ).
The projection center is the geometric model of a lens. All projection lines are straight lines
going from the object space, where there is an object point P at the location (x, y, z) through the
projection center O and intersecting the image plane.
We know at this point that the central projection is similar to the perspective projection. There
is a small difference, though. We define the projection center with respect to an image plane and
insist on some additional parameters that describe the central projection that we do not typically
use in the perspective projection.
Note that we have a second coordinate system that is in the image plane that is denoted in Slide
9.98 by ξ and η. This is a rectangular 2-dimensional Cartesian coordinate system with an origin
at point M . The point P in object space is projected onto the image location P 0 = (x, y). Third,
we have the location of point O of the perspective center defined in a sensor coordinate system.
The sensor coordinate system has its origin at the perspective center O, is a three-dimensional
coordinate system, its x and y axes are nominally parallel to the image coordinate system (ξ, η)and
the z-axis is perpendicular to the image plane.
9.15. THE CENTRAL PROJECTION
179
We do have an additional point H defined in the central projection which is the intersection of
the line perpendicular to the image plane and passing through the projection center. Note that
this does not necessarily have to be identical to point M . M simply is the origin of the image
coordinate system and is typically the point of symmetry with respect to some fiducial marks as
shown in Slide 9.98.
In order to describe a central projection we need to know the image coordinate system with
its origin M , we need to know the sensor coordinate system with its origin O and we need to
understand the relationship between the sensor coordinate system and the world coordinate system
(X, Y, Z).
Let us take another look at the same situation in Slide 9.99 where the coordinate systems are again
illustrated. We have the projection center O as in the previous slide and we have two projection
rays going from object point P1 to image point P10 or object point P2 to image point P20 . We also
do have an optical axis or the direction of the camera axis which passes through point O and is
perpendicular to the image plane. In Slide 9.99 there are two image planes suggested. One is
between the perspective center and the object area and that suggests the creation of a positive
image. In a camera, however, the projection center is typically between the object and the image
plane and that leads geometrically to a negative image. Slide 9.99 also defines again the idea of
an image coordinate system. In this case it is suggested that the image is rectangular and the
definition of the image coordinates is by some artificial marks that are placed in the image plane.
The marks are connected and define the origin M . We have also again the point H which is
the intersection of the line perpendicular to the image plane but passing through the projection
center. He also have some arbitrary location for a point P 0 that is projected into the image. We
will from here on out ignore that M and H may be 2 locations. Typically the distance between
M and H is small, and it is considered an error of a camera if M and H don’t coincide.
Normal cameras that we use as amateurs do not have those fiducial marks and therefore they
are called non-metric cameras because they do not define an image coordinate system. Users of
non-metric cameras who want to measure need to help themselves by some auxiliary definition of
an image coordinate system and they must make sure that the image coordinate system is the
same from picture to picture if multiple pictures show the same object. Professional cameras that
are used for making measurements and reconstructing 3D objects typically will have those fiducial
marks as fixed features of a camera. In digital cameras the rows and columns of a CCD array will
provide an inherent coordinate system because of the numbering of the pixels.
Slide 9.100 is revisiting the issue of a 3-dimensional rotation. We have mentioned before that there
are three coordinate systems in the camera: Two of those are 3-dimensional and the third one is
2-dimensional. The sensor coordinate system with its origin at projection center O and the world
coordinate system (X, Y, Z) need to be related via a 3-dimensional transformation. Slide 9.100
suggests we have 3 angles that define the relationship between the 2 coordinate systems. Each of
those angles represents a 2-dimensional rotation around 1 of the 3 world coordinate axes. Those
are angles in 3D space.
Recall that we have several definitons of rotations matrixes and that we can define a rotation
matrix by various geometric entities. These can be rotations around axes that rotate themselves,
or they can be angles in 3-D space subtended by the original axes and the rotated axis. Slide
9.100 describes the first case. Θ is the angle of rotation around the axis Z, but in the process we
will rotate axes X and Y . φ is rotating around the axis X and will take with it obviously the
axes Z and Y . And A then is a rotation around the rotated axis Z. Conceptually, everything
we said earlier about 3-dimensional transformations, rotations and so forth applies here as well.
Our earlier discussions of a 3D conformal transformation applies to the central projection and the
central projection really is mathematically modeled by the 3-dimensional conformal which elevates
that particular projection to a particularly important role.
180
9.16
CHAPTER 9. TRANSFORMATIONS
The Synthetic Camera
We have various places in our class in which we suggest the use of a synthetic camera. We
have applications in computer graphics in order to create a picture on a display medium, on a
monitor, for augmented or virtual reality. We have it in image processing and photogrammetry
to reconstruct the world from images and we have terminology that has developed separately as
follows. What is called a projection plane or image plane is in computer graphics called a View
Plane. What in image processing is a projection center is in computer graphics a View Reference
Point VRP. And what in image processing is the optical axis or the camera axis is in computer
graphics the View Plane Normal VPN. Slide 9.102 and Slide 9.103 explain this further. We do
have again a lens center and an image plane and an optical axes Z that is perpendicular to the
image plane which itself is defined by coordinate axis X and Y . An arbitrary point in object
space (X, Y, Z) is projected through the lens center on to an image plane. Note that in a synthetic
camera we do not worry much about fine points such as an image coordinate system defined by
fiducial marks or the difference between the points M and H (M being the origin of the image
coordinate system and H being the intersection point of the line normal to the image plane and
passing through the lens center).
In robotics we typically use cameras that might use rotations around very particular axes. Slide
9.103 defines the world coordinate system with (X, Y, Z) and defines a point or axis of rotation
in the world coordinate system at the end of vector w0 at location (X0 , Y0 , Z0 ). That location of
an axis of rotation then defines the angle under which the camera itself is looking at the world.
The camera has coordinate axes (x, y, z) and an optical axis in the direction of coordinate axis z.
The image coordinates are 2-dimensional with an origin at the center of the image and that point
itself is defined by an auxiliary vector r with respect to the point of rotation. So we see that we
have various definitions of angles and coordinate systems and we always need to understand these
coordinate systems and convert them into one another.
Slide 9.104 explains this further: We do have a camera looking at the world, again we have an image
coordinate system (x, y), and a sensor system (x, y, z) that are defined in the world coordinate
system (X, Y, Z). As we want to define where a camera is in the world coordinate system and in
which direction its optical axis is pointing we have to build up a transformation just as we did
previously with the 3-dimensional conformal transformation.
Let us assume that we start out with a perfect alignment of our camera in the world coordinate
system so that the sensor coordinate axes x, y, z and the world coordinate axis X, Y , Z are
coinciding. We now move the camera into an arbitrary position which represents the translation
in 3-D space defined, if you recall, by the translational vector t. Then we orient the camera by
rotating it essentially around 3 axes into an arbitrary position. First rotation may be as suggested
in Slide 9.105 around the z axis which represents the angle A in Slide ??. In this slide it is
suggested that the angle is 135o . Next we roll the camera around the x axis, also again by an
angle of 135o and instead of having the camera looking up into the sky we now have it look down at
the object. Obviously we can apply a third rotation around the rotated axis y to give our camera
attitude complete freedom. We now have a rotation matrix that will be defined by those angles of
rotation that we just described, we have a translation vector as described earlier. Implied in all of
this is also a scale factor. We have not discussed yet the perspective center and the image plane.
Obviously, as the distance grows, we go from a wide-angle through a normal-angle to a tele-lens
and that will affect the scale. So the scale of the image is affected by the distance of the camera
from the object and also by the distance of the projection center from the image plane.
Note that we need 7 elements to describe the transformation that we have seen in Slide 9.105. We
need 3 elements of translation, we have 3 angles of rotation and we have one scale factor that is
defined by the distance of the projection center from the image plane. That are the exact same 7
transformation parameters that we had earlier in the 3-dimensional conformal transformation.
Prüfungsfragen:
9.17. STEREOPSIS
181
• Gegeben seien eine 4 × 4-Matrix

8
 0
M=
 0
0
0
8
0
0

8 −24
8
8 

0 24 
1
1
sowie vier Punkte
p1
p2
p3
p4
=
=
=
=
(3, 0, 1)T
(2, 0, 7)T
(4, 0, 5)T
(1, 0, 3)T
im dreidimensionalen Raum. Die Matrix M fasst alle Transformationen zusammen, die zur
Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p0 = M · p
in Gerätekoordinaten erforderlich sind (siehe auch Abbildung B.36, die Bildschirmebene und
daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Transformationsmatrix M werden die Punkte p1 und p2 auf die Punkte
p01
p02
= (4, 8, 12)T
= (6, 8, 3)T
in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p03 und p04 !
Antwort: es gilt
p̃01
p̃02
p̃03
p̃04
9.17
= (8, 16, 24, 2)T
= (48, 64, 24, 8)T
= (48, 48, 24, 6)T
= (8, 32, 24, 4)T
⇒
⇒
⇒
⇒
p01
p02
p03
p04
= (4, 8, 12)T
= (6, 8, 3)T
= (8, 8, 4)T
= (2, 8, 6)T
Stereopsis
This is a good time to introduce the idea of stereopsis although we will have a separate chapter
later in this class. The synthetic camera produces an image that we can look at with one eye
and if we produce a second image and show it to the other eye we will be able to “trick” the eye
into a 3-dimensional perception of the object that was imaged. Slide 9.107. We model binocular
vision by two images: we compute or present to the eyes two existing natural images of the object,
separately one image to one eye and the other image to the other eye. Those images can be taken
by one camera placed in two locations or there can be synthetic images computed with a synthetic
camera. Slide 9.108 explains further that our left eye is seeing point Pleft , the right eye is seeing
point Pright , and in the brain those 2 observations are merged in a 3-dimensional location P .
Slide 9.109 illustrates that a few rules need to be considered when creating images for stereoscopic
viewing. Image planes and the optical axis for the 2 images should be parallel. Therefore, one
should not create two images with converging optical axes. This would be inconsistent with natural
human viewing. Only people who squint2 will have converging optical axes. Normal stereoscopic
viewing would create a headache if the images were taken with converging optical axes.
We call the distance between the two lens centers for the two stereoscopic images the stereobase
B. Slide 9.110 shows the same situation in a top view. We have the distance from the lens center
2 in
German: schielen
182
CHAPTER 9. TRANSFORMATIONS
to the image plane which is typically noted as the camera constant or focal length and an object
point W which is projected into image locations (X1 , Y1 ) and (X2 , Y2 ), and we have the two optical
axes Z parallel to one another and perpendicular to XY .
Note that we call the ratio of B/Distance-to-W also the Base/Heigth ratio, this being a measure
of quality for the stereo-view. If we compute a synthetic image from a 3-dimensional object for
the left and the right eye we might get a result as shown in Slide 9.111 which indeed can be viewed
stereoscopically under a stereoscope.
To make matters a little more complicated yet, it turns out that a human can view stereoscopically
two images that do not necessarily have to be made by a camera under a central perspective
projection. As long as the two images are similar enough in radiometry and if the geometric
differences are not excessive, the human will be able to merge the two images into a 3-dimensional
impression. This factor has been used in the past to represent measurements in 3 dimensions,
for example, temperature. We could encode temperature as a geometric difference in 2 otherwise
identical images and we would see a 2-dimensional scene and temperature would be shown as
height. This and similar applications have in the past been implemented by various researchers.
9.18
Interpolation versus Transformation
One may want to transfer an object such as a distorted photo in Slide 9.113 into an output
geometry. This can be accomplished by a simplified transformation based for example on 4 points.
This will reveal errors (distortions) in other known points (see Slide 9.114 and Slide 9.115). These
errors can be used to interpolate a continuous error function dx (x, y), dy (x, y) which must be
applied to each (x, y) location:
x0
y0
= x + dx (x, y)
= y + dy (x, y)
We have replaced a complicated transformation by a much simpler transformation plus an interpolation. Question: What is the definition of interpolation?
9.19
Transforming a Representation
9.19.1
Presenting a Curve by Samples and an Interpolation Scheme
We may want to represent an object in various ways. We may have a continuous representation
of an object or we might sample that object and represent the intervals between samples by some
kind of interpolation and approximation technique. So we have conceptually something similar
to a transformation because we have two different ways of representing an object. Slide 9.117
introduces the basic idea that is described by a set of points p1 , p2 , . . . , pn . If we are in a 2dimensional space we may want to represent that object not by n points but by a mathematical
curve. In 3-dimensional space it may be a surface to represent a set of points: We transform from
one representation into another.
A second item is that an object may not be given by points, but by a set of curves x = fx (t),
y = fy (t), and z = fz (t). We would like to replace this representation by another mathematical
representation which may be more useful for certain tasks.
Again while we are going to look at this basically in 2 dimensions or for curves, a generalization
into 3 dimensions and to surfaces always applies.
9.19. TRANSFORMING A REPRESENTATION
9.19.2
183
Parametric Representations of Curves
We introduce the parametric representation of a curve. We suggest in Slide 9.120 that the 2dimensional curve Q in an (x, y) Cartesian coordinate system can be represented by two curves
Q = x(t), y(t). We note this as a parametric representation. The parameter t typically can be
the length of the curve and as we proceed along a curve, the coordinate x and the coordinate y
will change as the function of the curve length t. More typically, t may be “time” for a point to
move along the curve. The advantage of a parametric representation is described in Slide 9.120.
dy(t)
The tangent is replaced by a tangent vector Q0 (t) = ( dx(t)
dt , dt ). That vector has a direction and
length.
9.19.3
Introducing Piecewise Curves
We may also not use a representation of the function x(t) or y(t) with a high order polynomial
but instead we might break up the curve into individual parts, each part being a polynomial of
third order (a cubic polynomial). We connect those polynomials at joints by forcing continuity at
the joints.
If a curve is represented in 3D-space by the equations x(t), y(t), and z(t) as shown in Slide 9.121,
we can request that at the joints those polynomial pieces be continuous in the function but are
also continuous in the first derivative or tangent. We may even want to make it continuous in
the curvature or second derivative (the length of the tangent). However, this type of geometric
continuity is narrower than the continity in “speed” as acceleration, if t is interpreted as time.
One represents such a curve by a function Q(t) which is really a vector function (x(t), y(t), z(t)).
9.19.4
Rearranging Entities of the Vector Function Q
In accordance with the equation of Slide 9.121, Q(t) can be represented as a multiplication of a
(row) vector T and a coefficient matrix C where T contains the independent parameter t as the
coefficient of the unknowns ax , bx , and cx . Matrix C can now be decomposed into M · G. As a
result we can write that Q(t) = T · M · G and we call now G a geometry vector and M is called
a basis matrix . We can introduce a new entity, a function B = T · M and those are the cubic
polynomials or the so-called blending functions.
Prüfungsfragen:
• Was sind der Geometrievektor“, die Basisfunktion“ und die Blending Funktionen“ einer
”
”
”
parametrischen Kurvendarstellung?
Antwort:
x(t)
y(t)
z(t)
Q(t)
T
=
=
=
=
=
ax t3 + bx t2 + cx t + dx
ay t3 + by t2 + cy t + dy
az t3 + bz t2 + cz t + dz
(x(t), y(y), z(t))T = T · C
(t3 , t2 , t, 1)
Man zerlegt C in C = M · G, sodass
Q(t) = T · C = T · M · G
mit G als Geometrievektor und M als Basismatrix. Weiters sind
B=T·M
184
CHAPTER 9. TRANSFORMATIONS
kubische Polynome, die Blending Functions.
9.19.5
Showing Examples: Three methods of Defining Curves
Slide 9.122 introduces three definitions of curves that are frequently used in engineering. Let’s
take a look at an example in Slide 9.123. In that slide we have a continuous curve represented
by 2 segments S and C. They are connected at a joint. Depending on the tangent vector at the
joint we may have different curves. Illustrated in Slide 9.123 are 3 examples C0 , C1 , and C2 . C0
is obtained if we simply enforce at the joint that the function be continuous, but we don’t worry
about the tangent vectors to be have the same direction. C1 results if we say that the function
has to have the same derivative. C2 further defines that also the length must be identical at the
joint. So we have 3 different types of continuity at the joint: function, velocity, acceleration. This
type of continuity is narrower than mere geometric continuity with function, slope and curneture.
In computer graphics one describes the type of continuity by the direction and the length of the
tangent vector. Slide 9.124 again illustrates how a point P2 is the joint between curve segments
Q1 and Q2 , two curves passing through P1 , P2 , Pi and P3 . Defining two different lengths for the
tangent (representing velocity) leads to two different curve segments Q2 , Q3 .
Slide 9.125 describes a curve with two segments joining at point P . We indicate equal time
intervals, showing a “velocity” that reduces as we approach point P . At point P we change
direction and accelerate. In this case of course, the function is continuous but as shown in that
example, the tangent is not continuous. We have a discontinuity in the first derivative at point P .
9.19.6
Hermite’s Approach
There is a concept in the representation of curves by means of cubic parametric equations called the
Hermite’s Curves. We start out with the beginning and end point of a curve and the beginning
and end tangent vector of that curve, and with those elements we can define a geometry vector G
as discussed earlier in Slide 9.121. Slide 9.127 explains several cases where we have a beginning
and end point of a curve defined, associated with a tangent vector and as a result we can now
describe a curve. Two points and two tangent vectors define four elements of a curve. In 2D space
this is a third order or cubic curve with coefficients a, b, c and d. Slide 9.128’s curves are basically
defined by 2 points and 2 tangent vectors. Since the end point of one curve is identical to the
beginning point of the next, we obtain a continuous curve. The tangent vectors are parallel but
point into opposite directions. Geometrically we are continuous in the shape, but the vertices are
opposing one another. This lends itself to describing curves by placing points interactively on a
monitor with a tangent vector. This is being done in constructing complex shapes, say in the car
industry where car bodies need to be designed. A particular approach to accomplishing this has
been proposed by Bezier.
9.20
Bezier’s Approach
Pierre Bezier worked for a French car manufacturer and invented an approach of designing 3dimensional shapes, but we will discuss this in 2 dimensions only. He wanted to represent a smooth
curve by means of 2 auxiliary points which are not on the curve. Note that so far we have had
curves go through our points, and Bezier wanted a different approach. So he defined 2 auxiliary
points for a curve and the directions of the tangent vectors. Slide 9.130 defines the beginning and
end points, P1 and P4 and the tangent at P1 using an auxiliary point P2 and the tangent at P4 by
using an auxiliary point P3 . By moving P2 and P3 one can obtain various shapes as one pleases,
passing through P1 and P4 .
9.21. SUBDIVIDING CURVES AND USING SPLINE FUNCTIONS
185
Definition 26 Bezier-curves in 2D
Sind definierte Punkte P0 bis Pn gegeben, die durch eine Kurve angenähert werden sollen, dann
ist die dazugehörige Bézierkurve:
P (t) =
n
X
Bin (t)Pi
0≤t≤1
1.0
i=0
Die Basisfunktionen, Bernsteinpolynaome genannt, ergeben sich aus:
Bin (t) =
n
i
ti (1 − t)n−i
mit
n
i
=
n!
i!(n − i)!
2.0
Sie können auch rekursiv berechnet werden. Bézierkurven haben die Eigenschaften, dass sie:
• Polynome (in t) vom Grad n sind, wenn n+1 Punkte gegeben sind,
• innerhalb der konvexen Hülle der definierenden Punkte liegen,
• im ersten Punkt P0 beginnen und im letzten Punkt Pn enden und
• alle Punkte P0 bis Pn Einfluss auf den Verlauf der Kurve haben.
Slide 9.131 illustrates the mathematics behind it. Obviously, we have a tangent at P1 denoted as
R1 , which is according to Bezier 3 · (P2 − P1 ). The analogous applies to tangent R4 . If we define
tangents in that way, we then obtain a third order parametric curve Q(t) as shown in Slide 9.131.
Slide 9.132 recalls what we have discussed before, how these cubic polynomials for a parametric
representation of a curve or surface can be decomposed into a geometric vector and a basis matrix
and how we define a blending function. Slide 9.133 illustrates geometrically some of those blending
functions for Bezier. Those particular ones are called Bernstein-curves.
Now let’s proceed in Slide 9.134 to the construction of a complicated curve that consists of 2
polynomial parts. We therefore need the beginning and end point for the first part, P1 and P4 ,
and the beginning and end point for the second part which is P4 and P7 . We then need to have
auxiliary points P2 , P3 , P5 and P6 to define the tangent vectors at P1 , P4 , P7 . P3 defines the
tangent at P4 for the first curve segment and P5 defines the tangent at point P4 for the second
segment. We are operating here with piece-wise functions. If P3 , P4 , and P5 are colinear, then
the curve is geometrically continuous. Study Slide 9.134 for details.
Prüfungsfragen:
• Was ist die Grundidee bei der Konstruktion von 2-dimensionalen Bezier-Kurven“?
”
• Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von
Kurven, und erläutern Sie anhand einer Skizze ein Approximationsverfahren Ihrer Wahl!
9.21
Subdividing Curves and Using Spline Functions
We can generalize the ideas of Bezier and other people and basically define spline functions3
as functions that are defined by a set of data points P1 , P2 , . . . , Pn to describe an object and
we approximate the object by piecewise polynomial functions that are valid on certain intervals.
In the general case of splines the curve does not necessarily have to go through P1 , P2 , . . . , Pn .
3 in
German: Biegefunktionen
186
CHAPTER 9. TRANSFORMATIONS
Algorithm 24 Casteljau
{Input: array p[0:n] of n+1 points and real number u}
{Output: point on curve, p(u)}
{Working: point array q[0:n]}
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
for i := 0 to n do
q[i] := p[i]
end for
for k := 1 to n do
for i := 0 to n - k do
q[i] := (1 - u)q[i] + uq[i + 1]
end for
end for
return q[0]
{save input}
We need to define the locations of the joints, and the type of continuity we want. Note that we
abondon here the used for a parametric representation.
Let us examine the idea that our points describing an object may be in error, for example those
points may be reconstructions from photographs taken of an object using a stereo reconstruction
process. Because the points may be in error and therefore be noisy, we do not want the curve
or surface to go through the points. We want an approximation of the shape. In that case we
need to have more points than we have unknown parameters of our function. In the Least Squares
approach discussed earlier, we would get a smooth spline going nearly through the points. Slide
9.136 illustrates the idea of a broken-up curve and defines a definition area for each curve between
joints P2 , P3 and P3 , P4 . We enforce continuity of the curves at the joints, for example by saying
that the tangent has to be identical. A spline that goes exactly through the data points is different
from the spline that approximates the data points only. Note that the data points are called control
points 4 .
Of course the general idea of a spline function can be combined with Bezier as suggested in
Slide 9.137 curve. For added flexibility we want to replace a single Bezier curve by two Bezier
curves which are defined on a first and second part of the original Bezier curve. We solve this
problem by finding auxiliary points and tangents such that the conditions apply, by propertionally
segmenting distance as shown.
Slide 9.139 illustrates the process. The technique is named after a French engineer Casteljeau.
The single curve defined by P1 , P4 (and auxiliary points P2 , P3 ) is broken into two smaller curves
defined by L1 , . . . , L4 and another curve defined by R1 , . . . , R4 .
Spline functions of a special kind exist if we enforce that the tangents at the joint are parallel to
the line going through adjacent neighboring joints. Slide 9.140 explains. The technique is named
after Catmull-Rom.
Prüfungsfragen:
• In Abbildung B.26 sehen Sie vier Punkte P1 , P2 , P3 und P4 , die als Kontrollpunkte für
eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des
Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 31 , also x( 13 ), und
erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26
eintragen, eine skizzenhafte Darstellung ist ausreichend.
Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der
Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird.
Antwort: Die Strecken sind rekursiv im Verhältnis
4 in
German: Pass-Punkte
1
3
:
2
3
zu teilen (siehe Abbildung 9.2).
9.22. GENERALIZATION TO 3 DIMENSIONS
187
Figure 9.2: Konstruktion einer Bezier-Kurve nach Casteljau
9.22
Generalization to 3 Dimensions
Slide 9.142 suggests a general idea of taking the 2-dimensional discussions we just had and transporting them into 3 dimensions. Bezier, splines and so forth, all exist in 3-D as well. That in
effect is where the applications are. Instead of having coordinates (x, y) or parameters t we now
have coordinates (x, y, z) or parameters t1 , t2 . Instead of having points define a curve we now have
a 3-dimensional arrangement of auxiliary points that serve to approximate a smooth 3D-surface.
9.23
Graz and Geometric Algorithms
On a passing note, a disproportional number of people who have been educated at the TU Graz
have become well-known and respected scientists in the discussion of geometric algorithms. Obviously, Graz has been a hot bed of geometric algorithms. Look out for classes on “Geometric
Algorithms”. Note that these geometric algorithms we have discussed are very closely related to
mathematics and really are associated with theoretical computer science and less so with computer
graphics and image processing. The discussion of curves and surfaces also is a topic of descriptive
geometry. In that context one speaks of “free-form curves and surfaces”. Look out for classes and
that subject as well!
188
CHAPTER 9. TRANSFORMATIONS
9.23. GRAZ AND GEOMETRIC ALGORITHMS
189
Slide 9.1
Slide 9.2
Slide 9.3
Slide 9.4
Slide 9.5
Slide 9.6
Slide 9.7
Slide 9.8
Slide 9.9
Slide 9.10
Slide 9.11
Slide 9.12
Slide 9.13
Slide 9.14
Slide 9.15
Slide 9.16
Slide 9.17
Slide 9.18
Slide 9.19
Slide 9.20
Slide 9.21
Slide 9.22
Slide 9.23
Slide 9.24
Slide 9.25
Slide 9.26
Slide 9.27
Slide 9.28
190
CHAPTER 9. TRANSFORMATIONS
Slide 9.29
Slide 9.30
Slide 9.31
Slide 9.32
Slide 9.33
Slide 9.34
Slide 9.35
Slide 9.36
Slide 9.37
Slide 9.38
Slide 9.39
Slide 9.40
Slide 9.41
Slide 9.42
Slide 9.43
Slide 9.44
Slide 9.45
Slide 9.46
Slide 9.47
Slide 9.48
Slide 9.49
Slide 9.50
Slide 9.51
Slide 9.52
Slide 9.53
Slide 9.54
Slide 9.55
Slide 9.56
9.23. GRAZ AND GEOMETRIC ALGORITHMS
191
Slide 9.57
Slide 9.58
Slide 9.59
Slide 9.60
Slide 9.61
Slide 9.62
Slide 9.63
Slide 9.64
Slide 9.65
Slide 9.66
Slide 9.67
Slide 9.68
Slide 9.69
Slide 9.70
Slide 9.71
Slide 9.72
Slide 9.73
Slide 9.74
Slide 9.75
Slide 9.76
Slide 9.77
Slide 9.78
Slide 9.79
Slide 9.80
Slide 9.81
Slide 9.82
Slide 9.83
Slide 9.84
192
CHAPTER 9. TRANSFORMATIONS
Slide 9.85
Slide 9.86
Slide 9.87
Slide 9.88
Slide 9.89
Slide 9.90
Slide 9.91
Slide 9.92
Slide 9.93
Slide 9.94
Slide 9.95
Slide 9.96
Slide 9.97
Slide 9.98
Slide 9.99
Slide 9.100
Slide 9.101
Slide 9.102
Slide 9.103
Slide 9.104
Slide 9.105
Slide 9.106
Slide 9.107
Slide 9.108
Slide 9.109
Slide 9.110
Slide 9.111
Slide 9.112
9.23. GRAZ AND GEOMETRIC ALGORITHMS
193
Slide 9.113
Slide 9.114
Slide 9.115
Slide 9.116
Slide 9.117
Slide 9.118
Slide 9.119
Slide 9.120
Slide 9.121
Slide 9.122
Slide 9.123
Slide 9.124
Slide 9.125
Slide 9.126
Slide 9.127
Slide 9.128
Slide 9.129
Slide 9.130
Slide 9.131
Slide 9.132
Slide 9.133
Slide 9.134
Slide 9.135
Slide 9.136
Slide 9.137
Slide 9.138
Slide 9.139
Slide 9.140
194
CHAPTER 9. TRANSFORMATIONS
Slide 9.141
Slide 9.142
Slide 9.143
Chapter 10
Data Structures
10.1
Two-Dimensional Chain-Coding
Algorithm 25 Chain coding
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
resample boundary by selecting larger grid spacing
starting from top left search the image rightwards until a pixel P[0] belonging to the region is
found
initialize orientation d with 1 to select northeast as the direction of the previous move
initialize isLooping with true
initialize i with 1
while isLooping do
search the neighbourhood of the current pixel for another unvisited pixel P[i] in a clockwise
direction beginning from (d + 7) mod 8, increasing d at every search step
if no unvisited pixel found then
set isLooping false
else
print d
end if
increase i
end while
We start from a raster image of a linear object. We are looking for a compact and economical
representation by means of vectors. Slide 10.3 illustrates the 2-dimensional raster of a contour
image, which is to be encoded by means of a chain-code. We have to make a decision about the
level of generalization or elimination of detail. Slide 10.4 describes the 4 and 8 neighborhood for
each pixel and indicates by a sequence of numbers how each neighbor is labeled as 1, 2, 3, 4, . . . , 8.
Using this approach, we can replace the actual object by a series of pixels and in the process
obtain a different resolution. We have resampled the contour of the object. Slide 10.6 shows how
a 4-neighborhood and an 8-neighborhood will serve to describe the object by a series of vectors,
beginning at an initial point. The encoding itself is represented by a string of integer numbers.
Obviously we obtain a very compact representation of that contour.
Next we can think of a number of normalizations of that coding scheme. We may demand that
the sum of all codes be minimized. Instead of recording the codes themselves to indicate in which
direction each vector points, we can look at code differences only, which would have the advantage
that they are invariant under rotations.
Obviously the object will look different if we change the direction of the grid at which we resample
195
196
CHAPTER 10. DATA STRUCTURES
the contour. An extensive theory of chain codes has been introduced by H. Freeman and one of
the best-known coding schemes is therefore also called the Freeman-Chain-Code.
Prüfungsfragen:
• Gegeben sei eine Punktfolge entsprechend Abbildung ?? und ein Pixelraster, wie dies in
Abbildung ?? dargestellt ist. Geben Sie bitte sowohl grafisch als auch numerisch die kompakte Kettenkodierung dieser Punktfolge im Pixelraster an, welche mit Hilfe eines 8-Codes
erhalten wird.
10.2
Two-Dimensional Polygonal Representations
Algorithm 26 Splitting
1:
2:
3:
4:
5:
6:
Splitting methods work by first drawing a line from one point on the boundary to another.
Then, we compute the perpendicular distance from each point along the segment to the line.
If this exceeds some threshold, we break the line at the point of greatest error.
We then repeat the process recursively for each of the two new lines until we don’t need to
break any more.
For a closed contour, we can find the two points that lie farthest apart and fit two lines
between them, one for one side and one for the other. Then, we can apply the recursive
splitting procedure to each side.
Let us assume that we do have an object with an irregular contour as shown in Slide 10.9 on the
left side. We describe that object by a series of pixels and the transition from the actual detailed
contour to the simplification of a representation by pixels must follow some rules. One of those
is a minimum parameter rule which takes the idea of a rubber band that is fit along the contour
pixels as shown on the right-hand side of Slide 10.9.
At issue is many times the simplification of a shape in order to save space, while maintaining the
essence of the object. Slide 10.10 explains how one may replace a polygonal representation of an
object by a simplified minimum quadrangle. One will look for the longest distance that can be
defined from points along the contour of the object. This produces a line segment ab. We then
further subdivide that shape by looking for the longest line that is perpendicular to the axis that
we just found. This produces a quadrangle. We can now continue on and further refine this shape
by a simplifying polygon defining a maximum deviation between the actual object contour and its
simplification. If the threshold value is set at 0.25 then we obtain the result shown in Slide 10.10.
The process is also denoted as splitting (algorithm 26).
Prüfungsfragen:
• Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie
einen Schritt des Algorithmus im Detail anhand Ihrer Zeichnung! Wählen Sie den Schwellwert so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann
vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung B.35 einzeichnen.
10.3. A SPECIAL DATA STRUCTURE FOR 2-D MORPHING
197
Definition 27 2D morphing for lines
Problems with other kinds of representation can be taken care of by the parametric representation.
In Parametric representation a single parameter t can represent the complete straight line once
the starting and ending points are given. In parametric representation
x = X(t), y = Y (t)
For starting point (x1, y1) and ending point (x2, y2)
(x, y)
(x, y)
= (x1, y1)
= (x2, y2)
if
if
t=0
t=1
Thus any point (x, y) on the straight line joining two points (x1, y1) and (x2, y2) is given by
x = x1 + t(x2 − x1)
y = y1 + t(y2 − y1)
10.3
A Special Data Structure for 2-D Morphing
Suppose the task is defined as in Slide 10.13 where an input figure, in this particular case a cartoon
of President Bush, needs to be transformed into an output figure, namely the cartoon of President
Clinton. The approach establishes a relationship between the object contour points of the input
and output cartoons. Each point on the input cartoon will correspond to one or no point on the
output cartoon. In order to morph the input into the output one needs now to take these vectors
which link these points. We introduce a parametric representation x = fx (t), y = fy (t). We
gradually increase the value of the parameter t from 0 to 1. At a value of the parameter t = 0 one
has the Bush cartoon, at the parameter t = 1, one has the Clinton cartoon. The transition can be
illustrated in as many steps as one likes. The basic concept is shown in Slide ?? and Slide 10.14
and the result is shown in Slide 10.15.
Prüfungsfragen:
• In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in
eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder
als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen werden benötigt? Erläutern Sie Ihre Antwort anhand einer beliebigen Strecke
aus Abbildung B.3!
10.4
Basic Concepts of Data Structures
For a successful data structure we would like to have a direct access to data independent of how big
a data base is. We would like to have simple arrays, our data should be stored sequentially and we
might use pointer lists, thus pointers, chains, trees, and rings. This all is applicable in geometric
data represented by coordinates. Slide 10.17 illustrates how we can build a directed graph of some
geometric entities that are built from points in 3-dimensional space with coordinates x, y, z at the
base. From those points, we produce lists of edges which combine two points into an edge. From
the edges, one builds regions or areas which combine edges into contours of areas.
Slide 10.18 shows that we request an ease of dynamic changes in the data, so we can insert or delete
points and objects or areas. We will also like to be able to change dynamically a visualization:
if we delete an object we should not be required to completely recompute everything. We would
198
CHAPTER 10. DATA STRUCTURES
like to have support for a hierarchical approach so that we can look at an overview as well as at
detail. And we would like to be able to group objects into hyper-objects and we need to have a
random access to arbitrary objects independent of the number of objects in the data base. Let us
now examine a few data structures.
Prüfungsfragen:
• Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch
seine (polygonale) Oberfläche genutzt werden kann!
10.5
Quadtree
Algorithm 27 Quadtree
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
{define datastructure quadtree}
quadtree=(SW,SE,NW,NE:Pointer of quadtree,value)
{SW south-western son, SE south-eastern son}
{NW north-western son, NE north-eastern son}
{value holds e.g. brightness}
init quadtree = (NULL,NULL,NULL,NULL,0)
while the entire image has not been segmented do
segment actually processed area into 4 squares
if there is no element of the object left in a subdivided square then
link a leaf to the quadtree according to the actually processed square
{leaf =
quadtree(NULL,NULL,NULL,NULL,value)}
else
link new node to (SW or SE or NW or NE) of former quadtree according to the actually
processed square
{node = quadtree (SW,SE,NW,NE,0)}
if node holds four leafs containing the same value then
replace node with leaf containig value
end if
end if
end while
A quadtree is a tree data structure for 2-dimensional graphical data, where we subdivide the root,
the 2-dimensional space, into squares of equal size, so we subdivide an entire area into 4 squares,
we subdivide those 4 squares further into 4 squares and so forth. We number each quadrant as
shown in Slide 10.20. Now if we have an object in an image or in a plane we describe the object
by a quadtree by breaking up the area sequentially, until such time that there is no element of the
object left in a subdivided square. In this case we call this a leaf of the tree structure, an empty
leaf. So we have as a node a quadrant, and each quadrant has four pointers to its sons. The sons
will be further subdivided until such time that there is either the entire quadrant filled with the
object or it is entirely empty.
A slight difference to the quadtree is the Bin-tree. In it, each node has only two sons and not four
like in the quadtree. Slide 10.21 explains.
If there is a mechanical part available as shown in Slide 10.22 then a pixel representation may be
shown on the left and the quadtree representation at right. The quadtree is more efficient. There
is an entire literature on geometric operations in quadtrees such as geometric transformations,
scale changes, editing, visualization, Boolean operations and so forth. Slide 10.23 represents the
mechanical part of Slide 10.24 in a quadtree representation.
10.6. DATA STRUCTURES FOR IMAGES
199
A quadtree has “levels of subdivisions”, obviously, and its root is at the highest level, with a single
node. The next level up is shown in Slide 10.24 and has one empty and three full nodes which are
further subdivided into a third level with some empty and some full leaves and some nodes that
are further subdivided into a fourth level. The leafs are numbered sequentially from north-west to
south-east. Slide 10.25 again illustrates how a raster image with pixels of equal area is converted
into a quadtree representation. It is more efficient since there are fewer leafs in a quadtree than
there are pixels in an image, except when the image is totally chaotic.
One may want to store all leaves whether they are empty or full or one stores only the full leafs,
thereby saving storage space. Typically this may save 60 percent as in the example of Slide 10.26.
Prüfungsfragen:
• Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung
dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung
”
B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes
grafisch mitzuteilen.
• Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält
sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen?
10.6
Data Structures for Images
So far we have looked at data structures for binary data, showing objects by means of their
contours, or as binary objects in a raster image. In this chapter, we are looking at data structures
for color and black and white gray value images. A fairly complete list of such data structures can
be seen in PhotoShop (Slide 10.28 and Slide 10.29). Let us review a few structures as shown in
Slide 10.30.
We can store an image by storing it pixel by pixel, and all information that belongs to a pixel
is stored sequentially, or we store row by row and we repeat say red, green, blue for each row of
images or we can go band sequential which means we store a complete image, one for the red, one
for the green, one for the blue channel. Those forms are called BSSF or BIFF (Band Sequential
File Format or similar). The next category is the TIFF-format, a tagged image file format, another
one is to store images in tiles, in little 32 by 32 or 128 by 128 windows.
The idea of hexagonal pixels has been proposed. An important idea is that of pyramids, where a
single image is reproduced at different resolutions, and finally representations of images by fractals
or wavelets and so forth exist. Slide 10.31 illustrates the idea of an image pyramid. The purpose
of pyramids is to start an image analysis process on a much reduced version of an image, e.g. to
segment it into its major parts and then guide a process which refines the preliminary segmentation
from resolution level to resolution level. This increases the robustness of an approach and also
reduces computing times. At issue is how one takes a full resolution image and creates from it
reduced versions. This may be by simple averaging or by some higher level processes and filters
that create low resolutions from neighborhoods of higher resolution pixels.
Slide 10.32 suggests that data structures for images are important in the context of image compression and we will address that subject under the title “Compression” towards the end of this
class.
Prüfungsfragen:
• In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das
erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht
200
CHAPTER 10. DATA STRUCTURES
nur mehr aus einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo
wird sie eingesetzt (nennen Sie mindestens ein Beispiel)?
• In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt
gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge
des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten
Speicherbedarf einer solchen Repräsentation her, wobei
– das erste (größte) Bild aus N × N Pixeln besteht,
– alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden,
– eine mögliche Komprimierung nicht berücksichtigt werden soll!
Hinweis: Benutzen Sie die Gleichung
Antwort:
P∞
i=0
S(N ) <
qi =
N2 ·
1
1−q
für q ∈ R, 0 < q < 1.
∞ i
X
1
i=0
4
1
1 − 14
1
4
= N2 · 3 = N2
3
4
= N2 ·
10.7
Three-Dimensional Data
The requirements for a successful data structure are listed in Slide 10.34. Little needs to be added
to the contents of that slide.
Prüfungsfragen:
• Nennen Sie allgemeine Anforderungen an eine Datenstruktur zur Repräsentation dreidimensionaler Objekte!
10.8
The Wire-Frame Structure
Definition 28 Wireframe structure
The simplest three-dimensional data structure is the wire-frame. A wireframe model captures the
shape of a 3D object in two lists, a vertex list and an edge list. The vertex list specifies geometric
information: where each corner is located. The edge list provides conectivity information,
specifying (in arbitrary order) the two vertices that form the endpoints of each edge.
The vertex-lists are used to build edges, the edges build edge-lists which then build faces
or facets and facets may build objects. In a wire-frame, there are no real facets, we simply go
from edges to objects directly.
The simplest three-dimensional data structure is the wire-frame. At the lowest level we have a list
of three-dimensional coordinates. The point-lists are used to build edges, the edges build edge-lists
which then build faces or facets and facets may build objects. In a wire-frame, there are no real
facets, we simply go from edges to objects directly. Slide 10.36 shows the example of a cube with
10.9. OPERATIONS ON 3-D BODIES
201
the object, the edge-lists and the point-lists. The edges or lines and the points or vertices are
again listed in Slide 10.37 for a cube. In Slide 10.38 the cube is augmented by an extra-plane and
represented by two extra vertices and three extra lines.
Prüfungsfragen:
• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken!
10.9
Operations on 3-D Bodies
Assume that we have 2 cubes, A and B, and we need to intersect them. A number of Boolean
operations can be defined as an intersection or a union of 2 bodies, subtracting B from A or A
from B leading to different results.
10.10
Sweep-Representations
A sweep-representation creates a 3-D object by means of a 2-D shape. An object will be created
by moving the 2-D representation through 3-D space denoting the movement as sweep. We may
have a translatory or a rotational sweep as shown in Slide ?? and Slide 10.43. A translatory sweep
can be obtained by a cutting tool. A rotational sweep obviously will be obtained by a rotational
tool. We have in Slide 10.43 the cutting tool, the model of a part and the image of an actual part
as produced in a machine.
Prüfungsfragen:
• Was versteht man unter einer Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese
”
Art der Objektrepräsentation?
• In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben!
10.11
Boundary-Representations
A very popular representation of objects is by means of their boundaries. Generally, these representations are denoted as B-reps. They are built from faces with vertices and edges. Slide 10.45
illustrates an object and asks the question of how many objects are we facing here, how many
faces, how many edges and so forth? A B-rep system makes certain assumptions about the topology of an object. In Slide 10.46 we show a prism that is formed from 5 faces, 6 vertices, 9 edges.
A basic assumption is that differential small pieces on the surface of the object can be represented
by a plane as shown in the left and central elements of Slide 10.46. On the right-hand side of Slide
10.46 is a body that does not satisfy the demands on a 2-manifold topology and that is the type
of body we may have difficulties with in a B-rep system.
A boundary representation takes advantage of Euler’s Formula. It relates the number of vertices,
faces and edges to one another as shown in Slide 10.47. A simple polyhedron is a body that can be
deformed into a sphere and therefore has no holes. In this case, Euler’s Formula applies. Slide
10.48 shows three examples that confirm the validity of Euler’s Formula. Slide 10.49 illustrates
a body with holes. In that case, Euler’s formula needs to be modified.
202
CHAPTER 10. DATA STRUCTURES
Prüfungsfragen:
• Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die
Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die
Reihenfolge, damit beide Flächen in die gleiche Richtung weisen“!
”
• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken!
10.12
A B-Rep Data Structure
Definition 29 Boundary representation
A B-Rep structure describes the boundary of an object with the help of 3-dimensional Polygon
Surfaces. The B-Rep model consists of three different object types: vertices, edges and surfaces.
The B-Rep strucure is often organized in:
• V: A set of vertices (Points in 3D-Space)
• E: A set of edges. The edges are defined by 2 points referenced from E
• S: A set of surfaces. Each surface is defined by a sequence of edges from V (at least 3 edges
define a surface)
The direction of the normal vector of the surfaces is usually given by the order of its edges
(clockwise or counterclockwise). Due to the referencing the B-Rep permits a redundancy-free
managment of the geometric information.
A B-rep structure is not unlike a wire-frame representation, but it does represent an object with
pointers to polygons and lists of polygons with pointers to edges and one differentiates between
spaces that are outside and inside the object taking advantage of the sequence of edges. Slide
10.52 illustrates a body that is represented by 2 faces in 3-D. We show the point-list, list of edges
and list of phases.
Slide 10.53 illustrates a B-rep representation of a cube with the list of faces, the list of edges,
the point-lists, and the respective pointers. Slide 10.54 explains the idea of inside and outside
directions for each face. The direction of the edges defines the direction of the normal vector onto
a face. As shown in Slide 10.54, A would be inside of B in one case, and outside of B in the other
depending on the direction of the normal onto face B.
10.13
Spatial Partitioning
An entirely different approach to 3-dimensional data structures is the idea of a spatial partitioning
approach.
In Slide 10.56 we choose the primitives to be prisms and cubes. They build the basic cells for a
decomposition. From those basic elements we can now build up various shapes as shown in that
slide. A special case occurs if the primitive is a cube of given size as shown in Slide 10.57. Slide
10.58 introduces the idea of the oct-tree which is the 3-dimensional analogon to the quadtree. Slide
10.59 explains how the 3 dimensional space as a root is decomposed into 8 sons, which then are
further decomposed until there is no further decomposition necessary because each son is either
empty or full. The example of Slide 10.59 has 2 levels and therefore the object can be created
from 2 types of cubes. Slide 10.60 illustrates the resulting representation in a computer that takes
10.14. BINARY SPACE PARTITIONING BSP
203
the root, subdivides it into 8 sons, calls them either white or black and if it needs to be further
subdivided then substitutes for the element another expression with 8 sons and so forth.
Slide 10.61 illustrates an oct-tree representation of a coffee cup. We can see how the surface,
because of its curvature, requires many small cubes to be represented whereas on the inside of
the cup the size of the elements increases. The data structure is very popular in medical imaging
because there exist various sensor systems that produce voxels, and those voxels can be generalized
into oct-trees, similar to pixels that can be generalized into quadtrees in 2 dimensions.
Prüfungsfragen:
• Erklären Sie den Begriff spatial partitioning“ und nennen Sie drei räumliche Datenstruk”
turen aus dieser Gruppe!
10.14
Binary Space Partitioning BSP
Definition 30 Cell-structure
An example for a 3-dimensional data structure is the idea of spatial partitioning. Therefor
some primitives like prisms or cubes are choosen. These primitives build the ”CELLS” for a
decomposition of an object. Every geometrical object can be build with these cells.
A special case occurs, if the primitive is an object of a given size.
A very common datastructure to find the decomposition is the oct-tree. The root (3-dimensional
space) of the oct-tree is subdivided into 8 cubes of equal size and these resulting cubes are
subdivided themselve again until there is no further decomposition necessary. A son in the tree is
marked as black or white (represented or not) or is marked as gray. Then a further decomposition
is needed.
This type of datastructure is very popular in medical imaging. The different sensor systems, like ”Computer Aided Tomography”, are producing voxels. These voxels can be generalized
into oct-trees.
A more specific space partitioning approach is the Binary Space Partitioning or BSP. We subdivide
space by means of planes that can be arbitrarily arranged. The Binary Space Partition is a tree
in which the nodes are represented by the planes. Each node has two sons which are the spaces
which result on the two sides of a plane, we have the inner and the outer half space.
Slide 10.63 illustrates the basic idea in 2 dimensions where the plane degenerates into straight
lines. The figure on the left side of Slide 10.63 needs to be represented by a BSP structure. The
root is the straight line a, subdividing a 2-D space into half spaces, defining an outside and an
inside by means of a vector shown on the left side of the slide. There are two sons and we take
the line b and the line j as the two sons. We further subdivide the half-spaces. We go on until
the entire figure is represented in this manner.
A similar illustration representing the same idea is shown in Slide 10.64. At the root is line 1,
the outside half-space is empty, the inside half-space contains line 2, again with an outside space
empty and the inside space containing line 3 and we repeat the structure.
If we start out with line 3 at the root, we obtain a different description of the same object. We
have in the outside half-space line 4 and on the inside half-space line 2, and now the half-space
defined by line 4 within the half-space defined by line 3 contains only the line segment 1b and the
other half-space as seen from line 3 which is then further subdivided into a half-space by line 2
contains line-segment 1a. The straight line 1 in this case is appearing twice, once in the form of
1a and another time in the form of 1b.
204
CHAPTER 10. DATA STRUCTURES
Algorithm 28 Creation of a BSP tree
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
polygon root;
{current Root-Polygon}
polygon *backList, *frontList;
{polygons in current Halfspaces}
polygon p, backPart, frontPart;
{temporary variables}
if (polyList == NULL) then
return NULL;
{no more polygons in this halfspace}
else
root = selectAndRemovePolygon(&polyList); {prefer polygons defining planes that don’t
intersect with other polygons}
backList = NULL;
frontList = NULL;
for (each remaining polygon in polyList) do
if (polygon p in front of root) then
addToList(p, &frontList);
else
if (polygon p in back of root) then
addToList(p, &backList);
else
{polygon p must be split}
splitPoly(p, root, &frontPart, &backPart);
addToList(frontPart, &frontList);
addToList(backPart, &backList);
end if
end if
end for
return new BSPTREE(root, makeTree(frontList), makeTree(backList));
end if
Prüfungsfragen:
• Geben Sie einen Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten
”
für das Polygon aus Abbildung B.17 an und zeichnen Sie die von Ihnen verwendeten Trennebenen ein!
10.15
Constructive Solid Geometry, CSG
This data structure takes 3D-primitives as input and produces Boolean operations, translations,
scaling and rotational operators to construct 3-dimensional objects from the primitives. Slide
10.66 and Slide 10.67 explain. A complex object as shown in Slide 10.67 may be composed of a
cylinder with an indentation and a rectangular body of which a corner is cut off. The cylinder itself
is obtained by subtracting a smaller cylinder from a larger cylinder. The cut-off is obtained by
subtracting from a fully rectangular shape another rectangular shape. So we have 2 subtractions
and one union to produce our object. In Slide 10.67 we have again 2 primitives, a block and a
cylinder, we can scale them, so we start out with two types of blocks and two types of cylinders.
By an operation of intersection union and difference we obtain a complicated object from those
primitives.
Slide 10.68 explains how Constructive Solid Geometry can produce a result in two different ways.
We can take two blocks and subtract them from one another or we can take two blocks and form
the union of them to obtain a particular shape. We cannot say generally that those two operations
are equivalent, because if we change the shapes of the two blocks, the same two operations may
not result in the same object shown in Slide 10.68.
10.16. MIXING VECTORS AND RASTER DATA
205
Prüfungsfragen:
• Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der
Konstruktion des Objektes (ohne Lampe).
10.16
Mixing Vectors and Raster Data
When we have photo-realistic representations of 3-D objects, we may need to mix data structures,
e.g. vector data or three-dimensional data structures representing 3-D objects and raster data
coming from images. The example of city models has illustrated this issue. introduces a particular
hierarchical structure for the geometric data. It is called the LoD/R-Tree-data structure for Level
of Detail and Rectangular Tree-structure. The idea is that objects are approximated by boxes
in 3D generalized from rectangles in 2 dimensions. These blocks can overlap and so we have the
entire city being at the root of a tree, represented by one block. Each district now is a son of that
root and is represented by blocks. Within each district we may have city blocks, within the city
blocks we may have buildings, and one particular building may therefore be the leaf of this data
structure.
We also have the problem of a level of detail for the photographic texture. We create an image
pyramid by image processing and then store the pyramids and create links to the geometric
elements in terms of level of detail, so that if we have wanted an overview of an object we get very
few pixels to process.
If we take a vantage point to look at the city, we have in the foreground a high resolution for the
texture and in the background low resolution. So we precompute per vantage point a hierarchy
of resolutions that may fall within the so-called View-Frustum. As we change our vantage point
by rotating our eyes, we have to call up from a data base a related element. If we move, thus
change our position, we have to call up from the data base different elements at high resolution
and elements at low resolution.
Slide 10.72 illustrates how the vector data structure describes nothing but the geometry whereas
the raster data describes the character of the object in Slide 10.73.
We may also use a raster data structure for geometric detail as shown in Slide 10.74. In that
case we have an (x, y) pattern of pixels and we associate with each pixel not the gray value but
an elevation representing therefore a geometry in the form of a raster which we otherwise have
typically used for images only.
10.17
Summary
We summarize the various ideas for data structures of spatial objects, be they in 2D or in 3D.
Slide 10.76 addresses 3D.
Prüfungsfragen:
• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken!
206
CHAPTER 10. DATA STRUCTURES
10.17. SUMMARY
207
Slide 10.1
Slide 10.2
Slide 10.3
Slide 10.4
Slide 10.5
Slide 10.6
Slide 10.7
Slide 10.8
Slide 10.9
Slide 10.10
Slide 10.11
Slide 10.12
Slide 10.13
Slide 10.14
Slide 10.15
Slide 10.16
Slide 10.17
Slide 10.18
Slide 10.19
Slide 10.20
Slide 10.21
Slide 10.22
Slide 10.23
Slide 10.24
Slide 10.25
Slide 10.26
Slide 10.27
Slide 10.28
208
CHAPTER 10. DATA STRUCTURES
Slide 10.29
Slide 10.30
Slide 10.31
Slide 10.32
Slide 10.33
Slide 10.34
Slide 10.35
Slide 10.36
Slide 10.37
Slide 10.38
Slide 10.39
Slide 10.40
Slide 10.41
Slide 10.42
Slide 10.43
Slide 10.44
Slide 10.45
Slide 10.46
Slide 10.47
Slide 10.48
Slide 10.49
Slide 10.50
Slide 10.51
Slide 10.52
Slide 10.53
Slide 10.54
Slide 10.55
Slide 10.56
10.17. SUMMARY
209
Slide 10.57
Slide 10.58
Slide 10.59
Slide 10.60
Slide 10.61
Slide 10.62
Slide 10.63
Slide 10.64
Slide 10.65
Slide 10.66
Slide 10.67
Slide 10.68
Slide 10.69
Slide 10.70
Slide 10.71
Slide 10.72
Slide 10.73
Slide 10.74
Slide 10.75
Slide 10.76
210
CHAPTER 10. DATA STRUCTURES
Chapter 11
3-D Objects and Surfaces
11.1
Geometric and Radiometric 3-D Effects
We are reviewing various effects we can use to model and perceive the 3-dimensional properties of
objects. This could be radiometric or geometric effects of reconstructing and representing objects.
When we look at a photograph of a landscape as in Slide 11.3, we notice various depth cues. Slide
11.4 summarizes these and other depth cues. Total of eight different cues are being described. For
example, colors tend to become bluer as the objects are farther away. Obviously, objects that are
nearby would cover and hide objects that are farther away. Familiar objects, such as buildings
will appear smaller as the distance grows. Our own motion will make nearby things move faster.
We have spatial viewing by stereoscopy. We have brightness that reduces as the distance grows.
Focus for one distance will have to change at others distance. Texture of a nearby object will
become simple shading of a far-away object.
Slide 11.5 shows that one often times differentiates between so-called two-dimensional, two-anda-half and three-dimensional objects. When we deal with two-and-half objects, we deal with one
surface of that object, essentially a function z(x, y) that is single-valued. In contrast a threedimensional object may have multiple values of z for a given x and y. Slide 11.5 is a typical
example of a two-and-a-half-dimensional object, Slide 11.7 of a three-D object.
Prüfungsfragen:
• Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 21 D- oder 3DModellen. Definieren Sie die Objektbeschreibung durch 2 12 D- bzw. 3D-Modelle mittels Gleichungen und erläutern Sie in Worten den wesentlichen Unterschied!
11.2
Measuring the Surface of An Object (Shape from X)
”Computer Vision“ is an expression that is particularly used when dealing with 3-D objects.
Methods that determine the surface of an object are numerous. One generally denotes methods
that will create a model of one side of a object (a two-and-a-half-dimensional model), as shapefrom-X. One typically will include the techniques which use images as the source of information.
In Slide 11.9 we may have sources of shape information that are not images. Slide 11.10 highlights
the one technique that is mostly used for small objects that can be placed inside a measuring
device. This may or may not use images to support the shape reconstruction. A laser may scan a
profile across the object, measuring the echo-time, and creating the profile sequentially across the
211
212
CHAPTER 11. 3-D OBJECTS AND SURFACES
object thereby building up the shape of the object. The object may rotate under a laser scanner,
or the laser scanner may rotate around the object. In that case we obtain a complete three-D
model of the object. Such devices are commercially available. For larger objects airborne laser
scanners exist such as shown in Slide 11.11 and previously discussed in the Chapter 2. A typical
product of an airborne laser scanner is shown in Slide 11.12.
The next technique is so-called Shape-from-Shading. In this technique, an illuminated object’s
gray tones are used to estimate a slope of the surface at each pixel. Integration of the slopes
to a continuous surface will lead to a model of the surface’s shape. This technique is inherently
unstable and under-constrained. There is not a unique slope associated with a pixel’s brightness.
The same gray value may be obtained from various illumination directions and therefore slopes.
In addition, the complication with this technique is that we must that the reflectance properties
of the surface. We have knowledge in an industrial environment where parts of known surface
properties are kept in a box and a robot needs to recognize the shape. In natural terrain, shading
alone is an insufficient source of information to model the surface shape. Slide 11.14 suggests an
example where a picture of a sculpture of Mozart is used to recreate the surface shape. With
perfectly known surface properties and with a known light source, we can cope with the variables
and constrain the problem sufficiently to find a solution.
An analogy of Shape-from-Shading is Photometric Stereo, where multiple images are taken of
a single surface from multiple vantage points that are known, but where the geometry of the
individual images is identical, only the illumination is not. This can be used in microscopy as
shown in the example of Slide 11.16.
Shape-from-Focus is also usable in microscopes, but also in a natural environment with small
objects. A Shape-from-Focus imaging system finds the portion of an object that is in focus, thereby
producing a contour of the object. By changing the focal distance we obtain a moving contour
and can reconstruct the object. Slide 11.18 illustrates a system that can do a shape reconstruction
in real time using the changing focus. Slide 11.19 illustrates two real-time reconstructions by
Shape-from-Focus. Slide 11.20 has additional examples.
The method of Structured Light projects a pattern onto an object and makes one or more images
of the surface with the pattern. Depending on the type of patterns we can from a single image
reconstruct the shape, or we can use the pattern as a surface texture to make it easy for an
algorithm to find overlapping image points in the stereo-method we will discuss in a moment.
Slide 11.22 through Slide 11.25 illustrate the use of structured light. In case of Slide 11.22 and
Slide 11.23 a stereo-pair is created and matching is made very simple. Slide 11.24 illustrates the
shape that is being reconstructed. Slide 11.25 suggests that by using a smart pattern, we can
reconstruct the shape from the gray-code that is being projected.
Slide 11.27 illustrates a fairly new technique for mapping terrain using interferometric radar. A
single radar pulse is being transmitted from an antenna in an aircraft or satellite and this is
reflected off the surface of the Earth and is being received by the transmitting antenna and an
auxiliary second antenna that is placed in the vicinity of the first one, say at the two wings of an
airplane. The difference in arrival time of the echoes at the two antennas is indicative of the angle
under which the pulse has traveled to the terrain and back. The method is inherently accurate to
within the wavelength of the used radiation. This technique is available even for satellites, with
two antennas on the space shuttle (NASA mission SRTM for Shuttle Radar Topography Mission,
1999), or is applicable to systems with a single antenna on a satellite, where the satellite repeats an
orbit very close, to within a few hundred meters of the original orbit, and in the process produces
a signal as if the two antennas had been carried along simultaneously.
The most popular and most widely used technique of Shape-from-X is the stereo-method . Slide
11.29 suggests a non-traditional arrangement, where two cameras take one image each of a scene
where the camera’s stereo-base b is the distance from one another. Two objects Pk and Pt are
at different depths as seen from the stereo-base and we can from the two images determine a
parallactic angle γ which allows us to determine the depth difference between the two points.
11.3. SURFACE MODELING
213
Obviously, a scene as shown in Slide 11.29 will produce a 2-D representation on a single image in
which the depth between Pt and Pk is lost. However, given two images, we can determine the angle
(and the distance to point Pk and we can also determine the angle dγ (and obtain the position
of point Pt at a depth different from Pk ’s. Slide 11.30 illustrates two images of a building. The
two images are illuminated in the same manner by the sunlight. The difference between the two
images is strictly geometrical. We have lines in the left image and corresponding lines in the right
images that are called “epi-polar lines”. Those are intersections of a special plane in 3-d space
with each of the two images. These planes are formed by the two projection centers and a point
on the object. If we have a point on the line of the left image, we know that it’s corresponding
matching point must be on the corresponding epi-polar line in the right image. Epi-polar lines
help in reducing the searching for match points for automated stereo. Slide 11.31 is a stereo
representation from an electron microscope. Structures are very small, pixels may have the size
of a few nanometers in object-space. We do not have a center-perspective camera model as the
basis for this type of stereo. However, the electron microscopic mode of imaging can be modeled
and we can reconstruct the surface by a method similar to classical camera stereo.
Slide 11.32 addresses a last technique of Shape-from-X, tomography. Slide ?? and Slide ?? illustrate from medical imaging, a so-called computer-aided tomographic CAT scan of a human scull.
Individual images represent a slice through the object. By stacking up a number of those images
we obtain a replica of the entire original space. Automated methods exist that collect all the
voxels that belong to a particular object and in the process determine the surface of that object.
The result is shown in Slide 11.34.
Prüfungsfragen:
• Erstellen Sie bitte eine Liste aller Ihnen bekannten Verfahren, welche man als Shape-from”
X“ bezeichnet.
• Wozu dient das sogenannte photometrische Stereo“? Und was ist die Grundidee, die diesem
”
Verfahren dient?
• In der Vorlesung wurden Tiefenwahrnehmungshilfen ( depth cues“) besprochen, die es dem
”
menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in
der digitalen Bildverarbeitung von verschiedenen shape from X“-Verfahren gelöst. Welche
”
depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden shape
”
”
from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung kann kein solcher Zusammenhang hergestellt werden?
11.3
Surface Modeling
There is an entire field of study to optimally model a surface from the data primitives one may
have obtained from stereo or other Shape-from-X techniques. We are dealing with point clouds,
connecting the point clouds to triangles, building from triangles polygonal faces, then take the
faces and replace them by continuos functions such as bi-cubic or quadric functions. Slide 11.36
illustrates a successfully constructed network of triangles, using as input a set of points created
from stereo. Slide 11.37 illustrates the triangles formed from all the photogrammetrically obtained
points of Emperor Charles in the National Library in Vienna. Also shown is a rendering of that
surface using photographic texture. calls to mind that these problems of creating a surface from
measured points. triangulating points etc have been previously discussed in the Chapters 9 and
10.
214
CHAPTER 11. 3-D OBJECTS AND SURFACES
11.4
Representing 3-D Objects
In representing 3-D objects we have to cope with 2 important subjects:
• hidden edges and hidden surfaces
• the interaction of light and material
In dealing with hidden edges and surfaces, we essentially differentiate among two classes of procedures. The first is an image space method where we go through all the pixels of an image and
find the associated object point that is closest to the image. This method is very susceptible to
aliasing effects. The object space method searches through the object among all object elements
and is checking what can be seen from the vantage point of the user. These techniques are less
prone to suffer from aliasing.
The issue of hidden lines or surfaces is illustrated in Slide 11.40 with a single-valued function
y = f (x, z). We might represent this surface by drawing profiles from the left edge to the right
edge of the 2-D surface. The resulting image in Slide 11.40 is not very easily interpreted. Slide
11.42 illustrates the effect of removing hidden lines. Hidden lines are being removed by going from
profile to profile through the data set and plotting them into a 2-D form as shown in Slide 11.43.
Each profile is compared with the background and we can find by a method of clipping which
surface elements are hidden by previous profiles. This can be done in one dimension as shown in
Slide 11.43 and then in a second dimension (Slide 11.44). When we look at Slide 11.44 we might
see slight differences between two methods of hidden line removal in case (c) and case (d).
Many tricks are being applied to speed up the computation of hidden lines and surfaces. One
employs the use of neighborhoods or some geometric auxiliary transformations, some accelerations
using bounding boxes around objects or finding surfaces that are facing away from the view position
(back-face culling), a subdivision of the view frustum and the use of hierarchies. Slide 11.46
illustrates the usefulness of enclosing rectangles or bounding boxes. Four objects exist in a 3-D
space and it is necessary to decide which ones cover the other ones up. Slide 11.47 illustrates
that the bounding box approach, while helping many times, may also mislead one to suspecting
overlaps when there are none.
Prüfungsfragen:
• Bei der Erstellung eines Bildes mittels recursive raytracing“ trifft der Primärstrahl für ein
”
bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen
aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die
Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten
jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel
zugeordnete Intensität, wenn
1. die Rekursionstiefe nicht beschränkt ist,
2. der Strahl nur genau einmal aufgeteilt wird,
3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt!
Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des
Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden!
Antwort:
11.5. THE Z-BUFFER
215
1. ohne Beschränkung:
I
=
=
=
=
=
2.7 + 0.1 · 2 + 0.5 · (3 + 0.4 · 2 + 0.1 · 4)
2.7 + 0.2 + 0.5 · (3 + 0.8 + 0.4)
2.9 + 0.5 · 4.2
2.9 + 2.1
5
2. Rekursionstiefe beschränkt:
I
= 2.7 + 0.1 · 2 + 0.5 · 3
= 2.7 + 0.2 + 1.5
= 4.4
3. Abbruch nach Gewichtung:
I
11.5
= 2.7 + 0.5 · (3 + 0.4 · 2)
= 2.7 + 0.5 · 3.8
= 2.7 + 1.9
= 4.6
The z-Buffer
Algorithm 29 z-buffer
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
Set zBuffer to infinite
for all possible polygons plg that have to be drawn do
for all possible scanlines scl of that polygon plg do
for all possible pixels pxl of that scanline scl do
if z-Value of pixel pz is nearer than zBuffer then
set zBuffer to z-Value of pixel pz
draw pixel pxl
end if
end for
end for
end for
The most popular approach to hidden line and surface removal is the well-known z-Buffer method
(algorithm 29). It has been introduced in 1974 and uses a transformation of an object’s surface
facets into the image plane and keeping track at each pixel of the distance between the camera
and the corresponding element on an object facet. One is keeping that gray value in each pixel
which comes from an object point that is closest to the image plane.
Another procedure is illustrated in Slide 11.50 with an oct-tree. The view reference point V as
shown in that slide leads to labeling of the octtree space and shows that the element 7 will be seen
most.
Prüfungsfragen:
• Die vier Punkte aus Aufgabe B.2 bilden zwei Strecken
A = p1 p2 ,
B = p3 p4 ,
216
CHAPTER 11. 3-D OBJECTS AND SURFACES
z
12
11
10
9
8
7
6
A
B
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10 x
− − B B A A B B B − −
Figure 11.1: grafische Auswertung des z-Buffer-Algorithmus
deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline
fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt
(A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist!
Hinweis: Zeichnen Sie p1 p2 und p3 p4 in die xz-Ebene des Gerätekoordinatensystems ein!
Antwort: siehe Abbildung 11.1
11.6
Ray-tracing
The most popular method to find hidden surfaces (but also used in other contexts) is the so-called
ray-tracing method. Slide 11.52 illustrates the basic idea that we have a projection center, an
image window and the object space. We cast a ray from the projection center through a pixel
into the object space and check to see where it hits the objects. To accelerate the ray-tracing we
subdivide the space and instead of intersecting the ray with each actual object we do a search
through the bounding boxes surrounding the objects. In this case we can dismiss many objects
because they are not along the path of the ray that is cast for a particular pixel. Pseudocode can
be seen in algorithm 30
Prüfungsfragen:
• Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen! Welche
”
Optimierungen können helfen, den Rechenaufwand zu verringern?
Antwort: Vom Projektionszentrum aus wird durch jedes Pixel der Bildebene ein Strahl
in die Szene geschickt und mit allen Objekten geschnitten. Von allen getroffenen Objekten
bestimmt jenes, dessen Schnittpunkt mit dem Strahl dem Projektionszentrum am nächsten
liegt, den Farbwert des Pixels.
– Die Zahl der benötigten Schnittberechnungen kann durch Verwendung von hierarchischen bounding-Volumina stark reduziert werden.
– Das getroffene Objekt (bei recursive ray-tracing nur im ersten Schnitt) kann auch mit
Hilfe des z-buffer Algorithmus ermittelt werden.
11.6. RAY-TRACING
Algorithm 30 Raytracing for Octrees
Raytracing - Algorithmus
Für jede Zeile des Bildes
Für jedes Pixel der Zeile
Bestimme Strahl vom Auge zum Pixel;
Pixelfarbe = Raytrace(Strahl);
Raytrace(Strahl)
Für alle Objekte der Szene
Wenn Strahl Objekt schneidet und Schnittpunkt ist bisher
am nächsten
notiere Schnitt;
Wenn kein Schnitt dann Ergebnis:=Hintergrundfarbe sonst
Ergebnis:=
Raytrace(reflektierter Strahl) + Raytrace(gebrochener Strahl);
Für alle Lichtquellen
Für alle Objekte der Szene
Wenn Strahl zur Lichtquelle Objekt schneidet
Schleifenabbruch, nächste Lichtquelle
Wenn kein Schnitt gefunden
Ergebnis += lokale Beleuchtung
Octree - Implementierung
Aufbau
Lege Quader q um Szene
Für alle Objekte o
Einfügen(o, q)
Einfügen (Objekt o, Quader q)
Für alle acht Teilquader t von q
Wenn o ganz in t passt
Ggf. t erstellen
Einfügen(o,t )
return
Ordne Objekt o Quader q zu
Schnitt
Schnitt (Quader q, Strahl s)
Wenn q leer return NULL
Wenn Schnitttest(q, s)
Für alle acht Teilquader t von q
res += Schnitt(t, s)
Für alle zugeordneten Objekte o
res += Schnitttest(o, s)
return nächsten Schnitt(res)
217
218
11.7
CHAPTER 11. 3-D OBJECTS AND SURFACES
Other Methods of Providing Depth Perception
Numerous methods exist to help us create the impression of depth in the rendering of a 3-D model.
These may include coding by brightness or coding in color. Slide 11.55 illustrates depth encoding
by means of the brightness of lines. The closer an object is to the viewer, the brighter it is. In Slide
11.56 we even add color to help obtaining a depth perception. Of course the depth perception
improves dramatically if we use the removal of edges as shown in Slide 11.57. We now can take
advantage of our knowledge that nearby objects cover up objects that are farther away. Slide
11.60 indicates that the transition to illumination methods for rendering 3-D objects is relevant
for depth perception.
Slide 11.58 introduces the idea of halos to represent 3-D objects, and Slide 11.59 is an example.
At first we see a wire-frame model of a human head and we see the same model after removing
the hidden lines but also interrupting some of the lines when they intersect with other lines. The
little interruption is denoted as a halo.
11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION
219
220
CHAPTER 11. 3-D OBJECTS AND SURFACES
Slide 11.1
Slide 11.2
Slide 11.3
Slide 11.4
Slide 11.5
Slide 11.6
Slide 11.7
Slide 11.8
Slide 11.9
Slide 11.10
Slide 11.11
Slide 11.12
Slide 11.13
Slide 11.14
Slide 11.15
Slide 11.16
Slide 11.17
Slide 11.18
Slide 11.19
Slide 11.20
Slide 11.21
Slide 11.22
Slide 11.23
Slide 11.24
Slide 11.25
Slide 11.26
Slide 11.27
Slide 11.28
11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION
221
Slide 11.29
Slide 11.30
Slide 11.31
Slide 11.32
Slide 11.33
Slide 11.34
Slide 11.35
Slide 11.36
Slide 11.37
Slide 11.38
Slide 11.39
Slide 11.40
Slide 11.41
Slide 11.42
Slide 11.43
Slide 11.44
Slide 11.45
Slide 11.46
Slide 11.47
Slide 11.48
Slide 11.49
Slide 11.50
Slide 11.51
Slide 11.52
Slide 11.53
Slide 11.54
Slide 11.55
Slide 11.56
222
CHAPTER 11. 3-D OBJECTS AND SURFACES
Slide 11.57
Slide 11.58
Slide 11.59
Slide 11.60
Chapter 12
Interaction of Light and Objects
Radiation and the natural environment have a complex interaction. If we assume as in Slide 12.2
that the sun illuminates the Earth, we have atmospheric scattering as the radiation approaches
the surface. We have atmospheric absorption that reduces to power in the light coming from the
sun. Then we have reflection of the top surface, which can be picked up by a sensor and used
in image formation. The light will go through an object and will be absorbed, but at the same
time an object might emit radiation, such as for example in the infrared wave length. Finally
the radiation will hit the ground and might again be absorbed, reflected or emitted. As the light
returns from the Earth’s surface to the sensor we again have atmospheric absorption and emission.
In remote sensing many of those factors will be used to describe and analyze objects based on
sensed images. In computer graphics we use a much simplified approach.
12.1
Illumination Models
Definition 31 Ambient light
In the Ambient Illumination Model the light intensity I after reflection from an object´s surface
is given by the equation
I = Ia ka
Ia is the intensity of ambient light, assumed to be constant for all objects. ka is a constant
between 0 and 1, called ambient-reflection coefficient. ka is a material property, and must be
defined for every object.
Ambient light alone creates unnatural images, because every point on an object´s surface is assigned the same intensity. Shading is not possible with this kind of light. Ambient light is used
mainly as an additional term in more complex illumination models, to illuminate parts of an object
that are visible to the viewer, but invisible to the light source. The resulting image then becomes
more realistic.
The simplest case is illumination by ambient light (definition 31). The existing light will be
multiplied with the properties of an object to produce the intensity of an object point in an
image. Slide 12.4 illustrates this with the previously used indoor scene.
Slide 12.5 goes one step further and introduces the diffuse Lambert reflection. There is a light
source which illuminates the surface under an angle Θ from the surface normal. The illumination
intensity I is the amount of incident light × the surface property k × the angle under which the
light is falling onto the surface is then being reflected. Slide 12.6 illustrates the effect of various
223
224
CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS
Definition 32 Lambert model
The Lambert Model describes the reflection of the light of a point light source on a matte surface
like chalk or fabrics.
Light emitted on a matte surface is diffuse reflected. This means that the light is reflected
uniformely in any direction. Because of the uniform reflection in any direction the amount of light
seen from any angle in front of the surface is the same. As the point of view does not influence
the amount of reflected light seen, the position of the light source has to. This relationship is
described in the Lambertian law:
Lambertian law:
Assume that a surface facet is directly illuminated by light so that the normal vector of the
surface is parallel to the vector from the light source to the surface facet. Now if you tilt the
surface facet by an angle θ the amount of light falling on the surface facet reduces by cos θ.
A tilted surface is illuminated by less light than a surface normal to the light direction.
So it reflects less light. This is called the diffuse Lambertian reflection:
I = Ip · kd · cos θ
Where I is the amount of reflected light; Ip is the intensity of the point light source; kd is the
materials diffuse reflection coefficient and cos θ is the angle between the surface normal and the
light vector.
values of the parameter k. The angle cos Θ in reality can also be expressed as an in-product of
two vectors, namely the vector from the light and the surface normal. Considering this diffuse
Lambert reflection our original image becomes Slide 12.7.
The next level of complexity is to add the two brightnesses together, the ambient and the Lambert
illumination. A next sophistication gets introduced if we add an atmospheric attenuation of the
light as a function of distance to the object shown in Slide 12.9. So far we have not talked about
mirror reflection. For this we need to introduce a new vector. We have the light source L, the
surface normal N , the mirror reflection vector R and the direction to a camera or viewer V . We
have a mirror reflection component in the system that is illustrated in Slide 12.10 with a term
W cosn α. α is the angle between the viewing direction and the direction of mirror reflection. W is
a value that the user can choose to indicate how mirror-like the surface is. Phong introduced the
model of this mirror reflection in 1975 and explained the effect of the power of n of cosn α. The
larger the power is, the more focussed and smaller will the area of mirror reflection be. But not
only does the power n define the type of mirror reflection, but also the parameter W as shown in
Slide 12.12 where the same amount of mirror reflection produces different appearances by varying
the value of the parameter W . W is describing the blending of the mirror reflection into the
background whereas the value n is indicating how small or large the area is that is affected by
the mirror reflection. Slide 12.13 introduces the idea of a light source that is not a point. In that
case we introduce a point light source and a reflector, which will reflect light onto the scene. The
reflector represendts the extended light source.
Prüfungsfragen:
• Was ist eine einfache Realisierung der Spiegelreflektion“ (engl.: specular reflection) bei
”
der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den
Namen eines Verfahrens nach seinem Erfinder.
• In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parame-
12.2. REFLECTIONS FROM POLYGON FACETS
225
ter der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter
wahrgenommene Intensität I dieses Punktes!
Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge
gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die
Näherung (1 − x)k ≈ 1 − kx für kleine x verwendbar ist.
12.2
Reflections from Polygon Facets
Gouraud introduced the idea of interpolated shading. Each pixel on a surface will have a brightness in an image that is interpolated using the three surrounding corners of the triangular facet.
The computation is made along a scan line as shown in Slide 12.15 with auxiliary brightness values
Ia and Ib . Note that the brightnesses are computed with a sophisticated illumination model at
positions I1 , I2 and I3 of the triangle and then a simple interpolation scheme is used to obtain
the brightness in Ip . Gouraud does not consider a specular reflection while Phong does.
Gouraud just interpolated brightnesses (algorithm 31), Phong interpolates surface normals from
the corners of a triangle (algorithm 32). Slide 12.16 explains. Slide 12.17 illustrates the appearance of a Gouraud illumination model. Note how smooth the illumination changes along the
surface whereas the geometry of the object is not smoothly interpolated. Slide 12.18 adds specular
reflection to Gouraud. Phong, as shown in Slide 12.19, is creating a smoother appearance of the
surface because of its interpolation of the surface normal. Of course it includes specular reflection.
In order to not only have smoothness in the surface illumination but also in the surface geometry,
facets of the object must be replaced by curved surfaces. Slide 12.20 illustrates the idea: the
model’s appearance is improved, also due to the specular reflection of the Phong model.
Slide 12.21 finally is introducing additional light sources. Slide 12.22 summarizes the various types
of reflection. We have the law of Snell, indicating that the angle of incidence equals the angle of
reflection and these angles are measured with respect to the surface normal. A mirror or specular
reflection is very directed and the incoming ray is reflected in the opposite output direction. The
opposite of specular reflection is the ”diffuse“ reflection. If it is near perfect, it will radiate into
all directions almost equally. The Lambert reflection is a perfect diffuse reflector as shown in on
the right-hand side.
Prüfungsfragen:
• Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt
nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte
ist IA = 100, IB = 50 und IC = 0. Berechne die Beleuchtungswerte nach dem GouraudVerfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden
Pixeln.
• Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks,
das zu einer beleuchteten polygonalen Szene gehört.
12.3
Shadows
Typically, shadows are computed in two steps or phases. The computations for shadows are related
to the computation of hidden surfaces, because areas in shadows are areas that are not seen from
the illuminating sun or light source. Slide 12.24 explains the two types of transformation. We first
have to transform a 3-D object into a fictitious viewing situation with the view point at the light
source. That produces the visible surfaces in that view. A transform into the model coordinates
produces shadow edges. We now have to merge the 3-D viewing and the auxiliary lines from
226
CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS
Algorithm 31 Gouraud shading
Prozedur ScanLine (xa , Ia , xb , Ib , y)
1: grad = (Ib − Ia )/(xb − xa )
{Schrittweite berechnen}
2: if xb > xa then
3:
xc = (int)xa + 1
{xc und xd auf Mittelpunkt von Pixel setzen}
4:
xd = (int)xb
5: else
6:
xc = (int)xb + 1
7:
xd = (int)xa
8: end if
9: I = Ia + (xc − xa ) ∗ grad
{Startwert für erstes Pixel berechnen}
10: while xc ≤ xd do
11:
I auf Pixel (xc ,y) anwenden
12:
xc = xc + 1
{einen Schritt weiter gehen}
13:
I = I + grad
14: end while
Function Triangle(x1 , y1 , I1 , x2 , y2 , I2 , x3 , y3 , I3 )
1: Punkte aufsteigend nach der y-Koordinate sortieren
2: ∆xa = (x2 − x1 )/(y2 − y1 )
{Schrittweiten für linke Kante berechnen}
3: ∆Ia = (I2 − I1 )/(y2 − y1 )
4: ∆xb = (x3 − x1 )/(y3 − y1 )
{Schrittweiten für rechte Kante berechnen}
5: ∆Ib = (I3 − I1 )/(y3 − y1 )
6: y = (int)y1 + 1
{Startzeile berechnen}
7: yend = (int)(y2 + 0.5)
{Endzeile für oberes Teildreieck berechnen}
8: xa = x1 + (y − y1 ) ∗ ∆xa
{Startwerte berechnen}
9: xb = x1 + (y − y1 ) ∗ ∆xb
10: Ia = I1 + (y − y1 ) ∗ ∆Ia
11: Ib = I1 + (y − y1 ) ∗ ∆Ib
12: while y < yend do
13:
eine Zeile mit ScanLine(xa , Ia , xb , Ib , y) berechnen
14:
xa = xa + ∆xa
{einen Schritt weiter gehen}
15:
xb = xb + ∆xb
16:
Ia = Ia + ∆Ia
17:
Ib = Ib + ∆Ib
18:
y =y+1
19: end while
{oberes Teildreieck fertig}
20: ∆xa = (x3 − x2 )/(y3 − y2 )
{Schrittweiten für Kante berechnen}
21: ∆Ia = (I3 − I2 )/(y3 − y2 )
22: yend = (int)(y3 + 0.5)
{Endzeile für unteres Teildreieck berechnen}
23: xa = x2 + (y − y2 ) ∗ ∆xa
{Startwert berechnen}
24: while y < yend do
25:
eine Zeile mit ScanLine(xa , Ia , xb , Ib , y) berechnen
26:
xa = xa + ∆xa
{einen Schritt weiter gehen}
27:
xb = xb + ∆xb
28:
Ia = Ia + ∆Ia
29:
Ib = Ib + ∆Ib
30:
y =y+1
31: end while
{unteres Teildreieck fertig}
12.3. SHADOWS
227
Algorithm 32 Phong - shading
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
for all polygons do
compute the surface normal in the corners of the polygon.
project the corners of the polygon into the plane
for all scanlines, which are overlaped by the polygon do
compute the linear interpolated surface normals on the left and right edge of the polygon
for all pixels of the polygon on the scanline do
compute the linear interpolated surface normals
normalize the surface normals
compute the illuminating modell and set the color of the pixel to the computed value
end for
end for
end for
Algorithm 33 Shadow map
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
make lightsource coordinate be center of projection
render object using zbuffer
assign zbuffer to shadowzbuffer
make camera coordinate be center of projection
render object using zbuffer
for all pixels visible do
Map coordinate from ’camera space’ into ’light space’
Project transformed coordinate to 2D (x’,y’ )
if transformed Z-coordinate > shadowzbuf f er[x’, y’] then
shadow pixel
{A Surface is nearer to the point than the lightsource}
end if
end for
Algorithm 34 Implementation of Atheron-Weiler-Greeberg Algorithm
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
make lightpoint be center of projection
determine visible parts of polygones
split visible and invisible parts of partial lightened polygones
transform to modelling database
merge original database with lightened polygones
{results a object splitted in lightened an
unlightened polygones}
make (any) eye point be center of projection
for all polygons do
{reder scene}
if polygone is in shadow then
set shading model to ambient model
else
set shading model to default model
end if
draw polygones
end for
228
CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS
the shadow boundaries into a combined polygon data base. The method of computing hidden
surfaces from the viewer’s perspective is repeated in Slide 12.25. Slide 12.26 illustrates the use of
the z-buffer method for the computation of shadow boundaries (algorithm 33). L is the direction
of the light, V is the position of the viewer. We first have to do a z-buffer from the light source,
and then we do a z-buffer from the viewer’s perspective. The view without shadows and the view
with them give a dramatically different impression of realism of the scene with two objects.
Prüfungsfragen:
• Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels
z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile.
12.4
Physically Inspired Illumination Models
There is a complex world of illumination computations that are concerned with the bi-directional
reflectivity function BRDF. In addition we can use ray-tracing for illumination and a very particular method called radiosity. We will spend a few thoughts on each of those three subjects.
A BRDF in Slide 12.28 describes the properties of a surface as a function of illumination. A 3-D
shape indicates how the incoming light from a light source is being reflected from a particular
surface. Many of the mathematical models used to describe those complex shapes bear their
inventors’ names.
12.5
Regressive Ray-Tracing
As discussed before, we have to cast a ray from the light source onto the object and find points
in shadow or illuminated. Similarly, rays cast from the observer’s position will give us the hidden
lines from the viewer’s reference point. Slide 12.30 illustrates again the geometry of ray-tracing to
obtain complex patterns in an image from an object and from light cast from other objects onto
that surface. Transparent object reflections may be obtained from the interface of the object with
the air at the back, away from the viewer.
12.6
Radiosity
A very interesting illumination concept that has been studied extensively during the last ten years
is called radiosity. It is a method that derives from modeling the distribution of temperature in
bodies in mechanical engineering (see Algorithm 35).
We subdivide the surface of our 3-D space into small facets. We have a light source, illuminating all
the facets, but the facets illuminate one another, and they become a form of secondary illumination
source. Each surface facet has associated with it the differential surface area dA. We can set up
an equation that relates the incoming light of the facets to all other facets. Very large systems
of equations comes about. They can, however, be efficiently reduced in the number of unknowns,
and therefore efficiently be solved.
Let’s have a look at the few of the examples of these technique. In Slide 12.38, we see a radiosity
used in the representation of a classroom, Slide 12.39 is an artificial set of cubes, Slide 12.39
illustrates one table at two levels of resolution. In the first case, the facets used for radiosity
are fairly large, in the second the facets are made much smaller. We see how the realism in this
illumination model increases.
12.6. RADIOSITY
229
Algorithm 35 Radiosity
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
load scene
divide surfaces into patches
for all patches do
if patch is a light then
patch.emmision := amount of light
patch.available emmision := amount of light
else
patch.emmision := 0
patch.available emmision := 0
end if
end for
{initialize patches}
repeat
{render scene}
for all patches i, starting at the patch with the highest emmision available do
place hemicube on top of patch i
{needed to calculate form factors}
for all patches j do
calculate form factor between patch i and patch j
{needed to calculate amount of
light}
end for
for all patches j do
∆R := amount of light from patch i to patch j {using the form factor and properties
of the patches}
j.emmision available := j.emmision available +∆R
j.emmision := j.emmision +∆R
end for
i.emmision available := 0 {all aviailable light has been distributed to the other patches}
end for
until good enough
230
CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS
Similarly, we have radiosity in modeling a computer room in Slide 12.39. We have internal illumination, and in one case on the lower right of Slide 12.40 we have illumination from the outside of
the room. In slide we see a radiosity-based computation of an indoor scene again in two levels of
detail in the mesh sizes for the radiosity computation.
12.6. RADIOSITY
231
232
CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS
Slide 12.1
Slide 12.2
Slide 12.3
Slide 12.4
Slide 12.5
Slide 12.6
Slide 12.7
Slide 12.8
Slide 12.9
Slide 12.10
Slide 12.11
Slide 12.12
Slide 12.13
Slide 12.14
Slide 12.15
Slide 12.16
Slide 12.17
Slide 12.18
Slide 12.19
Slide 12.20
Slide 12.21
Slide 12.22
Slide 12.23
Slide 12.24
Slide 12.25
Slide 12.26
Slide 12.27
Slide 12.28
12.6. RADIOSITY
233
Slide 12.29
Slide 12.30
Slide 12.31
Slide 12.32
Slide 12.33
Slide 12.34
Slide 12.35
Slide 12.36
Slide 12.37
Slide 12.38
Slide 12.39
Slide 12.40
Slide 12.41
234
CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS
Chapter 13
Stereopsis
13.1
Binokulares Sehen
The 3-dimensional impressions of our environment as received by our two eyes is called binocular
vision. Slide 13.10 explains that the human perceives two separate images via two eyes, merges
the two images in the brain and reconstructs a depth-model of the perceived scene in the brain.
The two images obtained by the two eyes differ slightly, because of the two different vantage
points. The stereo-base for natural binocular vision is typically six-and-a-half centimeters, thus
the distance between the eyes.
Recall that natural depth perception is defined by many depth queues other that binocular vision.
We talked about depth queues by color, by size, by motion, by objects covering up one another
etc. (see Chapter 11)
Slide 13.4 explains geometrically the binocular stereo-effect. On the retina, two points, P and Q,
will be imaged on top of one another in one eye, but will be imaged side by side subtending an small
angle dγ in the other eye. We call γ the parallactic angle or parallax an dγ a parallel difference.
It is the measure of disparity which is sensed and used in the brain for shape reconstruction.
The angle γ itself gives us the absolute distance to a point P and is usually computed from the
stereobase ba . Note that our eyes are sensitive to within a parallactic angle of 15 seconds of arc
(15”), and may be limited to perceive a parallactic angle no larger than 7 minutes of arc (7’).
Slide 13.5, Slide 13.6, and Slide 13.7 illustrate two cases of stereo-images taken from space and one
from microscopy. Note the difference between binocular viewing and stereo-viewing as discussed
in a moment.
What is of interest in a natural binocular viewing environment is the sensitivity of the eyes to
depth. Slide 13.8 explains that the difference in depth between 2 points, d, can be obtained by
our sensitivity to the parallactic angle, dγ. Since this is typically no smaller than 17 seconds of
arc, we have a depth differentiation ability dγ as shown in Slide 13.8. At a distance of 25 cm we
may be able to perceive depth differences as small a few p micrometers. At a meter it may be a
tenth of a millimeter, but at ten meters distance, it may already be about a meter. At a distance
of about 900 meters, we may not see any depth at all from our binocular vision.
Prüfungsfragen:
• Gegeben sei eine Distanz yA = 3 Meter vom Auge eines scharfäugigen Betrachters mit
typischem Augenabstand zu einem Objektpunkt A. Wie viel weiter darf sich nun ein zweiter
Objektpunkt B vom Auge befinden, sodass der Betrachter den Tiefenunterschied zwischen
den beiden Objektpunkten A und B gerade nicht mehr wahrnehmen kann? Es wird um die
235
236
CHAPTER 13. STEREOPSIS
entsprechende Formel, das Einsetzen von Zahlenwerten und auch um die Auswertung der
Formel gebeten.
• Auf der derzeit laufenden steirischen Landesausstellung comm.gr2000az“ im Schloss Eggen”
berg in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu
können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden.
Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte Skizze der Anordnung ist in Abbildung B.63 dargestellt.
Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63
markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit
halber folgende Kameraparameter an:
– Brennweite: 10 Millimeter
– geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter
Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie
eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen
zur Unschärfe der binokularen Tiefenwahrnehmung verwenden.
13.2
Stereoskopisches Sehen
We can now trick our two eyes to think they would see the natural environment, when in fact
they look at two images presented separately to the left and right eye. Since those images will
not be at an infinite distance, but will be perhaps at 25 cm, we will be forced with our eyes to
focus at 25 cm, yet use in our brain an attitude as if one were to look at a much larger distance
where the eye’s optical axes are parallel. Many people have difficulties focussing at 25 cm, and
simultaneously obtaining a stereoscopic impression.
To help, one has auxiliary tools called a mirror stereoscope. Two images are placed on a table, an
assembly of two mirrors and a lens present each image separately to each eye, whereby the eye is
permitted to focus at infinity and not at 25 cm.
Slide 13.12 lists alternative modes of stereo-viewing. We mentioned the mirror stereoscope with
separate optical axes. A second approach is by anaglyphs, implemented in the form of glasses,
where one eye is only receiving the red, the other one only the green component in an image. A
third approach is polarization, where the images presented to the eyes are polarized differently
for the left and right eye. And a further approach is the ultimate presentation of images by
shutters and glasses and presentation by projection or on a monitor. All four approaches have
been implemented on computer monitors.
You can think of a mirror stereoscope looking at two images by putting two optical systems on
a monitor and have the left half present one image, the right half the other. Anaglyphs are a
classical case of looking at stereo on a monitor by presenting in the green and red channel two
images, and wearing glasses to perceive a stereo impression. The most popular way of presenting
soft copy images on a monitor is by polarization, wearing simple glasses that look at two polarized
images on a monitor or active glasses, that are being controlled from the monitor and presenting
120 images per second, 60 to one eye and 60 to the other eye, and by polarization ensuring that
the proper image hits the proper eye. This will be called image flickering using polarization.
Slide 13.13 explains how stereoscopic viewing by means of two images increases the ability of the
human to perceive depth way beyond the ability available from binocular vision. The reason is very
simple. Binocular vision is limited by the six-and-half cm distance between the two eyes, whereas
13.3. STEREO-BILDGEBUNG
237
Definition 33 total plastic
Let
p=n·v
be the total plastic, whereby
n ... image magnification
v ... eye base magnification
The synthetic eye base dA, typically 6.5 cm, can be magnified by the stereo base dK from
which the images are taken. That implies
v = dK/dA
stereoscopic vision can employ images taken from a much larger stereobase. Take the example of
aerial photography, where two images may be taken from an airplane with the perspective centers
600 meters apart. We obtain an increase in our stereo-perception that is called total plastic (see
Definition 13.2). We look at a ratio between the eye base and the stereo base from which the images
are taken which gives us a factor v. In addition, we look at the images under a magnification n,
and a total plastic increases our stereo-ability, by n · v, thus by a factor of tens of thousand. As
a result, even though the object may be, say, a thousand meters away, we still may have a depth
acuity of three cm.
Prüfungsfragen:
• Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das Geheimnis“, welches
”
es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine
wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem
Sehen möglich ist.
• Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines echten“
”
(dreidimensionalen) Raumeindrucks einer vom Computer dargestellten Szene!
13.3
Stereo-Bildgebung
We need to create two natural images, with one camera at two positions, taking images sequentially,
or with a camera pair, taking images simultaneously. The simultaneous imaging is preferred when
the object moves. Slide 13.15 illustrates the two camera positions looking at a two-dimensional
scene and explaining again the concept of a stereobase b, of the angle γ of convergence giving
us the distance to the object PK and the parallactic difference angle dγ which is a measure of
depth between two points PK and PT . Slide 13.16 repeats the same idea for the case of aerial
photography where an airplane takes one image at position O1 and a second image at position O2 .
The distance between 01 and 02 is this aerial stereobase b, the distance to the ground is the flying
height H, the ratio b/H is called base-to-height-ratio and is a measure for the stereo acuity of an
image pair. Slide 13.17 repeats again the case of two images taken from an overhead position. Note
that the two images look identical to the casual observer. What makes the stereo-process work
are the minute small geometric differences between the two images which occur in the direction
of flight. There are no geometric differences in the direction perpendicular to the flight direction.
Going back to Slide 13.16, we may appreciate the necessity of recreating in the computer the
relative position and orientation of the two images in space. An airplane or satellite may make
238
CHAPTER 13. STEREOPSIS
unintended motions that will lead the user to not get an accurate measure of the positions O1 and
O2 and of the direction of imaging for the two camera positions. A stereo-process will therefore
typically require that sets of points are extracted from overlapping images representing the same
object on the ground. These are called homologue points. In Slide 13.16 is suggested that a
rectangular pattern of six points has been observed in image 1 and the same six points have been
observed in image 2. What now needs to happen mathematically is that two bundles of rays
are created from the image coordinates and the knowledge of the perspective centers O1 and O2
in the camera system. And then the two bundles of rays need to be arranged such, that the
corresponding rays (homologue rays) intersect in the three-dimensional space of the object world.
We call the reconstruction of a bundle of rays from image coordinates the inner orientation. We
call the process by which we arrange the two images such that all corresponding rays intersect in
object space, the relative orientation. And we call the process by which we take the final geometric
arrangement and we make it fit into the world coordinate system by a three-dimensional conformal
transformation the absolute transformation.
Prüfungsfragen:
• Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben
Sie typische Anwendungsfälle beider Methoden!
13.4
Stereo-Visualization
images to stereoscopically view the natural environment is stereo-visualization by creating artificial
images presented to the eyes and obtaining a three-dimensional impression of an artificial world.
We visit Slide 13.19 to explain that we need to create two images for the left and the right eye
of a geometric scene, represented in the slide by a cube and its point P . Slide 13.20 shows that
we compute two images of each world point W , assuming that we have two cameras, side by side,
at a stereobase b and with their optical axes being parallel. Recall that in computer graphics the
optical axes are called view point normals, the lens center is the view point V P . Slide 13.21 is
the ground view of the geometric arrangement. We have used previously Slide 13.22 to illustrate
the result obtained by creation of two images of a three-dimensional scene. In this particular case
it is a wire-frame representation for the left and right eye. If we present those two images at a
distance of about six-and-a-half cm on a piece on a flat table, and we look vertically down and
think we are looking at infinity (so that your eye-axes are parallel) we will be able to merge the
two images into a three-dimensional model of that object. However, we will notice that we will
not have a focused image, because our eyes will tend to focus at infinity, when we force our eye
axes to be parallel.
Computer generated stereo-images are the essence of virtual environments and augmented environments. Slide 13.23 illustrates how a person does look at artificial images and receives a
three-dimensional impression, using motion detectors, that will feed the head’s position and orientation into the computer, so that as the head gets moved, a new image will be projected to
the eyes, and the motion of the head will be consistent with the experience of the natural environment. In contrast, Slide 13.24 illustrates again augmented reality, where the monitors are
semi-transparent and therefore the human observer does not only see the artificial virtual impression of computed images, but has superimposed on them the natural environment which is visible
binocularly: augmented reality uses both, the binocular and stereo-vision.
13.5
Non-Optical Stereo
Eyes are very forgiving, and the images we observe stereoscopically need not necessarily be taken
by a camera and therefore need not be centrally perspective. Slide 13.26 explains how NASA Space
13.6. INTERACTIVE STEREO-MEASUREMENTS
239
Shuttle has created radar images in sequential orbits. Those images overlap with one another and
show the same terrain. Slide 13.27 illustrates a mountain range in Arizona imaged by radar. Note
that the two images look more different than our previous optical images did. Shadows are longer
in one image than the other. Yet a stereo-impression can be obtained in the same way as we have
obtaining it with optical imagery. The quality of the stereo-measurement will be lower, because
of the added complexity that the two images are less similar in gray tones.
The basic idea of this type of stereo is repeated in Slide 13.28. We have two antennas illuminating
the ground and receiving echoes as in a traditional radar image, and the overlap area can be
presented to the eyes as if they were two optical images. The basic idea is also explained in Slide
13.29. Note that in each radar image, point P is projected into position P 0 or P 00 and we get a
parallactic distance dp . The corresponding camera position that will produce from a point P the
same positions P 0 and P 00 and this parallax distance dp would be camera positions 1 and 2 shown
in Slide 13.29.
Prüfungsfragen:
• Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der
Stereo-Bildgebung!
13.6
Interactive Stereo-Measurements
If we want to make measurements using the stereo-impression from two images, we need to add
something to our visual impression: a measuring mark. Slide 13.31 explains the two stereo-images,
and our eyes viewing the same point M in the two images, where they are presented as M1 and
M2 . If we add a measuring mark as shown in e will perceive the measuring mark (M ) to float
above or below the ground. If we now move the measuring mark in the two images, such that
they superimpose the points M1 and M2 , the measure mark will coincide with the object point
M . We can now measure the elevation differences between two points by tracking the motion that
we have to apply to the measuring mark in image space. Slide 13.32 explains the object point M ,
the measuring mark (M ) and their positions in image space at M1 , M2 .
Slide 13.33 is an attempt at illustrating the position of the measuring mark above the ground, on
the ground and below the ground. In this particular case, the stereo-perception is anaglyphic.
13.7
Automated Stereo-Measurements
See Algorithm ??. The measuring mark for stereo-measurements needs to be placed on a pair
of homologue points. Knowing the location of the stereo-measuring mark permits us to measure
the coordinates of the 3D point in the world coordinate system. A systematic description of the
terrain shape, or more generally, the shape of 3D objects, requires many surface measurements
to be made by hand. This can be automated if the location of homologue points can be found
without manual interference. Slide 13.35 and Slide 13.36 explain.
Two images exist, building a stereo-pair, and a window is taken out of each image to indicate a
homologue area. The task exists, as shown in Slide 13.37 to automatically find the corresponding
locations in such windows. For such purpose, we define a master-and-slave image. We take a
window of the master image and move it over the slave image and at each location we compute a
value, describing the similarity between the two image windows. At a maximum value of similarity,
we have found a point of correspondence. We have as a result a point 1’ in image (’) and a point
1” in image (”). These two points define two perspective rays from a perspective center through
the image plane into the world coordinate system, and intersect at a surface point 1. We need to
240
CHAPTER 13. STEREOPSIS
verify that the surface point 1 makes sense, we will not accept that point if it is totally inconsistent
with its neighborhood, we will call this a gross error. We will accept the point if it is consistent
with its neighborhood.
Slide 13.38 explains the process of matching with the master-and-slave image windows. Note that
the window may be of size K × J and we are looking in the master window of size N × M ,
obtaining many measures of similarity. Slide 13.39 defines individual pixels within the sub-image
and is the basis for one particular measure of similarity shown in Slide 13.40. In it, a measure of
2
similarity, called normalized correlation as defined by a value RN
(m, n) at location (m, n). The
values in this formula are the gray values W in the master and S in the slave image. A double
summation occurs, because of the two-dimensional nature of the windows of size M × N . Slide
13.41 illustrates two additional image correlation measures. The normalized correlation produces
a value R, which typically assumes numbers between 0 and 1. Full similarity is expressed with
a value 1, total dissimilarity results in a value 0. A non-normalized correlation will not have a
range between 0 and 1, but will assume much larger ranges. However, whether the correlation is
normalized or not, one will likely find the same extremas and therefore the same matchpoints.
A much different measure of similarity is the sum of absolute differences in gray values. We
essentially sum up the absolute differences in gray between the master-and-slave images at a
particular location (m, n) of the window. The computation is much faster than the computation
of a correlation since we avoid the squaring of values, also if a measure of similarity becomes larger
than a previous value, we can stop the double summation, since we have already found a lower
value of absolute differences, and therefore a more likely place at which maximum similarities are
achieved. Slide 13.42 explains how the many computations of correlation values result in a window
of such correlation values and we need to find the extremum, the highest correlation within the
window, as marked by a star in Slide 13.42.
Problems occur if we have multiple extremas and we don’t know which one to choose.
Slide 13.43 suggests that various techniques exist that accelerate the matching process. Slide 13.44
indicates how the existence of a pyramid will allow us to do a preliminary match with reduced
versions of the two images and then limit the size of the search windows dramatically and thereby
increase the speed of finding successful matches. We call this a hierarchical matching approach.
Another trick is shown in Slide 13.45 where an input image is converted into a gradient image or
an image of interesting features. Instead of matching two gray value images, we match two edge
images. A whole theory exists on how to optimize the search for edges in images in preparation
for a stereo-matching approach. Slide 13.46 explains that a high-pass filter that suppresses noise
and computes edges is preferable. Such a filter is the so-called LoG-filter or Laplacian-of-Gaussian
transformation of an image. Where we get two lines for each edge since we are looking for zerotransitions1 . That subject is an extension of the topic of filtering.
Prüfungsfragen:
2
• Bestimmen Sie mit Hilfe der normalisierten Korrelation RN
(m, n) jenen Bildausschnitt innerhalb des fett umrandeten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen
Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie
den gefundenen Bereich in Abbildung B.25!
Antwort:
i2
W (j, k)Sm,n (j, k)
2
RN
(m, n) = PM PN
2 PM PN
2
j=1
k=1 [W (j, k)] ·
j=1
k=1 [Sm,n (j, k)]
hP
M
j=1
1 in
German: Nulldurchgänge
PN
k=1
13.7. AUTOMATED STEREO-MEASUREMENTS
cWS
241

2
M X
N
X
:= 
W (j, k)Sm,n (j, k)
j=1 k=1
cWW
:=
M X
N
X
[W (j, k)]
M X
N
X
[Sm,n (j, k)]
2
j=1 k=1
cSS
:=
2
j=1 k=1
Position
links oben
rechts oben
links unten
rechts unten
cWS
25
25
16
4
cWW
6
6
6
6
cSS
5
6
6
6
2
RN
(m, n)
0.833
0.694
0.444
0.111
Die beste Übereinstimmung besteht links oben.
• Nach welchem Grundprinzip arbeiten Verfahren, die aus einem Stereobildpaar die Oberfläche
eines in beiden Bildern sichtbaren Körpers rekonstruieren können?
242
CHAPTER 13. STEREOPSIS
13.7. AUTOMATED STEREO-MEASUREMENTS
243
Slide 13.1
Slide 13.2
Slide 13.3
Slide 13.4
Slide 13.5
Slide 13.6
Slide 13.7
Slide 13.8
Slide 13.9
Slide 13.10
Slide 13.11
Slide 13.12
Slide 13.13
Slide 13.14
Slide 13.15
Slide 13.16
Slide 13.17
Slide 13.18
Slide 13.19
Slide 13.20
Slide 13.21
Slide 13.22
Slide 13.23
Slide 13.24
Slide 13.25
Slide 13.26
Slide 13.27
Slide 13.28
244
CHAPTER 13. STEREOPSIS
Slide 13.29
Slide 13.30
Slide 13.31
Slide 13.32
Slide 13.33
Slide 13.34
Slide 13.35
Slide 13.36
Slide 13.37
Slide 13.38
Slide 13.39
Slide 13.40
Slide 13.41
Slide 13.42
Slide 13.43
Slide 13.44
Slide 13.45
Slide 13.46
Slide 13.47
Chapter 14
Classification
14.1
Introduction
Concepts of classification cannot just be used in image analysis and computer vision but also in
many other fields where one has to make decisions.
First, we want to define the problem, then see some examples. We then review an heuristic approach called minimum distance classifier. We finally go through the Bayes Theorem as the basis
of statistical classification. We round out this chapter with a and sketch of a simple implementation
based on the Bayes Theorem.
Classification is a topic based to a considerable extent on the field of statistics, dealing with
probabilities, errors, estimations. We will stay away from statistics here, but only take a short
look.
What is the definition of classification?
We have object classes Ci , i = 1, . . . , n, and we search a certain class Ci which belongs with a set of
observations. The question is first which observations to make and then second is the classification
itself, namely the decisions to which class the observations belong.
14.2
Object Properties
Let us review object features. Objects have colors, texture, height, whatever one can imagine. If
we classify the types of land use in Austria, as suggested in Slide 14.5, a set of terrain surface
properties will be needed perhaps from satellite images and public records. Slide 14.6 enumerates
the 7 properties of electromagnetic radiation one can sense remotely, say by camera, thermal
images, radiometry, radar and interferometric sensors. As a sensor collects image data about a
scene from a distance, up to 7 characteristics are accessible.
However, the properties of the sensed signal may be used to “invert” it into a physical parameter
of the object. Examples may be the object point’s moisture or roughness, possibly its geometric
shape. Slide ?? illustrates a camera image of a small segment of skin with a growth called lesion
that could be cancer. One can extract from the physically observed color image some geometric
properties of the lesion such as length, width, roughness of the edge etc.
Slide ?? is a fingerprint, Slide 14.9 a set of derived numbers describing the finger print. Each
number is associated with a pixel for a feature vector per pixel, or with a larger object such as
the lesion or finger print. The feature vector x is the input to a classification.
245
246
CHAPTER 14. CLASSIFICATION
Prüfungsfragen:
• Welche physikalischen Merkmale der von einem Körper ausgesandten oder reflektierten
Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)?
14.3
Features, Patterns, and a Feature Space
Algorithm 36 Feature space
1:
2:
3:
4:
FeatureSpace = CreateHyperCube(n-Dimensional);
{Create an n-Dimensional Hypercube}
for all Pixels in Image do
FeatureSpace[Pixel[Plane-1], Pixel[Plane-2], .. Pixel[Plane-n]] +=1;
{Increment the
corresponding Point in the FeatureSpace by 1}
end for {This algorithm creates a Feature-Space represented by a n-Dimensional Hypercube.}
If we have to a color classification, then our features will be “color”. In a color image we represent
color via the red-green-blue (RGB) planes. Recall the eight bit gray value image representing the
R channel, next the G channel and last the B channel, representing red, green, blue.
SlideFigure x suggests color classifications, but has 4 images or channels, for instance, infrared
(IR) in addition to RGB, or temperature or whatever we can find as an object feature.
We now build up a feature space. In the case of RGB we would have three dimensions. Slide 14.12
presents just tow dimensions for simplicity, for example R and G. If we add more features (B, IR,
temperature...) we end up with hyperspaces which are hard to visualize.
14.4
Principle of Decisions
Slide 14.14 illustrates what we would like to get from the classifier’s decisions: each object, in this
case pixel, is to be assigned to a class, here denoted by O1 , O2 , O3 . . .
The simplest method of classification is a so-called minimum-distance classifier. Slide 14.19
presents a 2-dimensional feature space. Each entry into this 2D space is a vector x = (x1 , x2 )T or
(g1 , g2 )T , with the observations x1 , x2 or g1 , g2 , for example representing the amount of red (R)
or green (G) as an 8-bit digital number DN from an image.
These observations describe in this case one pixel each and we find that the value for R may be
50 and for G 90. This determines a unique entry in the feature space. As we make observations
of known objects we may define a so-called learning phase, in which we find feature pairs defining
a specific class. R = 50, G = 90 might be a type of object.
We now calculate the mean value of a distribution which is nothing else than the expected value of
a set of observations. The arithmetic mean in this case is obtained by summing up all the values
and calculating the mean. We connect those means via a straight line and define a line halfway
between the means perpendicular to the connection line. This is the boundary between the two
classes is called the discriminating function.
If we now make an observation of a new unknown object (pixel), we simply determine the distances
to the various means. In Slide 14.16 the new object belongs class O3 . This is the minimum distance
classifier.
What could be a problem with the minimum distance classifier? Suppose that in the learning
phase one makes an error and for the class O3 we make an “odd” observation. This will affect
14.4. PRINCIPLE OF DECISIONS
247
Algorithm 37 Classification without rejection
TYPE pattern =
feature: ARRAY [1 .. NbOfFeatures] of Integer;
classIdentifier: Integer;
Classify-by-MinimumDistance (input: pattern)
this Method sets the ”classIdentifier” of ”input”
to the class represented by the nearest sample-Pattern
for i:=1 to NbOfSamples do
Distance := 0
...initial value
Summarizing all differences between ”input” and ”SamplePattern[i]”:
for j:=1 to NbOfFeatures do
Difference := input.feature[j] - SamplePattern[i].feature[j]
Distance := Distance + |Difference|
end for
if i=1 then
minDistance := Distance
end if
...initial value
Setting the Class:
if Distance ≤ minDistance then
minDistance := Distance
input.classIdentifier := SamplePattern[i].classIdentifier
end if
end for
Classify-by-DiscriminationFunction (input: pattern)
this Method sets the ”classIdentifier” of ”input”
to the class with maximum function result
for i:=1 to NbOfClasses do
Sum := 0
...initial value
Summarizing all function results of the input-features:
for j:=1 to NbOfFeatures do
functionResult := DiscriminationFunction[i] (input.feature[j])
Sum := Sum + functionResult
end for
if i=1 then
maxSum := Sum
end if
...initial value
Setting the Class:
if Sum ≥ maxSum then
maxSum := Sum
input.classIdentifier := i
end if
end for
...representing the actual function set
248
CHAPTER 14. CLASSIFICATION
the expected value for the entire data set. One problem is then that we have not considered
the “uncertainty” of the observation in defining the various classes. This “uncertainty” would
be represented by the “variance” of our observations. If the observations are clustered together
closely, then their variance is small. If they are spread out widely, then their variance is larger.
Variance is not considered in a minimum classifier. Figure x illustrates that each pixel gets
classified and assigned to a class. There are no rejections where the classifier is unable to make a
decision and rejects a pixel/object/feature vector as belonging to none of the classes
Prüfungsfragen:
• Gegeben seien Trainingspixel mit den in der beiliegenden Tabelle ?? angegebenen Grauwerten. Gegeben sei auch ein neues Pixel xneu = (13, 7).
1. Spannen Sie bitte nun grafisch einen zwei-dimensionalen Merkmalsraum auf und tragen
Sie die Lage der Trainingspixel ein.
2. Beschreiben Sie bitte einen einfachen Rechenvorgang (Algorithmus) zur Entscheidung,
welcher Objektklasse“ dieses neue Pixel mit hoher Wahrscheinlichkeit angehören wird.
”
3. Führen Sie die numerische Berechnung dieser Entscheidung durch und begründen Sie
daher numerisch die Zurdnung des neuen Pixels zu einer der in den Trainingspixeln
dargestellten Objektklassen.
14.5
Bayes Theorem
Algorithm 38 Classification with rejection
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
Pmax := −1
{initial value}
Pmin := 0.6
{choosen border for what to classify}
while there is a pixel to classify do
pick and remove pixel from the to-do-list
x := f (pixel)
{n-dim feature vector, represents information about pixel}
for all existing classes Ci do
with x calculate a posteriori probability P (Ci |x) for pixel
if P (Ci |x) > Pmax then
Pmax := P (Ci |x)
k := i
{store the actual most probable class k for pixel}
end if
end for
if Pmax > Pmin then
add pixel to corresponding class k
{classification}
else
leave pixel unclassified
{rejection}
end if
end while
Bayes Theorem looks complicated, but is not. We define a probability that an observation x
belongs to a class Ci . We call it an a-posteriori probability because it is a probability of a result,
after the classification. This resulting probability is computed from 3 other probabilities. The
first is the result of the learning phase which is the probability that given a class Ci , we make the
observation x. Second, we have the a-priori knowledge of the expert providing a probability that
a class Ci may occur. The third probability is the so-called joint probability of observation x and
class Ci .
14.5. BAYES THEOREM
249
This formula will not help us with the implementation of software codes. But the expression in
Slide 14.18 serves to explain relationships. A sketch of a possible implementation follows. First
we make a very common assumption in Slide 14.18. This assumption is called the closed world
assumption over all the classes stating that there is no unknown class and that an observation will
belong to one of the n classes. This expresses itself in statistics by means of a sum of all posteriori
probabilities being 1. For example colors: there is no pixel in the image where we do not know
a color. Bayes Theorem simplifies under this assumption since the joint probability is a constant
factor 1/a.
The problem with all classifiers is the need to model expert knowledge, then to learn one’s system.
The hard thing is to find a correct computation model. One simple implementation would thus
be that we just calculate the variances of our observations in the learning phase. We compute
not only the means, as we did before, but also the variance or standard deviation. We need to
learn our pixels, our colors, our triplets in color, we need to assign certain triplets to certain colors
and this will give us our means and our variances as in Slide 14.23. Note that the slide shows 2,
not 3 dimensions. The mean value and the variance define a Gauss an function representing the
so-called distribution of the observations.
In Slide 14.23 the ellipse may for instance define the 1-sigma border: “Sigma” or σ is the standard
deviation, σ 2 is the variance. The probability is represented by a curve or surface in 2D that is
called a “Gaussian curve” or surface.
This means that the probability that an observation within the ellipse of O3 is 66%. If the ellipse
is drawn at 3σ (3 times the standard deviation), then the probability goes to 99%.
By calculating the variance and the sigma border for each class Ci or Oi we produce n Gaussian
functions. In Slide ?? we have two dimensions, red and green. We make an observation which we
want to classify. We do not calculate the minimum distance, but we check in which ellipse the
vector of a new observation will come to lie.
To summarize, we have performed two steps: we calculate the mean and variance of each class in
the learning phase and then “intersect” the unknown observation with the result of the learning
phase. A simple Bayes classifier requires no more than to determine the Gaussian function
discussed above.
The Gaussian function in a single dimension for classing is
1
(x − mj )2
exp −
dj (x) = p
,
2σj
2σj
with x being the feature vector, σj the standard deviation, mj the mean and j is the index
associated with a specific class. In more than one dimension, m and x get replaced by vectors
and σ becomes a matrix.
This algorithm is summarized in Slide 14.23: m is the mean of each class, C is the variance. In
a multi-dimensional context, C is a matrix of numbers, the so-called co-variance matrix. It is
computed using the coordinates of the mean m. The expression E{·} in Slide 14.23 is denoted as
expected value and can be estimated by
C=
N
1 X T
xx − mmT
N
k=1
or equivalently
cij =
N
1 X
(xk,i − mi )(xk,j − mj ),
N
i, j = 1 . . . M,
k=1
where M is the dimension of feature space and N is the number of feature vectors or pixels per
class for the learning phase.
250
CHAPTER 14. CLASSIFICATION
As shown in Slide 14.23, each class of objects gets defined by an ellipse.
Prüfungsfragen:
• In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion
der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine Gausssche Normalverteilung zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren
benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p1 , p2 und p3 in zwei Dimensionen
(also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C!
Antwort:
Zuerst den Mittelwert m berechnen:
1
1
3
2
2
+
+
=
m=
−1
3
4
2
3
dann die (pi − m) · (pi − m)T bestimmen:
T
(p1 − m) · (p1 − m)
=
T
(p2 − m) · (p2 − m)
=
T
(p3 − m) · (p3 − m)
=
−1
· −1 −3 =
−3
1
1 1
· 1 1 =
1
1 1
0
0 0
· 0 2 =
2
0 4
1 3
3 9
Die Kovarianzmatrix ist
3
C=
1X
1
(pi − m) · (pi − m)T =
3 i=1
3
2
4
4
14
• Es sei p(x), x ∈ R2 die Wahrscheinlichkeitsdichtefunktion gemäß Gaussscher Normalverteilung, deren Parameter aufgrund der drei Merkmalsvektoren p1 , p2 und p3 aus Aufgabe
B.2 geschätzt wurden. Weiters seien zwei Punkte x1 = (0, 3)T und x2 = (3, 6)T im Merkmalsraum gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre
Antwort):
1. p(x1 ) < p(x2 )
2. p(x1 ) > p(x2 )
Hinweis: Zeichnen Sie die beiden Punkte x1 und x2 in Abbildung B.28 ein und überlegen Sie
sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C aus Aufgabe B.2 weisen.
Antwort: Es ist p(x1 ) < p(x2 ), da x2 in Richtung“ des größten Eigenvektors von C liegt
”
(gemessen vom Klassenzentrum m) und daher die Wahrscheinlichkeit von x2 größer ist als
die von x1 .
14.6
Supervised Classification
The approach where training/learning data exist is called supervised classification. Unsupervised
is a method where pixels (or objects) get entered into the feature space not knowing what they
are. In that case a search gets started to detect clusters in the data. The search comes up with
aggregations of pixels/objects and simply defines that each aggregate is a class.
14.7. REAL LIFE EXAMPLE
251
In contrast to this approach common classification starts out from known training pixels or objects.
A real life case is shown in Slide 14.22. A clustering algorithm may find here 3 clusters. In fact,
Slide ?? is the actual segmentation of these training pixels into 6 object classes (compare with
Slide 14.23).
The computation in the learning or training phase which leads to Slide ??, is the basis to receive
new pixels. If they fall within the agreed-upon range of a class, the pixel is assigned to that class..
Otherwise it is not assigned to any class: it gets rejected.
14.7
Real Life Example
Slide 14.26 to Slide 14.31 illustrate a classification of the territory of Austria on behalf of a cellphone project where surface cover was needed for wave propagation and signal strength assessment.
It is suggested that the classification was unsupervised, thus without training pixels and simply
looking for groups of similar pixels (clusters). A rather “noisy” result is obtained in Slide 14.28,
Slide 14.29 presents the forest pixels where many pixels get assigned to different classes, although
they are adjacent to one another. This is the result of not considering “neighborhoods”. One can
fix this by means of a filter that will aggregate adjacent pixels into one class if this does not totally
contradict the feature space. The city of Vienna’s surface cover and landuse result is shown in
Slide 14.31.
14.8
Outlook
In the specialization class on “Image Processing and Pattern Recognition” we will discuss more
details of this important and central topic of:
• Multi-variable probabilities
• Neural network classification
• Dependencies between features
• Non statistical classification (shape, chain codes)
• Transition to Artificial Intelligence AI
252
CHAPTER 14. CLASSIFICATION
14.8. OUTLOOK
253
Slide 14.1
Slide 14.2
Slide 14.3
Slide 14.4
Slide 14.5
Slide 14.6
Slide 14.7
Slide 14.8
Slide 14.9
Slide 14.10
Slide 14.11
Slide 14.12
Slide 14.13
Slide 14.14
Slide 14.15
Slide 14.16
Slide 14.17
Slide 14.18
Slide 14.19
Slide 14.20
Slide 14.21
Slide 14.22
Slide 14.23
Slide 14.24
Slide 14.25
Slide 14.26
Slide 14.27
Slide 14.28
254
CHAPTER 14. CLASSIFICATION
Slide 14.29
Slide 14.30
Slide 14.31
Slide 14.32
Chapter 15
Resampling
We have previously discussed the idea of resampling under the heading of Transformation (Chapter
9). It was a side-topic in that chapter, essentially an application. We will focus on the topic here,
using many of the illustrations from previous chapters.
Prüfungsfragen:
• Was versteht man unter (geometrischem) Resampling“, und welche Möglichkeiten gibt es,
”
die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren anhand einer Skizze und ggf. eines Formelausdrucks!
15.1
The Problem in Examples of Resampling
Slide 15.3 recalls an input image that is distorted and illustrates in connection with Slide 15.4
the rectification of the image, a geometric transformation from the input geometry to an output
geometry. The basic idea is illustrated in Slide 15.5. On the left, we have an input image geometry,
representing an distorted image. On the right, we have the output geometry, representing a
corrected or rectified image. The suggestion is here that we take a grid mesh of lines to cut up the
input image and we stretch each quadrilateral on the input image to fit into a perfect square on the
output image. This casual illustration of geometric transformation actually presents reasonably
fairly what happens in geometric transformation and resampling in digital image processing.
Resampling is also applicable in a context where we have individual images taken at different times
from different vantage points and we need to merge them into a continuous large image. We call
this process mosaicing. The images might overlap, and the overlap is used to achieve a match
between the images, finding homologue points. Those are the basis for a geometric transformation
and resampling process, to achieve the mosaic.
Finally, resampling is also an issue in computer graphics when dealing with texture. We may
have an input image, showing a particular pattern, and as we geometrically transform or change
the scale of that pattern, we will have to resample the texture. The illustration shows so-called
MIP-maps which are small image segments which are rich in detail.
15.2
A Two-Step Process
Geometric transformation and resampling really are typically performed in a two-step process.
The first step is the establishment of a geometric relationship between the input and the output
255
256
CHAPTER 15. RESAMPLING
images, essentially a coordinate processing issue. We typically have a regular pattern of pixels
in the input image, and conceptually we need to find a geometric location in the output image,
representing the center of each pixel from the input image. Vice-versa, we may have a regular
image matrix on the output side (the ground), and for each center of an output pixel, we need to
find the location in the input image, from where to pick a gray value. Slide 15.9 explains. Slide
15.10 and augment that explanation. We do have an input image that is geometrically distorted.
The object might be a stick figure as suggested in Slide 15.10. The output or target image is a
transformed stick figure. We have regular pixels in the target or output image, that need to be
assigned gray values as a function of the input image. Slide 15.12 explains the idea of the two-step
process: We have on the one hand a step 1 with a manipulation of coordinates, mapping the input
(x, y) into output (x̂, ŷ) coordinates. We have on the other hand a step 2 with a search for a gray
value for each output pixel, starting form the output location of a pixel and looking in the input
image for that gray value.
15.2.1
Manipulation of Coordinates
We have correspondence points between image-space and target or output space. These correspondence points serve to establish a geometric transformation that converts the input (x, y)
coordinates of an arbitrary image location into an output as (i, j) coordinate in the target space.
This particular transformation has its unknown transformation parameters which have to be computed in a separate process called spatial transformation. We will discuss in a moment how this
is done efficiently.
15.2.2
Gray Value Processing
Once this spatial transformation is known, we need to go through the output image and for each
pixel center (i, j) we need to find an input coordinate location (x, y) and we need to grab that
gray value and place that value at the pixel location of the output or target image.
Algorithm 39 Calculation with a node file
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
while there is another quadrangle quadin in the input node file do
if there is a corresponding quadrangle quadout in the output node file then
read the four mesh points of the quadrangle quadin
read the four mesh points of the quadrangle quadout
calculate the (eight) parameters params of the (bilinear) transformation
save the parameters params
else
error
{no corresponding quadrangle quadout for quadrangle quadin}
end if
end while
for all pixels pout of the output image do
get the quadrangle quadout in which pixel pout lies
get the parameters params corresponding to the quadrangle quadout
calculate the input image position pin of pout with the parameters params
calculate the grey value grey of pixel pout according to the position of pin
assign the grey value grey to pout
end for
15.3. GEOMETRIC PROCESSING STEP
15.3
257
Geometric Processing Step
See Algorithm 39. We go back to the idea that we cut up the input image into irregular meshes, and
each corner of the mesh pattern represents a corner of a regular mesh pattern in the output image.
We call these mesh points also nodes, we obtain a node file in the input image that corresponds
to the node file in the output image. Slide 15.15 suggests that the geometric transformation that
will relate the irregular meshes of the input image to the rectangular meshes of the output image
could be a polynomial transformation as previously discussed. More generally, we use a simple
transformation that takes four input points into the four output points as suggested in Slide 15.16.
That is a bi-linear transformation with 8 coefficients. The relationships between the mesh points
of the input and output image are obtained as a function of control points1 .
Suggested in Slide 15.16 and Slide 15.17 are control points at the locations marked by little stars. It
is those stars that define the parameters of a complex transformation function. The transformation
function is applied to the individual mesh points in the input and output images. For each location
in the output image, we compute the corresponding input mesh point. Slide 15.18 summarizes the
result of these transactions. Recall that we had given control points, which we use to compute the
transformation function. With the transformation function we establish image coordinates that
belong to mesh points in the image representing regularly spaced mesh points in the output image.
With this process, we have established the geometric relationship between input and output image
using the ideas of transformations and resulting with a node file in the input- and output images.
Algorithm 40 Nearest neighbor
1:
2:
3:
4:
read float-coordinates x0 and y0 of the input-point
x1 := round(x0 )
y1 := round(y0 )
return grayvalue of the new point (x1 , y1 )
15.4
{result is an integer}
{result is an integer}
Radiometric Computation Step
After the geometric relationships have been resolved, we now go to an arbitrary output pixel and
using its position within a square mesh, we compute the location in the input image, using the
bi-linear relationship within the mesh to find the location in the input image as suggested in Slide
15.20. That location will be an arbitrary point (x, y) that is not at the center of any pixel.
We now can select among various techniques to find a gray value for that location to be put into
the output pixel. Suggested in Slide 15.20 are 3 different techniques. If we take the gray value of
the pixel onto which location (x, y) falls, we call this the nearest neighbor (see Algorithm 40). If we
take four pixels that are nearest to the location (x, y), we can compute a bi-linear interpolation (see
Algorithm ??). If we use the 9 closest pixels, we can use a bi-cubic interpolation. We differentiate
between nearest neighbor, bi-linear and bi-cubic resampling in accordance with the technique
for gray value assignment. Slide 15.21 specifically illustrates the bi-linear interpolation: which
gray value do we assign to the output pixel as shown in Slide 15.21? We take the 4 gray values
nearest the location (x, y), those gray values are g1 , g2 , g3 , g4 , and by a simple interpolation, using
auxiliary values a and b, we obtain a gray value bi-linearly interpolated from the four gray values
g1 , g2 , g3 , g4 .
Prüfungsfragen:
1 in
German: Pass-Punkte
258
CHAPTER 15. RESAMPLING
• Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das
Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Transformation des
Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Transformation
einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1
angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit
grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung erfolgt.
15.5
Special Case: Rotating an Image by Pixel Shifts
We show in Slide 15.23 an aerial oblique image of an urban scene. We want to rotate that image
by 45o . We achieve this by simply shifting rows and columns of pixels (see Algorithm ??).. In
a first step, we shift each column of the image, going from right to left and increasingly shifting
the rows down. In a second step, we now take the rows of the resulting image and shift them
horizontally. As a result, we obtain a rotated version of the original image.
15.5. SPECIAL CASE: ROTATING AN IMAGE BY PIXEL SHIFTS
259
260
CHAPTER 15. RESAMPLING
Slide 15.1
Slide 15.2
Slide 15.3
Slide 15.4
Slide 15.5
Slide 15.6
Slide 15.7
Slide 15.8
Slide 15.9
Slide 15.10
Slide 15.11
Slide 15.12
Slide 15.13
Slide 15.14
Slide 15.15
Slide 15.16
Slide 15.17
Slide 15.18
Slide 15.19
Slide 15.20
Slide 15.21
Slide 15.22
Slide 15.23
Slide 15.24
Slide 15.25
Slide 15.26
Slide 15.27
Chapter 16
About Simulation in Virtual and
Augmented Reality
16.1
Various Realisms
Recall that we have earlier defined various types of reality. We talked about virtual reality, that
presents objects to the viewer that are modeled in a computer. Different from that is photographic
reality that we experience by an actual photograph of the natural environment. It differs from the
experience we have when we go in the real world and experience physical reality. You may recall
that we also talked about emotions and therefore talked about psychological reality, different from
the physical one. Simulation is now an attempt at creating a virtual environment that provides
essential aspects of the physical or psychological reality in a human being without the presence of
the full physical reality.
16.2
Why simulation?
To save money when training pilots, bus drivers, ship captains, soldiers, etc.
Simulation servers may be used for disaster preparedness training. Simulation is big business.
How realistic does a simulation have to be? Sufficiently realistic to serve the training purpose.
Therefore not under all circumstances do we need photorealism in simulation. We just need to
have enough visual support to challenge the human in a training situation.
16.3
Geometry, Texture, Illumination
Simulation needs information about the geometry of a situation, the illumination and the surface
properties. These are three factors, illustrated in Slide 16.8, Slide 16.9, Slide 16.10. The geometry
will not suffice if we need to recognize a particular scene. We will have difficulties with depth
queues as a function of size. We have a much reduced quality of data if we ignore texture. Texture
provides a greatly enhanced sense of realism and helps us better to estimate depth. In a disasterpreparedness scenario, the knowledge of windows and doors may be crucial and it may only be
available through texture and not through geometry.
Illumination is a third factor that creates shadows and light, again to help better understand the
context of a scene, estimate distances and intervisibility.
261
262
CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY
16.4
Augmented Reality
We combine the real world and the computer generated representation of a modeled world that
does not need to be in existence in reality. A challenge is the calibration of a system. We need
to see the real world and what is superimposed on it is shown on the two monitors. This needs
to match geometrically and in scale with the real environment that we see. Therefore we need to
define a world coordinate system and communicate that to the computer.
We also need sufficient speed, so if we turn our head, the two stereo-images computed for visual
consumption are recomputed instantly as a function of the changed angle. We need also to be
accurate to assess any rotations or change of position.
Magnetic positioning often is too slow and too inaccurate to serve the purpose well. For that
reason, an optical auxiliary system may be included in an augmented reality environment, so that
the world is observed through the camera and any change in attitude or position of the viewer
is more accurately tracked than the magnetic position could achieve. However, a camera-based
optical tracking system may be slow, too slow to act in real time at a rate of about thirty positioning computations per second. Therefore the magnetic positioning may provide an approximate
solution that is only refined by the optical tracking.
Slide 16.13 illustrates an application with a game played by two people seeing the same chess
board. An outside observer seeing the two players will see nothing. It is the two chess players who
will see one another and the game board.
Prüfungsfragen:
• Beschreiben Sie den Unterschied zwischen Virtual Reality“ und Augmented Reality“.
”
”
Welche Hardware wird in beiden Fällen benötigt?
16.5
Virtual Environments
If we exclude the real world from being experienced, then we talk about the virtual environment
or, more customarily, virtual reality. We immerse ourselves in the world of data. However, we
still have our own position and direction of viewing. As we move or turn our head we would like
to have in a virtual environment a resulting effect of looking at a new situation. Therefore, much
as in augmented reality, do we have a need to recompute very rapidly the stereo-impression of the
data world. However, virtual reality is simpler than augmented reality, because we don’t have the
accuracy requirement to superimpose the virtual over the real, as we have in augmented reality.
In a virtual reality environment, we would like to interact with the computer using our hands
and as a result we need some data garments that allow us to provide inputs to the compute, for
example by motions of our hands and fingers.
Prüfungsfragen:
• Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter
Trackingverfahren und erläutern Sie deren Vor- und Nachteile!
Antwort:
Tracking
magnetisch
optisch
Vorteile
robust
schnell
genau
Nachteile
kurze Reichweite
ungenau
Anforderung an Umgebung
aufwändig
16.5. VIRTUAL ENVIRONMENTS
263
Slide 16.1
Slide 16.2
Slide 16.3
Slide 16.4
Slide 16.5
Slide 16.6
Slide 16.7
Slide 16.8
Slide 16.9
Slide 16.10
Slide 16.11
Slide 16.12
Slide 16.13
Slide 16.14
Slide 16.15
Slide 16.16
264
CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY
Chapter 17
Motion
17.1
Image Sequence Analysis
A fixed sensor may observe a moving object, as suggested in Slide 17.3, where a series of images
is taken of moving ice in the arctic ocean. There is not only a motion of the ice, there is also
a change of the ice over time. Slide 17.4 presents a product obtained from an image sequence
analysis, representing a vector diagram of ice flows in the arctic ocean. The source of the results
was a satellite radar system of NASA, called Seasat that flew in 1978. This is now available also
from recent systems such as Canada’s Radarsat, currently orbiting the globe.
17.2
Motion Blur
Slide 17.6 illustrates a blurred image that is a result of an exposure taken while an object moved.
If the motion is known, then its effect can be removed and we can restore an image as if no motion
had happened.
The inverse occurs in Slide 17.7, where the object was stable but the camera moved during the
exposure. The same applies: if we can model the motion of the camera we will obtain a successful
reconstruction of the object by removal of the motion blur of the camera Slide 17.7 suggests that
simple filtering will not remove that blur. We need to model the effect of the motion. Yet the
process itself is called an Anti-Blur filter .
Prüfungsfragen:
• Was versteht man unter motion blur“, und unter welcher Voraussetzung kann dieser Effekt
”
aus einem Bild wieder entfernt werden?
Antwort: Durch Bewegung des aufgenommenen Objekts relativ zur Kamera während der
endlichen Öffnungszeit der Blende wird das Bild verwischt“. Eine Entfernung dieses Effekts
”
setzt voraus, dass diese Bewegung genau bekannt ist.
17.3
Detecting Change
Change may occur because of motion. Slide 17.9 explains the situation in which a group of people
is imaged while a person is moving out of the field-of-view of the camera. An algorithm can be
constructed that will detect the change between each image and its predecessors and in the process
265
266
CHAPTER 17. MOTION
allows one to map just changes. The inverse idea is to find what is constant and eliminate changes
form a sequence of images. An example is to compute texture of a building’s facade covered by
trees.
17.4
Optical Flow
A rapid sequence of images may be obtained of a changing situation. An example is the observation
of traffic. Optical flow is the analysis of the sequence of images, and the assessment of the motion
that is evident from the image stream. A typical representation of optical flow is by vectors
representing moving objects. Slide 17.12 explains.
17.4. OPTICAL FLOW
267
Slide 17.1
Slide 17.2
Slide 17.3
Slide 17.4
Slide 17.5
Slide 17.6
Slide 17.7
Slide 17.8
Slide 17.9
Slide 17.10
Slide 17.11
Slide 17.12
Slide 17.13
Slide 17.14
Slide 17.15
Slide 17.16
Slide 17.17
268
CHAPTER 17. MOTION
Chapter 18
Man-Machine-Interfacing
Our University offers a separate class on Man-Machine Interaction or Human-Computer-Interfaces
(HCI) as part of the multi-media program and as part of the computer graphics program. This
topic relates to elements of Computer-Graphics and Image Analysis since visual information in
created and manipulated.
18.1
Visualization of Abstract Information
Use of color and shape are a widely applicable tool in converging information. We have seen
examples in the Chapter on Color, encoding terrain elevation or temperature in color, or marking
contours of objects in color.
A very central element in the man-machine interaction is the use of the human visual sense to
present non-visual information for communication and interaction. An example is shown in Slide
18.3 where a diagram is presented that has on one axis the calendar time and on the other axis
a measure of popularity of movies. The interface serves to find movies on a computer monitor
by popularity and by age. Simultaneously, we can switch between various types of movies, like
drama, mystery, comedy and so forth.
Slide 18.4 is a so-called table-lens. This is a particular type of Excel sheet which shows the entire
complexity of the sheet in the background and provides a magnifying class that can be moved over
the spread sheet.
Another idea is shown in Slide 18.5 with the so-called cone-tree, representing a file structure. It
is a tree, which at its root has an entire directory, this is broken up into folders or subdirectories
which are then further broken up until each leaf is reached representing an individual file. A
similar idea is shown in Slide 18.6 called information slices. We have a very large inventory of
files, organized in subdirectories and directories. We can take subgroups of these subdirectories
and magnify them, until we can recognize each individual file.
18.2
Immersive Man-Machine Interactions
The subject of man-machine interaction also is involved in an immersion of the human in the world
of data, as we previously discussed in virtual reality which is sometimes denoted as immersive
visualization. Of particular interest is the input to the computer by means other than a keyboard
and mouse. This of course is increasingly by speech, but also by motions of the hands and
fingers, or by the recognition of facial expressions. This represents a hot subject in man-machine
interaction and ties in with computer graphics and image analysis.
269
270
CHAPTER 18. MAN-MACHINE-INTERFACING
Slide 18.1
Slide 18.2
Slide 18.3
Slide 18.4
Slide 18.5
Slide 18.6
Slide 18.7
Slide 18.8
Slide 18.9
Slide 18.10
Chapter 19
Pipelines
19.1
The Concept of an Image Analysis System
Various ideas exist in the literature about a system for image analysis. The idea of a pipeline
comes about if we consider that we have many components and algorithms in a repository of an
image possessing library. In order to set up an entire image analysis process, we plug the individual
processing steps together, much like a plumber will put a plumbing system in a building together
from standard components. In computer graphics and image processing we call this plumbing also
creation of a pipeline.
As shown in Slide 19.3 an image analysis system always begins with image acquision and sensing. We build up a system by going through preprocessing and segmentation to representation,
recognition and final use of the results of the image analysis system. All of this is built around
knowledge.
A somewhat different view combines the role of image analysis with the role of computer graphics
and separates the role into half worlds, one of reality and one of computer models. In the simplest
case, we have the world, within it a scene from which we obtain an image which goes into the
computer. The image will be replaced by an image description which then leads to a scene
description, which ultimately ends up with a description of the world.
We can close the loop from the description of the world, go back to the world, make the transition
from computer to reality by computer graphics.
The idea of active vision is going from the world to a description of the world, closing the loop
from an incomplete description of the world to a new second loop through the selection of a scene,
selection of images and so forth as shown in Slide 19.12. If as in analogy to the previous model, we
assign a central control element with expert knowledge, we have a similar idea as shown before.
Prüfungsfragen:
• Skizzieren Sie die Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen
”
Szene mittels z-buffering und Gouraud-shading!
19.2
Systems of Image Generation
Prüfungsfragen:
• Was wird in der Bildanalyse mit dem Begriff Active Vision“ bezeichnet?
”
271
272
CHAPTER 19. PIPELINES
19.3
Revisiting Image Analysis versus Computer Graphics
Slide 19.18 suggests that the transition from an image to a model of a scene is the subject of
image understanding or image processing. The inverse, the transition from a scene model to an
image is the subject of computer graphics. We do have a great overlap between image analysis
and computer graphics when it concerns the real world. Image analysis will always address the
real world, whereas computer graphics may deal with a virtual world that does not exist in reality.
In cases where one goes from a model of a non-existing world to an image, we are not dealing with
the inverse of image analysis.
Prüfungsfragen:
• Welche ist die wesentliche Abgrenzung zwischen Computergrafik und Bildanalyse, welches
ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung erwünscht.
Algorithm 41 z-buffer pipeline
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
for y = 0 to YMAX do
for x = 0 to XMAX do
WritePixel(x, y, backgroundcolor)
Z[x, y] := 0
end for
end for
for all Polygons polygon do
for all pixel in the projection of the polygon do
pz :=GetZValue(polygon,x,y)
if pz ≥ Z[x, y] then
Z[x, y] := pz
WritePixel(x, y, Color of polygon at (x,y) )
end if
end for
end for
{new point is in front}
Algorithm 42 Phong pipeline
1:
2:
3:
4:
5:
set value ai
set value il
diff:=diffuse()
reflect:=reflection()
result:= ai + il * (diff + reflect)
{ai is the ambient intensity.}
{il is the intensity of the light source.}
{calculates the amount of light which directly fall in.}
{calculates the amount of light which reflect.}
{formula developed by Phong.}
19.3. REVISITING IMAGE ANALYSIS VERSUS COMPUTER GRAPHICS
273
Slide 19.1
Slide 19.2
Slide 19.3
Slide 19.4
Slide 19.5
Slide 19.6
Slide 19.7
Slide 19.8
Slide 19.9
Slide 19.10
Slide 19.11
Slide 19.12
Slide 19.13
Slide 19.14
Slide 19.15
Slide 19.16
Slide 19.17
Slide 19.18
Slide 19.19
Slide 19.20
274
CHAPTER 19. PIPELINES
Chapter 20
Image Representation
The main goal of this chapter is to briefly describe some of the most common graphic file formats
for image files, as well as how to determine which file format to use for certain applications.
When an image is saved to a specific file format, one tells the application how to write the image’s
information to disk. The specific file format which is chosen depends on the graphics software
application one is using (e.g., Illustrator, Freehand, Photoshop) and how and where the image will
be used (e.g., the Web or a print publication).
There are three different categories of file formats: bitmap, vector and metafiles. When an image
is stored as a bitmap file, its information is stored as a pattern of pixels, or tiny, colored or black
and white dots. When an image is stored as a vector file, its information is stored as mathematical
data. The metafile format can store an image’s information as pixels (i.e. bitmap), mathematical
data (i.e., vector), or both.
20.1
Definition of Terms
20.1.1
Transparency
Transparency is the degree of visibility of a pixel against a fixed background. A totally transparent
pixel is invisible. Normal images are opaque, in the sense that no provision is made to allow the
manipulation and display of multiple overlaid images. To allow image overlay, some mechanism
must exist for the specification of transparency on a per-image, per-strip, per-tile, or per-pixel
bases. In practice, transparency is usually controlled through the addition of information to each
element of the pixel data.
The simplest way to allow image overlay is the addition of an overlay bit to each pixel value.
Setting the overlay bit in an area of an image allows the rendering application or output device
to selectively ignore those pixel values with the bit sample.
Another simple way is to reserve one unique color as transparency color, e.g. the background color
of a homogenous background. As all images are usually rectangular - regardless of the contours of
whatever have been drawn within the image - this property of background transparency is useful
for concealing image-backgrounds and making it appear that they are non rectangular. This
feature is widely used e.g., for logos on Web pages.
A more elaborate mechanism for specifying image overlays allows variations in transparency between bottom and overlaid images. Instead of having a single bit of overlay information, each pixel
value has more (usually eight bits). The eight transparency bits are sometimes called the alpha
channel. The degree of pixel transparency for an 8-bit alpha channel ranges from 0 (the pixel is
completely invisible or transparent) to 255 (the pixel is completely visible or opaque).
275
276
20.1.2
CHAPTER 20. IMAGE REPRESENTATION
Compression
This is a new concept not previously discussed in this class, except in the context of encoding
contours of objects. The amount of image data produced from all kinds of sensor, like digital
cameras, remote sensing satellites medical imaging devices, video cameras, increases steadily with
increasing number of sensors, resolution and color capabilities. Especially for transmission and
storage of this large amount of image data compression is a big issue.
We separate data compression into two classes, lossless and lossy compression. Lossless compression preserves all information present in the original data, the information is only stored in an
optimized way. Examples for lossless compression are run-length-encoding, where subsequent pixels of the same color are replaced by one color information and the number of following identical
pixels, Huffman coding uses codewords of different size instead of the usual strictly 8 or 24 bits,
shorter codewords are assignd to symbols which occur more often, this usually reduces the total
number of bits used to code an image. Compression rates between 2:1 and maximum 5:1 can be
achieved using lossless compression.
Lossy compression on the other hand removes invisible or only slightly visible information from
the image, e.g. only a reduced set of colors is used or high spatial frequencies in the image are
removed. The amount of compression which can be achieved by lossy compression is superior to
lossless compression schemes, at compression rates of 10:1 with no visible difference is feasible, the
quality for photographs is usually sufficient after a 20:1 compression. However, the information
content is changed by such an operation, therefore lossy compressed images are not suitable for
further image processing stages. We will see exampels of JPEG compressed images further on in
this lecture.
Algorithms 43 and ?? illustrate the principles.
Algorithm 43 Pipeline for lossless compression
load image;
// find redundancy and eliminate redundancy
for i = 0 to number of image columns do
for j = 0 to number of image rows do
// find out how often each pixel value appears
// (needed for the variable-length coding)
for pixel value = 0 to 2b do
histogram[pixel value]++;
end for
huffman (histogram, image);
// instead of Huffman other procedures can be used that
// produce variable-length code but Huffman leads to
// best compression results
end for
end for
save image;
20.1.3
Progressive Coding
Progressive image transmision is based on the fact that transmitting all image data may not be
necessary under some circumstances. Imagine a situation in which an operator is searching an
image database looking for a particular image. If the transmission is based on a raster scanning
order, all the data must be transmitted to view the whole image, but often it is not necessary to
have the highest possible image quality to find the image for which the operator is looking. Images
20.1. DEFINITION OF TERMS
277
Algorithm 44 Pipeline for lossy compression
load image;
// find irrelevancy like high frequencies and
// eliminate them
split image in nxn subimages;
// a common value for n is 8 or 16
transform in frequency domain;
cut off high frequencies;
// find redundancy and eliminate redundancy
for i = 0 to number of image columns do
for j = 0 to number of image rows do
// find out how often each pixel value appears
// (needed for the variable-length coding)
for pixel value = 0 to 2b do
histogram[pixel value]++;
end for
huffman (histogram, image);
// instead of Huffman other procedures can be used that
// produce variable-length code but Huffman leads to
// best compression results
end for
end for
save image;
do not have to be displayed with the hightest available resolution, and lower resolution may be
sufficient to reject an image and to begin displaying another one. This approach is also commonly
used to decrease the waiting time needed for the image to start appearing after transmission and
is used by WWW image transmission.
In progressive transmissions, the images are represented in a pyramid structure, the higher pyramid levels (lower resolution) being transmitted first. The number of pixels representing a lowerresolution image is substantially smaller and thus the user can decide from lower resolution images
whether further image refinement is needed.
20.1.4
Animation
A sequence of two or more images displayed in a rapid sequence so as to provide the illusion of
continuous motion. Animations are typically played back at a rate of 12 to 15 frames per second.
20.1.5
Digital Watermarking
A digital watermark is a digital signal or pattern inserted into a digital image. Since this signal
or pattern is present in each unaltered copy of the original image, the digital watermark may
also serve as a digital signature for the copies. A given watermark may be unique to each copy
(e.g., to identify the intended recipient), or be common to multiple copies (e.g., to identify the
document source). In either case, the watermarking of the document involves the transformation
of the original into another form.
Unlike encryption, digital watermarking leaves the original image or (or file) basically intact and
recognizable. In addition, digital watermarks, as signatures, may not be validated without special
software. Further, decrypted documents are free of any residual effects of encryption, whereas
278
CHAPTER 20. IMAGE REPRESENTATION
digital watermarks are designed to be persistent in viewing, printing, or subsequent re-transmission
or dissemination.
Two types of digital watermarks may be distinguished, depending upon whether the watermark
appears visible or invisible to the casual viewer. Visible watermarks Slide ?? are used in much
the same way as their bond paper ancestors. One might view digitally watermarked documents
and images as digitally ”stamped”.
Invisible watermarks Slide ??, on the other hand, are potentially useful as a means of identifying
the source, author, creator, owner, distributor or authorized consumer of a document or image.
For this purpose, the objective is to permanently and unalterably mark the image so that the credit
or assignment is beyond dispute. In the event of illicit usage, the watermark would facilitate the
claim of ownership, or the receipt of copyright revenues.
20.2
Common Image File Formats
Following are descriptions of some commonly used file formats:
20.2.1
BMP: Microsoft Windows Bitmap
The bitmap file format is used for bitmap graphics on the Windows platform only. Unlike other
file formats, which store image data from top to bottom and pixels in red/green/blue order, the
BMP format stores image data from bottom to top and pixels in blue/green/red order. This
means that if memory is tight, BMP graphics will sometimes appear drawn from bottom to top.
Compression of BMP files is not supported, so they are usually very large.
20.2.2
GIF: Graphics Interchange Format
The Graphics Interchange Format was originally developed by CompuServe in 1987. It is one of
the most popular file formats for Web graphics for exchanging graphics files between computers.
It is most commonly used for bitmap images composed of line drawings or blocks of a few distinct
colors. The GIF format supports 8 bits of color information or less. Therefore it is not suitiable
for photographs. In addition, the GIF89a file format supports transparency, allowing you to
make a color in your image transparent. (Please note: CompuServe Gif(87) does not support
transparency). This feature makes GIF a particularly popular format for Web images.
When to use GIF Use the GIF file format for images with only a few distinct colors, such
as illustrations, cartoons, and images with blocks of color, such as icons, buttons, and horizontal
rules.
GIF, like JPEG, is a “lossy” file format! It reduces an image’s file size by removing bits of
color information during the conversion process. The GIF format supports 256 colors or less.
When creating images for the Web, be aware that only 216 colors are shared between Macintosh
and Windows monitors. These colors, called the “Web palette,” should be used when creating
GIFs for the Web because colors that are not in this palette display differently on Macintosh and
Windows monitors. The restriction to only 256 colors is the reason why GIF is not siutable for
color photographs.
20.2. COMMON IMAGE FILE FORMATS
20.2.3
279
PICT: Picture File Format
The Picture file format is for use primarily on the Macintosh platform; it is the default format
for Macintosh image files. The PICT format is most commonly used for bitmap images, but can
be used for vector images was well. Avoid using PICT images for print publishing. The PICT
format is “lossless,” meaning it does not remove information from the original image during the
file format conversion process. Because the PICT format supports only limited compression on
Macintoshes with QuickTime installed, PICT files are usually large. When saving an image as a
PICT, add the extension “.pct” to the end of its file name. Use the PICT format for images used
in video editing, animations, desktop computer presentations, and multimedia authoring.
20.2.4
PNG: Portable Network Graphics
The Portable Network Graphics format was developed to be the successor to the GIF file format.
PNG is not yet widely supported by most Web browsers; Netscape versions 4.04 and later and
Internet Explorer version 4.0b1 and later currently support this file format. However, PNG is
expected to become a mainstream format for Web images and could replace GIF entirely. It is
platform independent and should be used for single images only (not animations). Compared
with GIF, PNG offers greater color support, better compression, gamma correction for brightness
control across platforms, better support for transparency (alpha channel), and a better method
for displaying progressive images.
20.2.5
RAS: Sun Raster File
The Sun Raster image file format is the native bitmap format of the SUN Microsystems UNIX
platforms using the SunOS operating system. This format is capable of storing black-and-white,
gray-scale, and color bitmapped data of any pixel depth. The use of color maps and a simple
Run-Length data compression are supported. Typically, most images found on a SunOS system
are Sun Raster images, and this format is supported by most UNIX imaging applications.
20.2.6
EPS: Encapsulated PostScript
The Encapsulated PostScript file format is a metafile format; it can be used for vector images or
bitmap images. The EPS file format can be used on a variety of platforms, including Macintosh
and Windows. When you place an EPS image into a document, you can scale it up or down
without information loss. This format contains PostScript information and should be used when
printing to a PostScript output device. The PostScript language , which was developed by Adobe,
is the industry standard for desktop publishing software and hardware. EPS files can be graphics
or images of whole pages that include text, font, graphic, and page layout information.
20.2.7
TIFF: Tag Interchange File Format
The Tag Interchange File Format is a tag-based format that was developed and maintained by
Aldus (now Adobe). TIFF, which is used for bitmap images, is compatible with a wide range of
software applications and can be used across platforms such as Macintosh, Windows, and UNIX.
The TIFF format is complex, so TIFF files are generally larger than GIF or JPEG files. TIFF
supports lossless LZW (Lempel-Ziv-Welch) compression ; however, compressed TIFFs take longer
to open. When saving a file to the TIFF format, add the file extension “.tif” to the end of its file
name.
280
20.2.8
CHAPTER 20. IMAGE REPRESENTATION
JPEG: Joint Photographic Expert Group
Like GIF, the Joint Photographic Experts Group format is one of the most popular formats for
Web graphcis. It supports 24 bits of color information, and is most commonly used for photographs
and similar continous-tone bitmap images. The JPEG file format stores all of the color information
in an RGB image, then reduces the file size by compressing it, or saving only the color information
that is essential to the image. Most imaging applications and plug-ins let you determine the
amount of compression used when saving a graphic in JPEG format. Unlike GIF, JPEG does not
support transparency.
When to use JPEG? JPEG uses a “lossy” compression technique, which changes the original
image by removing information during the conversion process. In theory, JPEG was designed
especially for photographs so that changes made to the orginal image during conversion to JPEG
would not be visible to the human eye. Most imaging applications let you control the amount of
lossy compression performed on an image, so you can tade off image quality for smaller file size
and vice versa. Be aware that the chances of degrading our image when converting it to JPEG
increase proportionally with the amount of compression you use.
JPEG is superior to GIF for storing full-color or grayscale images of “realistic” scenes, or images
with continouos variation in color. For example, use JPEG for scanned photographs and naturalistic artwork with hightlights, shaded areas, and shadows. The more complex and subtly rendered
the image is, the more likeley it is that the image should be converted to JPEG.
Do not use JPEG for illustrations, cartoons, lettering, or any images that have very sharp edges
(e.g., a row of black pixels adjacent to a row of white pixels). Sharp edges in images tend to
blur in JPEG unless you use only a small amount of compression when converting the image.
The JPEG data compression is being illustrated with an original image shown in Slide ??. We
have an input parameter into a JPEG compression scheme that indicates how many coefficients
one is carrying along. This is expressed by a percentage. Slide ?? shows 75% of the coefficients,
leading to a 15:1 compression of that particular image. We go on to 50% of the coefficients in
Slide ?? and 20% in Slide ??. We can appreciate the effect of the compression on the image
by comparing a enlarged segment of the original image with a similarly enlarged segment of the
de-compressed JPEG-image. Note how the decompression reveals that we have contaminated the
image, because objects radiate out under the effect of the forward transform that cannot fully be
undone by an inverse transform using a reduced set of coefficients. The effect of the compression
and the resulting contamination of the image is larger as we use fewer and fewer coefficients of
the transform as shown in Slide ?? and Slide ??. The effect of the compression can be shown
by computing a difference image of just the intensity component (black and white component) as
shown in Slide ??, Slide ??, and Slide ??.
The basic principle of JPEG compression is illustrated in Algorithm 45.
Prüfungsfragen:
• Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern?
20.3
Video File Formats: MPEG
Slide ?? illustrates the basic idea of the MPEG-1 standard for the compression of movies. MPEG
stands for Motion Picture Expert Group. Note that the MPEG approach takes key frames and
compresses them individually as shown as image frames I in Slide ??. Slides P get interpolated between frames I. Frames are then further interpolated using the frames P . Fairly large compression
rates can be achieved of 200:1. This leads to the ability of showing movies on laptop computers at
20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG
281
Algorithm 45 JPEG image compression
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
divide the picture into blocks of 8x8 pixels
for all blocks do
transform the block by DCT-II methode
for all values in the block do
quantize the value dependent from the position in the block
{high frequencies are less
important}
end for
reorder the values in a zic-zac way {DC value of block is replaced by difference to DC value
of previous block}
perform a run-length encoding of the quantized values
compress the resulting bytes with Huffmann coding
end for
this time. Slide ?? explains that the requirements for the standard, as they are defined, includes
the need to have the ability to play backwards and forwards, to compress time, to support fast
motions and rapid changes of scenes, and to randomly access any part of the movie.
The basic principle of MPEG compression is illustrated in Algorithm 46.
Prüfungsfragen:
• Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche
Kompressionsraten können erzielt werden?
20.4
New Image File Formats: Scalable Vector Graphic SVG
A Vector graphic differs from a raster graphic in that its content is described by mathematical
statements. The statements instruct a computer’s drawing engine what to display on screen i.e.
pixel information for a bitmap is not stored in the file and loaded into the display device as it is
in the case of JPEG and GIF. Instead shapes and lines, their position and direction, colours and
gradients are drawn. Vector graphics files contain instructions for the rasterisation of graphics
as the statements arrive at the viewer’s browser - ’on the fly’. Vector graphics are resolution
independent. That is, they can be enlarged as much as required with no loss of quality as there
is no raster type image to enlarge and pixelate. A vector graphic will always display at the best
quality that the output device is set to. When printing out a vector graphic from a Web page it
will print at the printer’s optimum resolution i.e. without ’jaggies’.
Until recently only proprietary formats such as Macromedia Flash or Apple’s QuickTime have
allowed Web designers to create and animate vector graphics for the Web. That is going to
change with the implementation of SVG (Scalable Vector Graphics).
SVG is the standard, based on XML (Extensible Mark-up Language), which is currently undergoing development by the W3C consortium.
An SVG file is itself comprised of text, that is the drawing engine instructions within it are
written in ordinary text and not the binary symbols 1 and 0. The file can therefore be edited in an
application no more complicated than a plain text editor, unlike raster graphics which have to be
opened in image editing applications where pixel values are changed with the use of the program’s
tools. If the appearance of a vector graphic is required to change in the Web browser, then the
text file is edited via:
282
CHAPTER 20. IMAGE REPRESENTATION
Algorithm 46 MPEG compression pipeline
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
Open MPEG stream {Encoder, not specified as part of MPEG standard. Subject to various
implementation dependant enhancements.}
Close MPEG stream
Open MPEG stream
{Decoder}
for all PictureGroups in MPEG stream do
for all Pictures in PictureGroup do
for all Slices in Picture do
for all MacroBlock in Slice do
for all Blocks in MacroBlock do
{all I,P,B pictures}
Variable Length Decoder
{Huffman with fixed DC Tables}
Inverse Quantizer
Inverse ZigZag
Inverse Diskrete Cosine Transformation
{IDCT}
end for
end for
end for
if Picture != I then
{interpolated pictures P and B}
average +1/2 interpolation
new-Picture = IDCT-Picture + interpolated-Picture
else
new-Picture is ready
end if
Dither new-Picture for display
display new-Picture
end for
end for
Close MPEG stream
20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG
283
• Editing the graphic in an SVG compliant drawing application (e.g. Adobe Illustrator 9)
• Editing the text of which the file is comprised in a text editor
• The actions of the viewer in the Web browser - clicking the mouse which triggers a script
which changes the text in the vector file
As the files are comprised of text the images themselves can be dynamic. For instance CGI and
PERL can generate images and animation based on user choices made in the browser. SVG
graphics can be used to dynamically (in real time) render database information, change their
appearance, and respond to user input and subsequent database queries.
As the SVG standard is based on XML it is fully compatible with existing Web standards such as
HTML (HyperText Mark Up Language), CSS (Cascading Style Sheets), DOM (Document Object
Model), JavaScript and CGI (Common Gateway Interface) etc.
The SVG format supports 24-bit colour, ICC color profiles for colour management, pan, zoom,
gradients and masking and other features. Type rendered as SVG will look smoother and attributes
such as kerning (spacing between characters), paths (paths along which type is run) and ligatures
(where characters are joined together) are as controllable as in DTP and drawing applications.
Positioning of SVG graphics in the Web browser window will be achieved with the use of CCS
(Cascading Style Sheets) which are part of the HTML 4 standard.
284
CHAPTER 20. IMAGE REPRESENTATION
Appendix A
Algorithmen und Definitionen
Algorithmus 1: Affines Matching (siehe Abschnitt 0.6)
Definition 2: Modellieren einer Panoramakamera (siehe Abschnitt 0.15)
Definition 3: Berechnung der Datenmenge eines Bildes (siehe Abschnitt 1.2)
Algorithmus 4: Bildvergrößerung (Raster vs. Vektor) (siehe Abschnitt 1.5)
Definition 5: Berechnung der Nachbarschaftspixel (siehe Abschnitt 1.6)
Definition 6: Berechnung des Zusammenhanges (siehe Abschnitt 1.6)
Definition 7: Berechnung der Distanz zwischen zwei Pixeln (siehe Abschnitt 1.6)
Algorithmus 8: Berechnung logischer Maskenoperationen (siehe Abschnitt 1.7)
Algorithmus 9: Berechnung schneller Maskenoperationen (siehe Abschnitt 1.7)
Definition 10: Modellierung einer perspektiven Kamera (siehe Abschnitt 2.2)
Algorithmus 11: DDA einer Geraden (siehe Abschnitt 3.1)
Algorithmus 12: Bresenham einer Geraden (siehe Abschnitt 3.1)
Algorithmus 13: Füllen eines Polygons (siehe Abschnitt 3.2)
Algorithmus 14: Zeichnen dicker Linien (siehe Abschnitt 3.3)
Definition 15: Skelettberechnung via MAT (siehe Abschnitt 3.4)
Definition 16: Translation (siehe Abschnitt 4.1)
Definition 17: Reflektion (siehe Abschnitt 4.1)
Definition 18: Komplement (siehe Abschnitt 4.1)
Definition 19: Differenz (siehe Abschnitt 4.1)
Algorithmus 20: Dilation (siehe Abschnitt 4.2)
Definition 21: Erosion (siehe Abschnitt 4.2)
Definition 22: Öffnen (siehe Abschnitt 4.3)
Definition 23: Schließen (siehe Abschnitt 4.3)
Definition 24: Filtern (siehe Abschnitt 4.4)
Definition 25: Hit oder Miss (siehe Abschnitt 4.5)
285
286
APPENDIX A. ALGORITHMEN UND DEFINITIONEN
Definition 26: Umriss (siehe Abschnitt 4.6)
Definition 27: Regionenfüllung (siehe Abschnitt 4.6)
Algorithmus 28: Herstellung von Halbtonbildern (siehe Abschnitt 5.1)
Definition 29: Farbtransformation in CIE (siehe Abschnitt 5.3)
Definition 30: Farbtransformation in CMY (siehe Abschnitt 5.6)
Definition 31: Farbtransformation in CMYK (siehe Abschnitt 5.7)
Algorithmus 32: HSV-HSI-HLS-RGB (siehe Abschnitt 5.8)
Definition 33: YIK-RGB (siehe Abschnitt 5.9)
Algorithmus 34: Umwandlung von Negativ- in Positivbild (siehe Abschnitt 5.14)
Algorithmus 35: Bearbeitung eines Masked Negative (siehe Abschnitt 5.14)
Algorithmus 36: Berechnung eines Ratiobildes (siehe Abschnitt 5.16)
Definition 37: Umrechnung lp/mm in Pixelgröße (siehe Abschnitt 6.4)
Algorithmus 38: Berechnung eines Histogrammes (siehe Abschnitt 6.6)
Algorithmus 39: Äquidistanzberechnung (siehe Abschnitt 6.6)
Definition 40: Spreizen des Histogrammes (siehe Abschnitt 6.6)
Algorithmus 41: Örtliche Histogrammäqualisierung (siehe Abschnitt 6.6)
Algorithmus 42: Differenzbild (siehe Abschnitt 6.6)
Algorithmus 43: Schwellwertbildung (siehe Abschnitt 7)
Definition 44: Kontrastspreitzung (siehe Abschnitt 7)
Definition 45: Tiefpassfilter mit 3 × 3 Fenster (siehe Abschnitt 7.2)
Algorithmus 46: Medianfilter (siehe Abschnitt 7.2)
Algorithmus 47: Faltungsberechnung (siehe Abschnitt 7.3)
Definition 48: USM Filter (siehe Abschnitt 7.4)
Definition 49: Allgemeines 3 × 3 Gradientenfilter (siehe Abschnitt 7.5)
Definition 50: Roberts-Filter (siehe Abschnitt 7.5)
Definition 51: Prewitt-Filter (siehe Abschnitt 7.5)
Definition 52: Sobel-Filter (siehe Abschnitt 7.5)
Algorithmus 53: Berechnung eines gefilterten Bildes im Spektralbereich (siehe Abschnitt 7.6)
Algorithmus 54: Ungewichtetes Antialiasing (siehe Abschnitt 7.9)
Algorithmus 55: Gewichtetes Antialiasing (siehe Abschnitt 7.9)
Algorithmus 56: Gupte-Sproull-Antialiasing (siehe Abschnitt 7.9)
Definition 57: Statistische Texturberechnung (siehe Abschnitt 8.2)
Definition 58: Berechnung eines spektralen Texturmasses (siehe Abschnitt 8.4)
Algorithmus 59: Aufbringen einer Textur (siehe Abschnitt 8.5)
Definition 60: Berechnung einer linearen Transformation in 2D (siehe Abschnitt 9.2)
Definition 61: Konforme Transformation (siehe Abschnitt 9.3)
Definition 62: Modellierung einer Drehung in 2D (siehe Abschnitt 9.4)
287
Definition 63: Aufbau einer 2D Drehmatrix bei gegebenen Koordinatenachsen (siehe Abschnitt
9.4)
Definition 64: Rückdrehung in 2D (siehe Abschnitt 9.4)
Definition 65: Aufeinanderfolgende Drehungen (siehe Abschnitt 9.4)
Definition 66: Affine Transformation in 2D in homogenen Koordinaten (siehe Abschnitt 9.5)
Definition 67: Affine Transformation in 2D in kartesischen Koordinaten (siehe Abschnitt 9.5)
Definition 68: Allgemeine Transformation in 2D (siehe Abschnitt 9.6)
Algorithmus 69: Berechnung unbekannter Transformationsparameter (siehe Abschnitt 9.6)
Algorithmus 70: Cohen Sutherland (siehe Abschnitt 9.8)
Definition 71: Aufbau einer homogenen Transformationsmatrix in 2D (siehe Abschnitt 9.9)
Definition 72: 3D Drehung (siehe Abschnitt 9.10)
Definition 73: 3D affine Transformation in homogenen Koordinaten (siehe Abschnitt 9.11)
Definition 74: Bezier-Kurven in 2D (siehe Abschnitt 9.20)
Algorithmus 75: Casteljau (siehe Abschnitt 9.21)
Algorithmus 76: Berechnung einer Kettenkodierung (siehe Abschnitt 10.1)
Algorithmus 77: Splitting (siehe Abschnitt 10.2)
Definition 78: Parameterdarstellung einer Geraden für 2D Morphing (siehe Abschnitt 10.3)
Algorithmus 79: Aufbau eines Quadtrees (siehe Abschnitt 10.5)
Definition 80: Aufbau einer Wireframestruktur (siehe Abschnitt 10.8)
Definition 81: Aufbau einer B-Rep-Struktur (siehe Abschnitt 10.12)
Definition 82: Aufbau einer Cell“-Struktur (siehe Abschnitt 10.14)
”
Algorithmus 83: Aufbau einer BSP-Struktur (siehe Abschnitt 10.14)
Algorithmus 84: z-Buffering für eine Octree-Struktur (siehe Abschnitt 11.5)
Algorithmus 85: Raytracing für eine Octree-Struktur (siehe Abschnitt 11.6)
Definition 86: Ambient Beleuchtung (siehe Abschnitt 12.1)
Definition 87: Lambert Modell (siehe Abschnitt 12.1)
Algorithmus 88: Gouraud (siehe Abschnitt 12.2)
Algorithmus 89: Phong (siehe Abschnitt 12.2)
Algorithmus 90: Objektgenaue Schattenberechnung (siehe Abschnitt 12.3)
Algorithmus 91: Bildgenaue Schattenberechnung (siehe Abschnitt 12.3)
Algorithmus 92: Radiosity (siehe Abschnitt 12.6)
Definition 93: Berechnung der Binokularen Tiefenschärfe (siehe Abschnitt 13.1)
Definition 94: Berechnung der totalen Plastik (siehe Abschnitt 13.2)
Algorithmus 95: Berechnung eines Stereomatches (siehe Abschnitt 13.7)
Definition 96: LoG Filter als Vorbereitung auf Stereomatches (siehe Abschnitt 13.7)
Algorithmus 97: Aufbau eines Merkmalsraums (siehe Abschnitt 14.3)
Algorithmus 98: Pixelzuteilung zu einer Klasse ohne Rückweisung (siehe Abschnitt 14.4)
288
APPENDIX A. ALGORITHMEN UND DEFINITIONEN
Algorithmus 99: Pixelzuteilung zu einer Klasse mit Rückweisung (siehe Abschnitt 14.4)
Algorithmus 100: Zuteilung eines Merkmalsraumes mittels Trainingspixeln (siehe Abschnitt 14.6)
Algorithmus 101: Berechnung einer Knotendatei (siehe Abschnitt 15.3)
Algorithmus 102: Berechnung eines nächsten Nachbars (siehe Abschnitt 15.4)
Algorithmus 103: Berechnung eines bilinear interpolierten Grauwerts (siehe Abschnitt 15.4)
Algorithmus 104: Bilddrehung (siehe Abschnitt 15.5)
Algorithmus 105: z-Buffer Pipeline (siehe Abschnitt 19.2)
Algorithmus 106: Phong-Pipeline (siehe Abschnitt 19.2)
Algorithmus 107: Kompressionspipeline (siehe Abschnitt 20.1.2)
Algorithmus 108: JPEG Pipeline (siehe Abschnitt 20.2.8)
Algorithmus 109: MPEG Pipeline (siehe Abschnitt 20.3)
Appendix B
Fragenübersicht
B.1
Gruppe 1
• Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“. Was ist
”
darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells? [#0001]
(Frage I/8 14. April 2000)
• Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle
”
Arten von Nachbarschaften auf, die in der Vorlesung behandelt wurden, und beschreiben Sie
diese Nachbarschaften mittels je einer Skizze.
[#0003]
(Frage I/9 14. April 2000, Frage I/1 9. November 2001)
• Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus
gegenüber dem DDA-Algorithmus.
[#0006]
(Frage I/5 11. Mai 2001, Frage 7 20. November 2001)
• Erläutern Sie die morphologische Erosion“ unter Verwendung einer Skizze und eines Forme”
lausdruckes.
[#0007]
(Frage I/2 14. April 2000)
• Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer
Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen,
und welche den Schnittpunkten der Geraden durch A, B mit dem Rand des CIE-Farbraumes?
[#0012]
(Frage I/3 14. April 2000)
• Zu welchem Zweck würde man als Anwender ein sogenanntes Ratio-Bild“ herstellen? Ver”
wenden Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes.
[#0015]
(Frage I/4 14. April 2000)
• Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit
welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine
Skizze.
[#0017]
(Frage I/10 14. April 2000)
289
290
APPENDIX B. FRAGENÜBERSICHT
• Eines der populärsten Filter heißt Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte
”
um eine einfache formelmäßige Erläuterung.
[#0021]
(Frage I/11 14. April 2000)
• In der Vorlesung wurde ein Baum“ für die Hierarchie diverser Projektionen in die Ebene
”
dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden Projektionen.
[#0026]
(Frage I/12 14. April 2000)
• Wozu dient das sogenannte photometrische Stereo“? Und was ist die Grundidee, die diesem
”
Verfahren dient?
[#0033]
(Frage I/5 14. April 2000, Frage I/1 28. September 2001)
• Was ist eine einfache Realisierung der Spiegelreflektion“ (engl.: specular reflection) bei
”
der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den
Namen eines Verfahrens nach seinem Erfinder.
[#0034]
(Frage I/6 14. April 2000, Frage I/6 28. September 2001, Frage I/6 1. Februar 2002)
• Welche ist die wesentliche Abgrenzung zwischen Computergrafik und Bildanalyse, welches
ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung erwünscht.
[#0041]
(Frage I/1 14. April 2000)
• Was bedeuten die Begriffe geometrische“ bzw. radiometrische“ Auflösung eines Bildes?
”
”
Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen.
[#0047]
(Frage I/1 14. Dezember 2001)
• Was versteht man unter Rasterkonversion“, und welche Probleme können dabei auftreten?
”
[#0058]
(Frage I/1 26. Mai 2000, Frage I/8 15. März 2002)
• Erläutern Sie das morphologische Öffnen“ unter Verwendung einer Skizze und eines Formel”
ausdruckes.
[#0059]
(Frage I/2 26. Mai 2000, Frage I/4 10. November 2000)
• Erklären Sie das Problem, das bei der Verwendung von einem Pixel breiten“ Linien auftritt,
”
wenn eine korrekte Intensitätswiedergabe gefordert ist. Welche Lösungsmöglichkeiten gibt
es für dieses Problem? Bitte verdeutlichen Sie Ihre Antwort anhand einer Skizze! (Hinweis:
betrachten Sie Linien unterschiedlicher Orientierung!)
[#0060]
(Frage I/3 26. Mai 2000)
• Was versteht man unter dem dynamischen Bereich“ eines Mediums zur Wiedergabe bild”
hafter Informationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung? Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen
Bereiches!
[#0061]
(Frage I/5 30. Juni 2000, Frage 1 20. November 2001, Frage I/5 15. März 2002)
• Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt werden? Begründen Sie Ihre Antwort anhand einer Skizze!
[#0062]
(Frage I/4 26. Mai 2000, Frage I/5 10. November 2000, Frage I/2 9. November 2001, Frage
4 20. November 2001)
B.1. GRUPPE 1
291
• Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er
eingesetzt?
[#0063]
(Frage I/5 26. Mai 2000, Frage I/7 10. November 2000, Frage I/11 30. März 2001, Frage I/5
28. September 2001, Frage 3 20. November 2001)
• Erklären Sie die Bedeutung von homogenen Koordinaten für die Computergrafik! Welche
Eigenschaften weisen homogene Koordinaten auf?
[#0066]
(Frage I/6 26. Mai 2000, Frage 1 15. Jänner 2002)
• Was versteht man unter (geometrischem) Resampling“, und welche Möglichkeiten gibt es,
”
die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren anhand einer Skizze und ggf. eines Formelausdrucks!
[#0067]
(Frage I/7 26. Mai 2000, Frage I/6 10. November 2000, Frage I/3 28. September 2001, Frage
I/9 9. November 2001, Frage 6 20. November 2001, Frage 6 15. Jänner 2002)
• Beschreiben Sie mindestens zwei Verfahren, bei denen allein durch Modulation der Oberflächenparameter (ohne Definition zusätzlicher geometrischer Details) eine realistischere Darstellung eines vom Computer gezeichneten Objekts möglich ist!
[#0068]
(Frage I/8 26. Mai 2000)
• Ein dreidimensionaler Körper kann mit Hilfe von Zellen einheitlicher Größe (Würfeln), die in
einem gleichmäßigen Gitter angeordnet sind, dargestellt werden. Beschreiben Sie Vor- und
Nachteile dieser Repräsentationsform! Begründen Sie Ihre Antwort ggf. mit einer Skizze!
[#0070]
(Frage I/9 26. Mai 2000)
• Erklären Sie (ohne Verwendung von Formeln) das Prinzip des Radiosity“-Verfahrens zur
”
Herstellung realistischer Bilder mit dem Computer. Welche Art der Lichtinteraktion kann
mit diesem Modell beschrieben werden, und welche kann nicht beschrieben werden? [#0073]
(Frage I/10 26. Mai 2000)
• In der Einführungsvorlesung wurde der Begriff Affine Matching“ verwendet. Wozu dient
”
das Verfahren, welches dieser Begriff bezeichnet?
[#0079]
(Frage I/7 14. April 2000)
• Skizzieren Sie die Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen
”
Szene mittels z-buffering und Gouraud-shading!
[#0082]
(Frage I/10 30. Juni 2000, Frage I/9 10. November 2000)
• Beschreiben Sie den Unterschied zwischen Virtual Reality“ und Augmented Reality“.
”
”
Welche Hardware wird in beiden Fällen benötigt?
[#0083]
(Frage I/9 30. Juni 2000, Frage I/8 28. September 2001, Frage I/8 14. Dezember 2001, Frage
I/2 15. März 2002)
• Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben
Sie typische Anwendungsfälle beider Methoden!
[#0084]
(Frage I/8 30. Juni 2000, Frage I/8 10. November 2000)
• Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels
z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile.
[#0086]
(Frage I/7 30. Juni 2000)
292
APPENDIX B. FRAGENÜBERSICHT
• Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 12 D- oder 3DModellen. Definieren Sie die Objektbeschreibung durch 2 12 D- bzw. 3D-Modelle mittels Gleichungen und erläutern Sie in Worten den wesentlichen Unterschied!
[#0087]
(Frage I/6 30. Juni 2000, Frage I/6 9. November 2001, Frage I/6 14. Dezember 2001, Frage
5 15. Jänner 2002)
• Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum
auf? Welche Aussagen können über eine Textur anhand ihres Spektrums gemacht werden?
[#0093]
(Frage I/4 30. Juni 2000)
• Erklären Sie, unter welchen Umständen Aliasing“ auftritt und was man dagegen unterneh”
men kann!
[#0094]
(Frage I/3 30. Juni 2000)
• Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in
das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird
das CMYK-Modell verwendet?
[#0095]
(Frage I/2 30. Juni 2000, Frage 2 20. November 2001)
• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?
[#0097]
(Frage I/1 30. Juni 2000)
• Definieren Sie den Begriff Kante“.
”
(Frage I/1 13. Oktober 2000)
[#0105]
• Erklären Sie anhand einer Skizze den zeitlichen Ablauf des Bildaufbaus auf einem Elektronenstrahlschirm!
[#0109]
(Frage I/2 13. Oktober 2000, Frage I/2 1. Februar 2002, Frage I/10 15. März 2002)
• Erklären Sie, wie man mit Hilfe der Computertomografie ein dreidimensionales Volumenmodell vom Inneren des menschlichen Körpers gewinnt.
[#0110]
(Frage I/3 13. Oktober 2000)
• Nennen Sie verschiedene Techniken, um dicke“ Linien (z.B. Geradenstücke oder Kreisbögen)
”
zu zeichnen.
[#0111]
(Frage I/4 13. Oktober 2000, Frage I/1 10. November 2000, Frage I/10 9. November 2001)
• Zum YIQ-Farbmodell:
1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell?
2. Wo wird das YIQ-Farbmodell eingesetzt?
[#0112]
(Frage I/5 13. Oktober 2000)
• Skizzieren Sie die Form des Filterkerns eines Gaussschen Tiefpassfilters. Worauf muss man
bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten?
[#0115]
(Frage I/6 13. Oktober 2000, Frage I/3 10. November 2000)
• Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an.
[#0116]
(Frage I/7 13. Oktober 2000, Frage I/10 10. November 2000)
B.1. GRUPPE 1
293
• Was versteht man unter einer Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese
”
Art der Objektrepräsentation?
[#0117]
(Frage I/8 13. Oktober 2000, Frage I/2 10. November 2000, Frage 4 15. Jänner 2002)
• Welche physikalischen Merkmale der von einem Körper ausgesandten oder reflektierten
Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)?
[#0118]
(Frage I/9 13. Oktober 2000, Frage I/5 14. Dezember 2001)
• Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks,
das zu einer beleuchteten polygonalen Szene gehört.
[#0119]
(Frage I/10 13. Oktober 2000)
• Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige
Beispiele für Mehrfachbilder!
[#0121]
(Frage I/1 15. Dezember 2000, Frage I/5 9. November 2001, Frage I/3 14. Dezember 2001)
• Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen
(z.B. Fotografien)!
[#0122]
(Frage I/2 15. Dezember 2000)
• Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile
bietet dieses Verfahren?
[#0123]
(Frage I/3 15. Dezember 2000)
• Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter
Trackingverfahren und erläutern Sie deren Vor- und Nachteile!
[#0124]
(Frage I/4 15. Dezember 2000, Frage I/4 1. Februar 2002)
• Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von
Kurven, und erläutern Sie anhand einer Skizze ein Approximationsverfahren Ihrer Wahl!
[#0125]
(Frage I/5 15. Dezember 2000, Frage 2 15. Jänner 2002)
• Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit
der cutoff“-Frequenz D0 an! Skizzieren Sie die Transferfunktion!
[#0127]
”
(Frage I/6 15. Dezember 2000, Frage I/7 14. Dezember 2001)
• Erklären Sie, wie in der Visualisierung die Qualität eines vom Computer erzeugten Bildes
durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften (insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer
Textur eignen.
[#0128]
(Frage I/7 15. Dezember 2000)
• Erklären Sie, warum bei der Entzerrung von digitalen Rasterbildern meist Resampling“
”
erforderlich ist. Nennen Sie zwei Verfahren zur Grauwertzuweisung für das Ausgabebild!
[#0130]
(Frage I/8 15. Dezember 2000)
• Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch
seine (polygonale) Oberfläche genutzt werden kann!
[#0131]
(Frage I/9 15. Dezember 2000, Frage I/2 28. September 2001)
294
APPENDIX B. FRAGENÜBERSICHT
128x128
256x256
512x512
Figure B.1: wiederholte Speicherung eines Bildes in verschieden Größen
• Erklären Sie den Begriff Überwachen beim Klassifizieren“. Wann kann man dieses Verfahren
”
einsetzen?
[#0133]
(Frage I/10 15. Dezember 2000)
• Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Transformationsmatrix (in
zwei Dimensionen) gefragt, die sich aus einer Skalierung und einer Rotation um ein beliebiges
Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Transformation?
Begründen Sie Ihre Antwort!
[#0167]
(Frage I/1 2. Februar 2001)
• Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten aus digitale Bilder
erzeugen, aus welchen ein topografisches Modell des Geländes (ein Höhenmodell) aus einer
einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der
elektromagnetischen Strahlung, die für diese Zwecke genutzt werden!
[#0169]
(Frage I/2 2. Februar 2001)
• In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das
erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht
nur mehr aus einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo
wird sie eingesetzt (nennen Sie mindestens ein Beispiel)?
[#0170]
(Frage I/6 2. Februar 2001, Frage I/1 1. Februar 2002)
• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken!
[#0175]
(Frage I/3 2. Februar 2001)
• In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in
eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder
als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen werden benötigt? Erläutern Sie Ihre Antwort anhand einer beliebigen Strecke
aus Abbildung B.3!
[#0177]
(Frage I/5 2. Februar 2001)
• Was ist eine 3D Textur“?
”
(Frage I/9 2. Februar 2001, Frage I/4 28. September 2001)
[#0178]
B.1. GRUPPE 1
295
• Welche Rolle spielen die sogenannten Passpunkte“ (engl. Control Points) bei der Interpo”
lation und bei der Approximation von Kurven? Erläutern Sie Ihre Antwort anhand einer
Skizze!
[#0179]
(Frage I/7 2. Februar 2001)
• Beschreiben Sie eine bilineare Transformation anhand ihrer Definitionsgleichung!
[#0180]
(Frage I/11 2. Februar 2001)
• Zählen Sie Fälle auf, wo in der Bildanalyse die Fourier-Transformation verwendet wird!
[#0184]
(Frage I/8 2. Februar 2001)
• Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern? [#0185]
(Frage I/10 2. Februar 2001, Frage I/9 19. Oktober 2001)
• Geben Sie zu jedem der Darstellungsverfahren aus Abbildung B.2 an, welche Informationen
über das Objekt gespeichert werden müssen!
[#0187]
(Frage I/4 2. Februar 2001)
• Erläutern Sie den Begriff Sensor-Modell“!
”
(Frage I/1 30. März 2001, Frage I/7 19. Oktober 2001)
[#0193]
• Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren kann man sie ermitteln?
[#0194]
(Frage I/2 30. März 2001)
• Was versteht man unter passiver Radiometrie“?
”
(Frage I/3 30. März 2001, Frage I/9 1. Februar 2002)
[#0195]
• Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon ausgefüllt
(also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme treten auf, wenn das Polygon sehr spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)?
”
[#0196]
(Frage I/4 30. März 2001, Frage I/2 14. Dezember 2001)
• Wie ist der Hit-or-Miss“-Operator A ~ B definiert? Erläutern Sie seine Funktionsweise zur
”
Erkennung von Strukturen in Binärbildern!
[#0199]
(Frage I/5 30. März 2001)
• Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild (pseudo color image)? Nennen Sie je einen typischen Anwendungsfall!
[#0200]
(Frage I/6 30. März 2001)
• Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit
der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz?
[#0202]
(Frage I/7 30. März 2001, Frage I/10 19. Oktober 2001, Frage I/4 14. Dezember 2001)
• Was versteht man unter prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile
”
bringt ihr Einsatz?
[#0206]
(Frage I/8 30. März 2001)
• Erklären Sie den Begriff spatial partitioning“ und nennen Sie drei räumliche Datenstruk”
turen aus dieser Gruppe!
[#0208]
(Frage I/9 30. März 2001)
296
APPENDIX B. FRAGENÜBERSICHT
• Erklären Sie die Begriffe feature“ (Merkmal), feature space“ (Merkmalsraum) und clus”
”
”
ter“ im Zusammenhang mit Klassifikationsproblemen und verdeutlichen Sie Ihre Antwort
anhand einer Skizze!
[#0209]
(Frage I/10 30. März 2001, Frage I/9 28. September 2001, Frage I/7 1. Februar 2002)
• Im Folgenden sehen Sie drei 3 × 3 Transformationmatrizen, wobei jede der Matrizen einen
bestimmten Transformationstyp für homogene Koordinaten in 2D beschreibt:


a11 0 0
A =  0 a22 0  ,
a11 , a22 beliebig
0
0
1


b11 b12 0
B =  −b12 b11 0  , b211 + b212 = 1
0 1
 0
1 0 c13
C =  0 1 c23  ,
c13 , c23 beliebig
0 0 1
Um welche Transformationen handelt es sich bei A, B und C?
[#0213]
(Frage I/1 11. Mai 2001)
• In der Computergrafik ist die Abbildung eines dreidimensionalen Objekts auf die zweidimensionale Bildfläche ein mehrstufiger Prozess (Abbildung B.4), an dem verschiedene Transformationen und Koordinatensysteme beteiligt sind. Benennen Sie die Koordinatensysteme A,
B und C in Abbildung B.4!
[#0215]
(Frage I/1 26. Juni 2001)
• Gegeben sei ein verrauschtes monochromes digitales Rasterbild. Gesucht sei ein Filter, das
zur Bereinigung eines solchen Bildes geeignet ist, wobei folgende Anforderungen gestellt
werden:
– Kanten müssen erhalten bleiben und dürfen nicht verwischt“ werden.
”
– Im Ausgabebild dürfen nur solche Grauwerte enthalten sein, die auch im Eingabebild
vorkommen.
Schlagen Sie einen Filtertyp vor, der dafür geeignet ist, und begründen Sie Ihre Antwort!
[#0216]
(Frage I/2 11. Mai 2001, Frage I/5 19. Oktober 2001)
• In der Computergrafik kennt man die Begriffe Phong-shading“ und Phong-illumination“.
”
”
Erklären Sie diese beiden Begriffe!
[#0219]
(Frage I/3 11. Mai 2001)
• Bei der Erstellung realistischer Szenen werden in der Computergrafik u.a. die zwei Konzepte
shading“ und shadow“ verwendet, um die Helligkeit der darzustellenden Bildpunkte zu
”
”
ermitteln. Was ist der Unterschied zwischen diesen beiden Begriffen?
[#0220]
(Frage I/3 26. Juni 2001, Frage I/2 19. Oktober 2001, Frage I/4 15. März 2002)
• Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung!
[#0225]
(Frage I/4 11. Mai 2001)
• Nennen Sie allgemeine Anforderungen an eine Datenstruktur zur Repräsentation dreidimensionaler Objekte!
[#0230]
(Frage I/7 11. Mai 2001)
B.1. GRUPPE 1
297
• Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen! Welche
”
Optimierungen können helfen, den Rechenaufwand zu verringern?
[#0231]
(Frage I/9 11. Mai 2001, Frage I/8 19. Oktober 2001, Frage 8 15. Jänner 2002)
• Beschreiben Sie Anwendungen von Resampling“ und erläutern Sie den Prozess, seine Vari”
anten und mögliche Fehlerquellen!
[#0232]
(Frage I/10 11. Mai 2001)
• Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines echten“
”
(dreidimensionalen) Raumeindrucks einer vom Computer dargestellten Szene!
[#0233]
(Frage I/11 11. Mai 2001)
• Erklären Sie den Unterschied zwischen supervised classification“ und unsupervised clas”
”
sification“! Welche Rollen spielen diese Verfahren bei der automatischen Klassifikation der
Bodennutzung anhand von Luftbildern?
[#0234]
(Frage I/8 11. Mai 2001)
• Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche
Kompressionsraten können erzielt werden?
[#0235]
(Frage I/6 11. Mai 2001, Frage I/9 14. Dezember 2001, Frage I/1 15. März 2002)
• Was versteht man unter motion blur“, und unter welcher Voraussetzung kann dieser Effekt
”
aus einem Bild wieder entfernt werden?
[#0238]
(Frage I/2 26. Juni 2001, Frage I/10 14. Dezember 2001)
• Welchem Zweck dient ein sogenannter Objektscanner“? Nennen Sie drei verschiedene Ver”
fahren, nach denen ein Objektscanner berührungslos arbeiten kann!
[#0239]
(Frage I/4 26. Juni 2001)
• Erklären Sie anhand eines Beispiels den Vorgang des morphologischen Filterns!
[#0240]
(Frage I/6 26. Juni 2001)
• Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen
Rasterbildes?
[#0243]
(Frage I/5 1. Februar 2002)
• Beschreiben Sie anhand einer Skizze das Aussehen“ folgender Filtertypen im Frequenzbe”
reich:
1. Tiefpassfilter
2. Hochpassfilter
3. Bandpassfilter
[#0245]
(Frage I/8 26. Juni 2001)
• Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden? Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern!
[#0246]
(Frage I/5 26. Juni 2001)
298
APPENDIX B. FRAGENÜBERSICHT
• Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein
verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die erforderlich sind, um ein
solches verzerrtes Bild zu rektifizieren!
[#0249]
(Frage I/10 26. Juni 2001)
• In der Computergrafik gibt es zwei grundlegend verschiedene Verfahren, um ein möglichst
(photo-)realistisches Bild einer dreidimensionalen Szene zu erstellen. Verfahren A kommt
zum Einsatz, wenn Spiegelreflexion, Lichtbrechung und Punktlichtquellen simuliert werden
sollen. Verfahren B ist besser geeignet, um diffuse Reflexion, gegenseitige Lichtabstrahlung
und Flächenlichtquellen darzustellen und die Szene interaktiv zu durchwandern. Benennen
Sie diese beiden Verfahren und erläutern Sie kurz deren jeweilige Grundidee!
[#0253]
(Frage I/7 26. Juni 2001)
• Was versteht man unter einem LoD/R-Tree“?
”
(Frage I/9 26. Juni 2001)
[#0254]
• Was versteht man unter immersiver Visualisierung“?
”
(Frage I/11 26. Juni 2001)
[#0256]
• Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird
verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert?
[#0265]
(Frage I/10 28. September 2001)
• Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der
Stereo-Bildgebung!
[#0266]
(Frage I/7 28. September 2001)
• Was versteht man unter data garmets“ (Datenkleidung)? Nennen Sie mindestens zwei
”
Geräte dieser Kategorie!
[#0273]
(Frage I/4 19. Oktober 2001)
• Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters und vergleichen Sie die Vor- und Nachteile beider Filtertypen!
[#0274]
(Frage I/1 19. Oktober 2001)
• Was versteht man unter einer konformen Transformation“?
”
(Frage I/6 19. Oktober 2001)
[#0275]
• Nach welchem Grundprinzip arbeiten Verfahren, die aus einem Stereobildpaar die Oberfläche
eines in beiden Bildern sichtbaren Körpers rekonstruieren können?
[#0276]
(Frage I/3 19. Oktober 2001)
• Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung
digitaler Rasterbilder verwendet werden!
[#0278]
(Frage I/3 9. November 2001, Frage I/7 15. März 2002)
• Was ist Morphologie“?
”
(Frage I/7 9. November 2001)
[#0279]
• Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie
mindestens drei Beispiele davon!
[#0280]
(Frage I/4 9. November 2001)
B.1. GRUPPE 1
• Erläutern Sie die strukturelle Methode der Texturbeschreibung!
299
[#0281]
(Frage I/8 9. November 2001)
• Nennen Sie ein Verfahren zur Verbesserung verrauschter Bilder, und erläutern sie deren
Auswirkungen auf die Qualität des Bildes! Bei welcher Art von Rauschen kann das von
Ihnen genannte Verfahren eingesetzt werden?
[#0296]
(Frage I/3 1. Februar 2002)
• Erläutern Sie die Octree-Datenstruktur und nennen Sie mindestens zwei verschiedene Anwendungen davon!
[#0298]
(Frage I/10 1. Februar 2002)
• Erklären Sie den z-buffer-Algorithmus zur Ermittlung sichtbarer Flächen!
[#0299]
(Frage I/8 1. Februar 2002)
• Beschreiben Sie die Arbeitsweise des Marr-Hildreth-Operators1 !
[#0311]
(Frage I/9 15. März 2002)
• Nennen Sie vier dreidimensionale Farbmodelle, benennen Sie die einzelnen Komponenten
und skizzieren Sie die Geometrie des Farbmodells!
[#0313]
(Frage I/6 15. März 2002)
• Versuchen Sie eine Definition des Histogramms eines digitalen Grauwertbildes!
[#0314]
(Frage I/3 15. März 2002)
1 Dieser Operator wurde in der Vorlesung zur Vorbearbeitung von Stereobilder besprochen und erstmals im
Wintersemester 2001/02 namentlich genannt.
300
APPENDIX B. FRAGENÜBERSICHT
(a) Verfahren 1
(b) Verfahren 2
(c) Verfahren 3
(d) Verfahren 4
Figure B.2: dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt
B.1. GRUPPE 1
301
Figure B.3: Überführung einer Vektorgrafik in eine andere
A
Modellierungs−
Transformation
B
Projektion
C
Figure B.4: Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale
Bildfläche
302
APPENDIX B. FRAGENÜBERSICHT
Figure B.5: Pixelraster
Figure B.6: binäres Rasterbild
B.2
Gruppe 2
• Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt,
wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster
dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt?
[#0011]
(Frage II/13 14. Dezember 2001)
• Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung
dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung
”
B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes
grafisch mitzuteilen.
[#0029]
(Frage II/14 14. April 2000)
• Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält
sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen?
[#0030]
(Frage II/15 14. April 2000)
• Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der
Konstruktion des Objektes (ohne Lampe).
[#0031]
(Frage II/17 14. April 2000)
B.2. GRUPPE 2
303
Figure B.7: Tisch
• Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das Geheimnis“, welches
”
es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine
wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem
Sehen möglich ist.
[#0037]
(Frage II/13 14. April 2000)
• Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das
Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Transformation des
Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Transformation
einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1
angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit
grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung erfolgt.
[#0039]
1
2
3
4
5
2
3
4
5
6
3
4
5
6
7
4
5
6
7
8
5
6
7
8
9
6
7
8
9
10
7
8
9
10
11
Figure B.8: Inputbild
Zeile
2.5
2.5
4.75
Spalte
1.5
2.5
5.25
Table B.1: Entsprechungspunkte im Inputbild
(Frage II/16 14. April 2000, Frage II/17 30. März 2001, Frage II/14 14. Dezember 2001)
• Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt
werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden. Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln
führen.
[#0057]
304
APPENDIX B. FRAGENÜBERSICHT
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12
Figure B.9: Die Verbindung zweier Pixel soll angenähert werden
Figure B.10: Objekt bestehend aus zwei Flächen
(Frage II/11 26. Mai 2000)
• Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die
Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die
Reihenfolge, damit beide Flächen in die gleiche Richtung weisen“!
[#0069]
”
(Frage II/12 26. Mai 2000, Frage II/12 10. November 2000, Frage II/15 11. Mai 2001, Frage
II/11 14. Dezember 2001)
• Bei der Erstellung eines Bildes mittels recursive raytracing“ trifft der Primärstrahl für ein
”
bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen
aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die
Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten
jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel
zugeordnete Intensität, wenn
1. die Rekursionstiefe nicht beschränkt ist,
2. der Strahl nur genau einmal aufgeteilt wird,
3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt!
Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des
Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden!
[#0072]
(Frage II/15 26. Mai 2000)
B.2. GRUPPE 2
305
2,7 A
0,1
2
0,5
B
3
0,4
2
C
0,1
D
4
E
Figure B.11: Aufteilung des Primärstrahls bei recursive raytracing“
”
y
7
6
5
4
B
A
3
M
2
1
0
0 1 2 3 4 5 6 7 8 9 10 11 x
Figure B.12: Lineare Transformation M eines Objekts A in ein Objekt B
• In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Transformation M in das
Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an,
die diese Transformation beschreibt (zwei verschiedene Lösungen)!
[#0074]
(Frage II/13 26. Mai 2000, Frage II/13 10. November 2000)
• Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett
umrandeten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das
Ergebnis direkt in Abbildung B.13 eintragen.
[#0075]
(Frage II/14 26. Mai 2000)
• Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umrandeten Bereiches
des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in
Abbildung B.14 eintragen.
[#0080]
(Frage II/11 30. Juni 2000, Frage II/14 10. November 2000)
• In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parameter der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter
wahrgenommene Intensität I dieses Punktes!
Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge
gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die
Näherung (1 − x)k ≈ 1 − kx für kleine x verwendbar ist.
[#0085]
(Frage II/12 30. Juni 2000, Frage II/15 15. Dezember 2000)
306
APPENDIX B. FRAGENÜBERSICHT
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 2 2 1
1 1 1 2 3 2 1
1
1
1
Sobel
1
1 1 2 2 2 2 1 1
1 1 2 2 2 2 2 1
Figure B.13: Anwendung des Sobel-Operators auf ein Grauwertbild
0 0 5 0 0 0 0 0 0
0 0 5 0 0 4 0 0 0
0 0 1 5 0 0 1 2 4
0 0 0 5 2 4 5 5 5
0 0 1 3 5 5 5 5 5
0 1 3 5 5 5 2 5 5
0 2 5 5 3 5 5 5 5
Figure B.14: Anwendung eines Median-Filters auf ein Grauwertbild
• Ermitteln Sie zu dem Grauwertbild aus Abbildung B.16 eine Bildpyramide, wobei jedem
Pixel einer Ebene der Mittelwert der entsprechenden vier Pixel aus der übergeordneten
(höher aufgelösten) Ebene zugewiesen wird!
[#0088]
(Frage II/13 30. Juni 2000, Frage II/11 10. November 2000, Frage II/12 15. Dezember 2000,
Frage II/14 28. September 2001)
• Geben Sie einen Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten
”
für das Polygon aus Abbildung B.17 an und zeichnen Sie die von Ihnen verwendeten Trennebenen ein!
[#0089]
(Frage II/14 30. Juni 2000, Frage II/15 10. November 2000, Frage II/15 1. Februar 2002)
• Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherland
anhand des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes
sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks
R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt
aus Abbildung B.18 ablesen.
[#0092]
(Frage II/15 30. Juni 2000)
• Gegeben seien die Transformationsmatrix

0
 0
M =
 1
−2
und zwei Punkte


3
p1 =  −1  ,
1

2 0 0
0 2 0 

0 0 −5 
0 0 8


2
p2 =  4 
−1
in Objektkoordinaten. Führen Sie die beiden Punkte p1 und p2 mit Hilfe der Matrix M in
die Punkte p01 bzw. p02 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei
die Umwandlungen zwischen dreidimensionalen und homogenen Koordinaten)!
[#0099]
B.2. GRUPPE 2
307
Lichtquelle
Beobachter
N
L
V
p
Figure B.15: Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell
Parameter
Formelzeichen
Wert
diffuser Reflexionskoeffizient
Spiegelreflexionskoeffizient
Spiegelreflexionsexponent
Richtung zur Lichtquelle
Richtung zum Beobachter
Oberflächennormalvektor
Intensität des ambienten Umgebungslichtes
Intensität der Lichtquelle
kd
W (θ) = ks
n
L
V
N
Ia
Ip
0.2
0.5
3
(−0.6, 0.8)T
(0.8, 0.6)T
(0, 1)T
0
2
Table B.2: Parameter für das Phongsche Beleuchtungsmodell in Abbildung B.15
(Frage II/11 13. Oktober 2000)
• Wenden Sie den Clipping-Algorithmus von Cohen-Sutherland (in zwei Dimensionen)
auf die in Beispiel B.2 gefundenen Punkte p01 und p02 an, um den innerhalb des Quadrats
Q = {(0, 0)T , (0, 1)T , (1, 1)T , (1, 0)T } liegenden Teil der Verbindungsstrecke zwischen p01 und
p02 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen grafisch lösen.
[#0100]
(Frage II/12 13. Oktober 2000)
• Das Quadrat Q in normalisierten Bildschirmkoordinaten aus Beispiel B.2 wird in ein Rechteck
R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten transformiert. Zeichnen Sie die
Verbindung der zwei Punkte p01 und p02 in Abbildung B.20 ein und bestimmen Sie grafisch
jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu
approximieren!
[#0102]
(Frage II/13 13. Oktober 2000)
• Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden.
Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des
fett umrandeten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen.
Führen Sie außerdem für eines der Pixel den Rechengang vor.
[#0103]
(Frage II/14 13. Oktober 2000)
308
APPENDIX B. FRAGENÜBERSICHT
3 8 9 9
2 7 6 8
0 3 6 9
0 1 2 7
Figure B.16: Grauwertbild als höchstauflösende Ebene einer Bildpyramide
3
4
2
1
Figure B.17: Polygon für BSP-Darstellung
• Nehmen Sie an, der Gradientenoperator in Aufgabe B.2 hätte das Ergebnis in Abbildung
B.22 ermittelt. Zeichnen Sie das Histogramm dieses Gradientenbildes und finden Sie einen
geeigneten Schwellwert, um Kantenpixel“ zu identifizieren. Markieren Sie in Abbildung
”
B.22 rechts alle jene Pixel (Kantenpixel), die mit diesem Schwellwert gefunden werden.
[#0104]
(Frage II/15 13. Oktober 2000)
• Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des
Randes eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete
Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung
B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse.
[#0106]
(Frage II/16 13. Oktober 2000)
• In Abbildung B.24 sind zwei Binärbilder A und B gezeigt, wobei schwarze Pixel logisch 1“
”
und weiße Pixel logisch 0“ entsprechen. Führen sie die Boolschen Operationen
”
1. A and B,
2. A xor B,
3. A minus B
aus und tragen Sie die Ergebnisse in Abbildung B.24 ein!
[#0132]
(Frage II/11 15. Dezember 2000, Frage II/15 15. März 2002)
• Gegeben sei ein Farbwert CRGB = (0.8, 0.5, 0.1)T im RGB-Farbmodell.
1. Welche Spektralfarbe entspricht am ehesten dem durch CRGB definierten Farbton?
2. Finden Sie die entsprechende Repräsentation von CRGB im CMY- und im CMYKFarbmodell!
[#0134]
(Frage II/13 15. Dezember 2000, Frage II/14 19. Oktober 2001)
B.2. GRUPPE 2
309
y
11
B
10
9
8
7
6
5
4
3
2
1
0
A
R
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 x
Figure B.18: Anwendung des Clipping-Algorithmus von Cohen-Sutherland
y
3
2
−2
−1
1
Q
0
1
2
3 x
−1
−2
Figure B.19: Clipping nach Cohen-Sutherland
2
• Bestimmen Sie mit Hilfe der normalisierten Korrelation RN
(m, n) jenen Bildausschnitt innerhalb des fett umrandeten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen
Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie
den gefundenen Bereich in Abbildung B.25!
[#0135]
(Frage II/14 15. Dezember 2000)
• In Abbildung B.26 sehen Sie vier Punkte P1 , P2 , P3 und P4 , die als Kontrollpunkte für
eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des
Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 13 , also x( 13 ), und
erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26
eintragen, eine skizzenhafte Darstellung ist ausreichend.
Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der
Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird. [#0164]
(Frage II/13 2. Februar 2001, Frage II/12 9. November 2001, Frage II/15 14. Dezember 2001,
Frage II/14 15. März 2002)
• Berechnen Sie jene Transformationsmatrix M, die eine Rotation um 45◦ im Gegenuhrzeiger√
sinn um den Punkt R = (3, 2)T und zugleich eine Skalierung mit dem Faktor 2 bewirkt
310
APPENDIX B. FRAGENÜBERSICHT
R
Figure B.20: Verbindung zweier Punkte nach Bresenham
0 0 0 1 2 2 3
0 1 2 3 3 3 3
1 2 3 7 7 6 3
1 2 7 8 9 8 4
2 2 8 8 8 9 5
Figure B.21: Anwendung eines Gradientenoperators
(wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei
Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p0 = Mp in den Punkt
p0 übergeführt wird.
Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für
die Matrixmultiplikation geeignet anwenden.
[#0166]
(Frage II/15 2. Februar 2001)
• In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion
der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine Gausssche Normalverteilung zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren
benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p1 , p2 und p3 in zwei Dimensionen
(also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C!
[#0173]
(Frage II/17 2. Februar 2001)
• Skizzieren Sie das Histogramm des digitalen Grauwertbildes aus Abbildung B.29, und kommentieren Sie Ihre Skizze!
[#0176]
(Frage II/12 2. Februar 2001, Frage II/13 19. Oktober 2001, Frage II/12 14. Dezember 2001)
B.2. GRUPPE 2
311
0 1 2 2 0 0 0
2 3 5 7 6 4 2
1 4 8 7 7 7 4
0 6 8 3 2 6 3
0 8 8 1 0 5 0
Figure B.22: Auffinden der Kantenpixel
Figure B.23: Rand einer Region
• Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass
1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals
unverändert lässt,
2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals
vollständig unterdrückt!
[#0182]
(Frage II/14 2. Februar 2001, Frage II/15 19. Oktober 2001)
• Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öff”
nen“ mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische
Effekt tritt auch in diesem Beispiel auf?
Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis
”
”
rechts in Abbildung B.31 eintragen.
[#0186]
(Frage II/16 2. Februar 2001)
• Gegeben sei ein Farbwert CRGB = (0.8, 0.4, 0.2)T im RGB-Farbmodell. Schätzen Sie grafisch
die Lage des Farbwertes CHSV in Abbildung B.32 (also die Entsprechung von CRGB im HSV0
Modell). Skizzieren Sie ebenso die Lage eines Farbwertes CHSV
, der den gleichen Farbton
und die gleiche Helligkeit aufweist wie CHSV , jedoch nur die halbe Farbsättigung! [#0201]
(Frage II/13 11. Mai 2001, Frage II/13 1. Februar 2002)
312
APPENDIX B. FRAGENÜBERSICHT
A
and
B
xor
minus
Figure B.24: Boolsche Operationen auf Binärbildern
0 1 1 1 2 2 2
0 0 1 0 1 1 2
1 1 1 0 0 1 2
0 1
1 2 2 1 2 1 1
1 2
2 2 1 0 1 0 0
M
0 1 0 0 1 1 0
Figure B.25: Ermittlung der normalisierten Korrelation
• Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender
Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses
Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze Rand in Abbildung
B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst. [#0203]
(Frage II/12 30. März 2001)
• Wenden Sie auf den fett umrandeten Bereich in Abbildung B.34 den Roberts-Operator zur
Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen. [#0204]
(Frage II/14 30. März 2001)
• Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie
einen Schritt des Algorithmus im Detail anhand Ihrer Zeichnung! Wählen Sie den Schwellwert so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann
vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung B.35 einzeichnen.
[#0207]
(Frage II/13 30. März 2001)
• Gegeben seien eine 4 × 4-Matrix

8
 0
M=
 0
0
0
8
0
0

8 −24
8
8 

0 24 
1
1
sowie vier Punkte
p1
p2
p3
p4
=
=
=
=
(3, 0, 1)T
(2, 0, 7)T
(4, 0, 5)T
(1, 0, 3)T
im dreidimensionalen Raum. Die Matrix M fasst alle Transformationen zusammen, die zur
Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p0 = M · p
B.2. GRUPPE 2
313
Figure B.26: Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau
y
5
y
5
4
4
3
3
2
1
0
2
R
R
1
0
0 1 2 3 4 5 6 x
0 1 2 3 4 5 6 x
Figure B.27: allgemeine Rotation mit Skalierung
in Gerätekoordinaten erforderlich sind (siehe auch Abbildung B.36, die Bildschirmebene und
daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Transformationsmatrix M werden die Punkte p1 und p2 auf die Punkte
p01
p02
= (4, 8, 12)T
= (6, 8, 3)T
in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p03 und p04 !
[#0210]
(Frage II/15 30. März 2001)
• Die vier Punkte aus Aufgabe B.2 bilden zwei Strecken
A = p1 p2 ,
B = p3 p4 ,
deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline
fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt
(A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist!
Hinweis: Zeichnen Sie p1 p2 und p3 p4 in die xz-Ebene des Gerätekoordinatensystems ein!
[#0211]
(Frage II/16 30. März 2001)
• In Abbildung B.37 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender Reihenfolge vorkommen (also f (x) = x im angegebenen Koordinatensystem, zur
Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil
1. ein lineares Tiefpassfilter F1 ,
314
APPENDIX B. FRAGENÜBERSICHT
y
6
5
4
3
2
1
0
−1
−2
x
−1 0 1 2 3 4 5 6
Figure B.28: drei Merkmalsvektoren im zweidimensionalen Raum
Figure B.29: digitales Grauwertbild (Histogramm gesucht)
2. ein lineares Hochpassfilter F2
mit 3×3-Filterkernen Ihrer Wahl an und geben Sie Ihr Ergebnis in Form eines Bildausschnitts
wie in Abbildung B.37 oder als Funktionen f1 (x) und f2 (x) an! Zeichnen Sie außerdem
die von Ihnen verwendeten Filterkerne. Randpixel müssen nicht gesondert berücksichtigt
werden.
[#0214]
(Frage II/12 11. Mai 2001, Frage II/11 9. November 2001)
• In Abbildung B.38(a) ist ein digitales Grauwertbild gezeigt, in dem mittels normalisierter
Kreuzkorrelation das Strukturelement aus Abbildung B.38(b) gesucht werden soll. Markieren
Sie in Abbildung B.38(a) die Position, an der der Wert der normalisierten Kreuzkorrelation
maximal ist! Die Aufgabe ist grafisch zu lösen, es sind keine Berechnungen erforderlich.
[#0223]
(Frage II/14 11. Mai 2001)
• Wenden Sie die medial axis“ Transformation von Bloom auf das Objekt in Abbildung B.39
”
links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen.
[#0226]
(Frage II/16 11. Mai 2001)
B.2. GRUPPE 2
315
(a) Tiefpass
(b) Hochpass
Figure B.30: leere Filtermasken
Formelement
Figure B.31: morphologisches Öffnen
• Gegeben seien eine 3 × 3-Transformationsmatrix


3 4 2
M =  −4 3 1 
0 0 1
sowie drei Punkte
a =
b =
c =
(2, 0)T ,
(0, 1)T ,
(0, 0)T
im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine
konforme Transformation, wobei ein Punkt p gemäß p0 = Mp in einen Punkt p0 übergeführt
wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und
bc stehen normal aufeinander.
1. Berechnen Sie a0 , b0 und c0 durch Anwendung der durch M beschriebenen Transformation auf die Punkte a, b und c!
2. Da M eine konforme Transformation beschreibt, müssen auch die Punkte a0 , b0 und
c0 ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall
ist! (Hinweis: es genügt zu zeigen, dass die Strecken a0 c0 und b0 c0 normal aufeinander
stehen.)
[#0229]
316
APPENDIX B. FRAGENÜBERSICHT
grün
gelb
cyan
rot
weiß
blau
magenta
50
Figure B.32: eine Ebene im HSV-Farbmodell
256
Figure B.33: Graukeil
(Frage II/17 11. Mai 2001)
• Geben Sie je eine 3 × 3-Filtermaske zur Detektion
1. horizontaler
2. vertikaler
Kanten in einem digitalen Rasterbild an!
[#0247]
(Frage II/12 26. Juni 2001, Frage II/11 15. März 2002)
• In Abbildung B.40 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender Reihenfolge vorkommen (also f (x) = x im angegebenen Koordinatensystem, zur
Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil die
in Aufgabe B.2 gefragten Filterkerne an und geben Sie Ihr Ergebnis in Form eines Bildausschnitts wie in Abbildung B.40 oder als Funktionen f1 (x) und f2 (x) an! Randpixel müssen
nicht gesondert berücksichtigt werden.
[#0248]
(Frage II/14 26. Juni 2001)
• Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.41 links an. Verwenden Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren
Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.41 rechts
eintragen.
[#0255]
(Frage II/16 26. Juni 2001, Frage II/14 1. Februar 2002)
• Gegeben seien eine Kugel mit Mittelpunkt mS , ein Punkt pS auf der Kugeloberfläche und
eine Lichtquelle an der Position pL mit der Intensität IL . Die Intensität soll physikalisch
korrekt mit dem Quadrat der Entfernung abnehmen. Die Oberfläche der Kugel ist durch das
B.2. GRUPPE 2
317
9 9 8 8 6 7 6 6
7 8 9 8 7 2 3 1
6 8 7 8 3 2 0 1
8 7 8 2 3 1 1 2
7 6 7 1 0 2 3 1
7 6 8 2 2 1 2 0
Figure B.34: Roberts-Operator
Lambert’sche Beleuchtungsmodell beschrieben, der diffuse Reflexionskoeffizient ist kd . Die
Szene wird von einer synthetischen Kamera an der Position pC betrachtet. Berechnen Sie
die dem Punkt pS zugeordnete Intensität IS unter Verwendung der Angaben aus Tabelle
B.3!
Hinweis: der Punkt pS ist von der Kameraposition pC aus sichtbar, diese Bedingung muss
nicht überprüft werden.
[#0257]
Parameter
Formelzeichen
Wert
Kugelmittelpunkt
Oberflächenpunkt
Position der Lichtquelle
Intensität der Lichtquelle
diffuser Reflexionskoeffizient
Position der Kamera
mS
pS
pL
IL
kd
pC
(−2, 1, −4)T
(−4, 5, −8)T
(2, 7, −11)T
343
1
(−e2 , 13.7603, −4π)T
Table B.3: Geometrie und Beleuchtungsparameter der Szene
(Frage II/13 26. Juni 2001)
• In Abbildung B.42(a) ist eine diskret approximierte Linie eingezeichnet. Erzeugen Sie daraus
auf zwei verschiedene Arten eine drei Pixel dicke“ Linie und beschreiben Sie die von Ihnen
”
verwendeten Algorithmen! Sie können die Ergebnisse direkt in die Abbildungen B.42(b) und
B.42(c) einzeichnen.
[#0258]
(Frage II/15 26. Juni 2001, Frage II/11 28. September 2001, Frage 10 20. November 2001)
• Geben Sie eine 4 × 4-Matrix für homogene Koordinaten in drei Dimensionen an, die eine
perspektivische Projektion mit dem Projektionszentrum p0 = (2, 3, −1)T beschreibt!
Hinweis: das Projektionszentrum wird in homogenen Koordinaten auf den Punkt (0, 0, 0, 0)T
abgebildet.
[#0260]
(Frage II/17 26. Juni 2001)
• Gegeben seien eine Kugel S (durch Mittelpunkt M und Radius r), ein Punkt pS auf der
Kugeloberfläche und ein Dreieck T (durch die drei Eckpunkte p1 , p2 und p3 ). Berechnen
Sie unter Verwendung der Angaben aus Tabelle B.4
1. den Oberflächennormalvektor nS der Kugel im Punkt pS ,
2. den Oberflächennormalvektor nT des Dreiecks!
Eine Normierung der Normalvektoren auf Einheitslänge ist nicht erforderlich.
(Frage II/13 28. September 2001)
[#0262]
318
APPENDIX B. FRAGENÜBERSICHT
Figure B.35: zweidimensionale Polygonrepräsentation
• Zeichnen Sie in Abbildung B.43 die zweidimensionale Figur ein, die durch den dort angeführten Kettencode definiert ist. Beginnen Sie bei dem mit ד markierten Pixel. Um welche
”
Art von Kettencode handelt es sich hier (bzgl. der verwendeten Nachbarschaftsbeziehungen)?
[#0268]
(Frage II/15 28. September 2001)
• Ein Laserdrucker hat eine Auflösung von 600dpi. Wie viele Linienpaare pro Millimeter sind
mit diesem Gerät einwandfrei darstellbar (es genügen die Formel und eine grobe Abschätzung)?
[#0269]
(Frage II/12 28. September 2001, Frage II/15 9. November 2001, Frage 8 20. November 2001)
• Gegeben seien ein Punkt pO = (3, −2, −1)T in Objektkoordinaten sowie die Matrizen




4 0 0 −3
1 0 0 0
 0 2 0 4 
 0 1 0 0 



M=
 0 0 3 6 , P =  0 0 0 1 ,
0 0 0 1
0 0 1 0
wobei M die Modellierungs- und P die Projektionsmatrix beschreiben. Berechnen Sie
1. den Punkt pW = M · pO in Weltkoordinaten,
2. den Punkt pS = P · pW in Bildschirmkoordinaten,
3. die Matrix M0 = P · M!
B.2. GRUPPE 2
319
z
8
7
6
5
4
3
2
1
0
x
−1
−2
−1 0 1 2 3 4 5 6 7
Figure B.36: Objekt und Kamera im Weltkoordinatensystem
0
256 x
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
Figure B.37: Graukeil
Hinweis zu 3: die Multplikation mit P entspricht hier lediglich einer Zeilenvertauschung.
[#0272]
(Frage II/12 19. Oktober 2001)
• In Abbildung B.44 sind vier Punkte A, B, C und D eingezeichnet. Transformieren Sie diese
Punkte nach der Vorschrift
x0
y0
= 2x − 3y + xy + 4
= 4x + y − 2xy + 2
und zeichnen Sie Ihr Ergebnis (A0 , B 0 , C 0 und D0 ) direkt in Abbildung B.44 rechts ein! Um
welche Art von Transformation handelt es sich hier?
[#0277]
(Frage II/11 19. Oktober 2001)
• Abbildung B.45 zeigt ein digitales Rasterbild, das als Textur verwendet wird. Durch die große
Entfernung von der virtuellen Kamera erscheint die Fläche im Verhältnis 1:3 verkleinert,
wobei aus Gründen der Effizienz der einfache Sub-Sampling-Algorithmus für die Verkleinerung
verwendet wird. Zeichnen Sie in Abbildung B.45 rechts das Bild ein, wie es am Ausgabegerät
320
APPENDIX B. FRAGENÜBERSICHT
(a)
(b)
Figure B.38: Anwendung der normalisierten Kreuzkorrelation
Figure B.39: Anwendung der medial axis Transformation
erscheint, und markieren Sie links die verwendeten Pixel. Welchen Effekt können Sie hier
beobachten, und warum tritt er auf?
[#0284]
(Frage II/13 9. November 2001)
• Abbildung B.46 zeigt drei digitale Grauwertbilder und deren Histogramme. Geben Sie für
jedes der Bilder B.46(a), B.46(c) und B.46(e) an, welches das dazugehörige Histogramm ist
(B.46(b), B.46(d) oder B.46(f)), und begründen Sie Ihre jeweilige Antwort!
[#0285]
(Frage II/14 9. November 2001)
• Zeichnen Sie in Abbildung B.47 jene Pixel ein, die benötigt werden, um im Halbtonverfahren die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der
Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen
”
dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0289]
(Frage II/11 1. Februar 2002)
• Zeichnen Sie in Abbildung B.48 jene Pixel ein, die benötigt werden, um im Halbtonverfahren die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der
Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen
”
dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0294]
(Frage II/13 1. Februar 2002)
• Wenden Sie auf das Binärbild in Abbildung B.49 links die morphologische Operation Schließen“
”
mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische
Effekt tritt auch in diesem Beispiel auf?
B.2. GRUPPE 2
321
0
256 x
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
Figure B.40: Graukeil
X
Figure B.41: Anwendung des Hit-or-Miss-Operators auf ein Binärbild
Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis
”
”
rechts in Abbildung B.49 eintragen.
[#0297]
(Frage II/12 1. Februar 2002)
• Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.50 links an. Verwenden Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren
Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.50 rechts
eintragen.
[#0301]
(Frage II/12 1. Februar 2002)
• Wenden Sie auf das Binärbild in Abbildung B.51 links die morphologische Operation Schließen“
”
mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische
Effekt tritt auch in diesem Beispiel auf?
Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis
”
”
rechts in Abbildung B.51 eintragen.
[#0303]
(Frage II/14 1. Februar 2002)
• Geben Sie einen Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten
”
322
APPENDIX B. FRAGENÜBERSICHT
(a) dünne“ Linie
”
(b) dicke“ Linie (Variante 1)
”
(c) dicke“ Linie (Variante 2)
”
Figure B.42: Erstellen dicker Linien
Parameter
Formelzeichen
Wert
Kugelmittelpunkt
Kugelradius
Punkt auf Kugeloberfläche
MS
r
pS
(−2, 1, −4)T
6
(−4, 5, −8)T
Dreieckseckpunkt
Dreieckseckpunkt
Dreieckseckpunkt
p1
p2
p3
(2, 1, 3)T
(3, 5, 3)T
(5, 2, 3)T
Table B.4: Geometrie der Objekte
für das Polygon aus Abbildung B.52 an und zeichnen Sie die von Ihnen verwendeten Trennebenen ein!
[#0305]
(Frage II/15 1. Februar 2002)
• Gegeben seien das Farbbildnegativ in Abbildung B.53 sowie die durch die Kreise markierten
Farbwerte A, B und C laut folgender Tabelle:
Farbe
A
B
C
Farbwert (RGB)
(0.6, 0.4, 0.3)T
(0.3, 0.2, 0.1)T
(0.5, 0.3, 0.1)T
Berechnen Sie die Farbwerte A0 , B 0 und C 0 , die das entsprechende Positivbild an den gleichen
markierten Stellen wie in Abbildung B.53 aufweist!
[#0312]
(Frage II/12 15. März 2002)
• In Abbildung B.54 sollen eine überwachte Klassifikation ( supervised classification“) anhand
”
gegebener Trainingsdaten durchgeführt und auf ebenfalls gegebene neue Daten angewandt
werden. Der Merkmalsraum ( feature space“) ist eindimensional, d.h. es ist nur ein skalares
”
Merkmal ( feature“) zu berücksichtigen. Die Werte dieses Merkmals sind in Abbildung
”
B.54(a) für ein 3 × 3 Pixel großes digitales Grauwertbild eingetragen, Abbildung B.54(b)
zeigt die dazugehörigen Zuordnungen zu den Klassen A und B.
Die Klassifikation soll unter der Annahme einer Normalverteilung (Gauss’sche Wahrscheinlichkeitsdichte) der Daten erfolgen. Bestimmen Sie die Klassenzuordnung der Pixel in Abbildung B.54(c) (tragen Sie Ihr Ergebnis in Abbildung B.54(d) ein) und geben Sie ebenso
Ihre Zwischenergebnisse an!
Hinweis: die Standardabweichung σ beider Klassen ist gleich und muss nicht berechnet werden.
[#0315]
B.2. GRUPPE 2
323
Figure B.43: Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz
221000110077666434544345“
”
y
10
y
10
9
9
8
8
7
6
7
6
5
5
4
3
2
4
D
3
2
C
1 A
B
0
0 1 2 3 4 5 6 7 8 9 10 x
1
0
0 1 2 3 4 5 6 7 8 9 10 x
Figure B.44: Transformation von vier Punkten
(Frage II/13 15. März 2002)
324
APPENDIX B. FRAGENÜBERSICHT
Figure B.45: Sub-Sampling
B.2. GRUPPE 2
325
(a) Vancouver
(b) Histogramm 1
(c) Kluane
(d) Histogramm 2
(e) Steiermark
(f) Histogramm 3
Figure B.46: drei digitale Grauwertbilder und ihre Histogramme
326
APPENDIX B. FRAGENÜBERSICHT
0
1
2
3
4
5
6
7
8
9
2
1
0
Figure B.47: Halbtonverfahren
9
8
7
6
5
4
3
Figure B.48: Halbtonverfahren
Formelement
Figure B.49: morphologisches Schließen
X
Figure B.50: Anwendung des Hit-or-Miss-Operators auf ein Binärbild
B.2. GRUPPE 2
327
Formelement
Figure B.51: Halbtonverfahren
2
1
3
4
Figure B.52: Polygon für BSP-Darstellung
A
C
B
Figure B.53: Farbbildnegativ
2 1 5
A A A
9 8 12
A A B
3 8
8 6 14
B B B
7 11
(a) Trainingsdaten
(b) Klassifikation
(c) neue Daten
Figure B.54: überwachte Klassifikation
(d) Ergebnis
328
APPENDIX B. FRAGENÜBERSICHT
A
Figure B.55: Rechteck mit Störobjekten
Figure B.56: Pixelanordnung
B.3
Gruppe 3
• Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern
Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert.
Verwenden Sie bitte dazu Formelausdrücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf. Stellen Sie auch das Ergebnisbild dar.
[#0008]
(Frage III/20 14. April 2000)
• Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch,
mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser
Pixelanordnung.
[#0010]
(Frage III/18 14. April 2000)
• Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche
Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um
B.3. GRUPPE 3
329
Figure B.57: Bild mit Störungen
die Darstellung der Methode und die Begründung, warum diese Methode die Störungen
entfernen wird.
[#0018]
(Frage III/19 14. April 2000)
• Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt
nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte
ist IA = 100, IB = 50 und IC = 0. Berechne die Beleuchtungswerte nach dem GouraudVerfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden
Pixeln.
[#0035]
(Frage III/21 14. April 2000)
• Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses
Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist,
das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu
segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form
eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)!
[#0064]
(Frage III/16 26. Mai 2000, Frage 9 20. November 2001)
• Die Transformationsmatrix M aus Abbildung B.60 ist aus einer Translation T und einer
Skalierung S zusammengesetzt, also M = T · S (ein Punkt p wird gemäß q = M · p in den
Punkt q übergeführt). Bestimmen Sie T, S und M−1 (die Inverse von M)!
[#0065]
(Frage III/17 26. Mai 2000, Frage III/16 10. November 2000, Frage III/19 28. September
2001)
• In der Vorlesung wurden Tiefenwahrnehmungshilfen ( depth cues“) besprochen, die es dem
”
menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in
der digitalen Bildverarbeitung von verschiedenen shape from X“-Verfahren gelöst. Welche
”
depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden shape
”
”
from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung kann kein solcher Zusammenhang hergestellt werden?
[#0071]
(Frage III/18 26. Mai 2000)
330
APPENDIX B. FRAGENÜBERSICHT
B
A
C
Figure B.58: Rasterdarstellung eines Objekts
1
5
6
6
3
6
7
4
6
6
5
1
2
1
2
0
Figure B.59: Grauwertbild
• In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in
der Mitte heller ist als am Rand. Geben Sie ein Verfahren an, das diese Störung entfernt!
[#0076]
(Frage III/19 26. Mai 2000)
• Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig,
um daraus mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer
als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund des Positivbildes an!
[#0077]
(Frage III/20 26. Mai 2000)
• Auf der derzeit laufenden steirischen Landesausstellung comm.gr2000az“ im Schloss Eggen”
berg in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu
können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden.
Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte Skizze der Anordnung ist in Abbildung B.63 dargestellt.
Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63
markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit
halber folgende Kameraparameter an:
– Brennweite: 10 Millimeter
– geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter
B.3. GRUPPE 3
331

1
 0
M=
 0
0
0
2.5
0
0

0 4
0 −3 

2 0 
0 1
Figure B.60: Transformationsmatrix
Figure B.61: Digitales Rasterbild mit zum Rand hin abfallender Intensität
Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie
eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen
zur Unschärfe der binokularen Tiefenwahrnehmung verwenden.
[#0078]
(Frage III/17 30. Juni 2000)
• Ein Koordinatensystem K1 wird durch Rotation in ein anderes Koordinatensystem K2
übergeführt, sodass ein Punkt mit den Koordinaten p in K1 in den Punkt q = M p in
K2 transformiert wird. In Tabelle B.5 sind vier Entsprechungspunkte zwischen den beiden
Koordinatensystemen gegeben. Bestimmen Sie die 3 × 3-Matrix2 M !
Hinweis: Beachten Sie, dass (da es sich um eine Rotation handelt) ||a|| = ||b|| = ||c|| = 1
und weiters a · b = a · c = b · c = 0, wobei ·“ das Skalarprodukt bezeichnet.
[#0081]
”
Punkt in K1
T
(0, 0, 0)
a = (a1 , a2 , a3 )T
b = (b1 , b2 , b3 )T
c = (c1 , c2 , c3 )T
Punkt in K2
(0, 0, 0)T
(1, 0, 0)T
(0, 1, 0)T
(0, 0, 1)T
Table B.5: Entsprechungspunkte zwischen den zwei Koordinatensystemen K1 und K2
(Frage III/16 30. Juni 2000)
• Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß
q = M p zu transformieren (sog. rigid body transformation“)!
”
2 Homogene
Koordinaten bringen hier keinen Vorteil, da keine Translation vorliegt.
332
APPENDIX B. FRAGENÜBERSICHT
Figure B.62: Farbfilmnegativ
Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“
”
enthalten. Wären sie hingegen explizit formuliert, wäre die Antwort eigentlich Material der
Gruppe I“.
[#0090]
”
(Frage III/18 30. Juni 2000)
• Dem digitalen Rasterbild in Abbildung B.64 ist eine regelmäßige Störung überlagert (kohärentes Rauschen). Beschreiben Sie ein Verfahren, das diese Störung entfernt!
[#0091]
(Frage III/19 30. Juni 2000)
• Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation
Erosion“ angewandt werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dila”
tion genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In anderen
Worten: statt der Erosion sollen andere morphologische Operationen eingesetzt werden, die
in geeigneter Reihenfolge nacheinander ausgeführt das gleiche Ergebnis liefern wie eine Erosion.) Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und
benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende Formelement ist ebenfalls in Abbildung B.65 dargestellt.
Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt aus der
Definitionsmenge Z2 zeigt!
[#0096]
(Frage III/20 30. Juni 2000, Frage III/17 10. November 2000, Frage III/17 14. Dezember
2001)
• Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich
durch die Gleichung
(A B)c = Ac ⊕ B̂
formulieren. Warum ist in dieser Gleichung die Reflexion (B̂) von Bedeutung?
[#0107]
(Frage III/18 13. Oktober 2000)
• Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas flau“.
”
1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert.
2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes
zu verbessern?
Wird durch diese Methoden auch der Informationsgehalt des Bildes vergrößert? Begründen
Sie Ihre Antwort.
[#0108]
(Frage III/20 13. Oktober 2000, Frage III/19 10. November 2000)
333
Kamera 2
B.3. GRUPPE 3
Roboter
2m
x
Kamera 1
z
4m
s
d
hn
ba
urf
W
aktuelle
Ballposition
2m
alle
B
es
y
z
Figure B.63: Vereinfachter Aufbau des bällefangenden Roboters auf der Landesausstellung
comm.gr2000az
• Wie äußern sich für das menschliche Auge
1. eine zu geringe geometrische Auflösung
2. eine zu geringe Grauwerteauflösung
eines digitalen Rasterbildes?
[#0113]
(Frage III/19 13. Oktober 2000, Frage III/20 10. November 2000, Frage III/20 28. September
2001)
• Welche Aussagen kann man über die Summen der Maskenkomponenten eines ( vernünfti”
gen“) Tief- bzw. Hochpassfilters treffen? Begründen Sie Ihre Antwort.
[#0114]
(Frage III/17 13. Oktober 2000, Frage III/18 10. November 2000)
• Ein Farbwert CRGB = (R, G, B)T im RGB-Farbmodell wird in den entsprechenden Wert
CYIQ = (Y, I, Q)T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet:


0.299 0.587
0.114
CYIQ =  0.596 −0.275 −0.321  · CRGB
0.212 −0.528 0.311
Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix ausgedrückt? (Hinweis: Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem
Zusammenhang die Y-Komponente hat.)
[#0120]
334
APPENDIX B. FRAGENÜBERSICHT
Figure B.64: Bild mit überlagertem kohärentem Rauschen
(Frage III/19 14. Dezember 2001)
• Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man3 die zugrundeliegenden Operationen (Erosion und Dilation) wiederholt ausführen. Welches der
folgenden beiden Verfahren führt zum gewünschten Ergebnis:
1. Es wird zuerst die Erosion n-mal ausgeführt und anschließend n-mal die Dilation, also
(((A B) . . . B) ⊕B) . . . ⊕ B
|
{z
}|
{z
}
n−mal n−mal ⊕
2. Es wird die Erosion ausgeführt und anschließend die Dilation, und der Vorgang wird
n-mal wiederholt, also
(((A B) ⊕ B) . . . B) ⊕ B
|
{z
}
n−mal abwechselnd /⊕
Begründen Sie Ihre Antwort und erklären Sie, warum das andere Verfahren versagt! [#0126]
(Frage III/16 15. Dezember 2000, Frage III/20 11. Mai 2001, Frage III/16 14. Dezember
2001, Frage III/17 15. März 2002)
• In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht
zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung
solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche
Fälle typisch?
[#0129]
(Frage III/17 15. Dezember 2000)
• In Abbildung B.67 sehen Sie die aus der Vorlesung bekannte Skizze zur Auswirkung des
morphologischen Öffnens auf ein Objekt (Abbildung B.67(a) wird durch Öffnen mit dem
gezeigten Strukturelement in Abbildung B.67(b) übergeführt). Wie kommen die Rundungen
in Abbildung B.67(b) zustande, und wie könnte man deren Auftreten verhindern? [#0149]
3 abgesehen
von einer Vergrößerung des Maskenelements B
B.3. GRUPPE 3
335
1
2
3
Formelement
Figure B.65: Alternative Berechnung der morphologischen Erosion
(Frage III/18 15. Dezember 2000, Frage III/23 30. März 2001)
• In Abbildung B.68 sind ein Geradenstück g zwischen den Punkten A und B sowie zwei
weitere Punkte C und D gezeigt. Berechnen Sie den Abstand (kürzeste Euklidische Distanz)
zwischen g und den Punkten C bzw. D.
[#0150]
(Frage III/19 15. Dezember 2000)
• In Abbildung B.69 sind ein digitales Rasterbild sowie die Resultate der Anwendung von
drei verschiedenen Filteroperationen gezeigt. Finden Sie die Operationen, die auf Abbildung
B.69(a) angewandt zu den Abbildungen B.69(b), B.69(c) bzw. B.69(d) geführt haben, und
beschreiben Sie jene Eigenschaften der Ergebnisbilder, an denen Sie die Filter erkannt haben.
[#0151]
(Frage III/20 15. Dezember 2000, Frage III/19 19. Oktober 2001)
• Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer
diskretisierten Bildfunktion. Erklären Sie diese Behauptung!
[#0158]
(Frage 4 16. Jänner 2001, Frage III/18 14. Dezember 2001)
• In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen
Transformation in zwei Dimensionen erläutert:
1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare gegeben sind
2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/OutputPunktpaare gegeben sind ( Least squares method“)
”
Die Methode der kleinsten Quadrate kann jedoch auch dann angewandt werden, wenn genau
vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese
Feststellung?
336
APPENDIX B. FRAGENÜBERSICHT
Figure B.66: Foto mit geringem Kontrast
(a)
(b)
Strukturelement
Figure B.67: Morphologisches Öffnen
Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat.
[#0163]
(Frage III/23 2. Februar 2001)
• In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben!
[#0165]
(Frage III/19 2. Februar 2001, Frage III/18 19. Oktober 2001)
• In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt
gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge
des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten
Speicherbedarf einer solchen Repräsentation her, wobei
– das erste (größte) Bild aus N × N Pixeln besteht,
– alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden,
– eine mögliche Komprimierung nicht berücksichtigt werden soll!
Hinweis: Benutzen Sie die Gleichung
P∞
i=0
qi =
1
1−q
für q ∈ R, 0 < q < 1.
(Frage III/18 2. Februar 2001, Frage III/20 1. Februar 2002)
[#0171]
B.3. GRUPPE 3
337
9
B
8
7
g
6
5
4
C
3
2
A
D
1
3
4
5
6
7
8
9 10 11 12 13
Figure B.68: Abstandsberechnung
• Gegeben seien eine Ebene ε und ein beliebiger (zusammenhängender) Polyeder P im dreidimensionalen Raum. Wie kann man einfach feststellen, ob die Ebene den Polyeder schneidet
(also P ∩ ε 6= {})?
[#0172]
(Frage III/21 2. Februar 2001)
• Es sei p(x), x ∈ R2 die Wahrscheinlichkeitsdichtefunktion gemäß Gaussscher Normalverteilung, deren Parameter aufgrund der drei Merkmalsvektoren p1 , p2 und p3 aus Aufgabe
B.2 geschätzt wurden. Weiters seien zwei Punkte x1 = (0, 3)T und x2 = (3, 6)T im Merkmalsraum gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre
Antwort):
1. p(x1 ) < p(x2 )
2. p(x1 ) > p(x2 )
Hinweis: Zeichnen Sie die beiden Punkte x1 und x2 in Abbildung B.28 ein und überlegen Sie
sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C aus Aufgabe B.2 weisen.
[#0174]
(Frage III/22 2. Februar 2001)
• Das digitalen Rasterbild aus Abbildung B.71 soll segmentiert werden, wobei die beiden
Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung hier nicht erfolgreich sein. Welche anderen Bildeigenschaften kann man verwenden, um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können?
[#0181]
(Frage III/20 2. Februar 2001, Frage III/20 14. Dezember 2001)
• In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen
nicht kommutativ ist, d.h. für zwei Transformationsmatrizen M1 und M2 gilt M1 ·M2 6= M2 ·
M1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R1
und R2 , so gilt sehr wohl R1 ·R2 = R2 ·R1 . Geben Sie eine geometrische oder mathematische
Begründung für diesen Sachverhalt an!
Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt! [#0192]
(Frage III/18 30. März 2001, Frage III/16 9. November 2001)
• Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen Erosion“
”
bzw. Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur
”
338
APPENDIX B. FRAGENÜBERSICHT
(a) Originalbild
(b) Filter 1
(c) Filter 2
(d) Filter 3
Figure B.69: verschiedene Filteroperationen
Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion
bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden kann!
Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen.
[#0197]
(Frage III/19 30. März 2001)
• Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung
liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angewandt
”
werden, wobei




1 0 tx
s 0 0
T =  0 1 ty  , S =  0 s 0  .
0 0 1
0 0 1
Nach der Tranformation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt
soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p
des Objekts gemäß obiger Vorschrift in einen Punkt p0 = M · p des transformierten Objekts
überführt. Welche ist die richtige Lösung:
B.3. GRUPPE 3
339
Figure B.70: Zylinder mit koaxialer Bohrung
1. M = T · S
2. M = S · T
Begründen Sie Ihre Antwort und geben Sie M an!
[#0198]
(Frage III/22 30. März 2001)
• In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären Sie, wie die Artefakte am oberen Bildrand zustandekommen, und beschreiben Sie eine
Möglichkeit, deren Auftreten zu verhindern!
[#0205]
(Frage III/21 30. März 2001, Frage III/20 19. Oktober 2001)
• Warum ist die Summe der Maskenelemente bei einem reinen Hochpassfilter immer gleich
null und bei einem reinen Tiefpassfilter immer gleich eins?
[#0212]
(Frage III/20 30. März 2001)
• In Aufgabe B.1 wurde nach den Begriffen Phong-shading“ und Phong-illumination“
”
”
gefragt. Beschreiben Sie eine Situation, in der beide Konzepte sinnvoll zum Einsatz kommen!
[#0218]
(Frage III/19 11. Mai 2001)
• Wendet man in einem digitalen (RGB-)Farbbild auf jeden der drei Farbkanäle einen MedianFilter an, erhält man ein Ergebnis, das vom visuellen Eindruck ähnlich einem Mediangefilterten Grauwertbild ist. Welche Eigenschaft des Median-Filters geht bei einer solchen
Anwendung auf Farbbilder jedoch verloren? Begründen Sie Ihre Antwort!
[#0221]
(Frage III/18 26. Juni 2001)
• Wenden Sie wie bei Frage B.2 ein 3 × 3-Medianfilter F3 auf den Graukeil in Abbildung B.37
an und begründen Sie Ihre Antwort!
[#0222]
(Frage III/18 11. Mai 2001)
•
1. Kommentieren Sie die Wirkung des hohen Rauschanteils von Abbildung B.38(a) (aus
Aufgabe B.2) auf die normalisierte Kreuzkorrelation!
340
APPENDIX B. FRAGENÜBERSICHT
Figure B.71: Segmentierung eines Grauwertbildes
Figure B.72: Artefakte bei einem schachbrettartigen Muster
2. Welches Ergebnis würde man bei Anwendung der normalisierten Kreuzkorrelation mit
dem selben Strukturelement (Abbildung B.38(b)) auf das rotierte Bild in Abbildung
B.73 erhalten? Begründen Sie Ihre Antwort!
[#0224]
(Frage III/21 11. Mai 2001)
• Welche Farbe liegt in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb
”
und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser
geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau?
[#0227]
(Frage III/23 11. Mai 2001)
• Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar4 und die Spring Conference on Computer Graphics stattfinden. Durch einen
4 Für interessierte Studenten aus der Vertiefungsrichtung Computergrafik besteht die Möglichkeit, kostenlos an
diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren.
B.3. GRUPPE 3
341
Figure B.73: Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild
automatischen Prozess wurde daraus Abbildung B.74(b) erzeugt, wobei einige Details (z.B.
die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur
Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise!
[#0228]
(a) Originalbild
(b) verbesserte Version
Figure B.74: automatische Kontrastverbesserung
(Frage III/22 11. Mai 2001)
• In Frage B.1 wurde festgestellt, dass die Abbildung eines dreidimensionalen Objekts auf
die zweidimensionale Bildfläche durch eine Kette von Transformationen beschrieben werden
kann. Erläutern Sie mathematisch, wie dieser Vorgang durch Verwendung des Assoziativgesetzes für die Matrixmultiplikation optimiert werden kann!
[#0237]
(Frage III/19 26. Juni 2001)
• In der Vorlesung wurden die Operationen Schwellwert“ und Median“, anzuwenden auf
”
”
digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden
Operationen im Kontext der Filterung?
[#0244]
342
APPENDIX B. FRAGENÜBERSICHT
1 1 1 1 3 7 7 7 7
1 1 1 1 3 7 7 7 7
1 1 1 1 3 7 7 7 7
1 1 1 1 3 7 7 7 7
Figure B.75: unscharfe Kante in einem digitalen Grauwertbild
(Frage III/20 26. Juni 2001)
• Um einem Punkt p auf der Oberfläche eines dreidimensionalen Objekts die korrekte Helligkeit zuweisen zu können, benötigen alle realistischen Beleuchtungsmodelle den Oberflächennormalvektor n an diesem Punkt p. Wird nun das Objekt einer geometrischen Transformation unterzogen, sodass der Punkt p in den Punkt p0 = Mp übergeführt wird5 , ändert sich
auch der Normalvektor, und zwar gemäß n0 = (M−1 )T n. Geben Sie eine mathematische
Begründung für diese Behauptung!
Hinweis: die durch p und n definierten Tangentialebenen vor bzw. nach der Transformation
sind in Matrixschreibweise durch die Gleichungen nT x = nT p bzw. n0T x0 = n0T p0 gegeben.
[#0250]
(Frage III/21 26. Juni 2001)
• In Abbildung B.75 sehen Sie einen vergößerten Ausschnitt aus einem digitalen Grauwertbild,
der eine unscharfe Kante darstellt. Beschreiben Sie, wie diese Kante aussieht, wenn
1. ein lineares Tiefpassfilter
2. ein Medianfilter
mit Maskengröße 3 × 3 mehrfach hintereinander auf das Bild angewendet wird. Begründen
Sie Ihre Antwort!
[#0251]
(Frage III/22 26. Juni 2001, Frage III/19 9. November 2001, Frage III/16 1. Februar 2002)
• Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 20% magenta, 50% gelb und 30% schwarz
gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den
Farbton in Worten!
[#0252]
(Frage III/23 26. Juni 2001)
• Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 0% magenta, 50% gelb und 30% schwarz
gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den
Farbton in Worten!
[#0261]
(Frage III/20 9. November 2001)
• Skizzieren Sie das Histogramm eines
1. dunklen,
2. hellen,
3. kontrastarmen,
4. kontrastreichen
5 Dieses konkrete Beispiel ist in kartesischen Koordinaten leichter zu lösen als in homogenen Koordinaten. Wir
betrachten daher nur 3 × 3-Matrizen (ohne Translationsanteil).
B.3. GRUPPE 3
343
monochromen digitalen Rasterbildes!
[#0263]
(Frage III/16 28. September 2001)
• Bei vielen Algorithmen in der Computergrafik ist eine Unterscheidung zwischen der Vorder”
und Rückseite“ eines Dreiecks notwendig (z.B. BSP-Baum, back face culling etc.). Wie
kann der Oberflächennormalvektor eines Dreiecks genutzt werden, um diese Unterscheidung
mathematisch zu formulieren (d.h. mit welcher Methode kann man für einen gegebenen
Punkt p feststellen, auf welcher Seite eines ebenfalls gegebenen Dreiecks T er sich befindet)?
Geben Sie außerdem an, ob der Vektor nT aus Aufgabe 2 unter dieser Definition in den
der Vorder- oder Rückseite des Dreiecks zugewandten Halbraum weist. Begründen Sie Ihre
Antwort!
[#0264]
(Frage III/17 28. September 2001)
• Erläutern Sie, wie ein monochromes digitales Rasterbild, das ein Schwarzweißfilm-Negativ
repräsentiert, durch Manipulation seines Histogramms in das entsprechende Positivbild umgewandelt werden kann!
[#0267]
(Frage III/18 28. September 2001)
• In Abbildung B.76 sind die Histogramme von zwei verschiedenen digitalen Grauwertbildern
A und B gezeigt. Nehmen Sie an, es würde nun auf beide Bilder die Operation His”
togrammäqualisierung“ angewandt werden, sodass die neuen Bilder A0 bzw. B 0 daraus entstehen.
1. Skizzieren Sie die Histogramme von A0 und B 0 .
2. Kommentieren Sie die Auswirkung der Histogrammäqualisierung bei den Bildern A
und B bzgl. Helligkeit und Kontrast!
Begründen Sie Ihre Antworten!
(a) Histogramm von Bild A
[#0270]
(b) Histogramm von Bild B
Figure B.76: Histogramme von zwei verschiedenen Bildern
(Frage III/17 19. Oktober 2001)
• Bei der perspektivischen Transformation werden entfernte Objekte zwar verkleinert abgebildet, Geraden bleiben jedoch auch in der Projektion als Geraden erhalten. Geben Sie eine
mathematische Begründung dieser Eigenschaft anhand der Projektionsmatrix


1 0 0 0
 0 1 0 0 

M=
 0 0 0 1 ,
0 0 1 0
die einen Punkt p gemäß p0 = Mp in den Punkt p0 überführt!
Hinweis: die x- und z-Koordinate einer Geraden stehen über die Gleichung x = kz + d
344
APPENDIX B. FRAGENÜBERSICHT
zueinander in Beziehung (Sonderfälle können vernachlässigt werden). Zeigen Sie, dass nach
der Transformation x0 = k 0 z 0 + d0 gilt, und verfahren Sie analog für y.
[#0271]
(Frage III/16 19. Oktober 2001)
• In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle
einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet.
Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte
dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt,
und was sind die typischen Eigenschaften, anhand derer man das Verfahren hier erkennen
kann?
[#0282]
(a) Beleuchtung von links
(b) Beleuchtung von rechts
(c) Detail aus Abbildung B.77(a)
(d) Detail aus Abbildung B.77(b)
Figure B.77: Torus mit Oberflächenstruktur
(Frage III/18 9. November 2001)
B.3. GRUPPE 3
345
• Die morphologische Dilation A ⊕ B kann als
A⊕B =
[
Bx
x∈A
geschrieben werden, also als Mengenvereinigung des an jedes Pixel x ∈ A verschobenen
Maskenelements B. Zeigen Sie unter Verwendung dieser Definition die Kommutativität der
Dilation, also A ⊕ B = B ⊕ A!
Hinweis: Schreiben Sie A ⊕ B = A ⊕ (B ⊕ E), wobei E das 1 × 1 Pixel große Einheits”
maskenelement“ ist, das das Objekt bei der Dilation unverändert lässt.
[#0283]
(Frage III/17 9. November 2001)
• Welche der folgenden Transformationen sind in homogenen Koordinaten durch eine Matrixmultiplikation (x0 = M · x) darstellbar? Begründen Sie Ihre Antwort!
– Translation
– perspektivische Projektion
– Rotation
– bilineare Transformation
– Scherung
– Skalierung
– bikubische Transformation
[#0290]
(Frage III/18 15. März 2002)
• Gegeben sei die Matrix

2 −2 3
M =  2 2 −4  ,
0 0
1

mit deren Hilfe ein Punkt p im zweidimensionalen Raum in homogenen Koordinaten in einen
Punkt p̃0 = M · p̃ übergeführt wird. Diese Operation lässt sich in kartesischen Koordinaten
alternativ als
p0 = s · R(ϕ) · p + t
anschreiben, wobei s der Skalierungsfaktor, R(ϕ) die Rotationsmatrix (Drehwinkel ϕ) und
t der Translationsvektor sind. Ermitteln Sie s, ϕ und t!
[#0292]
(Frage III/19 1. Februar 2002)
• Das Auge des kanadischen Bergschafes in Abbildung B.78(a) ist in den Abbildungen B.78(b)
bis B.78(d) vergößert dargestellt6 . Zur Interpolation wurden das nearest neighbor Verfahren,
bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren
den drei Bildern B.78(b) bis B.78(d) zu und begründen Sie Ihre Antwort!
[#0293]
(Frage III/16 1. Februar 2002)
6 Der
Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen.
346
APPENDIX B. FRAGENÜBERSICHT
• Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei
(1)
1 bietet das Plakat in der Farbe CCMYK an, die Druckerei 2 legt ein Angebot für ein Plakat
(2)
der Farbe CCMYK , wobei
(1)
= (0.6, 0.1, 0.7, 0.0)T ,
(2)
=
CCMYK
CCMYK
(0.2, 0.0, 0.3, 0.3)T .
Welcher Druckerei würden Sie den Auftrag erteilen, wenn
1. möglichst geringe Herstellungskosten
2. ein möglichst intensiver Farbton
das Auswahlkriterium ist? Begründen Sie Ihre Antwort!
[#0295]
(Frage III/19 1. Februar 2002)
• Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei
(1)
1 bietet das Plakat in der Farbe CCMYK an, die Druckerei 2 legt ein Angebot für ein Plakat
(2)
der Farbe CCMYK , wobei
(1)
=
(0.5, 0.0, 0.6, 0.1)T ,
(2)
=
(0.5, 0.3, 0.6, 0.0)T .
CCMYK
CCMYK
Welcher Druckerei würden Sie den Auftrag erteilen, wenn
1. möglichst geringe Herstellungskosten
2. ein möglichst intensiver Farbton
das Auswahlkriterium ist? Begründen Sie Ihre Antwort!
[#0300]
(Frage III/17 1. Februar 2002)
• Das Auge des kanadischen Bergschafes in Abbildung B.79(a) ist in den Abbildungen B.79(b)
bis B.79(d) vergößert dargestellt7 . Zur Interpolation wurden das nearest neighbor Verfahren,
bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren
den drei Bildern B.79(b) bis B.79(d) zu und begründen Sie Ihre Antwort!
[#0304]
(Frage III/18 1. Februar 2002)
• Der in Abbildung B.80 gezeigte BSP-Baum beschreibt ein zweidimensionales Polygon. Die
Trennebenen (bzw. -geraden, da wir den zweidimensionalen Fall betrachten) in jedem Knoten
sind durch Gleichungen der Form ax+by = c gegeben, wobei die Außenseite jeweils durch die
Ungleichung ax + by > c und die Innenseite durch ax + by < c charakterisiert sind. Weiters
führen (wie in Abbildung B.80 gezeigt) die Außen“-Pfade nach links und die Innen“-Pfade
”
”
nach rechts.
Zeichnen Sie in einem geeignet beschrifteten Koordinatensystem das Polygon, das durch
diesen BSP-Baum beschrieben wird, und kennzeichnen Sie, welche Kante zu welcher Gleichung gehört!
[#0307]
(Frage III/20 15. März 2002)
7 Der
Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen.
B.3. GRUPPE 3
347
• Erklären Sie die Begriffe Grenzfrequenz“ (cutoff frequency) und ideales vs. nicht ideales
”
Filter im Zusammenhang mit digitalen Rasterbildern! In welchem Zusammenhang stehen
diese Konzepte mit dem Aussehen des Ausgabebildes eines Filters?
[#0309]
(Frage III/19 15. März 2002)
• In Abbildung B.81 wurde der bekannte Stanford-Bunny mit drei verschiedenen Beleuchtungsmodellen dargestellt. Um welche Beleuchtungsmodelle handelt es sich in den Abbildungen
B.81(a), B.81(b) und B.81(c)? Anhand welcher Eigenschaften der Bilder haben Sie die
gesuchten Beleuchtungsmodelle erkannt?
[#0310]
(Frage III/16 15. März 2002)
348
APPENDIX B. FRAGENÜBERSICHT
(a) Originalbild
(b) Verfahren 1
(c) Verfahren 2
(d) Verfahren 3
Figure B.78: Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren
B.3. GRUPPE 3
349
(a) Originalbild
(b) Verfahren 1
(c) Verfahren 2
(d) Verfahren 3
Figure B.79: Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren
350
APPENDIX B. FRAGENÜBERSICHT
out
out
out
in
Figure B.80: BSP-Baum
(a) Modell 1
(b) Modell 2
(c) Modell 3
Figure B.81: Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle
Index
xy-stitching, 50
z-Buffer, 209
Weber-Ratio, 87
8-Code, 190
Casteljau-Algorithmus, 181, 307
cavalier, 171
chain code, 312, 317
chain-code, 189
chromaticity, 91
classification
supervised, 291
unsupervised, 291
Clipping, 167
clipping, 165
CMY-Farbmodell, 95, 302
CMYK-Farbmodell, 93, 95, 289, 302
Cohen-Sutherland, 167, 300, 301
color model, 107, 292
Computer Graphics/Visualization, 57
computer-aided tomographic, 207
Computergrafik und Bildanalyse, 266, 284
Computertomografie, 53, 286
cone-tree, 263
cones, 45
control points, 180, 289
convolution, 130
cornea, 45
CSG, 199, 296
cut-of-frequency, 134
cutoff-Frequenz, 135, 287
absolute transformation, 232
Abtastung, 34, 117
Active Vision, 265
active vision, 265
Affine matching, 285
anaglyphs, 230
anterior chamber, 45
Anti-Blur filter, 259
Approximation, 179, 287, 289
approximation, 155
Auflösung, geometrische, 34, 117, 283, 284
Auflösung, radiometrische, 34, 284
Augenabstand, 229
Augmented Reality, 256, 285
augmented reality, 56
Augmented Relity, 57, 256, 287
back-face culling, 208
basis matrix, 177
Basisfunktion, 177
Bezier-Kurve, 181, 307
Bezier-Kurven, 179
bi-directional reflectivity function, 222
Bilderkennung, 35
Bildmodell, 34, 283
bilineare Transformation, 164, 329
binokulares Sehen, 231, 297
blending functions, 177
Blending Funktionen, 177
blind spot, 45
Boundary-Representation, 196, 298
bounding boxes, 208
box function, 134
Bresenham-Algorithmus, 65, 283, 297, 301
BSP-Tree, 198, 300, 315
Bump-Mapping, 148, 285
bump-mapping, 148
data garmets, 56, 292
data-garments, 56
DDA-Algorithmus, 65, 283
density, 88
density slicing, 105
depth cues, 207, 323
descriptive geometry, 171
direct capture system, 52
Diskretisierung, 34
distance
between pixels, 38
dither matrix, 88
dots, 107
dynamic range, 88
dynamischer Bereich, 88, 115, 284
cabinet, 171
calibration, 256
edge-image, 132
Elektronenstrahlschirm, 35
351
352
Entzerrung, 287
Erosion, 75, 326
exterior orientation, 48, 56
Füllen von Polygonen, 65, 289
Farbfilmnegativ, 107, 324
Farbmodell, CIE, 92, 93, 283, 284
Farbmodell, CMY, 95, 286
Farbmodell, CMYK, 95, 286
Farbmodell, RGB, 93, 95, 284
feature, 290
feature space, 290
feature vector, 46
Fenster, 40
fiducial marks, 173
Filter, 40
filter
high pass
Butterworth, 136, 292
ideal, 136, 292
filter mask, 127
Fourier-Transformation, 289, 326
fovea, 45
Freiheitsgrad, 168, 325
gamma, 49
Gauss-filter, 129, 286
Geometrievektor, 177
geometry vector, 177
Gouraud-shading, 219, 287, 323
gradation curve, 119
Gradientenbild, 134, 301
Grafik-Pipeline, 265, 285
Grauwertzuweisung, 287
gross fog, 49
Halbraumcodes, 167
half tone, 314
half-space codes, 166
halo, 212
head-mounted displays, 56
hierarchical matching, 234
histogram, 337
equalization, 50, 105, 117, 119
spreading, 119
Histogramm, 120, 323
Hit-or-miss Operator, 80, 289
Hochpassfilter, 132, 324
homogene Koordinaten, 157, 168, 285, 299,
300, 325
homologue points, 232
HSV-Farbmodell, 97, 305
hue, 96
Human-Computer-Interfaces, 263
INDEX
hyper-spectral, 46
ideal filter, 134
illuminate, 51
image
black & white, 87
color, 87
false color, 90
half tone, 88
image flickering, 230
Image Processing/Computer Vision, 57
image quality, 115
immersive visualization, 263
information slices, 263
inner orientation, 232
intensity slicing, 105
Interpolation, 179, 287, 289
interpolation, 155, 176
Interpolation, bilineare, 252, 297
Kante, 38, 286
Kantendetektion, 134, 306
Kell-factor, 117
Kettenkodierung, 190
Klassifikation, 240, 244, 287, 304
Klassifizierung, 242
Koeffizientenmatrix, 162
Koordinatentransformation, 325
Korrelation, normalisiert, 234, 303
leaf, 192
Least Squares, 180
Least Squares Method, 164
Least squares method, 164, 329
Level of Detail, 199
light, 90
line pairs per millimeter, 50
Linie, 38
Linienpaar, 117
listening mode, 53
logische Verknüpfung, 40
luminance, 90
luminosity, 117
Man-Machine Interaction, 263
Maske, 40
masked negative, 106
median filter, 128
Median-Filter, 129, 299
Medianfilter, 129, 285, 323
Mehrfachbilder, 46, 287
Merkmalsraum, 242
mexican hat, 131
MIP-maps, 249
INDEX
mirror stereoscope, 230
Moiree effect, 107, 292
moments, 145
morphological
closing, 314, 315
erosion, 75, 283
filtering, 79, 291, 322
opening, 77, 78, 284, 305, 328
morphology, 75, 82, 326
mosaicing, 249
motion blur, 259, 291
Motion Picture Expert Group, 274
multi illumination, 56
multi-images, 46
multi-position, 46
multi-sensor, 46
multi-spectral, 46
multi-temporal, 46
multiple path, 50
Multispektralbild, 32
Multispektrales Abtastsystem, 52
Nachbarschaft, 38, 283
nearest neighbor, 165, 251
negative color, 91
nicht-perspektive Kamera, 51, 53, 286
node file, 251
nodes, 251
normal equation matrix, 164
offset print, 107, 292
one-point perspective, 172
Operationen, algebraische, 40
operator
Marr-Hildreth, 293
optische Dichte, 107, 324
parallactic angle, 229
parallax, 229
parallel difference, 229
Parametervektor, 164, 329
parametrische Kurvendarstellung, 177
paraphernalia, 48
passive Radiometrie, 54, 289
Passpunkte, 289
Phong-Modell, 218, 299
Phong-shading, 219, 287
photo detector, 49
photo-multiplier, 49
photography
negative, 337
Photometric Stereo, 206
pigments, 90
pipeline, 265
353
polarization, 230
pose, 48, 56
preprocessing, 120
projection, oblique, 171
projection, orthographic, 171
Projektionen, planar, 172, 284
prozedurale Texturen, 150, 289
pseudo-color, 90, 105
push-broom technology, 49
Quadtree, 193, 296
Radar, 54, 287
Radiosity, 285
radiosity, 222
Rasterdarstellung, 35
Rasterkonversion, 36, 284
ratio imaging, 107
Ratio-Bild, 113, 283
Rauschen, kohärentes, 326
ray tracing, 210, 291
ray-tracing, 210
Raytracing, recursive, 208, 298
Rectangular Tree, 199
Region, 38
relative orientation, 232
remote sensing, 52
Resampling, 287
resampling, 165, 291
Resampling, geometrisches, 249, 285
resolution, 45
RGB-Farbmodell, 93, 95, 97, 289, 302, 305
rigid body transformation, 168, 325
ringing, 135
Roberts-Operator, 134, 306
rods, 45
Rotation, 157, 303
Sampling, 34
Scannen, 50, 287
scanning electron-microscopes, 55
Schwellwert, 120, 323
Schwellwertbild, 32
Schwerpunkt, 73, 322
screening, 88
Segmentierung, 120, 323
sensor
non-optical, 233, 292
sensor model, 46
Sensor-Modell, 48, 289
Shape-from-Focus, 206
Shape-from-Shading, 206
Shape-from-X, 207, 323
sinc-filter, 128
354
Skalierung, 157, 303
Sobel-Operator, 133, 299
sound, navigation and range, 54
spatial partitioning, 197, 289
spatial-domain representation, 130
spectral representation, 130
Spektralraum, 147, 286
Spiegelreflexion, 218, 284
splitting, 190, 306
spy photography, 116
starrer Körper, 168, 325
step and stare, 49
Stereo, 230, 232, 285, 324
Stereo, photometrisches, 207, 284
stereo-method, 206
stereopsis, 175, 233, 292
Structured Light, 206
structured light, 55
Strukturelement, 82
support, 137
sweep, 195
Sweeps, 195, 287
table-lens, 263
template, 127
texels, 147
Textur, 147, 286
Texture-Mapping, 285
Tiefenunterschied, 229
Tiefenwahrnehmungshilfen, 207, 323
Tiefpassfilter, 135, 287
total plastic, 231
track, 55
Tracking, 57, 256, 287
transform
medial axis, 68, 308
Transformation, 157, 162, 252, 297, 299
transformations
conform, 170, 292
Transformationsmatrix, 157, 168, 299, 300,
303, 323
tri-chromatic coefficients, 91
tri-stimulus values, 91
trivial acceptance, 166
Trivial rejection, 166
undercolor removal, 95
Unsharp Masking, 132, 284
unsharp masking, 131
US Air Force Resolution Target, 50
vanishing point, 171
Vektordarstellung, 35
View Plane, 174
INDEX
View Plane Normal, 174
view point, 232
view point normals, 232
View Reference Point, 174
View-Frustum, 199
Virtual Reality, 256, 285
vitreous humor, 45
volume element, 53
voxel, 53
Voxel-Darstellung, 285
Wahrscheinlichkeitsdichtefunktion, 244, 304
window, 127
wire-frame, 194
XOR, 40
YIQ-Farbmodell, 96, 286, 327
Zusammenhang, 38
List of Algorithms
1
Affine matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2
Threshold image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3
Simple raster image scaling by pixel replication . . . . . . . . . . . . . . . . . . . .
42
4
Image resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
5
Logical mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
6
Fast mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
7
Digital differential analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
8
Thick lines using a rectangular pen . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
9
Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
10
Halftone-Image (by means of a dither matrix) . . . . . . . . . . . . . . . . . . . . .
95
11
Conversion from RGB to HSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12
Conversion from HSI to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13
Conversion from GRB to HSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
14
Conversion from HSV to RGB
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
15
Conversion from RGB to HLS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
16
Conversion from HLS to RGB
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
17
hlsvalue(N1,N2,HLSVALUE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
18
Masked negative of a color image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
19
Histogram equalization
20
Local image improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
21
Weighted Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
22
Gupta-Sproull-Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
23
Texture mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
24
Casteljau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
25
Chain coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
26
Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
27
Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
28
Creation of a BSP tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
355
356
LIST OF ALGORITHMS
29
z-buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
30
Raytracing for Octrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
31
Gouraud shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
32
Phong - shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
33
Shadow map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
34
Implementation of Atheron-Weiler-Greeberg Algorithm . . . . . . . . . . . . . . . . 227
35
Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
36
Feature space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
37
Classification without rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
38
Classification with rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
39
Calculation with a node file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
40
Nearest neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
41
z-buffer pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
42
Phong pipeline
43
Pipeline for lossless compression
44
Pipeline for lossy compression
45
JPEG image compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
46
MPEG compression pipeline
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
List of Definitions
1
Amount of data in an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
2
Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3
Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4
Perspective camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
5
Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6
Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
7
Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
8
Open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
9
Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
10
Morphological filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
11
Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
12
Contour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
13
Conversion from CIE to RGB
98
14
CMY color model
15
CMYK color model
16
YIQ color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
17
Histogram stretching
18
Conformal transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
19
Rotation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
20
2D rotation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
21
Sequenced rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
22
Affine transformation with 2D homogeneous coordinates . . . . . . . . . . . . . . . 168
23
Bliniear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
24
Rotation in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
25
Affine transformation with 3D homogeneous coordinates . . . . . . . . . . . . . . . 176
26
Bezier-curves in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
27
2D morphing for lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
28
Wireframe structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
357
358
LIST OF DEFINITIONS
29
Boundary representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
30
Cell-structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
31
Ambient light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
32
Lambert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
33
total plastic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
List of Figures
4.1
Morphologische Erosion als Abfolge Komplement→Dilation→Komplement . . . . .
82
4.2
morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
5.1
Histogramm von Abbildung B.29 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
5.2
eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1
Histogramm eines Graukeils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2
Histogramme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.1
Anwendung eines Median-Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2
Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3
Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.4
Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.1
rotated coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2
Konstruktion einer Bezier-Kurve nach Casteljau . . . . . . . . . . . . . . . . . 187
11.1 grafische Auswertung des z-Buffer-Algorithmus . . . . . . . . . . . . . . . . . . . . 216
B.1 wiederholte Speicherung eines Bildes in verschieden Größen . . . . . . . . . . . . . 294
B.2 dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt . . . . . 300
B.3 Überführung einer Vektorgrafik in eine andere . . . . . . . . . . . . . . . . . . . . . 301
B.4 Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale Bildfläche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
B.5 Pixelraster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
B.6 binäres Rasterbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
B.7 Tisch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
B.8 Inputbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
B.9 Die Verbindung zweier Pixel soll angenähert werden . . . . . . . . . . . . . . . . . 304
B.10 Objekt bestehend aus zwei Flächen . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
B.11 Aufteilung des Primärstrahls bei recursive raytracing“
”
359
. . . . . . . . . . . . . . . 305
360
LIST OF FIGURES
B.12 Lineare Transformation M eines Objekts A in ein Objekt B . . . . . . . . . . . . . 305
B.13 Anwendung des Sobel-Operators auf ein Grauwertbild . . . . . . . . . . . . . . . 306
B.14 Anwendung eines Median-Filters auf ein Grauwertbild . . . . . . . . . . . . . . . . 306
B.15 Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell . . . . . 307
B.16 Grauwertbild als höchstauflösende Ebene einer Bildpyramide . . . . . . . . . . . . 308
B.17 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
B.18 Anwendung des Clipping-Algorithmus von Cohen-Sutherland . . . . . . . . . . 309
B.19 Clipping nach Cohen-Sutherland . . . . . . . . . . . . . . . . . . . . . . . . . . 309
B.20 Verbindung zweier Punkte nach Bresenham . . . . . . . . . . . . . . . . . . . . . 310
B.21 Anwendung eines Gradientenoperators . . . . . . . . . . . . . . . . . . . . . . . . . 310
B.22 Auffinden der Kantenpixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
B.23 Rand einer Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
B.24 Boolsche Operationen auf Binärbildern . . . . . . . . . . . . . . . . . . . . . . . . 312
B.25 Ermittlung der normalisierten Korrelation . . . . . . . . . . . . . . . . . . . . . . . 312
B.26 Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau . . . 313
B.27 allgemeine Rotation mit Skalierung . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
B.28 drei Merkmalsvektoren im zweidimensionalen Raum . . . . . . . . . . . . . . . . . 314
B.29 digitales Grauwertbild (Histogramm gesucht) . . . . . . . . . . . . . . . . . . . . . 314
B.30 leere Filtermasken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
B.31 morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
B.32 eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
B.33 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
B.34 Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
B.35 zweidimensionale Polygonrepräsentation . . . . . . . . . . . . . . . . . . . . . . . . 318
B.36 Objekt und Kamera im Weltkoordinatensystem . . . . . . . . . . . . . . . . . . . . 319
B.37 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
B.38 Anwendung der normalisierten Kreuzkorrelation
. . . . . . . . . . . . . . . . . . . 320
B.39 Anwendung der medial axis Transformation . . . . . . . . . . . . . . . . . . . . . . 320
B.40 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
B.41 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 321
B.42 Erstellen dicker Linien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
B.43 Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz 221000110077666434544345“ 323
”
B.44 Transformation von vier Punkten . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
B.45 Sub-Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
B.46 drei digitale Grauwertbilder und ihre Histogramme . . . . . . . . . . . . . . . . . . 325
B.47 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
B.48 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
LIST OF FIGURES
361
B.49 morphologisches Schließen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
B.50 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 326
B.51 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
B.52 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
B.53 Farbbildnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
B.54 überwachte Klassifikation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
B.55 Rechteck mit Störobjekten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.56 Pixelanordnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.57 Bild mit Störungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.58 Rasterdarstellung eines Objekts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.59 Grauwertbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.60 Transformationsmatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
B.61 Digitales Rasterbild mit zum Rand hin abfallender Intensität . . . . . . . . . . . . 331
B.62 Farbfilmnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
B.63 Vereinfachter Aufbau des bällefangenden Roboters auf der Landesausstellung comm.gr2000az333
B.64 Bild mit überlagertem kohärentem Rauschen . . . . . . . . . . . . . . . . . . . . . 334
B.65 Alternative Berechnung der morphologischen Erosion . . . . . . . . . . . . . . . . . 335
B.66 Foto mit geringem Kontrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
B.67 Morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
B.68 Abstandsberechnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
B.69 verschiedene Filteroperationen
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
B.70 Zylinder mit koaxialer Bohrung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
B.71 Segmentierung eines Grauwertbildes . . . . . . . . . . . . . . . . . . . . . . . . . . 340
B.72 Artefakte bei einem schachbrettartigen Muster . . . . . . . . . . . . . . . . . . . . 340
B.73 Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild . . . . . . . 341
B.74 automatische Kontrastverbesserung . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
B.75 unscharfe Kante in einem digitalen Grauwertbild . . . . . . . . . . . . . . . . . . . 342
B.76 Histogramme von zwei verschiedenen Bildern . . . . . . . . . . . . . . . . . . . . . 343
B.77 Torus mit Oberflächenstruktur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
B.78 Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
B.79 Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
B.80 BSP-Baum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
B.81 Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle 350
362
LIST OF FIGURES
Bibliography
[FvDFH90] James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes. Computer Graphics, Principles and Practice, Second Edition. Addison-Wesley, Reading,
Massachusetts, 1990. Overview of research to date.
[GW92]
Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Addison-Wesley,
June 1992. ISBN 0-201-50803-6.
363