Color and Geometrical Structure in Images

Transcription

Color and Geometrical Structure in
Images
Applications in microscopy
Jan-Mark Geusebroek
This book was typeset by the author using LATEX 2ε .
Cover: Victory Boogie Woogie, by Piet Mondriaan, 1942–1944, oil-painting with
pieces of plastic and paper. Reproduction and permission for printing kindly provided by Gemeentemuseum Den Haag.
c
Copyright °2000
by Jan-Mark Geusebroek.
All rights reserved. No part of this publication may be reproduced or transmitted in
any form or by any means, electronic or mechanical, including photocopy, recording,
or any information storage and retrieval system, without permission from the author
([email protected]).
ISBN 90-5776-057-6
Color and Geometrical Structure in
Images
Applications in microscopy
ACADEMISCH PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Universiteit van Amsterdam,
op gezag van de Rector Magnificus
prof. dr J. J. M. Franse
ten overstaan van een door het College voor Promoties ingestelde commissie,
in het openbaar te verdedigen in de Aula der Universiteit
op donderdag 23 november 2000 te 12.00 uur
door
Jan-Mark Geusebroek
geboren te Amsterdam
Promotiecommissie:
Prof. dr ir A. W. M. Smeulders
Dr H. Geerts
Prof. dr J. J. Koenderink
Prof. dr G. D. Finlayson
Prof. dr ir L. van Vliet
Prof. dr ir C. A. Grimbergen
Prof. dr ir F. C. A. Groen
Prof. dr P. van Emde Boas
Faculteit:
Natuurwetenschappen, Wiskunde & Informatica
Kruislaan 403
1098 SJ Amsterdam
Nederland
The investigations described in this thesis were carried out at the Janssen Research
Foundation, Beerse, Belgium.
The study was supported by the Janssen Research Foundation.
Advanced School for Computing and Imaging
The work described in this thesis has been carried out at the Intelligent Sensory
Information Systems group. This work was carried out in graduate school ASCI.
ASCI dissertation series number 54.
Contents
1 Introduction
1.1 Part I: Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Part II: Geometrical Structure . . . . . . . . . . . . . . . . . . . . . .
2 Color and Scale
2.1 Color and Observation Scale . . . . . . . . .
2.1.1 The Spectral Structure of Color . . .
2.1.2 The Spatial Structure of Color . . .
2.2 Colorimetric Analysis of the Gaussian Color
2.3 Conclusion . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
Model
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
4
13
14
14
16
17
19
3 A Physical Basis for Color Constancy
3.1 Color Image Formation Model . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Color Formation for Reflection of Light . . . . . . . . . . . . .
3.1.2 Color Formation for Transmission of Light . . . . . . . . . . . .
3.1.3 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Illumination Invariant Properties of Object Reflectance or Transmittance
3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Small-Band Experiment . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Broad-Band Experiment . . . . . . . . . . . . . . . . . . . . . .
3.3.4 Colorimetric Experiment . . . . . . . . . . . . . . . . . . . . . .
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
25
25
27
29
30
32
32
35
36
36
38
4 Measurement of Color Invariants
4.1 Color Image Formation Model . . . . . . . . . . . . .
4.2 Determination of Color Invariants . . . . . . . . . . .
4.2.1 Invariants for White but Uneven Illumination
4.2.2 Invariants for White but Uneven Illumination
Surfaces . . . . . . . . . . . . . . . . . . . . .
43
45
46
46
i
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
and Matte, Dull
. . . . . . . . . .
48
ii
CONTENTS
4.2.3
4.3
4.4
Invariants for White, Uniform Illumination and Matte, Dull
Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.4 Invariants for Colored but Uneven Illumination . . . . . . . . .
4.2.5 Invariants for a Uniform Object . . . . . . . . . . . . . . . . . .
4.2.6 Summary of Color Invariants . . . . . . . . . . . . . . . . . . .
4.2.7 Geometrical Color Invariants in Two Dimensions . . . . . . . .
Measurement of Color Invariants . . . . . . . . . . . . . . . . . . . . .
4.3.1 Measurement of Geometrical Color Invariants . . . . . . . . . .
4.3.2 Discriminative Power for RGB Recording . . . . . . . . . . . .
4.3.3 Evaluation of Scene Geometry Invariance . . . . . . . . . . . .
4.3.4 Localization Accuracy for the Geometrical Color Invariants . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Robust Autofocusing in Microscopy
5.1 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 The Focus Score . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Measurement of the Focus Curve . . . . . . . . . . . . . .
5.1.3 Sampling the Focus Curve . . . . . . . . . . . . . . . . . .
5.1.4 Large, Flat Preparations . . . . . . . . . . . . . . . . . . .
5.1.5 Preparation and Image Acquisition . . . . . . . . . . . . .
5.1.6 Evaluation of Performance for High NA . . . . . . . . . .
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Autofocus Performance Evaluation . . . . . . . . . . . . .
5.2.2 Evaluation of Performance for High NA . . . . . . . . . .
5.2.3 Comparison of Performance with Small Derivative Filters
5.2.4 General Observations . . . . . . . . . . . . . . . . . . . .
5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
51
52
53
54
55
56
61
63
64
66
.
.
.
.
.
.
.
.
.
.
.
.
.
73
74
74
75
77
77
78
81
82
82
83
85
85
86
6 Segmentation of Tissue Architecture by Distance Graph Matching
6.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Hippocampal Tissue Preparation . . . . . . . . . . . . . . . . .
6.1.2 Image Acquisition and Software . . . . . . . . . . . . . . . . . .
6.1.3 K-Nearest Neighbor Graph . . . . . . . . . . . . . . . . . . . .
6.1.4 Distance Graph Matching . . . . . . . . . . . . . . . . . . . . .
6.1.5 Distance Graph Comparison . . . . . . . . . . . . . . . . . . .
6.1.6 Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.7 Evaluation of Robustness on Simulated Point Patterns . . . . .
6.1.8 Algorithm Robustness Evaluation . . . . . . . . . . . . . . . . .
6.1.9 Robustness for Scale Measure . . . . . . . . . . . . . . . . . . .
6.1.10 Cell Detection . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.11 Hippocampal CA Region Segmentation . . . . . . . . . . . . .
91
93
93
93
94
94
96
97
98
99
100
100
100
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iii
CONTENTS
6.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
101
101
105
105
107
109
Lines
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
116
116
116
118
119
120
121
122
122
125
125
125
125
126
127
8 Discussion
8.1 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Geometrical Structure . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 General Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
137
137
139
140
Samenvatting
143
6.3
6.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Algorithm robustness evaluation . . . . . . . . . . . . . .
6.2.2 Robustness for Scale Measure . . . . . . . . . . . . . . . .
6.2.3 Hippocampal CA Region Segmentation . . . . . . . . . .
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix: Dynamic Programming Solution for String Matching
7 A Minimum Cost Approach for Segmenting Networks
7.1 Network Extraction Algorithm . . . . . . . . . . . . . .
7.1.1 Vertex Detection . . . . . . . . . . . . . . . . . .
7.1.2 Line Point Detection . . . . . . . . . . . . . . . .
7.1.3 Line Tracing . . . . . . . . . . . . . . . . . . . .
7.1.4 Graph Extraction . . . . . . . . . . . . . . . . . .
7.1.5 Edge Saliency and Basin Coverage . . . . . . . .
7.1.6 Thresholding the Saliency Hierarchy . . . . . . .
7.1.7 Overview . . . . . . . . . . . . . . . . . . . . . .
7.1.8 Error Analysis . . . . . . . . . . . . . . . . . . .
7.2 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Heart Tissue Segmentation . . . . . . . . . . . .
7.2.2 Neurite Tracing . . . . . . . . . . . . . . . . . . .
7.2.3 Crack Detection . . . . . . . . . . . . . . . . . .
7.2.4 Directional Line Detection . . . . . . . . . . . . .
7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .
of
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
Chapter 1
Introduction
When looking at Victory Boogie Woogie, by the Dutch painter Piet Mondrian, the
yellow blocks appear jumpy and unstable, as if they move [33]. As the painting
hangs firmly fixed to the wall, the visual effect results from within the brain as it
processes the incoming visual information. In fact, a visual scene which enters the
brain fed into three subsystems [24, 34]. One subsystem segments the scene in parts
by the apparent color contrast. The subsystem gives the ability to see the various
colored patches as different entities. A second subsystem provides us with the color
of the parts. The subsystem is used for identifying the patches based on their color.
The third subsystem localizes objects in the world. It tells us where the patches
are in the scene. In contrast, the latter system is color blind, judging the scene
on intensity variations only. Cooperation between the first subsystem, segmenting
the different colored parts, and the latter subsystem, localizing the different patches,
results in ambiguity when the intensity of neighboring color patches is similar. The
phenomenon is in effect in Victory Boogie Woogie by the yellow stripes on a white
background, as described by Livingstone [33].
Apart from the color appearance of the blocks, Mondrian arranged blocks to form
a pattern of perpendicular lines. The visual arrangement is sifted out by the third,
monochromatic subsystem which extracts the spatial organization of the scene. The
lines are effectuated by an intensity contrast with the background. The yellow stripes
have no such contrast, but lines appear as the gaps are supplemented by the brain. In
Victory Boogie Woogie, Mondrian combined local color contrast and the geometrical
arrangement of details to stimulate a visual sensation in the brain.
Like Victory Boogie Woogie, this thesis deals with both color and spatial structure.
Part I describes the spatial interaction between colors. Color is discussed in its physical environment of light. Consequently, the physics of light reflection are included
in the human subsystem dealing with shape extraction. Part II describes the quantification of geometrical structure specifically applied to microscopy, although some
1
2
Introduction
of the concepts may have a broader application span. Tissue at the microscopical
level often exhibits a regular pattern. Automatic extraction of such arrangements is
considered, aiming at drug screening for pharmaceutical research. The two parts are
mostly separated from one another, as is the case for perception. Using the parts in
future research in conjunction may have synergy on color image processing.
1.1
Part I: Color
Color seems to be an unalienable property of objects. It is the orange that has
that color. However, the heart of the matter is quite different. Human perception
actively assigns colors to an observed scene. There is a discrepancy between the
physics of light, and color as signified by the brain. One undeniable fact is that color
perception is bootstrapped by a physical cause: it results from light falling onto the
eye. Objects in the world respond to daylight by reflecting differently part of the
incoming light spectrum. The specific component of reflection mainly instantiates
the color appearance of the object. Another fact is that color perception results
from experience. We assign the color of an orange that label as we have learned by
experience, as we are capable to do so by the biological mechanism. Experience has
led to the denominations of signs to colors. It would have given language no advantage
to label colors when we could not compare them with memory. A last contribution
to color as we know it is evolution that has shaped the actual mechanism of color
vision. Evolution, such that a species adapts to its environment, has driven the use
of color by perception. Color is one of the main cues for segmenting objects in a
scene. The difference in color of the green leaves that obscure oranges allows for easy
detection of fruit. Color has a high identification power for an object. Orange things
in a tree are clearly no lemons, although the shape is similar. Color in combination
with shading provides a clue for depth perception, hence geometry, of an object. For
monochromatic vision, such clues are highly ambiguous. Hence, color perception gives
primates advantage over monochromatic species.
In terms of physics, daylight is reflected by an object and reaches the eye. It is the
reflectance ratio over the wavelengths of radiant energy that is an object property,
hence the reflection function for an orange indeed is a physical characteristic of the
fruit. However, the amount of radiant energy falling onto the retina depends on
both the reflectance function and the light source illuminating the object. Still, we
observe an orange to be orange in sunlight, by candlelight, independent of shadow,
frontal illumination, or oblique illumination. All these variables influence the energy
distribution as it enters the eye, the variability being imposed by the physical laws of
light reflection. Human color vision has adapted to include these physical laws, due
to which we neglect the scene induced variations.
Observation of color by the human visual system proceeds by absorption of light
1.1. Part I: Color
3
by three different receptors. The pigment sensitivities extent over a broad range of
wavelengths, but they can be characterized by the spectral region for which sensitivity
is maximum. The maximum absorption of the pigments is at short, middle, and long
wavelengths, for which reason the receptors are named blue, green, and red cones.
Before transmission of the image to the brain, the output of the receptors are combined
in two ways. First, the output of the three cone types at each position on the retina
are combined to represent three opponent color axes. One axis describes intensity,
the black to white colors. A second axis yields yellow to blue colors, and a third axis
results in red to green colors. The combinations are known as opponent colors, first
described by Hering [22]. A second combination yields the comparison of neighboring
opponent color responses. Such a spatial comparison is performed within circular
areas, called receptive fields. The opponent colors are spatially compared, yielding
black–white, yellow–blue, and red–green receptive fields. Receptive fields are found
in primates at different sizes, and for different opponent pathways [4, 5, 10, 36, 57].
The existence of receptive fields implies that color is a spatial property. Hence color
perception is the result of contrast between opponent spectral responses.
In computer vision, as opposed to machine vision [48], one would like to mimic
human color perception to analyze and interpret color images. For example, in biological and medical research, tissue sections are evaluated by light microscopy. These
sections are stained with standard methods, often over one-hundred years old, and
especially designed to be discriminated by the human eye. Hence, segmentation of
classically stained preparations can best be based on human color perception. However, the understanding of the human visual system has not yet reached the level to
describe the world with the mathematical precision of computer vision [53].
A color image may be defined as a coherent measurement of the spatio-spectral
energy density reflected by a scene. Hence, a color image is the result of the interaction between light source and scene, observed by a measurement probe. From a
computer vision perspective, this definition raises two fundamental problems. First,
how to combine spectral measurements and spatial structure? Common practise in
color image processing is to use the color information without considering the spatial
coherence [11, 14, 16, 17, 37, 42, 45]. Some early attempts to include spatial scale into
color observation is the work by Land [31], and the work of Di Zenzo [58] and Cumani
[3]. Application of color receptive fields in computer vision include [15, 19, 40, 50, 55].
Although these methods intuitively capture the structure of color images, no foundation for color observation is available. A solid physical basis for combining color
information with spatial resolution would solve a fundamental problem of how to
probe the spatio-spectral influx of information bearing energy.
A second fundamental question is how to integrate the physical laws of light reflection into color measurement? Modeling the physical process of color image formation
provides a clue to the object-specific parameters [6, 25, 28, 29, 30, 39, 43, 46, 56, 59].
The question boils down to deriving the invariant properties of color vision, [1, 9, 13,
4
Introduction
16, 17, 20, 46]. With invariance we mean a property f of object t which receives value
f (t) regardless unwanted conditions W in the appearance of t [47]. For human color
vision, the group of disturbing conditions W 0 are categorized by shadow, highlights,
light source, and scene geometry. Scene geometry is determined by the number of
light sources, light source directions, viewing direction, and object shape. The invariant class W 0 is referred to as photometric invariance. For observation of images,
geometric invariance is of importance [12, 18, 26, 32, 49]. The group of spatial disturbing conditions is given by translation, rotation, and observation scale. Since the
human eye projects the three-dimensional world onto a two-dimensional image, the
group may be extended with projection invariance. Both photometric and geometric
invariance are required for a color vision system to reduce the complexity intrinsic to
color images.
In this thesis, these two fundamental questions are considered, aiming at robust
measurement of color invariants. Here, color invariance represents the combined photometric and geometric invariance class. The aim is to describe the local structure
of color images in a systematic, irreducible, and complete sense. The problem is
approached from a measurement theoretic viewpoint, by using aperture functions as
given by linear scale-space theory. Robust color measurement is achieved by selecting the appropriate scale for the aperture function. Conventional scale-space theory
observes the world without imposing a priori information. As a result, the spatial
operators defined in scale-space theory are translation, rotation, and scale invariant.
More importantly, classical scale-space apertures introduce no spurious details due to
the measurement process [26, 27].
In Chapter 2, we use the general scale-space assumptions to formulate a theory of
color measurement [7]. However, our visual system is the result of evolution. When
concerned with color, evolution is guided by the physical laws of light reflection,
imposing the effects of shadows, shading, and highlights [29, 30, 56]. Hence, human
color perception is constrained by the physical laws of light. Chapter 3 describes the
physics of color image formation, and makes a connection between color invariance
derived from physics and color constancy as characteristic for human color perception.
In Chapter 4 the physical laws for color image formation are exploited to derive a
complete, irreducible system of color invariants.
1.2
Part II: Geometrical Structure
The second part of this thesis is concerned with the extraction of geometrical arrangement of local structure. The processes of cell differentiation and apoptosis in growing
tissue result in the clustering of cells forming functional parts [2, 8, 23]. Often, these
functional parts exhibit a regular structure, which is the result of cell division and
specialization. The minimization of occupied space, a natural constraint imposed by
1.2. Part II: Geometrical Structure
5
gravity [51], yields dense packing of cells into somewhat regular arrays and henceforth
they lead to regularly shaped cell patterns. The geometrical arrangement of structures
in tissues may reveal differences between physiological and pathological conditions.
Classical light microscopy is often used to observe tissue structure. The tissue of
interest is cut into the necessary thin slices to observe the structures by transmission
of light. Contrast is added to the slices by staining procedures, resulting in the
highlighting of structures against a uniform background. The chemical state of cells is
quantified by color analysis after staining procedures. Tissue architecture is analyzed
by the spatial arrangement of cells, neurites, bloodvessels, fibers, and other cellular
objects.
The regularity of cell aggregates in tissues does not imply that the quintessence of
the arrangement can be captured in an algorithm. Biological variety causes clusters
to be irregular. Observation by light microscopy demands the extraction of sliced
samples from the tissue. The deformation caused by cutting the three-dimensional
structure into two-dimensional transections again results in spatial distortion of cluster regularity. These distortions impose high demands on the robustness for the
algorithm.
We consider the fundamental problem of geometric structure: how to capture the
arrangement of local structures? For example, a tissue may be considered, at a very
naive level, as a cluster of cells. Hence, a cell may be considered a local marker,
whereas the arrangement of cells is characteristic for the tissue. Such arrangements
impose a grammar of local structures. Graph morphology is the basic tool to describe
these grammars [21, 35, 38, 41, 44, 52, 54].
Chapter 6 describes the extraction of architectures by example structures. For
a regular architecture, a small sample of the geometric arrangement captures the
essential information for automatic extraction. This fact is exploited in the segmentation of tissue architecture for histological preparations. The method is validated by
comparing algorithm performance with the performance of an expert.
Chapter 7 presents an algorithm for the extraction of line networks from local image structure. A network is given by knots and their interconnections. The extraction
of knots and line points yields a localized description of the network. A graph-based
method is applied to the extraction of cardiac myocytes from heart muscle sections.
To derive tissue architecture related parameter, as described in Chapter 6 and
Chapter 7, the tissue need to be digitized into the computer. For tissue sections, often
large compared to the microscope field of view, the automatic acquisition involves a
scanning process. During scanning, the microscope need to be focused when tissue
surface is not planar, as is often the case. Since sufficiently accurate methods are
not available for autofocusing, the second part of this thesis starts with Chapter 5
describing a robust method for focusing preparations in scanning light microscopy.
6
Introduction
Bibliography
[1] E. Angelopoulou, S. Lee, and R. Bajcsy. Spectral gradient: A material descriptor
invariant to geometry and incident illumination. In Proceedings of the Seventh
IEEE International Conference on Computer Vision, pages 861–867. IEEE Computer Society, 1999.
[2] R. Chandebois. Cell sociology: A way of reconsidering the current concepts of
morphogenesis. Acta Bioth., 25:71–102, 1976.
[3] A. Cumani. Edge detection in multispectral images. CVGIP: Graphical Models
and Image Processing, 53(1):40–51, 1991.
[4] D. M. Dacey and B. B. Lee. The “blue-on” opponent pathway in primate retina
originates from a distinct bistratified ganglion cell type. Nature, 367:731–735,
1994.
[5] D. M. Dacey, B. B. Lee, D. K. Stafford, J. Pokorney, and V. C. Smith. Horizontal
cells of the primate retina: Cone specificity without spectral opponency. Science,
271:656–659, 1996.
[6] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink. Reflectance
and texture of real world surfaces. ACM Trans Graphics, 18:1–34, 1999.
[7] A. Dev and R. van den Boomgaard. Color and scale: The spatial structure of
color images. Technical report, ISIS institute, Department of Computer Science,
University of Amsterdam, Amsterdam, The Netherlands, 1999.
[8] K. J. Dormer. Fundamental Tissue Geometry for Biologists. Cambridge Univ.
Press, London, 1980.
[9] M. D’Zmura and P. Lennie. Mechanisms of color constancy. J. Opt. Soc. Am.
A, 3(10):1662–1672, 1986.
[10] S. Engel, X. Zhang, and B. Wandell. Colour tuning in human visual cortex
measured with functional magnetic resonance imaging. Nature, 388:68–71, 1997.
[11] G. D. Finlayson. Color in perspective. IEEE Trans. Pattern Anal. Machine
Intell., 18(10):1034–1038, 1996.
[12] L. Florack. Image Structure. Kluwer Academic Publishers, Dordrecht, 1997.
[13] D. H. Foster and S. M. C. Nascimento. Relational colour constancy from invariant
cone-excitation ratios. Proc. R. Soc. London B, 257:115–121, 1994.
[14] B. V. Funt and G. D. Finlayson. Color constant color indexing. IEEE Trans.
Pattern Anal. Machine Intell., 17(5):522–529, 1995.
Bibliography
7
[15] C. Garbay, G. Brugal, and C. Choquet. Application of colored image analysis to
bone marrow cell recognition. Analyt. Quantit. Cytol., 3:272–280, 1981.
[16] R. Gershon, D. Jepson, and J. K. Tsotsos. Ambient illumination and the determination of material changes. J. Opt. Soc. Am. A, 3:1700–1707, 1986.
[17] T. Gevers and A. W. M. Smeulders. Color based object recognition. Pat. Rec.,
32:453–464, 1999.
[18] L. J. Van Gool, T. Moons, E. J. Pauwels, and A. Oosterlinck. Vision and Lie’s
approach to invariance. Image Vision Comput., 13(4):259–277, 1995.
[19] D. Hall, V. Colin de Verdière, and J. L. Crowley. Object recognition using
coloured receptive fields. In Proceedings Sixth Europian Conference on Computer
Vision (ECCV), volume 1, pages 164–177, LNCS 1842. Springer-Verlag, 26th
June-1st July, 2000.
[20] G. Healey and A. Jain. Retrieving multispectral satellite images using physicsbased invariant representations. IEEE Trans. Pattern Anal. Machine Intell.,
18:842–848, 1996.
[21] H. J. A. M. Heijmans, P. Nacken, A. Toet, and L. Vincent. Graph morphology.
J. Visual Communication Image Representation, 3:24–38, 1992.
[22] E. Hering. Outlines of a Theory of the Light Sense. Harvard University Press,
Cambridge, MS, 1964.
[23] H. Honda. Geometrical models for cells in tissues. Int. Rev. Cytol., 81:191–248,
1983.
[24] D. H. Hubel. Eye, Brain, and Vision. Scientific American Library, New York,
NY, 1988.
[25] D. B. Judd and G. Wyszecki. Color in Business, Science, and Industry. Wiley,
New York, NY, 1975.
[26] J. J. Koenderink. The structure of images. Biol. Cybern., 50:363–370, 1984.
[27] J. J. Koenderink and A. J. van Doorn. Receptive field families. Biol. Cybern.,
63:291–297, 1990.
[28] J. J. Koenderink and A. J. van Doorn. Illuminance texture due to surface
mesostructure. J. Opt. Soc. Am. A, 13:452–463, 1996.
[29] P. Kubelka. New contribution to the optics of intensely light-scattering materials.
part I. J. Opt. Soc. Am., 38(5):448–457, 1948.
8
Introduction
[30] P. Kubelka and F. Munk. Ein beitrag zur optik der farbanstriche. Z. Techn.
Physik, 12:593, 1931.
[31] E. H. Land. The retinex theory of color vision. Sci. Am., 237:108–128, 1977.
[32] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Boston, 1994.
[33] M. Livingstone. Art, illusion and the visual system. Sci. Am., 258:78–85, 1988.
[34] M. Livingstone and D. Hubel. Segregation of form, color, movement, and depth:
Anatomy, physiology, and perception. Science, 240:740–749, 1988.
[35] R. Marcelpoil and Y. Usson. Methods for the study of cellular sociology: Voronoı̈
diagrams and parametrization of the spatial relationships. J. Theor. Biol.,
154:359–369, 1992.
[36] R. H. Masland. Unscrambling color vision. Science, 271:616–617, 1996.
[37] B. A. Maxwell and S. A. Shafer. Physics-based segmentation of complex objects
using multiple hypotheses of image formation. Comput. Vision Image Understanding, 65(2):269–295, 1997.
[38] F. Meyer. Skeleton and perceptual graphs. Signal Processing, 16:335–363, 1989.
[39] K. D. Mielenz, K. L. Eckerle, R. P. Madden, and J. Reader. New reference
spectrophotometer. Appl. Optics, 12(7):1630–1641, 1973.
[40] M. Mirmehdi and M. Petrou. Segmentation of color textures. IEEE Trans.
Pattern Anal. Machine Intel., 22(2):142–159, 2000.
[41] A. Mojsilović, J. Kovačević, J. Hu, R. J. Safranek, and S. K. Ganapathy. Matching and retrieval based on the vocabulary and grammar of color patterns. IEEE
Trans. Image Processing, 9(1):38–54, 2000.
[42] S. K. Nayar and R. M. Bolle. Computing reflectance ratios from an image. Pat.
Rec., 26:1529–1542, 1993.
[43] M. Oren and S. K. Nayar. Generalization of the Lambertian model and implications for machine vision. Int. J. Computer Vision, 14:227–251, 1995.
[44] J. Palmari, C. Dussert, Y. Berthois, C. Penel, and P. M. Martin. Distribution of
estrogen receptor heterogeneity in growing MCF–7 cells measured by quantitative
microscopy. Cytometry, 27:26–35, 1997.
[45] G. Sapiro. Color and illuminant voting. IEEE Trans. Pattern Anal. Machine
Intel., 21(11):1210–1215, 1999.
Bibliography
9
[46] S. A. Shafer. Using color to separate reflection components. Color Res. Appl.,
10(4):210–218, 1985.
[47] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content
based image retrieval at the end of the early years. submitted to IEEE Trans.
Pattern Anal. Machine Intell.
[48] H. M. G. Stokman. Robust Photometric Invariance in Machine Color Vision.
PhD thesis, University of Amsterdam, Amsterdam, The Netherlands, 2000.
[49] B. M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision.
Kluwer Academic Publishers, Boston, 1994.
[50] B. Thai and G. Healey. Modeling and classifying symmetries using a multiscale opponent color representation. IEEE Trans. Pattern Anal. Machine Intell.,
20(11):1224–1235, 1998.
[51] D. W. Thompson. On Growth and Form. Cambridge University Press, London,
England, 1971.
[52] A. Toet. Hierarchical clustering through morphological graph transformation.
Pat. Rec. Let., 12:391–399, 1991.
[53] D. Travis. Effective Color Displays, Theory and Practice. Academic Press, 1991.
[54] L. Vincent. Graphs and mathematical morphology. Signal Processing, 16:365–
388, 1989.
[55] S. G. Wolf, R. Ginosar, and Y. Y. Zeevi. Spatio-chromatic model for colour
image processing. In Proceedings 12th IAPR International Conference on Pattern
Recognition, volume 1, pages 599–601. IEEE, October 9–13, 1994.
[56] G. Wyszecki and W. S. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulae. Wiley, New York, NY, 1982.
[57] R. A. Young. The Gaussian derivative theory of spatial vision: Analysis of
cortical cell receptive field line-weighting profiles. Technical Report GMR-4920,
General Motors Research Center, Warren, MI, 1985.
[58] S. Di Zenzo. A note on the gradient of a multi-image. Comput. Vision Graphics
Image Processing, 33:116–125, 1986.
[59] R. Zhou, E. H. Hammond, and D. L. Parker. A multiple wavelength algorithm
in color image analysis and its applications in stain decomposition in microscopy
images. Med. Phys., 23(12):1977–1986, 1996.
Part I
Color
Chapter 2
Color and Scale
The Spatial Structure of Color Images
appeared in the proceedings of the sixth Europian Conference on Computer Vision, vol. 1, pp.
331–341, 2000.
“Lightness and color are field phenomena, not point phenomena.”
– –Edwin H. Land.
There has been a recent revival in the analysis of color in computer vision. This is
mainly due to the common knowledge that more visual information leads to easier
interpretation of the visual scene. A color image is easier to segment than a greyvalued image since some edges are only visible in the color domain and will not be
detected in the grey-valued image. An area of large interest is searching for particular
objects in images and image-databases, for which color is a feature with high reach in
its data-values and hence high potential for discriminability. Color can thus be seen
as an additional cue in image interpretation. Moreover, color can be used to extract
object reflectance robust for a change in imaging conditions [4, 5, 14, 15]. Therefore
color features are well suited for the description of an object.
Colors are only defined in terms of human observation. Modern analysis of color
has started in colorimetry where the spectral content of tri-chromatic stimuli are
matched by a human, resulting in the well-known XYZ color matching functions [17].
However, from the pioneering work of Land [13] we know that a perceived color does
not directly correspond to the spectral content of the stimulus; there is no one-to-one
mapping of spectral content to perceived color. For example, a colorimetry purist
will not consider brown to be a color, but as computer vision practisers would like
to be able to define brown in an image when searching on colors. Hence, it is not
only the spectral energy distribution coding color information, but also the spatial
13
14
Chapter 2. Color and Scale
configuration of colors. We aim at a physical basis for the local interpretation of color
images.
Common image processing sense tells us that the grey-value of a particular pixel
is not a meaningful entity. The value 42 by itself tells us little about the meaning of
the pixel in its environment. It is the local spatial structure of an image that has a
close geometrical interpretation [10]. Yet representing the spatial structure of a color
image is an unsolved problem.
The theory of scale-space [10, 16] adheres to the fact that observation and scale are
intervened; a measurement is performed at a certain resolution. Differentiation is one
of the fundamental operations in image processing, and one which is nicely defined
[3] in the context of scale-space. In this chapter we discuss how to represent color as
a scalar field embedded in a scale-space paradigm. As a consequence, the differential
geometry framework is extended to the domain of color images. We demonstrate color
invariant edge detectors which are robust to shadow and highlight boundaries.
The chapter is organized as follows. Section 2.1 considers the embedding of color
in the scale-space paradigm. In section 2.2 we derive estimators for the parameters in
the scale-space model, and give optimal values for these parameters. The resulting
sensitivity curves are colorimetrical compared with human color vision.
2.1
Color and Observation Scale
A spatio-spectral energy distribution is only measurable at a certain spatial resolution and a certain spectral bandwidth. Hence, physical realizable measurements
inherently imply integration over spectral and spatial dimensions. The integration
reduces the infinitely dimensional Hilbert space of spectra at infinitesimally small
spatial neighborhood to a limited amount of measurements. As suggested by Koenderink [11], general aperture functions, or Gaussians and its derivatives, may be used
to probe the spatio-spectral energy distribution. We emphasize that no essentially
new color model is proposed here, but rather a theory of color measurement. The
specific choice of color representation is irrelevant for our purpose. For convenience
we first concentrate on the spectral dimension, later on we show the extension to the
spatial domain.
2.1.1
The Spectral Structure of Color
From scale space theory we know how to probe a function at a certain scale; the probe
should have a Gaussian shape in order to prevent the creation of extra details into
the function when observed at a higher scale (lower resolution) [10]. As suggested
by Koenderink [11], we can probe the spectrum with a Gaussian. In this section,
we consider the Gaussian as a general probe for the measurement of spatio-spectral
15
2.1. Color and Observation Scale
differential quotients. No essentially new color model is proposed, but rather a theory
of color measurement.
Formally, let E(λ) be the energy distribution of the incident light, where λ denotes
wavelength, and let G(λ0 ; σλ ) be the Gaussian at spectral scale σλ positioned at λ0 .
The spectral energy distribution may be approximated by a Taylor expansion at λ 0 ,
1
λ0
+ ... .
E(λ) = E λ0 + λEλλ0 + λ2 Eλλ
2
(2.1)
Measurement of the spectral energy distribution with a Gaussian aperture yields a
weighted integration over the spectrum. The observed energy in the Gaussian color
model, at infinitely small spatial resolution, approaches in second order to
1
λ0 ,σλ
Ê σλ (λ) = Ê λ0 ,σλ + λÊλλ0 ,σλ + λ2 Êλλ
+ ...
2
(2.2)
where
Ê λ0 ,σλ =
Z
E(λ)G(λ; λ0 , σλ )dλ
(2.3)
E(λ)Gλ (λ; λ0 , σλ )dλ
(2.4)
measures the spectral intensity,
Êλλ0 ,σλ =
Z
measures the first order spectral derivative, and
λ0 ,σλ
=
Êλλ
Z
E(λ)Gλλ (λ; λ0 , σλ )dλ
(2.5)
measures the second order spectral derivative. Further, Gλ and Gλλ denote derivatives
of the Gaussian with respect to λ. Note that, throughout the thesis, we assume scale
normalized Gaussian derivatives to probe the spectral energy distribution.
Definition 1 (Gaussian Color Model) The Gaussian color model measures the
λ0 ,σλ
, . . . of the Taylor expansion of the Gaussian
coefficients Ê λ0 ,σλ , Êλλ0 ,σλ , Êλλ
weighted spectral energy distribution at λ0 and scale σλ .
One might be tempted to consider a higher, larger than two, order structure of
the smoothed spectrum. However, the subspace spanned by the human visual system
is of dimension 3, and hence higher order spectral structure cannot be observed by
the human visual system.
16
Figure 2.1: The probes for spatial color consists of probing the product of the spatial and the spectral
space with a Gaussian aperture.
2.1.2
The Spatial Structure of Color
Introduction of spatial extent in the Gaussian color model yields a local Taylor expansion at wavelength λ0 and position x~0 . Each measurement of a spatio-spectral
energy distribution has a spatial as well as spectral resolution. The measurement is
obtained by probing an energy density volume in a three-dimensional spatio-spectral
space, where the size of the probe is determined by the observation scale σ λ and σx ,
see fig. 2.1. It is directly clear that we do not separately consider spatial scale and
spectral scale, but actually probe an energy density volume in the 3d spectral-spatial
space where the “size” of the volume is specified by the observation scales.
We can describe the observed spatial-spectral energy density Ê(λ, ~x) of light as
a Taylor series for which the coefficients are given by the energy convolved with
Gaussian derivatives:
#µ ¶
µ ¶T " #
µ ¶T "
1 ~x
Ê~x
Ê~x~x Ê~xλ
~x
~x
+
+ ...
(2.6)
Ê(λ, ~x) = Ê +
λ
λ
2 λ
Êλ
Êλ~x Êλλ
where
Ê~xi λj (λ, ~x) = E(λ, ~x) ∗ G~xi λj (λ, ~x; σλ , σx ) .
(2.7)
Here, G~xi λj (λ, ~x; σλ , σx ) are the spatio-spectral probes, or color receptive fields. The
coefficients of the Taylor expansion of Ê(λ, ~x) represent the local image structure completely. Truncation of the Taylor expansion results in an approximate representation,
optimal in least squares sense.
For human vision, it is known that the Taylor expansion is spectrally truncated at
second order [8]. Hence, higher order derivatives do not affect color as observed by the
2.2. Colorimetric Analysis of the Gaussian Color Model
17
human visual system. Therefore, three receptive field families should be considered;
the luminance receptive fields as known from luminance scale-space [12] extended
with a yellow-blue receptive field family measuring the first order spectral derivative,
and a red-green receptive field family probing the second order spectral derivative.
For human vision, the Taylor expansion for luminance is spatially truncated at fourth
order [18].
2.2
Colorimetric Analysis of the Gaussian Color
Model
The eye projects the infinitely dimensional spectral density function onto a 3d ‘color’
space. Not any 3d subspace of the Hilbert space of spectra equals the subspace that
nature has chosen. Any subspace we create with an artificial color model should
be reasonably close in some metrical sense to the spectral subspace spanned by the
human visual system.
Formally, the infinitely dimensional spectrum e is projected onto a 3d space c by
c = AT e, where AT = (XY Z) represents the color matching matrix. The subspace
¡ ¢
in which c resides, is defined by the color matching functions AT . The range < AT
¡ T¢
defines what spectral distributions e can be reached from c, and the nullspace ℵ A
defines which spectra e cannot be observed in c. Since any spectrum e = e< + eℵ
¡ ¢
¡ ¢
decomposed into a part that resides in < AT and a part that resides in ℵ AT , we
define
Definition 2 The observable part of the spectrum equals e< = Π< e where Π< is the
projection onto the range of the human color matching functions A T .
Definition 3 The non-observable (or metameric black) part of the spectrum equals
eℵ = Πℵ e where Πℵ is the projection onto the nullspace of the human color matching
functions AT .
¡ ¢
The projection on the range < AT is given by [1]
¡
¢−1 T
¡ ¢
Π< : AT 7→ < AT = A AT A
A
(2.8)
¡ ¢
¡
¢−1 T
Πℵ : AT 7→ ℵ AT = I − A AT A
A = Π⊥
< .
(2.9)
and the projection on the nullspace
Any spectral probe B T that has the same range as AT is said to be colorimetric with
AT and hence differs only in an affine transformation. An important property of the
range projector Π< is that it uniquely specifies the subspace. Thus, we can rephrase
the previous statement into:
18
¡ ¢
Proposition 4 The human color space is uniquely defined by < AT . Any color
¡ ¢
¡ ¢
model B T is colorimetric with AT if and only if < AT = < B T .
In this way we can tell if a certain color model is colorimetric with the human visual
system. Naturally this is a formal definition. It is not well suited for a measurement
approach where the color subspaces are measured with a given precision. A definition
of the difference between subspaces is given by [7, Section 2.6.3],
Proposition 5 The largest principle angle θ between color subspaces given by their
color matching functions AT and B T equals
° ¡ ¢
¡ ¢°
sin θ(AT , B T ) = °< AT − < B T °2 .
Up to this point we did establish expressions describing similarity between different subspaces. We are now in a position to compare the subspace of the Gaussian
color model with the human visual system by using the XYZ color matching functions. Hence, parameters for the Gaussian color model may be optimized to capture
a similar spectral subspace as spanned by human vision, see fig. 2.2. Let the Gaussian
color matching functions be given by G(λ0 , σλ ). We have 2 degrees of freedom in positioning the subspace of the Gaussian color model; the mean λ0 and scale σλ of the
Gaussian. We wish to find the optimal subspace that minimizes the largest principle
angle between the subspaces, i.e.:
B(λ0 , σλ )
=
sin θ
=
[G(λ; λ0 , σλ ) Gλ (λ; λ0 , σλ ) Gλλ (λ; λ0 , σλ )]
° ´
³° ¡ ¢
°
T °
argmin °< AT − < (B(λ0 , σλ ) )°
λ0 ,σλ
T
2
An approximate solution is obtained for λ0 = 520 nm and σλ = 55 nm. The corresponding angles between the principal axes of the Gaussian sensitivities and the 1931
and 1964 CIE standard observers are given in tab. 2.1. Figure 2.3 shows the different sensitivities, together with the optimal (least square) transform from the XYZ
sensitivities to the Gaussian basis, given by

 
 
Ê
X̂
−0.48 1.2
0.28

 
 

=
(2.10)
0.48
0
−0.4
Ê
 λ
 Ŷ  .
1.18 −1.3
0
Êλλ
Ẑ
Since the transformed sensitivities are a linear (affine) transformation of the original
XYZ sensitivities, the transformation is colorimetric with human vision. The transform is close to the Hering basis for color vision [8], for which the yellow-blue pathway
indeed is found in the visual system of primates [2].
19
2.3. Conclusion
(a)
(b)
Figure 2.2: Cohen’s fundamental matrix < for the CIE 1964 standard observer (a), and for the
Gaussian color model (λ0 = 520 nm, σλ = 55 nm) (b).
A RGB-camera approximates the CIE 1931 XYZ basis for colorimetry by the
linear transform [9]
  
 
X̂
0.62 0.11 0.19
R
  
(2.11)
0.3
0.56 0.05 G .
 Ŷ  =
−0.01 0.03 1.11
B
Ẑ
The best linear transform from XYZ values to the Gaussian color model is given by
(eq. 2.10). Hence, the product of (eq. 2.11) and (eq. 2.10) gives the desired implementation of the Gaussian color model in RGB terms,
 

 
Ê
0.06 0.63
0.27
R
 



(2.12)
G .
 Êλ  = 0.3 0.04 −0.35
B
0.34 −0.6 0.17
Êλλ
A better approximation to the Gaussian color model may be obtained for known
camera sensitivities. Figure 2.4 shows an example image and its Gaussian color model
components.
2.3
Conclusion
We have established the measurement of spatial color information from RGB-images,
based on the Gaussian scale-space paradigm. We have shown that the formation of
color images yield a spatio-spectral integration process at a certain spatial and spectral
resolution. Hence, measurement of color images implies probing a three-dimensional
energy density at a spatial scale σx and spectral scale σλ . The Gaussian aperture
may be used to probe the spatio-spectral energy distribution.
20
Table 2.1: Angles between the principal axes for various color systems. For determining the optimal
values λ0 , σλ , the largest angle θ1 is minimized. The distance between the Gaussian sensitivities
for the optimal values λ0 = 520 nm, σλ = 55 nm and the different CIE colorimetric systems is
comparable. Note the difference between the CIE systems is 9.8◦ .
Gauss – XYZ 1931
26◦
21.5◦
3◦
θ1
θ2
θ3
Gauss – XYZ 1964
23.5◦
17.5◦
3◦
XYZ 1931 – 1964
9.8◦
3.9◦
1◦
1
2
1
0.5
1.5
0.5
0
1
0
-0.5
0.5
-0.5
-1
0
400
450
500
550
600
650
700
-1
400
(a)
450
500
550
600
650
700
(b)
400
450
500
550
600
650
700
(c)
Figure 2.3: The Gaussian sensitivities at λ0 = 520 nm and σλ = 55 nm (a). The The best linear
transformation from the CIE 1964 XYZ sensitivities (b) to the Gaussian bases is shown in (c). Note
the correspondence between the transformed sensitivities and the Gaussian color model.
(a)
(b)
(c)
(d)
Figure 2.4: The example image (a) and its color components Ê (b), Êλ (c), and Êλλ (d), respectively.
Note that for the color component Êλ achromaticity is shown in grey, negative bluish values are shown
in dark, and positive yellowish in light. Further, for Êλλ achromaticity is shown in grey, negative
greenish in dark, and positive reddish in light.
Bibliography
21
We have achieved a spatial color model, founded in physics as well as in measurement science. The parameters of the Gaussian color model have been estimated such
that a similar spectral subspace as human vision is captured. The Gaussian color
model solves the fundamental problem of color and scale by integrating the spatial
and color information. The model measures the coefficients of the Taylor expansion of
the spatio-spectral energy distribution. Hence, the Gaussian color model describes the
local structure of color images. As a consequence, the differential geometry framework
is extended to the domain of color images.
Spatial differentiation of expressions derived from the Gaussian color model is
inherently well-posed, in contrast with often ad-hoc methods for detection of hue
edges and other color edge detectors. Application areas include physics-based vision
[5], image database searches [6], and object tracking.
Bibliography
[1] J. B. Cohen and W. E. Kappauff. Color mixture and fundamental metamer:
Theory, algebra, geometry, application. Am. J. Psych., 98:171–259, 1985.
[2] D. M. Dacey and B. B. Lee. The “blue-on” opponent pathway in primate retina
originates from a distinct bistratified ganglion cell type. Nature, 367:731–735,
1994.
[3] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever.
Cartesian differential invariants in scale-space. Journal of Mathematical Imaging
and Vision, 3(4):327–348, 1993.
32:453–464, 1999.
[6] T. Gevers and A. W. M. Smeulders. Content-based image retrieval by viewpointinvariant image indexing. Image Vision Comput., 17(7):475–488, 1999.
[7] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins
Press Ltd., London, 1996.
[8] E. Hering. Outlines of a Theory of the Light Sense. Harvard University Press,
Cambridge, MS, 1964.
[9] ITU-R Recommendation BT.709. Basic parameter values for the HDTV standard for the studio and for international programme exchange. Technical Report
BT.709 [formerly CCIR Rec. 709], ITU, 1211 Geneva 20, Switzerland, 1990.
22
[11] J. J. Koenderink and A. Kappers. Color Space. Utrecht University, The Netherlands, 1998.
63:291–297, 1990.
[14] K. D. Mielenz, K. L. Eckerle, R. P. Madden, and J. Reader. New reference
spectrophotometer. Appl. Optics, 12(7):1630–1641, 1973.
10(4):210–218, 1985.
[18] R. A. Young. The Gaussian derivative theory of spatial vision: Analysis of
cortical cell receptive field line-weighting profiles. Technical Report GMR-4920,
General Motors Research Center, Warren, MI, 1985.
Chapter 3
A Physical Basis for Color
Constancy
Part of this work has appeared in the proceedings of the Second International Conference on
Scale-Space Theory in Computer Vision, 1999, pp. 459–464.
“As organisms grew more intricate, their sense organs multiplied and became
both more complex and more delicate. More messages of greater variety were
received from and about the external environment. Along with that (whether
as cause or effect we cannot tell), there developed an increasing complexity of
the nervous system, the living instrument that interpreted and stored the data
collected by the sense organs.”
– –Isaac Asimov.
A well known property of human vision, known as color constancy, is the ability
to correct for color deviations caused by a difference in illumination. Although the
effect is a long standing research topic [13, 15, 21], the mechanism involved is only
partly resolved.
A common approach to investigate color constant behavior is by psychophysical
experiments [1, 13, 14]. Despite the exact nature of such experiments, there are
intrinsic difficulties to explain the experimental results. For relatively simple experiments, the results may not explain in enough detail the mechanism underlying color
constancy. For example, in [14] the same stimulus patch, either illuminated by the
test illuminant, or by the reference illuminant, was presented to the left and right
eye. The subject was asked to match the appearance of the color under the reference
illuminant to the color under the test illuminant. As discussed by the authors, the
experiment is synthetical in that the visual scene lacks a third dimensions. Although
the results correspond to their predictions, they are unable to prove their theory on
natural scenes, the scenes where shadow plays an important role. On the other hand,
23
24
Chapter 3. A Physical Basis for Color Constancy
for complex experiments, with inherently a large amount of variables involved, the
results does not describe color constancy isolated from other perceptual mechanisms.
In [1], a more natural scene is used, in that objects were placed in the experimentation room. The observer judged the appearance of a test patch mounted on the far
wall of the room. The observer was asked to vary the chromaticity of the test patch
so that it appeared achromatic. The color constancy reported is excellent, but the
experiments could not be interpreted in enough detail to explain the results. Hence,
a fundamental problem in experimental colorimetry is that the complex experimental
environment necessary to examine color constancy makes it hard to draw conclusions.
An alternative approach to reveal the mechanisms involved in color constancy
is by considering the spectral image formation. Modeling the physical process of
spectral image formation provides insight into the effect of different parameters on
object reflectance [2, 3, 4, 5, 6, 19]. In this chapter, we aim at a physical basis for
color constancy rather than a psychophysical one. Object reflectance is well modeled
by Shafer [20], based on the older Kubelka-Munk theory [11, 12]. The Kubelka-Munk
theory models the reflected and transmitted spectrum of a colored layer, based on
a material dependent scattering and absorption function, under the assumption that
light is isotropically scattered within the material. The theory has proven to be
successful for a wide variety of materials and applications [8, 22]. The theory unites
spectral color formation for both reflecting materials as well as transparent materials
into one photometric model. Therefore, the Kubelka-Munk theory is well suited for
determining material properties from color measurements. In Chapter 4, the use of
the Kubelka-Munk model is demonstrated for the measurement of object reflectance
from color images, under various general assumptions regarding imaging conditions.
In this chapter, we concentrate on color constant measurement of object color under
both reflectance of light as well as light transmission.
When considering the estimation of material properties on the basis of local measurements, differential equations constitute a natural framework to describe the physical process of image formation. A well known technique from scale-space theory [10]
is the convolution of a signal with a derivative of the Gaussian kernel to obtain the
derivative of the signal. The Gaussian function regularizes the underlying distribution,
resulting in robustness against noise. The standard deviation σ of the Gaussian determines the observation scale. Introduction of wavelength in the scale-space paradigm,
as suggested by Koenderink [9], leads to a spatio-spectral family of Gaussian aperture
functions. These color receptive fields are introduced in Chapter 2 as the Gaussian
color model. The Gaussian color model provides a physical basis, which is compatible
with colorimetry, for the measurement of color constant object properties.
The color constancy problem is often posed as retrieving the unknown illuminant
from a given scene [2, 3, 14, 19]. Different from their approach, features invariant
to a change in illuminant can be developed [4, 5, 6]. In this chapter, we focus on
differential expressions which are robust to a change in illumination color. The per-
25
3.1. Color Image Formation Model
formance of these color invariants is demonstrated by experiments on spectral data.
Additionally, robustness against changes in the imaging conditions, such as camera
viewpoint, illumination direction, and object geometry is achieved, as demonstrated
in Chapter 4.
The organization of the chapter is as follows. In section 3.1, color image formation
is modeled by means of the Kubelka-Munk theory. Invariant differential expressions
which meet the given constraints are derived from the model in section 3.2. Application of the Gaussian color model described in Chapter 2 implies measurement of
the derived spatio-spectral differential invariants. Section 3.3 describes experimental
setup and color constancy results for the proposed method compared to well-known
methods from literature. Finally, a confrontation between physics based and perception based color constancy is given in section 3.4.
3.1
Color Image Formation Model
In this section, image formation is modeled by means of the Kubelka-Munk theory
[8, 22] for colorant layers. Under the assumption that light within the material is
isotropically scattered, the material layer may be characterized by a wavelength dependent scatter coefficient and absorption coefficient. The class of materials for which
the theory is useful ranges from dyed paper and textiles, opaque plastics, paint films,
up to enamel and dental silicate cements [8]. The model may be applied to both
reflecting and transparent material.
3.1.1
Color Formation for Reflection of Light
Consider a homogeneously colored material patch of uniform thickness d and infinitesimal area, characterized by its absorption coefficient k(λ) and scatter coefficient s(λ).
When illuminated by incident light with spectral distribution e(λ), light scattering
within the material causes diffuse body reflection (fig. 3.1), while Fresnel interface
reflectance occurs at the surface boundaries.
When the thickness of the layer is such that further increase in thickness does not
affect the reflected color, Fresnel reflectance at the back surface may be neglected. The
incident light is partly reflected at the front surface, and partly enters the material,
is isotropically scattered, and a part again passes the front-surface boundary. The
reflected spectrum in the viewing direction ~v , ignoring secondary scattering after
internal boundary reflection, is given by [8, 22]:
2
ER (λ) = e(λ) (1 − ρf (λ, ~n, ~s, ~v )) R∞ (λ) + e(λ)ρf (λ, ~n, ~s, ~v )
(3.1)
where ~n is the surface patch normal and ~s the direction of the illumination source,
and ρf the Fresnel front surface reflectance coefficient in the viewing direction. The
26
Figure 3.1: Illustration of the photometric model. The object, refractive index n 2 , is illuminated by
e(λ) (medium refractive index n1 ), and light is reflected and scattered in the viewing direction.
body reflectance
R∞ (λ) = a(λ) − b(λ)
(3.2)
depends on the absorption and scattering coefficient by
a(λ) = 1 +
k(λ)
,
s(λ)
b(λ) =
p
a(λ)2 − 1 .
(3.3)
Simplification is obtained by considering neutral interface reflection, assuming
that the Fresnel reflectance coefficient has a constant value over the spectrum. For
commonly used materials, interface reflection is constant with respect to wavelength
within a few percent across the visible spectrum [8, 18]. Equation (3.1) reduces to
2
ER (λ) = e(λ) (1 − ρf (~n, ~s, ~v )) R∞ (λ) + e(λ)ρf (~n, ~s, ~v ) .
(3.4)
The influence of the Fresnel reflectance varies from perfectly diffuse body reflectance
ρf = 0, or Lambertian reflection, to total mirroring of the illuminating source (ρf = 1).
Hence, the spectral color of ER is an additive mixture of the color of the light source
and the perfectly diffuse body reflectance color.
Because of projection of the energy distribution on the image plane, vectors ~n, ~s
and ~v will depend on the position at the imaging plane. The energy of the incoming
spectrum at a point ~x on the image plane is then related to
2
ER (λ, ~x) = e(λ, ~x) (1 − ρf (~x)) R∞ (λ, ~x) + e(λ, ~x)ρf (~x)
(3.5)
27
Figure 3.2: Illustration of the photometric model. The object, refractive index n 2 , is illuminated by
e(λ) (medium refractive index n1 ). When the material is transparent, light is transmitted through
the material, enters medium n3 , and is observed.
where the spectral distribution at each point x is generated off a specific material
patch.
The major assumption made for the model of (eq. 3.5) is that locally planar surface
patches are examined, for which the material is homogeneously colored. These constraints are imposed by the Kubelka-Munk theory, resulting in isotropic scattering of
light within the material. The assumption is valid when the resolution is fine enough
to consider locally uniform colored patches, whereas individual staining particles are
not resolved. Further, the thickness of the layer is assumed to be such that no light
reaches the other side of the material. For every day scenes, these assumptions seems
to be justified. Concerning the Fresnel reflectance, the photometric model assumes a
neutral interface at the surface patch. As discussed in [18, 20], deviations of ρ f over
the visible spectrum are small for commonly used materials, therefore the Fresnel
reflectance coefficient may be considered constant. The internally Fresnel reflected
light contributes little in many cases [22], and is ignored in the model.
3.1.2
Color Formation for Transmission of Light
Consider a homogeneously colored material patch of uniform thickness d and infinitesimal area, characterized by its absorption coefficient k(λ) and scatter coefficient s(λ).
When illuminated by incident light with spectral distribution e(λ), absorption and
scattering by the material determines its transmission color (fig. 3.2), while Fresnel
interface reflectance occurs at both the front and back surface boundaries.
When the layer is thin, such that the material is transparent, the transmitted spec-
28
trum through the layer in the viewing direction ~v , ignoring the effect of interreflections
between the material surfaces, is given by [8, 22]:
ET (λ) =
e(λ) (1 − ρf (λ, ~n, ~s, ~v )) (1 − ρb (λ, ~n, ~s, ~v )) b(λ)
a(λ) sinh[b(λ)s(λ)l(~n, ~s, ~v )c] + b(λ) cosh[b(λ)s(λ)l(~n, ~s, ~v )c]
(3.6)
where again ~n is the material patch normal and ~s is the direction of the illumination
source. Further, c is the staining concentration and l the distance traveled by the light
through the material. The terms ρf and ρb denote the Fresnel front and back surface
reflectance coefficient, respectively. The factors a and b depend on the absorption and
scattering coefficients as given by (eq. 3.3).
Simplification is obtained by considering neutral interface reflection, assuming
that the Fresnel reflectance coefficients have a constant value over the spectrum. In
that case, the Fresnel reflectance affects the intensity of the transmitted light only.
Further, by considering a small angle of incidence at the transparent layer, the path
length l(~n, ~s, ~v ) = d. Equation (3.6) reduces to
ET (λ) =
e(λ) (1 − ρf (~n, ~s, ~v )) (1 − ρb (~n, ~s, ~v )) b(λ)
.
a(λ) sinh[b(λ)s(λ)dc] + b(λ) cosh[b(λ)s(λ)dc]
(3.7)
Because of projection of the energy distribution on the image plane, vectors ~n, ~s
and ~v will depend on the position ~x at the imaging plane,
ET (λ, ~x) =
e(λ, ~x)(1 − ρf (~x))(1 − ρb (~x))b(λ, ~x)
a(λ, ~x) sinh[b(λ, ~x)s(λ, ~x)d(~x)c(~x)] + b(λ, ~x) cosh[b(λ, ~x)s(λ, ~x)d(~x)c(~x)]
(3.8)
where the spectral distribution at each point x is generated off a specific transparent
patch.
One of the assumptions made for the model of (eq. 3.8) is that locally planar material patches are examined, with parallel sides, for which the material is homogeneously
colored. The assumption is valid when the material is non-fluorescent nor in any sense
optically active, and the resolution is fine enough to consider locally uniform colored
patches, while individual stain particles are not resolved. Again, these constraints are
imposed by the Kubelka-Munk theory. Further, normal incidence of light at the layer
is assumed, so that the optical path length through the layer approximates its thickness. In transmission light microscopy, the preparation and observation conditions
fairly justify these assumptions. Concerning the Fresnel reflectance, the photometric
model assumes a neutral interface at the transparent patch. As discussed in [18],
deviations of ρf , ρb over the visible spectrum are small for commonly used materials. For example, the refractive index of immersion oil often used in microscopy only
varies 3.3% over the visible spectrum. Therefore, the Fresnel reflectance coefficients
ρf and ρb may be considered constant over the spectrum. The contribution of internally Fresnel reflected light is small in many cases [22], and is therefore ignored in the
model.
29
3.1.3
Special Cases
Thus far, we have achieved a photometric model for spectral color formation, which
is applicable for both reflecting and transmitting materials, and valid under a wide
variety of circumstances and materials. The following special cases can be derived.
For matte, dull surfaces, the Fresnel coefficient can be considered neglectable,
ρf (~x) ≈ 0, for which ER (eq. 3.5) reduces to the Lambertian model for diffuse body
reflection,
ER (λ, ~x) = e(λ, ~x)R∞ (λ, ~x)
(3.9)
as expected.
2
By introducing cb (λ) = e(λ)R∞ (λ), ci (λ) = e(λ), mb (~n, ~s, ~v ) = (1 − ρf (~n, ~s, ~v ))
and mi (~n, ~s, ~v ) = ρf (~n, ~s, ~v ), (eq. 3.4) may be reformulated as
ER (λ) = mb (~n, ~s, ~v )cb (λ) + mi (~n, ~s, ~v )ci (λ)
(3.10)
which corresponds to the dichromatic reflection model proposed by Shafer [20].
For light transmission, when the scattering coefficient is low compared to the
absorption coefficient, s(λ) ¿ k(λ), ET (eq. 3.8) reduces to Bouguer’s or LambertBeer’s law for absorption [22],
ET (λ, ~x) = e(λ, ~x) (1 − ρf (~x)) (1 − ρb (~x)) exp (−k(λ, ~x)d(~x)c(~x))
(3.11)
as expected.
Further, a unified model for both reflection and transmission of light is obtained
when considering Lambertian reflection and a uniform illumination for both cases.
For matte, dull surfaces, and a uniform illumination affected by shading, E R (eq. 3.5)
reduces to a multiplicative (Lambertian) model for body reflection,
ER (λ, ~x) = e(λ)i(~x)R∞ (λ, ~x)
(3.12)
where e(λ) is the colored but spatially uniform illumination and i(~x) denotes the
intensity distribution due to the surface geometry. Similar, for a uniform illuminated transparent material, intensity affected by shading and Fresnel reflectance, E T
(eq. 3.8) may be rewritten as
ET (λ, ~x) = e(λ)i(~x)C(λ, ~x)
(3.13)
where e(λ) is the uniform illumination, i(~x) denotes the intensity distribution, including Fresnel reflectance at front and back surface, and C(λ, ~x) represents the total
extinction coefficient, that is the total absorption- and scattering coefficient, within
the transparent layer. A general model for spectral image formation useful in both
reflectance and transmission of light may now be written as a multiplicative model,
E(λ, ~x) = e(λ)i(~x)m(λ, ~x)
(3.14)
30
where m(λ, ~x) denotes the material transmittance or reflectance function. Again,
e(λ) is the colored but spatially uniform illumination and i(~x) denotes the intensity
distribution. The validness of the model may be derived from models (eq. 3.5) and
(eq. 3.8). For reflectance of light, the model is valid for matte, dull surfaces, for which
the Fresnel reflectance is neglectable, and for isotropic light scattering within the
material. For light transmission, the model is valid for neutral interface reflection,
small angle of incidence to the surface normal, and isotropic light scattering within
the material. The model as such is used in the next sections to derive color invariant
material properties.
3.2
Illumination Invariant Properties of Object Reflectance or Transmittance
Any method for finding invariant color properties relies on a photometric model and
on assumptions about the physical variables involved. For example, hue is known to
be insensitive to surface orientation, illumination direction, intensity and highlights,
under a white illumination [6]. Normalized rgb is an object property for matte, dull
surfaces illuminated by white light. When the illumination color varies or is not white,
other object properties which are related to constant physical parameters should be
measured. In this section, expressions for determining material changes in images
will be derived, robust to a change in illumination color over time. Therefore, the
photometric model derived in section 3.1 is taken into account.
Consider the photometric reflection model (eq. 3.14) and an illumination with
locally constant color,
E(λ, ~x) = e(λ)i(~x)m(λ, ~x)
(3.15)
where e(λ) represents the illumination spectrum. The assumption allows for the
extraction of expressions describing material changes independent of the illumination.
Without loss of generality, we restrict ourselves to the one dimensional case; two
dimensional expressions may be derived according to Chapter 4. Differentiation of
(eq. 3.15) with respect to λ results in
∂E
∂e
∂m
= i(x)m(λ, x)
+ i(x)e(λ)
.
∂λ
∂λ
∂λ
(3.16)
Dividing (eq. 3.16) by (eq. 3.15) gives the relative differential,
1
∂E
1 ∂e
1
∂m
=
+
.
E(λ, x) ∂λ
e(λ) ∂λ m(λ, x) ∂λ
(3.17)
The result consists of two terms, the former depending on the illumination color and
the latter depending on material properties. Since the illumination color is constant
3.2. Illumination Invariant Properties of Object Reflectance or Transmittance
with respect to x, differentiation to x yields a material property only,
½
¾
½
¾
∂
1
1
∂
∂E
∂m
=
.
∂x E(λ, x) ∂λ
∂x m(λ, x) ∂λ
31
(3.18)
Within the Kubelka-Munk model, assuming matte, dull surfaces or transparent
layers, and assuming a single light source, Nλx determines changes in object reflectance or transmittance,
Nλx =
1
∂2E
1
∂E ∂E
−
E(λ, x) ∂λ∂x E(λ, x)2 ∂λ ∂x
(3.19)
which determines material changes independent of the viewpoint, surface orientation,
illumination direction, illumination intensity and illumination color. The expression
results from differentiation of (eq. 3.18).
The expression given by (eq. 3.19) is the fundamental lowest order illumination
invariant. Any spatio-spectral derivative of (eq. 3.19) inherently depends on the body
reflectance or object transmittance only. According to [17], a complete and irreducible
set of differential invariants is obtained by taking all higher order derivatives of the
fundamental invariant,
½
¾
∂ m+n
∂E ∂E
1
1
∂2E
Nλxλm xn =
−
(3.20)
∂λm ∂xn E(λ, x) ∂λ∂x E(λ, x)2 ∂λ ∂x
for m ≥ 0, n ≥ 0.
Application of the chain rule for differentiation yields the higher order expressions in terms of the spatio-spectral energy distribution. For instance, the spectral
derivative of Nλx is given by
Nλλx =
Eλλx E 2 − Eλλ Ex E − 2Eλx Eλ E + 2Eλ2 Ex
E3
(3.21)
where E(λ, x) is written as E for simplicity and indices denote differentiation. Note
that these expressions are valid everywhere E(λ, x) > 0. These invariants may be
interpreted as the spatial derivative of the normalized spectral slope Nλ and curvature
Nλλ of the reflectance function R∞ . Expressions for higher order derivatives are
straightforward.
A special case of (eq. 3.20) is for Lambert-Beer absorption (eq. 3.11) and slices of
locally constant thickness. Under these circumstances, ratios of invariants from set
N,
N0 =
N m,n
N p,q
(3.22)
for m, p ≥ 1 and n, q ≥ 0, are independent of the slice thickness. The property is
proven by considering differentiation with respect to λ of (eq. 3.11), and division by
32
(eq. 3.11), which results in
1 ∂e
∂k
1
∂ET
=
− dc(x)
.
ET (λ, x) ∂λ
e(λ) ∂λ
∂λ
Differentiation of the expression with respect to x yields
½
¾
∂k ∂c
∂2k
∂
∂ET
1
−d
.
= −dc(x)
∂x ET (λ, x) ∂λ
∂λ∂x
∂λ ∂x
(3.23)
(3.24)
By taking ratios of higher order derivatives, the constant thickness d is eliminated.
Summarizing, we have derived a complete set of color constant expressions determining object reflectance or transmittance. The expressions are invariant for a
change of illumination over time. The major assumption underlying the proposed
invariants is a single colored illumination, effectuating a spatially constant illumination spectrum. For an illumination color varying slowly over the scene with respect
to the spatial variation of the object reflectance or transmittance, simultaneous color
constancy is achieved by the proposed invariant.
We have proven that spatial differentiation is necessary to achieve color constancy
when pre-knowledge about the illuminant is not available. Hence, any color constant system should perform both spectral as well as spatial comparison in order
to be invariant against illumination changes, which confirms the theory of relational
color constancy as proposed in [4]. Accurate estimates of spatio-spectral differential quotients can be obtained by applying the Gaussian color model as described in
Chapter 2.
3.3
3.3.1
Experiments
Overview
The transmission of 168 patches from a calibration grid (IT8.7/1, Agfa, Mortsel,
Belgium) were measured (Spectrascan PR-713PC, Photo Research, Chatsworth, CA)
from 390 nm to 730 nm, resampled at 5 nm intervals. The patches include achromatic
colors, skin like tints and full colors (fig. 3.3). Each patch i will be represented by its
spectral transmission m̂i .
For the case of daylight, incandescent and halogen light, the emission spectra
are known to be a one parameter function of color temperature. For these important
classes of illuminants, the spectral energy distribution ek (λ) were calculated according
to the CIE method as described in [22]. Daylight illuminants were calculated in
the range of 4,000K up to 10,000K color temperature in steps of 500K. The 4,000K
and 10,000K illuminants represent extremes of daylight, whereas 6,500K represents
average daylight. Emission spectra of halogen and incandescent lamps are equivalent
33
3.3. Experiments
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Figure 3.3: The CIE 1964 chromaticity diagram of the colors in the calibration grid used for the
experiments, illuminated by average daylight D65
to blackbody radiators, generated from 2,000K up to 5,000K according to [22, Section
1.2.2].
For the case of fluorescent light, illuminants F1–F12 are used, as given by [7].
These are 12 representative spectral power distributions for fluorescent lamps.
Considering (eq. 3.14), the spectrum ski (λ) transmitted by a planar patch i under
illuminant k is given by
ski (λ) = ek (λ)mi (λ)
(3.25)
where mi (λ) is the spectral transmittance and ek (λ) the illumination spectrum.
Color values are calculated by the weighted summation over the transmitted spectrum ski at 5 nm intervals. For the CIE 1964 XYZ sensitivities, the XYZ value is
obtained by [22, Section 3.3.8]
X
=
Y
=
Z
=
1 X
x̄10 (λ)ek (λ)mi (λ)
k10
λ
1 X
ȳ10 (λ)ek (λ)mi (λ)
k10
λ
1 X
z̄10 (λ)ek (λ)mi (λ)
k10
(3.26)
λ
where k10 is a constant to normalize Yw = 100, Yw being the intensity of the light
34
source. Similarly, for the Gaussian color model (see Chapter 2) we have
E
= ∆λ
X
G(λ; λ0 , σλ )ek (λ)mi (λ)
λ
Eλ
= σλ ∆λ
X
Gλ (λ; λ0 , σλ )ek (λ)mi (λ)
λ
Eλλ
= σλ2 ∆λ
X
Gλλ (λ; λ0 , σλ )ek (λ)mi (λ)
(3.27)
λ
where ∆λ = 5 nm. Further, σλ = 55 nm and λ0 = 520 nm to be colorimetric with
human vision (see Chapter 2).
Color constancy is examined by evaluating edge strength under different simulated illumination conditions. Borders are formed by combining all patches with one
another, yielding 14,028 different color combinations. A ground truth is obtained by
taking a perfect white light illuminant. The reference boils down to an equal energy spectrum. The ground truth represents the patch transmission function up to
multiplication by a constant a,
sref
i (λ) = ami (λ) .
(3.28)
The difference in edge strength for two patches illuminated by the test illuminant
and the reference illuminant indicates the error in color constancy. We define the
color constancy ratio as
¯ k
¯
¯ d (i, j) − dref (i, j) ¯
¯
²k = 1 − ¯¯
¯
dref (i, j)
(3.29)
where dk is the color difference between two patches i, j under the test illuminant k,
and dref is the difference between the same two patches under the reference illuminant,
that is equal energy illumination. The color constancy ratio ²k measures the deviation
in edge strength between two patches i, j due to illuminant k relative to the edge
strength under the reference illuminant.
Three experiments are performed. One experiment evaluates the performance of
the proposed invariant (eq. 3.19) under ideal conditions. That is, it evaluates N λx
at scale σλ = 5 nm, for multiple small-band measurements with ∆λ = 5 nm covering
the visual spectrum. A second experiment assesses the influence of broad-band filters
on color constancy. The first experiment is repeated but now for σλ = 55 nm filters,
again with ∆λ = 5 nm covering the visual spectrum. The final experiment evaluates
color constancy for a colorimetric system detecting color differences. Three broadband measures (σλ = 55 nm) are taken at λ0 = 515 nm. The proposed invariant is
evaluated against the performance for color constancy of [21] and uv color space [22].
35
3.3. Experiments
Table 3.1: Results for the small-band experiment for invariant Nλx with σλ = 5 nm and 69 spectral
samples, δλ = 5 nm apart. Average percentage constancy ²̄ over 14.028 color edges is given, together
with standard deviation σ.
Daylight
T [K] ²̄ [%]
4000
99.9
4500
99.9
5000
99.9
5500
99.9
6000
99.9
6500
99.9
7000
99.9
7500
99.9
8000
99.9
8500
99.9
9000
99.9
9500
99.9
10000 99.9
3.3.2
(σ)
(0.2)
(0.2)
(0.2)
(0.2)
(0.2)
(0.2)
(0.1)
(0.1)
(0.1)
(0.1)
(0.1)
(0.1)
(0.1)
Blackbody
T [K] ²̄ [%]
2000
99.9
2500
99.9
3000
99.9
3500
99.9
4000
100.0
4500
100.0
5000
100.0
(σ)
(0.1)
(0.1)
(0.0)
(0.0)
(0.0)
(0.0)
(0.0)
Fluorescent
T [K] ²̄ [%]
(σ)
F1
99.4
(0.5)
F2
99.1
(0.8)
F3
98.7
(1.1)
F4
98.2
(1.6)
F5
99.5
(0.5)
F6
99.0
(0.9)
F7
99.5
(0.5)
F8
99.1
(0.7)
F9
98.8
(1.0)
F10
95.6
(1.7)
F11
94.4
(2.0)
F12
93.2
(2.4)
Small-Band Experiment
For each patch transmission, 69 Gaussian weighted samples were taken every 5 nm
with σλ = 5 nm. Invariant Nλx was calculated between each combination of two
patches for each central wavelength λ0 of the filters. For the experiment, color difference is defined by
ds =
sX
λc
(i, j)2
Nλx
(3.30)
λc
λc
(i, j)
where λc denotes the central wavelength of the cth filter (σλ = 5 nm), and Nλx
the edge strength (eq. 3.19) between patch i and j for filter c. Color constancy is
determined by (eq. 3.29), using ds as measure for color difference.
The results for the experiment are shown in tab. 3.1. Average constancy for daylight and blackbody is 99.9 ± 0.2%, which yields perfect color constancy. For the
fluorescent illuminants, average constancy is 97.9 ± 1.3%, almost perfect color constant. The small error is caused by the spectral spikes in the fluorescent emission
spectra, smoothed to the filter size of σλ = 5 nm.
The experiment demonstrates that perfect illumination invariance can be achieved
by using the proposed invariants and a spectrophotometer.
36
Table 3.2: Results for the broad-band experiment for invariant Nλx with σλ = 55 nm and 69 spectral
samples, δλ = 5 nm apart. Average percentage constancy ²̄ and standard deviation σ is given over
the 14.028 different edges.
T [K]
4000
4500
5000
5500
6000
6500
7000
7500
8000
8500
9000
9500
10000
3.3.3
Daylight
²̄ [%]
97.3
99.8
99.2
99.5
99.7
99.8
99.9
99.0
99.1
99.1
99.1
99.1
99.1
(σ)
(2.3)
(1.9)
(1.6)
(1.5)
(1.3)
(1.3)
(1.2)
(1.2)
(1.2)
(1.2)
(1.2)
(1.2)
(1.2)
Blackbody
T [K] ²̄ [%]
2000
93.2
2500
95.8
3000
97.1
3500
97.9
4000
98.4
4500
98.7
5000
98.9
(σ)
(5.3)
(3.3)
(2.2)
(1.5)
(1.1)
(0.8)
(0.7)
Fluorescent
T [K] ²̄ [%]
(σ)
F1
89.6
(9.2)
F2
86.1
(11.3)
F3
84.0
(12.0)
F4
82.2
(12.3)
F5
89.1
(9.3)
F6
85.1
(11.7)
F7
94.7
(7.1)
F8
95.4
(6.8)
F9
94.0
(7.6)
F10
88.0
(9.8)
F11
87.5
(9.8)
F12
86.8
(9.8)
Broad-Band Experiment
The experiment investigates the influence of broad-band filters by repeating the previous experiment but now for σλ = 55 nm. Hence, 69 largely overlapping Gaussian
weighted samples of the transmission spectrum are obtained.
The results show (tab. 3.2) constancy for daylight of 98.7 ± 1.5%. For blackbody
radiators, a constancy of 97.1±2.6% is achieved. These numbers are almost similar to
the results obtained for small-band filters. For fluorescent illuminants error increases
to 15% (average constancy 88.5 ± 9.9%) by using broad-band filters. Hence, approximation of derivatives with broad-band filters is valid under daylight and blackbody
illumination.
3.3.4
Colorimetric Experiment
For the colorimetric experiment, Gaussian weighted samples are taken at λ 0 = 520 nm
and σλ = 55 nm. Color difference is defined by
p
dN = Nλx (i, j)2 + Nλλx (i, j)2
(3.31)
were Nλx (i, j) (eq. 3.19) and Nλλx (i, j) (eq. 3.21) measures total chromatic edge
strength between patch i and j. Color constancy is determined by (eq. 3.29), using
dN as measure for color difference.
37
3.3. Experiments
For comparison, the experiment is repeated with the CIE XYZ 1964 sensitivities
for observation. Color difference is defined by the Euclidian distance in the CIE 1976
u0 v 0 color space [22, Section 3.3.9],
duv =
q
(u0i − u0j )2 + (vi0 − vj0 )2
(3.32)
where i, j represent the different patches. Color constancy is determined by (eq. 3.29),
using duv as measure for color difference. Note that for the u0 v 0 color space no
information about the light source is included. Further, u0 v 0 space is similar to uv
space up to a transformation of the achromatic point. The additive transformation of
the white point makes uv space a color constant space. Differences in u0 v 0 are equal
to differences in uv space. Hence, duv is an illumination invariant measure of color
difference.
As a well known reference, the von Kries transform for chromatic adaptation [21]
is evaluated in a similar experiment. Von Kries method is based on Lambertian
reflection, assuming that the (known) sensor responses to the illuminant may be used
to eliminate the illuminant from the measurement. For the experiment, von Kries
adaptation is applied on the measured color values, and the result is transformed to
the equal energy illuminant [7]. Thereafter, color difference between patches i and j
taken under the test illuminant is calculated according to (eq. 3.32). Comparison to
the color difference between the same two patches under the reference illuminant is
obtained by (eq. 3.29), using the von Kries transformed u’v’ distance as measure for
color distance.
Results for the color constancy measurements are given for daylight illumination
(tab. 3.3), blackbody radiators (tab. 3.4), and fluorescent light (tab. 3.5).
Average constancy over the different phases of daylight is for the proposed invariant 91.8 ± 6.1%. Difference in u0 v 0 color space performs similar with an average of
91.9 ± 6.3%. The von Kries transform is 5% more color constant, 96.0 ± 3.3%. As
expected, the von Kries transform has a better performance given that the color of
the illuminant is taken into account.
For blackbody radiators, the proposed invariant is on average 88.9 ± 12.5% color
constant. The proposed invariant is more color constant than u0 v 0 differences, average
82.4±15.1%. Again, von Kries transform is even better with an average of 93.4±6.8%.
For these types of illuminants, often running at a low color temperature, variation
due to illumination color is drastically reduced by the proposed method.
The proposed method is less color constant than von Kries adaptation, which
requires knowledge on the color of the light source. In comparison to u0 v 0 color
differences, the proposed invariant offers better performance for low color temperature
illuminants.
Color constancy for fluorescent illuminants is on average 85.0 ± 11.8% for the
proposed invariant, 84.7 ± 10.5% for u0 v 0 difference, and 89.4 ± 8.8% for the von
38
Table 3.3: Results for the different colorimetric experiments with daylight illumination, ranging from
4,000K to 10,000K color temperature. Average percentage constancy ²̄ and standard deviation σ for
the proposed invariant N , the von Kries transform, and u0 v 0 difference.
N
T [K]
4000
4500
5000
5500
6000
6500
7000
7500
8000
8500
9000
9500
10000
²̄ [%]
92.2
94.5
94.9
94.1
93.2
92.5
91.8
91.2
90.6
90.1
89.6
89.2
88.8
(σ)
(5.6)
(4.2)
(2.8)
(1.8)
(2.7)
(4.0)
(5.2)
(6.2)
(7.0)
(7.6)
(8.2)
(8.7)
(9.1)
von Kries
²̄ [%]
(σ)
96.1
(3.2)
97.9
(1.8)
99.2
(0.7)
98.9
(1.0)
97.9
(1.7)
96.9
(2.4)
96.1
(2.9)
95.4
(3.4)
94.8
(3.8)
94.3
(4.2)
93.8
(4.5)
93.4
(4.8)
93.0
(5.1)
u’v’
²̄ [%]
86.9
91.1
94.5
96.6
97.6
96.1
94.3
92.7
91.2
89.9
88.8
87.8
86.9
(σ)
(10.0)
(7.1)
(4.6)
(2.0)
(1.8)
(2.7)
(3.8)
(4.9)
(6.0)
(6.9)
(7.7)
(8.4)
(9.1)
Kries transform. As already pointed out for tab. 3.2, the large integration filters
are not capable in offering color constancy for the class of fluorescent illuminants.
The use of broad-band filters limits the applicability to smooth spectra, for which
the Gaussian weighted differential quotients as derived in Chapter 2 are accurate
estimations. For outdoor scenes, halogen illumination and incandescent light, the
illumination spectra may be considered smooth, as shown by the experimental results
tab. 3.2 versus tab. 3.1.
3.4
Discussion
This chapter presents a physics-based background for color constancy, valid for both
light reflectance as well as light transmittance. To achieve that goal, the KubelkaMunk theory is used as a model for color image formation. By considering spatial and
spectral derivatives of the formation model, object reflectance properties are derived
independent of the spectral energy distribution of the illuminant. Knowledge about
the spectral power distribution of the illuminant is not required for the proposed
invariant, as opposed to the well known von Kries transform for color constancy [21].
The robustness of our invariant (eq. 3.19) is assured by using the Gaussian color
model, introduced in Chapter 3. The Gaussian color model is considered an adequate
39
3.4. Discussion
Table 3.4: Results for the different colorimetric experiments with blackbody radiators from 2,000K
to 5,000K color temperature. Average percentage constancy ²̄ and standard deviation σ for the
proposed invariant N , the von Kries transform, and u0 v 0 difference.
N
T [K]
2000
2500
3000
3500
4000
4500
5000
²̄ [%]
75.6
82.5
87.1
90.8
93.7
96.0
96.9
(σ)
(24.5)
(16.5)
(11.2)
(7.5)
(4.9)
(3.0)
(1.7)
von Kries
²̄ [%]
(σ)
85.6
(12.4)
89.0
(9.4)
91.9
(6.8)
94.3
(4.7)
96.3
(3.0)
97.9
(1.7)
99.1
(0.7)
u’v’
²̄ [%]
65.8
72.3
78.3
83.7
88.4
92.5
95.9
(σ)
(24.9)
(20.6)
(16.3)
(12.4)
(8.9)
(6.0)
(3.4)
Table 3.5: Results for the colorimetric experiments with representative fluorescent illuminants. Average percentage constancy ²̄ and standard deviation σ for the proposed invariant N , the von Kries
transform, and u0 v 0 difference.
N
T [K]
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
²̄ [%]
82.4
82.7
79.9
77.2
81.1
80.6
90.2
93.6
93.3
87.2
87.1
84.9
(σ)
(14.4)
(12.4)
(13.5)
(15.4)
(15.4)
(13.7)
(7.8)
(3.1)
(4.4)
(9.1)
(10.1)
(13.6)
von
²̄ [%]
89.4
87.8
85.5
83.6
88.1
85.9
95.2
97.8
95.3
91.1
88.3
85.0
Kries
(σ)
(7.9)
(7.9)
(9.8)
(11.6)
(8.6)
(8.9)
(3.7)
(1.6)
(3.6)
(8.8)
(11.2)
(13.8)
u’v’
²̄ [%]
88.6
82.9
76.4
71.4
87.4
79.7
93.7
94.6
90.1
91.7
85.5
74.9
(σ)
(7.9)
(7.5)
(11.4)
(14.9)
(8.9)
(8.8)
(3.9)
(4.4)
(7.8)
(9.2)
(12.6)
(18.7)
approximation of the human tri-stimulus sensitivities. The Gaussian color model
measures the intensity, first, and second order derivative of the spectral energy distribution, combined in a well-established spatial observation theory. Application of
the Gaussian color model in color constancy ensures compatibility with colorimetry,
while inherently physically sound and robust measurements are derived.
From a different perspective, color constancy was considered in [1, 14]. The back-
40
ground is experimental colorimetry, where subjects are asked to match the reference
and test illumination condition. As a consequence their experiments do not include
shadow and shading. The result of their approach shows approximate color constancy under natural illuminants. However, their approach is unable to cope with
color constancy of three dimensional scenes, where shadow plays an important role.
The advantage of our physical approach over an empirical colorimetric approach, is
that invariant properties are deduced from the image formation model. Our proposed
(eq. 3.19) is designed to be insensitive to intensity changes due to the scene geometry.
The proposed invariant (eq. 3.19) is evaluated by experiments on spectral data of
168 transparent patches, illuminated by daylight, blackbody, and fluorescent illuminants. Average constancy is 90 ± 5% for daylight, 90 ± 10% for blackbody radiators,
and 85 ± 10% for fluorescent illuminants. The performance of the proposed method
is slightly less than that of the von Kries transform. Average constancy for von Kries
on the 168 patches is 95 ± 3% for daylight, 95 ± 5% for blackbody radiators, and
90 ± 10% for fluorescent illuminants. This is explained from the fact that the von
Kries transform requires explicit knowledge of material and illuminant, and even than
the difference is small. There are many circumstances where such a knowledge of material and illuminant is missing, especially in image retrieval from large databases,
or when calibration is not practically feasible as is frequently the case in light microscopy. The proposed method requires knowledge about the material only, hence is
applicable under a larger set of imaging circumstances.
As an alternative for color constancy under an unknown illuminant, one could use
Luv color space differences [22] instead of the proposed method. We have evaluated
color constancy for both methods. The proposed invariant offers similar performance
to u0 v 0 color differences. This is remarkable, given the different background against
which the methods are derived. Whereas u0 v 0 is derived from colorimetric experiments, hence from human perception, the proposed invariant N is derived from
measurement theory –the physics of observation– and physical reflection models. Apparently, it is the physical cause of color, and the environmental variation in physical
parameters, to which the human visual system adapts.
As pointed out in [14], mechanisms responding to cone-specific contrast offer a
better correspondence with human vision than by a system that estimates illuminant
and reflectance spectra. The research presented here raises the question whether the
illuminant is estimated at all in pre-attentive vision. The physical model presented
demands spatial comparison in order to achieve color constancy, thereby confirming
relational color constancy as a first step in color constant vision [4, 16]. Hence, lowlevel mechanisms as color constant edge detection reported here may play a role in
front-end vision.
Bibliography
41
Bibliography
[1] D. H. Brainard. Color constancy in the nearly natural image: 2. Achromatic loci.
J. Opt. Soc. Am. A, 15:307–325, 1998.
[2] M. D’Zmura and P. Lennie. Mechanisms of color constancy. J. Opt. Soc. Am.
A, 3(10):1662–1672, 1986.
[3] G. D. Finlayson. Color in perspective. IEEE Trans. Pattern Anal. Machine
Intell., 18(10):1034–1038, 1996.
[5] B. V. Funt and G. D. Finlayson. Color constant color indexing. IEEE Trans.
Pattern Anal. Machine Intell., 17(5):522–529, 1995.
32:453–464, 1999.
[7] R. W. G. Hunt. Measuring Colour. Ellis Horwood Limited, Hertfordshire, England, 1995.
New York, NY, 1975.
63:291–297, 1990.
[11] P. Kubelka. New contribution to the optics of intensely light-scattering materials.
part I. J. Opt. Soc. Am., 38(5):448–457, 1948.
Physik, 12:593, 1931.
[14] M. P. Lucassen and J. Walraven. Color constancy under natural and artificial
illumination. Vision Res., 37:2699–2711, 1996.
[15] L. T. Maloney and B. A. Wandell. Color constancy: a method for recovering
surface spectral reflectance. J. Opt. Soc. Am. A, 3:29–33, 1986.
[16] S. M. C. Nascimento and D. H. Foster. Relational color constancy in achromatic
and isoluminant images. J. Opt. Soc. Am. A, 17(2):225–231, 2000.
42
[17] P. Olver, G. Sapiro, and A Tannenbaum. Differential invariant signatures
and flows in computer vision: A symmetry group approach. In B. M. ter
Haar Romeny, editor, Geometry-Driven Diffusion in Computer Vision. Kluwer
Academic Publishers, Boston, 1994.
[18] M. Pluta. Advanced Light Microscopy, volume 1. Elsevier, Amsterdam, 1988.
[19] G. Sapiro. Color and illuminant voting. IEEE Trans. Pattern Anal. Machine
Intel., 21(11):1210–1215, 1999.
10(4):210–218, 1985.
[21] J. von Kries. Influence of adaptation on the effects produced by liminous stimuli.
In D. L. MacAdam, editor, Sources of Color Vision. MIT Press, Cambridge, MS,
1970.
Chapter 4
Measurement of Color
Invariants
submitted∗ to IEEE Transactions on Pattern Analysis and Machine Intelligence.
“Attaching significance to invariants is an effort to recognize what, because of
its form or colour or meaning or otherwise, is important or significant in what is
only trivial or ephemeral.”
– –H.W. Turnbull.
It is well known that color is a powerful cue in the distinction and recognition of
objects. Segmentation based on color, rather than just intensity, provides a broader
class of discrimination between material boundaries. Modeling the physical process
of color image formation provides a clue to the object-specific parameters [6, 8, 19].
To reduce some of the complexity intrinsic to color images, parameters with known
invariance are of prime importance. Current methods for the measurement of color
invariance require a fully sampled spectrum as input data usually derived by a spectrometer. Angelopoulou et al. [1] use the spectral gradient to estimate surface reflectance from multiple images of the same scene, captured with different spectral
narrow band filters. The assumptions underlying their approach require a smoothly
varying illumination. Their method is able to accurately estimate surface reflectance
independent of the scene geometry. Stokman and Gevers [20] propose a method for
edge classification from spectral images. Their method aims in detecting edges and
assigning one of the types: shadow or geometry, highlight, or a material edge. Under
the assumption of spectral narrow band filters, and for a known illumination spectrum, they prove their method to be accurate in edge classification. These approaches
∗ Part
of this work has appeared in the proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition 2000, vol. 1, pp. 50–57.
43
44
Chapter 4. Measurement of Color Invariants
hampers broad use as spectrometers are both slow and expensive. In addition they
do not provide two-dimensional spatial resolution easily. In this chapter we aim at a
broad range of color invariants measured from RGB-cameras.
To that end, differential geometry is adopted as the framework for feature detection and segmentation of images. Its impact in computer vision is overwhelming but
mostly limited to grey-value images [5, 15, 21]. Embedding the theory in the scalespace paradigm [11, 13] resulted in well-posed differential operators robust against
noisy measurements, with the Gaussian aperture as the fundamental operator. Only
a few papers are available on color differential geometry [7, 18], which are mainly
based on the color gradient proposed by Di Zenzo [23]. In the paper, an expression
for the color gradient is derived by analysis of the eigensystem of the color structure
tensor. In [2], curvature and zero-crossing detection is investigated for the directional
derivative of the color gradient. For these geometrical invariants no physical model
is taken into account, yielding measurements which are highly influenced by the specific imaging circumstances as shadow, illumination, and viewpoint. We consider the
introduction of wavelength in the scale-space paradigm, as suggested by Koenderink
[12]. This leads to a spatio-spectral family of Gaussian aperture functions, in Chapter 2 introduced as the Gaussian color model. Hence, the Gaussian color model may
be considered an extension of the differential geometry framework into the spatiospectral domain. We apply the spatio-spectral scale-space to the measurement of
photometric and geometric invariants.
In Chapter 3, the authors discuss the use of the Shafer model [19], effectively based
on the older Kubelka-Munk theory [14], to measure object reflectance independent of
illumination color. The Kubelka-Munk theory models the reflected spectrum of a colored body [10, 22], based on a material-dependent scattering and absorption function,
under the assumption that light is isotropically scattered within the material. The
theory has proven to be successful for a wide variety of materials and applications [10].
Therefore, the Kubelka-Munk theory is well suited for determining material properties from color measurements. We use the Kubelka-Munk theory for the definition
of object reflectance properties, for a wide range of assumptions regarding imaging
conditions.
The measurement of invariance involves a balance between constancy of the measurement regardless of the disturbing influence of the unwanted transform on the one
hand, and retained discriminating power between truly different states of the objects
on the other. As a general rule, features allowing ignorance of a larger set of disturbing factors, less discriminative power can be expected. We refer to such features as
broad features. Hence, both invariance and discriminating power of a method should
be investigated simultaneously. Only this allows to asses the practical performance of
the proposed method. In this chapter we extensively investigate invariant properties
and discriminative power.
The chapter is organized as follows. Section 4.1 describes a physical model for im-
45
age formation, based on the Kubelka-Munk theory. First contribution of this chapter
is a complete set of invariant expressions derived for basically three different imaging
conditions (section 4.2). A second important contribution considers the robust measurement of invariant expressions from RGB-images (section 4.3). Further, section 4.3
demonstrates the performance of the features as invariance and discriminative power
between different colored patches, which may be considered as a third contribution.
4.1
Color Image Formation Model
In Chapter 3, image formation is modeled by means of the Kubelka-Munk theory
[10, 22] for colorant layers. Under the assumption that light within the material
is isotropically scattered, the material layer may be characterized by a wavelength
dependent scatter coefficient and absorption coefficient. The model unites both reflectance of light and transparent materials. The class of materials for which the
theory is useful ranges from dyed paper and textiles, opaque plastics, paint films, up
to enamel and dental silicate cements [10].
In the sequel we will derive color invariant expressions under various imaging
conditions. Therefore, an image formation model adequate for reflectance of light in
real-world scenes is considered. We consider the Kubelka-Munk theory as a general
model for color image formation. The photometric reflectance model resulting from
the Kubelka-Munk theory is given by (see Chapter 3)
2
E(λ, ~x) = e(λ, ~x) (1 − ρf (~x)) R∞ (λ, ~x) + e(λ, ~x)ρf (~x)
(4.1)
where x denotes the position at the imaging plane and λ the wavelength. Further,
e(λ, ~x) denotes the illumination spectrum and ρf (~x) the Fresnel reflectance at ~x. The
material reflectivity is denoted by R∞ (λ, ~x). The reflected spectrum in the viewing
direction is given by E(λ, ~x). When redefining symbols cb (λ, ~x) = e(λ, ~x)R∞ (λ, ~x),
2
ci (λ, ~x) = e(λ, ~x), mb (~x) = (1 − ρf (~x)) and mi (~x) = ρf (~x), (eq. 4.1) reduces to
E(λ, ~x) = mb (~x)cb (λ, ~x) + mi (~x)ci (λ, ~x)
(4.2)
which is the dichromatic reflection model by Shafer [19].
Concerning the Fresnel reflectance, the photometric model assumes a neutral interface at the surface patch. As discussed in [17, 19], deviations of ρf over the visible
spectrum are small for commonly used materials, therefore the Fresnel reflectance
coefficient may be considered constant.
The following special case can be derived. For matte, dull surfaces, the Fresnel
coefficient can be considered neglectable, ρf (~x) ≈ 0, for which E(λ, ~x) (eq. 4.1) reduces
to the Lambertian model for diffuse body reflection
E(λ, ~x) = e(λ, ~x)R∞ (λ, ~x)
as expected.
(4.3)
46
4.2
Determination of Color Invariants
Any method for finding invariant color properties relies on a photometric model and
on assumptions about the physical variables involved. For example, hue is known to
be insensitive to surface orientation, illumination direction, intensity and highlights,
under a white illumination [8]. Normalized rgb is an object property but only for
matte, dull surfaces and only when illuminated by white light. When the illumination
color is not white, other object properties should be measured.
In this section, expressions for determining invariant properties in color images
will be derived for three different imaging conditions, taking into account the photometric model derived in section 4.1. The imaging conditions are assumed to be the 5
relevant out of 8 combinations of: a. white or colored illumination, b. matte, dull
object or general object, or c. uniformly stained object or generally colored object.
Further specialization as uniform illumination or a single illumination spectrum may
be considered. Note that each essentially different condition of the scene, object or
recording circumstances results in suited different invariant expressions. For notational convenience, we first concentrate on the one dimensional case; two dimensional
expressions will be derived later when introducing geometrical invariants.
4.2.1
Invariants for White but Uneven Illumination
Consider the photometric reflection model (eq. 4.1). For white illumination, the
spectral components of the source are approximately constant over the wavelengths.
Hence, a spatial component i(x) denotes intensity variations, resulting in
n
o
2
E(λ, x) = i(x) ρf (x) + (1 − ρf (x)) R∞ (λ, x) .
(4.4)
The assumption allows the extraction of expressions describing object reflectance
independent of the Fresnel reflectance. Let indices of λ and x indicate differentiation,
and from now on dropping (λ, x) from E(λ, x) when such will cause no confusion.
Lemma 6 Within the Kubelka-Munk model, assuming dichromatic reflection and
white illumination,
H=
Eλ
Eλλ
is an object reflectance property independent of viewpoint, surface orientation, illumination direction, illumination intensity and Fresnel reflectance coefficient.
Proof: Differentiating (eq. 4.4) with respect to λ twice results in
Eλ = i(x)(1 − ρf (x))2
∂R∞ (λ, x)
∂λ
47
4.2. Determination of Color Invariants
and
Eλλ = i(x)(1 − ρf (x))2
∂ 2 R∞ (λ, x)
.
∂λ2
Hence, their ratio depends on derivatives of the object reflectance functions R ∞ (λ, x)
only, which proves the lemma.
t
u
To interpret H, consider the local Taylor expansion at λ0 truncated at second
order,
1
E(λ0 + ∆λ) ≈ E(λ0 ) + ∆λEλ (λ0 ) + ∆λ2 Eλλ (λ0 ) .
2
(4.5)
The function extremum of Eλ (λ0 + ∆λ) is at ∆λ for which the first order derivative
is zero,
d
{E(λ0 + ∆λ)} = Eλ (λ0 ) + ∆λEλλ (λ0 ) = 0 .
dλ
(4.6)
Hence, for ∆λ near the origin λ0 ,
∆λmax = −
Eλ (λ0 )
.
Eλλ (λ0 )
(4.7)
In conclusion, the property H is related to the hue (i.e. arctan (λmax )) of the material.
For Eλλ (λ0 ) < 0 the result is at a maximum and describes a Newtonian (prism) color,
whereas for Eλλ (λ0 ) > 0 the result is at a minimum and indicates a non-Newtonian
(slit) color.
Of significant importance is the derivation of a complete set Ψ of functionally independent (irreducible) differential invariants Ψi . Completeness states that all possible
independent invariants for the unwanted distortion are present in the set Ψ. From
Olver et al. [16], the basic method for constructing a complete set of differential
invariants is to use invariant differential operators. A differential operator is said to
be invariant under a given distortion if it maps differential invariants to higher order
differential invariants. Hence, by iteration, such an operator produces a hierarchy
of differential invariants of arbitrarily large order n, given a lowest order invariant.
The lowest order invariant is referred to as the fundamental invariant. Summarizing,
for a lowest order color invariant, a differential operator may be defined to construct
complete, irreducible sets of color invariants under the same imaging conditions by
iteration.
Proposition 7 A complete and irreducible set of color invariants, up to a given
differential order, is given by all derivatives of the fundamental color invariant.
In the sequel we will define the generating differential operator given the lowest order
fundamental invariant.
48
The expression given by Lemma 6 is a fundamental lowest order invariant. As
a result of Proposition 7, differentiation of the expression for H with respect to x
or λ results in object reflectance properties under a white illumination. Note that
H is ill-defined when the second order spectral derivative vanishes. We prefer to
compute differentials of the arctan (H) a monotonic function of H, for which the
spatial derivatives yield better numerical stability.
Corollary 8 Within the Kubelka-Munk model, a complete and irreducible set of invariants for dichromatic reflection and a white illumination is given by
¶¾
½
µ
∂ m+n
Eλ
(4.8)
Hλ m x n =
arctan
∂λm ∂xn
Eλλ
for m, n ≥ 0.
Application of the chain rule for differentiation yields the higher order expressions
in terms of the spatio-spectral energy distribution. For illustration, we give all expressions for first spatial derivative and second spectral order. The hue spatial derivative
is given by
Hx =
Eλλ Eλx − Eλ Eλλx
2
Eλ2 + Eλλ
2
admissible for Eλ2 + Eλλ
> 0.
In the sequel we also need an expression for color saturation S,
q
1
S=
Eλ 2 + Eλλ 2 .
E(λ, x)
4.2.2
(4.9)
(4.10)
Invariants for White but Uneven Illumination and Matte,
Dull Surfaces
A class of tighter invariants may be derived when the object is matte and dull. Consider the photometric reflection model (eq. 4.4), for matte, dull surfaces with low
Fresnel reflectance, ρf (~x) ≈ 0,
E = i(x)R∞ (λ, x) .
(4.11)
These assumptions allow the derivation of expressions describing object reflectance
independent of the intensity distribution.
Lemma 9 Within the Kubelka-Munk model, assuming matte, dull surfaces, and a
white illumination,
Eλ
E
is an object reflectance property independent of the viewpoint, surface orientation,
illumination direction and illumination intensity.
Cλ =
49
Proof: Differentiation of (eq. 4.11) with respect to λ and normalization by (eq. 4.11)
results in an equation depending on object property only,
Eλ
∂R∞ (λ, x)
1
=
E
R∞ (λ, x)
∂λ
which proves the lemma.
t
u
The property Cλ may be interpreted as describing object color regardless intensity.
As a result of Proposition 7, all normalized higher order spectral derivatives of
Cλ , and their spatial derivatives, result in object reflectance properties under white
illumination. The normalization by E is to be evaluated at the spectral wavelength
of interest, and therefore is considered locally constant with respect to λ.
Corollary 10 Within the Kubelka-Munk model, a complete and irreducible set of
invariants for matte, dull surfaces, under a white illumination is given by
½
¾
∂ n Eλ m
m
n
Cλ x =
(4.12)
∂xn
E
for m ≥ 1, n ≥ 0.
Specific first spatial and second spectral order expressions are given by
Eλλ
E
Eλx E − Eλ Ex
=
E2
Eλλx E − Eλλ Ex
=
.
E2
Cλλ =
Cλx
Cλλx
(4.13)
Note that these expressions are valid everywhere E > 0. These invariants may be
interpreted as the spatial derivative of the intensity normalized spectral slope C λ and
curvature Cλλ .
4.2.3
Invariants for White, Uniform Illumination and Matte,
Dull Surfaces
For uniform illumination, consider again the photometric reflection model (eq. 4.11)
for matte, dull surfaces, and a white and uniform illumination with intensity i,
E(λ, x) = iR∞ (λ, x) .
(4.14)
The assumption of a white and uniformly illuminated object may be achieved under
well defined circumstances, such as photography of art. These assumptions allow the
derivation of expressions describing object reflectance independent of the intensity
level.
50
Lemma 11 Within the Kubelka-Munk model, assuming matte, dull surfaces, planar
objects, and a white and uniform illumination,
Wx =
Ex
E
determines changes in object reflectance independent of the illumination intensity.
Proof: Differentiation of (eq. 4.14) with respect to x and normalization by (eq. 4.14)
results in
Ex
∂R∞ (λ, x)
1
=
.
E
R∞ (λ, x)
∂x
This is an object reflectance property.
t
u
The property Wx may be interpreted as an edge detector specific for changes in
spectral distribution. Under common circumstances, a geometry dependent intensity
term is present, hence Wx does not represent pure object properties but will include
shadow edges where present.
As a result from Proposition 7, all normalized higher order derivatives of Wx yield
object reflectance properties under a white and uniform illumination. The normalization by E is to be evaluated at the spatial and spectral point of interest. Hence it
is considered locally constant.
invariants for matte, dull surfaces, planar objects, under a white and uniform illumination is given by
Wλ m x n =
Eλ m x n
E
(4.15)
for m ≥ 0, n ≥ 1.
Specific expressions for E > 0 up to first spatial and second spectral order are given
by
Eλx
E
Eλλx
.
=
E
Wλx =
Wλλx
(4.16)
These invariants may be interpreted as the intensity normalized spatial derivatives of
the spectral intensity E, spectral slope Eλ and spectral curvature Eλλ .
51
4.2.4
Invariants for Colored but Uneven Illumination
For colored illumination, when the spectral energy distribution of the illumination
does not vary over the scene, the illumination may be decomposed into a spectral
component e(λ) representing the illumination color, and a spatial component i(x)
denoting variations in intensity due to the scene geometry. Hence, for matte, dull
surfaces ρf → 0,
E = e(λ)i(x)R∞ (λ, x) .
(4.17)
The assumption allows us to derive expressions describing object reflectance independent of the illumination.
Lemma 13 Within the Kubelka-Munk model, assuming matte, dull surfaces and a
single illumination spectrum,
Nλx =
Eλx E − Eλ Ex
E2
determines changes in object reflectance independent of the viewpoint, surface orientation, illumination direction, illumination intensity and illumination color.
Proof: Differentiation of (eq. 4.17) with respect to λ results in
Eλ = i(x)R∞ (λ, x)
∂R∞ (λ, x)
∂e(λ)
+ e(λ)i(x)
.
∂λ
∂λ
Dividing by (eq. 4.17) gives the relative differential,
Eλ
1 ∂e(λ)
1
∂R∞ (λ, x)
=
+
.
E
e(λ) ∂λ
R∞ (λ, x)
∂λ
The result consists of two terms, the former depending on the illumination color only
and the latter depending on body and Fresnel reflectance only. Differentiation to x
yields
½
¾
½
¾
∂ Eλ
∂R∞ (λ, x)
∂
1
=
.
∂x E
∂x R∞ (λ, x)
∂λ
The right hand side is depending only on object property. This proves the lemma.
t
u
The invariant Nλx may be interpreted as the spatial derivative of the spectral
change of the reflectance function R∞ (λ, x) and therefore indicates transitions in
object reflectance. Hence, Nλx determines material transitions regardless illumination
color and intensity distribution.
As a result of Proposition 7, further differentiation of Nλx results in object reflectance properties under a colored illumination.
52
invariants for matte, dull surfaces, and a single illumination spectrum, is given by
½
¾
∂ m+n−2
Eλx E − Eλ Ex
(4.18)
Nλ m x n =
∂λm−1 ∂xn−1
E2
for m ≥ 1, n ≥ 1.
The third order example is the spectral derivative of Nλx for E(λ, x) > 0,
Nλλx =
4.2.5
Eλλx E 2 − Eλλ Ex E − 2Eλx Eλ E + 2Eλ2 Ex
.
E3
(4.19)
Invariants for a Uniform Object
For uniformly colored planar surface, the reflectance properties are spatially constant.
Hence the reflectance function R∞ and Fresnel coefficient ρf are independent of x,
n
o
2
E = e(λ, x) ρf + (1 − ρf ) R∞ (λ) .
(4.20)
For a single illumination source, expressions describing interreflections may be extracted, i.e. the reflected spectrum of surrounding materials.
Lemma 15 Within the Kubelka-Munk model, assuming dichromatic reflection, a single illumination source, and a uniformly colored planar surface,
Uλx =
Eλx E − Eλ Ex
E2
determines interreflections of colored objects, independent of the object spectral reflectance function.
Proof: Differentiating (eq. 4.20) with respect to λ results in
©
ª ∂e(λ, x)
∂R∞ (λ)
Eλ = ρf + (1 − ρf )2 R∞ (λ)
+ e(λ, x)(1 − ρf )2
.
∂λ
∂λ
Normalization by (eq. 4.20) results in
1 ∂e(λ, x)
(1 − ρf )2
∂R∞ (λ)
Eλ
=
+
.
E
e(λ, x) ∂λ
ρf + (1 − ρf )2 R∞ (λ) ∂λ
Differentiation with respect to x results in
½
¾
½
¾
∂
1 ∂e(λ, x)
∂ Eλ
=
∂x E
∂x e(λ, x) ∂λ
which depends on the illumination only. Differentiation yields the lemma.
t
u
53
The property Uλx may be interpreted as describing edges due to interreflections
and specularities. When ambient illumination is present casting a different spectral
distribution, the invariant describes shadow edges due to the combined ambient illumination and incident illumination.
Note that the expression for Lemma 15 is identical to the expression in Lemma 13.
Consequently, changes in object reflectance cannot be distinguished from interreflections in single images. Further differentiation of Uλx yield interreflections when assuming a uniform colored planar surface. The result is identical to (eq. 4.19).
4.2.6
Summary of Color Invariants
In conclusion, within the Kubelka-Munk model, various sets of invariants are derived
as summarized in tab. 4.1. The class of materials for which the invariants are useful
ranges from dyed paper and textiles, opaque plastics, paint films, up to enamel and
dental silicate cements [10]. The invariant sets may be ordered by broadness of invariance, where broader sets allow ignorance of a larger set of disturbing factors than
tighter sets.
Table 4.1: Summary of the various color invariant sets and their invariance to specific imaging
conditions. Invariance is denoted by “+”, whereas sensitivity to the imaging condition is indicated
by “–”. Note that the reflected spectral energy distribution E is sensitive to all the conditions cited.
H
N
U
C
W
E
viewing
direction
surface
orientation
highlights
illumination
direction
illumination
intensity
illumination
color
inter
reflection
+
+
+
+
–
–
+
+
+
+
–
–
+
–
–
–
–
–
+
+
+
+
–
–
+
+
+
+
+
–
–
+
+
–
–
–
–
–
–
–
–
–
The table offers the solution of using the narrowest set of invariants for known
imaging conditions, since H ⊂ N = U ⊂ C ⊂ W ⊂ E. In the case that recording
circumstances are unknown the table offers a broad to narrow hierarchy. Hence, an
incremental strategy of invariant feature extraction may be applied. Combination
of invariants open up the way to edge type classification as suggested in [9]. The
vanishing of edges for certain invariants indicate if their cause is shading, specular
reflectance, or material boundaries.
54
4.2.7
Geometrical Color Invariants in Two Dimensions
So far, we have established color invariant descriptors, based on differentials in the
spectral and the spatial domain in one spatial dimension. When applied in two dimensions, the result is depending on the orientation of the image content. In order to
obtain meaningful image descriptions it is crucial to derive descriptors which are invariant with respect to translation, rotation and scaling. For the grey-value luminance
L geometrical invariants are well established [5]. Translation and scale invariance is
obtained by examining the (Gaussian) scale-space, which is a natural representation
for investigating the scaling behavior of image features [11]. Florack et al. [5] extent the Gaussian scale-space with rotation invariance, by considering in a systematic
manner local gauge coordinates. The coordinate axis w and v are aligned to the gradient and isophote tangents directions, respectively. Hence, the first order gradient
gauge invariant is the magnitude of the luminance gradient,
q
Lw = L2x + L2y .
(4.21)
Note that the first order isophote gauge invariant is zero by definition. The second
order invariants are given by
Lvv =
L2x Lyy − 2Lx Ly Lxy + L2y Lxx
L2w
(4.22)
related to isophote curvature,
Lvw
¡
¢
Lx Ly (Lyy − Lxx ) − L2x − L2y Lxy
=
L2w
(4.23)
related to flow-line curvature, and
Lww =
L2x Lxx + 2Lx Ly Lxy + L2y Lyy
L2w
(4.24)
related to isophote density. Note that the Laplacian operator ∆L = Lxx + Lyy is an
invariant and hence equal to
∆L = Lvv + Lww .
On the basis of these spatial results, we combine (eq. 4.21)—(eq. 4.24) with the color
invariants for the 1D-case established before. The resulting first order expressions are
given in tab. 4.2.
Two or three measures for edge strength are derived, one for each spectral differential order. The only exception is H. Total edge strength due to differences in the
55
4.3. Measurement of Color Invariants
Table 4.2: Summary of the first order geometrical invariants for the various color invariant sets. See
tab. 4.1 for invariant class.
∂
∂w
H
Cλ
Cλλ
W
Wλ
Wλλ
Nλ
Nλλ
q
Hx2 + Hy2
q
= 2 1 2
(Eλλ Eλx − Eλ Eλλx )2 + (Eλλ Eλy − Eλ Eλλy )2
E +E
λ
λλ
q
2 + C2
Cλw = Cλx
λy
q
= E12 (Eλx E − Eλ Ex )2 + (Eλy E − Eλ Ey )2
q
2
2
Cλλw = Cλλx
+ Cλλy
q
1
= E 2 (Eλλx E − Eλλ Ex )2 + (Eλλy E − Eλλ Ey )2
q
Iw = Wx2 + Wy2
q
1
2 + E2
=E
Ex
y
q
2 + W2
Wλw = Wλx
λy
q
1
2 + E2
=E
Eλx
λy
q
2
2
Wλλw = Wλλx
+ Wλλy
q
1
2
2
=E
Eλλx
+ Eλλy
q
2 + N2
Nλw = Nλx
λy
q
1
= E 2 (Eλx E − Eλ Ex )2 + (Eλy E − Eλ Ey )2
q
2
2
Nλλw = Nλλx
+ Nλλy
√
= E13 A2 + B 2
Hw =
2
Ex
where A = Eλλx E 2 − Eλλ Ex E − 2Eλx Eλ E + 2Eλ
2
Ey
and B = Eλλy E 2 − Eλλ Ey E − 2Eλy Eλ E + 2Eλ
energy distribution may be defined by the root squared sum of the edge strengths under a given imaging condition. A summary of total edge strength measures, ordered
by degree of invariance, is given in tab. 4.3.
For completeness, spatial second order derivatives in two dimensions are given
in tab. 4.4 and tab. 4.5. The derivation of higher order invariants is straightforward.
Usually many derivatives are involved here, raising some doubt on the sustainable
computational accuracy of the result.
4.3
Measurement of Color Invariants
Up to this point we have considered invariant expressions describing material properties under some general assumptions. They are derived from expressions exploring
the infinitely dimensional Hilbert space of spectra at an infinitesimally small spatial
neighborhood. As shown in Chapter 2, the spatio-spectral energy distribution is mea-
56
Table 4.3: Summary of the total edge strength measures for the various color invariant sets, ordered
by degree of invariance. The edge strength Ew is not invariant to any change in imaging conditions.
See tab. 4.1 for invariant class.
E
W
C
N
H
Ew =
q
2
2
2 + E2 + E2
2
Ex
λx
λλx + Ey + Eλy + Eλλy
q
2 + W2
2
2
2
Wx2 + Wλx
λλx + Wy + Wλy + Wλλy
q
2
2
2
2
Cw = Cλx + Cλλx + Cλy + Cλλy
q
2 + N2
2
2
Nw = Nλx
λλx + Nλy + Nλλy
q
Hw = Hx2 + Hy2
Ww =
surable only at a certain spatial extend and a certain spectral bandwidth. Hence,
physical measurements imply integration over spectral and spatial dimensions. In
this section we exploit the Gaussian color model as presented in Chapter 2 to define
measurable color invariants.
4.3.1
Measurement of Geometrical Color Invariants
Measurement of the geometrical color invariants is obtained by substitution of the
Gaussian basis, as derived from the RGB measurements (see Chapter 2), in the invariant expressions derived in section 4.2. Measured values for the geometrical color
invariants given in tab. 4.2 and tab. 4.4 are obtained by substitution of E, E λ and
Eλλ for the measured values Ê, Êλ and Êλλ at given scale σ~x . In this section, we
demonstrate the color invariant properties for each of the assumed imaging conditions by applying the invariants for an example image. The invariants regarding a
uniform object are not demonstrated separately, since the expressions are included in
the invariants for colored illumination.
Measurement of invariants for white illumination
The invariant Ĥ is representative for hue or dominant color of the material, disregarding intensity and highlights. The pseudo invariant Ŝ (eq. 4.10) denotes the purity
of the color, and therefore is sensitive to highlights since at these points color is desaturated. An example is shown in fig. 4.1. The invariant Ĥw represents the hue
gradient magnitude, detecting color edges independent of intensity and highlights, as
demonstrated in fig. 4.1.
Common expressions for hue are known to be noise sensitive. In the scale frame,
Gaussian regularization offers a trade-off between noise and detail sensitivity. The
influence of noise on hue gradient magnitude Hw for various σ~x is shown in fig. 4.2.
The influence of noise on the hue edge detection is drastically reduced for larger
observational scale σ~x .
∂
∂xx
H
Hxx =
Cλ
Cλxx =
Cλλ
W
Wλ
Wλλ
Nλ
Nλλ
Cλλxx =
Wxx =
Wλxx =
Wλλxx =
Nλxx =
Nλλxx =
−2(Eλλ Eλx −Eλ Eλλx )(Eλ Eλx +Eλλ Eλλx )
´2
³
E 2 +E 2
λλ
´ λ
³
2
2
E +Eλλ (Eλλ Eλxx −Eλ Eλλxx )
´2
³
+ λ
E 2 +E 2
λ
λλ
2
2
Eλxx E −Eλ Exx E−2Eλx Ex E+2Eλ Ex
E3
2
Eλλxx E 2 −Eλλ Exx E−2Eλλx Ex E+2Eλλ Ex
E3
Exx
E
Eλxx
E
Eλλxx
E
2
Eλxx E 2 −Eλ Exx E−2Eλx Ex E+2Eλ Ex
E3
2
Eλλxx E−Eλλ Exx −2Eλxx Eλ −2Eλλx Ex −2Eλx
E2
2 E2
2
2
2E
E E+8Eλx Eλ Ex E+2Eλ Exx E−6Eλ
x
+ λλ x
E4
Hxy =
Cλxy =
Cλλxy =
Wxy =
Wλxy =
Wλλxy =
Nλxy =
Nλλxy =
∂
∂xy
´
³
−2(Eλλ Eλx −Eλ Eλλx ) Eλ Eλy +Eλλ Eλλy
´2
³
2
2
E +E
λλ
´
´³ λ
³
2
Eλλ Eλxy −Eλ Eλλxy −Eλy Eλλx +Eλx Eλλy
E 2 +Eλλ
´2
³
+ λ
2
2
E +E
λ
λλ
Eλxy E 2 +Eλx Ey E−Eλy Ex E−Eλ Exy E−2Eλx Ey E+2Eλ Ex Ey
E3
Eλλxy E 2 +Eλλx Ey E−Eλλy Ex E−Eλλ Exy E−2Eλλx Ey E+2Eλλ Ex Ey
E3
Table 4.4: Summary of the spatial second order derivatives for the various color invariant sets.
Exy
E
Eλxy
E
Eλλxy
E
Eλxy E 2 +Eλx Ey E−Eλy Ex E−Eλ Exy E−2Eλx Ey E+2Eλ Ex Ey
E3
Eλλxy E−Eλλ Exy −2Eλxy Eλ −Eλλx Ey −Eλλy Ex −2Eλx Eλy
E2
2E
2
2E
Ex Ey E+4Eλx Eλ Ey E+4Eλy Eλ Ex E+2Eλ
xy E−6Eλ Ex Ey
+ λλ
E4
57
Table 4.5: Summary of the second order geometrical invariants for the various color invariant sets.
H
Hvv =
Cλ
Cλvv =
Cλλ
Cλλvv =
∂
∂vv
2H
2
Hx
yy −2Hx Hy Hxy +Hy Hxx
2
Hw
2 C
2
Cλx
λyy −2Cλx Cλy Cλxy +Cλy Cλxx
C2
λw
2
Cλλx
Cλλyy −2Cλλx Cλλy Cλλxy
C2
λλw
2
Cλλy
Cλλxx
Hvw =
Cλvw =
Cλλvw =
+
W
Wλ
Wλλ
Ivv =
Wλvv =
Wλλvv =
C2
λλw
2
2E
Ex
yy −2Ex Ey Exy +Ey Exx
q
2 +E 2
E Ex
y
2 E
2
Eλx
λyy −2Eλx Eλy Eλxy +Eλy Eλxx
r
E E 2 +E 2
λx
λy
2
Eλλx
Eλλyy −2Eλλx Eλλy Eλλxy
r
E E2
+E 2
λλx
λλy
2
Eλλy
Eλλxx
r
Ivw =
Wλvw =
Wλλvw =
+
Nλ
Nλλ
Nλvv =
Nλλvv =
E E2
+E 2
λλx
λλy
2 N
2
Nλx
λyy −2Nλx Nλy Nλxy +Nλy Nλxx
N2
λw
2
Nλλx
Nλλyy −2Nλλx Nλλy Nλλxy
N2
λλw
2
Nλλy
Nλλxx
58
+
N2
λλw
Nλvw =
Nλλvw =
∂
∂vw
´
³
2 −H 2 H
Hx Hy (Hxx −Hyy )− Hx
xy
y
2
Hw
´
³
Cλx Cλy Cλxx −Cλyy
C2
λw
´
³
2 −C 2
− Cλx
λy Cλxy
C2
´
³ λw
Cλλx Cλλy Cλλxx −Cλλyy
2
C
λλw
³
´
C2
−C 2
Cλλxy
− λλx 2λλy
C
λλw³
´
2 −E 2 E
Ex Ey (Exx −Eyy )− Ex
xy
y
q
2 +E 2
E Ex
y´
³
Eλx Eλy Eλxx −Eλyy
r
E E 2 +E 2
λx
λy
´
³
2
Eλxy
E 2 −Eλy
r
− λx
2
E E
+E 2
λx
λy
³
´
Eλλx Eλλy Eλλxx −Eλλyy
r
E E2
+E 2
λλx
λλy
´
³
2
2
Eλλxy
Eλλx
−Eλλy
r
−
E E2
+E 2
λλx
λλy
³
´
Nλx Nλy Nλxx −Nλyy
N2
λw
´
³
Nλxy
N 2 −N 2
− λx 2λy
N
λw
³
´
Nλλx Nλλy Nλλxx −Nλλyy
N2
λλw
³
´
2
2
Nλλx
−Nλλy
Nλλxy
−
2
N
λλw
Hww =
Cλww =
Cλλww =
∂
∂ww
2H
2
Hx
xx +2Hx Hy Hxy +Hy Hyy
2
Hw
2 C
2
Cλx
λxx +2Cλx Cλy Cλxy +Cλy Cλyy
C2
λw
2
Cλλx
Cλλxx +2Cλλx Cλλy Cλλxy
C2
λλw
2
Cλλy
Cλλyy
+
Iww =
Wλww =
Wλλww =
C2
λλw
2
2E
Ex
xx +2Ex Ey Exy +Ey Eyy
q
2 +E 2
E Ex
y
2 E
2
Eλx
λxx +2Eλx Eλy Eλxy +Eλy Eλyy
r
E E 2 +E 2
λx
λy
2
Eλλx
Eλλxx +2Eλλx Eλλy Eλλxy
r
E E2
+E 2
λλx
λλy
2
Eλλy
Eλλyy
r
+
Nλww =
Nλλww =
E E2
+E 2
λλx
λλy
2 N
2
Nλx
λxx +2Nλx Nλy Nλxy +Nλy Nλyy
N2
λw
2
Nλλx
Nλλxx +2Nλλx Nλλy Nλλxy
N2
λλw
2
Nλλy
Nλλyy
+
N2
λλw
59
(a)
(b)
(c)
(d)
Figure 4.1: Example of the invariants associated with Ĥ. The example image is shown in (a),
invariant Ĥ in (b), the derived expression Ŝ (c), and gradient magnitude Ĥw (d). Intensity changes
and highlights are suppressed in the Ĥ and Ĥw image. The Ŝ image shows a low purity at color
borders, due to mixing of colors on two sides of the border. For all pictures, σ~x = 1 pixel and the
image size is 256 × 256.
(a)
(b)
(c)
(d)
Figure 4.2: The influence of white additive noise on gradient magnitude Ĥw . Independent Gaussian
zero-mean noise is added to each of the RGB channels, SNR = 5 (a), and Ĥw is determined for
σ~x = 1 (b), σ~x = 2 (c) and σ~x = 4 pixels (d), respectively. Note the noise robustness of the hue
gradient Ĥw for larger σ~x .
Measurement of invariants for white illumination and matte, dull surfaces
The invariants Ĉλ and Ĉλλ represent normalized color, consequently their spatial
derivatives measure the normalized color gradients. Ĉλw may be interpreted as the
color gradient magnitude for transitions in first order spectral derivative, whereas
Ĉλλw detects edges related to the second order spectral derivative. An example of
the normalized colors and its gradients are shown in fig. 4.3.
60
(a)
(b)
(c)
(d)
Figure 4.3: Examples of the normalized colors Ĉλ denoting the first spectral derivative (a), Ĉλλ
denoting the second spectral derivative (b), and their gradient magnitudes Ĉλw (c) and Ĉλλw (d),
respectively. Note that intensity edges are being suppressed, whereas highlights are still present.
(a)
(b)
(c)
Figure 4.4: Examples of the gradient magnitudes Iˆw (a), Ŵλw (b) and Ŵλλw (c), respectively. Note
all images show edges due to intensity differences and highlights. Iˆw shows purely intensity edges or
shadow edges, while Ŵλw and Ŵλλw show color edges.
Measurement of invariants for white and uniform illumination and matte,
dull surfaces
The invariant Iˆw denotes intensity or shadow edges, whereas the invariants Ŵλw and
Ŵλλw represent color edges. Wλw may be interpreted as the gradient magnitude for
the first spectral derivative. A similar interpretation holds for Ŵλλw , but here edges
caused by the second spectral derivative are detected. An example of the gradients is
shown in fig. 4.4.
61
(a)
(b)
Figure 4.5: Examples of the gradient magnitudes N̂λw (a) and N̂λλw (b). Note that intensity edges
are suppressed. Further, note that the assumptions underlying this invariant does not account for
highlights and interreflections, as is seen in the figure.
Measurement of invariants for colored illumination
The invariant N̂λw and N̂λww may be interpreted as the reflectance function gradient magnitudes for spectral first and second order derivatives, respectively. Hence,
material edges are detected independent of illumination intensity and illumination
color. An example of the gradients N̂λw and N̂λλw is shown in fig. 4.5. In Chapter 3,
illumination color invariance is investigated for the proposed edge strength, resulting
in a significant reduction of chromatic variation due to illumination color. For a more
elaborate discussion on the subject, see Chapter 3.
Total color gradients
The expressions for total gradient magnitude are given by Êw , Ŵw , Ĉw , N̂w , and Ĥw .
The proposed edge strength measures may be ordered by degree of invariance, yielding Êw as measure of spectral edge strength, Ŵw as measure of color edge strength,
disregarding intensity level, Ĉw as measure of chromatic edge strength, disregarding
intensity distribution, N̂w as measure of chromatic edge strength, disregarding illumination, and Ĥw as measure of dominant wavelength, disregarding intensity and
highlights. An example of the proposed measures is shown in fig. 4.6.
4.3.2
Discriminative Power for RGB Recording
In order to investigate the discriminative power of the proposed invariants, edge detection between 1013 different colors of the PANTONE† color system is examined.
The 1013 PANTONE colors‡ are recorded by a RGB-camera (Sony DXC-930P), un† PANTONE
‡ We
is a trademark of Pantone, inc.
use the PANTONE edition 1992-1993, Groupe BASF, Paris, France
62
(a)
(b)
(c)
(d)
Figure 4.6: Examples for the total color edge strength measures. a. Ŵw invariant for a constant
gain or intensity factor; note that this image show intensity, color, and highlight boundaries. b. Ĉw
and c. N̂w invariant for shading are shown. d. Ĥw invariant for shading and highlights. The effect
of intensity and highlights on the different invariants are in accordance with tab. 4.1.
der a 5200K daylight simulator (Little Light, Grigull, Jungingen, Germany). Purely
achromatic patches are removed from the dataset, leaving 1000 colored patches. In
this way, numerically unstable result for set Ĥ are avoided.
Color edges are formed by combining each of the patches with all others, yielding
499,500 different edges. Edges are defined virtually by computing the left-hand part
on one patch and the right-hand side of the filter on one of the other patches. The total
edge strength measures for invariants Ê, Ŵ , Ĉ, N̂ , and Ĥ (tab. 4.3) are measured
for each color combination at a scale of σx = {0.75, 1, 2, 3} pixels, hence evaluating
the total performance of each set of invariants. Discrimination between colors is
determined by evaluating the ratio of discriminatory contrast between patches to
within patch noise,
DN Rc (i, j) =
max
k
q
1
N2
ĉij
P
(4.25)
2
x,y ĉk (x, y)
where ĉ denotes one of the edge strength measures for Ê, Ŵ , Ĉ, N̂ , or Ĥ, respectively.
Further, ĉij denotes the edge strength between patch i and j, and ĉk denotes the
responses of the edge detector to noise within patch k. Hence, for detector ĉ, the
denominator in expression (eq. 4.25) expresses the maximum response over the 1000
patches due to noise, whereas the numerator expresses the response due to the color
edge. Two colors are defined to be discriminable when DN R ≥ 3, effectuating a
conservative threshold.
The results of the experiment are shown in tab. 4.6. For colors uniformly distributed in color space, and for the configuration used and spatial scale σ x = 0.75,
about 970 colors can be distinguished from one another (Ê). For invariant W , per-
63
Table 4.6: For each invariant, the number of colors is given which can be discriminated from one
another in the PANTONE color system (1000 colors). The number refers to the amount of colors
still to be distinguished with the conservative criterion DN R > 3 given the hardware and spatial
scale σx . For σx ≥ 2, Ê and Ŵ discriminate between all patches, hence the results are saturated.
σx = 0.75
σx = 1
σx = 2
σx = 3
Ê
970
983
1000
1000
Ŵ
944
978
1000
1000
Ĉ
702
820
949
970
N̂
631
757
962
974
Ĥ
436
461
452
462
formance reduces to 950 colors. A further decrease is for Ĉ and N̂ , which distinct
between approximately 700 and 630 colors, respectively. Lowest discriminative power
is achieved by invariant set Ĥ, which discriminates approximately 440 colors. When
the spatial scale σx increases, discrimination improves. A larger spatial scale yields
better reduction of noise, hence a more accurate estimate of the true color is obtained.
The results shown for σx ≥ 2 are saturated for Ê and Ŵ . Hence, a larger set of colors
can be discriminated than shown here. Note that for σx ≥ 2 the performance for Ĉ is
comparable to the performance of N̂ , again indicating saturation. Note also that the
power of discrimination expressed as the amount of discriminable colors is inversely
proportional to the degree of invariance. These are very encouraging results given
a standard RGB-camera and not a spectrophotometer. To discriminate 450 to 950
colors while maintaining invariance on just two patches in the image is helpful for
many practical image retrieval problems.
4.3.3
Evaluation of Scene Geometry Invariance
In this section, illumination and viewing direction invariance is evaluated by experiments on a collection of real-world surfaces. Colored patches from the CUReT §
database are selected [3]. The database consists of planar patches of common materials, captured under various illumination and viewing directions. Hence, recordings
of 27 colored material patches, each captured under 205 different illumination and
viewing directions are considered. Color edges are formed by combining the patch
for each imaging conditions with the others, yielding 205 × (205 − 1)/2 = 20, 910
different edges per material. Edges are defined virtually by computing the left-hand
part on one patch and the right-hand side of the filter on one of the other patches.
The total edge strength measures for invariants Ê, Ŵ , Ĉ, N̂ , and Ĥ (tab. 4.3) are
§ http://www.cs.columbia.edu/CAVE/curet/
64
obtained for each material at scale σx = 3 pixels. The root squared sum over the
measured edge strengths indicates sensitivity to the scene geometry for the material
and edge strength measure under consideration. For the spectral edge strength E,
edge strength was normalized to the average intensity over all viewing conditions. In
this way, comparison between the various edge strengths measured was possible.
The results are shown in tab. 4.7. A high value of Ê indicates influence of scene
geometry on material surface reflectance. By construction of the database, which contains planar patches, the value for Ŵ approximates the value for Ê. Exceptions are
surfaces with rough texture, exhibiting shadow edges larger than the measurement
scale (straw, cracker a). Further, the selected center point in the 205 recordings does
not correspond to one identical point on the material patches, causing an error for
non uniformly colored patches (orange peel, peacock feather), or patches exhibiting
intensity variations (rabbit fur, brick b, moss). The measured error for Ĉ and N̂ is
approximately similar. White light is used for the recordings, hence both Ĉ and N̂
reduce the measurement variation due to changes in intensity. Exceptions are scattering materials with fine texture relative to the measurement scale σx = 3 pixels (velvet,
rug b). Hence, causing highlights to influence the measured surface reflectance. Overall, for Ĉ and N̂ , variation in edge strength due to illumination and viewing direction
is reduced drastically. Even for these non-Lambertian real-world surfaces, invariant
sets N̂ and Ĉ are highly robust against changes in scene geometry. For Ĥ, results
are influenced by numerical stability. Highly saturated materials (velvet, artificial
2
>> 0.
grass, lettuce leaf, soleirolia plant, moss) result in a small error since Êλ2 + Êλλ
Exception is again the non-uniform colored orange peel. Note that the error due to
highlights for velvet is much smaller than as measured for Ĉ, N̂ . For materials with
lower saturation, errors become larger. Overall, influence of illumination and viewing
direction is slightly reduced for Ĥ.
In conclusion, the table demonstrates the expected error for real, commonly nonLambertian surfaces. These results demonstrate the usefulness of the various invariant
sets for material classification and recognition, based on surface reflectance properties.
4.3.4
Localization Accuracy for the Geometrical Color Invariants
Rotational and translational invariance is to be evaluated yet. The independency of
the derived expressions on coordinate system is mathematically shown in [4]. It is
demonstrated by the examples shown in fig. 4.6. The measurement problem related to
rotation and translation invariance is the accuracy of edge localization between different colors. In order the investigate localization accuracy of the proposed invariants,
edge location is evaluated between 1000 different colors of the PANTONE system.
The uncoated patches as described in the preceding section (section 4.3.2) are used
to form 499,500 different color edges. The total edge strength measures for invariants
65
Table 4.7: Results for scene geometry invariance evaluation on the CUReT dataset [3]. The root
squared sum in measured total edge strength over 205 recordings under different viewing and illumination direction is given for each of the materials. A high value for Ê indicates a large influence of
scene geometry on surface reflectance for the considered material. A low value for the variation of
invariants Ŵ , Ĉ, N̂ or Ĥ relative to Ê indicate robustness against scene geometry for the invariant
under consideration. The table offers an indication for the error to expect in estimating surface
reflectance for real materials.
material
Velvet
Pebbles
Artificial Grass
Roof Shingle
Cork
Rug b
Sponge
Lambswool
Lettuce Leaf
Rabbit Fur
Roof Shingle (zoomed)
Human Skin
Straw
Brick b
Corduroy
Linen
Brown Bread
Corn Husk
Soleirolia Plant
Wood a
Orange Peel
Wood b
Peacock Feather
Tree Bark
Cracker a
Cracker b
Moss
recordings
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
205
E
102.7
25.0
29.1
27.4
34.5
28.2
29.0
29.6
41.3
26.0
34.2
39.7
33.7
77.6
27.3
39.1
29.7
33.3
39.4
34.8
73.3
34.3
61.3
37.0
28.7
25.3
46.9
W/E
1.01
1.07
0.97
1.04
0.93
1.01
1.09
1.09
1.02
1.15
0.98
0.99
1.13
0.55
1.14
0.90
1.06
1.05
0.98
0.94
0.73
0.97
0.66
0.97
1.22
1.01
0.84
C/E
0.86
0.15
0.22
0.30
0.20
0.45
0.21
0.18
0.13
0.19
0.19
0.10
0.12
0.06
0.07
0.11
0.19
0.10
0.15
0.19
0.30
0.13
0.16
0.16
0.19
0.30
0.13
N/E
1.45
0.15
0.23
0.30
0.19
0.38
0.21
0.17
0.15
0.18
0.19
0.09
0.12
0.06
0.07
0.11
0.19
0.11
0.17
0.18
0.28
0.13
0.16
0.16
0.19
0.27
0.15
H/E
0.03
0.59
0.28
2.00
0.59
0.58
0.62
0.53
0.26
0.68
0.97
0.47
0.38
0.30
0.47
0.62
0.54
0.41
0.24
0.58
0.36
0.36
0.70
0.48
0.71
0.75
0.20
Ê, Ŵ , Ĉ, N̂ , and Ĥ (tab. 4.3) are measured for each color combination at a scale of
σx = {1, 2, 4} pixels, hence evaluating the total performance of each set of invariants.
For color pairs that can be distinguished (see section 4.3.2), the edge position between
different patches is determined by tracing the maximum response along the edge in
the resulting edge strength image. The average deviation between the measured edge
location and real edge location is considered to be a good measure for the localization
accuracy. The root mean squared error in edge location is determined over all color
pairs for each of the total edge strength measures.
The results of the experiment are shown in tab. 4.8. For the invariants Ê, Ŵ , Ĉ,
66
Table 4.8: Results for the edge localization experiment relative to pixel size. For each invariant, the
root mean squared error in measured edge position over the color pairs from the PANTONE system
is given.
σx = 1
σx = 2
Ê
0.22
0.28
σx = 4
0.36
Ŵ
0.51
1.03
1.82
2.44
Ĉ
0.66
1.49
N̂
0.65
1.45
2.38
Ĥ
2.46
1.63
0.70
and N̂ , localization accuracy degrades for higher spatial scale σx . This is a well known
property for Gaussian smoothing in the intensity domain. The invariants all result
in a larger localization error than Ê, due to severe reduction in edge contrast. The
localization error for Ĉ is almost identical to the error for N̂ , as expected. Note that
the localization error remains within spatial scale σx . The results for invariants Ĉ and
N̂ are almost identical, as expected. For the invariant Ĥ, edge strength is normalized
by the squared sum of the spectral derivatives (eq. 4.9). Hence, localization accuracy
improves for higher spatial scale due to a better estimation of local chromaticity.
In conclusion, edge localization accuracy is slightly reduced for the invariant sets
in comparison to Ê. However, precision remains within the spatial differential scale
σx . The results show Ĥ to be noise sensitive for small spatial scale σx < 2.
4.4
Conclusion
We have derived geometrical color invariant expressions describing material properties
under three independent assumptions regarding the imaging conditions, a. white or
colored illumination, b. matte, dull object or general object, or c. uniformly stained
object or generally colored object. The reflectance model under which the invariants
remain valid is useful for a wide range of materials [10]. Experiments on an example
image showed the invariant set C and N to be successful in disregarding shadow
edges, whereas the set H is shown to be successful in discounting both shadow edges
and highlights. In Chapter 3 the degree of illumination color invariance for set N̂ is
investigated.
We showed the discriminative power of the invariants to be orderable by broadness
of invariance. Highest discriminative power is obtained by set Ŵ (950 colors out of
1000) which has the tightest set of disturbing conditions, namely overall illumination
intensity or camera gain. Discrimination degraded for set Ĉ (700 colors), which
is invariant for shading effects. Set N̂ invariant for shading and illumination color
discriminates between 630 color, whereas set Ĥ, invariant for shadows and highlights,
Bibliography
67
has lowest discriminative power (440 colors). Discriminating power is increased when
considering a larger spatial scale σx , thereby taking a larger neighborhood into account
for determining the color value. Hence, a larger spatial scale results in a more accurate
estimate of color at the point of interest, increasing the accuracy of the result. The
aim of the chapter is reached in that high color discrimination resolution is achieved
while maintaining constancy against disturbing imaging conditions, both theoretically
as well as experimentally.
We have restricted ourselves in several ways. We have derived expressions up
to the second spatial order, and investigated their performance only for the spatial
gradient. The derivation of higher order derivatives is straightforward, and may aim in
corner detection [21]. Usually many derivatives are involved here, raising some doubt
on the sustainable accuracy of the result. Consequently, a larger spatial scale may be
necessary to increase the accuracy of measurements involving higher order derivatives.
Further, we have only considered spectral derivatives up to second order, yielding
compatibility with human color vision. For a spectrophotometer, measurements can
be obtained at different positions λ0 , for different scales σλ , and for higher spectral
differential order, thereby exploiting the generality of the Gaussian color model.
We provided different classes of color invariants, under general assumptions regarding the imaging conditions. We have shown how to reliably measure color invariants
from RGB images by using the Gaussian color model. The Gaussian color model
extents the differential geometry approaches from grey-value images to multi-spectral
differential geometry. Further, we experimentally proved the color invariants to be
successful in discounting shadows and highlights, resulting in accurate measurements
of surface reflectance properties. The presented framework for color measurement is
well-defined on a physical basis, hence it is theoretically better founded as well as
experimentally better evaluated than existing methods for the measurement of color
features in RGB-images.
Bibliography
[1] E. Angelopoulou, S. Lee, and R. Bajcsy. Spectral gradient: A material descriptor
invariant to geometry and incident illumination. In Proceedings of the Seventh
IEEE International Conference on Computer Vision, pages 861–867. IEEE Computer Society, 1999.
[2] A. Cumani. Edge detection in multispectral images. CVGIP: Graphical Models
and Image Processing, 53(1):40–51, 1991.
[3] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink. Reflectance
and texture of real world surfaces. ACM Trans Graphics, 18:1–34, 1999.
68
Scale and the differential structure of images. Image and Vision Computing,
10(6):376–388, 1992.
Cartesian differential invariants in scale-space. Journal of Mathematical Imaging
and Vision, 3(4):327–348, 1993.
[7] T. Gevers, S. Ghebreab, and A. W. M. Smeulders. Color invariant snakes. In
P. H. Lewis and M. S. Nixon, editors, Proceedings of the Ninth British Machine
Vision Conference, pages 659–670. University of Southhampton, 1998.
32:453–464, 1999.
[9] T. Gevers and H. Stokman. Reflectance based edge classification. In Proceedings of Vision Interface, pages 25–32. Canadian Image Processing and Pattern
Recognition Society, 1999.
New York, NY, 1975.
63:291–297, 1990.
Physik, 12:593, 1931.
[16] P. Olver, G. Sapiro, and A Tannenbaum. Differential invariant signatures
and flows in computer vision: A symmetry group approach. In B. M. ter
Haar Romeny, editor, Geometry-Driven Diffusion in Computer Vision. Kluwer
Academic Publishers, Boston, 1994.
Bibliography
69
[18] G. Sapiro and D. L. Ringach. Anisotropic diffusion of multivalued images with
applications to color filtering. IEEE Trans. Image Processing, 5(11):1582–1586,
1996.
10(4):210–218, 1985.
[20] H. Stokman and T. Gevers. Detection and classification of hyper-spectral edges.
In Proceedings of the Tenth British Machine Vision Conference, pages 643–651.
CRI Repro Systems Ltd., 1999.
[23] S. Di Zenzo. A note on the gradient of a multi-image. Comput. Vision Graphics
Image Processing, 33:116–125, 1986.
Part II
Geometrical Structure
Chapter 5
Robust Autofocusing in
Microscopy
appeared in Cytometry, vol. 39, pp. 1–9, 2000.
Europian patent application filed under no. 99201795.4 on June 4, 1999.
“The way the Nutri-Matic machine functioned was very interesting. When the
Drink button was pressed it made an instant but highly detailed examination
of the subject’s taste buds, a spectroscopic analysis of the subject’s metabolism
and then sent tiny experimental signals down the neural pathways to the taste
centres of the subject’s brain to see what was likely to go down well. However,
it invariably produced a plastic cup filled with a liquid which was almost, but
not quite, entirely unlike tea.”
in The Hitch Hikers Guide to the Galaxy, by Douglas Adams.
Along with the introduction of high throughput screenings, quantitative microscopy is gaining importance in pharmaceutical research. Fully automatic acquisition of microscope images in an unattended operation coupled to an automatic
image analysis system allows for the investigation of morphological changes. Time
lapse experiments reveal the effect of drug compounds on the dynamics of living cells.
Histochemical assessment of fixed tissue sections is used to quantify pathological
modification.
A critical step in automatic screening is focusing. Fast and reliable autofocus
methods for the acquisition of microscope images are indispensable for routine use on
a large scale. Autofocus algorithms should be generally applicable on a large variety
of microscopic modes and on a large variety of preparation techniques and specimen
types. Although autofocusing is a long standing topic in literature [4, 5, 7, 8, 9, 10, 11,
19], no such generally applicable solution is available. Methods are often designed for
one kind of imaging mode. They have been tested under well-defined circumstances.
The assumptions made for determining the focal plane in fluorescence microscopy
73
74
Chapter 5. Robust Autofocusing in Microscopy
are not compatible with the same in phase contrast microscopy, and this holds true
throughout. We consider the design of a method which is generally applicable in light
microscopy.
From Fourier optics [13] it has been deduced that well-focused images contain
more detail than images out of focus. A focus score is used to measure the amount of
detail. The focus curve can be estimated from sampling the focus score for different
levels of focus. Some examples of focus curves are shown in fig. 5.2. Best focus is
found by searching for the optimum in the focus curve. In a classical approach the
value of the focus score is estimated for a few focus positions [8, 19, 2]. Evaluation of
the scores indicates where on the focus curve to take the next sample. Repeating the
process iteratively should ensure convergence to the focal plane. A major drawback
is that such optimization procedure presupposes a. a uni-modal focus function, and
b. a broad-tailed extremum to obtain a wide focus range, which do not hold true in
general. In reality, the focus curve depends on the microscope setup, imaging mode
and preparation characteristics [16]. When the assumed shape of the focus curve
does not match the real focus curve, or when local extrema emerge, convergence to
the focal plane is not guaranteed.
Groen et al. [5] specifies criteria for the design of autofocus procedures. We adopt
these criteria of good focusing: a. accuracy, b. reproducibility, c. general applicability, d. insensitivity to other parameters. Under insensitivity to other parameters is
considered robustness against noise and optical artifacts common to microscopic image acquisition. Further, we reject the criteria of unimodality of the focus curve, which
can not be achieved in practise [16, 15]. As a consequence, the range or broadness of
the extremum in the focus curve is of less relevance.
In this report, an autofocus method is presented which is generally applicable in
different microscopic modes. The aim was to develop a method especially suited for
an unattended operational environment, such as high throughput screenings. Therefore, the method should be robust against confounding factors common in microscopy,
as noise, optical artifacts and dust on the preparation surface. To evaluate the performance of the autofocus method, experiments have been conducted in screening
applications.
5.1
5.1.1
Material and Methods
The Focus Score
From Fourier optics, measurement of the focus score can best be based on the energy
content of a linearly filtered image [5, 13]. From [5, 8, 16] it can be deduced that an
optimal focus score is output by the gradient filter. Scale-space theory [20] leads to
the use of the first order Gaussian derivative to measure the focus score. The σ of the
75
5.1. Material and Methods
Gauss filter determines the scale of prominent features. The focus function becomes
1 X
2
2
[f (x, y) ∗ Gx (x, y, σ)] + [f (x, y) ∗ Gy (x, y, σ)]
N M x,y
1 X 2
f + fy2
=
N M x,y x
F (σ) =
(5.1)
where f (x, y) is the image grey value, Gx (x, y, σ) and Gy (x, y, σ) are the first order
Gaussian derivatives in the x- and y-direction at scale σ, N M is the total amount
of pixels in the image, and fx , fy are the image derivatives at scale σ in the x- and
y-direction, respectively.
Often, a trade-off between noise sensitivity and detail sensitivity can be observed
for a specific microscope set-up. For example, in fluorescence microscopy the signal
to noise ratio (SNR) is often low, and relatively smooth images are examined. For
phase contrast microscopy, SNR is high, and small details (the phase transitions)
have to be detected. Accuracy of autofocusing depends on the signal to noise ratio
as propagated through the focus score filter [19]. Therefore, the σ of the Gaussian
filter should be chosen such that noise is maximally suppressed, while the response
to details of interest in the image is preserved. For bar-like structures, the value of σ
should conform to [17]
d
σ≈ √
2 3
(5.2)
where the thickness of the bar is given by d. Assuming that the smallest detail to be
focused may be considered bar shaped, (eq. 5.2) gives an indication for the minimal
value of σ. Note that the filter response degrades for smaller values, whereas a very
large value smooths all details to noise level.
5.1.2
Measurement of the Focus Curve
Consider a system consisting of the following hardware: 1) a microscope with scanning
stage and position controller for both axial and lateral direction, 2) a camera placed
on the microscope recording its field of view, 3) a video digitizer connected to a
computer system, writing at video rate the camera output into the computers memory.
The computer system is able to send positioning commands to the stage controller.
Examples of such systems will be given later.
The focal plane of the microscope is assumed to be within a pre-defined interval
∆z around the start z-position z. The scanning stage is moved down to the position
zmin = z− 21 ∆z. Backlash correction is applied by sending the stage further down than
necessary, and raising it again to the given position [10]. In this way, focus positions
are always reached from the same direction. As a result, mechanical tolerance in
cog-wheels is eliminated.
76
At t = 0 ms, the stage controller starts raising the stage to traverse the complete
focus interval ∆z. During the stage movement through focus, successive images of
the preparation are captured at 40 ms intervals (video rate). The focus score of each
captured image is calculated. The image buffer is re-used for the next video frame,
necessitating only two image memory buffers to be active at any time. One of the
buffers is used for focus score calculation of the previously captured image, while the
other is used for capturing the next image. Calculation of the focus score should thus
be performed within one video frame time.
As soon as the stage has reached the end of the focus interval, timing is stopped
at t = td ms. An estimation of the focus curve is obtained for the complete focus
interval. The global optimum in the estimate for the focus curve represents the focal
plane. Now, each z-position is related to the time at which the corresponding image
has been captured. When linear movement of the stage is assumed, the position at
which the image at time ti is taken corresponds to
zi =
ti
∆z + zmin
td
(5.3)
where td represents the travel duration, ∆z is the focus interval, and zmin is the start
position (position at t = 0 ms).
Since the focus curve is parabolic around the focal plane [10, 11, 19], high focus
precision can be achieved by quadratic interpolation. When assuming linear stage
movement, or z = vt + zmin , the focus curve around the focal plane can be approximated by
s(t) = c + bt + at2
(5.4)
The exact focus position is obtained by fitting a parabola through the detected optimum and its neighboring measurements. Consider the detected optimum s(t o ) = so
at time t = to . The time axis may be redefined such that the detected optimum is at
time t = 0. Then, neighboring scores are given by (sn , tn ) and (sp , tp ), respectively.
Solving for a, b and c gives
c = so , b =
−so t2n + sp t2n + so t2p − sn t2p
s o tn − s p tn − s o tp + s n tp
, a=
t2n tp − tn t2p
t2n tp − tn t2p
(5.5)
The peak of the parabola, and thus the elapsed time to the focus position, is given by
tf = −
so t2n − sp t2n − so t2p + sn t2p
b
+ to =
+ to
2a
2 (so tn − sp tn − so tp + sn tp )
(5.6)
The focal plane is at position
zf =
tf
∆z + zmin
td
to which is moved, taking the backlash correction into account.
(5.7)
77
5.1.3
Sampling the Focus Curve
The depth of field of an optical system is defined as the axial distance from the
focal plane over which details still can be observed with satisfactory sharpness. The
thickness of the slice which can be considered in focus is then given by [14, 26]
zd =
λ
¶
q
¡ ¢2
2n 1 − 1 − NA
n
µ
(5.8)
where n is the refractive index of the medium, λ the wavelength of the used light,
and N A the numerical aperture of the objective. The focus curve is sampled at
Nyquist rate when measured at zd intervals [18]. The parabolic fitting ensures that
focus position is centered within thick specimens, i.e. specimens much larger than z d .
Common video hardware captures frames at fixed rate. Thus the sampling density of
the focus curve can only be influenced by adjusting the stage velocity to travel z d µm
per video frame time.
In order to calculate the focus score within video frame time for current sensors
and computer systems, simplification of the focus function (eq. 5.1) is considered. For
biological preparations, details are distributed isotropically over the image. The response of the filter in one direction is adequate for determination of the focal plane.
Further computation time can be saved by estimating the filter response from a fraction of the scan lines in the image. Then, the focus function is given by
F (σ) =
L X
2
[f (x, y) ∗ Gx (x, y, σ)] .
N M x,y
(5.9)
For our purpose, each sixth row (L = 6) is applied. A recursive implementation of the Gaussian derivative filter is used [22], for which the computation time
is independent of the value of σ. The calculation time is kept under 40 ms for all
computer systems we used in the experiments, even when the system is running other
tasks simultaneously. Comparison between the focus curve calculated in two dimensions for the whole image (eq. 5.1), and the response of (eq. 5.9) reveals only marginal
differences for all experiments.
5.1.4
Large, Flat Preparations
For the acquisition of multiple aligned images from large, flat preparations, the variation in focus position is assumed small but noticeable at high magnification. Proper
acquisition of adjacent images can be obtained by focusing a few fields. Within the
preparation, the procedure starts by focusing the first field. Fields surrounding the
focused field are captured, until the next field to capture is a given distance away
from the initially focused field. Deviation from best focus is now corrected for by
78
focusing over a small interval. The preparation is scanned, keeping track of focus
position at fields further away than a given distance from the nearest of all the previously focused fields. The threshold distance for which focusing is skipped depends
on the preparation flatness and magnification, and has to be empirically optimized
for efficiency. Fields that have been skipped for focusing are positioned at the focus
level of the nearest focused field. Small variations in focus position while scanning
the preparation are corrected during acquisition.
5.1.5
Preparation and Image Acquisition
The autofocus algorithm is intensively tested in the following applications: a. quantitative neuronal morphology, b. time-lapse experiments of cardiac myocyte dedifferentiation, c. immunohistochemical label detection in fixed tissue d. C. Elegans
GFP-VM screening, e. acquisition of smooth muscle cells, and f. immunocytochemical label detection in fixed cells. Each of these applications is described below. The
software package SCIL Image version 1.4 [21] (TNO-TPD, Delft, The Netherlands) is
used for image processing, extended with the autofocus algorithm and functions for
automatic stage control and image capturing. All preparations are observed on Zeiss
invert microscopes (Carl Zeiss, Oberkochen, Germany), except for the immunohistochemical label detection, which is observed with an Zeiss Axioskop. The wavelength
of the used light is 530 nm, unless stated different. For automatic position control,
the microscopes are equipped with a scanning stage and MAC4000 or (comparable) MC2000 controller (Märzhäuser, Wetzlar, Germany). At power on, the stage
is calibrated and an initial focus level is indicated manually. Backlash correction is
empirically determined. For each application, the focus interval ∆z is determined by
evaluating the variability in the z-position between focus events.
Quantitative Neuronal Morphology in Bright-field Mode
Morphological changes of neurons are automatically quantified as described in [12].
Briefly, PC12 cells were plated in poly-L-lysine (Sigma, St. Louis, MO) coated 12-well
plates. In each well 5 × 104 cells were seeded. After 24 hours the cells were fixed with
1% glutaraldehyde for 10 minutes. Then the cells were washed twice with distilled
water. The plates were dried in an incubator.
The plates are examined in bright-field illumination mode, for details see tab. 5.1.
The camera used is an MX5 (Adaptec, Eindhoven, The Netherlands) 780 × 576 video
frame transfer CCD with pixel size 8.2 × 16.07µm2 , operating at room temperature
with auto gain turned off. Adjacent images are captured by an Indy R4600 132MHz
workstation (Silicon Graphics, Mountain View, CA), resulting in an 8 × 8 mosaic
image for each well. Prior to the acquisition of the well, autofocusing at the center of
the scan area is performed. The smallest details to focus are the neurites, which are
about 3 pixels thick, yielding σ = 1.0 (eq. 5.2). The wave length of the illumination is
79
about 530 nm, resulting in 23.4 µm depth of field (eq. 5.8). The effective stage velocity
is somewhat different due to rounding off to controller built-in speeds. Due to the
low magnification, backlash correction is not necessary.
Cardiac Myocyte Dedifferentiation in Phase Contrast Mode
Cardiac myocytes were isolated from adult rats (ca. 250 gram) heart by collagenase
perfusion as described in [3]. The cell suspension containing cardiomyocytes and
fibroblasts was seeded on laminin coated plastic petri dishes, supplied with M199 and
incubated for one hour. Thereafter, unattached and/or dead cells were washed away
by rinsing once with M199. The petri dishes were filled with M199 +20% fetal bovine
serum and incubated at 37◦ C.
The petri dishes are examined in phase contrast mode, for details see tab. 5.1. During the experiment, ambient temperature is maintained at 37◦ C. Time-lapse recordings (15 hours) are made in 6 manually selected fields, one in each of the 6 petri
dishes. The scanning stage visits the selected fields at 120 second intervals. Fields
are captured using a CCD camera (TM-765E, Pulnix, Alzenau, Germany). They are
added to JPEG compressed digital movies (Indy workstation with Cosmo compressor
card, SGI, Mountain View, CA), one for each selected field. Autofocusing is applied
once per cycle, successively refocusing all the fields in 6 cycles. The smallest details
to focus are the cell borders.
Immunohistochemical Label Detection in Bright-field Mode
Sections of the amygdala of mice injected with a toxic compound were cut at 15 µm
thickness through the injection site. They were subsequently immunostained for the
presence of the antigen, using a polyclonal antibody (44-136, Quality Control Biochemicals Inc., Hopkinton, MA) and visualized using the chromogen DAB.
Four microscope slides (40 brain slices) at once are mounted on the scanning
stage and observed in bright-field illumination mode, see tab. 5.1. Adjacent images
are captured (Meteor/RGB frame-grabber, Matrox, Donval, Quebec, Canada in an
Optiplex GXi PC with Pentium 200MHz MMX, Dell, Round Rock, TX) by use of
an MX5 CCD camera (Adaptec, Eindhoven, The Netherlands). As a result, mosaics
of complete brain slices are stored on disk. Prior to acquisition, autofocusing at
approximately the center of the brain slice is performed, the smallest details to focus
being tissue structures. Due to the low magnification, backlash correction is not
necessary.
C. Elegans GFP-VM Screening in Fluorescence Mode
Individual C. Elegans worms transgenic for GFP expressing vulval muscles (GFPVM) were selected from stock, and one young adult hermafrodite (P0 ) was placed in
80
each of the 60 center wells of a 96-well plate (Costar, Acton, MA) filled with natural
growth medium, and incubated for five days at 25◦ C to allow F1 progeny to reach
adult stage.
Before image acquisition, fluorescent beads (F-8839, Molecular Probes, Eugene,
OR) are added to the wells as background markers for the focus algorithm. The well
plate is examined in fluorescence mode, see tab. 5.1. A FITC filter (B, Carl Zeis,
Oberkochen, Germany) in combination with a 100W Xenophot lamp is used to excite
the GFP. Images are captured (O2 R5000 180MHZ workstation, Silicon Graphics,
Mountain View, CA) using an intensified CCD camera (IC-200, PTI, Monmouth
Junction, NJ). Each of the selected wells is scanned and the adjacent images, completely covering the well, are stored on disk. Variability in the z-position between the
center of the wells turned out to be within 250 µm, which is taken as focus interval for
initial focusing. After autofocusing on the well center, deviation from best focus while
scanning the well is corrected over one-fifth of the initial focus interval. Focusing of
all fields further than 3 fields away from a focused field was sufficient to keep track of
the focal plane. The diameter of the fluorescent spheres is 15 µm (30 pixels), which
is much larger than zd . Since the spheres are homogeneously stained, the smallest
detail to consider in the z-direction is a cylindrically shaped slice through the spheres,
where the cylinder height is determined by the horizontal resolution. Therefore, stage
velocity is reduced to approximately one third of the sphere diameter during focusing.
Acquisition of Smooth Muscle Cells in Phase Contrast Mode
Smooth muscle cells were enzymatically isolated from the circular muscle layer of
guinea-pig ileum by a procedure adapted from [1]. Dispersed cells were suspended in
a HEPES buffered saline containing 1 mM CaCl2 . Aliquots (200 µl) of the cell suspension were distributed over test tubes and maintained at 37◦ C for 30 minutes. Then,
800 µl of medium containing the compound to be tested was added and cells were
incubated for 30 seconds. The reaction was stopped by addition of 1% glutaraldehyde.
A drop of each cell suspension is brought on a microscopic glass slide, and observed
in phase contrast mode (see tab. 5.1). A region containing sufficient cells is selected
manually and adjacent images are captured (Indy R4600 132MHz workstation, Silicon
Graphics, Mountain View, CA) using an MX5 CCD camera (Adaptec, Eindhoven,
The Netherlands). Autofocusing is performed at approximately the center of the
selected area, the smallest details being the elongated cells.
Immunocytochemical Label Detection in Fluorescence Mode
Human fibroblasts were seeded in a 96-well plate (Costar, Acton, MA) at 7000 cells
per well, in 2% FBS/Optimem. Cells were immunostained according to [6] with
primary antibody rabbit anti human NF-κ B (p65) (Santa Cruz Biotechnology, Santa
Cruz, CA) and secondary Cy3 labeled sheep anti rabbit (Jackson, Uvert-Grove, PA).
81
Table 5.1: Summary of the experimental setup and parameter settings for the various experiments.
The value for sigma (eq. 5.2) is given together with the smallest structure (d) in pixels. The focus
interval ∆z and depth of field zd are given in [ µm]. The effective velocity used during focusing is
given by veff in [ µm / 40 ms].
application
Quant neuronal morph
Cardiac myocyte dediff
Immunohist label det
C. Elegans screening
Acq smooth muscle
Immunocyt label det
mode
bright
phase
bright
fluoresc
phase
fluoresc
obj
5×
32×
2.5×
40×
10×
40×
(NA)
(0.15)
(0.4)
(0.075)
(0.6)
(0.3)
(0.6)
σ
1.0
1.0
1.0
8.5
1.0
8.5
(d)
(3)
(4)
(3)
(30)
(4)
(30)
∆z
500
100
1,000
50
500
250
zd
23.4
3.2
94
1.33
5.75
1.13/1.50
veff
24.7
2.5
98.7
4.94
4.94
4.94
Further, nuclear counter staining with Hoechst 33342 (Molecular Probes, Eugene,
OR) was applied.
Well plates are examined in fluorescence mode, see tab. 5.1. A DAPI-FITC-TRITC
filter (XF66, Omega Optical, Brattleboro, VT) in combination with a 100W Xenophot
lamp is used to excite the cells (emission nuclei at 450 nm, immuno signal at 600 nm).
Adjacent images are captured (O2 workstation R5000 180MHZ, Silicon Graphics,
Mountain View, CA) using an intensified CCD camera (IC-200, PTI, Monmouth
Junction, NJ). Autofocusing is performed at approximately the center of the scan
area, the smallest details being the nuclei. Cell thickness is about 5–15 µm, much
larger than zd . Therefore, during focusing, stage velocity is reduced to approximate
the cell thickness.
5.1.6
Evaluation of Performance for High NA
The autofocus algorithm performance is objectively evaluated by comparing focus
random error with observers. For this purpose, 2 µm epon sections of dog left ventricle
cardiac myocytes stained with periodic acid schift and toluidin blue are observed
with a Zeiss Axioplan. A high NA objective 40× NA 1.4 oil immersion is used, for
which the depth of field is zd = 0.36 µm (eq. 5.8). Autofocusing is considered not
trivial under these circumstances. Unfocused, arbitrarily selected fields (20 in total)
are visited and manually focused by two independent experienced observers. Focus
positions are recorded for both observers. Similarly, the found focus positions for the
autofocus algorithm is recorded (σ = 1.0, backlash correction 15 µm, ∆z = 25 µm).
Comparison of the random error between observers and for observer vs. autofocus
gives an objective evaluation of autofocus performance.
82
1
0.8
0.6
0.4
0.2
0
-250 -200 -150 -100 -50
0
50
z-position [um]
100 150 200 250
Figure 5.1: Focus function as measured for the smooth muscle cells in phase contrast mode. The
focus score (arbitrary units) of one representative field is plotted as function of the z-position. The
peaks are caused by phase transition effects; the focal plane for the cell bodies is at −75 µm.
5.2
5.2.1
Results
Autofocus Performance Evaluation
The focus algorithm was not able to focus accurately on the smooth muscle cells.
Figure 5.1 shows a representative focus curve measured with σ = 1.0. Measurement
of the focus curve at other scales resulted in similar curves. The peaks are caused
by phase transitions occurring when scanning through focus. For different focus positions, bright halos appear around the cells due to light diffraction [15]. The area
of the cell bodies is small compared to the size of the halos, and thus the relevant
image information content is too low. These circumstances caused failure of the focus
algorithm to accurately focus on the cell bodies.
For the other applications, fig. 5.2 shows the average focus curves, not considering
complete failures. The variation in focus score is mainly due to the different number
of cells or amount of tissue present in each field. For the time lapse of the cardiac
myocytes (fig. 5.2b), variation in focus score is caused by the dedifferentiation of the
cardiac myocytes over time. The variation in focus score for the immunohistochemical
label detection (fig. 5.2c) is caused by contrast differences between slices. Further, for
the quantitative neuronal morphology (fig. 5.2a), the measured focus curve with lowest
maximum score (peak at 0.004) is at a field containing only some dead cells. Note
the local maximum beneath focus, caused by a 180◦ phase shift in the point spread
function of the optical system [25].
Table 5.2 shows a summary of autofocus performance. All fields were accurately
focused according to an experienced observer, except for a few complete failures. Focus could not be determined on empty fields, as is the case for 14 failures in the C.
Elegans GFP-VM screening. For the immunohistochemical label detection, focusing
83
5.2. Results
Table 5.2: Summary of the results for the various experiments. The total number of focus events is
denoted by # events. The time needed for focusing is given by tfoc in seconds, and as percentage of
the total acquisition time tacq .
application
Quant neuronal morph
Cardiac myocyte dediff
Immunohist label det
C. Elegans screening
Immunocyt label det
mode
bright
phase
bright
fluoresc
fluoresc
events
180
75
100
1800
300
fail
0
0
2
14
2
(correct)
(100%)
(100%)
(98%)
(> 99%)
(> 99%)
tfoc
1.7
2.8
1.5
1.1
2.8
tacq
7.5%
7%
12%
14%
(tacq )
(4.5 min.)
—
(3 min.)
(4.5 hour)
(20 min.)
failed on 2 fields, which contained not enough contrast for focusing. Further, for
2 fields in the immuno signal of the immunocytochemical label detection, the camera
was completely saturated (bloomed) due to preparation artifacts, causing the autofocus algorithm to fail. For the C. Elegans GFP-VM screening, total acquisition time
for a 96-well plate was 4.5 hours for 28,000 images, which is reasonable given the time
needed for preparation.
In summary, failure is caused by a shortage of relevant image information content. The proposed algorithm was completely successful in determining correct focus
position for the thoroughly stained preparations of the quantitative neuronal morphology, even for fields containing only a few dead cells. Further, complete success
was achieved for the cardiac myocyte dedifferentiation. Despite the morphological
changes in image content during the experiment, none of the time lapse movies was
out of focus any time. A high success rate was obtained for the immunohistochemical
label detection, failing for 2 fields containing not enough contrast. For the fluorescence applications, the images were highly degraded by the presence of random noise
(SNR ≤ 10 dB) due to fluorescent bacteria (C. Elegans screening), camera noise and
structural noise caused by earth loops in combination with the extremely sensitive
CCD camera. Nevertheless, a high success rate was achieved.
5.2.2
Evaluation of Performance for High NA
Comparison between observer 1 and observer 2 resulted in an average error of
0.070 µm, whereas autofocus versus observer 1 resulted in 0.423 µm error. Hence,
the autofocus method as implemented is slightly biased. The root mean squared
error was 0.477 µm between observers, and 0.494 µm between autofocus and observer,
which both is in the range of the depth of field for the used objective. Maximum
error between observers was 1.27 µm, and for autofocus versus observer 1.12 µm, both
within the slice thickness of 2 µm. Concluding, even for high NA objectives, autofocus
performance is comparable to experienced observers.
84
1
1
mean
min
max
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
-250 -200 -150 -100 -50
0
50
z-position [um]
mean
min
max
0.8
0
100 150 200 250
-40
-20
(a)
0
z-position [um]
20
40
(b)
1
1
mean
min
max
0.8
mean
min
max
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-400
-200
0
z-position [um]
200
400
-60
-40
-20
(c)
0
20
z-position [um]
40
60
(d)
1
1
mean
min
max
0.8
mean
min
max
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-60
-40
-20
0
20
40
60
-60
-40
-20
0
z-position [um]
z-position [um]
(e)
(f)
20
40
60
Figure 5.2: Average focus score (arbitrary units) as function of the z-position measured for different
applications. a. Quantitative neuronal morphology. b. Cardiac myocyte dedifferentiation. c.
Immunohistochemical label detection. d. C. Elegans GFP-VM screening. e. Immunocytochemical
label detection nuclei and f. immuno signal, respectively. The measured focus curves indicated by
“max” and “min” represent the focus events resulting in the lowest and highest maximum score,
indicating variability and influence of noise on the estimate of the focus score.
85
5.2. Results
5.2.3
Comparison of Performance with Small Derivative Filters
In order to evaluate the effect of the scale σ in the estimate for the focus score,
experiments with σ = 0.5 are performed.
For the quantitative neuronal morphology, accurate focusing with σ = 0.5 was not
possible for 1 out of 24 fields. In this case, the algorithm focused on the reversed phase
contrast image. Application of the small scale in focusing of the cardiac myocyte dedifferentiation failed whenever fungal contamination at the medium surface occurred,
which was taken as focal plane. Taking σ = 1.0 solved this problem, that is by focusing persistingly on the myocytes. Focusing with σ = 0.5 on the immunohistochemical
label detection resulted in focusing on dust particles at the glass surface for 5 out of
24 fields. For the fluorescence applications, accurate focusing was not possible with
σ = 0.5, due to the small signal to noise ratio (SNR ≤ 10 dB). Experiments taken
with σ = 0.75 resulted in inaccurate focusing for 18 out of 30 fields for the C. Elegans
GFP-VM screening. Further, the algorithm was not able to focus accurately on 13
out of 30 fields for the nuclei in the immunocytochemical label detection, and failed
for 17 out of 30 fields on the immuno signal. Repeating these experiments with the
values of σ as given in tab. 5.1 resulted in accurate focus for all fields.
5.2.4
General Observations
The effect of the scale σ results in robustness against noise and artifacts. A larger scale
resulted in robustness against phase reversion (quantitative neuronal morphology),
fungal contamination at the medium surface (cardiac myocyte dedifferentiation), dust
on the glass surface (immunohistochemical label detection) and noise (the fluorescence
applications). The performance of small differential filters, as used in [2, 5, 19, 16], is
poor given the number of inaccurately focused images for σ = 0.5 or σ = 0.75.
For the different applications, the chosen focus interval was effectively used for
about 30%, i.e. the top of the measured focus curve was commonly within one-third
of the focus interval centered at the origin. The focus interval should not be taken
too narrow to ensure that the focal plane is inside the interval, regardless manual
placement of the preparations. An effective use of 30% of the interval for 95% of the
focus events seems an acceptable rule of thumb.
The time needed for the autofocus algorithm varied from 1.5 up to 2.8 seconds for
current sensors and computer systems, which is in the same time range as experienced
observers. Focus time is completely determined by the depth of field and the video
frame time, which both can be considered as given quantities, and by the size of the
focus interval. Therefore, further reduction of focus time can only be achieved by a
smaller focus interval, on the condition that the variability in preparation position is
limited. When positional variability is low or well known, the focus interval ∆z can
86
be reduced to exactly fit the variability. For the applications given, focus time can be
reduced up to a factor 3 in this way.
Failure of the autofocus algorithm due to a shortage of image content can be well
predicted. If the focal plane is inside the focus interval, there should be a global maximum in the estimate of the focus curve. Comparing the maximum focus score s o with
the highest of the focus scores at the ends of the focus interval, se = max(s(0), s(td ))
which are certainly not in focus, determines the signal content with respect to noise.
When the maximum score does not exceed significantly the focus scores at the ends of
the interval, or (so − se )/se < α, the found focus position should be rejected. In this
case, focusing can better be based on a neighboring field. For the reported results, a
threshold of α = 10% safely predicts all failures.
5.3
Discussion
The success of automatic morphological screenings holds or falls with the accuracy of
autofocus procedures. Although focusing is trivial for a trained observer, automatic
systems often fail to focus images in different microscopic modalities. Autofocus procedures are often optimized for one specific preparation, visualized in one microscopic
imaging mode. This report presents a method for autofocusing in multi-mode light
microscopy. The objective was to develop a focus algorithm which is generally applicable in microscopy, and robust against confounding factors common in microscopy.
Defocused images inherently have less information content than well focused images [5, 8]. Focus functions based on this criteria, such as the Gaussian derivative
filter used in the presented method, by definition respond to the best focus position
with a local maximum. Reliable focusing, without taking a-priori information into
account, is possible whenever the best focus response becomes the global maximum.
This criterion is fulfilled when the information content due to the signal is higher than
that of optical artifacts, inherent to some modes of microscopic image formation, and
noise. Sampling of the focus curve at Nyquist rate over the complete focal range
guarantees detection of the global maximum. Consequently, the present autofocus
method is generally applicable in any microscopic mode, whenever the amount of
detail in the preparation is of larger influence than artifacts and noise.
The effectiveness of the proposed method has been evaluated experimentally for
the following specimen: neuronal cells in bright-field, cardiac myocytes in phase contrast, neuronal tissue sections in bright-field, fluorescent beads and GFP-VM expressing C. Elegans Nemotodes, smooth muscle cells in phase contrast, and immunocytochemically fluorescent labeled fibroblasts. The method was not able to focus the
smooth muscle cells accurately, due to a lack of relevant image information content.
For the other experiments, 2830 fields were focused with an overall success rate of
99.4%, where of the remaining 0.6% failure could be safely predicted. For each new
Bibliography
87
specimen and microscope set-up, it suffices to set the parameters for scale σ, focus
interval ∆z and focus speed, which can be derived from the size of the structures in
the specimen, the used light and objective NA. In addition, for the scanning of large
preparation, the distance after which focus has to be corrected and the fraction of the
focus interval to correct for should be set.
In contrast to other autofocus methods, the proposed algorithm is robust against
confounding factors like: a. noise, b. optical artifacts, inherent to a particular
mode of microscopic image formation, as halos in phase-contrast microscopy, and c.
artifacts such as dust and fungal contamination, lying at a different focus level than
the preparation. Focusing is performed within 2 or 3 seconds, which is in the same
time range as trained observers. Moreover, even for high NA objectives, autofocus
accuracy is comparable to experienced observers. For high magnification imaging of
thick specimens, the method can be easily combined with focal plane reconstruction
techniques [23, 24].
No constraints have been imposed on the focus curve other than that the global
maximum indicates the focal plane. Hence, the method is generally applicable in light
microscopy. The reliability of the proposed autofocus method allows for unattended
operation on a large scale.
Bibliography
[1] K. N. Bitar and G. M. Makhlouf. Receptors on smooth muscle cells: Characterization by contraction and specific antagonists. J. Physiology, 242:G400–407,
1982.
[2] F. R. Boddeke, L. J. van Vliet, H. Netten, and I. T. Young. Autofocusing in
microscopy based on the OTF and sampling. Bioimaging, 2:193–203, 1994.
[3] L. Ver Donck, P. J. Pauwels, G. Vandeplassche, and M. Borgers. Isolated rat
cardiac myocytes as an experimental model to study calcium overload: the effect
of calcium-entry blockers. Life Sci., 38:765–772, 1986.
[4] L. Firestone, K. Cook, K. Culp, N. Talsania, and K. Preston Jr. Comparison of
autofocus methods for automated microscopy. Cytometry, 12:195–206, 1991.
[5] F. C. A. Groen, I. T. Young, and G. Ligthart. A comparison of different focus
functions for use in autofocus algorithms. Cytometry, 6:81–91, 1985.
[6] T. Henkel, U. Zabel, K. van Zee, J. M. Müller, E. Fanning, and P. A. Baeuerle.
Intramolecular masking of the nuclear location signal and dimerization domain
in the precursor for the p50 nf-κ b subunit. Cell, 68:1121–1133, 1992.
88
[7] E. T. Johnson and L. J. Goforth. Metaphase spread detection and focus using
closed circuit television. J. Histochem. Cytochem., 22:536–545, 1974.
[8] E. Krotkov. Focusing. Int. J. Computer Vision, 1:223–237, 1987.
[9] S. J. Lockett, K. Jacobson, and B. Herman. Application of 3d digital deconvolution to optically sectioned images for improving the automatic analysis of
fluorescent-labeled tumor specimens. Proc. SPIE, 1660:130–139, 1992.
[10] D. C. Mason and D. K. Green. Automatic focusing of a computer-controlled
microscope. IEEE Trans. Biomed. Eng., 22:312–317, 1975.
[11] M. L. Mendelsohn and B. H. Mayall. Computer-oriented analysis of human
chromosomes-iii: Focus. Comput. Biol. Med., 2:137–150, 1971.
[12] R. Nuydens, C. Heers, A. Chadarevian, M. de Jong, R. Nuyens, F. Cornelissen, and H. Geerts. Sodium butyrate induces aberrant tau phosphorylation and
programmed cell death in human neuroblastoma cells. Brain Res., 688:86–94,
1995.
[13] A. Papoulis. The Fourier Integral and Its Applications. McGraw-Hill, New York,
1960.
[16] J. H. Price and D. A. Gough. Comparison of phase-contrast and fluorescence
digital autofocus for scanning microscopy. Cytometry, 16:283–297, 1994.
[17] C. Steger. An unbiased detector of curvilinear structures. IEEE Trans. Pattern
Anal. Machine Intell., 20:113–125, 1998.
[18] N. Streibl. Depth transfer by an imaging system. Opt. Act., 31:1233–1241, 1984.
[19] M. Subbarao and J. K. Tyan. Selecting the optimal focus measure for autofocusing and depth-from-focus. IEEE Trans. Pattern Anal. Machine Intell.,
20:864–870, 1998.
[21] R. van Balen, D. Koelma, T. K. ten Kate, B. Mosterd, and A. W. M. Smeulders.
ScilImage: a multi-layered environment for use and development of image processing software. In H. I. Christensen and J. L. Crowley, editors, Experimental
Environments for Computer Vision & Image Processing, pages 107–126. World
Scientific Publishing, 1994.
Bibliography
89
[22] L. J. van Vliet, I. T. Young, and P. W. Verbeek. Recursive Gaussian derivative
filters. In Proceedings ICPR ’98, pages 509–514. IEEE Computer Society Press,
1998.
[23] H. S. Whu, J. Barbara, and J. Gil. A focusing algorithm for high magnification
cell imaging. J. Microscopy, 184:133–142, 1996.
[24] T. T. E. Yeo, S. H. Ong, Jayasooriah, and R. Sinniah. Autofocusing for tissue
microscopy. Imaging Vision Comput., 11:629–639, 1993.
[25] I. T. Young, J. J. Gerbrands, and L. J. van Vliet. Fundamental Image Processing.
Delft University of Technology, Delft, 1995.
[26] I. T. Young, R. Zagers, L. J. van Vliet, J. Mullikin, F. Boddeke, and H. Netten.
Depth-of-focus in microscopy. In Proceedings 8th SCIA, pages 493–498, 1993.
Chapter 6
Segmentation of Tissue
Architecture by Distance
Graph Matching
appeared in Cytometry, vol. 35, pp. 11–22, 1999.
“Cell and tissue, shell and bone, leaf and flower, are so many portions of matter, and it is in obedience to the laws of physics that their particles have been
moved, moulded and conformed. They are no exceptions to the rule that God
always geometrizes. Their problems of form are in the first instance mathematical problems, their problems of growth are essentially physical problems, and
the morphologist is, ipso facto, a student of physical science.”
– –D’Arcy W. Thompson.
Quantitative morphological analysis of fixed tissue plays an increasingly important role in the study of biological and pathological processes. Specific detection
issues can be approached by classical staining methods, enzyme histochemical analysis, or immunohistochemical processes. The tissue can not only be characterized by
the properties of individual cells, such as staining intensity or expression of specific
proteins, but also by the geometrical arrangement of the cells [4, 6, 10]. Interesting
tissue parameters are derived from the topographical relationship between cells. For
instance, topographical analysis in tumor grading can significantly improve routine
diagnosis [3, 8, 16]. Studies of growing cancer cell lines have revealed a non-random
distribution of cells [5, 14]. Partitioning of epithelial tissue by cell topography is used
for quantitative evaluations [17]. We propose a new method for the partitioning of
tissues. As an example, structural integrity of hippocampal tissue after ischemia will
be examined.
91
92
Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching
As a first step, tissue parts of interest have to be segmented into cell clusters.
Segmentation of cell clusters can be based on distances between the center of gravity
of the cells. The recognition of tissue architecture is then reduced to determining
borders of point patterns. The problem traced as such can be solved by the application
of neighbor graphs, and partitioning them.
The Voronoı̈ graph is often applied as a modeling tool for point patterns [3, 5,
13, 16]. The definition of the Voronoı̈ graph is given by polygons Z(p), where each
polygon defines the area for which all points are closer to marker p than to any
other marker [22]. A polygon Z(p) is called the zone of (geometrical) influence of p.
Neighboring markers to p are defined by the set of all markers for which the zone of
influence touches that of p. Such a tesselation of the plane depends on the spatial
distribution of the cell markers. Cluster membership is determined by evaluation of
geometrical feature measurements on the zones of influence [1].
Rodenacker et al. [17] used the Voronoı̈ graph for partitioning epithelial tissue.
Segmentation was obtained by propagating the neighbors from the basal layer of the
epithelial tissue to the surface. Borders between basal, intermediate and superficial
areas were determined by examining the occupied surface of propagation. In this way,
every third of the total area of the Voronoı̈ graph was assigned to one of the regions,
yielding three regions with approximately similar areas in terms of zones of influence.
As discussed elsewhere [5, 12], the Voronoı̈ graph is sensitive to detection errors.
Removal or insertion of one object will change the characteristics of the Voronoı̈
graph. A second drawback is that Voronoı̈’s graph is ill-defined at cluster borders.
This makes the Voronoı̈ graph unsuited for robust segmentation of tissue architecture.
Another option for the recognition of point patterns is a modification of the
Voronoı̈ graph: the k-nearest neighbor graph [11, 15, 19]. The neighbors of a point
p are ordered as the nearest, second-nearest, and up to k th -nearest neighbor of p.
The k-nearest neighbor graph is defined by connecting each point to its k-nearest
neighboring points [22]. The strength of each connection is weighted by the distance
between points. Similarity between k-nearest neighbor graphs is determined by comparing the graphs extracted from detected point patterns with prototype k-nearest
neighbor graphs.
In Schwarz and Exner [19], the distance distribution to one of the nearest neighbors
was used for the separation of clusters from a background of randomly disposed points.
The main drawback is that not all patterns can be discriminated by considering only
one specific k-nearest neighbor distance.
Lavine et al. [11] used sequences of sorted interpoint distances extracted from noisy
point images to match the image with one of a set of prototype patterns. Similarity
between prototype and point set is based on a rankwise comparison. From the two
sorted interpoint distance vectors, the corresponding (relative-) difference vector is
calculated. The number of components which exceeds a given threshold is used for
discrimination between patterns. A major disadvantage of the rankwise comparison
6.1. Materials and Methods
93
is that all components have to be detected. When the nearest neighbor is missed in
the detection, the first one in rank is compared with the second one. Thus, failure to
detect one cell results in poor similarity.
Automatic segmentation of tissue architecture is difficult because biological variability and tissue preparation have a major influence on the tissue at hand. The detection and classification of individual cells in the tissue is prone to error. Although
most authors [3, 5, 12] were aware of the lack of robustness in the quantification of
tissue architecture, little effort was made to incorporate uncertainty of cell detection in tissue architecture quantification methods. Lavine et al. [11] showed that the
k-nearest neighbor graph is well-suited for point pattern recognition under spatial
distortions, but the method used is not able to anticipate cell detection errors.
In this chapter we present a robust method for tissue architecture segmentation,
based on the k-nearest neighbor graph. A sequence comparison algorithm is used to
allow missing or extra detected cells in the detected point set. Uncertainty in cell
classification is incorporated into the matching process. Experiments show that the
robustness of the method presented is superior to that of existing methods.
The method is demonstrated by segmentation of the CA region in rat hippocampi,
where structural integrity of the CA1 cell layer is affected by ischemia. The correlation
between manual scoring and automatic analysis of CA1 preservation is shown to be
excellent.
6.1
6.1.1
Materials and Methods
Hippocampal Tissue Preparation
Rat brains were fixed by intracardiac perfusion with diluted Karnovsky’s fixative (2%
formaldehyde, 2.5% glutaraldehyde in Sörensen’s phosphate buffer; pH 7.4). They
were immersed overnight in the same fixative. Coronal vibratome sections of the
dorsal hippocampus were prepared stereotaxically 3.6 mm caudally to the bregma
(Vibratome 1000, TPI, St. Louis, MO). Slices (200 µm) were postfixed with 2%
osmium-tetroxide, dehydrated in a graded ethanol series, and routinely embedded in
Epon. Epon sections were cut at 2 µm and stained with toluidine blue.
6.1.2
Image Acquisition and Software
Images were captured by a CCD camera (MX5, Adimec, Eindhoven, The Netherlands), which is a 780 × 576 video frame transfer CCD with pixel size 8.2 × 16.07µm 2 ,
operating at room temperature with auto gain turned off. The camera was mounted
on top of an Axioskop in bright-field illumination mode (Carl Zeiss, Oberkochen,
Germany). The microscope was equipped with a scanning stage for automatic position control (stage and MC2000 controller, Märzhäuser, Wetzlar, Germany). The
94
31
60
28
43
40
25 35
30
49
37
24
Figure 6.1: Example of a k-nearest neighbor graph. The nodes represents cells in tissue, while the
edges represent their relation. The relations in this graph are given by the two nearest neighboring
cells, and edges are weighted by the distance between the cells.
scanning stage was calibrated for a 10× magnification and adjacent 512 × 512 images
were captured to ensure that complete hippocampi were scanned. Typical composite
image sizes were 6144 × 4096 pixels, or 4.94 × 3.30 mm2 . For image processing, the
software package SCIL-Image version 1.4 (TNO-TPD, Delft, The Netherlands) was
used on an O2 workstation (SGI, Mountain View, CA). The package was extended
with the distance graph matching algorithm.
6.1.3
K-Nearest Neighbor Graph
Consider an image of a tissue containing cells. Detection of cells in the image will
result in m markers at possible cell locations. Let V be the set of m detected cell
markers, V = {v1 , v2 , . . . , vm }. The elements in V are called vertices or nodes. A
graph G(V, E) (fig. 6.1) defines how elements of V are related to one another. The
relation between the vertices is defined by the set of edges E, in which the elements
eij connects the vertices vi to vj . A weighted graph is defined by the graph G(V, E),
where a value is assigned to each edge eij .
The k-nearest neighbor graph of a node v is defined as the subset of k vertices
closest to v. The edges between v and the neighboring vertices are weighted by the
Euclidian distance, or Nvk = { d1 , d2 , . . . , dk | di = dist(v, vi ), di < di+1 }. Taking
k = 1 for all v ∈ V results in the nearest neighbor graph, in which each cell is
connected to its closest neighbor.
The average edge length in the k-nearest neighbor graph gives a measure of scale
of the pattern of cells. Division of all distances di in a k-nearest neighbor graph by the
¯ normalizes the graph for scale, i.e., d˜i = di /d.
¯
average of all distances in the graph, d,
6.1.4
Distance Graph Matching
Point patterns of interest were extracted from the k-nearest neighbor graph. As an
example, consider fig. 6.2. A regular structured tissue was assumed, consisting of cells
95
Figure 6.2: Extraction of tissue architecture. A typical relationship around a cell is obtained from
an example of the tissue of interest (a). The prototype k-nearest neighbor graph is derived from
distances to cells (b). All prototypes shown are considered equal to fit deformed tissue parts. Further
freedom is given by a certain elasticity of the edges in the prototype graph. Extraction of the tissular
architecture proceeds by fitting the prototype graph on each cell and its neighborhood in the tissue
(c). Within the similar tissue parts, the graph will fit. Outside these regions, matching is limited
to only one or two edges. In order to safeguard against cell detection errors, not all edges in the
prototype have to fit the cellular neighborhood.
regularly distributed over the tissue. Such a point pattern reveals an equally spaced
layout everywhere within the tissue borders. The surrounding of each cell belonging
to the pattern can be modeled by the neighborhood of one single cell (fig. 6.2). The
k-nearest neighbor graph of a typical pattern cell gives a characterization of the point
pattern of interest. After selection of a typical cell, the pattern is given by a prototype
k-nearest neighbor graph, with distance set P = {p1 , p2 , . . . , pk }, where pi denotes
96
the prototype distances. Acceptance or rejection of a detected object as belonging to
the cell-cluster of interest is based on comparison of the observed k-nearest neighbor
distances Nvk , to the prototype defined by the characteristic distances to the neighbors
in P .
6.1.5
Distance Graph Comparison
The difference between observation and prototype set is expressed by the replacements necessary to match the prototype with the observation. This is referred to
as dissimilarity between sets [18]. For example, consider for simplicity the discrete
observed set {3, 10, 11, 15, 20, 20, 21, 25} and prototype {5, 5, 10, 10, 20, 20}.
When disregarding the last distances in the observation (21, 25), two substitutions
(3 7→ 5, 11 7→ 10), one insertion (5) and one deletion (15) transforms the observed
distance set into the prototype. So there are four modifications between prototype
and observation. The extra distances at the end of the observed set are necessary
for expanding the comparison when elements are deleted in the beginning of the set.
Without these extra elements, deletion of one item at the beginning of the set implies
the addition of an item at the end of the set. There will be no need for addition
when there is a cell at the correct distance. Therefore, the amount of elements in the
observation l should be larger than the prototype length k to allow for expansion in
the comparison.
A cost is assigned to each type of replacement. Let ci be the cost for insertion,
cd the cost for deletion, cs the cost for substitution, and cm the cost for matching.
In the example, 11 is closer to 10 than 3 is to 5, which can be reflected in their
respective matching costs. The minimum total cost t, necessary to transform the
observed set into the prototype, gives the similarity between the sets. The minimum
cost is obtained by using a string matching algorithm [18] (see Appendix).
The lowest possible value for the cost t is obtained when both sets are equal.
The amount of replacements is zero, and thus the cost is zero. An upper bound for
the cost necessary to match two sets is obtained when all elements are replaced. In
this case, either all elements are inserted at the beginning of the set, or all elements
are substituted, depending on the respective costs. The upper bound is then given
by tupper = k min(ci , cs ). Normalization of the minimum total cost gives a correspondence measure, indicating how well the observed pattern matches the prototype,
i.e.,
C=
tupper − t
× 100%.
tupper
(6.1)
Discrimination between two known point patterns, cluster and background, can
be based on example and counterexample. Consider the observed k-nearest neighbor
graph Nvk , the prototype P describing the pattern of interest, and a prototype B characterizing the background pattern. When elements in background B match elements
97
in P , the cost tbackgr related to matching P with B is less than the upper bound for
the minimum cost. Then, discrimination between the two patterns is enhanced by
normalizing the correspondence to the cost given by matching P with background B,
or
C0 =
tbackgr − t
× 100%.
tbackgr
(6.2)
Note that C 0 can be negative for patterns which neither correspond to the foreground
prototype nor to the background prototype. The extension to multiclass problems
can be made by considering prototype P for the class of interest, and prototypes
B1 , B2 , . . . , Bn for the remaining classes. Matching P with each of the prototypes
Bi gives the correspondences between the pattern of interest and the other patterns.
The pattern Bi which is most similar to P results in the lowest matching cost, which
should be used for normalization.
6.1.6
Cost Functions
The total cost depends on the comparison between each of the individual elements
of Nvk and P , and thus the replacements necessary to match them. The replacement
operations are given by insertion (cost ci ), deletion (cost cd ) substitution (cs ) and
match (cm ).
The cost for matching cm is zero when the two distances are equal. The difference
between two distances is defined as their relative deviation, or δ = |di − pj |/pj . Here,
di denotes the observed distance and pj the prototype distance with which to compare.
Robustness against spatial distortion is obtained by allowing a percentage deviation
α in the comparison of distances [11]. In this case, two distances are considered equal
as long as their relative deviation is smaller than the tolerance α. A minimum value
for α is given by the distance measurement error.
When the deviation percentage between two distances is higher than α, their
correspondence is included in the matching cost. The correspondence C then depends
on the total distance deviation between the compared elements. The matching cost
is taken linearly proportional to the distance deviation, or


if δ ≤ α
 0
s
(δ − α) csc−α
if α < δ < cs
cm =
(6.3)

 c
otherwise.
s
The cost for matching is cs if δ ≥ cs , which is equivalent to a substitution operation.
For our case, cell detector properties determine the costs for insertion. For a
sensitive detector, the probability to miss a cell is low. As a consequence, the cost
for insertion should be high compared to deletion. Alternatively, a low-sensitive cell
detector will overlook cells, but fewer artifacts will be detected. Thus, the costs for
98
insertion should be low relative to deletion. The insertion cost is therefore tuned to
the cell detector performance, or
ci
#A
∝
.
cd
#M
(6.4)
Here, #A denotes the estimate of the average amount of artifacts detected as cells,
and #M denotes the estimate of the average amount of missed cells.
The deletion cost is derived from object features. A probability distribution can
be obtained from well-chosen measurements, e.g., the contour ratio, on a test set of
objects. Afterwards, the probability P (vi ) for object vi being a cell is extracted from
the measured distribution. When an object has a low probability of being a cell, the
object should be deleted. Therefore, rather than considering a fixed deletion cost, the
probability of an object being a cell determines the deletion cost for that object, or
ci (vi ) ∝ P (vi ) .
(6.5)
As a result, the correspondence measure for the object under examination is only
slightly affected by the deletion of artifacts. The rejection of detected objects as
being artifacts can be based on both cell probability P (vi ) and the correspondence C
of the object to the cluster prototype.
6.1.7
Evaluation of Robustness on Simulated Point Patterns
Four algorithms, based on the Voronoı̈ graph, nearest neighbor distance, the method
of Lavine et al. [11], and the proposed distance graph matching, were tested in simulations. The segmentation performance was measured as a function of the input
distortion. The input consisted of a foreground point pattern embedded in a background pattern, distorted by some random process.
For the simulations, two arbitrarily chosen patterns were generated. A hexagonal
point pattern was embedded in a random point pattern with the same density, and the
same pattern was placed in a hexagonal pattern with half the density (fig. 6.3). Artificial distortion was added to the sets by consecutive random removal, addition, and
displacement of points. The distortion was regulated from 0% up to a maximum, resulting in a noisy realization of the ideal patterns. By removing points, the algorithm
is tested for robustness against missing cells. Addition of points reveals robustness
of the algorithm against false cell detections. Robustness against spatial distortion is
examined by means of point displacement. Each one of the four methods was tested
for robustness against the given distortions. The combination of removal and displacement of points shows robustness against touching cells. The other combinations
show the interaction of distortions on robustness.
The segmentation performance indicates how well the foreground pattern was discriminated from the background points. It was measured as function of the distortion.
99
L
H
M>
I
A
"'
& J
G
H
D
E
F
K
C
<B
C
< = " >
?
" :
@
A
;
:
8
9
9 67 : #
$
3
2 1
4 5
(
/
0 .
, (
*
+
)
" #
'
(
&
" ! #
%
$
(a)
(b)
Figure 6.3: Point patterns used for the experiments. a. A regular hexagonal pattern inside a
hexagonal pattern with half the density. b. A regular hexagonal pattern inside a random pattern
with the same density.
The performance of the various algorithms was measured as one minus the ratio of
the false negatives combined with the ratio of false positives, or
P =1−
#Fb
#Bf
−
#T ruthf
#T ruthb
(6.6)
Here, #Fb denotes the number of foreground markers classified as background, #B f
denotes the amount of background markers classified as foreground, and #T ruth f
and #T ruthb denotes the true number of foreground and background markers, respectively, in the distorted data set.
6.1.8
Algorithm Robustness Evaluation
For the experiments, the area of the influence zones in the Voronoı̈ graph was thresholded [7] in order to partition the test point patterns. The thresholds were chosen
such that 10% distortion on the distance to the nearest neighbors was allowed for the
undistorted foreground pattern. This yields calculation of the minimum and maximum area for scaled versions of the pattern, with scaling factor 0.9 and 1.1.
With regard to the nearest-neighbor distance, thresholds were taken such that
10% perturbation in the nearest-neighbor distance was allowed, determined in the
undistorted foreground pattern.
The method given by Lavine et al. [11] was tested for k ∈ {5, 10, 15, 20, 25}.
Implementation of this method was achieved by using the distance graph matching
algorithm. Examples of both foreground and background pattern were used for discrimination. Costs for insertion and deletion were taken as infinity (ci = cd = ∞);
thus, only substitutions or matches were allowed. The allowed perturbation in the
distances was set at 10% (cs = α = 0.1). The correspondence C 0 (eq. 6.2) was thresholded at 50%.
100
Experiments for the proposed distance graph matching method were taken with
prototype length k ∈ {5, 10, 15, 20, 25}. In order to allow the string matching
to expand, the amount of observed elements considered for matching was twice the
length of the prototype set (l = 2k). Examples of both foreground and background
pattern were used for discrimination. Substitution of cells was not allowed, except as a
deletion followed by an insertion operation. This can be achieved by taking the cost for
substitution equal to the sum of the costs for insertion and deletion (c s = ci + cd = c).
The costs for insertion and deletion were taken as equal. The allowed perturbation
in the distances was taken to be 10% (c = α = 0.1). The correspondence C 0 (eq. 6.2)
was thresholded at 50%. This way, parameters were set to permit fair comparison
between the four methods for tissue architecture segmentation.
6.1.9
Robustness for Scale Measure
In order to investigate the influence of distortions on the scale normalization measure,
¯ the average
the measure was tested in the simulations. The normalization factor d,
neighbor distance, was calculated under addition, removal, and displacement of points.
The percentage error to the initial scale measure, d¯ for 0% distortion, was measured
as function of the distortion. The amount of neighbors k considered for calculation
of the scale measure was taken to be {1, 5, 10, 15}.
6.1.10
Cell Detection
Cell domes were extracted from the hippocampal images by grey-level reconstruction [2], resulting in a grey-value image containing the tops of all mountains when
considering the input image as a grey-level landscape. From the dome image, saturated transparent parts were removed, and the remaining objects were thresholded.
The results contained cell bodies, neurite parts and artifacts. An opening was applied to remove the neurite parts. After labeling, the center of gravity of each object
was calculated and used for determination of the k-nearest neighbor graphs. The
reciprocal contour ratio (1/cr) was used as a measure for cell probability (eq. 6.5).
6.1.11
Hippocampal CA Region Segmentation
Segmentation of the CA region was obtained by supervised selection of an example
region. An arbitrary section, unaffected by ischemia, was taken and, after cell detection, one of the cells in the CA1 region was manually selected. The neighborhood of
the selected cell was used as a prototype for segmentation. No counter (background)
example was taken. Each of the four algorithms was used for segmentation of the
CA region. Parameters for segmentation were derived from the example neighborhood to permit fair comparison between the methods.
101
6.2. Results
Thresholds for the area of the influence zones in the Voronoı̈ graph were derived
from the example, such that 35% distortion on the distance to the nearest neighbor
was allowed.
For the nearest-neighbor method, thresholds were taken such that 35% distortion
on the nearest-neighbor distance in the example was allowed.
The method of Lavine et al. [11] was implemented by using the distance graph
matching algorithm. Costs for insertion and deletion were taken as infinity (c i =
cd = ∞), allowing only substitutions or matches with 35% tolerance (cs = α = 0.35).
The deletion cost for individual objects was adjusted by the cell probability, derived
from the contour ratio. For graph matching, 15 neighbors were taken into account. A
cell was considered as a cluster cell when the similarity between distance graph and
prototype was at least 50%.
For the distance graph matching method, substitution of cells was not allowed,
achieved by setting cs = ci + cd = c. The substitution cost was tuned to allow
for 35% distortion in the distances, from which the last 25% was included in the
correspondence measure (c = 0.35, α = 0.1). After visual examination of the detector
performance, the insertion cost was set at twice the deletion cost. The deletion cost for
individual objects was adjusted by the cell probability, derived from the contour ratio.
For the distance graph matching, 15 neighbors were taken into account. Matching
was allowed to expand to twice the amount of neighbors in the prototype (l = 2k). A
cell was considered as a cluster cell when the similarity between distance graph and
prototype was at least 50%.
6.2
6.2.1
Results
Algorithm robustness evaluation
Figure 6.4 shows the results of the performance of the algorithms on the simulated
point patterns, where 0% performance corresponds to random classification of the
markers. The distortion for removal and addition of points is given as the percentage
of points removed or added. For displacement of points, the distortion is given as percentage of displacement up to half the nearest neighbor distance (100%) of the undistorted hexagonal foreground pattern. When the distortion in displacement reaches
100%, the hexagonal pattern has become a random pattern, indistinguishable from
the random background pattern (fig. 6.4f). The optimum performances which can be
reached for the three types of distortion are shown in fig. 6.4b,d,f. In those cases, the
segmentation result corresponds to correct classification of all (remaining) markers.
The results of the combined experiments are examined for interaction between the
different kinds of distortions, and their relation with the individual performances.
The behavior of the algorithms under all distortions remains similar for both test
patterns. This suggests that the performance of the different methods is insensitive
102
to the type of test pattern.
For addition and displacement of points, the minimum and maximum performance
over the 25 simulation trials remains within 20% from the average. For removal of
points, the minimum and maximum performance was within 40% from the average
for the Voronoı̈, Lavine et al. [11], and distance graph matching methods. The nearest
neighbor method shows a deviation of 60% from the average for removal of points,
which is due to the normalization of the performance measure to the amount of
markers (eq. 6.6).
Figure 6.4a–d reveals that thresholding the area of influence in the Voronoı̈ graph
is inadequate in determining cluster membership when cell detection is not reliable.
No point can be removed or added without changing the Voronoı̈ partition for all
(Voronoı̈) neighbors surrounding the removed or added point. A second drawback
is the high initial error of 20% and 35%, respectively. Under displacement of points
(fig. 6.4e,f), segmentation based on the Voronoı̈ graph is shown to be robust. Figure 6.4f reveals the bias (100% distortion, 10% performance) for the Voronoı̈ graph
at the image border. Points near the image border are all (correctly) classified as
background due to their deviation from the normal area of influence, resulting in a
better than random classification for the indistinguishable fore- and background. Experiments for the Voronoı̈ method performed with thresholding the deviation on the
nearest-neighbor distance at 5% give only marginally better performances (data not
shown). For the combination of displacement and removal, the resulting segmentation
error showed both factors to be additive below 15% removal (data not shown). Similarly, for the displacement and addition of points the combined error was shown to be
the addition of errors caused by applying each distortion separately. The performance
under removal and addition of points is only slightly better than the addition of the
individual errors.
Segmentation based on the nearest-neighbor distance behaves like the optimum
when distorted by removal of points (fig. 6.4a,b). Under the condition of addition of
points (fig. 6.4c,d), performance is as bad as with the Voronoı̈ method. Since 10%
distortion on the nearest neighbor distances is allowed, the method performs well up
to the 10% displacement (fig. 6.4e,f). As shown elsewhere [19], segmentation based
on one of the other k-nearest neighbors is able to improve the discrimination between
patterns. Behavior of the method under distortions for higher k remains similar to the
results shown for k = 1. The performance for the combinations removal-addition and
removal-displacement was completely determined by addition and displacement (data
not shown), respectively. As can be expected from fig. 6.4a,b the influence of removal
of points may be neglected. For the combination of addition and displacement of
points, the effect on the segmentation error is the addition of the errors caused by
each distortion separately.
For the method of Lavine et al. [11], the results are shown for k = 10. The initial segmentation error between the test point patterns (fig. 6.4a,b) is smaller than
6.2. Results
103
with both the Voronoı̈ and nearest-neighbor method. Taking more neighbors into
account clearly results in better discrimination between point patterns. The performance under removal of points degrades faster than the nearest-neighbor segmentation (fig. 6.4a,b), while the performance for addition of points (fig. 6.4c,d) degrades
less severely for small distortions. The tolerance for spatial distortion is improved
in comparison to the nearest-neighbor method. Analysis based on larger neighborhood sizes (k ∈ 15, 20, 25) shows that the performance for removal and addition
of points degrades faster, whereas the performance improves under the condition of
displacement of points. Additionally, the initial error increases with a few percentages. For k = 5, segmentation performance is comparable, except for the initial error
which increases a few percentages. The error due to both the combinations removaldisplacement and addition-displacement was shown to be almost perfectly additive
(data not shown). For the combination of addition and removal of points, the error
due to removal is counteracted by the addition of points for large distortions.
The distance graph matching method performs slightly better than the method
of Lavine et al. [11] for removal of points (fig. 6.4a,b). Under the condition of point
addition, the distance graph matching method is clearly superior. The initial error
in the discrimination between both hexagonal foreground and background is zero for
both the distance graph method and that of Lavine et al. [11]. For the discrimination
between hexagonal foreground and random background, the initial performance for
the distance graph matching is better than with the method of Lavine et al. [11].
Performance for a small neighborhood size is comparable to the performance with
the method of Labine et al. [11] (k = 5). For large neighborhood sizes (k ≥ 15),
performance for removal and addition degrades faster, but remains better than with
the method of Lavine et al. [11]. Under displacement of points, the performance
increases for high k. Additionally, the initial error increases a few percentages. The
performance for the combined distortion from addition and displacement of points was
shown to be completely determined by the point displacement (data not shown). For
removal and addition, the error due to removal was reduced by the random addition
of points for severe distortions. The combination of removal and displacement was
shown to be better than the addition of the respective errors.
From these experiments, it can be concluded that both thresholding the area of
influence in the Voronoı̈ graph and thresholding the distance to one of the nearest neighbors are not suitable for robust segmentation of tissue architecture. The
experiments undertaken show the instability of the Voronoı̈ graph for detection errors. The Voronoı̈ graph is certainly useful for determination of neighbors [16], but
more robust parameters can be estimated from the Euclidian distance between these
neighbors [23]. The proposed distance graph matching algorithm indeed has a better
performance under detection errors than the method of Lavine et al. [11]. Therefore,
the distance graph matching method is more suitable for use in the partitioning of
tissue architecture.
104
100
100
80
80
% performance
% performance
60
40
20
Voronoi
nearest neighbor
Lavine
distance graph
optimum
60
40
20
0
0
0
20
40
60
% distortion
80
100
0
20
100
100
80
80
60
40
20
100
80
100
80
100
60
40
20
0
0
0
20
40
60
% distortion
80
100
0
20
(c)
40
60
% distortion
(d)
100
100
80
80
% performance
% performance
80
(b)
% performance
% performance
(a)
40
60
% distortion
60
40
20
60
40
20
0
0
0
20
40
60
% distortion
(e)
80
100
0
20
40
60
% distortion
(f)
Figure 6.4: Average segmentation performance is plotted as function of the distortion. Each point
represents the average performance over 25 trials for the given percentage of distortion. For the
method of Lavine et al. [11] and the distance graph matching method, results for k = 10 are shown.
a. Point removal, hexagonal background. b. Point removal, random background. c. Point addition,
hexagonal background. d. Point addition, random background. e. Point displacement, hexagonal
background. f. Point displacement, random background.
105
6.2. Results
100
% performance
80
60
40
20
remove
add
shift
0
0
20
40
60
% distortion
80
100
Figure 6.5: Influence of removal, addition, and displacement of points on the scale normalization
measure d¯ for k = 10. Average percentage error over 25 trials.
6.2.2
Robustness for Scale Measure
Robustness of the scale normalization was tested on both artificial data sets. Results
for k = 10 on the hexagonal-hexagonal data set are shown in fig. 6.5. The result for
k = 1 degrades for addition and displacement of points, while removal of points is
more stable. The results for k = 5 and k = 15 are almost identical to the results
shown for k = 10. The results for the hexagonal-random data set are almost identical
to the hexagonal-hexagonal results for k ∈ 5, 10, 15. The experiment shows that the
average k-nearest neighbor distance is useful in normalization for scale when taking
k large enough.
6.2.3
Hippocampal CA Region Segmentation
The new method of distance graph matching was tested on the segmentation of the CA
region in rat hippocampi (fig. 6.6a), based on the preservation of the CA1 structure
after ischemia [9]. Here, the correlation between manual and automatic counting of
the preserved cells in the CA1 region is shown. An example of the cell detection is
shown in fig. 6.6b. As a result from the distance graph matching, all cells in the CA
and Hillus region were extracted from the image (fig. 6.6c). Only cluster cells are
preserved in the segmented image.
The CA1 region (fig. 6.6) is that part inside the CA region, starting orthogonally
at the end of the CA inside the hillus, and ending where the CA region becomes
thicker before the U-turn. Manual counting was performed on 2–4 slices for each
animal, resulting in a total number of preserved neurons counted in a total length of
CA1 region (cells/mm) per animal.
To demonstrate the usefulness of the proposed segmentation method, correlation
between these manual countings and automatic counting is shown. Due to the ambiguous definition of the CA1 region, manual indication of the CA1 region in the
hippocampus image was necessary. For each hippocampus, three points were ob-
(a)
(b)
(c)
106
Figure 6.6: Example of the segmentation of cell clusters in the hippocampus of a rat. The line segment SME indicates the CA1 region. All
segmented cells in figure (c) between points S and E are considered part of the CA1 region. The length of the CA1 region is derived from the
length of line SME. a. Hippocampus image as acquired by the setup. b. The resulting image from the cell detection. c. Cell clusters after the
distance graph matching.
6.3. Discussion
107
tained, indicating the start (S), middle (M), and end (E) of the CA1 region. The
segmented cells between start and end point, and within a reasonable distance from
the line segment SME connecting the three points, were classified as belonging to
the CA1 region. The average amount of cells per unit length was calculated for the
obtained cell cluster. The cluster length was taken to be the length of line segment
SME. Figure 6.7 shows the correlation between the manual and automatic counting
for each of the algorithms tested. Results obtained with segmentation based on the
Voronoı̈ graph and for the nearest-neighbor distance do not correlate well with manual
counting. The method of Lavine et al. [11] is biased (mean error, -12.9) and results in
a mean squared error of 405.0 [20]. For the distance graph matching algorithm, the
mean error is 0.1 and the mean squared error is 174.8.
6.3
Discussion
The geometrical arrangement of cells in tissues may reveal differences between physiological and pathological conditions based on structure. This intuitive notion does
not imply the quintessence that the arrangement can be captured easily in an algorithm. Quantification of tissue architecture, when successful and objectively measurable, opens the way to better assessment of treatment response. Before deriving
parameters from tissue architecture, partitioning of the tissue in its parts of interest
is necessary.
We present a method for the segmentation of homogeneous tissue parts based on
cell clustering. The objective is to develop a method which is robust under spatial distortions intrinsic to the acquisition of biological preparations, such as squeezing the
tissue as well as taking a two-dimensional transection through a three-dimensional
block. These manipulation artefacts lead to two major confounding factors: a. distortion in the cell density, and b. errors in cell detection. Distortion in cell density
is reflected in the distance between cells. Irregularity or spatial distortion in the
cell positions, and thus distortion in the neighbor distances, is inherent to tissues.
Squeezing of tissue, or local nonrigid deformations result in structural changes in cell
density and thus changes in neighbor distances. Small changes in transection angle
cause loss of cells in regions of the tissue. A second source of error in cell detection
is the classification of artifacts as cells, or else cells may be overlooked during detection, causing lack of proper definition of local tissue architecture. When neighboring
cells touch one another, they are often erroneously detected as one single cell. The
method also deals with the uncertainty in cell classification often encountered in the
automatic processing of tissues. Errors in the assignment of cells on cluster borders
should be minimal to prevent influence of cluster shape on the segmentation result.
The quantitative method enables reliable classification of areas by type of tissue.
In contrast to other cell pattern segmentation methods, the proposed distance
108
140
140
120
120
automatic count [cells/mm]
100
80
60
40
20
100
80
60
40
20
0
0
0
20
40
60
80
100
manual count [cells/mm]
120
140
0
20
120
140
120
140
(b)
140
140
120
120
(a)
40
60
80
100
100
80
60
40
20
100
80
60
40
20
0
0
0
20
40
60
80
100
(c)
120
140
0
20
40
60
80
100
(d)
Figure 6.7: Correlation between average number of cells per mm CA1 length per animal counted
manually, and the number of segmented cells per estimated mm CA1 length per animal. Dashed line
indicates y = x. a. Voronoı̈ method. b. Nearest-neighbor method. c. Method of Lavine et al. [11].
d. Distance graph matching method.
graph matching algorithm meets the various demands as formulated above. Detection
errors as missing cells or artifact detection are corrected by respective insertion and
deletion operations. Deviation of the distances to neighboring cells is incorporated
by allowing some tolerance in distance matching. Local deformation of the tissue has
only minor influence as long as the deviation in distances remains within tolerance.
The total sum of errors, combined with deviation in distances, indicates how well the
cell and its environment fit the prototype environment. A possible drawback of the
algorithm is its insensitivity for orientation. It is possible for two different patterns
to have the same distance graphs. Under these circumstances, segmentation is not
possible by any algorithm based on interpoint distances.
Including cell probability in the matching process further improves segmentation
6.4. Appendix: Dynamic Programming Solution for String Matching
109
performance. The interference between the probability indicating that the object is
or is not a cell, and the fit of the object in the cluster prototype, allow a better
rejection of artifacts, while cluster cell classification is less affected. Cell confidence
levels can be derived from the evaluation of the probability distribution of cell features
as contour ratio. In order to remain independent of microscope and camera settings,
the cell features chosen should not depend on scale, absolute intensity, etc.
The selection of an example often involves a supervised (i.e., interactive) procedure. The design of such a procedure requires adherence to several principles [21].
Among other requirements, reproducibility under the same intention is considered the
most important for our purpose. As a consequence, any prototype selection algorithm
should only consider cells in conformity with the expert’s intention.
Application of the method to the detection of the CA structure in rat hippocampi
showed that even narrow elongated structures, only a few cells thick, can be wellsegmented using the proposed distance graph matching. Results obtained semiautomatically correlate well with manual countings of preserved cells in the CA1 region, as
long as there are enough cells left to discern regular clusters. The other segmentation
methods tested, based on the area of influence in the Voronoı̈ graph, the distance
to the nearest neighbor, and the method of Lavine et al. [11], resulted in poor correlation between automatic segmentation and the countings by the expert. For the
case of CA region determination, the proposed method proved to be compatible with
the perception of the pathologist. We have not applied the method to other tissue
segmentation problems.
For the recognition of tissue architecture, the proposed distance graph matching
algorithm has proven to be a useful tool. The method reduces the nonbiological
variation in the analysis of tissue sections and thus improves confidence in the result.
The present method can be applied to any field where regular patterns have to be
recognized, as long as the directional distribution of neighbors may be neglected.
6.4
Appendix: Dynamic Programming Solution for
String Matching
The dynamic programming solution for matching the observation with the prototype
is given in fig. 6.8. The graph searches a small set (horizontal) inside a larger set
(vertical). The graph represents horizontally the prototype set P = {p1 , p2 , . . . , pk }
and vertically the input set Nvl = {d1 , d2 , . . . , dl }. Each node C[i, j] in the graph
represents the comparison between the ith element from the prototype with the j th
element from the input set.
The directional edges in the graph determine which operations (deletion, insertion, or matching/substitution) are necessary to obtain the observed and prototype
distance at the same position in the comparison string. For instance, each valid path
110
start
p1
p2
Ci
d1
Cd Cm
p3
p4
…
pk
term
Ci
Cd Cm
d2
Cd
C[2,3]
…
d3
dl
term
Figure 6.8: The dynamic programming solution for string matching.
from “start” to node C[2,3] describes the operations necessary to end up with a set
where the third element in the observation is considered as the second one. A horizontal step represents insertion of the prototype element; the same observed distance is
compared to the next prototype element. A vertical step implies deletion of an observation; the next observed distance is compared to the prototype element. Matching
or substitution is represented by a diagonal step.
A cost is assigned to each edge. Using an edge to reach a particular node implies
the addition of the edge penalty to the total cost involved for reaching the node.
Horizontal edges have cost ci ; vertical edges cost cd . The cost for diagonal edges
depends on the comparison between the elements connected to the node from which
the arrow starts. The cost is zero when the elements match (cm = 0), cs when the
observed element is substituted for the prototype (when cs ≤ cm ), or cost cm for
making them match otherwise.
The cost to reach a particular node is the sum of all costs necessary when taking
some valid path from “start” to the node considered. The minimum cost to reach that
node is related to the path with the least total cost compared to all other possible
paths. When considering only the previous nodes, i.e., all nodes from which the one
under consideration can be reached, the problem can be reformulated into a recurrent
relation. In this case, the minimum cost path is given by the least of the minimum
cost paths to the previous nodes, increased by the cost for traveling to the node of
interest.
Comparison begins at the “start” node, and each column is processed consecutively
111
Bibliography
from top to bottom. In this manner, the minimum cost paths to the previous nodes
are already determined when arriving at a particular node. The minimum cost to
reach the node under consideration is then given by:

C[i, j − 1] + cd



C[i − 1, j] + ci
C[i, j] = min

C[i − 1, j − 1] + cm


C[i − 1, j − 1] + cs
.
(6.7)
The initial value at “start” is zero; the cost from “start” to the first node is also zero.
The cost assigned to nonexisting edges (at the border of the graph) are considered
infinity.
The “term” nodes at the bottom and right side of the graph are used for collecting
the matching costs assigned to matching the last element in the observation (bottom)
or the last element from the prototype (right). The term node C[k + 1, l + 1] describes
the costs associated with matching the input set exactly to the prototype. The only
interest is in finding the prototype in a (larger) number of observed distances, for
which the cost is given by node C[k + 1, k + 1]. This is the first node where the
observation is exactly transformed in the prototype. When there exist additional
insert and delete operations on the observed set which results in a smaller matching
cost, this path should be taken as the minimum cost path. Therefore, the minimum
total cost is given by the minimum of the term nodes from C[term, k+1] to C[term, l+
1].
The order of the string matching algorithm is O(l × k) [18]. Here, k is the amount
of neighbors in the prototype, and l is the amount of neighbors taken from the observation. When the cost for matching is constant, as is the cost for substitution, then
algorithms with a lower complexity are known to compare ordered sequences.
Bibliography
[1] N. Ahuja and M. Tuceryan. Extraction of early perceptual structure in dot
patterns: Integrating region, boundary, and component gestalt. Comput. Vision
Graphics Image Process., 48(3):304–356, 1989.
[2] S. Beucher and F. Meyer. The morphological approach to segmentation: The
watershed transformation. In E. R. Dougherty, editor, Mathematical Morphology
in Image Processing, chapter 12, pages 433–481. Marcel Dekker, New York, 1993.
[3] G. Bigras, R. Marcelpoil, E. Brambilla, and G. Brugal. Cellular sociology applied to neuroendocrine tumors of the lung: Quantitative model of neoplastic
architecture. Cytometry, 24:74–82, 1996.
112
[4] R. Chandebois. Cell sociology: A way of reconsidering the current concepts of
morphogenesis. Acta Bioth., 25:71–102, 1976.
[5] F. Darro, A. Kruczynski, C. Etievant, J. Martinez, J. L. Pasteels, and R. Kiss.
Characterization of the differentiation of human colorectal cancer cell lines by
means of Voronoı̈ diagrams. Cytometry, 14:783–792, 1993.
[6] K. J. Dormer. Fundamental Tissue Geometry for Biologists. Cambridge Univ.
Press, London, 1980.
[7] C. Duyckaerts, G. Godefroy, and J. J. Hauw. Evaluation of neuronal numerical
density by Dirichlet tessellation. J. Neurosci. Methods, 51:47–69, 1994.
[8] M. Guillaud, J. B. Matthews, A. Harrison, C. MacAulay, and K. Skov. A novel
image cytometry method for quantification of immunohistochemical staining of
cytoplasmic antigens. Analyt. Cell. Pathol., 14:87–99, 1997.
[9] M. Haseldonckx, J. Van Reempts, M. Van de Ven, and L. Wouters. Protection
with lubeluzole against delayed ischemic brain damage in rats. Stroke, 28:428–
432, 1997.
[10] H. Honda. Geometrical models for cells in tissues. Int. Rev. Cytol., 81:191–248,
1983.
[11] D. Lavine, B. A. Lambird, and L. N. Kanal. Recognition of spatial point patterns.
Pattern Rec., 16:289–295, 1983.
154:359–369, 1992.
[13] G. A. Meijer, J. A. M. Beliën, P. J. van Diest, and J. P. A. Baak. Image analysis
in clinical pathology. J. Clin. Pathol., 50:365–370, 1997.
[14] J. Palmari, C. Dussert, Y. Berthois, C. Penel, and P. M. Martin. Distribution of
estrogen receptor heterogeneity in growing MCF–7 cells measured by quantitative
microscopy. Cytometry, 27:26–35, 1997.
[15] C. R. Rao and S. Suryawanshi. Statistical analysis of shape of objects based on
landmark data. Proc. Natl. Acad. Sci. USA, 93:12132–12136, 1996.
[16] E. Raymond, M. Raphael, M. Grimaud, L. Vincent, J. L. Binet, and F. Meyer.
Germinal center analysis with the tools of mathematical morphology on graphs.
Cytometry, 14:848–861, 1993.
[17] K. Rodenacker and P. Bischoff. Quantification of tissue sections: Graph theory
and topology as modelling tools. Pattern Rec. Lett., 11:275–284, 1990.
Bibliography
113
[18] D. Sankoff and J. B. Kruskal. Time Warps, String Edits and Macromolecules:
The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading,
1983.
[19] H. Schwarz and H. E. Exner. The characterization of the arrangement of feature
centroids in planes and volumes. J. Microscopy, 129:155–169, 1983.
[20] L. B. Sheiner and S. L. Beal. Some suggestions for measuring predictive performance. J. Pharmacokinet. Biopharm., 9(4):503–512, 1981.
[21] A. W. M. Smeulders, S. Delgado Olabariagga, R. van den Boomgaard, and
M. Worring. Design considerations for interactive segmentation. In R. Jain
and S. Santini, editors, Visual Information Systems 97, pages 5–12, San Diego,
1997. Knowledge Systems Institute.
388, 1989.
[23] F. Wallet and C. Dussert. Multifactorial comparative study of spatial point
pattern analysis methods. J. Theor. Biol., 187:437–447, 1997.
Chapter 7
A Minimum Cost Approach
for Segmenting Networks of
Lines
submitted to the International Journal of Computer Vision.
Alice came to a fork in the road. ‘Which road do I take?’ she asked. ‘Where do
you want to go?’, responded the Cheshire cat. ‘I don’t know.’ Alice answered.
‘Then,’ said the cat, ‘it doesn’t matter.’
– –Lewis Carroll.
The detection of lines in images is an important low-level task in computer vision.
Successful techniques are available for the detection of curvilinear structures [4, 6, 12].
They are applied in pharmaceutical research, where interesting tissue parameters can
be obtained by the extraction of bloodvessels, neurites, or tissue layers. Furthermore,
the extraction of roads, railroads, rivers and channels from satellite or aerial images
can be used to update geographic information systems.
A higher level of information is obtained by connecting the lines into networks.
Applications here can be found in the roads between crossings or highways connecting
cities, the railway system in between stations, the neurite system connecting the
neurons, all yielding organizational information of the network under consideration.
Extraction of line networks rests on the detection of connections, the vertices in the
network, as well as their interconnecting curves. The linking of line points over the
interconnections is an ill-defined problem since the curves are likely to contain gaps
and branches. More attractive is to find the minimum cost path between vertices, the
path which contains most line evidence. The vertices can be used to guide the line
tracking. Network extraction is then reduced to tracing lines between vertices.
115
116
Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines
In this chapter, we consider the robust extraction of networks of lines by the
application of minimum cost graphs. Design objective is robustness against gaps in
lines, which we consider as the most prominent source of error in network extraction.
We propose a robust measure for edge saliency, which indicates confidence for each
connection.
7.1
Network Extraction Algorithm
A network consists of vertices interconnected by lines.
Definition 16 (Network of Lines) A network of lines is defined by a set of vertices
indicating line end points, and the corresponding set of lines representing interconnections, where none of the lines do cross.
The definition above implies vertices at crossings. The network can be segmented by
tracing the lines between vertices. Therefore, four steps are considered: a. the detection of line points, b. the detection of vertices, c. finding the optimal paths between
neighboring vertices yielding the lines, and d. the extraction of the network graph
from the set of vertices and lines. A flow diagram is given in fig. 7.1. Post-processing
may include pruning of the graph to remove false branches, and the assignment of
confidence levels to the found graph. Graph confidence is given by the saliency of the
detected lines, and the basin coverage indicating how much line evidence is covered by
the graph. If the network graph covers all line evidence, no lines are missed. However,
if not all line evidence is covered by the graph, lines may be missed during extraction.
Hence, basin coverage together with edge saliency indicate missed lines and spurious
lines in the network graph. Each of these steps are described in further detail below.
7.1.1
Vertex Detection
For specific applications, the network vertices are geometrical structures which are
more obvious to detect than the interconnecting lines. Often, these are salient points
in the image. We assume these structures to be detected as landmarks to guide
the line tracing algorithm. For a general method one may rely on the detection of
saddlepoints, T-junctions, and crossings to obtain vertices [9, 13].
7.1.2
Line Point Detection
Theoretically, in two-dimensions, line points are detected by considering the second
order directional derivative in the gradient direction [12]. For a line point, the first
order directional derivative perpendicular to the line vanishes, where the second order
directional derivative exhibits an extremum. Hence, the second order directional
117
7.1. Network Extraction Algorithm
(a)
(b)
Figure 7.1: Flow diagram for network extraction. a. Action flow diagram, b. the corresponding
data flow. Graph extraction results in the network graph, line saliency indicating confidence for the
extracted lines, and basin coverage indicating missed lines.
derivative perpendicular to the line is a measure of line contrast. The second order
directional derivatives are calculated by considering the eigenvalues of the Hessian,
µ
¶
fxx fxy
H=
(7.1)
fxy fyy
given by
λ± = fxx + fyy ±
q
2
(fxx − fyy ) + 4fxy 2
(7.2)
where f (x, y) is the grey-value function and indices denote differentiation. After
ordering of the eigenvalues by magnitude, |λ+ | > |λ− |, λ+ yields the second order
directional derivative perpendicular to the line. Bright lines are observed when λ + < 0
and dark lines when λ+ > 0 [10]. For both types of lines, the magnitude |λ+ | indicates
line contrast. Note that this formulation is free of parameters.
In practice, one can only measure differential expressions at a certain observation
scale [5, 7]. By considering Gaussian weighted differential quotients, f xσ = G(σ)x ∗ f ,
a measure of line contrast is given by
¯ ¯ 1
R(x, y, σ) = σ 2 ¯λσ+ ¯ σ
b
(7.3)
where σ, the Gaussian standard deviation, denotes the scale for observing the eigenvalues, and where line brightness b is given by
(
fσ
if λσ+ ≤ 0,
σ
(7.4)
b =
σ
W −f
otherwise.
118
Line brightness is measured relative to black for bright lines, and relative to white
level W (255 for an 8-bit camera) for dark lines. The original expression (eq. 7.2) is
of dimension [intensity/pixel2 ]. Multiplication by σ 2 , which is of dimension [pixel2 ],
normalizes line contrast (eq. 7.3) for the differential scale. Normalization by line
brightness b results in a dimensionless quantity. As a consequence, the value of R(.)
is within [0 . . . 1].
The response of the second order directional derivate |λ+ | does not only depend
on the image data, but it is also affected by the Gaussian smoothing scale σ. By
analysis of the response to a given line profile as function of scale, one can determine
the optimal scale for line detection. For a bar-shaped line profile of width w, the
response of R(.) (eq. 7.3) as function of the quotient q = w/σ is plotted in fig. 7.2.
The response of R(.) is biased towards thin lines, and gradually degrades for larger w.
For a thin line w → 0 the response equals line contrast, whereas for a large value of w
relative to σ the response vanishes. Hence, the value of σ should be large enough to
capture the line width. For optimal detection of lines, the value of σ should at least
equal the width of the thickest line in the image,
σ≥w .
(7.5)
When line thickness varies, one can set the value of σ to the size of the thickest line
ŵ to expect,
σ = ŵ .
(7.6)
In this case, response is slightly biased to thin lines.
The differential equation (eq. 7.3) is a point measure, indicating if a given pixel
belongs to a line structure or not. The result is not the line structure itself, but a
set of points accumulating evidence for a line. In the sequel we will discuss how to
integrate line evidence to extract line structures.
7.1.3
Line Tracing
Consider a line and its two endpoints S1 and S2 . For all possible paths Ξ between S1
and S2 , the path which integrates most line evidence is considered the best connection between the vertices. Therefore, we reformulate the line tracing to a minimum
cost optimization problem. First, let r(x, y, σ be a cost function depending on R(.)
(eq. 7.3),
r(x, y, σ) =
²
² + R(x, y, σ)
and let us define the path integral, taking σ for granted, to be
Z S2
c(S1 , S2 ) = min
r (x(p), y(p)) dp .
Ξ
S1
(7.7)
(7.8)
119
1
0.9
0.8
R(0,q)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
q
Figure 7.2: Response of R(.) (eq. 7.3) at the centerline as function of relative line width q = w/σ.
Here, (x(p), y(p)) is the Cartesian coordinate of the path parameterized by the linear
path coordinate p. The path integral c(S1 , S2 ) now yields the integrated cost (eq. 7.7)
over the best defined path in terms of line contrast R(.). For high line contrast, the
line is well-defined, the cost r(.) should be determined by 1/R(.) ≈ 0. For a low
value of R(.), the cost should approximate 1, such that the Euclidian shortest path is
traced. Hence, the constant term ² in (eq. 7.7) determines the trade-off between either
following the maximum line contrast or taking the shortest route. The value of ² is
typically very small, e.g. ² = 0.001, compared to the line contrast, henceforth assures
that plateaus are crossed. Note that line extraction does not introduce additional
parameters.
7.1.4
Graph Extraction
Now consider an image containing vertices S = {S1 , S2 , . . . , Sn }. For our case, lines
are connecting neighboring vertices. The aim is to extract the network graph G =
(S, E) with vertices S, and edges E, the interconnecting lines given by the minimum
cost paths. As there will be no crossing paths (see section 7.1.1), the graph G may be
found by a local solution. Hence, we concentrate on connecting neighboring vertices.
Neighbors are defined by assigning a zone of influence to each vertex, where each
region Z(Si ) defines the area for which all points p are closer to Si than to any other
vertex [15],
©
ª
Z(Si ) = p ∈ IR2 , ∀A ∈ S \ {Si }, c(p, Si ) < c(p, A) .
(7.9)
Here, distance is measured with respect to cost c(p, Si ) (eq. 7.8). The regions of in-
120
fluence correspond to the catchment basins of the topographical watershed algorithm
[11]. Neighboring vertices to Si are defined by the set of all vertices for which the
zone of influence touches that of Si . Hence, neighboring vertices share an edge in the
topographical watershed segmentation. The minimum cost path Ψij between Si and
Sj runs over the edge shared by Si and Sj .
The graph G is computed by applying the topographical watershed transform.
First, the grey-weighted distance transform is applied on the cost image given by r
(eq. 7.7), with the vertices S as mask. The grey-weighted distance transform propagates the costs from the masks over their influence area, resulting in a wavefront
collision at places where two zones of influence meet. The collision events result in the
edges between neighboring vertices, yielding the watershed by topographic distance.
The minimum cost path between two neighboring vertices runs over the minimum in
their common edge. Therefore, any edge between two neighboring vertices is traced
for its minimum. Steepest descent on each side of the saddlepoint results in the minimum cost path between the vertices. Tracing the steepest descents for all different
borders between the zones of influence results in the network graph G.
The described algorithm requires one distance transform to find the zones of influence. Hence, the order of the algorithm is determined by the grey-weighted distance
transform, which is of order O(N 2 ), N being the image dimension [14]. Note that the
graph algorithm is free of parameters.
7.1.5
Edge Saliency and Basin Coverage
A natural measure of edge saliency is the integrated line contrast (eq. 7.3) over the
edge,
s(S1 , S2 ) =
Z
S2
R (x(p), y(p)) dp
(7.10)
S1
where S1 , S2 are start and end node, and where (x(p), y(p)) is the path. Note that
s(.), as R, is a dimensionless and parameter free quantity. A confidence measure
indicating how well the edge is supported by the image data is given by the average
saliency over the line,
s̄(S1 , S2 ) =
1
s(S1 , S2 )
l
(7.11)
where l is the path length. Again, s̄ is a dimensionless quantity, with range [0 . . . 1],
a high value indicating a well-defined line.
Each basin in the minimum cost graph is surrounded by a number of connected
paths forming the basin perimeter. An indication of segmentation confidence for a
basin B may be obtained by considering the average saliency over the surrounding
lines, compared to the average line contrast inside the graph basins. The average
121
saliency over the basin perimeter is given by
I
1
s̄B (B) =
R (x(p), y(p)) dp
l
(7.12)
B
l being the basin perimeter, and p representing the linear path coordinate. A high
value, in the range [0 . . . 1], indicates a well-outlined basin.
The average line contrast within basin B is measured by
ZZ
1
R (x(p), y(p)) dx dy .
(7.13)
c̄B (B) =
A(B ª )
Bª
B ª is the basin eroded by a band of thickness given by σ. Erosion is applied to
prevent the detected line points, smoothed by the Gaussian at scale σ, to influence
the basin contrast. In (eq. 7.13), A(.) is the area of the eroded basin. The value of
c̄B increases when line structures are present inside the basin, maybe due to a missed
line in the graph. Coverage of the graph G is defined by the ratio of the line contrast
remaining inside the basins relative to the line contrast covered by the graph edges,
c̄(B) = 1 −
c̄B (B)
.
s̄B (B)
(7.14)
When all line points are covered by the basin perimeter, c̄ will be close to one. For a
basin containing a missed line, the average line contrast over the basin will be high.
When a spurious edge outlines the basin, summed contrast over the edges will be low,
yielding a lower coverage value.
7.1.6
Thresholding the Saliency Hierarchy
The graph G is constructed such that neighboring vertices are connected, regardless
the absence of interconnecting lines. For a spurious connection saliency will be low,
since evidence of a connecting line is lacking. Pruning of the graph for spurious lines
may be achieved by thresholding on saliency. Pruning by saliency of G imposes a
hierarchy on the graph, ranging from graph G with all edges included, up to the
graph consisting of the one best defined edge in terms of contrast. The threshold
parameter indicates the saliency level of the hierarchy. Note the introduction of a
parameter, indicating the application dependent hierarchy level of the graph. We
propose two methods to prune edges by saliency.
First, global pruning may proceed by removing all ill-defined lines for which
s̄(S1 , S2 ) < α .
(7.15)
The resulting graph consists of the most contrasting lines, removing lines for which
contrast is below the threshold. The method is applicable when a clear distinction
between lines and background is present.
122
For the case of textured background, a local pruning method based on local comparison of edge saliency may be applied. Pruning of low confidence edges is installed
by removing all edges for which an alternative path can be found, via other vertices,
with higher confidence. Path confidence between S1 and Sn via vertices Si is defined
by the average saliency over the n edges,
Ãn−1
!
1 X
s(Si , Si+1 ) .
(7.16)
s̄(S1 , S2 , . . . , Sn ) =
l i=1
Here, l is the total path length. The direct path between S1 and Sn is pruned when
min s̄(S1 , . . . , Sn ) < αs̄(S1 , Sn )
(7.17)
where the minimum is taken over all alternative paths between S1 and Sn . Locally
ill-defined lines are removed from the graph, the degree of removal given by α. For
α = 0, no lines are removed, whereas for α = 1 all lines are removed for which a
detour via another vertex yields higher saliency. Hence, short ill-defined paths are
pruned when longer well-defined paths exist between the vertices. The method is
applicable for a textured background, and when enough connections are present to
determine alternative routes between neighboring vertices.
7.1.7
Overview
The algorithm is illustrated in fig. 7.3. The figure shows the extraction of cell borders
from heart tissue (fig. 7.3a). Extracted vertices are indicated in fig. 7.3b. Line contrast
is calculated according to (eq. 7.3), shown in fig. 7.3c. The tracing of minimum cost
paths is shown in fig. 7.3d. Most of the lines are correctly detected, together with some
spurious lines. Local pruning the graph results in fig. 7.3e. Here, all edges which are
not supported by the image data are removed. Figure 7.3f shows the area coverage,
where black indicates c̄ = 1, and white indicates c̄ = 0.
In summary, we have proposed a one-parameter algorithm for the extraction of
line networks. The parameter indicates the saliency level in a hierarchical graph. The
graph tessellates the image into regions, where each edge travels over the minimum
cost path between vertices. The resulting graph is labeled by edge saliency and area
coverage, both derived from line contrast.
7.1.8
Error Analysis
The robustness of the proposed algorithm can best be evaluated when considering the
different types of errors that may occur in forming the network graph. Table 7.1 gives
an overview of possible errors and their consequences on the network graph G. The
columns give a complete representation of the consequences an error may have on the
network graph G. The rows overview the errors which may result from the vertex
123
and line detection. In the sequel we discuss the sensitivity of the proposed algorithm
to these types of errors.
When the image contains textured regions, the texture may cause a high response
for the line point detection. Hence, the algorithm will falsely respond to the texture as
being an underbroken line and find an optimal path, illustrated in fig. 7.4a. Further,
when spurious line structures are present in the image data, without being part of
the network, distortions may occur when the line is near other interconnections. In
that case, the best path between vertices may be via the spurious line. An example
is shown in fig. 7.4b, where text interferes with dashed line structures. For missed
lines, basin coverage degrades. As the line structure is not part of the network, such
sensitivity is unwanted.
Gaps in lines, or lines slightly off the vertex, illustrated by fig. 7.4c, will have no
consequences except that saliency degrades.
When a line structure is of too low contrast to contribute enough to form a line,
the line maybe pruned after confidence thresholding. An example of a missed line is
shown in fig. 7.4d. As a consequence, coverage degrades, thereby indicating the event
of a missed line.
For the case of a falsely detected vertex off a line (no example available), the
vertex will be connected to the network. Saliency of the spurious lines will be low as
line evidence is missing from the image. Hence, pruning of the network by saliency is
likely to solve such errors.
Spurious or missed vertices at lines has, except for the insertion or deletion of
a vertex, respectively, no consequence for the extracted network. An examples of
spurious vertices is given in fig. 7.4e. The measure of saliency is invariant for insertion
and deletion of vertices. This is proven by considering the path integral (eq. 7.10).
Insertion of a vertex Sx at the path S1 , S2 results in
s(S1 , Sx ) + s(Sx , S2 )
=
=
Z
Sx
R (x(p), y(p)) dp +
S1
Z S2
Z
S2
R (x(p), y(p)) dp
Sx
R (x(p), y(p)) dp
S1
=
s(S1 , S2 ) t
u
which is of course similar to the original saliency. Invariance for vertex deletion
follows from the reverse argumentation. For the average line contrast within the
graph basins is not affected by insertion or deletion of vertices at edges, coverage
(eq. 7.14) is invariant for vertex insertion or deletion at lines.
More critical is overlooking a vertex at a fork or line-end. An example of a missing
vertex is shown in fig. 7.4f. In both cases, an edge is missed in the resulting graph,
and coverage degrades as not all line points are covered by the graph edges. For the
missing of a vertex at a line end, the line is maybe connected to a different vertex,
124
Table 7.1: Types of errors, in general for extraction of networks of lines and their consequences.
Columns denote events in graph construction, whereas rows represent detection errors. Wanted
sensitivity of the proposed algorithm is indicated by “+”, whereas unwanted sensitivity to errors is
indicated by “–”. Robustness of the proposed method to errors is indicated by ¤.
Error type
spurious line
gap in line
line off vertex
missed line
spurious vertex
off line
at line
missed vertex
at line
at fork
at line end
vertex
insert
¤
¤
¤
¤
vertex
delete
¤
¤
¤
¤
edge
insert
¤
¤
¤
¤
edge
delete
¤
¤
¤
–
edge
deviation
–
¤
¤
¤
saliency
(eq. 7.10)
¤
+
+
¤
coverage
(eq. 7.14)
–
¤
¤
+
–
–
¤
¤
–
¤
¤
¤
¤
¤
+
¤
¤
¤
¤
¤
¤
–
–
–
¤
¤
¤
¤
–
–
¤
¤
–
¤
¤
¤
¤
+
+
causing the crossing of the background by the minimum cost path. Pruning of the
network by saliency is likely to solve the error.
Except for errors general for the extraction of networks of lines, the proposed algorithm generates errors specific for minimum cost path based methods. By definition,
only one path between two vertices can be of minimum cost. Any other path connecting the same vertices will be removed, as illustrated in fig. 7.5a. As a consequence,
an edge is missed in the graph, and basin coverage degrades indicating the event of a
missed line.
Further, when a better defined path exists in the neighborhood of the traced
path, the algorithm tends to take a shortcut via the better defined path, as shown in
fig. 7.5b. In that case, coverage degrades to indicate the missed line, whereas saliency
increases due to the better defined route.
In conclusion, the proposed method is robust against: a. gaps in lines, b. lines
slightly off their vertex, c. spurious lines, and d. spurious vertices at lines. The
algorithm is sensitive to: a. missed lines, b. spurious vertices off lines, and c. missed
vertices at forks. For missed vertices, the resulting graph is degraded. For missed
lines, the graph may be degraded, and confidence of the area in which the missed line
is situated may be too high. Specific for the algorithm is the sensitivity to shortcuts,
and the inability to trace more than one line between connections.
7.2. Illustrations
7.2
7.2.1
125
Illustrations
Heart Tissue Segmentation
Figure 7.3 illustrates the application of the proposed algorithm on the extraction of
cells from heart tissue. The tissue consists of cardiac muscle cells, the dark textured areas, and bloodvessels, the white discs. Cell borders are transparent lines
surrounding all cardiac muscle cells. Due to the dense packing of cells, bloodvessels
are squeezed between the cells. The cell borders appear as bright lines connecting the
bloodvessels. Further, the dense packing causes gaps in the lines at places were light
microscopic resolving power is too low to examine the cell border.
In the cardiac muscle cell application, the bloodvessels are considered as initial
vertices. The vessels are detected by dome extraction [3] (fig. 7.3a). The extracted
network graph, together with basin saliency and coverage, is shown in fig. 7.3d,e,f. The
heart tissue segmentation is a successful application in that a large number of cells
is correctly segmented by the proposed algorithm. Individual cell parameters may be
estimated after selecting those cells with high saliency and coverage. The amount of
cells extracted from the tissue is in the same range as for qualitative studies based on
interactively outlining the cells by experts [1, 2, 16]. Hence, the algorithm enables the
quantitative assessment of morphological changes in heart tissue at a cellular level.
7.2.2
Neurite Tracing
A second example fig. 7.6 shows interactive segmentation of neurites. The neurite
starting points at the cell bodies are interactively indicated, and used as initial markers
for the network segmentation algorithm. The resulting network is shown in fig. 7.6b.
In this case, pruning of lines is not possible since no alternative routes between the
markers are present. Paths between cells which are not connected are removed by
thresholding the saliency (eq. 7.15). Note that no errors are caused from lack of
line structure, indicated in fig. 7.6a. The overall saliency of the result is s̄ = 0.44,
indicating that the line contrast spans almost half the dynamic range of the camera.
Coverage c̄ = 0.95, indicating that 95% of the line structures present in the image is
covered by the network graph. Hence, the result is considered highly accurate.
7.2.3
Crack Detection
An example of general line detection is shown in fig. 7.7, where cracks in ink at high
magnification are traced. The image shows an ink line, at such a magnification that
the ink is completely covering the image. Cracks in the ink form white lines, due to
the transmission of light, against a background of black ink. Note that no natural
markers are present.
126
For the general case of line detection, saddlepoint detection may be used to extract
markers. The saddlepoints on bright lines are detected by
fxσ = 0,
fyσ = 0,
λσ+ < 0 ,
σ σ
σ 2
fxx
fyy − fxy
< −α .
(7.18)
Here, α indicates salient saddlepoints, and is typically small to suppress spurious
saddlepoints due to noise. The saddlepoints are used as markers for the network
extraction algorithm.
The detected saddlepoint are highlighted in fig. 7.7b. The result of the proposed
algorithm, the saddlepoints as vertices, is shown in fig. 7.7c. Average saliency is
thresholded (eq. 7.15) to remove paths which cross the background. Overall saliency
of the graph is 0.313, and coverage 0.962. The cracks are successfully extracted by
the proposed algorithm, except that the crack ends are missing when end markers are
absent. In that case, the detected cracks are too short.
Since no natural markers are present, the algorithm should be robust against
marker insertion or deletion at lines. Figure 7.7d shows the result after random removal of half the markers in fig. 7.7c. Errors in the result include a shortcut via a more
contrasting line. Further, line ends are pruned due to the absence of markers. Note
that saliency is only marginally affected by the new situation, 0.316 instead of 0.313,
whereas coverage likewise is reduced marginally, from 0.962 to 0.960. Hence, the algorithm is robust for variations in the threshold value α for saddle point detection
(eq. 7.18).
7.2.4
Directional Line Detection
Characteristic for the proposed algorithm is that line evidence is accumulated over
the line. When line evidence is absent, the algorithm optimizes the shortest path to
the neighboring line parts to continue integration. As a result, when large gaps are
present, the algorithm may find an alternative route by crossing the background to
a neighboring line, tracking that line, and jumping back to the original line after the
gaps. The problem may be solved by including line orientation information into the
algorithm.
To proceed, we consider directional filtering for detection of line contrast. Consider (eq. 7.3), which was measured by isotropic Gaussian filters of scale σ. For the
directional filtering, we consider anisotropic Gaussian filters of scale σ l and σs , for
longest and shortest axis, respectively, and of orientation θ. Hence, line contrast is
given by
¯
¯ 1
¯
¯
R0 (x, y, σl , σs , θ) = σl σs ¯λσ+l ,σs ,θ ¯ σ ,σs ,
(7.19)
b l
where bσl ,σs is given by (eq. 7.4). The scale σs depends on line width as given by
(eq. 7.6), whereas σl is tuned to adequately capture line direction. Hence, σl should
7.3. Conclusion
127
be large enough to bridge small gaps, but should be not too large to prevent errors
when line curvature is high. In practice, an aspect ratio of σs = ŵ and σl = 3σs is
often sufficient.
Now we have established how to filter in a particular direction, the filter need to be
tuned to the line direction. Therefore, two options are considered. First, eigenvector
analysis of the Hessian results in the principal line direction. One could apply a first
undirectional pass to obtain line direction, as described by Steger [12]. A second
pass yields the tuning of the filter at each pixel in the line direction to obtain line
contrast. Note that the filter orientation may be different for each position in the
image plane. Instead of tuning the filter, sampling the image at different orientations
may be applied. One applies (eq. 7.19) for different orientations. When the filter is
correctly aligned with the line, filter response is maximal, whereas the filter being
perpendicular to the line results in low response. Hence, the per pixel maximum line
contrast over the orientations yields directional filtering.
The proposed method is applied to an example of a dashed line pattern as given in
fig. 7.8a, taken from [8]. The example is taken from the hardest class, the “complex”
patterns. The grey dots represent interactively selected markers, indicating crossings
and line end points. Orientation filtering is applied at 0◦ , 30◦ , 60◦ , 90◦ , 120◦ , and 150◦ ,
for which the maximum line contrast per pixel is taken over the sampled orientations.
The result after graph extraction and saliency thresholding is shown in fig. 7.8b. The
crude sampling of orientation space causes some of the lines to be noisy. A better
sampling enhances the result. Further, one line part is missed, due to a shorter line
connecting the same markers. The text present in the example causes the algorithm
to follow parts of the text instead of the original line. Not that isotropic line detection
does not adequately extract the graph (fig. 7.8c).
7.3
Conclusion
The extraction and interpretation of networks of lines from images yields important
organizational information of the network under consideration. We present a oneparameter algorithm for the extraction of line networks from images. The parameter
indicates the extracted saliency level from a hierarchical graph. Input for the algorithm is the domain specific knowledge of interconnection points. The algorithm
results in the network graph, together with edge saliency, and catchment basin coverage.
The proposed method assigns a robust measure of saliency to each minimum
cost path, based on the average path cost. Edges with a low saliency compared to
alternative routes are removed from the graph, leading to an improved segmentation
result. The correctness of the network extraction is indicated by the edge saliency
and area coverage. Hence, confidence in the final result can be based on the overall
128
network saliency.
Design issues are robustness against general errors summarized in tab. 7.1. The
proposed method is robust against: a. gaps in lines, b. lines slightly off their vertex,
c. spurious lines, and d. spurious vertices at lines. The algorithm is sensitive to:
a. missed lines, b. spurious vertices off lines, and c. missed vertices at forks.
Thresholding on saliency reduces the errors caused by spurious vertices. Missed lines
are signaled by a measure of coverage (eq. 7.14), indicating how much of the line
evidence is covered by the network graph. Specific for the algorithm is the sensitivity
to shortcuts, and the inability to trace more than one line between connections. Any
algorithm based on minimum cost paths is sensitive to these types of errors.
We restricted ourselves to locally defined line networks, where lines are connecting neighboring vertices. For globally defined networks, like electronic circuits, the
algorithm can be adapted to yield a regional or global solution. Therefore, several
distance transforms have to be applied, at the cost of a higher computational complexity. The pruning of the network, and the measure of saliency is again applicable
for the global case.
Incorporation of line directional information into the algorithm results in better
estimation of line contrast, hence improves graph extraction. The eigenvector analysis
of the directional derivatives yields an estimate of the local direction of the line. The
directional information may be included by considering an anisotropic metric for the
line contrast filtering. Experiments showed a better detection of the network graph
for dashed line detection. The example given is considered as a complex configuration,
according to [8]. Disadvantage is a longer computation time, due to the anisotropic
filtering pass.
The proposed method results in the extraction of networks from connection to
connection point. The routing from a starting connection to its final destination
depends on the functionality of the network, and is not considered in this chapter.
Correct interpretation of the network in the presence of distortion obviously requires
information on the function of the network.
For the extraction of line networks the proposed method has proven to be a useful
tool. The method is robust against gaps in lines, and against spurious vertices at lines,
which we consider as the most prominent source of error in line detection. Hence,
the proposed method enables reliable extraction of line networks. Furthermore, the
method indicates detection confidence, thereby supporting error proof interpretation
of the network functionality. The proposed method is applicable on a broad variety
of line networks, including dashed lines, as demonstrated by the illustrations. Hence,
the proposed method yields a major step towards general line tracking algorithms.
Bibliography
129
Bibliography
[1] J. Ausma, M. Wijfels, F. Thoné, L. Wouters, M. Allessie, and M. Borgers. Structural changes of atrial myocardium due to sustained atrial fibrillation in the goat.
Circulation, 96:3157–3163, 1997.
[2] C. A. Beltrami, N. Finato, M. Rocco, G. A. Feruglio, C. Puricelli, E. Cigola,
F. Quaini, E. H. Sonnenblick, G. Olivetti, and P. Anversa. Structural basis of
end-stage failure in ischemic cardiomyopathy in humans. Circulation, 89:151–163,
1994.
[3] S. Beucher and F. Meyer. The morphological approach to segmentation: The
watershed transformation. In E. R. Dougherty, editor, Mathematical Morphology
in Image Processing, chapter 12, pages 433–481. Marcel Dekker, New York, 1993.
[4] L. D. Cohen and R. Kimmel. Global minimum for active contour models: A
minimal path approach. Int. J. Computer Vision, 24:57–78, 1997.
Scale and the differential structure of images. Image and Vision Computing,
10(6):376–388, 1992.
[6] J. Illingworth and J. Kittler. A survey of the Hough transform. Computer Vision
Graphics Image Process., 44:87–116, 1988.
[8] B. Kong, I. T. Phillips, R. M. Haralick, A. Prasad, and R. Kasturi. A benchmark:
Performance evaluation of dashed-line detection algorithms. In R. Kasturi and
K. Tombre, editors, Graphics Recognition Methods and Applications, pages 270–
285. Springer-Verlag, 1996.
[10] C. Lorenz, I. C. Carlsen, T. M. Buzug, C. Fassnacht, and J. Weese. A multiscale line filter with automatic scale selection based on the hessian matrix for
medical image segmentation. In Scale Space Theories in Computer Vision, pages
152–163. Springer-Verlag, 1998.
[11] F. Meyer. Topographic distance and watershed lines. Signal Processing, 38:113–
125, 1994.
[12] C. Steger. An unbiased detector of curvilinear structures. IEEE Trans. Pattern
Anal. Machine Intell., 20:113–125, 1998.
130
[14] P. W. Verbeek and J. H. Verwer. Shading from shape, the Eikonal equation
solved by grey-weighted distance transformation. Pat. Rec. Let., 11:681–690,
1990.
388, 1989.
[16] H. W. Vliegen, A. van der Laarse, J. A. N. Huysman, E. C. Wijnvoord, M. Mentar, C. J. Cornelisse, and F. Eulderink. Morphometric quantification of myocyte
dimensions validated in normal growing rat hearts and applied to hypertrophic
human hearts. Cardiovasc. Res., 21:352–357, 1987.
131
Bibliography
(a)
(b)
(c)
(d)
(e)
(f)
Figure 7.3: Example of line detection on heart tissue (a), observed by transmission light microscopy.
The dark contours show the segmented blood vessels, superimposed on the original image. Line
contrast R(.) is shown in (b), the minimum cost graph in (c). The final segmentation (d) after
local pruning of spurious edges for α = 0.9 (eq. 7.17). The estimated saliency (e) (eq. 7.11) and area
coverage (f ) (eq. 7.14), dark representing high confidence in the result.
132
(a)
(b)
(c)
(d)
(e)
(f)
Figure 7.4: Example of failures in the line detection. a. The detection of a spurious line due to a
textured region. b. The deviation of a line due to spurious line structures, the text, in the image. c.
A gap in a line; the line is without errors correctly detected by the algorithm (result not shown). d.
A missing connections due to lack of line evidence. e. Extra vertices on the line does not influence
the algorithm performance. f. Missing of a vertex at a fork, resulting in a missed line in the network
graph.
133
Bibliography
(a)
(b)
Figure 7.5: Example of failures specific for algorithms based on minimum cost paths. a. The
missing of a line due to the double linking of vertices, for which the best connection is preserved. b.
A shortcut along a better defined lines to optimally connect two vertices.
(a)
(b)
Figure 7.6: Extraction of a neurite network (a); note the gaps present in the neurites. The traced
network is shown in (b). The dots represent the interactively indicated neurite start points at the
cell bodies.
134
(a)
(b)
(c)
(d)
Figure 7.7: Extraction of a general line network. a. shows a high magnification image of ink,
completely covering the image, distorted by white cracks through which light is transmitted. No
natural markers are present. b. The saddlepoints at the bright lines (eq. 7.18). c. The detected
lines, overall saliency s̄ = 0.313, coverage c̄ = 0.962. d. The result for half the number of markers,
overall saliency s̄ = 0.316, coverage c̄ = 0.960. Note the shortcut and the removal of line ends.
135
Bibliography
(a)
(b)
(c)
Figure 7.8: Extraction of a dashed line network (a), taken from [8]; markers are interactively selected
at line crossings and line end points, indicated by grey dots. The extracted network is shown in (b).
Errors made include some shortcuts, and the missing of the bend line part due to the presence of
a second, shorter connection between the markers. Note the difference for the isotropic result for
scale σl (c). The scale σl was taken to integrate line evidence over the gaps in the dashed lines.
Deviation from centerline in the isotropic case is the result of the large integration scale compared
to line width.
Chapter 8
Discussion
8.1
Color
In this thesis, we have developed a theory for the measurement of color in images,
founded in physics as well as in measurement science. The thesis considers a physical
basis for the measurement of spatio-spectral energy distributions, integrated with the
laws of light reflection. The Gaussian color model solves the fundamental problem of
color and scale by integrating the spatial and color information.
The differential geometry framework [10, 4] is extended to the domain of color
images. As a consequence, we have given a physical basis for the opponent color
theory, and a physical basis for color receptive fields. Furthermore, it was concluded
that the Gaussian color model is the natural representation for investigating the
scaling behavior of color image features. The Gaussian color model describes the
differential structure of color images. Selection of scale enables robust and accurate
measurement of color value, even under noisy circumstances.
Color perception is mainly constraint by the number of spectral measurements,
the spectral resolution. Due to the limited space available on the retina, evolution was
forced to trade-off between the number of different spectral receptors, and their spatial
distribution. For humans, spectral vision is limited to three color samples, and a
tremendous amount of spatial samples. Therefore, the Gaussian color model measures
intensity, first order, and second order derivative of the incoming spectral energy
distribution. Daylight has driven evolution to set the central wavelength of color
vision at about 520 nm, and a spectral range of about 330 nm. For any colorimetric
system, measurement is constraint by these parameters.
A second achievement of the thesis is the integration of the physical laws of spectral
image formation into the measurement of color invariants. We define a complete
framework for the extraction of photometric invariant properties. The framework for
color measurement is grounded in the physics of observation. Hence it is theoretically
137
138
Discussion
better founded as well as experimentally better evaluated than existing methods for
the measurement of color features in RGB-images. The framework can be applied to
any vision problem where reflection laws are different from every day vision. Among
other imaging circumstances, application areas are satellite imaging [3], vision in bad
weather [7], and underwater vision.
The physical model presented in Chapter 3 demands spatial comparison in order to
achieve color constancy. The model confirms relational color constancy as a first step
in color constant vision systems [2, 8]. The subdivision of human perception in edge
detection based on color contrast, cooperating with a subsystem for assigning colors
to the segmented visual scene, may yield an overall performance which is highly color
constant. Hence, spatial edge detection based on color contrast plays an important
role in color constancy.
Most of the color invariant sets presented in Chapter 3 and Chapter 4 have spatial
edge detection as lowest order expression. Edge detection is confirmed by Livingstone
and Hubel [5] and by Foster [2] to be of primary importance in human vision. To cite
Livingstone and Hubel about one of the three visual subsystems: Although neurons
in the early stages of this system are colors-selective, those at higher levels respond
to color-contrast borders but do not carry information about what colors form the
border. They conclude that the subsystem is important in seeing stationary objects
in great detail, given the slow time course and high resolution of the subsystem.
From a physical perspective, these results are evident given the invariants derived in
Chapter 4.
We show in Chapter 4 that the discriminative power of the invariants is orderable
by broadness of the group of invariance. A broad to narrow hierarchy of the invariance
groups considered is given in section 4.2.6:
H
N
C
W
E
viewing
direction
surface
orientation
highlights
illumination
direction
illumination
intensity
illumination
color
inter
reflection
+
+
+
–
–
+
+
+
–
–
+
–
–
–
–
+
+
+
–
–
+
+
+
+
–
–
+
–
–
–
–
–
–
–
–
Invariance is denoted by +, whereas sensitivity to the imaging condition is indicated by –. The discriminative power of the invariance groups is given in section 4.3.2:
139
8.2. Geometrical Structure
σx = 0.75
σx = 1
σx = 2
σx = 3
Ê
970
983
1000
1000
Ŵ
944
978
1000
1000
Ĉ
702
820
949
970
N̂
631
757
962
974
Ĥ
436
461
452
462
The number refers to the amount of colors out of 1,000 patches still to be distinguished by the invariant, and is an absolute number given the hardware and spatial
scale σx . For the proposed color invariants, discriminating power is increased when
considering a larger spatial scale σx , thereby taking a larger neighborhood into account for determining the color value. Hence, a larger spatial scale results in a more
accurate estimate of color at the point of interest, increasing the accuracy of the
result.
The aim of the thesis is reached in that high color discrimination resolution is
achieved while maintaining constancy against disturbing imaging conditions, both
theoretically as well as experimentally. The proposed invariance groups describe the
local structure of color images in a systematic, irreducible, and complete sense. The
invariance groups incorporate the physics of light reflection as well as the constraints
imposed by human color perception.
8.2
Geometrical Structure
Characterization of histological or pathological conditions can be based on the topographical relationship between tissue structures. Capturing the arrangement of local
structure enables the extraction of global tissue architecture. Such an extraction procedure should be insensitive to distortions intrinsic to the acquisition of biological
preparations.
In this thesis, a graph based approach for the robust extraction of tissue architecture is established. Design issue is robustness against errors common in the preparation of biological tissues, like taking a transections through a three-dimensional block,
and errors in the detection of cells, bloodvessels, and cell border. Biological variety,
effectuating the architecture to be irregular, is taken as design issue rather than as
error cause [1, 6, 9, 11]. As demonstrated in Chapter 6, these design considerations
accomplished the recognition of tissue architecture.
In both Chapter 6 and Chapter 7, the extraction of geometrical arrangements is
based on local structure. Tissue architecture is derived from the local relationships
between markers. Confidence in the final result is estimated by the saliency of the
detected structures, and the goodness of fit to the quintessence of the architecture.
140
Discussion
Robust extraction of tissue architecture reduces the nonbiological variation in the
analysis of tissue sections and thus improves confidence in the result. The quantitative
methods based on local structure enables reliable classification of areas by type of
tissue.
Combining the methodology as proposed in this thesis enables effective analysis
and interpretation of histologically stained tissue sections. The proposed frameworks
allow for fully automatic screening of drug targets in pharmaceutical research [12].
8.3
General Conclusion
This thesis makes a contribution to the field of color vision. The constraints imposed by human color vision are incorporated in the physical measurement of spatiospectral energy distributions. The spatial interaction between colors is derived from
the physics of light reflection. Hence, the proposed framework for color measurement
enables the interpretation of color images from both a physical and a perceptual
viewpoint.
The second contribution of the thesis is the assessment of spatial arrangement.
The methodology presented is applied to the segmentation of biological tissue sections observed by light microscopy. The proposed concepts can be utilized in other
application areas.
As demonstrated by Mondriaan, the combination of color and spatial organization
captures the essential visual information, in that the subsystems dealing with shape
and dealing with localization both are in effect. Hence, combining color and spatial
structure yet to follow, and the way to go, resolves the perceptual organization of
images, Victory Boogie Woogie.
Bibliography
[1] F. Darro, A. Kruczynski, C. Etievant, J. Martinez, J. L. Pasteels, and R. Kiss.
Characterization of the differentiation of human colorectal cancer cell lines by
means of Voronoı̈ diagrams. Cytometry, 14:783–792, 1993.
[3] G. Healey and A. Jain. Retrieving multispectral satellite images using physicsbased invariant representations. IEEE Trans. Pattern Anal. Machine Intell.,
18:842–848, 1996.
Bibliography
141
[5] M. Livingstone and D. Hubel. Segregation of form, color, movement, and depth:
Anatomy, physiology, and perception. Science, 240:740–749, 1988.
154:359–369, 1992.
[7] S. Narasimhan and S. Nayar. Chromatic framework for vision in bad weather.
In Proceedings of the Conference on Computer Vision and Pattern Recognition,
volume 1, pages 598–605. IEEE Computer Society, 2000.
[8] S. M. C. Nascimento and D. H. Foster. Relational color constancy in achromatic
and isoluminant images. J. Opt. Soc. Am. A, 17(2):225–231, 2000.
[9] E. Raymond, M. Raphael, M. Grimaud, L. Vincent, J. L. Binet, and F. Meyer.
Germinal center analysis with the tools of mathematical morphology on graphs.
Cytometry, 14:848–861, 1993.
[11] H. W. Venema. Determination of nearest neighbours in muscle fibre patterns
using a generalised version of the Dirchlet tesselation. Pat. Rec. Let., 12:445–
449, 1991.
[12] K. Ver Donck, I. Maillet, I. Roelens, L. Bols, P. Van Osta, T. Bogaert, and J. Geysen. High density C. Elegans screening. In Proceedings of the 12th International
C. Elegans Meeting, page 871, 1999.
Samenvatting∗
Kleur en Geometrische Structuur in Beelden
Toepassingen in microscopie
Dit proefschrift behandelt zowel kleur als geometrische structuur. Kleur wordt
benaderd vanuit de theoretische meettechniek, waarbij kleur het resultaat is van
een lokale, spatio-spectrale apertuur meting. Vervolgens wordt differentiaalrekening
gebruikt voor het afleiden van kenmerken invariant onder alledaagse belichtingsomstandigheden. De kenmerken worden in experimenten uitvoerig getest op invariantie en discriminerend vermogen. De experimenten tonen het hoge discrimenerende
vermogen aan van de veschillende invarianten, en demonsteren daarbij de invariante
eigenschappen. Nieuw in het proefschrift is de koppeling tussen het fysisch meten van
kleur en de perceptie van kleur door de mens.
Verder behandelt dit proefschrift het kwantificeren van geometrische structuren, specifiek toegepast in licht-microscopie, hoewel de ontwikkelde methodologie breder toepasbaar is. Graaf mathematisch-morfologische methodes worden ontwikkeld voor het
segmenteren van regelmatige punt- en lijn-patronen, zoals aanwezig in hersen- en
hartweefsel. De ontwikkelde methodes zijn succesvol toegepast bij het kwantificeren
van morfologische parameters. De methodes kunnen worden ingezet bij het zoeken
naar potentiële medicamenten in farmaceutisch onderzoek.
∗ Summary
in Dutch
143
Dankwoord†
Het proefschrift is de afronding van een leerproces, en is daarmee inherent beı̈nvloed
door vele personen. Veel heb ik geleerd van mijn promotor, Arnold, die mij door de
moeilijke passages in de tekst heen hielp, nadat hij z’n brood al had verdiend met
een paar opmerkingen over het onderzoek zelf . . . Vooral het “pushen” van de wwwweekendjes naar de Ardennen, en later de Eifel, de Wadden, en de Zeeuwse kust, heb
ik erg gewaardeerd en zal ik in de toekomst naar uitzien.
Het onderzoek is grotendeels uitgevoerd bij Janssen Pharmaceutica, onder supervisie
van Hugo, die mij de ruimte gaf om geheel mijn eigen weg te gaan, maar wel de druk
op de ketel hield om die weg ook af te lopen. Ik heb deze vrijheid zeer gewaardeerd,
waarvoor ook Frans lof verdiend met het vele geduld dat hij had tot er “eindelijk”
iets uit kwam dat ook toepasbaar was. De samenwerking met Frans en Peter heeft,
naar mijn mening, geleid tot een aantal goede toepassingen van beeldverwerking, in
het bijzonder van scale-space methodes, in biologisch en farmaceutisch onderzoek. Dit
mede door de “technology pull” van Kris, Luc, en Rony. Hoewel ik er vaak m’n eigen
bescheiden “Hollandse” mening op na hield, hebben de discussies met Frans en Peter
zeker verandering van inzicht tot gevolg gehad.
Alhoewel Janssen een zeer grote research afdeling heeft, voel je je als AIO informatica
toch enigzins verdwaalt tussen de biologen, met als enige uitzondering Luc, die me
vele malen met statistiek om de oren sloeg. De twee- tot drie-wekelijkse bezoekjes aan
Amsterdam waren dan ook zeer welkom en een onuitputtelijke bron van inspiratie. De
weerslag van de discussies op die dagen met Rein, Theo, Anuj, Geert, Dennis, Harro,
Erik en Arnold (Jonk) is dan ook duidelijk terug te vinden in het proefschrift. Mijn
hartelijke dank hiervoor, het gaf me een flinke steun in de rug.
Een duidelijke tekortkoming als AIO informatica is de gebrekkige kennis van biologie.
Na vele verhalen van mijn belgische collega PhD studenten op Janssen over NF-κB
receptors, biochemical pathways, en apoptosis van NGF gedepriveerde PC12 cellijnen,
is één en ander me duidelijk geworden. Rony, Gwenda, Gerrit, Christopher, en de
† Acknowledgement
145
studenten van de andere afdelingen, bedankt voor het bijbrengen van de biologische
achtergrond om mee te kunnen in een pharmaceutisch bedrijf. Uiteraard ben ik hier
ook Astrid, Peter en Jos een bedankje schuldig.
De grafische afdeling van Janssen heeft de figuren in het proefschrift verzorgt, alsmede
het drukken van het proefschrift. Vele malen heb ik Lambert verontrust met rare beeld
formaten, formules, en encapsulated postscript figuren. Ook ben ik Jozef en Bob dank
verschuldigd voor het maken van figuren en posters, zeer succesvol wanneer ze eenmaal
op het poster-board pronkten. Ook dank aan Marcel en Luc voor de mogelijkheid om
bij Janssen dit onderzoek te doen, en het verzorgen van de financiering voor het
boekje.
Ik heb zowel Janssen als de UvA als een prettige omgeving ervaren om onderzoek te
doen, mede door de goede sfeer in beide groepen. Hiervoor dank aan alle collega’s van
de oude afdeling Life Sciences, met name Mirjam, Gerd, Eddy, Roger, Koen, Marc,
Guy en Greet, en iedereen van de ISIS groep, met name Benno, Marcel, Silvia, Carlo,
Kees, Wilko, Edo, Herke-Jan, Frank, Joost, Tat, Andy en Jeroen.
Astrid, hoe kan ik jouw ooit bedanken . . .

Color and Geometrical Structure in Images

Transcription

Similar documents

Video Naso-Pharyngoscope

Color Consistency Analysis in Fundus Photography 19 June 2014 1

SU-233/PVS M3X Tactical Illuminator

Market - Dialight

Understanding the effect of lighting in images

Basics of slit lamp microscopy

Extending limits of space - Geodetski zavod Celje, doo

Light Sheet Fluorescence Microscopy

blachere