An Approach Towards Fast Gradient

Transcription

An Approach Towards Fast Gradient
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
1
An Approach Towards
Fast Gradient-based Image Segmentation
Benjamin Hell*, Student Member, IEEE, Marc Kassubeck, Pablo Bauszat, Martin Eisemann,
Marcus Magnor, Senior Member, IEEE
Abstract—In this paper we present and investigate an approach to fast multi-label color image segmentation using convex
optimization techniques. The presented model is in some ways
related to the well-known Mumford-Shah model, but deviates in
certain important aspects. The optimization problem has been
designed with two goals in mind: The objective function should
represent fundamental concepts of image segmentation, such as
incorporation of weighted curve length and variation of intensity
in the segmented regions, while allowing transformation into a
convex concave saddle point problem that is computationally
inexpensive to solve. This paper introduces such a model, the
nontrivial transformation of this model into a convex-concave
saddle point problem, and the numerical treatment of the
problem. We evaluate our approach by applying our algorithm
to various images and show that our results are competitive in
terms of quality at unprecedentedly low computation times. Our
algorithm allows high-quality segmentation of megapixel images
in a few seconds and achieves interactive performance for low
resolution images.
Index Terms—unsupervised image segmentation, convex optimization
I. I NTRODUCTION
T
HE importance of grouping performed by the human
visual system has been evident at least since the Gestalt
movement in psychology [1]. Not surprisingly, grouping is also
considered essential for a wide range of computational vision
problems. For instance, intermediate-level vision problems
such as stereo or motion estimation benefit from appropriate
region support for correspondence estimation. Higher-level
problems, such as image indexing [2], foreground-background
separation [3], or recognition by parts [4] can also benefit
from a useful segmentation of the image. The problem of
image segmentation and grouping is, therefore, a long-standing
challenge in computer vision. In 1989, Mumford and Shah
introduced their famous image model [5] and showed how it
can be applied to many segmentation problems. Unfortunately,
it is highly non-linear and cannot be easily optimized. Existing
convex approaches typically suffer from high computational
costs, especially when separating into many distinct regions.
In this paper, we propose a model incorporating all the basic
aspects of the gradient-based Mumford-Shah approach while
maintaining an easy and efficiently-to-handle convex saddle
point structure to enable fast computation. Our work is inspired
Copyright (c) 2015 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to [email protected].
Authors are with the Computer Graphics Lab at TU Braunschweig, Germany. Please see http://graphics.tu-bs.de/people/ for contact.
Fig. 1. High Resolution Image Segmentation. Using the CUDA implementation of the presented algorithm, a megapixel image (top left) is segmented
in 8 regions (top right) in about 3 seconds, while a quarter megapixel image
(bottom left) can be segmented in 3 regions (bottom right) in a fraction of a
second.
by the seminal previous work on the Mumford-Shah functional
by Alberti et. al. [6] and the works of Pock et. al [7] and
Strekalovskiy et. al. [8], which build upon Alberti’s investigations. In Section II we introduce our model by considering the
binary segmentation case. We derive its unique saddle point
representation and present non-trivial extensions that allow
for multi-region segmentation and proper color information
treatment. In Section III, we show how a primal-dual convex
optimization algorithm can be applied to solve the saddle
point problem. To efficiently treat multi-region segmentation,
we introduce a very fast, approximative projection scheme in
Section III-C. We conclude with an evaluation of our approach
and compare our results to other state-of-the-art techniques.
A. Related Work
Major contributions to image segmentation and similar problems have been made by Mumfurd-Shah in [5] and by research
on active contours [9]–[11]. These approaches have in common, that they are non-convex, hence it is in general difficult
to obtain a global optimum. Their numerical method of choice
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
to obtain a local optimal solution is the level set method.
Later on researchers turned to convex problems which lead to
expectation maximization approaches and considered convex
relaxations for the piecewise-constant Mumford-Shah model
[12], [13] as well as the scalar piecewise smooth MumfordShah model [7], which incorporates ideas closely related to
the transformation process used in this paper and is based on
[6]. Our goal is to make use of aspects, that are relevant for
image segmentation, without the need of involving the entire
graph of a function that corresponds to the optimal solution of
the problem. This way, at the cost of not being able to use our
approach for image smoothing techniques, we do not have to
introduce an additional dimension to the problem, leading to
faster computation times. We also use directional information
obtained from a segment’s boundary normal, a topic which has
also been touched in [14]. Another challenge is the extension
of binary segmentation techniques to multi-region segmentation. In this regard, prior work most closely related to ours is
[15], [16] and, most of all, [12]. The latter paper also contains
summaries on crucial facts about projection algorithms that
play an essential role in convex multi-region segmentation. In
contrast to [12] this paper introduces a vastly different and
novel image segmentation model and a very fast projection
scheme, which is essential for achieving short computation
times while maintaining a high quality segmentation output.
Fast first order primal-dual algorithms for convex optimization
problems relevant to this work have also been investigated in
[17], [18] and [19].
B. Contributions
Overall this paper makes the following contributions:
●
●
●
●
●
●
We introduce a unique model that shows resemblance
to the famous Mumford-Shah functional and can be
transformed into an easy-to-handle, convex saddle point
problem without introducing additional dimensionality.
We show how this model can be transformed into its
unique saddle point representation.
We show how to exploit a well established primaldual convex optimization algorithm to obtain the unique
solution of the saddle point problem.
To calculate projections onto the specific convex sets
that evolve naturally from the introduced model and the
specific application of the first order primal-dual algorithm, we introduce a very fast, approximative projection
scheme.
The introduced model is flexible and can be adjusted to
quite some degree. The presented ideas can be applied
to a wide class of optimization problems and are not
restricted to the form of the model presented in this paper.
The presented problem can be solved without introducing
any additional dimensionality. Together with a very fast
projection algorithm this leads to very fast computation
times, which is one of the major benefits of our approach.
2
I∶
⟨.,.⟩X ∶
∥.∥p ∶
∂∶
D∶
∇∶
˜∶
∇
Ω ⊂ Rn ∶
BV ∶
V∶
Φ∶
l∶
k∶
image intensity (scalar respectively color vector)
inner product on Hilbert space X
p-norm
boundary (on sets) resp. differential operator
weak derivative
gradient vector
discrete gradient vector
rectangular domain of the given picture
space of functions of bounded variation
convex restriction set of the primal variable
convex restriction set of the dual variable
number of image channels
k + 1 is the number of segments
II. M ODELING P ROCESS
This section introduces the optimization model that will
be investigated throughout this paper. For convenience, we
start with presenting the binary segmentation case first (Section II-A) and consider the extension to the multi-region case
afterwards (Section II-C). Also, we will focus on basic color
image treatment first and will offer an extension of the model
to treat intensity gradients for each color channel individually
later on. From a modeling perspective, both extensions are
straightforward but involve some technical difficulties that
need to be overcome. All our observations hold true for
arbitrary domain dimension n; for standard 2D image segmentation we have n = 2.
A. The Novel Binary Segmentation Model
We first consider a rather general form of a binary image
segmentation model. To label the occurring segments we use
an indicator function v ∶ Ω → {0,1} which operates on the
image domain Ω. The space of all these binary functions will
be called V˜ ∶= {v ∣ v ∶ Ω → {0,1}}. Jv is the jump set of the
binary function v which represents the boundary of the individual image segments. Hn−1 represents the n−1-dimensional
Hausdorff-measure. For dimension n = 2 (2D image segmentation), this n−1-dimensional measure corresponds to curve
length measurement. The general form of our binary model is
then given by
inf
v∈V˜
∫ ∥DI(x)∥2 dx +
Ω/Jv
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
intensity variation on segments
2
λ1 ∫ ∥d(x)−νJv(x)∥2 dHn−1 +λ2∫ g(x)dHn−1 +
Jv
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
directional deviation
Jv
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
(1)
weighted boundary length
λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx
Ω
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
image data
C. Notation and Abbreviations
The following notation will be used throughout the whole
paper.
λ1 ,λ2 ,λ3 ∈ R are scalar weights to balance the individual
terms in the objective function, in essence controlling their
respective influence. In the following we will explain and
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
Fig. 2. Approximation in the discrete setting The figure shows the exact
boundary (blue line) and its discrete approximation (red blocks). As the
calculation eventually takes place on an equidistant grid (pixel grid in the
2D case), we can approximate the integral ∫Ω/Jv ∥DI(x)∥2 dx (integration
˜
without blue curve) via subtracting ∫Jv ∥∇I(x)∥
2 dx (integration red boxes)
˜
˜ is the numerical approximation of the image
from ∫Ω ∥∇I(x)∥
2 dx. ∇I
gradient vector and Jv the jump set of the binary function v, which
corresponds to the boundary (cf. (3) and (4)).
3
incides with the ability to transform a relaxed version of the
problem into a convex saddle point problem. Alberti et. al. [6]
show that this is possible for the Mumfurd-Shah functional [5].
Unfortunately, the saddle point problem can only be obtained
by extending the solution space by one additional dimension.
This leads to high computational costs for obtaining an optimal
solution numerically because a higher dimensional space has
to be discretized. In contrast, our approach works directly
with image gradients (cf. the intensity variation term) and
without any similarity to the initial image term like the beforementioned Mumford-Shah functional. Hence, we are able to
preserve the dimensionality of the solution space by transforming problem (1) first and obtaining a convex saddle point
representation for this problem afterwards in Section II-B.
The only term that needs to be modified is the intensity
variation term ∫Ω/Jv ∥DI(x)∥2 dx. We assume I ∈ BV, so according to Appendix A we have the following decomposition
of the distributional derivative DI:
motivate the individual terms of the objective function above.
intensity variation on segments: This term essentially
penalizes any large deviation of intensity values on each
segment, ignoring intensity changes that occur at a segment’s
boundary, which is described by the jump set Jv . A similar
term can, for example, be found in the Mumford-Shah
functional [5]. Note that Jv is a null set with respect to
the n-dimensional Hausdorff measure on Ω. DI has to be
understood as the distributional derivative of the intensity I,
which we assume to be a function of bounded variation and
to exhibit discontinuities on the jump set Jv . This way the
exclusion of the null set Jv in the set Ω/Jv makes sense and
cannot be omitted.
DI = ∇I +CI +(I + −I − )νI dHn−1 ⌞JI
From this representation it immediatelly follows:
∫ ∥DI(x)∥2 dx = ∫ ∥DI(x)∥2 dx− ∫ ∥DI(x)∥2 dx =
Ω
Ω/Jv
weighted boundary length: This term penalizes large
boundary measure and additionally weighs the penalization at
each point x by g(x) > 0, an idea also occurring in edge-based
segmentation approaches [11]. There are lots of meaningful
choices for a weighting function g(.), such as ideas from
geodesic active contours [11]. For one possible choice see
Section II-E.
image data: This term serves as a regularizer and involves
the data terms ρi (x). Essentially, ρi (x) yields an image data
related cost of assigning label i at point x. For a specific choice
of ρi see Section II-E. Without this image data term, v ≡ const.
on Ω, i.e. the jump set Jv is empty and no segmentation takes
place, would tend to be the optimal solution of problem (1),
which has to be avoided.
Problem (1) has been designed with the purpose of obtaining
the solution while minimizing computational effort. This co-
Jv
+
−
n−1
∫ ∥DI(x)∥2 dx− ∫ (I (x)−I (x))dH
Ω
(3)
Jv
˜ as the numerical derivative
In a discrete setting, i.e. with ∇I
and calculation taking place on a equidistant grid, we get
+
−
n−1
≈
∫ ∥DI(x)∥2 dx− ∫ (I (x)−I (x))dH
Ω
directional deviation: νJv is the inner unit normal to the
boundary of the set {x ∶ v(x) = 1}. The boundary itself
coincides with the jump set Jv . In (9) we will see that this
normal vector is closely related to the distributional derivative
of the indicator function v. d(x) represents a suggestion for
the unit normal vector of the boundary passing through x. We
will use the normalized image gradient for d(x) later on (cf.
Section II-E). Direction-based terms, though with a different
intention, have also been used in [14].
(2)
Jv
n−1
˜
˜
2 dx− ∫ ∥∇I(x)∥
2 dH
∫ ∥∇I(x)∥
Ω
(4)
Jv
which is visualized in Figure 2.
Substituting (3) into (1) and making use of the fact
that the term ∫Ω ∥DI(x)∥2 dx has no influence on the
optimal argument of the whole optimization problem we
arrive at the final version of our binary segmentation model:
2
inf λ1 ∫ ∥d(x)−νJv(x))∥2 dHn−1 +
v∈V˜
Jv
+
−
n−1
∫ λ2 g(x)−(I (x)−I (x)) dH +
Jv ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
f (x,λ2 )=f (x)∶=
(5)
λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx
Ω
B. Transformation into Saddle Point Problem
One essential property of the optimization problem (5) is that
it can be rewritten as a convex-concave saddle point (min/max)
problem by introducing additional dual variables and relaxing
the set V˜ . This property will be essential for obtaining the
optimal solution from an analytical and numerical standpoint.
We make use of a similar technique as the one presented in
[6] by Alberti et. al. for the Mumford-Shah functional. Note
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
4
that the approximation process, and hence the particular optimization problem (5), can be transformed into the saddle point
problem without introducing an additional dimensionality as
in the Mumford-Shah case (cf. [6]). Another deviation from
Alberti’s work, and at the same time the key to obtaining the
saddle point formulation, is the derivation of an appropriate
restriction set Φ for the dual variable.
−2λ1 d(x) with radius f (x)+λ1 (1+∥d(x)∥22 ). To do so, we
take a look at the possible radius:
Theorem 1 (Curve Integral Representation).
and C(x) being the supposed to be center of the circle.
Substituting into (13) we obtain
2
n−1
=
∫ λ1 ∥d(x)−νJv(x)∥2 +f (x,λ2 )H
Jv
φ∈Φ
with φ(x) being on the line described by
2
⟨φ(x),η(x)⟩ = f (x)+λ1 ∥d(x)−νJv(x)∥2
2
with Φ being the convex restriction set for the dual variable
φ given by:
Φ = {φ ∈ C0∞ (Ω)n ∶ ∥φ(x)+2λ1 d(x)∥2 ≤
f (x)+λ1 (1+∥d(x)∥22 ), x ∈ Ω}
(7)
Proof. Let η(x) ∈ R2 be an arbitrary unit vector, i.e.
∥η(x)∥2 = 1. Let us further define a half space dependent on
η(x) by
Hη(x) ∶= {φ(x) ∈ Rn ∶ ⟨φ(x),η(x)⟩ ≤
2
f (x)+λ1 ∥d(x)−η(x)∥2 }
(8)
We now turn to one of the most important facts for this proof, a
general form of the coarea formula, also used in [6], which is according to Appendix A - a special case of the decomposition
of a function of bounded variation:
Dv = νv ⋅Hn−1 ⌞Jv
(9)
with Hn−1 being the n−1-dimensional Hausdorff measure, i.e.
curve length measure in the 2D case, and νJv being the inner
unit normal vector to the boundary of the set {x ∶ v(x) = 1}.
With these results from measure theory we obtain for
φ ∈ C0∞ (Ω)n :
n−1
∫ ⟨φ(x),Dv(x)⟩ = ∫ ⟨φ(x),νJv(x)⟩dH
(10)
(14)
(15)
If we choose C(x) = −2λ1 d(x), we obtain
R(x) = f (x)+λ1 (1+∥d(x)∥22 )
Ω
Ω
(13)
R(x) = f (x)+λ1 ∥d(x)−η(x)∥2 −⟨C(x),η(x)⟩
(6)
sup ∫ ⟨φ(x),Dv(x)⟩
R(x) ∶= ⟨φ(x)−C(x),η(x)⟩
(16)
which is independent of η(x).
Remarks
● At each point x the set Φ describes a circle around
center −2λ1 d(x) with radius f (x)+λ1 (1+∥d(x)∥22 ).
● The radius of the circle has to be positive for equality
(6) to hold. In practice, this can be easily achieved at
any point x by choosing suitable parameters λ1 , λ2
and an appropriate weighting function g(x). As the
value of λ1 has, in general, only little impact on the
final segmentation result, this does not pose any serious
restriction.
● It is possible to choose negative values for f (x). This is
an interesting property of the model. Essentially, f (x) <
0 forces the boundary to pass through point x.
Using Theorem 1, the optimization problem (5) can be
rewritten as:
inf sup ∫ ⟨φ(x),Dv(x)⟩+
v∈V˜ φ∈Φ
Ω
λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx
(17)
Ω
with V˜ ∶= {v ∶ Ω → {0,1}}. Convex relaxation of V˜ into
V ∶= {v ∶ Ω → [0,1]} makes (17) a convex-concave saddle
point problem, which will be considered throughout the rest
of this paper:
Jv
Recall that ∥νJv(x)∥2 = 1. So the following equality holds true:
∫
2
λ1 ∥d(x)−νJv(x)∥2 +f (x)dHn−1 =
Jv
sup ∫ ⟨φ(x),Dv(x)⟩
φ∈Φ
Ω
λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx
(11)
(18)
Ω
with Φ defined in (7).
˜
φ∈Φ
Ω
˜ is the space of all C ∞ (Ω)n -functions, which, evaluwhere Φ
0
ated at a certain point x, lie in the intersection of all possible
half spaces defined by (8):
˜ ∶= {φ ∈ C0∞ (Ω)n ∶ φ(x) ∈ Hη(x) ,∥η(x)∥2 = 1,x ∈ Ω}
Φ
inf sup ∫ ⟨φ(x),Dv(x)⟩+
v∈V
(12)
˜ = Φ, as
The only thing left to do is to show that indeed Φ
defined in (7). This coincides with the intersection of all
possible half spaces Hη(x) forming a circle around the center
C. Extension to Multi-Region Case
In this section, we extend the saddle point representation (18)
to the multi-region case. This extension is in a certain way
straightforward, but the representation for multiple segments
introduces some technical difficulties that need to be addressed
accurately.
To represent multiple regions mathematically we introduce a
concatenation of binary functions, each one splitting the image
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
5
area into one more segment. This representation follows the
ideas presented in [16], which in return is based on [15]. The
new set of binary variables is defined by:
V˜ = {vi ∶ Ω → {0,1} ∶ 1 ≥ v1 (x) ≥ ⋅⋅⋅ ≥ vk (x) ≥ 0, x ∈ Ω}
(19)
with region Ri (i = 1,...,k +1) being represented by
Ri = {x ∈ Ω ∶ vi−1 (x)−vi (x) = 1} (i = 1,...,k +1)
k
(20)
with v0 (.) ≡ 1 and vk+1 (.) ≡ 0.
Straightforward extension of the objective function in (18)
by using the scalar product ∑ki=1 supφi ∈Φ ∫Ω ⟨φi ,Dvi ⟩ leads
to counting certain parts of the boundary multiple times,
which unfortunately produces undesirable results (especially
for f (x) < 0). Hence, to obtain an appropriate curve integral
representation, analogous to the one in Theorem 1, we need
to restrict the corresponding dual variables to an appropriate
convex set Φ, which is given by the following theorem.
Theorem 2 (Curve Integral Representation for Multiple Segments).
∫
2
λ1 ∥d(x)−νJv(x)∥2 +f (x,λ2 )dHn−1 =
Jv
(21)
k
sup ∑ ∫ ⟨φi (x),Dvi (x)⟩
φ∈Φ i=1
Ω
with Dvi (.) (i = 1,...,k) being the weak derivatives of the
binary functions vi (.) ∶ Ω → {0,1} (i = 1,...,k) representing the
segments according to scheme (20). Jv being the combined
jump set of all these binary functions vi (i = 1,...,k), i.e.
the boundary of all k +1 segments, νJv being the inner
normal to at least one of the sets {x ∶ vi (x) = 1} (i = 1,...,k)
(boundaries of segments specified by primal variable v may
overlap, but if they do they still share the same normal)
and Φ being the convex restriction set for the dual variable
φ(.) = (φ1 (.),...,φk (.)) given by:
Φ ={(φ1 ,...,φk ) ∶ φi ∈ C0∞ (Ω)n ,∥ ∑ φi (x)+2λ1 d(x)∥2 ≤
i1 ≤i≤i2
f (x)+λ1 (1+∥d(x)∥22 ), x ∈ Ω, 1 ≤ i1 < i2 ≤ k}
(22)
Proof. Let Ai = {x ∶ vi (x) = 1}, with v = (v1 ,...,vk ) ∈ V˜ (cf.
definition (19)). Due to the structure of V˜ we immediately
obtain the property:
x ∈ ∂Ai2 ∧x ∈ ∂Ai1 ⇒ x ∈ ∂Ai (i1 ≤ i ≤ i2 )
(23)
So for each x ∈ Ω there exist i1 ,i2 ∈ {1,...,k} such that
k
i2
∑⟨φi (x),Dvi (x)⟩ = ⟨ ∑ φi (x),Dvi1 (x)⟩
i=1
From Theorem 2 in conjunction with a straightforward
extension of the image data term in the objective function
and convex relaxation of the primal restriction set V˜ , we
obtain the following convex-concave saddle point problem
for multi-region image segmentation:
(24)
inf sup ∑ ∫ ⟨φi (x),Dvi (x)⟩+
v∈V
φ∈Φ i=1
Ω
k+1
(25)
λ3 ∑ ∫ (vi−1 (x)−vi (x))ρi (x)dx
i=1 Ω
with convex restriction set
V = {vi ∶ Ω → [0,1] ∶ 1 ≥ v1 (x) ≥ ⋅⋅⋅ ≥ vk (x) ≥ 0, x ∈ Ω}
(26)
and Φ defined in (22), v0 (.) ≡ 1, vk+1 (.) ≡ 0 and
ρi (i = 1,...,k +1) the image data term. For a specific
choice of ρi see Section II-E.
D. Handling Color Information
So far, color information only appeared in the image data term
(volume integral) of the objective function. The curve integral
term does not make any use of multiple color channels
yet. The image gradient ∇I(.) has been pre-calculated on
a grayscale picture. While this leads to faster calculation
times to obtain the segmentation, obviously information
that is easily available is not used in the segmentation
process. To do so, we introduce additional dual variables
(one for each color channel) in the saddle point problem
(25), which leads us to a simple, full-color multi-region model:
3
k
inf ∑ sup ∑ ∫ ⟨φij (x),Dvj (x)⟩+
v∈V
i=1 φi ∈Φi j=1 Ω
k+1
(27)
λ3 ∑ ∫ (vi−1 (x)−vi (x))ρi (x)dx
i=1 Ω
with φi (.) = (φi1 (.),...,φik (.)) (i = 1,...,3) and
Φi ={(φi1 ,...,φik ) ∶ φij ∈ C0∞ (Ω)n ,
∥ ∑ φij (x)+2λ1 di (x)∥2 ≤ fi (x)+λ1 (1+∥di (x)∥22 ),
j1 ≤j≤j2
x ∈ Ω, 1 ≤ j1 < j2 ≤ k}
(i = 1,...,3)
(28)
with Ii (.) representing the image intensity according to
channel i (i = 1,...,3). The definition of fi (.) follows directly
from applying the definition in (5) channelwise, i.e.:
fi (x) = fi (x,λ2 ) ∶= λ2 gi (x)−(Ii+ (x)−Ii− (x))
(29)
i=i1
where equality has to be understood according to the definition
of the weak derivative D and i1 > i2 is admissible, in which
case the sum yields 0.
The rest of the proof follows by considering the right hand
side of equation (24) and applying the proof of Theorem 1.
E. Specific Choices for the Implementation
Let N (.) represent the normalization operator, i.e.
⎧
y
⎪
⎪
N (y) ∶= ⎨ ∥y∥2
⎪
⎪
⎩0
y≠0
y=0
(30)
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
For all our specific implementations of the algorithm based
on the models presented in Section II-A, Section II-C and
Section II-D, we use the following choices for the respective
functions introduced so far:
d(x) ∶= N (DI(x)): With this choice, the directional deviation
term describes the deviation of the normalized image gradient
(gradient of intensity I) from the unit normal vector of the
separating curve (described by Jv ). This favors a region
boundary which has a normal vector pointing in the direction
of intensity increase. Normalization of the image gradient
vector is necessary to be able to compare it to a unit normal
vector.
1
g(x) = g(x,α) ∶= 1+α∥DI(x)∥
(α > 0): This specific choice fa2
vors the segment’s boundary to run through points with large
image gradient. Similar terms occur in edge-based image
segmentation approaches, such as geodesic active contours
[11]. For the color model in Section II-D we simply define
this weight channelwise, i.e. gi (x) ∶= 1/(1+α∥DIi (x)∥2 ).
ρi (x) ∶= ∥I(x)− I¯i ∥2 (i = 1,...,k): Describes the local (pointwise) deviation of the image intensity I(x) - for color images
this is a 3D-vector - from some appropriate mean color
value I¯i . Those values need to be obtained once in a precomputation step. One option to calculate those mean color
values would be to use k-means clustering, although a lot of
other appropriate methods like histogram analysis or manual
selection are available and may be more appropriate depending
on the scenario at hand. Once again, it is intentional to keep
ρi simple as the according term in the objective function acts
as a regularizer (cf. Section II-A).
Note that all these choices could be replaced by different
meaningful terms that do have certain intentional properties
and depend on the same arguments.
III. O PTIMIZATION
This section deals with the actual optimization process. The
main purpose of transforming the initial optimization problem
into a convex-concave saddle point problem is to be able to
use fast primal-dual convex optimization algorithms to solve
the problem for its unique solution. It is well known that
those algorithms perform very well on optimization problems
exhibiting a saddle point structure. This has been shown for
many computer vision problems like image denoising, image
inpainting, image deconvolution and zooming, motion estimation, and image smoothing, just to name a few. An application
to those examples, a detailed comparison to other algorithms
and proof of the stability and convergence properties of first
order primal-dual algorithms for convex saddle point problems
can be found in [17] and [19].
In this section we will give a short overview of the primaldual algorithm we use for calculating our results, and we
show how to apply this algorithm to the specific problem at
hand. Applying the algorithm involves nontrivial projection
operations in the multi-region case. In Section III-C, we will
show fast projection algorithms for projecting onto the primal
restriction set V and introduce our own projection algorithm
to compute projections onto the dual restriction set Φ.
6
A. The Primal-Dual Projection Algorithm
The primal-dual algorithm we use is presented in detail in [17].
This section gives a short overview and determines the notation
we will use later on. The algorithm works on convex-concave
saddle point problems.
The general form of the problem considered by Chambolle
et. al. [17] suited to our problem is
inf
sup
v∈BV (Ω)k φ∈C ∞ (Ω)nkl
⟨φ,K(v)⟩L2 (Ω)nkl +G(v)−F ∗ (φ) (31)
0
The function spaces have been chosen to suit problems (18),
(25) and (27). Recall that k +1 is the number of segments
and l shall be the number of image channels (l = 1:grayscale,
l = 3: color). K(.) is a linear operator, G(.) is a convex lower
semicontinuous function, and F ∗ (.) is the convex conjugate of
a lower semicontinuous function F (.). The conjugacy notation
has to do with the usual derivation of saddle point problems
via the Fenchel-Legendre transform [20].
With the notation introduced above, the optimization algorithm we use is the following:
φn+1 = (Id+σ∂F ∗ )−1 (φn +σK(˜
v n ))
v n+1 = (Id+τ ∂G)−1 (v n −τ K ∗ (φn+1 ))
n+1
v˜
= 2v
n+1
−v
(32)
n
with Id being the identity operator and step widths σ,τ > 0
chosen such that στ ∥K∥22 < 1, with ∥K∥2 being the operator
norm of K. The inequality is essential for convergence of
the algorithm, hence choosing an appropriate upper bound of
∥K∥22 is mandatory.
There has been a paper published on preconditioning for the
primal-dual algorithm [18] and one on a more general theory
regarding first order forward-backward algorithms [19], both
introducing techniques to potentially speed up the algorithm
at hand. We found that in our particular case a heuristic, fixed
choice of the step sizes τ and σ introduced in (32) yields the
best results.
To apply the algorithm one essentially needs to implement
discrete versions of the specific resolvent operators (Id+
σ∂F ∗ )−1 and (Id+τ ∂G)−1 . In addition, proper discrete forms
of the linear operator K and its adjoint operator K ∗ have
to be determined. The process of convexification explained
in Section II leads to the relaxed convex restriction set V
(26). So as a final step the solution, i.e. the primal variable
v = (v1 ,...,vk ) needs to be thresholded in each component to
obtain a proper segmentation.
B. Applying the Algorithm
This section gives a short summary of the problem-specific
operators appearing in the algorithm (32) and other implementation details. We consider the multi-region color model
(27) introduced in Section II-D. The treatment for the binary
segmentation model (18) and the semi-color multi-region
model (25) should become evident as special cases.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
7
Comparison of the saddle point form of the multi-region
color model (27) to the general form (31) yields
l
k
∑ ∑ ∫ ⟨φij (x),Dvj (x)⟩ −δΦ (φ)
v∈BV (Ω)k φ∈C ∞ (Ω)nkd i=1 j=1
´¹¹ ¹ ¸¹¹ ¹ ¶
0
Ω
∗
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ F (φ)=
inf
sup
⟨φ,K(v)⟩=
(33)
k+1
+λ3 ∑ ∫ (vi (x)−vi−1 (x))ρi (x)dx+δV (v)
i=1 Ω
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
G(v)=
with δΦ (.) and δV (.) being the indicator functions for the dual
restriction set Φ = {Φ1 ,...,Φl } defined in (28), respectively the
primal restriction set V defined in (26):
⎧
⎧
⎪0 v ∈ V
⎪0 φ ∈ Φ
⎪
⎪
δV (v) = ⎨
, δΦ (φ) = ⎨
(34)
⎪
⎪
∞ otherwise
∞ otherwise
⎪
⎪
⎩
⎩
Note that according to (4) it makes sense to choose fi (x) =
˜ i (x) (∇
˜ being the discrete gradient operator),
λ2 gi (x)− ∇I
which occurs in the definition of the set Φ and has been defined
in (5), respectively (29). For a practical choice of gi (.) see
Section II-E.
The resolvent operators can be deduced from the following
identity (cf. [17]) which implies usage of first order optimality
conditions and the fundamental theorem of variational calculus:
(Id+σ∂F ∗ )−1 (φ) = argmin {
θ∈C0∞ (Ω)nkd
∥θ −φ∥22
+F ∗ (θ)} = ΠΦ (φ)
2σ
∥u−v∥22
+G(u)} =
2τ
u∈L2 (Ω)k
⎛
⎛ −∥I(.)− I¯1 ∥2 +∥I(.)− I¯2 ∥2 ⎞⎞
⋮
⎟⎟
ΠV ⎜v(.)−τ λ3 ⎜
⎝
⎝−∥I(.)− I¯k ∥2 +∥I(.)− I¯k+1 ∥2 ⎠⎠
(Id+τ ∂G)−1 (v) = argmin {
(35)
with Π being the pointwise projection operator with respect
to Euclidean distance. It should be evident by now that these
projection operators will have a huge influence on calculation
time, so fast algorithms to perform these projections will
be essential. We introduce algorithms for ΠΦ and ΠV in
Section III-C.
The operator K(.) for l = 3 (color) is given by
⎛⎛Dv1 ⎞
⎛Dvk ⎞⎞
K(v) = Dv ∶= ⎜⎜Dv1 ⎟ , ... , ⎜Dvk ⎟⎟
⎝⎝Dv1 ⎠
⎝Dvk ⎠⎠
(36)
Comparison to (33) directly leads to the adjoint operator
K ∗ (.):
3
3
i=1
i=1
K ∗ (φ) = −divφ ∶= −(∑ divφi1 , ... , ∑ divφik )
(37)
Note that for the grayscale case (l = 1) those operators reduce to K(v) = Dv = (Dv1 ,...,Dvk ) and K ∗ (φ) = −divφ =
−(divφ1 ,...,divφk ). Using an estimation process like in the
proof of Theorem 3.1 in [21] we obtain ∥K∥22 ≤ 4nkl. We use
this estimate to choose appropriate step widths σ and τ such
that στ ∥K∥22 < 1, which leads to convergence of the primaldual algorithm (32) (cf. [17]).
For initializing v and φ practically any random value could
be used, as the final result is unique due to the convex structure
of the problem and hence does not depend on the starting point
of the computation. Of course, to reduce computation time it
is best to choose an initial solution that is as close as possible
to the final - yet to be determined - optimum. As noted in
Section II-E there are many options to compute mean color
values I¯i for the data term ρi . In general it makes sense to
base a first guess for v on the mean color values I¯i chosen for
the image data term and choose Dv as a starting point for φ.
C. Projection Algorithms
Eventually all the operations presented in previous sections
take place locally at each point of the input image, which is
especially beneficial for parallel implementations (e.g. on the
GPU). This also holds true for the projection operators ΠV
and ΠΦ . As already mentioned, algorithms representing those
local projections have huge influence on computation time.
1) Projection on Primal Set V: Taking a look at the
resolvent operator (Id+τ ∂G)−1 presented in (35) one sees
that ΠV projects at each point x a vector v(x) ∈ Rk onto the
set V (x) ∶= {y = (y1 ,...,yk ) ∈ Rk ∶ 1 ≥ y1 ≥ ⋅⋅⋅ ≥ yk ≥ 0}, which
follows directly from the definition of V in (26). To execute
the projection we slightly modify an algorithm presented in
[12] by Chambolle et al.:
Algorithm
To project y = (y1 ,...,yk ) ∈ Rk on V (x), we obtain the
projection y˜ = (˜
y1 ,..., y˜k ) = ΠV (x) (y) by
1) set y˜i ∶= yi and define k clusters Zi ∶= {i} (i = 1,...,k)
2) identify index i with y˜i < y˜i+1 , if there is no such index
abort the algorithm and jump to step 4
3) set clusters Zj ∶= Zi ∪Zi+1 ∀j ∈ Zi ∪Zi+1 , set y˜i+1 ∶=
y˜i ∶= ∑j∈Zi (˜
yj )/∣Zi ∣ and go back to step 2
4) set
y˜i ∶= max(min(˜
yi ,1),0)
(clipping),
then
y˜ = ΠV (x) (y)
Remarks
● The presented algorithm deviates from the algorithm
presented in [12] in the last step, i.e. the clipping
process.
● It can be shown that doing the clipping at the end results
in the algorithm yielding the exact projection on the set
V (x) defined in (26).
2) Projection on Dual Set Φ: For implementing the pointwise projection ΠΦ onto the set Φ = {Φ1 ,...,Φl } we need a fast
algorithm to project φi (x) ∈ Rnk (i = 1,...,l) at point x onto
Φi (x) ∶= {y = (y1 ,...,yk ) ∈ Rnk ∶ ∥ ∑ yj +2λ1 di (x)∥2 ≤
j1 ≤j≤j2
fi (x)+λ1 (1+∥di (x)∥22 ), 1 ≤ j1 < j2 ≤ k}
(38)
which follows directly from the definition (28) of the dual
restriction set Φi (i = 1,...,l).
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
8
Given a vector y = (y1 ,...,yk ) ∈ Rnk we start with introducing z = (z1 ,...,zk ) ∈ Rnk by defining
input
i
zi ∶= ∑ yj , z0 ∶= 0Rn ⇒ zj2 −zj1 −1 = ∑ yj
j1 ≤j≤j2
j=1
(39)
SMMS
In addition we define radius and center by
R ∶= fi (x)+λ1 (1+∥di (x)∥22 ), C ∶= −2λ1 di (x)
(40)
ours
This way, we can rewrite
Φi (x) = {y = (y1 ,...,yk ) ∈ Rnk ∶ ∥zj2 −zj1 −1 −C∥ ≤ R,
j
zj = ∑ ym (j = 1,...,k), z0 = 0Rn , 1 ≤ j1 ≤ j2 ≤ k}
(41)
m=1
which leads to the idea of using successive projections to
approximate the projection process.
Algorithm
Let y = (y1 ,...,yk ) ∈ Rnk be given, we then project on Φi (x)
specified in (38) by
1) set z0 ∶= 0Rn , z˜0 ∶= z0 and j ∶= 1
2) set zj ∶= z˜j−1 +yj , then iteratively project zj
onto
⋂ Km with circle Km ∶= {z ∈ Rn ∶
m=0,...,j−1
∥z − z˜m −C∥2 ≤ R}, the projected point shall be z˜j
a) from the definition of C and R one can see that
⋂ Kj ≠ ∅ (i = 1,...,k) if fi (x) ≥ 0, otherwise
j=0,...,i−1
(b) yields at least a good approximation
b) a fast and again approximative routine for projecting zj onto ⋂ Km is to set
m=0,...,j−1
z˜j ∶= argmax {∥z −zj ∥2 ∶ z = ΠKm(zj ) (m = 0,...,j −1)}
z
with ΠKm being the Euclidean projection on the
circle Km
3) set j ∶= j +1, if j ≤ k go to step 2, otherwise go to step 4
4) obtain final projected vector y˜ = (˜
y1 ,..., y˜k ) via reversing the accumulation process: y˜j ∶= z˜j − z˜j−1 (j =
1,...,k)
Remarks
● The presented algorithm does not yield the exact projection, but the objective function is very well conditioned
with regard to the dual variable φ. So small errors in
obtaining this dual variable practically only impact the
rate of convergence, not the final result.
● The algorithm is a very fast way to obtain a sufficiently
accurate projection on the set Φi (x) and hence on the
discretization of the set Φ.
IV. E XPERIMENTAL VALIDATION
In this section we apply the algorithm developed in Section II
and III to a variety of different test scenarios. We show that
the results are competitive to state-of-the-art segmentation algorithms in terms of quality and can be obtained very fast. The
presented algorithm is fully parallelizable. We implemented it
in CUDA.
Fig. 3. Binary segmentation: Results for a series of natural scenery images
(top) by the new statistical model for Mumford-Shah (SMMS) [23] (middle)
and our proposed approach (bottom). Each picture has a resolution of about
0.2 megapixels. Computation time for our approach is, depending on the
image, between 40 and 95 ms for a high accuracy of = 10−4 . According to
[23], the SMMS algorithm takes roughly 10 seconds to compute each image.
In the preceding sections we showed how to obtain a highly
efficiently computable structure of the initial optimization
problem, which led to the primal-dual projection scheme using
tailored projection algorithms. We show that the whole process
leads to very short computation times, which to the best of
our knowledge have not been achieved with other approaches
based on similar models yet. All statistics were measured on
an Intel Core i7-4930K, 3.40 GHz and 32 GB RAM PC with
an NVIDIA GeForce GTX 780Ti running Windows 8 64-bit.
In the following, we show results for binary segmentation
and then advance to multi-region segmentation evaluation.
We conclude this section with a comparison to clustering
algorithms and the application of the Berkeley image segmentation benchmark BSDS500 (cf. [22]). All test scenarios are
evaluated using algorithm (32). Recall that the primal variable
is denoted by v and the dual variable by φ. The output we
show is the accumulated, thresholded primal variable, which
we rescaled to values between 0 and 1. Each intensity in the
resulting grayscale image corresponds to one segment. If the
root mean square deviation
√ in the primal variable drops below
a specified value , i.e. ∥v n+1 −v n ∥22 /(resolution⋅k) < , the
algorithm terminates. If not explicitly mentioned otherwise, we
use = 10−4 , σ = 10 and τ = 0.99/(σ ⋅4nkl) (cf. Section III-B)
and the threshold value 0.5. We refer to results computed with
50000 iterations as reference solutions.
A. Binary Segmentation
We first consider our binary segmentation model (18). While
from a mathematical standpoint the theory behind the algorithm in the binary case is pretty much the same as for
the multi-region case, the projection scheme presented in
Section III boils down to an easily computable clipping (for
the primal variable) and projecting onto a single circle (dual
variable). Figure 3 shows results for binary segmentation of
graylevel images. We compare our results with those of Ref.
[23]. The results are comparable in terms of quality and can be
obtained in a fraction of a second even for high accuracy . Our
experiments show that realtime (≤ 50 ms) computation speeds
are achievable with almost the same segmentation result.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
Fig. 4. 3-label segmentation: The input image with resolution 512 × 512
(left) and the initial solution based on least Euclidean distance towards picked
mean color values {I¯1 , I¯2 , I¯3 } (middle). The output of the algorithm (right)
based on the semi-color model (25) is obtained in approx. 250 ms with =
0.5 ⋅ 10−4 in 700 iterations. Parameter choices are λ1 = 10, λ2 = 5, λ3 = 1 and
α = 12.
Fig. 5. Challenging 4-label segmentation: The input image initially created
by Farbman et. al [24] at resolution 500 × 375 (left) and the initial solution based on least Euclidean distance towards picked mean color values
{I¯1 , I¯2 , I¯3 , I¯4 } (middle). The output of the algorithm (right) based on the
semi-color model (25) is obtained in approx. 0.5 seconds in 1400 iterations.
Parameter choices are λ1 = 2, λ2 = 21, λ3 = 2 and α = 1.
B. Multi-Region Segmentation
For multi-region segmentation using our semi-color model
(25) and full-color model (27) presented in Section II, the computation process involves far more complicated projections,
as introduced in Section III-C. In most cases it is sufficient
to use our much faster multi-region semi-color model (25).
Results for 3-label, 4-label and 8-label segmentation using this
approach are presented in Figure 4, Figure 5 and Figure 6.
In Figure 7, we present a case where the full-color model
yields superior results to the multi-region model, i.e. when the
grayscale-converted input color image has very low contrast.
C. Convergence Analysis
The rate of convergence is essential for any iterative algorithm.
As we are dealing with a convex-concave saddle point problem
in (18), (25) and (27), it is safe to assume convergence when
∥v n+1 −v n ∥2 gets very ”low”. What ”low” means depends on
the scenario at hand and cannot be answered in general. For
example, ∥v n+1 −v n ∥2 scales linearly with the number of segments and also depends on the step width τ chosen for the al-
Fig. 6. 8-label segmentation: The input image also used in [15] at resolution
640 × 491 (left) and the initial solution based on k-means clustering (middle).
The output of the algorithm (right) based on the semi-color model (25) is
obtained in 1.4 seconds with = 10−4 in 900 iterations. Parameter choices
are λ1 = 2, λ2 = 3.5, λ3 = 1 and α = 1.
9
Fig. 7. Model comparison: In this artificially created case the input image
(left) is segmented using the semi-color model (middle) and the full-color
model (right). Due to the lack of image gradient information, the image data
term takes the predominant role in the semi-color model (cf. Section II-A).
Fig. 8. Relative convergence: Root√
mean square deviation (RMSD) of successive primal variable values, i.e.
∥v n+1 − v n ∥22 /(resolution ⋅ k), plotted
against number of iterations for four different test scenarios, presented in this
paper: Binary segmentation (magenta) from Figure 3, 3-label segmentation
(red) from Figure 4, 4-label segmentation (blue) from Figure 5 and 8-label
segmentation from Figure 6.
gorithm (the smaller τ gets, the smaller the difference will be).
The determining factor for when the algorithm can be stopped
should be the actual visual change of the final segmentation
output. Of course, this can only be measured in artificial test
environments where a reference solution is known. Because
of these observations, in Figure 8 we present graphs for the
root mean square deviation
(RMDS) of the primal variable,
√
which is defined by ∥v n+1 −v n ∥22 /(resolution⋅k), depending
on the number of iterations. The actual deviation of the output
after each iteration from the reference solution can be seen in
Figure 9.
Fig. 9. Absolute convergence: Same scenario as in Figure 8, but considering
root mean square deviation of the primal variable to a reference solution. The
difference of a few pixels after about 500 iterations in all four scenarios
corresponds to ≈ 10−4 as a stopping criteria (cf. Figure 8).
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
res. / seg.
128x128
256x256
512x512
1024x1024
2
0.08
0.08
0.19
0.65
4
ms
ms
ms
ms
0.08
0.16
0.45
1.65
10
8
ms
ms
ms
ms
0.14
0.38
1.28
4.99
16
ms
ms
ms
ms
0.46
1.62
6.21
24.9
input
k-means(global) k-means(local) mean-shift
quick-shift
ms
ms
ms
ms
TABLE I
C OMPUTATION TIME : P ERFORMANCE IN TIME PER ITERATION OF OUR
ALGORITHM FOR SEGMENTING THE FIREWORKS IMAGE ( PRESENTED IN
F IGURE 1) FOR DIFFERENT RESOLUTIONS AND NUMBERS OF SEGMENTS
USING THE SEMI - COLOR MULTI - REGION MODEL PRESENTED IN
S ECTION II-D.
D. Computation Time
Table I shows representative computation times for one image
at different resolutions and segmentation into different numbers of segments with our algorithm implemented in CUDA.
This shows that our algorithm scales very well with the
number of pixels and the number of segments and makes high
resolution image segmentation possible in a matter of seconds.
In general, it is difficult to compare computation times
to other approaches. First, different methods achieve at least
slightly different segmentation results on one and the same
input image. This is due to different mathematical models
describing the segmenation process. For example, one major
feature of our approach is the incorporation of the ”intensity
variation on segments” term (cf. (1)), which to the best of
our knowledge only occurs in a similar fashion in MumfordShah-related approaches like [7]. Second, the computation
time depends to a great extent on the implementation of the
algorithm. Highly parallelizable algorithms like ours benefit
greatly by the efficient computing capabilities of a GPU. In our
experiments we experienced a speedup of approx. factor 100
compared to an optimized Mathematica implementation of our
algorithm. This is why we first want to focus on computation
times of other approaches implemented in CUDA.
In [15] the authors claim that typical runtimes for image
segmentation problems for 512 × 512 images and 10 labels
are in the range of 50 seconds on a Nvidia Tesla C1060 GPU.
For the approach presented in [14] usual runtimes for 640 ×
480 images and 5 labels are around 90 seconds, and for 320
× 240 images 7 seconds on a NVIDIA GTX 480 GPU. In [7]
typical execution times for 128 × 128 images with the solution
discretized on 32 levels are in the order of 600 seconds on
on a Tesla C1060 GPU. All these computation times have
been obtained using slightly older hardware than the NVIDIA
GTX 780Ti we used for our experiments. Taking this into
account, our algorithm still outperforms those approaches in
terms of computation time. An implementation of the quickshift algorithm (cf. [25]), a method similar to mean-shift and
hence completely different from our approach, only needs a
few hundred milliseconds running on our setup (for further
details see Section IV-E).
Additionally, we want to compare our approach to popular
CPU-based segmentation algorithms. According to [23] the
statistical model for Mumford-Shah (SMMS) and multiphase
Mumford-Shah (MPMS) CPU implementations take at least
50 seconds to compute, depending on the number of labels
Fig. 10. Comparison to Mean-Shift and K-Means Clustering: From left to
right: The input image and the results obtained by k-means clustering (global
optimum, local optimum), mean-shift (Wolfram Mathematica Implementation)
and the quick-shift implemention of [25] (with additional grouping of regions
and parameters experimentally chosen for best visual results).
and the image size. The segmentation algorithm gPb-owt-ucm
presented in [22], which the Berkeley image segmentation
benchmark (see Section IV-F) is built upon, has rather long
computation times of at least 100 seconds on images with
a moderate resolution. According to [26], optimized k-means
clustering algorithms usually take only a few hundred milliseconds to compute, but are not guaranteed to converge to a global
optimum. Efficient mean-shift algorithm implementations need
at least a few seconds to compute (cf. [27]).
E. Comparison to Mean-Shift and K-Means Clustering
Mean-shift and k-means clustering are well known clustering algorithms, that are widely used for many applications,
one of them being image segmentation. It is important to
note though that those algorithms do not yield coherent image
segments as their output. They rather produce clusters of data,
which in the case of an image corresponds to clusters of
pixels. Nevertheless, with appropriate grouping mechanisms
and filters, their output can be used for image segmentation.
For example, if the output contains some noise, denoising
methods can be used create suitable image regions. For a short
comparison we included Figure 10, which shows the same
pictures as Figure 5 and Figure 4, but this time with applying
k-means clustering and the mean-shift algorithm. There are
many implementations and different variants of k-means clustering and mean-shift algorithms available. Computation times
vary from a few seconds to a few hundred seconds depending
on the algorithm used and do also depend heavily on the image
size.
The k-means clustering algorithm can be implemented in
such a way that the computation time is very low. Unfortunately there is no guarantee that the algorithm converges
to its global optimum, which is in general the desired result. [26] proposes a fast global k-means algorithm, which
generates new initial solutions consecutively by running basic
k-means clustering in each iteration. While there is no guarantee that the algorithm reaches the global optimum, it is fast
and delivers in general better results than basic versions of
the k-means algorithm, because it has a higher probability of
reaching the global optimum of the clustering problem.
Mean-shift algorithms usually have longer computation
times and the number of clusters is in general determined
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
BSDS500
Covering
PRI
VI
ODS OIS Best ODS OIS ODS OIS
Human
0.72 0.72
0.88 0.88
1.17 1.17
gPb-owt-ucm
0.59 0.65 0.74
0.83 0.86
1.69 1.48
ours
0.47 0.59 0.74
0.77 0.83
2.32 1.78
Mean Shift
0.54 0.58 0.66
0.79 0.81
1.85 1.64
Felz-Hutt
0.52 0.57 0.69
0.80 0.82
2.21 1.87
Canny-owt-ucm
0.49 0.55 0.66
0.79 0.83
2.19 1.89
NCuts
0.45 0.53 0.67
0.78 0.80
2.23 1.89
Quad-Tree
0.32 0.37 0.46
0.73 0.74
2.46 2.32
TABLE II
R ESULTS OF THE B ERKELEY IMAGE SEGMENTATION BENCHMARK .
F OR EACH SEGMENTATION METHOD , THE LEFTMOST THREE
COLUMNS REPORT THE SCORE OF THE COVERING OF GROUND - TRUTH
SEGMENTS ACCORDING TO OPTIMAL DATASET SCALE (ODS),
OPTIMAL IMAGE SCALE (OIS), OR B EST COVERING CRITERIA . T HE
11
input
gPb-owt-ucm
ours
RIGHTMOST FOUR COLUMNS COMPARE THE SEGMENTATION
METHODS AGAINST GROUND - TRUTH USING THE P ROBABILISTIC R AND
I NDEX (PRI) AND VARIATION OF I NFORMATION (VI) BENCHMARKS ,
RESPECTIVELY. F OR DETAILS SEE [22].
automatically, which makes it difficult to obtain a desired
number of clusters. Amongst others, [27] shows a reasonably
fast and reliable solution for mean-shift algorithms, which still
needs a few seconds running on larger images. Quick-shift, a
method similar in concept to the mean-shift algorithm yields
faster solutions, though. [25] proposes an implementation of
the quick-shift algorithm which runs on the GPU. On our setup
it achieved average computation times of 300 milliseconds for
images with a resolution of approximately 0.25 megapixels.
Figure 10 shows that under certain circumstances, and with
additional accumulation of regions (grouping mechanism), the
quick-shift algorithm can obtain good image segmentation
results.
Fig. 11. Results of our algorithm on images of the Berkeley segmentation
dataset: The input image (left), the output of the gPb-owt-ucm (Berkeley)
algorithm of paper [22] (middle) and the output of our algorithm (right)
with automated parameter choices. The shown output serves as the input
to the Berkeley image segmentation benchmark. The average computation
time for the gPb-owt-ucm algorithm, using code supplied by the authors,
is approximately 150 seconds, while our approach took an average of
approximately 1 second per image on the dataset.
F. Berkeley Benchmark Results
In the following we show how to adapt our algorithm to
hierarchical segmentation and apply the most recent Berkeley Image Segmentation Benchmark BSDS500 [22]. This
benchmark provides 500 images partitioned into a 200 image
training set with manually created ground truth images, a
validation dataset consisting of 100 images and a 200 image
test set. Paper [22] also introduces a state-of-the-art image
segmentation algorithm (gPb-owt-ucm) and the segmentation
benchmark is heavily based on specific details and implementations of this algorithm. Ground truth data is obtained from
segmentations done manually by multiple humans.
It is important to note that the benchmark does neither
take the computation time nor the coherency of the detected
regions into account. It also does not require the contours
to be closed. All of these are important characteristics of
image segmentation algorithms which are used beyond pure
object recognition and are major benefits of our approach.
Nevertheless, we were able to rank among the top scoring
algorithms, mentioned in [22] and Table II, and our algorithm
is the fastest among those. For example, the best scoring gPbowt-ucm algorithm from the Berkeley group takes an average
of 150 seconds for each picture in the test dataset and, while
our algorithm needs about 1 second to compute the final result.
As the output of our algorithm is parameter-dependent we
decided to compute optimal parameters on the supplied training data set and use those parameters on the final benchmark
test dataset. We, however, consider the optimal choice of
parameters as future work which will improve the results
even further. All computations for this section were carried
out using our algorithm with the semi-color model (25).
To transform the output of our algorithm into a compatible
format for the benchmark, we separated individual coherent
regions from our segmentation results and built a tree-like
structure, where separated regions are merged successively.
This structure served as the input for the benchmark. The
detailed results of the benchmark can be seen in Table II. Some
results of our algorithm and Berkeleys own gPb-owt-ucmalgorithm on images of the Berkeley test dataset are shown
in Figure 11.
In general applying benchmarks in image segmentation is
a nontrivial task and results are always only objective to
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
a certain degree. The authors of the Berkeley benchmark
state in [28] that ”it is considerably easier to quantify the
performance of computer vision algorithms at recognition
than at segmentation” and that some researchers argue that
”segmentation or grouping performance can be evaluated only
in the context of a task such as object recognition”. In addition
to that, there are certain important facts to keep in mind:
● The benchmark does not take the computation time into
account, so we can only compare our results in terms of
quality.
● As our focus is on computation time, we did not construct
any more sophisticated segmentation processes that might
involve running the algorithm multiple times on one
image. Those approaches might improve the quality and
accuracy of the segmentation considerably, but contradict
the focus of this paper.
● The benchmark incorporates different kind of measures
to compare a certain segmentation of an image to the
ground truth segmentation. The most important one being
covering-measure, i.e. measuring of region overlap.
● Measurement takes place on a local per pixel level, so
the coherency of regions is not taken into accout. This
also explains why algorithms like mean-shift are able to
obtain a rather high score.
● A segmentations contour must not be closed to be applicable for the benchmark.
● The benchmark favors a structure, where segments with
the same label belong to one coherent region in the image.
This is in contrast to our labeling scheme, which we
consider a feature of our approach.
● The output of the segmentation algorithm presented in
this paper depends considerably on the choices of the
parameters available, especially λ2 and λ3 .
● The segmentation algorithm presented in [22], which the
benchmark is built upon, has long computation times,
taking 100 seconds or longer for average sized images
of 0.15 megapixels.
● In general, we consider the choice of optimal parameters
for our algorithm as future work. The same holds true for
additional user input and the choice of pixel dependent
parameters, like λi (x) (i = 1,...,3). This will improve
the quality further. The rather strong dependency on the
parameters also shows in lower ODS (optimal dataset
scale) values for our approach.
V. C ONCLUSION
In this paper we have introduced a very fast method for
unsupervised, gradient based image segmentation. We have
evaluated our approach on different test cases, and achieve
high-quality results at a fraction of the usually required computation time. This work is intended as a proof of concept
of the ideas behind the modeling and transformation process
presented in Section II. We feel that these ideas can be applied
to a more general class of optimization problems that tries
to minimize gradient norms specifically excluding boundaries.
While the presented approach works autonomously, additional
user input can be incorporated to increase the visual quality of
12
the segmentation. For example, it would be an interesting and
useful extension to let the user influence the weights for each
energy term in the objective function locally at each pixel.
Because of the short computation times it would also be interesting to see real time interaction of the user with the algorithm
via some sort of GUI. The presented approach is independent
of the domain dimension, so for future work we plan to
consider 3D image segmentation and video segmentation.
From a mathematical standpoint we would like to investigate
the justification of the convex relaxation process used to obtain
a convex-concave saddle point formulation in Section II and
the detailed influence of the approximative projection scheme
presented in Section III-C on convergence of the algorithm.
In addition, the full-color model presented in Section II-D
could be extended to have connected restriction sets (similar
to the work of Strekalovskiy et. al. [8] on the Mumford-Shah
functional), which would further improve quality.
A PPENDIX A
D ISTRIBUTIONAL D ERIVATIVE OF F UNCTIONS OF
B OUNDED VARIATION
Functions of bounded variation (BV) have many interesting
analytical properties. The one most relevant for this paper is
the decomposition of their distributional derivative. Let u ∈ BV,
then according to [29] the distributional derivative Du has the
following Lebesgue-Nikodym decomposition:
Du = ∇u+Cu+(u+ −u− )νu dHn−1 ⌞Ju
(42)
In the above decomposition ∇u denotes the absolutely continous part with respect to Lebesgue measure on Ω of the
measure Du. Cu indicates the so-called Cantor part of the
measure Du. Both parts, ∇u and Cu, do not charge any dHn−1
finite sets, with dHn−1 denoting n−1-dimensional Hausdorff measure. Only the last part, i.e. (u+ −u− )νu dHn−1 ⌞Ju ,
charges dHn−1 finite sets, with Ju being the jump set of the
function u, νu being the normal vector to this jump set pointing
in the ascending direction, and u+ > u− being left and right
limit values of the function u at the jumping point. For binary
functions - and in general piecewise constant functions - the
decomposition (42) consists for obvious reasons only of the
jump part, i.e. for the binary function v ∶ Ω → {0,1} we have:
Dv = (v + −v − )νv dHn−1 ⌞Jv
(43)
A rather elaborate overview of BV functions and the above
decomposition can be found in [12].
ACKNOWLEDGMENT
The research leading to these results has received funding
from the European Unions Seventh Framework Programme
FP7/2007-2013 under grant agreement no. 256941, Reality
CG.
R EFERENCES
[1] M. Wertheimer, Laws of organization in perceptual forms. Harcourt,
Brace & Jovanovitch, 1938.
[2] K. W. T. Jr, B. L. Bhaduri, E. A. Bright, A. Cheriyadat, T. P. Karnowski,
P. J. Palathingal, T. E. Potok, and J. R. Price, “Automated Feature Generation in Large-Scale Geospatial Libraries for Content-Based Indexing,”
2006.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015
[3] A. Levin, D. Lischinski, and Y. Weiss, “A closed-form solution to natural
image matting,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 30, no. 2, pp. 228–242, 2008.
[4] I. Kokkinos and P. Maragos, “Synergy between object recognition
and image segmentation using the expectation-maximization algorithm,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 31, no. 8, pp. 1486–1501, 2009.
[5] D. Mumford and J. Shah, “Optimal approximations by piecewise smooth
functions and associated variational problems,” Comm. Pure Appl.
Math., vol. 42, no. 5, pp. 577–685, 1989.
[6] G. Alberti, G. Bouchitt, and G. Dal Maso, “The calibration method
for the mumford-shah functional and free-discontinuity problems,” Calculus of Variations and Partial Differential Equations, vol. 16, no. 3,
pp. 299–333, 2003.
[7] T. Pock, D. Cremers, H. Bischof, and A. Chambolle, “An algorithm for
minimizing the mumford-shah functional,” in Computer Vision, 2009
IEEE 12th International Conference on, pp. 1133–1140, Sept 2009.
[8] E. Strekalovskiy, A. Chambolle, and D. Cremers, “A convex representation for the vectorial mumford-shah functional,” in IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), (Providence, Rhode
Island), June 2012.
[9] L. A. Vese and T. F. Chan, “A multiphase level set framework for
image segmentation using the mumford and shah model,” Int. J. Comput.
Vision, vol. 50, no. 3, pp. 271–293, 2002.
[10] T. F. Chan and L. A. Vese, “Active contours without edges.,” IEEE
Transactions on Image Processing, vol. 10, no. 2, pp. 266–277, 2001.
[11] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours.,” in
ICCV, pp. 694–699, 1995.
[12] A. Chambolle, D. Cremers, and T. Pock, “A convex approach to
minimal partitions,” SIAM Journal on Imaging Sciences, vol. 5, no. 4,
pp. 1113–1158, 2012.
[13] J. Lellmann, J. Kappes, J. Yuan, F. Becker, and C. Schn¨orr, “Convex
multi-class image labeling by simplex-constrained total variation,” in
Proceedings of the Second International Conference on Scale Space
and Variational Methods in Computer Vision, SSVM ’09, (Berlin,
Heidelberg), pp. 150–162, Springer-Verlag, 2009.
[14] E. Strekalovskiy and D. Cremers, “Generalized ordering constraints
for multilabel optimization,” in Proceedings of the 2011 International
Conference on Computer Vision, ICCV ’11, (Washington, DC, USA),
pp. 2619–2626, IEEE Computer Society, 2011.
[15] T. Pock, A. Chambolle, H. Bischof, and D. Cremers, “A convex relaxation approach for computing minimal partitions,” in IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), (Miami, Florida),
2009.
[16] D. Cremers, T. Pock, K. Kolev, and A. Chambolle, “Convex relaxation
techniques for segmentation, stereo and multiview reconstruction,” in
Markov Random Fields for Vision and Image Processing, MIT Press,
2011.
[17] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for
convex problems with applications to imaging,” J. Math. Imaging Vis.,
vol. 40, no. 1, pp. 120–145, 2011.
[18] T. Pock and A. Chambolle, “Diagonal preconditioning for first order
primal-dual algorithms in convex optimization,” in Computer Vision
(ICCV), 2011 IEEE International Conference on, pp. 1762–1769, Nov
2011.
[19] D. Lorenz and T. Pock, “An accelerated forward-backward algorithm
for monotone inclusions,” 2014, 1403.3522.
[20] R. Rockafellar, Convex Analysis. Convex Analysis, Princeton University
Press, 1997.
[21] A. Chambolle, “An algorithm for total variation minimization and
applications,” J. Math. Imaging Vis., vol. 20, pp. 89–97, Jan. 2004.
[22] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection
and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 33, pp. 898–916, May 2011.
[23] B.-W. Hong, Z. Lu, and G. Sundaramoorthi, “A new model and simple
algorithms for multi-label mumford-shah problems,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8,
2013.
[24] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge-preserving
decompositions for multi-scale tone and detail manipulation,” in ACM
SIGGRAPH 2008 Papers, SIGGRAPH ’08, (New York, NY, USA),
pp. 67:1–67:10, ACM, 2008.
[25] B. Fulkerson and S. Soatto, “Really quick shift: Image segmentation
on a GPU,” in Trends and Topics in Computer Vision - ECCV 2010
Workshops, Heraklion, Crete, Greece, September 10-11, 2010, Revised
Selected Papers, Part II, pp. 350–358, 2010.
13
[26] J. Xie and S. Jiang, “A simple and fast algorithm for global k-means
clustering,” Education Technology and Computer Science, International
Workshop on, vol. 2, pp. 36–40, 2010.
[27] P. Wang, D. Lee, A. G. Gray, and J. M. Rehg, “Fast mean shift with
accurate and stable convergence.,” in AISTATS (M. Meila and X. Shen,
eds.), vol. 2 of JMLR Proceedings, pp. 604–611, JMLR.org, 2007.
[28] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human
segmented natural images and its application to evaluating segmentation
algorithms and measuring ecological statistics,” in in Proc. 8th Intl Conf.
Computer Vision, pp. 416–423, 2001.
[29] H. Attouch, G. Buttazzo, and G. Michaille, Variational Analysis in
Sobolev and BV Spaces: Applications to PDEs and Optimization.
MPS-SIAM Series on Optimization, Society for Industrial and Applied
Mathematics, 2006.
Benjamin Hell received his Masters degree in Industrial Mathematics in 2011 from the University
of Bayreuth, Germany. He is currently pursuing his
PhD in Computer Science at the Computer Graphics
Lab at the TU Braunschweig. His research interests
include optimization and geometry.
Marc Kassubeck received his Bachelor degree in
Computer Science in 2013 from TU Braunschweig,
Germany. He is currently pursuing his Masters degree in Computer Science at TU Braunschweig.
His research interests include image processing and
applied mathematics.
Pablo Bauszat received his Diploma degree in Computer Science from TU Braunschweig, Germany, in
2011. He is currently pursuing his PhD in Computer
Science at TU Braunschweig, Germany. His research
interests include real-time rendering, ray tracing,
light transport simulation and perceptual rendering.
Martin Eisemann received a Diploma degree in
Computational Visualistics from the University of
Koblenz-Landau, Germany, in 2006 and his PhD
degree in Computer Graphics from the TU Braunschweig, Germany, and received the best student
paper award at the annual conference of the European Association for Computer Graphics (Eurographics) in 2008. 2011 he became Postdoctoral
Researcher at the Computer Graphics Lab at the
TU Braunschweig. In 2014 he spent one semester
as Postdoctoral Researcher at the TU Delft. Since
2015 he is professor at the TH K¨oln. His main research interests include
image- and video-based rendering and editing, visual analytics, and realistic
and interactive rendering.
Marcus Magnor is full professor and head of the
Computer Graphics Lab at TU Braunschweig. He
holds a Diploma degree in Physics (1997) and a
PhD in Electrical Engineering (2000). After his postgraduate time as Research Associate in the Graphics
Lab at Stanford University, he established his own
research group at the Max-Planck-Institut Informatik
in Saarbr¨ucken. He completed his habilitation and
received the venia legendi in Computer Science from
Saarland University in 2005. In 2009, he spent one
semester as Fulbright scholar and Visiting Associate
Professor at the University of New Mexico. His research interests meander
along the visual information processing pipeline, from image formation, acquisition, and analysis to image synthesis, display, perception, and cognition.