Image Processing - Jerzy Karczmarczuk

Transcription

Introduction to
Image Processing
DESS + Maı̂trise d’Informatique, Université de Caen
Jerzy Karczmarczuk
Caen 1997/1998
Contents
1 What do we need it for?
3
2 2D Geometric Transformations
2.1 Typical Linear Transformations . . . . . . . .
2.1.1 Scaling, Rotation, Shearing . . . . . .
2.2 Other Affine Transformations and Perspective
2.2.1 Linear Transformation of Triangles . .
2.2.2 Nonlinear Deformations (Overview) . .
2.3 How to do Geometry in Discrete Space . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
9
11
12
17
3 Filtering
3.1 Introduction: Filters and Convolutions . .
3.2 Typical Filters and their Applications . . .
3.2.1 Smoothing filters . . . . . . . . . .
3.2.2 De-noising by median filtering . . .
3.2.3 Gradients . . . . . . . . . . . . . .
3.2.4 Laplace and Sobel operators . . . .
3.2.5 Sharpening, and Edge Enhancing .
3.3 Fourier Transform and Spatial Frequencies
3.3.1 Discrete Cosine Transform . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
20
21
21
23
24
25
26
27
29
4 Colour Space Manipulations
4.1 Colour Spaces and Channels . . . . . . . . . . .
4.2 Transfer Curves, and Histogram Manipulations .
4.2.1 Transfer Curves . . . . . . . . . . . . . .
4.2.2 Practical gamma correction . . . . . . .
4.2.3 Histogram Equalization . . . . . . . . .
4.3 Transparence Channel . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
30
31
32
33
34
36
.
.
.
.
38
38
39
41
43
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Image Algebra
5.1 Addition, Subtraction and Multiplication . . . . . .
5.1.1 Some Simple Examples . . . . . . . . . . . .
5.2 Working with Layers . . . . . . . . . . . . . . . . .
5.2.1 Construction and Usage of the Transparence
1
. . . . .
. . . . .
. . . . .
Channel
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
6 Some Special 3D Effects
6.1 2D Bump mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Displacement Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Another example: turbulence . . . . . . . . . . . . . . . . . .
2
45
45
47
48
7 Other Deformation Techniques
51
7.1 Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Morphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 1
What do we need it for?
Our course deals with the creation of images, and concretely with the synthesis and
rendering of 3-dimensional scenes. What is the role of the image processing here, do
we really need to add new topics to a domain already sufficiently rich? The answer
is: yes, we do. Of course, in order to construct a sound 3D model and to launch a
ray tracer, one does not need to master all, sometimes very specific 2D techniques,
although at least the 3D scene should be constructed in a way adapted to its 2D
presentation. (If you choose badly the position, the direction, or the focal attributes
of your camera, even a wonderful composition of your scene won’t help you. . . ) If
the rendering program/device has been constructed and piloted correctly, perhaps
no post-processing is ever needed. However much more often one has to do some
minor corrections, or more important alterations of the produced images.
You might wish to add through post-processing some special effects which would
be extremely costly if integrated into the 3D rendering, or just add some 2D external
elements to the image, such as some text or frames. Also, when composing a synthetic scene with a “natural” texture, parts of a photograph etc., more often than
not the contrast, brightness or colour distribution should be adjusted.
The image processing domain has some independent, creative aspects as well.
We will not speak here about artistic creation and painting techniques, although
the author confesses that this is for him a fascinating subject. We might have to
think seriously about:
A Creation of various coloured textures: regular and stochastic; based on real
photographs or totaly synthesized; simple or showing a replicated geometric
pattern; texts, etc. In general – everything we may need to enrich the surfaces
of 3D objects.
B Compositions and collages; clips from one image added to another; elimination
or replication of some fragments; retouch.
C Colour space transformations
• Adjustments of the luminosity, contrast, hue (colour bias) or/and gamma
correction.
3
4
• Histogram equilibration (equalization) or stretching.
• Dithering and halftoning. Colour quantization and creation of micropatterns: modifying the pixel structure of the image in order to simulate
non-existing colours.
• Thresholding; all kind of transfer curve manipulations and creation of
artificial colour variations. Also: ex post colouring of gray photographs.
• “Image algebra” (or arithmetic if you prefer). Addition, multiplication,
and other variants of image “blitting” or “rasterOps”. These techniques
permit to change colour distributions, but also to add/subtract image
fragments, administrate the “sprites” in animated pictures, etc.
D Geometric transformations: rotations, scaling, simulated perspective; nonlinear transformations adapted to the texture-mapping manipulations or the
deformation introduced by some non-standard cameras: panoramic, fish-eye,
etc. Arbitrary deformations: warping.
E Composite deformations and blending: morphing.
F Special effects:
• Bump mapping and other pseudo-extrusion techniques which give to the
image a 3D appearance, notably the “embossing” technique.
• Lighting effects: halos and glows, lens reflections, distance (“atmospheric”)
attenuation introduced ex post.
• Particular artistic effects: transforming realistic images into pointilist
(van Gogh like) or other impressionist tableaux; “hot wax”, aquarelle,
or carbon/chalk pictures, etc. One may wish to transform a photograph
into an ancient copperplate engraving, or a comics strip style drawing.
The possibilities are infinite. The main problem is to transform human
imagination into an algorithm. . .
G “Classical” filtering manipulations: edge enhancing, noise removal, anti-aliasing by blurring, sharpening, etc.
H More analytic operations, which recognize or segment the image fragments:
contour tracing, or zone (colour interval) selection, essential for the cutting
and/or replacing picture fragments. The contour finding and representation
is a very large domain per se, we will mention briefly some standard filtering
techniques, but we cannot discuss other, more modern and fascinating subjects
as the active contours (“snakes”) or the watershed algorithm. These, anyway,
serve principally for the image analysis and interpretation rather than as a
creation aid.
5
I Other manipulations which belong to the image analysis, and which will be
omitted from these notes, or just briefly mentioned:
• Vectorization: transformation of a bitmap into a set of geometric objects
– lines, circles, etc.
• Segmentation through the Hough transform: representation of geometrical entities of the image in the space of parameters which define these
objects.
• Karhunen-Loeve (Hotelling) transform which is a powerful tool of the
statistical analysis of images, signals, etc.
• Reconstruction of images from various linear signals, for example from
their Radon transforms generated by X-ray scanners.
(The Radon or Karhunen-Loeve transforms may serve also for more prosaic
manipulations. They might help to localise and to parameterise the lines which
should be by definition horizontal or vertical, but they are not, because the
photo we put in the scanner was slightly slanted.)
J We shall not discuss either the image compression (which is the object of
another course for the 5-th year students: DESS “Images”). The list of omissions is anyway almost infinite: wavelet representation, procedural creation
of pictures, for example through IFS (Iterated Function Systems) or the Lsystems (Lindenmeyer “grammatical” approach to the generation of iterative
and recursive pictures, very good for the simulation of plants), etc.
These items are not independent, but strongly related. For instance, the simulated
3D effects are often not geometric manipulations, but just some specific modifications of the colour distribution (bump-mapping). (But displacement maps are of
geometric nature.)
During the preparation of these notes we have used very different software, commercial and free. The image processing packages are very abundant and it is easy to
find several free or shareware programs very powerful and user friendly. The commercial package heavily used was the well known Photoshop, but the Unix users may
obtain almost all its special effects (and many more!) from a free package GIMP, a
wonderful interactive and programmable tool, still evolving.
As our ambition is to explain the essentials of the image processing, it was necessary to do some low-level programming, to treat images as matrices of numerical
values (gray level, colour triplets or palette indices), and to process them as such.
Of course it is possible to use any programming language to do so, and we have
tried several of them. The necessity to perform many experiments, to change interactively this or that element of the processed picture precludes the usage of classical
compile-and-run languages, as C++ or Java, the interactivity is much more important than the brutal efficiency, so we used the scientific programming system Matlab.
It is a commercial package, but there are other, free systems well adapted to matrix
6
processing, such as SciLab or Tela. (But Matlab is a scientific, vectorized computation package, and has excellent interfacing tools which permit both high-level and
low-level visual data processing. This is slightly less developed in above-mentioned
free systems (to which we may add Rlab and Octave), which have other objectives
than image processing.
There are also some powerful programming/integrating machines specifically
adapted to the treatment of images as Khoros. The low-level programming is left to
the user, but it is seldom needed. All typical image algebra and geometry are already
ready to use in Khoros standard libraries, and the combination of modules and their
interactive construction and execution using the visual, dataflow style interface is
really very comfortable and elegant.
Khoros has a mixed status: one can get freely the sources and compile them
(which sometimes is not trivial. . . ), or buy a commercially packed full system, with
plenty of documentation and examples.
We acknowledge the existence of some other similar packages, commercial, but
whose distributors offer sometimes some working demonstration versions, for example the IDL system, which offers a good graphical programming language and very
decent interfacing.
Of course we will use also some drawing packages, for example the MetaPost system which permits to include some PostScript drawings into the document without
having to program them using the horrible PostScript language (very clumsy, but
sometimes very useful!). MetaPost has the advantage of generating a particularly
simple, developed PostScript, easily convertible into PDF by the ConTeXt package of
Hans Hagen, without the necessity of using the commercial Adobe Distiller product.
These notes are constructed independently of the other parts of the author’s course
on image synthesis, and they may be read separately, but – of course – they should
be treated as a part of a bigger whole. This is the first version, very incomplete,
and probably very buggy. Please contact the author if a particularly shameful error
is discovered.
Chapter 2
2D Geometric Transformations
2.1
Typical Linear Transformations
We begin thus to discuss some geometric manipulations of the 2D images. They
will be presented, if necessary, in a coordinate-independent fashion, using abstract
vector algebra, but the difference between the 3D scenes, where we are interested in
“real” objects and their mutual relations, and 2D images is essential. There is no
need to introduce abstractions where the only thing to do is just to transform the
pixel coordinates in a loop. There is no particular need for the homogeneous coordinates, although they might simplify the presentation of the simulated perspective
transformation.
Only continuous geometry is discussed in this section. The real problems, the
manipulations of discrete pixel matrices is postponed to the section (2.3).
2.1.1
Scaling, Rotation, Shearing
The scaling is a very simple operation:
x → sx x,
y → sy y,
(2.1)
or, if you wish
x
sx
→
y
0
0
sy
x
y
(2.2)
where the pair sx , sy may contain negative components, but then one has to reinterpret the negative coordinates; usually both x and y are positive, starting at (0, 0),
and rarely we think of the image as of something centered around the origin of the
coordinate system (although it might help while considering an image to be a filter,
see section(3). We have then to add the compensating translation. If in the original
picture x varies from 0 to A, and sx is negative, the real final transformation is
x → sx (x − A).
7
(2.3)
2.1 Typical Linear Transformations
The rotation matrix is well known:
x
cos(θ)
→
y
sin(θ)
8
− sin(θ)
cos(θ)
x
,
y
(2.4)
but, what is sometimes less known is the fact that this rotation can be composed
out of three shearing (slanting) transformations parallel to the coordinate axes. The
x-shearing transformation which gives the effect shown on Fig (2.1) (any Brazilians
among readers?. . . ) has the representation (2.5).
x
1 κ
x
→
.
(2.5)
y
0 1
y
Fig. 2.1: The x-shearing transformation
Defining κ = tan(θ/2) we prove easily that the rotation
identity:
cos(θ) − sin(θ)
1 −κ
1
=
sin(θ)
cos(θ)
0 1
sin(θ)
matrix fulfils the following
0
1
1 −κ
0 1
,
(2.6)
which give together the chain of transformations shown on Fig. (2.2).
Fig. 2.2: Rotation from shearing
It seems a little useless, since it is more complicated than a simple rotation matrix,
but it might be faster, if there is no pixel interpolation involved. Horizontal or
vertical slanting displace entire rows or columns, and if we have a fast pixel block
transfer routine, some time may be economised. Beware however of aliasing!
The slanting deformation can be easily generalized producing a trapezium shown on
Fig. (2.3).
9
The mathematics of this transformation is quite
simple, we see that the slanting coefficient in (2.5) is
now x-dependent, and that this dependence is linear.
So, the whole transformation is not linear any more:
y → y, x → x + (α + βx)y.
This might be considered as a poor-man perspective (horizontal) transformation: the figure represents Fig. 2.3: Generalized sheara rectangle disposed horizontally, perpendicularly to ing
the vertical plane defined by the sheet of paper, or the
screen of this document. We look at this stripe from above and from the right, and
the shape is intuitively correct.
However, simulating the perspective in such a way is a rather bad idea. The
problem is that – as easily seen from the figure – the displacement and the compression are strictly horizontal. But the proportions along the other direction are
modified as well. We know that the perspective is affine only in the homogeneous
coordinates.
2.2
Other Affine Transformations and Perspective
The real perspective looks like on the Fig. (2.4). The details of the transformation
depend on the relation between the simulated orientation of the original image and
the screen. Our example figure is placed vertically and perpendicularly to the screen,
but, for example the “Star Wars” text is horizontal. We might obtain an oblique
orientation of the original also, like on Fig. (2.5).
Fig. 2.4: Once upon a time. . . there were some perspectives
Now, how to obtain these transformations? The technique used is the following.
The image is enclosed in a rectangular box. We can move arbitrarily the corners of
this box, producing an arbitrary quadrilateral. The parameters of the perspective
transformation which combines
• the simulated position of the original image in the 3D space, and
• the parameters of its projection on the screen
10
Fig. 2.5: Perspective; oblique orientation
is retrieved from these corners, and all the rest is straightforward. For the sake of
completeness we re-derive the perspective projection formulæ. The following entities
are known:
• The position of the projection point (the eye): ~xP . Usually in the 3D computations this point is fixed, for example at origin, or ~xP = (0, 0, 1), etc. Here
this is not an independent geometric object, we will find it from the resulting
projection quadrilateral.
• The projection plane (screen): x~0 · ~n = d.
The homothetic projection is shown on Fig. (2.6).
~x
x~0
~n
~xP
Fig. 2.6: Perspective Projection
The vector x~0 is the homothetic map of ~x, so we can write that
x~0 − ~xP = κ(~x − ~xP ).
(2.7)
But x~0 lies on the projection plane. Thus, multiplying the equation (2.7) by ~n we
get
d − ~xP · ~n
κ=
,
(2.8)
(~x − ~xP ) · ~n
from which x~0 can be easily computed. But here we are interested in solving a totally
different problem! First, we simplify the equation (2.8) identifying the projection
11
screen with the plane xy (so ~n is the unit vector along the z axis, and d = 0), and
placing the focal point xP at (0, 0, r). We obtain the following transformation
0
r
x
x
=
.
(2.9)
0
y
r−z y
Here we don’t know the vector (x, y, z) yet. It belongs to the original image, considered always rectangular, with its canonical Cartesian system, say {~x0 , ~u, ~v }, where
~x0 is a distinguished point, for example one corner, or the center of the image, and
~u, ~v define the axes. Of course, if ~x0 is the left lower corner, the natural choice will
be ~u = ~x1 − ~x0 , and ~v = ~x2 − ~x0 , etc. Some other convention might be easier if the
origin is chosen at the image center.
So, we need to find this coordinate system, i.e. 3 vectors with two of them
perpendicular. There are thus 8 unknowns, and we have 8 equations for the 4
distorted corners. We leave this exercice to the reader.
2.2.1
Linear Transformation of Triangles
When a fragment to fragment mapping is needed, as in morphing (see the section
(7.1)), usually both the source and the target areas are split in simple polygons, for
example in triangles. Triangles are always convex and their simplicity ensures that
the mapping is unique, and without pathologies. The task consists in mapping the
triangle spanned by the three points p~0 , p~1 , p~2 into the triangle defined by ~q0 , ~q1 , ~q2 .
The mapping should be linear. The Fig. (2.7) shows the result.
q2
p2
x’
q1
x
p0
q0
p1
Fig. 2.7: Linear Triangle Mapping
The solution goes as follows. We establish within the first triangle a local coordinate
system spanned by ~u = p~1 − p~0 , and ~v = p~2 − p~0 . The axes are not normalized.
Every internal point ~x of the triangle admits the representation ~x = p~0 + α~u + β~v .
Knowing that the problem is planar, and that we can treat the vector product as
scalar (pseudo-scalar, but this is unimportant; it has one component only), we get
~x =
1
((~x ∧ ~v )~u − (~x ∧ ~u)~v ) ,
~u ∧ ~v
(2.10)
12
i.e., α = ~x ∧ ~v /~u ∧ ~v , etc. In the second triangle we introduce the corresponding
base ~g = ~q1 − ~q0 and ~h = ~q2 − ~q0 , and we restore ~x0 = ~q0 + α~g + β~h. The only detail
which remains is the correct implementation in the discrete case, as always.
Obviously, if the problem of triangle mapping is solved, any polygons may be treated
after their triangularisation. A natural caveat seems appropriate here: if the mapping is linear, and the triangular components of a polygon are treated separately,
the resulting global mapping is continuous, but straight lines might break at the
triangle boundaries, as shown on Fig. (2.8).
Fig. 2.8: Linear Transformation of Polygons is Dangerous
Such technique might not be acceptable. Moreover, even in the case of quadrilaterals,
if the target figure is convex, there is a choice of two diagonals, which add some
ambiguity to the mapping strategy. The lines will break in different places. The
section (7.1) discusses some other methods of deformation, essentially non-linear.
2.2.2
Nonlinear Deformations (Overview)
The perspective is already nonlinear, but we want to treat here some more general
cases, especially useful in texture mapping. If a flat picture is projected on the
surface of a 3D object, and then this object is projected on the screen, the two
transformations have to be composed. If it is possible to derive the composite transformation before the rendering, this process can be accelerated. If for the rendering
the ray tracing is used, this is usually useless. We have to find the intersection of
the ray with the point in the 3D space, and from its coordinates we may deduce
the colour of the pixel. But if a polygon scan is used, or if the radiosity machine
prepared all the 3D surfaces in a more clever way, and we have only to project everything on the screen, some simplifications can be obtained. Some faster rendering
engines like 3DS or the dynamic games: Doom, Quake, etc. pre-prepare the mapped
textures.
Another application of nonlinear deformations is the general warping, which
will be discussed in section (7.1). The warping is ususally an “artistic”, manual
manipulation of the image, but there is at least one nonlinear flat transformation
which may be considered algorithmic, and it is strongly related to the perspective
transformation: the normalisation of images (photographs) obtained with very wide
13
lens camera, for example with the “fish-eye” lenses. We may wish to “flatten”, to
restore straight lines of such picture as Fig. (2.9), or inversely, to compose one
panoramic strip out of flat image fragments, as on Fig. (2.10). The first picture has
been taken from the NASA collection, essentially unprotected, and the other from
the BYTE Web page, with a possible copyright infringement. If somebody knows
more about their restrictions, please let me know).
Fig. 2.9: Fish view of a cosmic enterprise
Fig. 2.10: Brooklyn bridge composed out of flat fragments
It is possible of course to combine small picture fragments in an affine way, without
introducing curves, nor the the “Cinerama” style discontinuities of tangents. The
Fig. (2.11) shows it very properly done.
All these operations need a very strong analytic apparatus, or an interactive
adjustment, by a human. For example, in order to render the straight lines on the
14
Fig. 2.11: Reconstruction of a wide-angle shot
Cap Canaveral photo, either one has to deduce the true geometric proportions of
the 3D scene which requires a rather intelligent geometric segmentation, or one tries
through dynamic warping to transform all the circle segments into straight lines,
which may not be unique.
Of course, if the focal properties of the fish-eye camera are known, the restoration of the picture (or its simulated generation from a flat image) is essentially
straightforward, although not entirely trivial.
We describe now partially, and with some simplifications the fish-eye camera. This
is not a – strictly speaking – image processing problem, and it will migrate to the
3D-geometry section in future releases of these notes, but for the moment the reader
might treat this subject as an exercice in image deformation.
The most classical fish-eye model (not necessarily fully practical) is based on the
stereographic projection shown on Fig. (2.12). The idea is thus very simple. Instead
of projecting the 3D scene on a flat screen, we project it on spherical surface. In such
a way we can cover 180 or more degrees without introducing geometric pathologies.
(Of course, we cannot cover 360◦ , but there are already some standardised techniques
15
of producing and displaying the images covering the full solid angle, for example
IPIX, which is even accessible as a Web browser plug-in).
spher. proj.
final proj.
focus
Fig. 2.12: “Fish-eye”-like deformation
We have the following geometric entities:
• The projection sphere with radius R, considered usually to be very small as
compared with the true scene dimensions, but it is not small, when a simulated
projection is used to deform an already existing 2D image.
• The main focus which is not necessarily the center of the sphere.
• The stereographic projection point, and the projection plane: we need to map
the sphere sector to a flat disk. We may choose for example the pole opposite
to the main vision direction (the center of the image disk), and for the plane
– the “equator”. Other variants are also possible, but more difficult to treat
mathematically.
• Finally, we have to define somewhere the limiting angle. This is essential for
the fish-eye rendering, but also for the flattening: we have to know how far
shall we go, the size of the 180◦ image is essentially infinite. . .
If we want just to simulate the fish-eye camera and to construct a distorted image
from something perfectly standard, we may use the center of the sphere as the focal
point. This is neither the standard stereographic projection, nor a general fish-eye,
but the IPIX normalized camera. The original image may be placed on a plane
tangent to the sphere (this is just the choice of the scaling factor).
We have the following relation between the “real” radius r on a flat image tangent
to the projection sphere, and the distorted radius z on the equator plane:
z/R
r
=p
R
1 − z 2 /R2
(2.11)
which can be easily inverted to generate the fish-eye distortion
z
r/R
=p
.
R
1 + r2 /R2
16
(2.12)
The task of transforming the Cartesian coordinates (x, y) to r and some angular
variable is left to the reader. But we show the result of this transformation on Fig.
(2.13). Of course you know the original. (The result could be of better quality if we
have followed the rules suggested in the next section. We have not interpolated the
pixel colours.)
Fig. 2.13: Babel as seen by a fish
There is one important point in this exercice which shows how the conceptual and
geometric difference between the 3D scene creation and th 2D picture manipulation
reflects on some mathematic details. Here we don’t know anything about the sphere
radius R, but we know the size of the original, and the dimensions of the new image
we are manufacturing. If the vertical half-height of the tangent image (centered) is
equal to A, and that of the equatorial projection – B, we have
1
1
1
= 2− 2
(2.13)
2
R
A
B
We suggest very strongly that the local readers repeat these calculations for any
position of the focal point, not necessarily in the center of the sphere. This is a very
good examination subject.
2.3 How to do Geometry in Discrete Space
2.3
17
How to do Geometry in Discrete Space
This is a very important section. If a discrete set of points (or intervals, but localized
as pixels) is stretched or compressed, one has to define the interpolation procedure
which is being used. There is no unique algorithm to do this. Moreover the pixels
are usually square (or rectangular) oriented canonically wrt. the x and y axes. If we
rotate the image the pixels change their positions, but their shape does not rotate.
They must occupy canonical positions also, their coordinates must be integer again.
So, all kind of very nasty effects is expected: holes, pixel fusion (after rounding),
aliasing etc. The loops
for (y=0;y<ymax;y++)
for (x=0;x<xmax;x++)
{newpt=transf(x,y); NA[newpt.x][newpt.y]=A[x][y];}
in general cannot be applied directly. The basic idea which eliminates the topologic
pathologies (holes), although does not prevent the aliasing and other troubles resulting from the discretisation, is the application of the inverse transform. The
program calculates first the image of the transformed contour: if the original image
is a rectangle, and the transformation is simple, for example affine, it is only necessary to find the new coordinates of the 4 corners, which may be then connected by
straight lines. In general case we have to find the boundaries of the result space, and
then we will fill this space regularly with pixels. The program goes now as follows:
for all (x0 , y 0 ) in the transformed zone the program calculates (x, y) – the inverse
mapping. The result will usually be a pair of reals, and not integers.
In the most primitive case it suffices to round the coordinates and to assign
the pixel value corresponding to the coordinates found. But caveat programmator!
Rounding or truncating is a harsh operation, and if the transformation is repeated
several times, the distortions will accumulate. Fig. (2.14) shows the result of the
rotation of a picture by 360 degrees in 10 stages, without any interpolation.
Fig. 2.14: Discrete rotation iterated
Of course this result is not very acceptable, and this manipulation was done on
purpose. A serious (and in particular: multi-stage) transformation must interpolate
18
the pixel values. The result might be then a little smeared or fuzzy, but in general
is quite good. Fig. (2.15) shows the result of the same manipulation, a full rotation
in 10 slices of 36 degrees, but with interpolation.
Fig. 2.15: Discrete rotation with interpolation
The interpolation might be performed in many ways. The simplest may even be linear. The source pixels are treated as points occupying the vertices of a rectangular
grid. If the reverse transformation constructs a point between the vertices, for example if we obtain a pair (x, y) reduced to the unit square whose corners correspond
to original pixels denoted by I00 , I10 , I01 , and I11 , we obtain the resulting value by
the bilinear interpolation
Ixy = (1 − x)(1 − y)I00 + x(1 − y)I10 + (1 − x)yI01 + xyI11 .
(2.14)
In practice a bicubic interpolation is much better, and it is not so complicated. (This
is a good examination subject. . . )
There is a small problem near the image edges: what shall we do if the reverse
map gives coordinates outside the pixel grid? Or even so near the edges or the
corners, that a bicubic interpolation is impossible. A slightly different interpolation
scheme might then be used, simpler than cubic. First we “collapse” all the pixels to
their corners (point-like vertices) using the nearest-neighbour approximation. The
vertices far from the image edges are just averages of the 4 neighbouring pixels, and
the boundaries are calculated as follows from the Fig.(2.16).
We may use the following equations which are quite intuitive:
1
(A + B + C + D)
4
f + i = A + B etc.
(e + i)/2 = A.
i =
(2.15)
(2.16)
(2.17)
The other vertices are computed by symmetry and iteration. Note that the exterior
vertices extrapolate rather than interpolate the pixel colours.
e =
1
(7A − B − C − D)
4
(2.18)
g
f
e
A
h
C
k
19
B
i
D
j
l
Fig. 2.16: Nearest-neighbour pixel interpolation
1
(3A + 3B − C − D)
4
1
h =
(3A + 3C − B − D)
4
1
i =
(A + B + C + D)
4
f =
(2.19)
(2.20)
(2.21)
(2.22)
It may be troublesome, and throw us outside the allowed colour space (negative
or too big intensity). Now this matrix is converted using the inverse transform
technique, and the pixels are reconstructed from the vertices using equations similar
to those above. The reader is kindly asked to solve explicitly these equations in the
reverse direction. What happens if the transformed image contains L-like concave
boundaries?
Chapter 3
Filtering
3.1
Introduction: Filters and Convolutions
Mathematically the convolution of two functions: f (x) and g(x) is given by the
integral
Z
(f ? g)(x) = f (z)g(x − z) dz
(3.1)
which has to be generalized into two dimensions and discretized. We get thus the
convolution formula for two matrices A and B:
XX
(A ? B)ij =
Akl Bi−k,j−l ,
(3.2)
k
l
where the indices run through all the intervals where they are defined. Usually k,
etc. is greater (or equal, depending on the convention used) than zero, and goes up
to the upper limit of the matrix dimension. When it becomes negative, the element
is neglected. This seems trivial, but is not necessarily so: sometimes it is preferable
to apply the cyclic convention, where the negative indices “wrap around” the matrix
– the image is not considered to be a finite patch of a plane, but a torus. In this
way the problems with boundary conditions may be omitted.
Usually one of the concerned matrices is the image (or three images – one for
each colour plane), and the other is the filter which is usually much smaller than
the image. The sum (3.2) should thus be reformulated in a way which minimizes
the number of operations.
A useful convention used in this domain is the cyclicity of the filter matrix. We
don’t need to apply the toroidal boundary condition to the image, but very often the
filter intuitively should be “symmetric about zero”, in order to to generate spurious
anisotropies.
Convolutions, as shown by the equations above are linear operations, and they are
not sufficient to obtain all kind of posible special effects, but they are universal
enough. (We will discuss here essentially one kind of non-linear filter – the median,
and its variants, but its presentation will be rather superficial). In the section (4)
20
3.2 Typical Filters and their Applications
21
we will discuss some details of the colour representation of images, but the theory
of colours will be treated in a separate set of notes. For us a pixel is a numeric value
which can be normalized to the interval [0, 1), or [0, 255) if you wish. If the image is
coloured, we can treat the three component separately. here we will not discuss the
colour mixing or conversions at all. Henceforth we shall either consider gray-level
images, or treat the three channels identically.
The interpretation of the filtering process can be given on many different conceptual and mathematical levels. This course cannot treat all the mathematical
details of the signal processing theory, so for the moment the reader may think that
the filtering produces the value of a given result pixel by a linear combination of
its neighbours. This may smooth the image if we calculate the averages (i.e. if all
the weight factors are positive), or it may enhance local differences if some weight
factors are negative.
A general warning is necessary. A convolution of the image with any matrix, and
in particular with a filter possessing negative elements may throw the image out of
the legal colour space. The filtering application shoud warn the user, and permit to
correct the result by cropping the illegal values (they will remain either minimal or
maximal), or renormalise the whole image by shifting and/or multiplication of the
pixel values by an appropriate constant. There is no unique solution in such a case.
Image processing packages ususlly offer to the user a possibility of adding an offset
immediately during the filtering. We shall discuss this problem under a different
angle in section (5).
3.2
3.2.1
Typical Filters and their Applications
Smoothing filters
Look at the horizontal fragment of the da Vinci “Last Supper” fresco on the Fig.
(3.1).
This picture is extremely noisy. By convoluting it with a matrix


1 1 1 1 1
1 1 1 1 1

1 
1 1 1 1 1,
(3.3)

25 
1 1 1 1 1
1 1 1 1 1
where the factor 1/25 is the obvious normalisation factor, produces the transformation shown on Fig. (3.2).
The noise has been considerably attenuated. Of course the image is blurred now,
but if we don’t need the full resolution, it is better to smooth it before, as the plain
resampling usually does not get rid of the noise, as shown on the left of the Fig.
(3.3).
22
Fig. 3.1: Last Supper – fragment
Fig. 3.2: Image blurred with a 5 × 5 homogeneous filter
In order to show the relation between the de-noising through smoothing and the
deterioration of contours we have exaggerated the size of the filter mask: 3 × 3
would be sufficient. The uniform, square mask is simple and fast, but more powerful
methods can be applied. In particular, an often used smoothing filter is the gaussian
function
1 −x2
(3.4)
g(x) = e 2σ2 .
N
p
In two dimensions x is replaced by r = x2 + y 2 . The Gaussian filter is sometimes
used for special effects, and its range is far from bein local – sometimes several dozens
of pixel along both directions are concerned, and usually it is parametrable. it may
be computed
directly
by a floating numer routine, or as an iterated convolution of
1 1
the matrix
with itself. For example, the second and the fourth iterations
1 1
23
Fig. 3.3: Reduced images
give


1 2 1
1 
g2 =
2 4 2,
16
1 2 1


1 4 6 4 1
 4 16 24 16 4 

1 
 6 24 36 24 6  .
g4 =

256 
 4 16 24 16 4 
1 4 6 4 1
(3.5)
Here the variance is proportional to the size of the matrix, but this can be changed,
we wanted just to show that a Gaussian-like filter can be obtained without floatingpoint calculations, but by iterations. of course, instead of generating a big filter
matrix, a small one is iteratively applied.
As mentioned above, in several interesting cases the size of the Gaussian matrix
is big. The complexity of the convolution algorithm is proportional to the surface of the filter, and filters too large are inefficient. In the case of Gaussians we
have another possibility of simplification. The two dimensional Gaussian exponent
exp(−r2 /2σ 2 ) = exp(−x2 /2σ 2 ) exp(−y 2 /2σ 2 ). We may thus apply the two factors
separately, first convoluting the columns in each row separately, and then repeating
the same operations to all rows. The complexity is reduced from N 2 , where N is the
filter size, to linear. The one-line or one-column filtering 1×m matrix again does not
need floating point computations, but may be obtained by convoluting [1, 1] with
itself, and normalizing. By using asymmetric Gaussian filters, with different horizontal and vertical variance it is possible to obtain many interesting effects, which
will be discussed later.
3.2.2
De-noising by median filtering
Fig. (3.4) shows two more smoothing experiences. The image on the left was obtained by the Gaussian blurring followed by a sharpenig filter which will be discussed
in the next section. The image on the right is the application of the median process.
Instead of averaging the pixel values around the center, the filtering chooses one
representative value.
Concretely: first a mask – 3 × 3, 5 × 5 or other is defined. Within this mask all 9, or
25 pixels are identified and sorted with respect to their values. The middle value
replaces the central pixel of the block. An even size of the block may also be used,
24
Fig. 3.4: Gaussian filtering (sharpened), and median denoising
although it is less popular. But the existence of one central pixel is not needed. In
the even case the resulting picture will be shifted “colourfully” by 0.5 pixels.
One should not exaggerate with the size of the median zone, as it introduces
homogeneous colour patches. But it can be used then as an artistic tool. There
are some recent variants of this scheme: the size of the reprentative block is not
fixed, but of varying size: 2, 3, 4, 5, etc., not necessarily centered. For each block
the (normalised) statistical variance of the picture is calculated, and the block with
minimal variance is chosen for the median computations. The author of this technique claims that a good denoising effect is obtained, and the edges become less
fuzzy than in the standard model. Of course, the variance-sensitive techniques are
known for years, for example the linear (but adaptive) Wiener filter.
3.2.3
Gradients
Taking the averages: homogeneous or Gaussian, is equivalent to the integration of
the picture with a positive weight function. This operation eliminates the fluctuations, the high frequencies present in the image. The measure of these fluctuations
which can help us to localize the edges is the operation inverse – the differentiation.
In a discrete structure the gradient can be of course approximated by the difference
of two neighbouring pixels. The gradient is a vector, and we can choose to take its
absolute value, or a directional quantity. In any case, if the gradient filter is applied
to a homogeneous colour zone, the result is close to zero.
Fig. (3.5) shows three different variants of gradient-like filtering. The left picture
is the result of the filter [−1, 1], where 1 is at the central position, and all the
remaining values of the filetr matrix are zero. This is thus a horizontal gradient
which strenghtens the vertical edges. The result is so weakly visible that we had
to augment the luminosity and the contrast of the image. (Note that this filter
produces plenty of negative values which are cropped to zero.) The side-effect of
this image enhancing was the introduction of some noise.


0 0 −1
The central variant is a diagonal filter  0 1 0 , but with an additive constant
0 0 0
(equal to 128 – the mid-value of the intensity full interval). This is the classical
embossing filter – the simulation of a bas-relief. If this filter is applied to a full
25
Fig. 3.5: Lena make-up
colour image, the result is not very different (why?). Theimage at theright is
0 −1 −1
another directional, gradient-like filter, but more complex:  1 1 −1 . Note
1 1
0
that the overall weight here is 1 and not zero. No constant is needed to produce the
embossing effect. Previously the filtering modulated a flat, gray surface, here the
image itself defines the artificially extruded plane.
The reader should understand this effect. (Actually, extruding the original picture seems artistically worse than embossing the flat surface.)
3.2.4
Laplace and Sobel operators
The discrete version of the second derivative has the following form:
f 00 (x) →
f (x + h) − 2f (x) + f (x − h)
.
h2
(3.6)
The basic discrete
version of 
the Laplace filter which computes ∂ 2 f /∂x2 + ∂ 2 f /∂y 2

0 −1 0

is just L1 = −1 4 −1 . This filter is used to detect the edges, and also for
0 −1 0
the sharpening. It is not unique, we may define other Laplacian-like filter matrices,
for example






−1 −1 −1
1 −2 1
−1 0 −1
L2 =  −1 8 −1  , L3 =  −2 4 −2  , L4 =  0 4 0  ,
−1 −1 −1
1 −2 1
−1 0 −1
(3.7)
or asymmetric variants, vertical or horizontal only. The left variant on Fig. (3.6)
has been obtained with L2 which gives contours slightly better than other versions
(at least in the case of Lena).
26
Fig. 3.6: Laplacian and Sobel edge filtering
The right contour is obtained with the combined action



1
2
1
1



Sh =
0
0
0
,
Sv = 2
−1 −2 −1
1
of two Sobel filters:

0 −1
0 −2  .
0 −1
(3.8)
Both act independently on the image producing the horizontalpand vertical partial
contours: I1 and I2 . Now we combine these images with I = I12 + I22 . The right
picture on Fig. (3.6) was produced by a simplified operation: I = I1 + I2 , which
is very slightly less regular, but much less expensive. More about the algebraic
combination of images is presented in section (5).
Of course, if we want to segment the picture and to find the geometric description
of the contours, all this work is still ahead, the filtering prepares only the bitmap.
Anyway we have barely scratched the surface of this domain; there are more powerful
methods, for example the application of adaptive gradient filters followed by the
search of directional maxima.
3.2.5
Sharpening, and Edge Enhancing
Laplacian, Sobel or Prewitt filters, XY-oriented or diagonal, may serve to identify
the edges. How can we just enhance them in order to sharpen the picture? If we
add 1 to the central element of the Laplacian matrices, L1 or L2 , etc., we obtain
sharpening filters. Such filter, for example


−1 −1 −1
 −1 9 −1 
(3.9)
−1 −1 −1
may also be understood
Imagine that we have smoothed the image
 a little differently.

1 1 1
L0 with the filter 1/9  1 1 1 . We call it L1 . A weighted avarage interpolates
1 1 1
27
between the original and the blurred one: L = αL0 + (1 − α)L1 . When α decresaes
from 1 to 0 we move from the original to the blurred version. But what happens if
α > 1 ? This is an extrapolation which increases the distance between the smoothed
image and the result. It gets sharpened, at least it should be sharper than the
original. In order to get the matrix (3.9), the value α = 10 is needed.
3.3
Fourier Transform and Spatial Frequencies
Of course we won’t treat here the rich and enormous domain of spatial frequency
analysis. Our aim is to process the images in order to render them better, and not
to analyse them. The review of the discrete Fourier transform will thus be very
superficial. We shall discuss here essentially two applications:
• Rescaling, and
• Fast convolution (and correlation).
The basic truth to remember is that the Fourier transform converts distances into
frequencies which are, in a sense, dual to distances. When we look at the formula
Z +∞
(3.10)
f (x) → g(ω) = F[f ](ω) =
f (x)eiωx dx
−∞
we see immediately that if g(ω) = F[f (x)], then F[f (ax)] = 1/ag(ω/a). Shrinking
(horizontally) the transform, dilates the function. The discrete case is slightly more
difficult to visualise. Here “shrinking” or “dilating” is less meaningful, we have just
a number of points. Moreover, if we look at the discrete FT formula:
gk =
N
−1
X
fj e2πijk/N
(3.11)
j=0
we see that the periodicity (exp(2πi) = 1) implies that for large j, approaching N
we don’t have “very high frquencies”, but again low, and negative. The highest
frequencies correspond to the index j = N/2.
In the discrete case we don’t have “distances”. The linear length measure is conventional. So, if we want to scale a function, to dilate it by a factor p, we need to pass
from N to p×N points. (Obviously, enlarging a picture means augmenting the number of pixels). If our aim is to enlarge the picture without introducing new, spurious
frequencies corresponding to the sharp edges between the “macro-pixels”, we add
just some zeros to the Fourier transform, we correct the scale (a multiplicative factor), and we invert it. Where shall we add those zeros? Of course in the middle, for
example the vector g0 , g1 , g2 , g3 , g4 , g5 changes into g0 , g1 , g2 , g3 0, 0, 0, 0, 0, g3 , g4 , g5 .
We have take into account the following properties of the FT:
• g0 corresponds to the frequency zero (the global average), is real, and has
usually the biggest numerical value among all the Fourier coefficients.
28
• If g corresponds to a FT of a real function, (and the images are usually real), we
have the following symmetry: gN −k = gk . This symmetry must be preserved
when we add some zeros. In our example the length of the vector was 6, with
a solitary g3 in the middle. It had to be duplicated.
The Fig. (3.7) shows the spectral stretching of a vector. The original had 32
elements (dashed line), the dilated – 128 (solid blue). Note that this dilating which
did not introduce new frequencies, smoothed the function (and introduced some small
oscillations, which unfortunately may be visible on images. You may call them (cum
grano salis) diffraction patterns.
4
2
0
−2
0
20
40
60
80
100
120
Fig. 3.7: Spectral stretching of a curve
The Fig. (3.8) shows the splitting of the transformed image before the inversion of
the transform.
Fig. 3.8: The dilation splits the Fourier transform
The Fig. (3.9) shows the result of such a dilation on the eye of Lena. We discarded
the colours in order to avoid the delicate discussion on the chrominance transformations. The colour diffraction patterns are even more disturbing than the gray ones. . .
The left picture presents once more the classical cubic interpolation, and the right
one – the diffractive dilation of the original. Of course, using the median filter or
others it is possible to smooth out the square diffraction patterns, but usually with
the aid of well chosen wavelets it is possible to improve the result substantially. The
Fourier transform is a venerable, but not the only spectral tool on the market!
29
Fig. 3.9: Cubic interpolation vs. spectral stretching
3.3.1
Discrete Cosine Transform
The complex Fourier transform has some advantages over slightly more complicated
cosine or sine transforms, but these, and especially the Discrete Cosine Transform
(DCT) is used very frequently. It might be used for the spectral filtering, but its
main application is the image compression, for example in JPEG documents. We
give here for the reference the formulæ used. If Fkl is the matrix representing the
image (or its fragment: JPEG uses 8 × 8 blocks), its transform is defined by
Gpq = Cp Cq
M
−1 N
−1
X
X
Fmn cos
m=0 n=0
0≤p<M
. (3.12)
0≤q<N
π(2n + 1)q
π(2m + 1)p
cos
,
2M
2N
where
Cp =
√
1/ M ,
p
2/M ,
p=0
,
1≤p<M
Cq =
√
1/
p N,
2/N ,
q=0
.
1≤q<N
(3.13)
The formula (3.12) is invertible, and its inverse is given by:
Fmn =
M
−1 N
−1
X
X
p=0 q=0
Cp Cq Gpq cos
π(2n + 1)q
π(2m + 1)p
cos
,
2M
2N
0≤m<M
. (3.14)
0≤n<N
The usage of this formula for the image compression is discussed in another set
of notes. We want only to signal here that usually the DCT is dominated by low
frequencies, and the other can be eliminated without introducing visible distortions.
The DCT of the “eye” (gray) picture gives G11 = 1589, (this is the frequency 0),
12 values near the origin are bigger than 100, and the remaining (of 240) are much,
much smaller!
Chapter 4
Colour Space Manipulations
4.1
Colour Spaces and Channels
By the name “Colour Space” one usually denotes various linear or non-linear combinations of the three “basic” colours in conformity with the generally accepted
trichromic theory. An introduction to the thory of colours is presented elsewhere.
If the manipulations of colours are linear and “synchronous” (the same operation
applied separately to each colour plane of the image), we can use the space we want,
and specifically the representation RGB, because often the images are stored in the
computer memory as 3 matrices, one for each plane R, G or B.
These planes will be called channels. A colour image needs at least 3 channels
in order to represent (more or less) faithfully all hues, but we might need more than
that. In particular:
• We shall use the subtractive CMYK (Cyan – Magenta – Yellow – blacK) space,
as this particular set of channels is well adapted to the printing process: the
more ink you put on paper, the darker is the result. When preparing a picture to be printed and adjusting some colours, this space is most often used.
But also for some interactive colour balance adjustments on the screen the
CMYK space is useful, as the factorization of the global luminance (or rather:
darkness) makes it easier to tune the picture.
• When some irreversible manipulations take place, for example when the histogram equalization eliminates some colours in favour of other, more frequent,
such operation cannot be done separately for each plane, otherwise the colours
might get severely distorted. Usually one separates then the luminance and
the chrominance, transforming everything for example into the CIEL*a*b, or
even XYZ space. The luminance channel undergoes the histogram massacre,
but the chroma channels are left intact, and then the RGB representation is
reconstructed.
• If for some reasons the colours must be quantized, and an optimized palette
chosen among all the 224 TrueColours, a judicious choice of channels may be
30
4.2 Transfer Curves, and Histogram Manipulations
31
very helpful.
• When combining the image fragments, superposing or masking them the transparency, or α channel is very important. This is not a “visible” colour, but
affects the display of other channels.
A multi-channel image may have several artificial channels, which during the
final rendering will be flattened out, and disappear, but without them the image processing would be a horribly cumbersome task, very clumsy and difficult
to learn.
Some examples might be useful here. Looking at the original Lena portrait we should
remark something strange. Not only the plume of her hat is violet, but her eyes
as well. . . In fact, the colour shifting is a popular technique in advertising and in
press, to render the food pictures or the skin of beautiful girls more appetising. The
Fig. (4.1) shows conspicuously that the picture of Lena is stained. For – probably
– some deep philosophical reasons the Playboy editors found that the overall tone
of the image should be rose. Using the CMYK separation, and eliminating some
Magenta we “normalize” the result.
Fig. 4.1: “Playboy” models’ life is not always rose. . .
4.2
Transfer Curves, and Histogram Manipulations
The left picture on Fig. (4.2) is not very interesting. One can hardly distinguish the
details (in fact the standard brightness and contrast of the author’s screen are such
that the picture is almost completely black). It was produced by Povray from the
file piece3.pov written by Truman Brown without the application of the standard
gamma correction. Of course, we could enhance manually the brightness and the
contrast, or modify the transfer curve, but sometimes a more automatic approach
would be useful. We shall come to that.
4.2.1
32
Transfer Curves
What are transfer curves? They are just transformations in the colour space. A
diagonal line Iout = Iin is the identity. In order to enhance the contrast the
curve should make clear areas clearer, and dark – darker. The overall brightness is
enhanced by the vertical lifting of the curve. The two remaining fragments on Fig.
(4.2) show a manually tuned curve and the result of the operation.
Fig. 4.2: Adjusting transfer curves
More or less the same result may be obtained automatically through the analysis
of the histogram of the image. The full TrueColour histogram is a vector with
2563 entries, and is usually not done for obvious reasons, the number of “bins” is
enormous, and most of them will be empty anyway. One usually constructs three
histograms, one for each plane, or even just one for the luminance. The result in
the case of the “piece3” example is shown on Fig. (4.3).
25
20
15
10
5
0
0
20
40
60
80
Fig. 4.3: Histogram of the Piece3 image
We see that the histogram is null above the index 80. More than two-thirds of the
colour space is empty for each channel. In such a case it would be sufficient to
multiply every colour by 3 (more or less), and that would render the image brighter.
But not enough! The histogram is still biased towards small luminance. Moreover,
in cases where dark and bright areas coexist, but they are far from equilibrium, as
shown at the left of Fig. (4.6), no histogram stretching is possible.
4.2.2
33
Practical gamma correction
A very important and typical transformation in the colour (mainly: intensity) space
is the power-law:
(4.1)
Io = Iiγ .
In fact such correction is usually introduced already during the acquisition of a real
scene by a camera. But if the image is synthesised, created without camera. . .
The well-known RT package Povray until recently produced “raw” images, but
the version 3 permits to add a power-like correction directly during the rendering. A
relatively easy way to brighten the piece3 image has been shown. We know already
that the bright segment of the transfer curve (intensity near one) is irrelevant, as
the histogram is concentrated around zero. This dark part can be easily lifted by
the formule (4.1) with γ about 0.33. But the contrast will be too low, the image
will be grayish, as shown on Fig. (4.4), where γ was chosen equal to 4. (In fact this
is the inverse of what is normally called γ in the image transfer jargon.)
Fig. 4.4: “Piece3” with gamma-correction
Before continuing the subject we have to point out that the manipulation of the
transfer curves might be used to obtain completly delirious artistic effects. We give
just one complex example, which begins here, but which will be modified later.
We begin with a gray, Gaussian random noise image. Some percentage of background, white pixels turned into gray. Then the pixel diffusion was iterated a few
times. This operation displaces a pixel, exchanging it with one of its neighbours,
randomly chosen. Despite the naı̈ve hope that this shall produce a uniform chaos,
what we really see, is the agglomeration of colours: when two identical pixels approach each other, the diffusion tends to keep them near. Chaotic diffusion increases
fluctuations. Such are the laws of physics, but we cannot comment that here.
Then, resulting image is blurred with a Gaussian filter, and an arbitrary, manual
manipulation of the transfer curves for each component separately finishes the task.
34
Fig. 4.5: Random colour distribution
4.2.3
Histogram Equalization
However, we can perform then an irreversible (in general) operation called histogram equalization: The pixels will change colours in such a way that the colours
almost non-existent on the image (with very low histogram value) might disappear,
but the colours strongly populated will be alloted more “Lebensraum”. The Fig.
(4.6) shows one possible result of the equalisation operation.
Fig. 4.6: Euro-Disneyland, the truth, and the treachery. . .
This manipulation is not unique, at least three popular variants exist, and they
might be accompanied by some contrast and brightness, or gamma correction. The
idea goes as follows: first the histogram is constructed, and the avarage colour (or
brightness) computed. Then, for each colour within the histogram add its population
to an accumulator. If the result is bigger than the average, subtract this average
and allot one new colour bin to this entry. If after the subtraction the result is still
big, because this colour was very popular, subtract the average again, and add a
new bin, and repeat this operation until exhaustion. The accumulator may have
some residual value different from zero. To this value the next histogram column is
added, and the iteration repeated. If some colour is so rare, that adding its histogram
entry to the accumulator does not make it pass the threshold, this histogram entry
35
is eliminated. At the end of each iterations we have a range of new colours which
can be attributed to one on the original image.
A transformation “one-to-many” cannot be reversible in general. There are, as
mentioned above, three popular strategies of choice:
1. Choose the center of the alloted range of colours. In the new histogram the
popular colours will be more widely spaced. This is the simplest possibility.
2. Choose randomly, independently for each pixel a new colour among all eligible
(within the calculated range). This introduces a noise into the image, but it
might be less disturbing than a severily quantized colour space.
3. Choose – if possible – the colour of the pixel’s neighbours (e.g. the median, or
the average). This avoids the noise without impoverishing the colour space, but
it blurs the image. This variant is used rarely, as it is is quite time-consuming.
We remind once more that the equalization might be done for each channel separately, or globally for the luminance without touching the chroma channels, and
then reconstructed. The first variant usually decreases the colour saturation. (Apparently this is the method chosen by PaintShop. Photoshop produces equalized
images much more colourful.)
We present now a complete Matlab program which does the histogram equalization (one channel), and we show some results. On Fig. (4.6) the equalization
had been done with Photoshop. PaintShop produces something similar, but almost
completely gray, the colours have been too well equilibrated. . .
a=imread(’picture.bmp’);
%a: matrix NxM.
[n,m]=size(a);
%dimensions
nw=zeros(n,m);
%New matrix
his=zeros(256); % The histogram vector, initialized.
for iy=1:n,
for ix=1:m, fc=a(iy,ix)+1; his(fc)=his(fc)+1; end;
end;
avg=sum(his)/256; % The average.
lft=zeros(1,256); rgt=zeros(1,256);
nh=zeros(1,256); % New hist
rule=input(’Which rule? (1 or 2) ’);
rr=1; accum=0;
for z=1:256,
lft(z)=rr; accum=accum+his(z);
while accum>=avg,
4.3 Transparence Channel
36
accum=accum-avg; rr=rr+1;
end;
rgt(z)=rr;
% Number of alloted columns is ready
if rule==1, nh(z)=floor((rr+lft(z))/2)-1; % The new value
else nh(z)=rgt(z)-lft(z); end;
%The interval.
end;
% New image reconstruction
for iy=1:n,
for ix=1:m,
z=a(iy,ix)+1;
% Sorry, no zero index in Matlab
if lft(z)==rgt(z), nw(iy,ix)=lft(z)-1;
else
if rule==1, nw(iy,ix)=nh(z);
else nw(iy,ix)=lft(z)-1+rand*nh(z); %The new value here.
end;
end;
end;
The second rule may be modified, instead of using a new random value for the
histogram bin where each pixel is tossed, the random generator adapts itself, giving
more chances to the poorer. We don’t give the solution for a possible third rule
which assigns the new colour depending on the pixel environment. This is slow, and
delicate: when a new colour is assigned in a xy loop, we don’t know yet the colours
of all the neighbours of the modified pixel. So, anyway a mixed strategy seems
easier to adopt: the rule 2 is used, and then the image is despeckled by averaging
or median smoothing.
4.3
Transparence Channel
The administration of the transparent areas on the image will be discussed in the
next section. We note only here that there are several possibilities to deal with
“invisible” zones of a picture.
• For indexed images one specific palette index is treated as the “transparent”
colour. This is often used with GIFs.
• A full α channel is an integral part of the image. In such a way it is possible
to specify the degree of opacity.
• If more than one transparency channel is used, for example if a specific opacity channel is attched to every other “visible” colour plane, the transparence
becomes colourful, which can be used for many special effects, artistic, or
technical, such as selective visualisation of very complex volumetric data.
4.3 Transparence Channel
37
The full power of the transparence channels shows itself when image are composed
and superposed. In the simplest case the displaying engine should only lookup the
transparency value of a pixel and to display it or not. More precisely: it should
display either the pixel or the background, explicit, or by default. Of the full α
channel is present, it is necessary to perform the interpolation, the displayed colour
is equal to I = αI + (1 − α)B, where I is the image, and B – the background. We
see that α here is the opacity rather than the transparence.
Some delicate questions may be posed concerning the influence of filtering on the
transparence channels. The subtraction of two transparence values is rather difficult
to understand, and unless you know what you are doing, it is better to stay far
away from that. On the other hand, a blending between a “normal” colour and the
transparence channel is extremely important – this is a standard way to introduce
soft shadows to images.
Chapter 5
Image Algebra
5.1
Addition, Subtraction and Multiplication
Apparently there is nothing really fascinating here. If we manipulate images as
numeric matrices, we can add them, multiply by constants or element-wise, bias
the pixel by some additive constants, etc. There is just a few intuitive rules to
master. The reader knows already almost everything if he learned well the filtering
operations
1. Multiplication by a constant c < 1 darkens the image, and c > 1 lightens it.
Such multiplication is the simplest case of histogram stretching.
2. Averaging two images interpolates between them. Usually adding is followed
by the division of resulting values by 2, otherwise the result which always
enhances the intensity of the concerned channel may be illegal. By varying
in a loop the parameter α ∈ [0, 1] used to combine additively two images:
I = (1 − α)I1 + αI2 , we obtain a blending (fading-off) between the two source
images, often used in animation, or real movies.
If we dont need just one interpolation, but a whole sequence (for example in
morphing discussed in the section (7.2)), or other kind of animation, the linear
(in time) blending might be too sharp, and often a different interpolator: a
sigmoidal function: I = (1 − s(α))I1 + s(α)I2 , where s(α) = 3α2 − 2α3 , is used.
(Here α is the interpolation time, between 0 and 1. This function (which is one
of the Hermite basic splines) maps the unit interval into itself, and the “speed”
of mapping varies smoothly at the beginning and the end of the process.
Don’t forget that geometric manipulation of images need more elaborate interpolation between pixels. Several image processing packages, as PhotoShop offer a smoother interpolator: the bicubic, two-dimensional Catmull-Rom spline.
This is not discussed here.
3. The subtraction in general may produce negative numbers, and the arithmetic
package should crop them to zero. Such effects as embossing, etc. need the
38
5.1 Addition, Subtraction and Multiplication
39
addition of an additive constant to the result of a subtraction. Of course this
addition is performed before the eventual cropping.
4. The possibility to multiply the images means that the colour space is normalised to [0, 1]3 , all three planes contain the percentage of the “full” colour.
Some packages, e.g. PaintShop apparently do it wrongly, and the multiplication of two dark images might produce something bright. Of course, if 0 is
black and 1 is white, the multiplication can only darken the image, and this
is the way of adding ex post shadows to an image.
How to lighten an image fragment? Simple: take the negative of the image
and the negative of the “anti-shadow”. Multiply them, which will darken the
result, and invert it back again.
Of course the division of images is an ill-defined operation, and should be avoided,
unless you have some private crazy ideas about its meaning.
(The reader who follows a more detailed course on image analysis knows nevertheless that a division is used to sharpen the images. If the image is smoothed
by its convolution with a Gaussian-like filter, in the Fourier space the transformed
image is the product of the transforms of the image and the filter. So, if we have a
blurred image we can – looking at the supposed edge smearing – extract the width
of the blurring filter which would produce the same effect, prepare artificially the
trasform of this filter, and divide the image transform by it. The inverse transform
of the result should sharpen the image. This technique is often used in astronomy.
Of course it might introduce, and usually does, a substantial local noise, due to the
fact that big frequencies have been enhanced. The Fourier transform of a Gaussian
is a Gaussian, and dividing by it raises considerably the values of the “frequency
pixels” far from the origin.)
5.1.1
Some Simple Examples
Let us construct an artificial shadow shown on Fig (5.1). There is absolutely
nothing fascinating here, this picture has been made in one minute. The shadow is
constructed from the distorted letter, multiplied by the background image.
A very simple image package will require from the user the correct choice of the
filling gray before applying the multiplication, otherwise the shadow may become
too dense or too clear. But a more intelligent and more interactive approach is
possible also – don’t forget the α channel. Even if it is not available explicitly (as in
some contexts in Photoshop, where apparently the transparence manipulations have
been designed by a committee of 1439 ambitious experts. . . ), it is usually possible
to declare globally the transparence of the blending colours.
We have mentioned that in order to “anti-shadow” a fragment of the image we
shadow its negative and we invert le result. But this is not always what we want.
If a bump-mapping texture produces a strong light reflex on the surface, this effect
5.1 Addition, Subtraction and Multiplication
40
Fig. 5.1: Simple shadow
should be added to the normal, “diffuse” colour. More about that will be found in
the section (6.1) devoted to the bump mapping.
Another example is a variant of the solarization effect. A solarized picture is in fact
a badly exposed and/or badly processed photograph. The dark and middle zones
remain without changing, but parts which on the original are too light are “burned
out” and become dark. Fig. (5.2) shows two variants of this effect. At the left we
see the original Photoshop solarization, and at the center – our modified variant,
which additionnaly enhances the contrast.
Fig. 5.2: Solarization
The middle picture is particularly easy to obtain: it suffices to compute the difference
between the image and its negative, and to invert the result. In fact, for light areas
we get 1 − (I − (1 − I)) = 2(1 − I), an enhanced negative. For the light parts we
have 1 − (1 − I − I) = 2I. The “original” solarization may be obtained with a trivial
filter – divide the image by two.
Obviously, the solarization applied to colour picture gives usually useless and
disgusting effects (Who wants a deadly green Lena?) But the particular intensity
5.2 Working with Layers
41
shift may introduce some almost specular, silky reflexes, whose colour may then be
corrected by the “hard light” composition rule discussed in the next section. This
gives us the right picture on Fig. (5.2).
5.2
Working with Layers
Layers are conceptually simpler than channels, but technically they may be quite
comples. They can be thought of, as superposed transparencies. When we have
to compile a rather complicated algebraic superposition of various images, it is
preferable to put all the parts on separate layers, eventually to duplicate and mask
some of them, and when the image is ready, we can collapse all the layers into one.
Conceptually a multilayer image is just a M × N × D “tensor”, where D is the
number of layers. Of course, it is possible to do everything using separate images
In popular packages like Photoshop they are integrated into the interface. The
advantage of such protocol is thet the layers are ordered and we see immediately
which one may cover the others. The layer interface provides automatically some
number of operators, for example a two-layer image may be composed declaratively
as a “normal” superposition, where the upper image dominates everywhere where
it is not transparent, or a multiplication, difference, etc.
Perhaps it might be useful to comment various layer combination modes choses
be the Photoshop creators according to user wishes. We know that the normal mode
covers the underlying layers. Here are some of the remaining modes. We don not
plan to teach the reader how to use Photoshop, but to suggest which facilities should
be present if one day he tries to construct his own image processing superpackage,
which will replace Photoshop, Gimp, and all the others. You will see that many of
these modes are simple combinations of more primitive operations.
There is a difference between global layer composition mode, and the drawing
tool modes.
Suppose that the upper layer pixel has colour c0 , and the layer beneath – c1 .
• Dissolve. This mode constructs the pixel colour c by a random choice between
c0 and c1 , depending on the opacity of c0 . If the upper layer is opaque, always
c0 is chosen. This might be used to simulate a spray (airbrush) painting.
• Behind. Used for painting. If a layer is thought of, as an acetate sheet, the
normal painting (brush, etc.), replaces the existing colour. The “behind” mode
is equivalent to painting at the back of this sheet. Of course it has some sense
if the sheet contains some transparent or semi-transparent areas. In such a
ways one layer can be used twice.
• Clear. It is just the eraser, but attached to the line (stroking) tools, or filling
commands (path fill and paintbucket). It renders the touched areas transparent, and of course is used when the manual erasing would be too cumbersome.
42
• Multiply. Multiplies the pixel contents considered as fractions between 0 and
1, channel by channel. The result is always darker, unless one of the concerned
pixels is white. This may be a global option.
• Screen mode. This is the “anti-shadowing” effect. For each channel separately
the inverses c1 and c0 are multiplied, and the inverse of the result is taken.
The final effect is always lighter. Screening with white gives white, and with
black, leaves the colour unchanged.
• Overlay mode. This is a little complicated, and may either multiply or screen
the pixels depending on the base colour. If the base (c1 ) and the blending
(upper, c0 ) colours are random, the result is difficult to understand. The
idea is to preserve the shadow and the lights of the base colour: where it is
strong, it remains, otherwise the blending colour “wins”. The result is af we
looked at the base image through a coloured glass, but “active”, i. e. a white
blending colour may lighten the image, give some milky appearance to it, while
black does not destroy tha image, but darkens it, and the darkening is more
pronounced where the areas are already dark.
Yes, obviously an example is needed. . . Fig. (5.3) shows the effect of overlaying.
Fig. 5.3: Overlay mode
Exercice. Try to deduce the underlying colour algebra for this operation.
• Soft Light mode. Now the blending colour may darken or lighten the base
image. Imageine that c0 represents a diffused spotlight. If it is light, lighter
than 0.5, then the image is lightened, otherwise it is darkened. (An “anti-light”
effect). If the blending colour is white or black, the result is pronounced, but
never turns int white or black.
• Hard Light mode. Here the effect is stronger, the image is lightened or darkened, depending on the blending colours. If c0 is white, the result is screened,
if it is dark, the result is multiplied, so it is possible to get very light (white)
highlights, or very deep shadows.
43
• Darken. This is simple. The darker colour is chosen. There is also the
“Lighten” option.
• Difference. The result is the absolute value of the subtraction of two colours,
the result is never negative.
• Hue mode. The hue (spectral location) of the blending colour replaces the
base hue, but the saturation and the luminance remain. Imagine that the
base image has been converted into gray, and then artificially coloured with
the blending pixels. The Color mode replaces not only the hue, but also
the saturation, the effect of artificial tainting is more prononced then in the
previous mode. The Luminosity mode is the inverse of Colour – the hue and
saturation remain, the luminance is taken from the blending image.
There is finally the saturation mode which takes the saturation only from
the blend layer. This mode may be used for the selective (through painting) elimination of some too saturated colours, for example for simulating the
atmosphering colour attenuation with the distance.
5.2.1
Construction and Usage of the Transparence Channel
All these, algebraic and layer operations are mathematically trivial, and belong to
the practical folklore of the image processing art. We have mentioned that the
work with the transparence channel may be delicate. If it is not accessible directly,
several possibilities to simulate it exist, provided the selection tools and the image
arithmetics is sufficiently complete. In particular, knowing that the transparence
per se cannot produce visible results, superposing a partially transparent image I1
with the opacity α < 1 over the base image I0 , means that the result is computed
as I = αI1 + (1 − α)I0 (unless some non-standard arithmetic: “screen”, “soft light”,
etc. mode is used. We shall not discuss this here).
Some image synthesis packages, as 3DMax generate automatically the alpha
channel: everything which belongs to the image is white, and the background is
black (if we choose to “see” the opacity in such a way). This may be very important
when the image is subsequently used for animation – the system automatically filters
out the transparent zones when composing the final picture.
If you have to do it by hand, and if you wish to obtain an effect shown on Fig.
(5.4), you must
• Pick out the fragment of the original image (at the left) which will be erased,
and construct a mask M .
• M – say – black on white, is multiplied by the image. The face is erased.
• The same mask, but inverted, multiplies the replacement (unfortunately its
original disappeared somewhere. . . )
Fig. 5.4: Composition of two images
• The two components are added.
44
Chapter 6
Some Special 3D Effects
6.1
2D Bump mapping
The technique of bump-mapping is often used in the synthesis of 3D scenes, where it
simulates spatial textures: by deforming the normals to the surface of the rendered
object, it modifies the illumination conditions, and simulates bumps, rugosity, holes,
etc.
The aim of the bump mapping in the domain of image processing might be
different. Of course, we can produce some 3D effects which simulate the extrusion of
a flat contour, for example a text, but more often this technique is used to add some
texture to the surface of the image: to simulate a canvas, a piece of hairy tapestry,
etc. The 2D image is already rendered, and adding some additional lighting effect
should not deteriorate the colour contents of the picture, so the bump mapping
should not be overdone. The shadows should not be too profound, and the specular
spots rather narrow. And, in general, the size of bumps should not be too big either.
Fig. (6.1) shows the result of the application of a bump map.
Fig. 6.1: Bump mapping
This effect is produced by the simulation of a directed lighting on a two-dimensional
image, and in general may be quite involved, with many parameters. The basic idea
goes as follows. Imagine that you have a (fake) extrusion given by a gray (intensity)
image, whose one-dimensional section is shown on Fig. (6.2). The light beam is
represented by the red dashed lines. The bump map has the same size as the work
image. Those regions which correspond to the angle 90◦ between the normal to the
45
6.1 2D Bump mapping
46
bump profile and the light direction will be enhanced. Any specular model may
be used, for example the Phong formula Is = k cos(θ)n where n may vary between,
say, 3 and 200. But beware: in modeling glossy effects on a 3D scene the specular
contribution is always positive. Here the shadow is active and may darken some
parts of the image. We have to choose a priori the “neutral” direction (usually
“horizontal”) where the lighting effect vanishes.
Fig. 6.2: Bump mapping geometry
Our mathematics is conditioned by the fact that we don’t have the profile geometry
explicitly given. The bump map is just an image, where the black areas represent
the extrusion, and white is “flat” (or vice-versa). These are the main computations
involved:
• Deduce the fake normal vector to the bump image.
• Compute the scalar product of this normal and the direction of light.
• Enhance the result using the Phong (or other) formula, and split the dark and
light areas in order to superpose them on the work image.
• Complete this last algebraic manipulations.
The blue (mostly negative, slightly truncated) profile on the Fig. (6.2) is the cos(θ)
where θ is the angle between the normal and the light direction vector, normalize
so as to “neutralize” the horizontal parts of the image.
The normal is a vector orthogonal to the gradient, and the gradient is just an
exercice in filtering. It should be done carefully in order not to truncate too early the
negative values. We can obtain separately the x and y components of the gradient, or
directly, by a judiciously chosen offset, the gradient whose xy projection is collinear
with the light beam direction. The normal to the image has the same property. We
6.2 Displacement Maps
47
obtain a standard directional “embossing” effect, which is then separated into light
and dark contribution by the (signed, and truncated) subtraction of the neutral
gray. The contrasts of the shadows and reflexes should be enhanced, and the rest is
almost trivial.
6.2
Displacement Maps
This is an extremely popular and powerful deformation technique. In general, geometric (“horizontal”, not just in the colour space) deformations, such as twirling,
bumping of a part of the image etc. are not presented as mathematical, analytically
given transformations, but their sources are shapes themselves.
The displacement map is an image. It may be of the same size as the deformed
subject, or any other, in which case we will use scaling. Call Ixy a point on the
original image, and Dxy the colour value corresponding to the reduced point of the
map. The reduction takes place if the sizes of I and D are different. Then, the point
(x, y) of the image corresponds to
WI
x,
(6.1)
WD
HI
y =
y
(6.2)
HD
on the map. The inverse transformation, which is trivial, will also be needed. Note
that in general these transformations are real, not integer. The value of D determines
the displacement vector, which in general has two components, so the displacement
map should be at least a two-channel picture. (Photoshop uses the Red and Green
channels, Blue is neutral). Since the value of the pixel is somewhat conventional
(the interval [0, 255] is meaningless), one usually adopts an additional convention:
the maximum displacement length (in pixels) s is established independently of
the map D. The minimum colour (0) of D corresponds to the maximum shift in one
direction, say, to the left, and the maximum value – to the right. The “gray” value
(say, g) is neutral. Of course a more general algorithm can introduce a special offset,
but it is a complication which can be resolved separately. Now, the deformation goes
as follows.
For all the pixels (x, y) of the new, deformed image the point (x, y) on D is
computed. Suppose that the map is by convention normalised so that the maximum
colour value is equal to 1 (then g = 1/2). The vector generated by the two used
channels of the map image is Dx , Dy ). From this value the displacement vector is
calculated:
x =
(dx , dy ) = (s · (Dx − g), s · (Dy − g)) .
(6.3)
(Of course, it is possible to use different horizontal and vertical scales and offsets).
Now, the new pixel at (x, y) is not Ixy , but another value Ix0 y0 , where
(x0 , y 0 ) = (x + dx , y + dy ),
(6.4)
48
interpolated appropriately, since (x0 , y 0 ) need not be integer, and the neighbouring
pixels are looked up.
As an example we show how to produce an enlarging lens: a circular displacement
map which moves to the left all pixels which are to the left of the center, and to the
right the right pixels. The same holds with respect to the vertical coordinate. Fig.
(6.3) shows the two displacement channels and the result.
Fig. 6.3: Simulating lenses by displacement map
Here the gradients were linear and the displacement zones were homogeneous
either vertically or horizontally. Much more elaborate effects can be obtain by using
blurred d-maps, and by masking the deformed zones, confining them to some areas.
Th Fig. (6.4) shows a little different distortion of the original, which simulates a
hemi-spherical transparent “water drop” on the image. Fig. (6.5) suggests how to
derive the displacement map image from the desired geometry, but we suggest that
the reader performs all the computations himself.
Fig. 6.4: Displacement map simulating a transparent hemisphere
6.2.1
Another example: turbulence
Many packages offer the “twirl” effect which simulates a whirlpool. Suppose that we
wish to simulate more or less realistically a physical whirl vortex, for example a big
49
Fig. 6.5: The “water drop”
cyclone needed for an atmospheric texture. We suppose that it is a 2-dimensional
(cylindrically symmetric) problem. Fig (6.6) shows some geometric relations which
must be obeyed according to some conservation laws. Suppose that the “matter”
moves counterclockwise, and is sucked into a central region, where the laws are
different.
r0
r1
dr
Fig. 6.6: The geometry of a tornado
The thin ring of radius r and width dr occupies the area 2πr dr. If we assume that
the “matter” inside is not compressible (or that its pressure does not change significantly), the surface conservation law determines its radial speed. The constancy
of the area means that dr = const/r,
which, when integrated over time gives the
p
functional dependence r(t) = r0 1 − t/t0 . The constant t0 is chosen so that at
that time the vortex element falls into the center. This time depends on the initial
conditions and on the “force” of the vortex. In order to construct an incremental
displacement map we will not need this integrated formula, but it might be useful
to know it.
This result is independent of the angular motion. Here the angular momentum conservation determines the rotational speed. For a thin ring its angular momentum is given by M = 2πr dr · ωr, which means that r2 ω is constant. We
may thus postulate that in a short (and constant) interval of time we have the
displacements:∆r = c/r, and ∆φ = h/r2 with some appropriate constants. These
50
equations need to be translated into the Cartesian system, an exercice which we
leave for the reader. Fig. (6.7) shows the map and the result of its iterated (about
15 times) application to some ordinary fractal clouds.
Fig. 6.7: Tornado
We have eliminated the singularity at the center by introducing a small neutral zone,
but many other possibilities are possible, for example replacing 1/r by r/(r2 + ).
It should be obvious for everybody that this technique, or a similar one is still
not a good way to produce realistic cyclones. The clouds here are smeared, while
any satellite photo shows that at the scale of the turbulence the cloud edges are
quite sharp. Actually, the clouds are created within the turbulence, and their local
properties are not determined (or only partly) by the spiraling motion.
A good cloud model, which takes into account the pression of the air and the
condensation of the steam due to the adiabatic decompression, and which somehow
incorporated the third dimension into play, simply does not exist. Any takers?
Chapter 7
Other Deformation Techniques
7.1
Warping
Warping became an over-used term. It may be a general deformation, but for us it
will mean an interactively produced image distortion, which respects some continuity
constraints – as if we put the image on a rubber sheet, and then the ruber was
deformed with all the pixels following (and, of course with some interpolation). The
warping is the subject of a thick book, and we cannot discuss here all the possible
techniques, nor applications.
It is not possible to teach how to practically obtain particularly disgusting deformations of our favourite political personages either.
Thus, we sketch only the usage of the elastic deformation theory, and we suggest
how a sufficiently general warping package might be structured. We stress that
the elasticity is just one among many possible models, and its only purpose is to
introduce some regularity into the deformation process, to restrain the madness of
the creator. Imagine – for simplicity – a one-dimensional elastic line shown on Fig.
(7.1). Now, we should not confound the coordinate position x on this axis, and
the dynamic position which we call p(x), or the dynamic displacement of a point,
which we call d(x). This is a trivial observation for a physicist, but for computer
science students the fact that the displacements belongs to “the same space” of x is
confusing. In the initially static configuration p(x) = x, or d(x) = 0.
x00 ,
x01
x1
x0
x02
x2
Fig. 7.1: One-dimensional elasticity
The original configuration is the lower line. The rubber element at the point x0 has
been displaced, and finds itself at x00 . Thus, p(x0 ) = x00 . Its neighbours follow, but
their displacement is not parallel to the central point, because the element at the
51
7.1 Warping
52
left is attracted by the left neighbours, and the element at the right is less attracted
(or more repelled) by its right neighbours. The neighbourhoods are infinitesimal,
we may consider x1 = x0 − dx; x2 = x0 + dx, but again – do not confound this, with
the – also infinitesimal – displacement d(x0 ) = x00 − x0 . We remind that the original
configuration is in equilibrium, the elastic forces cancel themselves.
Now we displace the element at x = x0 . What force acts upon it now? We
build-up the following, easy to grasp equation:
F (p(x)) = k (d(x + dx) − d(x)) − k (d(x) − d(x − dx))
= k (p(x + dx) − 2p(x) + p(x − dx)) ,
(7.1)
which takes into account that the net force is the result of the incomplete cancellation, and that the difference between the displacements of two neighbours is equal
to the difference between the shifted positions, the absolute contributions cancel.
We see that the development of p(x + dx) into the Taylor series about x must go
up to the second derivative, as the first derivatives cancel. Knowing that the force
is proportional to the acceleration d2 p/dt2 , we have
∂2
∂2
p(x, t) = C 2 p(x, t),
∂t2
∂x
(7.2)
i. e. the one-dimensional wave equation, as expected. In the multidimensional
case the spatial derivative should be replaced by the Laplacian. In the equilibrium
∆p = 0. When we pass from the (2-dimensional) continuum to a discrete grid
indexed by pairs ij : xij + dx = xi,j+1 etc., the Laplace equation takes the form
4pi,j − pi−1,j − pi+1,j − pi,j−1 − pi,j+1 = 0.
(7.3)
This means that the equilibrium position of each node is the symmetric barycenter
of its four neighbours. We might – perhaps this digression will be useful for some
readers – derive this formula directly within the discrete formulation. Imagine a
discrete grid whose vertices are connected with elastic springs. The potential energy
of the system is equal to the sum of the energies of the corresponding oscillators:
U=
kX
(pj − pl )2 ,
2 jl
(7.4)
where j and l are two-dimensional, double indices, locating the vertix in the grid.
The sum goes over all pairs of neighbouring indices, i. e. over all springs. The
equilibrium is obtained when the energy reaches its minimum, as a function of the
positions {p}. The derivative over pk gives
X
∂U
(pk − pi ) = 0
=k
∂pk
i
for all k,
(7.5)
which agains shows that pk is the arithmetic average of the positions of its neighbours. Now, in order to solve numerically such set of equations, we must fix some of
7.1 Warping
53
the vertices, introduce some boundary conditions. Fig. (7.2) shows what happens
when we displace one point and fix its position. The left drawing shows the initial
equilibrium, which is trivial, but there is no cheating: we have fixed the border of
the grid, and a MetaPost program found the internal vertices.
Fig. 7.2: Elastic displacement
We see that the elastic adjustment of the neighbourhood does not necessarily prevents the mesh cross-over. In this example the mesh boundaries are too close. Fig.
(7.3) shows a less drastic mesh folding.
Fig. 7.3: Grid size importance
If huge, but very localized warping is needed, it is better to do it in several stages, or
to use a different elastic model, otherwise a very dense grid would be needed. The
solution of the discretized Laplace equation is easy: the assignment (7.3) is iterated
until convergence. If the grid is large, this process might be slow.
In general, which facilities should provide a warping package?
1. Of course it should ensure a correct interfacing for the image import, and the
image and animation export. More about that in the morphing section below.
The grid should be adaptative.
2. It does not need to be visible. It is perfectly possible to give the user just
the possibility to fix some point or lines (segments, polygons, splines), and the
system may choose the grid in function of the geometric details.
7.2 Morphing
54
3. The warping may be defined by dragging points, lines, or whole areas. The
internal boundary conditions may be established, and the system should understand that if the warping zone is constrained within a closed area, its exterior
does not participate in the iterative solution of the Laplace equation.
7.2
Morphing
Morphing is a combination of (generalized) warping and colour interpolation, which
deforms one image in such a way that at the end it becomes another one. The
Internet is full of clichés with the deformation of human faces one into another (e.
g. the Michael Jackson’s “Black and White” clip), or the transformation of human
faces into a beast (wolf, tiger, etc.). Thus, we shall not show a morphing example
here.
Usually the warping phase is liberal, and there is no “elasticity” involved, especially if the source and target images are so different that it is difficult to find some
common topologic patterns. The user splits the source image in simple polygons,
usually triangles, although any contours can be used, provided that the morphing
software can perform well the nonlinear transformations between arbitrary closed
contours.
On the target image the same set of polygons is automatically re-created, and the
user deforms them manually, moving the vertices. Most of the popular morphing
programs are too tolerant: the user may produce topological singularities forcing
some triangles to overlap, or leaving holes. Even the construction of the covering
triangles may be clumsy, and some packages help the user by an automatic triangulation, for example using the Delaunay algorithm, which produces “equilibrated”
triangles, not too elongate, if possible. The user chooses only some number of control points. For example, when morphing faces it is natural to localise the eyes the
mouth corners, and the face contour.
The affine transformation between triangles has already been discussed, the only
modification here is that this transformation, and the corresponding texture transmutation are multi-stage processes. If a value: the point position or the pixel
colour passes from v0 into v1 in N stages, with v(0) = v0 and v(N ) = vN , then
v(i) = N1 ((N − i)v0 + ivn ) if we choose to apply the linear interpolation. Often a
more general approach is better for artistic reasons. Instead of using the linear interpolation: (1 − x, x) (where x symbolically denotes i/N ), the Hermite cubic function
3x2 − 2x3 is used. In the Michael Jackson’s clip a more sophisticated approach has
been adopted: some parts of the image converge faster to the final form than others.
When changing a human face into a beast (or into a well known politician), the
artist might ask himself several questions, for example:
• Perhaps it would be better to go with the warping all the way through before
interpolating the colours: first the distortion of the shape, and then put some
hairy texture upon it. Or, inversely, first put the hair, scales, or spots on the
7.2 Morphing
55
skin of the original human, and only then transform it into a lion, a fish, or
somebody whose name we know all, but we won’t mention it here.
• The speed of the transmutation need not only be non-linear, but it might be
asymmetric. Shall it start slowly, and speed-up, or decelerate?
The abundant amateurish morphing packages on the Internet continue to develop.
If you want one day to make your own, please consider the following advice:
1. The interface should be professional, and this does not mean that the number
of menus and buttons should be greater than 10. . . The package should be
able to import at least 3–5 standard image formats, and generate at least two
different ones. It should also be able to save on the disk the working context,
i. e. the point meshes and/or the triangles.
2. Don’t forget that a graphical interface without the “undo” command will soon
or later, but rather soon, end in the waste basket.
3. Use a professional colour interpolation scheme. Aliasing in morphs is rarely
acceptable.
4. Generate separate intermediate images, but learn and generate also some compound images, for example the animated GIFs. This is not difficult. You might
generate also some scripts which will drive the MPEG encoder launched automatically by your package.
5. It is very frustrating when one cannot morph two images which differ (even
slightly) in size. Offer the user the possibility to crop one of the images, or
to resize it, or the other one. Preferably both, with the resize parameters
constrained to preserve the proportions.
6. A dense mesh of control points and lines is a mess, badly legible on colour
images. Use colour for the control entities, but permit the user to mask the
colour (or to attenuate it) of the images on the display. Gray images are
significantly clear, and do not risk to render invisible red control dots.
7. Plug-in the Delaunay or other automatic triangularisation algorithm.
8. Cry loud when a topological incongruity is generated: overlapping triangles,
or holes. Think about offering – as an option – a limited warping schema, for
example the elastic model.
9. It will be nice to parametrise the transition speed, to introduce some nonlinearities, or even to choose different warping/interpolation speed in different
parts of the image, as suggested above.
10. Add some really original examples. This remark is very silly, of course. But
apparently the morphing package creators pretend to ignore it. . .

Image Processing - Jerzy Karczmarczuk

Transcription

Similar documents

modern designs - Resene Paints

`Tyci Tavoli series of tables` product card

PrimeFilm 1800 AFL Low-Cost 35mm Slide/Film Scanning Through

The new RS-35M, June, 2010 - Stu Martin, [email protected]

BALTICA 10

MODEL 324 ADDENDUM Please make the ,following changes in

Palace of Justice

Статья

Half Price Half Price Half Price Half Price Half Price Half Price

Untitled - System 24