Image Processing - Jerzy Karczmarczuk
Transcription
Image Processing - Jerzy Karczmarczuk
Introduction to Image Processing DESS + Maı̂trise d’Informatique, Université de Caen Jerzy Karczmarczuk Caen 1997/1998 Contents 1 What do we need it for? 3 2 2D Geometric Transformations 2.1 Typical Linear Transformations . . . . . . . . 2.1.1 Scaling, Rotation, Shearing . . . . . . 2.2 Other Affine Transformations and Perspective 2.2.1 Linear Transformation of Triangles . . 2.2.2 Nonlinear Deformations (Overview) . . 2.3 How to do Geometry in Discrete Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 9 11 12 17 3 Filtering 3.1 Introduction: Filters and Convolutions . . 3.2 Typical Filters and their Applications . . . 3.2.1 Smoothing filters . . . . . . . . . . 3.2.2 De-noising by median filtering . . . 3.2.3 Gradients . . . . . . . . . . . . . . 3.2.4 Laplace and Sobel operators . . . . 3.2.5 Sharpening, and Edge Enhancing . 3.3 Fourier Transform and Spatial Frequencies 3.3.1 Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 20 21 21 23 24 25 26 27 29 4 Colour Space Manipulations 4.1 Colour Spaces and Channels . . . . . . . . . . . 4.2 Transfer Curves, and Histogram Manipulations . 4.2.1 Transfer Curves . . . . . . . . . . . . . . 4.2.2 Practical gamma correction . . . . . . . 4.2.3 Histogram Equalization . . . . . . . . . 4.3 Transparence Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 31 32 33 34 36 . . . . 38 38 39 41 43 . . . . . . . . . . . . . . . . . . 5 Image Algebra 5.1 Addition, Subtraction and Multiplication . . . . . . 5.1.1 Some Simple Examples . . . . . . . . . . . . 5.2 Working with Layers . . . . . . . . . . . . . . . . . 5.2.1 Construction and Usage of the Transparence 1 . . . . . . . . . . . . . . . Channel . . . . . . . . . . . . . . . . CONTENTS 6 Some Special 3D Effects 6.1 2D Bump mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Displacement Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Another example: turbulence . . . . . . . . . . . . . . . . . . 2 45 45 47 48 7 Other Deformation Techniques 51 7.1 Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Morphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter 1 What do we need it for? Our course deals with the creation of images, and concretely with the synthesis and rendering of 3-dimensional scenes. What is the role of the image processing here, do we really need to add new topics to a domain already sufficiently rich? The answer is: yes, we do. Of course, in order to construct a sound 3D model and to launch a ray tracer, one does not need to master all, sometimes very specific 2D techniques, although at least the 3D scene should be constructed in a way adapted to its 2D presentation. (If you choose badly the position, the direction, or the focal attributes of your camera, even a wonderful composition of your scene won’t help you. . . ) If the rendering program/device has been constructed and piloted correctly, perhaps no post-processing is ever needed. However much more often one has to do some minor corrections, or more important alterations of the produced images. You might wish to add through post-processing some special effects which would be extremely costly if integrated into the 3D rendering, or just add some 2D external elements to the image, such as some text or frames. Also, when composing a synthetic scene with a “natural” texture, parts of a photograph etc., more often than not the contrast, brightness or colour distribution should be adjusted. The image processing domain has some independent, creative aspects as well. We will not speak here about artistic creation and painting techniques, although the author confesses that this is for him a fascinating subject. We might have to think seriously about: A Creation of various coloured textures: regular and stochastic; based on real photographs or totaly synthesized; simple or showing a replicated geometric pattern; texts, etc. In general – everything we may need to enrich the surfaces of 3D objects. B Compositions and collages; clips from one image added to another; elimination or replication of some fragments; retouch. C Colour space transformations • Adjustments of the luminosity, contrast, hue (colour bias) or/and gamma correction. 3 4 • Histogram equilibration (equalization) or stretching. • Dithering and halftoning. Colour quantization and creation of micropatterns: modifying the pixel structure of the image in order to simulate non-existing colours. • Thresholding; all kind of transfer curve manipulations and creation of artificial colour variations. Also: ex post colouring of gray photographs. • “Image algebra” (or arithmetic if you prefer). Addition, multiplication, and other variants of image “blitting” or “rasterOps”. These techniques permit to change colour distributions, but also to add/subtract image fragments, administrate the “sprites” in animated pictures, etc. D Geometric transformations: rotations, scaling, simulated perspective; nonlinear transformations adapted to the texture-mapping manipulations or the deformation introduced by some non-standard cameras: panoramic, fish-eye, etc. Arbitrary deformations: warping. E Composite deformations and blending: morphing. F Special effects: • Bump mapping and other pseudo-extrusion techniques which give to the image a 3D appearance, notably the “embossing” technique. • Lighting effects: halos and glows, lens reflections, distance (“atmospheric”) attenuation introduced ex post. • Particular artistic effects: transforming realistic images into pointilist (van Gogh like) or other impressionist tableaux; “hot wax”, aquarelle, or carbon/chalk pictures, etc. One may wish to transform a photograph into an ancient copperplate engraving, or a comics strip style drawing. The possibilities are infinite. The main problem is to transform human imagination into an algorithm. . . G “Classical” filtering manipulations: edge enhancing, noise removal, anti-aliasing by blurring, sharpening, etc. H More analytic operations, which recognize or segment the image fragments: contour tracing, or zone (colour interval) selection, essential for the cutting and/or replacing picture fragments. The contour finding and representation is a very large domain per se, we will mention briefly some standard filtering techniques, but we cannot discuss other, more modern and fascinating subjects as the active contours (“snakes”) or the watershed algorithm. These, anyway, serve principally for the image analysis and interpretation rather than as a creation aid. 5 I Other manipulations which belong to the image analysis, and which will be omitted from these notes, or just briefly mentioned: • Vectorization: transformation of a bitmap into a set of geometric objects – lines, circles, etc. • Segmentation through the Hough transform: representation of geometrical entities of the image in the space of parameters which define these objects. • Karhunen-Loeve (Hotelling) transform which is a powerful tool of the statistical analysis of images, signals, etc. • Reconstruction of images from various linear signals, for example from their Radon transforms generated by X-ray scanners. (The Radon or Karhunen-Loeve transforms may serve also for more prosaic manipulations. They might help to localise and to parameterise the lines which should be by definition horizontal or vertical, but they are not, because the photo we put in the scanner was slightly slanted.) J We shall not discuss either the image compression (which is the object of another course for the 5-th year students: DESS “Images”). The list of omissions is anyway almost infinite: wavelet representation, procedural creation of pictures, for example through IFS (Iterated Function Systems) or the Lsystems (Lindenmeyer “grammatical” approach to the generation of iterative and recursive pictures, very good for the simulation of plants), etc. These items are not independent, but strongly related. For instance, the simulated 3D effects are often not geometric manipulations, but just some specific modifications of the colour distribution (bump-mapping). (But displacement maps are of geometric nature.) During the preparation of these notes we have used very different software, commercial and free. The image processing packages are very abundant and it is easy to find several free or shareware programs very powerful and user friendly. The commercial package heavily used was the well known Photoshop, but the Unix users may obtain almost all its special effects (and many more!) from a free package GIMP, a wonderful interactive and programmable tool, still evolving. As our ambition is to explain the essentials of the image processing, it was necessary to do some low-level programming, to treat images as matrices of numerical values (gray level, colour triplets or palette indices), and to process them as such. Of course it is possible to use any programming language to do so, and we have tried several of them. The necessity to perform many experiments, to change interactively this or that element of the processed picture precludes the usage of classical compile-and-run languages, as C++ or Java, the interactivity is much more important than the brutal efficiency, so we used the scientific programming system Matlab. It is a commercial package, but there are other, free systems well adapted to matrix 6 processing, such as SciLab or Tela. (But Matlab is a scientific, vectorized computation package, and has excellent interfacing tools which permit both high-level and low-level visual data processing. This is slightly less developed in above-mentioned free systems (to which we may add Rlab and Octave), which have other objectives than image processing. There are also some powerful programming/integrating machines specifically adapted to the treatment of images as Khoros. The low-level programming is left to the user, but it is seldom needed. All typical image algebra and geometry are already ready to use in Khoros standard libraries, and the combination of modules and their interactive construction and execution using the visual, dataflow style interface is really very comfortable and elegant. Khoros has a mixed status: one can get freely the sources and compile them (which sometimes is not trivial. . . ), or buy a commercially packed full system, with plenty of documentation and examples. We acknowledge the existence of some other similar packages, commercial, but whose distributors offer sometimes some working demonstration versions, for example the IDL system, which offers a good graphical programming language and very decent interfacing. Of course we will use also some drawing packages, for example the MetaPost system which permits to include some PostScript drawings into the document without having to program them using the horrible PostScript language (very clumsy, but sometimes very useful!). MetaPost has the advantage of generating a particularly simple, developed PostScript, easily convertible into PDF by the ConTeXt package of Hans Hagen, without the necessity of using the commercial Adobe Distiller product. These notes are constructed independently of the other parts of the author’s course on image synthesis, and they may be read separately, but – of course – they should be treated as a part of a bigger whole. This is the first version, very incomplete, and probably very buggy. Please contact the author if a particularly shameful error is discovered. Chapter 2 2D Geometric Transformations 2.1 Typical Linear Transformations We begin thus to discuss some geometric manipulations of the 2D images. They will be presented, if necessary, in a coordinate-independent fashion, using abstract vector algebra, but the difference between the 3D scenes, where we are interested in “real” objects and their mutual relations, and 2D images is essential. There is no need to introduce abstractions where the only thing to do is just to transform the pixel coordinates in a loop. There is no particular need for the homogeneous coordinates, although they might simplify the presentation of the simulated perspective transformation. Only continuous geometry is discussed in this section. The real problems, the manipulations of discrete pixel matrices is postponed to the section (2.3). 2.1.1 Scaling, Rotation, Shearing The scaling is a very simple operation: x → sx x, y → sy y, (2.1) or, if you wish x sx → y 0 0 sy x y (2.2) where the pair sx , sy may contain negative components, but then one has to reinterpret the negative coordinates; usually both x and y are positive, starting at (0, 0), and rarely we think of the image as of something centered around the origin of the coordinate system (although it might help while considering an image to be a filter, see section(3). We have then to add the compensating translation. If in the original picture x varies from 0 to A, and sx is negative, the real final transformation is x → sx (x − A). 7 (2.3) 2.1 Typical Linear Transformations The rotation matrix is well known: x cos(θ) → y sin(θ) 8 − sin(θ) cos(θ) x , y (2.4) but, what is sometimes less known is the fact that this rotation can be composed out of three shearing (slanting) transformations parallel to the coordinate axes. The x-shearing transformation which gives the effect shown on Fig (2.1) (any Brazilians among readers?. . . ) has the representation (2.5). x 1 κ x → . (2.5) y 0 1 y Fig. 2.1: The x-shearing transformation Defining κ = tan(θ/2) we prove easily that the rotation identity: cos(θ) − sin(θ) 1 −κ 1 = sin(θ) cos(θ) 0 1 sin(θ) matrix fulfils the following 0 1 1 −κ 0 1 , (2.6) which give together the chain of transformations shown on Fig. (2.2). Fig. 2.2: Rotation from shearing It seems a little useless, since it is more complicated than a simple rotation matrix, but it might be faster, if there is no pixel interpolation involved. Horizontal or vertical slanting displace entire rows or columns, and if we have a fast pixel block transfer routine, some time may be economised. Beware however of aliasing! The slanting deformation can be easily generalized producing a trapezium shown on Fig. (2.3). 2.2 Other Affine Transformations and Perspective 9 The mathematics of this transformation is quite simple, we see that the slanting coefficient in (2.5) is now x-dependent, and that this dependence is linear. So, the whole transformation is not linear any more: y → y, x → x + (α + βx)y. This might be considered as a poor-man perspective (horizontal) transformation: the figure represents Fig. 2.3: Generalized sheara rectangle disposed horizontally, perpendicularly to ing the vertical plane defined by the sheet of paper, or the screen of this document. We look at this stripe from above and from the right, and the shape is intuitively correct. However, simulating the perspective in such a way is a rather bad idea. The problem is that – as easily seen from the figure – the displacement and the compression are strictly horizontal. But the proportions along the other direction are modified as well. We know that the perspective is affine only in the homogeneous coordinates. 2.2 Other Affine Transformations and Perspective The real perspective looks like on the Fig. (2.4). The details of the transformation depend on the relation between the simulated orientation of the original image and the screen. Our example figure is placed vertically and perpendicularly to the screen, but, for example the “Star Wars” text is horizontal. We might obtain an oblique orientation of the original also, like on Fig. (2.5). Fig. 2.4: Once upon a time. . . there were some perspectives Now, how to obtain these transformations? The technique used is the following. The image is enclosed in a rectangular box. We can move arbitrarily the corners of this box, producing an arbitrary quadrilateral. The parameters of the perspective transformation which combines • the simulated position of the original image in the 3D space, and • the parameters of its projection on the screen 2.2 Other Affine Transformations and Perspective 10 Fig. 2.5: Perspective; oblique orientation is retrieved from these corners, and all the rest is straightforward. For the sake of completeness we re-derive the perspective projection formulæ. The following entities are known: • The position of the projection point (the eye): ~xP . Usually in the 3D computations this point is fixed, for example at origin, or ~xP = (0, 0, 1), etc. Here this is not an independent geometric object, we will find it from the resulting projection quadrilateral. • The projection plane (screen): x~0 · ~n = d. The homothetic projection is shown on Fig. (2.6). ~x x~0 ~n ~xP Fig. 2.6: Perspective Projection The vector x~0 is the homothetic map of ~x, so we can write that x~0 − ~xP = κ(~x − ~xP ). (2.7) But x~0 lies on the projection plane. Thus, multiplying the equation (2.7) by ~n we get d − ~xP · ~n κ= , (2.8) (~x − ~xP ) · ~n from which x~0 can be easily computed. But here we are interested in solving a totally different problem! First, we simplify the equation (2.8) identifying the projection 2.2 Other Affine Transformations and Perspective 11 screen with the plane xy (so ~n is the unit vector along the z axis, and d = 0), and placing the focal point xP at (0, 0, r). We obtain the following transformation 0 r x x = . (2.9) 0 y r−z y Here we don’t know the vector (x, y, z) yet. It belongs to the original image, considered always rectangular, with its canonical Cartesian system, say {~x0 , ~u, ~v }, where ~x0 is a distinguished point, for example one corner, or the center of the image, and ~u, ~v define the axes. Of course, if ~x0 is the left lower corner, the natural choice will be ~u = ~x1 − ~x0 , and ~v = ~x2 − ~x0 , etc. Some other convention might be easier if the origin is chosen at the image center. So, we need to find this coordinate system, i.e. 3 vectors with two of them perpendicular. There are thus 8 unknowns, and we have 8 equations for the 4 distorted corners. We leave this exercice to the reader. 2.2.1 Linear Transformation of Triangles When a fragment to fragment mapping is needed, as in morphing (see the section (7.1)), usually both the source and the target areas are split in simple polygons, for example in triangles. Triangles are always convex and their simplicity ensures that the mapping is unique, and without pathologies. The task consists in mapping the triangle spanned by the three points p~0 , p~1 , p~2 into the triangle defined by ~q0 , ~q1 , ~q2 . The mapping should be linear. The Fig. (2.7) shows the result. q2 p2 x’ q1 x p0 q0 p1 Fig. 2.7: Linear Triangle Mapping The solution goes as follows. We establish within the first triangle a local coordinate system spanned by ~u = p~1 − p~0 , and ~v = p~2 − p~0 . The axes are not normalized. Every internal point ~x of the triangle admits the representation ~x = p~0 + α~u + β~v . Knowing that the problem is planar, and that we can treat the vector product as scalar (pseudo-scalar, but this is unimportant; it has one component only), we get ~x = 1 ((~x ∧ ~v )~u − (~x ∧ ~u)~v ) , ~u ∧ ~v (2.10) 2.2 Other Affine Transformations and Perspective 12 i.e., α = ~x ∧ ~v /~u ∧ ~v , etc. In the second triangle we introduce the corresponding base ~g = ~q1 − ~q0 and ~h = ~q2 − ~q0 , and we restore ~x0 = ~q0 + α~g + β~h. The only detail which remains is the correct implementation in the discrete case, as always. Obviously, if the problem of triangle mapping is solved, any polygons may be treated after their triangularisation. A natural caveat seems appropriate here: if the mapping is linear, and the triangular components of a polygon are treated separately, the resulting global mapping is continuous, but straight lines might break at the triangle boundaries, as shown on Fig. (2.8). Fig. 2.8: Linear Transformation of Polygons is Dangerous Such technique might not be acceptable. Moreover, even in the case of quadrilaterals, if the target figure is convex, there is a choice of two diagonals, which add some ambiguity to the mapping strategy. The lines will break in different places. The section (7.1) discusses some other methods of deformation, essentially non-linear. 2.2.2 Nonlinear Deformations (Overview) The perspective is already nonlinear, but we want to treat here some more general cases, especially useful in texture mapping. If a flat picture is projected on the surface of a 3D object, and then this object is projected on the screen, the two transformations have to be composed. If it is possible to derive the composite transformation before the rendering, this process can be accelerated. If for the rendering the ray tracing is used, this is usually useless. We have to find the intersection of the ray with the point in the 3D space, and from its coordinates we may deduce the colour of the pixel. But if a polygon scan is used, or if the radiosity machine prepared all the 3D surfaces in a more clever way, and we have only to project everything on the screen, some simplifications can be obtained. Some faster rendering engines like 3DS or the dynamic games: Doom, Quake, etc. pre-prepare the mapped textures. Another application of nonlinear deformations is the general warping, which will be discussed in section (7.1). The warping is ususally an “artistic”, manual manipulation of the image, but there is at least one nonlinear flat transformation which may be considered algorithmic, and it is strongly related to the perspective transformation: the normalisation of images (photographs) obtained with very wide 2.2 Other Affine Transformations and Perspective 13 lens camera, for example with the “fish-eye” lenses. We may wish to “flatten”, to restore straight lines of such picture as Fig. (2.9), or inversely, to compose one panoramic strip out of flat image fragments, as on Fig. (2.10). The first picture has been taken from the NASA collection, essentially unprotected, and the other from the BYTE Web page, with a possible copyright infringement. If somebody knows more about their restrictions, please let me know). Fig. 2.9: Fish view of a cosmic enterprise Fig. 2.10: Brooklyn bridge composed out of flat fragments It is possible of course to combine small picture fragments in an affine way, without introducing curves, nor the the “Cinerama” style discontinuities of tangents. The Fig. (2.11) shows it very properly done. All these operations need a very strong analytic apparatus, or an interactive adjustment, by a human. For example, in order to render the straight lines on the 2.2 Other Affine Transformations and Perspective 14 Fig. 2.11: Reconstruction of a wide-angle shot Cap Canaveral photo, either one has to deduce the true geometric proportions of the 3D scene which requires a rather intelligent geometric segmentation, or one tries through dynamic warping to transform all the circle segments into straight lines, which may not be unique. Of course, if the focal properties of the fish-eye camera are known, the restoration of the picture (or its simulated generation from a flat image) is essentially straightforward, although not entirely trivial. We describe now partially, and with some simplifications the fish-eye camera. This is not a – strictly speaking – image processing problem, and it will migrate to the 3D-geometry section in future releases of these notes, but for the moment the reader might treat this subject as an exercice in image deformation. The most classical fish-eye model (not necessarily fully practical) is based on the stereographic projection shown on Fig. (2.12). The idea is thus very simple. Instead of projecting the 3D scene on a flat screen, we project it on spherical surface. In such a way we can cover 180 or more degrees without introducing geometric pathologies. (Of course, we cannot cover 360◦ , but there are already some standardised techniques 2.2 Other Affine Transformations and Perspective 15 of producing and displaying the images covering the full solid angle, for example IPIX, which is even accessible as a Web browser plug-in). spher. proj. final proj. focus Fig. 2.12: “Fish-eye”-like deformation We have the following geometric entities: • The projection sphere with radius R, considered usually to be very small as compared with the true scene dimensions, but it is not small, when a simulated projection is used to deform an already existing 2D image. • The main focus which is not necessarily the center of the sphere. • The stereographic projection point, and the projection plane: we need to map the sphere sector to a flat disk. We may choose for example the pole opposite to the main vision direction (the center of the image disk), and for the plane – the “equator”. Other variants are also possible, but more difficult to treat mathematically. • Finally, we have to define somewhere the limiting angle. This is essential for the fish-eye rendering, but also for the flattening: we have to know how far shall we go, the size of the 180◦ image is essentially infinite. . . If we want just to simulate the fish-eye camera and to construct a distorted image from something perfectly standard, we may use the center of the sphere as the focal point. This is neither the standard stereographic projection, nor a general fish-eye, but the IPIX normalized camera. The original image may be placed on a plane tangent to the sphere (this is just the choice of the scaling factor). We have the following relation between the “real” radius r on a flat image tangent to the projection sphere, and the distorted radius z on the equator plane: z/R r =p R 1 − z 2 /R2 (2.11) 2.2 Other Affine Transformations and Perspective which can be easily inverted to generate the fish-eye distortion z r/R =p . R 1 + r2 /R2 16 (2.12) The task of transforming the Cartesian coordinates (x, y) to r and some angular variable is left to the reader. But we show the result of this transformation on Fig. (2.13). Of course you know the original. (The result could be of better quality if we have followed the rules suggested in the next section. We have not interpolated the pixel colours.) Fig. 2.13: Babel as seen by a fish There is one important point in this exercice which shows how the conceptual and geometric difference between the 3D scene creation and th 2D picture manipulation reflects on some mathematic details. Here we don’t know anything about the sphere radius R, but we know the size of the original, and the dimensions of the new image we are manufacturing. If the vertical half-height of the tangent image (centered) is equal to A, and that of the equatorial projection – B, we have 1 1 1 = 2− 2 (2.13) 2 R A B We suggest very strongly that the local readers repeat these calculations for any position of the focal point, not necessarily in the center of the sphere. This is a very good examination subject. 2.3 How to do Geometry in Discrete Space 2.3 17 How to do Geometry in Discrete Space This is a very important section. If a discrete set of points (or intervals, but localized as pixels) is stretched or compressed, one has to define the interpolation procedure which is being used. There is no unique algorithm to do this. Moreover the pixels are usually square (or rectangular) oriented canonically wrt. the x and y axes. If we rotate the image the pixels change their positions, but their shape does not rotate. They must occupy canonical positions also, their coordinates must be integer again. So, all kind of very nasty effects is expected: holes, pixel fusion (after rounding), aliasing etc. The loops for (y=0;y<ymax;y++) for (x=0;x<xmax;x++) {newpt=transf(x,y); NA[newpt.x][newpt.y]=A[x][y];} in general cannot be applied directly. The basic idea which eliminates the topologic pathologies (holes), although does not prevent the aliasing and other troubles resulting from the discretisation, is the application of the inverse transform. The program calculates first the image of the transformed contour: if the original image is a rectangle, and the transformation is simple, for example affine, it is only necessary to find the new coordinates of the 4 corners, which may be then connected by straight lines. In general case we have to find the boundaries of the result space, and then we will fill this space regularly with pixels. The program goes now as follows: for all (x0 , y 0 ) in the transformed zone the program calculates (x, y) – the inverse mapping. The result will usually be a pair of reals, and not integers. In the most primitive case it suffices to round the coordinates and to assign the pixel value corresponding to the coordinates found. But caveat programmator! Rounding or truncating is a harsh operation, and if the transformation is repeated several times, the distortions will accumulate. Fig. (2.14) shows the result of the rotation of a picture by 360 degrees in 10 stages, without any interpolation. Fig. 2.14: Discrete rotation iterated Of course this result is not very acceptable, and this manipulation was done on purpose. A serious (and in particular: multi-stage) transformation must interpolate 2.3 How to do Geometry in Discrete Space 18 the pixel values. The result might be then a little smeared or fuzzy, but in general is quite good. Fig. (2.15) shows the result of the same manipulation, a full rotation in 10 slices of 36 degrees, but with interpolation. Fig. 2.15: Discrete rotation with interpolation The interpolation might be performed in many ways. The simplest may even be linear. The source pixels are treated as points occupying the vertices of a rectangular grid. If the reverse transformation constructs a point between the vertices, for example if we obtain a pair (x, y) reduced to the unit square whose corners correspond to original pixels denoted by I00 , I10 , I01 , and I11 , we obtain the resulting value by the bilinear interpolation Ixy = (1 − x)(1 − y)I00 + x(1 − y)I10 + (1 − x)yI01 + xyI11 . (2.14) In practice a bicubic interpolation is much better, and it is not so complicated. (This is a good examination subject. . . ) There is a small problem near the image edges: what shall we do if the reverse map gives coordinates outside the pixel grid? Or even so near the edges or the corners, that a bicubic interpolation is impossible. A slightly different interpolation scheme might then be used, simpler than cubic. First we “collapse” all the pixels to their corners (point-like vertices) using the nearest-neighbour approximation. The vertices far from the image edges are just averages of the 4 neighbouring pixels, and the boundaries are calculated as follows from the Fig.(2.16). We may use the following equations which are quite intuitive: 1 (A + B + C + D) 4 f + i = A + B etc. (e + i)/2 = A. i = (2.15) (2.16) (2.17) The other vertices are computed by symmetry and iteration. Note that the exterior vertices extrapolate rather than interpolate the pixel colours. e = 1 (7A − B − C − D) 4 (2.18) 2.3 How to do Geometry in Discrete Space g f e A h C k 19 B i D j l Fig. 2.16: Nearest-neighbour pixel interpolation 1 (3A + 3B − C − D) 4 1 h = (3A + 3C − B − D) 4 1 i = (A + B + C + D) 4 f = (2.19) (2.20) (2.21) (2.22) It may be troublesome, and throw us outside the allowed colour space (negative or too big intensity). Now this matrix is converted using the inverse transform technique, and the pixels are reconstructed from the vertices using equations similar to those above. The reader is kindly asked to solve explicitly these equations in the reverse direction. What happens if the transformed image contains L-like concave boundaries? Chapter 3 Filtering 3.1 Introduction: Filters and Convolutions Mathematically the convolution of two functions: f (x) and g(x) is given by the integral Z (f ? g)(x) = f (z)g(x − z) dz (3.1) which has to be generalized into two dimensions and discretized. We get thus the convolution formula for two matrices A and B: XX (A ? B)ij = Akl Bi−k,j−l , (3.2) k l where the indices run through all the intervals where they are defined. Usually k, etc. is greater (or equal, depending on the convention used) than zero, and goes up to the upper limit of the matrix dimension. When it becomes negative, the element is neglected. This seems trivial, but is not necessarily so: sometimes it is preferable to apply the cyclic convention, where the negative indices “wrap around” the matrix – the image is not considered to be a finite patch of a plane, but a torus. In this way the problems with boundary conditions may be omitted. Usually one of the concerned matrices is the image (or three images – one for each colour plane), and the other is the filter which is usually much smaller than the image. The sum (3.2) should thus be reformulated in a way which minimizes the number of operations. A useful convention used in this domain is the cyclicity of the filter matrix. We don’t need to apply the toroidal boundary condition to the image, but very often the filter intuitively should be “symmetric about zero”, in order to to generate spurious anisotropies. Convolutions, as shown by the equations above are linear operations, and they are not sufficient to obtain all kind of posible special effects, but they are universal enough. (We will discuss here essentially one kind of non-linear filter – the median, and its variants, but its presentation will be rather superficial). In the section (4) 20 3.2 Typical Filters and their Applications 21 we will discuss some details of the colour representation of images, but the theory of colours will be treated in a separate set of notes. For us a pixel is a numeric value which can be normalized to the interval [0, 1), or [0, 255) if you wish. If the image is coloured, we can treat the three component separately. here we will not discuss the colour mixing or conversions at all. Henceforth we shall either consider gray-level images, or treat the three channels identically. The interpretation of the filtering process can be given on many different conceptual and mathematical levels. This course cannot treat all the mathematical details of the signal processing theory, so for the moment the reader may think that the filtering produces the value of a given result pixel by a linear combination of its neighbours. This may smooth the image if we calculate the averages (i.e. if all the weight factors are positive), or it may enhance local differences if some weight factors are negative. A general warning is necessary. A convolution of the image with any matrix, and in particular with a filter possessing negative elements may throw the image out of the legal colour space. The filtering application shoud warn the user, and permit to correct the result by cropping the illegal values (they will remain either minimal or maximal), or renormalise the whole image by shifting and/or multiplication of the pixel values by an appropriate constant. There is no unique solution in such a case. Image processing packages ususlly offer to the user a possibility of adding an offset immediately during the filtering. We shall discuss this problem under a different angle in section (5). 3.2 3.2.1 Typical Filters and their Applications Smoothing filters Look at the horizontal fragment of the da Vinci “Last Supper” fresco on the Fig. (3.1). This picture is extremely noisy. By convoluting it with a matrix 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1, (3.3) 25 1 1 1 1 1 1 1 1 1 1 where the factor 1/25 is the obvious normalisation factor, produces the transformation shown on Fig. (3.2). The noise has been considerably attenuated. Of course the image is blurred now, but if we don’t need the full resolution, it is better to smooth it before, as the plain resampling usually does not get rid of the noise, as shown on the left of the Fig. (3.3). 3.2 Typical Filters and their Applications 22 Fig. 3.1: Last Supper – fragment Fig. 3.2: Image blurred with a 5 × 5 homogeneous filter In order to show the relation between the de-noising through smoothing and the deterioration of contours we have exaggerated the size of the filter mask: 3 × 3 would be sufficient. The uniform, square mask is simple and fast, but more powerful methods can be applied. In particular, an often used smoothing filter is the gaussian function 1 −x2 (3.4) g(x) = e 2σ2 . N p In two dimensions x is replaced by r = x2 + y 2 . The Gaussian filter is sometimes used for special effects, and its range is far from bein local – sometimes several dozens of pixel along both directions are concerned, and usually it is parametrable. it may be computed directly by a floating numer routine, or as an iterated convolution of 1 1 the matrix with itself. For example, the second and the fourth iterations 1 1 3.2 Typical Filters and their Applications 23 Fig. 3.3: Reduced images give 1 2 1 1 g2 = 2 4 2, 16 1 2 1 1 4 6 4 1 4 16 24 16 4 1 6 24 36 24 6 . g4 = 256 4 16 24 16 4 1 4 6 4 1 (3.5) Here the variance is proportional to the size of the matrix, but this can be changed, we wanted just to show that a Gaussian-like filter can be obtained without floatingpoint calculations, but by iterations. of course, instead of generating a big filter matrix, a small one is iteratively applied. As mentioned above, in several interesting cases the size of the Gaussian matrix is big. The complexity of the convolution algorithm is proportional to the surface of the filter, and filters too large are inefficient. In the case of Gaussians we have another possibility of simplification. The two dimensional Gaussian exponent exp(−r2 /2σ 2 ) = exp(−x2 /2σ 2 ) exp(−y 2 /2σ 2 ). We may thus apply the two factors separately, first convoluting the columns in each row separately, and then repeating the same operations to all rows. The complexity is reduced from N 2 , where N is the filter size, to linear. The one-line or one-column filtering 1×m matrix again does not need floating point computations, but may be obtained by convoluting [1, 1] with itself, and normalizing. By using asymmetric Gaussian filters, with different horizontal and vertical variance it is possible to obtain many interesting effects, which will be discussed later. 3.2.2 De-noising by median filtering Fig. (3.4) shows two more smoothing experiences. The image on the left was obtained by the Gaussian blurring followed by a sharpenig filter which will be discussed in the next section. The image on the right is the application of the median process. Instead of averaging the pixel values around the center, the filtering chooses one representative value. Concretely: first a mask – 3 × 3, 5 × 5 or other is defined. Within this mask all 9, or 25 pixels are identified and sorted with respect to their values. The middle value replaces the central pixel of the block. An even size of the block may also be used, 3.2 Typical Filters and their Applications 24 Fig. 3.4: Gaussian filtering (sharpened), and median denoising although it is less popular. But the existence of one central pixel is not needed. In the even case the resulting picture will be shifted “colourfully” by 0.5 pixels. One should not exaggerate with the size of the median zone, as it introduces homogeneous colour patches. But it can be used then as an artistic tool. There are some recent variants of this scheme: the size of the reprentative block is not fixed, but of varying size: 2, 3, 4, 5, etc., not necessarily centered. For each block the (normalised) statistical variance of the picture is calculated, and the block with minimal variance is chosen for the median computations. The author of this technique claims that a good denoising effect is obtained, and the edges become less fuzzy than in the standard model. Of course, the variance-sensitive techniques are known for years, for example the linear (but adaptive) Wiener filter. 3.2.3 Gradients Taking the averages: homogeneous or Gaussian, is equivalent to the integration of the picture with a positive weight function. This operation eliminates the fluctuations, the high frequencies present in the image. The measure of these fluctuations which can help us to localize the edges is the operation inverse – the differentiation. In a discrete structure the gradient can be of course approximated by the difference of two neighbouring pixels. The gradient is a vector, and we can choose to take its absolute value, or a directional quantity. In any case, if the gradient filter is applied to a homogeneous colour zone, the result is close to zero. Fig. (3.5) shows three different variants of gradient-like filtering. The left picture is the result of the filter [−1, 1], where 1 is at the central position, and all the remaining values of the filetr matrix are zero. This is thus a horizontal gradient which strenghtens the vertical edges. The result is so weakly visible that we had to augment the luminosity and the contrast of the image. (Note that this filter produces plenty of negative values which are cropped to zero.) The side-effect of this image enhancing was the introduction of some noise. 0 0 −1 The central variant is a diagonal filter 0 1 0 , but with an additive constant 0 0 0 (equal to 128 – the mid-value of the intensity full interval). This is the classical embossing filter – the simulation of a bas-relief. If this filter is applied to a full 3.2 Typical Filters and their Applications 25 Fig. 3.5: Lena make-up colour image, the result is not very different (why?). Theimage at theright is 0 −1 −1 another directional, gradient-like filter, but more complex: 1 1 −1 . Note 1 1 0 that the overall weight here is 1 and not zero. No constant is needed to produce the embossing effect. Previously the filtering modulated a flat, gray surface, here the image itself defines the artificially extruded plane. The reader should understand this effect. (Actually, extruding the original picture seems artistically worse than embossing the flat surface.) 3.2.4 Laplace and Sobel operators The discrete version of the second derivative has the following form: f 00 (x) → f (x + h) − 2f (x) + f (x − h) . h2 (3.6) The basic discrete version of the Laplace filter which computes ∂ 2 f /∂x2 + ∂ 2 f /∂y 2 0 −1 0 is just L1 = −1 4 −1 . This filter is used to detect the edges, and also for 0 −1 0 the sharpening. It is not unique, we may define other Laplacian-like filter matrices, for example −1 −1 −1 1 −2 1 −1 0 −1 L2 = −1 8 −1 , L3 = −2 4 −2 , L4 = 0 4 0 , −1 −1 −1 1 −2 1 −1 0 −1 (3.7) or asymmetric variants, vertical or horizontal only. The left variant on Fig. (3.6) has been obtained with L2 which gives contours slightly better than other versions (at least in the case of Lena). 3.2 Typical Filters and their Applications 26 Fig. 3.6: Laplacian and Sobel edge filtering The right contour is obtained with the combined action 1 2 1 1 Sh = 0 0 0 , Sv = 2 −1 −2 −1 1 of two Sobel filters: 0 −1 0 −2 . 0 −1 (3.8) Both act independently on the image producing the horizontalpand vertical partial contours: I1 and I2 . Now we combine these images with I = I12 + I22 . The right picture on Fig. (3.6) was produced by a simplified operation: I = I1 + I2 , which is very slightly less regular, but much less expensive. More about the algebraic combination of images is presented in section (5). Of course, if we want to segment the picture and to find the geometric description of the contours, all this work is still ahead, the filtering prepares only the bitmap. Anyway we have barely scratched the surface of this domain; there are more powerful methods, for example the application of adaptive gradient filters followed by the search of directional maxima. 3.2.5 Sharpening, and Edge Enhancing Laplacian, Sobel or Prewitt filters, XY-oriented or diagonal, may serve to identify the edges. How can we just enhance them in order to sharpen the picture? If we add 1 to the central element of the Laplacian matrices, L1 or L2 , etc., we obtain sharpening filters. Such filter, for example −1 −1 −1 −1 9 −1 (3.9) −1 −1 −1 may also be understood Imagine that we have smoothed the image a little differently. 1 1 1 L0 with the filter 1/9 1 1 1 . We call it L1 . A weighted avarage interpolates 1 1 1 3.3 Fourier Transform and Spatial Frequencies 27 between the original and the blurred one: L = αL0 + (1 − α)L1 . When α decresaes from 1 to 0 we move from the original to the blurred version. But what happens if α > 1 ? This is an extrapolation which increases the distance between the smoothed image and the result. It gets sharpened, at least it should be sharper than the original. In order to get the matrix (3.9), the value α = 10 is needed. 3.3 Fourier Transform and Spatial Frequencies Of course we won’t treat here the rich and enormous domain of spatial frequency analysis. Our aim is to process the images in order to render them better, and not to analyse them. The review of the discrete Fourier transform will thus be very superficial. We shall discuss here essentially two applications: • Rescaling, and • Fast convolution (and correlation). The basic truth to remember is that the Fourier transform converts distances into frequencies which are, in a sense, dual to distances. When we look at the formula Z +∞ (3.10) f (x) → g(ω) = F[f ](ω) = f (x)eiωx dx −∞ we see immediately that if g(ω) = F[f (x)], then F[f (ax)] = 1/ag(ω/a). Shrinking (horizontally) the transform, dilates the function. The discrete case is slightly more difficult to visualise. Here “shrinking” or “dilating” is less meaningful, we have just a number of points. Moreover, if we look at the discrete FT formula: gk = N −1 X fj e2πijk/N (3.11) j=0 we see that the periodicity (exp(2πi) = 1) implies that for large j, approaching N we don’t have “very high frquencies”, but again low, and negative. The highest frequencies correspond to the index j = N/2. In the discrete case we don’t have “distances”. The linear length measure is conventional. So, if we want to scale a function, to dilate it by a factor p, we need to pass from N to p×N points. (Obviously, enlarging a picture means augmenting the number of pixels). If our aim is to enlarge the picture without introducing new, spurious frequencies corresponding to the sharp edges between the “macro-pixels”, we add just some zeros to the Fourier transform, we correct the scale (a multiplicative factor), and we invert it. Where shall we add those zeros? Of course in the middle, for example the vector g0 , g1 , g2 , g3 , g4 , g5 changes into g0 , g1 , g2 , g3 0, 0, 0, 0, 0, g3 , g4 , g5 . We have take into account the following properties of the FT: • g0 corresponds to the frequency zero (the global average), is real, and has usually the biggest numerical value among all the Fourier coefficients. 3.3 Fourier Transform and Spatial Frequencies 28 • If g corresponds to a FT of a real function, (and the images are usually real), we have the following symmetry: gN −k = gk . This symmetry must be preserved when we add some zeros. In our example the length of the vector was 6, with a solitary g3 in the middle. It had to be duplicated. The Fig. (3.7) shows the spectral stretching of a vector. The original had 32 elements (dashed line), the dilated – 128 (solid blue). Note that this dilating which did not introduce new frequencies, smoothed the function (and introduced some small oscillations, which unfortunately may be visible on images. You may call them (cum grano salis) diffraction patterns. 4 2 0 −2 0 20 40 60 80 100 120 Fig. 3.7: Spectral stretching of a curve The Fig. (3.8) shows the splitting of the transformed image before the inversion of the transform. Fig. 3.8: The dilation splits the Fourier transform The Fig. (3.9) shows the result of such a dilation on the eye of Lena. We discarded the colours in order to avoid the delicate discussion on the chrominance transformations. The colour diffraction patterns are even more disturbing than the gray ones. . . The left picture presents once more the classical cubic interpolation, and the right one – the diffractive dilation of the original. Of course, using the median filter or others it is possible to smooth out the square diffraction patterns, but usually with the aid of well chosen wavelets it is possible to improve the result substantially. The Fourier transform is a venerable, but not the only spectral tool on the market! 3.3 Fourier Transform and Spatial Frequencies 29 Fig. 3.9: Cubic interpolation vs. spectral stretching 3.3.1 Discrete Cosine Transform The complex Fourier transform has some advantages over slightly more complicated cosine or sine transforms, but these, and especially the Discrete Cosine Transform (DCT) is used very frequently. It might be used for the spectral filtering, but its main application is the image compression, for example in JPEG documents. We give here for the reference the formulæ used. If Fkl is the matrix representing the image (or its fragment: JPEG uses 8 × 8 blocks), its transform is defined by Gpq = Cp Cq M −1 N −1 X X Fmn cos m=0 n=0 0≤p<M . (3.12) 0≤q<N π(2n + 1)q π(2m + 1)p cos , 2M 2N where Cp = √ 1/ M , p 2/M , p=0 , 1≤p<M Cq = √ 1/ p N, 2/N , q=0 . 1≤q<N (3.13) The formula (3.12) is invertible, and its inverse is given by: Fmn = M −1 N −1 X X p=0 q=0 Cp Cq Gpq cos π(2n + 1)q π(2m + 1)p cos , 2M 2N 0≤m<M . (3.14) 0≤n<N The usage of this formula for the image compression is discussed in another set of notes. We want only to signal here that usually the DCT is dominated by low frequencies, and the other can be eliminated without introducing visible distortions. The DCT of the “eye” (gray) picture gives G11 = 1589, (this is the frequency 0), 12 values near the origin are bigger than 100, and the remaining (of 240) are much, much smaller! Chapter 4 Colour Space Manipulations 4.1 Colour Spaces and Channels By the name “Colour Space” one usually denotes various linear or non-linear combinations of the three “basic” colours in conformity with the generally accepted trichromic theory. An introduction to the thory of colours is presented elsewhere. If the manipulations of colours are linear and “synchronous” (the same operation applied separately to each colour plane of the image), we can use the space we want, and specifically the representation RGB, because often the images are stored in the computer memory as 3 matrices, one for each plane R, G or B. These planes will be called channels. A colour image needs at least 3 channels in order to represent (more or less) faithfully all hues, but we might need more than that. In particular: • We shall use the subtractive CMYK (Cyan – Magenta – Yellow – blacK) space, as this particular set of channels is well adapted to the printing process: the more ink you put on paper, the darker is the result. When preparing a picture to be printed and adjusting some colours, this space is most often used. But also for some interactive colour balance adjustments on the screen the CMYK space is useful, as the factorization of the global luminance (or rather: darkness) makes it easier to tune the picture. • When some irreversible manipulations take place, for example when the histogram equalization eliminates some colours in favour of other, more frequent, such operation cannot be done separately for each plane, otherwise the colours might get severely distorted. Usually one separates then the luminance and the chrominance, transforming everything for example into the CIEL*a*b, or even XYZ space. The luminance channel undergoes the histogram massacre, but the chroma channels are left intact, and then the RGB representation is reconstructed. • If for some reasons the colours must be quantized, and an optimized palette chosen among all the 224 TrueColours, a judicious choice of channels may be 30 4.2 Transfer Curves, and Histogram Manipulations 31 very helpful. • When combining the image fragments, superposing or masking them the transparency, or α channel is very important. This is not a “visible” colour, but affects the display of other channels. A multi-channel image may have several artificial channels, which during the final rendering will be flattened out, and disappear, but without them the image processing would be a horribly cumbersome task, very clumsy and difficult to learn. Some examples might be useful here. Looking at the original Lena portrait we should remark something strange. Not only the plume of her hat is violet, but her eyes as well. . . In fact, the colour shifting is a popular technique in advertising and in press, to render the food pictures or the skin of beautiful girls more appetising. The Fig. (4.1) shows conspicuously that the picture of Lena is stained. For – probably – some deep philosophical reasons the Playboy editors found that the overall tone of the image should be rose. Using the CMYK separation, and eliminating some Magenta we “normalize” the result. Fig. 4.1: “Playboy” models’ life is not always rose. . . 4.2 Transfer Curves, and Histogram Manipulations The left picture on Fig. (4.2) is not very interesting. One can hardly distinguish the details (in fact the standard brightness and contrast of the author’s screen are such that the picture is almost completely black). It was produced by Povray from the file piece3.pov written by Truman Brown without the application of the standard gamma correction. Of course, we could enhance manually the brightness and the contrast, or modify the transfer curve, but sometimes a more automatic approach would be useful. We shall come to that. 4.2 Transfer Curves, and Histogram Manipulations 4.2.1 32 Transfer Curves What are transfer curves? They are just transformations in the colour space. A diagonal line Iout = Iin is the identity. In order to enhance the contrast the curve should make clear areas clearer, and dark – darker. The overall brightness is enhanced by the vertical lifting of the curve. The two remaining fragments on Fig. (4.2) show a manually tuned curve and the result of the operation. Fig. 4.2: Adjusting transfer curves More or less the same result may be obtained automatically through the analysis of the histogram of the image. The full TrueColour histogram is a vector with 2563 entries, and is usually not done for obvious reasons, the number of “bins” is enormous, and most of them will be empty anyway. One usually constructs three histograms, one for each plane, or even just one for the luminance. The result in the case of the “piece3” example is shown on Fig. (4.3). 25 20 15 10 5 0 0 20 40 60 80 Fig. 4.3: Histogram of the Piece3 image We see that the histogram is null above the index 80. More than two-thirds of the colour space is empty for each channel. In such a case it would be sufficient to multiply every colour by 3 (more or less), and that would render the image brighter. But not enough! The histogram is still biased towards small luminance. Moreover, in cases where dark and bright areas coexist, but they are far from equilibrium, as shown at the left of Fig. (4.6), no histogram stretching is possible. 4.2 Transfer Curves, and Histogram Manipulations 4.2.2 33 Practical gamma correction A very important and typical transformation in the colour (mainly: intensity) space is the power-law: (4.1) Io = Iiγ . In fact such correction is usually introduced already during the acquisition of a real scene by a camera. But if the image is synthesised, created without camera. . . The well-known RT package Povray until recently produced “raw” images, but the version 3 permits to add a power-like correction directly during the rendering. A relatively easy way to brighten the piece3 image has been shown. We know already that the bright segment of the transfer curve (intensity near one) is irrelevant, as the histogram is concentrated around zero. This dark part can be easily lifted by the formule (4.1) with γ about 0.33. But the contrast will be too low, the image will be grayish, as shown on Fig. (4.4), where γ was chosen equal to 4. (In fact this is the inverse of what is normally called γ in the image transfer jargon.) Fig. 4.4: “Piece3” with gamma-correction Before continuing the subject we have to point out that the manipulation of the transfer curves might be used to obtain completly delirious artistic effects. We give just one complex example, which begins here, but which will be modified later. We begin with a gray, Gaussian random noise image. Some percentage of background, white pixels turned into gray. Then the pixel diffusion was iterated a few times. This operation displaces a pixel, exchanging it with one of its neighbours, randomly chosen. Despite the naı̈ve hope that this shall produce a uniform chaos, what we really see, is the agglomeration of colours: when two identical pixels approach each other, the diffusion tends to keep them near. Chaotic diffusion increases fluctuations. Such are the laws of physics, but we cannot comment that here. Then, resulting image is blurred with a Gaussian filter, and an arbitrary, manual manipulation of the transfer curves for each component separately finishes the task. 4.2 Transfer Curves, and Histogram Manipulations 34 Fig. 4.5: Random colour distribution 4.2.3 Histogram Equalization However, we can perform then an irreversible (in general) operation called histogram equalization: The pixels will change colours in such a way that the colours almost non-existent on the image (with very low histogram value) might disappear, but the colours strongly populated will be alloted more “Lebensraum”. The Fig. (4.6) shows one possible result of the equalisation operation. Fig. 4.6: Euro-Disneyland, the truth, and the treachery. . . This manipulation is not unique, at least three popular variants exist, and they might be accompanied by some contrast and brightness, or gamma correction. The idea goes as follows: first the histogram is constructed, and the avarage colour (or brightness) computed. Then, for each colour within the histogram add its population to an accumulator. If the result is bigger than the average, subtract this average and allot one new colour bin to this entry. If after the subtraction the result is still big, because this colour was very popular, subtract the average again, and add a new bin, and repeat this operation until exhaustion. The accumulator may have some residual value different from zero. To this value the next histogram column is added, and the iteration repeated. If some colour is so rare, that adding its histogram entry to the accumulator does not make it pass the threshold, this histogram entry 4.2 Transfer Curves, and Histogram Manipulations 35 is eliminated. At the end of each iterations we have a range of new colours which can be attributed to one on the original image. A transformation “one-to-many” cannot be reversible in general. There are, as mentioned above, three popular strategies of choice: 1. Choose the center of the alloted range of colours. In the new histogram the popular colours will be more widely spaced. This is the simplest possibility. 2. Choose randomly, independently for each pixel a new colour among all eligible (within the calculated range). This introduces a noise into the image, but it might be less disturbing than a severily quantized colour space. 3. Choose – if possible – the colour of the pixel’s neighbours (e.g. the median, or the average). This avoids the noise without impoverishing the colour space, but it blurs the image. This variant is used rarely, as it is is quite time-consuming. We remind once more that the equalization might be done for each channel separately, or globally for the luminance without touching the chroma channels, and then reconstructed. The first variant usually decreases the colour saturation. (Apparently this is the method chosen by PaintShop. Photoshop produces equalized images much more colourful.) We present now a complete Matlab program which does the histogram equalization (one channel), and we show some results. On Fig. (4.6) the equalization had been done with Photoshop. PaintShop produces something similar, but almost completely gray, the colours have been too well equilibrated. . . a=imread(’picture.bmp’); %a: matrix NxM. [n,m]=size(a); %dimensions nw=zeros(n,m); %New matrix his=zeros(256); % The histogram vector, initialized. for iy=1:n, for ix=1:m, fc=a(iy,ix)+1; his(fc)=his(fc)+1; end; end; avg=sum(his)/256; % The average. lft=zeros(1,256); rgt=zeros(1,256); nh=zeros(1,256); % New hist rule=input(’Which rule? (1 or 2) ’); rr=1; accum=0; for z=1:256, lft(z)=rr; accum=accum+his(z); while accum>=avg, 4.3 Transparence Channel 36 accum=accum-avg; rr=rr+1; end; rgt(z)=rr; % Number of alloted columns is ready if rule==1, nh(z)=floor((rr+lft(z))/2)-1; % The new value else nh(z)=rgt(z)-lft(z); end; %The interval. end; % New image reconstruction for iy=1:n, for ix=1:m, z=a(iy,ix)+1; % Sorry, no zero index in Matlab if lft(z)==rgt(z), nw(iy,ix)=lft(z)-1; else if rule==1, nw(iy,ix)=nh(z); else nw(iy,ix)=lft(z)-1+rand*nh(z); %The new value here. end; end; end; The second rule may be modified, instead of using a new random value for the histogram bin where each pixel is tossed, the random generator adapts itself, giving more chances to the poorer. We don’t give the solution for a possible third rule which assigns the new colour depending on the pixel environment. This is slow, and delicate: when a new colour is assigned in a xy loop, we don’t know yet the colours of all the neighbours of the modified pixel. So, anyway a mixed strategy seems easier to adopt: the rule 2 is used, and then the image is despeckled by averaging or median smoothing. 4.3 Transparence Channel The administration of the transparent areas on the image will be discussed in the next section. We note only here that there are several possibilities to deal with “invisible” zones of a picture. • For indexed images one specific palette index is treated as the “transparent” colour. This is often used with GIFs. • A full α channel is an integral part of the image. In such a way it is possible to specify the degree of opacity. • If more than one transparency channel is used, for example if a specific opacity channel is attched to every other “visible” colour plane, the transparence becomes colourful, which can be used for many special effects, artistic, or technical, such as selective visualisation of very complex volumetric data. 4.3 Transparence Channel 37 The full power of the transparence channels shows itself when image are composed and superposed. In the simplest case the displaying engine should only lookup the transparency value of a pixel and to display it or not. More precisely: it should display either the pixel or the background, explicit, or by default. Of the full α channel is present, it is necessary to perform the interpolation, the displayed colour is equal to I = αI + (1 − α)B, where I is the image, and B – the background. We see that α here is the opacity rather than the transparence. Some delicate questions may be posed concerning the influence of filtering on the transparence channels. The subtraction of two transparence values is rather difficult to understand, and unless you know what you are doing, it is better to stay far away from that. On the other hand, a blending between a “normal” colour and the transparence channel is extremely important – this is a standard way to introduce soft shadows to images. Chapter 5 Image Algebra 5.1 Addition, Subtraction and Multiplication Apparently there is nothing really fascinating here. If we manipulate images as numeric matrices, we can add them, multiply by constants or element-wise, bias the pixel by some additive constants, etc. There is just a few intuitive rules to master. The reader knows already almost everything if he learned well the filtering operations 1. Multiplication by a constant c < 1 darkens the image, and c > 1 lightens it. Such multiplication is the simplest case of histogram stretching. 2. Averaging two images interpolates between them. Usually adding is followed by the division of resulting values by 2, otherwise the result which always enhances the intensity of the concerned channel may be illegal. By varying in a loop the parameter α ∈ [0, 1] used to combine additively two images: I = (1 − α)I1 + αI2 , we obtain a blending (fading-off) between the two source images, often used in animation, or real movies. If we dont need just one interpolation, but a whole sequence (for example in morphing discussed in the section (7.2)), or other kind of animation, the linear (in time) blending might be too sharp, and often a different interpolator: a sigmoidal function: I = (1 − s(α))I1 + s(α)I2 , where s(α) = 3α2 − 2α3 , is used. (Here α is the interpolation time, between 0 and 1. This function (which is one of the Hermite basic splines) maps the unit interval into itself, and the “speed” of mapping varies smoothly at the beginning and the end of the process. Don’t forget that geometric manipulation of images need more elaborate interpolation between pixels. Several image processing packages, as PhotoShop offer a smoother interpolator: the bicubic, two-dimensional Catmull-Rom spline. This is not discussed here. 3. The subtraction in general may produce negative numbers, and the arithmetic package should crop them to zero. Such effects as embossing, etc. need the 38 5.1 Addition, Subtraction and Multiplication 39 addition of an additive constant to the result of a subtraction. Of course this addition is performed before the eventual cropping. 4. The possibility to multiply the images means that the colour space is normalised to [0, 1]3 , all three planes contain the percentage of the “full” colour. Some packages, e.g. PaintShop apparently do it wrongly, and the multiplication of two dark images might produce something bright. Of course, if 0 is black and 1 is white, the multiplication can only darken the image, and this is the way of adding ex post shadows to an image. How to lighten an image fragment? Simple: take the negative of the image and the negative of the “anti-shadow”. Multiply them, which will darken the result, and invert it back again. Of course the division of images is an ill-defined operation, and should be avoided, unless you have some private crazy ideas about its meaning. (The reader who follows a more detailed course on image analysis knows nevertheless that a division is used to sharpen the images. If the image is smoothed by its convolution with a Gaussian-like filter, in the Fourier space the transformed image is the product of the transforms of the image and the filter. So, if we have a blurred image we can – looking at the supposed edge smearing – extract the width of the blurring filter which would produce the same effect, prepare artificially the trasform of this filter, and divide the image transform by it. The inverse transform of the result should sharpen the image. This technique is often used in astronomy. Of course it might introduce, and usually does, a substantial local noise, due to the fact that big frequencies have been enhanced. The Fourier transform of a Gaussian is a Gaussian, and dividing by it raises considerably the values of the “frequency pixels” far from the origin.) 5.1.1 Some Simple Examples Let us construct an artificial shadow shown on Fig (5.1). There is absolutely nothing fascinating here, this picture has been made in one minute. The shadow is constructed from the distorted letter, multiplied by the background image. A very simple image package will require from the user the correct choice of the filling gray before applying the multiplication, otherwise the shadow may become too dense or too clear. But a more intelligent and more interactive approach is possible also – don’t forget the α channel. Even if it is not available explicitly (as in some contexts in Photoshop, where apparently the transparence manipulations have been designed by a committee of 1439 ambitious experts. . . ), it is usually possible to declare globally the transparence of the blending colours. We have mentioned that in order to “anti-shadow” a fragment of the image we shadow its negative and we invert le result. But this is not always what we want. If a bump-mapping texture produces a strong light reflex on the surface, this effect 5.1 Addition, Subtraction and Multiplication 40 Fig. 5.1: Simple shadow should be added to the normal, “diffuse” colour. More about that will be found in the section (6.1) devoted to the bump mapping. Another example is a variant of the solarization effect. A solarized picture is in fact a badly exposed and/or badly processed photograph. The dark and middle zones remain without changing, but parts which on the original are too light are “burned out” and become dark. Fig. (5.2) shows two variants of this effect. At the left we see the original Photoshop solarization, and at the center – our modified variant, which additionnaly enhances the contrast. Fig. 5.2: Solarization The middle picture is particularly easy to obtain: it suffices to compute the difference between the image and its negative, and to invert the result. In fact, for light areas we get 1 − (I − (1 − I)) = 2(1 − I), an enhanced negative. For the light parts we have 1 − (1 − I − I) = 2I. The “original” solarization may be obtained with a trivial filter – divide the image by two. Obviously, the solarization applied to colour picture gives usually useless and disgusting effects (Who wants a deadly green Lena?) But the particular intensity 5.2 Working with Layers 41 shift may introduce some almost specular, silky reflexes, whose colour may then be corrected by the “hard light” composition rule discussed in the next section. This gives us the right picture on Fig. (5.2). 5.2 Working with Layers Layers are conceptually simpler than channels, but technically they may be quite comples. They can be thought of, as superposed transparencies. When we have to compile a rather complicated algebraic superposition of various images, it is preferable to put all the parts on separate layers, eventually to duplicate and mask some of them, and when the image is ready, we can collapse all the layers into one. Conceptually a multilayer image is just a M × N × D “tensor”, where D is the number of layers. Of course, it is possible to do everything using separate images In popular packages like Photoshop they are integrated into the interface. The advantage of such protocol is thet the layers are ordered and we see immediately which one may cover the others. The layer interface provides automatically some number of operators, for example a two-layer image may be composed declaratively as a “normal” superposition, where the upper image dominates everywhere where it is not transparent, or a multiplication, difference, etc. Perhaps it might be useful to comment various layer combination modes choses be the Photoshop creators according to user wishes. We know that the normal mode covers the underlying layers. Here are some of the remaining modes. We don not plan to teach the reader how to use Photoshop, but to suggest which facilities should be present if one day he tries to construct his own image processing superpackage, which will replace Photoshop, Gimp, and all the others. You will see that many of these modes are simple combinations of more primitive operations. There is a difference between global layer composition mode, and the drawing tool modes. Suppose that the upper layer pixel has colour c0 , and the layer beneath – c1 . • Dissolve. This mode constructs the pixel colour c by a random choice between c0 and c1 , depending on the opacity of c0 . If the upper layer is opaque, always c0 is chosen. This might be used to simulate a spray (airbrush) painting. • Behind. Used for painting. If a layer is thought of, as an acetate sheet, the normal painting (brush, etc.), replaces the existing colour. The “behind” mode is equivalent to painting at the back of this sheet. Of course it has some sense if the sheet contains some transparent or semi-transparent areas. In such a ways one layer can be used twice. • Clear. It is just the eraser, but attached to the line (stroking) tools, or filling commands (path fill and paintbucket). It renders the touched areas transparent, and of course is used when the manual erasing would be too cumbersome. 5.2 Working with Layers 42 • Multiply. Multiplies the pixel contents considered as fractions between 0 and 1, channel by channel. The result is always darker, unless one of the concerned pixels is white. This may be a global option. • Screen mode. This is the “anti-shadowing” effect. For each channel separately the inverses c1 and c0 are multiplied, and the inverse of the result is taken. The final effect is always lighter. Screening with white gives white, and with black, leaves the colour unchanged. • Overlay mode. This is a little complicated, and may either multiply or screen the pixels depending on the base colour. If the base (c1 ) and the blending (upper, c0 ) colours are random, the result is difficult to understand. The idea is to preserve the shadow and the lights of the base colour: where it is strong, it remains, otherwise the blending colour “wins”. The result is af we looked at the base image through a coloured glass, but “active”, i. e. a white blending colour may lighten the image, give some milky appearance to it, while black does not destroy tha image, but darkens it, and the darkening is more pronounced where the areas are already dark. Yes, obviously an example is needed. . . Fig. (5.3) shows the effect of overlaying. Fig. 5.3: Overlay mode Exercice. Try to deduce the underlying colour algebra for this operation. • Soft Light mode. Now the blending colour may darken or lighten the base image. Imageine that c0 represents a diffused spotlight. If it is light, lighter than 0.5, then the image is lightened, otherwise it is darkened. (An “anti-light” effect). If the blending colour is white or black, the result is pronounced, but never turns int white or black. • Hard Light mode. Here the effect is stronger, the image is lightened or darkened, depending on the blending colours. If c0 is white, the result is screened, if it is dark, the result is multiplied, so it is possible to get very light (white) highlights, or very deep shadows. 5.2 Working with Layers 43 • Darken. This is simple. The darker colour is chosen. There is also the “Lighten” option. • Difference. The result is the absolute value of the subtraction of two colours, the result is never negative. • Hue mode. The hue (spectral location) of the blending colour replaces the base hue, but the saturation and the luminance remain. Imagine that the base image has been converted into gray, and then artificially coloured with the blending pixels. The Color mode replaces not only the hue, but also the saturation, the effect of artificial tainting is more prononced then in the previous mode. The Luminosity mode is the inverse of Colour – the hue and saturation remain, the luminance is taken from the blending image. There is finally the saturation mode which takes the saturation only from the blend layer. This mode may be used for the selective (through painting) elimination of some too saturated colours, for example for simulating the atmosphering colour attenuation with the distance. 5.2.1 Construction and Usage of the Transparence Channel All these, algebraic and layer operations are mathematically trivial, and belong to the practical folklore of the image processing art. We have mentioned that the work with the transparence channel may be delicate. If it is not accessible directly, several possibilities to simulate it exist, provided the selection tools and the image arithmetics is sufficiently complete. In particular, knowing that the transparence per se cannot produce visible results, superposing a partially transparent image I1 with the opacity α < 1 over the base image I0 , means that the result is computed as I = αI1 + (1 − α)I0 (unless some non-standard arithmetic: “screen”, “soft light”, etc. mode is used. We shall not discuss this here). Some image synthesis packages, as 3DMax generate automatically the alpha channel: everything which belongs to the image is white, and the background is black (if we choose to “see” the opacity in such a way). This may be very important when the image is subsequently used for animation – the system automatically filters out the transparent zones when composing the final picture. If you have to do it by hand, and if you wish to obtain an effect shown on Fig. (5.4), you must • Pick out the fragment of the original image (at the left) which will be erased, and construct a mask M . • M – say – black on white, is multiplied by the image. The face is erased. • The same mask, but inverted, multiplies the replacement (unfortunately its original disappeared somewhere. . . ) 5.2 Working with Layers Fig. 5.4: Composition of two images • The two components are added. 44 Chapter 6 Some Special 3D Effects 6.1 2D Bump mapping The technique of bump-mapping is often used in the synthesis of 3D scenes, where it simulates spatial textures: by deforming the normals to the surface of the rendered object, it modifies the illumination conditions, and simulates bumps, rugosity, holes, etc. The aim of the bump mapping in the domain of image processing might be different. Of course, we can produce some 3D effects which simulate the extrusion of a flat contour, for example a text, but more often this technique is used to add some texture to the surface of the image: to simulate a canvas, a piece of hairy tapestry, etc. The 2D image is already rendered, and adding some additional lighting effect should not deteriorate the colour contents of the picture, so the bump mapping should not be overdone. The shadows should not be too profound, and the specular spots rather narrow. And, in general, the size of bumps should not be too big either. Fig. (6.1) shows the result of the application of a bump map. Fig. 6.1: Bump mapping This effect is produced by the simulation of a directed lighting on a two-dimensional image, and in general may be quite involved, with many parameters. The basic idea goes as follows. Imagine that you have a (fake) extrusion given by a gray (intensity) image, whose one-dimensional section is shown on Fig. (6.2). The light beam is represented by the red dashed lines. The bump map has the same size as the work image. Those regions which correspond to the angle 90◦ between the normal to the 45 6.1 2D Bump mapping 46 bump profile and the light direction will be enhanced. Any specular model may be used, for example the Phong formula Is = k cos(θ)n where n may vary between, say, 3 and 200. But beware: in modeling glossy effects on a 3D scene the specular contribution is always positive. Here the shadow is active and may darken some parts of the image. We have to choose a priori the “neutral” direction (usually “horizontal”) where the lighting effect vanishes. Fig. 6.2: Bump mapping geometry Our mathematics is conditioned by the fact that we don’t have the profile geometry explicitly given. The bump map is just an image, where the black areas represent the extrusion, and white is “flat” (or vice-versa). These are the main computations involved: • Deduce the fake normal vector to the bump image. • Compute the scalar product of this normal and the direction of light. • Enhance the result using the Phong (or other) formula, and split the dark and light areas in order to superpose them on the work image. • Complete this last algebraic manipulations. The blue (mostly negative, slightly truncated) profile on the Fig. (6.2) is the cos(θ) where θ is the angle between the normal and the light direction vector, normalize so as to “neutralize” the horizontal parts of the image. The normal is a vector orthogonal to the gradient, and the gradient is just an exercice in filtering. It should be done carefully in order not to truncate too early the negative values. We can obtain separately the x and y components of the gradient, or directly, by a judiciously chosen offset, the gradient whose xy projection is collinear with the light beam direction. The normal to the image has the same property. We 6.2 Displacement Maps 47 obtain a standard directional “embossing” effect, which is then separated into light and dark contribution by the (signed, and truncated) subtraction of the neutral gray. The contrasts of the shadows and reflexes should be enhanced, and the rest is almost trivial. 6.2 Displacement Maps This is an extremely popular and powerful deformation technique. In general, geometric (“horizontal”, not just in the colour space) deformations, such as twirling, bumping of a part of the image etc. are not presented as mathematical, analytically given transformations, but their sources are shapes themselves. The displacement map is an image. It may be of the same size as the deformed subject, or any other, in which case we will use scaling. Call Ixy a point on the original image, and Dxy the colour value corresponding to the reduced point of the map. The reduction takes place if the sizes of I and D are different. Then, the point (x, y) of the image corresponds to WI x, (6.1) WD HI y = y (6.2) HD on the map. The inverse transformation, which is trivial, will also be needed. Note that in general these transformations are real, not integer. The value of D determines the displacement vector, which in general has two components, so the displacement map should be at least a two-channel picture. (Photoshop uses the Red and Green channels, Blue is neutral). Since the value of the pixel is somewhat conventional (the interval [0, 255] is meaningless), one usually adopts an additional convention: the maximum displacement length (in pixels) s is established independently of the map D. The minimum colour (0) of D corresponds to the maximum shift in one direction, say, to the left, and the maximum value – to the right. The “gray” value (say, g) is neutral. Of course a more general algorithm can introduce a special offset, but it is a complication which can be resolved separately. Now, the deformation goes as follows. For all the pixels (x, y) of the new, deformed image the point (x, y) on D is computed. Suppose that the map is by convention normalised so that the maximum colour value is equal to 1 (then g = 1/2). The vector generated by the two used channels of the map image is Dx , Dy ). From this value the displacement vector is calculated: x = (dx , dy ) = (s · (Dx − g), s · (Dy − g)) . (6.3) (Of course, it is possible to use different horizontal and vertical scales and offsets). Now, the new pixel at (x, y) is not Ixy , but another value Ix0 y0 , where (x0 , y 0 ) = (x + dx , y + dy ), (6.4) 6.2 Displacement Maps 48 interpolated appropriately, since (x0 , y 0 ) need not be integer, and the neighbouring pixels are looked up. As an example we show how to produce an enlarging lens: a circular displacement map which moves to the left all pixels which are to the left of the center, and to the right the right pixels. The same holds with respect to the vertical coordinate. Fig. (6.3) shows the two displacement channels and the result. Fig. 6.3: Simulating lenses by displacement map Here the gradients were linear and the displacement zones were homogeneous either vertically or horizontally. Much more elaborate effects can be obtain by using blurred d-maps, and by masking the deformed zones, confining them to some areas. Th Fig. (6.4) shows a little different distortion of the original, which simulates a hemi-spherical transparent “water drop” on the image. Fig. (6.5) suggests how to derive the displacement map image from the desired geometry, but we suggest that the reader performs all the computations himself. Fig. 6.4: Displacement map simulating a transparent hemisphere 6.2.1 Another example: turbulence Many packages offer the “twirl” effect which simulates a whirlpool. Suppose that we wish to simulate more or less realistically a physical whirl vortex, for example a big 6.2 Displacement Maps 49 Fig. 6.5: The “water drop” cyclone needed for an atmospheric texture. We suppose that it is a 2-dimensional (cylindrically symmetric) problem. Fig (6.6) shows some geometric relations which must be obeyed according to some conservation laws. Suppose that the “matter” moves counterclockwise, and is sucked into a central region, where the laws are different. r0 r1 dr Fig. 6.6: The geometry of a tornado The thin ring of radius r and width dr occupies the area 2πr dr. If we assume that the “matter” inside is not compressible (or that its pressure does not change significantly), the surface conservation law determines its radial speed. The constancy of the area means that dr = const/r, which, when integrated over time gives the p functional dependence r(t) = r0 1 − t/t0 . The constant t0 is chosen so that at that time the vortex element falls into the center. This time depends on the initial conditions and on the “force” of the vortex. In order to construct an incremental displacement map we will not need this integrated formula, but it might be useful to know it. This result is independent of the angular motion. Here the angular momentum conservation determines the rotational speed. For a thin ring its angular momentum is given by M = 2πr dr · ωr, which means that r2 ω is constant. We may thus postulate that in a short (and constant) interval of time we have the displacements:∆r = c/r, and ∆φ = h/r2 with some appropriate constants. These 6.2 Displacement Maps 50 equations need to be translated into the Cartesian system, an exercice which we leave for the reader. Fig. (6.7) shows the map and the result of its iterated (about 15 times) application to some ordinary fractal clouds. Fig. 6.7: Tornado We have eliminated the singularity at the center by introducing a small neutral zone, but many other possibilities are possible, for example replacing 1/r by r/(r2 + ). It should be obvious for everybody that this technique, or a similar one is still not a good way to produce realistic cyclones. The clouds here are smeared, while any satellite photo shows that at the scale of the turbulence the cloud edges are quite sharp. Actually, the clouds are created within the turbulence, and their local properties are not determined (or only partly) by the spiraling motion. A good cloud model, which takes into account the pression of the air and the condensation of the steam due to the adiabatic decompression, and which somehow incorporated the third dimension into play, simply does not exist. Any takers? Chapter 7 Other Deformation Techniques 7.1 Warping Warping became an over-used term. It may be a general deformation, but for us it will mean an interactively produced image distortion, which respects some continuity constraints – as if we put the image on a rubber sheet, and then the ruber was deformed with all the pixels following (and, of course with some interpolation). The warping is the subject of a thick book, and we cannot discuss here all the possible techniques, nor applications. It is not possible to teach how to practically obtain particularly disgusting deformations of our favourite political personages either. Thus, we sketch only the usage of the elastic deformation theory, and we suggest how a sufficiently general warping package might be structured. We stress that the elasticity is just one among many possible models, and its only purpose is to introduce some regularity into the deformation process, to restrain the madness of the creator. Imagine – for simplicity – a one-dimensional elastic line shown on Fig. (7.1). Now, we should not confound the coordinate position x on this axis, and the dynamic position which we call p(x), or the dynamic displacement of a point, which we call d(x). This is a trivial observation for a physicist, but for computer science students the fact that the displacements belongs to “the same space” of x is confusing. In the initially static configuration p(x) = x, or d(x) = 0. x00 , x01 x1 x0 x02 x2 Fig. 7.1: One-dimensional elasticity The original configuration is the lower line. The rubber element at the point x0 has been displaced, and finds itself at x00 . Thus, p(x0 ) = x00 . Its neighbours follow, but their displacement is not parallel to the central point, because the element at the 51 7.1 Warping 52 left is attracted by the left neighbours, and the element at the right is less attracted (or more repelled) by its right neighbours. The neighbourhoods are infinitesimal, we may consider x1 = x0 − dx; x2 = x0 + dx, but again – do not confound this, with the – also infinitesimal – displacement d(x0 ) = x00 − x0 . We remind that the original configuration is in equilibrium, the elastic forces cancel themselves. Now we displace the element at x = x0 . What force acts upon it now? We build-up the following, easy to grasp equation: F (p(x)) = k (d(x + dx) − d(x)) − k (d(x) − d(x − dx)) = k (p(x + dx) − 2p(x) + p(x − dx)) , (7.1) which takes into account that the net force is the result of the incomplete cancellation, and that the difference between the displacements of two neighbours is equal to the difference between the shifted positions, the absolute contributions cancel. We see that the development of p(x + dx) into the Taylor series about x must go up to the second derivative, as the first derivatives cancel. Knowing that the force is proportional to the acceleration d2 p/dt2 , we have ∂2 ∂2 p(x, t) = C 2 p(x, t), ∂t2 ∂x (7.2) i. e. the one-dimensional wave equation, as expected. In the multidimensional case the spatial derivative should be replaced by the Laplacian. In the equilibrium ∆p = 0. When we pass from the (2-dimensional) continuum to a discrete grid indexed by pairs ij : xij + dx = xi,j+1 etc., the Laplace equation takes the form 4pi,j − pi−1,j − pi+1,j − pi,j−1 − pi,j+1 = 0. (7.3) This means that the equilibrium position of each node is the symmetric barycenter of its four neighbours. We might – perhaps this digression will be useful for some readers – derive this formula directly within the discrete formulation. Imagine a discrete grid whose vertices are connected with elastic springs. The potential energy of the system is equal to the sum of the energies of the corresponding oscillators: U= kX (pj − pl )2 , 2 jl (7.4) where j and l are two-dimensional, double indices, locating the vertix in the grid. The sum goes over all pairs of neighbouring indices, i. e. over all springs. The equilibrium is obtained when the energy reaches its minimum, as a function of the positions {p}. The derivative over pk gives X ∂U (pk − pi ) = 0 =k ∂pk i for all k, (7.5) which agains shows that pk is the arithmetic average of the positions of its neighbours. Now, in order to solve numerically such set of equations, we must fix some of 7.1 Warping 53 the vertices, introduce some boundary conditions. Fig. (7.2) shows what happens when we displace one point and fix its position. The left drawing shows the initial equilibrium, which is trivial, but there is no cheating: we have fixed the border of the grid, and a MetaPost program found the internal vertices. Fig. 7.2: Elastic displacement We see that the elastic adjustment of the neighbourhood does not necessarily prevents the mesh cross-over. In this example the mesh boundaries are too close. Fig. (7.3) shows a less drastic mesh folding. Fig. 7.3: Grid size importance If huge, but very localized warping is needed, it is better to do it in several stages, or to use a different elastic model, otherwise a very dense grid would be needed. The solution of the discretized Laplace equation is easy: the assignment (7.3) is iterated until convergence. If the grid is large, this process might be slow. In general, which facilities should provide a warping package? 1. Of course it should ensure a correct interfacing for the image import, and the image and animation export. More about that in the morphing section below. The grid should be adaptative. 2. It does not need to be visible. It is perfectly possible to give the user just the possibility to fix some point or lines (segments, polygons, splines), and the system may choose the grid in function of the geometric details. 7.2 Morphing 54 3. The warping may be defined by dragging points, lines, or whole areas. The internal boundary conditions may be established, and the system should understand that if the warping zone is constrained within a closed area, its exterior does not participate in the iterative solution of the Laplace equation. 7.2 Morphing Morphing is a combination of (generalized) warping and colour interpolation, which deforms one image in such a way that at the end it becomes another one. The Internet is full of clichés with the deformation of human faces one into another (e. g. the Michael Jackson’s “Black and White” clip), or the transformation of human faces into a beast (wolf, tiger, etc.). Thus, we shall not show a morphing example here. Usually the warping phase is liberal, and there is no “elasticity” involved, especially if the source and target images are so different that it is difficult to find some common topologic patterns. The user splits the source image in simple polygons, usually triangles, although any contours can be used, provided that the morphing software can perform well the nonlinear transformations between arbitrary closed contours. On the target image the same set of polygons is automatically re-created, and the user deforms them manually, moving the vertices. Most of the popular morphing programs are too tolerant: the user may produce topological singularities forcing some triangles to overlap, or leaving holes. Even the construction of the covering triangles may be clumsy, and some packages help the user by an automatic triangulation, for example using the Delaunay algorithm, which produces “equilibrated” triangles, not too elongate, if possible. The user chooses only some number of control points. For example, when morphing faces it is natural to localise the eyes the mouth corners, and the face contour. The affine transformation between triangles has already been discussed, the only modification here is that this transformation, and the corresponding texture transmutation are multi-stage processes. If a value: the point position or the pixel colour passes from v0 into v1 in N stages, with v(0) = v0 and v(N ) = vN , then v(i) = N1 ((N − i)v0 + ivn ) if we choose to apply the linear interpolation. Often a more general approach is better for artistic reasons. Instead of using the linear interpolation: (1 − x, x) (where x symbolically denotes i/N ), the Hermite cubic function 3x2 − 2x3 is used. In the Michael Jackson’s clip a more sophisticated approach has been adopted: some parts of the image converge faster to the final form than others. When changing a human face into a beast (or into a well known politician), the artist might ask himself several questions, for example: • Perhaps it would be better to go with the warping all the way through before interpolating the colours: first the distortion of the shape, and then put some hairy texture upon it. Or, inversely, first put the hair, scales, or spots on the 7.2 Morphing 55 skin of the original human, and only then transform it into a lion, a fish, or somebody whose name we know all, but we won’t mention it here. • The speed of the transmutation need not only be non-linear, but it might be asymmetric. Shall it start slowly, and speed-up, or decelerate? The abundant amateurish morphing packages on the Internet continue to develop. If you want one day to make your own, please consider the following advice: 1. The interface should be professional, and this does not mean that the number of menus and buttons should be greater than 10. . . The package should be able to import at least 3–5 standard image formats, and generate at least two different ones. It should also be able to save on the disk the working context, i. e. the point meshes and/or the triangles. 2. Don’t forget that a graphical interface without the “undo” command will soon or later, but rather soon, end in the waste basket. 3. Use a professional colour interpolation scheme. Aliasing in morphs is rarely acceptable. 4. Generate separate intermediate images, but learn and generate also some compound images, for example the animated GIFs. This is not difficult. You might generate also some scripts which will drive the MPEG encoder launched automatically by your package. 5. It is very frustrating when one cannot morph two images which differ (even slightly) in size. Offer the user the possibility to crop one of the images, or to resize it, or the other one. Preferably both, with the resize parameters constrained to preserve the proportions. 6. A dense mesh of control points and lines is a mess, badly legible on colour images. Use colour for the control entities, but permit the user to mask the colour (or to attenuate it) of the images on the display. Gray images are significantly clear, and do not risk to render invisible red control dots. 7. Plug-in the Delaunay or other automatic triangularisation algorithm. 8. Cry loud when a topological incongruity is generated: overlapping triangles, or holes. Think about offering – as an option – a limited warping schema, for example the elastic model. 9. It will be nice to parametrise the transition speed, to introduce some nonlinearities, or even to choose different warping/interpolation speed in different parts of the image, as suggested above. 10. Add some really original examples. This remark is very silly, of course. But apparently the morphing package creators pretend to ignore it. . .