A Flexible Shearlet Transform - Sparse Approximations and
Transcription
A Flexible Shearlet Transform - Sparse Approximations and
A Flexible Shearlet Transform - Sparse Approximations and Dictionary Learning Bachelorarbeit zur Erlangung des Grades Bachelor of Science in der Mathematik Technische Universität Berlin Fakultät für Mathematik und Naturwissenschaften Studiengang Mathematik vorgelegt von Sandra Keiper (Matrikelnr.: 318795) Erstprüfer: Prof. Dr. Gitta Kutyniok Zweitprüfer: Prof. Dr. Reinhold Schneider Die selbstständige und eigenhändige Ausfertigung versichert an Eides statt Berlin, den ........................................................................... Unterschrift Deutsche Zusammenfassung der Arbeit Das Feld der Bildverarbeitung ist sehr breit gefächert. Es reicht von Gebieten wie dem Entrauschen von Bildern, der Kantenerkennung bis zur Kompression von Bildern. Ein wichtiges Ziel ist es, Bildmodelle und zu diesen Repräsentationssysteme zu finden, die in der Lage sind Kanten in einer optimalen Art zu erkennen. Optimal heißt dabei, dass wenige Elemente des Repräsentationssystems ausreichen um die Kanten zu erkennen oder das Bild darzustellen. Die ersten gut bekannten Repräsentationssysteme sind Wavelet Basen/Frames. Diese haben gute Eigenschaften im Erkennen von Punktsingularitäten. Kanten können Waveletsysteme allerdings nicht in der zuvor erwähnten optimalen Art erkennen. Aus diesem Grund haben sich Gitta Kutyniok und andere damit beschäftigt eine Erweiterung dieser Systeme zu finden, die dazu geeignet sind Kanten in einer optimalen Art zu erkennen. Diese Arbeit beschäftigt sich mit dieser Weiterentwicklung von Waveletsystemen, den sogenannten Shearletsystemen und die Optimalität der Shearletsysteme für eine N -Term Approximation der Bilder. Dazu wird im ersten Teil der Arbeit der theoretische Grundstein gelegt; Die Fourier Transformation und Hölder Räume sowie Frames und Wavelets werden eingeführt. Danach beginnt der eigentliche Teil der Arbeit. Wie zuvor erwähnt muss zunächst ein Bildmodell gefunden werden, das natürliche Bilder möglichst allgemein modeliert. Mit natürlichen Bildern meinen wir dabei Bilder die in der Natur tatsächlich vorkommen, also zum Beispiel Fotografien von realen Dingen. Ein verbreitetes Modell ist das Modell der Cartoon-ähnlichen Bilder (besser bekannt unter ihrer englischen Bezeichnung: cartoon-like images). Die Idee dieses Modells ist, dass ein Bild f ∈ L2 (R2 ) (oder bei einem natürliches Bild ein genügend kleiner Bildausschnitt) aus zwei glatten Gebieten besteht die durch eine (ebenfalls glatte) Kurve von einander getrennt sind, für eine bessere Vorstellung siehe Abb. 1. Die Glattheit ist dabei natürlich noch genauer zu spezifizieren. Die Arbeiten von Gitta Kutyniok und anderen basieren auf Cartoon-ähnlichen Bildern, deren beiden Regionen genauso wie die Unstetigkeitskurve C 2 -stetig sind. In dieser Arbeit soll es nun aber darum gehen diese Resultate auf allgemeinere Cartoon-ähnliche Bilder zu übertragen. Das heißt statt der C 2 -Stetigkeit wollen wir nur noch C β -Stetigkeit der beiden separierten Regionen und C α -Stetigkeit der separierenden Kurve fordern, wobei 1 < α ≤ β ≤ 2. Definition (Cartoon-ähnliche Bilder). β Die Menge der Cartoon-ähnlichen Bilder Eα,L (R2 ) ist die Menge aller Funktion f : R2 → C der Form f = f0 + f1 χB wobei B eine sternenförmige Region mit stückweiser C α (R2 )-stetiger Randkurve ist und f0 , f1 ∈ C β (R2 ) ist Diese Klasse von Bildern wollen wir nun mit Hilfe eines Repräsentationsystems approximieren. Sei also Ψ = (ψi )i∈I eine Basis für L2 (R2 ). Dann existiert für jedes f ∈ L2 (R2 ) eine Folge von Koeffizienten (ci (f ))i∈I so dass f dargestellt werden kann in der Form: X f= ci (f )ψi i∈I Abb. 1: Beispiel eines Cartoon-ähnlichen Bildes. Für die N-term Approximation suchen wir nun die N betragsmäßig größten Koeffizienten (ci (f ))i∈IN und rekonstruieren f in der Form: X fN = ci (f )ψi i∈IN Der Fehler kfN − f k den wir dabei machen fällt natürlich wenn wir mehr Koeffizienten dazu nehmen. Mit welcher Rate dieser Fehler fällt, wenn wir als Repräsentationssysteme Shearletsysteme wählen, soll in dieser Arbeit überprüft werden. Für die spezielle Klasse von Cartoonähnlichen Bildern mit C 2 -Regularität sowie für dreidimensionalen Fall (hier auch für die allgemeinere Klasse von Cartoon-ähnlichen Bildern), also für f ∈ L2 (R3 ), sind die Approximationsraten schon bewiesen worden. Beachte das Shearletsysteme keine Basen mehr sind, sondern allgemeiner Frames, der Einfachheit halber soll uns dies aber in dieser Zusammenfassung nicht weiter interessieren. Für diese Klasse von Cartoon-ähnlichen Bildern wird in Kapitel 3 gezeigt, dass die beste Fallrate, die allgemein erreicht werden kann bei N −α liegt. Und das Resultat dieser Arbeit wird sein, dass mit Hilfe von Shearletsystemen diese Approximationsrate erreicht wird. Aber zunächst wollen wir Shearletsysteme definieren. Definition. Für α ∈ [1, 2) seien die Skalierungsmatrizen A2j , Ã2j , j ∈ Z, definiert als " # " # 2jα/2 0 2j/2 0 A2j = , Ã2j = , 0 2j/2 0 2jα/2 and die Shearmatrix Sk als " # 1 k . Sk = 0 1 Dann ist für c = (c1 , c2 ) ∈ R2+ das kegel-adaptierte diskrete Shearletsystem SH Φ, Ψ, Ψ̃; c, α für den Parameter α ∈ (1, 2] generiert durch Φ, Ψ, Ψ̃ gegeben durch SH Φ, Ψ, Ψ̃; c, α = Φ (ϕ; c1 , α) ∪ Ψ (ψ; c, α) ∪ Ψ̃ (ψ; c, α) , wobei Φ (ϕ; c1 , α) = ϕm (· − m) : m ∈ c1 Z2 , n o Ψ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ (Sk A2j · −m) : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ cZ2 , n o Ψ̃ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ SkT Ã2j · −m : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ c̃Z2 , Abb. 2: Aufteilung des Frequenzbereiches. mit c̃ = (c2 , c1 ). Mit cZ2 meinen wir, dass für z = (z1 , z2 ) ∈ Z2 cz = (c1 z1 , c2 z2 ) ist. Wenn man die Aufteilung des Frequenzbereiches, die in Abb. 2 dargestellt ist, betrachtet kann man zeigen, dass Ψ (ψ; c, α) eine Frame für n o L2 f ∈ L2 (R2 ) : ess-supp fb ⊂ C1 ∪ C3 . ist und Ψ̃ (ψ; c, α) für L2 n o f ∈ L2 (R2 ) : ess-supp fb ⊂ C2 ∪ C4 . Das Hauptresultat der Arbeit lautet nun: Theorem. Angenommen ϕ, ψ and ψ̃ ∈ L2 (R2 ) haben kompakten Träger und SH Φ, Ψ, Ψ̃; c, α bildet eine Frame für den L2 (R2 ). Dann erfüllt SH Φ, Ψ, Ψ̃; c, α - unter einigen zusätzlichen Voraussetzungen, die in der folgenden Arbeit spezifiziert werden - fast die Eigenschaft der optiβ malen Sparse Approximation für Funktionen f aus Eα,L (R2 ). Das heißt, es gibt eine Konstante C > 0, so dass: kfN − f k22 ≤ CN −α · (log2 N )α+1 für N → ∞, wobei fN die N -Term Approximation von f ist, die durch auswählen der N betragsmäßig größten Koeffizienten entsteht. Im letzten Abschnitt der Arbeit wird die Implementation der Shearlet Transformation beschrieben und mit dieser Implementation für verschiedene Bilder die Abfallrate des Fehlers der N -Term Approximation berechnet. Contents 1 Introduction 11 2 Theoretical Basics 13 2.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Hölder Spaces and Fractional Order Sobolev Spaces . . . . . . . . . . . . . . . . 15 2.3 Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Optimality Result β Eα;L 24 3.1 The Class of Cartoon-like Images . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Optimality Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 Decay of the Approximation Error using a Shearlet System 29 4.1 Cone-adapted Shearlet System . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Decay Rate of the N-term Approximation Error . . . . . . . . . . . . . . . . . . 32 5 Proofs 37 5.1 Proof of Proposition 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Proof of Proposition 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3 Proof of Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Proof of the Main Result Theorem 4.6 . . . . . . . . . . . . . . . . . . . . . . . 51 6 Extension to a Singularity Curve with Corners 58 7 Implementation 61 7.1 Shearlab Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 7.2 N-Term Approximation and Results . . . . . . . . . . . . . . . . . . . . . . . . 66 7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8 Conclusion 74 List of Figures 1 Example of a cartoon-like image. . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2 Natural image containing a cartoon-like structure. 3 Partition of frequency domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Support of the shearlet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 Intersection of the cubic window with the support of a shearlet and the discon- . . . . . . . . . . . . . . . . 25 tinuity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6 Shearlets interacting with corner points. . . . . . . . . . . . . . . . . . . . . . . 41 7 Shearlet intersecting the boundary curve and the smallest parallelogram P that entirely contains the curve in the interior of the shearlet. . . . . . . . . . . . . . 48 8 Shearlets interacting with corner points. . . . . . . . . . . . . . . . . . . . . . . 58 9 Low-pass filter, which is used in the implementation. . . . . . . . . . . . . . . . 63 10 High-pass filter, which is used in the implementation. . . . . . . . . . . . . . . . 63 11 Pair of quadrature mirror filters, which is used in the implementation. . . . . . 64 12 Two-dimensional low-pass filter in frequency domain. . . . . . . . . . . . . . . . 64 13 Two-dimensional fan filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 14 Family of synthetic images of more and more parallel oscillating curves. . . . . 69 15 Graph of the approximations. Comparison of the different images for fixed α. . 70 16 Graph of the approximations. Comparison of the different images for fixed α. . 71 17 Famous images for Image Processing: ’Lena’ and ’Barbara’ . . . . . . . . . . . . 72 18 Graph of the approximations for the images ’Lena’ and ’Barbara’. . . . . . . . . 73 List of Tables 1 Difference between α used in the Matlab implementation and in the code. . . . 61 2 Number of shears for each level depending on α. 3 Decay rate for synthetic images. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4 Decay rate for synthetic images. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 . . . . . . . . . . . . . . . . . 66 11 1 Introduction The field of image processing includes a lot of areas like denoising, compression and feature detection. An important goal of this field is to find image models and representation systems for this special model that are capable to detect edges in an optimal sparse way. One of the most known representation systems are wavelet frames, which have good properties in detecting point singularities. Wavelet-based methods are used for all the subfields of image processing mentioned above. However wavelet-frames are not able to detect edges in an optimal sparse way. For this reason my mentors Gitta Kutyniok and Wang-Q Lim, and others, developed the wavelet methods further. To study methods of image processing in a general way one has to restrict themself to an image model. There are different ways to do this, for example digital and continuous models. In the theoretical part of this thesis we use a continuous model, i.e. cartoon-like images, but of course for the implementation we have to use a digital model, i.e we have to see the image as an element of Rn,n . However the cartoon-like model is based on the assumption that a ’natural’ image is composed of smooth parts separated by finitely many singularities. But what does the ’smoothness’ in this context mean? In earlier definitions of cartoon-like images one looks at C 2 -smooth areas that are separated by a C 2 -smooth discontinuity curve. But this concept has been developed further to C α -smooth areas. The mother function (generator function) of the representation system can be chosen either as band-limited or as compactly supported. Most of the results that do exhibit optimal sparse approximation are only applicable to band-limited generators. The first proof in this context for compactly supported generators was given in [9], but only for the standard model of cartoon-like images. In this thesis we want to expand the results given in [9] to cartoon-like images that are C α smooth apart from an C β singularity curve, but only for 1 < α ≤ β ≤ 2 . In the first section we aim to introduce all the theoretical basics we need to understand the results and their proofs. The second chapter introduces cartoon-like images mathematically and derives the optimally approximation rate for this class of images. Afterwards in the third chapter we like to introduce shearlet frames that depend on the smoothness of the image we would like to analyze. Moreover we will claim the main result of this thesis: the optimal approximation rate can be reached by using these shearlet frames. Section five proofs the main result and all the results we need for the main results. In section six we expand the result to singularity curves that are only piecewise C β -smooth. Up to this point we will assume that we know the smoothness of our image that we like to approximate or to denoise etc. But in practise we do not know this smoothness and therefore, we do not know which frame we have to use. The last section deals with this problem. The main idea is to compute the approximation rate of an image for a special α and then to learn 12 1 INTRODUCTION the best α. So first the implementation of the shearlet transform to Matlab will be explained and afterwards we will lern the best α for this images. 13 2 Theoretical Basics In this section we want to introduce the basic knowledge that we need for the theoretical part of this thesis. First we introduce the Fourier transform in L2 (Rn ) and some of its important properties. Accordingly the Hölder spaces and some important fractional order Sobolev spaces should be defined. Then we want to say something about frames and wavelets that got to be extended to shearlets. First we want to formulate the multi-index notation: Definition 2.1. An n-dimensional multi-index α is an n-tupel α = (α1 . . . αn ). When we α α α1 α2 write |α| we mean the α·n· · + αn , by x we think of x = x · · · · · x and by α1|α| = α1 + sum . . . . ∂x∂n Dα we mean Dα = ∂x∂ 1 2.1 Fourier Transform Definition 2.2. For f ∈ L1 (Rn ) set: (Ff ) (ξ) = 1 (2π)n/2 Z f (x)e−ih ξ, xi dx ∀ξ ∈ Rn . Rn The function Ff is called Fourier transform of f . Ff is well-defined since the integral exists for f ∈ L1 (Rn ). The following proposition states a few easy properties of Ff . Proposition 2.3 ([12]). Let f ∈ L1 (Rn ). i) It is Ff ∈ C0 (Rn ) and F : L1 (Rn ) → C0 (Rn ) is a continuous and linear operator. ii) For y ∈ Rn is F (f (· − y)) (ξ) = e−ih y, ξi Ff (ξ). iii) For a > 0 is F (f (a·)) (ξ) = a−1 Ff ( aξ ). iv) Is xα f ∈ L1 (Rn ) for |α| ≤ k then Dα (Ff ) = (−i)|α| F (xα f ). To define the Fourier transform for functions f ∈ L2 (Rn ) it is useful to limit F to a subspace of L1 (Rn ) that is also dense in L2 (Rn ). The so-called Schwartz space is such a subspace. Definition 2.4. A function f : Rn → C is called rapidly decreasing if lim xα f (x) = 0 |x|→∞ ∀α ∈ Nn0 , where xα = xα1 1 · · · · · xαnn . The space n o S (Rn ) = f ∈ C ∞ (Rn ) : Dβ f is rapidly decreasing for all β ∈ N0n is called Schwartz space. 14 2 THEORETICAL BASICS The Schwartz space is a subspace of L1 (Rn ), so the Fourier transform is well-defined on this space. The extension of the Fourier transform to L2 -functions is therefore based on the following results: Proposition 2.5 ([12]). i) S(Rn ) is dense in L2 (Rn ). ii) The Fourier transform is a bijection between S(Rn ) and S(Rn ). The inverse operator is defined by F −1 1 f (x) = (2π)n/2 Z f (x)eih ξ, xi dx ∀ξ ∈ Rn . Rn And it yields h Ff , FgiL2 = h f , giL2 . In particular it follows that kFf kL2 = kf kL2 for all f ∈ S(Rn ) and therefore the Operator F is well-defined, bijective and isometric on S(Rn ) with respect to the norm k · kL2 . Because of the density of S(Rn ) in L2 (Rn ), one can extend F to a continuous operator on L2 (Rn ), called the Fourier-Plancherel transform and denoted by F2 . One often use fb instead of F2 f . The so-called Plancherel equation is fulfilled for this extension: h F2 f , F2 giL2 = h f , giL2 and kF2 f kL2 = kf kL2 ∀f, g ∈ L2 (Rn ) Now we want to state the Fourier Slice Theorem which gives us a connection between a projection of a two-dimensional function and the Fourier transform. Theorem 2.6 (Fourier Slice Theorem). Let f : R2 → C, then the projection of f onto the x2 -axis is given by: Z p(x1 ) = f (x1 , x2 ) dx2 . R The slice is then: s(ξ1 ) = fb(ξ1 , 0) = 1 2π Z Z f (x1 , x2 ) e−2πix1 ξ1 dx1 dx2 R R = 1 2π 1 = 2π Z R Z Z f (x1 , x2 ) dx2 e−2πix1 ξ1 dx1 R p(x1 )e−2πix1 ξ1 dx1 R 1 = pb(ξ1 ) (2π)1/2 2.2 Hölder Spaces and Fractional Order Sobolev Spaces Hence, q 1 b(ξ1 ) 2π p Z = fb(ξ1 , 0) and with the inverse Fourier transform: 1 f (x1 , x2 ) dx2 = p(x1 ) = (2π)1/2 R 2.2 15 Z pb(ξ1 )e 2πiξ1 Z dξ1 = R fb(ξ1 , 0)e2πiξ1 dξ1 . R Hölder Spaces and Fractional Order Sobolev Spaces In this thesis we aim to get an optimality result for one class of images, called cartoon-like images. Meaning, for some 1 < α ≤ β ≤ 2, images of two smooth parts separated by a curve of C α -regularity, where the both smooth parts can be described with functions in C β . Therefore the Hölder space C α (Rn ) as well as some fractional order Sobolev spaces have to be introduced. First we introduce the space of all m-times continuous differentiable functions C m. Definition 2.7. Let Ω ⊂ Rn , i) the space C m (Ω), for a non-negative integer m, is a vector space consisting of all functions ϕ which, together with their derivative Dα ϕ of order α ≤ m, are continuous on Ω. ii) the vector space C m (Ω) is the subspace consisting of all those functions ϕ ∈ C m (Ω) for which Dα ϕ is bounded and uniformly continuous on Ω for all 0 ≤ |α| ≤ m. Note that, ϕ ∈ C m (Ω) need not to be bounded, but if ϕ ∈ C m (Ω) is bounded and uniformly continuous then it possesses a unique and bounded continuous extension to Ω [1]. With the norm kϕkC m (Ω) = max sup |Dα ϕ(x)| is C m (Ω) a Banach space [1]. 0≤|α|≤m x∈Ω Now we are ready to introduce the Hölder spaces: Definition 2.8. i) For 0 ≤ λ ≤ 1 a function ϕ fulfills the Hölder condition of exponent λ if there exists a positive integer K such that |ϕ(x) − ϕ(y)| ≤ K |x − y|λ ∀x, y ∈ Ω. ii) If 0 ≤ λ ≤ 1, we define C m,λ (Ω) to be the subspace of C m (Ω) consisting of all those functions ϕ for which Dα ϕ satisfies Hölder condition of exponent λ in Ω for all 0 ≤ |α| ≤ m. A function ϕ ∈ C m,λ (Ω) is called Hölder-α smooth. Often we write σ = m + λ, where m is an positive integer and 0 ≤ λ ≤ 1, and C σ (Ω) instead of C m,λ (Ω). Note that C m,λ (Ω) together with the norm kϕkC m,λ(Ω) = kϕkC m (Ω) + max sup 0≤|α|≤m x,y∈Ω x6=y |Dα ϕ(x) − Dα ϕ(y)| |x − y|λ 16 2 THEORETICAL BASICS is a Banach space [1]. There is an equivalent definition for the Hölder space that yields an estimate we will use later. Therefore this should be stated here [11]: f is a Hölder-α smooth function if it has bαc derivatives and if there exists a constant C > 0 such that α f (x) − Tybαc (x) ≤ C |x − y| bαc where Ty ∀x, y, denotes the Taylor polynomial of degree bαc at the point y. bαc Note that for the range 1 < α ≤ 2 and Ω ⊂ R it is Ty (x) = f (y) + f 0 (y) |x − y|, and this yields the inequality: α f (x) − Tybαc (x) = f (x) − f (y) + f 0 (y) |x − y| ≤ C |x − y| ⇒ |f (x)| − f (y) + f 0 (y)(x − y) ≤ C |x − y|α ⇒ |f (x)| ≤ |f (y)| + f 0 (y) |x − y| + C |x − y|α (2.1) Now we want to introduce some fractional order Sobolev spaces; that is, to extend the notion of Sobolev spaces with integer values to non-integer values. There are a lot of methods to do this, but we want to restrict ourselves to the Sobolev-Slobodezki space W s,p (Rn ) and the Sobolev space H s (Rn ). The reason is that the Sobolev-Slobodezki space W s,p (Rn ) is featured with a norm which uses a Hölder condition and the Sobolev space H s (Rn ) can be defined with the Fourier transform. But first remember the definition of the Sobolev space W m,p for a non-negative integer m. Definition 2.9. i) For a non-negative integer m and 1 ≤ p < ∞ the norm k · km,p is defined as kukm,p := X 0≤|α|≤m kDα ukpp 1/p . ii) The Sobolev space W m,p (Ω) is defined as W m,p (Ω) := {u ∈ Lp (Ω) : kukm,p < ∞} . Note that with Dα u we mean the weak partial derivative. Now we can introduce the SobolevSlobodezki space: Definition 2.10. 2.2 Hölder Spaces and Fractional Order Sobolev Spaces 17 i) For 0 < µ < 1 and 1 ≤ p < ∞ the Slobodezki seminorm is defined as 1/p Z Z p |v(x) − v(y)| |v|µ,p := dxdy . |x − y|d+µp Ω Ω ii) The Sobolev-Slobodezki space is defined as W s,p (Ω) := {v ∈ W m,p : |Dα v|µ,p < ∞ for all |α| = m} , where s = µ + m. Now the Fourier transform comes into play again. We will see that for p = 2 the SobolevSlobodezki space W s,2 (Rn ) coincides with the Sobolev space H s (Rn ) which is defined by means of the Fourier transform. Proposition 2.11 ([4]). Let u ∈ W s,2 (Rn ) with s larger than zero. Then there are some constants c1 > 0 and c2 > 0, such that c1 · kuk2s,2 ≤ k 1 + | · |2 s/2 Fuk22 ≤ c2 · kuk2s,2 . Definition 2.12. For s > 0 i) the norm k · kH s is defined as s/2 kukH s := k 1 + |·|2 Fuk2 , ii) the Sobolev space H s (Rd ) for s > 0 is defined as H s (Rd ) := {u ∈ S(Rn )∗ : kukH s < ∞} , where S(Rn )∗ denotes the dual space of the Schwartz space S(Rn ). From Proposition 2.11 it follows that the two spaces coincides, i.e. W s,2 (Rn ) = H s (Rn ). Later it would be easier to handle with elements of the space H s (Rn ) than of C s (Rn ), so we need a connection between these two spaces. We first show how C s (Rn ) and W s,2 (Rn ) are related. Because of the concord of H s (Rn ) and W s,2 (Rn ) this also shows the desired relation. Since we will later restrict ourselves to R2 it should be enough to claim the following theorem for R2 . Theorem 2.13. Let 0 < s < ∞, then for every ε > 0 the following embedding is true: C0s+ε (R2 ) ⊂ W s,2 (R2 ), where the zero as usual indicates that we only look at the subspace of elements that are compactly supported. 18 2 THEORETICAL BASICS Proof. Let ϕ ∈ C0s+ε (R2 ), then there is an Ω > 0 such that ϕ is supported on [−Ω, +Ω]2 ⊂ R2 and by the Hölder condition we have for s + ε = m + µ and K > 0 |Dα ϕ(x) − Dα ϕ(y)| ≤K |x − y|µ ∀ |α| = m. To show that ϕ is also in W s,2 (R2 ) we have to show that |Dα ϕ|µ−ε,2 < ∞. This yields the estimation: |Dα ϕ(x) − Dα ϕ(y)|2 2+2(µ−ε) |x − y| = |Dα ϕ(x) − Dα ϕ(y)| 1 · µ |x − y| |x − y|1−ε 2 ≤ K2 · 1 |x − y|2(1−ε) . Hence, α |Dα ϕ(x) − Dα ϕ(y)|2 Z Z |D ϕ|µ−ε,p = |x − y|2+2(µ−ε) ZΩ ZΩ dxdy = −Ω −Ω R R ZΩ ≤K 2 · ZΩ −Ω −Ω 1 |x − y|2(1−ε) |Dα ϕ(x) − Dα ϕ(y)|2 |x − y|2+2(µ−ε) dxdy dxdy ≤ ∞ 1−ε 1 where we used in the last step that for every ε > 0 the map (x, y) 7→ x−y is contained 1−ε in L1 [−Ω, +Ω]2 since the map x 7→ x1 is contained in L1 (Ω̃) for every ε > 0 and every bounded region Ω̃ ⊂ R: ZΩ ZΩ −Ω −Ω 1 dxdy = |x − y|1−ε ZΩ Ω−y Z −Ω −Ω−y 1 dxdy ≤ K · |x|1−ε ZΩ dy < ∞ −Ω But this shows that the Slobodezki norm k · ks,2 for ϕ is bounded and therefore that ϕ is also in W s,2 , which proves the theorem. To conclude this subsection it remains to introduce a fractional order derivative of order smaller than s for a function ϕ ∈ H s (Rn ). There are different methods to do this (see for example [1]) but we want to introduce the method based on the Fourier transform. Remember the following property of the Fourier transform: Lemma 2.14 ([12]). Let ϕ ∈ W m,2 (Rd ) for a non-negative integer m. Then we have for |α| ≤ m F (Dα ϕ) = i|α| ξ α Fϕ. Since the right hand side makes also sense for non-integer values |α| we can define the fractional order derivative this way. Definition 2.15. Let ϕ ∈ W s,2 (Rd ) for s > 0. Then define for |α| ≤ s the fractional order derivative Dα ϕ to fulfill F (Dα ϕ) = i|α| ξ α Fϕ. 2.3 Frames 2.3 19 Frames Remember, if we have a Hilbert space H and a orthonormal basis S ⊂ H of H, then all P elements x of H can be displayed as x = h x, eie. This so-called reconstruction formula e∈S is a very nice property of an orthonormal basis. But to be orthonormal is a very strong condition, so it would be nice to disclaim the property of being orthonormal without losing this reconstruction formula. Since frames do this, we want to introduce them now. Later we will see that the cone adapted shearlet system indeed forms a frame. Hence, we have the reconstruction formula. Definition 2.16. i) Let {ϕi : i ∈ I} be a collection of elements in a Hilbert space H. Then (ϕi )i∈I forms a frame for H, if there exist constants 0 < A ≤ B < ∞ such that Akf k2 ≤ X |h f , ϕi i|2 ≤ Bkf k2 . i∈I The constants A and B are called frame bounds. ii) If A = B is possible, then (ϕi )i∈I is called a tight frame. If A = B = 1, then (ϕi )i∈I is called a Parseval frame. As mentioned in the beginning of this subsection, we want to have a reconstruction formula for a frame. Note that for a tight frame the reconstruction formula is very simple to argue; since we have X |h f , ϕi i|2 = Akf k2 for all f ∈ H i∈I it is easy to see that f can be displayed as [6] f= 1 X h f , ϕi iϕi A for allf ∈ H. i∈I For the general case this is much more complicated and one first needs to introduce the frame operator: Definition 2.17. i) Let (ϕi )i∈I ⊂ H be a frame for H. Then T : H → `2 (I) f 7→ (h f , ϕi i)i∈I 20 2 THEORETICAL BASICS is the analysis operator of (ϕi )i∈I . The adjoint operator is given by T ∗ : `2 (I) → H X (ci )i∈I 7→ ci ϕi i∈I and is called the synthesis operator. ii) The frame operator with respect to (ϕi )i∈I is given by: S =TT∗: H→H X f 7→ h f , ϕi iϕi . i∈I Note that the synthesis operator is well-defined since T is bounded [6]. Now we can present the desired reconstruction formula: Theorem 2.18 ([6]). Let (ϕi )i∈I ⊂ H be a frame for H and let S be its frame operator as defined above. Then S −1 ϕi is also a frame for H with frame bounds B −1 and A−1 , the so-called canonical dual frame. Then for each f ∈ H we have i) the reconstruction formula f= X h f , ϕi iS −1 ϕi , i∈I ii) the decomposition formula f= X h f , S −1 ϕi iϕi . i∈I Since our goal is to claim how the N-term approximation error decays, it is useful to find a boundary of this error as in the Lemma. Lemma 2.19 ([8]). Let (ϕi )i∈I be a frame for H with frame bounds A and B, and let (ϕ̃i )i∈I be the canonical dual frame defined above. Let IN ⊂ I with # |IN | = N , and let fN = P h f , ϕi iϕ̃i . Then i∈IN kf − fN k2 ≤ 1 X |h f , ϕi i|2 . A i∈I / N 2.4 Wavelets 2.4 21 Wavelets The shearlet concept to deliver sparse representations of anisotropic features are based on the concept of wavelet to deliver sparse representations of singularities. For this reason we want to introduce wavelets and wavelet bases for L2 (R2 ) and a result that illustrates why wavelets are far from reaching the optimality rate. The two-dimensional wavelets are an extension of the one-dimensional case. So first we consider the one-dimensional case, especially L2 (R). Definition 2.20. Let ψ ∈ L2 (R), ψ 6= 0. Then ψ is called a wavelet if Cψ := 2 Z∞ ψ̂(ξ) ξ dξ < ∞. 0 Note that this condition, the so-called wavelet condition, says that the Fourier transform of a wavelet in the near of zero goes rapidly to zero. Proposition 2.21 ([2]). The wavelet condition is only fulfilled if ψ̂(0) = 0. Now we aim to get a wavelet orthonormal basis for L2 (R). For this we need to know the concept of the multiresolution analysis. Definition 2.22. Let ψ be a wavelet. A collection of functions {ψj,k (x)}j,k∈Z of the form ψj,k (x) = 2−j/2 ψ (2 − jx − k) is called a wavelet system in L2 (R). If the wavelet system forms an orthonormal basis for L2 (R), the system is called wavelet orthonormal basis. Definition 2.23. A sequence (Vj )j∈Z of closed subspaces of L2 (R) is called multiresolution analysis (MRA) if the following conditions are fulfilled: i) Inclusion: For all j ∈ Z it is {0} ⊂ ... ⊂ V2 ⊂ V1 ⊂ V0 ⊂ V−1 ⊂ V−2 ⊂ ... ⊂ L2 (R) ii) Totality: ∪j∈Z Vj = L2 (R) iii) Trivial Intersection: ∩j∈Z Vj = {0} iv) Scaling: For all j ∈ Z it is: f ∈ Vj ⇔ f 2j · ∈ V0 v) Translation: For all j, k ∈ Z it is: f ∈ Vj ⇔ f (· − 2j k) ∈ Vj vi) Scaling Function: There exists a function ϕ ∈ L2 (R), such that {ϕ (· − m) : m ∈ Z } is an orthonormal basis (ONB) of V0 . This function is typically called scaling function. Note that by (iv) and (vi) we have: Vj = span {ϕj,m : m ∈ Z} where ϕj,m (x) = 2−j/2 ϕ 2−j x − m . 22 2 THEORETICAL BASICS Lemma 2.24 ([2]). Let ϕ be the scaling function associated with a MRA. Then there exists a sequence (hk )k∈Z ⊂ R such that ϕ(x) = 21/2 X hm ϕ(2x − m) ∀x ∈ R. k∈Z We say ϕ satisfies the scaling equation. Definition 2.25. Let (Vj )j∈Z ⊂ L2 (R) form a MRA. Then the associated wavelet spaces Wj , j ∈ Z are defined as Vj = Vj+1 ⊕ Wj+1 , Wj+1 ⊥Vj+2 . Note that by this definition it follows that [2] M Wm Vj = m≥j+1 and because of the completeness of Vj also L2 (R) = M Wm . m∈Z With that results we can construct a wavelet basis for L2 (R), what the following theorem [2] shows: Theorem 2.26 ([2]). Let (Vj ) a MRA with generator ϕ that fulfills the scaling equation with the sequence (hk ). Define ψ ∈ V−1 as ψ(x) = 21/2 X (−1)k ϕ(2x − k). k∈Z Then the following statements are true: i) The set ψj,k = 2−j/2 ψ(2−j · −k) : k ∈ Z is an ONB of Wj . ii) The set {ψj,k : k, j ∈ Z} is an ONB of L2 (R). iii) The function ψ is a wavelet. In fact we wanted to construct a Wavelet basis for L2 (R2 ) and with the orthonormal wavelet basis for L2 (R) we can do this by tensor products. Theorem 2.27 ([2]). Let (Vj ) be a MRA for L2 (R) with scaling function ϕ ∈ L2 (R) and associated wavelet ψ ∈ L2 (R). Define for (x1 , x2 ) ∈ R2 ψ 1 (x1 , x2 ) = ϕ(x1 )ψ(x2 ) ψ 2 (x1 , x2 ) = ψ(x1 )ϕ(x2 ) ψ 3 (x1 , x2 ) = ψ(x1 )ψ(x2 ), 2.4 Wavelets 23 Then the set k Ψ = (x1 , x2 ) 7→ ψj,m (x1 , x2 ) = 2−j ψ k (2−j x1 − m1 , 2−j x2 − m2 ) : j, m1 , m2 ∈ Z; k ∈ {1, 2, 3} is an ONB for L2 (R2 ). Note that this ONB is a so-called wavelet basis. From [7] we know that for a ’nice’ wavelet ψ and a function f that is smooth apart from a discontinuity point x0 ∈ R2 the continuous wavelet transform Wψ f (2, k) = 2−j/2 Z ψ(2−j (x − k))f (x)dx R2 decays rapidly as j → ∞ except for k near x0 . So the continuous wavelet transform is able to locate point singularities. But the next theorem will show that the approximation rate obtained using wavelet approximation is O(N −1 ) and later we will see that this approximation rate is far away from being optimal. Theorem 2.28 ([8]). Let Ψ be a wavelet basis for L2 (R2 ). Suppose f = χB , where B is a ball contained in [0, 1]2 . Then kf − fN k2L2 N −1 for N → ∞, where fN is the best N-term approximation from Ψ. 24 3 OPTIMALITY RESULT Figure 1: Example of a cartoon-like image. 3 Optimality Result In this section we want to deduce the optimal approximation rate that can be reached for cartoon-like images. Therefore the class of cartoon-like images has to be defined first. 3.1 β The Class Eα;L of Cartoon-like Images The main idea to model a image mathematically was given in [3]; that is, such images contain two smooth (C 2 -) regions separated by a smooth (C 2 -) curve (see Figure 1). In this thesis we do not want to restrict ourselves to the C 2 - regularity, therefore we consider images that contain two smooth C β -regions separated by a C α -region, where α and β lie between one and two. For more clarity we introduce the set ST ARα (ν, L) first, and afterwards the class of β . cartoon-like images Eα,L For α > 0 and β > 0, let ρ : [0, 2π) → [0, ∞) be continuous, then define the set B ⊂ R2 by: B = x ∈ R2 : θ ∈ [0, 2π) , kxk2 ≤ ρ(θ), x = (kxk2 , θ) in spherical coordinates such that the boundary ∂B of B is a closed surface parametrized by: ! ρ(θ) cos(θ) b(θ) = , θ ∈ [0, 2π) ρ(θ) sin(θ) and the radius function ρ is Hölder continuous with coefficient ν, i.e: 0 γ ∂ p(θ) − ∂ γ p(θ ) max sup ≤ ν, {α} |γ|=bαc θ,θ0 kθ − θ0 k2 (3.1) (3.2) where {α} = α − bαc. Definition 3.1. i) For ν > 0 the set ST ARα (ν) is defined to contain all sets B ⊂ R2 that are translates of sets obeying (3.1) and (3.2). β 3.1 The Class Eα;L of Cartoon-like Images 25 Figure 2: Natural image containing a cartoon-like structure. ii) The class ST ARα (ν, L) is defined to be the set containing all sets B with piecewise C α -boundary, i.e. ∂B is the union of finitely many pieces ∂B1 , ..., ∂BL , which do not overlap except at their boundaries and each piece ∂Bi , i ∈ {1, ..., L}, can be represented in parametric form by a C α -smooth radius function ρi = ρi (θ) obeying (3.2). Note that the inequality (3.2) in particular implies that the discontinuity curve is C α -regular and that by definition we have no restriction to the sharpness of the edges since there is no specification in how the pieces ∂Bi meet. Also observe that ST ARα (ν) = ST ARα (ν, 1). With introducing ST ARα (ν, L) we are ready to define the class of cartoon-like images: Definition 3.2. β i) Let µ, ν > 0, α, β ∈ (1, 2] and L ∈ N. Then Eα,L (R2 ) denotes the set of functions f : R2 → C of the form f = f0 + f1 χB , where B ∈ ST ARα (ν, L) and fi ∈ C β (R2 ) with supp f0 ⊂ [0, 1]2 and kfi kC β ≤ µ for β i = 0, 1. Define Eαβ (R2 ) = Eα,1 (R2 ). bin (R2 ) denotes the class of binary cartoon-like images, that is, functions f = f + ii) Eα,L 0 β f1 χB ∈ Eα,L (R2 ) where f0 = 0 and f1 = 1. This model of images seems to appropriate since natural images often can be characterized to contain many smooth regions separated by curves with sharp corners. So if we only look at a small detail of an image, it looks like our model. For an example see Figure 2. 26 3 OPTIMALITY RESULT 3.2 Optimality Rate β In this subsection we aim for a benchmark of sparse approximation of functions in Eα;L ⊂ L2 (R2 ). So let Ψ = (ψi )i∈I be a dictionary for L2 (R2 ) where I not necessary countable. Without loss of generality we assume kψi kL2 = 1. Then for every function f ∈ L2 (R2 ) there exists a countable subset If of I and a sequence (ci (f ))i∈If = c(f ) such that X ci (f )ψi . f= i∈If For a N-term approximation we have to search the N-th coefficient in Ψf = (ψi )i∈If . But without a restriction of the search depth this can be infeasible in practise. The reason is that we can choose Ψ to be a countable close subset of L2 (R2 ) since L2 (R2 ) is separable. But this would yield an arbitrarily good sparse approximation since there is always an element of Ψ that is nearer to f than another one. Hence, the search would never end. For this reason we will obtain polynomial depth search, this requires for a polynomial π(n), that the first n terms in the n-term approximation have to come from the first π(n) terms of the dictionary [5]. To find out the optimality rate by the polynomial depth search restriction we have to define what it means for a function space to contain a copy of `p0 . We will see that if a function space F contains a copy of `p0 , then for every τ < p there exists an element f ∈ F such that c(f ) ∈ / `τ0 . From this we can deduce the optimality rate. Definition 3.3. i) A function space F is said to contain an embedded orthogonal hypercube of dimension m and side δ if there exist an f0 ∈ F and orthogonal functions ψi,m,δ for i = 1, ..., m with kψi,m,δ kL2 = δ such that the collection of hypercube vertices P H(m; f0 , (ψi )) = h = f0 + m i=1 εi ψi,m | εi ∈ {0, 1} is embedded in F . ii) A function space F is said to contain a copy of `p0 if F contains embedded orthogonal hypercubes of dimension m(δ) and side δ and if for some sequence δk → 0 and some constant C > 0 there exist a k0 such that: m(δk ) ≥ Cδk−p ∀k ≥ k0 (3.3) Theorem 3.4 ([5]). Suppose F contains a copy of `p0 . Then for every τ < p, and allowing only polynomial depth search, we have max kc(f )klτ = +∞. f ∈Eβα (R2 ) 3.2 Optimality Rate 27 Now we want to translate these results to our model of cartoon-like images and we will see for which p the class of cartoon-like images contains a copy of `p0 . Theorem 3.5. i) The class of binary cartoon-like images Eαbin (R2 ) contains a copy of `p0 for p = 2/(α + 1). ii) The space of Hölderfunctions C β (R2 ) with compact support in [0, 1]2 contains a copy of `p0 for p = 2/(β + 1). Proof. i) Follows directly from Donoho [5] for a star-shaped function since B is in particular star-shaped. ii) To show that C β (R2 ) contains a copy of `p0 we have to find a collection of embedded orthogonal hypercubes of dimension m(δ) and size δ such that (3.3) holds. Let ϕ ∈ C0∞ with supp ϕ ⊂ [0, 1] and define for m ∈ N and i1 , i2 ∈ {0, ..., m − 1} ψi,m (t) = ψi1 ,i2 ,m (t) = m−β ϕ(mt1 − i1 )ϕ(mt2 − i2 ), where i = (i1 , i2 ) and t = (t1 , t2 ) ∈ R2 . We let ψ(t) = ϕ(t1 )ϕ(t2 ), then ψ and ψi,m ∈ C β (R) since ϕ ∈ C ∞ (R). It follows that kψi,m k2L2 = m−2β−2 kψk2L2 . To see this note R 2 −1 2 R |ϕ(mtk − ik )| dtk = m kϕkL2 . T i2 i2 +1 i1 i1 +1 , m ] × [m , m ], we have supp ψi,m supp ψj,m = ∅ for i 6= j. Since supp ψi,m ⊂ [ m Hence, ψi,m and ψj,m are orthogonal in L2 (R) for i 6= j and we have the hypercube embedding 2 H(m ; 0, (ψi,m )) = h = m X m X εi1 ,i2 ϕ(m · −i1 )ϕ(m · −i2 ) | εi ∈ {0, 1} i1 =1 i2 =1 2 = h= m X εi ψi | εi ∈ {0, 1} , i=1 2 where δ = kψi,m kL2 = m−β−1 kψkL . Therefore, if we choose m(δ) as m(δ) = δ kψkL2 − 1 β+1 , it follows for δk → 0 and δk0 sufficiently small 2 δk kψkL2 − 2 β+1 δk kψkL2 − 2 β+1 m(δk ) ≥ ≥ − 2 2 ! β+1 β+1 δk δk −1= 1− kψkL2 kψkL2 ! 2 β+1 δk0 − 2 1− ≥ C · (δk ) β+1 . kψkL2 28 3 OPTIMALITY RESULT With this result we can see that kc(f )klp cannot be bounded for p < max{2/(α+1), 2/(β +1)} which follows from Theorem 3.4. Let ((cn (f )∗ )n )n∈N be a (in modulus) decreasing rearrangement of the coefficients (c(f )n )n∈N . For the coefficient (cn (f )∗ )n to be not in `p0 means that (cn (f )∗ )pn behaves asymptotically worse than since f ∈ (cn (f )∗ ) n 1 , i.e.(cn (f )∗ )pn ≥ n12 for n2 Eαbin (R2 ) which contains a − β+1 β 2 2 ≥n n sufficiently large. Therefore it is (cn (f )∗ )n ≥ n− copy of l0p for p = 2 α+1 . α+1 2 Analogously it follows that since f ∈ C (R ). This implies the lower bound (cn (f )∗ )n . n− min{ α+1 β+1 , 2 2 }. α+1 Suppose now (|c∗n |)n∈N = (|cn (f )∗ |)n∈N decays as |c∗n | . n− 2 . From Lemma 2.19 we know P that kf − fN k2 ≤ A1 |h f , ϕi i|2 where A is a lower frame bound for the frame (ϕi )i of i∈N / L2 (R2 ). Hence, ∞ Z 1 X −(α+1) 1 X ∗ 2 |cn | . n . x−(α+1) dx . CN −α , kf − fN k ≤ A A 2 n>N n>N N for (c(f )n )n∈N = h f , ϕn i. By supposing that |c∗n | decays as |c∗n | . n− β+1 2 , this yields kf − fN k2 . CN −β . Summa rizing, it follows that the optimal approximation error cannot exceed O max N −α , N −β convergence. For the parameter range 1 < α ≤ β ≤ 2 this rate reduces to O (N −α ). 29 4 4.1 Decay of the Approximation Error using a Shearlet System Cone-adapted Shearlet System As we have seen in Theorem 2.28 and Section 3 by using wavelet approximation for a cartoonlike image the approximation rate is far from being optimal. The reason is that wavelets are only able to locate point singularities. But a cartoon-like image is characterized by curve singularities. So the authors of [7] extended the wavelet transform by a shearing operation what yields a new system; the shearlet system. In this section we want to introduce the shearlet system. For α ∈ (1, 2] we fintroduce the scaling matrices A2j , Ã2j , j ∈ Z, defined by " A2j = 2jα/2 0 0 2j/2 " # , Ã2j = 2j/2 0 0 2jα/2 # , and the shear matrix Sk , defined by " Sk = 1 k 0 1 # . As to be seen in Figure 3 we partition the frequency domain into the following four cones C1 = (ξ1 , ξ2 ) ∈ R2 C2 = (ξ1 , ξ2 ) ∈ R2 C3 = (ξ1 , ξ2 ) ∈ R2 C4 = (ξ1 , ξ2 ) ∈ R2 : ξ1 ≥ 1, |ξ2 /ξ1 | ≤ 1 , : ξ2 ≥ 1, |ξ1 /ξ2 | ≤ 1 , : ξ1 ≤ −1, |ξ2 /ξ1 | ≤ 1 , : ξ2 ≤ −1, |ξ1 /ξ2 | ≤ 1 , and a centered square R = (ξ1 , ξ2 ) ∈ R2 : k (ξ1 , ξ2 ) k∞ < 1 . This partition is useful since for forming a frame for L2 (R2 ) it suffices to take a set of shearing parameters k that is bounded. The idea is now to construct a so-called shearlet frame for the subspace of L2 (R2 ) that is induced by rectangle R as well as by C1 ∪ C3 and C2 ∪ C4 which all together form a frame for L2 (R2 ). Definition 4.1. For c = (c1 , c2 ) ∈ R2+ the cone-adapted discrete shearlet system SH Φ, Ψ, Ψ̃; c, α for the parameter α ∈ (1, 2] generated by Φ, Ψ, Ψ̃ is defined by SH Φ, Ψ, Ψ̃; c, α = Φ (ϕ; c1 , α) ∪ Ψ (ψ; c, α) ∪ Ψ̃ (ψ; c, α) , 30 4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM Figure 3: Partition of frequency domain. where Φ (ϕ; c1 , α) = ϕm (· − m) : m ∈ c1 Z2 , o n Ψ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ (Sk A2j · −m) : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ cZ2 , o n Ψ̃ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ SkT Ã2j · −m : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ c̃Z2 , where c̃ = (c2 , c1 ) and cZ2 means for z = (z1 , z2 ) ∈ Z2 it is cz = (c1 z1 , c2 z2 ). For an easier handling define for j > 0 a set of possible parameter λ n o Λj = λ = (j, k, m) : |k| < b2j(α−1)/2 c, m ∈ cZ . As mentioned above we want to know wheather this Shearlet system forms a frame for L2 (R2 ). Under certain conditions we can show that Ψ (ψ; c, α) forms a frame for the subspace of L2 (R2 ) induced by C1 ∪ C3 , i.e. for L2 n o f ∈ L2 (R2 ) : ess-supp fb ⊂ C1 ∪ C3 . And analogously, since Ψ (ψ; c, α) and Ψ̃ (ψ; c, α) are linked by rotation of 90◦ , Ψ̃ (ψ; c, α) forms a frame for L2 o n f ∈ L2 (R2 ) : ess-supp fb ⊂ C2 ∪ C4 . This result implies that SH Φ, Ψ, Ψ̃; c, α forms a frame for L2 (R2 ) under certain conditions. These conditions should not be stated here; for details see [10]. But a minimal requirement that we use a lot of times is the property of a shearlet system to be feasible. This property ensures that the essential support of the shearlet in the frequency domain is bounded. 4.1 Cone-adapted Shearlet System 31 Definition 4.2. Let δ, γ > 0. A function ψ ∈ L2 (R2 ) is called a (δ, γ)-feasible shearlet, if there exist q ≥ q 0 > 0 and q ≥ r > 0 such that n o n −γ o b δ · min 1, |rξ2 |−γ . ψ(ξ) . min 1, |qξ1 | · min 1, q 0 ξ1 (4.1) In the following parts we will assume that q = q 0 = r = 1. Remember that by Heisenberg’s uncertainty principle a compactly supported shearlet in time domain cannot be compactly supported in frequency domain. But due to this property we also have a decay condition on the shearlet in the frequency domain. Since the shearlet ψ is compactly supported in [0, 1]2 , the shearlets ψj,km will be supported in a parallelogram of side length 2−jα/2 and 2−j/2 as illustrated in Figure 4. For α close to 1 the shearlets are square-like. But for α > 1 the shearlets become more and more line-like for j → ∞; that is in one direction the shearlets become smaller and smaller. Figure 4: Support of the shearlet. 32 4.2 4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM Decay Rate of the N-term Approximation Error In this section we will state the main result, i.e. we will claim that the N-term approximation error by using a cone-adapted shearlet system, that fulfills special conditions, coincides with the optimal approximation rate N −α . But first we need to analyze the shearlet coefficients h f , ψλ i. We separate this analysis into shearlets, which support is away from the discontinuity curve and shearlets, which support is associated with the discontinuity curve. For the shearlets associated with the discontinuity curve we further subdivide the analysis to a linear discontinuity curve, then a general discontinuity curve and finally a general discontinuity curve with finitely many corners. But the last extension to a discontinuity curve with corners is not that difficult and will be given in Section 6. In this section we only want to claim the results, the proofs are given in the following sections. Let SH Φ, Ψ, Ψ̃; c, α be a shearlet frame for L2 (R2 ), which we indicate by SH Φ, Ψ, Ψ̃; c, α = (ωi )i∈I to have the same notation as in Section 2.3. The canonical dual frame associated with the frame operator we denote by (ω̃i )i∈I . Then by the reconstruction formula of Theorem 2.18 we know that every f ∈ Eαβ (R2 ) can be displayed as f= X h f , ωi iω̃i . i∈I Then we define the N -term approximation fN of f ∈ Eαβ (R2 ) to be fN = X h f , ωi iω̃i , i∈IN where (h f , ωi i)i∈IN are the N largest coefficients in magnitude. Note that this approximation is not always the best N -term approximation, but it is an N -term approximation. As we will see it suffices to reach the optimal decay rate. Further note that this approximation is not linear since, if fN is the N -term approximation for f with index-set If ;N and gN the N -term approximation for g with index set Ig,N , than fN + gN is only the N -term approximation for f + g if If,N = Ig,N . In the following sections we want to prove some estimates that will be claimed now. Note that we use generic constants for the proofs of our estimates, that is to use the same symbol for appearing constants also if the value of the constants change. We first have to introduce some notation. 4.2 Decay Rate of the N-term Approximation Error 33 For a scale j ≥ 0 and p ∈ Z2 let Qj,p denote the dyadic cube defined by h i j j j 2 Qj,p := −2− 2 , 2− 2 + 2− 2 p. Next define Qj as Qj := {Qj,p : int(Qj,p ) ∩ ∂B 6= ∅} , where int(Qj,p ) denotes the interior of Qj,p , i.e. Qj is the collection of dyadic squares Qj,p , whose interior intersects the discontinuity curve ∂B. Now the shearlets come into play, so for some scale j ≥ 0 and some p ∈ Z2 define Λj,p to be the set of shearlet indices, such that the corresponding shearlet intersects the discontinuity curve ∂B in the interior of Qj,p , i.e.: Λj,p := {λ ∈ Λj : int(Qj,p ) ∩ ∂B ∩ int(supp ψλ ) 6= ∅} where Λj denotes the set of all shearlet indices to the scale j. Finally we define Λj,p (ε) for 0 < ε < 1 to be the set of shearlet indices whose support intersects Qj,p and whose corresponding shearlet coefficients |h f , ψλ i| are larger than ε and Λ(ε) to be the collection of all Λj,p (ε) across all scales j ≥ 0 and all p ∈ Z2 Λj,p (ε) := {λ ∈ Λj,p : |h f , ψλ i| > ε} and Λ(ε) := [ Λj,p (ε). j,p The set Sj,p := the size C · 2 − 2j S λ∈Λj,p supp ψλ , − 2j ×C ·2 for some 2j,p ∈ Qj , is then contained in a cubic window of , hence, of asymptotically same size as Qj,p . As the first step we want to analyze the shearlet coefficients of shearlets not interacting with the discontinuity line. Note that by computing the shearlet coefficient h f , ψλ i, only the part of f on the support of ψλ is important. On this part f is C β -regular since the shearlet do not intersects the discontinuity curve. Therefore it suffices to look at functions f , that are C β -smooth on all of R2 . Proposition 4.3. Let f ∈ C β (R2 ) with supp f ⊂ [0, 1]2 . Suppose that ψ ∈ L2 (R2 ) is compactly supported and (δ, γ)-feasible for δ > γ + β and γ > 3. Then: P |c(f )∗n |2 . N −β+ε as N → ∞ n>N for any ε > 0. So we have the decay rate N −β+ε for any ε > 0, but since we can choose ε arbitrarily small this yields our desired decay rate N −β . Now we come to the more interesting part; the decay rate of the shearlet coefficients associated with the discontinuity curve. By assumption, the discontinuity curve is of C α -regularity, 34 4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM Figure 5: In a sufficiently small cubic window Qj,p the discontinuity curve has one of this characteristics. In the intersection of the cubic window with the support of a shearlet and the discontinuity, we can choose a point and the tangent to the discontinuity at this point. The left picture is an example for case 4a) and the right picture for case 4b) hence, for sufficiently large j the edge curve can be parametrized either by (x1 , E(x1 )) or by (E(x2 ), x2 ), with E ∈ C α , in the interior of Sj,p (see Figure 5). Now there are two cases to distinguish: Case 4a: The discontinuity curve can be parametrized by (x1 , E(x1 )) or by (E(x2 ), x2 ) with E ∈ C α in the interior of Sj,p , such that for any λ ∈ Λj,p we have |E 0 (xˆ2 )| ≤ 2 or |E 0 (xˆ1 )|−1 ≤ 2 for one x̂ = (xˆ1 , xˆ2 ) ∈ int(2j,p ) ∩ int(supp ψλ ) ∩ ∂B. Case 4b: The discontinuity curve can be parametrized either by (x1 , E(x1 )) or by (E(x2 ), x2 ) with E ∈ C α in the interior of Sj,p , such that for any λ ∈ Λj,p we have |E 0 (xˆ2 )| > 2 or |E 0 (xˆ1 )|−1 > 2 for one x̂ = (xˆ1 , xˆ2 ) ∈ int(2j,p ) ∩ int(supp ψλ ) ∩ ∂B. Note that E 0 (xˆ1 ) = 0 can be identified by E 0 (xˆ1 )−1 = ∞ . As mentioned above we first assume that the discontinuity curve is linear on the support of the shearlet and estimate the coefficient. Proposition 4.4. Let ψ ∈ L2 (R2 ) be compactly supported, and suppose that ψ is (δ, γ)-feasible for δ > γ + β and γ > 4 and satisfies ∂ |ξ2 | −γ b ∂ξ2 ψ ≤ |h(ξ1 )| · 1 + |ξ1 | (4.2) Furthermore, let λ ∈ Λj,p for j ≥ 0 and p ∈ Z2 . Suppose that f ∈ Eαβ (R2 ) for 1 < α ≤ β ≤ 2 and that ∂B is linear on the support of ψλ in the sense that: supp ψλ ∩ ∂B ⊂ H 4.2 Decay Rate of the N-term Approximation Error 35 for some affine line H of R2 . Then: i) if H has normal vector (−1, s) with s ≤ 3, 2−j(α+1)/4 |h f , ψλ i| ≤ C k + 2j(α−1)/2 s3 (4.3) for some constant C < 0. ii) if H has normal vector (−1, s) with s ≥ 3/2, |h f , ψλ i| ≤ C2−j(7α−5)/4 (4.4) for some constant C < 0. iii) if H has normal vector (0, s) with s ∈ R, then (4.4) holds. Observe that, if the line has slope s then the directional vector is given by (s, 1) since the line is parametrized by the equation x2 = s · x1 + c for some c ∈ R. Therefore the normal vector of the line is given by (−1, s). We see that case 4a) is handled in 4.4 i) and case 4b) is handled in 4.4 ii) and iii). Note that it is no problem that the cases i) and ii) of Theorem 4.4 overlap, since both cases yield to the desired decay rate as we will see later. Having this estimate we can use it to prove the more general result for the shearlet coefficients that are intersecting a general discontinuity line without corners. Theorem 4.5. Let ψ ∈ L2 (R2 ) be compactly supported and suppose that ψ is (δ, γ)-feasible for δ > γ + β and γ > 4 and satisfies condition (4.2) of Proposition 4.4 . Furthermore, let λ ∈ Λj,p for j ≥ 0 and p ∈ Z2 . Suppose that f ∈ Eαβ (R2 ) for 1 < α ≤ β ≤ 2 and µ, λ > 0. For fixed x̃ = (x˜1 , x˜2 ) ∈ int(Qj,p ) ∩ int(supp ψλ ) ∩ ∂B let H be the tangent line to the discontinuity curve ∂B at x̃ = (x˜1 , x˜2 ). Then: i) if H has normal vector (−1, s) with s ≤ 3, 2−j(α+1)/4 |h f , ψλ i| ≤ C k + 2j(α−1)/2 sα+1 (4.5) for some constant C < 0. ii) if H has normal vector (−1, s) with s ≥ 3/2, |h f , ψλ i| ≤ C2−j (2α 2 +α−1 )/4 for some constant C < 0. iii) if H has normal vector (0, s) with s ∈ R, then (4.6) holds. (4.6) 36 4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM With the estimation for the shearlet coefficients intersecting a general discontinuity curve, we can compute the decay rate of the N-term approximation by using the N largest coefficients in modulus and will see that this decay rate meets the desired optimal decay rate. Theorem 4.6. Let c > 0 and let ϕ, ψ, ψ̃ ∈ L2 (R2 ) be compactly supported. Suppose that the shearlet ψ is (δ, γ)-feasible for δ > γ + β and γ > 4 and satisfies condition (4.2) as well as that the shearlet ψ̃ satisfies the same conditions but with the roles of ξ1 and ξ2 reversed. Further suppose that SH Φ, Ψ, Ψ̃; c, α forms a frame for L2 (R2 ). Then for every ν > 0, the shearlet frame SH Φ, Ψ, Ψ̃; c, α provides almost optimally sparse approximations of functions f ∈ Eαβ (R2 ), i.e. there exists some constant C > 0, such that: kf − fN k22 ≤ CN −α · (log2 N )α+1 as N → ∞, (4.7) where fN is the nonlinear N -term approximation by choosing the N largest coefficients in magnitude of f . 37 5 Proofs 5.1 Proof of Proposition 4.3 To prove Proposition 4.3 we first check an analogous result for functions in Hβ (R2 ) since for such functions we have a fractional order derivative due to Definition 2.15. When we have shown this result we can assign it to functions in C β (R2 ) by Theorem 2.13. But first we want to prove the following estimate which we need to check the first step. Lemma 5.1. Let g ∈ Hβ (R2 ) with supp g ⊂ [0, 1]2 . Suppose ψ ∈ L2 (R2 ) is (δ, γ)-feasible for δ > γ + β, γ > 3. Then there exists a constant B > 0, such that: ∞ X X X j=0 k<2 j(α−1) 2 2αβj |h g, ψj,k,m i|2 ≤ Bk∂ (β,0) gk2L2 m∈Z Proof. b Choose ϕ as (2πiξ1 )β ϕ(ξ) b = ψ(ξ) for ξ = (ξ1 , ξ2 ) ∈ R2 . Then ϕ ∈ L2 (R2 ) since: Z Z Z 2 2 2 −β b 0 ≤ |ϕ| dx = |ϕ| b dξ = (2πiξ1 ) ψ(ξ) dξ R2 2 R R Z n o 2 ≤ (2πi)−β min 1, |ξ1 |δ−β min 1, |ξ1 |−γ min 1, |ξ2 |−γ dξ R2 Z 2 Z n o 2 ≤ (2πi)−2β min 1, |ξ1 |δ−β min 1, |ξ1 |−γ min 1, |ξ2 |−γ dξ2 dξ1 R R ≤C · (2πi) −2β Z n o 2 δ−β min 1, |ξ1 |−γ dξ1 min 1, |ξ1 | R\[−1,1] Z1 n o 2 + min 1, |ξ1 |δ−β min 1, |ξ1 |−γ dξ1 −1 =C · (2πi)−2β Z |ξ1 |−2γ dξ1 + R\[−1,1] Z1 |ξ1 |2(δ−β) dξ1 ≤ ∞ −1 It follows that D(β,0) ϕ(ξ) = ψ(ξ) since \ b D(β,0) ϕ(ξ) = (2πi)|(β,0)| ξ (β,0) ϕ(ξ) b = (2πiξ1 )β ϕ(ξ) b = ψ(ξ). And therefore 2 2 2 (β,0) β = h g b , (2πiξ ) ϕ \ i g, ϕj,k,m i = h (2πiξ1 )β gb, ϕ \ i h D 1 j,k,m j,k,m 2 = h g, D(β,0) ϕj,k,m i = 2αβj |h g, ψj,k,m i|2 . 38 5 PROOFS Where in the last step we used D(β,0) ϕj,k,m = 2αβj ((β,0) ϕ)j,k,m . Now the claim follows by a corollary that is shown in [10] for the three-dimensional case, and which follows analogously for the 2-dimensional case: ∞ X X j=0 = k<2 ∞ X j=0 X m∈Z2 j(α−1) 2 2 X h D(β,0) g, ϕj,k,m i ≤ BkD(β,0) gk2L2 . X k<2 2αβj |h g, ψj,k,m i|2 m∈Z2 j(α−1) 2 With this estimate we are ready to prove the statement for functions in Hβ (R2 ). Lemma 5.2. Let g ∈ Hβ (R2 ) with supp g ⊂ [0, 1]2 . Suppose that ψ ∈ L2 (R2 ) is compactly supported and (δ, γ)-feasible for δ > γ + β and γ > 3. Then: P |c(g)∗n |2 . N −β as N → ∞ n>N where c(g)∗n is the n-th largest coefficient in magnitude from h g, ψλ i. Proof. S Λ Define Λ̃j := {λ ∈ Λj : supp ψλ ∩ supp g 6= ∅} and NJ := J−1 . Then NJ satisfies the j j=0 estimate: NJ ∼ J−1 X 2 j(α−1) 2 jα j J−1 X 22 22 = 2jα ∼ 2Jα j=0 j=0 since 2 jα 2 j 2 2 is the volume of the support of shearlet ψλ with λ ∈ Λj , and therefore the number j(α−1) of possible translates m, and 2 2 the number of possible parameters k for the scale j. P |c(g)∗n | does not contain the Nj0 largest elements of (|h g, ψj,k,m i|)j,k,m Note that the sum n>Nj0 and therefore it holds that X |c(g)∗n | ≤ XX |h g, ψj,k,m i| , j=j0 k,m n>Nj0 since the number of summands in both sums is the same and the second sum can contain some of the Nj0 largest coefficients instead of the smaller ones. This yields ∞ X j0 =1 2αβj0 X |c(g)∗n |2 ≤C 2αβj0 |h g, ψj,k,m i|2 j=1 j=j0 k,m n>Nj0 rearrangement of terms → ∞ X ∞ X X =C ≤C ∞ X X j=1 k,m ∞ X X j=1 k,m 2 |h g, ψj,k,m i| j X 2αβj0 j0 =1 |h g, ψj,k,m i|2 2αβj < ∞, 5.1 Proof of Proposition 4.3 39 where the last step follows from Lemma 5.1. P In particular we have 2αβj0 |c(g)∗n |2 ≤ C and therefore n>Nj0 |c(g)∗n |2 ≤ C2−αβj0 = C 2αj0 P n>Nj0 −β . ≤ CNj−β 0 Finally let N > 0, then there exists a positive integer j0 > 0 such that N ∼ Nj0 ∼ 2αj0 and with the estimate above the claim follows: X n>N |c(g)∗n |2 ∼ X ∼ CN −β . |c(g)∗n |2 ≤ CNj−β 0 n>Nj0 Proof of Proposition 4.3. Let f ∈ C β (R2 ), then by the embedding Theorem 2.13 for fractional order Sobolev spaces f is also in H β−ε (R2 ) since C β (R2 ) ⊂ H β−ε (R2 ). Now from the Lemma: P |c(f )∗n |2 . N −(β−ε) = N −β+ε as N → ∞ n>N 40 5.2 5 PROOFS Proof of Proposition 4.4 Now we are ready to prove Proposition 4.4, that is we want to prove the estimation for the shearlet coefficients on the assumption that the singularity curve is linear on the support of the shearlet. The proof follows in an analogue way as the proof of the three dimensional case as well as the proof of the two-dimensional case for α = 2 in [9], [10]. Without loss of generality we assume that f is only nonzero on B. We first consider the cases i) and ii) of Proposition 4.4. The hyperplane can be written as H = x ∈ R2 : h x − x0 , (−1, s)i = 0 for some x0 ∈ R2 since (−1, s) is the normal vector that should be orthogonal to the discontinuity line and x0 gives the translation from the origin to the real position of the discontinuity line. Step 1 Since integration is easier along lines parallel to the discontinuity line we shear the discontinuity line, such that it is a parallel line to the x2 -axis: S−s H = x ∈ R2 = x ∈ R2 = x ∈ R2 = x ∈ R2 : h Ss x − x0 , (−1, s)i = 0 : h x − S−s x0 , SsT (−1, s)i = 0 : h x − S−s x0 , (−1, 0)i = 0 : x1 = xˆ1 where xˆ1 = (S−s x0 )1 . So we see that x1 is constant, i.e. S−s H is parallel to the x2 -axis. But this requires a modification of the shear parameter since it should hold that h f , ψj,k,m i = h f (Ss ·), ψj,k̂,m i. If we define k̂ by k̂ := k + 2 j(α−1) 2 s the equality is fulfilled. Indeed, easy integral substitution with y = Ss x shows Z f R2 1 s 0 1 ! x1 x2 !! ψ 1 k + 2j(α−1)/2 s 0 1 ! 2jα/2 0 0 2j/2 ! x1 x2 !! dx Z = f (y) ψ (Sk A2j y) dy, R2 since SS−1 Sk̂ A2j = Sk A2j SS . To simplify the integration more, we fix a new origin in x1 = xˆ1 (the x2 coordinate for this new origin will be defined in the next step). Then f will be equal to zero on one side of the x2 -axis, i.e. on one side of S−s H since f is only non-zero on B. Say f is equal to zero on x1 < 0. So it suffices to look at h f0 (Ss ·) χΩ , ψj,k̂,m i for Ω = R × R+ and 5.2 Proof of Proposition 4.4 41 (a) Support of a shearlet, which inter- (b) Support of the shearlet after shear- sects the discontinuity line. ing the discontinuity line and the shearlet. Figure 6: Shearlets interacting with corner points. f0 ∈ C β (R2 ). Step 2 Without loss of generality we assume that k̂ < 0 since the case k̂ ≥ 0 can be handled similarly. Because of translation symmetry we can also assume m = (0, 0). Since ψ, and therefore ψλ , are compactly supported, we can define a parallelogram Pj,k̂ that contains supp ψλ , then integration limited to Pj,k̂ : Due to the property of being compactly supported, there is an L > 0 such that supp ψ ⊂ [−L, L] and by a rescaling argument we assume L = 1. Then supp ψj,k̂,0 is contained in Pj,k̂ : = x ∈ R2 : Sk̂ A2j x 1 ≤ 1, Sk̂ A2j x 2 ≤ 1 o n = x ∈ R2 : 2jα/2 x1 + k̂2j/2 x2 ≤ 1, |x2 | ≤ 2−j/2 . One side of Pj,k̂ is given by 2jα/2 x1 + k̂2j/2 x2 = 1 which is equivalent to x1 = 2−jα/2 + 2−j(α−1)/2 k̂x2 . Solving this equation for x2 = 0 gives x1 = 2jα/2 , that means that one side of Pj,k̂ intersects the x1 -axis in x1 = 2−jα/2 . To fix the x2 coordinate of the new origin such that this side of the parallelogram intersects that new origin, it should hold supp ψj,k̂,m ⊂ Pj,k̂ + 2−jα/2 , 0 =: P̃j,k relative to that new origin. The sides of P̃j,k are given by x2 = ±2j/2 , 2jα/2 x1 + 2j/2 k̂x2 = 0 and 2jα/2 x1 + 2j/2 k̂x2 = 2. By a rescaling argument we assume that the right hand side of the last equation is given by 1 instead of 2. Solving the last two equations for x2 yields jα/2 x L1 : x 2 = − 2 2j/2 k̂ 1 j(α−1)/2 x = −2 k̂ 1 , 42 5 PROOFS L2 : x2 = 1 2j/2 − 2jα/2 x1 2j/2 k̂ = 2−j/2 k̂ − 2j(α−1)/2 x1 . k̂ Since x1 ≥ 0 and x2 ≥ −2j/2 , the lower bound for integration along x1 is given by 0 and the upper bound by K1 = 2−jα/2 + k̂2−jα/2 . Which follows directly from solving the equation L2 . This yields the adequate but simpler integration ZK1ZL1 f0 (Ss x) ψj,k̂,m (x) dx2 dx1 h f0 (Ss ·) χΩ , ψj,k̂,m i . 0 L2 where the inner integration is along lines parallel to the singularity curve. Step 3 To simplify the integration we estimate it by using Taylor expansion for Hölder smooth function introduced in (2.1). This also yields a partition of the integral. Afterwards we can estimate each part of the integral separately. The Taylor expansion for f0 (Ss ·) in x2 -direction at each point (x1 , x˙2 ) = ẋ ∈ L1 is given by: " f0 Ss x1 x2 #! ∂ = |f0 (Ss ẋ)| + ∂x2 f0 " # " #!β " #!! " # " #! x x x1 x1 x1 1 1 SS − − +C · x2 x2 ẋ2 ẋ2 ẋ2 = |f0 (Ss ẋ)| " #! " #!! ! " # " #! x1 x1 s x1 x1 ∂ ∂ f0 f0 SS SS · − + ∂x1 ∂x2 x˙2 x˙2 1 x2 ẋ2 " # " #!β x x1 1 +C · − x2 ẋ2 β ! 2j(α−1)/2 2j(α−1)/2 β x1 + x2 + x1 ≤ C (1 + |s|) · 1 + x2 + k̂ k̂ where we used in the last step that (x2 − ẋ2 ) = x2 + 2j(α−1)/2 x1 k̂ since (x1 , x˙2 ) = ẋ ∈ L1 and that all partial derivatives of f are bounded by a constant µ since f is cartoon-like. This yields: ZK1X 3 β Il (x1 )dx1 , h f0 (Ss · χΩ ) , ψj,k̂,m i . (1 + |s|) 0 l=1 (5.1) 5.2 Proof of Proposition 4.4 43 where ZL1 ψj,k̂,m (x)dx2 , I1 (x1 ) = (5.2) L2 ZL1 x2 + I2 (x1 ) = 2j(α−1)/2 k̂ L2 ! ψj,k̂,m (x) dx2 x1 ZL1 = (x2 + K2 x1 ) ψj,k̂,m (x) dx2 , (5.3) L2 ZL1 x2 + I3 (x1 ) = 2j(α−1)/2 k̂ L2 !β ψj,k̂,m (x) dx2 x1 −j/2 2Z k̂ (x2 )β ψj,k̂,m (x1 , x2 − K2 x1 ) dx2 , = (5.4) 0 and K2 is defined by K2 := 2j(α−1)/2 . k̂ Step 4 We now want to compute I1 . An easy computation shows: " #! −jα/2 ξ j(α+1) 2 1 ψbj,k̂,0 (ξ1 , ξ2 ) = 2− 4 ψ̂ , −k̂2−jα/2 ξ1 + 2−j/2 ξ2 (5.5) since: Z ψbj,k̂,0 (ξ1 , ξ2 ) = ψj,k̂,0 (x1 , x2 )e−2πih x, ξi d(x1 , x2 ) R = Substitution = y=Sk̂ A2j x 2j(α+1)/4 2 j(α+1)/4 Z ψ Sk̂ A2j x e−2πih x, ξi d(x1 , x2 ) R Z ψ(y)e −1 −2πih A−1 y, ξi j S k̂ 2 d(x1 , x2 ) R = 2j(α+1)/4 Z ψ(y)e T −2πih y, (S −1 )T (A−1 j ) ξi k̂ 2 d(x1 , x2 ) R = = j(α+1)/4 2 j(α+1) − 4 2 ψb " ψ̂ S −1 k̂ T T A−1 j 2 ξ 2−jα/2 ξ1 −k̂2−jα/2 ξ1 + 2−j/2 ξ2 #! . 44 5 PROOFS By assumption of ψ to be feasible it follows that b ψ (ξ , 0) j,k̂,0 1 j(α+1) −jα/2 −γ −jα/2 −γ −jα/2 α − 4 min 1, 2 k̂ξ1 ξ1 min 1, 2 ξ1 min 1, 2 ≤2 −γ j(α−1) ≤ 2 4 2−jα/2 h1 2−jα/2 ξ1 k̂ , where h1 is given by h1 (ξ) := min (1, |ξ|α ) min 1, |ξ|−γ min k̂ , |ξ|−γ ≤ min (1, |ξ|α ) min 1, |ξ|−γ |ξ|−γ = min |ξ|−γ , |ξ|α−γ min 1, |ξ|−γ . Since α − γ ≥ 0 this shows h1 ∈ L1 (R). The Fourier-Slice Theorem (see Theorem 2.6) applied to ψj,k̂,0 yields Z I1 (x1 ) = Z ψj,k̂,0 (x) dx2 = R ψbj,k̂,0 (ξ1 , 0) e2πiξ1 dξ1 . R And with the estimates above one gets Z −γ I1 (x1 ) . 2j(α−1)/4 2−jα/2 h1 2−jα/2 ξ1 1 + k̂ dξ1 R Z = 2 j(α−1)/4 −γ |h1 (ξ1 )| 1 + k̂ dξ1 R .2 j(α−1)/4 −γ 1 + k̂ , where we used in the last step that h1 is in L1 . Step 5 In this step we estimate I2 , and we will see, that I2 decays faster than I1 . Hence, we can I2 leave out of our analysis. First we can I2 divide since Z Z I2 (x1 ) ≤ x2 ψj,k̂,0 (x)dx2 + |K2 | ψj,k̂,0 (x)dx2 =: S1 + S2 . R R Again by applying the Fourier-Slice-Theorem one gets with fj,k̂,0 (x) := x2 ψj,k̂,0 (x) that Z Z Z 2πiξ1 2πiξ1 [ \ S1 = x2 ψj,k̂,0 (x)dx2 = fj,k̂,0 (ξ1 , 0)e dξ1 = x2 ψj,k̂,m (ξ1 , 0)e dξ1 . R R R (5.6) 5.2 Proof of Proposition 4.4 45 With the property of the Fourier transform (see Proposition 2.3) we know that ∂ b ψ = −ix\ 2 ψj,k̂,0 . ∂ξ2 j,k̂,0 Thus we can estimate Z ∂ 2πix ξ 1 1 dξ1 S1 ≤ ψbj,k̂,0 (ξ1 , 0)e ∂ξ2 R Z −γ −j(α+1)/4 −jα/2 2−j/2 1 + k̂ . 2 h 2 ξ1 R −j(α+1)/4 −j/2 =2 . 2−j(3−α)/4 2 −jα/2 −γ ZR (1 + k̂ h(ξ1 )dξ1 ) (2 −γ 1 + k̂ , where we used in the second step condition (4.2) of Proposition 4.4 and in the last one that h ∈ L1 (R). The estimate for S2 follows directly from the estimate for I1 since S2 ≤ |K2 | I1 and K2 can be estimated as follows: |K2 | ≤ 2−j(α−1)/2 k̂ −jα/2 = 2−j/2 , k̂ 2 since |x1 | ≤ −k̂2−jα/2 . Hence, −γ 1 + k̂ S2 ≤ 2 2 −γ = 2−j(3−α)/4 1 + k̂ . −j/2 −j(α−1)/4 −γ In summary we can conclude that I2 (x1 ) . 2−j(3−α)/4 1 + k̂ . Step 6 It remains to estimate I3 : −j/2 −j/2 2Z k̂ 2Z k̂ (x2 )β kψj,k̂,0 kL∞ dx2 . 2j(α+1)/4 (x2 )β dx2 I3 ≤ 0 0 −(β+1) . 2j(α−2β−1)/4 k̂ . . Step 7 We are now ready to compute h f0 (Ss ·) χΩ , ψj,k̂,0 i. Since I2 decays faster than I1 we can 46 5 PROOFS leave it out of our analysis. Therefore we get the estimate: ZK1 j(α−1)/4 2j(α−2β−1)/4 2 β γ + h f0 (Ss ·) χΩ , ψj,k̂,0 i . (1 + |s|) β+1 dx1 1 + k̂ k̂ 0 −j(α+2β+1)/4 2−j(α+1)/4 = (1 + |s|)β γ−1 + . β 1 + k̂ k̂ (5.7) Suppose that s ≤ 3, then (5.7) reduces to: 2j(α+1)/4 2j(α+1)/4 2j(α+2β−1)/4 . |h f , ψj,k,m i| . 3 + 3 β 1 + k̂ 1 + k̂ k̂ . 2j(α+1)/4 3 1 + k + 2j(α−1)/2 s On the other hand suppose that s ≥ 3/2, then we can do the following estimates: • k2−j(α−1)/2 + s ≥ |s| − k2−j(α−1)/2 ≥ 1 2 − 2−j(α−1)/2 ≥ C for sufficiently large j since k ≤ 2j(α−1)/2 + 1, • • 2−j(α+1)/4 (1+|k̂|)3 = 2−j(7α−5)/4 (2−j(α−1)/2 +|k2−j(α−1)/2 +s|)3 (1+|s|)β (|s|−|1−2−j(α−1)/2 |) 3 ≤ 1+|s| |s|−C β 1 |s|−C ≤ 2−j(7α−5)/4 3, k2 | −j(α−1)/2 +s| 3−β ≤C 1/|s|+1 1−C/|s| ≤ C. Hence, 2−j(α+1)/4 2−j(7α−5)/4 β (1 + |s|)β 3 . 3 (1 + |s|) . 2−j(7α−5) . −j(α−1)/2 |s| − 1 − 2 1 + k̂ Step 8 Now we consider the case iii) and can do a similar computation. Suppose ∂B can be parametrized by (x1 , E(x1 )) with E 0 (x1 ) = 0 and E ∈ C β . In this case we do not need to shear the discontinuity curve since it already is parallel to the x1 -axis. Again let Pj,k the parallelogram that contains the support of ψj,k,0 and fix the new origin, such that one side of the parallelogram intersects this new origin, i.e. define P̃j,k := Pj,k − 2−jα/2 , 0 such that relative to the new origin it holds that supp ψj,k,0 ⊂ P̃j,k . Then Pj,k has the sides x2 ≤ ±1 and L1 : x1 = −2−j(α−1)/2 kx2 , L2 : x1 = 2 · 2−jα/2 − 2−j(α−1)/2 kx2 . 5.2 Proof of Proposition 4.4 47 Therefore h f0 χ , ψj,k,0 i = C Ω̃ −j/2 L 2Z Z1 f0 · ψj,k,0 dx1 dx2 , 0 L2 where Ω̃ = R × R+ . Taylor expansion of f0 in x1 -direction yields: ! ! ! x˙1 x˙1 x1 ∂ + =f0 x1 + 2−j(α−1)/2 kx2 + f0 f0 ∂x1 x˙2 x˙2 x˙2 ! β ξ1 ∂ −j(α−1)/2 x + 2 kx f 1 2 0 ∂x21 x˙2 By assumption (4.2) of Proposition 4.4 we can conclude that Z xl1 ψ(x)dx1 = 0 ∀x2 ∈ R, l = 0, 1 (5.8) R since b 0 ≤ ψ(0, ξ2 ) ≤ min(1, 0α ) min(1, 0−γ ) min(1, |ξ2 |−γ ) = 0 and b 1 , ξ2 ) − ψ(0, b ξ2 ) b 1 , ξ2 ) ∂ b ψ(ξ ψ(ξ ψ(0, ξ2 ) = lim = lim = 0. ξ1 →0 ξ1 →0 ∂ξ1 ξ1 ξ1 Shearing preserves vanishing moments since Z Z xl1 ψ Sk (x1 , x2 )T dx1 = (x1 − kx2 )l ψ (x1 , x2 )T dx1 . R R Thus integration reduces to: h f0 χ , ψj,k,0 i ≤ kψk∞ 2j(α+1)/4 Ω̃ −j/2 L 2Z Z1 0 . 2j(α+1)/4 −j/2 2Z 0 x1 + 2−j(α−1)/2 kx2 L2 Z0 −2−jα/2 = 2−j(3α+2αβ−3)/4 . (x1 )β dx1 dx2 β dx1 dx2 48 5 PROOFS Figure 7: Shearlet intersecting the boundary curve and the smallest parallelogram P that entirely contains the curve in the interior of the shearlet. 5.3 Proof of Theorem 4.5 Let (j, k, m) ∈ Λj,p and fix x̂ = (x̂1 , x̂2 ) ∈ int(2j,p ) ∩ int(supp ψλ ) ∩ B and let s be the slope of the tangent to the edge-curve ∂B at (x̂1 , x̂2 ). Without loss of generality we can assume that the edge curve satisfies E(0) = 0 by translation symmetry and as before m = 0. Select now P to be the smallest parallelogram that entirely contains the edge curve parametrized by (x1 , E(x1 )) or (E(x2 ), x2 ) in the interior of supp ψj,k,0 , and whose two sides are parallel to the tangent to the edge curve at (x̂1 , x̂2 ) = (0, 0). Now we can split the shearlet coefficients in the following way h f , ψj,k,0 i = h χP f , ψj,k,0 i + h χP C f , ψj,k,0 i = h χP f , ψj,k,0 i + h χP C f (Ss ·) , ψj,k̂,0 i, where the shearing operation is used for the part outside the parallelogram since for this we can use the estimate for a linear discontinuity curve. For this reason in this step we can concentrate on computing h χP f , ψj,k,0 i. First we assume that the edge curve can be parameterized by (x1 , E(x1 )) with the slope of the tangent at (0, 0) not equal to zero, or by (E(x2 ), x2 ) where E ∈ C α . Estimating the length of the sides of P will give us the volume of P, and therefore an estimation for the shearlet coefficients. Let d be the length of these sides of the parallelogram that are parallel to the tangent. We observe that d is the distance between two points in which the tangent intersects the boundary of supp ψj,k,0 . From this observation it follows that: d= 2−j/2 p k̂ s2 + 1. To see this, remember that the parallelogram containing the support of ψj,k,0 has sides jα/2 j/2 2 x + k2 x 1 2 ≤ 1, 5.3 Proof of Theorem 4.5 49 with x1 = sx2 , which yields 2j/2 1 = ± , 2jα/2 s + k2j/2 k̂ 1 2j/2 = ±s jα/2 = ±s , 2 s + k2j/2 k̂ x2± = ± x1± " and the distance is given by k x1+ x2+ # − " # x1− x2− k = d. Now we can also estimate the width of the parallelogram P, call it d˜ . Since the edge curve can be parametrized by a C α -function E with bounded curvature, d˜ ≤ 2−j/2 p 2 s +1 k̂ !α . In summary the Volume of P can be estimated as V ol(P) ≤ C 2−j/2 p 2 s +1 k̂ !α+1 =C s2 + 1 (α+1)/2 2−j(α+1)/2 . α+1 k̂ This implies |h f χP , ψj,k,0 i| . 2 j(α+1)/4 kf k∞ kψk∞ −j(α+1)/4 .2 For s < 3 this yields|h f χP , ψj,k,0 i| . h χP C f (Ss ·) , ψj,k̂,0 i . 2−j(α+1)/4 3 (1+|k̂|) s2 + 1 (α+1)/2 2−j(α+1)/2 α+1 k̂ (α+1)/2 s2 + 1 . α+1 k̂ 2−j(α+1)/4 α+1 |k̂| (5.9) and with the estimate for the linear part we are left with: 2−j(α+1)/4 2−j(α+1)/4 2−j(α+1)/4 , |h f , ψλ i| ≤ C α+1 + 3 ≤ C k + 2j(α−1)/2 sα+1 1 + k̂ k̂ α+1 α+1 3 ≥ k̂ since α + 1 ≤ 3. where we used in the last step that 1 + k̂ ≥ 1 + k̂ For the case that s > 3/2 we first observe the following: 1 + s2 α+1 |s|α+1 2 = s2 1 s2 +1 α+1 |s|α+1 2 ≤2 |s|α+1 = 2, |s|α+1 50 5 PROOFS where we used that 1/s < 1 since s > 1. Now estimation (5.9) yields |h f χP , ψj,k,0 i| . 2 s2 + 1 −j(α+1)/4 (α+1)/2 k + 2j(α−1)/2 sα+1 (α+1)/2 s2 + 1 2−j(α+1)/4 = k/s + 2j(α−1)/2 α+1 |s|α+1 2−j(α+1)/4 2 = 2−j (2α +α−1)/4 . 2j(α−1)/2(α+1) Together with the estimate for the linear part h χP C f (Ss ·) , ψj,k̂,0 i . 2−j(7α−5)/4 we get . 2 2 −j(7α−5)/4 h f , ψ i + 2−j (2α +α−1)/4 . 2−j (2α +α−1)/4 , j,k̂,0 . 2 since (7α − 5) /4 ≥ α2 /2 + α/4 − 1/4 for 1 ≤ α ≤ 2 with identity if and only if α ∈ {1, 2}. Now we have still to handle the case that the edge curve is parametrized by (x1 , E(x1 )). Again we let P be the parallelogram that contains the edge curve in the interior of supp ψj,k,0 and observe that one side of P is parallel to the x1 -axis, hence the length d of this side is given by the distance between the boundary lines of supp ψj,k,0 . This observation yields d = 2−jα/2 , 2 d˜ = 2−jα /2 V ol(P) . 2−jα(1+α)/2 . and And finally we have |h f χP , ψj,k,0 i| . 2j(α+1)/4 kf k∞ kψk∞ 2−jα(1+α)/2 . 2j(1−α−2α 2 )/4 . With the same argumentation as in case ii) we obtain the desired result. 5.4 Proof of the Main Result Theorem 4.6 5.4 51 Proof of the Main Result Theorem 4.6 Now we want to prove the main result for a cartoon-like image with a C α -regularity, but without corners. The result from Theorem 4.3 shows that shearlet coefficients of shearlets that do not interact with the discontinuity line can be neglected. Since for the restriction 1 < α ≤ β it is N −β+ε ≥ N −α+ε these shearlet coefficients meet the decay rate. Now we want to estimate the other shearlet coefficients. We do this in two steps; we first estimate |Λj,p (ε)| and |Λj,p | for case 4a) and then for case 4b). Claim 1a: For case 4a) we have the following estimate: |h f , ψλ i| ≤ 2−j(α+1)/4 . Proof. Since ψ ∈ L2 (R2 ) and ψ is compactly supported on Ω ⊂ R2 , Ω bounded, it holds that ψ|Ω ⊂ L2 (Ω) ⊂ L1 (Ω) by the embedding property of the Lebesque spaces. Hence, ψ ∈ L1 (R2 ) and kψkL1 ≤ C for some positive constant C. The Hölder inequality implies: |h f , ψλ i| ≤ kf k∞ kψλ kL1 ≤ kf k∞ C · 2j(α+1)/4 ≤ 2µC2j(α+1)/4 , where in the last step we used that kfi kC β ≤ µ, i ∈ {0, 1} by definition. This in particular implies supx |fi (x)| ≤ µ and supx |f (x)| ≤ supx |f0 (x)| + supx |f1 (x)| ≤ 2µ. For simplicity we may assume that 2µC = 1 what proves the claim. For estimating |Λj,p (ε)| we restrict our attention to the shearlet coefficients that are larger than ε in absolute value. We will see that we can restrict our attention to a special range of scales j. Claim 2a: Λj,p (ε) contains only shearlet coefficients to scales j that meet the inequality j≤ 4 log2 (ε−1 ). α+1 Proof. By definition of Λj,p (ε) and Claim 1a) it is: ε ≤ |h f , ψλ i| ≤ 2−j(α+1)/4 ⇔ log2 (ε) ≤ −j(α + 1)/4 4 ⇔j≤ log2 (ε−1 ). α+1 For the following step we need some new notation which we introduce now: Mj,k,Qj,p := m ∈ Z2 : supp(ψj,k,m ) ∩ int(Qj,p ) ∩ ∂B 6= 0 . For m = (m1 , 0) define Pj,k,m = Pj,k + 2−jα/2 m1 , 0 , (5.10) 52 5 PROOFS and its crossline P0 given by n o P0,j,m = x ∈ R2 : x1 + k2−j(α−1)/2 x2 = 0, |x2 | ≤ 2j/2 + 2−jα/2 m1 , 0 . Claim 3a: For each shear index k and s ∈ [−2, 2] it holds: Mj,k,Q ,p ≤ C · k + 2j(α−1)/2 s + 1 , j (5.11) which is independent of the choice of the point x̂ ∈ int(Qj,p ) ∩ int(supp(ψλ )) ∩ ∂B. Proof. Independence 0 0 Let x̂ be another point ∈ int(Qj,p ) ∩ int(supp(ψλ )) ∩ ∂B and let s and s be the associated 0 slopes of the tangents in the discontinuity curve E in x̂ and x̂ . Since E ∈ C α and α − 1 is the fractional part of α by definition of the Hölder space, there is a constant C1 > 0, independent 0 of x̂, x̂ , such that: α−1 α−1 s − s0 = E 0 (x̂2 ) − E 0 (x̂02 ) ≤ C1 · x̂2 − x̂02 ≤ C1 · 2−j/2 = C1 · 2−j(α−1)/2 , (5.12) where we used that |x2 | ≤ 2−j/2 . And hence, k + 2j(α−1)/2 s0 = k + 2j(α−1)/2 s − 2j(α−1)/2 s + 2j(α−1)/2 s0 ≤ k + 2j(α−1)/2 s + 2j(α−1)/2 s − s0 ≤ k + 2j(α−1)/2 s + C1 ≤ C · k + 2j(α−1)/2 s + 1 . This proves that estimate (5.11) remains asymptotically the same, independently of the values of s and s0 . Estimation For each fixed j, k we want to count the number of translates m ∈ Z such that supp(ψj,k,m ) intersects the discontinuity curve inside Qj,p . Observe that for fixed m1 only a finite number of m2 -translates can fulfill supp(ψj,k,m ) ∩ int(Qj,p ) ∩ ∂B 6= ∅ since this number is bounded by the number of parallelograms Pj,k,m that are intersecting Qj,p and this is independent of the translate p of the cube since the size of Pj,k,m and Qj,p do not change by translation. For this reason it suffices to estimate the number of relevant m1 -translations for a fixed m2 . Then the number of m-translates is given by multiplying the number of m1 -translates with a fixed constant Mj,k,Q ≤ C · |{m1 ∈ Z : supp(ψj,k,m ) ∩ int(Qj,p ) ∩ ∂B 6= 0}| . j,p 5.4 Proof of the Main Result Theorem 4.6 53 For simplicity fix m2 = 0. Without loss of generality assume Q = Qj,p = −2−j/2 , 2−j/2 and H to be the tangent line to ∂B at (0, 0). Note that supp(ψj,k,m ) ⊂ Pj,k,m since supp(ψj,k,0 ) ⊂ Pj,k for m = (m1 , 0). So we can substitute in the following way: Mj,k,Q ≤ C · |{m1 ∈ Z : Pj,k,m ∩ int(Qj,p ) ∩ ∂B 6= 0}| . j,p By definition of a cartoon-like image the curvature of ∂B is bounded, so if we replace ∂B by the tangent line H, meaning: Mj,k,Q ≤ C · |{m1 ∈ Z : Pj,k,m ∩ int(Qj,p ) ∩ H 6= 0}| j,p does not change the estimation in its asymptotically behavior. Replacing Pj,k,m by its crossline P0 does not change it as well. Alltogether it follows the description, that is much easier to handle: Mj,k,Q ≤ C · |{m1 ∈ Z : P0,j,m ∩ int(Qj,p ) ∩ H 6= 0}| . j,p Solving P0,j,m for x1 yields: x1 = 2−jα/2 m1 − 2−j(α−1)/2 kx2 . With the description for the tangent line H: x1 = sx2 , equalizing of both equations gives: sx2 = 2−jα/2 m1 − 2−j(α−1)/2 kx2 ⇔ 2−jα/2 m1 = sx2 + 2−j(α−1)/2 kx2 ⇔ m1 = 2jα/2 s + 2j/2 k x2 ⇔ m1 = 2j(α−1)/2 s + k 2j/2 x2 . Using that x2 ≤ 2−j/2 gives the desired estimate for m1 : |m1 | ≤ 2j(α−1)/2 s + k and the claim is proven. We just provided that estimation (5.11) is independent of the choice of the point x̂ ∈ int(Qj,p )∩ int(supp(ψλ )) ∩ ∂B. But it is also important that by the choice of x̂ we can assure to get the same estimation for the shearlet coefficient. But this holds for sufficiently large scaling index j. 0 0 Claim 4a: Let x̂ be another point ∈ int(Qj,p ) ∩ int(supp(ψλ )) ∩ ∂B and let s and s be the 0 associated slopes of the tangents in the discontinuity curve E in x̂ and x̂ then the following estimation holds: 2−j(α+1)/4 2−j(α+1)/4 α+1 ≤ 2α+1 k + 2jα−1/2 s0 k + 2jα−1/2 sα+1 54 5 PROOFS Proof. Without loss of generality we assume that k + 2j/2 ≥ 2 · C1 , where is the Hölder constant appearing in (5.12). Which can be seen as follows: n o n o k ∈ Z : k + 2j(α−1)/2 s < 2 · C1 = k ∈ Z : −2 · C1 < k + 2j(α−1)/2 s < 2 · C1 n o = k ∈ Z : −2 · C1 − 2j(α−1)/2 s < k < 2 · C1 − 2j(α−1)/2 s < 2 · C1 − 2j(α−1)/2 s − −2 · C1 − 2j(α−1)/2 s = 4 · C1 Hence, the number of parameters k that not fullfil the assumption is bounded by a constant independent on j. Now from (5.12) follows 0 0 k + 2j(α−1)/2 s ≥ 2 · C1 ≥ 2 · 2j(α−1)/2 · s − s = 2 · 2j(α−1)/2 s + k − 2j(α−1)/2 s − k 0 ≥ 2 · k + 2j(α−1)/2 s − k + 2j(α−1)/2 s which implies 0 ⇔ 2 · k + 2j(α−1)/2 s ≥ k + 2j(α−1)/2 s and therefore 2−j(α+1)/4 2−j(α+1)/4 ⇔ α+1 ≤ 2(α+1) k + 2j(α−1)/2 s0 k + 2j(α−1)/2 sα+1 Now we are ready to present the estimation for |Λj,p (ε)|. Claim 5a The number of coefficients in Λj,p (ε) for fixed j and case 4a) can be estimated as follows: 2 1 Λj,p (ε) ≤ C · ε− α+1 · 2−j/4 + 1 . Proof. In this case Theorem 4.5 gives us the following estimate: ε ≤ |h f , ψλ i| ≤ 2−j(α+1)/4 α+1 k + 2j(α−1)2 s 1 ⇒ k + 2j(α−1)2 s ≤ ε− α+1 2−j/4 . (5.13) n o 1 − α+1 j(α−1)2 −j/4 Let Kj (ε) = k ∈ Z : k + 2 s ≤ε 2 . Since Λj,p (ε) is the union of Mj,k,Qj,p over k, we can conclude with the X |Λj,p (ε)| ≤ C · help of claim 3a) X j(α−1)/2 Mj,k,Q ≤ C · k + 2 s + 1 j,p k∈Kj (ε) X ≤C· k∈Kj (ε) ε 1 − α+1 −j/4 ·2 k∈Kj (ε) 1 ≤ C · ε− α+1 · 2−j/4 + 1 2 , +1 5.4 Proof of the Main Result Theorem 4.6 55 1 where in the last step we used that the number of k̂ with k̂ ≤ ε− α+1 · 2−j/4 is bounded by 1 2 · ε− α+1 · 2−j/4 and therefore the number of k ∈ Kj (ε) is bounded by the same number. The next step is to follow analogous results for the case 4b). Note that by analogous arguments as in the proof of independence in claim 3a, it suffices to consider only one fixed x̂ ∈ int(Qj,p )∩ int(supp(ψλ )) ∩ ∂B with associated slope s. Claim 1b: Λj,p (ε) contains only shearlet coefficients to scales j that meet the estimate j ≤ log2 (ε−1 ) 2α2 4 . +α−1 Proof. With Theorem 4.5 ii) respectively iii) and (j, k, m) ∈ Λj,p (ε) it follows: ε ≤ |h f , ψλ i| ≤ 2−j(2α 2 +α−1)/4 ⇔ log2 (ε) ≤ −j(2α2 + α − 1)/4 4 ≥ j. ⇔ log2 (ε−1 ) 2 2α + α − 1 In this case it suffices for the main result to use a very crude estimate for Λj,p (ε). This is displayed in the following claim: Claim 2b: The number of coefficient in Λj,p (ε) for fixed j and case 4b) can be estimated as: |Λj,p (ε)| . 2jα/2 Proof. With the same argumentation as before to estimate the number of translates m for fixed j, k, such that supp(ψj,k,m ) intersects the discontinuity curve inside Qj,p , it suffices to estimate the number of m1 -translates for fixed m2 . So let m2 be fixed, then we can appreciate the number of possible m1 -translates by 2j/2 since Qj,p is a cube of size C · 2−j/2 where C = 2. By definition of the Shearlet-frame the number of shear parameters k is bounded by C · 2j(α−1)/2 . Therefore is |Λj,p | ≤ C · 2j/2 · 2j(α−1)/2 = C · 2jα/2 and in particular |Λj,p (ε)| ≤ C · 2j/2 · 2j(α−1)/2 = C · 2jα/2 since Λj,p (ε) ⊂ Λj,p . Now the estimation of Λj,p (ε) is known for all relevant scales j, so we can estimate Λ(ε) by accumulating Λj,p (ε) over all relevant scales j and all translations p of the cubic window Qj,0 which contains a part of the discontinuity curve. Since the discontinuity curve is contained in [−1, 1]2 only the cubic windows Qj,p ⊂ [−1, 1]2 are relevant; for a fixed scale j that are less than the length of [−1, 1]2 divided by the length of the cubic window, i.e. the number of translations p is bounded by C · 1/2−j/2 = C · 2j/2 . Note that [−1, 1]2 of course contains (2j/2 )2 = 2j cubic windows, but we only count the one 56 5 PROOFS which meet the discontinuity. It follows with the estimation of Λj,p (ε) for case 4a) and 4b): 4 α+1 log2 (ε−1 ) X |Λ(ε)| ≤C · 2 2j/2 ε−1/(α+1) · 2−j/4 + 1 j=0 2 (α− 1 2 )(α+1) log2 (ε−1 )+C X +C · 2j/2 · 2jα/2 j=0 −2 ≤C · log2 (ε−1 ) · ε α+1 −1 −2 To see that note that for α ≤ 2 it is ε α−1/2 ≤ ε α+1 and: 4 α+1 log2 (ε−1 ) 2 2j/2 ε−1/(α+1) · 2−j/4 + 1 X j=0 2 4 4 −1 −1 4 log2 (ε−1 ) · 2 α+1 log2 (ε )/2 ε−1/(α+1) 2− α+1 log2 (ε )/4 + 1 α+1 2 4 = log2 (ε−1 ) · ε−2/(α+1) ε−1/(α+1) · ε1/(α+1) + 1 α+1 ≤ . log2 (ε−1 ) · ε−2/(α+1) and: 2 (α− 1 2 )(α+1) log2 (ε−1 )+C X 2j(α+1)/2 j=0 (α− 1 2)(α+1) log2 (ε−1 )(α+1)/2 2 −1 log ε ·2 2 2 (α − 21 )(α + 1) 2 = log2 ε−1 · ε−(α−1/2) 1 (α − 2 )(α + 1) −1 . log2 ε−1 · ε (α−1/2) ≤ We know that |Λ(ε)| is the number of shearlets ψΛ with shearlet coefficients h f , ψλ i that are larger than ε in magnitude. Setting n = |Λ(ε)| gives n = |Λ(ε)| ≤ C · log2 (ε−1 ) · ε−2/(α+1) ⇔ n(α+1)/2 ≤ C · log2 (ε−1 )(α+1)/2 · ε−1 ⇔ ε ≤ C · n−(α+1)/2 · log2 (ε−1 )(α+1)/2 . With n large enough, more precisely n > ε−1 , this yields: ε ≤ n−(α+1)/2 · log2 (n)(α+1)/2 5.4 Proof of the Main Result Theorem 4.6 57 That is for n ∈ N less than n shearlet coefficients are larger than n−(α+1)/2 · log2 (n)(α+1)/2 and in particular the n-th shearlet coefficient is smaller than this value: |θ (f )|n ≤ n−(α+1)/2 · log2 (n)(α+1)/2 . This implies X n>N |θ (f )|2n ≤ X n −(α+1) (α+1) · log2 (n) Z ∞ ≤ x−(α+1) · log2 (x)(α+1) dx. n n>N By partial integration we obtain the estimate, which proves the claim: Z ∞ x−(α+1) · log2 (x)(α+1) dx n Z ∞ −α α+1 ∞ −(α+1) ∂ (α+1) = C · x log2 (x) − x log (x) dx 2 N ∂x N ≤ C · N −α log2 (N )α+1 where we used in the last step that R∞ N ∂ x−(α+1) ∂x log2 (x)(α+1) dx ≥ 0 since the logartihm is monotonically increasing and therefore that the integrand is positive for the range [N, ∞]. 58 6 EXTENSION TO A SINGULARITY CURVE WITH CORNERS (a) Shearlet intersecting a corner (b) Shearlet interacting with a corner point. point but not intersecting it. Figure 8: Shearlets interacting with corner points. 6 Extension to a Singularity Curve with Corners Now we want to show that the main result also holds for the extended class of cartoon-like β for L > 1. In this case the boundary curve is only required to be piecewise C α images Eα;L smooth, ie. there are finitely many points p - called corner points - where the boundary curve is not C α - smooth. It suffices to consider shearlets interacting with the corner points since for shearlets not interacting with them the estimates we made before still hold. Again our goal is to estimate Λε. There are two ways the shearlet can interact with the corner point. On the one hand the shearlet can intersect a corner point (see Figure 8a) and on the other hand the shearlet can interact with two boundary curves that meet in a corner point (see Figure 8b). For these cases we get the following estimate of Λ(ε): β Theorem 6.1. Let f ∈ Eα;L for 1 < α ≤ β ≤ 2 and suppose that ψ ∈ L2 (R2 ) satisfies the conditions of Theorem 4.6. Consider the following both cases: Case 6a) The shearlets ψλ intersect a corner point, in which two parts ∂B0 and ∂B1 of the edge curve meet. Case 6b) The shearlets ψλ intersect two parts ∂B0 and ∂B1 of the edge curve that meet in a corner point; but the shearlets do not intersect the corner point. Then i) For Case 6a) we get |Λ(ε)| ≤ ε− 2(α−1) α+1 , i.e. the number of shearlets that intersect a corner point and have shearlet coefficients in magnitude larger than ε is bounded by this number. ii) For Case 6b) we get |Λ(ε)| ≤ ε− 2(α−1) α+1 as well. 59 Proof. Case a) The number of shearlets of level j ≥ 0 intersecting one corner point is bounded by the number of shearing indices k, i.e. by C · 2j(α−1)/2 . By assumption there are only finitely many corner points, and of course, the number of corner points is independent of the shearlets and therefore of the scale j. Hence, the number of shearlets, that intersection one of each corner points, is also bounded by C · 2j(α−1)/2 . By equation (5.10) we get 4 α+1 |Λ(ε)| . log2 (ε−1 ) X j=0 2j(α−1)/2 . 2(α−1) 2(α−1) 4 log (ε−1 ) log2 (ε−1 ) · 2 (α+1) 2 . ε− α+1 . α+1 Case b) Let B ∈ ST ARα (ν) be the set which gives the parameterization of f = f0 +f1 χB (see Definition 3.2). Define Q̃0j to be the set of dyadic squares Q containig two distinct boundary curves. Then let C = Q \ B for some cubic window Q ⊂ [0, 1]2 that contains the two boundary curves. Then we can write the function f as f = f0 χC + f1 χB for some f0 , f1 ∈ C β ([0, 1]2 ). Note that this parameterization is different from the one given in Definition 3.2, but of course, if we adapt f0 and f1 this is also true. Now we can also write it in the following way f = f0 χC + f1 χB = (f0 + f1 )χC + f1 . Since for the smooth function f1 the optimality rate is achieved, we can concentrate on f := gχC where g ∈ C β ([0, 1]2 ) is defined as g := f0 − f1 . The utility of this parameteriR zation of f is that the integral Q gψλ dx = h g, ψλ i vanishes on B, and therefore we can split the integral (see Figure 8), and for each one we can use the estimations we made for the case where no corners are at the edge curve. For this reason we show the estimation for the case that the two parts of the edge curve are linear on Q. Then it is very easy to see that the result for a general discontinuity curve follows analogously. Since ∂B0 and ∂B1 are linear on Q we can write them as Li := x ∈ R2 : h x − xi0 , (1, si )i = 0 for i = 0, 1 and some xi0 ∈ R2 . We assume si ≤ 3 for i = 0, 1 and the other cases can be handled similarly; again by the reason that we can split the integral into two parts that can be handled similar to the special i case L = 0. Next we define the set of translates Mj,k,Q as before but for both parts of the edge curve: i Mj,k,Q := m ∈ Z2 : |supp ψj,k,m ∩ Li ∩ Q| = 6 0 . 60 6 EXTENSION TO A SINGULARITY CURVE WITH CORNERS i By the estimate (5.11) we know that Mj,k,Q . k + 2j(α−1)/2 si + 1 := k̂i + 1 for i = 0, 1. It follows that 0 1 . min k + 2j(α−1)/2 si + 1 = min k̂i + 1 . M ∩ M j,k,Q j,k,Q i=0,1 i=0,1 (6.1) Applying Theorem 4.4 to the hyperplanes L0 and L1 , we have 2−j(α+1)/4 2−j(α+1)/4 2−j(α+1)/4 |h f , ψλ i| . 3 + 3 . max 3 , i=0,1 k + 2j(α−1)/2 s0 k + 2j(α−1)/2 s1 k̂i (6.2) since we get the adequate result for each part of the integral. Using (6.1) and (6.2) and assuming k̂0 ≤ k̂1 we can estimate |Λ(ε)| as follows: 4 α+1 |Λ(ε)| . 4 log2 (ε−1 ) X j=0 2 j 2 X X Q∈Q̃0j |k̂0 | −1 ) 2 (ε 5.13 α+1 log X 0 1 + k̂0 . Qj 2 1 j 2 2 1 + ε− α+1 2−j/4 j=0 2 . Q0j ε− α+1 Since the number of dyadic squares Q ∈ Q0j , which contain two distinct boundary curves, is bounded by a constant C > 0 for all levels j ≥ 0 the desired estimate is shown. 61 α (Matlab) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 α (Theory) 2.8 2.6 2.4 2.2 2.2 1.8 1.6 1.4 1.2 1.0 Table 1: Difference between α used in the Matlab implementation and in the code. 7 Implementation The next step is to compute an α-shearlet approximation for special images. Now the problem is that we do not know the regularity of the images and therefore, we do not know which α-shearlet frame we have to use. Hence, we want to learn the α that gives the best decay rate of the approximation error. In this section we want to explain briefly the implementation of the α-shearlet approximation of images. The first subsection deals with the implementation of ShearLab. That is to compute the shearlet coefficients for one α and to reconstruct the image with them. The second subsection focuse on the N -Term approximation; the goal is to find the best α. Note that the code uses another dilation matrix as we use in the theory, but this coincides by adapting the α (see also Table 1): " AM atlab = 7.1 2j 0 0 2jα # " AT heroy = 2jα/2 0 0 2j/2 # Shearlab Implementation Note first that in practice we cannot see the images as functions in L2 (R2 ) but we have to see them as matrices y ∈ Rn,n . The idea of the implementation is based on filters, i.e. the shearlets will be constructed as filters. This is why we introduce filters and the basis operations that we are going to use. Since images are two-dimensional we need two-dimensional filters, but this can be constructed from the one-dimensional case. So we begin by introducing what a one-dimensional filter is. Definition 7.1. Let `(Z) be the space of all sequences c : Z → R. i) Let F, H ∈ `(Z) then the convolution of F and H is defined as F ∗ H: Z → R X m 7→ F (m − k)H(k). k∈Z 62 7 IMPLEMENTATION ii) The upsampling operator ↑n : `(Z) → `(Z) is defined by: F ( k ), if k ∈ nZ n ↑n F (k) = 0, else. iii) Let F ∈ `(Z). The z-transform F ∗ of F is defined as: F ∗ (z) = X F (k)z −k , z ∈ C \ {0} . k∈Z iv) A filter is an operator F : `(Z) → `(Z). Note that if we want to convolve two vectors F and H, which have only finite length (in contrast to sequences), we see them as sequences the following way: Let NF be the length of F and NH the lenght of H. Then define F (n), F̃ (n) = 0, H(n), H̃(n − 1) = 0, if 1 ≤ n ≤ NF else if 1 ≤ n ≤ NH else. Then it holds F ∗ H = F̃ ∗ H̃. We differ two kind of filters. High-pass filters pass the high frequencies of a function and reduce the low frequencies, low-pass filters pass the low frequencies and reduce the amplitude of the high frequencies. An example of this kind of filters can be seen in the Figure 9 and Figure 10. A method to construct a pair of one high-pass and one low-pass filter are quadrature mirror filters. The convenience of this kind of filters is the perfect reconstruction property, i.e. the sum at each frequency of both filters is equal to one (see Figure 11). Definition 7.2. Let h0 be some filter with Fourier transform H0 , which fulfills |H0 (ξ)|2 + |H0 (ξ + π)|2 = 1 for all ξ. Define h1 as h1 (n) = (−1)n h0 (n) for all n ∈ Z. Then h0 and h1 are quadrature mirror filters. Note that h0 and h1 have indeed the perfect reconstruction property. To see this note first that the Fourier transform and the z-transform coincides in the way that F ∗ (eiξ ) = F̂ (ξ) for some F ∈ `(Z). Now it follows: h∗1 (z) = X n h1 (n)z −n = X n (−1)n h0 (n)z −n = X n h0 (n)(−z)−n = h∗0 (−z) 7.1 Shearlab Implementation 63 1 0.6 0.9 0.5 0.8 0.4 0.7 0.6 0.3 0.5 0.2 0.4 0.3 0.1 0.2 0 0.1 −0.1 1 2 3 4 5 6 7 8 0 9 1 2 3 4 5 6 7 8 9 (a) One-dimensional low-pass filter in (b) One-dimensional low-pass filter in time domain. frequency domain. Figure 9: Low-pass filter, which is used in the implementation. 0.6 1 0.5 0.9 0.4 0.8 0.7 0.3 0.6 0.2 0.5 0.1 0.4 0 0.3 −0.1 0.2 −0.2 −0.3 1 0.1 0 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 (a) One-dimensional high-pass filter in (b) One-dimensional high-pass filter in time domain. frequency domain. Figure 10: High-pass filter, which is used in the implementation. 64 7 IMPLEMENTATION and H1 (ξ) = h∗1 (eiξ ) = h∗0 (−eiξ ) = h∗0 (eiπ eiξ ) = h∗0 (ei(π+ξ) ) = H0 (π + ξ) and therefore |H0 (ξ)|2 + |H1 (ξ)|2 = |H0 (ξ)|2 + |H0 (ξ + π)|2 = 1. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 Figure 11: Pair of quadrature mirror filters, which is used in the implementation. The idea is now to construct the two-dimensional filters from the one-dimensional filters. For the low frequency part this is very simple, since it is rectangularly. The two-dimensional lowpass filter we need can be computed as hT0 · h0 . This multiplication gives a matrix since h0 is a row vector. The Figure 12 shows how the low-pass filter looks in frequency domain. 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 Figure 12: Two-dimensional low-pass filter in frequency domain. For the high frequency part it is not that easy, because the shearlets should have their essential support in the cones (compare Figure 3). For this reason we use two-dimensional fan-filters, whose support in frequency domain is cone-like (see Figure 13). 7.1 Shearlab Implementation 65 18 16 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 Figure 13: Two-dimensional fan filter. The next step is to upsample the filter, such that the filters will be compressed. Note that the fan filter is a matrix and not a vector, so we do the upsampling defined above for every column. The result can be seen in Figure 14a. As Figure 14a shows, the essential support of the resulting shearlets does not lie in the cones C (remember the partition of frequency domain introduced in Section 4.1 and Figure 3). To restrict only at the frequencies in that cones (high frequencies) we convolve the filter with the highpass filter of the quadrature mirror filters defined above. Note that the convolution of a matrix with a vector is the same as if we convolve every row with the vector. For that reason we can use the one-dimensional convolution defined above. Figure 14b shows the resulting essential support, which conform to the essential support of a unsheared shearlet. 70 90 80 60 70 50 60 40 50 40 30 30 20 20 10 0 10 0 0 2 4 6 8 10 12 14 (a) Upsampled fan filter. 16 18 0 5 10 15 20 25 30 35 40 45 (b) Upsampled and convolved fan filter. Now we have to shear and that is the point where the α ∈ (0, 1] and therefore the difference of the frames comes into play. Let N be the size of the image, the number of shears can be 66 7 IMPLEMENTATION computed as nj = Jf − j − b(Jf − j) ∗ αc, (7.1) where Jf = log2 (N ) is called the finest level and j is the actual level. In the following we will use Jf = 9 and J = 2, where J is the number of levels. The reason for the rounding in this computation is the need of an integer number of shears. But this yields the same number for some α. Table 2 shows the number of shears for each level and each α in the case Jf = 9 and J = 2. α=0.1-0.2 α=0.3 α=0.4 α=0.5-0.6 α=0.7 α=0.8-1.0 Level 1 3 3 2 2 1 1 Level2 4 3 3 2 2 1 Table 2: Number of shears for each level depending on α. After constructing the shearlet frame, we are able to compute the shearlet coefficients. Let shearlow be the the filter of the low frequency part and shearj,kj the filter of level j and shearing kj where kj ∈ [1, nj ]. Let ŷ the Fourier transform of the image y, then the coefficient matrices can be computed in frequency domain as: (dlow )n,m = (shearlow )n,m · ŷ (dj,kj )n,m = (shearj,k )n,m · ŷ And of course, by applying the inverse Fourier transform we get the coefficients in time domain. For reconstruction of the image from the coefficients we have to pay attention to the fact that the shearlet frame is not a tight frame. Therefore we need the dual frame for reconstruction. ˜ low , shear ˜ j,k be the dual frame elements (for the computation look at the Matlab Let shear code). Then in Fourier domain we can reconstruct the image as: ŷn,m ˜ low )n,m + = (dlow )n,m · (shear nj J X X ˜ j,k )n,m (dj,kj )n,m · (shear j j=1 kj =1 By applying the inverse Fourier transform the image is reconstructed. 7.2 N-Term Approximation and Results Now assume that we have a special image that we like to approximate. As mentioned above, we do not know it’s regularity and we have to learn the best α. But for this implementation it is easy. Remember that we get, by choosing Jf = 9 and J = 2, only six different frames. 7.2 N-Term Approximation and Results 67 Therefore it is no problem to compute the decay rate of the approximation fault for each of this frames and to compare them. But if we look to larger images we are able to choose Jf and J also larger and therefore the number of different frames increases and we have to think about a ’real’ dictionary learning. Having computed the shearlet coefficients for one α-shearlet frame, we can approximate an image by using the N largest coefficients in magnitude and compute the decay rate of the approximation fault. Since the coefficients of the low frequency part are for each α the same, we use all coefficients of this part and then start the approximation. In each step i = 1, 2, . . . we take the Ni largest coefficients in magnitude of the high frequency part and reconstruct the image y and call this reconstruction yNi . Then we compute the fault as eNi = ky − yNi kF If we assume that the fault decays as N −γ , it holds for some constant C > 0 eN = C · N −γ . Logarithmizing on both sides yields log2 (eN ) = log2 (C) − γ log2 (N ). This implies that we get in each step the error: log2 (eNi ) = log2 (C) − γ log2 (Ni ), which yields the matrix-multiplication: 1 log2 (N1 ) " # log (eN ) 1 log (N2 ) log (C) 2 2 2 2 = . .. .. . −γ . . . log2 (eNn ) 1 log2 (Nn ) log2 (eN1 ) Therefore we have to find a least square solution of this problem, which we can compute with the help of the Matlab function ’polyfit’. Let us do this computation for different images in the next subsection. 68 7.3 7 IMPLEMENTATION Results First we tried to compute the N-term approximation for different images with the Shearlab implementation introduced before. But with this implementation one always find the same α as the best. That is because this implementation is not suited for a N -Term approximation (that takes the N -largest coefficients in magnitude), since the supports of translatet shearlets overlap. Hence, if a shearlet meet the discontinuity, many of its translates meet it also and there are to much large coefficients. Hence, all belonging coefficients will be chosen without an additional fee for the image reconstruction. For this reason the implementation has to be adapted. But at the time it does not work for all possible α. Therefore we have to restrict to some of the possible α. First we want to look at synthetic images. Therefore we draw an image of parallel lines and go up to more and more parallel oscillating curves (see Figure 14). For each of these images we compute the approximation rate for some α and compare them. For these images we want to compute the decay rate for α = 2; 2, 5; 3. Do not wonder about the choice of α. Of course, it would be better to choose α also smaller than two. But there is still a problem in the adapted code for this choice of α. For our consideration, this should not be a problem because the point here should be to see some differences. In addition to this bachelor’s thesis we will do this considerations, too. If we look at the graphics that show the decay-rate of the N-term approximation of the images, we see that in the first steps the decay rate is not that large. But the theory claims about the decay rate for sufficiently large j. For this reason we have to look at the decay-rate of the finest-level, which is in our case j = 3. Up to a certain point, the decay rate increases immense because we use to much information. So it should be sufficient to look at the first two to ten percent of coefficients to compute the suitable decay rate. For this number of coefficients the PSNR is between 40db and 60db (for the definition of the PSNR look forward to equation 8.1). This is a good value for the PSNR, i.e. one could not see any differences in the picture. If we look at Figure 15, we can imagine that the PSNR decreases the more the curves oscillate, and this is indeed the case. So it would be better to take more coefficients for the images of more oscillating curves, but to have also a comparability between the different images and not only between the different α we decided to take the same number of coefficients for each image. As mentioned before a PSNR of 40db is a good PSNR. The Table 3 contain the computed decay rates for the mentioned α and the images shown in Figure 14. For a better imagination of this rates see Figure 18 where the graph of the mapping log2 (f ault) = f (log2 (number of coefficients)) is drawn. The Figure 15 shows the decay rate of the different images for a fixed α. What we can see is the more the curve oscillates, the better the approximation is for a smaller α. 7.3 Results 69 (a) Lines (b) Oscillating 1 (c) Oscillating 2 (d) Oscillating 3 (e) Oscillating 4 (f ) Oscillating 5 (g) Oscillating 6 Figure 14: Family of synthetic images of more and more parallel oscillating curves. Fig. 16a Fig. 14b Fig. 14c Fig. 14d Fig. 14e Fig. 14f Fig. 14g α=2 −0, 9431 −0, 8441 −0, 7417 −0, 8377 −0, 8433 −0, 8806 −0, 9416 α = 2.5 −1, 3302 −0, 7858 −0, 7299 −0, 7773 −0, 8236 −0, 7221 −0, 7076 α=3 −1, 3582 −0, 7610 −0, 7304 −0, 7460 −0, 7865 −0, 6395 −0, 6351 Table 3: Decay rate for synthetic images. 70 7 IMPLEMENTATION [2 2] 0 Lines osz 1 osz 2 osz 3 osz 4 osz 5 osz 6 −1 log2(fault) −2 −3 −4 −5 −6 8 9 10 11 12 13 14 log2(coefficients) 15 16 17 18 15 16 17 18 (a) α = 2 [2 3] 0 −1 log2(fault) −2 −3 −4 Lines Oscillating 1 Oscillating 2 Oscillating 3 Oscillating 4 Oscillating 5 Oscillating 6 −5 −6 −7 8 9 10 11 12 13 14 log2(coefficients) (b) α = 2, 5 [3 3] 0 −1 log2(fault) −2 −3 −4 Lines Oscillating 1 Oscillating 2 Oscillating 3 Oscillating 4 Oscillating 5 Oscillating 6 −5 −6 −7 8 9 10 11 12 13 14 log2(coefficients) 15 16 17 18 (c) α = 3 Figure 15: Graph of the approximations. Comparison of the different images for fixed α. 7.3 Results 71 0 alpha=2 alpha=2,5 alpha=3 −1 log2(fault) −2 −3 −4 −5 −6 −7 8 9 10 11 12 13 14 log2(coefficients) 15 16 17 18 (a) Lines 0 alpha=2 alpha=2,5 alpha=3 −0.5 −1 log2(fault) −1.5 −2 −2.5 −3 −3.5 −4 8 9 10 11 12 13 log2(coefficients) 14 15 16 17 18 (b) Oscillating 2 0 alpha=2 alpha=2,5 alpha=3 −0.5 log2(fault) −1 −1.5 −2 −2.5 −3 −3.5 8 9 10 11 12 13 14 log2(coefficients) 15 16 17 18 (c) Oscillating 6 Figure 16: Graph of the approximations. Comparison of the different images for fixed α. 72 7 IMPLEMENTATION Lena Barabara α=2 −0, 8578 −0, 9560 α = 2, 5 −0, 7758 −0, 8701 α=3 −0, 7514 −0, 8181 Table 4: Decay rate for synthetic images. Now we want to look at real images. We take two of the most famous pictures in the image processing theory; this are the pictures ’Lena’ and ’Barbara’ (see Figure 17). For this real images we see that a smaller α is better for the approximation rate, but the differences are not that large. Do not wonder about that the results do not show the decay rates we have estimated in the theoretical part of this thesis. For this estimation we made the assumption of infinitely many frame elements (shearlets) and have estimated the decay rate for sufficiently large j. In the case of the implementation we have only finitly many frame elements. All in all the examples show that in the case of this implementation a smaller α yields a better decay rate of the error of the N -term approximation. But there are also pictures for which a larger α yields a better decay rate (remember Figure 16a). (a) Lena (b) Barbara Figure 17: Famous images for Image Processing: ’Lena’ and ’Barbara’ 7.3 Results 73 −4 alpha=2 alpha=2,5 alpha=3 −4.5 log2(fault) −5 −5.5 −6 −6.5 −7 −7.5 8 10 12 14 log2(coefficients) 16 18 20 (a) Lena −2.5 alpha=2 alpha=2,5 alpha=3 −3 −3.5 log2(fault) −4 −4.5 −5 −5.5 −6 −6.5 8 10 12 14 log2(coefficients) 16 18 20 (b) Barbara Figure 18: Graph of the approximations for the images ’Lena’ and ’Barbara’. 74 8 8 CONCLUSION Conclusion In this bachelor’s thesis we have introduced cone-adapted discrete shearlet systems, which are adapted on some α. Then we have shown that by using the α-adpted shearlet system the approximation error of the largest N -Term approximation decays as N −α · log2 (N )α+1 . Therefore that it provides almost the optimal sparse approximation rate, which is N −α as we have also shown. In the last section we have introduced the implementation of the shearlet transform to Matlab and have computed some decay rates. Our goal was to find the ’best’ α-adapted shearlet system for some special images, since we do not know the regularity of an image when we see it and therefore we do not know which shearlet system, more precisely which scaling matrix, we have to use. At the moment, the code does not work for all possible α we looked at in the theoretical part, since we have had to adapt it. So the next step is to make the adapted code work for all these α and to look at the decay rate for these. As mentioned in the last section, if we approximate bigger images, we are able to choose a larger finest level and therefore more levels overall. The expectation is that we see larger differences in the decay rates for different α. Furthermore, for larger j the number of equal shearing numbers (although for different α) decreases and therefore, the number of different α that yield the same transformation. To see what we mean remember equation 7.1. So if we have many different transformations and want to find the best, we have to learn it. So a next step will be to implement a dictionary learning, that finds the best transformation (i.e. the most suitable α-shearlet system) for special classes of images. However, we also want to look at other measures. This means we do not only want to measure the approximation quality as the decay rate of the N -Term approximation. Another measure that is imaginable, is the denoising quality: assume that we have an image and we add noise to it. Then we can use the shearlet transform to denoise it. The denoise quality can be measured now as the difference between the PSNR of the noised and the PSNR of the denoised image. The PSNR is defined as P SN R := 20 · log10 255 · N kf d − f kF , (8.1) where f is the original image to which we add noise, f d the denoised image and N the size of the image. Note that to measure the denoising quality we can use the original Matlab code. REFERENCES 75 References [1] Robert A. Adams. Sobolev spaces. Academic Press, 1975. [2] Kristian Bredies and Dirk Lorenz. Mathematische Bildverarbeitung. Vieweg + Teubner Verlag, 2011. [3] E. J. Candés and D. L. Donoho. New tight frames of curvelets and optimal representations of objects with piecewise C 2 singularities, pages 216–266. Comm. Pure and Appl. Math. 56, 2004. [4] Manfred Dobrowolski. Angewandte Funktionalanalysis, volume 2. Springer-Verlag, 2005. [5] David L. Donoho. Sparse components of images and optimal atomic decompositions. Constructive Approximation, 17:353–382, 2001. [6] Eugenio Hernández and Guido Weiss. A first course on wavelets. CRC Press LLC, 1996. [7] Gitta Kutyniok and Demetrio Labate. Resolution of the wavefront set using continuous shearlets. Trans. Amer. Math. Soc., 361:2719–2754, 2009. [8] Gitta Kutyniok, Jakob Lemvig, and Wang-Q Lim. Multiscale analysis for multivariate data, chapter Shearlets and Optimally Sparse Approximations. Birkhäuser-Springer, 2011. [9] Gitta Kutyniok and Wang-Q Lim. Compactly supported shearlets are optimally sparse. J. Approx. Theory, 163:1564–1589, 2011. [10] Gitta Kutyniok, Wang-Q Lim, and Jakob Lemvig. Compactly supported shearlet frames and optimally sparse approximation of functions in L2 (R3 ) with piecewise C α singularities. Submitted, 2011. [11] R. Nowak. Lecture 15: Denoising smooth functions with unknown smoothness. [12] Dirk Werner. Funktionalanalysis, volume 6. Springer-Verlag, 2000.