A Flexible Shearlet Transform - Sparse Approximations and

Transcription

A Flexible Shearlet Transform - Sparse Approximations and
A Flexible Shearlet Transform - Sparse Approximations and
Dictionary Learning
Bachelorarbeit
zur Erlangung des Grades
Bachelor of Science in der Mathematik
Technische Universität Berlin
Fakultät für Mathematik und Naturwissenschaften
Studiengang Mathematik
vorgelegt von
Sandra Keiper
(Matrikelnr.: 318795)
Erstprüfer: Prof. Dr. Gitta Kutyniok
Zweitprüfer: Prof. Dr. Reinhold Schneider
Die selbstständige und eigenhändige Ausfertigung versichert an Eides statt
Berlin, den
...........................................................................
Unterschrift
Deutsche Zusammenfassung der Arbeit
Das Feld der Bildverarbeitung ist sehr breit gefächert. Es reicht von Gebieten wie dem Entrauschen von Bildern, der Kantenerkennung bis zur Kompression von Bildern. Ein wichtiges
Ziel ist es, Bildmodelle und zu diesen Repräsentationssysteme zu finden, die in der Lage sind
Kanten in einer optimalen Art zu erkennen. Optimal heißt dabei, dass wenige Elemente des
Repräsentationssystems ausreichen um die Kanten zu erkennen oder das Bild darzustellen.
Die ersten gut bekannten Repräsentationssysteme sind Wavelet Basen/Frames. Diese haben
gute Eigenschaften im Erkennen von Punktsingularitäten. Kanten können Waveletsysteme allerdings nicht in der zuvor erwähnten optimalen Art erkennen. Aus diesem Grund haben sich
Gitta Kutyniok und andere damit beschäftigt eine Erweiterung dieser Systeme zu finden, die
dazu geeignet sind Kanten in einer optimalen Art zu erkennen.
Diese Arbeit beschäftigt sich mit dieser Weiterentwicklung von Waveletsystemen, den sogenannten Shearletsystemen und die Optimalität der Shearletsysteme für eine N -Term Approximation der Bilder. Dazu wird im ersten Teil der Arbeit der theoretische Grundstein gelegt;
Die Fourier Transformation und Hölder Räume sowie Frames und Wavelets werden eingeführt.
Danach beginnt der eigentliche Teil der Arbeit.
Wie zuvor erwähnt muss zunächst ein Bildmodell gefunden werden, das natürliche Bilder möglichst allgemein modeliert. Mit natürlichen Bildern meinen wir dabei Bilder die in der Natur
tatsächlich vorkommen, also zum Beispiel Fotografien von realen Dingen. Ein verbreitetes
Modell ist das Modell der Cartoon-ähnlichen Bilder (besser bekannt unter ihrer englischen
Bezeichnung: cartoon-like images). Die Idee dieses Modells ist, dass ein Bild f ∈ L2 (R2 ) (oder
bei einem natürliches Bild ein genügend kleiner Bildausschnitt) aus zwei glatten Gebieten
besteht die durch eine (ebenfalls glatte) Kurve von einander getrennt sind, für eine bessere
Vorstellung siehe Abb. 1. Die Glattheit ist dabei natürlich noch genauer zu spezifizieren. Die
Arbeiten von Gitta Kutyniok und anderen basieren auf Cartoon-ähnlichen Bildern, deren beiden Regionen genauso wie die Unstetigkeitskurve C 2 -stetig sind. In dieser Arbeit soll es nun
aber darum gehen diese Resultate auf allgemeinere Cartoon-ähnliche Bilder zu übertragen.
Das heißt statt der C 2 -Stetigkeit wollen wir nur noch C β -Stetigkeit der beiden separierten
Regionen und C α -Stetigkeit der separierenden Kurve fordern, wobei 1 < α ≤ β ≤ 2.
Definition (Cartoon-ähnliche Bilder).
β
Die Menge der Cartoon-ähnlichen Bilder Eα,L
(R2 ) ist die Menge aller Funktion f : R2 → C
der Form
f = f0 + f1 χB
wobei B eine sternenförmige Region mit stückweiser C α (R2 )-stetiger Randkurve ist und
f0 , f1 ∈ C β (R2 ) ist
Diese Klasse von Bildern wollen wir nun mit Hilfe eines Repräsentationsystems approximieren.
Sei also Ψ = (ψi )i∈I eine Basis für L2 (R2 ). Dann existiert für jedes f ∈ L2 (R2 ) eine Folge von
Koeffizienten (ci (f ))i∈I so dass f dargestellt werden kann in der Form:
X
f=
ci (f )ψi
i∈I
Abb. 1: Beispiel eines Cartoon-ähnlichen Bildes.
Für die N-term Approximation suchen wir nun die N betragsmäßig größten Koeffizienten
(ci (f ))i∈IN und rekonstruieren f in der Form:
X
fN =
ci (f )ψi
i∈IN
Der Fehler kfN − f k den wir dabei machen fällt natürlich wenn wir mehr Koeffizienten dazu
nehmen. Mit welcher Rate dieser Fehler fällt, wenn wir als Repräsentationssysteme Shearletsysteme wählen, soll in dieser Arbeit überprüft werden. Für die spezielle Klasse von Cartoonähnlichen Bildern mit C 2 -Regularität sowie für dreidimensionalen Fall (hier auch für die allgemeinere Klasse von Cartoon-ähnlichen Bildern), also für f ∈ L2 (R3 ), sind die Approximationsraten schon bewiesen worden. Beachte das Shearletsysteme keine Basen mehr sind, sondern
allgemeiner Frames, der Einfachheit halber soll uns dies aber in dieser Zusammenfassung nicht
weiter interessieren. Für diese Klasse von Cartoon-ähnlichen Bildern wird in Kapitel 3 gezeigt,
dass die beste Fallrate, die allgemein erreicht werden kann bei N −α liegt. Und das Resultat
dieser Arbeit wird sein, dass mit Hilfe von Shearletsystemen diese Approximationsrate erreicht
wird. Aber zunächst wollen wir Shearletsysteme definieren.
Definition. Für α ∈ [1, 2) seien die Skalierungsmatrizen A2j , Ã2j , j ∈ Z, definiert als
"
#
"
#
2jα/2
0
2j/2
0
A2j =
, Ã2j =
,
0
2j/2
0
2jα/2
and die Shearmatrix Sk als
"
#
1 k
.
Sk =
0 1
Dann ist für c = (c1 , c2 ) ∈ R2+ das kegel-adaptierte diskrete Shearletsystem SH Φ, Ψ, Ψ̃; c, α
für den Parameter α ∈ (1, 2] generiert durch Φ, Ψ, Ψ̃ gegeben durch
SH Φ, Ψ, Ψ̃; c, α = Φ (ϕ; c1 , α) ∪ Ψ (ψ; c, α) ∪ Ψ̃ (ψ; c, α) ,
wobei
Φ (ϕ; c1 , α) = ϕm (· − m) : m ∈ c1 Z2 ,
n
o
Ψ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ (Sk A2j · −m) : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ cZ2 ,
n
o
Ψ̃ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ SkT Ã2j · −m : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ c̃Z2 ,




Abb. 2: Aufteilung des Frequenzbereiches.
mit c̃ = (c2 , c1 ). Mit cZ2 meinen wir, dass für z = (z1 , z2 ) ∈ Z2 cz = (c1 z1 , c2 z2 ) ist.
Wenn man die Aufteilung des Frequenzbereiches, die in Abb. 2 dargestellt ist, betrachtet kann
man zeigen, dass Ψ (ψ; c, α) eine Frame für
n
o
L2 f ∈ L2 (R2 ) : ess-supp fb ⊂ C1 ∪ C3 .
ist und Ψ̃ (ψ; c, α) für
L2
n
o
f ∈ L2 (R2 ) : ess-supp fb ⊂ C2 ∪ C4 .
Das Hauptresultat der Arbeit lautet nun:
Theorem. Angenommen ϕ, ψ and ψ̃ ∈ L2 (R2 ) haben kompakten Träger und SH Φ, Ψ, Ψ̃; c, α
bildet eine Frame für den L2 (R2 ). Dann erfüllt SH Φ, Ψ, Ψ̃; c, α - unter einigen zusätzlichen
Voraussetzungen, die in der folgenden Arbeit spezifiziert werden - fast die Eigenschaft der optiβ
malen Sparse Approximation für Funktionen f aus Eα,L
(R2 ). Das heißt, es gibt eine Konstante
C > 0, so dass:
kfN − f k22 ≤ CN −α · (log2 N )α+1
für N → ∞, wobei fN die N -Term Approximation von f ist, die durch auswählen der N
betragsmäßig größten Koeffizienten entsteht.
Im letzten Abschnitt der Arbeit wird die Implementation der Shearlet Transformation beschrieben und mit dieser Implementation für verschiedene Bilder die Abfallrate des Fehlers
der N -Term Approximation berechnet.
Contents
1 Introduction
11
2 Theoretical Basics
13
2.1
Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2
Hölder Spaces and Fractional Order Sobolev Spaces . . . . . . . . . . . . . . . . 15
2.3
Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4
Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Optimality Result
β
Eα;L
24
3.1
The Class
of Cartoon-like Images . . . . . . . . . . . . . . . . . . . . . . . 24
3.2
Optimality Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Decay of the Approximation Error using a Shearlet System
29
4.1
Cone-adapted Shearlet System . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2
Decay Rate of the N-term Approximation Error . . . . . . . . . . . . . . . . . . 32
5 Proofs
37
5.1
Proof of Proposition 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2
Proof of Proposition 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3
Proof of Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4
Proof of the Main Result Theorem 4.6 . . . . . . . . . . . . . . . . . . . . . . . 51
6 Extension to a Singularity Curve with Corners
58
7 Implementation
61
7.1
Shearlab Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2
N-Term Approximation and Results . . . . . . . . . . . . . . . . . . . . . . . . 66
7.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8 Conclusion
74
List of Figures
1
Example of a cartoon-like image. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2
Natural image containing a cartoon-like structure.
3
Partition of frequency domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4
Support of the shearlet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5
Intersection of the cubic window with the support of a shearlet and the discon-
. . . . . . . . . . . . . . . . 25
tinuity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6
Shearlets interacting with corner points. . . . . . . . . . . . . . . . . . . . . . . 41
7
Shearlet intersecting the boundary curve and the smallest parallelogram P that
entirely contains the curve in the interior of the shearlet. . . . . . . . . . . . . . 48
8
Shearlets interacting with corner points. . . . . . . . . . . . . . . . . . . . . . . 58
9
Low-pass filter, which is used in the implementation. . . . . . . . . . . . . . . . 63
10
High-pass filter, which is used in the implementation. . . . . . . . . . . . . . . . 63
11
Pair of quadrature mirror filters, which is used in the implementation. . . . . . 64
12
Two-dimensional low-pass filter in frequency domain. . . . . . . . . . . . . . . . 64
13
Two-dimensional fan filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
14
Family of synthetic images of more and more parallel oscillating curves. . . . . 69
15
Graph of the approximations. Comparison of the different images for fixed α. . 70
16
Graph of the approximations. Comparison of the different images for fixed α. . 71
17
Famous images for Image Processing: ’Lena’ and ’Barbara’ . . . . . . . . . . . . 72
18
Graph of the approximations for the images ’Lena’ and ’Barbara’. . . . . . . . . 73
List of Tables
1
Difference between α used in the Matlab implementation and in the code. . . . 61
2
Number of shears for each level depending on α.
3
Decay rate for synthetic images. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4
Decay rate for synthetic images. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
. . . . . . . . . . . . . . . . . 66
11
1
Introduction
The field of image processing includes a lot of areas like denoising, compression and feature
detection. An important goal of this field is to find image models and representation systems
for this special model that are capable to detect edges in an optimal sparse way. One of
the most known representation systems are wavelet frames, which have good properties in
detecting point singularities. Wavelet-based methods are used for all the subfields of image
processing mentioned above. However wavelet-frames are not able to detect edges in an
optimal sparse way. For this reason my mentors Gitta Kutyniok and Wang-Q Lim, and
others, developed the wavelet methods further.
To study methods of image processing in a general way one has to restrict themself to an
image model. There are different ways to do this, for example digital and continuous models.
In the theoretical part of this thesis we use a continuous model, i.e. cartoon-like images, but
of course for the implementation we have to use a digital model, i.e we have to see the image
as an element of Rn,n . However the cartoon-like model is based on the assumption that a
’natural’ image is composed of smooth parts separated by finitely many singularities. But
what does the ’smoothness’ in this context mean? In earlier definitions of cartoon-like images
one looks at C 2 -smooth areas that are separated by a C 2 -smooth discontinuity curve. But
this concept has been developed further to C α -smooth areas.
The mother function (generator function) of the representation system can be chosen either
as band-limited or as compactly supported. Most of the results that do exhibit optimal sparse
approximation are only applicable to band-limited generators. The first proof in this context
for compactly supported generators was given in [9], but only for the standard model of
cartoon-like images.
In this thesis we want to expand the results given in [9] to cartoon-like images that are C α smooth apart from an C β singularity curve, but only for 1 < α ≤ β ≤ 2 . In the first section
we aim to introduce all the theoretical basics we need to understand the results and their
proofs. The second chapter introduces cartoon-like images mathematically and derives the
optimally approximation rate for this class of images. Afterwards in the third chapter we like
to introduce shearlet frames that depend on the smoothness of the image we would like to
analyze. Moreover we will claim the main result of this thesis: the optimal approximation
rate can be reached by using these shearlet frames. Section five proofs the main result and
all the results we need for the main results. In section six we expand the result to singularity
curves that are only piecewise C β -smooth.
Up to this point we will assume that we know the smoothness of our image that we like to
approximate or to denoise etc. But in practise we do not know this smoothness and therefore,
we do not know which frame we have to use. The last section deals with this problem. The
main idea is to compute the approximation rate of an image for a special α and then to learn
12
1 INTRODUCTION
the best α. So first the implementation of the shearlet transform to Matlab will be explained
and afterwards we will lern the best α for this images.
13
2
Theoretical Basics
In this section we want to introduce the basic knowledge that we need for the theoretical part
of this thesis. First we introduce the Fourier transform in L2 (Rn ) and some of its important
properties. Accordingly the Hölder spaces and some important fractional order Sobolev spaces
should be defined. Then we want to say something about frames and wavelets that got to be
extended to shearlets. First we want to formulate the multi-index notation:
Definition 2.1. An n-dimensional multi-index α is an n-tupel α = (α1 . . . αn ). When we
α
α
α1
α2
write |α| we mean the
α·n· · + αn , by x we think of x = x · · · · · x and by
α1|α| =
α1 +
sum
.
. . . ∂x∂n
Dα we mean Dα = ∂x∂ 1
2.1
Fourier Transform
Definition 2.2. For f ∈ L1 (Rn ) set:
(Ff ) (ξ) =
1
(2π)n/2
Z
f (x)e−ih ξ, xi dx
∀ξ ∈ Rn .
Rn
The function Ff is called Fourier transform of f .
Ff is well-defined since the integral exists for f ∈ L1 (Rn ). The following proposition states a
few easy properties of Ff .
Proposition 2.3 ([12]). Let f ∈ L1 (Rn ).
i) It is Ff ∈ C0 (Rn ) and F : L1 (Rn ) → C0 (Rn ) is a continuous and linear operator.
ii) For y ∈ Rn is F (f (· − y)) (ξ) = e−ih y, ξi Ff (ξ).
iii) For a > 0 is F (f (a·)) (ξ) = a−1 Ff ( aξ ).
iv) Is xα f ∈ L1 (Rn ) for |α| ≤ k then Dα (Ff ) = (−i)|α| F (xα f ).
To define the Fourier transform for functions f ∈ L2 (Rn ) it is useful to limit F to a subspace
of L1 (Rn ) that is also dense in L2 (Rn ). The so-called Schwartz space is such a subspace.
Definition 2.4. A function f : Rn → C is called rapidly decreasing if
lim xα f (x) = 0
|x|→∞
∀α ∈ Nn0 ,
where xα = xα1 1 · · · · · xαnn . The space
n
o
S (Rn ) = f ∈ C ∞ (Rn ) : Dβ f is rapidly decreasing for all β ∈ N0n
is called Schwartz space.
14
2 THEORETICAL BASICS
The Schwartz space is a subspace of L1 (Rn ), so the Fourier transform is well-defined on
this space. The extension of the Fourier transform to L2 -functions is therefore based on the
following results:
Proposition 2.5 ([12]).
i) S(Rn ) is dense in L2 (Rn ).
ii) The Fourier transform is a bijection between S(Rn ) and S(Rn ). The inverse operator is
defined by
F
−1
1
f (x) =
(2π)n/2
Z
f (x)eih ξ, xi dx
∀ξ ∈ Rn .
Rn
And it yields
h Ff , FgiL2 = h f , giL2 .
In particular it follows that kFf kL2 = kf kL2 for all f ∈ S(Rn ) and therefore the Operator F
is well-defined, bijective and isometric on S(Rn ) with respect to the norm k · kL2 . Because of
the density of S(Rn ) in L2 (Rn ), one can extend F to a continuous operator on L2 (Rn ), called
the Fourier-Plancherel transform and denoted by F2 . One often use fb instead of F2 f . The
so-called Plancherel equation is fulfilled for this extension:
h F2 f , F2 giL2 = h f , giL2 and kF2 f kL2 = kf kL2 ∀f, g ∈ L2 (Rn )
Now we want to state the Fourier Slice Theorem which gives us a connection between a
projection of a two-dimensional function and the Fourier transform.
Theorem 2.6 (Fourier Slice Theorem).
Let f : R2 → C, then the projection of f onto the x2 -axis is given by:
Z
p(x1 ) = f (x1 , x2 ) dx2 .
R
The slice is then:
s(ξ1 ) = fb(ξ1 , 0) =
1
2π
Z Z
f (x1 , x2 ) e−2πix1 ξ1 dx1 dx2
R R
=
1
2π
1
=
2π
Z
R
Z


Z
 f (x1 , x2 ) dx2  e−2πix1 ξ1 dx1
R
p(x1 )e−2πix1 ξ1 dx1
R
1
=
pb(ξ1 )
(2π)1/2
2.2 Hölder Spaces and Fractional Order Sobolev Spaces
Hence,
q
1
b(ξ1 )
2π p
Z
= fb(ξ1 , 0) and with the inverse Fourier transform:
1
f (x1 , x2 ) dx2 = p(x1 ) =
(2π)1/2
R
2.2
15
Z
pb(ξ1 )e
2πiξ1
Z
dξ1 =
R
fb(ξ1 , 0)e2πiξ1 dξ1 .
R
Hölder Spaces and Fractional Order Sobolev Spaces
In this thesis we aim to get an optimality result for one class of images, called cartoon-like
images. Meaning, for some 1 < α ≤ β ≤ 2, images of two smooth parts separated by a
curve of C α -regularity, where the both smooth parts can be described with functions in C β .
Therefore the Hölder space C α (Rn ) as well as some fractional order Sobolev spaces have to
be introduced. First we introduce the space of all m-times continuous differentiable functions
C m.
Definition 2.7. Let Ω ⊂ Rn ,
i) the space C m (Ω), for a non-negative integer m, is a vector space consisting of all functions
ϕ which, together with their derivative Dα ϕ of order α ≤ m, are continuous on Ω.
ii) the vector space C m (Ω) is the subspace consisting of all those functions ϕ ∈ C m (Ω) for
which Dα ϕ is bounded and uniformly continuous on Ω for all 0 ≤ |α| ≤ m.
Note that, ϕ ∈ C m (Ω) need not to be bounded, but if ϕ ∈ C m (Ω) is bounded and uniformly
continuous then it possesses a unique and bounded continuous extension to Ω [1]. With the
norm kϕkC m (Ω) = max sup |Dα ϕ(x)| is C m (Ω) a Banach space [1].
0≤|α|≤m x∈Ω
Now we are ready to introduce the Hölder spaces:
Definition 2.8.
i) For 0 ≤ λ ≤ 1 a function ϕ fulfills the Hölder condition of exponent λ if there exists a
positive integer K such that
|ϕ(x) − ϕ(y)| ≤ K |x − y|λ
∀x, y ∈ Ω.
ii) If 0 ≤ λ ≤ 1, we define C m,λ (Ω) to be the subspace of C m (Ω) consisting of all those
functions ϕ for which Dα ϕ satisfies Hölder condition of exponent λ in Ω for all 0 ≤ |α| ≤
m. A function ϕ ∈ C m,λ (Ω) is called Hölder-α smooth.
Often we write σ = m + λ, where m is an positive integer and 0 ≤ λ ≤ 1, and C σ (Ω) instead
of C m,λ (Ω). Note that C m,λ (Ω) together with the norm
kϕkC m,λ(Ω) = kϕkC m (Ω) + max
sup
0≤|α|≤m x,y∈Ω
x6=y
|Dα ϕ(x) − Dα ϕ(y)|
|x − y|λ
16
2 THEORETICAL BASICS
is a Banach space [1].
There is an equivalent definition for the Hölder space that yields an estimate we will use later.
Therefore this should be stated here [11]:
f is a Hölder-α smooth function if it has bαc derivatives and if there exists a constant
C > 0 such that
α
f (x) − Tybαc (x) ≤ C |x − y|
bαc
where Ty
∀x, y,
denotes the Taylor polynomial of degree bαc at the point y.
bαc
Note that for the range 1 < α ≤ 2 and Ω ⊂ R it is Ty (x) = f (y) + f 0 (y) |x − y|, and this
yields the inequality:
α
f (x) − Tybαc (x) = f (x) − f (y) + f 0 (y) |x − y| ≤ C |x − y|
⇒ |f (x)| − f (y) + f 0 (y)(x − y) ≤ C |x − y|α
⇒ |f (x)| ≤ |f (y)| + f 0 (y) |x − y| + C |x − y|α
(2.1)
Now we want to introduce some fractional order Sobolev spaces; that is, to extend the notion
of Sobolev spaces with integer values to non-integer values. There are a lot of methods to
do this, but we want to restrict ourselves to the Sobolev-Slobodezki space W s,p (Rn ) and
the Sobolev space H s (Rn ). The reason is that the Sobolev-Slobodezki space W s,p (Rn ) is
featured with a norm which uses a Hölder condition and the Sobolev space H s (Rn ) can be
defined with the Fourier transform. But first remember the definition of the Sobolev space
W m,p for a non-negative integer m.
Definition 2.9.
i) For a non-negative integer m and 1 ≤ p < ∞ the norm k · km,p is defined as
kukm,p :=

 X

0≤|α|≤m
kDα ukpp
1/p

.

ii) The Sobolev space W m,p (Ω) is defined as
W m,p (Ω) := {u ∈ Lp (Ω) : kukm,p < ∞} .
Note that with Dα u we mean the weak partial derivative. Now we can introduce the SobolevSlobodezki space:
Definition 2.10.
2.2 Hölder Spaces and Fractional Order Sobolev Spaces
17
i) For 0 < µ < 1 and 1 ≤ p < ∞ the Slobodezki seminorm is defined as

1/p
Z Z
p
|v(x) − v(y)|
|v|µ,p := 
dxdy  .
|x − y|d+µp
Ω Ω
ii) The Sobolev-Slobodezki space is defined as
W s,p (Ω) := {v ∈ W m,p : |Dα v|µ,p < ∞ for all |α| = m} ,
where s = µ + m.
Now the Fourier transform comes into play again. We will see that for p = 2 the SobolevSlobodezki space W s,2 (Rn ) coincides with the Sobolev space H s (Rn ) which is defined by means
of the Fourier transform.
Proposition 2.11 ([4]). Let u ∈ W s,2 (Rn ) with s larger than zero. Then there are some
constants c1 > 0 and c2 > 0, such that
c1 · kuk2s,2 ≤ k 1 + | · |2
s/2
Fuk22 ≤ c2 · kuk2s,2 .
Definition 2.12. For s > 0
i) the norm k · kH s is defined as
s/2
kukH s := k 1 + |·|2
Fuk2 ,
ii) the Sobolev space H s (Rd ) for s > 0 is defined as
H s (Rd ) := {u ∈ S(Rn )∗ : kukH s < ∞} ,
where S(Rn )∗ denotes the dual space of the Schwartz space S(Rn ).
From Proposition 2.11 it follows that the two spaces coincides, i.e. W s,2 (Rn ) = H s (Rn ).
Later it would be easier to handle with elements of the space H s (Rn ) than of C s (Rn ), so we
need a connection between these two spaces. We first show how C s (Rn ) and W s,2 (Rn ) are
related. Because of the concord of H s (Rn ) and W s,2 (Rn ) this also shows the desired relation.
Since we will later restrict ourselves to R2 it should be enough to claim the following theorem
for R2 .
Theorem 2.13. Let 0 < s < ∞, then for every ε > 0 the following embedding is true:
C0s+ε (R2 ) ⊂ W s,2 (R2 ),
where the zero as usual indicates that we only look at the subspace of elements that are compactly supported.
18
2 THEORETICAL BASICS
Proof. Let ϕ ∈ C0s+ε (R2 ), then there is an Ω > 0 such that ϕ is supported on [−Ω, +Ω]2 ⊂ R2
and by the Hölder condition we have for s + ε = m + µ and K > 0
|Dα ϕ(x) − Dα ϕ(y)|
≤K
|x − y|µ
∀ |α| = m.
To show that ϕ is also in W s,2 (R2 ) we have to show that |Dα ϕ|µ−ε,2 < ∞. This yields the
estimation:
|Dα ϕ(x) − Dα ϕ(y)|2
2+2(µ−ε)
|x − y|
=
|Dα ϕ(x) − Dα ϕ(y)|
1
·
µ
|x − y|
|x − y|1−ε
2
≤ K2 ·
1
|x − y|2(1−ε)
.
Hence,
α
|Dα ϕ(x) − Dα ϕ(y)|2
Z Z
|D ϕ|µ−ε,p =
|x − y|2+2(µ−ε)
ZΩ ZΩ
dxdy =
−Ω −Ω
R R
ZΩ
≤K 2 ·
ZΩ
−Ω −Ω
1
|x − y|2(1−ε)
|Dα ϕ(x) − Dα ϕ(y)|2
|x − y|2+2(µ−ε)
dxdy
dxdy ≤ ∞
1−ε
1
where we used in the last step that for every ε > 0 the map (x, y) 7→ x−y
is contained
1−ε
in L1 [−Ω, +Ω]2 since the map x 7→ x1
is contained in L1 (Ω̃) for every ε > 0 and every
bounded region Ω̃ ⊂ R:
ZΩ ZΩ
−Ω −Ω
1
dxdy =
|x − y|1−ε
ZΩ Ω−y
Z
−Ω −Ω−y
1
dxdy ≤ K ·
|x|1−ε
ZΩ
dy < ∞
−Ω
But this shows that the Slobodezki norm k · ks,2 for ϕ is bounded and therefore that ϕ is also
in W s,2 , which proves the theorem.
To conclude this subsection it remains to introduce a fractional order derivative of order
smaller than s for a function ϕ ∈ H s (Rn ). There are different methods to do this (see for
example [1]) but we want to introduce the method based on the Fourier transform. Remember
the following property of the Fourier transform:
Lemma 2.14 ([12]). Let ϕ ∈ W m,2 (Rd ) for a non-negative integer m. Then we have for
|α| ≤ m
F (Dα ϕ) = i|α| ξ α Fϕ.
Since the right hand side makes also sense for non-integer values |α| we can define the fractional
order derivative this way.
Definition 2.15. Let ϕ ∈ W s,2 (Rd ) for s > 0. Then define for |α| ≤ s the fractional order
derivative Dα ϕ to fulfill
F (Dα ϕ) = i|α| ξ α Fϕ.
2.3 Frames
2.3
19
Frames
Remember, if we have a Hilbert space H and a orthonormal basis S ⊂ H of H, then all
P
elements x of H can be displayed as x =
h x, eie. This so-called reconstruction formula
e∈S
is a very nice property of an orthonormal basis. But to be orthonormal is a very strong
condition, so it would be nice to disclaim the property of being orthonormal without losing
this reconstruction formula. Since frames do this, we want to introduce them now. Later
we will see that the cone adapted shearlet system indeed forms a frame. Hence, we have the
reconstruction formula.
Definition 2.16.
i) Let {ϕi : i ∈ I} be a collection of elements in a Hilbert space H. Then (ϕi )i∈I forms a
frame for H, if there exist constants 0 < A ≤ B < ∞ such that
Akf k2 ≤
X
|h f , ϕi i|2 ≤ Bkf k2 .
i∈I
The constants A and B are called frame bounds.
ii) If A = B is possible, then (ϕi )i∈I is called a tight frame. If A = B = 1, then (ϕi )i∈I is
called a Parseval frame.
As mentioned in the beginning of this subsection, we want to have a reconstruction formula
for a frame. Note that for a tight frame the reconstruction formula is very simple to argue;
since we have
X
|h f , ϕi i|2 = Akf k2
for all f ∈ H
i∈I
it is easy to see that f can be displayed as [6]
f=
1 X
h f , ϕi iϕi
A
for allf ∈ H.
i∈I
For the general case this is much more complicated and one first needs to introduce the frame
operator:
Definition 2.17.
i) Let (ϕi )i∈I ⊂ H be a frame for H. Then
T : H → `2 (I)
f 7→ (h f , ϕi i)i∈I
20
2 THEORETICAL BASICS
is the analysis operator of (ϕi )i∈I . The adjoint operator is given by
T ∗ : `2 (I) → H
X
(ci )i∈I 7→
ci ϕi
i∈I
and is called the synthesis operator.
ii) The frame operator with respect to (ϕi )i∈I is given by:
S =TT∗: H→H
X
f 7→
h f , ϕi iϕi .
i∈I
Note that the synthesis operator is well-defined since T is bounded [6]. Now we can present
the desired reconstruction formula:
Theorem 2.18 ([6]). Let (ϕi )i∈I ⊂ H be a frame for H and let S be its frame operator as
defined above. Then S −1 ϕi is also a frame for H with frame bounds B −1 and A−1 , the
so-called canonical dual frame. Then for each f ∈ H we have
i) the reconstruction formula
f=
X
h f , ϕi iS −1 ϕi ,
i∈I
ii) the decomposition formula
f=
X
h f , S −1 ϕi iϕi .
i∈I
Since our goal is to claim how the N-term approximation error decays, it is useful to find a
boundary of this error as in the Lemma.
Lemma 2.19 ([8]). Let (ϕi )i∈I be a frame for H with frame bounds A and B, and let (ϕ̃i )i∈I
be the canonical dual frame defined above. Let IN ⊂ I with # |IN | = N , and let fN =
P
h f , ϕi iϕ̃i . Then
i∈IN
kf − fN k2 ≤
1 X
|h f , ϕi i|2 .
A
i∈I
/ N
2.4 Wavelets
2.4
21
Wavelets
The shearlet concept to deliver sparse representations of anisotropic features are based on the
concept of wavelet to deliver sparse representations of singularities. For this reason we want
to introduce wavelets and wavelet bases for L2 (R2 ) and a result that illustrates why wavelets
are far from reaching the optimality rate. The two-dimensional wavelets are an extension of
the one-dimensional case. So first we consider the one-dimensional case, especially L2 (R).
Definition 2.20. Let ψ ∈ L2 (R), ψ 6= 0. Then ψ is called a wavelet if
Cψ :=
2
Z∞ ψ̂(ξ)
ξ
dξ < ∞.
0
Note that this condition, the so-called wavelet condition, says that the Fourier transform of a
wavelet in the near of zero goes rapidly to zero.
Proposition 2.21 ([2]). The wavelet condition is only fulfilled if ψ̂(0) = 0.
Now we aim to get a wavelet orthonormal basis for L2 (R). For this we need to know the
concept of the multiresolution analysis.
Definition 2.22. Let ψ be a wavelet. A collection of functions {ψj,k (x)}j,k∈Z of the form
ψj,k (x) = 2−j/2 ψ (2 − jx − k) is called a wavelet system in L2 (R). If the wavelet system forms
an orthonormal basis for L2 (R), the system is called wavelet orthonormal basis.
Definition 2.23. A sequence (Vj )j∈Z of closed subspaces of L2 (R) is called multiresolution
analysis (MRA) if the following conditions are fulfilled:
i) Inclusion: For all j ∈ Z it is {0} ⊂ ... ⊂ V2 ⊂ V1 ⊂ V0 ⊂ V−1 ⊂ V−2 ⊂ ... ⊂ L2 (R)
ii) Totality: ∪j∈Z Vj = L2 (R)
iii) Trivial Intersection: ∩j∈Z Vj = {0}
iv) Scaling: For all j ∈ Z it is: f ∈ Vj ⇔ f 2j · ∈ V0
v) Translation: For all j, k ∈ Z it is: f ∈ Vj ⇔ f (· − 2j k) ∈ Vj
vi) Scaling Function: There exists a function ϕ ∈ L2 (R), such that {ϕ (· − m) : m ∈ Z }
is an orthonormal basis (ONB) of V0 . This function is typically called scaling function.
Note that by (iv) and (vi) we have:
Vj = span {ϕj,m : m ∈ Z} where ϕj,m (x) = 2−j/2 ϕ 2−j x − m .
22
2 THEORETICAL BASICS
Lemma 2.24 ([2]). Let ϕ be the scaling function associated with a MRA. Then there exists a
sequence (hk )k∈Z ⊂ R such that
ϕ(x) = 21/2
X
hm ϕ(2x − m)
∀x ∈ R.
k∈Z
We say ϕ satisfies the scaling equation.
Definition 2.25. Let (Vj )j∈Z ⊂ L2 (R) form a MRA. Then the associated wavelet spaces
Wj , j ∈ Z are defined as
Vj = Vj+1 ⊕ Wj+1 ,
Wj+1 ⊥Vj+2 .
Note that by this definition it follows that [2]
M
Wm
Vj =
m≥j+1
and because of the completeness of Vj also
L2 (R) =
M
Wm .
m∈Z
With that results we can construct a wavelet basis for L2 (R), what the following theorem [2]
shows:
Theorem 2.26 ([2]). Let (Vj ) a MRA with generator ϕ that fulfills the scaling equation with
the sequence (hk ). Define ψ ∈ V−1 as
ψ(x) = 21/2
X
(−1)k ϕ(2x − k).
k∈Z
Then the following statements are true:
i) The set ψj,k = 2−j/2 ψ(2−j · −k) : k ∈ Z is an ONB of Wj .
ii) The set {ψj,k : k, j ∈ Z} is an ONB of L2 (R).
iii) The function ψ is a wavelet.
In fact we wanted to construct a Wavelet basis for L2 (R2 ) and with the orthonormal wavelet
basis for L2 (R) we can do this by tensor products.
Theorem 2.27 ([2]). Let (Vj ) be a MRA for L2 (R) with scaling function ϕ ∈ L2 (R) and
associated wavelet ψ ∈ L2 (R). Define for (x1 , x2 ) ∈ R2
ψ 1 (x1 , x2 ) = ϕ(x1 )ψ(x2 )
ψ 2 (x1 , x2 ) = ψ(x1 )ϕ(x2 )
ψ 3 (x1 , x2 ) = ψ(x1 )ψ(x2 ),
2.4 Wavelets
23
Then the set
k
Ψ = (x1 , x2 ) 7→ ψj,m
(x1 , x2 ) = 2−j ψ k (2−j x1 − m1 , 2−j x2 − m2 ) : j, m1 , m2 ∈ Z; k ∈ {1, 2, 3}
is an ONB for L2 (R2 ).
Note that this ONB is a so-called wavelet basis. From [7] we know that for a ’nice’ wavelet
ψ and a function f that is smooth apart from a discontinuity point x0 ∈ R2 the continuous
wavelet transform
Wψ f (2, k) = 2−j/2
Z
ψ(2−j (x − k))f (x)dx
R2
decays rapidly as j → ∞ except for k near x0 . So the continuous wavelet transform is
able to locate point singularities. But the next theorem will show that the approximation rate
obtained using wavelet approximation is O(N −1 ) and later we will see that this approximation
rate is far away from being optimal.
Theorem 2.28 ([8]). Let Ψ be a wavelet basis for L2 (R2 ). Suppose f = χB , where B is a ball
contained in [0, 1]2 . Then
kf − fN k2L2 N −1
for N → ∞,
where fN is the best N-term approximation from Ψ.
24
3 OPTIMALITY RESULT
Figure 1: Example of a cartoon-like image.
3
Optimality Result
In this section we want to deduce the optimal approximation rate that can be reached for
cartoon-like images. Therefore the class of cartoon-like images has to be defined first.
3.1
β
The Class Eα;L
of Cartoon-like Images
The main idea to model a image mathematically was given in [3]; that is, such images contain
two smooth (C 2 -) regions separated by a smooth (C 2 -) curve (see Figure 1). In this thesis
we do not want to restrict ourselves to the C 2 - regularity, therefore we consider images that
contain two smooth C β -regions separated by a C α -region, where α and β lie between one and
two. For more clarity we introduce the set ST ARα (ν, L) first, and afterwards the class of
β
.
cartoon-like images Eα,L
For α > 0 and β > 0, let ρ : [0, 2π) → [0, ∞) be continuous, then define the set B ⊂ R2 by:
B = x ∈ R2 : θ ∈ [0, 2π) , kxk2 ≤ ρ(θ), x = (kxk2 , θ) in spherical coordinates
such that the boundary ∂B of B is a closed surface parametrized by:
!
ρ(θ) cos(θ)
b(θ) =
, θ ∈ [0, 2π)
ρ(θ) sin(θ)
and the radius function ρ is Hölder continuous with coefficient ν, i.e:
0 γ
∂ p(θ) − ∂ γ p(θ )
max sup
≤ ν,
{α}
|γ|=bαc θ,θ0
kθ − θ0 k2
(3.1)
(3.2)
where {α} = α − bαc.
Definition 3.1.
i) For ν > 0 the set ST ARα (ν) is defined to contain all sets B ⊂ R2 that are translates of
sets obeying (3.1) and (3.2).
β
3.1 The Class Eα;L
of Cartoon-like Images
25
Figure 2: Natural image containing a cartoon-like structure.
ii) The class ST ARα (ν, L) is defined to be the set containing all sets B with piecewise
C α -boundary, i.e. ∂B is the union of finitely many pieces ∂B1 , ..., ∂BL , which do not
overlap except at their boundaries and each piece ∂Bi , i ∈ {1, ..., L}, can be represented
in parametric form by a C α -smooth radius function ρi = ρi (θ) obeying (3.2).
Note that the inequality (3.2) in particular implies that the discontinuity curve is C α -regular
and that by definition we have no restriction to the sharpness of the edges since there is no
specification in how the pieces ∂Bi meet. Also observe that ST ARα (ν) = ST ARα (ν, 1). With
introducing ST ARα (ν, L) we are ready to define the class of cartoon-like images:
Definition 3.2.
β
i) Let µ, ν > 0, α, β ∈ (1, 2] and L ∈ N. Then Eα,L
(R2 ) denotes the set of functions
f : R2 → C of the form
f = f0 + f1 χB ,
where B ∈ ST ARα (ν, L) and fi ∈ C β (R2 ) with supp f0 ⊂ [0, 1]2 and kfi kC β ≤ µ for
β
i = 0, 1. Define Eαβ (R2 ) = Eα,1
(R2 ).
bin (R2 ) denotes the class of binary cartoon-like images, that is, functions f = f +
ii) Eα,L
0
β
f1 χB ∈ Eα,L
(R2 ) where f0 = 0 and f1 = 1.
This model of images seems to appropriate since natural images often can be characterized to
contain many smooth regions separated by curves with sharp corners. So if we only look at a
small detail of an image, it looks like our model. For an example see Figure 2.
26
3 OPTIMALITY RESULT
3.2
Optimality Rate
β
In this subsection we aim for a benchmark of sparse approximation of functions in Eα;L
⊂
L2 (R2 ). So let Ψ = (ψi )i∈I be a dictionary for L2 (R2 ) where I not necessary countable.
Without loss of generality we assume kψi kL2 = 1. Then for every function f ∈ L2 (R2 ) there
exists a countable subset If of I and a sequence (ci (f ))i∈If = c(f ) such that
X
ci (f )ψi .
f=
i∈If
For a N-term approximation we have to search the N-th coefficient in Ψf = (ψi )i∈If . But
without a restriction of the search depth this can be infeasible in practise. The reason is that
we can choose Ψ to be a countable close subset of L2 (R2 ) since L2 (R2 ) is separable. But this
would yield an arbitrarily good sparse approximation since there is always an element of Ψ
that is nearer to f than another one. Hence, the search would never end. For this reason
we will obtain polynomial depth search, this requires for a polynomial π(n), that the first n
terms in the n-term approximation have to come from the first π(n) terms of the dictionary [5].
To find out the optimality rate by the polynomial depth search restriction we have to define
what it means for a function space to contain a copy of `p0 . We will see that if a function
space F contains a copy of `p0 , then for every τ < p there exists an element f ∈ F such that
c(f ) ∈
/ `τ0 . From this we can deduce the optimality rate.
Definition 3.3.
i) A function space F is said to contain an embedded orthogonal hypercube of dimension m
and side δ if there exist an f0 ∈ F and orthogonal functions ψi,m,δ for i = 1, ..., m with
kψi,m,δ kL2 = δ such that the collection of hypercube vertices
P
H(m; f0 , (ψi )) = h = f0 + m
i=1 εi ψi,m | εi ∈ {0, 1}
is embedded in F .
ii) A function space F is said to contain a copy of `p0 if F contains embedded orthogonal
hypercubes of dimension m(δ) and side δ and if for some sequence δk → 0 and some
constant C > 0 there exist a k0 such that:
m(δk ) ≥ Cδk−p
∀k ≥ k0
(3.3)
Theorem 3.4 ([5]). Suppose F contains a copy of `p0 . Then for every τ < p, and allowing
only polynomial depth search, we have
max kc(f )klτ = +∞.
f ∈Eβα (R2 )
3.2 Optimality Rate
27
Now we want to translate these results to our model of cartoon-like images and we will see
for which p the class of cartoon-like images contains a copy of `p0 .
Theorem 3.5.
i) The class of binary cartoon-like images Eαbin (R2 ) contains a copy of `p0 for p = 2/(α + 1).
ii) The space of Hölderfunctions C β (R2 ) with compact support in [0, 1]2 contains a copy of
`p0 for p = 2/(β + 1).
Proof.
i) Follows directly from Donoho [5] for a star-shaped function since B is in particular
star-shaped.
ii) To show that C β (R2 ) contains a copy of `p0 we have to find a collection of embedded
orthogonal hypercubes of dimension m(δ) and size δ such that (3.3) holds.
Let ϕ ∈ C0∞ with supp ϕ ⊂ [0, 1] and define for m ∈ N and i1 , i2 ∈ {0, ..., m − 1}
ψi,m (t) = ψi1 ,i2 ,m (t) = m−β ϕ(mt1 − i1 )ϕ(mt2 − i2 ),
where i = (i1 , i2 ) and t = (t1 , t2 ) ∈ R2 . We let ψ(t) = ϕ(t1 )ϕ(t2 ), then ψ and ψi,m ∈
C β (R) since ϕ ∈ C ∞ (R). It follows that kψi,m k2L2 = m−2β−2 kψk2L2 . To see this note
R
2
−1
2
R |ϕ(mtk − ik )| dtk = m kϕkL2 .
T
i2 i2 +1
i1 i1 +1
, m ] × [m
, m ], we have supp ψi,m supp ψj,m = ∅ for i 6= j.
Since supp ψi,m ⊂ [ m
Hence, ψi,m and ψj,m are orthogonal in L2 (R) for i 6= j and we have the hypercube
embedding
2
H(m ; 0, (ψi,m )) = h =
m X
m
X
εi1 ,i2 ϕ(m · −i1 )ϕ(m · −i2 ) | εi ∈ {0, 1}
i1 =1 i2 =1
2
= h=
m
X
εi ψi | εi ∈ {0, 1} ,
i=1
2
where δ = kψi,m kL2 = m−β−1 kψkL .
Therefore, if we choose m(δ) as m(δ) =
δ
kψkL2
−
1
β+1
, it follows for δk → 0 and δk0
sufficiently small
2
δk
kψkL2
−
2
β+1
δk
kψkL2
−
2
β+1
m(δk ) ≥
≥
− 2
2 !
β+1
β+1
δk
δk
−1=
1−
kψkL2
kψkL2
!
2
β+1
δk0
− 2
1−
≥ C · (δk ) β+1 .
kψkL2
28
3 OPTIMALITY RESULT
With this result we can see that kc(f )klp cannot be bounded for p < max{2/(α+1), 2/(β +1)}
which follows from Theorem 3.4.
Let ((cn (f )∗ )n )n∈N be a (in modulus) decreasing rearrangement of the coefficients (c(f )n )n∈N .
For the coefficient (cn (f )∗ )n to be not in `p0 means that (cn (f )∗ )pn behaves asymptotically
worse than
since f ∈
(cn
(f )∗ )
n
1
, i.e.(cn (f )∗ )pn ≥ n12 for
n2
Eαbin (R2 ) which contains a
− β+1
β
2
2
≥n
n sufficiently large. Therefore it is (cn (f )∗ )n ≥ n−
copy of l0p for p =
2
α+1 .
α+1
2
Analogously it follows that
since f ∈ C (R ). This implies the lower bound
(cn (f )∗ )n . n− min{
α+1 β+1
, 2
2
}.
α+1
Suppose now (|c∗n |)n∈N = (|cn (f )∗ |)n∈N decays as |c∗n | . n− 2 . From Lemma 2.19 we know
P
that kf − fN k2 ≤ A1
|h f , ϕi i|2 where A is a lower frame bound for the frame (ϕi )i of
i∈N
/
L2 (R2 ). Hence,
∞
Z
1 X −(α+1)
1 X ∗ 2
|cn | .
n
. x−(α+1) dx . CN −α ,
kf − fN k ≤
A
A
2
n>N
n>N
N
for (c(f )n )n∈N = h f , ϕn i.
By supposing that |c∗n | decays as |c∗n | . n−
β+1
2
, this yields kf − fN k2 . CN −β . Summa
rizing, it follows that the optimal approximation error cannot exceed O max N −α , N −β
convergence. For the parameter range 1 < α ≤ β ≤ 2 this rate reduces to O (N −α ).
29
4
4.1
Decay of the Approximation Error using a Shearlet System
Cone-adapted Shearlet System
As we have seen in Theorem 2.28 and Section 3 by using wavelet approximation for a cartoonlike image the approximation rate is far from being optimal. The reason is that wavelets are
only able to locate point singularities. But a cartoon-like image is characterized by curve
singularities. So the authors of [7] extended the wavelet transform by a shearing operation
what yields a new system; the shearlet system. In this section we want to introduce the
shearlet system.
For α ∈ (1, 2] we fintroduce the scaling matrices A2j , Ã2j , j ∈ Z, defined by
"
A2j =
2jα/2
0
0
2j/2
"
#
,
Ã2j =
2j/2
0
0
2jα/2
#
,
and the shear matrix Sk , defined by
"
Sk =
1 k
0 1
#
.
As to be seen in Figure 3 we partition the frequency domain into the following four cones
C1 = (ξ1 , ξ2 ) ∈ R2
C2 = (ξ1 , ξ2 ) ∈ R2
C3 = (ξ1 , ξ2 ) ∈ R2
C4 = (ξ1 , ξ2 ) ∈ R2
: ξ1 ≥ 1, |ξ2 /ξ1 | ≤ 1 ,
: ξ2 ≥ 1, |ξ1 /ξ2 | ≤ 1 ,
: ξ1 ≤ −1, |ξ2 /ξ1 | ≤ 1 ,
: ξ2 ≤ −1, |ξ1 /ξ2 | ≤ 1 ,
and a centered square
R = (ξ1 , ξ2 ) ∈ R2 : k (ξ1 , ξ2 ) k∞ < 1 .
This partition is useful since for forming a frame for L2 (R2 ) it suffices to take a set of shearing
parameters k that is bounded. The idea is now to construct a so-called shearlet frame for the
subspace of L2 (R2 ) that is induced by rectangle R as well as by C1 ∪ C3 and C2 ∪ C4 which all
together form a frame for L2 (R2 ).
Definition 4.1.
For c = (c1 , c2 ) ∈ R2+ the cone-adapted discrete shearlet system SH Φ, Ψ, Ψ̃; c, α for the
parameter α ∈ (1, 2] generated by Φ, Ψ, Ψ̃ is defined by
SH Φ, Ψ, Ψ̃; c, α = Φ (ϕ; c1 , α) ∪ Ψ (ψ; c, α) ∪ Ψ̃ (ψ; c, α) ,
30
4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM




Figure 3: Partition of frequency domain.
where
Φ (ϕ; c1 , α) = ϕm (· − m) : m ∈ c1 Z2 ,
o
n
Ψ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ (Sk A2j · −m) : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ cZ2 ,
o
n
Ψ̃ (ψ; c, α) = ψj,k,m = 2j(α+1)/4 ψ SkT Ã2j · −m : j ≥ 0, |k| < d2j(α−1)/2 e, m ∈ c̃Z2 ,
where c̃ = (c2 , c1 ) and cZ2 means for z = (z1 , z2 ) ∈ Z2 it is cz = (c1 z1 , c2 z2 ).
For an easier handling define for j > 0 a set of possible parameter λ
n
o
Λj = λ = (j, k, m) : |k| < b2j(α−1)/2 c, m ∈ cZ .
As mentioned above we want to know wheather this Shearlet system forms a frame for L2 (R2 ).
Under certain conditions we can show that Ψ (ψ; c, α) forms a frame for the subspace of L2 (R2 )
induced by C1 ∪ C3 , i.e. for
L2
n
o
f ∈ L2 (R2 ) : ess-supp fb ⊂ C1 ∪ C3 .
And analogously, since Ψ (ψ; c, α) and Ψ̃ (ψ; c, α) are linked by rotation of 90◦ , Ψ̃ (ψ; c, α)
forms a frame for
L2
o
n
f ∈ L2 (R2 ) : ess-supp fb ⊂ C2 ∪ C4 .
This result implies that SH Φ, Ψ, Ψ̃; c, α forms a frame for L2 (R2 ) under certain conditions.
These conditions should not be stated here; for details see [10]. But a minimal requirement
that we use a lot of times is the property of a shearlet system to be feasible. This property
ensures that the essential support of the shearlet in the frequency domain is bounded.
4.1 Cone-adapted Shearlet System
31
Definition 4.2. Let δ, γ > 0. A function ψ ∈ L2 (R2 ) is called a (δ, γ)-feasible shearlet, if
there exist q ≥ q 0 > 0 and q ≥ r > 0 such that
n
o
n −γ o
b δ
· min 1, |rξ2 |−γ .
ψ(ξ) . min 1, |qξ1 | · min 1, q 0 ξ1 (4.1)
In the following parts we will assume that q = q 0 = r = 1. Remember that by Heisenberg’s
uncertainty principle a compactly supported shearlet in time domain cannot be compactly
supported in frequency domain. But due to this property we also have a decay condition on
the shearlet in the frequency domain.
Since the shearlet ψ is compactly supported in [0, 1]2 , the shearlets ψj,km will be supported
in a parallelogram of side length 2−jα/2 and 2−j/2 as illustrated in Figure 4. For α close to 1
the shearlets are square-like. But for α > 1 the shearlets become more and more line-like for
j → ∞; that is in one direction the shearlets become smaller and smaller.


Figure 4: Support of the shearlet.
32
4.2
4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM
Decay Rate of the N-term Approximation Error
In this section we will state the main result, i.e. we will claim that the N-term approximation
error by using a cone-adapted shearlet system, that fulfills special conditions, coincides with
the optimal approximation rate N −α . But first we need to analyze the shearlet coefficients
h f , ψλ i. We separate this analysis into shearlets, which support is away from the discontinuity curve and shearlets, which support is associated with the discontinuity curve. For the
shearlets associated with the discontinuity curve we further subdivide the analysis to a linear
discontinuity curve, then a general discontinuity curve and finally a general discontinuity curve
with finitely many corners. But the last extension to a discontinuity curve with corners is not
that difficult and will be given in Section 6. In this section we only want to claim the results,
the proofs are given in the following sections.
Let SH Φ, Ψ, Ψ̃; c, α be a shearlet frame for L2 (R2 ), which we indicate by
SH Φ, Ψ, Ψ̃; c, α = (ωi )i∈I
to have the same notation as in Section 2.3. The canonical dual frame associated with the
frame operator we denote by (ω̃i )i∈I . Then by the reconstruction formula of Theorem 2.18 we
know that every f ∈ Eαβ (R2 ) can be displayed as
f=
X
h f , ωi iω̃i .
i∈I
Then we define the N -term approximation fN of f ∈ Eαβ (R2 ) to be
fN =
X
h f , ωi iω̃i ,
i∈IN
where (h f , ωi i)i∈IN are the N largest coefficients in magnitude. Note that this approximation
is not always the best N -term approximation, but it is an N -term approximation. As we will
see it suffices to reach the optimal decay rate. Further note that this approximation is not
linear since, if fN is the N -term approximation for f with index-set If ;N and gN the N -term
approximation for g with index set Ig,N , than fN + gN is only the N -term approximation for
f + g if If,N = Ig,N .
In the following sections we want to prove some estimates that will be claimed now. Note
that we use generic constants for the proofs of our estimates, that is to use the same symbol
for appearing constants also if the value of the constants change. We first have to introduce
some notation.
4.2 Decay Rate of the N-term Approximation Error
33
For a scale j ≥ 0 and p ∈ Z2 let Qj,p denote the dyadic cube defined by
h
i
j
j
j 2
Qj,p := −2− 2 , 2− 2 + 2− 2 p.
Next define Qj as
Qj := {Qj,p : int(Qj,p ) ∩ ∂B 6= ∅} ,
where int(Qj,p ) denotes the interior of Qj,p , i.e. Qj is the collection of dyadic squares Qj,p ,
whose interior intersects the discontinuity curve ∂B.
Now the shearlets come into play, so for some scale j ≥ 0 and some p ∈ Z2 define Λj,p to be
the set of shearlet indices, such that the corresponding shearlet intersects the discontinuity
curve ∂B in the interior of Qj,p , i.e.:
Λj,p := {λ ∈ Λj : int(Qj,p ) ∩ ∂B ∩ int(supp ψλ ) 6= ∅}
where Λj denotes the set of all shearlet indices to the scale j.
Finally we define Λj,p (ε) for 0 < ε < 1 to be the set of shearlet indices whose support intersects
Qj,p and whose corresponding shearlet coefficients |h f , ψλ i| are larger than ε and Λ(ε) to be
the collection of all Λj,p (ε) across all scales j ≥ 0 and all p ∈ Z2
Λj,p (ε) := {λ ∈ Λj,p : |h f , ψλ i| > ε}
and
Λ(ε) :=
[
Λj,p (ε).
j,p
The set Sj,p :=
the size C · 2
− 2j
S
λ∈Λj,p supp ψλ ,
− 2j
×C ·2
for some
2j,p ∈ Qj , is then contained in a cubic window of
, hence, of asymptotically same size as Qj,p .
As the first step we want to analyze the shearlet coefficients of shearlets not interacting with
the discontinuity line. Note that by computing the shearlet coefficient h f , ψλ i, only the part
of f on the support of ψλ is important. On this part f is C β -regular since the shearlet do
not intersects the discontinuity curve. Therefore it suffices to look at functions f , that are
C β -smooth on all of R2 .
Proposition 4.3. Let f ∈ C β (R2 ) with supp f ⊂ [0, 1]2 . Suppose that ψ ∈ L2 (R2 ) is compactly supported and (δ, γ)-feasible for δ > γ + β and γ > 3. Then:
P
|c(f )∗n |2 . N −β+ε
as N → ∞
n>N
for any ε > 0.
So we have the decay rate N −β+ε for any ε > 0, but since we can choose ε arbitrarily small
this yields our desired decay rate N −β .
Now we come to the more interesting part; the decay rate of the shearlet coefficients associated
with the discontinuity curve. By assumption, the discontinuity curve is of C α -regularity,
34
4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM
Figure 5: In a sufficiently small cubic window Qj,p the discontinuity curve has one of this
characteristics. In the intersection of the cubic window with the support of a shearlet and the
discontinuity, we can choose a point and the tangent to the discontinuity at this point. The
left picture is an example for case 4a) and the right picture for case 4b)
hence, for sufficiently large j the edge curve can be parametrized either by (x1 , E(x1 )) or by
(E(x2 ), x2 ), with E ∈ C α , in the interior of Sj,p (see Figure 5).
Now there are two cases to distinguish:
Case 4a: The discontinuity curve can be parametrized by (x1 , E(x1 )) or by (E(x2 ), x2 ) with
E ∈ C α in the interior of Sj,p , such that for any λ ∈ Λj,p we have
|E 0 (xˆ2 )| ≤ 2 or |E 0 (xˆ1 )|−1 ≤ 2
for one x̂ = (xˆ1 , xˆ2 ) ∈ int(2j,p ) ∩ int(supp ψλ ) ∩ ∂B.
Case 4b: The discontinuity curve can be parametrized either by (x1 , E(x1 )) or by (E(x2 ), x2 )
with E ∈ C α in the interior of Sj,p , such that for any λ ∈ Λj,p we have
|E 0 (xˆ2 )| > 2 or |E 0 (xˆ1 )|−1 > 2
for one x̂ = (xˆ1 , xˆ2 ) ∈ int(2j,p ) ∩ int(supp ψλ ) ∩ ∂B. Note that E 0 (xˆ1 ) = 0 can be identified
by E 0 (xˆ1 )−1 = ∞ .
As mentioned above we first assume that the discontinuity curve is linear on the support of
the shearlet and estimate the coefficient.
Proposition 4.4. Let ψ ∈ L2 (R2 ) be compactly supported, and suppose that ψ is (δ, γ)-feasible
for δ > γ + β and γ > 4 and satisfies
∂ |ξ2 | −γ
b
∂ξ2 ψ ≤ |h(ξ1 )| · 1 + |ξ1 |
(4.2)
Furthermore, let λ ∈ Λj,p for j ≥ 0 and p ∈ Z2 . Suppose that f ∈ Eαβ (R2 ) for 1 < α ≤ β ≤ 2
and that ∂B is linear on the support of ψλ in the sense that:
supp ψλ ∩ ∂B ⊂ H
4.2 Decay Rate of the N-term Approximation Error
35
for some affine line H of R2 . Then:
i) if H has normal vector (−1, s) with s ≤ 3,
2−j(α+1)/4
|h f , ψλ i| ≤ C k + 2j(α−1)/2 s3
(4.3)
for some constant C < 0.
ii) if H has normal vector (−1, s) with s ≥ 3/2,
|h f , ψλ i| ≤ C2−j(7α−5)/4
(4.4)
for some constant C < 0.
iii) if H has normal vector (0, s) with s ∈ R, then (4.4) holds.
Observe that, if the line has slope s then the directional vector is given by (s, 1) since the line
is parametrized by the equation x2 = s · x1 + c for some c ∈ R. Therefore the normal vector of
the line is given by (−1, s). We see that case 4a) is handled in 4.4 i) and case 4b) is handled
in 4.4 ii) and iii). Note that it is no problem that the cases i) and ii) of Theorem 4.4 overlap,
since both cases yield to the desired decay rate as we will see later.
Having this estimate we can use it to prove the more general result for the shearlet coefficients
that are intersecting a general discontinuity line without corners.
Theorem 4.5. Let ψ ∈ L2 (R2 ) be compactly supported and suppose that ψ is (δ, γ)-feasible
for δ > γ + β and γ > 4 and satisfies condition (4.2) of Proposition 4.4 . Furthermore, let
λ ∈ Λj,p for j ≥ 0 and p ∈ Z2 . Suppose that f ∈ Eαβ (R2 ) for 1 < α ≤ β ≤ 2 and µ, λ > 0. For
fixed x̃ = (x˜1 , x˜2 ) ∈ int(Qj,p ) ∩ int(supp ψλ ) ∩ ∂B let H be the tangent line to the discontinuity
curve ∂B at x̃ = (x˜1 , x˜2 ). Then:
i) if H has normal vector (−1, s) with s ≤ 3,
2−j(α+1)/4
|h f , ψλ i| ≤ C k + 2j(α−1)/2 sα+1
(4.5)
for some constant C < 0.
ii) if H has normal vector (−1, s) with s ≥ 3/2,
|h f , ψλ i| ≤ C2−j (2α
2 +α−1
)/4
for some constant C < 0.
iii) if H has normal vector (0, s) with s ∈ R, then (4.6) holds.
(4.6)
36
4 DECAY OF THE APPROXIMATION ERROR USING A SHEARLET SYSTEM
With the estimation for the shearlet coefficients intersecting a general discontinuity curve, we
can compute the decay rate of the N-term approximation by using the N largest coefficients
in modulus and will see that this decay rate meets the desired optimal decay rate.
Theorem 4.6. Let c > 0 and let ϕ, ψ, ψ̃ ∈ L2 (R2 ) be compactly supported. Suppose that
the shearlet ψ is (δ, γ)-feasible for δ > γ + β and γ > 4 and satisfies condition (4.2) as well
as that the shearlet ψ̃ satisfies
the same
conditions but with the roles of ξ1 and ξ2 reversed.
Further suppose that SH Φ, Ψ, Ψ̃; c, α forms a frame for L2 (R2 ). Then for every ν > 0, the
shearlet frame SH Φ, Ψ, Ψ̃; c, α provides almost optimally sparse approximations of functions
f ∈ Eαβ (R2 ), i.e. there exists some constant C > 0, such that:
kf − fN k22 ≤ CN −α · (log2 N )α+1
as N → ∞,
(4.7)
where fN is the nonlinear N -term approximation by choosing the N largest coefficients in
magnitude of f .
37
5
Proofs
5.1
Proof of Proposition 4.3
To prove Proposition 4.3 we first check an analogous result for functions in Hβ (R2 ) since for
such functions we have a fractional order derivative due to Definition 2.15. When we have
shown this result we can assign it to functions in C β (R2 ) by Theorem 2.13. But first we want
to prove the following estimate which we need to check the first step.
Lemma 5.1. Let g ∈ Hβ (R2 ) with supp g ⊂ [0, 1]2 . Suppose ψ ∈ L2 (R2 ) is (δ, γ)-feasible for
δ > γ + β, γ > 3. Then there exists a constant B > 0, such that:
∞
X
X
X
j=0
k<2
j(α−1)
2
2αβj |h g, ψj,k,m i|2 ≤ Bk∂ (β,0) gk2L2
m∈Z
Proof.
b
Choose ϕ as (2πiξ1 )β ϕ(ξ)
b
= ψ(ξ)
for ξ = (ξ1 , ξ2 ) ∈ R2 . Then ϕ ∈ L2 (R2 ) since:
Z
Z
Z 2
2
2
−β b
0 ≤ |ϕ| dx = |ϕ|
b dξ = (2πiξ1 ) ψ(ξ) dξ
R2
2
R
R
Z n
o
2
≤ (2πi)−β min 1, |ξ1 |δ−β min 1, |ξ1 |−γ min 1, |ξ2 |−γ dξ
R2


Z 2 Z n
o
2
≤ (2πi)−2β min 1, |ξ1 |δ−β min 1, |ξ1 |−γ  min 1, |ξ2 |−γ dξ2  dξ1
R
R

≤C · (2πi)
−2β
Z


n
o
2
δ−β
min 1, |ξ1 |−γ dξ1
min 1, |ξ1 |
R\[−1,1]

Z1 n
o
2
+ min 1, |ξ1 |δ−β min 1, |ξ1 |−γ dξ1 
−1


=C · (2πi)−2β 
Z
|ξ1 |−2γ dξ1 +
R\[−1,1]
Z1


|ξ1 |2(δ−β) dξ1  ≤ ∞
−1
It follows that D(β,0) ϕ(ξ) = ψ(ξ) since
\
b
D(β,0)
ϕ(ξ) = (2πi)|(β,0)| ξ (β,0) ϕ(ξ)
b
= (2πiξ1 )β ϕ(ξ)
b
= ψ(ξ).
And therefore
2 2
2 (β,0)
β
=
h
g
b
,
(2πiξ
)
ϕ
\
i
g, ϕj,k,m i = h (2πiξ1 )β gb, ϕ
\
i
h D
1
j,k,m
j,k,m 2
= h g, D(β,0) ϕj,k,m i = 2αβj |h g, ψj,k,m i|2 .
38
5 PROOFS
Where in the last step we used D(β,0) ϕj,k,m = 2αβj ((β,0) ϕ)j,k,m . Now the claim follows by a
corollary that is shown in [10] for the three-dimensional case, and which follows analogously
for the 2-dimensional case:
∞
X
X
j=0
=
k<2
∞
X
j=0
X
m∈Z2
j(α−1)
2
2
X h D(β,0) g, ϕj,k,m i ≤ BkD(β,0) gk2L2 .
X
k<2
2αβj |h g, ψj,k,m i|2
m∈Z2
j(α−1)
2
With this estimate we are ready to prove the statement for functions in Hβ (R2 ).
Lemma 5.2. Let g ∈ Hβ (R2 ) with supp g ⊂ [0, 1]2 . Suppose that ψ ∈ L2 (R2 ) is compactly
supported and (δ, γ)-feasible for δ > γ + β and γ > 3. Then:
P
|c(g)∗n |2 . N −β as N → ∞
n>N
where
c(g)∗n
is the n-th largest coefficient in magnitude from h g, ψλ i.
Proof.
S
Λ
Define Λ̃j := {λ ∈ Λj : supp ψλ ∩ supp g 6= ∅} and NJ := J−1
. Then NJ satisfies the
j
j=0
estimate:
NJ ∼
J−1
X
2
j(α−1)
2
jα j J−1
X
22
22 =
2jα ∼ 2Jα
j=0
j=0
since 2
jα
2
j
2 2 is the volume of the support of shearlet ψλ with λ ∈ Λj , and therefore the number
j(α−1)
of possible translates m, and 2 2 the number of possible parameters k for the scale j.
P
|c(g)∗n | does not contain the Nj0 largest elements of (|h g, ψj,k,m i|)j,k,m
Note that the sum
n>Nj0
and therefore it holds that
X
|c(g)∗n | ≤
XX
|h g, ψj,k,m i| ,
j=j0 k,m
n>Nj0
since the number of summands in both sums is the same and the second sum can contain
some of the Nj0 largest coefficients instead of the smaller ones. This yields
∞
X
j0 =1
2αβj0
X
|c(g)∗n |2 ≤C
2αβj0 |h g, ψj,k,m i|2
j=1 j=j0 k,m
n>Nj0
rearrangement of terms
→
∞ X
∞ X
X
=C
≤C
∞ X
X
j=1 k,m
∞ X
X
j=1 k,m
2
|h g, ψj,k,m i|
j
X
2αβj0
j0 =1
|h g, ψj,k,m i|2 2αβj < ∞,
5.1 Proof of Proposition 4.3
39
where the last step follows from Lemma 5.1.
P
In particular we have 2αβj0
|c(g)∗n |2 ≤ C and therefore
n>Nj0
|c(g)∗n |2 ≤ C2−αβj0 = C 2αj0
P
n>Nj0
−β
.
≤ CNj−β
0
Finally let N > 0, then there exists a positive integer j0 > 0 such that N ∼ Nj0 ∼ 2αj0 and
with the estimate above the claim follows:
X
n>N
|c(g)∗n |2 ∼
X
∼ CN −β .
|c(g)∗n |2 ≤ CNj−β
0
n>Nj0
Proof of Proposition 4.3.
Let f ∈ C β (R2 ), then by the embedding Theorem 2.13 for fractional order Sobolev spaces f
is also in H β−ε (R2 ) since C β (R2 ) ⊂ H β−ε (R2 ). Now from the Lemma:
P
|c(f )∗n |2 . N −(β−ε) = N −β+ε as N → ∞
n>N
40
5.2
5 PROOFS
Proof of Proposition 4.4
Now we are ready to prove Proposition 4.4, that is we want to prove the estimation for the
shearlet coefficients on the assumption that the singularity curve is linear on the support of
the shearlet. The proof follows in an analogue way as the proof of the three dimensional case
as well as the proof of the two-dimensional case for α = 2 in [9], [10].
Without loss of generality we assume that f is only nonzero on B. We first consider the cases
i) and ii) of Proposition 4.4. The hyperplane can be written as
H = x ∈ R2 : h x − x0 , (−1, s)i = 0
for some x0 ∈ R2 since (−1, s) is the normal vector that should be orthogonal to the discontinuity line and x0 gives the translation from the origin to the real position of the discontinuity
line.
Step 1
Since integration is easier along lines parallel to the discontinuity line we shear the discontinuity line, such that it is a parallel line to the x2 -axis:
S−s H = x ∈ R2
= x ∈ R2
= x ∈ R2
= x ∈ R2
: h Ss x − x0 , (−1, s)i = 0
: h x − S−s x0 , SsT (−1, s)i = 0
: h x − S−s x0 , (−1, 0)i = 0
: x1 = xˆ1
where xˆ1 = (S−s x0 )1 . So we see that x1 is constant, i.e. S−s H is parallel to the x2 -axis. But
this requires a modification of the shear parameter since it should hold that
h f , ψj,k,m i = h f (Ss ·), ψj,k̂,m i.
If we define k̂ by k̂ := k + 2
j(α−1)
2
s the equality is fulfilled. Indeed, easy integral substitution
with y = Ss x shows
Z
f
R2
1 s
0 1
!
x1
x2
!!
ψ
1 k + 2j(α−1)/2 s
0
1
!
2jα/2
0
0
2j/2
!
x1
x2
!!
dx
Z
=
f (y) ψ (Sk A2j y) dy,
R2
since SS−1 Sk̂ A2j = Sk A2j SS . To simplify the integration more, we fix a new origin in x1 = xˆ1
(the x2 coordinate for this new origin will be defined in the next step). Then f will be equal
to zero on one side of the x2 -axis, i.e. on one side of S−s H since f is only non-zero on B. Say
f is equal to zero on x1 < 0. So it suffices to look at h f0 (Ss ·) χΩ , ψj,k̂,m i for Ω = R × R+ and
5.2 Proof of Proposition 4.4
41
(a) Support of a shearlet, which inter-
(b) Support of the shearlet after shear-
sects the discontinuity line.
ing the discontinuity line and the
shearlet.
Figure 6: Shearlets interacting with corner points.
f0 ∈ C β (R2 ).
Step 2
Without loss of generality we assume that k̂ < 0 since the case k̂ ≥ 0 can be handled similarly.
Because of translation symmetry we can also assume m = (0, 0).
Since ψ, and therefore ψλ , are compactly supported, we can define a parallelogram Pj,k̂ that
contains supp ψλ , then integration limited to Pj,k̂ :
Due to the property of being compactly supported, there is an L > 0 such that supp ψ ⊂
[−L, L] and by a rescaling argument we assume L = 1. Then supp ψj,k̂,0 is contained in
Pj,k̂ : = x ∈ R2 : Sk̂ A2j x 1 ≤ 1, Sk̂ A2j x 2 ≤ 1
o
n
= x ∈ R2 : 2jα/2 x1 + k̂2j/2 x2 ≤ 1, |x2 | ≤ 2−j/2 .
One side of Pj,k̂ is given by 2jα/2 x1 + k̂2j/2 x2 = 1 which is equivalent to x1 = 2−jα/2 +
2−j(α−1)/2 k̂x2 . Solving this equation for x2 = 0 gives x1 = 2jα/2 , that means that one side
of Pj,k̂ intersects the x1 -axis in x1 = 2−jα/2 . To fix the x2 coordinate of the new origin such
that this side of the parallelogram intersects that new origin, it should hold supp ψj,k̂,m ⊂
Pj,k̂ + 2−jα/2 , 0 =: P̃j,k relative to that new origin.
The sides of P̃j,k are given by x2 = ±2j/2 , 2jα/2 x1 + 2j/2 k̂x2 = 0 and 2jα/2 x1 + 2j/2 k̂x2 = 2.
By a rescaling argument we assume that the right hand side of the last equation is given by
1 instead of 2. Solving the last two equations for x2 yields
jα/2 x
L1 : x 2 = − 2
2j/2 k̂
1
j(α−1)/2 x
= −2
k̂
1
,
42
5 PROOFS
L2 : x2 =
1
2j/2
−
2jα/2 x1
2j/2 k̂
=
2−j/2
k̂
−
2j(α−1)/2 x1
.
k̂
Since x1 ≥ 0 and x2 ≥ −2j/2 , the lower bound for integration along x1 is given by 0 and the
upper bound by K1 = 2−jα/2 + k̂2−jα/2 . Which follows directly from solving the equation L2 .
This yields the adequate but simpler integration
ZK1ZL1
f0 (Ss x) ψj,k̂,m (x) dx2 dx1 h f0 (Ss ·) χΩ , ψj,k̂,m i . 0 L2
where the inner integration is along lines parallel to the singularity curve.
Step 3
To simplify the integration we estimate it by using Taylor expansion for Hölder smooth function introduced in (2.1). This also yields a partition of the integral. Afterwards we can
estimate each part of the integral separately.
The Taylor expansion for f0 (Ss ·) in x2 -direction at each point (x1 , x˙2 ) = ẋ ∈ L1 is given by:
"
f0
Ss
x1
x2
#!
∂
= |f0 (Ss ẋ)| + ∂x2
f0
" # " #!β
" #!! " # " #!
x
x
x1
x1 x1 1
1
SS
−
−
+C ·
x2
x2
ẋ2
ẋ2 ẋ2 = |f0 (Ss ẋ)|
" #! " #!!
! " # " #!
x1
x1
s x1
x1 ∂
∂
f0
f0
SS
SS
·
−
+
∂x1
∂x2
x˙2
x˙2
1 x2
ẋ2 " # " #!β
x
x1 1
+C ·
−
x2
ẋ2 β !
2j(α−1)/2
2j(α−1)/2
β
x1 + x2 +
x1
≤ C (1 + |s|) · 1 + x2 +
k̂
k̂
where we used in the last step that (x2 − ẋ2 ) = x2 +
2j(α−1)/2
x1
k̂
since (x1 , x˙2 ) = ẋ ∈ L1 and
that all partial derivatives of f are bounded by a constant µ since f is cartoon-like. This
yields:
ZK1X
3
β
Il (x1 )dx1 ,
h f0 (Ss · χΩ ) , ψj,k̂,m i . (1 + |s|)
0
l=1
(5.1)
5.2 Proof of Proposition 4.4
43
where
ZL1
ψj,k̂,m (x)dx2 ,
I1 (x1 ) =
(5.2)
L2
ZL1
x2 +
I2 (x1 ) =
2j(α−1)/2
k̂
L2
!
ψj,k̂,m (x) dx2
x1
ZL1
=
(x2 + K2 x1 ) ψj,k̂,m (x) dx2 ,
(5.3)
L2
ZL1
x2 +
I3 (x1 ) =
2j(α−1)/2
k̂
L2
!β
ψj,k̂,m (x) dx2
x1
−j/2
2Z k̂
(x2 )β ψj,k̂,m (x1 , x2 − K2 x1 ) dx2 ,
=
(5.4)
0
and K2 is defined by K2 :=
2j(α−1)/2
.
k̂
Step 4
We now want to compute I1 . An easy computation shows:
"
#!
−jα/2 ξ
j(α+1)
2
1
ψbj,k̂,0 (ξ1 , ξ2 ) = 2− 4 ψ̂
,
−k̂2−jα/2 ξ1 + 2−j/2 ξ2
(5.5)
since:
Z
ψbj,k̂,0 (ξ1 , ξ2 )
=
ψj,k̂,0 (x1 , x2 )e−2πih x, ξi d(x1 , x2 )
R
=
Substitution
=
y=Sk̂ A2j x
2j(α+1)/4
2
j(α+1)/4
Z
ψ Sk̂ A2j x e−2πih x, ξi d(x1 , x2 )
R
Z
ψ(y)e
−1
−2πih A−1
y, ξi
j S
k̂
2
d(x1 , x2 )
R
=
2j(α+1)/4
Z
ψ(y)e
T
−2πih y, (S −1 )T (A−1
j ) ξi
k̂
2
d(x1 , x2 )
R
=
=
j(α+1)/4
2
j(α+1)
− 4
2
ψb
"
ψ̂
S −1
k̂
T
T
A−1
j
2
ξ
2−jα/2 ξ1
−k̂2−jα/2 ξ1 + 2−j/2 ξ2
#!
.
44
5 PROOFS
By assumption of ψ to be feasible it follows that
b
ψ
(ξ
,
0)
j,k̂,0 1 j(α+1)
−jα/2 −γ
−jα/2 −γ
−jα/2 α
− 4
min 1, 2
k̂ξ1 ξ1 min 1, 2
ξ1 min 1, 2
≤2
−γ
j(α−1) ≤ 2 4 2−jα/2 h1 2−jα/2 ξ1 k̂ ,
where h1 is given by
h1 (ξ) := min (1, |ξ|α ) min 1, |ξ|−γ min k̂ , |ξ|−γ
≤ min (1, |ξ|α ) min 1, |ξ|−γ |ξ|−γ
= min |ξ|−γ , |ξ|α−γ min 1, |ξ|−γ .
Since α − γ ≥ 0 this shows h1 ∈ L1 (R). The Fourier-Slice Theorem (see Theorem 2.6) applied
to ψj,k̂,0 yields
Z
I1 (x1 ) =
Z
ψj,k̂,0 (x) dx2 =
R
ψbj,k̂,0 (ξ1 , 0) e2πiξ1 dξ1 .
R
And with the estimates above one gets
Z
−γ
I1 (x1 ) . 2j(α−1)/4 2−jα/2 h1 2−jα/2 ξ1 1 + k̂ dξ1
R
Z
=
2
j(α−1)/4
−γ
|h1 (ξ1 )| 1 + k̂ dξ1
R
.2
j(α−1)/4
−γ
1 + k̂ ,
where we used in the last step that h1 is in L1 .
Step 5
In this step we estimate I2 , and we will see, that I2 decays faster than I1 . Hence, we can I2
leave out of our analysis. First we can I2 divide since
Z
Z
I2 (x1 ) ≤ x2 ψj,k̂,0 (x)dx2 + |K2 | ψj,k̂,0 (x)dx2 =: S1 + S2 .
R
R
Again by applying the Fourier-Slice-Theorem one gets with fj,k̂,0 (x) := x2 ψj,k̂,0 (x) that
Z
Z
Z
2πiξ1
2πiξ1
[
\
S1 = x2 ψj,k̂,0 (x)dx2 = fj,k̂,0 (ξ1 , 0)e
dξ1 = x2 ψj,k̂,m (ξ1 , 0)e
dξ1 .
R
R
R
(5.6)
5.2 Proof of Proposition 4.4
45
With the property of the Fourier transform (see Proposition 2.3) we know that
∂ b
ψ
= −ix\
2 ψj,k̂,0 .
∂ξ2 j,k̂,0
Thus we can estimate
Z ∂
2πix
ξ
1
1
dξ1 S1 ≤ ψbj,k̂,0 (ξ1 , 0)e
∂ξ2
R
Z
−γ −j(α+1)/4
−jα/2
2−j/2
1 + k̂ . 2
h 2
ξ1
R
−j(α+1)/4 −j/2
=2
. 2−j(3−α)/4
2
−jα/2
−γ ZR
(1 + k̂ h(ξ1 )dξ1 )
(2
−γ
1 + k̂ ,
where we used in the second step condition (4.2) of Proposition 4.4 and in the last one that
h ∈ L1 (R).
The estimate for S2 follows directly from the estimate for I1 since S2 ≤ |K2 | I1 and K2 can
be estimated as follows:
|K2 | ≤
2−j(α−1)/2
k̂
−jα/2
= 2−j/2 ,
k̂ 2
since |x1 | ≤ −k̂2−jα/2 . Hence,
−γ
1 + k̂ S2 ≤ 2
2
−γ
= 2−j(3−α)/4 1 + k̂ .
−j/2 −j(α−1)/4
−γ
In summary we can conclude that I2 (x1 ) . 2−j(3−α)/4 1 + k̂ .
Step 6
It remains to estimate I3 :
−j/2
−j/2
2Z k̂
2Z k̂
(x2 )β kψj,k̂,0 kL∞ dx2 . 2j(α+1)/4 (x2 )β dx2 I3 ≤ 0
0
−(β+1)
. 2j(α−2β−1)/4 k̂ .
.
Step 7
We are now ready to compute h f0 (Ss ·) χΩ , ψj,k̂,0 i. Since I2 decays faster than I1 we can
46
5 PROOFS
leave it out of our analysis. Therefore we get the estimate:


ZK1 j(α−1)/4
2j(α−2β−1)/4
2

β
γ +
h f0 (Ss ·) χΩ , ψj,k̂,0 i . (1 + |s|)  β+1 dx1 
1 + k̂ k̂ 0


−j(α+2β+1)/4 
 2−j(α+1)/4
= (1 + |s|)β  γ−1 +
.
β
1 + k̂ k̂ (5.7)
Suppose that s ≤ 3, then (5.7) reduces to:
2j(α+1)/4
2j(α+1)/4
2j(α+2β−1)/4
.
|h f , ψj,k,m i| . 3 +
3
β
1 + k̂ 1 + k̂ k̂ .
2j(α+1)/4
3
1 + k + 2j(α−1)/2 s
On the other hand suppose that s ≥ 3/2, then we can do the following estimates:
• k2−j(α−1)/2 + s ≥ |s| − k2−j(α−1)/2 ≥
1
2
− 2−j(α−1)/2 ≥ C
for sufficiently large j since k ≤ 2j(α−1)/2 + 1,
•
•
2−j(α+1)/4
(1+|k̂|)3
=
2−j(7α−5)/4
(2−j(α−1)/2 +|k2−j(α−1)/2 +s|)3
(1+|s|)β
(|s|−|1−2−j(α−1)/2 |)
3
≤
1+|s|
|s|−C
β 1
|s|−C
≤
2−j(7α−5)/4
3,
k2
| −j(α−1)/2 +s|
3−β
≤C
1/|s|+1
1−C/|s|
≤ C.
Hence,
2−j(α+1)/4
2−j(7α−5)/4
β
(1 + |s|)β 3 .
3 (1 + |s|) . 2−j(7α−5) .
−j(α−1)/2
|s| − 1 − 2
1 + k̂ Step 8
Now we consider the case iii) and can do a similar computation. Suppose ∂B can be parametrized by (x1 , E(x1 )) with E 0 (x1 ) = 0 and E ∈ C β . In this case we do not need to
shear the discontinuity curve since it already is parallel to the x1 -axis. Again let Pj,k the
parallelogram that contains the support of ψj,k,0 and fix the new origin, such that one side
of the parallelogram intersects this new origin, i.e. define P̃j,k := Pj,k − 2−jα/2 , 0 such that
relative to the new origin it holds that supp ψj,k,0 ⊂ P̃j,k . Then Pj,k has the sides x2 ≤ ±1
and
L1 :
x1 = −2−j(α−1)/2 kx2 ,
L2 :
x1 = 2 · 2−jα/2 − 2−j(α−1)/2 kx2 .
5.2 Proof of Proposition 4.4
47
Therefore
h f0 χ , ψj,k,0 i = C
Ω̃
−j/2 L
2Z
Z1
f0 · ψj,k,0 dx1 dx2 ,
0
L2
where Ω̃ = R × R+ . Taylor expansion of f0 in x1 -direction yields:
!
!
!
x˙1 x˙1
x1
∂
+
=f0
x1 + 2−j(α−1)/2 kx2 +
f0
f0
∂x1
x˙2
x˙2
x˙2
!
β
ξ1 ∂
−j(α−1)/2
x
+
2
kx
f
1
2
0
∂x21
x˙2
By assumption (4.2) of Proposition 4.4 we can conclude that
Z
xl1 ψ(x)dx1 = 0 ∀x2 ∈ R, l = 0, 1
(5.8)
R
since
b
0 ≤ ψ(0,
ξ2 ) ≤ min(1, 0α ) min(1, 0−γ ) min(1, |ξ2 |−γ ) = 0
and
b 1 , ξ2 ) − ψ(0,
b ξ2 )
b 1 , ξ2 )
∂ b
ψ(ξ
ψ(ξ
ψ(0, ξ2 ) = lim
= lim
= 0.
ξ1 →0
ξ1 →0
∂ξ1
ξ1
ξ1
Shearing preserves vanishing moments since
Z
Z
xl1 ψ Sk (x1 , x2 )T dx1 = (x1 − kx2 )l ψ (x1 , x2 )T dx1 .
R
R
Thus integration reduces to:
h f0 χ , ψj,k,0 i ≤ kψk∞ 2j(α+1)/4
Ω̃
−j/2 L
2Z
Z1
0
. 2j(α+1)/4
−j/2
2Z
0
x1 + 2−j(α−1)/2 kx2
L2
Z0
−2−jα/2
= 2−j(3α+2αβ−3)/4 .
(x1 )β dx1 dx2
β
dx1 dx2
48
5 PROOFS








Figure 7: Shearlet intersecting the boundary curve and the smallest parallelogram P that
entirely contains the curve in the interior of the shearlet.
5.3
Proof of Theorem 4.5
Let (j, k, m) ∈ Λj,p and fix x̂ = (x̂1 , x̂2 ) ∈ int(2j,p ) ∩ int(supp ψλ ) ∩ B and let s be the slope
of the tangent to the edge-curve ∂B at (x̂1 , x̂2 ). Without loss of generality we can assume
that the edge curve satisfies E(0) = 0 by translation symmetry and as before m = 0. Select
now P to be the smallest parallelogram that entirely contains the edge curve parametrized by
(x1 , E(x1 )) or (E(x2 ), x2 ) in the interior of supp ψj,k,0 , and whose two sides are parallel to the
tangent to the edge curve at (x̂1 , x̂2 ) = (0, 0). Now we can split the shearlet coefficients in the
following way
h f , ψj,k,0 i = h χP f , ψj,k,0 i + h χP C f , ψj,k,0 i = h χP f , ψj,k,0 i + h χP C f (Ss ·) , ψj,k̂,0 i,
where the shearing operation is used for the part outside the parallelogram since for this we
can use the estimate for a linear discontinuity curve. For this reason in this step we can
concentrate on computing h χP f , ψj,k,0 i.
First we assume that the edge curve can be parameterized by (x1 , E(x1 )) with the slope of the
tangent at (0, 0) not equal to zero, or by (E(x2 ), x2 ) where E ∈ C α . Estimating the length
of the sides of P will give us the volume of P, and therefore an estimation for the shearlet
coefficients. Let d be the length of these sides of the parallelogram that are parallel to the
tangent. We observe that d is the distance between two points in which the tangent intersects
the boundary of supp ψj,k,0 . From this observation it follows that:
d=
2−j/2 p
k̂
s2 + 1.
To see this, remember that the parallelogram containing the support of ψj,k,0 has sides
jα/2
j/2 2
x
+
k2
x
1
2 ≤ 1,
5.3 Proof of Theorem 4.5
49
with x1 = sx2 , which yields
2j/2
1
=
±
,
2jα/2 s + k2j/2
k̂
1
2j/2
= ±s jα/2
=
±s
,
2
s + k2j/2
k̂
x2± = ±
x1±
"
and the distance is given by k
x1+
x2+
#
−
"
#
x1−
x2−
k = d.
Now we can also estimate the width of the parallelogram P, call it d˜ . Since the edge curve
can be parametrized by a C α -function E with bounded curvature,
d˜ ≤
2−j/2 p 2
s +1
k̂
!α
.
In summary the Volume of P can be estimated as
V ol(P) ≤ C
2−j/2 p 2
s +1
k̂
!α+1
=C
s2 + 1
(α+1)/2
2−j(α+1)/2
.
α+1
k̂ This implies
|h f χP , ψj,k,0 i| . 2
j(α+1)/4
kf k∞ kψk∞
−j(α+1)/4
.2
For s < 3 this yields|h f χP , ψj,k,0 i| .
h χP C f (Ss ·) , ψj,k̂,0 i .
2−j(α+1)/4
3
(1+|k̂|)
s2 + 1
(α+1)/2
2−j(α+1)/2
α+1
k̂ (α+1)/2
s2 + 1
.
α+1
k̂ 2−j(α+1)/4
α+1
|k̂|
(5.9)
and with the estimate for the linear part
we are left with:
2−j(α+1)/4
2−j(α+1)/4
2−j(α+1)/4
,
|h f , ψλ i| ≤ C α+1 + 3 ≤ C k + 2j(α−1)/2 sα+1
1 + k̂ k̂ α+1 α+1
3 ≥ k̂ since α + 1 ≤ 3.
where we used in the last step that 1 + k̂ ≥ 1 + k̂ For the case that s > 3/2 we first observe the following:
1 + s2
α+1
|s|α+1
2
=
s2
1
s2
+1
α+1
|s|α+1
2
≤2
|s|α+1
= 2,
|s|α+1
50
5 PROOFS
where we used that 1/s < 1 since s > 1. Now estimation (5.9) yields
|h f χP , ψj,k,0 i| . 2
s2 + 1
−j(α+1)/4
(α+1)/2
k + 2j(α−1)/2 sα+1
(α+1)/2
s2 + 1
2−j(α+1)/4
=
k/s + 2j(α−1)/2 α+1
|s|α+1
2−j(α+1)/4
2
= 2−j (2α +α−1)/4 .
2j(α−1)/2(α+1)
Together with the estimate for the linear part h χP C f (Ss ·) , ψj,k̂,0 i . 2−j(7α−5)/4 we get
.
2
2
−j(7α−5)/4
h
f
,
ψ
i
+ 2−j (2α +α−1)/4 . 2−j (2α +α−1)/4 ,
j,k̂,0 . 2
since (7α − 5) /4 ≥ α2 /2 + α/4 − 1/4 for 1 ≤ α ≤ 2 with identity if and only if α ∈ {1, 2}.
Now we have still to handle the case that the edge curve is parametrized by (x1 , E(x1 )). Again
we let P be the parallelogram that contains the edge curve in the interior of supp ψj,k,0 and
observe that one side of P is parallel to the x1 -axis, hence the length d of this side is given by
the distance between the boundary lines of supp ψj,k,0 . This observation yields
d = 2−jα/2 ,
2
d˜ = 2−jα /2
V ol(P) . 2−jα(1+α)/2 .
and
And finally we have
|h f χP , ψj,k,0 i| . 2j(α+1)/4 kf k∞ kψk∞ 2−jα(1+α)/2
. 2j(1−α−2α
2 )/4
.
With the same argumentation as in case ii) we obtain the desired result.
5.4 Proof of the Main Result Theorem 4.6
5.4
51
Proof of the Main Result Theorem 4.6
Now we want to prove the main result for a cartoon-like image with a C α -regularity, but without corners. The result from Theorem 4.3 shows that shearlet coefficients of shearlets that do
not interact with the discontinuity line can be neglected. Since for the restriction 1 < α ≤ β it
is N −β+ε ≥ N −α+ε these shearlet coefficients meet the decay rate. Now we want to estimate
the other shearlet coefficients. We do this in two steps; we first estimate |Λj,p (ε)| and |Λj,p |
for case 4a) and then for case 4b).
Claim 1a: For case 4a) we have the following estimate:
|h f , ψλ i| ≤ 2−j(α+1)/4 .
Proof. Since ψ ∈ L2 (R2 ) and ψ is compactly supported on Ω ⊂ R2 , Ω bounded, it holds that
ψ|Ω ⊂ L2 (Ω) ⊂ L1 (Ω) by the embedding property of the Lebesque spaces. Hence, ψ ∈ L1 (R2 )
and kψkL1 ≤ C for some positive constant C. The Hölder inequality implies:
|h f , ψλ i| ≤ kf k∞ kψλ kL1 ≤ kf k∞ C · 2j(α+1)/4 ≤ 2µC2j(α+1)/4 ,
where in the last step we used that kfi kC β ≤ µ, i ∈ {0, 1} by definition. This in particular
implies supx |fi (x)| ≤ µ and supx |f (x)| ≤ supx |f0 (x)| + supx |f1 (x)| ≤ 2µ. For simplicity we
may assume that 2µC = 1 what proves the claim.
For estimating |Λj,p (ε)| we restrict our attention to the shearlet coefficients that are larger than
ε in absolute value. We will see that we can restrict our attention to a special range of scales j.
Claim 2a: Λj,p (ε) contains only shearlet coefficients to scales j that meet the inequality
j≤
4
log2 (ε−1 ).
α+1
Proof. By definition of Λj,p (ε) and Claim 1a) it is:
ε ≤ |h f , ψλ i| ≤ 2−j(α+1)/4 ⇔ log2 (ε) ≤ −j(α + 1)/4
4
⇔j≤
log2 (ε−1 ).
α+1
For the following step we need some new notation which we introduce now:
Mj,k,Qj,p := m ∈ Z2 : supp(ψj,k,m ) ∩ int(Qj,p ) ∩ ∂B 6= 0 .
For m = (m1 , 0) define
Pj,k,m = Pj,k + 2−jα/2 m1 , 0 ,
(5.10)
52
5 PROOFS
and its crossline P0 given by
n
o P0,j,m = x ∈ R2 : x1 + k2−j(α−1)/2 x2 = 0, |x2 | ≤ 2j/2 + 2−jα/2 m1 , 0 .
Claim 3a: For each shear index k and s ∈ [−2, 2] it holds:
Mj,k,Q ,p ≤ C · k + 2j(α−1)/2 s + 1 ,
j
(5.11)
which is independent of the choice of the point x̂ ∈ int(Qj,p ) ∩ int(supp(ψλ )) ∩ ∂B.
Proof.
Independence
0
0
Let x̂ be another point ∈ int(Qj,p ) ∩ int(supp(ψλ )) ∩ ∂B and let s and s be the associated
0
slopes of the tangents in the discontinuity curve E in x̂ and x̂ . Since E ∈ C α and α − 1 is the
fractional part of α by definition of the Hölder space, there is a constant C1 > 0, independent
0
of x̂, x̂ , such that:
α−1
α−1
s − s0 = E 0 (x̂2 ) − E 0 (x̂02 ) ≤ C1 · x̂2 − x̂02 ≤ C1 · 2−j/2
= C1 · 2−j(α−1)/2 ,
(5.12)
where we used that |x2 | ≤ 2−j/2 . And hence,
k + 2j(α−1)/2 s0 = k + 2j(α−1)/2 s − 2j(α−1)/2 s + 2j(α−1)/2 s0 ≤ k + 2j(α−1)/2 s + 2j(α−1)/2 s − s0 ≤ k + 2j(α−1)/2 s + C1
≤ C · k + 2j(α−1)/2 s + 1 .
This proves that estimate (5.11) remains asymptotically the same, independently of the values
of s and s0 .
Estimation
For each fixed j, k we want to count the number of translates m ∈ Z such that supp(ψj,k,m )
intersects the discontinuity curve inside Qj,p . Observe that for fixed m1 only a finite number
of m2 -translates can fulfill supp(ψj,k,m ) ∩ int(Qj,p ) ∩ ∂B 6= ∅ since this number is bounded
by the number of parallelograms Pj,k,m that are intersecting Qj,p and this is independent of
the translate p of the cube since the size of Pj,k,m and Qj,p do not change by translation. For
this reason it suffices to estimate the number of relevant m1 -translations for a fixed m2 . Then
the number of m-translates is given by multiplying the number of m1 -translates with a fixed
constant
Mj,k,Q ≤ C · |{m1 ∈ Z : supp(ψj,k,m ) ∩ int(Qj,p ) ∩ ∂B 6= 0}| .
j,p
5.4 Proof of the Main Result Theorem 4.6
53
For simplicity fix m2 = 0. Without loss of generality assume Q = Qj,p = −2−j/2 , 2−j/2 and
H to be the tangent line to ∂B at (0, 0). Note that supp(ψj,k,m ) ⊂ Pj,k,m since supp(ψj,k,0 ) ⊂
Pj,k for m = (m1 , 0). So we can substitute in the following way:
Mj,k,Q ≤ C · |{m1 ∈ Z : Pj,k,m ∩ int(Qj,p ) ∩ ∂B 6= 0}| .
j,p
By definition of a cartoon-like image the curvature of ∂B is bounded, so if we replace ∂B by
the tangent line H, meaning:
Mj,k,Q ≤ C · |{m1 ∈ Z : Pj,k,m ∩ int(Qj,p ) ∩ H 6= 0}|
j,p
does not change the estimation in its asymptotically behavior. Replacing Pj,k,m by its crossline
P0 does not change it as well. Alltogether it follows the description, that is much easier to
handle:
Mj,k,Q ≤ C · |{m1 ∈ Z : P0,j,m ∩ int(Qj,p ) ∩ H 6= 0}| .
j,p
Solving P0,j,m for x1 yields:
x1 = 2−jα/2 m1 − 2−j(α−1)/2 kx2 .
With the description for the tangent line H: x1 = sx2 , equalizing of both equations gives:
sx2 = 2−jα/2 m1 − 2−j(α−1)/2 kx2
⇔ 2−jα/2 m1 = sx2 + 2−j(α−1)/2 kx2
⇔ m1 = 2jα/2 s + 2j/2 k x2
⇔ m1 = 2j(α−1)/2 s + k 2j/2 x2 .
Using that x2 ≤ 2−j/2 gives the desired estimate for m1 :
|m1 | ≤ 2j(α−1)/2 s + k and the claim is proven.
We just provided that estimation (5.11) is independent of the choice of the point x̂ ∈ int(Qj,p )∩
int(supp(ψλ )) ∩ ∂B. But it is also important that by the choice of x̂ we can assure to get the
same estimation for the shearlet coefficient. But this holds for sufficiently large scaling index
j.
0
0
Claim 4a: Let x̂ be another point ∈ int(Qj,p ) ∩ int(supp(ψλ )) ∩ ∂B and let s and s be the
0
associated slopes of the tangents in the discontinuity curve E in x̂ and x̂ then the following
estimation holds:
2−j(α+1)/4
2−j(α+1)/4
α+1 ≤ 2α+1 k + 2jα−1/2 s0 k + 2jα−1/2 sα+1
54
5 PROOFS
Proof. Without loss of generality we assume that k + 2j/2 ≥ 2 · C1 , where is the Hölder
constant appearing in (5.12). Which can be seen as follows:
n
o n
o
k ∈ Z : k + 2j(α−1)/2 s < 2 · C1 = k ∈ Z : −2 · C1 < k + 2j(α−1)/2 s < 2 · C1 n
o
= k ∈ Z : −2 · C1 − 2j(α−1)/2 s < k < 2 · C1 − 2j(α−1)/2 s < 2 · C1 − 2j(α−1)/2 s − −2 · C1 − 2j(α−1)/2 s = 4 · C1
Hence, the number of parameters k that not fullfil the assumption is bounded by a constant
independent on j. Now from (5.12) follows
0
0
k + 2j(α−1)/2 s ≥ 2 · C1 ≥ 2 · 2j(α−1)/2 · s − s = 2 · 2j(α−1)/2 s + k − 2j(α−1)/2 s − k 0
≥ 2 · k + 2j(α−1)/2 s − k + 2j(α−1)/2 s which implies
0
⇔ 2 · k + 2j(α−1)/2 s ≥ k + 2j(α−1)/2 s
and therefore
2−j(α+1)/4
2−j(α+1)/4
⇔
α+1 ≤ 2(α+1) k + 2j(α−1)/2 s0 k + 2j(α−1)/2 sα+1
Now we are ready to present the estimation for |Λj,p (ε)|.
Claim 5a The number of coefficients in Λj,p (ε) for fixed j and case 4a) can be estimated as
follows:
2
1
Λj,p (ε) ≤ C · ε− α+1 · 2−j/4 + 1 .
Proof. In this case Theorem 4.5 gives us the following estimate:
ε ≤ |h f , ψλ i| ≤
2−j(α+1)/4
α+1
k + 2j(α−1)2 s
1
⇒ k + 2j(α−1)2 s ≤ ε− α+1 2−j/4 .
(5.13)
n
o
1
− α+1
j(α−1)2
−j/4
Let Kj (ε) = k ∈ Z : k + 2
s ≤ε
2
. Since Λj,p (ε) is the union of Mj,k,Qj,p
over k, we can conclude with the
X
|Λj,p (ε)| ≤ C ·
help of claim 3a)
X j(α−1)/2 Mj,k,Q ≤ C ·
k
+
2
s
+
1
j,p
k∈Kj (ε)
X ≤C·
k∈Kj (ε)
ε
1
− α+1
−j/4
·2
k∈Kj (ε)
1
≤ C · ε− α+1 · 2−j/4 + 1
2
,
+1
5.4 Proof of the Main Result Theorem 4.6
55
1
where in the last step we used that the number of k̂ with k̂ ≤ ε− α+1 · 2−j/4 is bounded by
1
2 · ε− α+1 · 2−j/4 and therefore the number of k ∈ Kj (ε) is bounded by the same number.
The next step is to follow analogous results for the case 4b). Note that by analogous arguments
as in the proof of independence in claim 3a, it suffices to consider only one fixed x̂ ∈ int(Qj,p )∩
int(supp(ψλ )) ∩ ∂B with associated slope s.
Claim 1b: Λj,p (ε) contains only shearlet coefficients to scales j that meet the estimate
j ≤ log2 (ε−1 )
2α2
4
.
+α−1
Proof. With Theorem 4.5 ii) respectively iii) and (j, k, m) ∈ Λj,p (ε) it follows:
ε ≤ |h f , ψλ i| ≤ 2−j(2α
2 +α−1)/4
⇔ log2 (ε) ≤ −j(2α2 + α − 1)/4
4
≥ j.
⇔ log2 (ε−1 ) 2
2α + α − 1
In this case it suffices for the main result to use a very crude estimate for Λj,p (ε). This is
displayed in the following claim:
Claim 2b: The number of coefficient in Λj,p (ε) for fixed j and case 4b) can be estimated as:
|Λj,p (ε)| . 2jα/2
Proof. With the same argumentation as before to estimate the number of translates m for
fixed j, k, such that supp(ψj,k,m ) intersects the discontinuity curve inside Qj,p , it suffices
to estimate the number of m1 -translates for fixed m2 . So let m2 be fixed, then we can
appreciate the number of possible m1 -translates by 2j/2 since Qj,p is a cube of size C · 2−j/2
where C = 2. By definition of the Shearlet-frame the number of shear parameters k is
bounded by C · 2j(α−1)/2 . Therefore is |Λj,p | ≤ C · 2j/2 · 2j(α−1)/2 = C · 2jα/2 and in particular
|Λj,p (ε)| ≤ C · 2j/2 · 2j(α−1)/2 = C · 2jα/2 since Λj,p (ε) ⊂ Λj,p .
Now the estimation of Λj,p (ε) is known for all relevant scales j, so we can estimate Λ(ε) by
accumulating Λj,p (ε) over all relevant scales j and all translations p of the cubic window Qj,0
which contains a part of the discontinuity curve.
Since the discontinuity curve is contained in [−1, 1]2 only the cubic windows Qj,p ⊂ [−1, 1]2
are relevant; for a fixed scale j that are less than the length of [−1, 1]2 divided by the length
of the cubic window, i.e. the number of translations p is bounded by C · 1/2−j/2 = C · 2j/2 .
Note that [−1, 1]2 of course contains (2j/2 )2 = 2j cubic windows, but we only count the one
56
5 PROOFS
which meet the discontinuity.
It follows with the estimation of Λj,p (ε) for case 4a) and 4b):
4
α+1
log2 (ε−1 )
X
|Λ(ε)| ≤C ·
2
2j/2 ε−1/(α+1) · 2−j/4 + 1
j=0
2
(α− 1
2 )(α+1)
log2 (ε−1 )+C
X
+C ·
2j/2 · 2jα/2
j=0
−2
≤C · log2 (ε−1 ) · ε α+1
−1
−2
To see that note that for α ≤ 2 it is ε α−1/2 ≤ ε α+1 and:
4
α+1
log2 (ε−1 )
2
2j/2 ε−1/(α+1) · 2−j/4 + 1
X
j=0
2
4
4
−1
−1
4
log2 (ε−1 ) · 2 α+1 log2 (ε )/2 ε−1/(α+1) 2− α+1 log2 (ε )/4 + 1
α+1
2
4
=
log2 (ε−1 ) · ε−2/(α+1) ε−1/(α+1) · ε1/(α+1) + 1
α+1
≤
. log2 (ε−1 ) · ε−2/(α+1)
and:
2
(α− 1
2 )(α+1)
log2 (ε−1 )+C
X
2j(α+1)/2
j=0
(α− 1 2)(α+1) log2 (ε−1 )(α+1)/2
2
−1
log
ε
·2 2
2
(α − 21 )(α + 1)
2
=
log2 ε−1 · ε−(α−1/2)
1
(α − 2 )(α + 1)
−1
. log2 ε−1 · ε (α−1/2)
≤
We know that |Λ(ε)| is the number of shearlets ψΛ with shearlet coefficients h f , ψλ i that are
larger than ε in magnitude. Setting n = |Λ(ε)| gives
n = |Λ(ε)| ≤ C · log2 (ε−1 ) · ε−2/(α+1)
⇔
n(α+1)/2 ≤ C · log2 (ε−1 )(α+1)/2 · ε−1
⇔
ε ≤ C · n−(α+1)/2 · log2 (ε−1 )(α+1)/2 .
With n large enough, more precisely n > ε−1 , this yields:
ε ≤ n−(α+1)/2 · log2 (n)(α+1)/2
5.4 Proof of the Main Result Theorem 4.6
57
That is for n ∈ N less than n shearlet coefficients are larger than n−(α+1)/2 · log2 (n)(α+1)/2
and in particular the n-th shearlet coefficient is smaller than this value:
|θ (f )|n ≤ n−(α+1)/2 · log2 (n)(α+1)/2 .
This implies
X
n>N
|θ (f )|2n
≤
X
n
−(α+1)
(α+1)
· log2 (n)
Z
∞
≤
x−(α+1) · log2 (x)(α+1) dx.
n
n>N
By partial integration we obtain the estimate, which proves the claim:
Z ∞
x−(α+1) · log2 (x)(α+1) dx
n
Z ∞
−α
α+1 ∞
−(α+1) ∂
(α+1)
= C · x log2 (x)
−
x
log
(x)
dx
2
N
∂x
N
≤ C · N −α log2 (N )α+1
where we used in the last step that
R∞
N
∂
x−(α+1) ∂x
log2 (x)(α+1) dx ≥ 0 since the logartihm is
monotonically increasing and therefore that the integrand is positive for the range [N, ∞].
58
6 EXTENSION TO A SINGULARITY CURVE WITH CORNERS












(a) Shearlet intersecting a corner
(b) Shearlet interacting with a corner
point.
point but not intersecting it.
Figure 8: Shearlets interacting with corner points.
6
Extension to a Singularity Curve with Corners
Now we want to show that the main result also holds for the extended class of cartoon-like
β
for L > 1. In this case the boundary curve is only required to be piecewise C α
images Eα;L
smooth, ie. there are finitely many points p - called corner points - where the boundary curve
is not C α - smooth. It suffices to consider shearlets interacting with the corner points since for
shearlets not interacting with them the estimates we made before still hold. Again our goal
is to estimate Λε.
There are two ways the shearlet can interact with the corner point. On the one hand the
shearlet can intersect a corner point (see Figure 8a) and on the other hand the shearlet can
interact with two boundary curves that meet in a corner point (see Figure 8b). For these cases
we get the following estimate of Λ(ε):
β
Theorem 6.1. Let f ∈ Eα;L
for 1 < α ≤ β ≤ 2 and suppose that ψ ∈ L2 (R2 ) satisfies the
conditions of Theorem 4.6. Consider the following both cases: Case 6a) The shearlets ψλ
intersect a corner point, in which two parts ∂B0 and ∂B1 of the edge curve meet. Case 6b)
The shearlets ψλ intersect two parts ∂B0 and ∂B1 of the edge curve that meet in a corner
point; but the shearlets do not intersect the corner point. Then
i) For Case 6a) we get |Λ(ε)| ≤ ε−
2(α−1)
α+1
, i.e. the number of shearlets that intersect a
corner point and have shearlet coefficients in magnitude larger than ε is bounded by this
number.
ii) For Case 6b) we get |Λ(ε)| ≤ ε−
2(α−1)
α+1
as well.
59
Proof.
Case a)
The number of shearlets of level j ≥ 0 intersecting one corner point is bounded by the number
of shearing indices k, i.e. by C · 2j(α−1)/2 . By assumption there are only finitely many corner
points, and of course, the number of corner points is independent of the shearlets and therefore
of the scale j. Hence, the number of shearlets, that intersection one of each corner points, is
also bounded by C · 2j(α−1)/2 .
By equation (5.10) we get
4
α+1
|Λ(ε)| .
log2 (ε−1 )
X
j=0
2j(α−1)/2 .
2(α−1)
2(α−1)
4
log (ε−1 )
log2 (ε−1 ) · 2 (α+1) 2
. ε− α+1 .
α+1
Case b) Let B ∈ ST ARα (ν) be the set which gives the parameterization of f = f0 +f1 χB (see
Definition 3.2). Define Q̃0j to be the set of dyadic squares Q containig two distinct boundary
curves. Then let C = Q \ B for some cubic window Q ⊂ [0, 1]2 that contains the two boundary
curves. Then we can write the function f as f = f0 χC + f1 χB for some f0 , f1 ∈ C β ([0, 1]2 ).
Note that this parameterization is different from the one given in Definition 3.2, but of course,
if we adapt f0 and f1 this is also true. Now we can also write it in the following way
f = f0 χC + f1 χB = (f0 + f1 )χC + f1 .
Since for the smooth function f1 the optimality rate is achieved, we can concentrate on
f := gχC where g ∈ C β ([0, 1]2 ) is defined as g := f0 − f1 . The utility of this parameteriR
zation of f is that the integral Q gψλ dx = h g, ψλ i vanishes on B, and therefore we can split
the integral (see Figure 8), and for each one we can use the estimations we made for the case
where no corners are at the edge curve. For this reason we show the estimation for the case
that the two parts of the edge curve are linear on Q. Then it is very easy to see that the result
for a general discontinuity curve follows analogously. Since ∂B0 and ∂B1 are linear on Q we
can write them as
Li := x ∈ R2 : h x − xi0 , (1, si )i = 0
for i = 0, 1 and some xi0 ∈ R2 .
We assume si ≤ 3 for i = 0, 1 and the other cases can be handled similarly; again by the
reason that we can split the integral into two parts that can be handled similar to the special
i
case L = 0. Next we define the set of translates Mj,k,Q
as before but for both parts of the
edge curve:
i
Mj,k,Q
:= m ∈ Z2 : |supp ψj,k,m ∩ Li ∩ Q| =
6 0 .
60
6 EXTENSION TO A SINGULARITY CURVE WITH CORNERS
i By the estimate (5.11) we know that Mj,k,Q
. k + 2j(α−1)/2 si + 1 := k̂i + 1 for
i = 0, 1. It follows that
0
1
. min k + 2j(α−1)/2 si + 1 = min k̂i + 1 .
M
∩
M
j,k,Q
j,k,Q
i=0,1
i=0,1
(6.1)
Applying Theorem 4.4 to the hyperplanes L0 and L1 , we have
2−j(α+1)/4
2−j(α+1)/4
2−j(α+1)/4
|h f , ψλ i| . 3 + 3 . max 3 ,
i=0,1
k + 2j(α−1)/2 s0 k + 2j(α−1)/2 s1 k̂i (6.2)
since we get the adequate result for each part of the integral. Using (6.1) and (6.2) and
assuming k̂0 ≤ k̂1 we can estimate |Λ(ε)| as follows:
4
α+1
|Λ(ε)| .
4
log2 (ε−1 )
X
j=0
2
j
2
X X
Q∈Q̃0j |k̂0 |
−1 )
2 (ε
5.13 α+1 log
X
0
1 + k̂0 . Qj 2
1
j
2 2 1 + ε− α+1 2−j/4
j=0
2
. Q0j ε− α+1
Since the number of dyadic squares Q ∈ Q0j , which contain two distinct boundary curves, is
bounded by a constant C > 0 for all levels j ≥ 0 the desired estimate is shown.
61
α (Matlab)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
α (Theory)
2.8
2.6
2.4
2.2
2.2
1.8
1.6
1.4
1.2
1.0
Table 1: Difference between α used in the Matlab implementation and in the code.
7
Implementation
The next step is to compute an α-shearlet approximation for special images. Now the problem is that we do not know the regularity of the images and therefore, we do not know which
α-shearlet frame we have to use. Hence, we want to learn the α that gives the best decay rate
of the approximation error.
In this section we want to explain briefly the implementation of the α-shearlet approximation
of images. The first subsection deals with the implementation of ShearLab. That is to compute the shearlet coefficients for one α and to reconstruct the image with them. The second
subsection focuse on the N -Term approximation; the goal is to find the best α.
Note that the code uses another dilation matrix as we use in the theory, but this coincides by
adapting the α (see also Table 1):
"
AM atlab =
7.1
2j
0
0
2jα
#
"
AT heroy =
2jα/2
0
0
2j/2
#
Shearlab Implementation
Note first that in practice we cannot see the images as functions in L2 (R2 ) but we have to see
them as matrices y ∈ Rn,n . The idea of the implementation is based on filters, i.e. the shearlets
will be constructed as filters. This is why we introduce filters and the basis operations that
we are going to use. Since images are two-dimensional we need two-dimensional filters, but
this can be constructed from the one-dimensional case. So we begin by introducing what a
one-dimensional filter is.
Definition 7.1. Let `(Z) be the space of all sequences c : Z → R.
i) Let F, H ∈ `(Z) then the convolution of F and H is defined as
F ∗ H: Z → R
X
m 7→
F (m − k)H(k).
k∈Z
62
7 IMPLEMENTATION
ii) The upsampling operator ↑n : `(Z) → `(Z) is defined by:

F ( k ), if k ∈ nZ
n
↑n F (k) =
0,
else.
iii) Let F ∈ `(Z). The z-transform F ∗ of F is defined as:
F ∗ (z) =
X
F (k)z −k , z ∈ C \ {0} .
k∈Z
iv) A filter is an operator F : `(Z) → `(Z).
Note that if we want to convolve two vectors F and H, which have only finite length (in
contrast to sequences), we see them as sequences the following way: Let NF be the length of
F and NH the lenght of H. Then define

F (n),
F̃ (n) =
0,

H(n),
H̃(n − 1) =
0,
if 1 ≤ n ≤ NF
else
if 1 ≤ n ≤ NH
else.
Then it holds F ∗ H = F̃ ∗ H̃.
We differ two kind of filters. High-pass filters pass the high frequencies of a function and reduce the low frequencies, low-pass filters pass the low frequencies and reduce the amplitude of
the high frequencies. An example of this kind of filters can be seen in the Figure 9 and Figure
10. A method to construct a pair of one high-pass and one low-pass filter are quadrature
mirror filters. The convenience of this kind of filters is the perfect reconstruction property, i.e.
the sum at each frequency of both filters is equal to one (see Figure 11).
Definition 7.2. Let h0 be some filter with Fourier transform H0 , which fulfills |H0 (ξ)|2 +
|H0 (ξ + π)|2 = 1 for all ξ. Define h1 as h1 (n) = (−1)n h0 (n) for all n ∈ Z. Then h0 and h1
are quadrature mirror filters.
Note that h0 and h1 have indeed the perfect reconstruction property. To see this note first
that the Fourier transform and the z-transform coincides in the way that F ∗ (eiξ ) = F̂ (ξ) for
some F ∈ `(Z). Now it follows:
h∗1 (z) =
X
n
h1 (n)z −n =
X
n
(−1)n h0 (n)z −n =
X
n
h0 (n)(−z)−n = h∗0 (−z)
7.1 Shearlab Implementation
63
1
0.6
0.9
0.5
0.8
0.4
0.7
0.6
0.3
0.5
0.2
0.4
0.3
0.1
0.2
0
0.1
−0.1
1
2
3
4
5
6
7
8
0
9
1
2
3
4
5
6
7
8
9
(a) One-dimensional low-pass filter in
(b) One-dimensional low-pass filter in
time domain.
frequency domain.
Figure 9: Low-pass filter, which is used in the implementation.
0.6
1
0.5
0.9
0.4
0.8
0.7
0.3
0.6
0.2
0.5
0.1
0.4
0
0.3
−0.1
0.2
−0.2
−0.3
1
0.1
0
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
(a) One-dimensional high-pass filter in
(b) One-dimensional high-pass filter in
time domain.
frequency domain.
Figure 10: High-pass filter, which is used in the implementation.
64
7 IMPLEMENTATION
and
H1 (ξ) = h∗1 (eiξ ) = h∗0 (−eiξ ) = h∗0 (eiπ eiξ ) = h∗0 (ei(π+ξ) ) = H0 (π + ξ)
and therefore
|H0 (ξ)|2 + |H1 (ξ)|2 = |H0 (ξ)|2 + |H0 (ξ + π)|2 = 1.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
9
Figure 11: Pair of quadrature mirror filters, which is used in the implementation.
The idea is now to construct the two-dimensional filters from the one-dimensional filters. For
the low frequency part this is very simple, since it is rectangularly. The two-dimensional lowpass filter we need can be computed as hT0 · h0 . This multiplication gives a matrix since h0 is
a row vector. The Figure 12 shows how the low-pass filter looks in frequency domain.
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
Figure 12: Two-dimensional low-pass filter in frequency domain.
For the high frequency part it is not that easy, because the shearlets should have their essential
support in the cones (compare Figure 3). For this reason we use two-dimensional fan-filters,
whose support in frequency domain is cone-like (see Figure 13).
7.1 Shearlab Implementation
65
18
16
14
12
10
8
6
4
2
0
0
2
4
6
8
10
12
14
16
18
Figure 13: Two-dimensional fan filter.
The next step is to upsample the filter, such that the filters will be compressed. Note that
the fan filter is a matrix and not a vector, so we do the upsampling defined above for every
column. The result can be seen in Figure 14a. As Figure 14a shows, the essential support
of the resulting shearlets does not lie in the cones C (remember the partition of frequency
domain introduced in Section 4.1 and Figure 3). To restrict only at the frequencies in that
cones (high frequencies) we convolve the filter with the highpass filter of the quadrature mirror filters defined above. Note that the convolution of a matrix with a vector is the same as
if we convolve every row with the vector. For that reason we can use the one-dimensional
convolution defined above. Figure 14b shows the resulting essential support, which conform
to the essential support of a unsheared shearlet.
70
90
80
60
70
50
60
40
50
40
30
30
20
20
10
0
10
0
0
2
4
6
8
10
12
14
(a) Upsampled fan filter.
16
18
0
5
10
15
20
25
30
35
40
45
(b) Upsampled and convolved fan filter.
Now we have to shear and that is the point where the α ∈ (0, 1] and therefore the difference
of the frames comes into play. Let N be the size of the image, the number of shears can be
66
7 IMPLEMENTATION
computed as
nj = Jf − j − b(Jf − j) ∗ αc,
(7.1)
where Jf = log2 (N ) is called the finest level and j is the actual level. In the following we will
use Jf = 9 and J = 2, where J is the number of levels. The reason for the rounding in this
computation is the need of an integer number of shears. But this yields the same number for
some α. Table 2 shows the number of shears for each level and each α in the case Jf = 9 and
J = 2.
α=0.1-0.2
α=0.3
α=0.4
α=0.5-0.6
α=0.7
α=0.8-1.0
Level 1
3
3
2
2
1
1
Level2
4
3
3
2
2
1
Table 2: Number of shears for each level depending on α.
After constructing the shearlet frame, we are able to compute the shearlet coefficients. Let
shearlow be the the filter of the low frequency part and shearj,kj the filter of level j and
shearing kj where kj ∈ [1, nj ]. Let ŷ the Fourier transform of the image y, then the coefficient
matrices can be computed in frequency domain as:
(dlow )n,m = (shearlow )n,m · ŷ
(dj,kj )n,m = (shearj,k )n,m · ŷ
And of course, by applying the inverse Fourier transform we get the coefficients in time domain.
For reconstruction of the image from the coefficients we have to pay attention to the fact that
the shearlet frame is not a tight frame. Therefore we need the dual frame for reconstruction.
˜ low , shear
˜ j,k be the dual frame elements (for the computation look at the Matlab
Let shear
code). Then in Fourier domain we can reconstruct the image as:
ŷn,m
˜ low )n,m +
= (dlow )n,m · (shear
nj
J X
X
˜ j,k )n,m
(dj,kj )n,m · (shear
j
j=1 kj =1
By applying the inverse Fourier transform the image is reconstructed.
7.2
N-Term Approximation and Results
Now assume that we have a special image that we like to approximate. As mentioned above,
we do not know it’s regularity and we have to learn the best α. But for this implementation
it is easy. Remember that we get, by choosing Jf = 9 and J = 2, only six different frames.
7.2 N-Term Approximation and Results
67
Therefore it is no problem to compute the decay rate of the approximation fault for each of
this frames and to compare them. But if we look to larger images we are able to choose Jf
and J also larger and therefore the number of different frames increases and we have to think
about a ’real’ dictionary learning.
Having computed the shearlet coefficients for one α-shearlet frame, we can approximate an
image by using the N largest coefficients in magnitude and compute the decay rate of the
approximation fault. Since the coefficients of the low frequency part are for each α the same,
we use all coefficients of this part and then start the approximation. In each step i = 1, 2, . . .
we take the Ni largest coefficients in magnitude of the high frequency part and reconstruct
the image y and call this reconstruction yNi . Then we compute the fault as
eNi = ky − yNi kF
If we assume that the fault decays as N −γ , it holds for some constant C > 0
eN = C · N −γ .
Logarithmizing on both sides yields
log2 (eN ) = log2 (C) − γ log2 (N ).
This implies that we get in each step the error: log2 (eNi ) = log2 (C) − γ log2 (Ni ), which yields
the matrix-multiplication:



1 log2 (N1 )

 
"
#
 log (eN )  1 log (N2 )  log (C)
2
2
2

 

2

 = .

..
..
.




−γ
.
.

 .

log2 (eNn )
1 log2 (Nn )
log2 (eN1 )

Therefore we have to find a least square solution of this problem, which we can compute with
the help of the Matlab function ’polyfit’. Let us do this computation for different images in
the next subsection.
68
7.3
7 IMPLEMENTATION
Results
First we tried to compute the N-term approximation for different images with the Shearlab
implementation introduced before. But with this implementation one always find the same α
as the best. That is because this implementation is not suited for a N -Term approximation
(that takes the N -largest coefficients in magnitude), since the supports of translatet shearlets
overlap. Hence, if a shearlet meet the discontinuity, many of its translates meet it also and
there are to much large coefficients. Hence, all belonging coefficients will be chosen without
an additional fee for the image reconstruction. For this reason the implementation has to be
adapted. But at the time it does not work for all possible α. Therefore we have to restrict to
some of the possible α.
First we want to look at synthetic images. Therefore we draw an image of parallel lines and
go up to more and more parallel oscillating curves (see Figure 14). For each of these images
we compute the approximation rate for some α and compare them.
For these images we want to compute the decay rate for α = 2; 2, 5; 3. Do not wonder about
the choice of α. Of course, it would be better to choose α also smaller than two. But there
is still a problem in the adapted code for this choice of α. For our consideration, this should
not be a problem because the point here should be to see some differences. In addition to this
bachelor’s thesis we will do this considerations, too.
If we look at the graphics that show the decay-rate of the N-term approximation of the images,
we see that in the first steps the decay rate is not that large. But the theory claims about
the decay rate for sufficiently large j. For this reason we have to look at the decay-rate of the
finest-level, which is in our case j = 3. Up to a certain point, the decay rate increases immense
because we use to much information. So it should be sufficient to look at the first two to ten
percent of coefficients to compute the suitable decay rate. For this number of coefficients the
PSNR is between 40db and 60db (for the definition of the PSNR look forward to equation 8.1).
This is a good value for the PSNR, i.e. one could not see any differences in the picture. If we
look at Figure 15, we can imagine that the PSNR decreases the more the curves oscillate, and
this is indeed the case. So it would be better to take more coefficients for the images of more
oscillating curves, but to have also a comparability between the different images and not only
between the different α we decided to take the same number of coefficients for each image. As
mentioned before a PSNR of 40db is a good PSNR.
The Table 3 contain the computed decay rates for the mentioned α and the images shown in
Figure 14. For a better imagination of this rates see Figure 18 where the graph of the mapping
log2 (f ault) = f (log2 (number of coefficients)) is drawn.
The Figure 15 shows the decay rate of the different images for a fixed α. What we can see is
the more the curve oscillates, the better the approximation is for a smaller α.
7.3 Results
69
(a) Lines
(b) Oscillating 1
(c) Oscillating 2
(d) Oscillating 3
(e) Oscillating 4
(f ) Oscillating 5
(g) Oscillating 6
Figure 14: Family of synthetic images of more and more parallel oscillating curves.
Fig. 16a
Fig. 14b
Fig. 14c
Fig. 14d
Fig. 14e
Fig. 14f
Fig. 14g
α=2
−0, 9431
−0, 8441
−0, 7417
−0, 8377
−0, 8433
−0, 8806
−0, 9416
α = 2.5
−1, 3302
−0, 7858
−0, 7299
−0, 7773
−0, 8236
−0, 7221
−0, 7076
α=3
−1, 3582
−0, 7610
−0, 7304
−0, 7460
−0, 7865
−0, 6395
−0, 6351
Table 3: Decay rate for synthetic images.
70
7 IMPLEMENTATION
[2 2]
0
Lines
osz 1
osz 2
osz 3
osz 4
osz 5
osz 6
−1
log2(fault)
−2
−3
−4
−5
−6
8
9
10
11
12
13
14
log2(coefficients)
15
16
17
18
15
16
17
18
(a) α = 2
[2 3]
0
−1
log2(fault)
−2
−3
−4
Lines
Oscillating 1
Oscillating 2
Oscillating 3
Oscillating 4
Oscillating 5
Oscillating 6
−5
−6
−7
8
9
10
11
12
13
14
log2(coefficients)
(b) α = 2, 5
[3 3]
0
−1
log2(fault)
−2
−3
−4
Lines
Oscillating 1
Oscillating 2
Oscillating 3
Oscillating 4
Oscillating 5
Oscillating 6
−5
−6
−7
8
9
10
11
12
13
14
log2(coefficients)
15
16
17
18
(c) α = 3
Figure 15: Graph of the approximations. Comparison of the different images for fixed α.
7.3 Results
71
0
alpha=2
alpha=2,5
alpha=3
−1
log2(fault)
−2
−3
−4
−5
−6
−7
8
9
10
11
12
13
14
log2(coefficients)
15
16
17
18
(a) Lines
0
alpha=2
alpha=2,5
alpha=3
−0.5
−1
log2(fault)
−1.5
−2
−2.5
−3
−3.5
−4
8
9
10
11
12
13
log2(coefficients)
14
15
16
17
18
(b) Oscillating 2
0
alpha=2
alpha=2,5
alpha=3
−0.5
log2(fault)
−1
−1.5
−2
−2.5
−3
−3.5
8
9
10
11
12
13
14
log2(coefficients)
15
16
17
18
(c) Oscillating 6
Figure 16: Graph of the approximations. Comparison of the different images for fixed α.
72
7 IMPLEMENTATION
Lena
Barabara
α=2
−0, 8578
−0, 9560
α = 2, 5
−0, 7758
−0, 8701
α=3
−0, 7514
−0, 8181
Table 4: Decay rate for synthetic images.
Now we want to look at real images. We take two of the most famous pictures in the image
processing theory; this are the pictures ’Lena’ and ’Barbara’ (see Figure 17). For this real
images we see that a smaller α is better for the approximation rate, but the differences are
not that large.
Do not wonder about that the results do not show the decay rates we have estimated in the
theoretical part of this thesis. For this estimation we made the assumption of infinitely many
frame elements (shearlets) and have estimated the decay rate for sufficiently large j. In the
case of the implementation we have only finitly many frame elements. All in all the examples
show that in the case of this implementation a smaller α yields a better decay rate of the error
of the N -term approximation. But there are also pictures for which a larger α yields a better
decay rate (remember Figure 16a).
(a) Lena
(b) Barbara
Figure 17: Famous images for Image Processing: ’Lena’ and ’Barbara’
7.3 Results
73
−4
alpha=2
alpha=2,5
alpha=3
−4.5
log2(fault)
−5
−5.5
−6
−6.5
−7
−7.5
8
10
12
14
log2(coefficients)
16
18
20
(a) Lena
−2.5
alpha=2
alpha=2,5
alpha=3
−3
−3.5
log2(fault)
−4
−4.5
−5
−5.5
−6
−6.5
8
10
12
14
log2(coefficients)
16
18
20
(b) Barbara
Figure 18: Graph of the approximations for the images ’Lena’ and ’Barbara’.
74
8
8 CONCLUSION
Conclusion
In this bachelor’s thesis we have introduced cone-adapted discrete shearlet systems, which
are adapted on some α. Then we have shown that by using the α-adpted shearlet system
the approximation error of the largest N -Term approximation decays as N −α · log2 (N )α+1 .
Therefore that it provides almost the optimal sparse approximation rate, which is N −α as we
have also shown.
In the last section we have introduced the implementation of the shearlet transform to Matlab
and have computed some decay rates. Our goal was to find the ’best’ α-adapted shearlet
system for some special images, since we do not know the regularity of an image when we see
it and therefore we do not know which shearlet system, more precisely which scaling matrix,
we have to use.
At the moment, the code does not work for all possible α we looked at in the theoretical part,
since we have had to adapt it. So the next step is to make the adapted code work for all these
α and to look at the decay rate for these. As mentioned in the last section, if we approximate
bigger images, we are able to choose a larger finest level and therefore more levels overall. The
expectation is that we see larger differences in the decay rates for different α. Furthermore,
for larger j the number of equal shearing numbers (although for different α) decreases and
therefore, the number of different α that yield the same transformation. To see what we mean
remember equation 7.1. So if we have many different transformations and want to find the
best, we have to learn it. So a next step will be to implement a dictionary learning, that
finds the best transformation (i.e. the most suitable α-shearlet system) for special classes of
images.
However, we also want to look at other measures. This means we do not only want to measure
the approximation quality as the decay rate of the N -Term approximation. Another measure
that is imaginable, is the denoising quality: assume that we have an image and we add noise to
it. Then we can use the shearlet transform to denoise it. The denoise quality can be measured
now as the difference between the PSNR of the noised and the PSNR of the denoised image.
The PSNR is defined as
P SN R := 20 · log10
255 · N
kf d − f kF
,
(8.1)
where f is the original image to which we add noise, f d the denoised image and N the size of
the image. Note that to measure the denoising quality we can use the original Matlab code.
REFERENCES
75
References
[1] Robert A. Adams. Sobolev spaces. Academic Press, 1975.
[2] Kristian Bredies and Dirk Lorenz. Mathematische Bildverarbeitung. Vieweg + Teubner
Verlag, 2011.
[3] E. J. Candés and D. L. Donoho. New tight frames of curvelets and optimal representations
of objects with piecewise C 2 singularities, pages 216–266. Comm. Pure and Appl. Math.
56, 2004.
[4] Manfred Dobrowolski. Angewandte Funktionalanalysis, volume 2. Springer-Verlag, 2005.
[5] David L. Donoho. Sparse components of images and optimal atomic decompositions.
Constructive Approximation, 17:353–382, 2001.
[6] Eugenio Hernández and Guido Weiss. A first course on wavelets. CRC Press LLC, 1996.
[7] Gitta Kutyniok and Demetrio Labate. Resolution of the wavefront set using continuous
shearlets. Trans. Amer. Math. Soc., 361:2719–2754, 2009.
[8] Gitta Kutyniok, Jakob Lemvig, and Wang-Q Lim. Multiscale analysis for multivariate data, chapter Shearlets and Optimally Sparse Approximations. Birkhäuser-Springer,
2011.
[9] Gitta Kutyniok and Wang-Q Lim. Compactly supported shearlets are optimally sparse.
J. Approx. Theory, 163:1564–1589, 2011.
[10] Gitta Kutyniok, Wang-Q Lim, and Jakob Lemvig. Compactly supported shearlet frames
and optimally sparse approximation of functions in L2 (R3 ) with piecewise C α singularities. Submitted, 2011.
[11] R. Nowak. Lecture 15: Denoising smooth functions with unknown smoothness.
[12] Dirk Werner. Funktionalanalysis, volume 6. Springer-Verlag, 2000.