PDF - Center for Image Processing Research

Transcription

PDF - Center for Image Processing Research
Low-Complexity Scalable Multi-Dimensional Image
Coding with Random Accessibility
CIPR Technical Report TR-2008-5
Ying Liu
August 2008
Center for
Image Processing Research
Rensselaer Polytechnic Institute
Troy, New York 12180-3590
http://www.cipr.rpi.edu
LOW-COMPLEXITY SCALABLE MULTIDIMENSIONAL
IMAGE CODING WITH RANDOM ACCESSIBILITY
By
Ying Liu
A Thesis Submitted to the Graduate
Faculty of Rensselaer Polytechnic Institute
in Partial Fulfillment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
Major Subject: Electrical Engineering
Approved by the
Examining Committee:
William A. Pearlman, Thesis Adviser
Alhussein Abouzeid , Member
Mukkai Krishnamoorthy, Member
John W. Woods, Member
Rensselaer Polytechnic Institute
Troy, New York
July 2008
(For Graduation August 2008)
LOW-COMPLEXITY SCALABLE MULTIDIMENSIONAL
IMAGE CODING WITH RANDOM ACCESSIBILITY
By
Ying Liu
An Abstract of a Thesis Submitted to the Graduate
Faculty of Rensselaer Polytechnic Institute
in Partial Fulfillment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
Major Subject: Electrical Engineering
The original of the complete thesis is on file
in the Rensselaer Polytechnic Institute Library
Examining Committee:
William A. Pearlman, Thesis Adviser
Alhussein Abouzeid , Member
Mukkai Krishnamoorthy, Member
John W. Woods, Member
Rensselaer Polytechnic Institute
Troy, New York
July 2008
(For Graduation August 2008)
c Copyright 2008
°
by
Ying Liu
All Rights Reserved
ii
CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
ACKNOWLEDGMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1
1
Desirable Functionality . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.1.1
SNR Scalability . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.1.2
Resolution Scalability . . . . . . . . . . . . . . . . . . . . . . .
3
1.1.3
Random Accessibility . . . . . . . . . . . . . . . . . . . . . . .
3
1.1.4
Low Complexity and Resource Usage . . . . . . . . . . . . . .
3
1.2
Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2. WAVELET AND SET PARTITION CODING . . . . . . . . . . . . . . . .
9
2.1
2.2
2.3
Image Coding Background . . . . . . . . . . . . . . . . . . . . . . . .
9
2.1.1
Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . .
9
2.1.2
Statistical Characteristics of Wavelet Transformed Images . . 11
2.1.3
Bit-plane Coding . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4
Bit-plane Coding Passes . . . . . . . . . . . . . . . . . . . . . 12
Set-Partitioning Image Coding . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1
SPIHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1.1 Spatial Orientation Trees . . . . . . . . . . . . . . . 14
2.2.1.2 Coding Algorithm . . . . . . . . . . . . . . . . . . . 16
2.2.2
SPECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3
SBHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4
EBCOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
iii
3. LOW-COMPLEXITY 3-D IMAGE CODER: 3D-SBHP . . . . . . . . . . . 30
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2
Three-Dimensional Integer Wavelet Transform . . . . . . . . . . . . . 32
3.3
3.4
3.2.1
Lifting Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2
Scaling Factors . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Scalable 3D-SBHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1
Coding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2
Processing Order of Sorting Pass . . . . . . . . . . . . . . . . 42
3.3.3
Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.4
Memory and Complexity Analysis . . . . . . . . . . . . . . . . 44
3.3.5
Scalable Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.5.1 Resolution Scalability . . . . . . . . . . . . . . . . . 46
3.3.5.2 Rate Control . . . . . . . . . . . . . . . . . . . . . . 46
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1
3.5
Lossless Coding Performance . . . . . . . . . . . . . . . . . .
3.4.1.1 Lossless Coding Performance by Use of Different Integer Wavelet Transforms . . . . . . . . . . . . . .
3.4.1.2 Comparison of Lossless Performance with Different
Algorithms . . . . . . . . . . . . . . . . . . . . . .
3.4.1.3 Lossless coding performance by use of different codeblock sizes . . . . . . . . . . . . . . . . . . . . . . .
. 50
. 50
. 51
. 54
3.4.2
Lossy performance . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.3
Resolution scalable results . . . . . . . . . . . . . . . . . . . . 56
3.4.4
Computational Complexity . . . . . . . . . . . . . . . . . . . . 57
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 58
4. Region-of-Interest Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1
Code-block Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2
Random Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1
Wavelet Transform vs. Random Accessibility . . . . . . . . . . 74
4.2.1.1 Filter Implementation . . . . . . . . . . . . . . . . . 74
4.2.1.2 ROI decoding performance by use of different wavelet
filters and wavelet decomposition levels . . . . . . . . 75
4.2.2
Code-block Configurations vs. Random Accessibility . . . . . 76
4.2.2.1 Lossy-to-lossless coding performance by use of different code-block sizes . . . . . . . . . . . . . . . . . 76
iv
4.2.2.2
4.2.3
4.3
ROI decoding performance by use of different codeblock sizes and ROI sizes . . . . . . . . . . . . . . . 78
ROI access performance by use of different bit allocation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5. Multistage Lattice Vector Quantization for Hyperspectral Image Compression 83
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2
Vector Quantization . . . . . . . . . . . . . .
5.2.1 Lattice Vector Quantization . . . . . .
5.2.1.1 Classical Lattice . . . . . . .
5.2.1.2 LVQ Codebook . . . . . . . .
5.2.2 Multistage Lattice Vector Quantization
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
84
85
85
86
87
5.3
MLVQ-SPIHT
5.3.0.1
5.3.0.2
5.3.0.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
88
90
91
92
5.4
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 94
. . . . . . . . . . .
Cubic Z4 LVQ .
Pyramid D4 LVQ
Sphere D4 LVQ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6. Four-Dimensional Wavelet Compression of 4-D Medical Images Using Scalable 4-D SBHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2
Scalable 4D-SBHP . . . . . .
6.2.1 Wavelet Decomposition
6.2.2 Coding Algorithm . . .
6.2.3 Scalable Coding . . . .
6.3
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3.1 Comparison of Lossless performance with 3-D and 4-D schemes106
6.3.2 Comparison of Lossy performance with 3-D schemes . . . . . . 107
6.3.3 Resolution scalable results . . . . . . . . . . . . . . . . . . . . 108
6.4
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 108
. . . .
in 4-D
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
100
100
101
105
7. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1
Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 115
7.2
Futher Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.1 Improving Compression Efficiency . . . . . . . . . . . . . . . . 117
7.2.2 3D-SBHP on Video . . . . . . . . . . . . . . . . . . . . . . . . 117
v
LITERATURE CITED
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
APPENDICES
A. Huffman Codes for Entropy Coding and Statistics of the Training Set . . . 125
vi
LIST OF TABLES
2.1
Filter coefficients for the Daubechies’ biorthogonal 9/7 filters . . . . . . 10
2.2
Comparison of wavelet-based image coders . . . . . . . . . . . . . . . . 29
3.1
Average standard deviation of volumetric image sequences along X, Y,
and Z directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2
Lossless integer filters.
3.3
Description of the image volumes . . . . . . . . . . . . . . . . . . . . . 50
3.4
Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on CT data. . . . . . . . 51
3.5
Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on MR data. . . . . . . . 52
3.6
Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on AVIRIS data (Decomposition level of 3 is used on all dimensions). . . . . . . . . . . . . . . . 52
3.7
Comparison of different coding methods for lossless compression of 8-bit
medical image volumes (bits/pixel). . . . . . . . . . . . . . . . . . . . . 54
3.8
Comparison of different coding methods for lossless coding of 16-bit
AVIRIS image volumes (bit/pixel) (Decomposition level of 5 is used on
spatial domain and decomposition level of 2 is used on spectral axis) . 54
3.9
Lossless Coding Results by Use of Different Code-block Size (bits/pixel) 55
3.10
PSNR performance (in dB) of 3D-SBHP at various rates for medical volumetric image data. These rates are obtained by truncation of lossless
bitstream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.11
PSNR for decoding CT skull at a variety of resolutions and bit rates . . 56
3.12
Byte used for losslessly reconstruct CT skull at a variety of resolutions . 56
3.13
The comparison of lossless encoding time between AT-3D-SPIHT and
3D-SBHP on image CT skull and MR liver t1. (Wavelet transform
times are not included.) . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.14
The comparison of decoding time between AT-3D-SPIHT and 3D-SBHP
on image CT skull and MR liver t1 at a variety of bit rates. (Wavelet
transform times are not included.) . . . . . . . . . . . . . . . . . . . . . 59
. . . . . . . . . . . . . . . . . . . . . . . . . . . 35
vii
3.15
Losslessly decoding time of 3D-SBHP on CT skull and MR liver t1 at
a variety of resolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.16
The comparison of CPU cycles used for wavelet transform time, lossless encoding and disk I/O between AT-3D-SPIHT and 3D-SBHP on
CT skull image sequence. . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1
Number of taps of integer filters.
4.2
Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64 × 64 × 64, code-block size = 8 × 8 × 2, spatial wavelet
decomposition level = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3
Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64 × 64 × 64, code-block size = 8 × 8 × 2, spatial wavelet
decomposition level = 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4
Description of the image volumes . . . . . . . . . . . . . . . . . . . . . 77
5.1
Description of the image volume Moffett Field . . . . . . . . . . . . . . 93
5.2
Comparison of rate-distortion results of different coding methods in
Signal-to-Noise ration (SNR) in dB . . . . . . . . . . . . . . . . . . . . 94
6.1
Average standard deviation of 4D fMRI and 4D CT image data along
X, Y, Z and T directions. . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2
Description of the image volumes . . . . . . . . . . . . . . . . . . . . . 106
6.3
Lossless compression performance using 4D-SBHP and 3D-SBHP (bits/pixel)107
6.4
Lossless compression performance using 4D methods (bits/pixel) . . . . 107
6.5
SNR for decoding siem at a variety of resolutions and bit rates . . . . . 108
A.1
Probabilities for 15 significant subset masks collected from medical image training set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
A.2
Probabilities for 15 significant subset masks collected from hyperspectral image training set. . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
A.3
Probabilities for the number of significant subset in a split significant
set. This statistics is collected from both medical image training set
and hyperspectral image training set. . . . . . . . . . . . . . . . . . . . 127
A.4
Probabilities of significance of a generated subset when a set is split.
This statistics is collected from both medical image training set and
hyperspectral image training set. . . . . . . . . . . . . . . . . . . . . . . 127
. . . . . . . . . . . . . . . . . . . . . 75
viii
A.5
Huffman codewords generated for 15 significant subset masks based on
medical image training set. . . . . . . . . . . . . . . . . . . . . . . . . 127
A.6
Huffman codewords generated for 15 significant subset masks based on
hyperspectral image training set. . . . . . . . . . . . . . . . . . . . . . 128
ix
LIST OF FIGURES
1.1
An example of medical CT images (256 × 256 × 192). . . . . . . . . . .
2
1.2
An example of hyperspectral images (512 × 512 × 224). . . . . . . . . .
2
1.3
Block diagram of general transform coding system. . . . . . . . . . . . .
6
2.1
Two channel filter structure for subband coding. . . . . . . . . . . . . . 10
2.2
Illustration of a two-dimensional dyadic DWT decomposition when two
levels are performed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3
Parent-child relationship in SPIHT. . . . . . . . . . . . . . . . . . . . . 15
2.4
Partitioning of wavelet transformed image into sets S and I. . . . . . . 19
2.5
Quadtree partitioning of set S. . . . . . . . . . . . . . . . . . . . . . . . 19
2.6
Octave partitioning of set I. . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7
Set partitioning rules used by SBHP. . . . . . . . . . . . . . . . . . . . 24
2.8
Example of JPEG2000 code-block scan pattern . . . . . . . . . . . . . . 28
3.1
Wavelet decomposition structure with 3 levels of 2D spatial transform
followed by 2 levels of 1D axial transform. . . . . . . . . . . . . . . . . . 33
3.2
The forward wavelet transform using lifting: First the Lazy wavelet
(subsample into even and odd), then alternating lifting and dual lifting
steps, and finally a scaling. . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3
The inverse wavelet transform using lifting: First a scaling, then alternating dual lifting and lifting steps, and finally the inverse Lazy transform. 34
3.4
An example of scaling factor used in integer wavelet transform to approximate a 3D unitary transform. . . . . . . . . . . . . . . . . . . . . . 61
3.5
Wavelet decomposition structure with 2 levels of 1D packet decomposition along axial direction, followed by 3 levels of 2D dyadic transform
in spatial domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.6
Partitioning of the code-block into set S and I. . . . . . . . . . . . . . . 62
3.7
Quadtree partitioning of set S. . . . . . . . . . . . . . . . . . . . . . . . 63
3.8
octave-band partitioning of set I. . . . . . . . . . . . . . . . . . . . . . . 63
x
3.9
Set partitioning rules used by 3-D SBHP. . . . . . . . . . . . . . . . . . 64
3.10
12 resolution levels with 3-level wavelet decomposition in the spatial
domain and 2-level wavelet decomposition in the spectral direction. . . . 64
3.11
An example of 3D-SBHP SNR and resolution scalable coding. Compressed bitstream generated on bitplane α in code-block β is notated as
b(α,β) . Code-blocks are encoded and indexed from the lowest subband
to the highest subband. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.12
Bitstream structure generated by 3D-SBHP. Compressed bitstream generated on bitplane α in code-block β is notated as b(α,β) . R(i,j,k) denotes
the number of bit used after ith coding pass (i = 0: LIP pass; i = 1:
LIS pass; i = 2: LSP pass) at the nth bit plane for code-block Bk .
Derivation D(i,j,k) denotes the derivation of the rate distortion curve,
δDi,j,k , after ith coding pass (i = 0: LIP pass; i = 1: LIS pass; i = 2:
LSP pass) at the nth bit plane for code-block Bk . . . . . . . . . . . . . 65
3.13
Reconstructed CT Skull 1st slice by 3D-SBHP, from left to right, top
to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0 bpp, and original slice . . 66
3.14
Reconstructed MR Liver t1 1st slice by 3D-SBHP, from left to right,
top to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0 bpp, and original slice
67
3.15
A visual example of resolution scalable decoding. From left to right:
1/4, 1/2 and full resolution at 0.125 bpp . . . . . . . . . . . . . . . . . 68
4.1
Spatial access with code-blocks. . . . . . . . . . . . . . . . . . . . . . . 70
4.2
Parent-offspring dependencies in the 3D orientation tree. . . . . . . . . 70
4.3
2D example of code-block selection. Filter length is considered. . . . . . 72
4.4
An visual example of 3D-SBHP random access decoding. . . . . . . . . 73
4.5
Rate-distortion performance with increasing code-block size. . . . . . . 77
4.6
Rate-distortion performance with increasing ROI size. . . . . . . . . . . 79
4.7
A visual example of ROI decoding from 3-D SBHP bit stream using
different wavelet filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.8
Rate-distortion performance with different priorities for code-blocks. . . 82
5.1
Multistage lattice VQ with A2 lattice. . . . . . . . . . . . . . . . . . . . 88
5.2
An example of parent-child relationship between vectors when vector
dimension N = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
xi
5.3
Vector SPIHT with successive refinement LVQ. . . . . . . . . . . . . . . 91
5.4
Comparison of original and reconstructed moffet scene 3 49th band by
MLVQ-SPIHT, from top to bottom: original, 0.1bpp, 0.5bpp. . . . . . . 96
5.5
Comparison of lossy performance of for Moffet Field image, scene 3. . . 97
6.1
Wavelet decomposition structure with 2 levels of 1D temporal transform
followed by 2 levels of 1D axial transform and 2D spatial transform. The
black block is the lowest frequency subband. . . . . . . . . . . . . . . . 101
6.2
Quadtree partitioning of set S. . . . . . . . . . . . . . . . . . . . . . . . 103
6.3
octave-band partitioning of set I. . . . . . . . . . . . . . . . . . . . . . . 103
6.4
Set partitioning rules used by 4-D SBHP. . . . . . . . . . . . . . . . . . 110
6.5
An example of 4D-SBHP SNR and resolution scalable coding. Each
bitplane α in block β is notated as b(α, β). Code-blocks are encoded
and indexed from the lowest subband to the highest subband. . . . . . . 111
6.6
Bitstream structure generated by 4D-SBHP. Each bitplane α in block
β is notated as b(α, β). Rate-distortion information is stored in the
header of every code-block. . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.7
Comparison of lossy performance of mb01 image data. . . . . . . . . . . 112
6.8
Comparison of lossy performance of siem image data. . . . . . . . . . . 113
6.9
Reconstructed siem sequence at time t = 20 by 4D-SBHP, from left to
right, top to bottom: original, 0.5 bpp, 1.0 bpp, and 2.0bpp . . . . . . . 114
6.10
A visual example of resolution scalable decoding. Full resolution and
1/2 resolution of one slice at 0.25 bpp . . . . . . . . . . . . . . . . . . . 114
xii
ACKNOWLEDGMENT
First and foremost, I offer my sincerest gratitude to my thesis advisor, Professor William A. Pearlman, for his carefully considered advice, patience, selfless
support and inspirational enthusiasm for this research. I am very fortunate to have
worked with him.
I would like to thank Professor John W. Woods, Alhussein Abouzeid and
Mukkai Krishnamoorthy for serving as members of my thesis committee. I am
grateful for their help and thoughtful comments on this work.
Many of my fellow students in Center for Image Processing Research (CIPR),
too many to be mentioned individually, were of great help while this work was being
done. Their knowledge and experience in performing related research has been very
helpful.
I acknowledge with thanks the financial support I received for my Ph.D. program from Office of Naval Research, Electrical, Computer and Systems Engineering
Department, and Rensselaer Polytechnic Institute.
Lastly, and most importantly, I would like to thank my family, to whom this
thesis is dedicated. Their never-ending support in all my endeavors has be invaluable.
xiii
ABSTRACT
Multi-dimensional data set, such as hyperspectral images and medical volumetric data generated by computer tomography (CT) or magnetic resonance (MR)
typically contains many image slices that requires huge amount of storage and transmission bandwidth. To compress those huge size image data, it is highly desirable
to have a low-complexity and efficient image coding algorithm. Furthermore, in the
Internet environment, to make interactive viewing more efficient, we need a compression scheme which is inherently scalable and that supports a high degree of
random accessibility.
The first aspect of this work proposes a fast coding method that supports both
SNR and resolution scalabilities and decoding of a region of interest by random access to the bitstream. In order to achieve minimal complexity, we use fixed-symbol
Huffman coding instead of context-based arithmetic coding. Multi-dimensional subband/wavelet coding is applied to exploit the dependencies and multi-resolution in
all dimensions. We adopt wavelet bitplane coding to give full SNR scalability. The
hierarchical coding and block based structure enables spatial accessibility and resolution scalable representation of wavelet transform coefficients. The framework is
designed and implemented for both 3D and 4D image sources. We demonstrate
through extensive experiments that our coding scheme performed comparable in
compression to other algorithms while yielding very high coding speeds, and supporting all features planned for JPEG 2000.
The second aspect of this thesis proposes a coding method for wavelet coefficients of 3D image sources using vector quantization. In the proposed algorithm,
multistage lattice vector quantization (MLVQ) is used to exploit correlations between image slices, while offering successive refinement with low coding complexity and computation. Different LVQs including cubic Z4 and D4 lattices are implemented with SPIHT. The experimental results show that MLVQ-based-schemes
provide better rate-distortion performance at low bit rate than 2DSPIHT and those
algorithms that employ 3D wavelet transforms.
xiv
CHAPTER 1
INTRODUCTION
In the past decade, the acquisition, transmission, storage, and processing of
digital images have become widespread. Digital technology allows visual information
to be regenerated, processed, archived and transmitted easily. Most significantly,
digital images easily provide a diverse range of service over Internet. Despite their
advantages, there is one problem with digital images – they generally consist of large
amounts of data when they are represented in uncompressed data form, especially
for those three-dimensional and four-dimensional image data produced by medical
data acquisition devices or multispectral/hyperspectral imaging techniques.
Those volumetric data sets are best viewed as a collection of still images,
known as slices. Magnetic Resonance Imaging (MRI), Computer-assisted Tomography (CT) and Ultrasound (US) are typical examples of medical imaging techniques
that generate three-dimensional and four-dimensional image data. The slices represent cross sections of the subject at various positions along a third orthogonal
axis at a given time instant. Figure 1.1 gives a visual example of 3D medical CT
image sequence. Hyperspectral imaging is a remote sensing imaging technique that
also generates three-dimensional data sets, where the slices represent narrow and
contiguous spectral bands of the region being viewed by the instrument. As an
example shown in Figure 1.2, a 512 × 512 × 224 hyperspectral image data with 16
bits/pixel, has a raw file size equal to 117.5 Mbytes, which would require about 8
minutes to acquire even over the high speed 256Kbps digital subscriber line (DSL).
Therefore, efficient compression should be applied to those data sets before storage
and transmission. On the other hand, in many internet applications, it is indispensable to guarantee interactivity for consultation and quantitative analysis. As a
consequence, trading-off image quality and algorithm complexity against the bit-rate
constraint requires the compression scheme to not only have efficient compression
but also other functionality.
1
2
Figure 1.1: An example of medical CT images (256 × 256 × 192).
Figure 1.2: An example of hyperspectral images (512 × 512 × 224).
1.1
1.1.1
Desirable Functionality
SNR Scalability
SNR scalability is a functionality that is provided by multiple layers such that
an enhancement layer carries coefficients quantized with improved accuracy. This
means the data that are more important for reconstructing the image are stored
before less important data in the compressed image description. In fact, if we
take a long enough prefix of the bitstream we should be able to get a completely
3
lossless representation of the original image. This lossy-to-lossless compression is
useful whenever images are needed at several different quality settings. For medical
consultation, discarding small image details that might be an indication of pathology
could alter a diagnosis, causing severe human and legal consequences[1]. In this
case, lossless decoding is preferred. On the other hand, lossy decoding allows users
to quickly browse through a large volumetric data set.
1.1.2
Resolution Scalability
Resolution scalability is the ability to easily display the image at different reso-
lutions. For an algorithm to be resolution scalable, the beginning of the compressed
image bitstream should contain data for reconstructing a small, low-resolution version of the image. Each successive part of the bitstream, along with the previous
bits, should contain the data for reconstructing a larger, higher-resolution version of
the image. This capability would be very useful in image browsing, or in any system
that requires different resolutions of an image for output to different devices.
1.1.3
Random Accessibility
Random accessibility refers to the ability to render an arbitrary portion of the
image data set from an embedded compressed codestream without having to decode
the entire image. This feature, when combined with resolution scalability, would
be very useful in interactive applications. After viewing a low-resolution overview
of the image, a user can zoom to a particular region without decoding the entire
high-resolution image data set.
1.1.4
Low Complexity and Resource Usage
The complexity of an image compression algorithm is calculated by the number
of data operations required to perform encoding and decoding. The resource usage
usually means the speed of encoding/decoding and the amount of memory used
by the compression algorithm. For some applications, the speed of encoding and
decoding an image is critical. In other applications, the amount of memory used by
the compression algorithm needs to be small to keep the cost of the entire system
4
low. Clearly, it is desirable for an image compression algorithm to be both as fast
as possible, and to have a memory footprint that is as small as possible.
1.2
Related Works
In the past, some Discrete Cosine Transform(DCT) based schemes [5, 6] have
been proposed for volumetric coding. For these techniques the computation is based
on splitting image volume into N × N × N blocks and applying a 3-D DCT. These
3-D DCT based schemes encountered two problems: 1) They cannot meet the requirements imposed by the scalability paradigm, 2) DCT-based processes cannot
be used for true lossless coding1 . The latter is extremely important for medical
applications that often cannot tolerate any distortion that could lead to an faulty
diagnosis.
To overcome these problems and maintain good compression performance,
many promising wavelet based image coding algorithms are proposed recently. Shapiro’s
Embedded Zerotree Wavelet (EZW) [18] is the first efficient wavelet-based image
coding algorithm. Later work by Said and Pearlman [19] on set partitioning in hierarchical trees (SPIHT) improved upon EZW coding and applied it successfully to
both lossy and lossless compression. Islam and Pearlman proposed Set Partitioned
Embedded bloCK (SPECK)[17], a low complexity, block-based image coder, with
similar features. While EZW and SPIHT represent zero-tree based image coders,
SPECK represents image coder based on zero-block structures. The new JPEG 2000
standard is based on a similar scheme called Embedded Block Coding with Optimal
truncation (EBCOT) [24]. A SPECK variant called Subband Block Hierarchical
Partitioning (SBHP) [16] was proposed as a low complexity alternative to EBCOT
in the JPEG 2000 Working Group. SBHP was incorporated into the JPEG 2000
coding framework, simply by replacing EBCOT as the entropy coding engine
Although 3-D image data can be compressed by applying two-dimensional
compression algorithm to each slice independently, the high correlation between
slices makes three-dimension based algorithm a better choice. Recently, compres1
We are referring to a true floating point DCT, not integer approximation to the DCT, which
is not as efficient, but can be compatible with lossless coding
5
sion techniques with separate 3D wavelet transform and 3D coding of quantization
indices have been considered by several researchers. 3-D context-based EZW (3-D
CB-EZW) [7], a 3D zero-tree coding through modified EZW, has been used with
good results in compression of volumetric images. However, as pointed out by Xiong
et al.[8], the problem of efficient progressive lossy compression is not addressed there.
The well-known SPIHT algorithm has been extended to three dimensions by Kim
and Pearlman [9]. Dragotti et al. applied 3-D SPIHT for compression of multispectral images [10]. 3-D SPIHT has been applied on volumetric medical data by
Kim and Pearlman[11]. Stripe-based SPIHT has been proposed for volumetric medical data compression with low memory [12]. Recently, Christophe and Pearlman
presented an adaptation of 3D-SPIHT to support random accessibility and resolution scalability [13]. Tang et al. [21] extended SPECK to three-dimensions and
applied 3-D SPECK on hyperspectral images. 3-D SPECK treats each subband
as a code-block and generates an embedded codestream for each code-block independently. The EBCOT algorithm has also been extended to three-dimensions by
several researchers. Three-Dimensional Cube Splitting EBCOT (3D CS-EBCOT)
[14] partitions the wavelet coefficient prism into fixed-size 64 × 64 × 64 code-block
and applies cube splitting technique on each code-block. Xu et al. extended EBCOT
to Three-Dimensional Embedded Subband Coding with Optimized Truncation (3-D
ESCOT) [15] by treating each subband as a code-block. JPEG2000 Part 2 [27] also
provides a method to code multicomponent images. After a 3D discrete wavelet
transform (DWT), the JPEG2000 coder is applied on each transformed slice independently. All 3-D applications are potentially affected by the fact that Part 2 fails
to enable a number of source coding features in the cross-component direction [29].
Now JP3D, a new work item with the JPEG working group, is under development
to provide extensions of JPEG2000 for logically rectangular 3D data sets.
Although many techniques with 3D wavelet transform and 3D coding have
been proposed for compressing 3D datasets, most of them are unable to provide full
scalability or random access functionality. In this thesis, we address low- complexity
compression techniques which support full scalability and a degree of random access
into the multi-dimensional image data with a single codestream per data set.
6
In a transform coding system [2, 3, 4] as depicted in Figure 1.3, three-dimensional
transform, quantization and adaptive coding method based on three-dimensional
context modelling are all candidates for exploiting the relationships between slices.
Due to its superior performance over scalar quantization, vector quantization has
been applied in many wavelet-based coding algorithms.
Encoder
Decoder
Transform
Quantization
Entropy
Encoder
Inverse
Transform
Inverse
Quantization
Entropy
Decoder
Figure 1.3: Block diagram of general transform coding system.
In [53], subband image coding with LBG [55] codebook generation VQ is
proposed. Since the LBG training algorithm causes high computational cost and
coding complexity, especially as the vector dimension and bit rate increase, lattice
vector quantization is proposed to reduce the computational complexity [57].
Plain lattice vector quantization (LVQ) of wavelet coefficient vectors has been
successfully employed for image compression [61, 62, 63]. In order to improve performance, it is reasonable to consider combining LVQ with powerful wavelet-based
zerotree or set-partitioning image coding methods and bitplane-wise successive refinement methodologies for scalar sources, as in EZW, SPIHT and SPECK. In [64],
a multistage lattice vector quantization is used along with both zerotree structure
and quadtree structure that produced comparable results to JPEG 2000 at low bit
rates. VEZW [65] and VSPIHT [66, 67, 68] have successfully employed LVQ with
2D-EZW and 2D-SPIHT respectively. And in VSPECK [69], tree-structured vector
quantization (TSVQ) [70] and ECVQ [71] are used to code the significant coefficients
for 2D-SPECK.
Since VQ has the ability to exploit the statistical correlation between neighboring data in a straightforward manner, the second aspect of this thesis proposes
a coding method for wavelet coefficients of 3D image sources using vector quantization. The multistage LVQ is used to obtain the counterpart of bitplane-wise
7
successive refinement, where successive lattice codebooks in the shape of Voronoi
regions of multidimensional lattice are used.
1.3
Outline of the Thesis
This chapter briefly introduces the motivation and desirable features of 3-D
image compression. It also includes related work and the outline of this proposal.
In Chapter 2, fundamentals of hierarchial wavelet based image compression
scheme are introduced. Brief reviews of discrete wavelet transform and bit plane
coding are given first. The basic coding mechanisms are described by reviewing and
analyzing several representative hierarchical set partitioning algorithms, including
SPIHT, SPECK and SBHP. A brief description of the EBCOT algorithm is also
given in this chapter.
In Chapter 3, a very fast, low-complexity volumetric image coding algorithm,
3D-SBHP, for SNR and resolution scalable and random access decoding is presented.
Our main interest is scalable compression techniques which also support a degree
of random access into the volumetric data. Here, 3D-DWT is applied on an image
sequence to exploit the correlation along the spatial dimensions and axial dimension.
After the wavelet transform, the wavelet coefficient prism is split into fixed-size codeblocks and the 3-D SBHP algorithm is applied on each code-block independently.
The algorithm is based on set partitioning and bit-plane coding. The set partitioning technique can quickly zoom to the high energy areas. 3D-DWT and block-based
coding naturally support resolution scalable coding, while bit-plane coding enables
SNR scalability. Experiments show that our proposed algorithm provides comparable efficiency to other algorithms, while supporting all desirable features addressed
in Section 1.1.
Chapter 4 addresses the random access decoding method of 3D-SBHP. Codeblock selection method is chosen so that the image sequence can be encoded only
once and then the decoder can directly extract a subset of the codestream to reconstruct a chosen Region of Interest (ROI) of required quality. In this chapter,
we investigate random accessibility and compression efficiency of highly scalable
volumetric compression from both the transform and coding perspective.
8
In Chapter 5, we extended the SPIHT coding algorithm with lattice vector
quantization to code hyperspectral images. In the proposed algorithm, multistage
lattice vector quantization (MLVQ) is used to exploit correlations between image
slices, while offering successive refinement with low coding complexity and computation. Different lattices including cubic Z4 and D4 are considered. Their performances are compared with other 2D and 3D wavelet-based image compression
algorithms.
In Chapter 6, the idea of the 3D-SBHP algorithm in Chapter 3 is extended
to the 4D case. Resolution scalability is empirically investigated, and the lossy-tolossless compression performances are compared with other 3D and 4D volumetric
compression schemes.
In Chapter 7, the overall conclusion of this proposal and further work are
discussed.
CHAPTER 2
WAVELET AND SET PARTITION CODING
Recently, a number of hierarchical wavelet based image coding techniques have
emerged. All these techniques are based on the idea of set partitioning and exploiting the hierarchical subband pyramidal structure of the transformed images. In this
chapter, we briefly review the fundamentals of wavelet based hierarchical set partitioning image compression algorithms. The Discrete Wavelet Transform is briefly
introduced first and is followed by basic wavelet bitplane coding techniques. The
last section reviews several important hierarchical set partitioning image coding algorithms, including Set Partitioning In Hierarchical Trees (SPIHT), Set Partitioned
Embedded bloCK (SPECK),and Subband Block Hierarchical Partitioning (SBHP).
The Embedded Block Coding with Optimal Truncation (EBCOT), which is the
basis of the JPEG2000 standard, is also described in this chapter.
2.1
2.1.1
Image Coding Background
Discrete Wavelet Transform
The wavelet transform represents its input in terms of functions that are lo-
calized in both time and frequency. Mathematically, the wavelet transform approximates a function by representing it as a linear combination of two sets of functions:
Φ and Ψ. The set Φ is constructed from the scaling function, while Ψ is constructed
from the mother wavelet. Such superposition decomposes the function into different
scale levels, where each level is then further decomposed with a resolution matched
to the level. More detailed mathematical introductions to wavelets can be found in
[30, 31, 32].
The discrete wavelet transform (DWT) is applied to discretely sampled data
and is based on a low-pass filter and a high-pass filter. A filter is defined by a finite
set of filter coefficients. Table 2.1 gives the filter coefficients of the biorthogonal
Daubechies 9/7 wavelet [33]. Generally, the output of a filter can be computed by a
convolution of the filter coefficients with the input data followed by downsampling
9
10
as shown in Fig 2.1. A convolution of filter coefficients c1 , ..., cm with input data
x[1], ..., x[n] producing output y[1], .., y[n] is given as follows:
y[i] =
m
X
cj x[i + j − dm/2e].
j=1
As can be seen, the time required to compute the DWT is proportional to
the number of filter coefficients. Since downsampling removes half the outputs,
the computations for those outputs are not necessary. A very efficient method of
computing the DWT, known as the lifting scheme, was presented in [34, 35] to
reduce the number of arithmetic operations.
k
0
±1
±2
±3
±4
Analysis
Low-pass filter High-pass filter
0.602949
1.115087
0.266864
-0.591271
-0.078223
-0.057543
-0.016864
0.091271
0.026748
Synthesis
Low-pass filter High-pass filter
1.115087
0.602949
0.591271
-0.266864
-0.057543
-0.078223
-0.091271
0.016864
0.026748
Table 2.1: Filter coefficients for the Daubechies’ biorthogonal 9/7 filters
y1[n]
2
G
2
x[n]
y2[n]
Analysisfilter
Downsampling
2
~
H
2
~
G
CHA NNEL
H
Upsampling
Synthesisfilt
^
X[n]
er
Figure 2.1: Two channel filter structure for subband coding.
A subband is the set of transform coefficient outputs when applying one filter to
the input data points. Thus, there are two subbands after the DWT is applied once.
One represents the low-pass filter output (L1 ), and the other represents the high-pass
filter output (H1 ). When applied to two-dimensional data such as images, the DWT
is applied horizontally to each row of the image and vertically on each column of the
wavelet coefficients calculated in the row transformations. Thus, the first level of
the transform consists of four subbands: a horizontal and vertical low-pass subband
(LL), a horizontal low-pass and vertical high-pass subband (LH), a horizontal highpass and vertical low-pass subband (HL), and a horizontal and vertical high-pass
11
subband (HH). The LL subband represents a low-resolution overview of the image,
while the other subbands represent a high-frequency detail information. In the LH,
HL, and HH subbands, most coefficients are close to zero, while those that are not
close to zero represent edges in the image. When dyadic wavelet decomposition is
applied to an image, each successive level of the transform operates only on the LL
subband data produced by the previous level. This structure efficiently represents
edges in the high-frequency subbands, as well as smooth regions in the low frequency
subbands. This method is also known as pyramidal wavelet decomposition and
octave subband wavelet decomposition. Fig 2.2 shows two levels of dyadic wavelet
decomposition applied to an image.
LL1
Original
Image
HL1
LL2
LH2
HL2
HH2
LH1
HH1
LH1
HL1
HH1
Figure 2.2: Illustration of a two-dimensional dyadic DWT decomposition
when two levels are performed.
2.1.2
Statistical Characteristics of Wavelet Transformed Images
Image compression algorithms work by exploiting the correlation between co-
efficients and then removing this correlation so that the image can be represented
in fewer bits.
EZW is a very successful compression algorithm based on two statistical characteristics of wavelet-transformed natural images. The first property is that many
wavelet coefficients are zero or close to zero beyond the LL subband and those coefficients generally obey the zerotree property: if a coefficient is found to be insignificant
in a given bit-plane, then all its descendants are also likely to be insignificant. Another property is that the magnitude of a child coefficient is usually less than that
of its parent. This property is used in deciding the testing order.
In SPECK and EBCOT, the main idea is to exploit the clustering of energy in
hierarchical structure of transformed images. They found that significant coefficients
are likely to cluster together, regardless of whether they share the same parent.
12
2.1.3
Bit-plane Coding
Bit-plane coding has been applied to many wavelet based image compression
algorithms, such as EZW, SPIHT, SPECK and EBCOT. If we write the absolute
P
n
value of the wavelet transformed coefficient ci in binary format, |ci | =
n bn 2 ,
where the bit-plane n = 0, 1, .., nmax and the bit index bn ∈ 0, 1, then the bit-plane
n consists of the single bit index of the nth least significant bit of the coefficient
magnitude. The bit-planes are coded in order so that no bit from bit-plane n is
coded before all bits from bit-plane n + 1 have been coded.
The bit-plane ordering represents successive refinement of a simple scalar quantization of the coefficients. With each additional bit-plane, the quantization bins
get smaller and the uncertain interval of |ci | is halved. Since the wavelet coefficient
can be positive or negative, the sign of each coefficient must be encoded. Typically,
the sign is encoded when a coefficient is identified to be significant.
Said and Pearlman [19] show that greater magnitude coefficients in the wavelet
transform domain affect the quality of the image more than lesser magnitude coefficients. This suggests that the successive refinement strategy of bit-plane coding is
very efficient way to generate an embedded codestream. Since the sign is very important once the coefficient becomes significant, sending the sign bit immediately after
a coefficient becomes significant also helps produce a good embedded codestream.
Although many image compression algorithms code the wavelet transformed
coefficients by bit-plane coding, they are different in the method of scanning and
compressing a bit-plane. In this chapter we will discuss some of these methods.
2.1.4
Bit-plane Coding Passes
Wavelet image coding methods typically code each bit-plane with several
passes. EZW, SPIHT and SPECK code each bit-plane with two passes – significance pass and refinement pass. The significance pass convey significance and sign
information for coefficients that have not yet been found to be significant. The refinement pass sends one more bit for each coefficient that became significant in the
previous bit-plane. For each bit-plane, the two-pass scheme divides coefficients into
two sets: one containing coefficients that became significant in a previous bit-plane;
13
and one containing coefficient not yet identified to be significant. The refinement
bits have roughly an equal chance of being one or zero, while most significance pass
bits are expected to be zero because many wavelet coefficients are close to zero. Due
to this probability difference, the division of two sets leads to better rate-distortion
performance. JPEG2000 uses three passes per bit-plane instead of two. The third
pass, cleanup pass, is used to convey significance and sign information for those coefficients that have not yet been found to be significant and are predicted to remain
insignificant during the processing of the current bit plane. Using several passes per
bit plane reduces the amount of data associated with each coding pass, facilitating
finer control over rate.
2.2
Set-Partitioning Image Coding
As mentioned before, in the significance pass, a significance map is used to
represent each bit-plane. The efficiency of representing this map determines the
compression performance. An efficient method of representing the significance map
is to group coefficients into sets. A set S is said to be significant with respect to the
bitplane n if
max {|ci,j |} ≥ 2n
(i,j)∈S
otherwise it is insignificant. Where ci,j is the wavelet transform coefficient at coordinate (i, j). The insignificant set can be represented with a single bit 0, while the
significant set is recursively partitioned and tested until the significant coefficient is
located. The objective of set partitioning scheme is to create new partitions such
that subsets expected to be insignificant contain a large number of coefficients, while
subsets expected to be significant contain only one.
There are typically two types of sets:
• Interband Sets: contain coefficients from a number of different subbands. The
best known and widely implemented interband set is the zerotree set in EZW
and Spatial Orientation Tree in SPIHT.
• Intraband Sets: only contain coefficients that lie wholly within a subband.
An example of the intraband coding scheme is Quadtree partitioning which
14
quickly zooms to areas of high energy while maintaining large sets of low
energy coefficients.
In this section, set partitioning image compression algorithms that serve as a
benchmark are discussed in detail.
2.2.1
SPIHT
Said and Pearlman’s SPIHT [19] belongs to a class of embedded, tree-structured
significance mapping schemes. Compared to EZW, more (wide-sense) zerotrees are
efficiently found and represented in SPIHT by separating the tree root from the
tree, i.e. zerotree with its root significant coefficient. As the study in [20], SPIHT
is a degree-2 zerotree coder and EZW a degree-0 zerotree coder. This resulted in a
better performance and faster speed than EZW.
2.2.1.1
Spatial Orientation Trees
Spatial orientation trees, or Zerotrees, are based upon the hypothesis that
if a wavelet coefficient at a coarse scale is insignificant with respect to a given
threshold T, then all wavelet coefficient of the same orientation in the same spatial
location at finer scales are likely to be insignificant with respect to T [18]. In the
hierarchial subband system, every coefficient at a given scale can be related to a set
of coefficient at the next finer scale of the same orientation. The spatial orientation
tree is constructed based on the hypothesis and this parent-child relationship across
levels of the decomposition. The trees are partitioned into four types of sets:
• H: roots of all spatial orientation trees. They are grouped into 2 × 2 blocks
whereby the upper left coefficient has no offspring.
• O(i, j): the offspring set contains the direct offspring of the node at coordinates
(i, j), that is the four coefficients at the same spatial location in the next level
of the pyramid. Except at the highest and lowest pyramid levels, the offspring
set is defined as:
O(i, j) = (2i, 2j), (2i, 2j + 1), (2i + 1, 2j), (2i + 1, 2j + 1);
(2.1)
15
• D(i, j): set of all descendants of the coefficient (i, j).
• L(i, j): D(i, j) − O(i, j).
The parent-child relationships and four types of set definitions of SPIHT in
the 2D spatial orientation tree are shown in Fig 2.3. Here we say that a LIS entry
is of type A if it represents D(i, j), and of type B if it represents L(i, j).
*
LL3
HL3
LH3
HH3
LH2
LH1
HL2
HH2
HL1
HH1
TypeBset
TypeAset
Setroot(i,j)
Setroot(i,j)
O(i,j)
D(i,j)
L(i,j)
L(ij,)
Figure 2.3: Parent-child relationship in SPIHT.
16
2.2.1.2
Coding Algorithm
The SPIHT algorithm maintains three lists to store significant information.
Significant coefficients are stored in the List of Significant Pixels (LSP), while insignificant coefficients are stored in the List of Insignificant Pixels (LIP). Coefficient
c(i, j) which represents insignificant set D(i, j) or L(i, j) is stored in List of Insignificant Sets (LIS) of type A or LIS of type B, respectively.
The algorithm consists of four stages: initialization, sorting pass, refinement
pass and quantization step update. The last three of the four stages are repeated
for each bit-plane.
First, the algorithm is initialized by adding all coefficients in the lowest subband to the LIP, and all those with offspring to the LIS as type A. The sorting pass
begins by testing each entry in the LIP for significance with respect to the current
threshold and coding the result. For each type A entry in the LIS the descendant
set is tested. If it is significant, the set is partitioned into a type B set and four
offspring. The reason is that if a set is significant, it is likely that the descendant
which is significant will turn out to be its offspring. After the decomposition, the
decomposed coefficients and sets are further tested for significance. This process is
repeated until all the significant coefficients of that root are located. During the refinement pass, for each entry in the LSP, except those added in the last sorting pass,
the MSB of the coefficients are outputted. In this manner, a bitplane transmission
scheme is achieved. For the quantization step, n is decremented by 1 to process the
next lower bit plane. This process will be repeated until either the desired rate is
reached or all coefficients have been transmitted. As a result, SPIHT generates a
progressive codestream.
The detailed SPIHT algorithm is presented below.
Terminology:
Sn (τ ) ≡ significance of set τ w.r.t. n.
Sn (τ ) =
SPIHT Algorithm

 1, if 2n ≤ max |ci,j | ≤ 2(n+1) ,
(i,j)∈τ
 0, otherwise.
(2.2)
17
1. Initialization
• output n = blog2 (max |ci,j |)c
∀(i,j)
• set LSP = φ
• set LIP = (i, j) ∈ H
• set type A LIS = (i, j) ∈ H, s.t. D(i, j) 6= φ
2. Sorting Pass
(a) for each (i, j) ∈ LIP,
• output Sn (i, j)
• if Sn (i, j) = 1, move (i, j) to LSP and output the sign of ci,j
(b) for each (i, j) ∈ LIS,
i. if (i, j) ∈ LIS (type A),
• output Sn (D(i, j))
• if Sn (D(i, j)) = 1,
– for each (k, l) ∈ O(i, j),
∗ output Sn (k, l)
∗ if Sn (k, l) = 1, add (k, l) to LSP and output sign of ck,l
∗ if Sn (k, l) = 0, add (k, l) to LIP
– if L(i, j) = φ, remove (i, j) from LIS (type A) and skip step ii;
else change (i, j) to type B
ii. if (i, j) ∈ LIS (type B),
• output Sn (L(i, j))
• if Sn (L(i, j)) = 1,
– add each (k, l) ∈ O(i, j) to LIS (type A)
– remove (i, j) from LIS (type B)
3. Refinement Pass
(a) for each (i, j) ∈ LSP, except those included in the last sorting pass, output
the nth MSB of |ci,j |
18
4. Quantization Step
(a) decrement n by 1
(b) go to step 2
2.2.2
SPECK
The SPECK algorithm [17] improves upon SPIHT [19], SWEET [37] and AGP
[38] by producing a fully embedded bit-stream which employs progressive transmission by coding bitplanes in decreasing order. SPECK coding scheme provides
excellent results, comparable to the popular image coding scheme, such as SPIHT.
SPECK is different from SPIHT in that it does not use spatial orientation trees,
rather, like SWEET and AGP, it makes use of sets of the form of blocks of contiguous coefficients within subbands. SPECK incorporates the octave band partitioning
of SWEET to exploit the hierarchical structure of the wavelet transform. It makes
use of the quadtree splitting scheme from AGP to quickly zoom to areas of high
energy while maintaining areas of low activity in relatively large sets.
The SPECK algorithm makes use of rectangular regions of image, referred to
as set of type S. The dimension of a set S depends on the dimension of the original
image and the subband level of the pyramidal structure at which the set lies. To
test the significance of the set S, SPECK follows the same terminology used in the
SPIHT algorithm.
Two linked lists are maintained in SPECK algorithm: List of Insignificant Sets
(LIS) and List of Significant Pixels (LSP). The former contains sets of type S of
varying sizes which have not yet been found significant, while the latter contains
coefficients which have been tested significant.
Like SPIHT, SPECK also consists of four steps: the initialization step; the
sorting pass; the refinement pass; and the quantization step. The algorithm starts
by partitioning the transformed image into two sets: set S which is the root of the
pyramid, and set I which is everything that is left of the image after taking out
the root, as shown in Fig. 2.4. To start the algorithm, set S is added to the LIS.
The sorting pass examines the significance of the LIS and set I. If a set S in LIS
is insignificant, it stays in the LIS. Otherwise, quadtree partitioning will be applied
19
to S. The significant set S is partitioned into four equal subsets and retested. In
this manner, the quadtree procedure recursively divides the set S into homogeneous
rectangular regions until all significant coefficients are located. The partitioning
process is demonstrated in Fig. 2.5 where quadtree partitioning is used to locate
two significant coefficients. The motivation for quadtree partitioning of such sets is
to zoom in quickly to areas of high energy in the set S and code them first.
S
I
Figure 2.4: Partitioning of wavelet transformed image into sets S and I.
S0 = 1
S1 = 0
S =1
S2 = 0
S3 = 1
0
0
1
0
S2 = 0
S1 = 0
1
0
0
0
Figure 2.5: Quadtree partitioning of set S.
For each bit plane, after testing all sets of type S, the set I is tested next.
If I is significant, it is partitioned by another partition scheme - the octave band
partitioning. Fig. 2.6 gives an illustration of this partitioning scheme. Set I is
partitioned into four sets - three type S sets and on type I set. These new sets are
recursively tested for significance. The octave partitioning scheme is used to exploit
this hierarchical structure of the subband decomposition, where the energy is more
likely concentrated at the top most levels of the pyramid and as one goes down the
pyramid, the energy content decreases gradually.
Once one sorting pass has occurred, sets of type S of varying sizes are added to
LIS. During the next lower bit plane, these sets are processed in increasing order of
20
S1
S2
S3
I
I
Figure 2.6: Octave partitioning of set I.
their size. SPECK uses an array of lists. Each list corresponds to a level of pyramid
and stores sets of fixed size. Processing the lists in an order that corresponds to
increasing size of sets completely eliminates the need for any sorting mechanism
which significantly slows down the coding speed.
Once all the sets have been processed for a bit plane, the refinement pass is
initiated for that bit plane. This procedure is the same as that of SPIHT.
The pseudo code of SPECK is given below.
1. Initialization
• Partition image transform X into two sets: S ≡ root, and I ≡ X − S
• output n = blog2 ( max |ci,j |)c
∀(i,j)∈X
• add S to LIS and set LSP = φ
2. Sorting Pass
• in increasing order of size of sets
– for each set S ∈ LIS,
∗ ProcessS(S)
• ProcessI()
3. Refinement Pass
• for each (i, j) ∈ LSP, except those included in the last sorting pass, output
the nth MSB of |ci,j |
21
4. Quantization Step
• decrement n by 1, and go to step 2
ProcessS(S)
{
• output Sn (S)
• if Sn (S) = 1
– if S is a coefficient, output sign of S and add S to LSP
– else CodeS(S)
– if S ∈ LIS, remove S from LIS
• else
– if S ∈
/ LIS, add S to LIS
}
CodeS(S)
{
• partition S into four equal subsets O(S)
• for each O(S)
– output Sn (O(S))
– if Sn (O(S)) = 1
∗ if O(S) is a coefficient, output sign of O(S) and add O(S) to LSP
∗ else CODES(O(S))
– else
∗ add O(S) to LIS
}
ProcessI()
22
{
• output Sn (I)
• if Sn (I) =1
– CodeI()
}
CodeI()
{
• partition I into four sets - three S and one I
• for each of the three sets S
– ProcessS(S)
• ProcessI()
}
2.2.3
SBHP
Subband Block Hierarchical Partitioning (SBHP) [16], a SPECK variant, was
originally proposed as a low complexity alternative to JPEG2000. SBHP has been
incorporated into the JPEG2000 Verification Model (VM) 4.2, where a command
line switch initiates the SBHP coding engine in place of EBCOT [24] in coding the
codeblocks of the subbands. Every single feature and mode of operation supported
by the VM continues to be available with SBHP. Like EBCOT, SBHP is applied to
blocks of wavelet coefficients extracted from inside subbands. Except for the fact
that it does not use the arithmetic encoder, it does not require any change in any
of the VM function outside of the entropy coding.
SBHP uses SPECK’s octave-band partitioning scheme on codeblocks and encodes the S sets with the quadtree splitting CodeS(S) procedure of SPECK as
described in the last section. Minor differences are that SBHP maintains three lists:
List of Insignificant Sets (LIS), List of Insignificant Pixels (LIP) and List of Significant Pixels (LSP). SBHP uses a separate LIP for insignificant isolated pixels. But
23
the LIP is visited first and then the LIS in order of increasing size sets. Therefore,
the two lists LIP and LIS are functionally equivalent to the one LIS list in SPECK.
Figure 2.7 shows the sequential process of partitioning a 16 × 16 codeblock.
In SBHP, the partitioning of the codeblock mimics the octave band partitioning in
SPECK by starting with a 2 × 2 blocks S at the upper left with the rest of the block,
the I set, as shown in figure 2.7(a). The coding proceeds in the codeblock just as
it does for the full-transform SPECK described in the last section until the block’s
target file size is reached. In this example, the first set can be decomposed in 4
individual pixels and the second set can be decomposed in three 2 × 2 blocks and
the remaining pixels. Figure 2.7(b) shows the next level of decomposition: each 2×2
set can be decomposed in 4 pixels, and the remaining set can be partitioned into
groups of 4 × 4, plus the remaining pixels. In the next stage, each 4 × 4 set is split
into 4 2 × 2 sets, and the remaining set is partitioned in 3 8 × 8 sets. As shown in
figure 2.7(c), at this moment there is no set of remaining pixels. Figure 2.7(d) shows
how the process continues, until all sets are partitioned to individual pixels. The
procedure is repeated on the next codeblock until all codeblocks in each subband
are coded. The subbands are visited in order from lowest to highest frequency in the
same order dictated by the octave band partitioning in the full-transform SPECK.
The pseudo code of the SBHP algorithm is given below.
For l = 1, 2, ..., L
1. Initialization
• Partition code-block X into two sets: S ≡ the top-left 2 × 2 rectangular
prism, and I ≡ X − S
• output n = blog2 ( max |ci,j |)c
∀(i,j)∈X
• add S to LIS, set LIP = all coefficients in the codeblock and set LSP =
φ
2. Sorting Pass
• for each ci,j ∈ LIP output Sn (ci,j )
– if Sn (ci,j ) = 1
24
Figure 2.7: Set partitioning rules used by SBHP.
∗ output sign of ci,j and move ci,j from LIP to LSP
• in increasing order of size of sets
– for each set S ∈ LIS,
∗ ProcessS(S)
• ProcessI()
3. Refinement Pass
• for each (i, j) ∈ LSP, except those included in the last sorting pass, output
the nth MSB of |ci,j |
4. Quantization Step
• decrement n by 1, and go to step 2
ProcessS(S)
{
25
• output Sn (S)
• if Sn (S) = 1
– CodeS(S)
– if S ∈ LIS, remove S from LIS
• else
– if S ∈
/ LIS, add S to LIS
}
CodeS(S)
{
• partition S into four equal subsets O(S)
• for each O(S)
– output Sn (O(S))
– if Sn (O(S)) = 1
∗ CodeS(O(S))
– else
∗ add O(S) to LIS
}
ProcessI()
{
• output Sn (I)
• if Sn (I) =1
– CodeI()
}
CodeI()
26
{
• partition I into four sets - three S and one I
• for each of the new generated sets S
– ProcessS(S)
• ProcessI()
}
SBHP uses a simple fixed Huffman code of 15 symbols for encoding the significance map bits generated by the SPECK algorithm. No type of entropy coding is
used to code the sign and the refinement bits. This results in compression loss, but
it is observed that it is very difficult to compress these bits efficiently, and nothing
is simpler than just moving those ”raw” bits to the compressed stream. Although
EBCOT outperforms SBHP in terms of PSNR, tests showed that SBHP was about
4 times faster than the JPEG2000 VM 4.2 in encoding and about 6 to 8 times faster
in decoding for the embedded version and as much as 11 times faster for the nonembedded version, in which case the complexity of SBHP becomes close to baseline
JPEG. For natural images, such as photographic and medical images, the reductions
in PSNR from VM 4.2 are in the range of 0.4-0.5 dB. SBHP showed losses in bit
rate at the same PSNR level from 5-10% for lossy compression and only 1-2% for
lossless compression [39].
2.2.4
EBCOT
Taubman’s embedded block coding with optimized truncation (EBCOT) [24]
is significant because it offers the capabilities of embeddedness, resolution scalability,
and spatial accessibility.
In EBCOT, compression to a specific rate is achieved as a two tier process. The
image is first compressed without considering the target bit rate, then the second
tier postprocesses the bitstream to produce a rate-distortion optimized bitstream
for a specific rate.
27
EBCOT divides each subband into relatively small codeblocks (typically 32 ×
32 or 64 × 64), and codes each codeblock independently to generate an embedded
bitstream for that block. Truncation points are marked in the embedded bitstream
for each codeblock, with each truncation corresponding to a certain quality metric.
The tier two algorithm selects various truncation points from each codeblock to
construct the optimal embedded bitstream for a given bit rate. Since the codeblocks
are relatively small, and the multi-resolution wavelet transform method is used,
this coder is also spatially accessible and resolution scalable. The downside of this
scheme is that for each desired embedded rate, truncation points have to be marked
in each codeblock. To approximate a true embedded scheme, EBCOT has to select
a large number of truncation points for each codeblock. The overhead associated
with location and quality metric information of each truncation point has a negative
effect on performance.
In EBCOT, codeblocks are encoded in bit-plane order. Each codeblock is recursively partitioned into sub-blocks down to typically 16 × 16 dimensions. The
quadtree structure is applied over sub-blocks to code the significance of each subblock explicitly prior to sample-by-sample coding in the significant sub-blocks. The
significant sub-block is encoded with a specialized sub-block coding method. The
coefficients of a significant sub-block are coded by an arithmetic coder with 18 different contexts. A coefficient’s context is decided by its own significance status and its
eight adjacent neighbors’ significance status. Moreover, five additional contexts are
used for encoding the sign bits of coefficients that just became significant. Fractional
bitplane coding is achieved using four passes for each bit-plane. Over a wide variety
of images and bit-rate, PSNR performance improvement over SPIHT is about 0.4
dB on average.
The EBCOT algorithm is the basis for the JPEG2000 standard with some
modifications introduced to the entropy coding part of EBCOT. Most of the changes
are described in [40, 24]. Here we give a brief summary of these changes.
• To reduce the model adaptation cost in typical images, some of the contexts
are initialized in the assumed highly skewed state instead of the traditional
equi-probable state.
28
• A low-complexity but less-effective coder known as the MQ [41] coder is used
instead of the traditional arithmetic coder.
• Only three coding passes are used in the fractional bitplane scheme.
• There is no quadtree partitioning of code-blocks. Each bit-plane of a codeblock is scanned in a particular order as shown in Figure 2.8.
Figure 2.8: Example of JPEG2000 code-block scan pattern
The cumulative effect of these modification is a 40% improvement in execution
speed for the entropy coding part with an average loss of about 0.15 dB [24].
2.3
Conclusion
In previous sections, we have discussed the desirable features of image com-
pression and most popular wavelet-based image coding algorithms proposed recently.
In Table 2.2, we make an approximate comparison of these algorithms to provide a
general guide to choose appropriate coder for different applications. For example,
when low complexity and good PSNR performance are a must, we can choose algorithms like SPIHT or SPECK. If scalability and accessibility are preferred, SBHP
and JPEG2000 are good candidates.
29
Original SPIHT
SPECK
SBHP
EBCOT/JPEG2K
PSNR
Performance
Very Good
Very Good
Good
Excellent
SNR
Scalability
Yes
Yes
Yes
Yes
Resolution
Scalability
No
Yes
Yes
Yes
Random
Accessibility
No
No
Yes
Yes
Complexity
Low
Low
Very Low
High
Table 2.2: Comparison of wavelet-based image coders
CHAPTER 3
LOW-COMPLEXITY 3-D IMAGE CODER: 3D-SBHP
In this chapter, we present a low-complexity three dimensional image compression algorithm to support Signal-to-Noise Ratio (SNR) scalability, resolution
scalability and Region-Of-Interest (ROI) decoding. We demonstrate progressive
lossy to lossless compression of volumetric images using three dimensional integer
wavelet transform and subband block hierarchical partitioning (SBHP). The coding
efficiency comes from exploiting the dependencies in all three dimensions. The hierarchial coding and block based structure enables spatial accessibility and resolution
scalable representation of wavelet transform coefficients.
3.1
Introduction
Nowadays, many medical data acquisition devices and multispectral imaging
techniques produce three-dimensional image data. The increasing use of threedimensional imaging modalities triggers the need for efficient techniques to transport
and store the related volumetric data. To make interactive viewing more efficient,
we need a compression scheme which is inherently scalable and that supports a high
degree of random accessibility and fast encoding/decoding.
To store and transmit those three-dimensional data, typically the image volume is considered as composed of multiple slices. In the previous chapter, we introduced several benchmarks of efficient wavelet based embedded image compression
algorithm. It is always possible to compress the slices independently and use multiplexing mechanisms to select from each slice the correct bitstream to support the
required Quality-of-Service for the whole volumetric images. And we shall use such
a scheme as our reference point.
Since neighboring slices have high spatial correlation, it is natural to try to
improve compression efficiency by exploiting this property. To provide scalability and compression efficiency , many wavelet-based coders have been extended to
three-dimensions, such as Three-Dimensional Context-Based Embedded Zerotree of
30
31
Wavelet coefficient(3D-CB-EZW) [7] and Three-Dimensional Set Partitioning In Hierarchical Trees(3D-SPIHT)[9, 11, 10], Stripe-based SPIHT [12], Three-Dimensional
Set Partitioned Embedded bloCK (3D-SPECK)[21], Three Dimensional Cube Splitting EBCOT (3D CS-EBCOT) [14], Three-Dimensional SPIHT with Random Access and Resolution Scalability (RARS 3D-SPIHT) [13], and Annex of Part II of
JPEG2000 [27] standard for multi-component imagery compression. Although all
of these algorithms support SNR scalability, only EBCOT-based 3D CS-EBCOT,
JPEG2000 multi-component and RARS 3D-SPIHT can support SNR/resolution
scalability and random accessibility simultaneously. However, 3D CS-EBCOT inherits high complexity from EBCOT, RARS 3D-SPIHT has relatively high memory
requirement, while JPEG2000 multi-component fails to enable a number of source
coding features in the cross-component direction.
With interactive 3D image viewing in mind as the primary application, we
can see that among algorithms shown in Table 2.2, SBHP is a very good candidate
for volumetric images. Recently, JP3D, part 10 of JPEG20000 [28] that will provide extensions of JPEG2000 for logically rectangular 3D data sets with no time
component, has been issued, but there is no workable application available. Our
work, presenting a low-complexity 3D image coder for interactive applications, is
motivated in part by JP3D. Our work not only emphasizes desired features, such
as scalability and the ability to access regions of interest within volumetric images,
but also provides an application which allows efficient network access to compressed
volumetric image data and their metadata in a way that exploits these features.
In this chapter, we first introduce briefly the 3D wavelet/subband transform
structure used in 3D-SBHP and describe the lifting scheme and integer to integer
transforms. In addition to the integer transform, we also describe the proper scaling
of coefficients and wavelet transform structure that result in approximate 3D unitary transform. In Section 3.3, we present the details of the 3D-SBHP algorithm.
Simulation results, comparisons, and analysis are given in Section 3.4. Finally, we
summarize and conclude this chapter in Section 3.5.
32
3.2
Three-Dimensional Integer Wavelet Transform
The proposed volumetric coding system consists of a 3D wavelet/subband
transform part and a coding part with 3D SBHP kernel. Table 3.1 shows the average
standard deviation (STD) of four medical and hyperspectral image sequences given
in Table 6.2 along the X, Y, and axial Z direction, respectively. As an example, the
average STD along the axial direction Z is calculated by
Averagex,y (stdz (GOF (x, y, z)))
where Averagex,y is the mean function
s with respect to x and y, stdz is the function
i=N
P
1
of STD with respect to z (stdz =
(zi − z)2 ). The result shows that the
N
i=1
STDs along X and Y directions are close, while STD along Z is much smaller. So,
it is reasonable to apply wavelet transform along Z direction in a different way from
that along the X and Y directions. In our 3D wavelet transform scheme, the 2D
spatial transform and 1D axial transform (along image slices) are done separately
by first performing a 2D dyadic wavelet decomposition on each image slice, and then
performing a 1D wavelet packet decomposition along the resulting image slices. A
heterogeneous selection of filter types and a different amount of decomposition levels
for each spatial direction (x, y or z direction) are supported by this separable wavelet
decomposition module. This allows for adapting the size of the wavelet pyramid in
each spatial direction in case the spatial resolution is limited. Figure 3.1 shows an
example of a wavelet decomposition structure of a packet non-symmetric transform
with 3 levels of 2D spatial transform followed by 2 levels of 1D axial transform.
STD
CT skull
MR Liver t1
moffett scence 1
moffett scence 2
X
39.927
28.157
654.062
907.438
Y
48.624
42.851
702.028
1208.555
Z
5.061
9.173
578.474
370.583
Table 3.1: Average standard deviation of volumetric image sequences
along X, Y, and Z directions.
Because the number of slices in a typical volumetric data set can be quite
large, it is impractical to buffer all slices for the axial transform. In our scheme,
33
Figure 3.1: Wavelet decomposition structure with 3 levels of 2D spatial
transform followed by 2 levels of 1D axial transform.
the image slices are collected into a Group Of Slices (GOS) of F consecutive slices.
Each GOS is independently transformed and coded. This also make random access
to the selected slice easier.
3.2.1
Lifting Scheme
The traditional Mallat’s algorithm [42] of wavelet transform involves recur-
sively convolving the signal through two decomposition filters h and g, and decimating the result to obtain wavelet coefficients at every decomposition level. However,
integer wavelet transforms are not easy to construct by this traditional method. W.
Sweldens [35] introduced the lifting scheme allowing to compute the discrete wavelet
transform with a reduced computational complexity and support for lossless transform [43].
The idea of lifting scheme is to divide the wavelet transform into split and
a set of lifting steps. Sweldens et. al. [44] proved the theorem: Every wavelet or
subband transform with finite filters can be obtained as the Lazy wavelet followed by
a finite number of primal and dual lifting steps and a scaling. The lifting scheme
34
is shown in Figure 3.2 and Figure 3.3. The dual polyphase matrix and polyphase
matrix are given by [44]:
P̃ (z) =
m
Y


i=1
P (z) =


1
0

−1
0
−si (z ) 1
m
Y
1 si (z)
0
i=1
z
1
0

ti (z) 1
t1 ( z )
K
K
0
0
1/K
P (z − 1)
~
(3.1)

K −1
LP
K
BP
(3.2)
t m (z )
sm ( z )
2
Lazywavelet
0


2
s1 ( z )
0


1


1/K
1



1 −ti (z −1 )
t
Figure 3.2: The forward wavelet transform using lifting: First the Lazy
wavelet (subsample into even and odd), then alternating lifting and dual lifting steps, and finally a scaling.
LP
K
+
t m (z )
BP
K −1
sm ( z )
+
t1 ( z )
+
+
P(z )
2
+
s1 ( z )
2
z −1
Inverselazywavelet
Figure 3.3: The inverse wavelet transform using lifting: First a scaling,
then alternating dual lifting and lifting steps, and finally the
inverse Lazy transform.
Since we can write wavelet transform with lifting steps, it follows that we can
build an integer version of every wavelet transform. For example, one can in each
lifting step round-off the result of the filter right before the adding or subtracting.
This results in an integer to integer transform.
We present several lossless integer lifting filters of the form (N, Ñ ) where N
is the number of vanishing moments of the analyzing high pass filter, while Ñ is
the number of vanishing moments of the synthesizing high pass filter (vanishing
35
moments correspond to the multiplicity of zero as a root in the spectum of the
filter). The integer (2, 2) filter we give below is the 5/3 filter used in JPEG2000.
Table 3.2 gives the name and number of filter taps of the given filters [36].
• S+P (B) transform [45]
h[n] = x[2n + 1] − x[2n]
hd [n] = x[2n] − b
i=1
P
l[n] = h[n] + b h[n]
c
2
αi (l[n + i − 1] − l[n + i]) − β1 h[n + 1] + 1/2c
(3.3)
i=−1
where α−1 = −1/16, α0 = 4/16, α1 = 8/16, β1 = 6/16.
• (2,2) transform
+ 1/2c
h[n] = x[2n + 1] − b x[2n]+x[2n+2]
2
l[n] = x[2n] + b h[n−1]+h[n]
+ 1/2c
4
(3.4)
• (2,4) transform
h[n] = x[2n + 1] − b x[2n]+x[2n+2]
+ 1/2c
2
19
(h[n − 1] + h[n]) −
l[n] = x[2n] + b 64
3
(h[n
64
− 2] + h[n + 1]) + 1/2c
(3.5)
• (2+2,2) transform
h[n] = x[2n + 1] − b x[2n]+x[2n+2]
+ 1/2c
2
l[n] = x[2n] + b h[n−1]+h[n]
+ 1/2c
4
hd [n] = h[n] − b −l[n−1]+l[n]+l[n+1]−l[n+2]
+ 1/2c
16
Filter
Name
S+P
5/3
9/3
5/11
Number of
Vanishing Moments
(2,4)
(2,2)
(2,4)
(2+2,2)
Number of low-pass
Filter Taps
2
5
9
5
Number of high-pass
Filter Taps
6
3
3
11
Table 3.2: Lossless integer filters.
(3.6)
36
From the computational standpoint, the lifting scheme has shown numerous
benefits over traditional DWT. Daubechies and Sweldens [44] give the theorem:
Asymptotically, for long filters, the cost of the lifting algorithms for computing the
wavelet transform is one half of the cost of the standard algorithm. For (N, Ñ )
wavelet, the cost for standard algorithm is 3(N + Ñ ) − 2 while the cost of the
lifting algorithm is 3/2(N + Ñ ). Furthermore, the lifting scheme requires only inplace computations, and the integer-to-integer lifting scheme does not require any
floating point computations, which yields substantial saving in computation and
memory requirements.
3.2.2
Scaling Factors
A typical problem encountered with 3D integer wavelet transform is the com-
plexity needed to make the transform unitary. The problem is caused by the fact
that integer wavelet transforms are not unitary. So the quantization error in the
wavelet domain is not same as the error in the spatial domain. This does not affect
the performance of lossless compression. However, a unitary transform is necessary
in order to achieve a good lossy coding performance.
The normalization factors can be determined by calculating the L2 norm of the
low-pass and high-pass filters. Generally, in the 1D case, the low-pass band needs
√
√
be scaled up by 2 while high-pass band needs be scaled down by 1/ 2 after every
two-band wavelet decomposition to make the transform unitary. It is not difficult
to make 2D integer transform unitary since the typical scaling factors to obtain a
unitary transform for the 2D case are approximate powers of two [45]. Therefore, it
can be easily implemented by bit shifts. Figure 3.4(a) shows the scaling factor for
three-level 2D dyadic integer wavelet transform.
However, the scaling factors are not powers of two for odd dimensional dyadic
integer wavelet transforms. Therefore, in 3D the scaling factor for different subbands are not all powers of two, thus, simple bit shit cannot make the transform
approximately unitary. We have to find a 1D transform structure that allows simple
bit shift of wavelet coefficients to make the transform unitary. Some proposals [8, 46]
have been formulated that make use of a wavelet packet transform to get unitarity.
37
Fig 3.4(b) shows the simplest two-level wavelet packet tree.
In our 3-D scheme with integer filters, to make the 3-D transform approximately unitary, 1-D wavelet packet decomposition and scaling are performed in
axial domain, and 2-D dyadic wavelet decomposition and 2D scaling factor are applied on each image slice. Fig 3.4(c) shows scaling factors after three-level 2D integer
wavelet transform in the spatial dimension and two-level 1D packet transform in the
axial dimension, where GOS = 4. The factors are the multiplication of corresponding scaling factors in Figure 3.4(a) and Figure 3.4(b). For wavelet decomposition,
asymmetric (decoupling) decomposition is chosen since it is reported to show better
performance than symmetric decomposition when the correlation along the axial
direction is stronger than that along the horizontal and vertical direction [47]. In
the asymmetric 3D wavelet transform, several cascaded wavelet transforms along
the Z direction remove the correlation along the axial direction first. Then, a set of
alternate wavelet transforms along the X and Y directions remove the correlation in
the spatial domain. Figure 3.5 gives an example of 3D asymmetric decomposition
with two-level wavelet decomposition in the axial domain and three-level wavelet
decomposition in the spatial domain.
3.3
Scalable 3D-SBHP
Consider a 3D image data set has been transformed using 3D discrete wavelet
transform, as shown in Figure 3.5. The image sequence is represented by an indexed
set of wavelet transformed coefficients ci,j,k located at the position (i, j, k) in the
transformed image sequence. Following the idea in[19], for a given bit plane n and
a given set τ of coefficients, we define the significance function:
Sn (τ ) =

 1, if 2n ≤ max |ci,j,k | ≤ 2(n+1) ,
(i,j,k)∈τ
 0, otherwise.
(3.7)
Following this definition, we say that set τ is significant with respect to bit
plane n if Sn (τ ) = 1. Otherwise, we say that set τ is insignificant.
In 3D-SBHP, each subband is partitioned into code-blocks with the same size.
3D-SBHP algorithm makes use of rectangular prisms in the code-blocks. These rect-
38
angular prisms or sets referred to as set of type S, can be of varying dimensions. The
dimension of a set S depends on the dimension of the code-block and the partitioning rules. Because of the limited number of frames in a GOS, the dimension along
the axial direction of the code-block might be much shorter than dimension along
x and y direction. This results that some S sets are 2D sets, i.e., axial dimension
= 1. We define Max2D to be the maximum 2D S set that can be generated. For a
2m × 2m × 2l code-block, the Max2D is the 2m−l × 2m−l × 1 set. 3D-SBHP always
has S sets with at least 2 × 2 × 1 coefficients.
The size of a set is defined to be the cardinality C of the sets, i.e., the number
of coefficients in the set. During the course of the algorithm, sets of various sizes
will be formed, depending on the characteristics of coefficients in the code-block.
size(S) = C(S) ≡ |S|
(3.8)
3D-SBHP also defines another type of set referred to as type I. These sets are
obtained by chopping off a small rectangular prism from the top left portion of the
code-block. Figure 3.6 illustrates a typical set I.
To minimize the number of significant tests for a given bit-plane, 3-D SBHP
maintains three lists:
• LIS(List of Insignificant Sets) - all the sets(with more than one coefficient)
that are insignificant but do not belong to a larger insignificant set.
• LIP(List of Insignificant Pixels) - coefficients that are insignificant and do not
belong to insignificant set.
• LSP(List of Significant Pixels) - all coefficients found to be significant in previous passes.
3.3.1
Coding Algorithm
The 3D-SBHP coder is applied to every code-block independently and gen-
erates a highly scalable bit-stream for each code-block by using the same form of
progressive bit-plane coding as in SPIHT[18]. The coder encodes the code-blocks
39
resolution by resolution, from code-blocks in lower resolution subband to code-blocks
in higher subbands. This enables progressive resolution decoding.
The 3D-SBHP algorithm consists of the initialization step, the sorting and
refinement passes and quantization step. Assume the total number of code-blocks
is L and all code-blocks are indexed from low-pass subband to highpass subband.
The pseudo code of the algorithm is given below.
For l = 1, 2, ..., L
1. Initialization
• Partition code-block X into two sets: S ≡ the top-left 2×2×1 rectangular
prism, and I ≡ X − S (see Figure 3.6)
• output n = blog2 ( max |ci,j,k |)c
∀(i,j,k)∈X
• add S to LIS, set LIP = all coefficients in the codeblock and set LSP =
φ
2. Sorting Pass
• for each ci,j,k ∈ LIP output Sn (ci,j,k )
– if Sn (ci,j,k ) = 1
∗ output sign of ci,j,k and move ci,j,k from LIP to LSP
• in increasing order of size of sets
– for each set S ∈ LIS,
∗ ProcessS(S)
• ProcessI()
3. Refinement Pass
• for each (i, j, k) ∈ LSP, except those included in the last sorting pass,
output the nth MSB of |ci,j,k |
4. Quantization Step
• decrement n by 1, and go to step 2
40
ProcessS(S)
{
• output Sn (S)
• if Sn (S) = 1
– CodeS(S)
– if S ∈ LIS, remove S from LIS
• else
– if S ∈
/ LIS, add S to LIS
}
CodeS(S)
{
• if size(S) ≤ size(M ax2D)
– partition S into four equal subsets O(S)
• else
– partition S into eight equal subsets O(S)
• for each O(S)
– output Sn (O(S))
– if Sn (O(S)) = 1
∗ CodeS(O(S))
– else
∗ add O(S) to LIS
}
ProcessI()
{
41
• output Sn (I)
• if Sn (I) =1
– CodeI()
}
CodeI()
{
• if size(X − I) ≤ size(M ax2D)
– partition I into four sets - three S and one I
• else
– partition I into eight sets - seven S and one I
• for each of the new generated sets S
– ProcessS(S)
• ProcessI()
}
3-D SBHP is based on a set-partitioning strategy. Figure 3.7 and Figure 3.8
illustrate the partitioning process used in 3D-SBHP. These splits occur only when
set is significant. Below we explain the partitioning rule of 3D-SBHP algorithm in
detail by using a 16 × 16 × 4 code-block as an example.
If the 16 × 16 × 4 code-block is significant, the algorithm starts by partitioning
the code-block into two sets: set S which is composed of the 2×2×1 top-left wavelet
coefficient in the first frame, and set I which contains the remaining coefficients, as
shown in Figure 3.9(a). LIS is initialized to set S. The size (Max2D) = 4 × 4 × 1.
In the first set partitioning stage, set S which is smaller than the Max2D can
be decomposed into 4 individual coefficients, and set I can be decomposed into three
2 × 2 × 1 S sets and a new I set, as shown in Figure 3.9(b). Figure 3.9(c) shows
the second stage of set partitioning, each 2 × 2 × 1 S set can be decomposed into
4 coefficients, and the remaining I set can be split into seven 4 × 4 S sets and a
42
remaining set, I set. In the third stage, as shown in Figure 3.9(d), each 4 × 4 × 1
S set is split into 4 2 × 2 × 1 S sets, and the I set is partitioned in seven 8 × 8 × 2
S sets. Here, size(X-I )> size(Max2D). Figure3.9(e) shows each 2 × 2 × 1 S set can
be decomposed into 4 coefficients, and each 8 × 8 × 2 S set can be split into eight
4 × 4 × 1 S set. This process continues until all sets are partitioned to individual
coefficients and these partitions only apply on significant sets.
For each new bit plane, significance of coefficients in the LIP are tested first.
Significant coefficients move from LIP to LSP; a bit 1 is output to indicate the
significance of the coefficient; and another bit is outputted to represent the sign of the
pixel. Then each set in the LIS are tested for significance. If the S is not significant,
it stays in the LIS. Otherwise, the significant S set is partitioned following the
quadtree partitioning rules as shown above until all significant coefficients in that S
set are located and coded. The algorithm sends the significant coefficients to LSP.
Once all sets of type S are processed, the set I, if it exists, is processed by testing
it against the same threshold. If it is significant, it is partitioned by octave-band
partitioning rule as shown above. After the set I is partitioned, the new S sets are
processed in the regular image-scanning order.
Once all the sets S and I have been processed, the refinement pass is initiated
which refines the coefficients in LSP, except those included in the just-completed
sorting pass.
The last step of the algorithm is to lower the threshold by 1 and the sequence
of sorting and refinement passes is repeated against this lower threshold.
In our coding scheme, the above sequence of four steps: the initialization step,
the sorting and refinement passes and quantization step, is applied to every codeblock independently and generates SNR progressive bit stream for every code-block.
3.3.2
Processing Order of Sorting Pass
Once one sorting pass has occurred, insignificant coefficients and sets of type
S of varying sizes are generated and added to LIP and LIS, respectively. During the
sorting pass, the algorithm goes through the LIP first, then processes sets of type S
in increasing order of their size.
43
This strategy is based on the idea that during the sorting pass, the algorithm
sends those coefficients to LIP whose immediate neighbors have tested significant
against some threshold but they themselves have not tested significant against that
particular threshold. Because of energy clustering in the transform domain, these
insignificant coefficients would have magnitudes close to the magnitudes of their
significant neighboring coefficients. Therefore, these coefficients will have higher
probability to become significant to some nearby lower threshold. The second reason
is that the overhead involved in processing a single coefficient in LIP or a smaller
size S set in LIS is much lower than that involved in processing a larger size S set.
Therefore, if the coding algorithm stops in the middle of a sorting pass, we can get
performance improvement and facilitate finer rate control by executing sorting pass
in increasing order of set size.
Instead of using a single large list having sets S of varying sizes, we use an
array of smaller lists of type LIS, each containing sets of type S of a fixed size.
Since the total number of sets S that are formed during the coding process remain
the same, using an array of lists does not increase the memory requirement for
the coder.Use of multiple lists completely eliminates the the need for any sorting
mechanism for processing sets of type S in increasing order of their size and speeds
up the encoding/decoding process.
3.3.3
Entropy Coding
During the sorting pass, when a S set is split, the significant patterns of the
mask have unequal probabilities as shown in Table A.6 and Table A.2, Appendix A.
We exploit this fact to reduce the number of compressed bits with simple entropy
coding. Although entropy coding is a powerful tool to improve compression performance, it does add complexity, and adaptive arithmetic or adaptive Huffman coding
add much more. In order to keep the complexity small, instead of using arithmetic
coding, 3D-SBHP uses only three fixed Huffman codes in some special conditions.
We generate individual Huffman tables based on the analysis of training sets of both
medical images and hyperspectral images as shown in Appendix A. Since after most
sets are partitioned, there are four subsets or pixels, we can code them together.
44
In 3D-SBHP, we choose a Huffman code with 15 symbols, corresponding to all the
possible outcomes. The largest Huffman codeword is of length 7 bits. To speed up
decoding, we can use lookup tables instead of binary tree to decode.
The other reason to choose fixed Huffman coding is the random accessibility.
During the adaptive coding process, the entropy coder pays a price for inaccuracies in
the conditional probability estimates until it converges to the source statistics. The
probability adaptation process requests sufficient samples to get convergence. Unlike
3D-CS-EBCOT [14], our 3D-SBHP reduces the 3D code-block temporal dimensions
to enhance slice accessibility. The smaller code-block might not have enough samples
to compensate this learning penalty.
For sign and refinement bits, their probabilities to be 1 and 0 are very near
1/2. It is very hard to compress these bits efficiently. Therefore, no type of entropy
coding is used to code these bits, although this method results in some compression
loss. We just move these ”raw” bits to the compressed bitstream.
3.3.4
Memory and Complexity Analysis
3D-SBHP has low memory requirements. The algorithm splits every subband
into fixed-size code blocks and processes every code block independently. Therefore,
at any given time during the coding process, only a fixed-size memory is used for
coding/decoding. The size of the dynamic memory does not depend on the size of
the volumetric image. It only depends on the size of the code block. So even for a
huge size volumetric image, only a small size of dynamic memory is needed. Since
the algorithm works with fixed-size code blocks, the fixed-size dynamic memory can
be assigned in advance and used for all code blocks, and no time-consuming memory
control is needed. Moreover, the data in the code block can fit in the CPU’s fast
cache memory which minimizes access to slow memory.
3D-SBHP also has low computational complexity. Only the most basic operations, like memory access, bit shifts, additions and comparisons are required by the
coder. No multiplication or division is required.
The complexity analysis of 3D-SBHP can be divided in two parts: independent
of and dependent on the bit rate.
45
The 3D-SBHP coder first executes the preprocessing pass that visits all coefficients in the code block to gather information about bits in all bit planes. This pass
only requires one bitwise OR operation per coefficient following a predetermined
sequence. All bit-plane coders need a similar pass to identify the top bit-plane. The
bit rate-independent complexity is related to this preprocessing pass. As mentioned
above, a bitwise OR operation is necessary for each coefficient and for each set. The
total number of sets in 3D-SBHP is about 1/3 the number of coefficients, so we need
about 4/3 accesses per coefficient for the preprocess pass.
The bit rate-dependent complexity is proportional to the number of compressed bits. For most common bit rates, most of the computational effort is spent
processing the LIS. We can roughly measure the complexity of coding a bit plane by
counting the number of bit comparisons used to test its bits. 3D-SBHP tests only
the elements in its lists. When all bits in a set inside a bitplane are equal to zero,
the set partitioning process uses exactly one bit to indicate it. Since the information about these sets is gathered in the preprocessing pass, only one comparison is
required (the decoder just reads the bit) per bit in the compressed stream. So,the
number of bits generated per bitplane is equal to the number of comparisons.
Two facts are important in reducing the list management complexity in 3DSBHP. First, the algorithm works with small fixed-size code-blocks, so the list memory can be assigned in advance, and no time-consuming memory control is required.
Second, all the lists and list arrays are updated in the most efficient list management
method - FIFO.
3.3.5
Scalable Coding
In many applications, one may need to view only a low precision image with
high resolution or a low resolution image with high precision. For these applications,
the coder needs to have the capability to encode the image only once with all
bitplanes and all resolutions, so that the user can extract a subset of the bit stream
to reconstruct images with specified resolution and quality during the decoding.
46
3.3.5.1
Resolution Scalability
In a wavelet coding system, resolution scalability enables increase of resolution
when bits in higher frequency subbands are decoded. For a 2D image, after Nlevels of wavelet decomposition, the image has N + 1 resolution levels. For a 3D
image sequence with N-level wavelet decomposition in the spatial direction and Mlevel wavelet decomposition in the spectral direction, a total of (N + 1) × (M + 1)
resolution levels will be available as shown in Figure 3.10. 3D-SBHP is applied to
every code-block inside the subbands independently. At the encoder side, along each
direction, no code-block in the higher frequency subbands can be encoded before
all code-blocks in the lower frequency subbands are encoded. As shown in Figure
3.11, the whole bit stream is resolution scalable. At the decoder side, if a user wants
to view the image with resolution n, then the bits belonging to code-blocks related
to resolution n will be extracted for decoding. To locate a specified code-block in
the bitstream, the size of every code-block (the compressed bits generated for every
code-block) need be kept in the header.
3.3.5.2
Rate Control
Within every code-block, the bitstream is SNR progressive by bitplane coding,
but overall, the bitstream is resolution progressive, not SNR progressive. But if the
bits belonging to the same threshold from every code-block are put into the bitstream
starting from the highest to the lowest threshold, then the composite bitstream
would be embedded. For a given target bit rate, we need apply a bit allocation
algorithm to select a cutting point for every code-block to minimize the distortion.
The solution is the same rate-distortion slope for every code-block receiving non-zero
rate. The Lagrangian optimization method given in [24] is used in our scheme to
find the optimal cutting point for every code-block Bk whose embedded bitstream
may be truncated at rate Rk and leads to distortion Dk in the reconstructed image.
The additive distortion measure, squared error, is chosen to have the total distortion
X
D=
Dk . The Lagrangian optimization tells that given a parameter λ, the set
k
of optimal truncation rate {Rk } is the one which minimizes the cost function
47
(D(λ) + λR(λ)) =
X
(Dkλ + λRkλ ),
(3.9)
k
where
X
Rkλ = R ≤ Rtarget . So, if we can find a value of λ for all code-blocks such
k
that the truncation rates which minimize the cost function yield R(λ)) = Rtarget ,
then this set of truncation rates must be an optimal solution.
Let λ1 and λ2 be two different Lagrangian parameters. Let (R1 , D1 ) and
(R2 , D2 ) be the solutions of min(D + λR) for λ1 and λ2 , respectively. Then, by
Lemma 2 in [25], we have R1 ≥ R2 if λ1 < λ2 . Given a target bitrate Rtarget , we
can find the value λ quickly by using this property. The bitrate R(λ) is first be
calculated with a starting value of λ. Then we modify the value of λ according to
the relative value of Rtarget and R(λ). This process is repeated until the value is
found.
Since every code-block is coded separately, we need to solve the optimal truncation problem for every code-block Bk . A simple algorithm to find the truncation
rate, Rk , which minimizes (Dkλ + λRkλ ) for a given λ, is as follows [24]:
• initialize i = 0;
• for j = 1, 2, 3, ...
– set ∆Rkj = Rkj − Rki and ∆Dkj = Dki − Dkj , where Rkj increases with j;
– if ∆Dkj /∆Rkj > λ, then update i = j.
To calculate the distortion-rate slopes ∆Dkj /∆Rkj , we need information of number of bits used in compression and the decrease in distortion. Exact computation of
the squared error requires computation of square values for each coded pixel, which
is not trivial. In 3D-SBHP, instead calculating the distortion for each pixel, we simplify this computation by estimating the reduction of distortion [26] as a function
of number of elements in the LIS, LIP and LSP. The evaluation of the distortion
is done in the transform domain. The 3D integer wavelet transform with scaling is
approximately orthogonal, so the error evaluating the distortion on the transform
domain will remain reasonably small.
48
At the 3D-SBHP encoder side, on each bit plane n, all wavelet coefficients
whose magnitudes are greater than the threshold τ = 2n and are less than 2τ
are considered significant by the 3D-SBHP coder. Once a coefficient C is found
significant, its position and approximate magnitude, which is 1.5τ , are inferred from
the significance map by one bit and its sign is coded using one additional bit.
Initially, at the decoder side, every coefficient of the transformed image is
assumed to be zero. Here we assume that the coefficient value C is positive and
uniformly distributed between [τ, 2τ ), then the expected square error in reproducing
the coefficient as 0 is,
Z
2
2τ
2
D0 = E{(C − 0) } = E{C } =
τ
1 2
7
c dc = τ 2 .
τ
3
(3.10)
If we reproduce the coefficient at Ĉ = 1.5τ , then the expected squared error
becomes
Z
2
2τ
2
Dτ = E{(C − Ĉ) } = E{(C − 1.5τ ) } =
τ
1
1
(c − 1.5τ )2 dc = τ 2 .
τ
12
(3.11)
So, finding a newly significant coefficient reduces the expected sum squared error by
7
1
27
D0 − Dτ = τ 2 − τ 2 = τ 2 .
3
12
12
(3.12)
Equation 3.11 also gives the expected squared error of a coefficient refined up
to significant threshold τ . If a coefficient is refined to significant threshold τ , the
reduction by refinement is,
D2τ − Dτ =
1
1
1
(2τ )2 − τ 2 = τ 2 .
12
12
4
(3.13)
For each bit plane n, 3D-SBHP calculates rate-distortion information at three
points (corresponding to ends of LIP pass, LIS pass and LSP pass) in the coding
process for every code-block Bk . This information are rates Ri,n,k (i = 0, 1, 2, corresponding to LIP pass, LIS pass and LSP pass, respectively), the total number of
bits used so far, Pi,n,k , the number of pixels in the LSP, and δDi,n,k , derivative of
49
the rate distortion curve.
δDi,n,k can be calculated as the average decrease in distortion per coded bit.
With Equation 3.12, Equation 3.13 and experimental refinement of the
27
12
factor,
we approximate δDi,n,k as
δD0,n,k = −2.15(P0,n,k − P2,n+1,k )τ 2 /(R0,n,k − R2,n+1,k )
(3.14)
δD1,n,k = −1.95(P1,n,k − P0,n,k )τ 2 /(R1,n,k − R0,n,k )
(3.15)
δD2,n,k = −0.25P2,n+1,k τ 2 /(R2,n,k − R1,n,k )
(3.16)
For any truncation point Rk in the every code-block’s embedded bit-stream, linear
interpolation is used to estimate the derivative of the rate distortion
δD(Rk ) = δDi,n,k +
(Rk − Ri,n,k )(δDi+1,n,k − δDi,n,k )
, Ri,n,k ≤ Rk ≤ Ri+1,n,k (3.17)
(Ri+1,n,k − Ri,n,k )
To enable SNR scalability, rate-distortion information is calculated by (3.14),
(3.15) and (3.16) for every bitplane n. And Ri,n,k and δDi,n,k are stored in the header
for every code-block in the coding process, as shown in Figure 3.12. When decoding,
the given Lagrangian optimization method is used to find the optimal truncation
points for every code-block’s bit stream, then bitstream interleaving is performed to
get the final bitstream.
3.4
Numerical Results
We conduct our experiments on 4 8-bit CT medical image volumes, 4 8-bit MR
medical image volumes, and 4 16-bit Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) hyperspectral image volumes. AVIRIS has 224 bands and 614 × 512
pixel resolution. For our experiments, we cropped the scene to 512 × 512 × 224
pixels. Table 3.3 shows the description of these sequences.
In this section, we provide simulation results and compare the proposed 3-D
volumetric codec with other algorithms.
50
File Name
Image Type
Volume Size
Skull
Wrist
Carotid
Aperts
Liver t1
Liver t2e1
Sag head
Ped chest
moffett scence 1
moffett scence 2
moffett scence 3
jasper scence 1
CT
CT
CT
CT
MR
MR
MR
MR
AVIRIS
AVIRIS
AVIRIS
AVIRIS
256 × 256 × 192
256 × 256 × 176
256 × 256 × 64
256 × 256 × 96
256 × 256 × 48
256 × 256 × 48
256 × 256 × 48
256 × 256 × 64
512 × 512 × 224
512 × 512 × 224
512 × 512 × 224
512 × 512 × 224
Bit Depth
(bit/pixel)
8
8
8
8
8
8
8
8
16
16
16
16
Table 3.3: Description of the image volumes
3.4.1
Lossless Coding Performance
To show the compression performance and SNR salability of 3D-SBHP, we first
present the lossless coding results including comparisons of performance by using
different integer wavelet transforms, and comparisons of performance using different
code-block sizes. The results are given in bits per pixel, averaged over the entire
image volume. We will show the lossy performance in the next subsection.
In our experiments, all image sequences are compressed losslessly with GOS =
16.
3.4.1.1
Lossless Coding Performance by Use of Different Integer Wavelet
Transforms
Integer wavelet transforms S+P(B), I(2,2) and I(2+2,2) are compared for 3DSBHP, Asymmetric Tree based 3D-SPIHT (AT-3D-SPIHT)[51], 3D-SPECK and
2D-SBHP. Table 3.4 and Table 3.5 compare the lossless coding performance of 3DSBHP, AT-3D-SPIHT, 3D-SPECK, 3D-CB-EZW and 2D-SBHP on CT and MR
volumetric data sets, respectively. Table 3.6 gives the comparison of 3D-SBHP,
AT-3D-SPIHT, 3D-SPECK on AVIRIS hyperspectral image volumes by using of
different filters. Three decomposition levels are used for AT-3D-SPIHT ,3D-SPECK,
3D-SBHP and 2D-SBHP in all dimensions. 3D-CB-EZW uses two decomposition
levels on all three dimensions. 3D-SBHP uses code-block dimensions of 64 × 64 × 4.
The results show that none of these filters performs the best for all data sets. These
51
three filters have similar performance on medical image data sets. I(2+2,2) performs
the worst on AVIRIS image data sets for almost all selected algorithms, but shows
average performance on medical image data sets. In general, the S+P and I(2,2)
filters perform better most of the time.
Algorithm
GOS
3D-SBHP
16
AT-3D-SPIHT
16
3D-SPECK
whole
sequence
3D-CB-EZW
16
2D-SBHP
1
Filter
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
CT Skull
2.2911
2.2301
2.2701
2.0752
2.1321
2.1754
2.2063
2.1626
2.0170
2.2046
2.9519
2.1792
3.2916
3.3125
3.2969
CT Wrist
1.4644
1.3347
1.4002
1.2811
1.2490
1.3083
1.3731
1.2718
1.2538
1.3274
1.8236
1.2267
1.9733
2.0267
1.9720
CT Carotid
1.5941
1.6684
1.6631
1.4976
1.5772
1.5844
1.6041
1.6824
1.6517
1.4553
2.1408
1.4618
2.1300
2.1795
2.1421
CT Aperts
1.1047
1.0525
1.0876
1.0403
0.9938
1.0370
1.1134
1.0667
1.1502
1.0139
1.4263
0.9424
1.3573
1.4564
1.4326
Table 3.4: Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on CT
data.
3.4.1.2
Comparison of Lossless Performance with Different Algorithms
Table 3.7 compares the lossless compression performance of 3D-SPIHT, 3DSPECK, 3D-CB-EZW, 3D-SBHP, JPEG2000 multi-component and a 2D lossless
compression algorithms, JPEG2000 on medical data.
To get these results, 3D-SBHP uses code-block dimensions 64×64×4 and GOS
of 16, while other 3D algorithms treat the entire image sequence as one coding unit.
For all 3D algorithms, the three level wavelet transform was applied on all three
dimensions using I(2+2,2) filter. JPEG2000 multi-component first applied I(2+2)
filter on the axial domain, then coded every resultant spectral slice as separate file
by Kakadu JPEG2000 [50] which uses integer 5/3 filter.
Comparing the average compression performance list in the last row of the
table, JPEG2000 multi-component gives the best coding efficiency. As an extension
of SBHP, a low-complexity alternative to JPEG2000, 3D-SBHP on average yields
52
Algorithm
GOS
3D-SBHP
16
AT-3D-SPIHT
16
3D-SPECK
whole
sequence
3D-CB-EZW
16
2D-SPIHT
1
2D-SBHP
1
Filter
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
S+P
S+P
I(2,2)
I(2+2,2)
MR Liver1
2.5609
2.5001
2.5257
2.3697
2.3423
2.3191
2.5520
2.5049
2.4331
2.4156
3.2270
2.3239
3.1288
3.4102
3.5061
3.4090
MR Liver2
1.8225
1.8354
1.8477
1.7444
1.7501
1.7868
1.8403
1.8585
1.8733
1.7530
2.5771
1.7512
2.4982
2.5759
2.6528
2.5606
MR head
2.3241
2.3091
2.3219
2.2025
2.1557
2.2071
2.3463
2.3455
2.3589
2.3569
2.8631
2.2690
2.6913
2.9069
3.1575
3.1451
MR chest
2.2387
2.0081
2.0873
2.0280
1.8779
1.9629
2.1951
2.0320
2.1160
2.1174
2.4954
1.9895
2.8555
3.1367
3.2131
3.1327
Table 3.5: Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on MR
data.
Algorithm
GOS
3D-SBHP
16
AT-3D-SPIHT
whole
sequence
3D-SPECK
whole
sequence
Filter
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
S+P
I(2,2)
I(2+2,2)
moffett
scene 1
7.0598
7.1848
7.4741
6.5270
6.5860
6.6316
6.9102
7.1360
7.2617
moffett
scene 2
8.4385
8.5674
8.7774
7.6781
7.7774
7.8481
8.0550
8.1910
8.2420
moffett
scene 3
6.8563
6.8536
7.1244
6.3969
6.4701
6.5026
6.8209
6.6402
6.7016
jasper
scence
6.8097
6.9705
7.2418
6.2602
6.5324
6.5647
6.7014
7.0213
6.8403
Table 3.6: Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on
AVIRIS data (Decomposition level of 3 is used on all dimensions).
23% higher compression performance than 2D JPEG2000, and is 13% worse than
JPEG2000 multi-component. Compared with the average compression results of
other 3D algorithms, 3D-SBHP is 2%, 10% and 13% worse than 3D-SPECK, 3DSPIHT and 3D-CS-EZW, in compression efficiency, respectively. On the other hand,
3D-SBHP outperforms most algorithms on some sequences. If we consider the fact
that 3D-SBHP is applied with GOS = 16, while other 3D algorithms use the whole
sequence as their coding unit, a small performance gap is expected.
53
Table 3.8 presents the lossless performances of 3D-SBHP, 3D-SPIHT, 3DSPECK, JP2K-Multi, 2D-SPHIT and JPEG 2000 on hyperspectral data. 3D-SBHP
uses five-level dyadic S+P(B) filter on spatial domain and two-level 1D S+P(B)
filter on the spectral axis with GOS = 16 and code-block size = 64 × 64 × 4. JP2KMulti is implemented first by applying the S+P filter on spectral dimension and is
then followed by application of the 2D JPEG 2000 on the spatial domain using the
integer filter(5,3). For all other 3D algorithms, all 224 bands are coded as a single
unit and five-level filter are applied on every dimension.
For AVIRIS test image volumes, 3D-SPIHT gives the best coding efficiency.
3D-SBHP is comparable to 3D-SPIHT on AVIRIS image sequence. On average, it is
only about 2% inferior to 3D-SPIHT and 3D-SPECK. Our algorithm yields, on average, about 2%, 13% and 17% higher compression efficiency than JPEG2000 multicomponent, 2D-SPIHT and JPEG2000, respectively. Again, we sacrifice coding
efficiency to gain random accessibility and low memory usage by using GOS = 16.
Compared with other coding algorithms, 3D-SBHP performs better on hyperspectral images than on the medical images. In Table 3.1, we listed the average
standard deviations (STD) of medical and hyperspectral image dataset along x, y,
and z directions. Since hyperspectral image data has high STD along all three directions, that means after wavelet transform, hyperspectral images tends to have
notable number of high value coefficients in high frequency subbands. Quadtree
partition used in 3D-SBHP can zoom to these areas of high energy very quickly. For
medical image data, the STD along all three directions, especially along the axial
direction is very low. After wavelet transform, the high frequency subbands will
have very low energy. When a wavelet coefficient at a coarse scale is insignificant
with respect to a given threshold T, wavelet coefficient of the same orientation in
the same spatial location at finer scales have much higher possibility to be insignificant with respect to T. This property can be used by zerotree or spatial orientation
trees. This may explain why 3D-SPIHT and 3D-EZW give better performance than
3D-SPECK and 3D-SBHP on medical image data, but inferior or close performance
on hyperspectral image data.
54
File
Name
CT Skull
CT Wrist
CT Carotid
CT Aperts
MR Liver t1
MR Liver t2e1
MR Sag head
MR Ped chest
average
3DSPIHT
2.0051
1.1570
1.5498
1.0313
2.2447
1.6914
2.1750
1.9218
1.7220
3DSBHP
2.2701
1.4002
1.6631
1.0876
2.5257
1.8477
2.3219
2.0873
1.9004
3DSPECK
2.0170
1.2538
1.6517
1.1502
2.4331
1.8733
2.3589
2.1160
1.8567
3DCB-EZW
2.0095
1.1393
1.3930
0.8923
2.2076
1.6591
2.2846
1.8705
1.6820
JP2KMulti
1.7450
1.1771
1.6785
0.7290
2.3814
1.6247
2.5961
1.4884
1.6775
JPEG
2000
2.9993
1.7648
2.0277
1.2690
3.2640
2.5804
2.9134
3.1106
2.4912
Table 3.7: Comparison of different coding methods for lossless compression of 8-bit medical image volumes (bits/pixel).
File
Name
moffett scene 1
moffett scene 2
moffett scene 3
jasper scene 1
average
3DSPIHT
6.9411
7.9174
6.7402
6.7157
7.0786
3DSBHP
7.0333
8.4333
6.8359
6.7842
7.2716
3DSPECK
6.9102
8.0835
6.8209
6.7014
7.1290
JP2KMulti
7.1748
8.4131
7.0021
6.8965
7.3716
2DSPIHT
7.9714
9.8503
7.5874
7.7977
8.3458
JPEG
2000
8.7905
10.0815
7.7258
8.8560
8.7959
Table 3.8: Comparison of different coding methods for lossless coding of
16-bit AVIRIS image volumes (bit/pixel) (Decomposition level
of 5 is used on spatial domain and decomposition level of 2 is
used on spectral axis)
3.4.1.3
Lossless coding performance by use of different code-block sizes
Table 3.9 compares the lossless compression results for all image data listed in
Table 6.2 by using different code-block sizes: 8 × 8 × 2, 16 × 16 × 2, 32 × 32 × 4 and
64 × 64 × 4. The image sequences are compressed with GOS = 16 and I(2,2) filter.
Three levels of wavelet decomposition is applied on all three dimensions. The results
show that for all image sequences, increasing the size of the code-block improves the
performance somewhat. The reason for the improvement of coding efficiency is that
larger code-block size decreases the total overhead for the whole image sequence.
3.4.2
Lossy performance
As discussed before, we can obtain reconstructed volumetric slices at any bit
rate from just one compressed embedded bitstream generated by 3D-SBHP. To
get good lossy performance, we use the wavelet transform structure and scaling
factors shown in Figure 3.5 and Figure 3.4(c) that make the 3D integer transform
55
File Name
Skull
Wrist
Carotid
Aperts
Liver t1
Liver t2e1
Sag head
Ped chest
moffett scence 1
moffett scence 2
moffett scence 3
jasper scence 1
8×8×2
3.1066
2.1780
2.5093
1.8857
3.3724
2.6961
3.1859
2.8729
8.3711
9.8242
8.0128
8.1417
16 × 16 × 2
2.4758
1.5601
1.8973
1.2718
2.7478
2.0709
2.5538
2.2502
7.5104
8.9170
7.1722
7.2922
32 × 32 × 4
2.2617
1.3604
1.6952
1.0793
2.5287
1.86613
2.3395
2.0372
7.2282
8.6086
6.8960
7.0130
64 × 64 × 4
2.2301
1.3347
1.6684
1.0525
2.5001
1.8354
2.3091
2.0081
7.1848
8.5674
6.8536
6.9705
Table 3.9: Lossless Coding Results by Use of Different Code-block Size
(bits/pixel)
approximately unitary. In this section, we show performance of lossy reconstruction
from lossless compressed file. The quality of reconstruction is measured by peak
signal to noise ratio (PSNR) over the whole image sequence. PSNR is defined by
P SN R = 10 log10
x2peak
dB
M SE
(3.18)
where xpeak = 255 for these medical images and MSE denotes the mean squarederror between all the original and reconstructed slices. Table 3.10 shows the PSNR
performance for four different medical volumetric image sets at four different bit
rates (these rates are obtained by truncation of lossless bitstream). The I(2,2)
filters and 32 × 32 × 4 code-block size are used here.
Bit rate
0.125 bpp
0.25 bpp
0.5 bpp
1.0 bpp
CT Skull
33.36
38.65
42.92
47.17
CT Carotid
39.76
44.04
47.06
49.6
MR Liver t1
38.18
41.88
44.58
46.87
MR Sag head
39.08
42.6
45.77
48.29
Table 3.10: PSNR performance (in dB) of 3D-SBHP at various rates for
medical volumetric image data. These rates are obtained by
truncation of lossless bitstream.
Figure 3.13 and Figure 3.14 show the reconstructed CT Skull 1st slice and
MR Liver t1 1st slice by 3D-SBHP. The algorithm yields good subjective quality.
Small differences in quality from the original are almost not noticeable, especially
at higher bit rates.
56
3.4.3
Resolution scalable results
The CT medical sequence ”skull” , I(2,2) integer filter, and 32 × 32 × 4 code-
block size are selected for this comparison. Figure 3.15 shows the reconstructed
CT skull sequence decoded from a single scalable code stream at a variety of resolutions at 0.125 bpp. Here, bit rate (bpp) is defined as number of code bits (byte
budget×8) divided by the total number of pixels in the original image sequence The
PSNR values listed in Table 3.11 for low resolution image sequences are calculated
with respect to the lossless reconstruction of the corresponding resolution. Table
3.11 shows that for a given byte budget, the PSNR values increase from one resolution to the next lower one. And at each reolution level, the PSNR value increase
from lower bit rates to higher ones. The corresponding byte budget for losslessly
reconstruction at a variety of resolutions are provided in Table 3.12. We can see
that the computational cost of decoding reduces from one resolution level to the
next lower one.
Bit Rate
(bpp)
Byte budget
(bytes)
0.03125
0.0625
0.125
0.25
0.5
1.0
2.0
49152
98304
196608
393216
786432
1572864
3145728
PSNR (dB)
1/4 resolution
33.12
49.73
lossless
lossless
lossless
lossless
lossless
1/2 resolution
25.53
31.61
38.84
43.88
lossless
lossless
lossless
FULL
21.74
25.8
33.80
37.97
42.09
46.81
51.12
Table 3.11: PSNR for decoding CT skull at a variety of resolutions and
bit rates
Figure 3.15 demonstrates the first reconstructed slice from the reconstructed
sequence decoded at 0.125 bpp to a variety of resolutions. Even at a low resolution,
we can get a clear view of the image sequence.
Byte Budget (bytes) for
lossless reconstruction
1/4 resolution 1/2 resolution
FULL
137333
757110
3725185
Table 3.12: Byte used for losslessly reconstruct CT skull at a variety of
resolutions
57
3.4.4
Computational Complexity
One of the main advantages of 3D-SBHP is its speed of operation. In this
section, we show the coding speed. 3D-SBHP has been implemented using standard
C++ language and complied by VC++.NET compiler. Tests are performed on a
laptop with Intel 1.50GHz Pentium M processor and Microsoft Windows XP. The
coding speed is measured by CPU cycles. The RDTSC (read-time stamp counter)
instruction is used for cycle count.
CT skull and MR liver t1 are selected for testing. These image sequences are
compressed losslessly with GOS = 16 and code-block size = 32 × 32 × 4. And threelevels of spatial dyadic integer wavelet transform and two-levels temporal integer
wavelet transform are applied on all image sequences by using I(2,2) filters. Both
3D-SBHP and AT-3D-SPIHT schemes perform lossless encoding. The decoding
time of 3D-SBHP and AT-3D-SPIHT given in this section includes the time for bit
allocation and decoding the selected bitstream. In Table 3.13, Table 3.14 and Table
3.15, we measure only the encoding and decoding time. The wavelet transform times
and disk I/O times are not included.
The lossless encoding times of AT-3D-SPIHT and 3D-SBHP on CT Skull and
MR liver t1 are compared in Table 3.13, measured in total CPU cycles used for
whole image sequence and average CPU cycles used for a single pixel. Table 3.14
compares the decoding times of AT-3D-SPIHT and 3D-SBHP on CT Skull and
MR liver t1 at the rate of 0.125, 0.25, 0.5 and 1.0 bpp. The comparison shows that
3D-SBHP encoder runs around 6 times faster than AT-3D-SPIHT encoder. As bit
rate increases from 0.125 bpp to full bit rate, 3D-SBHP decoder is about 2 to 10
times faster than AT-3D-SPIHT decoder. For both schemes, the decoding time is
much less than encoding time. The decoding times increase around twice when the
bit rate is doubled. For these two kinds of test image sequences, the average coding
times used for coding a single pixel are very similar at every bit rate.
Table 3.15 compares the decoding times of 3D-SBHP on CT Skull and MR liver t1
at a variety of resolutions. These variable resolution reconstructions are losslessly
decoded from the lossless bit stream. The table gives both the total CPU cycles and
cycles/pixel used to losslessly reconstruct image sequence to the desired resolution
58
level. Cycles/pixel value is calculated by averaging the total cycles over the number
of pixels in the original image sequence. The results show that the computational
cost are reduced rapidly (about 1/6) from one resolution level to the next lower one.
Table 3.16 compares CPU cycles used for wavelet transform, encoding and disk
I/O operations for both AT-3D-SPIHT and 3D-SBHP on CT Skull image squence.
In AT-3D-SPIHT, whole image sequence is read into memory, then followed a twolevel 1D wavelet transform on axial direction and three-level 2D dyadic wavelet
transform in spatial domain. The coding algorithm is applied on the whole transformed sequence. In 3D-SBHP, the whole sequence are divided into GOS with size
16. The same level wavelet transform is applied on every GOS separately. Codeblocks with size 32 × 32 × 4 are coded by 3D-SBHP coding algorithm independently.
The comparison shows that with smaller GOS size, wavelet transform performed
in 3D-SBHP is about 3 time faster than the one performed in AT-3D-SPIHT. For
AT-3D-SPIHT, more CPU cycles are used for virtual memory paging because of
its larger memory usage. For both 3D-SBHP and AT-3D-SPIHT, major amount
of CPU cycles are used for disk I/O operation. The speed of disk I/O operation is
mainly determined by the amount of memory , the data transfer rate of the hardware
and the frequency to access low-speed peripheral device such as disk drives.
File
CT Skull
MR liver t1
Total Cycles (×106 )
3D-SBHP
AT-3D
-SPIHT
1643.162 10086.096
449.921
2560.516
Cycles/pixel
3D-SBHP AT-3D
-SPIHT
130.58
801.570
143.58
813.966
Table 3.13: The comparison of lossless encoding time between AT-3DSPIHT and 3D-SBHP on image CT skull and MR liver t1.
(Wavelet transform times are not included.)
3.5
Summary and Conclusions
A low complexity three-dimensional image coding algorithm, 3D-SBHP, is pre-
sented in this chapter. Fixed Huffman coding and one coding pass per bit plane
are used to reduce the coding time. The proposed algorithm supports all functions
of JPEG2000. Integer wavelet transform is used to enable lossy-to-lossless recon-
59
Bit Rate
Total Cycles (×106 )
3D-SBHP AT-3DSPIHT
CT Skull
0.125
0.25
0.5
1.0
lossless
MR liver t1
0.125
0.25
0.5
1.0
lossless
Cycles/pixel
3D-SBHP AT-3DSPIHT
155.757
209.629
312.836
496.994
814.119
375.695
786.145
1677.159
3689.307
8333.717
12.37
16.65
24.86
39.49
64.70
29.86
62.477
133.29
293.20
662.30
38.857
53.322
80.057
126.758
231.21
96.860
174.739
396.864
844.629
2142.805
12.35
16.95
25.44
40.29
73.50
30.79
55.55
126.16
268.50
681.18
Table 3.14: The comparison of decoding time between AT-3D-SPIHT and
3D-SBHP on image CT skull and MR liver t1 at a variety of
bit rates. (Wavelet transform times are not included.)
Resolution
CT Skull
1/4
1/2
Full
MR liver t1
1/4
1/2
Full
Decoding
Total Cycles (×106 ) Cycles/pixel
18.638
113.901
814.119
1.481
18.104
64.70
6.903
38.106
231.21
2.194
12.113
73.50
Table 3.15: Losslessly decoding time of 3D-SBHP on CT skull and
MR liver t1 at a variety of resolutions
struction. The experimental results show that with small loss of quality, it is able to
encode an image sequence around 6 times faster than AT-3D-SPIHT. And according
to the bit rate, it is able to decode a image sequence about 2 to 10 times faster than
AT-3D-SPIHT.
60
wavelet transform time
encoding time
I/O time
total coding time
Total Cycles (×106 )
3D-SBHP
AT-3D
-SPIHT
975.830
3036.749
1643.162
10086.096
1105972.911 1676627.888
1108591.903 1689750.733
Cycles/pixel
3D-SBHP
AT-3D
-SPIHT
77.55
241.33
130.58
801.570
87894.82 133246.41
88102.96 134289.32
Table 3.16: The comparison of CPU cycles used for wavelet transform
time, lossless encoding and disk I/O between AT-3D-SPIHT
and 3D-SBHP on CT skull image sequence.
61
x8
x4
x4
x2
x2
x1
x2
x1
´ 2 −1
x1
(a) Scaling factor for three-level 2D
dyadic integer wavelet transform.
x2
x1
x1
LL
LH
HL
´ 2 −1
HH
(b) Wavelet packet transform in the third dimension.
LL
LH
x8
x8 x4
x4
x4
x2
x16
HL
x4
x4 x2
x2
x2
x1
x8
x2
x2
x1
F1
HH
x4
x4 x2
x2
x2
x1
x8
x1
´ 2 −1
x1
F2
x2
x2 x1
x1
x1
´ 2 −1
x4
x1
´ 2 −1
x1
F3
´ 2 −1
´ 2 −2
´ 2 −1
F4
(c) Scaling factor for three-level 2D integer wavelet transform with two-level
packet transform in the axial dimension.
Figure 3.4: An example of scaling factor used in integer wavelet transform
to approximate a 3D unitary transform.
62
Figure 3.5: Wavelet decomposition structure with 2 levels of 1D packet
decomposition along axial direction, followed by 3 levels of
2D dyadic transform in spatial domain.
X
S
I
Figure 3.6: Partitioning of the code-block into set S and I.
63
O(S)
S
S
O(S)
(a) size(S) ≤ size(M ax2D).
(b) size(S) > size(M ax2D).
Figure 3.7: Quadtree partitioning of set S.
s
s
s
s
s
s
s
I
(a) size(S) ≤ size(M ax2D).
s
s
s
I
(b) size(S) > size(M ax2D).
Figure 3.8: octave-band partitioning of set I.
64
(a)
(b)
(c)
8*8*2
(d)
(e)
Figure 3.9: Set partitioning rules used by 3-D SBHP.
3
2
1
4
4
4
6
5
9
12
8
11
7
10
7
7
10
10
Figure 3.10: 12 resolution levels with 3-level wavelet decomposition in
the spatial domain and 2-level wavelet decomposition in the
spectral direction.
65
block L
block 0
header
header
the highest
bitplane
header
header
b(n_0,0)
b(n_1,1)
header
b(n_j,j)
b(n_i,i)
b(n_L-1,L)
b(n_j-1,j)
b(n_i-1,i)
b(n_0-1,0)
b(n_L,L)
b(n_1-1,1)
SNR
scalable
b(0,L)
b(0,0)
b(0,1)
b(0,j)
b(0,i)
the lowest
bitplane
LLLLLL Subband
resolution 0
HHHHHH Subband
resolution k
Resolution scalable
Figure 3.11: An example of 3D-SBHP SNR and resolution scalable coding. Compressed bitstream generated on bitplane α in codeblock β is notated as b(α,β) . Code-blocks are encoded and
indexed from the lowest subband to the highest subband.
the highest bitplane
header
b(n,0) b(n-1,0)
block 0
Ri , j , 0
the lowest bitplane
b(0,0)
header
b(m,1) b(m-1,1)
b(0,1)
block 1
δDi , j , 0
Figure 3.12: Bitstream structure generated by 3D-SBHP. Compressed
bitstream generated on bitplane α in code-block β is notated
as b(α,β) . R(i,j,k) denotes the number of bit used after ith
coding pass (i = 0: LIP pass; i = 1: LIS pass; i = 2: LSP
pass) at the nth bit plane for code-block Bk . Derivation
D(i,j,k) denotes the derivation of the rate distortion curve,
δDi,j,k , after ith coding pass (i = 0: LIP pass; i = 1: LIS pass;
i = 2: LSP pass) at the nth bit plane for code-block Bk .
66
Figure 3.13: Reconstructed CT Skull 1st slice by 3D-SBHP, from left to
right, top to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0 bpp,
and original slice
67
Figure 3.14: Reconstructed MR Liver t1 1st slice by 3D-SBHP, from left
to right, top to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0
bpp, and original slice
68
Figure 3.15: A visual example of resolution scalable decoding. From left
to right: 1/4, 1/2 and full resolution at 0.125 bpp
CHAPTER 4
Region-of-Interest Decoding
In interactive viewing application, users usually only need a section of the
image sequence for analysis and diagnosis. Therefore, it is very important to have
region of interest retrievability that can greatly save decoding time and transmission
bandwidth. In this chapter, we illustrate how to apply 3D-SBHP to achieve RegionOf-Interest (ROI) access.
JPEG2000 can achieve three ROI coding methods: tiling, coefficient scaling
and code-block selection. Since code-block selection doesn’t require the ROI be
determined and segmented before encoding, the image sequence can be encoded
only once and it’s up to the decoder to extract a subset of bit stream to reconstruct
an image region specified in spatial location and quality. This gives the user the
flexibility at decoding time, which is vital to interactive application. In this chapter,
we illustrate how to apply 3D-SBHP to achieve Region-Of-Interest (ROI) access by
the method of code-block selection.
4.1
Code-block Selection
Consider an image sequence which has been transformed using the discrete
wavelet transform. The transformed image sequence exhibits a hierarchical pyramid structure. The wavelet coefficients in the pyramid subband system are spatially
correlated to some region of the image sequence. Figure 4.1 shows an example of
spatial access with code-block selection and the correlation between spatial domain
and wavelet transform domain. In 3-D SBHP, code-blocks are of a fixed size, and
they represent an increasing spatial extent at lower frequency subbands. Figure 4.2
gives an example of the parent-offspring dependencies in the 3D spatial orientation
tree after 2-level wavelet packet decomposition( 2D spatial + 1D temporal). Except
those coefficients in the lowest spatial and temporal subband, every coefficient located at (i, j, k) has its unique parent at (b 2i c, b 2j c, b k2 c) in the lower subband. All
coefficients are organized by trees with roots located in the lowest subband.
69
70
Figure 4.1: Spatial access with code-blocks.
t
x
y
Ht
LHt
LLt
Figure 4.2: Parent-offspring dependencies in the 3D orientation tree.
71
In this chapter, we consider retrieving a rectangular prism in an image sequence. Since the wavelet transform is separable, we first consider the random
access problem in one dimension.
Let A denotes the upper-left corner and B denotes the upper-right corner in
the first frame of the rectangular prism, then [xA , xB ) denotes the range of the
rectangular prism in the X direction. Let [xFk,l , xR
k,l ) denotes the X-direction interval
that is related to the rectangular prism at DWT level k in low-pass or high-pass
subbands. Let l = {0, 1} represent the low-pass and high-pass subband respectively.
Suppose the volume size of the image sequence is W × H × D. If we don’t consider
the filter length, the boundaries of each interval can be found recursively using
xFk,l
=b
xF(k−1),0
2
W
c+l× k,
2
xF0,0 = xA ,
xR
k,l
=d
xR
(k−1),0
2
e+l×
W
2k
xR
0,0 = xB
The spatial error penetration of the filter length effect around edges can
be calculated from the wavelet filter length and level of wavelet decomposition.
Topiwala[49] gives an approximate equation of the error penetration, by which the
spread of the error E (in pixels) as a function of the wavelet filter length L and the
number of wavelet decomposition level K is given by

 (2K − 1)(2 L−3 + 1), L even
2
E(K, L) =
 (2K − 1)(2 L−2 + 1), L odd
2
(4.1)
As shown in Equation 4.1, the number of error penetrated pixels grows exponentially in the number of decomposition level and is proportional to the synthesis
filter length. To get perfect reconstruction of the ROI region, we must consider the
filter length.
Suppose we have a synthesis filter with filter length L = M + N + 1,
gn =
N
X
i=−M
ai × fn+i
72
Type I code-block
Type II code-block
ROI
Original Image
Subband image
Figure 4.3: 2D example of code-block selection. Filter length is considered.
the boundaries of each interval can become
xFk,l
= max{0, b
xR
k,l = min{d
xF(k−1),0 − M
2
xR
(k−1),0 + N
2
xF0,0 = xA ,
e,
c} + l ×
W
,
2k
W
W
−
1}
+
l
×
2k
2k
xR
0,0 = xB
F
R
Similarly, the boundaries of each interval in Y direction, [yk,l
, yk,l
), and in
F
R
temporal direction, [zk,l
, zk,l
), can be found following the same principle.
Suppose that an image sequence is decomposed at level K in spatial domain
and level T in temporal domain with synthesis filter length L and coded with codeblock size O × P × Q. To reconstruct a X × Y × Z (Z ≤ GOS size) 3D region,
where
F
X = xR
0,0 − x0,0 ,
R
F
Y = y0,0
− y0,0
,
R
F
Z = z0,0
− z0,0
.
The number of decoded code-blocks, denoted as NB , is given below.
! Ã
!
R
F
xFj,l
yj,l
yj,l
xR
j,l
e−b
c+1 × d e−b c+1
NB =
s× d
O
O
P
P
j=1 l=1
Ã
!!
R
F
T P
t
zi,n
zi,n
P
d
×
e−b
c+1
Q
Q
i=1 n=1
Ã
K
S
P
P
Ã
73
where,



 1, j < K
 3, S = 1
 1, i < T
t=
S=
s=
 0, i = T
 0, j = K
 1, S = 0
(4.2)
For example, a 32×32×4 3D region is positioned at row 64, column 90, in frame
number 5 of a image sequence which is decomposed at level 2 with synthesis filter
length 3 and coded with code-block size 16×16×2. If we do not consider filter length,
96 code-blocks are needed for reconstruction of the ROI region. Here we call these
code-blocks type I code-blocks. To losslessly reconstruct this region, 156 code-blocks
are needed. Here, 60 more code-blocks are used for lossless reconstruction. In this
paper, we call these extra code-blocks type II code-blocks. Since subband transforms
are not shift invariant, the same 3D region positioned at different locations may
need different numbers of code-blocks for reconstruction. Figure 4.3 shows a 2D
illustration of code-block selection where the filter length is considered. Type I
code-blocks and type II code-blocks are labelled in the figure.
Figure 4.4(a) and Figure 4.4(b) give both 2D and 3D visual examples of ROI
decoding. In the 3D example, the region of ROI is from (134, 117, 17) to (198, 181,
112).
(a) A 2D visual example of
3D-SBHP random access decoding. The left: the 17th slice
of CT skull sequence at 1/2
resolution; The right: the 17th
slice in the ROI decoded image
sequence, full resolution.
(b) A 3D visual example of 3D-SBHP random access decoding. The left: CT skull
sequence; The right: the ROI decoded image sequence.
Figure 4.4: An visual example of 3D-SBHP random access decoding.
74
4.2
Random Accessibility
In this section, we assess the impact that the wavelet transform and the code-
block configuration have on the compression efficiency and accessibility of the scalable bitstream generated by 3D-SBHP.
4.2.1
Wavelet Transform vs. Random Accessibility
As filtering is a expansive operation, the number of wavelet coefficients that
need to be extracted for reconstruction always exceeds the number of pixels contained in the rectangular prism of interest. As shown in Equation 4.2, the number
of code-blocks needed to reconstruct a given region is a function of both wavelet decomposition level and filter length. In this section, we show their effects on random
accessibility.
4.2.1.1
Filter Implementation
For comparison, we use 2-tap S filter, I(2,2), I(4,2) and I(4,4) filters for constructing wavelet transforms to map integers to integers. The equations of the filters
are given below. Table 4.1 gives the number of low/high-pass filter tape for these
four integer filters.

 h =c
m
2m+1 − c2m
2 − tapSf ilter
 lm = c2m + b 1 hm c
(4.3)
2

1
1
 h =c
m
2m+1 − b 2 (c2m + c2m+2 ) + 2 c,
I(2, 2)
 lm = c2m + b 1 (hm−1 + hm ) + 1 c
4
(4.4)
2

9


 hm = c2m+1 − b 16 (c2m + c2m+2 ) −
I(4, 2)



1
16
(c2m−2 + c2m+4 ) + 12 c,
(4.5)
lm = c2m + b 14 (hm−1 + hm ) + 21 c

9

(c2m + c2m+2 ) −
hm = c2m+1 − b 16




1
 1 (c
2m−2 + c2m+4 ) + 2 c,
16
I(4, 4)
9

lm = c2m + b 32
(hm−1 + hm ) −




 1 (h
+h
) + 1c
32
m−2
m+1
2
(4.6)
75
Filter
Name
2-tape S
I(2,2)
I(4,2)
I(4,4)
Number of low-pass
Filter Taps
2
5
9
13
Number of high-pass
Filter Taps
2
3
7
7
Table 4.1: Number of taps of integer filters.
4.2.1.2
ROI decoding performance by use of different wavelet filters and
wavelet decomposition levels
Tables 4.2 and 4.3 show the effect of different filter lengths and wavelet decomposition levels on the spatial direction upon the ROI decoding performance. A
two-level wavelet transform with I(2,2) filter is used on the temporal direction in all
cases. The experiment is performed on the CT Carotid sequence with code-block
size = 8 × 8 × 2 and the rectangular region with the lower left corner in the center
of image slice and sizes of 64 × 64 × 64 are selected as ROI region. The distortions are compared after all type I code-blocks are decoded. We see that decreasing
the synthesis filter length offers slightly better distortion when number of taps is
larger than two. As shown in 4.7(b), only the 2-tap S filter allows perfect separation
background from ROI, although its compression performance is inferior to longer
filters. For other longer filters, the quality of ROI suffers much from error penetration, shown in Figure 4.7(c)-4.7(e). Table 4.3 gives the performance when spatial
wavelet decomposition level is 2. It provides more than 4dB better performance for
all filters than 3 level decomposition. Again, we see that fewer decomposition levels
offer better ROI quality at the price of decreased compression capability. As shown
in Equation 4.1, the number of error penetrated pixels grows exponentially in the
number of decomposition level and is proportional to the synthesis filter length. To
improve ROI quality, reducing the number of decomposition level and filter length
is preferred. However, if the compression efficiency and quality of the background
are also of some importance, a balance has to be established.
76
Table 4.2: Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64×64×64, code-block size = 8×8×2,
spatial wavelet decomposition level = 3)
Filter
Type
2-tape S
I(2,2)
I(4,2)
I(4,4)
Decoding type
code-blocks
Bit rate(bpp)
0.2237
0.1967
0.1912
0.1922
I
PSNR(dB)
lossless
18.83
18.48
18.49
Bit rate for losslessly
decoding
ROI(bpp)
Bit rate for lossless
compression(bpp)
0.2237
0.6150
0.6018
0.6058
2.7647
2.5973
2.5802
2.5933
Table 4.3: Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64×64×64, code-block size = 8×8×2,
spatial wavelet decomposition level = 2)
Filter
Type
2-tape S
I(2,2)
I(4,2)
I(4,4)
4.2.2
Decoding type
code-blocks
Bit rate(bpp)
0.2267
0.1999
0.1937
0.1947
I
PSNR(dB)
lossless
22.87
22.57
22.59
Bit rate for losslessly
decoding
ROI(bpp)
Bit rate for lossless
compression(bpp)
0.2267
0.5541
0.5397
0.5433
2.7796
2.6058
2.5888
2.6020
Code-block Configurations vs. Random Accessibility
In this section, we turn our attention to code-block configurations which op-
timize the random accessibility and compression efficiency. We use integer filter
I(2,2) with three levels of spatial dyadic wavelet transform and two levels temporal
wavelet transform for comparison.
4.2.2.1
Lossy-to-lossless coding performance by use of different codeblock sizes
We first compare the lossy-to-lossless coding performance by using different
coding units. We test 3-D SBHP on a set of eight 8-bit medical image sequences.
Table 6.2 shows the description of these sequence. Figure 4.5 illustrates the effect of
increasing the code-block size on the rate-distortion performance. Average PSNR
is calculated over all eight image sequences. The results show there is only subtle
decrease in rate-distortion performance when the code-block size reduces from 64 ×
64 × 4 to 32 × 32 × 2. The significant loss is observed when the code-block size is less
77
50
45
40
PSNR(dB)
35
30
25
8×8×2
16×16×2
32×32×2
32×32×4
64×64×4
20
15
10
0
0.2
0.4
0.6
0.8
1
bpp
1.2
1.4
1.6
1.8
2
Figure 4.5: Rate-distortion performance with increasing code-block size.
than 16 × 16 × 2. This performance decrease is particularly significant at bit rates
< 0.5 bpp where there is as much as 10 dB decrease in PSNR for 16 × 16 × 2 codeblock size and 20 dB decrease in PSNR for 8 × 8 × 2 code-block size compared to the
other three larger code-block sizes, for a given bit rate. The reason for the reduction
of coding efficiency is that smaller code-block size increases the total overhead for
the whole image sequence. As shown in the figure, for 8 × 8 × 2 code-block size,
when bit rate increases from 0.03125 bpp to 0.125 bpp, there is almost no PSNR
increase. That means all decoded bits are overhead bits.
Table 4.4: Description of the image volumes
File Name
Skull
Wrist
Carotid
Aperts
Liver t1
Liver t2e1
Sag head
Ped chest
Image Type
CT
CT
CT
CT
MR
MR
MR
MR
Volume Size
256 × 256 × 192
256 × 256 × 176
256 × 256 × 64
256 × 256 × 96
256 × 256 × 48
256 × 256 × 48
256 × 256 × 48
256 × 256 × 64
78
4.2.2.2
ROI decoding performance by use of different code-block sizes
and ROI sizes
In Figure 4.6, we show the inter-dependence between ROI size and code-block
size. The experiment is performed on the CT Carotid sequence and the rectangular
region with the lower left corner in the center of image slice and sizes of 16 × 16 × 64,
32 × 32 × 64 and 64 × 64 × 64 are selected as ROI. For every ROI size, the figure
gives performance of using 8 × 8 × 2,16 × 16 × 2 and 32 × 32 × 2 code-blocks. All type
I code-blocks are decoded before type II code-blocks. It can be seen that code-block
size 8 × 8 × 2 gives the best lossless decoding performance in all three ROI size
experiments. And enlarging the size of the ROI increases the bit rate at which the
ROI is losslessly decoded only when the ROI size is larger or equal to the image
region to which a code-block in the highest subband is correlated. In the figure, the
rate-distortion at the point when all type I code-blocks are fully decoded is labelled
by black × on every curve. It is clear that when type II code-blocks are decoded, for
a given code-block size, smaller ROI curves have higher slope, and for a given ROI
size, those curves with smaller code-block size have higher slope. The higher slope
indicates that the type II code-blocks are more important than those in the lower
slope case. It also indicates that in smaller code-block cases, higher percentage of
bits decoded in the type II code-blocks are used for perfect reconstruction of the
ROI, while in the larger code-block case, more bits decoded in the type II codeblocks contribute to background. However, the larger code-block size gives better
rate-distortion performance at low bit rate. This occurs because of the overhead of
a code-block. Therefore, in applications where only a high quality ROI is required
or ROI size is small, smaller code-blocks (say 8 × 8 × 2) should be used. Whereas if
the desired bit rate is very low, or the background is also of some importance, larger
code-blocks should be used.
4.2.3
ROI access performance by use of different bit allocation methods
In all of our previous experiments, the bit allocation method we used for ROI
region is that we first allocate bit to type I code-blocks from higher bit-plane to
lower bit-plane, then allocate the remainder of the bit budget to the type II code-
79
ROI 16×16×64
70
8×8×2
16×16×2
32×32×2
60
PSNR(dB)
50
40
30
20
10
0
0.2
0.4
0.6
0.8
1
bpp
1.2
1.4
1.6
1.8
2
(a) ROI size= 16 × 16 × 64
ROI 32×32×64
70
8×8×2
16×16×2
32×32×2
60
PSNR(dB)
50
40
30
20
10
0
0.2
0.4
0.6
0.8
1
bpp
1.2
1.4
1.6
1.8
2
(b) ROI size= 32 × 32 × 64
ROI 64×64×64
70
8×8×2
16×16×2
32×32×2
60
PSNR(dB)
50
40
30
20
10
0
0.2
0.4
0.6
0.8
1
bpp
1.2
1.4
1.6
1.8
2
(c) ROI size= 64 × 64 × 64
Figure 4.6: Rate-distortion performance with increasing ROI size.
80
(a) 5th
slice
of
original (b) The ROI decoded image slice
CT Carotid image sequence
with Haar filter
(c) The ROI decoded image slice (d) The ROI decoded image slice
with I(2,2) filter
with I(4,2) filter
(e) The ROI decoded image slice
with I(4,4) filter
Figure 4.7: A visual example of ROI decoding from 3-D SBHP bit stream
using different wavelet filters.
81
blocks. In Figure 4.6(b), when 8 × 8 × 2 code-block is used, all type I code-blocks
are fully decoded at rate 0.062 bpp with low P SN R = 19.37dB. And when the
bit rate increases from 0.0315 bpp to 0.062 bpp, there is no significant performance
improvement shown in the figure, while when the type II code-blocks are decoded, the
quality increases sharply. That means that type II code-blocks play an important role
in improving the quality and the low bit planes (corresponding to the least significant
bit of the wavelet coefficients in the ROI) in type I code-blocks contribute very little to
the visual quality of the ROI. Therefore, it would make sense to terminate decoding
type I code-blocks before we reach the low bit planes and instead send high bit
planes from the type II code-blocks when the given bit rate is not enough to fully
decode all ROI related code-blocks. In Figure 4.8, we compare the rate-distortion
performance of three different bit allocation method. The first, shown as decode
priority 1 in the figure, is the decoding scheme we used in the last experiment. The
second, shown as decode priority 2, gives corresponding bit planes in those two kind
of code-blocks the same priority. That means the second scheme allocates bit to all
ROI related code-blocks together from the highest bit plane to the lowest bit plane.
In the third scheme, shown as decode priority 3, we first allocate bits for the type I
code-blocks from the highest bit-plane to the fourth lowest bit-plane, then allocate
the left bit budget to all ROI related code-block from higher bit-plane to the lower
bit-plane. Comparing these three schemes, although they achieve lossless decoding
the ROI at the same bit rate, their rate-distortion curves are significantly different.
At low bit rate (< 0.1bpp), the third scheme gives at most 15dB and 5dB better
performance than the first and second scheme, respectively, whereas at higher bit
rate, the first scheme performs best. Therefore, given a bit rate, to get the best
lossy ROI decoding performance, we need to find an optimal bit allocation method
according to the relative importance between type II code-blocks and type II codeblocks. As we addressed in previous sections, the relative importance between those
two kind of blocks depends on code-block size, ROI size, filter length and wavelet
decomposition level.
82
70
60
PSNR(dB)
50
40
30
20
10
Figure 4.8:
4.3
Decode priority 1
Decode priority 2
Decode priority 3
0
0.1
0.2
0.3
0.4
0.5
bpp
0.6
0.7
0.8
0.9
1
Rate-distortion performance with different priorities for
code-blocks.
Conclusions
In this chapter, we present 3D-SBHP algorithm and empirically investigated
code-block selection ROI access method by applying 3D-SBHP on medical volumetric images. Our work shows that the ROI access performance is affected by several
coding parameters. We also outline some trade-offs in ROI access. At last, we give
a possible way to optimize ROI access performance at the decoder side.
CHAPTER 5
Multistage Lattice Vector Quantization for Hyperspectral
Image Compression
Lattice vector quantization (LVQ) offers substantial reduction in computational load and design complexity due to the lattice regular structure [52]. In this
chapter, we extended SPIHT coding algorithm with lattice vector quantization to
code hyperspectral images. In the proposed algorithm, multistage lattice vector
quantization (MLVQ) is used to exploit correlations between image slices, while
offering successive refinement with low coding complexity and computation. Different LVQs including cubic Z4 and D4 are considered. And their performances are
compared with other 2D and 3D wavelet-based image compression algorithms.
5.1
Introduction
As we mentioned in Chapter 1, the transform, the quantization and the coding
of quantized coefficients are all candidates for exploiting the relationships between
the slices. Due to the superior performance over scalar quantization, vector quantization has been applied in many wavelet-based coding algorithms.
The Linde-Buzo-Gray (LBG) algorithm [55] is the most common approach to
design vector quantizers. In [53], subband image coding with VQ generated by an
LBG codebook is proposed. The LBG training algorithm causes high computational
cost and coding complexity especially as the vector dimension and bit rate increase.
Lattice vector quantization, which is an extension of uniform scalar quantization to
multiple dimensions, is an approach to reduce the computational complexity [57].
Plain lattice vector quantization of wavelet coefficient vectors has been successfully employed for image compression [61, 62, 63]. In order to improve performance, it is reasonable to consider combining LVQ with powerful wavelet-based
zerotree or set-partitioning image coding methods and bitplane-wise successive refinement methodologies for scalar sources, as in EZW, SPIHT and SPECK. In [64],
a multistage lattice vector quantization is used along with both zerotree structure
83
84
and quadtree structure that produced comparable results to JPEG 2000 at low bit
rates. VEZW [65] and VSPIHT [66, 67, 68] have successfully employed LVQ with
2D-EZW and 2D-SPIHT respectively. And in VSPECK [69], tree-structured vector
quantization (TSVQ) [70] and ECVQ [71] are used to code the significant coefficients
for 2D-SPECK.
For volumetric images, especially for hyperspectral images, neighboring slices
convey highly related spatial details. Since VQ has the ability to exploit the statistical correlation between neighboring data in a straightforward manner, we plan to
use VQ on volumetric images to explore the correlation in the axial direction. In
particular, the multistage LVQ is used to obtain the counterpart of bitplane-wise
successive refinement, where successive lattice codebooks in the shape of Voronoi
regions of multidimensional lattice are used.
This chapter is organized as follows. We first reviews basic lattice vector
quantization and the multistage LVQ. The multistage LVQ-based-SPIHT (MLVQSPIHT) is given in Section 5.3. The performance of the MLVQ-SPIHT for hyperspectral image compression is presented in Section 5.4. Section 5.5 concludes the
chapter.
5.2
Vector Quantization
The basic idea of vector quantizer is to quantize pixel sequences rather than
single pixels. A vector quantizer of dimension n and size L is defined as a function
that maps an arbitrary vector X ∈ Rn into one of L output vectors Y1 , Y2 , ..., YL
called codevectors belonging to Rn . The vector quantizer is completely specified by
the L codevectors and their corresponding nonoverlapping partitions of Rn called
Voronoi regions. A Voronoi region Vi is defined by the equation [72]
Vi = {X ∈ Rn /kX − Yi k ≤ kX − Yj k, i 6= j}
(5.1)
Given a desired bit rate per dimension b and vector dimension n, the codebook
size is equal to 2bn . Although the LBG (Lind-Buzo-Gray) algorithm can generate locally optimal codebooks, the complexity grows exponentially because the codewords
have to be compared and chosen between 2bn possible vectors.
85
Lattice Vector Quantization (LVQ), which builds codebooks as subsets of multidimensional lattices, solves the complexity problem of LBG based vector quantizer
and yields very general codebooks.
5.2.1
Lattice Vector Quantization
A lattice L in Rn is composed of all integral combinations of a set of linearly
independent vectors. That is
L = {Y|Y = u1 a1 + .... + un an }
(5.2)
where{a1 , ..., an } is a set of n linearly dependent vectors, and {u1 , ..., un } are all
integers. A lattice coset Λ, is obtained from a lattice L by adding a fixed translation
vector t to the points of the lattice
Λ = {Y|Y = u1 a1 + .... + un an + t}
(5.3)
Around each point Yi in a lattice coset Λ, an associated nearest neighbour set of
points called Voronoi region is defined as [72]
V (Λ, Yi ) = {X ∈ Rn /kX − Yi k ≤ kX − Yj k, Yi ∈ Λ, ∀Yj ∈ Λ}
(5.4)
The zero-centered Voronoi region V (Λ, 0) is defined as
V (Λ, 0) = V (Λ, Yi ) − Yi
(5.5)
In lattice vector quantization, the input vector is mapped to the lattice points
of a certain chosen lattice type. The lattice points or codewords may be selected
from the coset points or the truncated lattice points [72].
5.2.1.1
Classical Lattice
Conway and Sloane [57] investigated the lattice properties and determined the
optimal lattices for several dimensions. They also give a fast quantization algorithm
[58] which makes searching the closest lattice point to a given vector extremely fast.
86
The cubic lattice Z n is the simplest lattice form which consists all the integer
points in the coordinate system with a certain lattice dimension. Other important
lattices are the root lattices An (n ≥ 1), Dn (n ≥ 2), En [n = 6, 7, 8] and the BarnesWall lattice ∧16 . These lattices give the best sphere packings and coverings in their
respective dimension [52]. In this chapter, we have used the Z 4 , D4 and A2 . The
An lattice is defined by the following:
An = {(x0 , x1 , ..., xn ) ∈ Zn+1 : x0 + ... + xn = 0}, F or n ≥ 1
(5.6)
For n ≥ 3, the Dn lattice is defined as the follows:
Dn = {(x1 , ..., xn ) ∈ Zn : x1 + ... + xn even}
(5.7)
The lattice E8 is defined by
E8 = D8
[ 1
( 18 + D8 )
2
(5.8)
where 18 stands for all-one vector of 8 dimensions.
5.2.1.2
LVQ Codebook
The codebook of a lattice quantizer is obtained by selecting a finite number
of lattice points out of an infinite lattice. An LVQ codebook is decided by a root
lattice, a truncation and a scaling factor. The root lattice is the lattice coset from
which the codebook is actually constructed. A truncation must be applied on a root
lattice in order to select a finite number of lattice points and quantize the input
data with finite energy. The bit rate of the LVQ is determined by the number of
points in the truncated area. To obtain the best distortion rate, we must scale and
truncate the lattice properly. To do this, we need to know how many lattice points
lie within the truncated area, i.e. to know the shape of the truncated area.
Two kinds of truncation shapes are consider in this chapter. When the signal
to be compressed has an i.i.d. multivariate Gaussian distribution, the surfaces of
equal probability are ordinary spheres. The truncated area is spherical [62]. In
87
these applications the size of the codebook was calculated by the theta function of
the lattice, which was described in [52]. In the case of Laplacian sources (for cubic
lattice) where surfaces of equal probability are spheres for the L1 metric, which
are sometimes called pyramids. The number of lattice points N um(n, r) lying on a
hyper-pyramid of radius r in n-dimensional space Rn is given by Fischer [60] as:
N um(n, r) = N um(n − 1, r) + N um(n − 1, r − 1) + N um(n, r − 1)
(5.9)
The truncation is determined by specifying the shape and radius of the hypersphere/hyperpyramid that best matches the probability distribution of the input
source. The scaling factor is used to control the distance between any two nearest
lattice points, i.e. the maximum granular error of the quantizer [64]. The support
of the distribution of the granular quantization error has the shape of the Voronoi
region.
5.2.2
Multistage Lattice Vector Quantization
The essence of our successive refinement lattice VQ is to generate a series of de-
creasing scale zero-centered Voronoi lattice regions, V0 (Λ0 , 0), V1 (Λ1 , 0), V2 (Λ2 , 0), ...,
each covering the zero-centered Voronoi region of the previous higher scale. The
coarsest scale quantizer is completely specified by lattice points yi and its corresponding nonoverlapping Voronoi region V0 (Λ0 , yi ). To prevent divergence of overload quantization error, the truncated LVQ at current stage should be able to cover
the Voronoi region of the previous stage. On the other hand, any overlap of quantization regions at two successive stages will decrease compression efficiency. So the
optimal truncated lattice should be consistent with the Voronoi region of the root
lattice [64]. However, this optimal condition can not be always satisfied.
Figure 5.1 gives an example of this multistage LVQ with the hexagonal A2
lattice and scale-down factor r = 4. First, input vector x ∈ Rn is quantized to
be output vector u0 = y0 by the coarsest scale quantizer. The uncertainty in x
has been reduced to the Voronoi region V0 (Λ0 , y0 ) around the chosen codevector
y0 . Next quantizer quantizes the approximation error (x − u0 ), which falls into the
zero-centered Voronoi region V0 (Λ0 , 0), using finer lattice VQ quantizer to obtain
88
a refinement u1 = z1 . Now, the uncertainty in x is reduced to the zero-centered
Voronoi region V1 (Λ1 , 0) of lattice coset V1 . The next finer scale quantizer quantizes
the error (x − u0 − u1 ) reducing the uncertainty in x to the zero-centered Voronoi
region of V2 . Continuing in this way, the final approximation x̂ of vector x is
x̂ = u0 + u1 + u2 + ...
Support Region
Voronoi Region with
radius r
Lattice point
Voronoi region with
radius r/4
}
are lattice points shown to different scales
Figure 5.1: Multistage lattice VQ with A2 lattice.
5.3
MLVQ-SPIHT
In this section, we describe our new algorithm MLVQ-SPIHT, which combines
multistage lattice vector quantization methodology and the SPIHT coding algorithm
to code 3D hyperspectral image data sets.
In MLVQ-SPIHT, 2D DWT is applied on each image slice independently.
For a given vector dimension N , we segment the image sequence of the trans-
89
formed images into groups, GOS = N . N = 4 is used for our application. The
coding algorithm will be applied on every GOS independently. For every transformed image slice in the same GOS, we group wavelet coefficients at the same
spatial location into vectors. For example, for spatial location (i, j) and transformed slices S1 , S2 , ..., SN in the GOS, the vector associated with this location is
v(i, j) = (S0 (i, j), S1 (i, j), ..., SN −1 (i, j)). A parent-child relationship between the
vectors in different subbands is the same as in [19]. Figure 5.2 gives an example of
parent-child relationship between vectors when vector dimension N = 4.
Vector
Four vectors
Figure 5.2: An example of parent-child relationship between vectors
when vector dimension N = 4.
The SPIHT algorithm is used to search for significance at the current metric
threshold, which is based on certain pre-defined decision regions that gradually
decrease in scale following a given rule. Every decision region is defined by two
surfaces (one is on the inside, the other is one the outside) enclosing the origin that
successively decrease in size. For every given decision region, the SPIHT algorithm
90
is used to test the significance of the N-dimensional vectors. Each sorting pass
locates significant vectors and roughly quantizes these significant vectors in the
same pass. The vectors ascertained as significant in a pass will be progressively
refined in successive passes using our multistage LVQ. Figure 5.3 uses the A2 lattice
to illustrate our vector SPIHT, where the lattice at each stage decreases in scale by
a factor r = 4, the threshold for SPIHT sorting pass decreases in scale by a factor of
2, and the L2 norm is used for significant test. The wavelet vectors are first scaled so
that all scaled L2 norm vectors will lie within or on the hyperspherical surface of L2
norm equal to a given standardized value R. For the first sorting pass, the significant
region is bounded on the outside by the hyperspherical surface of L2 norm R, and on
the inside by the hyperspherical surface of L2 norm R/2. For the following sorting
passes, the significant regions are bounded by zero-centered hyperspherical regions,
with the inside one having half scale of the outside one. For example, if a vector
is ascertained as significant in the first sorting pass, i.e., that vector is located in
the 1st significant region, the vector is roughly encoded by the first stage LVQ of
the 1st significant region, which uses translations of V /2 lattice coset. When the
sorting pass reaches the 3rd significant region, the vector ascertained as significant
in the 1st pass will be refined by the second stage LVQ, which uses translations
of V /8 lattice coset. As shown in Figure 5.3, the bracketed sequences denote the
successively lower scale lattices used to quantize vectors in that significant region.
We believe this scheme can provide good compression performance with successive
refinement.
Based on the above scheme, we implemented a SPIHT-based coding algorithm
using four-dimensional wavelet vectors as shown in Figure 5.2. Several LVQs are
implemented in our scheme.
5.3.0.1
Cubic Z4 LVQ
To define a cubic Z4 LVQ codebook, the root lattice Z4 and a cubic truncation
are used. The cubic truncation requires the L∞ norm (maximum norm) for vector
magnitude measurement. The L∞ norm is defined as
kXk∞ := max(|x1 |, ..., |xN |)
91
Hypershperical surface of L2 norm
=R
V
V/2
V/4
3rd Significant region
V/8 (V/32, V/128,...)
2nd Significant region
V/4 (V/16, V/64,...)
1st Significant region
V/2 (V/8, V/32,...)
Figure 5.3: Vector SPIHT with successive refinement LVQ.
For cubic truncation, the bit rate is evenly allocated to each of the lattice’s Ndimension components. That implies Cubic Z4 LVQ is actually equivalent to four
individual scalar quantizers that are applied independently to each of the four coefficient in a vector. The cubic truncation area has exactly the same shape as the
Voronoi region of the corresponding root lattice. And the number of codewords in
the codebook can always be an integer power of 2, which prevents loss in coding efficiency. Cubic truncation does not well match the typical distributions of subband
coefficients and decreases the compression performance. Two different bit rates are
used in our cubic Z4 multistage LVQ. When a newly significant vector is quantized
in its first stage quantizer, an 8-bit LVQ is used to quantize both significance and
signs. In its all refinement stages, a 4-bit LVQ is used. So the truncation radius of
these two kinds of LVQs are 2 and 1, respectively. If the threshold is scaled down
by two at each successive layer, these layers are equivalent to bit planes.
5.3.0.2
Pyramid D4 LVQ
To define a pyramid D4 LVQ codebook, the root lattice D4 and a pyramid
truncation are used. The pyramid truncation requires the L1 norm for vector mag-
92
nitude measurement. The L1 norm is defined as
kXk1 :=
N
X
|xi |
i=1
In our implementation, the truncation radius is set to four. Lattice points inside
this truncation area lie on two hyper-pyramid surfaces with constant L1 norm 2 and
4, respectively. The number of lattice points on these two shells are 32 and 192,
respectively [62]. So 8-bit indexes are used to code these 225 codewords. The same
LVQ codebook is used in all stages. Since the Voronoi region is closer to a sphere
and is inconsistent with the shape of the pyramid truncation, to get the best balance
between overlaps and gaps between the Voronoi region at the current stage and that
of the previous stage, the scale-down factor is set to 1/3 [64].
5.3.0.3
Sphere D4 LVQ
To define a sphere D4 LVQ codebook, the root lattice D4 and a sphere truncation are used. The sphere truncation requires the L2 norm for vector magnitude
measurement. The L2 norm is defined as
v
u N
uX
kXk2 := t
|xi |2
i=1
In our implementation, the truncation radius is set to 2. Lattice points inside this
√
truncation area lie on two hyper-sphere surfaces with constant L2 norm 2 and
2, respectively. The number of lattice points on these two shells are 24 and 24,
respectively [62]. So 6-bit indexes are used to code these 48 codewords. The scaledown factor is set to 1/2.
5.4
Experimental Results
The proposed MLVQ-SPIHT algorithm is used compress hyperspectral image
”Moffet Field”. Its property is shown in Table 5.1.
The pyramid wavelet decomposition employed here uses the S+P wavelet filter,
and a 5-level spatial transform is performed. After wavelet transformation, the
93
File Name
Image Type
Volume Size
moffett scence 1
moffett scence 3
AVIRIS
AVIRIS
512 × 512 × 224
512 × 512 × 224
Bit Depth
(bit/pixel)
16
16
Power (σx2 )
4803298
2177316
Table 5.1: Description of the image volume Moffett Field
magnitude of each vector is calculated according to the norm corresponding to the
particular LVQ. The fast quantizing and coding algorithm proposed by Conway and
Sloane [58, 59] is used to code the significant vectors. For each significant region,
the LVQ indices are coded using an adaptive arithmetic coder and the significant
information is adaptively arithmetic coded as described in [19].
The quality of reconstruction is measured by signal to noise ratio (SNR). SNR
is defined by
SN R = 10 log10
σx2
dB
M SE
(5.10)
Figure 5.5 compares the rate-distortion performance of MLVQ-SPIHT with
scalar SPIHT for each band. In Figure 5.5, the SNR results for 2DSPIHT and
MLVQ-SPIHT are obtained by calculating the σx2 and MSE for each band separately.
The plots show the MLVQ-SPIHT offers over 3dB improvement at 0.1 bpp and 0.5
bpp for all bands in the sequence. This implies that the hyperspectral sequences
are highly correlated, and using vector quantization along the wavelength axis can
efficiently exploit these inter band correlations. The visual comparison of original
and reconstructed 49th band in moffet scene3 at 0.1bpp and 0.5bpp are given in
Figure 5.4.
Table 5.2 compares the rate-distortion results for MLVQ-SPIHT using different LVQs with 3D-SPIHT, 3D-SPECK and JPEG2000 multi-component integer
implementation for the Moffett hyperspectral image volume [22]. Five-levels of the
dyadic S+P (B) integer filter were applied on all three dimensions for 3D-SPIHT
and 3D-SPECK. For JPEG2000 multi-component, five-level 1D S+P (B) filter was
first applied on spectral axis followed by (5,3) filter on spatial domain. For MLVQSPIHT, to enable SNR scalability, bit stream boundaries are maintained for every
coding layer. To compare with those three dimensional compression algorithm,
94
bits belonging to the same fraction of the same coding layer in the the different
four dimensional vector bands can be extracted for decoding. The SNR results of
MLVQ-SPIHT are obtained by first calculating the overall MSE and power (the σx2
shown in Table 5.1) of the whole image sequence, then following the Equation 6.3.
The results show that at low bit rate, MLVQ-SPIHT algorithms outperforms 3D
compression algorithms. As the bit rate increases, 3D algorithms give better performance. And in general, sphere D4 LVQ shows better performance than cubic Z4
LVQ. The reason that MLVQ-SPIHT performs worse at high bit rates, which means
more quantization stages, is due to the overlaps and gaps between two successive
stages. As we mentioned before, to prevent divergence of overload quantization error, the truncated LVQ at current stage should be able to cover the Voronoi region
of the previous stage. On the other hand, any overlap of quantization regions at two
successive stages will decrease compression efficiency. However, it’s very difficult to
find a truncated lattice which is perfectly consistent with the Voronoi region of the
root lattice.
Bit
Rate
3DSPIHT [23]
3DSPECK [23]
0.1
0.2
0.5
1.0
bpp
bpp
bpp
bpp
15.509
20.605
29.105
37.198
15.717
20.778
29.199
37.284
0.1
0.2
0.5
1.0
bpp
bpp
bpp
bpp
10.828
16.740
26.102
34.946
10.622
16.557
25.998
34.845
JP2KMLVQ-SPIHT
Multi[23]
Cubic Z4
moffet scene 1
14.770
16.475
19.655
20.617
27.999
25.136
36.312
29.602
moffet scene 3
10.264
11.817
15.952
17.144
25.298
23.859
33.835
31.169
MLVQ-SPIHT
Pyramid D4
MLVQ-SPIHT
Sphere D4
16.401
21.174
24.861
31.692
17.035
21.905
26.492
32.646
11.807
18.149
24.605
31.152
12.361
18.278
25.100
31.305
Table 5.2: Comparison of rate-distortion results of different coding methods in Signal-to-Noise ration (SNR) in dB
5.5
Summary and Conclusions
In this chapter, we presented a multidimensional image compression algorithm
which is an extension of SPIHT with lattice vector quantization and support successive refinement. In the proposed algorithm, multistage lattice vector quantization
95
is used to exploit correlations between image slices. Cubic Z4 LVQ, sphere D4 LVQ
and pyramid D4 LVQ are implemented in the proposed scheme. The experimental
results show that MLVQ-based-schemes exploit the inter-band correlations along
the wavelength axis and provide better rate-distortion performance at low bit rate
than 2DSPIHT and those algorithms that employ 3D wavelet transforms.
96
(a) Original image of moffet scene3 49th band
(b) Reconstructed image of moffet scene3 49th
band at 0.1bpp
(c) Reconstructed image of moffet scene3 49th
band at 0.5bpp
Figure 5.4: Comparison of original and reconstructed moffet scene 3 49th
band by MLVQ-SPIHT, from top to bottom: original, 0.1bpp,
0.5bpp.
97
Moffett Field, scene 3 @ 0.1bpp
25
2D SPIHT
Z4 SPIHT
Shpere D4 SPIHT
SNR(dB)
20
15
10
5
0
0
50
100
150
Spectral band
200
250
(a) Compare rate-distortion performance of MLVQ-SPIHT
with scalar SPIHT at 0.1bpp
Moffett Field, scene 3 @ 0.5bpp
35
2D SPIHT
Z4 SPIHT
Shpere D4 SPIHT
30
SNR(dB)
25
20
15
10
5
0
0
50
100
150
Spectral band
200
250
(b) Compare rate-distortion performance of MLVQ-SPIHT
with scalar SPIHT at 0.5bpp
Figure 5.5: Comparison of lossy performance of for Moffet Field image,
scene 3.
CHAPTER 6
Four-Dimensional Wavelet Compression of 4-D Medical
Images Using Scalable 4-D SBHP
In this chapter, we proposes a low-complexity wavelet-based method for progressive lossy-to-lossless compression of four dimensional (4-D) medical images. The
Subband Block Hierarchial Partitioning (SBHP) algorithm is modified and extended
to four dimensions, and applied to every code block independently. The resultant
algorithm, 4D-SBHP, efficiently encodes 4D image data by the exploitation of the
dependencies in all dimensions, while enabling progressive SNR and resolution decompression. The resolution scalable and lossy-to-lossless performances are empirically investigated. The experimental results show that our 4-D scheme achieves
better compression performance on 4-D medical images when compared with 3-D
volumetric compression schemes.
6.1
Introduction
Four-dimensional (4-D) data sets, such as images generated by computer to-
mography (CT) and functional Magnetic Resonance (fMRI) are increasingly used
in diagnosis. Three-dimensional (3-D) volumetric images are two-dimensional (2D) image slices that represent cross sections of a subject. Four dimensional (4-D)
medical images, which can be seen as a time series of 3-D images, represent the live
action of human anatomy and consume even larger amounts of resources for transmission and storage than 3-D image data. For example, a few seconds of volumetric
CT image sequences require a few hundred mega-bytes of memory. Therefore, for
modern multimedia applications, particularly in the Internet environment, efficient
compression techniques are necessary to reduce storage and transmission bandwidth.
Furthermore, it is highly desirable to have properties of SNR and resolution scalability with a single embedded bitstream per data set in many applications. SNR
scalability gives the user an option of lossless decoding, which is important for analysis and diagnosis, and also allows the user to reconstruct image data at lower rate or
98
99
quality to get rapid browsing through a large image data set. Resolution scalability
can provide image browsing with low memory cost and computational resources.
Since 4-D image data can be represented as multiple 2-D slices or 3-D volumes,
it is possible to code these 2-D slices or 3-D volumes independently. Many waveletbased 2-D [18, 19, 17, 24] and 3-D [8, 73, 7, 14, 74] image compression algorithms
have been proposed and applied on medical images. However, those 2-D and 3-D
methods do not exploit the dependency among pixels in different volumes. Since
the 4-D medical data is normally temporally smooth, the high correlation between
volumes makes an algorithm based on four-dimensional coding a better choice.
Very little work has been done in the field of 4-D medical image compression.
Zeng et al. [76] use 4-D discrete wavelet transform and extended EZW to 4-D for
lossy compression the echocardiographic data. SPIHT was extended to 4-D and
tested on fMRI and 4-D ultrasound images by Lalgudi et al. [77]. These two algorithms are zerotree codecs and use symmetric tree structure. Lalgudi et al. [75]
applied a 4-D wavelet transform on fMRI data, and compressed the transformed
slices by JPEG2000 separately. Kassim et al. [78] proposed a lossy-to-lossless compression method for 4-D medical images by using a combination of 3-D integer
wavelet transform and 3-D motion compensation.
In this paper, we propose a low-complexity progressive lossy-to-lossless compression algorithm that exploits dependencies in all four dimensions by using a 4-D
discrete wavelet transform and 4-D coder. We extend SBHP [16], originally proposed
as a low complexity alternative to JPEG2000 [24], to four dimensions. We have already reported on extension of SBHP to three dimensions and shown that this 3D
SBHP is about 6 times faster in lossless encoding and 6 to 10 times faster in lossless
decoding than Asymmetric Tree 3D-SPIHT [74]. This block-based algorithm has
better scalability and random accessibility than zerotree coders. The 4D-SBHP is
based on coding 4-D subblocks of 4-D wavelet subbands and can provide scalability
and fast encoding and decoding.
In this chapter, we will investigate the lossy-to-lossless performance and resolution scalability of 4D-SBHP in detail.
The rest of this chapter is organized as follows. We present the scalable 4D-
100
SBHP algorithm in Section 2. Experimental results of scalable coding are given in
Section 3. Section 4 will conclude this study.
6.2
6.2.1
Scalable 4D-SBHP
Wavelet Decomposition in 4-D
For 4D datasets, the variances along the axial and temporal direction are
determined by the thickness of slices and imaging speed. The variances among four
dimensions may be very different. In general, the similarity of pixel values along
the temporal and axial directions are expected to be closer than along the other two
directions, and similarity along the X and Y directions is very close. In Table 6.1,
we give the average standard deviation (STD) of a 4D fMRI medical dataset and a
4D CT dataset along the X, Y, axial Z and temporal T direction. This asymmetric
similarity has also been shown in [75] for 4-D fMRI image data sets. Therefore, it is
reasonable to apply transforms along the axial and temporal directions in different
ways from the transforms along the X and Y directions in the 4D wavelet transform.
STD
siem
ct4d
X
32.131
22.975
Y
24.451
23.460
Z
19.731
4.547
T
2.991
2.171
Table 6.1: Average standard deviation of 4D fMRI and 4D CT image data
along X, Y, Z and T directions.
In our method, 2D spatial transformation, 1D axial transformation (along image slices) and 1D temporal transformation are done separately by first performing
1D dyadic wavelet decomposition in the temporal domain, followed by 1D dyadic
wavelet decomposition along the axial direction and then 2D dyadic spatial decomposition in the XY planes. A heterogeneous selection of filter types and different
amounts of decomposition levels for each spatial direction (x, y, z or t direction) are
supported by this separable wavelet decomposition module. This allows for adapting the size of the wavelet pyramid in each direction in case the resolution is limited.
Figure 6.1 shows a 4-D (x, y, z, t) data set after 2 levels of 4-D wavelet transform.
Because the number of volumes and slices in a typical 4-D data set can be
quite large, it is impractical to buffer all volumes and slices for the temporal and
101
4-DimagedatainaGOV
Fslices
Z
T
X
4-DWavelettransform
Y
LLt
LHt
Ht
Figure 6.1: Wavelet decomposition structure with 2 levels of 1D temporal
transform followed by 2 levels of 1D axial transform and 2D
spatial transform. The black block is the lowest frequency
subband.
axial transform. In our scheme, F consecutive slices in T consecutive volumes
are collected into a Group Of Volumes (GOV). For example, the set of slices Sk,l ,
k ∈ (0, F − 1), l ∈ (0, T − 1) forms a GOV. We shall use the notation GOV (a, b)
to indicate that the GOV has a slices in each volume and b volumes. Each GOV is
independently transformed and coded.
6.2.2
Coding Algorithm
The 2-D SBHP algorithm is a SPECK[17] variant which was originally designed
as a low complexity alternative to JPEG2000 [16]. 4-D SBHP is a modification
and extension of 2-D SBHP to four dimensions. In 4-D SBHP, each subband is
partitioned into 4-D code-blocks. All 4-D code-blocks have the same size. 4-D
partitioning is applied to every code-block independently and generates a highly
scalable bit-stream for each code-block by using the same form of progressive bitplane coding as in SPIHT[18].
102
Consider a 4-D image data set that has been transformed using a discrete
wavelet transform. The image sequence is represented by an indexed set of wavelet
transformed coefficients ci,j,k,l located at the position (i, j, k, l) in the transformed
image sequence. Following the idea in[19], for a given bit plane n and a given set τ
of coefficients, we define the significance function:
Sn (τ ) =

 1, if 2n ≤ max |ci,j,k,l | ≤ 2(n+1) ,
(i,j,k,l)∈τ
 0, otherwise.
(6.1)
Following this definition, we say that set τ is significant with respect to bit
plane n if Sn (τ ) = 1. Otherwise, we say that set τ is insignificant.
In 4D-SBHP, each subband is partitioned into 4-D code-blocks with the same
size. The 4D-SBHP algorithm makes use of sets referred to as sets of type S which
can be of varying dimensions. The dimension of a set S depends on the dimension
of the 4-D code-block and the partitioning rules. Because of the limited number of
volumes and slices in a GOV, the dimensions along the temporal and axial directions
of the 4-D code-block might be much shorter than the dimensions along the x and y
directions. In our method, we set the 4-D code-block size to be 2M × 2M × 2N × 2N ,
(M > N > 0), i.e., code-block size has equal dimensions along x and y directions,
and equal dimensions along z and t directions. With these dimensions, the initial
stages of partitioning result in some S sets that are 2D sets, i.e., temporal and axial
dimension are both 1. We define Max2D to be the maximum 2D S set that can be
generated. For a 2M ×2M ×2N ×2N code-block, the Max2D is the 2M −N ×2M −N ×1×1
set. 4D-SBHP always has type S sets with at least 2 × 2 × 1 × 1 coefficients.
The size of a set is defined to be the cardinality C of the sets, i.e., the number
of coefficients in the set. During the course of the algorithm, sets of various sizes
will be formed, depending on the characteristics of coefficients in the code-block.
size(S) = C(S) ≡ |S|
(6.2)
4-D SBHP is based on a set-partitioning strategy. Figure 6.2 and Figure 6.3
illustrate the partitioning process used in 4D-SBHP.
Below we explain in detail the 4-D partition rules by using a 64 × 64 × 4 × 4
103
S
S
O(S)
O(S)
(a) size(S) ≤ size(M ax2D).
(b) size(S) > size(M ax2D).
Figure 6.2: Quadtree partitioning of set S.
S
S S
I
(a) size(S) ≤ size(M ax2D).
S S
S
S S
S
S
S S
S
S S
S
S
I
(b) size(S) > size(M ax2D).
Figure 6.3: octave-band partitioning of set I.
4-D code-block X as an example. Here size(Max2D) = 16 × 16 × 1 × 1.
The algorithm starts with two sets, as shown in Figure 6.4(a). One is composed
of the S set: 2 × 2 × 1 × 1 top-left wavelet coefficients at the (0,0,0,0) position in
X, and the other is the I set which contains the remaining coefficients, I = X − S.
The 4-D SBHP will work on a 2-D domain and follow exactly the octave band
partition rules as in 2-D SBHP [16] until the size(X − I) equal to size(Max2D),
i.e. 16 × 16 × 1 × 1, as shown in Figure 6.4(b). In the next stage, the upper left
16 × 16 × 1 × 1 set at the (0,0,0,0) position in X will follow the 2-D quadrisection
partition rules until all the significant coefficients be located. And the remaining I
104
set will be partitioned into fifteen 16 × 16 × 1 × 1 S sets (labeled 2 − 16 in Figure
6.4(c)) and one I set. The 2-D SBHP partition rule is applied on these 16×16×1×1
S sets to locate significant coefficients. At the following stage, the remaining I set is
partitioned into fifteen 32 × 32 × 2 × 2 S sets, labeled 17 − 31 in Figure 6.4(d). The
two 32 × 32 × 2 3-D blocks with the same label make one 32 × 32 × 2 × 2 4-D block.
At the next step, each 32 × 32 × 2 × 2 4-D block will be partitioned into sixteen
16 × 16 × 1 × 1 blocks and 2-D SBHP partition rules will be applied on those 2-D
blocks until all sets are partitioned to individual coefficients. Figure 6.4(e) shows
this partition on block 17.
During the coding process a set is partitioned following the above rules when
at least one of its subsets is significant. To minimize the number of significant tests
for a given bit-plane, 4-D SBHP maintains three lists:
• LIS(List of Insignificant Sets) - all the sets(with more than one coefficient)
that are insignificant but do not belong to a larger insignificant set.
• LIP(List of Insignificant Pixels) - pixels that are insignificant and do not belong
to insignificant set.
• LSP(List of Significant Pixels) - all pixels found to be significant in previous
passes.
Instead of using a single large LIS that has sets of varying sizes, we use an
array of smaller lists of type LIS, each containing sets of a fixed size. All the lists
and list arrays are updated in the most efficient list management method - FIFO.
Since the total number of sets that are formed during the coding process remain the
same, using an array of lists does not increase the memory requirement for the coder.
Use of multiple lists completely eliminates the need for any sorting mechanism for
processing sets in increasing order of their size and speeds up the encoding/decoding
process. For each new bit plane, significance of coefficients in the LIP are tested
first, then the sets in the LIS in increasing order of their sizes, and lastly the code
refinement bits for coefficients in LSP. Testing of sets by increasing size allows finding
of significant coefficients more quickly and hence conveying value information prior
to set significance information that conveys no value of individual coefficients.
105
The way 4D-SBHP entropy codes the comparison results is an important factor
that reduces the coding complexity. Instead of using adaptive arithmetic or Huffman
coding, 4D-SBHP uses only three fixed Huffman codes in some special conditions.
Since there are only four subsets or pixels after most sets are partitioned, they can be
coded together. In 4D-SBHP, we choose a Huffman code with 15 symbols, which is
used in Chapter 3, corresponding to all the possible outcomes. The largest Huffman
code is of length 6 bits. To speed up decoding, we can use lookup tables instead of
binary trees to decode. No entropy coding is used to code the sign or the refinement
bits. For these large datasets, this light use of entropy coding is a major factor in
the low complexity and speed of the 4D-SBHP algorithm.
6.2.3
Scalable Coding
In a wavelet coding system, resolution scalability enables increase of resolution
when bits in higher frequency subbands are decoded. As shown in Figure 6.1,
4D-SBHP codes code-blocks in the black part first, then code-blocks in the white
subbands. For a 2D image, after N-level of wavelet decomposition, the image has
N +1 resolution levels. For a 4D image sequence with N-level wavelet decomposition
in the spatial direction, M-level wavelet decomposition in the spectral direction and
K-level wavelet decomposition in the temporal direction a total of (N + 1) × (M +
1) × (K + 1) resolution levels will be available. As shown in Figure 6.5, 4D-SBHP
codes code-blocks from lowest to the highest frequency subbands. The algorithm
generates a progressive bit stream for each code-block, and the whole bit stream is
resolution scalable. If a user wants to decode up to resolution n, bits belonging to
the same fraction of the same bit planes in the code-blocks related to resolution n
can be extracted for decoding.
4D-SBHP is applied independently to every 4-D code-block inside a subband.
An embedded bit stream is generated by bitplane coding, but overal, the bitstream
is resolution scalable. To enable SNR scalability, rate-distortion information are
calculated and stored in the header of each code-block in the coding process. When
decoding, the rate allocation method described in 3 is used to select optimal cutting points for every code-block. Then selected bitstream from every code-block is
106
interleaved to get the final bitstream..
6.3
Numerical Results
We tested our algorithm on four fMRI , two 4D CT and one 4D ultrasound
imaging datasets. Table 6.2 gives a brief description of these image datasets.
File Name
Image Type
mb01
siem
feeds
3T
heart
ct4d
4D CT
fMRI
fMRI
fMRI
fMRI
ultrasound
CT
CT
Dimension
(x,y,z,t)
64 × 64 × 20 × 100
64 × 64 × 16 × 120
64 × 64 × 20 × 180
64 × 64 × 28 × 244
128 × 128 × 128 × 12
256 × 256 × 256 × 16
512 × 512 × 108 × 8
Bit Depth
(bit/pixel)
13
8
13
13
8
8
13
Resolution in ’xy’ and z
respectively (mm)
3.75 and 5
unknown
3.75 and 5
3.75 and 5
unknown
unknown
0.787109 and 2.5
Table 6.2: Description of the image volumes
In this section, we provide simulation results and compare the proposed 4-D
codec with 3-D volumetric algorithms.
6.3.1
Comparison of Lossless performance with 3-D and 4-D schemes
For 4-D datasets (x, y, z, t), we can employ 3D wavelet compression on either
xyz cube or xyt cube. Table 6.3 compares the lossless compression performance of
4-D SBHP with 3-D SBHP applied on xyz cube and xyt cube. We get considerable
compression improvement for 4D-SBHP as compared to 3D-SBHP on xyz and a
small improvement as compared to 3D-SBHP on xyt. All results were obtained using
three-level I(2,2) reversible transforms in the xy domain and one-level S transforms
along the temporal and axial directions. The 4-D code-block size of (32 × 32 × 2 × 2)
and GOV size GOV (z, t) = (4, 4) are chosen here. For 3D-SBHP, code-block size
(32 × 32 × 2) and GOS (groups of slices) size of 4 (GOV (4, 1)) are chosen.
Table 6.4 compares the lossless compression performance of 4D SBHP with
4D JPEG2000 [75], 4D EZW and 4D SPIHT [77]. Instead of applying wavelet
transform on every GOV independently, these works apply 4D JPEG2000, 4D EZW
and 4D SPIHT on the 4D wavelet transform of the whole 4D dataset. For these
three 4D methods [75, 77], the compression parameter settings such as code-block
size and wavelet decomposition level are not mentioned. To get 4D SBHP results,
107
File Name
mb01
siem
feeds
3T
heart
ct4d
4D CT
3D-SBHP on ’xyz’ cube
7.4732
4.9947
5.4952
10.3150
2.0719
3.2870
5.1953
3D-SBHP on ’xyt’ cube
6.8829
4.7234
4.5045
9.6163
2.1728
3.0830
5.0137
4D-SBHP
6.8196
4.6187
4.4923
9.4972
2.0272
2.8394
4.8667
Table 6.3: Lossless compression performance using 4D-SBHP and 3DSBHP (bits/pixel)
GOV (z, t) = (16, 32), and wavelet decomposition level (xy, z, t) = (3, 2, 5) are used
for mb01 and siem datasets. For heart image data, we use GOV (z, t) = (32, 8) and
wavelet decomposition level (xy, z, t) = (3, 2, 2). Code-block size (64 × 64 × 4 × 4) is
used on all three datasets. As shown in the table, 4D-SBHP is roughly comparable
to 4D JPEG2000, 4D-EZW and 4D-SPIHT in lossless performance, while having
lower memory requirement and complexity.
File Name
mb01
siem
heart
4D JPEG2000 [77]
5.962
4.056
1.68
Table 6.4: Lossless compression
(bits/pixel)
6.3.2
4D EZW [77]
5.984
4.331
2.013
4D SPIHT [77]
5.749
3.933
1.735
performance
using
4DSBHP
6.0585
4.1580
1.9905
4D
methods
Comparison of Lossy performance with 3-D schemes
In this section, we show performance of lossy reconstruction from lossless com-
pressed file. The quality of reconstruction is measured by Signal-to-Noise Ratio
(SNR) over the whole 4-D image dataset. SNR is defined by
SN R = 10 log10
Px2
dB
M SE
(6.3)
where Px2 is the average squared value of the original 4D medical image dataset and
MSE denotes the mean squared-error between all the original and reconstructed
slices. Figure 6.7 and Figure 6.8 compare the lossy performance of 4-D SBHP with
3-D SBHP applied on xyz cube and xyt cube. Figure 6.7(a) and 6.8(a) show plots
of Signal-to-Noise Ratio (SNR) versus bitrate over the whole mb01 and siem image
108
data. We also evaluated the lossy performance of our algorithm in 2-D as medical
images are usually viewed slice by slice. Figure 6.7(b) shows the lossy results of the
(x, y) slices at every z of the 4-D block at time t = 8, i.e., (x, y, z, 8) for every z
of mb01 at 3.5 bits/pixel. Figure 6.8(b) shows the lossy results of the (x, y) slices
at every z of the 4-D block at time t = 20, i.e., (x, y, z, 20) for every z of siem at
1.0 bits/pixel. Figure 6.7 and 6.8 clearly show that our 4-D scheme exploits the
redundancy in all four dimensions and is superior to 3-D coding schemes.
Figure 6.9 shows the 3D view of the reconstructed siem sequence at time
t = 20. The algorithm has a good subjective quality. Small differences in quality
are almost not noticeable, especially at higher bit rates.
6.3.3
Resolution scalable results
The fMRI medical sequence siem , three-level I(2,2) reversible transform in xy
domain and two-level S transform along temporal and axial directions are selected
for this comparison. And GOV size GOV (z, t) = (16, 16). Figure 6.10 shows the
reconstructed siem images decoded from a single scalable code stream at a variety
of resolutions at 0.25 bpp. The SNR values listed in Table 6.5 for low resolution
image sequences are calculated with respect to the lossless reconstruction of the
corresponding resolution. Table 6.5 shows that the SNR values increase from one
resolution to the next lower one.
Bit
Rate
0.0625
0.125
0.25
0.5
1
2
SNR (dB)
1/2 resolution
23.722
28.817
37.80
lossless
lossless
lossless
FULL
12.347
13.356
15.187
18.716
21.985
25.883
Table 6.5: SNR for decoding siem at a variety of resolutions and bit rates
6.4
Summary and Conclusions
In this paper, we proposed an image coding algorithm, 4D-SBHP, for lossy-
to-lossless compression of 4-D medical images using four-dimensional DWT and
109
four-dimensional set-partition. This block-based algorithm supports resolution scalability. Fixed Huffman coding and one coding pass per bit plane are used to reduce
the coding time. The experimental results show that 4D-SBHP exploits the redundancy in all four dimensions and has higher compression ratio than 3-D compression
schemes. Furthermore, 4D-SBHP is low in complexity and exhibits fast encoding
and decoding times.
110
64x64x4x44-Dcode-block
64x64x4
2x2
s
I
(a)
64x64x4x44-Dcode-block
X-I
16x16
X
64x64x4
I
(b)
64x64x4x44-Dcode-block
X
16x16
6
5
1 2
7
3 4
13 14
9 10
15
11 12
8
16
I
(c)
32x32x2x2
20
21
23
24
30
29
25
31
28
24
18
19
26
19
27
25
29
31
30
22
22
18
23
17
17
28
21
20
26
27
(d)
16x16x1x1
20
21
23
17
17
23
24
30
25
29
31
28
24
19
18
26
19
27
25
29
31
30
22
22
18
28
21
20
26
27
(e)
Figure 6.4: Set partitioning rules used by 4-D SBHP.
111
block L
block 0
header
header
the highest
bitplane
header
header
b(n_0,0)
b(n_1,1)
b(n_0-1,0)
header
b(n_j,j)
b(n_i,i)
b(n_L,L)
b(n_L-1,L)
b(n_j-1,j)
b(n_i-1,i)
b(n_1-1,1)
SNR
scalable
b(0,L)
b(0,0)
b(0,1)
b(0,j)
b(0,i)
the lowest
bitplane
LLLLLL Subband
resolution 0
HHHHHH Subband
resolution k
Resolution scalable
Figure 6.5: An example of 4D-SBHP SNR and resolution scalable coding. Each bitplane α in block β is notated as b(α, β). Codeblocks are encoded and indexed from the lowest subband to
the highest subband.
the highest bitplane
header
b(n,0) b(n-1,0)
block 0
Ri , j , 0
the lowest bitplane
b(0,0)
header
b(m,1) b(m-1,1)
b(0,1)
block 1
δDi , j , 0
Figure 6.6: Bitstream structure generated by 4D-SBHP. Each bitplane
α in block β is notated as b(α, β). Rate-distortion information
is stored in the header of every code-block.
112
55
50
45
SNR(dB)
40
35
30
25
4D−SBHP
3D−SBHP xyt
3D−SBHP xyz
20
15
0
1
2
3
4
Rate (bpp)
5
6
7
(a) Lossy performance on mb01 image data
45
SNR(dB)
40
35
4D−SBHP
3D−SBHP xyt
3D−SBHP xyz
30
0
5
10
Slice no
15
20
(b) Lossy performance of every slice in eighth time sequence
of mb01 image data at 3.5bpp
Figure 6.7: Comparison of lossy performance of mb01 image data.
113
35
SNR(dB)
30
25
20
15
10
4D−SBHP
3D−SBHP xyt
3D−SBHP xyz
0
0.5
1
1.5
2
bpp
2.5
3
3.5
4
(a) Lossy performance on siem image data
26
24
SNR(dB)
22
20
18
16
4D−SBHP
3D−SBHP xyt
3D−SBHP xyz
14
12
0
2
4
6
8
Slice no
10
12
14
16
(b) Lossy performance of every slice in twentieth time sequence of siem image data at 1.0bpp
Figure 6.8: Comparison of lossy performance of siem image data.
114
Figure 6.9: Reconstructed siem sequence at time t = 20 by 4D-SBHP,
from left to right, top to bottom: original, 0.5 bpp, 1.0 bpp,
and 2.0bpp
Figure 6.10: A visual example of resolution scalable decoding. Full resolution and 1/2 resolution of one slice at 0.25 bpp
CHAPTER 7
Conclusions and Future Work
This thesis has investigated the problem of volumetric image compression. In
this chapter, we summarize the contributions of this work and give some directions
for future research.
7.1
Contributions of the Thesis
In the first part, a low-complexity, embedded, block based, wavelet transform
coding algorithm has been proposed for volumetric image compression. Based on the
properties of volumetric image data, asymmetric 3D wavelet transform is applied to
maximally de-correlate the source signals. The Three-Dimensional Subband Block
Hierarchical Partitioning (3D-SBHP), efficiently encodes volumetric image data by
exploiting the correlations in all dimensions. Fixed Huffman coding and one coding
pass per bit plane are used to get low computational complexity. 3D-SBHP generates
embedded bitstream and supports SNR scalability, resolution scalability and random
accessibility from one bitstream.
The details of SNR scalable and resolution scalable 3D-SBHP are given in
Chapter 3. In 3-D SBHP, each subband is divided into contiguous code-blocks. 3-D
SBHP is applied to every code-block independently and generates a highly scalable
bit-stream for each code-block. The processing order of the code-blocks is resolution
by resolution, i.e. the code-block in the next finer resolution will be coded only after
all code-blocks in a given resolution is coded. We described the integer filter mode for
3D-SBHP which enables lossy-to-lossless decompression from the same bitstream.
Wavelet packet structure and coefficient scaling are used to make the integer wavelet
transform approximately unitary.
Chapter 4 described the way 3D-SBHP is used to support random access. The
details of the method that finds corresponding code-blocks for a given ROI are given
in this chapter. The equations for number of code-blocks used for reconstructing a
given ROI are also derived. The impact that the wavelet transform and code-block
115
116
configuration have on the compression efficiency and accessibility of an embedded
bitstream is assessed by applying 3D-SBHP on medical volumetric images. Our work
shows that the ROI access performance is affected by a set of coding parameters.
With regard to both coding and access efficiency, we outlined some reasonable tradeoffs for 3D volumetric images.
With a small loss of quality, 3D-SBHP gives low-complexity, SNR scalable,
resolution scalable and random access decoding features. The 3D-SBHP is a very
good candidate for applications that need high speed encoding volumetric images
and other features, such as random access decodng and progressive decoding without
any trhanscoding of an encoded bitstream.
The application of extending 3D-SBHP to 4D case in Chapter 6 is another
contribution of this work. The 4D wavelet transform and 4D coding method give a
significant improvement over previous 2D and 3D techniques.
In Chapter 5 of the thesis, the quantization techniques are investigated to
exploit the relationships between the slices in the volumetric image dataset. For
volumetric images, especially for hyperspectral images, neighboring slices convey
highly related spatial details. And vector quantization has the ability to exploit the
statistical correlation between neighboring data in a straightforward manner. Based
on those facts, vector quantization is combined with SPIHT coding algorithm to code
hyperspectral images. Lattice vector quantization is used to get low computational
complexity. In particular, multistage lattice vector quantization (MLVQ) is used
to exploit correlations between image slices, while offering successive refinement.
Different LVQs including cubic Z4 and D4 are considered. And their performances
are compared with other 2D and 3D wavelet-based image compression algorithms.
This new method exploits the inter-band correlations along the wavelength axis and
provide better rate-distortion performance at low bit rate than 2DSPIHT and those
algorithms that employ 3D wavelet transforms.
7.2
Futher Work
Some suggestions for future work are given below:
117
7.2.1
Improving Compression Efficiency
Our 3D-SBHP can support all features of JPEG2000 and has very low-complexity.
For lossless compression efficiency, 3D-SBHP is comparable to other 3D image coders
on AVIRIS image data. Our result is only 2% worse than the highest efficiency we
listed in the Table 3.8. However for medical image data, the lossless compression efficiency of 3D-SBHP is about 2%-10% lower than that of other 3D algorithms. Due
to the limitations of storage and transmission bandwidth, improvement of 3D-SBHP
compression performance is important. In 3D-SBHP, two fixed Huffman codes are
used for all code-blocks. Since wavelet coefficients in different subbands have different source statistics, we can investigate the source statistics of medical image data
and find a better low-complexity entropy coder.
7.2.2
3D-SBHP on Video
For emerging video and multimedia applications, resolution and fidelity sala-
bilities are very essential. 3D-SBHP, a high speed 3D wavelet subband coder with
SNR and resolution scalabilities, is a good candidate for video compression. Recently, 3D wavelet coding via a motion compensated temporal filter (MCTF) is
emerging as a very effective structure for highly scalable video coding, such as the
MC-EZBC (Embedded ZeroBlocks Coding) video coder. 3D-SBHP can replace the
EZBC subband coder as a low-complexity alternative. The idea of MCTF-based
3D-SBHP can also be applied on the 4D medical volumetrical images.
LITERATURE CITED
[1] T. Hamid, “DICOM requirements for JPEG2000”,
ISO/IECJTC1/SC29/WG1, Report N944, 1998.
[2] A. K. Jain, “Fundamentals of Digital Image Processing”, Englewood Cliffs,
NJ: Prentice-Hall, 1989.
[3] J. S. Lim, “Two-Dimensional Signal and Image Processing”, Englewood
Cliffs, NJ: Prentice-Hall, 1990.
[4] J. W. Woods and S. D. O’Neil, “Subband coding of images”, IEEE Trans. on
Acoust., Speech and Signal Processing, Vol. 34, pp. 1278-1288, Oct. 1986.
[5] G. P. Abousleman, M. W. Marcellin, and B. R. Hunt, “Compression of
hyperspectral imagery using the 3-D DCT and hybrid DPCM/DCT”, IEEE
Trans. on Geosci. Remote Sensing, Vol. 33, pp. 26-34, Jan. 1995.
[6] A. Vlaicu, S. Lungu, N. Crisan, and S. Persa, “New compression techniques
for storage and transmission of 2D and 3D medical images”, in Proc. SPIE
Advanced Image and Video Communications and Storage Technologies, vol.
2451, pp. 370-377, Feb. 1995.
[7] A. Bilgin, G.Zweig, and M.W. Marcllin, “Three-dimensional image
compression with integer wavelet transform”, Applied Optics, Vol. 39, No.11,
April. 2000.
[8] Z. Xiong, X. Wu, S. Cheng, and J. Hua, “Lossy-to-Lossless compression of
medical volumetric data using three-dimensional integer wavelet transforms”,
IEEE Trans. on Medical Imaging, Vol. 22, No. 3, pp. 459-470, March 2003.
[9] B.Kim and W.A.Pearlman, “An embedded wavelet video coder using
three-dimensional set partitioning in hierarchical tree”, IEEE Data
Compression Conference, pp. 251-260, March 1997.
[10] P. Dragotti, G. Poggi, and A. Ragozini, “Compression of multispectral images
by three-dimensional SPIHT algorithm”, IEEE Trans. on Geosci. Remote
Sensing, Vol. 38, pp. 416-428, Jan. 2000.
[11] Y. Kim and W. A. Pearlman, “Lossless volumetric medical image
compression”, in Proc. SPIE Conference on Applications of Digita Image
Processing XXII, vol. 3808, pp. 305-312, July 1999.
118
119
[12] Y. S. Kim and W. A. Pearlman, “Stripe-based SPIHT lossy compression of
volumetric medical images for low memory usage and uniform reconstruction
quality”, in Proc. ICASSP, vol. 4, pp. 2031-2034, June 2000.
[13] E. Christophe and W. A. Pearlman, ”Three-dimensional SPIHT Coding of
Volume Images with Random Access and Resolution Scalability”, EURASIP
Journal on Image and Video Processing, 2008.
[14] P. Schelkens, J. Barbarien, and J. Cornelis, “Compression of volumetric
medical data based on cube-splitting”, in Applications of Digital Image
Processing XXIII, Proc. of SPIE 4115, pp. 91-101, San Diego, CA, July 2000.
[15] J. Xu, Z. Xiong, S. Li, and Y. Zhang, “3-D embedded subband coding with
optimal truncation(3-D ESCOT)”, J. Appl. Comput. Harmon. Anal., vol. 10,
pp. 290-315, May 2001.
[16] C. Chysafis, A. Said, A. Drukarev, A. Islam, and W.A Pearlman, “SBHP - A
Low complexity wavelet coder”, IEEE Int. Conf. Acoust., Speech and Sig.
Proc. (ICASSP2000), vol. 4, PP. 2035-2038, June 2000.
[17] A. Islam and W.A. Pearlman, “An embedded and efficient low-complexity
hierarchical image coder”, in Proc. SPIE Visual Comm. and Image
Processing, Vol. 3653, pp. 294-305, 1999.
[18] J.M. Shapiro, “Embedded image coding using zerotrees of wavelet
coefficients”, IEEE Trans. Image Processing, Vol. 41, pp. 3445-3462, Dec.
1993.
[19] A. Said and W.A. Pearlman, “A new, fast and efficient image codec based on
set-partitioning in hierarchical trees”, IEEE Trans. on Circuits and Systems
for Video Technology, Vol. 6, pp. 243-250, June 1996.
[20] Y. Cho and W. A. Pearlman, “Quantifying the Performance of Zerotrees of
Wavelet Coefficients: Degree-k Zerotree Model”, IEEE Trans. on Signal
Processing, Vol. 55, Part 1, pp. 2425-2431, June 2007.
[21] X. Tang, W.A. Pearlman and J.W. Modestino, ”HyPerspectral image
compression using three-dimensional wavelet coding”, SPIE/IS&T Electronic
Imaging 2003, Proceedings of SPIE, Vol. 5022, Jan. 2003.
[22] X. Tang and W. A. Pearlman, ”Three-Dimensional Wavelet-Based
Compression of Hyperspectral Images”, Chapter in Hyperspectral Data
Compression, Kluwer Academic Publishers 2005.
[23] X. Tang, ”Wavelet Based Multi-Dimensional Image Coding Algorithms”,
Ph.D thesis, Rensselaer Polytechnic Institute, 2005.
120
[24] D. Taubman, “High performance scalable image compression with EBCOT”,
IEEE Trans. on Image Processing, Vol. 9, pp. 1158-1170, July 2000.
[25] Y. Shoham and A. Gersho, “Efficient Bit Allocation for an Arbitrary Set of
Quantizers”, IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol.
36, no. 9, pp. 1445-1453, September 1988.
[26] F.W. Wheeler, ”Trellis source coding and memory constrained image coding”,
Ph.D thesis, Rensselaer Polytechnic Institute, 2000.
[27] ”Information Technology - JPEG 2000 Image Coding System : Part 2 Extensions”, no. 15444-2, ISO/IEC JEC1/SC29/WG1 IS, 2002.
[28] ”Information Technology - JPEG 2000 Image Coding System : Part 10 Extensions for three-dimensional data”, no. 15444-10 ISO/IEC
JEC1/SC29/WG1 IS, 2008.
[29] D. T. Lee, ”JPEG 2000: Retrospective and New Developments”, in the
Proceedings of the IEEE Vol. 93, No. 1, , pp 32-41, Jan. 2005.
[30] I. Daubbechies, “Ten lectures on wavelets”, Society for Industrial and Applied
Mathematics, Philadelphia, 1992.
[31] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding
using wavelet transform”, IEEE Trans. on Image Processing, Vol. 1, No. 2,
pp. 205-220, Apr. 1992.
[32] C. K. Chui, ”An Introduction to Wavelets”, Academic Press, San Diego, 1992.
[33] C. Christopoulos, A. Skodras, and T. Ebrahimi. , “The JPEG2000 Still Image
Coding: An Overview”, IEEE Transactions on Consumer Electronics, Vol.
46, No. 4, pp. 1103-1127, November 2000.
[34] W. Swelden, ”The lifting scheme: a custom-design construction of
biorthogonal wavelets”, Applied and Computational Harmonic Analysis, Vol.
3, No. 2, pp. 186–200, 1996.
[35] W. Swelden, ”The lifting scheme: A construction of second generation
wavelets”, SIAM J. Math Anal., Vol. 29, No. 2, pp. 511-546, 1997.
[36] A.R. Calderbank, I. Daubechies, W. Swelden, and B.-L. Yeo, ”Wavelet
transforms that map integers to integers”, J. Appl. Computa. Harmonics
Anal. 5, pp.332-369, 1998.
[37] J. Andrew, ”A simple and efficient hierarchical image coder”, IEEE
International Conf. on Image Proc.(ICIP-97), Vol. 3, pp. 658-661, Oct. 1997.
121
[38] A. Said and W.A. Pearlman, ”Low-complexity waveform coding wia alphabet
and sample-set partitioning”, Visual Communications and Image Processing’
97, Proceedings of SPIE, vol. 3024, pp. 25-37, Feb. 1997.
[39] W.A. Pearlman, A. Islam, N. Nagaraj, and A. Said, “Efficient, low-complexity
image coding with a set-partitioning embedded block coder”, IEEE Trans. on
Ciruits and Systems for Video Technology, Vol. 14, No. 11, pp. 1219-1235,
Nov. 2004.
[40] Reduced Complexity Entropy Coding, ISO/IEC JTC1/SC29/WG1 N1312,
June 1999.
[41] Proposal of the Arithmetic Coder for JPEG2000, ISO/IEC JTC1/SC29/WG1
N762, Mar. 1998.
[42] S. Mallat, ”A wavelet tour of signal processing”, Academic Press, 2nd
Edition, pp. 413, 1999
[43] A.R. Calderbank, I.Daubechies, W. Sweldens, and B. Yeo, ” Wavelet
transforms that map integers to integers” Appl. Comput. Harmon. Anal., vol.
5, no. 3, pp. 332-369, 1998.
[44] I. Daubechies and W. Sweldens, ”Factoring wavelet transforms into lifting
steps”, J. Fourier Anal. Appl., vol. 4, pp. 247-269, 1998.
[45] A. Said and W. Pearlman, ”An image multiresolution representation for
lossless and lossy compression”, IEEE Trans. Image Processing, vol. 5, pp.
1303-1310, Sep. 1996.
[46] Z. Xiong, X. Wu, D.Y. Yun, and W.A. Pearlman, ”Progressive coding of
medical volumetric data using three-dimensional integer wavelet packet
transform”, Medical Technology Symposium, 1998. Proceedings. Pacific, pp.
384-387, 1998.
[47] C. He, J. Dong, Y.F. Zheng, and Z. Gao, ” Optimal 3-D coefficient tree
structure for 3D wavelet video coding”, IEEE Transactions on Circuits and
Systems for Video Technology, vol. 13, pp. 961-972, Oct 2003.
[48] A.P. Bradley and F.W.M.Stentiford JPEG 2000 and region of interest coding,
Digital Image Computing Techniques and Applications(DICTA), Melbourne,
Australia, pp. 303-308, 2002.
[49] P.N.Topiwala, “Wavelet Image and video compression”, Kluver Academic
Publishers, 1998.
[50] Kakadu JPEG2000 v3.4, http://www.kakadusoftware.com/.
122
[51] S. Cho and W. A. Pearlman, ”Error Resilient Video Coding with Improved
3-D SPIHT and Error Concealment”, SPIE/IS&T Electronic Imaging 2003,
Proceedings SPIE Vol. 5022, pp. 125-136, Jan. 2003.
[52] J.H. Conway and N.J.A Sloane, Sphere-Packing, Lattice, and Groups,
Springer, New York, NY, USA, 1988.
[53] S.P. Voukelatos and J. Soraghan, ”Very low bit-rate color video coding using
adaptive subband vector quantization with dynamic bit allocation”, IEEE
Trans. on Circuits and Systems for Video Technology, Vol. 7, No. 2, April
1997, pp. 424-428.
[54] C.E. Shannon, ”Coding theorems for a discrete source with a fidelity
criterion” IRE Nat. Conv. Rec., pt. 4, 1959, pp. 142-163.
[55] Y.L. Linde, A. Buzo and R.M. Gray, ”An algorithm for vector quantizer
design”, IEEE Trans. on Communication, Vol. COM-28, Jan. 1980, pp. 84-95.
[56] R.M. Gray, Source Coding Theoty, Boston: Kluwer, 1990.
[57] J.H. Conway and N.J.A. Sloane, ”Voronoi region of lattices, second moments
of polytopes, and quantization”, IEEE Trans. on Information Theory, Vol.
IT-28, Mar. 1982, pp. 211-226.
[58] J.H. Conway and N.J.A Sloane, ”Fast quantizing and decoding algorithms for
lattice quantizers and codes”, IEEE Trans. on Information Theory, Vol.
IT-28, Mar. 1982, pp.227-232.
[59] J.H. Conway and N.J.A Sloane, ”A fast encoding method for lattice codes and
quantizers”, IEEE Trans. on Information Theory, Vol. IT-29, No. 6, Nov.
1983, pp.820-824.
[60] T.R. Fischer, ”A pyramid vector quantizer”, IEEE Trans. on Information
Theory, Vol. IT-32, July 1986, pp. 568-583.
[61] M. Antonini, M. Barlaud, and P. Mathieu, ”Image coding using lattice vector
quantization of wavelet coefficients”, in Proc. IEEE Int. Conf. Acoustics,
Speech and Signal Processing, Toronto, ON, Canada, May 1991, pp.2273-2276.
[62] M. Barlaud, P. Sole, T. Gaidon, M. Antonini and P. Mathieu, ”Pyramidal
lattice vector quantization for multiscale image coding”, IEEE Trans. on
Image Processing, Vol. IP-3, No.4, July 1994, pp. 367-381.
[63] A. Woolf and G. Rogers, ”Lattice vector quantization of image wavelet
coefficient vectors using a simplified form of entropy coding”, in Proc. IEEE
Int. Conf. Acoustics, Speech and Signal Processing, Vol. 5, Adelaide,
Australia, Apr. 1994, pp. 269-272.
123
[64] H. Man, F. Kossentini, and M. J. T. Smith, ”A family of efficient and channel
error resilient wavelet/subband image coders”, IEEE Trans. on Circuits and
Systems for Video Technology, Vol. 9, no. 1, pp. 95-108, 1999.
[65] E.A.B. Da Silva, D.G.Sampson, and M. Ghanbari, ” A successvie
approximation vector quantizer for wavelet transform image coding”, IEEE
Trans. Image Processing, vol. 5, pp. 299-310, Feb. 1996.
[66] J. Knipe, X. Li,and B. Han, ”An improved lattice vector quantization scheme
for wavelet compression”, IEEE Trans. Signal Processing, vol. 46, pp.
239-243, Jan. 1998.
[67] D. Mukherjee and S. K. Mitra, ” Vector set partitioning with classified
successive refinement VQ for embedded wavelet image and video coding”, in
Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 5, Seattle,
WA, May 1998, pp. 2809-2812.
[68] D. Mukherjee and S. K. Mitra, ” Successive refinment lattice vector
quantization”, IEEE Trans. Image Porcessing, vol. 11, no. 12, pp. 1337-1348,
Dec. 2002.
[69] C. C. Chao and R. M. Gray, ”Image compression with vector SPECK
algorithm”, in Proceedings of IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP’06, vol. 2, pp. 445-448, Toulouse,
France, May 2006.
[70] K. Rose, D. Miller, and A. Gersho, ” Entropy-constrained tree-structured
vector quantizer design”, IEEE Trans. Image Porcessing, vol. 5, No. 2, pp.
393-398, Feb 1996.
[71] P. A. Chou, T. Lookabaugh, and R. M. Gray, ” Entropy-constrained vector
quantization”, IEEE Trans. Acoust. Speech, Signal Processing, vol. 37, No. 1,
, pp. 31-42, Feb 1989.
[72] A.A. Gersho and R. M. Gray, Vector Quantization and Signal Compression,
Kluwer Academic, New York, NY, USA, 1992.
[73] S. Cho, D. Kim, and W. A. Pearlman, ”Lossless compression of volumetric
medical images with improved 3-D SPIHT algorithm”, Journal of Digital
Imaging, Vol. 17, No. 1, pp. 57-63, March 2004.
[74] Y. Liu and W. A. Pearlman, ”Scalable three-dimensional SBHP algorithm
with region of interest access and low complexity,” Applications of Digital
Image Processing XXIX , Proc. SPIE Vol. 6312, pp. 631209-1–11, Aug. 2006.
[75] H.G. Lalgudi, A. Bilgin, M.W. Marcellin, A. Tabesh, M.S. Nadar, and T.P.
Trouard, ”Four-dimensional compression of fMRI using JPEG2000,” in Proc.
SPIE International Symposium on Medical Imaging, Feb. 2005.
124
[76] L. Zeng, C.P. Jansen, S. Marcsch, M. Unser, and P.R Hunziker,
”Four-dimensional wavelet compression of arbitrarily sized echocardiographic
data,” IEEE Transactions on Medical Imaging, Vol. 21, pp. 1179-1187, Sept
2002.
[77] H.G. Lalgudi, A. Bilgin, M.W. Marcellin, and M.S. Nadar, and T.P.,
”Compression of fMRI and ultrasound images using 4D SPIHT,”, in
Proceedings of 2005 International Conference on Image Processing, Genova,
Italy, September 2005.
[78] A.Kassim, P.Yan, W.Lee, and K.Sengupta, ”Motion compensated
lossy-to-lossless compression of 4-D medical images using integer wavelet
transforms”, IEEE Trans. on Info. Tech. in Biomedicine, Vol. 9, no., 1, pp.
132-138, March 2005.
APPENDIX A
Huffman Codes for Entropy Coding and Statistics of the
Training Set
In 3D-SBHP, the dimension of the code-block along the axial direction is much
short than the dimensions along in the spatial domain. This property makes most
sets are 2D set. All sets are stored in order. For those 2D sets, LIS[k] points to all
2k+1 × 2k+1 sets. When those sets are partitioned, there are four subsets or pixels,
we code them together by three individual Huffman codes for three context models.
• Huffman Code 1: if the set in the LIS[0] becomes significant, use Huffman
Code 1 to code the significant mask of this set.
• Huffman Code 2: if the non-2 × 2 set in the LIS[i] (i > 0) becomes significant,
use Huffman Code 2 to code the significant mask of this set.
• Huffman Code 3: if the 2 × 2 set which is newly generated in the current
bitplane becomes significant, use Huffman Code 3 to code the significant mask
of this set.
Some statistics utilized to generated Huffman codes for these three contexts
and the generated codewords are list in the following.
125
126
Significant Mask
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Probabilities of Medical Image Training Set
Context 1 Context 2
Context 3
0.090296
0.091414
0.135432
0.090227
0.090745
0.135160
0.063756
0.048479
0.055889
0.090235
0.088137
0.135425
0.062569
0.041981
0.054073
0.055897
0.026445
0.044545
0.053861
0.045748
0.031364
0.089652
0.086817
0.134552
0.055741
0.026342
0.044816
0.062524
0.041575
0.053016
0.053850
0.045509
0.031308
0.064294
0.042958
0.055011
0.053944
0.046189
0.031343
0.053892
0.045667
0.031379
0.059261
0.231995
0.026688
Table A.1: Probabilities for 15 significant subset masks collected from
medical image training set.
Significant Mask
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Probabilities of Hyperspectral Image Training Set
Context 1 Context 2
Context 3
0.115262
0.096264
0.173623
0.115253
0.096757
0.173633
0.061992
0.049633
0.042905
0.114914
0.095634
0.173279
0.060506
0.047942
0.041147
0.056912
0.038823
0.034611
0.037598
0.049960
0.013908
0.115096
0.098630
0.174340
0.056961
0.038891
0.034788
0.061132
0.050287
0.042022
0.037595
0.049989
0.013926
0.062093
0.050924
0.043522
0.037630
0.049149
0.013912
0.037637
0.049835
0.013839
0.029422
0.137281
0.010545
Table A.2: Probabilities for 15 significant subset masks collected from
hyperspectral image training set.
127
No. of Significant Subset
in a Significant Set
1
2
3
4
Probability
Medical Image
0.4436
0.3009
0.1646
0.0909
Hyperspectral Image
0.5191
0.2937
0.1381
0.0491
Table A.3: Probabilities for the number of significant subset in a split
significant set. This statistics is collected from both medical
image training set and hyperspectral image training set.
Probability for a generated
subset is significant
Medical Image Hyperspectral Image
0.4758
0.4293
Table A.4: Probabilities of significance of a generated subset when a set
is split. This statistics is collected from both medical image
training set and hyperspectral image training set.
Significant Mask
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Huffman Codewords of Medical Image Training Set
Context 1 Context 2
Context 3
000
010
000
0100
0110
100
1100
1110
0001
0010
0001
010
1010
01011
1001
0110
11011
01011
1110
1001
11011
0001
0101
110
1001
00111
0101
0101
10111
1101
1101
01111
00111
0011
11111
0011
1011
1101
10111
0111
0011
01111
1111
00
11111
Table A.5: Huffman codewords generated for 15 significant subset masks
based on medical image training set.
128
Significant Mask
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Huffman Codewords of Hyperspectral Image Training Set
Context 1 Context 2
Context 3
000
0010
001
100
1010
00
0001
0110
00011
010
1110
101
1001
0001
10011
01011
01111
01011
11011
1001
001111
110
000
10
0101
11111
11011
1101
0101
00111
00111
1101
101111
0011
0011
10111
10111
1011
011111
01111
0111
0111111
11111
100
1111111
Table A.6: Huffman codewords generated for 15 significant subset masks
based on hyperspectral image training set.