PDF - Center for Image Processing Research
Transcription
PDF - Center for Image Processing Research
Low-Complexity Scalable Multi-Dimensional Image Coding with Random Accessibility CIPR Technical Report TR-2008-5 Ying Liu August 2008 Center for Image Processing Research Rensselaer Polytechnic Institute Troy, New York 12180-3590 http://www.cipr.rpi.edu LOW-COMPLEXITY SCALABLE MULTIDIMENSIONAL IMAGE CODING WITH RANDOM ACCESSIBILITY By Ying Liu A Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Subject: Electrical Engineering Approved by the Examining Committee: William A. Pearlman, Thesis Adviser Alhussein Abouzeid , Member Mukkai Krishnamoorthy, Member John W. Woods, Member Rensselaer Polytechnic Institute Troy, New York July 2008 (For Graduation August 2008) LOW-COMPLEXITY SCALABLE MULTIDIMENSIONAL IMAGE CODING WITH RANDOM ACCESSIBILITY By Ying Liu An Abstract of a Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Subject: Electrical Engineering The original of the complete thesis is on file in the Rensselaer Polytechnic Institute Library Examining Committee: William A. Pearlman, Thesis Adviser Alhussein Abouzeid , Member Mukkai Krishnamoorthy, Member John W. Woods, Member Rensselaer Polytechnic Institute Troy, New York July 2008 (For Graduation August 2008) c Copyright 2008 ° by Ying Liu All Rights Reserved ii CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x ACKNOWLEDGMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 1 Desirable Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 SNR Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Resolution Scalability . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.3 Random Accessibility . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.4 Low Complexity and Resource Usage . . . . . . . . . . . . . . 3 1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2. WAVELET AND SET PARTITION CODING . . . . . . . . . . . . . . . . 9 2.1 2.2 2.3 Image Coding Background . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . . 9 2.1.2 Statistical Characteristics of Wavelet Transformed Images . . 11 2.1.3 Bit-plane Coding . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.4 Bit-plane Coding Passes . . . . . . . . . . . . . . . . . . . . . 12 Set-Partitioning Image Coding . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 SPIHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1.1 Spatial Orientation Trees . . . . . . . . . . . . . . . 14 2.2.1.2 Coding Algorithm . . . . . . . . . . . . . . . . . . . 16 2.2.2 SPECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.3 SBHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.4 EBCOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 iii 3. LOW-COMPLEXITY 3-D IMAGE CODER: 3D-SBHP . . . . . . . . . . . 30 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Three-Dimensional Integer Wavelet Transform . . . . . . . . . . . . . 32 3.3 3.4 3.2.1 Lifting Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.2 Scaling Factors . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Scalable 3D-SBHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Coding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.2 Processing Order of Sorting Pass . . . . . . . . . . . . . . . . 42 3.3.3 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.4 Memory and Complexity Analysis . . . . . . . . . . . . . . . . 44 3.3.5 Scalable Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.5.1 Resolution Scalability . . . . . . . . . . . . . . . . . 46 3.3.5.2 Rate Control . . . . . . . . . . . . . . . . . . . . . . 46 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 3.5 Lossless Coding Performance . . . . . . . . . . . . . . . . . . 3.4.1.1 Lossless Coding Performance by Use of Different Integer Wavelet Transforms . . . . . . . . . . . . . . 3.4.1.2 Comparison of Lossless Performance with Different Algorithms . . . . . . . . . . . . . . . . . . . . . . 3.4.1.3 Lossless coding performance by use of different codeblock sizes . . . . . . . . . . . . . . . . . . . . . . . . 50 . 50 . 51 . 54 3.4.2 Lossy performance . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.3 Resolution scalable results . . . . . . . . . . . . . . . . . . . . 56 3.4.4 Computational Complexity . . . . . . . . . . . . . . . . . . . . 57 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 58 4. Region-of-Interest Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1 Code-block Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Random Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.1 Wavelet Transform vs. Random Accessibility . . . . . . . . . . 74 4.2.1.1 Filter Implementation . . . . . . . . . . . . . . . . . 74 4.2.1.2 ROI decoding performance by use of different wavelet filters and wavelet decomposition levels . . . . . . . . 75 4.2.2 Code-block Configurations vs. Random Accessibility . . . . . 76 4.2.2.1 Lossy-to-lossless coding performance by use of different code-block sizes . . . . . . . . . . . . . . . . . 76 iv 4.2.2.2 4.2.3 4.3 ROI decoding performance by use of different codeblock sizes and ROI sizes . . . . . . . . . . . . . . . 78 ROI access performance by use of different bit allocation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5. Multistage Lattice Vector Quantization for Hyperspectral Image Compression 83 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Vector Quantization . . . . . . . . . . . . . . 5.2.1 Lattice Vector Quantization . . . . . . 5.2.1.1 Classical Lattice . . . . . . . 5.2.1.2 LVQ Codebook . . . . . . . . 5.2.2 Multistage Lattice Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 85 85 86 87 5.3 MLVQ-SPIHT 5.3.0.1 5.3.0.2 5.3.0.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 90 91 92 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 94 . . . . . . . . . . . Cubic Z4 LVQ . Pyramid D4 LVQ Sphere D4 LVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Four-Dimensional Wavelet Compression of 4-D Medical Images Using Scalable 4-D SBHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.2 Scalable 4D-SBHP . . . . . . 6.2.1 Wavelet Decomposition 6.2.2 Coding Algorithm . . . 6.2.3 Scalable Coding . . . . 6.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.3.1 Comparison of Lossless performance with 3-D and 4-D schemes106 6.3.2 Comparison of Lossy performance with 3-D schemes . . . . . . 107 6.3.3 Resolution scalable results . . . . . . . . . . . . . . . . . . . . 108 6.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 108 . . . . in 4-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 100 101 105 7. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.1 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 115 7.2 Futher Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.2.1 Improving Compression Efficiency . . . . . . . . . . . . . . . . 117 7.2.2 3D-SBHP on Video . . . . . . . . . . . . . . . . . . . . . . . . 117 v LITERATURE CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 APPENDICES A. Huffman Codes for Entropy Coding and Statistics of the Training Set . . . 125 vi LIST OF TABLES 2.1 Filter coefficients for the Daubechies’ biorthogonal 9/7 filters . . . . . . 10 2.2 Comparison of wavelet-based image coders . . . . . . . . . . . . . . . . 29 3.1 Average standard deviation of volumetric image sequences along X, Y, and Z directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Lossless integer filters. 3.3 Description of the image volumes . . . . . . . . . . . . . . . . . . . . . 50 3.4 Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on CT data. . . . . . . . 51 3.5 Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on MR data. . . . . . . . 52 3.6 Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on AVIRIS data (Decomposition level of 3 is used on all dimensions). . . . . . . . . . . . . . . . 52 3.7 Comparison of different coding methods for lossless compression of 8-bit medical image volumes (bits/pixel). . . . . . . . . . . . . . . . . . . . . 54 3.8 Comparison of different coding methods for lossless coding of 16-bit AVIRIS image volumes (bit/pixel) (Decomposition level of 5 is used on spatial domain and decomposition level of 2 is used on spectral axis) . 54 3.9 Lossless Coding Results by Use of Different Code-block Size (bits/pixel) 55 3.10 PSNR performance (in dB) of 3D-SBHP at various rates for medical volumetric image data. These rates are obtained by truncation of lossless bitstream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.11 PSNR for decoding CT skull at a variety of resolutions and bit rates . . 56 3.12 Byte used for losslessly reconstruct CT skull at a variety of resolutions . 56 3.13 The comparison of lossless encoding time between AT-3D-SPIHT and 3D-SBHP on image CT skull and MR liver t1. (Wavelet transform times are not included.) . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.14 The comparison of decoding time between AT-3D-SPIHT and 3D-SBHP on image CT skull and MR liver t1 at a variety of bit rates. (Wavelet transform times are not included.) . . . . . . . . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 vii 3.15 Losslessly decoding time of 3D-SBHP on CT skull and MR liver t1 at a variety of resolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.16 The comparison of CPU cycles used for wavelet transform time, lossless encoding and disk I/O between AT-3D-SPIHT and 3D-SBHP on CT skull image sequence. . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Number of taps of integer filters. 4.2 Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64 × 64 × 64, code-block size = 8 × 8 × 2, spatial wavelet decomposition level = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3 Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64 × 64 × 64, code-block size = 8 × 8 × 2, spatial wavelet decomposition level = 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4 Description of the image volumes . . . . . . . . . . . . . . . . . . . . . 77 5.1 Description of the image volume Moffett Field . . . . . . . . . . . . . . 93 5.2 Comparison of rate-distortion results of different coding methods in Signal-to-Noise ration (SNR) in dB . . . . . . . . . . . . . . . . . . . . 94 6.1 Average standard deviation of 4D fMRI and 4D CT image data along X, Y, Z and T directions. . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.2 Description of the image volumes . . . . . . . . . . . . . . . . . . . . . 106 6.3 Lossless compression performance using 4D-SBHP and 3D-SBHP (bits/pixel)107 6.4 Lossless compression performance using 4D methods (bits/pixel) . . . . 107 6.5 SNR for decoding siem at a variety of resolutions and bit rates . . . . . 108 A.1 Probabilities for 15 significant subset masks collected from medical image training set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 A.2 Probabilities for 15 significant subset masks collected from hyperspectral image training set. . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 A.3 Probabilities for the number of significant subset in a split significant set. This statistics is collected from both medical image training set and hyperspectral image training set. . . . . . . . . . . . . . . . . . . . 127 A.4 Probabilities of significance of a generated subset when a set is split. This statistics is collected from both medical image training set and hyperspectral image training set. . . . . . . . . . . . . . . . . . . . . . . 127 . . . . . . . . . . . . . . . . . . . . . 75 viii A.5 Huffman codewords generated for 15 significant subset masks based on medical image training set. . . . . . . . . . . . . . . . . . . . . . . . . 127 A.6 Huffman codewords generated for 15 significant subset masks based on hyperspectral image training set. . . . . . . . . . . . . . . . . . . . . . 128 ix LIST OF FIGURES 1.1 An example of medical CT images (256 × 256 × 192). . . . . . . . . . . 2 1.2 An example of hyperspectral images (512 × 512 × 224). . . . . . . . . . 2 1.3 Block diagram of general transform coding system. . . . . . . . . . . . . 6 2.1 Two channel filter structure for subband coding. . . . . . . . . . . . . . 10 2.2 Illustration of a two-dimensional dyadic DWT decomposition when two levels are performed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Parent-child relationship in SPIHT. . . . . . . . . . . . . . . . . . . . . 15 2.4 Partitioning of wavelet transformed image into sets S and I. . . . . . . 19 2.5 Quadtree partitioning of set S. . . . . . . . . . . . . . . . . . . . . . . . 19 2.6 Octave partitioning of set I. . . . . . . . . . . . . . . . . . . . . . . . . 20 2.7 Set partitioning rules used by SBHP. . . . . . . . . . . . . . . . . . . . 24 2.8 Example of JPEG2000 code-block scan pattern . . . . . . . . . . . . . . 28 3.1 Wavelet decomposition structure with 3 levels of 2D spatial transform followed by 2 levels of 1D axial transform. . . . . . . . . . . . . . . . . . 33 3.2 The forward wavelet transform using lifting: First the Lazy wavelet (subsample into even and odd), then alternating lifting and dual lifting steps, and finally a scaling. . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 The inverse wavelet transform using lifting: First a scaling, then alternating dual lifting and lifting steps, and finally the inverse Lazy transform. 34 3.4 An example of scaling factor used in integer wavelet transform to approximate a 3D unitary transform. . . . . . . . . . . . . . . . . . . . . . 61 3.5 Wavelet decomposition structure with 2 levels of 1D packet decomposition along axial direction, followed by 3 levels of 2D dyadic transform in spatial domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.6 Partitioning of the code-block into set S and I. . . . . . . . . . . . . . . 62 3.7 Quadtree partitioning of set S. . . . . . . . . . . . . . . . . . . . . . . . 63 3.8 octave-band partitioning of set I. . . . . . . . . . . . . . . . . . . . . . . 63 x 3.9 Set partitioning rules used by 3-D SBHP. . . . . . . . . . . . . . . . . . 64 3.10 12 resolution levels with 3-level wavelet decomposition in the spatial domain and 2-level wavelet decomposition in the spectral direction. . . . 64 3.11 An example of 3D-SBHP SNR and resolution scalable coding. Compressed bitstream generated on bitplane α in code-block β is notated as b(α,β) . Code-blocks are encoded and indexed from the lowest subband to the highest subband. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.12 Bitstream structure generated by 3D-SBHP. Compressed bitstream generated on bitplane α in code-block β is notated as b(α,β) . R(i,j,k) denotes the number of bit used after ith coding pass (i = 0: LIP pass; i = 1: LIS pass; i = 2: LSP pass) at the nth bit plane for code-block Bk . Derivation D(i,j,k) denotes the derivation of the rate distortion curve, δDi,j,k , after ith coding pass (i = 0: LIP pass; i = 1: LIS pass; i = 2: LSP pass) at the nth bit plane for code-block Bk . . . . . . . . . . . . . 65 3.13 Reconstructed CT Skull 1st slice by 3D-SBHP, from left to right, top to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0 bpp, and original slice . . 66 3.14 Reconstructed MR Liver t1 1st slice by 3D-SBHP, from left to right, top to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0 bpp, and original slice 67 3.15 A visual example of resolution scalable decoding. From left to right: 1/4, 1/2 and full resolution at 0.125 bpp . . . . . . . . . . . . . . . . . 68 4.1 Spatial access with code-blocks. . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Parent-offspring dependencies in the 3D orientation tree. . . . . . . . . 70 4.3 2D example of code-block selection. Filter length is considered. . . . . . 72 4.4 An visual example of 3D-SBHP random access decoding. . . . . . . . . 73 4.5 Rate-distortion performance with increasing code-block size. . . . . . . 77 4.6 Rate-distortion performance with increasing ROI size. . . . . . . . . . . 79 4.7 A visual example of ROI decoding from 3-D SBHP bit stream using different wavelet filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.8 Rate-distortion performance with different priorities for code-blocks. . . 82 5.1 Multistage lattice VQ with A2 lattice. . . . . . . . . . . . . . . . . . . . 88 5.2 An example of parent-child relationship between vectors when vector dimension N = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 xi 5.3 Vector SPIHT with successive refinement LVQ. . . . . . . . . . . . . . . 91 5.4 Comparison of original and reconstructed moffet scene 3 49th band by MLVQ-SPIHT, from top to bottom: original, 0.1bpp, 0.5bpp. . . . . . . 96 5.5 Comparison of lossy performance of for Moffet Field image, scene 3. . . 97 6.1 Wavelet decomposition structure with 2 levels of 1D temporal transform followed by 2 levels of 1D axial transform and 2D spatial transform. The black block is the lowest frequency subband. . . . . . . . . . . . . . . . 101 6.2 Quadtree partitioning of set S. . . . . . . . . . . . . . . . . . . . . . . . 103 6.3 octave-band partitioning of set I. . . . . . . . . . . . . . . . . . . . . . . 103 6.4 Set partitioning rules used by 4-D SBHP. . . . . . . . . . . . . . . . . . 110 6.5 An example of 4D-SBHP SNR and resolution scalable coding. Each bitplane α in block β is notated as b(α, β). Code-blocks are encoded and indexed from the lowest subband to the highest subband. . . . . . . 111 6.6 Bitstream structure generated by 4D-SBHP. Each bitplane α in block β is notated as b(α, β). Rate-distortion information is stored in the header of every code-block. . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.7 Comparison of lossy performance of mb01 image data. . . . . . . . . . . 112 6.8 Comparison of lossy performance of siem image data. . . . . . . . . . . 113 6.9 Reconstructed siem sequence at time t = 20 by 4D-SBHP, from left to right, top to bottom: original, 0.5 bpp, 1.0 bpp, and 2.0bpp . . . . . . . 114 6.10 A visual example of resolution scalable decoding. Full resolution and 1/2 resolution of one slice at 0.25 bpp . . . . . . . . . . . . . . . . . . . 114 xii ACKNOWLEDGMENT First and foremost, I offer my sincerest gratitude to my thesis advisor, Professor William A. Pearlman, for his carefully considered advice, patience, selfless support and inspirational enthusiasm for this research. I am very fortunate to have worked with him. I would like to thank Professor John W. Woods, Alhussein Abouzeid and Mukkai Krishnamoorthy for serving as members of my thesis committee. I am grateful for their help and thoughtful comments on this work. Many of my fellow students in Center for Image Processing Research (CIPR), too many to be mentioned individually, were of great help while this work was being done. Their knowledge and experience in performing related research has been very helpful. I acknowledge with thanks the financial support I received for my Ph.D. program from Office of Naval Research, Electrical, Computer and Systems Engineering Department, and Rensselaer Polytechnic Institute. Lastly, and most importantly, I would like to thank my family, to whom this thesis is dedicated. Their never-ending support in all my endeavors has be invaluable. xiii ABSTRACT Multi-dimensional data set, such as hyperspectral images and medical volumetric data generated by computer tomography (CT) or magnetic resonance (MR) typically contains many image slices that requires huge amount of storage and transmission bandwidth. To compress those huge size image data, it is highly desirable to have a low-complexity and efficient image coding algorithm. Furthermore, in the Internet environment, to make interactive viewing more efficient, we need a compression scheme which is inherently scalable and that supports a high degree of random accessibility. The first aspect of this work proposes a fast coding method that supports both SNR and resolution scalabilities and decoding of a region of interest by random access to the bitstream. In order to achieve minimal complexity, we use fixed-symbol Huffman coding instead of context-based arithmetic coding. Multi-dimensional subband/wavelet coding is applied to exploit the dependencies and multi-resolution in all dimensions. We adopt wavelet bitplane coding to give full SNR scalability. The hierarchical coding and block based structure enables spatial accessibility and resolution scalable representation of wavelet transform coefficients. The framework is designed and implemented for both 3D and 4D image sources. We demonstrate through extensive experiments that our coding scheme performed comparable in compression to other algorithms while yielding very high coding speeds, and supporting all features planned for JPEG 2000. The second aspect of this thesis proposes a coding method for wavelet coefficients of 3D image sources using vector quantization. In the proposed algorithm, multistage lattice vector quantization (MLVQ) is used to exploit correlations between image slices, while offering successive refinement with low coding complexity and computation. Different LVQs including cubic Z4 and D4 lattices are implemented with SPIHT. The experimental results show that MLVQ-based-schemes provide better rate-distortion performance at low bit rate than 2DSPIHT and those algorithms that employ 3D wavelet transforms. xiv CHAPTER 1 INTRODUCTION In the past decade, the acquisition, transmission, storage, and processing of digital images have become widespread. Digital technology allows visual information to be regenerated, processed, archived and transmitted easily. Most significantly, digital images easily provide a diverse range of service over Internet. Despite their advantages, there is one problem with digital images – they generally consist of large amounts of data when they are represented in uncompressed data form, especially for those three-dimensional and four-dimensional image data produced by medical data acquisition devices or multispectral/hyperspectral imaging techniques. Those volumetric data sets are best viewed as a collection of still images, known as slices. Magnetic Resonance Imaging (MRI), Computer-assisted Tomography (CT) and Ultrasound (US) are typical examples of medical imaging techniques that generate three-dimensional and four-dimensional image data. The slices represent cross sections of the subject at various positions along a third orthogonal axis at a given time instant. Figure 1.1 gives a visual example of 3D medical CT image sequence. Hyperspectral imaging is a remote sensing imaging technique that also generates three-dimensional data sets, where the slices represent narrow and contiguous spectral bands of the region being viewed by the instrument. As an example shown in Figure 1.2, a 512 × 512 × 224 hyperspectral image data with 16 bits/pixel, has a raw file size equal to 117.5 Mbytes, which would require about 8 minutes to acquire even over the high speed 256Kbps digital subscriber line (DSL). Therefore, efficient compression should be applied to those data sets before storage and transmission. On the other hand, in many internet applications, it is indispensable to guarantee interactivity for consultation and quantitative analysis. As a consequence, trading-off image quality and algorithm complexity against the bit-rate constraint requires the compression scheme to not only have efficient compression but also other functionality. 1 2 Figure 1.1: An example of medical CT images (256 × 256 × 192). Figure 1.2: An example of hyperspectral images (512 × 512 × 224). 1.1 1.1.1 Desirable Functionality SNR Scalability SNR scalability is a functionality that is provided by multiple layers such that an enhancement layer carries coefficients quantized with improved accuracy. This means the data that are more important for reconstructing the image are stored before less important data in the compressed image description. In fact, if we take a long enough prefix of the bitstream we should be able to get a completely 3 lossless representation of the original image. This lossy-to-lossless compression is useful whenever images are needed at several different quality settings. For medical consultation, discarding small image details that might be an indication of pathology could alter a diagnosis, causing severe human and legal consequences[1]. In this case, lossless decoding is preferred. On the other hand, lossy decoding allows users to quickly browse through a large volumetric data set. 1.1.2 Resolution Scalability Resolution scalability is the ability to easily display the image at different reso- lutions. For an algorithm to be resolution scalable, the beginning of the compressed image bitstream should contain data for reconstructing a small, low-resolution version of the image. Each successive part of the bitstream, along with the previous bits, should contain the data for reconstructing a larger, higher-resolution version of the image. This capability would be very useful in image browsing, or in any system that requires different resolutions of an image for output to different devices. 1.1.3 Random Accessibility Random accessibility refers to the ability to render an arbitrary portion of the image data set from an embedded compressed codestream without having to decode the entire image. This feature, when combined with resolution scalability, would be very useful in interactive applications. After viewing a low-resolution overview of the image, a user can zoom to a particular region without decoding the entire high-resolution image data set. 1.1.4 Low Complexity and Resource Usage The complexity of an image compression algorithm is calculated by the number of data operations required to perform encoding and decoding. The resource usage usually means the speed of encoding/decoding and the amount of memory used by the compression algorithm. For some applications, the speed of encoding and decoding an image is critical. In other applications, the amount of memory used by the compression algorithm needs to be small to keep the cost of the entire system 4 low. Clearly, it is desirable for an image compression algorithm to be both as fast as possible, and to have a memory footprint that is as small as possible. 1.2 Related Works In the past, some Discrete Cosine Transform(DCT) based schemes [5, 6] have been proposed for volumetric coding. For these techniques the computation is based on splitting image volume into N × N × N blocks and applying a 3-D DCT. These 3-D DCT based schemes encountered two problems: 1) They cannot meet the requirements imposed by the scalability paradigm, 2) DCT-based processes cannot be used for true lossless coding1 . The latter is extremely important for medical applications that often cannot tolerate any distortion that could lead to an faulty diagnosis. To overcome these problems and maintain good compression performance, many promising wavelet based image coding algorithms are proposed recently. Shapiro’s Embedded Zerotree Wavelet (EZW) [18] is the first efficient wavelet-based image coding algorithm. Later work by Said and Pearlman [19] on set partitioning in hierarchical trees (SPIHT) improved upon EZW coding and applied it successfully to both lossy and lossless compression. Islam and Pearlman proposed Set Partitioned Embedded bloCK (SPECK)[17], a low complexity, block-based image coder, with similar features. While EZW and SPIHT represent zero-tree based image coders, SPECK represents image coder based on zero-block structures. The new JPEG 2000 standard is based on a similar scheme called Embedded Block Coding with Optimal truncation (EBCOT) [24]. A SPECK variant called Subband Block Hierarchical Partitioning (SBHP) [16] was proposed as a low complexity alternative to EBCOT in the JPEG 2000 Working Group. SBHP was incorporated into the JPEG 2000 coding framework, simply by replacing EBCOT as the entropy coding engine Although 3-D image data can be compressed by applying two-dimensional compression algorithm to each slice independently, the high correlation between slices makes three-dimension based algorithm a better choice. Recently, compres1 We are referring to a true floating point DCT, not integer approximation to the DCT, which is not as efficient, but can be compatible with lossless coding 5 sion techniques with separate 3D wavelet transform and 3D coding of quantization indices have been considered by several researchers. 3-D context-based EZW (3-D CB-EZW) [7], a 3D zero-tree coding through modified EZW, has been used with good results in compression of volumetric images. However, as pointed out by Xiong et al.[8], the problem of efficient progressive lossy compression is not addressed there. The well-known SPIHT algorithm has been extended to three dimensions by Kim and Pearlman [9]. Dragotti et al. applied 3-D SPIHT for compression of multispectral images [10]. 3-D SPIHT has been applied on volumetric medical data by Kim and Pearlman[11]. Stripe-based SPIHT has been proposed for volumetric medical data compression with low memory [12]. Recently, Christophe and Pearlman presented an adaptation of 3D-SPIHT to support random accessibility and resolution scalability [13]. Tang et al. [21] extended SPECK to three-dimensions and applied 3-D SPECK on hyperspectral images. 3-D SPECK treats each subband as a code-block and generates an embedded codestream for each code-block independently. The EBCOT algorithm has also been extended to three-dimensions by several researchers. Three-Dimensional Cube Splitting EBCOT (3D CS-EBCOT) [14] partitions the wavelet coefficient prism into fixed-size 64 × 64 × 64 code-block and applies cube splitting technique on each code-block. Xu et al. extended EBCOT to Three-Dimensional Embedded Subband Coding with Optimized Truncation (3-D ESCOT) [15] by treating each subband as a code-block. JPEG2000 Part 2 [27] also provides a method to code multicomponent images. After a 3D discrete wavelet transform (DWT), the JPEG2000 coder is applied on each transformed slice independently. All 3-D applications are potentially affected by the fact that Part 2 fails to enable a number of source coding features in the cross-component direction [29]. Now JP3D, a new work item with the JPEG working group, is under development to provide extensions of JPEG2000 for logically rectangular 3D data sets. Although many techniques with 3D wavelet transform and 3D coding have been proposed for compressing 3D datasets, most of them are unable to provide full scalability or random access functionality. In this thesis, we address low- complexity compression techniques which support full scalability and a degree of random access into the multi-dimensional image data with a single codestream per data set. 6 In a transform coding system [2, 3, 4] as depicted in Figure 1.3, three-dimensional transform, quantization and adaptive coding method based on three-dimensional context modelling are all candidates for exploiting the relationships between slices. Due to its superior performance over scalar quantization, vector quantization has been applied in many wavelet-based coding algorithms. Encoder Decoder Transform Quantization Entropy Encoder Inverse Transform Inverse Quantization Entropy Decoder Figure 1.3: Block diagram of general transform coding system. In [53], subband image coding with LBG [55] codebook generation VQ is proposed. Since the LBG training algorithm causes high computational cost and coding complexity, especially as the vector dimension and bit rate increase, lattice vector quantization is proposed to reduce the computational complexity [57]. Plain lattice vector quantization (LVQ) of wavelet coefficient vectors has been successfully employed for image compression [61, 62, 63]. In order to improve performance, it is reasonable to consider combining LVQ with powerful wavelet-based zerotree or set-partitioning image coding methods and bitplane-wise successive refinement methodologies for scalar sources, as in EZW, SPIHT and SPECK. In [64], a multistage lattice vector quantization is used along with both zerotree structure and quadtree structure that produced comparable results to JPEG 2000 at low bit rates. VEZW [65] and VSPIHT [66, 67, 68] have successfully employed LVQ with 2D-EZW and 2D-SPIHT respectively. And in VSPECK [69], tree-structured vector quantization (TSVQ) [70] and ECVQ [71] are used to code the significant coefficients for 2D-SPECK. Since VQ has the ability to exploit the statistical correlation between neighboring data in a straightforward manner, the second aspect of this thesis proposes a coding method for wavelet coefficients of 3D image sources using vector quantization. The multistage LVQ is used to obtain the counterpart of bitplane-wise 7 successive refinement, where successive lattice codebooks in the shape of Voronoi regions of multidimensional lattice are used. 1.3 Outline of the Thesis This chapter briefly introduces the motivation and desirable features of 3-D image compression. It also includes related work and the outline of this proposal. In Chapter 2, fundamentals of hierarchial wavelet based image compression scheme are introduced. Brief reviews of discrete wavelet transform and bit plane coding are given first. The basic coding mechanisms are described by reviewing and analyzing several representative hierarchical set partitioning algorithms, including SPIHT, SPECK and SBHP. A brief description of the EBCOT algorithm is also given in this chapter. In Chapter 3, a very fast, low-complexity volumetric image coding algorithm, 3D-SBHP, for SNR and resolution scalable and random access decoding is presented. Our main interest is scalable compression techniques which also support a degree of random access into the volumetric data. Here, 3D-DWT is applied on an image sequence to exploit the correlation along the spatial dimensions and axial dimension. After the wavelet transform, the wavelet coefficient prism is split into fixed-size codeblocks and the 3-D SBHP algorithm is applied on each code-block independently. The algorithm is based on set partitioning and bit-plane coding. The set partitioning technique can quickly zoom to the high energy areas. 3D-DWT and block-based coding naturally support resolution scalable coding, while bit-plane coding enables SNR scalability. Experiments show that our proposed algorithm provides comparable efficiency to other algorithms, while supporting all desirable features addressed in Section 1.1. Chapter 4 addresses the random access decoding method of 3D-SBHP. Codeblock selection method is chosen so that the image sequence can be encoded only once and then the decoder can directly extract a subset of the codestream to reconstruct a chosen Region of Interest (ROI) of required quality. In this chapter, we investigate random accessibility and compression efficiency of highly scalable volumetric compression from both the transform and coding perspective. 8 In Chapter 5, we extended the SPIHT coding algorithm with lattice vector quantization to code hyperspectral images. In the proposed algorithm, multistage lattice vector quantization (MLVQ) is used to exploit correlations between image slices, while offering successive refinement with low coding complexity and computation. Different lattices including cubic Z4 and D4 are considered. Their performances are compared with other 2D and 3D wavelet-based image compression algorithms. In Chapter 6, the idea of the 3D-SBHP algorithm in Chapter 3 is extended to the 4D case. Resolution scalability is empirically investigated, and the lossy-tolossless compression performances are compared with other 3D and 4D volumetric compression schemes. In Chapter 7, the overall conclusion of this proposal and further work are discussed. CHAPTER 2 WAVELET AND SET PARTITION CODING Recently, a number of hierarchical wavelet based image coding techniques have emerged. All these techniques are based on the idea of set partitioning and exploiting the hierarchical subband pyramidal structure of the transformed images. In this chapter, we briefly review the fundamentals of wavelet based hierarchical set partitioning image compression algorithms. The Discrete Wavelet Transform is briefly introduced first and is followed by basic wavelet bitplane coding techniques. The last section reviews several important hierarchical set partitioning image coding algorithms, including Set Partitioning In Hierarchical Trees (SPIHT), Set Partitioned Embedded bloCK (SPECK),and Subband Block Hierarchical Partitioning (SBHP). The Embedded Block Coding with Optimal Truncation (EBCOT), which is the basis of the JPEG2000 standard, is also described in this chapter. 2.1 2.1.1 Image Coding Background Discrete Wavelet Transform The wavelet transform represents its input in terms of functions that are lo- calized in both time and frequency. Mathematically, the wavelet transform approximates a function by representing it as a linear combination of two sets of functions: Φ and Ψ. The set Φ is constructed from the scaling function, while Ψ is constructed from the mother wavelet. Such superposition decomposes the function into different scale levels, where each level is then further decomposed with a resolution matched to the level. More detailed mathematical introductions to wavelets can be found in [30, 31, 32]. The discrete wavelet transform (DWT) is applied to discretely sampled data and is based on a low-pass filter and a high-pass filter. A filter is defined by a finite set of filter coefficients. Table 2.1 gives the filter coefficients of the biorthogonal Daubechies 9/7 wavelet [33]. Generally, the output of a filter can be computed by a convolution of the filter coefficients with the input data followed by downsampling 9 10 as shown in Fig 2.1. A convolution of filter coefficients c1 , ..., cm with input data x[1], ..., x[n] producing output y[1], .., y[n] is given as follows: y[i] = m X cj x[i + j − dm/2e]. j=1 As can be seen, the time required to compute the DWT is proportional to the number of filter coefficients. Since downsampling removes half the outputs, the computations for those outputs are not necessary. A very efficient method of computing the DWT, known as the lifting scheme, was presented in [34, 35] to reduce the number of arithmetic operations. k 0 ±1 ±2 ±3 ±4 Analysis Low-pass filter High-pass filter 0.602949 1.115087 0.266864 -0.591271 -0.078223 -0.057543 -0.016864 0.091271 0.026748 Synthesis Low-pass filter High-pass filter 1.115087 0.602949 0.591271 -0.266864 -0.057543 -0.078223 -0.091271 0.016864 0.026748 Table 2.1: Filter coefficients for the Daubechies’ biorthogonal 9/7 filters y1[n] 2 G 2 x[n] y2[n] Analysisfilter Downsampling 2 ~ H 2 ~ G CHA NNEL H Upsampling Synthesisfilt ^ X[n] er Figure 2.1: Two channel filter structure for subband coding. A subband is the set of transform coefficient outputs when applying one filter to the input data points. Thus, there are two subbands after the DWT is applied once. One represents the low-pass filter output (L1 ), and the other represents the high-pass filter output (H1 ). When applied to two-dimensional data such as images, the DWT is applied horizontally to each row of the image and vertically on each column of the wavelet coefficients calculated in the row transformations. Thus, the first level of the transform consists of four subbands: a horizontal and vertical low-pass subband (LL), a horizontal low-pass and vertical high-pass subband (LH), a horizontal highpass and vertical low-pass subband (HL), and a horizontal and vertical high-pass 11 subband (HH). The LL subband represents a low-resolution overview of the image, while the other subbands represent a high-frequency detail information. In the LH, HL, and HH subbands, most coefficients are close to zero, while those that are not close to zero represent edges in the image. When dyadic wavelet decomposition is applied to an image, each successive level of the transform operates only on the LL subband data produced by the previous level. This structure efficiently represents edges in the high-frequency subbands, as well as smooth regions in the low frequency subbands. This method is also known as pyramidal wavelet decomposition and octave subband wavelet decomposition. Fig 2.2 shows two levels of dyadic wavelet decomposition applied to an image. LL1 Original Image HL1 LL2 LH2 HL2 HH2 LH1 HH1 LH1 HL1 HH1 Figure 2.2: Illustration of a two-dimensional dyadic DWT decomposition when two levels are performed. 2.1.2 Statistical Characteristics of Wavelet Transformed Images Image compression algorithms work by exploiting the correlation between co- efficients and then removing this correlation so that the image can be represented in fewer bits. EZW is a very successful compression algorithm based on two statistical characteristics of wavelet-transformed natural images. The first property is that many wavelet coefficients are zero or close to zero beyond the LL subband and those coefficients generally obey the zerotree property: if a coefficient is found to be insignificant in a given bit-plane, then all its descendants are also likely to be insignificant. Another property is that the magnitude of a child coefficient is usually less than that of its parent. This property is used in deciding the testing order. In SPECK and EBCOT, the main idea is to exploit the clustering of energy in hierarchical structure of transformed images. They found that significant coefficients are likely to cluster together, regardless of whether they share the same parent. 12 2.1.3 Bit-plane Coding Bit-plane coding has been applied to many wavelet based image compression algorithms, such as EZW, SPIHT, SPECK and EBCOT. If we write the absolute P n value of the wavelet transformed coefficient ci in binary format, |ci | = n bn 2 , where the bit-plane n = 0, 1, .., nmax and the bit index bn ∈ 0, 1, then the bit-plane n consists of the single bit index of the nth least significant bit of the coefficient magnitude. The bit-planes are coded in order so that no bit from bit-plane n is coded before all bits from bit-plane n + 1 have been coded. The bit-plane ordering represents successive refinement of a simple scalar quantization of the coefficients. With each additional bit-plane, the quantization bins get smaller and the uncertain interval of |ci | is halved. Since the wavelet coefficient can be positive or negative, the sign of each coefficient must be encoded. Typically, the sign is encoded when a coefficient is identified to be significant. Said and Pearlman [19] show that greater magnitude coefficients in the wavelet transform domain affect the quality of the image more than lesser magnitude coefficients. This suggests that the successive refinement strategy of bit-plane coding is very efficient way to generate an embedded codestream. Since the sign is very important once the coefficient becomes significant, sending the sign bit immediately after a coefficient becomes significant also helps produce a good embedded codestream. Although many image compression algorithms code the wavelet transformed coefficients by bit-plane coding, they are different in the method of scanning and compressing a bit-plane. In this chapter we will discuss some of these methods. 2.1.4 Bit-plane Coding Passes Wavelet image coding methods typically code each bit-plane with several passes. EZW, SPIHT and SPECK code each bit-plane with two passes – significance pass and refinement pass. The significance pass convey significance and sign information for coefficients that have not yet been found to be significant. The refinement pass sends one more bit for each coefficient that became significant in the previous bit-plane. For each bit-plane, the two-pass scheme divides coefficients into two sets: one containing coefficients that became significant in a previous bit-plane; 13 and one containing coefficient not yet identified to be significant. The refinement bits have roughly an equal chance of being one or zero, while most significance pass bits are expected to be zero because many wavelet coefficients are close to zero. Due to this probability difference, the division of two sets leads to better rate-distortion performance. JPEG2000 uses three passes per bit-plane instead of two. The third pass, cleanup pass, is used to convey significance and sign information for those coefficients that have not yet been found to be significant and are predicted to remain insignificant during the processing of the current bit plane. Using several passes per bit plane reduces the amount of data associated with each coding pass, facilitating finer control over rate. 2.2 Set-Partitioning Image Coding As mentioned before, in the significance pass, a significance map is used to represent each bit-plane. The efficiency of representing this map determines the compression performance. An efficient method of representing the significance map is to group coefficients into sets. A set S is said to be significant with respect to the bitplane n if max {|ci,j |} ≥ 2n (i,j)∈S otherwise it is insignificant. Where ci,j is the wavelet transform coefficient at coordinate (i, j). The insignificant set can be represented with a single bit 0, while the significant set is recursively partitioned and tested until the significant coefficient is located. The objective of set partitioning scheme is to create new partitions such that subsets expected to be insignificant contain a large number of coefficients, while subsets expected to be significant contain only one. There are typically two types of sets: • Interband Sets: contain coefficients from a number of different subbands. The best known and widely implemented interband set is the zerotree set in EZW and Spatial Orientation Tree in SPIHT. • Intraband Sets: only contain coefficients that lie wholly within a subband. An example of the intraband coding scheme is Quadtree partitioning which 14 quickly zooms to areas of high energy while maintaining large sets of low energy coefficients. In this section, set partitioning image compression algorithms that serve as a benchmark are discussed in detail. 2.2.1 SPIHT Said and Pearlman’s SPIHT [19] belongs to a class of embedded, tree-structured significance mapping schemes. Compared to EZW, more (wide-sense) zerotrees are efficiently found and represented in SPIHT by separating the tree root from the tree, i.e. zerotree with its root significant coefficient. As the study in [20], SPIHT is a degree-2 zerotree coder and EZW a degree-0 zerotree coder. This resulted in a better performance and faster speed than EZW. 2.2.1.1 Spatial Orientation Trees Spatial orientation trees, or Zerotrees, are based upon the hypothesis that if a wavelet coefficient at a coarse scale is insignificant with respect to a given threshold T, then all wavelet coefficient of the same orientation in the same spatial location at finer scales are likely to be insignificant with respect to T [18]. In the hierarchial subband system, every coefficient at a given scale can be related to a set of coefficient at the next finer scale of the same orientation. The spatial orientation tree is constructed based on the hypothesis and this parent-child relationship across levels of the decomposition. The trees are partitioned into four types of sets: • H: roots of all spatial orientation trees. They are grouped into 2 × 2 blocks whereby the upper left coefficient has no offspring. • O(i, j): the offspring set contains the direct offspring of the node at coordinates (i, j), that is the four coefficients at the same spatial location in the next level of the pyramid. Except at the highest and lowest pyramid levels, the offspring set is defined as: O(i, j) = (2i, 2j), (2i, 2j + 1), (2i + 1, 2j), (2i + 1, 2j + 1); (2.1) 15 • D(i, j): set of all descendants of the coefficient (i, j). • L(i, j): D(i, j) − O(i, j). The parent-child relationships and four types of set definitions of SPIHT in the 2D spatial orientation tree are shown in Fig 2.3. Here we say that a LIS entry is of type A if it represents D(i, j), and of type B if it represents L(i, j). * LL3 HL3 LH3 HH3 LH2 LH1 HL2 HH2 HL1 HH1 TypeBset TypeAset Setroot(i,j) Setroot(i,j) O(i,j) D(i,j) L(i,j) L(ij,) Figure 2.3: Parent-child relationship in SPIHT. 16 2.2.1.2 Coding Algorithm The SPIHT algorithm maintains three lists to store significant information. Significant coefficients are stored in the List of Significant Pixels (LSP), while insignificant coefficients are stored in the List of Insignificant Pixels (LIP). Coefficient c(i, j) which represents insignificant set D(i, j) or L(i, j) is stored in List of Insignificant Sets (LIS) of type A or LIS of type B, respectively. The algorithm consists of four stages: initialization, sorting pass, refinement pass and quantization step update. The last three of the four stages are repeated for each bit-plane. First, the algorithm is initialized by adding all coefficients in the lowest subband to the LIP, and all those with offspring to the LIS as type A. The sorting pass begins by testing each entry in the LIP for significance with respect to the current threshold and coding the result. For each type A entry in the LIS the descendant set is tested. If it is significant, the set is partitioned into a type B set and four offspring. The reason is that if a set is significant, it is likely that the descendant which is significant will turn out to be its offspring. After the decomposition, the decomposed coefficients and sets are further tested for significance. This process is repeated until all the significant coefficients of that root are located. During the refinement pass, for each entry in the LSP, except those added in the last sorting pass, the MSB of the coefficients are outputted. In this manner, a bitplane transmission scheme is achieved. For the quantization step, n is decremented by 1 to process the next lower bit plane. This process will be repeated until either the desired rate is reached or all coefficients have been transmitted. As a result, SPIHT generates a progressive codestream. The detailed SPIHT algorithm is presented below. Terminology: Sn (τ ) ≡ significance of set τ w.r.t. n. Sn (τ ) = SPIHT Algorithm 1, if 2n ≤ max |ci,j | ≤ 2(n+1) , (i,j)∈τ 0, otherwise. (2.2) 17 1. Initialization • output n = blog2 (max |ci,j |)c ∀(i,j) • set LSP = φ • set LIP = (i, j) ∈ H • set type A LIS = (i, j) ∈ H, s.t. D(i, j) 6= φ 2. Sorting Pass (a) for each (i, j) ∈ LIP, • output Sn (i, j) • if Sn (i, j) = 1, move (i, j) to LSP and output the sign of ci,j (b) for each (i, j) ∈ LIS, i. if (i, j) ∈ LIS (type A), • output Sn (D(i, j)) • if Sn (D(i, j)) = 1, – for each (k, l) ∈ O(i, j), ∗ output Sn (k, l) ∗ if Sn (k, l) = 1, add (k, l) to LSP and output sign of ck,l ∗ if Sn (k, l) = 0, add (k, l) to LIP – if L(i, j) = φ, remove (i, j) from LIS (type A) and skip step ii; else change (i, j) to type B ii. if (i, j) ∈ LIS (type B), • output Sn (L(i, j)) • if Sn (L(i, j)) = 1, – add each (k, l) ∈ O(i, j) to LIS (type A) – remove (i, j) from LIS (type B) 3. Refinement Pass (a) for each (i, j) ∈ LSP, except those included in the last sorting pass, output the nth MSB of |ci,j | 18 4. Quantization Step (a) decrement n by 1 (b) go to step 2 2.2.2 SPECK The SPECK algorithm [17] improves upon SPIHT [19], SWEET [37] and AGP [38] by producing a fully embedded bit-stream which employs progressive transmission by coding bitplanes in decreasing order. SPECK coding scheme provides excellent results, comparable to the popular image coding scheme, such as SPIHT. SPECK is different from SPIHT in that it does not use spatial orientation trees, rather, like SWEET and AGP, it makes use of sets of the form of blocks of contiguous coefficients within subbands. SPECK incorporates the octave band partitioning of SWEET to exploit the hierarchical structure of the wavelet transform. It makes use of the quadtree splitting scheme from AGP to quickly zoom to areas of high energy while maintaining areas of low activity in relatively large sets. The SPECK algorithm makes use of rectangular regions of image, referred to as set of type S. The dimension of a set S depends on the dimension of the original image and the subband level of the pyramidal structure at which the set lies. To test the significance of the set S, SPECK follows the same terminology used in the SPIHT algorithm. Two linked lists are maintained in SPECK algorithm: List of Insignificant Sets (LIS) and List of Significant Pixels (LSP). The former contains sets of type S of varying sizes which have not yet been found significant, while the latter contains coefficients which have been tested significant. Like SPIHT, SPECK also consists of four steps: the initialization step; the sorting pass; the refinement pass; and the quantization step. The algorithm starts by partitioning the transformed image into two sets: set S which is the root of the pyramid, and set I which is everything that is left of the image after taking out the root, as shown in Fig. 2.4. To start the algorithm, set S is added to the LIS. The sorting pass examines the significance of the LIS and set I. If a set S in LIS is insignificant, it stays in the LIS. Otherwise, quadtree partitioning will be applied 19 to S. The significant set S is partitioned into four equal subsets and retested. In this manner, the quadtree procedure recursively divides the set S into homogeneous rectangular regions until all significant coefficients are located. The partitioning process is demonstrated in Fig. 2.5 where quadtree partitioning is used to locate two significant coefficients. The motivation for quadtree partitioning of such sets is to zoom in quickly to areas of high energy in the set S and code them first. S I Figure 2.4: Partitioning of wavelet transformed image into sets S and I. S0 = 1 S1 = 0 S =1 S2 = 0 S3 = 1 0 0 1 0 S2 = 0 S1 = 0 1 0 0 0 Figure 2.5: Quadtree partitioning of set S. For each bit plane, after testing all sets of type S, the set I is tested next. If I is significant, it is partitioned by another partition scheme - the octave band partitioning. Fig. 2.6 gives an illustration of this partitioning scheme. Set I is partitioned into four sets - three type S sets and on type I set. These new sets are recursively tested for significance. The octave partitioning scheme is used to exploit this hierarchical structure of the subband decomposition, where the energy is more likely concentrated at the top most levels of the pyramid and as one goes down the pyramid, the energy content decreases gradually. Once one sorting pass has occurred, sets of type S of varying sizes are added to LIS. During the next lower bit plane, these sets are processed in increasing order of 20 S1 S2 S3 I I Figure 2.6: Octave partitioning of set I. their size. SPECK uses an array of lists. Each list corresponds to a level of pyramid and stores sets of fixed size. Processing the lists in an order that corresponds to increasing size of sets completely eliminates the need for any sorting mechanism which significantly slows down the coding speed. Once all the sets have been processed for a bit plane, the refinement pass is initiated for that bit plane. This procedure is the same as that of SPIHT. The pseudo code of SPECK is given below. 1. Initialization • Partition image transform X into two sets: S ≡ root, and I ≡ X − S • output n = blog2 ( max |ci,j |)c ∀(i,j)∈X • add S to LIS and set LSP = φ 2. Sorting Pass • in increasing order of size of sets – for each set S ∈ LIS, ∗ ProcessS(S) • ProcessI() 3. Refinement Pass • for each (i, j) ∈ LSP, except those included in the last sorting pass, output the nth MSB of |ci,j | 21 4. Quantization Step • decrement n by 1, and go to step 2 ProcessS(S) { • output Sn (S) • if Sn (S) = 1 – if S is a coefficient, output sign of S and add S to LSP – else CodeS(S) – if S ∈ LIS, remove S from LIS • else – if S ∈ / LIS, add S to LIS } CodeS(S) { • partition S into four equal subsets O(S) • for each O(S) – output Sn (O(S)) – if Sn (O(S)) = 1 ∗ if O(S) is a coefficient, output sign of O(S) and add O(S) to LSP ∗ else CODES(O(S)) – else ∗ add O(S) to LIS } ProcessI() 22 { • output Sn (I) • if Sn (I) =1 – CodeI() } CodeI() { • partition I into four sets - three S and one I • for each of the three sets S – ProcessS(S) • ProcessI() } 2.2.3 SBHP Subband Block Hierarchical Partitioning (SBHP) [16], a SPECK variant, was originally proposed as a low complexity alternative to JPEG2000. SBHP has been incorporated into the JPEG2000 Verification Model (VM) 4.2, where a command line switch initiates the SBHP coding engine in place of EBCOT [24] in coding the codeblocks of the subbands. Every single feature and mode of operation supported by the VM continues to be available with SBHP. Like EBCOT, SBHP is applied to blocks of wavelet coefficients extracted from inside subbands. Except for the fact that it does not use the arithmetic encoder, it does not require any change in any of the VM function outside of the entropy coding. SBHP uses SPECK’s octave-band partitioning scheme on codeblocks and encodes the S sets with the quadtree splitting CodeS(S) procedure of SPECK as described in the last section. Minor differences are that SBHP maintains three lists: List of Insignificant Sets (LIS), List of Insignificant Pixels (LIP) and List of Significant Pixels (LSP). SBHP uses a separate LIP for insignificant isolated pixels. But 23 the LIP is visited first and then the LIS in order of increasing size sets. Therefore, the two lists LIP and LIS are functionally equivalent to the one LIS list in SPECK. Figure 2.7 shows the sequential process of partitioning a 16 × 16 codeblock. In SBHP, the partitioning of the codeblock mimics the octave band partitioning in SPECK by starting with a 2 × 2 blocks S at the upper left with the rest of the block, the I set, as shown in figure 2.7(a). The coding proceeds in the codeblock just as it does for the full-transform SPECK described in the last section until the block’s target file size is reached. In this example, the first set can be decomposed in 4 individual pixels and the second set can be decomposed in three 2 × 2 blocks and the remaining pixels. Figure 2.7(b) shows the next level of decomposition: each 2×2 set can be decomposed in 4 pixels, and the remaining set can be partitioned into groups of 4 × 4, plus the remaining pixels. In the next stage, each 4 × 4 set is split into 4 2 × 2 sets, and the remaining set is partitioned in 3 8 × 8 sets. As shown in figure 2.7(c), at this moment there is no set of remaining pixels. Figure 2.7(d) shows how the process continues, until all sets are partitioned to individual pixels. The procedure is repeated on the next codeblock until all codeblocks in each subband are coded. The subbands are visited in order from lowest to highest frequency in the same order dictated by the octave band partitioning in the full-transform SPECK. The pseudo code of the SBHP algorithm is given below. For l = 1, 2, ..., L 1. Initialization • Partition code-block X into two sets: S ≡ the top-left 2 × 2 rectangular prism, and I ≡ X − S • output n = blog2 ( max |ci,j |)c ∀(i,j)∈X • add S to LIS, set LIP = all coefficients in the codeblock and set LSP = φ 2. Sorting Pass • for each ci,j ∈ LIP output Sn (ci,j ) – if Sn (ci,j ) = 1 24 Figure 2.7: Set partitioning rules used by SBHP. ∗ output sign of ci,j and move ci,j from LIP to LSP • in increasing order of size of sets – for each set S ∈ LIS, ∗ ProcessS(S) • ProcessI() 3. Refinement Pass • for each (i, j) ∈ LSP, except those included in the last sorting pass, output the nth MSB of |ci,j | 4. Quantization Step • decrement n by 1, and go to step 2 ProcessS(S) { 25 • output Sn (S) • if Sn (S) = 1 – CodeS(S) – if S ∈ LIS, remove S from LIS • else – if S ∈ / LIS, add S to LIS } CodeS(S) { • partition S into four equal subsets O(S) • for each O(S) – output Sn (O(S)) – if Sn (O(S)) = 1 ∗ CodeS(O(S)) – else ∗ add O(S) to LIS } ProcessI() { • output Sn (I) • if Sn (I) =1 – CodeI() } CodeI() 26 { • partition I into four sets - three S and one I • for each of the new generated sets S – ProcessS(S) • ProcessI() } SBHP uses a simple fixed Huffman code of 15 symbols for encoding the significance map bits generated by the SPECK algorithm. No type of entropy coding is used to code the sign and the refinement bits. This results in compression loss, but it is observed that it is very difficult to compress these bits efficiently, and nothing is simpler than just moving those ”raw” bits to the compressed stream. Although EBCOT outperforms SBHP in terms of PSNR, tests showed that SBHP was about 4 times faster than the JPEG2000 VM 4.2 in encoding and about 6 to 8 times faster in decoding for the embedded version and as much as 11 times faster for the nonembedded version, in which case the complexity of SBHP becomes close to baseline JPEG. For natural images, such as photographic and medical images, the reductions in PSNR from VM 4.2 are in the range of 0.4-0.5 dB. SBHP showed losses in bit rate at the same PSNR level from 5-10% for lossy compression and only 1-2% for lossless compression [39]. 2.2.4 EBCOT Taubman’s embedded block coding with optimized truncation (EBCOT) [24] is significant because it offers the capabilities of embeddedness, resolution scalability, and spatial accessibility. In EBCOT, compression to a specific rate is achieved as a two tier process. The image is first compressed without considering the target bit rate, then the second tier postprocesses the bitstream to produce a rate-distortion optimized bitstream for a specific rate. 27 EBCOT divides each subband into relatively small codeblocks (typically 32 × 32 or 64 × 64), and codes each codeblock independently to generate an embedded bitstream for that block. Truncation points are marked in the embedded bitstream for each codeblock, with each truncation corresponding to a certain quality metric. The tier two algorithm selects various truncation points from each codeblock to construct the optimal embedded bitstream for a given bit rate. Since the codeblocks are relatively small, and the multi-resolution wavelet transform method is used, this coder is also spatially accessible and resolution scalable. The downside of this scheme is that for each desired embedded rate, truncation points have to be marked in each codeblock. To approximate a true embedded scheme, EBCOT has to select a large number of truncation points for each codeblock. The overhead associated with location and quality metric information of each truncation point has a negative effect on performance. In EBCOT, codeblocks are encoded in bit-plane order. Each codeblock is recursively partitioned into sub-blocks down to typically 16 × 16 dimensions. The quadtree structure is applied over sub-blocks to code the significance of each subblock explicitly prior to sample-by-sample coding in the significant sub-blocks. The significant sub-block is encoded with a specialized sub-block coding method. The coefficients of a significant sub-block are coded by an arithmetic coder with 18 different contexts. A coefficient’s context is decided by its own significance status and its eight adjacent neighbors’ significance status. Moreover, five additional contexts are used for encoding the sign bits of coefficients that just became significant. Fractional bitplane coding is achieved using four passes for each bit-plane. Over a wide variety of images and bit-rate, PSNR performance improvement over SPIHT is about 0.4 dB on average. The EBCOT algorithm is the basis for the JPEG2000 standard with some modifications introduced to the entropy coding part of EBCOT. Most of the changes are described in [40, 24]. Here we give a brief summary of these changes. • To reduce the model adaptation cost in typical images, some of the contexts are initialized in the assumed highly skewed state instead of the traditional equi-probable state. 28 • A low-complexity but less-effective coder known as the MQ [41] coder is used instead of the traditional arithmetic coder. • Only three coding passes are used in the fractional bitplane scheme. • There is no quadtree partitioning of code-blocks. Each bit-plane of a codeblock is scanned in a particular order as shown in Figure 2.8. Figure 2.8: Example of JPEG2000 code-block scan pattern The cumulative effect of these modification is a 40% improvement in execution speed for the entropy coding part with an average loss of about 0.15 dB [24]. 2.3 Conclusion In previous sections, we have discussed the desirable features of image com- pression and most popular wavelet-based image coding algorithms proposed recently. In Table 2.2, we make an approximate comparison of these algorithms to provide a general guide to choose appropriate coder for different applications. For example, when low complexity and good PSNR performance are a must, we can choose algorithms like SPIHT or SPECK. If scalability and accessibility are preferred, SBHP and JPEG2000 are good candidates. 29 Original SPIHT SPECK SBHP EBCOT/JPEG2K PSNR Performance Very Good Very Good Good Excellent SNR Scalability Yes Yes Yes Yes Resolution Scalability No Yes Yes Yes Random Accessibility No No Yes Yes Complexity Low Low Very Low High Table 2.2: Comparison of wavelet-based image coders CHAPTER 3 LOW-COMPLEXITY 3-D IMAGE CODER: 3D-SBHP In this chapter, we present a low-complexity three dimensional image compression algorithm to support Signal-to-Noise Ratio (SNR) scalability, resolution scalability and Region-Of-Interest (ROI) decoding. We demonstrate progressive lossy to lossless compression of volumetric images using three dimensional integer wavelet transform and subband block hierarchical partitioning (SBHP). The coding efficiency comes from exploiting the dependencies in all three dimensions. The hierarchial coding and block based structure enables spatial accessibility and resolution scalable representation of wavelet transform coefficients. 3.1 Introduction Nowadays, many medical data acquisition devices and multispectral imaging techniques produce three-dimensional image data. The increasing use of threedimensional imaging modalities triggers the need for efficient techniques to transport and store the related volumetric data. To make interactive viewing more efficient, we need a compression scheme which is inherently scalable and that supports a high degree of random accessibility and fast encoding/decoding. To store and transmit those three-dimensional data, typically the image volume is considered as composed of multiple slices. In the previous chapter, we introduced several benchmarks of efficient wavelet based embedded image compression algorithm. It is always possible to compress the slices independently and use multiplexing mechanisms to select from each slice the correct bitstream to support the required Quality-of-Service for the whole volumetric images. And we shall use such a scheme as our reference point. Since neighboring slices have high spatial correlation, it is natural to try to improve compression efficiency by exploiting this property. To provide scalability and compression efficiency , many wavelet-based coders have been extended to three-dimensions, such as Three-Dimensional Context-Based Embedded Zerotree of 30 31 Wavelet coefficient(3D-CB-EZW) [7] and Three-Dimensional Set Partitioning In Hierarchical Trees(3D-SPIHT)[9, 11, 10], Stripe-based SPIHT [12], Three-Dimensional Set Partitioned Embedded bloCK (3D-SPECK)[21], Three Dimensional Cube Splitting EBCOT (3D CS-EBCOT) [14], Three-Dimensional SPIHT with Random Access and Resolution Scalability (RARS 3D-SPIHT) [13], and Annex of Part II of JPEG2000 [27] standard for multi-component imagery compression. Although all of these algorithms support SNR scalability, only EBCOT-based 3D CS-EBCOT, JPEG2000 multi-component and RARS 3D-SPIHT can support SNR/resolution scalability and random accessibility simultaneously. However, 3D CS-EBCOT inherits high complexity from EBCOT, RARS 3D-SPIHT has relatively high memory requirement, while JPEG2000 multi-component fails to enable a number of source coding features in the cross-component direction. With interactive 3D image viewing in mind as the primary application, we can see that among algorithms shown in Table 2.2, SBHP is a very good candidate for volumetric images. Recently, JP3D, part 10 of JPEG20000 [28] that will provide extensions of JPEG2000 for logically rectangular 3D data sets with no time component, has been issued, but there is no workable application available. Our work, presenting a low-complexity 3D image coder for interactive applications, is motivated in part by JP3D. Our work not only emphasizes desired features, such as scalability and the ability to access regions of interest within volumetric images, but also provides an application which allows efficient network access to compressed volumetric image data and their metadata in a way that exploits these features. In this chapter, we first introduce briefly the 3D wavelet/subband transform structure used in 3D-SBHP and describe the lifting scheme and integer to integer transforms. In addition to the integer transform, we also describe the proper scaling of coefficients and wavelet transform structure that result in approximate 3D unitary transform. In Section 3.3, we present the details of the 3D-SBHP algorithm. Simulation results, comparisons, and analysis are given in Section 3.4. Finally, we summarize and conclude this chapter in Section 3.5. 32 3.2 Three-Dimensional Integer Wavelet Transform The proposed volumetric coding system consists of a 3D wavelet/subband transform part and a coding part with 3D SBHP kernel. Table 3.1 shows the average standard deviation (STD) of four medical and hyperspectral image sequences given in Table 6.2 along the X, Y, and axial Z direction, respectively. As an example, the average STD along the axial direction Z is calculated by Averagex,y (stdz (GOF (x, y, z))) where Averagex,y is the mean function s with respect to x and y, stdz is the function i=N P 1 of STD with respect to z (stdz = (zi − z)2 ). The result shows that the N i=1 STDs along X and Y directions are close, while STD along Z is much smaller. So, it is reasonable to apply wavelet transform along Z direction in a different way from that along the X and Y directions. In our 3D wavelet transform scheme, the 2D spatial transform and 1D axial transform (along image slices) are done separately by first performing a 2D dyadic wavelet decomposition on each image slice, and then performing a 1D wavelet packet decomposition along the resulting image slices. A heterogeneous selection of filter types and a different amount of decomposition levels for each spatial direction (x, y or z direction) are supported by this separable wavelet decomposition module. This allows for adapting the size of the wavelet pyramid in each spatial direction in case the spatial resolution is limited. Figure 3.1 shows an example of a wavelet decomposition structure of a packet non-symmetric transform with 3 levels of 2D spatial transform followed by 2 levels of 1D axial transform. STD CT skull MR Liver t1 moffett scence 1 moffett scence 2 X 39.927 28.157 654.062 907.438 Y 48.624 42.851 702.028 1208.555 Z 5.061 9.173 578.474 370.583 Table 3.1: Average standard deviation of volumetric image sequences along X, Y, and Z directions. Because the number of slices in a typical volumetric data set can be quite large, it is impractical to buffer all slices for the axial transform. In our scheme, 33 Figure 3.1: Wavelet decomposition structure with 3 levels of 2D spatial transform followed by 2 levels of 1D axial transform. the image slices are collected into a Group Of Slices (GOS) of F consecutive slices. Each GOS is independently transformed and coded. This also make random access to the selected slice easier. 3.2.1 Lifting Scheme The traditional Mallat’s algorithm [42] of wavelet transform involves recur- sively convolving the signal through two decomposition filters h and g, and decimating the result to obtain wavelet coefficients at every decomposition level. However, integer wavelet transforms are not easy to construct by this traditional method. W. Sweldens [35] introduced the lifting scheme allowing to compute the discrete wavelet transform with a reduced computational complexity and support for lossless transform [43]. The idea of lifting scheme is to divide the wavelet transform into split and a set of lifting steps. Sweldens et. al. [44] proved the theorem: Every wavelet or subband transform with finite filters can be obtained as the Lazy wavelet followed by a finite number of primal and dual lifting steps and a scaling. The lifting scheme 34 is shown in Figure 3.2 and Figure 3.3. The dual polyphase matrix and polyphase matrix are given by [44]: P̃ (z) = m Y i=1 P (z) = 1 0 −1 0 −si (z ) 1 m Y 1 si (z) 0 i=1 z 1 0 ti (z) 1 t1 ( z ) K K 0 0 1/K P (z − 1) ~ (3.1) K −1 LP K BP (3.2) t m (z ) sm ( z ) 2 Lazywavelet 0 2 s1 ( z ) 0 1 1/K 1 1 −ti (z −1 ) t Figure 3.2: The forward wavelet transform using lifting: First the Lazy wavelet (subsample into even and odd), then alternating lifting and dual lifting steps, and finally a scaling. LP K + t m (z ) BP K −1 sm ( z ) + t1 ( z ) + + P(z ) 2 + s1 ( z ) 2 z −1 Inverselazywavelet Figure 3.3: The inverse wavelet transform using lifting: First a scaling, then alternating dual lifting and lifting steps, and finally the inverse Lazy transform. Since we can write wavelet transform with lifting steps, it follows that we can build an integer version of every wavelet transform. For example, one can in each lifting step round-off the result of the filter right before the adding or subtracting. This results in an integer to integer transform. We present several lossless integer lifting filters of the form (N, Ñ ) where N is the number of vanishing moments of the analyzing high pass filter, while Ñ is the number of vanishing moments of the synthesizing high pass filter (vanishing 35 moments correspond to the multiplicity of zero as a root in the spectum of the filter). The integer (2, 2) filter we give below is the 5/3 filter used in JPEG2000. Table 3.2 gives the name and number of filter taps of the given filters [36]. • S+P (B) transform [45] h[n] = x[2n + 1] − x[2n] hd [n] = x[2n] − b i=1 P l[n] = h[n] + b h[n] c 2 αi (l[n + i − 1] − l[n + i]) − β1 h[n + 1] + 1/2c (3.3) i=−1 where α−1 = −1/16, α0 = 4/16, α1 = 8/16, β1 = 6/16. • (2,2) transform + 1/2c h[n] = x[2n + 1] − b x[2n]+x[2n+2] 2 l[n] = x[2n] + b h[n−1]+h[n] + 1/2c 4 (3.4) • (2,4) transform h[n] = x[2n + 1] − b x[2n]+x[2n+2] + 1/2c 2 19 (h[n − 1] + h[n]) − l[n] = x[2n] + b 64 3 (h[n 64 − 2] + h[n + 1]) + 1/2c (3.5) • (2+2,2) transform h[n] = x[2n + 1] − b x[2n]+x[2n+2] + 1/2c 2 l[n] = x[2n] + b h[n−1]+h[n] + 1/2c 4 hd [n] = h[n] − b −l[n−1]+l[n]+l[n+1]−l[n+2] + 1/2c 16 Filter Name S+P 5/3 9/3 5/11 Number of Vanishing Moments (2,4) (2,2) (2,4) (2+2,2) Number of low-pass Filter Taps 2 5 9 5 Number of high-pass Filter Taps 6 3 3 11 Table 3.2: Lossless integer filters. (3.6) 36 From the computational standpoint, the lifting scheme has shown numerous benefits over traditional DWT. Daubechies and Sweldens [44] give the theorem: Asymptotically, for long filters, the cost of the lifting algorithms for computing the wavelet transform is one half of the cost of the standard algorithm. For (N, Ñ ) wavelet, the cost for standard algorithm is 3(N + Ñ ) − 2 while the cost of the lifting algorithm is 3/2(N + Ñ ). Furthermore, the lifting scheme requires only inplace computations, and the integer-to-integer lifting scheme does not require any floating point computations, which yields substantial saving in computation and memory requirements. 3.2.2 Scaling Factors A typical problem encountered with 3D integer wavelet transform is the com- plexity needed to make the transform unitary. The problem is caused by the fact that integer wavelet transforms are not unitary. So the quantization error in the wavelet domain is not same as the error in the spatial domain. This does not affect the performance of lossless compression. However, a unitary transform is necessary in order to achieve a good lossy coding performance. The normalization factors can be determined by calculating the L2 norm of the low-pass and high-pass filters. Generally, in the 1D case, the low-pass band needs √ √ be scaled up by 2 while high-pass band needs be scaled down by 1/ 2 after every two-band wavelet decomposition to make the transform unitary. It is not difficult to make 2D integer transform unitary since the typical scaling factors to obtain a unitary transform for the 2D case are approximate powers of two [45]. Therefore, it can be easily implemented by bit shifts. Figure 3.4(a) shows the scaling factor for three-level 2D dyadic integer wavelet transform. However, the scaling factors are not powers of two for odd dimensional dyadic integer wavelet transforms. Therefore, in 3D the scaling factor for different subbands are not all powers of two, thus, simple bit shit cannot make the transform approximately unitary. We have to find a 1D transform structure that allows simple bit shift of wavelet coefficients to make the transform unitary. Some proposals [8, 46] have been formulated that make use of a wavelet packet transform to get unitarity. 37 Fig 3.4(b) shows the simplest two-level wavelet packet tree. In our 3-D scheme with integer filters, to make the 3-D transform approximately unitary, 1-D wavelet packet decomposition and scaling are performed in axial domain, and 2-D dyadic wavelet decomposition and 2D scaling factor are applied on each image slice. Fig 3.4(c) shows scaling factors after three-level 2D integer wavelet transform in the spatial dimension and two-level 1D packet transform in the axial dimension, where GOS = 4. The factors are the multiplication of corresponding scaling factors in Figure 3.4(a) and Figure 3.4(b). For wavelet decomposition, asymmetric (decoupling) decomposition is chosen since it is reported to show better performance than symmetric decomposition when the correlation along the axial direction is stronger than that along the horizontal and vertical direction [47]. In the asymmetric 3D wavelet transform, several cascaded wavelet transforms along the Z direction remove the correlation along the axial direction first. Then, a set of alternate wavelet transforms along the X and Y directions remove the correlation in the spatial domain. Figure 3.5 gives an example of 3D asymmetric decomposition with two-level wavelet decomposition in the axial domain and three-level wavelet decomposition in the spatial domain. 3.3 Scalable 3D-SBHP Consider a 3D image data set has been transformed using 3D discrete wavelet transform, as shown in Figure 3.5. The image sequence is represented by an indexed set of wavelet transformed coefficients ci,j,k located at the position (i, j, k) in the transformed image sequence. Following the idea in[19], for a given bit plane n and a given set τ of coefficients, we define the significance function: Sn (τ ) = 1, if 2n ≤ max |ci,j,k | ≤ 2(n+1) , (i,j,k)∈τ 0, otherwise. (3.7) Following this definition, we say that set τ is significant with respect to bit plane n if Sn (τ ) = 1. Otherwise, we say that set τ is insignificant. In 3D-SBHP, each subband is partitioned into code-blocks with the same size. 3D-SBHP algorithm makes use of rectangular prisms in the code-blocks. These rect- 38 angular prisms or sets referred to as set of type S, can be of varying dimensions. The dimension of a set S depends on the dimension of the code-block and the partitioning rules. Because of the limited number of frames in a GOS, the dimension along the axial direction of the code-block might be much shorter than dimension along x and y direction. This results that some S sets are 2D sets, i.e., axial dimension = 1. We define Max2D to be the maximum 2D S set that can be generated. For a 2m × 2m × 2l code-block, the Max2D is the 2m−l × 2m−l × 1 set. 3D-SBHP always has S sets with at least 2 × 2 × 1 coefficients. The size of a set is defined to be the cardinality C of the sets, i.e., the number of coefficients in the set. During the course of the algorithm, sets of various sizes will be formed, depending on the characteristics of coefficients in the code-block. size(S) = C(S) ≡ |S| (3.8) 3D-SBHP also defines another type of set referred to as type I. These sets are obtained by chopping off a small rectangular prism from the top left portion of the code-block. Figure 3.6 illustrates a typical set I. To minimize the number of significant tests for a given bit-plane, 3-D SBHP maintains three lists: • LIS(List of Insignificant Sets) - all the sets(with more than one coefficient) that are insignificant but do not belong to a larger insignificant set. • LIP(List of Insignificant Pixels) - coefficients that are insignificant and do not belong to insignificant set. • LSP(List of Significant Pixels) - all coefficients found to be significant in previous passes. 3.3.1 Coding Algorithm The 3D-SBHP coder is applied to every code-block independently and gen- erates a highly scalable bit-stream for each code-block by using the same form of progressive bit-plane coding as in SPIHT[18]. The coder encodes the code-blocks 39 resolution by resolution, from code-blocks in lower resolution subband to code-blocks in higher subbands. This enables progressive resolution decoding. The 3D-SBHP algorithm consists of the initialization step, the sorting and refinement passes and quantization step. Assume the total number of code-blocks is L and all code-blocks are indexed from low-pass subband to highpass subband. The pseudo code of the algorithm is given below. For l = 1, 2, ..., L 1. Initialization • Partition code-block X into two sets: S ≡ the top-left 2×2×1 rectangular prism, and I ≡ X − S (see Figure 3.6) • output n = blog2 ( max |ci,j,k |)c ∀(i,j,k)∈X • add S to LIS, set LIP = all coefficients in the codeblock and set LSP = φ 2. Sorting Pass • for each ci,j,k ∈ LIP output Sn (ci,j,k ) – if Sn (ci,j,k ) = 1 ∗ output sign of ci,j,k and move ci,j,k from LIP to LSP • in increasing order of size of sets – for each set S ∈ LIS, ∗ ProcessS(S) • ProcessI() 3. Refinement Pass • for each (i, j, k) ∈ LSP, except those included in the last sorting pass, output the nth MSB of |ci,j,k | 4. Quantization Step • decrement n by 1, and go to step 2 40 ProcessS(S) { • output Sn (S) • if Sn (S) = 1 – CodeS(S) – if S ∈ LIS, remove S from LIS • else – if S ∈ / LIS, add S to LIS } CodeS(S) { • if size(S) ≤ size(M ax2D) – partition S into four equal subsets O(S) • else – partition S into eight equal subsets O(S) • for each O(S) – output Sn (O(S)) – if Sn (O(S)) = 1 ∗ CodeS(O(S)) – else ∗ add O(S) to LIS } ProcessI() { 41 • output Sn (I) • if Sn (I) =1 – CodeI() } CodeI() { • if size(X − I) ≤ size(M ax2D) – partition I into four sets - three S and one I • else – partition I into eight sets - seven S and one I • for each of the new generated sets S – ProcessS(S) • ProcessI() } 3-D SBHP is based on a set-partitioning strategy. Figure 3.7 and Figure 3.8 illustrate the partitioning process used in 3D-SBHP. These splits occur only when set is significant. Below we explain the partitioning rule of 3D-SBHP algorithm in detail by using a 16 × 16 × 4 code-block as an example. If the 16 × 16 × 4 code-block is significant, the algorithm starts by partitioning the code-block into two sets: set S which is composed of the 2×2×1 top-left wavelet coefficient in the first frame, and set I which contains the remaining coefficients, as shown in Figure 3.9(a). LIS is initialized to set S. The size (Max2D) = 4 × 4 × 1. In the first set partitioning stage, set S which is smaller than the Max2D can be decomposed into 4 individual coefficients, and set I can be decomposed into three 2 × 2 × 1 S sets and a new I set, as shown in Figure 3.9(b). Figure 3.9(c) shows the second stage of set partitioning, each 2 × 2 × 1 S set can be decomposed into 4 coefficients, and the remaining I set can be split into seven 4 × 4 S sets and a 42 remaining set, I set. In the third stage, as shown in Figure 3.9(d), each 4 × 4 × 1 S set is split into 4 2 × 2 × 1 S sets, and the I set is partitioned in seven 8 × 8 × 2 S sets. Here, size(X-I )> size(Max2D). Figure3.9(e) shows each 2 × 2 × 1 S set can be decomposed into 4 coefficients, and each 8 × 8 × 2 S set can be split into eight 4 × 4 × 1 S set. This process continues until all sets are partitioned to individual coefficients and these partitions only apply on significant sets. For each new bit plane, significance of coefficients in the LIP are tested first. Significant coefficients move from LIP to LSP; a bit 1 is output to indicate the significance of the coefficient; and another bit is outputted to represent the sign of the pixel. Then each set in the LIS are tested for significance. If the S is not significant, it stays in the LIS. Otherwise, the significant S set is partitioned following the quadtree partitioning rules as shown above until all significant coefficients in that S set are located and coded. The algorithm sends the significant coefficients to LSP. Once all sets of type S are processed, the set I, if it exists, is processed by testing it against the same threshold. If it is significant, it is partitioned by octave-band partitioning rule as shown above. After the set I is partitioned, the new S sets are processed in the regular image-scanning order. Once all the sets S and I have been processed, the refinement pass is initiated which refines the coefficients in LSP, except those included in the just-completed sorting pass. The last step of the algorithm is to lower the threshold by 1 and the sequence of sorting and refinement passes is repeated against this lower threshold. In our coding scheme, the above sequence of four steps: the initialization step, the sorting and refinement passes and quantization step, is applied to every codeblock independently and generates SNR progressive bit stream for every code-block. 3.3.2 Processing Order of Sorting Pass Once one sorting pass has occurred, insignificant coefficients and sets of type S of varying sizes are generated and added to LIP and LIS, respectively. During the sorting pass, the algorithm goes through the LIP first, then processes sets of type S in increasing order of their size. 43 This strategy is based on the idea that during the sorting pass, the algorithm sends those coefficients to LIP whose immediate neighbors have tested significant against some threshold but they themselves have not tested significant against that particular threshold. Because of energy clustering in the transform domain, these insignificant coefficients would have magnitudes close to the magnitudes of their significant neighboring coefficients. Therefore, these coefficients will have higher probability to become significant to some nearby lower threshold. The second reason is that the overhead involved in processing a single coefficient in LIP or a smaller size S set in LIS is much lower than that involved in processing a larger size S set. Therefore, if the coding algorithm stops in the middle of a sorting pass, we can get performance improvement and facilitate finer rate control by executing sorting pass in increasing order of set size. Instead of using a single large list having sets S of varying sizes, we use an array of smaller lists of type LIS, each containing sets of type S of a fixed size. Since the total number of sets S that are formed during the coding process remain the same, using an array of lists does not increase the memory requirement for the coder.Use of multiple lists completely eliminates the the need for any sorting mechanism for processing sets of type S in increasing order of their size and speeds up the encoding/decoding process. 3.3.3 Entropy Coding During the sorting pass, when a S set is split, the significant patterns of the mask have unequal probabilities as shown in Table A.6 and Table A.2, Appendix A. We exploit this fact to reduce the number of compressed bits with simple entropy coding. Although entropy coding is a powerful tool to improve compression performance, it does add complexity, and adaptive arithmetic or adaptive Huffman coding add much more. In order to keep the complexity small, instead of using arithmetic coding, 3D-SBHP uses only three fixed Huffman codes in some special conditions. We generate individual Huffman tables based on the analysis of training sets of both medical images and hyperspectral images as shown in Appendix A. Since after most sets are partitioned, there are four subsets or pixels, we can code them together. 44 In 3D-SBHP, we choose a Huffman code with 15 symbols, corresponding to all the possible outcomes. The largest Huffman codeword is of length 7 bits. To speed up decoding, we can use lookup tables instead of binary tree to decode. The other reason to choose fixed Huffman coding is the random accessibility. During the adaptive coding process, the entropy coder pays a price for inaccuracies in the conditional probability estimates until it converges to the source statistics. The probability adaptation process requests sufficient samples to get convergence. Unlike 3D-CS-EBCOT [14], our 3D-SBHP reduces the 3D code-block temporal dimensions to enhance slice accessibility. The smaller code-block might not have enough samples to compensate this learning penalty. For sign and refinement bits, their probabilities to be 1 and 0 are very near 1/2. It is very hard to compress these bits efficiently. Therefore, no type of entropy coding is used to code these bits, although this method results in some compression loss. We just move these ”raw” bits to the compressed bitstream. 3.3.4 Memory and Complexity Analysis 3D-SBHP has low memory requirements. The algorithm splits every subband into fixed-size code blocks and processes every code block independently. Therefore, at any given time during the coding process, only a fixed-size memory is used for coding/decoding. The size of the dynamic memory does not depend on the size of the volumetric image. It only depends on the size of the code block. So even for a huge size volumetric image, only a small size of dynamic memory is needed. Since the algorithm works with fixed-size code blocks, the fixed-size dynamic memory can be assigned in advance and used for all code blocks, and no time-consuming memory control is needed. Moreover, the data in the code block can fit in the CPU’s fast cache memory which minimizes access to slow memory. 3D-SBHP also has low computational complexity. Only the most basic operations, like memory access, bit shifts, additions and comparisons are required by the coder. No multiplication or division is required. The complexity analysis of 3D-SBHP can be divided in two parts: independent of and dependent on the bit rate. 45 The 3D-SBHP coder first executes the preprocessing pass that visits all coefficients in the code block to gather information about bits in all bit planes. This pass only requires one bitwise OR operation per coefficient following a predetermined sequence. All bit-plane coders need a similar pass to identify the top bit-plane. The bit rate-independent complexity is related to this preprocessing pass. As mentioned above, a bitwise OR operation is necessary for each coefficient and for each set. The total number of sets in 3D-SBHP is about 1/3 the number of coefficients, so we need about 4/3 accesses per coefficient for the preprocess pass. The bit rate-dependent complexity is proportional to the number of compressed bits. For most common bit rates, most of the computational effort is spent processing the LIS. We can roughly measure the complexity of coding a bit plane by counting the number of bit comparisons used to test its bits. 3D-SBHP tests only the elements in its lists. When all bits in a set inside a bitplane are equal to zero, the set partitioning process uses exactly one bit to indicate it. Since the information about these sets is gathered in the preprocessing pass, only one comparison is required (the decoder just reads the bit) per bit in the compressed stream. So,the number of bits generated per bitplane is equal to the number of comparisons. Two facts are important in reducing the list management complexity in 3DSBHP. First, the algorithm works with small fixed-size code-blocks, so the list memory can be assigned in advance, and no time-consuming memory control is required. Second, all the lists and list arrays are updated in the most efficient list management method - FIFO. 3.3.5 Scalable Coding In many applications, one may need to view only a low precision image with high resolution or a low resolution image with high precision. For these applications, the coder needs to have the capability to encode the image only once with all bitplanes and all resolutions, so that the user can extract a subset of the bit stream to reconstruct images with specified resolution and quality during the decoding. 46 3.3.5.1 Resolution Scalability In a wavelet coding system, resolution scalability enables increase of resolution when bits in higher frequency subbands are decoded. For a 2D image, after Nlevels of wavelet decomposition, the image has N + 1 resolution levels. For a 3D image sequence with N-level wavelet decomposition in the spatial direction and Mlevel wavelet decomposition in the spectral direction, a total of (N + 1) × (M + 1) resolution levels will be available as shown in Figure 3.10. 3D-SBHP is applied to every code-block inside the subbands independently. At the encoder side, along each direction, no code-block in the higher frequency subbands can be encoded before all code-blocks in the lower frequency subbands are encoded. As shown in Figure 3.11, the whole bit stream is resolution scalable. At the decoder side, if a user wants to view the image with resolution n, then the bits belonging to code-blocks related to resolution n will be extracted for decoding. To locate a specified code-block in the bitstream, the size of every code-block (the compressed bits generated for every code-block) need be kept in the header. 3.3.5.2 Rate Control Within every code-block, the bitstream is SNR progressive by bitplane coding, but overall, the bitstream is resolution progressive, not SNR progressive. But if the bits belonging to the same threshold from every code-block are put into the bitstream starting from the highest to the lowest threshold, then the composite bitstream would be embedded. For a given target bit rate, we need apply a bit allocation algorithm to select a cutting point for every code-block to minimize the distortion. The solution is the same rate-distortion slope for every code-block receiving non-zero rate. The Lagrangian optimization method given in [24] is used in our scheme to find the optimal cutting point for every code-block Bk whose embedded bitstream may be truncated at rate Rk and leads to distortion Dk in the reconstructed image. The additive distortion measure, squared error, is chosen to have the total distortion X D= Dk . The Lagrangian optimization tells that given a parameter λ, the set k of optimal truncation rate {Rk } is the one which minimizes the cost function 47 (D(λ) + λR(λ)) = X (Dkλ + λRkλ ), (3.9) k where X Rkλ = R ≤ Rtarget . So, if we can find a value of λ for all code-blocks such k that the truncation rates which minimize the cost function yield R(λ)) = Rtarget , then this set of truncation rates must be an optimal solution. Let λ1 and λ2 be two different Lagrangian parameters. Let (R1 , D1 ) and (R2 , D2 ) be the solutions of min(D + λR) for λ1 and λ2 , respectively. Then, by Lemma 2 in [25], we have R1 ≥ R2 if λ1 < λ2 . Given a target bitrate Rtarget , we can find the value λ quickly by using this property. The bitrate R(λ) is first be calculated with a starting value of λ. Then we modify the value of λ according to the relative value of Rtarget and R(λ). This process is repeated until the value is found. Since every code-block is coded separately, we need to solve the optimal truncation problem for every code-block Bk . A simple algorithm to find the truncation rate, Rk , which minimizes (Dkλ + λRkλ ) for a given λ, is as follows [24]: • initialize i = 0; • for j = 1, 2, 3, ... – set ∆Rkj = Rkj − Rki and ∆Dkj = Dki − Dkj , where Rkj increases with j; – if ∆Dkj /∆Rkj > λ, then update i = j. To calculate the distortion-rate slopes ∆Dkj /∆Rkj , we need information of number of bits used in compression and the decrease in distortion. Exact computation of the squared error requires computation of square values for each coded pixel, which is not trivial. In 3D-SBHP, instead calculating the distortion for each pixel, we simplify this computation by estimating the reduction of distortion [26] as a function of number of elements in the LIS, LIP and LSP. The evaluation of the distortion is done in the transform domain. The 3D integer wavelet transform with scaling is approximately orthogonal, so the error evaluating the distortion on the transform domain will remain reasonably small. 48 At the 3D-SBHP encoder side, on each bit plane n, all wavelet coefficients whose magnitudes are greater than the threshold τ = 2n and are less than 2τ are considered significant by the 3D-SBHP coder. Once a coefficient C is found significant, its position and approximate magnitude, which is 1.5τ , are inferred from the significance map by one bit and its sign is coded using one additional bit. Initially, at the decoder side, every coefficient of the transformed image is assumed to be zero. Here we assume that the coefficient value C is positive and uniformly distributed between [τ, 2τ ), then the expected square error in reproducing the coefficient as 0 is, Z 2 2τ 2 D0 = E{(C − 0) } = E{C } = τ 1 2 7 c dc = τ 2 . τ 3 (3.10) If we reproduce the coefficient at Ĉ = 1.5τ , then the expected squared error becomes Z 2 2τ 2 Dτ = E{(C − Ĉ) } = E{(C − 1.5τ ) } = τ 1 1 (c − 1.5τ )2 dc = τ 2 . τ 12 (3.11) So, finding a newly significant coefficient reduces the expected sum squared error by 7 1 27 D0 − Dτ = τ 2 − τ 2 = τ 2 . 3 12 12 (3.12) Equation 3.11 also gives the expected squared error of a coefficient refined up to significant threshold τ . If a coefficient is refined to significant threshold τ , the reduction by refinement is, D2τ − Dτ = 1 1 1 (2τ )2 − τ 2 = τ 2 . 12 12 4 (3.13) For each bit plane n, 3D-SBHP calculates rate-distortion information at three points (corresponding to ends of LIP pass, LIS pass and LSP pass) in the coding process for every code-block Bk . This information are rates Ri,n,k (i = 0, 1, 2, corresponding to LIP pass, LIS pass and LSP pass, respectively), the total number of bits used so far, Pi,n,k , the number of pixels in the LSP, and δDi,n,k , derivative of 49 the rate distortion curve. δDi,n,k can be calculated as the average decrease in distortion per coded bit. With Equation 3.12, Equation 3.13 and experimental refinement of the 27 12 factor, we approximate δDi,n,k as δD0,n,k = −2.15(P0,n,k − P2,n+1,k )τ 2 /(R0,n,k − R2,n+1,k ) (3.14) δD1,n,k = −1.95(P1,n,k − P0,n,k )τ 2 /(R1,n,k − R0,n,k ) (3.15) δD2,n,k = −0.25P2,n+1,k τ 2 /(R2,n,k − R1,n,k ) (3.16) For any truncation point Rk in the every code-block’s embedded bit-stream, linear interpolation is used to estimate the derivative of the rate distortion δD(Rk ) = δDi,n,k + (Rk − Ri,n,k )(δDi+1,n,k − δDi,n,k ) , Ri,n,k ≤ Rk ≤ Ri+1,n,k (3.17) (Ri+1,n,k − Ri,n,k ) To enable SNR scalability, rate-distortion information is calculated by (3.14), (3.15) and (3.16) for every bitplane n. And Ri,n,k and δDi,n,k are stored in the header for every code-block in the coding process, as shown in Figure 3.12. When decoding, the given Lagrangian optimization method is used to find the optimal truncation points for every code-block’s bit stream, then bitstream interleaving is performed to get the final bitstream. 3.4 Numerical Results We conduct our experiments on 4 8-bit CT medical image volumes, 4 8-bit MR medical image volumes, and 4 16-bit Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) hyperspectral image volumes. AVIRIS has 224 bands and 614 × 512 pixel resolution. For our experiments, we cropped the scene to 512 × 512 × 224 pixels. Table 3.3 shows the description of these sequences. In this section, we provide simulation results and compare the proposed 3-D volumetric codec with other algorithms. 50 File Name Image Type Volume Size Skull Wrist Carotid Aperts Liver t1 Liver t2e1 Sag head Ped chest moffett scence 1 moffett scence 2 moffett scence 3 jasper scence 1 CT CT CT CT MR MR MR MR AVIRIS AVIRIS AVIRIS AVIRIS 256 × 256 × 192 256 × 256 × 176 256 × 256 × 64 256 × 256 × 96 256 × 256 × 48 256 × 256 × 48 256 × 256 × 48 256 × 256 × 64 512 × 512 × 224 512 × 512 × 224 512 × 512 × 224 512 × 512 × 224 Bit Depth (bit/pixel) 8 8 8 8 8 8 8 8 16 16 16 16 Table 3.3: Description of the image volumes 3.4.1 Lossless Coding Performance To show the compression performance and SNR salability of 3D-SBHP, we first present the lossless coding results including comparisons of performance by using different integer wavelet transforms, and comparisons of performance using different code-block sizes. The results are given in bits per pixel, averaged over the entire image volume. We will show the lossy performance in the next subsection. In our experiments, all image sequences are compressed losslessly with GOS = 16. 3.4.1.1 Lossless Coding Performance by Use of Different Integer Wavelet Transforms Integer wavelet transforms S+P(B), I(2,2) and I(2+2,2) are compared for 3DSBHP, Asymmetric Tree based 3D-SPIHT (AT-3D-SPIHT)[51], 3D-SPECK and 2D-SBHP. Table 3.4 and Table 3.5 compare the lossless coding performance of 3DSBHP, AT-3D-SPIHT, 3D-SPECK, 3D-CB-EZW and 2D-SBHP on CT and MR volumetric data sets, respectively. Table 3.6 gives the comparison of 3D-SBHP, AT-3D-SPIHT, 3D-SPECK on AVIRIS hyperspectral image volumes by using of different filters. Three decomposition levels are used for AT-3D-SPIHT ,3D-SPECK, 3D-SBHP and 2D-SBHP in all dimensions. 3D-CB-EZW uses two decomposition levels on all three dimensions. 3D-SBHP uses code-block dimensions of 64 × 64 × 4. The results show that none of these filters performs the best for all data sets. These 51 three filters have similar performance on medical image data sets. I(2+2,2) performs the worst on AVIRIS image data sets for almost all selected algorithms, but shows average performance on medical image data sets. In general, the S+P and I(2,2) filters perform better most of the time. Algorithm GOS 3D-SBHP 16 AT-3D-SPIHT 16 3D-SPECK whole sequence 3D-CB-EZW 16 2D-SBHP 1 Filter S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) CT Skull 2.2911 2.2301 2.2701 2.0752 2.1321 2.1754 2.2063 2.1626 2.0170 2.2046 2.9519 2.1792 3.2916 3.3125 3.2969 CT Wrist 1.4644 1.3347 1.4002 1.2811 1.2490 1.3083 1.3731 1.2718 1.2538 1.3274 1.8236 1.2267 1.9733 2.0267 1.9720 CT Carotid 1.5941 1.6684 1.6631 1.4976 1.5772 1.5844 1.6041 1.6824 1.6517 1.4553 2.1408 1.4618 2.1300 2.1795 2.1421 CT Aperts 1.1047 1.0525 1.0876 1.0403 0.9938 1.0370 1.1134 1.0667 1.1502 1.0139 1.4263 0.9424 1.3573 1.4564 1.4326 Table 3.4: Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on CT data. 3.4.1.2 Comparison of Lossless Performance with Different Algorithms Table 3.7 compares the lossless compression performance of 3D-SPIHT, 3DSPECK, 3D-CB-EZW, 3D-SBHP, JPEG2000 multi-component and a 2D lossless compression algorithms, JPEG2000 on medical data. To get these results, 3D-SBHP uses code-block dimensions 64×64×4 and GOS of 16, while other 3D algorithms treat the entire image sequence as one coding unit. For all 3D algorithms, the three level wavelet transform was applied on all three dimensions using I(2+2,2) filter. JPEG2000 multi-component first applied I(2+2) filter on the axial domain, then coded every resultant spectral slice as separate file by Kakadu JPEG2000 [50] which uses integer 5/3 filter. Comparing the average compression performance list in the last row of the table, JPEG2000 multi-component gives the best coding efficiency. As an extension of SBHP, a low-complexity alternative to JPEG2000, 3D-SBHP on average yields 52 Algorithm GOS 3D-SBHP 16 AT-3D-SPIHT 16 3D-SPECK whole sequence 3D-CB-EZW 16 2D-SPIHT 1 2D-SBHP 1 Filter S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) S+P S+P I(2,2) I(2+2,2) MR Liver1 2.5609 2.5001 2.5257 2.3697 2.3423 2.3191 2.5520 2.5049 2.4331 2.4156 3.2270 2.3239 3.1288 3.4102 3.5061 3.4090 MR Liver2 1.8225 1.8354 1.8477 1.7444 1.7501 1.7868 1.8403 1.8585 1.8733 1.7530 2.5771 1.7512 2.4982 2.5759 2.6528 2.5606 MR head 2.3241 2.3091 2.3219 2.2025 2.1557 2.2071 2.3463 2.3455 2.3589 2.3569 2.8631 2.2690 2.6913 2.9069 3.1575 3.1451 MR chest 2.2387 2.0081 2.0873 2.0280 1.8779 1.9629 2.1951 2.0320 2.1160 2.1174 2.4954 1.9895 2.8555 3.1367 3.2131 3.1327 Table 3.5: Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on MR data. Algorithm GOS 3D-SBHP 16 AT-3D-SPIHT whole sequence 3D-SPECK whole sequence Filter S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) S+P I(2,2) I(2+2,2) moffett scene 1 7.0598 7.1848 7.4741 6.5270 6.5860 6.6316 6.9102 7.1360 7.2617 moffett scene 2 8.4385 8.5674 8.7774 7.6781 7.7774 7.8481 8.0550 8.1910 8.2420 moffett scene 3 6.8563 6.8536 7.1244 6.3969 6.4701 6.5026 6.8209 6.6402 6.7016 jasper scence 6.8097 6.9705 7.2418 6.2602 6.5324 6.5647 6.7014 7.0213 6.8403 Table 3.6: Comparison lossless coding results in terms of bit/pixel of different coding methods by use of different integer filters on AVIRIS data (Decomposition level of 3 is used on all dimensions). 23% higher compression performance than 2D JPEG2000, and is 13% worse than JPEG2000 multi-component. Compared with the average compression results of other 3D algorithms, 3D-SBHP is 2%, 10% and 13% worse than 3D-SPECK, 3DSPIHT and 3D-CS-EZW, in compression efficiency, respectively. On the other hand, 3D-SBHP outperforms most algorithms on some sequences. If we consider the fact that 3D-SBHP is applied with GOS = 16, while other 3D algorithms use the whole sequence as their coding unit, a small performance gap is expected. 53 Table 3.8 presents the lossless performances of 3D-SBHP, 3D-SPIHT, 3DSPECK, JP2K-Multi, 2D-SPHIT and JPEG 2000 on hyperspectral data. 3D-SBHP uses five-level dyadic S+P(B) filter on spatial domain and two-level 1D S+P(B) filter on the spectral axis with GOS = 16 and code-block size = 64 × 64 × 4. JP2KMulti is implemented first by applying the S+P filter on spectral dimension and is then followed by application of the 2D JPEG 2000 on the spatial domain using the integer filter(5,3). For all other 3D algorithms, all 224 bands are coded as a single unit and five-level filter are applied on every dimension. For AVIRIS test image volumes, 3D-SPIHT gives the best coding efficiency. 3D-SBHP is comparable to 3D-SPIHT on AVIRIS image sequence. On average, it is only about 2% inferior to 3D-SPIHT and 3D-SPECK. Our algorithm yields, on average, about 2%, 13% and 17% higher compression efficiency than JPEG2000 multicomponent, 2D-SPIHT and JPEG2000, respectively. Again, we sacrifice coding efficiency to gain random accessibility and low memory usage by using GOS = 16. Compared with other coding algorithms, 3D-SBHP performs better on hyperspectral images than on the medical images. In Table 3.1, we listed the average standard deviations (STD) of medical and hyperspectral image dataset along x, y, and z directions. Since hyperspectral image data has high STD along all three directions, that means after wavelet transform, hyperspectral images tends to have notable number of high value coefficients in high frequency subbands. Quadtree partition used in 3D-SBHP can zoom to these areas of high energy very quickly. For medical image data, the STD along all three directions, especially along the axial direction is very low. After wavelet transform, the high frequency subbands will have very low energy. When a wavelet coefficient at a coarse scale is insignificant with respect to a given threshold T, wavelet coefficient of the same orientation in the same spatial location at finer scales have much higher possibility to be insignificant with respect to T. This property can be used by zerotree or spatial orientation trees. This may explain why 3D-SPIHT and 3D-EZW give better performance than 3D-SPECK and 3D-SBHP on medical image data, but inferior or close performance on hyperspectral image data. 54 File Name CT Skull CT Wrist CT Carotid CT Aperts MR Liver t1 MR Liver t2e1 MR Sag head MR Ped chest average 3DSPIHT 2.0051 1.1570 1.5498 1.0313 2.2447 1.6914 2.1750 1.9218 1.7220 3DSBHP 2.2701 1.4002 1.6631 1.0876 2.5257 1.8477 2.3219 2.0873 1.9004 3DSPECK 2.0170 1.2538 1.6517 1.1502 2.4331 1.8733 2.3589 2.1160 1.8567 3DCB-EZW 2.0095 1.1393 1.3930 0.8923 2.2076 1.6591 2.2846 1.8705 1.6820 JP2KMulti 1.7450 1.1771 1.6785 0.7290 2.3814 1.6247 2.5961 1.4884 1.6775 JPEG 2000 2.9993 1.7648 2.0277 1.2690 3.2640 2.5804 2.9134 3.1106 2.4912 Table 3.7: Comparison of different coding methods for lossless compression of 8-bit medical image volumes (bits/pixel). File Name moffett scene 1 moffett scene 2 moffett scene 3 jasper scene 1 average 3DSPIHT 6.9411 7.9174 6.7402 6.7157 7.0786 3DSBHP 7.0333 8.4333 6.8359 6.7842 7.2716 3DSPECK 6.9102 8.0835 6.8209 6.7014 7.1290 JP2KMulti 7.1748 8.4131 7.0021 6.8965 7.3716 2DSPIHT 7.9714 9.8503 7.5874 7.7977 8.3458 JPEG 2000 8.7905 10.0815 7.7258 8.8560 8.7959 Table 3.8: Comparison of different coding methods for lossless coding of 16-bit AVIRIS image volumes (bit/pixel) (Decomposition level of 5 is used on spatial domain and decomposition level of 2 is used on spectral axis) 3.4.1.3 Lossless coding performance by use of different code-block sizes Table 3.9 compares the lossless compression results for all image data listed in Table 6.2 by using different code-block sizes: 8 × 8 × 2, 16 × 16 × 2, 32 × 32 × 4 and 64 × 64 × 4. The image sequences are compressed with GOS = 16 and I(2,2) filter. Three levels of wavelet decomposition is applied on all three dimensions. The results show that for all image sequences, increasing the size of the code-block improves the performance somewhat. The reason for the improvement of coding efficiency is that larger code-block size decreases the total overhead for the whole image sequence. 3.4.2 Lossy performance As discussed before, we can obtain reconstructed volumetric slices at any bit rate from just one compressed embedded bitstream generated by 3D-SBHP. To get good lossy performance, we use the wavelet transform structure and scaling factors shown in Figure 3.5 and Figure 3.4(c) that make the 3D integer transform 55 File Name Skull Wrist Carotid Aperts Liver t1 Liver t2e1 Sag head Ped chest moffett scence 1 moffett scence 2 moffett scence 3 jasper scence 1 8×8×2 3.1066 2.1780 2.5093 1.8857 3.3724 2.6961 3.1859 2.8729 8.3711 9.8242 8.0128 8.1417 16 × 16 × 2 2.4758 1.5601 1.8973 1.2718 2.7478 2.0709 2.5538 2.2502 7.5104 8.9170 7.1722 7.2922 32 × 32 × 4 2.2617 1.3604 1.6952 1.0793 2.5287 1.86613 2.3395 2.0372 7.2282 8.6086 6.8960 7.0130 64 × 64 × 4 2.2301 1.3347 1.6684 1.0525 2.5001 1.8354 2.3091 2.0081 7.1848 8.5674 6.8536 6.9705 Table 3.9: Lossless Coding Results by Use of Different Code-block Size (bits/pixel) approximately unitary. In this section, we show performance of lossy reconstruction from lossless compressed file. The quality of reconstruction is measured by peak signal to noise ratio (PSNR) over the whole image sequence. PSNR is defined by P SN R = 10 log10 x2peak dB M SE (3.18) where xpeak = 255 for these medical images and MSE denotes the mean squarederror between all the original and reconstructed slices. Table 3.10 shows the PSNR performance for four different medical volumetric image sets at four different bit rates (these rates are obtained by truncation of lossless bitstream). The I(2,2) filters and 32 × 32 × 4 code-block size are used here. Bit rate 0.125 bpp 0.25 bpp 0.5 bpp 1.0 bpp CT Skull 33.36 38.65 42.92 47.17 CT Carotid 39.76 44.04 47.06 49.6 MR Liver t1 38.18 41.88 44.58 46.87 MR Sag head 39.08 42.6 45.77 48.29 Table 3.10: PSNR performance (in dB) of 3D-SBHP at various rates for medical volumetric image data. These rates are obtained by truncation of lossless bitstream. Figure 3.13 and Figure 3.14 show the reconstructed CT Skull 1st slice and MR Liver t1 1st slice by 3D-SBHP. The algorithm yields good subjective quality. Small differences in quality from the original are almost not noticeable, especially at higher bit rates. 56 3.4.3 Resolution scalable results The CT medical sequence ”skull” , I(2,2) integer filter, and 32 × 32 × 4 code- block size are selected for this comparison. Figure 3.15 shows the reconstructed CT skull sequence decoded from a single scalable code stream at a variety of resolutions at 0.125 bpp. Here, bit rate (bpp) is defined as number of code bits (byte budget×8) divided by the total number of pixels in the original image sequence The PSNR values listed in Table 3.11 for low resolution image sequences are calculated with respect to the lossless reconstruction of the corresponding resolution. Table 3.11 shows that for a given byte budget, the PSNR values increase from one resolution to the next lower one. And at each reolution level, the PSNR value increase from lower bit rates to higher ones. The corresponding byte budget for losslessly reconstruction at a variety of resolutions are provided in Table 3.12. We can see that the computational cost of decoding reduces from one resolution level to the next lower one. Bit Rate (bpp) Byte budget (bytes) 0.03125 0.0625 0.125 0.25 0.5 1.0 2.0 49152 98304 196608 393216 786432 1572864 3145728 PSNR (dB) 1/4 resolution 33.12 49.73 lossless lossless lossless lossless lossless 1/2 resolution 25.53 31.61 38.84 43.88 lossless lossless lossless FULL 21.74 25.8 33.80 37.97 42.09 46.81 51.12 Table 3.11: PSNR for decoding CT skull at a variety of resolutions and bit rates Figure 3.15 demonstrates the first reconstructed slice from the reconstructed sequence decoded at 0.125 bpp to a variety of resolutions. Even at a low resolution, we can get a clear view of the image sequence. Byte Budget (bytes) for lossless reconstruction 1/4 resolution 1/2 resolution FULL 137333 757110 3725185 Table 3.12: Byte used for losslessly reconstruct CT skull at a variety of resolutions 57 3.4.4 Computational Complexity One of the main advantages of 3D-SBHP is its speed of operation. In this section, we show the coding speed. 3D-SBHP has been implemented using standard C++ language and complied by VC++.NET compiler. Tests are performed on a laptop with Intel 1.50GHz Pentium M processor and Microsoft Windows XP. The coding speed is measured by CPU cycles. The RDTSC (read-time stamp counter) instruction is used for cycle count. CT skull and MR liver t1 are selected for testing. These image sequences are compressed losslessly with GOS = 16 and code-block size = 32 × 32 × 4. And threelevels of spatial dyadic integer wavelet transform and two-levels temporal integer wavelet transform are applied on all image sequences by using I(2,2) filters. Both 3D-SBHP and AT-3D-SPIHT schemes perform lossless encoding. The decoding time of 3D-SBHP and AT-3D-SPIHT given in this section includes the time for bit allocation and decoding the selected bitstream. In Table 3.13, Table 3.14 and Table 3.15, we measure only the encoding and decoding time. The wavelet transform times and disk I/O times are not included. The lossless encoding times of AT-3D-SPIHT and 3D-SBHP on CT Skull and MR liver t1 are compared in Table 3.13, measured in total CPU cycles used for whole image sequence and average CPU cycles used for a single pixel. Table 3.14 compares the decoding times of AT-3D-SPIHT and 3D-SBHP on CT Skull and MR liver t1 at the rate of 0.125, 0.25, 0.5 and 1.0 bpp. The comparison shows that 3D-SBHP encoder runs around 6 times faster than AT-3D-SPIHT encoder. As bit rate increases from 0.125 bpp to full bit rate, 3D-SBHP decoder is about 2 to 10 times faster than AT-3D-SPIHT decoder. For both schemes, the decoding time is much less than encoding time. The decoding times increase around twice when the bit rate is doubled. For these two kinds of test image sequences, the average coding times used for coding a single pixel are very similar at every bit rate. Table 3.15 compares the decoding times of 3D-SBHP on CT Skull and MR liver t1 at a variety of resolutions. These variable resolution reconstructions are losslessly decoded from the lossless bit stream. The table gives both the total CPU cycles and cycles/pixel used to losslessly reconstruct image sequence to the desired resolution 58 level. Cycles/pixel value is calculated by averaging the total cycles over the number of pixels in the original image sequence. The results show that the computational cost are reduced rapidly (about 1/6) from one resolution level to the next lower one. Table 3.16 compares CPU cycles used for wavelet transform, encoding and disk I/O operations for both AT-3D-SPIHT and 3D-SBHP on CT Skull image squence. In AT-3D-SPIHT, whole image sequence is read into memory, then followed a twolevel 1D wavelet transform on axial direction and three-level 2D dyadic wavelet transform in spatial domain. The coding algorithm is applied on the whole transformed sequence. In 3D-SBHP, the whole sequence are divided into GOS with size 16. The same level wavelet transform is applied on every GOS separately. Codeblocks with size 32 × 32 × 4 are coded by 3D-SBHP coding algorithm independently. The comparison shows that with smaller GOS size, wavelet transform performed in 3D-SBHP is about 3 time faster than the one performed in AT-3D-SPIHT. For AT-3D-SPIHT, more CPU cycles are used for virtual memory paging because of its larger memory usage. For both 3D-SBHP and AT-3D-SPIHT, major amount of CPU cycles are used for disk I/O operation. The speed of disk I/O operation is mainly determined by the amount of memory , the data transfer rate of the hardware and the frequency to access low-speed peripheral device such as disk drives. File CT Skull MR liver t1 Total Cycles (×106 ) 3D-SBHP AT-3D -SPIHT 1643.162 10086.096 449.921 2560.516 Cycles/pixel 3D-SBHP AT-3D -SPIHT 130.58 801.570 143.58 813.966 Table 3.13: The comparison of lossless encoding time between AT-3DSPIHT and 3D-SBHP on image CT skull and MR liver t1. (Wavelet transform times are not included.) 3.5 Summary and Conclusions A low complexity three-dimensional image coding algorithm, 3D-SBHP, is pre- sented in this chapter. Fixed Huffman coding and one coding pass per bit plane are used to reduce the coding time. The proposed algorithm supports all functions of JPEG2000. Integer wavelet transform is used to enable lossy-to-lossless recon- 59 Bit Rate Total Cycles (×106 ) 3D-SBHP AT-3DSPIHT CT Skull 0.125 0.25 0.5 1.0 lossless MR liver t1 0.125 0.25 0.5 1.0 lossless Cycles/pixel 3D-SBHP AT-3DSPIHT 155.757 209.629 312.836 496.994 814.119 375.695 786.145 1677.159 3689.307 8333.717 12.37 16.65 24.86 39.49 64.70 29.86 62.477 133.29 293.20 662.30 38.857 53.322 80.057 126.758 231.21 96.860 174.739 396.864 844.629 2142.805 12.35 16.95 25.44 40.29 73.50 30.79 55.55 126.16 268.50 681.18 Table 3.14: The comparison of decoding time between AT-3D-SPIHT and 3D-SBHP on image CT skull and MR liver t1 at a variety of bit rates. (Wavelet transform times are not included.) Resolution CT Skull 1/4 1/2 Full MR liver t1 1/4 1/2 Full Decoding Total Cycles (×106 ) Cycles/pixel 18.638 113.901 814.119 1.481 18.104 64.70 6.903 38.106 231.21 2.194 12.113 73.50 Table 3.15: Losslessly decoding time of 3D-SBHP on CT skull and MR liver t1 at a variety of resolutions struction. The experimental results show that with small loss of quality, it is able to encode an image sequence around 6 times faster than AT-3D-SPIHT. And according to the bit rate, it is able to decode a image sequence about 2 to 10 times faster than AT-3D-SPIHT. 60 wavelet transform time encoding time I/O time total coding time Total Cycles (×106 ) 3D-SBHP AT-3D -SPIHT 975.830 3036.749 1643.162 10086.096 1105972.911 1676627.888 1108591.903 1689750.733 Cycles/pixel 3D-SBHP AT-3D -SPIHT 77.55 241.33 130.58 801.570 87894.82 133246.41 88102.96 134289.32 Table 3.16: The comparison of CPU cycles used for wavelet transform time, lossless encoding and disk I/O between AT-3D-SPIHT and 3D-SBHP on CT skull image sequence. 61 x8 x4 x4 x2 x2 x1 x2 x1 ´ 2 −1 x1 (a) Scaling factor for three-level 2D dyadic integer wavelet transform. x2 x1 x1 LL LH HL ´ 2 −1 HH (b) Wavelet packet transform in the third dimension. LL LH x8 x8 x4 x4 x4 x2 x16 HL x4 x4 x2 x2 x2 x1 x8 x2 x2 x1 F1 HH x4 x4 x2 x2 x2 x1 x8 x1 ´ 2 −1 x1 F2 x2 x2 x1 x1 x1 ´ 2 −1 x4 x1 ´ 2 −1 x1 F3 ´ 2 −1 ´ 2 −2 ´ 2 −1 F4 (c) Scaling factor for three-level 2D integer wavelet transform with two-level packet transform in the axial dimension. Figure 3.4: An example of scaling factor used in integer wavelet transform to approximate a 3D unitary transform. 62 Figure 3.5: Wavelet decomposition structure with 2 levels of 1D packet decomposition along axial direction, followed by 3 levels of 2D dyadic transform in spatial domain. X S I Figure 3.6: Partitioning of the code-block into set S and I. 63 O(S) S S O(S) (a) size(S) ≤ size(M ax2D). (b) size(S) > size(M ax2D). Figure 3.7: Quadtree partitioning of set S. s s s s s s s I (a) size(S) ≤ size(M ax2D). s s s I (b) size(S) > size(M ax2D). Figure 3.8: octave-band partitioning of set I. 64 (a) (b) (c) 8*8*2 (d) (e) Figure 3.9: Set partitioning rules used by 3-D SBHP. 3 2 1 4 4 4 6 5 9 12 8 11 7 10 7 7 10 10 Figure 3.10: 12 resolution levels with 3-level wavelet decomposition in the spatial domain and 2-level wavelet decomposition in the spectral direction. 65 block L block 0 header header the highest bitplane header header b(n_0,0) b(n_1,1) header b(n_j,j) b(n_i,i) b(n_L-1,L) b(n_j-1,j) b(n_i-1,i) b(n_0-1,0) b(n_L,L) b(n_1-1,1) SNR scalable b(0,L) b(0,0) b(0,1) b(0,j) b(0,i) the lowest bitplane LLLLLL Subband resolution 0 HHHHHH Subband resolution k Resolution scalable Figure 3.11: An example of 3D-SBHP SNR and resolution scalable coding. Compressed bitstream generated on bitplane α in codeblock β is notated as b(α,β) . Code-blocks are encoded and indexed from the lowest subband to the highest subband. the highest bitplane header b(n,0) b(n-1,0) block 0 Ri , j , 0 the lowest bitplane b(0,0) header b(m,1) b(m-1,1) b(0,1) block 1 δDi , j , 0 Figure 3.12: Bitstream structure generated by 3D-SBHP. Compressed bitstream generated on bitplane α in code-block β is notated as b(α,β) . R(i,j,k) denotes the number of bit used after ith coding pass (i = 0: LIP pass; i = 1: LIS pass; i = 2: LSP pass) at the nth bit plane for code-block Bk . Derivation D(i,j,k) denotes the derivation of the rate distortion curve, δDi,j,k , after ith coding pass (i = 0: LIP pass; i = 1: LIS pass; i = 2: LSP pass) at the nth bit plane for code-block Bk . 66 Figure 3.13: Reconstructed CT Skull 1st slice by 3D-SBHP, from left to right, top to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0 bpp, and original slice 67 Figure 3.14: Reconstructed MR Liver t1 1st slice by 3D-SBHP, from left to right, top to bottom: 0.125 bpp, 0.25 bpp, 0.5 bpp, 1.0 bpp, and original slice 68 Figure 3.15: A visual example of resolution scalable decoding. From left to right: 1/4, 1/2 and full resolution at 0.125 bpp CHAPTER 4 Region-of-Interest Decoding In interactive viewing application, users usually only need a section of the image sequence for analysis and diagnosis. Therefore, it is very important to have region of interest retrievability that can greatly save decoding time and transmission bandwidth. In this chapter, we illustrate how to apply 3D-SBHP to achieve RegionOf-Interest (ROI) access. JPEG2000 can achieve three ROI coding methods: tiling, coefficient scaling and code-block selection. Since code-block selection doesn’t require the ROI be determined and segmented before encoding, the image sequence can be encoded only once and it’s up to the decoder to extract a subset of bit stream to reconstruct an image region specified in spatial location and quality. This gives the user the flexibility at decoding time, which is vital to interactive application. In this chapter, we illustrate how to apply 3D-SBHP to achieve Region-Of-Interest (ROI) access by the method of code-block selection. 4.1 Code-block Selection Consider an image sequence which has been transformed using the discrete wavelet transform. The transformed image sequence exhibits a hierarchical pyramid structure. The wavelet coefficients in the pyramid subband system are spatially correlated to some region of the image sequence. Figure 4.1 shows an example of spatial access with code-block selection and the correlation between spatial domain and wavelet transform domain. In 3-D SBHP, code-blocks are of a fixed size, and they represent an increasing spatial extent at lower frequency subbands. Figure 4.2 gives an example of the parent-offspring dependencies in the 3D spatial orientation tree after 2-level wavelet packet decomposition( 2D spatial + 1D temporal). Except those coefficients in the lowest spatial and temporal subband, every coefficient located at (i, j, k) has its unique parent at (b 2i c, b 2j c, b k2 c) in the lower subband. All coefficients are organized by trees with roots located in the lowest subband. 69 70 Figure 4.1: Spatial access with code-blocks. t x y Ht LHt LLt Figure 4.2: Parent-offspring dependencies in the 3D orientation tree. 71 In this chapter, we consider retrieving a rectangular prism in an image sequence. Since the wavelet transform is separable, we first consider the random access problem in one dimension. Let A denotes the upper-left corner and B denotes the upper-right corner in the first frame of the rectangular prism, then [xA , xB ) denotes the range of the rectangular prism in the X direction. Let [xFk,l , xR k,l ) denotes the X-direction interval that is related to the rectangular prism at DWT level k in low-pass or high-pass subbands. Let l = {0, 1} represent the low-pass and high-pass subband respectively. Suppose the volume size of the image sequence is W × H × D. If we don’t consider the filter length, the boundaries of each interval can be found recursively using xFk,l =b xF(k−1),0 2 W c+l× k, 2 xF0,0 = xA , xR k,l =d xR (k−1),0 2 e+l× W 2k xR 0,0 = xB The spatial error penetration of the filter length effect around edges can be calculated from the wavelet filter length and level of wavelet decomposition. Topiwala[49] gives an approximate equation of the error penetration, by which the spread of the error E (in pixels) as a function of the wavelet filter length L and the number of wavelet decomposition level K is given by (2K − 1)(2 L−3 + 1), L even 2 E(K, L) = (2K − 1)(2 L−2 + 1), L odd 2 (4.1) As shown in Equation 4.1, the number of error penetrated pixels grows exponentially in the number of decomposition level and is proportional to the synthesis filter length. To get perfect reconstruction of the ROI region, we must consider the filter length. Suppose we have a synthesis filter with filter length L = M + N + 1, gn = N X i=−M ai × fn+i 72 Type I code-block Type II code-block ROI Original Image Subband image Figure 4.3: 2D example of code-block selection. Filter length is considered. the boundaries of each interval can become xFk,l = max{0, b xR k,l = min{d xF(k−1),0 − M 2 xR (k−1),0 + N 2 xF0,0 = xA , e, c} + l × W , 2k W W − 1} + l × 2k 2k xR 0,0 = xB F R Similarly, the boundaries of each interval in Y direction, [yk,l , yk,l ), and in F R temporal direction, [zk,l , zk,l ), can be found following the same principle. Suppose that an image sequence is decomposed at level K in spatial domain and level T in temporal domain with synthesis filter length L and coded with codeblock size O × P × Q. To reconstruct a X × Y × Z (Z ≤ GOS size) 3D region, where F X = xR 0,0 − x0,0 , R F Y = y0,0 − y0,0 , R F Z = z0,0 − z0,0 . The number of decoded code-blocks, denoted as NB , is given below. ! Ã ! R F xFj,l yj,l yj,l xR j,l e−b c+1 × d e−b c+1 NB = s× d O O P P j=1 l=1 Ã !! R F T P t zi,n zi,n P d × e−b c+1 Q Q i=1 n=1 Ã K S P P Ã 73 where, 1, j < K 3, S = 1 1, i < T t= S= s= 0, i = T 0, j = K 1, S = 0 (4.2) For example, a 32×32×4 3D region is positioned at row 64, column 90, in frame number 5 of a image sequence which is decomposed at level 2 with synthesis filter length 3 and coded with code-block size 16×16×2. If we do not consider filter length, 96 code-blocks are needed for reconstruction of the ROI region. Here we call these code-blocks type I code-blocks. To losslessly reconstruct this region, 156 code-blocks are needed. Here, 60 more code-blocks are used for lossless reconstruction. In this paper, we call these extra code-blocks type II code-blocks. Since subband transforms are not shift invariant, the same 3D region positioned at different locations may need different numbers of code-blocks for reconstruction. Figure 4.3 shows a 2D illustration of code-block selection where the filter length is considered. Type I code-blocks and type II code-blocks are labelled in the figure. Figure 4.4(a) and Figure 4.4(b) give both 2D and 3D visual examples of ROI decoding. In the 3D example, the region of ROI is from (134, 117, 17) to (198, 181, 112). (a) A 2D visual example of 3D-SBHP random access decoding. The left: the 17th slice of CT skull sequence at 1/2 resolution; The right: the 17th slice in the ROI decoded image sequence, full resolution. (b) A 3D visual example of 3D-SBHP random access decoding. The left: CT skull sequence; The right: the ROI decoded image sequence. Figure 4.4: An visual example of 3D-SBHP random access decoding. 74 4.2 Random Accessibility In this section, we assess the impact that the wavelet transform and the code- block configuration have on the compression efficiency and accessibility of the scalable bitstream generated by 3D-SBHP. 4.2.1 Wavelet Transform vs. Random Accessibility As filtering is a expansive operation, the number of wavelet coefficients that need to be extracted for reconstruction always exceeds the number of pixels contained in the rectangular prism of interest. As shown in Equation 4.2, the number of code-blocks needed to reconstruct a given region is a function of both wavelet decomposition level and filter length. In this section, we show their effects on random accessibility. 4.2.1.1 Filter Implementation For comparison, we use 2-tap S filter, I(2,2), I(4,2) and I(4,4) filters for constructing wavelet transforms to map integers to integers. The equations of the filters are given below. Table 4.1 gives the number of low/high-pass filter tape for these four integer filters. h =c m 2m+1 − c2m 2 − tapSf ilter lm = c2m + b 1 hm c (4.3) 2 1 1 h =c m 2m+1 − b 2 (c2m + c2m+2 ) + 2 c, I(2, 2) lm = c2m + b 1 (hm−1 + hm ) + 1 c 4 (4.4) 2 9 hm = c2m+1 − b 16 (c2m + c2m+2 ) − I(4, 2) 1 16 (c2m−2 + c2m+4 ) + 12 c, (4.5) lm = c2m + b 14 (hm−1 + hm ) + 21 c 9 (c2m + c2m+2 ) − hm = c2m+1 − b 16 1 1 (c 2m−2 + c2m+4 ) + 2 c, 16 I(4, 4) 9 lm = c2m + b 32 (hm−1 + hm ) − 1 (h +h ) + 1c 32 m−2 m+1 2 (4.6) 75 Filter Name 2-tape S I(2,2) I(4,2) I(4,4) Number of low-pass Filter Taps 2 5 9 13 Number of high-pass Filter Taps 2 3 7 7 Table 4.1: Number of taps of integer filters. 4.2.1.2 ROI decoding performance by use of different wavelet filters and wavelet decomposition levels Tables 4.2 and 4.3 show the effect of different filter lengths and wavelet decomposition levels on the spatial direction upon the ROI decoding performance. A two-level wavelet transform with I(2,2) filter is used on the temporal direction in all cases. The experiment is performed on the CT Carotid sequence with code-block size = 8 × 8 × 2 and the rectangular region with the lower left corner in the center of image slice and sizes of 64 × 64 × 64 are selected as ROI region. The distortions are compared after all type I code-blocks are decoded. We see that decreasing the synthesis filter length offers slightly better distortion when number of taps is larger than two. As shown in 4.7(b), only the 2-tap S filter allows perfect separation background from ROI, although its compression performance is inferior to longer filters. For other longer filters, the quality of ROI suffers much from error penetration, shown in Figure 4.7(c)-4.7(e). Table 4.3 gives the performance when spatial wavelet decomposition level is 2. It provides more than 4dB better performance for all filters than 3 level decomposition. Again, we see that fewer decomposition levels offer better ROI quality at the price of decreased compression capability. As shown in Equation 4.1, the number of error penetrated pixels grows exponentially in the number of decomposition level and is proportional to the synthesis filter length. To improve ROI quality, reducing the number of decomposition level and filter length is preferred. However, if the compression efficiency and quality of the background are also of some importance, a balance has to be established. 76 Table 4.2: Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64×64×64, code-block size = 8×8×2, spatial wavelet decomposition level = 3) Filter Type 2-tape S I(2,2) I(4,2) I(4,4) Decoding type code-blocks Bit rate(bpp) 0.2237 0.1967 0.1912 0.1922 I PSNR(dB) lossless 18.83 18.48 18.49 Bit rate for losslessly decoding ROI(bpp) Bit rate for lossless compression(bpp) 0.2237 0.6150 0.6018 0.6058 2.7647 2.5973 2.5802 2.5933 Table 4.3: Comparison of different wavelet filter on ROI access and lossless encoding (ROI size = 64×64×64, code-block size = 8×8×2, spatial wavelet decomposition level = 2) Filter Type 2-tape S I(2,2) I(4,2) I(4,4) 4.2.2 Decoding type code-blocks Bit rate(bpp) 0.2267 0.1999 0.1937 0.1947 I PSNR(dB) lossless 22.87 22.57 22.59 Bit rate for losslessly decoding ROI(bpp) Bit rate for lossless compression(bpp) 0.2267 0.5541 0.5397 0.5433 2.7796 2.6058 2.5888 2.6020 Code-block Configurations vs. Random Accessibility In this section, we turn our attention to code-block configurations which op- timize the random accessibility and compression efficiency. We use integer filter I(2,2) with three levels of spatial dyadic wavelet transform and two levels temporal wavelet transform for comparison. 4.2.2.1 Lossy-to-lossless coding performance by use of different codeblock sizes We first compare the lossy-to-lossless coding performance by using different coding units. We test 3-D SBHP on a set of eight 8-bit medical image sequences. Table 6.2 shows the description of these sequence. Figure 4.5 illustrates the effect of increasing the code-block size on the rate-distortion performance. Average PSNR is calculated over all eight image sequences. The results show there is only subtle decrease in rate-distortion performance when the code-block size reduces from 64 × 64 × 4 to 32 × 32 × 2. The significant loss is observed when the code-block size is less 77 50 45 40 PSNR(dB) 35 30 25 8×8×2 16×16×2 32×32×2 32×32×4 64×64×4 20 15 10 0 0.2 0.4 0.6 0.8 1 bpp 1.2 1.4 1.6 1.8 2 Figure 4.5: Rate-distortion performance with increasing code-block size. than 16 × 16 × 2. This performance decrease is particularly significant at bit rates < 0.5 bpp where there is as much as 10 dB decrease in PSNR for 16 × 16 × 2 codeblock size and 20 dB decrease in PSNR for 8 × 8 × 2 code-block size compared to the other three larger code-block sizes, for a given bit rate. The reason for the reduction of coding efficiency is that smaller code-block size increases the total overhead for the whole image sequence. As shown in the figure, for 8 × 8 × 2 code-block size, when bit rate increases from 0.03125 bpp to 0.125 bpp, there is almost no PSNR increase. That means all decoded bits are overhead bits. Table 4.4: Description of the image volumes File Name Skull Wrist Carotid Aperts Liver t1 Liver t2e1 Sag head Ped chest Image Type CT CT CT CT MR MR MR MR Volume Size 256 × 256 × 192 256 × 256 × 176 256 × 256 × 64 256 × 256 × 96 256 × 256 × 48 256 × 256 × 48 256 × 256 × 48 256 × 256 × 64 78 4.2.2.2 ROI decoding performance by use of different code-block sizes and ROI sizes In Figure 4.6, we show the inter-dependence between ROI size and code-block size. The experiment is performed on the CT Carotid sequence and the rectangular region with the lower left corner in the center of image slice and sizes of 16 × 16 × 64, 32 × 32 × 64 and 64 × 64 × 64 are selected as ROI. For every ROI size, the figure gives performance of using 8 × 8 × 2,16 × 16 × 2 and 32 × 32 × 2 code-blocks. All type I code-blocks are decoded before type II code-blocks. It can be seen that code-block size 8 × 8 × 2 gives the best lossless decoding performance in all three ROI size experiments. And enlarging the size of the ROI increases the bit rate at which the ROI is losslessly decoded only when the ROI size is larger or equal to the image region to which a code-block in the highest subband is correlated. In the figure, the rate-distortion at the point when all type I code-blocks are fully decoded is labelled by black × on every curve. It is clear that when type II code-blocks are decoded, for a given code-block size, smaller ROI curves have higher slope, and for a given ROI size, those curves with smaller code-block size have higher slope. The higher slope indicates that the type II code-blocks are more important than those in the lower slope case. It also indicates that in smaller code-block cases, higher percentage of bits decoded in the type II code-blocks are used for perfect reconstruction of the ROI, while in the larger code-block case, more bits decoded in the type II codeblocks contribute to background. However, the larger code-block size gives better rate-distortion performance at low bit rate. This occurs because of the overhead of a code-block. Therefore, in applications where only a high quality ROI is required or ROI size is small, smaller code-blocks (say 8 × 8 × 2) should be used. Whereas if the desired bit rate is very low, or the background is also of some importance, larger code-blocks should be used. 4.2.3 ROI access performance by use of different bit allocation methods In all of our previous experiments, the bit allocation method we used for ROI region is that we first allocate bit to type I code-blocks from higher bit-plane to lower bit-plane, then allocate the remainder of the bit budget to the type II code- 79 ROI 16×16×64 70 8×8×2 16×16×2 32×32×2 60 PSNR(dB) 50 40 30 20 10 0 0.2 0.4 0.6 0.8 1 bpp 1.2 1.4 1.6 1.8 2 (a) ROI size= 16 × 16 × 64 ROI 32×32×64 70 8×8×2 16×16×2 32×32×2 60 PSNR(dB) 50 40 30 20 10 0 0.2 0.4 0.6 0.8 1 bpp 1.2 1.4 1.6 1.8 2 (b) ROI size= 32 × 32 × 64 ROI 64×64×64 70 8×8×2 16×16×2 32×32×2 60 PSNR(dB) 50 40 30 20 10 0 0.2 0.4 0.6 0.8 1 bpp 1.2 1.4 1.6 1.8 2 (c) ROI size= 64 × 64 × 64 Figure 4.6: Rate-distortion performance with increasing ROI size. 80 (a) 5th slice of original (b) The ROI decoded image slice CT Carotid image sequence with Haar filter (c) The ROI decoded image slice (d) The ROI decoded image slice with I(2,2) filter with I(4,2) filter (e) The ROI decoded image slice with I(4,4) filter Figure 4.7: A visual example of ROI decoding from 3-D SBHP bit stream using different wavelet filters. 81 blocks. In Figure 4.6(b), when 8 × 8 × 2 code-block is used, all type I code-blocks are fully decoded at rate 0.062 bpp with low P SN R = 19.37dB. And when the bit rate increases from 0.0315 bpp to 0.062 bpp, there is no significant performance improvement shown in the figure, while when the type II code-blocks are decoded, the quality increases sharply. That means that type II code-blocks play an important role in improving the quality and the low bit planes (corresponding to the least significant bit of the wavelet coefficients in the ROI) in type I code-blocks contribute very little to the visual quality of the ROI. Therefore, it would make sense to terminate decoding type I code-blocks before we reach the low bit planes and instead send high bit planes from the type II code-blocks when the given bit rate is not enough to fully decode all ROI related code-blocks. In Figure 4.8, we compare the rate-distortion performance of three different bit allocation method. The first, shown as decode priority 1 in the figure, is the decoding scheme we used in the last experiment. The second, shown as decode priority 2, gives corresponding bit planes in those two kind of code-blocks the same priority. That means the second scheme allocates bit to all ROI related code-blocks together from the highest bit plane to the lowest bit plane. In the third scheme, shown as decode priority 3, we first allocate bits for the type I code-blocks from the highest bit-plane to the fourth lowest bit-plane, then allocate the left bit budget to all ROI related code-block from higher bit-plane to the lower bit-plane. Comparing these three schemes, although they achieve lossless decoding the ROI at the same bit rate, their rate-distortion curves are significantly different. At low bit rate (< 0.1bpp), the third scheme gives at most 15dB and 5dB better performance than the first and second scheme, respectively, whereas at higher bit rate, the first scheme performs best. Therefore, given a bit rate, to get the best lossy ROI decoding performance, we need to find an optimal bit allocation method according to the relative importance between type II code-blocks and type II codeblocks. As we addressed in previous sections, the relative importance between those two kind of blocks depends on code-block size, ROI size, filter length and wavelet decomposition level. 82 70 60 PSNR(dB) 50 40 30 20 10 Figure 4.8: 4.3 Decode priority 1 Decode priority 2 Decode priority 3 0 0.1 0.2 0.3 0.4 0.5 bpp 0.6 0.7 0.8 0.9 1 Rate-distortion performance with different priorities for code-blocks. Conclusions In this chapter, we present 3D-SBHP algorithm and empirically investigated code-block selection ROI access method by applying 3D-SBHP on medical volumetric images. Our work shows that the ROI access performance is affected by several coding parameters. We also outline some trade-offs in ROI access. At last, we give a possible way to optimize ROI access performance at the decoder side. CHAPTER 5 Multistage Lattice Vector Quantization for Hyperspectral Image Compression Lattice vector quantization (LVQ) offers substantial reduction in computational load and design complexity due to the lattice regular structure [52]. In this chapter, we extended SPIHT coding algorithm with lattice vector quantization to code hyperspectral images. In the proposed algorithm, multistage lattice vector quantization (MLVQ) is used to exploit correlations between image slices, while offering successive refinement with low coding complexity and computation. Different LVQs including cubic Z4 and D4 are considered. And their performances are compared with other 2D and 3D wavelet-based image compression algorithms. 5.1 Introduction As we mentioned in Chapter 1, the transform, the quantization and the coding of quantized coefficients are all candidates for exploiting the relationships between the slices. Due to the superior performance over scalar quantization, vector quantization has been applied in many wavelet-based coding algorithms. The Linde-Buzo-Gray (LBG) algorithm [55] is the most common approach to design vector quantizers. In [53], subband image coding with VQ generated by an LBG codebook is proposed. The LBG training algorithm causes high computational cost and coding complexity especially as the vector dimension and bit rate increase. Lattice vector quantization, which is an extension of uniform scalar quantization to multiple dimensions, is an approach to reduce the computational complexity [57]. Plain lattice vector quantization of wavelet coefficient vectors has been successfully employed for image compression [61, 62, 63]. In order to improve performance, it is reasonable to consider combining LVQ with powerful wavelet-based zerotree or set-partitioning image coding methods and bitplane-wise successive refinement methodologies for scalar sources, as in EZW, SPIHT and SPECK. In [64], a multistage lattice vector quantization is used along with both zerotree structure 83 84 and quadtree structure that produced comparable results to JPEG 2000 at low bit rates. VEZW [65] and VSPIHT [66, 67, 68] have successfully employed LVQ with 2D-EZW and 2D-SPIHT respectively. And in VSPECK [69], tree-structured vector quantization (TSVQ) [70] and ECVQ [71] are used to code the significant coefficients for 2D-SPECK. For volumetric images, especially for hyperspectral images, neighboring slices convey highly related spatial details. Since VQ has the ability to exploit the statistical correlation between neighboring data in a straightforward manner, we plan to use VQ on volumetric images to explore the correlation in the axial direction. In particular, the multistage LVQ is used to obtain the counterpart of bitplane-wise successive refinement, where successive lattice codebooks in the shape of Voronoi regions of multidimensional lattice are used. This chapter is organized as follows. We first reviews basic lattice vector quantization and the multistage LVQ. The multistage LVQ-based-SPIHT (MLVQSPIHT) is given in Section 5.3. The performance of the MLVQ-SPIHT for hyperspectral image compression is presented in Section 5.4. Section 5.5 concludes the chapter. 5.2 Vector Quantization The basic idea of vector quantizer is to quantize pixel sequences rather than single pixels. A vector quantizer of dimension n and size L is defined as a function that maps an arbitrary vector X ∈ Rn into one of L output vectors Y1 , Y2 , ..., YL called codevectors belonging to Rn . The vector quantizer is completely specified by the L codevectors and their corresponding nonoverlapping partitions of Rn called Voronoi regions. A Voronoi region Vi is defined by the equation [72] Vi = {X ∈ Rn /kX − Yi k ≤ kX − Yj k, i 6= j} (5.1) Given a desired bit rate per dimension b and vector dimension n, the codebook size is equal to 2bn . Although the LBG (Lind-Buzo-Gray) algorithm can generate locally optimal codebooks, the complexity grows exponentially because the codewords have to be compared and chosen between 2bn possible vectors. 85 Lattice Vector Quantization (LVQ), which builds codebooks as subsets of multidimensional lattices, solves the complexity problem of LBG based vector quantizer and yields very general codebooks. 5.2.1 Lattice Vector Quantization A lattice L in Rn is composed of all integral combinations of a set of linearly independent vectors. That is L = {Y|Y = u1 a1 + .... + un an } (5.2) where{a1 , ..., an } is a set of n linearly dependent vectors, and {u1 , ..., un } are all integers. A lattice coset Λ, is obtained from a lattice L by adding a fixed translation vector t to the points of the lattice Λ = {Y|Y = u1 a1 + .... + un an + t} (5.3) Around each point Yi in a lattice coset Λ, an associated nearest neighbour set of points called Voronoi region is defined as [72] V (Λ, Yi ) = {X ∈ Rn /kX − Yi k ≤ kX − Yj k, Yi ∈ Λ, ∀Yj ∈ Λ} (5.4) The zero-centered Voronoi region V (Λ, 0) is defined as V (Λ, 0) = V (Λ, Yi ) − Yi (5.5) In lattice vector quantization, the input vector is mapped to the lattice points of a certain chosen lattice type. The lattice points or codewords may be selected from the coset points or the truncated lattice points [72]. 5.2.1.1 Classical Lattice Conway and Sloane [57] investigated the lattice properties and determined the optimal lattices for several dimensions. They also give a fast quantization algorithm [58] which makes searching the closest lattice point to a given vector extremely fast. 86 The cubic lattice Z n is the simplest lattice form which consists all the integer points in the coordinate system with a certain lattice dimension. Other important lattices are the root lattices An (n ≥ 1), Dn (n ≥ 2), En [n = 6, 7, 8] and the BarnesWall lattice ∧16 . These lattices give the best sphere packings and coverings in their respective dimension [52]. In this chapter, we have used the Z 4 , D4 and A2 . The An lattice is defined by the following: An = {(x0 , x1 , ..., xn ) ∈ Zn+1 : x0 + ... + xn = 0}, F or n ≥ 1 (5.6) For n ≥ 3, the Dn lattice is defined as the follows: Dn = {(x1 , ..., xn ) ∈ Zn : x1 + ... + xn even} (5.7) The lattice E8 is defined by E8 = D8 [ 1 ( 18 + D8 ) 2 (5.8) where 18 stands for all-one vector of 8 dimensions. 5.2.1.2 LVQ Codebook The codebook of a lattice quantizer is obtained by selecting a finite number of lattice points out of an infinite lattice. An LVQ codebook is decided by a root lattice, a truncation and a scaling factor. The root lattice is the lattice coset from which the codebook is actually constructed. A truncation must be applied on a root lattice in order to select a finite number of lattice points and quantize the input data with finite energy. The bit rate of the LVQ is determined by the number of points in the truncated area. To obtain the best distortion rate, we must scale and truncate the lattice properly. To do this, we need to know how many lattice points lie within the truncated area, i.e. to know the shape of the truncated area. Two kinds of truncation shapes are consider in this chapter. When the signal to be compressed has an i.i.d. multivariate Gaussian distribution, the surfaces of equal probability are ordinary spheres. The truncated area is spherical [62]. In 87 these applications the size of the codebook was calculated by the theta function of the lattice, which was described in [52]. In the case of Laplacian sources (for cubic lattice) where surfaces of equal probability are spheres for the L1 metric, which are sometimes called pyramids. The number of lattice points N um(n, r) lying on a hyper-pyramid of radius r in n-dimensional space Rn is given by Fischer [60] as: N um(n, r) = N um(n − 1, r) + N um(n − 1, r − 1) + N um(n, r − 1) (5.9) The truncation is determined by specifying the shape and radius of the hypersphere/hyperpyramid that best matches the probability distribution of the input source. The scaling factor is used to control the distance between any two nearest lattice points, i.e. the maximum granular error of the quantizer [64]. The support of the distribution of the granular quantization error has the shape of the Voronoi region. 5.2.2 Multistage Lattice Vector Quantization The essence of our successive refinement lattice VQ is to generate a series of de- creasing scale zero-centered Voronoi lattice regions, V0 (Λ0 , 0), V1 (Λ1 , 0), V2 (Λ2 , 0), ..., each covering the zero-centered Voronoi region of the previous higher scale. The coarsest scale quantizer is completely specified by lattice points yi and its corresponding nonoverlapping Voronoi region V0 (Λ0 , yi ). To prevent divergence of overload quantization error, the truncated LVQ at current stage should be able to cover the Voronoi region of the previous stage. On the other hand, any overlap of quantization regions at two successive stages will decrease compression efficiency. So the optimal truncated lattice should be consistent with the Voronoi region of the root lattice [64]. However, this optimal condition can not be always satisfied. Figure 5.1 gives an example of this multistage LVQ with the hexagonal A2 lattice and scale-down factor r = 4. First, input vector x ∈ Rn is quantized to be output vector u0 = y0 by the coarsest scale quantizer. The uncertainty in x has been reduced to the Voronoi region V0 (Λ0 , y0 ) around the chosen codevector y0 . Next quantizer quantizes the approximation error (x − u0 ), which falls into the zero-centered Voronoi region V0 (Λ0 , 0), using finer lattice VQ quantizer to obtain 88 a refinement u1 = z1 . Now, the uncertainty in x is reduced to the zero-centered Voronoi region V1 (Λ1 , 0) of lattice coset V1 . The next finer scale quantizer quantizes the error (x − u0 − u1 ) reducing the uncertainty in x to the zero-centered Voronoi region of V2 . Continuing in this way, the final approximation x̂ of vector x is x̂ = u0 + u1 + u2 + ... Support Region Voronoi Region with radius r Lattice point Voronoi region with radius r/4 } are lattice points shown to different scales Figure 5.1: Multistage lattice VQ with A2 lattice. 5.3 MLVQ-SPIHT In this section, we describe our new algorithm MLVQ-SPIHT, which combines multistage lattice vector quantization methodology and the SPIHT coding algorithm to code 3D hyperspectral image data sets. In MLVQ-SPIHT, 2D DWT is applied on each image slice independently. For a given vector dimension N , we segment the image sequence of the trans- 89 formed images into groups, GOS = N . N = 4 is used for our application. The coding algorithm will be applied on every GOS independently. For every transformed image slice in the same GOS, we group wavelet coefficients at the same spatial location into vectors. For example, for spatial location (i, j) and transformed slices S1 , S2 , ..., SN in the GOS, the vector associated with this location is v(i, j) = (S0 (i, j), S1 (i, j), ..., SN −1 (i, j)). A parent-child relationship between the vectors in different subbands is the same as in [19]. Figure 5.2 gives an example of parent-child relationship between vectors when vector dimension N = 4. Vector Four vectors Figure 5.2: An example of parent-child relationship between vectors when vector dimension N = 4. The SPIHT algorithm is used to search for significance at the current metric threshold, which is based on certain pre-defined decision regions that gradually decrease in scale following a given rule. Every decision region is defined by two surfaces (one is on the inside, the other is one the outside) enclosing the origin that successively decrease in size. For every given decision region, the SPIHT algorithm 90 is used to test the significance of the N-dimensional vectors. Each sorting pass locates significant vectors and roughly quantizes these significant vectors in the same pass. The vectors ascertained as significant in a pass will be progressively refined in successive passes using our multistage LVQ. Figure 5.3 uses the A2 lattice to illustrate our vector SPIHT, where the lattice at each stage decreases in scale by a factor r = 4, the threshold for SPIHT sorting pass decreases in scale by a factor of 2, and the L2 norm is used for significant test. The wavelet vectors are first scaled so that all scaled L2 norm vectors will lie within or on the hyperspherical surface of L2 norm equal to a given standardized value R. For the first sorting pass, the significant region is bounded on the outside by the hyperspherical surface of L2 norm R, and on the inside by the hyperspherical surface of L2 norm R/2. For the following sorting passes, the significant regions are bounded by zero-centered hyperspherical regions, with the inside one having half scale of the outside one. For example, if a vector is ascertained as significant in the first sorting pass, i.e., that vector is located in the 1st significant region, the vector is roughly encoded by the first stage LVQ of the 1st significant region, which uses translations of V /2 lattice coset. When the sorting pass reaches the 3rd significant region, the vector ascertained as significant in the 1st pass will be refined by the second stage LVQ, which uses translations of V /8 lattice coset. As shown in Figure 5.3, the bracketed sequences denote the successively lower scale lattices used to quantize vectors in that significant region. We believe this scheme can provide good compression performance with successive refinement. Based on the above scheme, we implemented a SPIHT-based coding algorithm using four-dimensional wavelet vectors as shown in Figure 5.2. Several LVQs are implemented in our scheme. 5.3.0.1 Cubic Z4 LVQ To define a cubic Z4 LVQ codebook, the root lattice Z4 and a cubic truncation are used. The cubic truncation requires the L∞ norm (maximum norm) for vector magnitude measurement. The L∞ norm is defined as kXk∞ := max(|x1 |, ..., |xN |) 91 Hypershperical surface of L2 norm =R V V/2 V/4 3rd Significant region V/8 (V/32, V/128,...) 2nd Significant region V/4 (V/16, V/64,...) 1st Significant region V/2 (V/8, V/32,...) Figure 5.3: Vector SPIHT with successive refinement LVQ. For cubic truncation, the bit rate is evenly allocated to each of the lattice’s Ndimension components. That implies Cubic Z4 LVQ is actually equivalent to four individual scalar quantizers that are applied independently to each of the four coefficient in a vector. The cubic truncation area has exactly the same shape as the Voronoi region of the corresponding root lattice. And the number of codewords in the codebook can always be an integer power of 2, which prevents loss in coding efficiency. Cubic truncation does not well match the typical distributions of subband coefficients and decreases the compression performance. Two different bit rates are used in our cubic Z4 multistage LVQ. When a newly significant vector is quantized in its first stage quantizer, an 8-bit LVQ is used to quantize both significance and signs. In its all refinement stages, a 4-bit LVQ is used. So the truncation radius of these two kinds of LVQs are 2 and 1, respectively. If the threshold is scaled down by two at each successive layer, these layers are equivalent to bit planes. 5.3.0.2 Pyramid D4 LVQ To define a pyramid D4 LVQ codebook, the root lattice D4 and a pyramid truncation are used. The pyramid truncation requires the L1 norm for vector mag- 92 nitude measurement. The L1 norm is defined as kXk1 := N X |xi | i=1 In our implementation, the truncation radius is set to four. Lattice points inside this truncation area lie on two hyper-pyramid surfaces with constant L1 norm 2 and 4, respectively. The number of lattice points on these two shells are 32 and 192, respectively [62]. So 8-bit indexes are used to code these 225 codewords. The same LVQ codebook is used in all stages. Since the Voronoi region is closer to a sphere and is inconsistent with the shape of the pyramid truncation, to get the best balance between overlaps and gaps between the Voronoi region at the current stage and that of the previous stage, the scale-down factor is set to 1/3 [64]. 5.3.0.3 Sphere D4 LVQ To define a sphere D4 LVQ codebook, the root lattice D4 and a sphere truncation are used. The sphere truncation requires the L2 norm for vector magnitude measurement. The L2 norm is defined as v u N uX kXk2 := t |xi |2 i=1 In our implementation, the truncation radius is set to 2. Lattice points inside this √ truncation area lie on two hyper-sphere surfaces with constant L2 norm 2 and 2, respectively. The number of lattice points on these two shells are 24 and 24, respectively [62]. So 6-bit indexes are used to code these 48 codewords. The scaledown factor is set to 1/2. 5.4 Experimental Results The proposed MLVQ-SPIHT algorithm is used compress hyperspectral image ”Moffet Field”. Its property is shown in Table 5.1. The pyramid wavelet decomposition employed here uses the S+P wavelet filter, and a 5-level spatial transform is performed. After wavelet transformation, the 93 File Name Image Type Volume Size moffett scence 1 moffett scence 3 AVIRIS AVIRIS 512 × 512 × 224 512 × 512 × 224 Bit Depth (bit/pixel) 16 16 Power (σx2 ) 4803298 2177316 Table 5.1: Description of the image volume Moffett Field magnitude of each vector is calculated according to the norm corresponding to the particular LVQ. The fast quantizing and coding algorithm proposed by Conway and Sloane [58, 59] is used to code the significant vectors. For each significant region, the LVQ indices are coded using an adaptive arithmetic coder and the significant information is adaptively arithmetic coded as described in [19]. The quality of reconstruction is measured by signal to noise ratio (SNR). SNR is defined by SN R = 10 log10 σx2 dB M SE (5.10) Figure 5.5 compares the rate-distortion performance of MLVQ-SPIHT with scalar SPIHT for each band. In Figure 5.5, the SNR results for 2DSPIHT and MLVQ-SPIHT are obtained by calculating the σx2 and MSE for each band separately. The plots show the MLVQ-SPIHT offers over 3dB improvement at 0.1 bpp and 0.5 bpp for all bands in the sequence. This implies that the hyperspectral sequences are highly correlated, and using vector quantization along the wavelength axis can efficiently exploit these inter band correlations. The visual comparison of original and reconstructed 49th band in moffet scene3 at 0.1bpp and 0.5bpp are given in Figure 5.4. Table 5.2 compares the rate-distortion results for MLVQ-SPIHT using different LVQs with 3D-SPIHT, 3D-SPECK and JPEG2000 multi-component integer implementation for the Moffett hyperspectral image volume [22]. Five-levels of the dyadic S+P (B) integer filter were applied on all three dimensions for 3D-SPIHT and 3D-SPECK. For JPEG2000 multi-component, five-level 1D S+P (B) filter was first applied on spectral axis followed by (5,3) filter on spatial domain. For MLVQSPIHT, to enable SNR scalability, bit stream boundaries are maintained for every coding layer. To compare with those three dimensional compression algorithm, 94 bits belonging to the same fraction of the same coding layer in the the different four dimensional vector bands can be extracted for decoding. The SNR results of MLVQ-SPIHT are obtained by first calculating the overall MSE and power (the σx2 shown in Table 5.1) of the whole image sequence, then following the Equation 6.3. The results show that at low bit rate, MLVQ-SPIHT algorithms outperforms 3D compression algorithms. As the bit rate increases, 3D algorithms give better performance. And in general, sphere D4 LVQ shows better performance than cubic Z4 LVQ. The reason that MLVQ-SPIHT performs worse at high bit rates, which means more quantization stages, is due to the overlaps and gaps between two successive stages. As we mentioned before, to prevent divergence of overload quantization error, the truncated LVQ at current stage should be able to cover the Voronoi region of the previous stage. On the other hand, any overlap of quantization regions at two successive stages will decrease compression efficiency. However, it’s very difficult to find a truncated lattice which is perfectly consistent with the Voronoi region of the root lattice. Bit Rate 3DSPIHT [23] 3DSPECK [23] 0.1 0.2 0.5 1.0 bpp bpp bpp bpp 15.509 20.605 29.105 37.198 15.717 20.778 29.199 37.284 0.1 0.2 0.5 1.0 bpp bpp bpp bpp 10.828 16.740 26.102 34.946 10.622 16.557 25.998 34.845 JP2KMLVQ-SPIHT Multi[23] Cubic Z4 moffet scene 1 14.770 16.475 19.655 20.617 27.999 25.136 36.312 29.602 moffet scene 3 10.264 11.817 15.952 17.144 25.298 23.859 33.835 31.169 MLVQ-SPIHT Pyramid D4 MLVQ-SPIHT Sphere D4 16.401 21.174 24.861 31.692 17.035 21.905 26.492 32.646 11.807 18.149 24.605 31.152 12.361 18.278 25.100 31.305 Table 5.2: Comparison of rate-distortion results of different coding methods in Signal-to-Noise ration (SNR) in dB 5.5 Summary and Conclusions In this chapter, we presented a multidimensional image compression algorithm which is an extension of SPIHT with lattice vector quantization and support successive refinement. In the proposed algorithm, multistage lattice vector quantization 95 is used to exploit correlations between image slices. Cubic Z4 LVQ, sphere D4 LVQ and pyramid D4 LVQ are implemented in the proposed scheme. The experimental results show that MLVQ-based-schemes exploit the inter-band correlations along the wavelength axis and provide better rate-distortion performance at low bit rate than 2DSPIHT and those algorithms that employ 3D wavelet transforms. 96 (a) Original image of moffet scene3 49th band (b) Reconstructed image of moffet scene3 49th band at 0.1bpp (c) Reconstructed image of moffet scene3 49th band at 0.5bpp Figure 5.4: Comparison of original and reconstructed moffet scene 3 49th band by MLVQ-SPIHT, from top to bottom: original, 0.1bpp, 0.5bpp. 97 Moffett Field, scene 3 @ 0.1bpp 25 2D SPIHT Z4 SPIHT Shpere D4 SPIHT SNR(dB) 20 15 10 5 0 0 50 100 150 Spectral band 200 250 (a) Compare rate-distortion performance of MLVQ-SPIHT with scalar SPIHT at 0.1bpp Moffett Field, scene 3 @ 0.5bpp 35 2D SPIHT Z4 SPIHT Shpere D4 SPIHT 30 SNR(dB) 25 20 15 10 5 0 0 50 100 150 Spectral band 200 250 (b) Compare rate-distortion performance of MLVQ-SPIHT with scalar SPIHT at 0.5bpp Figure 5.5: Comparison of lossy performance of for Moffet Field image, scene 3. CHAPTER 6 Four-Dimensional Wavelet Compression of 4-D Medical Images Using Scalable 4-D SBHP In this chapter, we proposes a low-complexity wavelet-based method for progressive lossy-to-lossless compression of four dimensional (4-D) medical images. The Subband Block Hierarchial Partitioning (SBHP) algorithm is modified and extended to four dimensions, and applied to every code block independently. The resultant algorithm, 4D-SBHP, efficiently encodes 4D image data by the exploitation of the dependencies in all dimensions, while enabling progressive SNR and resolution decompression. The resolution scalable and lossy-to-lossless performances are empirically investigated. The experimental results show that our 4-D scheme achieves better compression performance on 4-D medical images when compared with 3-D volumetric compression schemes. 6.1 Introduction Four-dimensional (4-D) data sets, such as images generated by computer to- mography (CT) and functional Magnetic Resonance (fMRI) are increasingly used in diagnosis. Three-dimensional (3-D) volumetric images are two-dimensional (2D) image slices that represent cross sections of a subject. Four dimensional (4-D) medical images, which can be seen as a time series of 3-D images, represent the live action of human anatomy and consume even larger amounts of resources for transmission and storage than 3-D image data. For example, a few seconds of volumetric CT image sequences require a few hundred mega-bytes of memory. Therefore, for modern multimedia applications, particularly in the Internet environment, efficient compression techniques are necessary to reduce storage and transmission bandwidth. Furthermore, it is highly desirable to have properties of SNR and resolution scalability with a single embedded bitstream per data set in many applications. SNR scalability gives the user an option of lossless decoding, which is important for analysis and diagnosis, and also allows the user to reconstruct image data at lower rate or 98 99 quality to get rapid browsing through a large image data set. Resolution scalability can provide image browsing with low memory cost and computational resources. Since 4-D image data can be represented as multiple 2-D slices or 3-D volumes, it is possible to code these 2-D slices or 3-D volumes independently. Many waveletbased 2-D [18, 19, 17, 24] and 3-D [8, 73, 7, 14, 74] image compression algorithms have been proposed and applied on medical images. However, those 2-D and 3-D methods do not exploit the dependency among pixels in different volumes. Since the 4-D medical data is normally temporally smooth, the high correlation between volumes makes an algorithm based on four-dimensional coding a better choice. Very little work has been done in the field of 4-D medical image compression. Zeng et al. [76] use 4-D discrete wavelet transform and extended EZW to 4-D for lossy compression the echocardiographic data. SPIHT was extended to 4-D and tested on fMRI and 4-D ultrasound images by Lalgudi et al. [77]. These two algorithms are zerotree codecs and use symmetric tree structure. Lalgudi et al. [75] applied a 4-D wavelet transform on fMRI data, and compressed the transformed slices by JPEG2000 separately. Kassim et al. [78] proposed a lossy-to-lossless compression method for 4-D medical images by using a combination of 3-D integer wavelet transform and 3-D motion compensation. In this paper, we propose a low-complexity progressive lossy-to-lossless compression algorithm that exploits dependencies in all four dimensions by using a 4-D discrete wavelet transform and 4-D coder. We extend SBHP [16], originally proposed as a low complexity alternative to JPEG2000 [24], to four dimensions. We have already reported on extension of SBHP to three dimensions and shown that this 3D SBHP is about 6 times faster in lossless encoding and 6 to 10 times faster in lossless decoding than Asymmetric Tree 3D-SPIHT [74]. This block-based algorithm has better scalability and random accessibility than zerotree coders. The 4D-SBHP is based on coding 4-D subblocks of 4-D wavelet subbands and can provide scalability and fast encoding and decoding. In this chapter, we will investigate the lossy-to-lossless performance and resolution scalability of 4D-SBHP in detail. The rest of this chapter is organized as follows. We present the scalable 4D- 100 SBHP algorithm in Section 2. Experimental results of scalable coding are given in Section 3. Section 4 will conclude this study. 6.2 6.2.1 Scalable 4D-SBHP Wavelet Decomposition in 4-D For 4D datasets, the variances along the axial and temporal direction are determined by the thickness of slices and imaging speed. The variances among four dimensions may be very different. In general, the similarity of pixel values along the temporal and axial directions are expected to be closer than along the other two directions, and similarity along the X and Y directions is very close. In Table 6.1, we give the average standard deviation (STD) of a 4D fMRI medical dataset and a 4D CT dataset along the X, Y, axial Z and temporal T direction. This asymmetric similarity has also been shown in [75] for 4-D fMRI image data sets. Therefore, it is reasonable to apply transforms along the axial and temporal directions in different ways from the transforms along the X and Y directions in the 4D wavelet transform. STD siem ct4d X 32.131 22.975 Y 24.451 23.460 Z 19.731 4.547 T 2.991 2.171 Table 6.1: Average standard deviation of 4D fMRI and 4D CT image data along X, Y, Z and T directions. In our method, 2D spatial transformation, 1D axial transformation (along image slices) and 1D temporal transformation are done separately by first performing 1D dyadic wavelet decomposition in the temporal domain, followed by 1D dyadic wavelet decomposition along the axial direction and then 2D dyadic spatial decomposition in the XY planes. A heterogeneous selection of filter types and different amounts of decomposition levels for each spatial direction (x, y, z or t direction) are supported by this separable wavelet decomposition module. This allows for adapting the size of the wavelet pyramid in each direction in case the resolution is limited. Figure 6.1 shows a 4-D (x, y, z, t) data set after 2 levels of 4-D wavelet transform. Because the number of volumes and slices in a typical 4-D data set can be quite large, it is impractical to buffer all volumes and slices for the temporal and 101 4-DimagedatainaGOV Fslices Z T X 4-DWavelettransform Y LLt LHt Ht Figure 6.1: Wavelet decomposition structure with 2 levels of 1D temporal transform followed by 2 levels of 1D axial transform and 2D spatial transform. The black block is the lowest frequency subband. axial transform. In our scheme, F consecutive slices in T consecutive volumes are collected into a Group Of Volumes (GOV). For example, the set of slices Sk,l , k ∈ (0, F − 1), l ∈ (0, T − 1) forms a GOV. We shall use the notation GOV (a, b) to indicate that the GOV has a slices in each volume and b volumes. Each GOV is independently transformed and coded. 6.2.2 Coding Algorithm The 2-D SBHP algorithm is a SPECK[17] variant which was originally designed as a low complexity alternative to JPEG2000 [16]. 4-D SBHP is a modification and extension of 2-D SBHP to four dimensions. In 4-D SBHP, each subband is partitioned into 4-D code-blocks. All 4-D code-blocks have the same size. 4-D partitioning is applied to every code-block independently and generates a highly scalable bit-stream for each code-block by using the same form of progressive bitplane coding as in SPIHT[18]. 102 Consider a 4-D image data set that has been transformed using a discrete wavelet transform. The image sequence is represented by an indexed set of wavelet transformed coefficients ci,j,k,l located at the position (i, j, k, l) in the transformed image sequence. Following the idea in[19], for a given bit plane n and a given set τ of coefficients, we define the significance function: Sn (τ ) = 1, if 2n ≤ max |ci,j,k,l | ≤ 2(n+1) , (i,j,k,l)∈τ 0, otherwise. (6.1) Following this definition, we say that set τ is significant with respect to bit plane n if Sn (τ ) = 1. Otherwise, we say that set τ is insignificant. In 4D-SBHP, each subband is partitioned into 4-D code-blocks with the same size. The 4D-SBHP algorithm makes use of sets referred to as sets of type S which can be of varying dimensions. The dimension of a set S depends on the dimension of the 4-D code-block and the partitioning rules. Because of the limited number of volumes and slices in a GOV, the dimensions along the temporal and axial directions of the 4-D code-block might be much shorter than the dimensions along the x and y directions. In our method, we set the 4-D code-block size to be 2M × 2M × 2N × 2N , (M > N > 0), i.e., code-block size has equal dimensions along x and y directions, and equal dimensions along z and t directions. With these dimensions, the initial stages of partitioning result in some S sets that are 2D sets, i.e., temporal and axial dimension are both 1. We define Max2D to be the maximum 2D S set that can be generated. For a 2M ×2M ×2N ×2N code-block, the Max2D is the 2M −N ×2M −N ×1×1 set. 4D-SBHP always has type S sets with at least 2 × 2 × 1 × 1 coefficients. The size of a set is defined to be the cardinality C of the sets, i.e., the number of coefficients in the set. During the course of the algorithm, sets of various sizes will be formed, depending on the characteristics of coefficients in the code-block. size(S) = C(S) ≡ |S| (6.2) 4-D SBHP is based on a set-partitioning strategy. Figure 6.2 and Figure 6.3 illustrate the partitioning process used in 4D-SBHP. Below we explain in detail the 4-D partition rules by using a 64 × 64 × 4 × 4 103 S S O(S) O(S) (a) size(S) ≤ size(M ax2D). (b) size(S) > size(M ax2D). Figure 6.2: Quadtree partitioning of set S. S S S I (a) size(S) ≤ size(M ax2D). S S S S S S S S S S S S S S I (b) size(S) > size(M ax2D). Figure 6.3: octave-band partitioning of set I. 4-D code-block X as an example. Here size(Max2D) = 16 × 16 × 1 × 1. The algorithm starts with two sets, as shown in Figure 6.4(a). One is composed of the S set: 2 × 2 × 1 × 1 top-left wavelet coefficients at the (0,0,0,0) position in X, and the other is the I set which contains the remaining coefficients, I = X − S. The 4-D SBHP will work on a 2-D domain and follow exactly the octave band partition rules as in 2-D SBHP [16] until the size(X − I) equal to size(Max2D), i.e. 16 × 16 × 1 × 1, as shown in Figure 6.4(b). In the next stage, the upper left 16 × 16 × 1 × 1 set at the (0,0,0,0) position in X will follow the 2-D quadrisection partition rules until all the significant coefficients be located. And the remaining I 104 set will be partitioned into fifteen 16 × 16 × 1 × 1 S sets (labeled 2 − 16 in Figure 6.4(c)) and one I set. The 2-D SBHP partition rule is applied on these 16×16×1×1 S sets to locate significant coefficients. At the following stage, the remaining I set is partitioned into fifteen 32 × 32 × 2 × 2 S sets, labeled 17 − 31 in Figure 6.4(d). The two 32 × 32 × 2 3-D blocks with the same label make one 32 × 32 × 2 × 2 4-D block. At the next step, each 32 × 32 × 2 × 2 4-D block will be partitioned into sixteen 16 × 16 × 1 × 1 blocks and 2-D SBHP partition rules will be applied on those 2-D blocks until all sets are partitioned to individual coefficients. Figure 6.4(e) shows this partition on block 17. During the coding process a set is partitioned following the above rules when at least one of its subsets is significant. To minimize the number of significant tests for a given bit-plane, 4-D SBHP maintains three lists: • LIS(List of Insignificant Sets) - all the sets(with more than one coefficient) that are insignificant but do not belong to a larger insignificant set. • LIP(List of Insignificant Pixels) - pixels that are insignificant and do not belong to insignificant set. • LSP(List of Significant Pixels) - all pixels found to be significant in previous passes. Instead of using a single large LIS that has sets of varying sizes, we use an array of smaller lists of type LIS, each containing sets of a fixed size. All the lists and list arrays are updated in the most efficient list management method - FIFO. Since the total number of sets that are formed during the coding process remain the same, using an array of lists does not increase the memory requirement for the coder. Use of multiple lists completely eliminates the need for any sorting mechanism for processing sets in increasing order of their size and speeds up the encoding/decoding process. For each new bit plane, significance of coefficients in the LIP are tested first, then the sets in the LIS in increasing order of their sizes, and lastly the code refinement bits for coefficients in LSP. Testing of sets by increasing size allows finding of significant coefficients more quickly and hence conveying value information prior to set significance information that conveys no value of individual coefficients. 105 The way 4D-SBHP entropy codes the comparison results is an important factor that reduces the coding complexity. Instead of using adaptive arithmetic or Huffman coding, 4D-SBHP uses only three fixed Huffman codes in some special conditions. Since there are only four subsets or pixels after most sets are partitioned, they can be coded together. In 4D-SBHP, we choose a Huffman code with 15 symbols, which is used in Chapter 3, corresponding to all the possible outcomes. The largest Huffman code is of length 6 bits. To speed up decoding, we can use lookup tables instead of binary trees to decode. No entropy coding is used to code the sign or the refinement bits. For these large datasets, this light use of entropy coding is a major factor in the low complexity and speed of the 4D-SBHP algorithm. 6.2.3 Scalable Coding In a wavelet coding system, resolution scalability enables increase of resolution when bits in higher frequency subbands are decoded. As shown in Figure 6.1, 4D-SBHP codes code-blocks in the black part first, then code-blocks in the white subbands. For a 2D image, after N-level of wavelet decomposition, the image has N +1 resolution levels. For a 4D image sequence with N-level wavelet decomposition in the spatial direction, M-level wavelet decomposition in the spectral direction and K-level wavelet decomposition in the temporal direction a total of (N + 1) × (M + 1) × (K + 1) resolution levels will be available. As shown in Figure 6.5, 4D-SBHP codes code-blocks from lowest to the highest frequency subbands. The algorithm generates a progressive bit stream for each code-block, and the whole bit stream is resolution scalable. If a user wants to decode up to resolution n, bits belonging to the same fraction of the same bit planes in the code-blocks related to resolution n can be extracted for decoding. 4D-SBHP is applied independently to every 4-D code-block inside a subband. An embedded bit stream is generated by bitplane coding, but overal, the bitstream is resolution scalable. To enable SNR scalability, rate-distortion information are calculated and stored in the header of each code-block in the coding process. When decoding, the rate allocation method described in 3 is used to select optimal cutting points for every code-block. Then selected bitstream from every code-block is 106 interleaved to get the final bitstream.. 6.3 Numerical Results We tested our algorithm on four fMRI , two 4D CT and one 4D ultrasound imaging datasets. Table 6.2 gives a brief description of these image datasets. File Name Image Type mb01 siem feeds 3T heart ct4d 4D CT fMRI fMRI fMRI fMRI ultrasound CT CT Dimension (x,y,z,t) 64 × 64 × 20 × 100 64 × 64 × 16 × 120 64 × 64 × 20 × 180 64 × 64 × 28 × 244 128 × 128 × 128 × 12 256 × 256 × 256 × 16 512 × 512 × 108 × 8 Bit Depth (bit/pixel) 13 8 13 13 8 8 13 Resolution in ’xy’ and z respectively (mm) 3.75 and 5 unknown 3.75 and 5 3.75 and 5 unknown unknown 0.787109 and 2.5 Table 6.2: Description of the image volumes In this section, we provide simulation results and compare the proposed 4-D codec with 3-D volumetric algorithms. 6.3.1 Comparison of Lossless performance with 3-D and 4-D schemes For 4-D datasets (x, y, z, t), we can employ 3D wavelet compression on either xyz cube or xyt cube. Table 6.3 compares the lossless compression performance of 4-D SBHP with 3-D SBHP applied on xyz cube and xyt cube. We get considerable compression improvement for 4D-SBHP as compared to 3D-SBHP on xyz and a small improvement as compared to 3D-SBHP on xyt. All results were obtained using three-level I(2,2) reversible transforms in the xy domain and one-level S transforms along the temporal and axial directions. The 4-D code-block size of (32 × 32 × 2 × 2) and GOV size GOV (z, t) = (4, 4) are chosen here. For 3D-SBHP, code-block size (32 × 32 × 2) and GOS (groups of slices) size of 4 (GOV (4, 1)) are chosen. Table 6.4 compares the lossless compression performance of 4D SBHP with 4D JPEG2000 [75], 4D EZW and 4D SPIHT [77]. Instead of applying wavelet transform on every GOV independently, these works apply 4D JPEG2000, 4D EZW and 4D SPIHT on the 4D wavelet transform of the whole 4D dataset. For these three 4D methods [75, 77], the compression parameter settings such as code-block size and wavelet decomposition level are not mentioned. To get 4D SBHP results, 107 File Name mb01 siem feeds 3T heart ct4d 4D CT 3D-SBHP on ’xyz’ cube 7.4732 4.9947 5.4952 10.3150 2.0719 3.2870 5.1953 3D-SBHP on ’xyt’ cube 6.8829 4.7234 4.5045 9.6163 2.1728 3.0830 5.0137 4D-SBHP 6.8196 4.6187 4.4923 9.4972 2.0272 2.8394 4.8667 Table 6.3: Lossless compression performance using 4D-SBHP and 3DSBHP (bits/pixel) GOV (z, t) = (16, 32), and wavelet decomposition level (xy, z, t) = (3, 2, 5) are used for mb01 and siem datasets. For heart image data, we use GOV (z, t) = (32, 8) and wavelet decomposition level (xy, z, t) = (3, 2, 2). Code-block size (64 × 64 × 4 × 4) is used on all three datasets. As shown in the table, 4D-SBHP is roughly comparable to 4D JPEG2000, 4D-EZW and 4D-SPIHT in lossless performance, while having lower memory requirement and complexity. File Name mb01 siem heart 4D JPEG2000 [77] 5.962 4.056 1.68 Table 6.4: Lossless compression (bits/pixel) 6.3.2 4D EZW [77] 5.984 4.331 2.013 4D SPIHT [77] 5.749 3.933 1.735 performance using 4DSBHP 6.0585 4.1580 1.9905 4D methods Comparison of Lossy performance with 3-D schemes In this section, we show performance of lossy reconstruction from lossless com- pressed file. The quality of reconstruction is measured by Signal-to-Noise Ratio (SNR) over the whole 4-D image dataset. SNR is defined by SN R = 10 log10 Px2 dB M SE (6.3) where Px2 is the average squared value of the original 4D medical image dataset and MSE denotes the mean squared-error between all the original and reconstructed slices. Figure 6.7 and Figure 6.8 compare the lossy performance of 4-D SBHP with 3-D SBHP applied on xyz cube and xyt cube. Figure 6.7(a) and 6.8(a) show plots of Signal-to-Noise Ratio (SNR) versus bitrate over the whole mb01 and siem image 108 data. We also evaluated the lossy performance of our algorithm in 2-D as medical images are usually viewed slice by slice. Figure 6.7(b) shows the lossy results of the (x, y) slices at every z of the 4-D block at time t = 8, i.e., (x, y, z, 8) for every z of mb01 at 3.5 bits/pixel. Figure 6.8(b) shows the lossy results of the (x, y) slices at every z of the 4-D block at time t = 20, i.e., (x, y, z, 20) for every z of siem at 1.0 bits/pixel. Figure 6.7 and 6.8 clearly show that our 4-D scheme exploits the redundancy in all four dimensions and is superior to 3-D coding schemes. Figure 6.9 shows the 3D view of the reconstructed siem sequence at time t = 20. The algorithm has a good subjective quality. Small differences in quality are almost not noticeable, especially at higher bit rates. 6.3.3 Resolution scalable results The fMRI medical sequence siem , three-level I(2,2) reversible transform in xy domain and two-level S transform along temporal and axial directions are selected for this comparison. And GOV size GOV (z, t) = (16, 16). Figure 6.10 shows the reconstructed siem images decoded from a single scalable code stream at a variety of resolutions at 0.25 bpp. The SNR values listed in Table 6.5 for low resolution image sequences are calculated with respect to the lossless reconstruction of the corresponding resolution. Table 6.5 shows that the SNR values increase from one resolution to the next lower one. Bit Rate 0.0625 0.125 0.25 0.5 1 2 SNR (dB) 1/2 resolution 23.722 28.817 37.80 lossless lossless lossless FULL 12.347 13.356 15.187 18.716 21.985 25.883 Table 6.5: SNR for decoding siem at a variety of resolutions and bit rates 6.4 Summary and Conclusions In this paper, we proposed an image coding algorithm, 4D-SBHP, for lossy- to-lossless compression of 4-D medical images using four-dimensional DWT and 109 four-dimensional set-partition. This block-based algorithm supports resolution scalability. Fixed Huffman coding and one coding pass per bit plane are used to reduce the coding time. The experimental results show that 4D-SBHP exploits the redundancy in all four dimensions and has higher compression ratio than 3-D compression schemes. Furthermore, 4D-SBHP is low in complexity and exhibits fast encoding and decoding times. 110 64x64x4x44-Dcode-block 64x64x4 2x2 s I (a) 64x64x4x44-Dcode-block X-I 16x16 X 64x64x4 I (b) 64x64x4x44-Dcode-block X 16x16 6 5 1 2 7 3 4 13 14 9 10 15 11 12 8 16 I (c) 32x32x2x2 20 21 23 24 30 29 25 31 28 24 18 19 26 19 27 25 29 31 30 22 22 18 23 17 17 28 21 20 26 27 (d) 16x16x1x1 20 21 23 17 17 23 24 30 25 29 31 28 24 19 18 26 19 27 25 29 31 30 22 22 18 28 21 20 26 27 (e) Figure 6.4: Set partitioning rules used by 4-D SBHP. 111 block L block 0 header header the highest bitplane header header b(n_0,0) b(n_1,1) b(n_0-1,0) header b(n_j,j) b(n_i,i) b(n_L,L) b(n_L-1,L) b(n_j-1,j) b(n_i-1,i) b(n_1-1,1) SNR scalable b(0,L) b(0,0) b(0,1) b(0,j) b(0,i) the lowest bitplane LLLLLL Subband resolution 0 HHHHHH Subband resolution k Resolution scalable Figure 6.5: An example of 4D-SBHP SNR and resolution scalable coding. Each bitplane α in block β is notated as b(α, β). Codeblocks are encoded and indexed from the lowest subband to the highest subband. the highest bitplane header b(n,0) b(n-1,0) block 0 Ri , j , 0 the lowest bitplane b(0,0) header b(m,1) b(m-1,1) b(0,1) block 1 δDi , j , 0 Figure 6.6: Bitstream structure generated by 4D-SBHP. Each bitplane α in block β is notated as b(α, β). Rate-distortion information is stored in the header of every code-block. 112 55 50 45 SNR(dB) 40 35 30 25 4D−SBHP 3D−SBHP xyt 3D−SBHP xyz 20 15 0 1 2 3 4 Rate (bpp) 5 6 7 (a) Lossy performance on mb01 image data 45 SNR(dB) 40 35 4D−SBHP 3D−SBHP xyt 3D−SBHP xyz 30 0 5 10 Slice no 15 20 (b) Lossy performance of every slice in eighth time sequence of mb01 image data at 3.5bpp Figure 6.7: Comparison of lossy performance of mb01 image data. 113 35 SNR(dB) 30 25 20 15 10 4D−SBHP 3D−SBHP xyt 3D−SBHP xyz 0 0.5 1 1.5 2 bpp 2.5 3 3.5 4 (a) Lossy performance on siem image data 26 24 SNR(dB) 22 20 18 16 4D−SBHP 3D−SBHP xyt 3D−SBHP xyz 14 12 0 2 4 6 8 Slice no 10 12 14 16 (b) Lossy performance of every slice in twentieth time sequence of siem image data at 1.0bpp Figure 6.8: Comparison of lossy performance of siem image data. 114 Figure 6.9: Reconstructed siem sequence at time t = 20 by 4D-SBHP, from left to right, top to bottom: original, 0.5 bpp, 1.0 bpp, and 2.0bpp Figure 6.10: A visual example of resolution scalable decoding. Full resolution and 1/2 resolution of one slice at 0.25 bpp CHAPTER 7 Conclusions and Future Work This thesis has investigated the problem of volumetric image compression. In this chapter, we summarize the contributions of this work and give some directions for future research. 7.1 Contributions of the Thesis In the first part, a low-complexity, embedded, block based, wavelet transform coding algorithm has been proposed for volumetric image compression. Based on the properties of volumetric image data, asymmetric 3D wavelet transform is applied to maximally de-correlate the source signals. The Three-Dimensional Subband Block Hierarchical Partitioning (3D-SBHP), efficiently encodes volumetric image data by exploiting the correlations in all dimensions. Fixed Huffman coding and one coding pass per bit plane are used to get low computational complexity. 3D-SBHP generates embedded bitstream and supports SNR scalability, resolution scalability and random accessibility from one bitstream. The details of SNR scalable and resolution scalable 3D-SBHP are given in Chapter 3. In 3-D SBHP, each subband is divided into contiguous code-blocks. 3-D SBHP is applied to every code-block independently and generates a highly scalable bit-stream for each code-block. The processing order of the code-blocks is resolution by resolution, i.e. the code-block in the next finer resolution will be coded only after all code-blocks in a given resolution is coded. We described the integer filter mode for 3D-SBHP which enables lossy-to-lossless decompression from the same bitstream. Wavelet packet structure and coefficient scaling are used to make the integer wavelet transform approximately unitary. Chapter 4 described the way 3D-SBHP is used to support random access. The details of the method that finds corresponding code-blocks for a given ROI are given in this chapter. The equations for number of code-blocks used for reconstructing a given ROI are also derived. The impact that the wavelet transform and code-block 115 116 configuration have on the compression efficiency and accessibility of an embedded bitstream is assessed by applying 3D-SBHP on medical volumetric images. Our work shows that the ROI access performance is affected by a set of coding parameters. With regard to both coding and access efficiency, we outlined some reasonable tradeoffs for 3D volumetric images. With a small loss of quality, 3D-SBHP gives low-complexity, SNR scalable, resolution scalable and random access decoding features. The 3D-SBHP is a very good candidate for applications that need high speed encoding volumetric images and other features, such as random access decodng and progressive decoding without any trhanscoding of an encoded bitstream. The application of extending 3D-SBHP to 4D case in Chapter 6 is another contribution of this work. The 4D wavelet transform and 4D coding method give a significant improvement over previous 2D and 3D techniques. In Chapter 5 of the thesis, the quantization techniques are investigated to exploit the relationships between the slices in the volumetric image dataset. For volumetric images, especially for hyperspectral images, neighboring slices convey highly related spatial details. And vector quantization has the ability to exploit the statistical correlation between neighboring data in a straightforward manner. Based on those facts, vector quantization is combined with SPIHT coding algorithm to code hyperspectral images. Lattice vector quantization is used to get low computational complexity. In particular, multistage lattice vector quantization (MLVQ) is used to exploit correlations between image slices, while offering successive refinement. Different LVQs including cubic Z4 and D4 are considered. And their performances are compared with other 2D and 3D wavelet-based image compression algorithms. This new method exploits the inter-band correlations along the wavelength axis and provide better rate-distortion performance at low bit rate than 2DSPIHT and those algorithms that employ 3D wavelet transforms. 7.2 Futher Work Some suggestions for future work are given below: 117 7.2.1 Improving Compression Efficiency Our 3D-SBHP can support all features of JPEG2000 and has very low-complexity. For lossless compression efficiency, 3D-SBHP is comparable to other 3D image coders on AVIRIS image data. Our result is only 2% worse than the highest efficiency we listed in the Table 3.8. However for medical image data, the lossless compression efficiency of 3D-SBHP is about 2%-10% lower than that of other 3D algorithms. Due to the limitations of storage and transmission bandwidth, improvement of 3D-SBHP compression performance is important. In 3D-SBHP, two fixed Huffman codes are used for all code-blocks. Since wavelet coefficients in different subbands have different source statistics, we can investigate the source statistics of medical image data and find a better low-complexity entropy coder. 7.2.2 3D-SBHP on Video For emerging video and multimedia applications, resolution and fidelity sala- bilities are very essential. 3D-SBHP, a high speed 3D wavelet subband coder with SNR and resolution scalabilities, is a good candidate for video compression. Recently, 3D wavelet coding via a motion compensated temporal filter (MCTF) is emerging as a very effective structure for highly scalable video coding, such as the MC-EZBC (Embedded ZeroBlocks Coding) video coder. 3D-SBHP can replace the EZBC subband coder as a low-complexity alternative. The idea of MCTF-based 3D-SBHP can also be applied on the 4D medical volumetrical images. LITERATURE CITED [1] T. Hamid, “DICOM requirements for JPEG2000”, ISO/IECJTC1/SC29/WG1, Report N944, 1998. [2] A. K. Jain, “Fundamentals of Digital Image Processing”, Englewood Cliffs, NJ: Prentice-Hall, 1989. [3] J. S. Lim, “Two-Dimensional Signal and Image Processing”, Englewood Cliffs, NJ: Prentice-Hall, 1990. [4] J. W. Woods and S. D. O’Neil, “Subband coding of images”, IEEE Trans. on Acoust., Speech and Signal Processing, Vol. 34, pp. 1278-1288, Oct. 1986. [5] G. P. Abousleman, M. W. Marcellin, and B. R. Hunt, “Compression of hyperspectral imagery using the 3-D DCT and hybrid DPCM/DCT”, IEEE Trans. on Geosci. Remote Sensing, Vol. 33, pp. 26-34, Jan. 1995. [6] A. Vlaicu, S. Lungu, N. Crisan, and S. Persa, “New compression techniques for storage and transmission of 2D and 3D medical images”, in Proc. SPIE Advanced Image and Video Communications and Storage Technologies, vol. 2451, pp. 370-377, Feb. 1995. [7] A. Bilgin, G.Zweig, and M.W. Marcllin, “Three-dimensional image compression with integer wavelet transform”, Applied Optics, Vol. 39, No.11, April. 2000. [8] Z. Xiong, X. Wu, S. Cheng, and J. Hua, “Lossy-to-Lossless compression of medical volumetric data using three-dimensional integer wavelet transforms”, IEEE Trans. on Medical Imaging, Vol. 22, No. 3, pp. 459-470, March 2003. [9] B.Kim and W.A.Pearlman, “An embedded wavelet video coder using three-dimensional set partitioning in hierarchical tree”, IEEE Data Compression Conference, pp. 251-260, March 1997. [10] P. Dragotti, G. Poggi, and A. Ragozini, “Compression of multispectral images by three-dimensional SPIHT algorithm”, IEEE Trans. on Geosci. Remote Sensing, Vol. 38, pp. 416-428, Jan. 2000. [11] Y. Kim and W. A. Pearlman, “Lossless volumetric medical image compression”, in Proc. SPIE Conference on Applications of Digita Image Processing XXII, vol. 3808, pp. 305-312, July 1999. 118 119 [12] Y. S. Kim and W. A. Pearlman, “Stripe-based SPIHT lossy compression of volumetric medical images for low memory usage and uniform reconstruction quality”, in Proc. ICASSP, vol. 4, pp. 2031-2034, June 2000. [13] E. Christophe and W. A. Pearlman, ”Three-dimensional SPIHT Coding of Volume Images with Random Access and Resolution Scalability”, EURASIP Journal on Image and Video Processing, 2008. [14] P. Schelkens, J. Barbarien, and J. Cornelis, “Compression of volumetric medical data based on cube-splitting”, in Applications of Digital Image Processing XXIII, Proc. of SPIE 4115, pp. 91-101, San Diego, CA, July 2000. [15] J. Xu, Z. Xiong, S. Li, and Y. Zhang, “3-D embedded subband coding with optimal truncation(3-D ESCOT)”, J. Appl. Comput. Harmon. Anal., vol. 10, pp. 290-315, May 2001. [16] C. Chysafis, A. Said, A. Drukarev, A. Islam, and W.A Pearlman, “SBHP - A Low complexity wavelet coder”, IEEE Int. Conf. Acoust., Speech and Sig. Proc. (ICASSP2000), vol. 4, PP. 2035-2038, June 2000. [17] A. Islam and W.A. Pearlman, “An embedded and efficient low-complexity hierarchical image coder”, in Proc. SPIE Visual Comm. and Image Processing, Vol. 3653, pp. 294-305, 1999. [18] J.M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients”, IEEE Trans. Image Processing, Vol. 41, pp. 3445-3462, Dec. 1993. [19] A. Said and W.A. Pearlman, “A new, fast and efficient image codec based on set-partitioning in hierarchical trees”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 6, pp. 243-250, June 1996. [20] Y. Cho and W. A. Pearlman, “Quantifying the Performance of Zerotrees of Wavelet Coefficients: Degree-k Zerotree Model”, IEEE Trans. on Signal Processing, Vol. 55, Part 1, pp. 2425-2431, June 2007. [21] X. Tang, W.A. Pearlman and J.W. Modestino, ”HyPerspectral image compression using three-dimensional wavelet coding”, SPIE/IS&T Electronic Imaging 2003, Proceedings of SPIE, Vol. 5022, Jan. 2003. [22] X. Tang and W. A. Pearlman, ”Three-Dimensional Wavelet-Based Compression of Hyperspectral Images”, Chapter in Hyperspectral Data Compression, Kluwer Academic Publishers 2005. [23] X. Tang, ”Wavelet Based Multi-Dimensional Image Coding Algorithms”, Ph.D thesis, Rensselaer Polytechnic Institute, 2005. 120 [24] D. Taubman, “High performance scalable image compression with EBCOT”, IEEE Trans. on Image Processing, Vol. 9, pp. 1158-1170, July 2000. [25] Y. Shoham and A. Gersho, “Efficient Bit Allocation for an Arbitrary Set of Quantizers”, IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. 36, no. 9, pp. 1445-1453, September 1988. [26] F.W. Wheeler, ”Trellis source coding and memory constrained image coding”, Ph.D thesis, Rensselaer Polytechnic Institute, 2000. [27] ”Information Technology - JPEG 2000 Image Coding System : Part 2 Extensions”, no. 15444-2, ISO/IEC JEC1/SC29/WG1 IS, 2002. [28] ”Information Technology - JPEG 2000 Image Coding System : Part 10 Extensions for three-dimensional data”, no. 15444-10 ISO/IEC JEC1/SC29/WG1 IS, 2008. [29] D. T. Lee, ”JPEG 2000: Retrospective and New Developments”, in the Proceedings of the IEEE Vol. 93, No. 1, , pp 32-41, Jan. 2005. [30] I. Daubbechies, “Ten lectures on wavelets”, Society for Industrial and Applied Mathematics, Philadelphia, 1992. [31] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using wavelet transform”, IEEE Trans. on Image Processing, Vol. 1, No. 2, pp. 205-220, Apr. 1992. [32] C. K. Chui, ”An Introduction to Wavelets”, Academic Press, San Diego, 1992. [33] C. Christopoulos, A. Skodras, and T. Ebrahimi. , “The JPEG2000 Still Image Coding: An Overview”, IEEE Transactions on Consumer Electronics, Vol. 46, No. 4, pp. 1103-1127, November 2000. [34] W. Swelden, ”The lifting scheme: a custom-design construction of biorthogonal wavelets”, Applied and Computational Harmonic Analysis, Vol. 3, No. 2, pp. 186–200, 1996. [35] W. Swelden, ”The lifting scheme: A construction of second generation wavelets”, SIAM J. Math Anal., Vol. 29, No. 2, pp. 511-546, 1997. [36] A.R. Calderbank, I. Daubechies, W. Swelden, and B.-L. Yeo, ”Wavelet transforms that map integers to integers”, J. Appl. Computa. Harmonics Anal. 5, pp.332-369, 1998. [37] J. Andrew, ”A simple and efficient hierarchical image coder”, IEEE International Conf. on Image Proc.(ICIP-97), Vol. 3, pp. 658-661, Oct. 1997. 121 [38] A. Said and W.A. Pearlman, ”Low-complexity waveform coding wia alphabet and sample-set partitioning”, Visual Communications and Image Processing’ 97, Proceedings of SPIE, vol. 3024, pp. 25-37, Feb. 1997. [39] W.A. Pearlman, A. Islam, N. Nagaraj, and A. Said, “Efficient, low-complexity image coding with a set-partitioning embedded block coder”, IEEE Trans. on Ciruits and Systems for Video Technology, Vol. 14, No. 11, pp. 1219-1235, Nov. 2004. [40] Reduced Complexity Entropy Coding, ISO/IEC JTC1/SC29/WG1 N1312, June 1999. [41] Proposal of the Arithmetic Coder for JPEG2000, ISO/IEC JTC1/SC29/WG1 N762, Mar. 1998. [42] S. Mallat, ”A wavelet tour of signal processing”, Academic Press, 2nd Edition, pp. 413, 1999 [43] A.R. Calderbank, I.Daubechies, W. Sweldens, and B. Yeo, ” Wavelet transforms that map integers to integers” Appl. Comput. Harmon. Anal., vol. 5, no. 3, pp. 332-369, 1998. [44] I. Daubechies and W. Sweldens, ”Factoring wavelet transforms into lifting steps”, J. Fourier Anal. Appl., vol. 4, pp. 247-269, 1998. [45] A. Said and W. Pearlman, ”An image multiresolution representation for lossless and lossy compression”, IEEE Trans. Image Processing, vol. 5, pp. 1303-1310, Sep. 1996. [46] Z. Xiong, X. Wu, D.Y. Yun, and W.A. Pearlman, ”Progressive coding of medical volumetric data using three-dimensional integer wavelet packet transform”, Medical Technology Symposium, 1998. Proceedings. Pacific, pp. 384-387, 1998. [47] C. He, J. Dong, Y.F. Zheng, and Z. Gao, ” Optimal 3-D coefficient tree structure for 3D wavelet video coding”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 961-972, Oct 2003. [48] A.P. Bradley and F.W.M.Stentiford JPEG 2000 and region of interest coding, Digital Image Computing Techniques and Applications(DICTA), Melbourne, Australia, pp. 303-308, 2002. [49] P.N.Topiwala, “Wavelet Image and video compression”, Kluver Academic Publishers, 1998. [50] Kakadu JPEG2000 v3.4, http://www.kakadusoftware.com/. 122 [51] S. Cho and W. A. Pearlman, ”Error Resilient Video Coding with Improved 3-D SPIHT and Error Concealment”, SPIE/IS&T Electronic Imaging 2003, Proceedings SPIE Vol. 5022, pp. 125-136, Jan. 2003. [52] J.H. Conway and N.J.A Sloane, Sphere-Packing, Lattice, and Groups, Springer, New York, NY, USA, 1988. [53] S.P. Voukelatos and J. Soraghan, ”Very low bit-rate color video coding using adaptive subband vector quantization with dynamic bit allocation”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 7, No. 2, April 1997, pp. 424-428. [54] C.E. Shannon, ”Coding theorems for a discrete source with a fidelity criterion” IRE Nat. Conv. Rec., pt. 4, 1959, pp. 142-163. [55] Y.L. Linde, A. Buzo and R.M. Gray, ”An algorithm for vector quantizer design”, IEEE Trans. on Communication, Vol. COM-28, Jan. 1980, pp. 84-95. [56] R.M. Gray, Source Coding Theoty, Boston: Kluwer, 1990. [57] J.H. Conway and N.J.A. Sloane, ”Voronoi region of lattices, second moments of polytopes, and quantization”, IEEE Trans. on Information Theory, Vol. IT-28, Mar. 1982, pp. 211-226. [58] J.H. Conway and N.J.A Sloane, ”Fast quantizing and decoding algorithms for lattice quantizers and codes”, IEEE Trans. on Information Theory, Vol. IT-28, Mar. 1982, pp.227-232. [59] J.H. Conway and N.J.A Sloane, ”A fast encoding method for lattice codes and quantizers”, IEEE Trans. on Information Theory, Vol. IT-29, No. 6, Nov. 1983, pp.820-824. [60] T.R. Fischer, ”A pyramid vector quantizer”, IEEE Trans. on Information Theory, Vol. IT-32, July 1986, pp. 568-583. [61] M. Antonini, M. Barlaud, and P. Mathieu, ”Image coding using lattice vector quantization of wavelet coefficients”, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Toronto, ON, Canada, May 1991, pp.2273-2276. [62] M. Barlaud, P. Sole, T. Gaidon, M. Antonini and P. Mathieu, ”Pyramidal lattice vector quantization for multiscale image coding”, IEEE Trans. on Image Processing, Vol. IP-3, No.4, July 1994, pp. 367-381. [63] A. Woolf and G. Rogers, ”Lattice vector quantization of image wavelet coefficient vectors using a simplified form of entropy coding”, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vol. 5, Adelaide, Australia, Apr. 1994, pp. 269-272. 123 [64] H. Man, F. Kossentini, and M. J. T. Smith, ”A family of efficient and channel error resilient wavelet/subband image coders”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 9, no. 1, pp. 95-108, 1999. [65] E.A.B. Da Silva, D.G.Sampson, and M. Ghanbari, ” A successvie approximation vector quantizer for wavelet transform image coding”, IEEE Trans. Image Processing, vol. 5, pp. 299-310, Feb. 1996. [66] J. Knipe, X. Li,and B. Han, ”An improved lattice vector quantization scheme for wavelet compression”, IEEE Trans. Signal Processing, vol. 46, pp. 239-243, Jan. 1998. [67] D. Mukherjee and S. K. Mitra, ” Vector set partitioning with classified successive refinement VQ for embedded wavelet image and video coding”, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 5, Seattle, WA, May 1998, pp. 2809-2812. [68] D. Mukherjee and S. K. Mitra, ” Successive refinment lattice vector quantization”, IEEE Trans. Image Porcessing, vol. 11, no. 12, pp. 1337-1348, Dec. 2002. [69] C. C. Chao and R. M. Gray, ”Image compression with vector SPECK algorithm”, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06, vol. 2, pp. 445-448, Toulouse, France, May 2006. [70] K. Rose, D. Miller, and A. Gersho, ” Entropy-constrained tree-structured vector quantizer design”, IEEE Trans. Image Porcessing, vol. 5, No. 2, pp. 393-398, Feb 1996. [71] P. A. Chou, T. Lookabaugh, and R. M. Gray, ” Entropy-constrained vector quantization”, IEEE Trans. Acoust. Speech, Signal Processing, vol. 37, No. 1, , pp. 31-42, Feb 1989. [72] A.A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic, New York, NY, USA, 1992. [73] S. Cho, D. Kim, and W. A. Pearlman, ”Lossless compression of volumetric medical images with improved 3-D SPIHT algorithm”, Journal of Digital Imaging, Vol. 17, No. 1, pp. 57-63, March 2004. [74] Y. Liu and W. A. Pearlman, ”Scalable three-dimensional SBHP algorithm with region of interest access and low complexity,” Applications of Digital Image Processing XXIX , Proc. SPIE Vol. 6312, pp. 631209-1–11, Aug. 2006. [75] H.G. Lalgudi, A. Bilgin, M.W. Marcellin, A. Tabesh, M.S. Nadar, and T.P. Trouard, ”Four-dimensional compression of fMRI using JPEG2000,” in Proc. SPIE International Symposium on Medical Imaging, Feb. 2005. 124 [76] L. Zeng, C.P. Jansen, S. Marcsch, M. Unser, and P.R Hunziker, ”Four-dimensional wavelet compression of arbitrarily sized echocardiographic data,” IEEE Transactions on Medical Imaging, Vol. 21, pp. 1179-1187, Sept 2002. [77] H.G. Lalgudi, A. Bilgin, M.W. Marcellin, and M.S. Nadar, and T.P., ”Compression of fMRI and ultrasound images using 4D SPIHT,”, in Proceedings of 2005 International Conference on Image Processing, Genova, Italy, September 2005. [78] A.Kassim, P.Yan, W.Lee, and K.Sengupta, ”Motion compensated lossy-to-lossless compression of 4-D medical images using integer wavelet transforms”, IEEE Trans. on Info. Tech. in Biomedicine, Vol. 9, no., 1, pp. 132-138, March 2005. APPENDIX A Huffman Codes for Entropy Coding and Statistics of the Training Set In 3D-SBHP, the dimension of the code-block along the axial direction is much short than the dimensions along in the spatial domain. This property makes most sets are 2D set. All sets are stored in order. For those 2D sets, LIS[k] points to all 2k+1 × 2k+1 sets. When those sets are partitioned, there are four subsets or pixels, we code them together by three individual Huffman codes for three context models. • Huffman Code 1: if the set in the LIS[0] becomes significant, use Huffman Code 1 to code the significant mask of this set. • Huffman Code 2: if the non-2 × 2 set in the LIS[i] (i > 0) becomes significant, use Huffman Code 2 to code the significant mask of this set. • Huffman Code 3: if the 2 × 2 set which is newly generated in the current bitplane becomes significant, use Huffman Code 3 to code the significant mask of this set. Some statistics utilized to generated Huffman codes for these three contexts and the generated codewords are list in the following. 125 126 Significant Mask 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Probabilities of Medical Image Training Set Context 1 Context 2 Context 3 0.090296 0.091414 0.135432 0.090227 0.090745 0.135160 0.063756 0.048479 0.055889 0.090235 0.088137 0.135425 0.062569 0.041981 0.054073 0.055897 0.026445 0.044545 0.053861 0.045748 0.031364 0.089652 0.086817 0.134552 0.055741 0.026342 0.044816 0.062524 0.041575 0.053016 0.053850 0.045509 0.031308 0.064294 0.042958 0.055011 0.053944 0.046189 0.031343 0.053892 0.045667 0.031379 0.059261 0.231995 0.026688 Table A.1: Probabilities for 15 significant subset masks collected from medical image training set. Significant Mask 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Probabilities of Hyperspectral Image Training Set Context 1 Context 2 Context 3 0.115262 0.096264 0.173623 0.115253 0.096757 0.173633 0.061992 0.049633 0.042905 0.114914 0.095634 0.173279 0.060506 0.047942 0.041147 0.056912 0.038823 0.034611 0.037598 0.049960 0.013908 0.115096 0.098630 0.174340 0.056961 0.038891 0.034788 0.061132 0.050287 0.042022 0.037595 0.049989 0.013926 0.062093 0.050924 0.043522 0.037630 0.049149 0.013912 0.037637 0.049835 0.013839 0.029422 0.137281 0.010545 Table A.2: Probabilities for 15 significant subset masks collected from hyperspectral image training set. 127 No. of Significant Subset in a Significant Set 1 2 3 4 Probability Medical Image 0.4436 0.3009 0.1646 0.0909 Hyperspectral Image 0.5191 0.2937 0.1381 0.0491 Table A.3: Probabilities for the number of significant subset in a split significant set. This statistics is collected from both medical image training set and hyperspectral image training set. Probability for a generated subset is significant Medical Image Hyperspectral Image 0.4758 0.4293 Table A.4: Probabilities of significance of a generated subset when a set is split. This statistics is collected from both medical image training set and hyperspectral image training set. Significant Mask 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Huffman Codewords of Medical Image Training Set Context 1 Context 2 Context 3 000 010 000 0100 0110 100 1100 1110 0001 0010 0001 010 1010 01011 1001 0110 11011 01011 1110 1001 11011 0001 0101 110 1001 00111 0101 0101 10111 1101 1101 01111 00111 0011 11111 0011 1011 1101 10111 0111 0011 01111 1111 00 11111 Table A.5: Huffman codewords generated for 15 significant subset masks based on medical image training set. 128 Significant Mask 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Huffman Codewords of Hyperspectral Image Training Set Context 1 Context 2 Context 3 000 0010 001 100 1010 00 0001 0110 00011 010 1110 101 1001 0001 10011 01011 01111 01011 11011 1001 001111 110 000 10 0101 11111 11011 1101 0101 00111 00111 1101 101111 0011 0011 10111 10111 1011 011111 01111 0111 0111111 11111 100 1111111 Table A.6: Huffman codewords generated for 15 significant subset masks based on hyperspectral image training set.