image authentication using distributed source coding
Transcription
image authentication using distributed source coding
IMAGE AUTHENTICATION USING DISTRIBUTED SOURCE CODING A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Yao-Chung Lin September 2010 © 2011 by Yao-Chung Lin. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons AttributionNoncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/jw121yz9884 ii Abstract Image authentication is important in content delivery via untrusted intermediaries, such as peer-to-peer (P2P) file sharing. Many differently encoded versions of the original image might exist. In addition, intermediaries might tamper with the contents. Distinguishing legitimate diversity from malicious manipulations is the challenge addressed in this dissertation. We propose an approach using distributed source coding for the image authentication problem. The key idea is to provide a Slepian-Wolf encoded quantized image projection as authentication data. This version can be correctly decoded with the help of an authentic image as side information. Distributed source coding provides the desired robustness against legitimate variations while detecting illegitimate modification. The decoder incorporating expectation maximization (EM) algorithms can authenticate images which have undergone contrast, brightness, and affine warping adjustments. Our novel authentication system also offers tampering localization by using inference over a factor graph that represents tampering models. Video quality monitoring is closely related to the image authentication problem. We contribute an approach using distributed source coding. The video receiver sends the Slepian-Wolf coded video projections to the quality monitoring server which has access to the original video. Distributed source coding provides rate-efficient encoding of the projection by exploiting the correlation between the projections of the original and received videos. We show that the projections can be encoded at a low rate of just a few kilobits per second. Compared to the ITU-T J.240 Recommendation for remote PSNR monitoring, our scheme achieves a bit-rate which is lower by at least one order of magnitude. iv Acknowledgments During my time at Stanford, I have been fortunate to have the support of many great people. My advisor, Prof. Bernd Girod, consistently inspires me with interesting research questions and creative suggestions to make my graduate study joyful. I would like to thank Prof. Robert M. Gray and Dr. Ton Kalker for serving on my committee and offering insightful comments that improved this dissertation. I am deeply grateful to Dr. Erwin Bellers and Torsten Fink for mentoring my research projects through three summers and beyond. Working with Image, Video, and Multimedia Systems (IVMS) group is a rewarding experience. We share not only sharp insights and stimulating ideas but also interesting things happening surround. I sincerely thank my key collaborator, David Varodayan, for enormous fruitful discussions that gradually enrich my research and baby care experience. I would like to extend my gratitude to other members, alumni, and visitors of IVMS group for their friendship throughout the years. Most importantly, my deepest gratitude goes to my parents and grandparents for their love throughout my life; my wife, Caroline, for making my life wonderful; little Lucas, appearing recently, for reminding me to work hard and keep learning. This dissertation is dedicated to my family. v Contents Abstract iv Acknowledgments v 1 Introduction 1 1.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Organization 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 6 2.1 Robust Hashing for Image Authentication . . . . . . . . . . . . . . . 7 2.1.1 Compression-Inspired Features . . . . . . . . . . . . . . . . . . 8 2.1.2 Block Projection . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Robust Projection . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.4 Coding of Features . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Foundations of Distributed Source Coding . . . . . . . . . . . . . . . 12 2.2.1 Lossless Distributed Source Coding . . . . . . . . . . . . . . . 13 2.2.2 Practical Slepian-Wolf Coding . . . . . . . . . . . . . . . . . . 13 2.3 Secure Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Secure Biometrics Using Slepian-Wolf Codes . . . . . . . . . . 16 2.3.2 Privacy Leakage and Secret-Key Rate . . . . . . . . . . . . . . 17 2.3.3 Comparison to Image Authentication . . . . . . . . . . . . . . 18 2.4 Rate Constrained Hypothesis Testing . . . . . . . . . . . . . . . . . . 19 2.5 Video Quality Monitoring Background . . . . . . . . . . . . . . . . . 23 2.5.1 Full-Reference Quality Assessment vi . . . . . . . . . . . . . . . 23 2.5.2 No-Reference Quality Assessment . . . . . . . . . . . . . . . . 24 2.5.3 Reduced-Reference Quality Assessment . . . . . . . . . . . . . 25 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Image Authentication Using DSC 3.1 Image Authentication Problem . . . . . . . . . . . . . . . . . . . . . . 27 28 3.1.1 Two-State Channel . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.2 Residual Statistics . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Image Authentication System . . . . . . . . . . . . . . . . . . . . . . 34 3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.1 Authentication Data Size . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 Receiver Operating Characteristic . . . . . . . . . . . . . . . . 38 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4 Learning Unknown Parameters 42 4.1 Two-State Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 EM Decoder for Contrast and Brightness Adjustment . . . . . . . . . 45 4.3 EM Decoder for Affine Warping Adjustment . . . . . . . . . . . . . . 49 4.4 Contrast, Brightness, and Affine Warping Adjustment . . . . . . . . . 54 4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5.1 Contrast and Brightness Adjustment . . . . . . . . . . . . . . 56 4.5.2 Affine Warping . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.5.3 Contrast, Brightness, and Affine Warping Adjustment . . . . . 61 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5 Tampering Localization 64 5.1 Space-Varying Two-State Channel . . . . . . . . . . . . . . . . . . . . 65 5.2 Decoder Factor Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3 Spatial Models for State Nodes . . . . . . . . . . . . . . . . . . . . . 69 5.4 Tampering Localization for Adjusted Images . . . . . . . . . . . . . . 72 5.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 75 5.5.2 Decodable Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.5.3 Receiver Operating Characteristic . . . . . . . . . . . . . . . . 76 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6 Video Quality Monitoring Using DSC 6.1 Video Quality Monitoring System . . . . . . . . . . . . . . . . . . . . 80 81 6.1.1 J.240 Feature Extraction . . . . . . . . . . . . . . . . . . . . . 82 6.1.2 PSNR Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.2 Performance Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.2.1 Performance Prediction . . . . . . . . . . . . . . . . . . . . . . 85 6.2.2 Synthesized Data Simulation . . . . . . . . . . . . . . . . . . . 88 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7 Conclusions and Future Work 95 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 A Test Images 99 B Concavity of L̂ 100 Bibliography 104 viii List of Figures 2.1 Image authentication scheme based on robust hashing . . . . . . . . . 8 2.2 Source coding with side information at the decoder . . . . . . . . . . 14 2.3 Compression for hypothesis testing with side information . . . . . . . 20 3.1 Two-state lossy channel . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Examples of the two-state lossy channel output . . . . . . . . . . . . 29 3.3 The difference between the two-state lossy channel input and output . 30 3.4 Sample autocorrelation function of differences . . . . . . . . . . . . . 31 3.5 Power spectral density function of differences . . . . . . . . . . . . . . 32 3.6 The difference distributions between the two-state lossy channel input and output using the blockwise mean as the projection . . . . . . . . 33 3.7 The difference distributions between the two-state lossy channel input and output using a high frequency projection . . . . . . . . . . . . . . 33 3.8 Image authentication system using distributed source coding . . . . . 34 3.9 Minimum rate for decoding Slepian-Wolf bitstream for the image Lena with the projection X quantized to 4 bits. . . . . . . . . . . . . . . . 37 3.10 Authentication data sizes in number of bytes using convention fixed length coding and distributed source coding for different number of bits in quantization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.11 ROC curves of tampering detection with different number of bits in quantization of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.12 ROC equal error rates for different authentication data sizes using conventional fixed length coding and distributed source coding. . . . . . 40 3.13 ROC curves of various authentication methods. . . . . . . . . . . . . 41 ix 4.1 Two-state channel with unknown adjustment parameters . . . . . . . 44 4.2 Examples of the channel output . . . . . . . . . . . . . . . . . . . . . 44 4.3 The oracle decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Contrast and brightness learning Slepian-Wolf decoder . . . . . . . . 46 4.5 Search traces for different decoders . . . . . . . . . . . . . . . . . . . 49 4.6 Realignment of an affine warped image . . . . . . . . . . . . . . . . . 51 4.7 Slepian-Wolf decoder with affine warping parameter learning . . . . . 52 4.8 Example of corresponding coordinate estimation in 1D . . . . . . . . 53 4.9 Minimum decodable rates for contrast and brightness adjusted images 58 4.10 ROC curves for contrast and brightness adjusted images . . . . . . . 59 4.11 Minimum decodable rates for rotated and sheared images . . . . . . . 60 4.12 ROC curves for the target images that have undergone affine warping 61 4.13 ROC curves for the target images that have undergone contrast, brightness, and affine warping adjustments . . . . . . . . . . . . . . . . . . 62 5.1 Space-varying two-state lossy channel . . . . . . . . . . . . . . . . . . 65 5.2 Target image overlaid with channel states . . . . . . . . . . . . . . . . 66 5.3 Factor graph for the localization decoder . . . . . . . . . . . . . . . . 68 5.4 Factor graph for the localization decoder with spatial models . . . . . 69 5.5 Spatial models for the channel states . . . . . . . . . . . . . . . . . . 71 5.6 Space-varying two-state lossy channel with contrast and brightness adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.7 Contrast and brightness learning Slepian-Wolf decoder for tampering localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.8 Minimum localization data rates for decoding S(Xq ) using tampered side information compared to the authentication data rates . . . . . . 77 5.9 ROC curves of the tampering localization decoders using spatial models 78 5.10 ROC curves of the tampering localization decoders facing contrast and brightness adjusted images . . . . . . . . . . . . . . . . . . . . . . . . 79 6.1 Video quality monitoring scheme using distributed source coding . . . 81 6.2 Random projection of J.240 feature extraction module . . . . . . . . 83 x 6.3 Distributions of X and X − Y . . . . . . . . . . . . . . . . . . . . . . 85 6.4 MSE estimation errors of maximum likelihood estimation . . . . . . . 89 6.5 PSNR estimation errors of maximum likelihood estimation . . . . . . 90 6.6 Average squared PSNR estimation error . . . . . . . . . . . . . . . . 90 6.7 Distortion-rate curves of the transcoded test video sequences . . . . . 91 6.8 RMS PSNR estimation error versus the number of bits in the quantization of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.9 RMS PSNR estimation error versus video digest data rates . . . . . . 93 A.1 Test images used in simulations. . . . . . . . . . . . . . . . . . . . . . 99 B.1 Concavity of the log-likelihood function as we vary the number of bits in quantization of X . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 xi Chapter 1 Introduction Media content can be efficiently delivered through intermediaries, such as peer-to-peer (P2P) file sharing and P2P multicast streaming. Popular P2P file sharing systems include BitTorrent [1], eMule [2], and KaZaA [3]. In these systems, each user not only receives the requested content but also acts as a relay forwarding the received portions to the other users. Since the same content can be re-encoded several times, media content in those P2P file sharing systems is available in various digital formats, such as JPEG [73] and JPEG2000 [77] for images, and MPEG-1 [74], MPEG-2 [75], and H.264/AVC [76, 79] for videos. On the other hand, the untrusted intermediaries might tamper with the media for a variety of reasons, such as interfering with the distribution of particular files, piggybacking unauthentic content, or generally discrediting a particular distribution system. A recent survey indicates that more than 50% of popular songs in KaZaA are corrupted [96], e.g., replaced with noise or different songs. Distinguishing legitimate encoding versions from maliciously tampered ones is important in applications that deliver media content through untrusted intermediaries. The problem is more challenging if some legitimate adjustments, such as cropping and resizing an image, are allowed in addition to lossy compressions. Additional adjustments might not change the meaning of the content, but could be misclassified as tampering. Users might also be interested in localizing tampered regions in a received content already deemed tampered, so that the user can request that the tampered 1 CHAPTER 1. INTRODUCTION 2 regions be retransmitted for recovery instead of requesting the whole content. On the other hand, the content distributor and the users can benefit from knowing the quality of the received content. Distinguishing legitimate encodings with possible adjustments from tampering, localizing tampering, and estimating the received content quality are the challenges addressed in this thesis with a focus on image authentication problems. Through digital video content delivery, distortions are introduced in lossy compression, transmission of bitstream, and reconstruction of the media content from possibly transcoded or damaged bitstreams. To ensure quality of service for the whole media delivery system, the first step is to monitor the fidelity of the received video. The quality monitoring problem is closely related to image authentication. Image authentication systems test two hypotheses, legitimate and tampered, while the quality monitoring systems examine a continuum of hypotheses about quality. This work contributes to the video quality monitoring problem by improving coding efficiency and quality estimation methods. 1.1 Research Contributions This thesis presents an extension of robust hashing for image authentication and quality monitoring problems using distributed source coding principles. Some results have been published in [100–108]. The major contributions of this work are summarized below: • The concept of distributed source coding is applied to the image authentication problem. Statistical properties of legitimate encodings and tampering are formulated. The use of blockwise projection of images is justified to capture the spatial structure of possible tampering. The robust hash generated by SlepianWolf encoding of the quantized original image projection is transmitted to the user. The user attempts to decode this robust hash using the target image as side information. The Slepian-Wolf result [167] indicates that the lower the distortion between side information and the original, the fewer authentication bits CHAPTER 1. INTRODUCTION 3 are required for correct decoding. By correctly choosing the size of the authentication data, this insight allows us to distinguish between legitimate encoding variations of the image and illegitimate modifications. • An extension of Slepian-Wolf decoding using an expectation maximization (EM) algorithm is proposed to address the authentication problem when additional adjustments appear. The extended decoder iteratively updates the editing parameters using the soft information of the partially decoded image projection and then decodes the Slepian-Wolf bitstream using the updated side information. The resulting authentication scheme can be robust against contrast, brightness, and affine warping adjustments. • We model the tampering localization problem using a space-varying channel and construct the corresponding decoder factor graph representation [89]. Using a message-passing algorithm over the decoder factor graph, the scheme can localize tampering in an image. The extensions with 1D and 2D spatial models to exploit the contiguity of tampering result in a smaller required Slepian-Wolf bitstream. • The analysis and coding of video quality monitoring systems are investigated. The maximum-likelihood estimation of the mean squared error between original and received feature pixels results in a smaller number of bits required for quantization of the feature pixels. Slepian-Wolf coding of quantized feature pixels additionally yields rate savings. We characterize the system by using the Cramér-Rao lower bound. 1.2 Organization This thesis is organized as follows: • Chapter 2 reviews past approaches for image authentication. Then we describe some foundations of lossless distributed source coding which offer potential improvement on the image authentication system based on robust hashing. We CHAPTER 1. INTRODUCTION 4 also review research into the secure biometric problem which is closely related to the image authentication problem. Both secure biometric and image authentication have similar information theoretical fundamentals. The theoretical results suggest that we explore the image authentication problem using distributed source coding. Lastly, we review past approaches for video quality monitoring which is closely related to the image authentication problem and can also benefit from distributed source coding. • Chapter 3 introduces the image authentication system using distributed source coding. We formulate the image authentication problem with a hypothesis testing setup by assuming that legitimate compressions introduce small white noise whereas tampering adds spatially correlated noise. The proposed system reduces the image dimensions by using blockwise projection to capture the spatial structure. The Slepian-Wolf encoder codes the projections. By correctly choosing the size of the Slepian-Wolf bitstream, the resulting bitstreams can be decoded using the legitimate image as side information. Section 3.3 shows that our system can distinguish legitimate encodings from malicious tampering using authentication data of less than 100 bytes. • Chapter 4 presents a solution to authenticate images that have undergone legitimate editing, such as contrast, brightness, and affine warping adjustments. The authentication decoder learns the editing parameters directly from the target image through decoding the authentication data using an EM algorithm. Section 4.1 introduces a two-state channel with unknown editing parameters to formulate the problem. Section 4.2 describe the proposed authentication decoder for images that have undergone contrast and brightness adjustment. Section 4.3 presents our solution to the authentication of affine warped images. Section 4.4 extends the decoder to address images that have simultaneously undergone contrast, brightness, and affine warping adjustment. Experimental results in Section 4.5 demonstrate that the EM decoder can distinguish legitimate editing from malicious tampering while accurately learning the parameter CHAPTER 1. INTRODUCTION 5 with authentication data size comparable to the oracle decoder which knows the ground truth parameters. • Chapter 5 extends the authentication system to localize the tampering. Sec- tion 5.1 formulates the localization problem using a space-varying two-state channel model. Section 5.2 describes the factor graph representation of the corresponding decoder. The decoder can reconstruct the image projection of the localization data using tampered images as side information by performing a message-passing algorithm over the factor graph. Section 5.3 presents spatial models that exploit the spatial correlation of the tampering. Section 5.4 extends the decoder to localize the tampering in tampered images that have undergone legitimate contrast and brightness adjustment. Simulation results in Section 5.5 demonstrate that the authentication system can localize the tampering with high probability using localization data of a few hundred bytes. • Chapter 6 studies a reduced-reference video quality monitoring scheme using distributed source coding. Section 6.1 describes the scheme in detail and provides the rationale for using distributed source coding. In Section 6.2, a theoretical framework based on the Cramér-Rao lower bound gives a performance prediction of the maximum likelihood variance estimation. The proposed performance prediction is confirmed by simulations. In Section 6.3, our approach is compared with the ITU-T J.240 Recommendation for remote PSNR monitoring. Chapter 2 Background The objective of image authentication is to distinguish legitimate variations in content from maliciously edited ones. Past approaches for image authentication fall into three groups: forensics, watermarking, and robust hashing. In digital forensics, the user verifies the authenticity by solely checking the received content [55]. For example, the histograms of transform coefficients offer a clue about how many times lossy compression has been applied [118] and the spectrum of color components tells us if the image is spliced from two or more images [129]. Unfortunately, these forensic methods cannot work well in images of low quality, since compression noise or reencoding would weaken those forensic traces. Without any information from the original, one cannot confirm the integrity of the received content. Therefore, content independent from the original may pass forensic checking. The next option for image authentication is watermarking. In this option, a semi-fragile watermark is embedded into the host signal waveform without perceptual distortion [36, 48, 206]. Users can confirm the authenticity by extracting the watermark from the received content. The system design should ensure that the watermark survives lossy compression, but that it “breaks” as a result of malicious manipulations. Unfortunately, watermarking authentication is not backward compatible with previously encoded contents, i.e., unmarked content cannot be authenticated later. Embedded watermarks might also increase the bit rate required when compressing a media file. 6 CHAPTER 2. BACKGROUND 7 Robust hashing can check the integrity of the received content using a small amount of data derived from the original content. Cryptographic hashing [42,47,151] is a special case in which the authentication data are generated using a scrambling hash function that is nearly impossible to invert; any modification of the content is not allowed as modifications yield a very different hash value. However, cryptographic hashing is not applicable to the image authentication problem as processed images are not exactly identical to the original but carry the same meaning. Researchers have been investigating robust hashing schemes that distinguish allowable distortion from malicious editing. This chapter reviews related work on robust hashing approaches. Section 2.1 reviews robust hashing schemes to offer an overview of previous approaches to the image authentication problem. Section 2.2 describes the key element of this work, distributed source coding, by reviewing Slepian-Wolf results, and some practical implementations of the Slepian-Wolf codec. Section 2.3 reviews contributions to the secure biometric problem which is closely related to the image authentication problem. Both secure biometrics and image authentication have similar information theoretical fundamentals which will be reviewed in Section 2.4. The theoretical results suggest that we explore the image authentication problem using distributed source coding. A closely related problem, quality monitoring, will be reviewed in Section 2.5. 2.1 Robust Hashing for Image Authentication Robust hashing achieves verification of previously encoded media by using an authentication server to supply authentication data to the user. Digital signatures [42, 152] have solved the problem when only unaltered content is allowed. The idea is to generate a hash value of the original content using a cryptographic hash function, such as MD5 [151] and SHA-1 [47], which is then signed by the private key of an authority using an asymmetric encryption system. The user can check if the content is altered by comparing the hash value of the received content to that in the digital signature. However, this solution is not applicable when some legitimate editing is allowed, since changing any single bit leads to a completely different hash. Inspired by 8 CHAPTER 2. BACKGROUND cryptographic hash, robust hashing is offered to provide proof of perceptual integrity. If two media signals are perceptually indistinguishable, they should have identical hash values. A common approach in media hashing is extracting features which have perceptual importance and should survive compression. The authentication data are generated by compressing these features or their hash values. The user checks the authenticity of the received content by comparing the features or their hash values to the authentication data. Typical robust hashing schemes for image authentication consist of three parts: feature extraction, coding of feature vector, and verification. In feature extraction, the original image is analyzed to obtain a set of feature vectors that would be robust against some type of processing, such as lossy compression. The (possibly quantized) feature vectors are coded into a bitstream as a part of the authentication data. The authenticity of the received image is verified at the receiver along with the authentication data which can be delivered through secure channels or embedded in the image. !"#$#%&'()*&$+ !"#$%&" '($&#)$*+, 2&#,34*33*+, -+.*,/ 0123+%2#-&2#4%(7&2& ,+-+#.+/()*&$+ 0"&*1*)#$*+, 0123+%2#4" 5&*6+"+/ Figure 2.1: Image authentication scheme based on robust hashing. The authentication data are generated from robust feature vectors of the original image. The received image is classified as authentic or tampered along with the authentication data delivered via secure channels. 2.1.1 Compression-Inspired Features Many image authentication systems achieve robustness against lossy compression by using compression invariant features. For JPEG compression, Lin et al. proposed to use the invariant relationship of DCT coefficients [97–99]. The key idea is that the partial order relations between two transform coefficients (i.e., ≤ and ≥) remain CHAPTER 2. BACKGROUND 9 unchanged after quantization and reconstruction. The binary feature vector encodes a set of the DCT coefficient relations of the same frequency in two pseudo-randomly selected blocks. For JPEG2000, Lu et al. proposed a similar image authentication scheme using the partial order relation of wavelet child-parent pairs [115, 116]. Oostveen et al. [128] and Adel-Mottaleb et al. [12] also investigated relation-based features for authentication. Other approaches directly use intermediate results in image coding, such as scan states of embedded block coding with optimal truncation (EBCOT) from JPEG2000 [171], binary significance map in set partitioning in hierarchical trees (SPIHT) [214], directly hashing the wavelet coefficients [14, 155], and using critical sets of DCT coefficients [85]. These compression-inspired features are designed for the corresponding compression schemes but would fail in other coding schemes or common image processing. 2.1.2 Block Projection Researchers investigated using block-based statistics or more sophisticated projection for the feature vectors to increase robustness. Block-based approaches can preserve feature locality for locating possible tampering. Schneider et al. proposed using image block-based histograms for image authentication [156]. The block-based histograms are robust against acceptable image compression, but an attacker can easily change the content while keeping the histograms unchanged. Fridrich considered zero-mean low-pass Gaussian pseudo-random projection for image authentication [56, 57]. The purpose of zero-mean projection is to enhance the robustness against contrast and brightness adjustment and other common image processing. However, this raises the security issue that the attacker can arbitrarily change the mean of image blocks while keeping the feature vector unaltered. This is because the null space of the projection is known. The same problem happens to the authentication scheme which uses fixed projection such as features using image block standard deviations or means [84, 113], column and row projections [208], and transform coefficients [215, 219]. This security issue can be addressed by using pseudo-random projection or pseudo-random tiling, such as [193], that keeps the null space or partitions unknown to the attacker. Typical CHAPTER 2. BACKGROUND 10 block-based features suffer from a robustness issue: when the target image is rotated, cropped, resized, or translated, the system fails as the features in the authentication data are no longer aligned to the target image. Mihcak et al. proposed the features derived from a low-pass band of randomly partitioned images. They argued that one should iteratively apply an order statistics filter and a linear filter to increase the robustness against slightly affine transformation [123]. Khanna et al. suggest normalizing the images before verification [87]. Next, we will review some sophisticated projections to authenticate images that have undergone rotation, scaling, and translation. 2.1.3 Robust Projection Many sophisticated projections are proposed to be invariant or robust to rotation, scaling, and translation. Most are based on Radon and Fourier transforms. Lefebvre et al. use principal component analysis on Radon transform coefficients of images to obtain features for authentication [94]. The important coefficients of the Radon transformed image are extracted by principal component analysis to yield a short representation for the authentication data. Principal component analysis also makes the features robust to scaling and rotation. Similarly, Seo et al. [162] and Zhang et al. [220] obtain affine invariant features from the log-mapped Radon transform coefficients. Swaminathan et al. proposed using a polar representation of the 2D Fourier transform of an image to generate robust features [174, 175]. An element in the proposed feature vector is the pseudo-random weighted summation along the circumference of a selected radii in the Fourier transformed image. The magnitudes of Fourier transform coefficients are independent from translation and the circumferential summation offers rotational invariance. Other features robust against rotation include the variance of pixels along the radial [40, 41], mean of pixels within a radial sector [213], contourlet coefficients [22, 195], and diametric strip projection [179]. One interesting approach is to treat each image block as a non-negative matrix and then use dimensional reduction techniques to generate image hashes [67,88,119,126,127,180]. Other CHAPTER 2. BACKGROUND 11 methods to enhance the robustness of the image authentication system include more sophisticated features that are important to the human visual system. Many approaches consider using visually important features for image authentication. These features are designed to be robust against moderate quality image compression and other content preserving processing but sensitive to malicious editing. Monga et al. proposed using the features derived from the end-stop wavelet coefficients to which human visual perception is reportedly sensitive [124,125]. Edges and contours of objects in an image are investigated as promising candidates for robust features in image authentication [21, 43, 138, 139]. Features inspired by the computer vision community for content-based image retrieval have been considered by Bhattacharjee et al. [21], Roy et al. [153], Lu et al. [114], and Schlauweg et al. [154]. 2.1.4 Coding of Features Most researchers in this field focus on investigating feature extraction. Generation of the final authentication data requires quantization and compression of the feature vectors. However, few articles discuss coding methods. Most approaches use a coarse quantization to have short authentication data. For example, Fridrich et al. use 1-bit quantization for random projection coefficients [56, 57, 153], and the relation-based approaches [12, 97–99, 115, 116, 128] can be considered as 1-bit quantizations of coefficient differences. The entropy coded or cryptographic hashed binary representation of the feature vector serves as the authentication data. Since the original and the legitimate target images are highly correlated, one can generate less authentication data by exploiting the correlation. To the best of our knowledge, Venkatesan et al. were the first to consider errorcorrecting coding in the image authentication system to reduce the authentication data size [193]. The idea is to project the binary feature vectors of the images into syndrome bits of an error-correcting code and directly compare the syndrome bits to decide the authenticity. Sun et al. independently considered using error-correcting coding in the image authentication system [170]. Their approach uses systematic CHAPTER 2. BACKGROUND 12 Hamming codes to obtain the parity bits of the binary feature vectors as the authentication data. The idea is to use the parity bits of systematic channel codes concatenated with the binary feature vector of the received image to correct the errors introduced by image processing, such as compression. Both approaches use bounded distance error-correcting codes. Further improvements can be made with the knowledge of distributed source coding. Another use for distributed source coding in authentication is to secure the feature vector. Johnson et al. proposed an image authentication scheme that protects the feature vector using subtractive dithering and compresses the dithered feature vector using distributed source coding [83]. The decoder uses the dither sequence as side information to decode the authentication data. The next section will review the foundations of distributed source coding and the connection to error-correcting codes. 2.2 Foundations of Distributed Source Coding Distributed source coding addresses separate compression of statistically dependent random sequences. Each encoder separately observes a random sequence and sends a bitstream to a single decoder. The decoder reconstructs the random sequences from the incoming bitstreams. Efficient compression can be achieved despite independent encoding. This concept has been applied to many applications, such as data compression for sensor networks [130,209], low complexity video encoding [5–11,134–137], compression for flexible video playback [28–31], systematic lossy error protection for videos [8, 140–147, 223], distributed compression for large camera arrays [222], distributed stereo image coding [24–26, 190–192], and compression of encrypted content [157, 161]. Overviews of recent developments and applications of distributed source coding can be found in [44,64]. This section reviews the Slepian-Wolf theorem and practical implementations of Slepian-Wolf coding. 13 CHAPTER 2. BACKGROUND 2.2.1 Lossless Distributed Source Coding A sequence of N independent and identically distributed (i.i.d.) samples of a finitealphabet random variable X can be compressed with negligible information loss at a coding rate (in bits per source symbol) arbitrarily close to H(X) as N tends to infinity, where H(X) is the entropy of the random process [163]. Similarly, a random sequence of i.i.d. samples of Y can be compressed into a binary sequence at a coding rate close to H(Y ). Clearly, a total coding rate of (H(X) + H(Y )) can allow the sequences of X and Y to be reconstructed at the decoder. If X and Y are statistically dependent, only H(X, Y ) bits per symbol are needed, but can it be done by encoding X and Y separately? Slepian and Wolf gave a positive answer to the above question [167], which shows the achievable rate region: RX + RY ≥ H(X, Y ) RX ≥ H(X|Y ) RY ≥ H(Y |X) where RX and RY are the coding rates in bits per source symbol for X and Y , respectively. Consider the special case depicted in Figure 2.2. We are interested in compression of a sequence of X statistically dependent on the side information Y . The side information Y is only available at the decoder. The Slepian-Wolf result implies that the rate for lossless compression of X is at least H(X|Y ). Conversely, when the rate is less than H(X|Y ), the probability of decoding error will be bounded from zero. Recent work contributes several practical solutions approaching the theoretical limit. We will discuss practical implementations of this coding scheme in Section 2.2.2. 2.2.2 Practical Slepian-Wolf Coding Although the theoretical results of distributed source coding have been available for more than three decades, most practical coding schemes emerged only recently. From 14 CHAPTER 2. BACKGROUND ! !"#$%&'()*"+ ,'-*.#/ "#$%&" '$#$()$(*#++,&-%.%/0%/$ !"#$%&'()*"+ 0#-*.#/ !! # Figure 2.2: A special case of the Slepian-Wolf theorem. The discrete memoryless random variables X and Y are statistically dependent, but Y is only available at the decoder. the proof of the theorem in [34], the encoder divides the source space X N into 2N R bins. For each input sequence, the encoder sends the index of its bin. With a sufficient rate R for the binning, the decoder can reliably find the unique and correct X that is jointly typical with the side information Y . Although the proof does not provide any constructive means of binning, recent advances on channel codes offer promising solutions. In the outline of the alternative proof of the Slepian-Wolf theorem in [207], the author demonstrates that channel coding is closely related to Slepian-Wolf coding. Consider that binary sequences X and Y are correlated and with some small difference. The difference can be modeled as an error sequence introduced by a correlation channel that captures the statistical dependence between X and Y . The key idea is to send the syndrome of the source X to the decoder. The decoder corrects the error by decoding the concatenation of the syndrome and the side information Y . Csiszár showed linear codes can achieve the Slepian-Wolf rate with a bounded error exponent [38]. This implies that using syndrome bits of channel codes as binning indices is a suitable binning strategy. Recent practical Slepian-Wolf coding schemes are based on channel codes like block and trellis codes [131]. More sophisticated distributed source coding techniques are based on the channel codes that are close to the Shannon limit, such as Turbo codes [20] and low-density parity-check (LDPC) codes [58]. Turbo codes for compression of binary source with side information were independently proposed by Garcı́a-Frı́as and Zhao [59, 61, 221], Bajcsy et al. [16], and Aaron et al. [4]. LDPC CHAPTER 2. BACKGROUND 15 codes were suggested by Liveris et al. [111,112] and Schonberg et al. [158–160] as well as other authors [33, 63, 92, 168, 181]. Iterative LDPC decoding methods using the message-passing algorithm are attractive as they can intuitively integrate with factor graphs [89] of source or channel models. Garcı́a-Frı́as et al. demonstrated the compression of binary sequences correlated by a hidden Markov model using LDPC codes [62], as well as the decoding algorithm for LDPC codes over a finite-state binary Markov channel [60]. Varodayan et al. [187] and Schonberg et al. [157] independently extend the LDPC decoder to exploit the 2D spatial source correlation for compression. Aaron et al. first introduced a concept of rate adaptivity to the distributed source coding community. They proposed to use rate-compatible punctured turbo (RCPT) codes for practical Slepian-Wolf coding to enable rate-efficient applications of distributed source coding in which terminals can transmit bitstreams incrementally without significant rate penalty [11]. Varodayan et al. then invented rate-adaptive LDPC codes [186, 188] as the LDPC counter part. In addition to rate adaptivity, the side information adaptation is introduced to offer a novel means for motion and disparity estimation of distributed stereo images and video coding [189–192]. Our work in authentication of adjusted images described in Chapter 4 has been inspired by adaptive distributed source coding [185]. In addition to video and image coding applications, researchers in the security community found that the distributed source coding technique meets the needs of their applications. The next section will present some work in biometric security which applies distributed source coding techniques. 2.3 Secure Biometrics Access control to data or physical locations plays an important role in preventing malicious actions. Traditional approaches use possession of secret knowledge, e.g., password, or physical token, e.g., identifying documents. The former can be guessed by unauthorized persons or forgotten by the legitimate users; the latter might be forged, lost, or stolen. Biometric systems provide an alternative solution to access CHAPTER 2. BACKGROUND 16 control. Biometric data, such as fingerprints [15] and iris scans [39], cannot be forgotten or lost. These kinds of data contain a large amount of information and therefore are hard to guess and copy. However, each measurement of biometric data suffers from noise due to intrinsic variation, different measurement conditions, and devices. For example, fingerprint scans can be different due to elastic skin deformation or dust on the finger. Biometric systems should be robust to those variations. Most approaches rely on pattern recognition techniques. The enrollment biometric data of an individual are extracted and stored at registration. Later, the biometric reading is compared to the stored biometric data for authentication. If the two biometric data are close enough, then access is granted. However, biometric data, unlike password or identifying documents, cannot be regenerated once they are compromised. Unencrypted storage of the original biometric data would pose a security risk. Just as passwords in computer systems are not stored in the clear to prevent being comprised, biometric data should also be stored in a protected way. The solution for a password, i.e., cryptographic hashing, cannot be directly applied to biometric data as the aforementioned measurement noise appears in a biometric reading while a password is always the same. The challenge of secure biometrics includes privacy of the original biometric data and robustness to measurement noise. 2.3.1 Secure Biometrics Using Slepian-Wolf Codes Researchers in Mitsubishi Electric Research Laboratories (MERL) achieve secure storage of biometric data using Slepian-Wolf codes [45, 46, 120, 172, 173, 194] with the information theoretical knowledge of common randomness [38]. The key idea is to use Slepian-Wolf coding as a tool extracting the common secret from two observations of correlated sources. In the enrollment stage, the system extracts the biometric features from the raw biometric data from an individual and then encodes the features into Slepian-Wolf bitstream and cryptographic hash value. In the authentication stage, the system decodes the Slepian-Wolf bitstream using the probe biometric features as side information. The hash value of the decoded biometric feature is compared to CHAPTER 2. BACKGROUND 17 the original hash value stored in the system. If the hash values are identical, access is granted, otherwise, access is denied. The authors assume that the biometric data from a user is supposed to be statistically independent of the biometric data from other users. On the other hand, the biometric measurements from the same user are correlated. A suitable conditional distribution models the correlation between the biometric readings (the enrollment and the probe) from the same user. Slepian-Wolf coding provides the robustness against the measurement noise by exploiting the correlation. The system is secure as the Slepian-Wolf bitstream alone does not leak too much information and the cryptographic hash value presumably leaks no information. In [120], the authors demonstrated a secure biometric system for iris biometric data using the above principle and indicated the tradeoffs between security and robustness of the secure biometrics. In [45, 46], the authors applied the same principle to secure fingerprint biometric data and developed a fingerprint channel model for a message-passing decoder to capture the measurement noise. Later, the authors proposed an alternative approach using a feature transformation to map the fingerprint biometric data to a binary vector which has appropriate statistics for LDPC syndrome coding [172] without the complicated message-passing algorithms. 2.3.2 Privacy Leakage and Secret-Key Rate A research group in Eindhoven University of Technology independently investigated the biometric problems from an information theoretic perspective [69–72]. The authors determined that the secure biometric problem is related to common randomness [38]. Their contribution is to investigate the tradeoff between the secret-key rate and the privacy-leakage for the biometric system setting. In their setting, the enrollment and authentication biometric data can form a common (random) secret key using a public message derived from the enrollment biometric data. The secret key can be used in the cryptographic system for secure communication and access control. A higher secret-key rate would require the public message to leak more private CHAPTER 2. BACKGROUND 18 information about the enrollment biometric data. The authors established the optimal secret-key and privacy-leakage rate regions. Although they aimed at a slightly different goal from MERL which investigated the tradeoff between the robustness and privacy leakage, both contributions give the same insight that the Slepian-Wolf theorem plays an important role in an optimal detection scheme. 2.3.3 Comparison to Image Authentication The image authentication problem is closely related to the secure biometrics. In the image authentication problem, the original image and the authentic target image are similar; in the secure biometric problem, the enrollment biometric data and the probe biometric data from the same person are similar. Both target images and probe biometric data are not available at the time of generating the authentication data or secure biometric data. These cases fit the setting of distributed source coding which can lead to high coding efficiency for authentication data and secure biometric data. In the image authentication problem, lower encoding rate yields smaller authentication data; in the secure biometric problem, lower encoding rate yields more privacy of the original biometric data. Despite these similarities, these two problems do have some differences. In the secure biometric problem, the biometric data from two different persons are assumed to be independent. In the image authentication problem, the tampered target images are usually correlated to the original one but with different statistics than the authentic target images. This means that the secure biometric problem is actually a hypothesis testing against independence under rate constraints [13], while the image authentication problem is a more general rate-constrained hypothesis testing problem [65, 66]. The next section will review information theoretical results of hypothesis testing under rate constraints. The result suggests that the image authentication problem can benefit more than the biometric problem by using distributed source coding. 19 CHAPTER 2. BACKGROUND 2.4 Hypothesis Testing with Side Information Under Rate Constraints Let each pair (X, Y ) be independently and identically distributed over discrete space X × Y subject to a joint distribution based on the following two hypotheses: (X, Y ) ∼ ! PXY (X, Y ) if H0 is true, QXY (X, Y ) if H1 is true. (2.1) Hypothesis testing determines which hypothesis is true based on the observations (xN , y N ). The Type I error α is the error of rejecting H0 when it is actually true. The Type II error β is the error of accepting H0 when H1 is actually true. These two errors express a tradeoff made by the choice of the acceptance region A ⊆ (X N , Y N ) for H0 . Optimal error probabilities are related to the divergence of the two distributions P and Q. Specifically, the minimal Type II error exponent for a given probability of a Type I error has been determined by Stein [27, 35]: Lemma 1 (Stein’s Lemma). [35, Theorem 12.8.1] Let X1 , X2 , . . . , XN be i.i.d. drawn from P if H0 is true, or Q if H1 is true, and Q > 0 and P > 0 for all values in X . The Kullback-Leibler divergence from P to Q is defined as D(P ||Q) ! " P (x) N be an acceptance region for hypothesis H0 , x∈X P (x) log Q(x) < ∞. Let AN ⊆ X and probabilities of error be αN = P N (AcN ) and βN = QN (AN ). For 0 < # < 12 , define ! βN = min βN . AN αN <! Then 1 ! log βN = −D(P ||Q) !→0 N →∞ N lim lim If both X N and Y N are available at the same site, one can perform the optimal hypothesis testing and then send one-bit information of the decision. This would achieve the optimal decision performance at zero-rate. A more interesting problem is to consider the asymmetric case in which Y N is only available at the decoder. 20 CHAPTER 2. BACKGROUND Consider the setup depicted in Figure 2.3. (X, Y )N are i.i.d. subject to the joint probability P (X, Y ) if H0 is true, Q(X, Y ), otherwise. The encoder can only observe X N and send the encoding of X N at the rate R to the decoder. The decoder decides which hypothesis is true based on the incoming encoding of X N and the side information Y N . Note that it is not necessary to reconstruct X N at the decoder to obtain the decision. The decoder tries to minimize the decision error under the encoding rate constraint. Define the optimal error exponent: σ(#, R) ≡ lim inf − N →∞ 1 ! log βN (R) N ! where # is the constraint on the Type I error, i.e., lim sup αN ≤ #, and βN (R) is the N →∞ minimal Type II error. The goal is to find the value of σ(#, R). However, to the best of our knowledge, most results of this problem setting only have lower (achievable) bounds. The only exception is hypothesis testing against independence. ! $ ,'-*.#/ "#$%&" 0#-*.#/ '$#$()$(*#++,&-%.%/0%/$4& %&4&6!7#8&9&'6!7#8&35& %(4&6!7#8&9&)6!7#8 -%*()(3/4&% 1 &35&%2 #$ Figure 2.3: (X, Y )N are i.i.d. subject to the joint probability P (X, Y ) if H0 is true, Q(X, Y ), otherwise. The side information Y N is only available at the decoder. The encoder sends an encoded bitstream based on the observations X N to the decoder. The decoder decides which hypothesis is true based on the incoming bitstream and the side information Y N . Ahlswede and Csiszár establish the tight bound of the error exponent under rate constraint for the case of the hypothesis testing against independence, i.e., H0 : PXY (X, Y ) (2.2) H1 : PX (X)PY (Y ). (2.3) 21 CHAPTER 2. BACKGROUND Theorem 2 (Ahlswede and Csiszár [13]). Consider the hypothesis testing with (2.2) and (2.3). For all 0 ≤ # < 1 and all R ≥ 0 σ(#, R) = max U ∈L(R) ||U ||≤||X ||+1 I(U; Y ) where L(R) = {U|R ≤ I(U; X), U → X → Y }, (2.4) and U is an auxiliary random variable with cardinality ||U|| and U → X → Y forms a Markov chain such that the mutual information satisfies the rate constraint I(U; X) ≤ R. Note that this bound is tight, i.e. the achievable lower bound meets the converse upper bound. In the case that R ≥ H(X), one can achieve the optimal decision error exponent of I(X, Y ) by letting U = X. The mutual information I(X; Y ) is the divergence between the joint distribution P (X, Y ) against the product of the two marginals P (X)P (Y ). This result is closely related to the secure biometrics. According to the information theoretical results of the secure biometrics in [69], the rate constraint here corresponds to the privacy leakage constraint in the secure biometrics, and the optimal decision error exponent corresponds to the chosen secret-key rate. We now review the general rate-constrained hypothesis testing by stating the following theorem. Theorem 3 (Han [65] in 1987). Consider the setting depicted in Figure 2.3 and the hypothesis testing in (2.1). For all 0 ≤ # < 1 and all R ≥ 0 σ(#, R) ≥ max min U ∈L(R) P̃U XY ∈S(U ) ||U ||≤||X ||+1 D(P̃U XY ||QU XY ), where S(U) = {P̃U XY |P̃U X = PU X , P̃U X = PU X }, QU |X = PU |X , L(R) is defined in (2.4). This result only gives an achievable lower bound of the error exponent for the general hypothesis testing under the rate constraint. However, if Q(X, Y ) = P (X)P (Y ), 22 CHAPTER 2. BACKGROUND this result reduces to Theorem 2. Clearly, the special case, R ≥ H(X), gives the optimal decision error exponent σ(#, R) = D(PXY ||QXY ). Both Theorem 2 and Theorem 3 only consider the zero-error coding scheme, i.e. the decoder reconstructed the joint type of U, X, and Y at exact zero error probability. Later, Shimokawa et al. established an achievable lower bound for a wider class of coding schemes that includes the coding schemes with decaying nonzero-error probabilities. Note that most Slepian-Wolf coding schemes are in this class. Theorem 4 (Shimokawa et al. [166] in 1994). Consider the setting depicted in Figure 2.3 and the hypothesis testing in (2.1). For all 0 ≤ # < 1 and all R ≥ 0. σ(#, R) ≥ max U ∈L∗ (R) ||U ||≤||X ||+1 min(d1 (U), d2 (U)), where L∗ (R) = {U|R ≤ I(U; X|Y ), U → X → Y } d1 (U) = d2 (U) = min P̃U XY ∈S(U ) (2.5) D(P̃U XY ||QU XY ) (2.6) +∞, + [R − I(U; X|Y )] + min P̃U XY ∈T (U ) if R ≥ I(U; X) D(P̃U XY ||QU XY ), otherwise (2.7) T (U) = {P̃U XY |P̃U X = PU X , P̃Y = PY , HP̃ (U|Y ) ≥ HP (U|Y )} (2.8) S(U) = {P̃U XY |P̃U X = PU X , P̃U X = PU X } (2.9) This result covers the wider coding scheme class by additionally considering the error exponent of decoding error in (2.7) and the rate constraint relaxation by decoding using side information Y in (2.5), and yields a tighter bound than Theorem 3. Consider a special case in which R ≥ H(X|Y ) + D(PXY ||QXY ), (2.10) CHAPTER 2. BACKGROUND 23 then we can set U ≡ X and obtain the optimal decision error exponent σ(#, R) = D(PXY ||QXY ). Note that the secure biometric problem assumes that the biometric data from a different person are independent, i.e., QXY = PX PY . This implies the rate requirement in (2.10) to be R ≥ H(X|Y )+I(X; Y ) = H(X). On the other hand, most cases in the image authentication problem usually have only minor illegitimate tampering which suggests the tampering probability model QXY would not be far from the authentic model PXY . We can assume the rate constraint in (2.10) to be H(X|Y ) + D(PXY ||QXY ) ≤ H(X), provided D(PXY ||QXY ) ≤ I(X; Y ), which is a reasonable assumption for the image authentication problem. Therefore, the Slepian- Wolf type coding scheme makes more sense for the image authentication problem to achieve rate-efficient coding than for the secure biometric problem. 2.5 Video Quality Monitoring Background A closely related problem to image authentication is video quality monitoring. This section reviews the related work on quality monitoring systems. Past approaches are categorized into three classes: full-reference, no-reference, and reduced-reference. 2.5.1 Full-Reference Quality Assessment The full-reference quality assessment measures the distortion of the target video referring to the original video. The most common objective metrics are the mean squared error (MSE) and peak-signal-to-noise ratio (PSNR). Most researchers seek objective metrics that are highly correlated to the subjective quality assessment which involves the human visual system. This perceptual quality measurement for speech, image, and video coding has been proposed for three decades [17, 18, 82, 91, 201, 203]. The idea is to decompose the signal into several frequency banks and evaluate the distortion with a different weight for each subband according to the human perception. Statistical models are used to capture the subjective quality from objective metrics. Wang et al. correlated the distortion to the perceptual quality by separating the CHAPTER 2. BACKGROUND 24 structural distortion from the non-structural one [197]. Sheikh et al. suggested evaluating the fidelity using mutual information between the reference and the distorted videos based on the proposed natural scene statistics [165]. The subjective distortionrate performance of video encoders can benefit from full-reference quality assessment, since video encoders have access to the original content. For most video broadcasting applications, since received video and original video are not available at the same site, the full-reference quality assessment is not applicable. 2.5.2 No-Reference Quality Assessment No-reference methods estimate the quality at the client device using the received video alone. Common approaches use statistical image models and temporal information of videos to estimate objective or subjective quality. Turaga et al. estimated the quantization error power of intra coded images using the statistics of the discrete cosine transform coefficients [182]. For video, Reibman et al. additionally considered the temporal dependency to estimate the mean squared error of received video degradation due to packet loss [149]. Yamada et al. proposed estimating the MSE due to packet loss using effectiveness of error concealment which is derived from the discontinuity of concealed and correctly decoded blocks [210]. Approaches of subjective quality estimation include using natural scene statistics [164], estimation of blocking and ringing artifacts [109,110,196,198], and estimation using luminance gradients [54]. Researchers additionally use temporal information to assess the visual quality of video content [117, 150, 212]. Since many video coding schemes introduce deblocking filters and post processing to enhance subjective quality, the distortions are difficult to quantify without the reference. Another type of approach in no-reference quality assessment uses watermarking. Researchers have suggested embedding a watermark into the original content to assess the received quality by relating the received reconstruction PSNR to the watermark detection rate [78, 169], comparing to the original watermarking signal [23,53], or delivering the original quality features for comparison using watermarking [200]. The watermark approaches offer more accurate quality CHAPTER 2. BACKGROUND 25 estimation, but the tradeoff is quality degradation due to watermark embedding or higher bit rate required to compress the media content to avoid watermark breaking. 2.5.3 Reduced-Reference Quality Assessment The reduced-reference quality assessment approaches achieve more accurate quality estimation by providing features of the original video at a low bit rate. The first reduced-reference video quality assessment system was proposed by Webster et al. [202]. Features derived from spatial and temporal pixel gradients of the original video content are sent to the receiver. The received video quality is estimated by combining the distortion of the features. This scheme has been refined by using features describing the spatial activity [204] and color coherence [205]. Most researchers in this field focus on investigating feature extraction according to target quality metrics. In [49, 52, 90], features including blocking, blur, edge-based image activity, gradientbased image activity, and intensity masking, are transmitted and combined into a hybrid image quality metric for the receiver quality assessment. Extensions include extracting the features in multi-resolution images [51, 68, 121], assessing the quality by treating the region of interest and background differently [50], and using a neural network to learn the estimator of subjective quality from features [93]. A statistical approach is proposed by Wang et al.. The authors propose using the generalized Gaussian distribution to model wavelet coefficients of the original image and send the estimated parameters as the reduced reference. The quality measurement is derived from the Kullback-Leibler divergence between the original model specified by the parameters and the received image statistics [199]. Less work has been done on the coding of features. Yamada et al. propose estimating the reconstruction PSNR by sending a representative luminance value and an entropy coded bit map indicating the positions in the original image having the representative luminance value [211]. The ITU-T J.240 Recommendation suggests a subsampling method using a projection of the image signal after whitening in both spatial and Walsh-Hadamard transform domains [80,86]. Conventional source coding CHAPTER 2. BACKGROUND 26 of the projections is not efficient due to the large variance of the whitened coefficients. In contrast, we report an efficient coding scheme using distributed source coding to exploit the correlation between the projections of the original and received images [32, 108]. Tagliasacchi et al. propose similar schemes using Wyner-Ziv coding for MSE estimation using random projection [183] and for a perceptual quality estimation using structural features [176]. In their approaches, the decoder first reconstructs the features using minimum mean squared error (MMSE) estimation given the side information and the reconstructed quantization indices, and then the decoder estimates the quality according to the selected metric. Unfortunately, the MMSE reconstruction leads to a suboptimal quality estimation. Later in Chapter 6, we will show that quality estimation that directly uses side information indices yields lower estimation error. 2.6 Summary This chapter reviews past approaches of robust hashing for image authentication, covering compression invariant features, block projections, and robust projections. Although past contributions were rarely dedicated to the coding of feature vectors, the evidence from the literature indicates that error-correcting coding can provide improvements in security and authentication data size reduction. We then review the lossless distributed source coding which is the fundamental technique behind the error-correcting coding to offer such improvements. We describe the contributions to the secure biometrics which has similar setting to the image authentication. The contributions indicate that the distributed source coding technique plays an important role toward the optimal security biometric scheme. We then review some information theoretical results in hypothesis testing with multiterminal data compression. The results lead us to explore the potential of using distributed source coding in the image authentication problem which will be presented in the following chapters and in the quality monitoring system which will be described in Chapter 6. Chapter 3 Image Authentication Using Distributed Source Coding This chapter studies and investigates a practical image authentication scheme using distributed source coding. The key idea is to provide a user with a Slepian-Wolf encoded projection of the original image as authentication data, and for the user to attempt to decode this bitstream using the target image as side information. The Slepian-Wolf result [167] indicates that the lower the distortion between side information and the original, the fewer authentication bits are required for correct decoding. By correctly choosing the size of the authentication data, this insight allows us to distinguish between legitimate encoding variations of the image and illegitimate modifications. Section 3.1 formulates the image authentication problem and justifies the use of block projection and distributed source coding for generating rate-efficient authentication data. Section 3.2 describes the image authentication scheme and its rationale in detail. Simulation results presented in Section 3.3 demonstrate the tradeoffs between the authentication data size and tampering detection performance. 27 28 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 3.1 Image Authentication Problem Our approach to the image authentication problem is through hypothesis testing. The authentication data provide some information about the original image to the user. The user makes the authentication decision based on the target image and the authentication data. We first describe a two-state channel that models the target image. Section 3.1.2 details the statistical assumptions and describes the projection basis. 3.1.1 Two-State Channel We model the target image y by way of a two-state lossy channel, shown in Figure 3.1. In the legitimate state, the channel performs lossy compression and reconstruction, such as JPEG and JPEG2000, with peak signal-to-noise ratio (PSNR) of 30 dB or better. In the tampered state, it additionally includes a malicious attack. :%;($(<#$% '$#$% =#<.%5%0 '$#$% >5(;(/#+ ?<#;% * >5(;(/#+ ?<#;% * =#5;%$&?<#;% + 12,34555612,3 12,34555612,3 7&"%-%*89 :;;&-< =#5;%$&?<#;% + Figure 3.1: The target image y is modeled as an output of a two-state lossy channel. In the legitimate state, the channel consists of lossy compression and reconstruction, such as JPEG and JPEG2000; in the tampered state, the channel further applies a malicious attack. Figure 3.2 demonstrates a sample input and two outputs of this channel. The source image x is a Kodak test image at 512×512 resolution. In the legitimate state, the channel is JPEG2000 compression and reconstruction at (the worst permissible) 30 dB PSNR. In the tampered state, a further malicious attack is applied: a 19×163 pixel text banner is overlaid on the reconstructed image and some objects are removed. The joint statistics of x and y vary depending on the state of the channel. We illustrate this by plotting in Figure 3.3 the luminance difference between the target 29 CHAPTER 3. IMAGE AUTHENTICATION USING DSC (a) (b) (c) Figure 3.2: An image from the Kodak test image set. (a) x original, (b) y at the output of the legitimate channel, and (c) y at the output of the tampered channel. and the original images. In the legitimate state, the difference resembles white noise due to the compression; in the tampered state, the channel additionally introduces tampering which results in image-like differences in some regions. Based on the aforementioned observation, we describe the image authentication problem in a hypothesis testing formulation: x−y = z = ! z0 , if the channel is in the legitimate state, z1 , if the channel is in the tampered state (3.1) where we model z0 as white noise, and z1 similar to z0 except that it has some regions which contain image-like noise. Next, we will describe the statistical assumptions on z0 and z1 . 3.1.2 Residual Statistics To illustrate the residual statistics, we use Kodak test images and compress them at different qualities to generate legitimate images. We additionally overlay text banners and remove some objects in the images to generate tampered images like Figure 3.2. We assume that the residual process z in tampered or legitimate regions is wide-sense stationary. Figure 3.4 plots sample autocorrelation functions Rzz (k, l) = 30 CHAPTER 3. IMAGE AUTHENTICATION USING DSC (a) (b) Figure 3.3: The difference between the two-state lossy channel input and output. (a) The difference resembles white noise in the legitimate state. (b) In the tampered state, the channel introduces tampering resulting in image-like differences. E[(z(m, n) − µz )(z(m − k, n − l) − µz )] of the residual in legitimate and tampered regions, where µz is the mean of the residual z assumed to be zero. The residual z in legitimate regions is less correlated to its neighbors, and more correlated in tampered regions. We, therefore, model the autocorrelation function of z as: Rzz (k, l) = ! σ02 δ(k, l), σ12 exp(−λ1 |k|) exp(−λ2 |l|) + legitimate region σ02 δ(k, l), tampered region (3.2) where the autocorrelation function in the tampered region is based on the separable image model [81]. To find important components that distinguish legitimate and tampered regions, we plot power spectral density (PSD) functions Φzz (ω1 , ω2 ) (based on the model of z in Equation (3.2) with λ1 = λ2 = 0.025) at ω2 = 0 in Figure 3.5. The PSD in legitimate regions is flat and in tampered regions is peaky at low frequency. This suggests low frequency components can greatly distinguish legitimate and tampered regions while high frequency components offer less discrimination. 31 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 1 1 0.5 0.5 0 0 10 10 0 10 0 −10 −10 l k (a) 10 0 0 −10 −10 l k (b) Figure 3.4: Sample autocorrelation function Rzz normalized according to Rzz (0, 0) in (a) legitimate regions, Rzz (0, 0) = 47.9, and (b) tampered regions, Rzz (0, 0) = 1326.8. The residual z in legitimate regions is less correlated to its neighbors, while it is more correlated in tampered regions. We exercise this assumption by plotting the distribution of the difference Z = Y − X, where X and Y are image projections of x and y in Figure 3.2, respectively. The projections are blockwise of size 16×16 pixels. In Figure 3.6, we use the block mean as the projection. Since the samples of the projection difference Z are sums of compression noise, the distribution of Z resembles a Gaussian, by the central limit theorem. In the tampered channel state, the image samples in the tampered region are unrelated to those of the original image and have large variance in low frequencies, giving the distribution of Z non-negligible tails. On the other hand, in Figure 3.7, we use the highest frequency basis in the 2D Hadamard transform as the projection basis. Both the difference distributions of legitimate and tampered images resemble a Gaussian and are similar to each other. This indicates that the high frequency projection hardly distinguishes tampered images from legitimate ones. Both Figure 3.6 and Figure 3.7 show that image projections of the legitimate image are highly correlated to the original image projection. 32 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 65 : ?"/*$*4#$" 2#4@"&". " ;;<!=5> 65 65 65 65 9 8 7 6 $# $ # 8 # 5 ! 8 # Figure 3.5: Power spectral density function of z at ω2 = 0 in legitimate and tampered regions. Now we describe the image authentication problem at the blockwise projection level in the hypothesis testing setting: X|Y ∼ ! P (X|Y ) ∼ N (Y, σ02 ), Q(X|Y ) ∼ (1 − γ)N (Y, σ02 ) y is legitimate + γPtampered (X|Y ), y is tampered (3.3) where γ ∈ [0, 1] is the tampered fraction of image blocks, and Ptampered (X|Y ) is the probability model for tampered blocks depending on the projection basis. We assume that Ptampered (X|Y ) = U(X) is a uniform distribution over the dynamic range of X when we use the mean projection. Having both projections X and Y , the optimal decision is based on the likelihood ratio test: P (X,Y ) Q(X,Y ) ≷ T . The next section will describe our image authentication scheme which uses these statistical assumptions to efficiently generate authentication data by using distributed source coding. 33 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 0 0 10 10 Difference PMF in legitimate state Gaussian approximation of difference in legitimate state Difference PMF in tampered state Gaussian approximation of difference in legitimate state −1 −1 10 10 −2 −2 10 10 −3 −3 10 10 −4 10 −4 0 20 40 60 Difference Z = Y − X 80 100 10 0 20 40 60 Difference Z = Y − X (a) 80 100 (b) Figure 3.6: The difference distributions between the two-state lossy channel input and output using the blockwise mean as the projection. (a) The difference distribution resembles a Gaussian in the legitimate state. (b) In the tampered state, the tampered channel introduces larger deviations. 0 0 10 10 Difference PMF in legitimate state Gaussian approximation of difference in legitimate state Difference PMF in tampered state Gaussian approximation of difference in legitimate state −1 −1 10 10 −2 −2 10 10 −3 −3 10 10 −4 10 −5 −4 0 5 10 15 20 25 Difference Z = Y − X (a) 30 35 40 10 −5 0 5 10 15 20 25 Difference Z = Y − X 30 35 40 (b) Figure 3.7: The difference distributions between the two-state lossy channel input and output using a high frequency projection. (a) The difference distribution in the legitimate state. (b) The difference distribution in the tampered state. Both the difference distributions resemble a Gaussian and are similar to each other. This means that the high frequency projection can hardly distinguish tampered images from legitimate ones. 34 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 3.2 Image Authentication System Figure 3.8 depicts the image authentication scheme using distributed source coding. The left-hand side of Figure 3.8 shows that the authentication data consist of a Slepian-Wolf encoded quantized image projection of x and a digital signature of that version. The verification decoder, in the right-hand side of Figure 3.8, knows the statistics of the worst permissible legitimate channel and can correctly decode the authentication data with the help of an authentic image y as side information. >5(;(/#+&?<#;% * =>*(9;&;#?@A&''#" "#/03< '%%0 /0 G&'.*F 2/*H#-;%*' E@#/$(F%0&?<#;%& G53H%*$(3/ !- G&'.*F 2/*H#-;%*' '+%.(#/AB3+C D($)$5%#<126!-8 ?<#;%&G53H%*$(3/&! I8&';%J&;%*' '(0%&?/C35<#$(3/&# =#5;%$&?<#;% + !"#$%&'()*"+ ,'-*.#/ "%*3/)$5@*$%0&?<#;%& G53H%*$(3/ !-3 !"#$%&'()*"+ 0#-*.#/ @/B$;*C/&$A%-? D&9A?E8'-;%*' "#/03< '%%0 /0 @/B$;*C/&$A%D&9A?E8'-;%*' ?<#;% -(;%)$ :9BFF#;/%,'-/B$;%*' G5(I#$%&J%, -(;($#+ '(;/#$@5% ,6!-./08 :9BFF#;/%0#-/B$;%*' ?<#;% -(;%)$ @*F$&/%9*' G@K+(*&J%, Figure 3.8: Image authentication system using distributed source coding. The authentication data consist of a Slepian-Wolf encoded quantized pseudorandom projection of the original image, a random seed, and a signature of the image projection. The target image is modeled as an output of the two-state lossy channel shown in Figure 3.1. The user projects the target image using the same projection to yield the side information and tries to decode the Slepian-Wolf bitstream using the side information. Once the decoding fails, i.e., the hash value of the reconstructed image projection does not match the signature, the verification decoder claims it is tampered, otherwise, the reconstructed image projection along with the side information will be examined using hypothesis testing. In our authentication system shown in Figure 3.8, a pseudorandom projection (based on a randomly drawn seed Ks ) is applied to the original image x and the projection coefficients X are quantized to yield Xq . The authentication data are comprised of two parts, both derived from Xq . The Slepian-Wolf bitstream S(Xq ) is the output of a Slepian-Wolf encoder based on low-density parity-check (LDPC) 35 CHAPTER 3. IMAGE AUTHENTICATION USING DSC codes [111,112] and the much smaller digital signature D(Xq , Ks ) consists of the seed Ks and a cryptographic hash value of Xq signed with a private key. The authentication data are generated by a server upon request. Each response uses a different random seed Ks , which is provided to the decoder as part of the authentication data. This prevents an attack which simply confines the tampering to the nullspace of the projection. Based on the random seed, for each 16×16 nonoverlapping block Bi , we generate a 16×16 pseudorandom matrix Pi by drawing its elements independently from a Gaussian distribution N (1, σp2) and normalizing so that ||Pi ||2 = 1. We choose σp = 0.2 empirically. In this way, we maintain the nice properties of the mean projection as suggested in the previous section while gaining sensitivity to high-frequency attacks. The inner product +Bi , Pi , is quantized into an element of Xq . The rate of the Slepian-Wolf bitstream S(Xq ) determines how statistically similar the target image must be to the original to be declared authentic. If the conditional entropy H(Xq |Y ) exceeds the bit rate R in bits per pixels, Xq can no longer be decoded correctly [167]. Therefore, the rate of S(Xq ) should be chosen to distinguish between the different joint statistics induced in the images by the legitimate and tampered channel states. At the encoder, we select a Slepian-Wolf bit rate just sufficient to authenticate both legitimate 30 dB JPEG2000 and JPEG reconstructed versions of the original image. At the receiver, the user seeks to authenticate the image y with authentication data S(Xq ) and D(Xq , Ks ). It first projects y to Y in the same way as during authentication data generation. A Slepian-Wolf decoder reconstructs Xq & from the Slepian-Wolf bitstream S(Xq ) using Y as side information. Decoding is via the LDPC messagepassing algorithm [111, 112] initialized according to the statistics of the legitimate channel state at the worst permissible quality for the given original image. Finally, the image digest of Xq & is computed and compared to the image digest, decrypted from the digital signature D(Xq , Ks ) using a public key. If these two image digests do not match, the receiver recognizes that image y is tampered, otherwise the receiver makes a decision based on the likelihood ratio test: P (Xq " ,Y ) Q(Xq " ,Y ) ≶ T , where P and Q are CHAPTER 3. IMAGE AUTHENTICATION USING DSC 36 probability models derived from (3.3) for legitimate and tampered states, respectively, and T is a fixed decision threshold. 3.3 Simulation Results We use the test images, shown in Appendix A, at 512×512 resolution in 8-bit gray scale resolution. The two-state channel in Figure 3.1 has JPEG2000 or JPEG compression and reconstruction applied at several qualities. The malicious attack consists of the overlaying of a 19×163 text banner at a random location in the image or removing a randomly selected Maximally Stable Extremal Region (MSER) [122] by interpolating the region from its boundaries. The text color is white or black, whichever is more visible, to avoid generating trivial attacks, such as white text on a white area. The quantization of the authentication encoder is varied so that the Slepian-Wolf encoder processes between 1 to 8 bitplanes, starting with the most significant. The Slepian-Wolf codec is implemented using rate-adaptive LDPC codes [188] with block size of 1024 bits. During authentication data generation, the bitplanes of X are encoded successively as LDPCA syndromes. The bitplanes are conditionally decoded, with each decoded bitplane acting as additional side information for subsequent bitplanes, as in [7]. 3.3.1 Authentication Data Size Figure 3.9 compares the minimum rate that would be required to decode the SlepianWolf bitstream S(Xq ) for side information Y due to legitimate and tampered channel states for Lena with the projection X quantized to 4 bits. The following observations also hold for other images and levels of quantization. The rate required to decode S(Xq ) with legitimately created side information is significantly lower than the rate (averaged over 100 trials) when the side information is tampered, for JPEG2000 or JPEG reconstruction PSNR above 30 dB. Moreover, as the PSNR increases, the rate for legitimate side information decreases, while the rate for tampered side information 37 Rate (bits per pixel of original image) CHAPTER 3. IMAGE AUTHENTICATION USING DSC 0.016 0.014 0.012 0.01 Rate with Conventional Fixed Length Coding Minimum Rate for Tampered State with DSC Minimum Rate for Legitimate State with DSC 0.008 0.006 0.004 0.002 0 28 30 32 34 36 38 Reconstruction PSNR (dB) 40 42 Figure 3.9: Minimum rate for decoding Slepian-Wolf bitstream for the image Lena with the projection X quantized to 4 bits. stays high and close to the conventional fixed length coding. The rate gap justifies our choice for the Slepian-Wolf bitstream size: the size just sufficient to authenticate both legitimate 30 dB JPEG2000 and JPEG reconstructed versions of the original image. Figure 3.10 shows the maximum selected Slepian-Wolf bitstream size in bytes among all the test images from 1 to 8 bits in quantization of X. For 4-bit quantization, the Slepian-Wolf bitstream size is less than 80 bytes or 2.3% of the encoded file sizes at 30 dB reconstruction. Compared to conventional fixed length coding, distributed source coding offers a great rate saving. For authentication data size of 120 bytes, conventional fixed length coding can only deliver 1-bit quantized projections, while distributed source coding can offer 5-bit precision. The overall effect is lower decision error. 38 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 1200 Authentication Data Size (Bytes) Conventional Fixed Length Coding Distributed Source Coding 1000 800 600 400 200 0 1 2 3 4 5 6 Number of Bits in Quantization 7 8 Figure 3.10: Authentication data sizes in number of bytes using convention fixed length coding and distributed source coding for different number of bits in quantization. 3.3.2 Receiver Operating Characteristic We now fix the authentication data sizes of different numbers of bits in quantization shown in Figure 3.10 to evaluate the tampering detection using 3,450 legitimate and 3,450 tampered test images. We measure the false acceptance rate (the chance that a tampered image is falsely accepted as a legitimate one) and the false rejection rate (the chance that a legitimate image is falsely detected as a tampered one.) Figure 3.11 compares the receiver operating characteristic (ROC) curves for tampering detection with different numbers of bits in quantization by sweeping the decision threshold T in the likelihood ratio test. In the likelihood ratio test, we set the variance of the Gaussian in P to be 2 and γ (the convex combination parameter in Q in (3.3)) to be 0.02. Figure 3.11 shows that higher quantization precision offers better detection performance, but this comes at the cost of more authentication data. Figure 3.12 combines 39 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 0 10 −1 False Acceptance 10 −2 10 −3 10 1−bit Quantization 2−bit Quantization 4−bit Quantization 8−bit Quantization −4 10 −4 10 −3 10 −2 10 False Rejection −1 10 0 10 Figure 3.11: ROC curves of tampering detection with different number of bits in quantization of X for test images. This demonstrates that higher quantization precision offers better detection performance. the results of Figures 3.10 and 3.11, depicting the ROC equal error rate versus the authentication data size in bytes for different coding methods. The equal error rates are interpolated from the ROC curves as the points where the false acceptance rate equals the false rejection rate. Distributed source coding reduces the authentication data size by 75% to 83% compared to conventional fixed length coding at the same ROC equal error rates of 2%. Now, we compare our authentication system to others in the literature: Lin et al. [99], Fridrich [57], and Swaminathan et al. [175]. The method described in [99] is JPEG-inspired. The first 2 DCT coefficients (according to the zigzag order) per 8×8 image block are selected to generate authentication data of 512 bytes per image. The method proposed by Fridrich is block projection based. We set it to generate 20 bits per 64×64 block according to zero-mean low-pass pseudorandom projections. This yields 168 bytes per image. The verification is based on Hamming distance of the hash values. Swaminathan’s method takes circular summations of the 2D Fourier 40 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 0 10 ROC Equal Error Rate Distributed Source Coding Conventional Fixed Length Coding −1 10 −2 10 0 200 400 600 800 1000 Authentication Data Size (Bytes) 1200 Figure 3.12: ROC equal error rates for different authentication data sizes using conventional fixed length coding and distributed source coding. transformed image to generate hashes of 100 bytes per image. Our approach using distributed source coding generates 78 bytes per image. Figure 3.13 plots the ROC curves of various image authentication systems by sweeping their decision thresholds. Our approach outperforms the methods proposed by Fridrich and Swaminathan et al. and performs closely to Lin’s method which requires the authentication data size of 512 bytes while our approach requires only 78 bytes per image. 3.4 Summary This chapter presents a statistical model for the image authentication problem using a two-state channel. The proposed scheme captures the spatial structure of tampering noise using a blockwise projection. The analysis suggests that the mean projection be used as the principal discriminant basis. For reasons of security, we instead use 41 CHAPTER 3. IMAGE AUTHENTICATION USING DSC 0 10 −1 False Acceptance 10 −2 10 −3 10 Fridrich Lin Swaminathan Our approach −4 10 −4 10 −3 10 −2 10 False Rejection −1 10 0 10 Figure 3.13: ROC curves of various authentication methods. a mean plus pseudo-random noise as the projection basis. The quantized projection coefficients of the original image are compressed at a proper rate by the SlepianWolf encoder to yield the authentication data, which can be correctly decoded using authentic images as side information. Distributed source coding provides robustness against various legitimate encodings while detecting malicious modifications at a much lower rate of authentication data than conventional fixed length coding. The proposed authentication system has lower detection error rates and smaller authentication data size compared with other systems. The authentication decoder presented in this chapter addresses various lossy compressions. The next chapter will discuss an adaptive distributed source coding decoder using a statistical method to broaden the robustness of the system for some common adjustments, such as contrast and brightness adjustment, and affine warping. Chapter 4 Learning Unknown Parameters of Image Adjustment The previous chapter presents an image authentication scheme that distinguishes legitimate encodings from tampering using distributed source coding. It is very common for the target image to be adjusted to accommodate the display capability for a better content presentation. For example, the image might be cropped and resized to meet the size and resolution of the client display or contrast and brightness adjustment may be applied to an image that is too gloomy or over-exposed. If we consider those adjusted images as legitimate ones, the image authentication system described in the previous chapter cannot authenticate those legitimate images because the side information would be considered as tampered due to the adjustments. As mentioned in Chapter 2, past approaches address this challenge by using invariant projections or features robust against the legitimate editings. These approaches are not flexible: they work well on the editing for which they are designed, but might fail for other types of editing. An apparent workaround is to try every possible adjustment to align the received image. Heuristic approaches would exhaustively search over all possible editing space to authenticate the target image. The complexity of this approach grows exponentially as the dimension of the editing space increases. 42 CHAPTER 4. LEARNING UNKNOWN PARAMETERS 43 This chapter presents a solution in which the authentication decoder learns the editing parameters directly from the target image through decoding the authentication data using an expectation maximization (EM) algorithm. Section 4.1 introduces a two-state channel with unknown editing parameters to formulate the problem. Section 4.2 describes the proposed authentication decoder for images that have undergone contrast and brightness adjustment. Section 4.3 presents our solution to authentication of affine warped images. Section 4.4 extends the decoder to address images that have simultaneously undergone contrast, brightness, and affine warping adjustment. Experimental results in Section 4.5 demonstrate that the EM decoder can distinguish legitimate editing from malicious tampering while accurately learning the parameter. Authentication data size is comparable to the oracle decoder which knows the ground truth parameters. 4.1 Two-State Channel with Unknown Adjustment Parameters We model the target image by way of a two-state channel with unknown adjustment parameters as shown in Figure 4.1. In both states, the channel adjusts the image via legitimate editing with a fixed but unknown parameter θ. In the legitimate state, we model y = f (x; θ) + z, where x and y are the original and the target images, respectively, and z is noise introduced by compression and reconstruction. In the tampered state, the channel additionally applies malicious tampering. Figure 4.2 demonstrates the channel for a Kodak test image at 512×512 resolution. In Figure 4.2(b), a target image has undergone contrast and brightness adjustment. We have f (x; α, β) = αx + β, where α, β ∈ R are contrast and bright- ness adjustment parameters, respectively. In Figure 4.2(c), the channel applies affine warping: y(m) = f (x; A, b) = x(Am + b), where m ∈ R2 are the correspond- ing coordinates in the target image, and A ∈ R2×2 , b ∈ R2 are transformation and translation parameters, respectively. Figure 4.2(d) shows a target image which has simultaneously undergone contrast, brightness and affine warping adjustment: 44 CHAPTER 4. LEARNING UNKNOWN PARAMETERS ?"/*$*4#$" I$#$" D&*/*,#B E4#/" ! ?"/*$*4#$"F'.*$*,/ #<GH!> 2#4@"&". I$#$" D&*/*,#B E4#/" ! ?"/*$*4#$"F'.*$*,/ #<GH!> 2#&/"$ E4#/" " ?+33C -+4@&"33*+, ?+33C -+4@&"33*+, A#B*)*+%3 2#4@"&*,/ 2#&/"$ E4#/" " Figure 4.1: The target image is modeled as an output of a two-state channel affected by a global editing function f (.; θ) with unknown but fixed parameter θ. In the tampered state, the channel additionally applies malicious tampering. y(m) = f (x; A, b, α, β) = αx(Am + b) + β. In the last case, there are 8 scalar parameters. Heuristic methods may need to decode and test authentication data using 108 possible candidates given 10 for each parameter. This makes the exhaustive search practically infeasible. Moreover, since the authenticity decision is based on likelihood ratio test: P (Xq ,y;θ) Q(Xq ,y;θ) ≷ T , accurate estimation of θ is needed for confident decision results. Figure 4.3 shows an unrealistic solution. The decoder has an oracle knowing the true editing parameters of the target image corresponding to the authentication data. (a) (b) (c) (d) Figure 4.2: One of the Kodak test images. (a) The original image; (b) a legitimate image with contrast increased by 20% and brightness decreased by 10/255; (c) a legitimate image rotated 5-degrees around the center; and, (d) a legitimate image with contrast increased by 20%, brightness decreased by 10/255, and rotated 5 degrees around the center. All target images (b)-(d) are compressed and reconstructed by JPEG at 30 dB PSNR. 45 CHAPTER 4. LEARNING UNKNOWN PARAMETERS The target image is then compensated using the parameters provided by the oracle. Then the decoder decodes the Slepian-Wolf bitstream and tests the target image and reconstructed image projection in the same way described in Chapter 3. The following sections will show how to turn this unrealistic solution into a practical one using statistical learning techniques for various editing models. =#5;%$&?<#;% + K#C%;%F&;#?,.%;%'C? @*F$#'9&;%*' '+%.(#/AB3+C D($)$5%#< 26!-8 !"#$%&'()*"+ 0#-*.#/ ! '#..6!-8 L/&-"# "%*3/)$5@*$%0 ?<#;%&G53H%*$(3/ !-3 Figure 4.3: The oracle decoder knows the parameters and compensates the target image to align with the authentication data. Then the Slepian-Wolf bitstream is decoded using the compensated target image as side information to yield an a posteriori pmf of the quantized projection Papp (Xq ). The reconstructed quantized image projection is the result of a hard decision on Papp (Xq ). 4.2 EM Decoder for Contrast and Brightness Adjustment This section considers a target image that has undergone contrast and brightness adjustment. The example shown in Figure 4.2(b) has the contrast and brightness parameters (α, β) = (1.2, −10), such that y = αx+β +z. Without knowing the contrast and brightness adjustment parameters, the decoder of Chapter 3 requires a high rate of the Slepian-Wolf bitstream to successfully decode, since the decoder is unaware of the adjustment. The affine relationship is preserved by the random image projection due to its linearity; that is, Y = αX + β + Z. Consequently, even legitimate Y is poor side information for the decoding of X and the decoder will treat the adjusted image as tampering. Unlike past approaches in which the projection or the features might be invariant to the contrast and brightness adjustment, we solve this problem 46 CHAPTER 4. LEARNING UNKNOWN PARAMETERS by decoding the authentication data while learning the parameters that establish the correlation between the target and original images. Estimation of the contrast and brightness adjustment parameters requires the target image and the original image projections, but the latter is not available before decoding. This situation with latent variables to estimate can be addressed using the statistical method called expectation maximization. Figure 4.4 shows the Slepian-Wolf decoder with EM. It decodes the Slepian-Wolf bitstream S(Xq ) using the target image projection Y as side information and yields the reconstructed image projection Xq & . Note that it now decodes the bitstream via an EM algorithm that updates the a posteriori probability mass function (pmf) Papp (Xq ) in the E-step and updates α and β by maximum likelihood estimation in the M-step. =#5;%$&?<#;%& G53H%*$(3/ ! ,9;%F&;#.?@*';/&9;?&'.? M/%CA;'#99?:.H89;F#'; '+%.(#/AB3+C D($)$5%#< "6#$8 !"#$%&'()*"+ 0#-*.#/ !&'" 2&/&F#;#/ ,9;%F&;%*' %#..6#$8 "%*3/)$5@*$%0 ?<#;%&G53H%*$(3/ #$( Figure 4.4: The Slepian-Wolf decoder with contrast and brightness adjustment learning decodes the Slepian-Wolf bitstream S(Xq ) using the side information Y compensated with the previously estimated contrast and brightness adjustment parameters. Each iteration produces soft estimation of Xq in the E-step and updates the contrast and brightness adjustment parameters in the M-step. In the E-step, we fix contrast α and brightness β at their current estimates. Next we apply contrast and brightness adjustment with these values to obtain a priori pmfs of the image projection Xq (i) from the side information Y (i). Finally, we run one iteration of joint bitplane LDPC decoding on the a priori pmfs with the Slepian-Wolf bitstream S(Xq ) to produce extrinsic pmfs Papp (Xq (i) = xq ), which we denote Qi (xq ) for convenience. CHAPTER 4. LEARNING UNKNOWN PARAMETERS 47 In the M-step, we fix these extrinsic pmfs Qi (xq ) of the projection Xq (i) and estimate α and β with reference to the side information Y (i). For robustness, we only consider projections, for which maxxq Qi (xq ) > T = 0.9995, denoting the set of eligible indices as C.1 We now derive optimality conditions2 on the parameters α and β for the maximization of a lower bound L̂(α, β) of the log-likelihood function L(α, β). The lower bound is due to Jensen’s inequality and the concavity of log(.): L(α, β) ≡ = & log P (Xq (i), Y (i); α, β) i∈C & log && i∈C Qi (xq )P (xq , Y (i); α, β) xq i∈C ≥ & xq Qi (xq ) log P (xq |Y (i); α, β) + log P (Y (i)) ≡ L̂(α, β), (4.1) (4.2) where the distribution P (xq |Y (i); α, β) is modeled as a quantized Gaussian with mean at α1 (Y (i) − β) and variance σz2 /α2 . The quantization of X is uniform and saturated for X less than 0 or greater than 255. Setting partial derivatives of L̂(α, β) with respect to α and β to zero, we obtain the optimality conditions : " " i i µ Y (i) − x i∈C i∈C j∈C µx Y (j) α= " " " |C| i∈C µix2 − i∈C j∈C µix µjx 1 & β= Y (i) − αµix |C| i∈C |C| 1 " To guarantee that C is nonempty, we make sure to encode a small portion of the quantized image projection Xq with degree-1 syndrome bits. The decoder knows those values with probability 1 and includes their indices in C. 2 Appendix B discusses the concavity of L̂(α, β) to claim the optimality conditions. CHAPTER 4. LEARNING UNKNOWN PARAMETERS 48 where µix = & Qi (xq )E[X(i)|q(X(i)) = xq , Y (i); α, β] xq µix2 = & xq Qi (xq )E[X(i)2 |q(X(i)) = xq , Y (i); α, β]. Since both the left and right hand sides of the optimality conditions contain α and β, we update them iteratively until convergence or at most 30 iterations. The outer loop of EM iterations terminates when hard decisions on Papp (Xq (i) = xq ) satisfy the constraints imposed by S(Xq ). If the hash value of Xq does not match the one in the authentication data, the decoder declares the image to be tampered; otherwise, we make a decision based on the log likelihood ratio test, P (Xq ,y;α,β) Q(Xq ,y;α,β) ≷ T , where T is a fixed threshold. Figure 4.5 demonstrates the efficiency of the EM decoder by illustrating the traces of parameter searching for different decoders. The ground truth of the contrast parameter is 0.84, and brightness is 10. The oracle decoder directly outputs the ground truth. The decoder unaware of adjustment uses 1 and 0 for contrast and brightness parameters, respectively. In Figure 4.5(c), the exhaustive searching decoder tries to decode the authentication data using samples in the parameter space from -0.75 to 1.2 of contrast parameter and -20 to 20 of brightness parameter until it obtains a parameter sample that can successfully decode the bitstream. The discrete search space makes the resulting parameters inaccurate and the computational complexity will grow exponentially as the parameter dimension increases. Figure 4.5(d) shows the search trace of our proposed EM decoder. Even though the initial parameters are far from the ground truth, the decoder approaches the ground truth within a manageable number of iterations. Unlike the exhaustive search, the EM decoder estimates the parameters in a continuous space. Simulation results in Section 4.5 will demonstrate the accuracy of the our parameter estimation. 49 CHAPTER 4. LEARNING UNKNOWN PARAMETERS 25 25 Ground Truth Searched Parameters Final Estimated Parameters 15 10 5 0 −5 −10 −15 −20 15 10 5 0 −5 −10 −15 0.8 0.9 1 1.1 Contrast Parameter −20 1.2 (a) Oracle decoder 0.9 1 1.1 Contrast Parameter 1.2 25 Ground Truth Searched Parameters Final Estimated Parameters 20 15 10 5 0 −5 −10 −15 Ground Truth Initial Parameters Searched Parameters Final Estimated Parameters 20 Brightness Parameter Brightness Parameter 0.8 (b) Unaware of adjustment 25 −20 Ground Truth Final Estimated Parameters 20 Brightness Parameter Brightness Parameter 20 15 10 5 0 −5 −10 −15 0.8 0.9 1 1.1 Contrast Parameter (c) Exhaustive search 1.2 −20 0.8 0.9 1 1.1 Contrast Parameter 1.2 (d) EM decoder Figure 4.5: Search traces for different decoders. (a) The oracle decoder directly outputs the ground truth; (b) the decoder unaware of adjustment outputs (1,0) for contrast and brightness parameters; (c) the exhaustive searching decoder tries to decode the authentication data using the parameters in the discrete search space, until it reaches a parameter that can successfully decode the authentication data; and, (d) the proposed EM decoder iteratively updates the parameters and decodes the authentication data. 4.3 EM Decoder for Affine Warping Adjustment In the previous section, the authentication system is extended to be robust against contrast and brightness adjustment using an EM algorithm. This section presents an extension of robustness against affine warping adjustment which includes cropping, resizing, rotation, and shearing. The example target image shown in Figure 4.2(c) is CHAPTER 4. LEARNING UNKNOWN PARAMETERS 50 first rotated counterclockwise by 5 degrees around the image center, then cropped to 512×512, and finally JPEG compressed and reconstructed at 30 dB'PSNR. Recall that ( 0.996 −0.087 we model the editing as y(m) = x(Am + b) + z(m), where A = 0.087 0.996 ' ( 23 and b = for a 5-degree counterclockwise rotation and cropping. Without −21 knowing the adjustment parameters, the decoder requires a high rate of Slepian-Wolf bitstream for successful decoding and suffers from a high false rejection rate. The authentication of such image adaptation is different from the problem addressed in the previous section, since the target image is no longer aligned with the corresponding authentication data. Our solution is to realign the target image by estimating the affine warping parameters using the corresponding authentication data. The authentication decision is based on the reconstructed image projection and the compensated target image. Due to affine warping and cropping, some portions of the original image are cropped out in the target image y. The cropped-out areas of the target image are not considered in the authentication decision. Figure 4.6 shows the target image realigned to the original. The blue areas in Figure 4.6(c) indicate the cropped-out regions. We refer to the remaining area of the image as the cropped-in region. The affine-learning Slepian-Wolf decoder shown in Figure 4.7 takes the SlepianWolf bitstream S(Xq ) and the target image y and yields the reconstructed image projection Xq & . It now decodes the authentication via an EM algorithm. The E-step updates the a posteriori probability mass functions (pmf) Papp (Xq ) using the SlepianWolf decoder and also estimates corresponding coordinates for a subset of reliablydecoded projections. The M-step updates the affine warping parameters based on the corresponding coordinate distributions, denoted Papp (m) in Figure 4.7. This loop of EM iterations terminates when hard decisions on Papp (Xq ) satisfy the constraints imposed by S(Xq ). In the E-step, we fix the parameters A and b at their current hard estimates. The inverse transform is applied to the target image y to obtain a compensated image ycomp . If the affine warping parameters are accurate, ycomp would be closely aligned to 51 CHAPTER 4. LEARNING UNKNOWN PARAMETERS (a) Original image (b) Affine warped image (c) Realigned image Figure 4.6: Realignment of an affine warped image. (a) The original image of a Kodak test image; (b) target image rotated counterclockwise by 5 degree and cropped to 512×512; and, (c) realigned target image color overlaid. The blue areas associated with the 16×16 blocks indicate the cropped-out regions; the other blocks form the cropped-in region. the original image x in the cropped-in region. We derive a priori pmfs for the image projections Xq as follows. In the cropped-in region, we use Gaussian distributions centered at the random projection values of ycomp , and in the cropped-out region, we use uniform distributions. Then, we run three iterations of joint bitplane LDPC decoding on the a priori pmfs with the Slepian-Wolf bitstream S(Xq ) to produce a posteriori pmfs Papp (Xq ). We estimate the corresponding coordinates for those projections for which maxxq Papp (Xq (i) = xq ) > T = 0.995, denoting this set of reliably-decoded projection indices as C. We also denote the maximizing reconstruction value xq to be xmax (i). q As before, the degree-1 syndrome bits are sent to guarantee that C is nonempty. We obtain corresponding coordinate pmfs Papp (m(i) ) for these projections by maximizing the following log-likelihood function: L(A, b) ≡ = & log P (xmax (i), n(i) , y; A, b) q i∈C & i∈C log ) & m(i) * P (xmax (i), n(i) , y|m(i) ; A, b)P (m(i) ) , q 52 CHAPTER 4. LEARNING UNKNOWN PARAMETERS 5&"$+2()*&$+ ! !"# '?!@ !-4*6 9'+6#&%:;4'< =#2>2"+&* "?#$@ %&66?#$@ ,+-4%>2"1-2+/ )*&$+(A"4B+-2#4% #$& Figure 4.7: The Slepian-Wolf decoder with affine warping parameter learning decodes the Slepian-Wolf bitstream S(Xq ) using the target image y as side information. The soft output of the quantized image projection Papp (Xq ) is matched to the target image y to yield corresponding coordinate estimations in the E-step. The affine warping parameters (A, b) are estimated in the M-step. where n(i) is the set of top-left coordinates of the 16×16 projection blocks Bi in the original image x, and the latent variable m(i) represents the corresponding set of coordinates in y. In this way, we associate a corresponding coordinate vector Papp (m(i) ) with each projection Xq (i) in C. For the projection Xq (i), we produce the pmf Papp (m(i) = m) by matching Xq (i) to projections obtained from y through vectors m over a small search window. Specifically, Papp (m(i) = m) is proportional to the integral over the quantization interval of xmax (i) of a Gaussian centered at the projection of a block q at m in the image y. Figure 4.8 gives a 1D example of the resulting distribution for the projection at n(i) = 193. The quantized projection shown as a red bar in Figure 4.8(a) is matched against the projections of y in Figure 4.8(b) over the search + window to obtain Papp (m) ∝ x:Q(x)=[xmax]i P (x|Y (m(i) ))dx in Figure 4.8(c). q In the M-step, we re-estimate parameters A and b by holding the corresponding coordinate pmfs Papp (m(i) ) fixed and maximizing a lower bound of the log-likelihood 53 CHAPTER 4. LEARNING UNKNOWN PARAMETERS 4Q5 + NQ N5 455 5VN N55 Q '%#5*L B(/03M Q5 5 '#.. ! !- NQ5 5 N4O NPQ NRN NSS NOT 45O 44Q 4PN 4QS 4ST NQ5 455 5 4 (a) 5V5Q 4Q5 5 NU5 NO5 455 4N5 5 (b) (c) Figure 4.8: Example of corresponding coordinate estimation in 1D. A reliably decoded quantized image projection Xq shown as a red bar in (a) matches the over-complete image projections of the target image over a search window shown in (b). The resulting a posteriori probability of the corresponding coordinate Papp (m) is proportional to P (Xq |Y (m)). function L(A, b): (A, b) := arg max A,b && Qi (m(i) ) log P (xmax (i), n(i) , y|m(i) ; A, b) q i∈C m(i) = arg max A,b && i∈C m(i) , Qi (m(i) ) log P (n(i) |m(i) , xmax (i), y; A, b) + log P (xmax (i), y|m(i) ) q q The lower bound is due to Jensen’s inequality and concavity of log(.). Note also that P (xmax (i), y|m(i) ) does not depend on the parameters A and b and can be thus q ignored in the maximization. We model P (n(i) |m(i) , xmax (i), y; A, b) as a Gaussian q distribution, i.e., (n(i) − Am(i) − b) ∼ N (0, σ 2I). Similar to the method of least squares, log P (n(i) |m(i) , xmax (i), y; A, b) is a concave function of A and b. Taking q partial derivatives with respect to A and b, and setting these to zero, we obtain the optimal updates: A11 A21 A12 A22 b1 b2 . .. .. . (i) (i) T −1 T := E(G G) E[G ] n 1 n2 .. .. . . , (4.3) 54 CHAPTER 4. LEARNING UNKNOWN PARAMETERS where ··· G = ··· ··· E[GT G] = (i) m1 (i) m2 1 ··· T ··· ··· (i) E[(m1 )2 ] & E[m(i) m(i) ] 1 2 (i) i∈C E[m1 ] and (i) (i) (i) E[m1 m2 ] E[m1 ] (i) E[(m2 )2 ] The likelihood ratio test for authenticity is (i) E[m2 ] P (Xq ,y;A,b) Q(Xq ,y;A,b) (i) E[m2 ] 1 ≷ T , measured over the cropped-in area of the compensated target image. 4.4 EM Decoder for Contrast, Brightness, and Affine Warping Adjustment Joint estimation of contrast, brightness, and affine warping adjustment parameters is not a trivial extension of the EM decoders described in Section 4.2 and Section 4.3. Without a proper estimation of contrast and brightness adjustment parameters, the corresponding coordinate estimation would fail. On the other hand, estimation of contrast and brightness adjustment parameters cannot be done without reference to corresponding coordinates. The key idea to solve this problem is to use the soft information of corresponding coordinates in the contrast and brightness adjustment parameter estimation. As before, in the E-step, we fix the parameters A, b, α, and β at their current hard estimates and obtain a compensated image ycomp . We derive a priori pmfs for the image projections Xq as follows. In the cropped-in region, we use Gaussian distributions centered at the random projection values of ycomp , and in the croppedout region, we use uniform distributions. Then, we run three iterations of joint bitplane LDPC decoding on the a priori pmfs with the Slepian-Wolf bitstream S(Xq ) to produce a posteriori pmfs Papp (Xq (i) = xq ). CHAPTER 4. LEARNING UNKNOWN PARAMETERS 55 We estimate the corresponding coordinates m(i) for those projections for which maxxq Papp ([Xq ]i = xq ) > T = 0.995, denoting this set of reliably-decoded projection indices as C. We also denote the maximizing reconstruction value xq to be [xmax ]i . q The latent variable update can be written as Qi (m) := P (m(i) = m|[xmax ]i , y, n(i) ; A, b, α, β) q In the M-step, we re-estimate the parameters A, b, α, and β by holding the corresponding coordinate pmfs Qi (m) fixed and maximizing a lower bound of the log-likelihood function: L(A, b, α, β) ≡ = & log i∈C ≥ = ) & i∈C P (xmax (i), n(i) , y|m(i) ; A, b, α, β)P (m(i)) q * Qi (m) log P (xmax (i), n(i) , y|m; A, b, α, β) q m && i∈C log P (xmax (i), n(i) , y; A, b, α, β) q m(i) && i∈C & m , (i), y|m; α, β) . (4.4) (i), y; A, b) + log P (xmax Qi (m) log P (n(i) |m, xmax q q The lower bound is due to Jensen’s inequality and concavity of log(.). Note also that P (xmax (i), y|m(i) ; α, β) does not depend on the parameters A and b, and q ]i , y; A, b) does not depend on the parameters α and β. Thus, we can P (n(i) |m, [xmax q maximize the lower bound separately over these two sets of parameters. The affine warping parameters are updated using (4.3). Similarly, we model P (Xqmax(i)|y, m; α, β) as a quantized Gaussian with mean at Y (m)−β α and variance σz2 /α2 . The quantization of X is uniform and saturated for X less than 0 or greater than 255. Setting partial derivatives with respect to α and β 56 CHAPTER 4. LEARNING UNKNOWN PARAMETERS to zero, we obtain the optimal updates3 : α := |C| " i∈C µiXY − " i∈C " j∈C µiX µjY " " " |C| i∈C µiX 2 − i∈C j∈C µiX µjX 1 & i µ − αµiX , β := |C| i∈C Y where (i)]], µiX = Em∼Qi [E[X|Y (m), xmax q µiY = Em∼Qi [Y (m)], µiX 2 = Em∼Qi [E[X 2 |Y (m), xmax (i)], q µiXY = Em∼Qi [Y (m)E[X|Y (m), xmax (i)]]. q The likelihood ratio test for authenticity is P (Xq ,y;A,b,α,β) Q(Xq ,y;A,b,α,β) ≷ T , measured over the cropped-in area of the compensated target image. 4.5 Simulation Results We use Kodak and classic test images, shown in Appendix A, at 512×512 resolution in 8-bit gray resolution for the simulation. The space-varying two-state channel in Figure 4.1 applies JPEG2000 or JPEG compression and reconstruction at several qualities above 30 dB. In Section 4.5.1, the channel applies contrast and brightness adjustment. Section 4.5.2 shows the results when affine warping adjustment is applied. In Section 4.5.3, the channel simultaneously applies contrast, brightness, and affine warping adjustment. 4.5.1 Contrast and Brightness Adjustment Our first experiment uses Lena of size 512×512 at 8-bit gray resolution. The twostate channel in Figure 4.1 randomly selects contrast and brightness parameters (α, β) 3 Appendix B discusses the concavity of L̂(α, β) to claim the optimality conditions. CHAPTER 4. LEARNING UNKNOWN PARAMETERS 57 from the set {(1.2, −20), (1.1, −10), (1.0, 0), (0.9, 10), (0.8, 20)}. After adjustment, JPEG2000 or JPEG compression and reconstruction is applied at 30 dB reconstruction PSNR. The malicious attack overlays a 20×122 pixel text banner randomly in the image. The text color is white or black, whichever is more visible, to avoid generating trivial attacks, such as white text on a white area. The image projection X is quantized to 4 bits, and the Slepian-Wolf encoder uses a 4096-bit LDPC code with 200 degree-1 syndrome nodes. Figure 4.9 compares the minimum rate (averaged over 20 trials) for decoding S(Xq ) with legitimate and tampered side information using three different decoding schemes: the proposed EM decoder that learns α and β, an oracle decoder that knows α and β, and a decoder unaware of adjustment that always uses α = 1 and β = 0. The EM decoder separates the minimum decodable rates as effectively as the oracle decoder, while the decoder unaware of adjustment cannot always decode at low rates with legitimate side information. This makes the decoder unaware of adjustment unable to distinguish tampered or legitimately adjusted images. The same observation holds for other test images. We set the authentication data size to 107 bytes and measure the false acceptance rate (the chance that a tampered image is falsely accepted as a legitimate one) and the false rejection rate (the chance that a legitimate image is falsely detected as a tampered one), using 30,000 test target images derived from 15 sample images. The channel settings remain the same except that α and β are drawn uniformly at random from [0.8, 1.2] and [−20, 20], respectively, and JPEG2000/JPEG reconstruction PSNR is selected from 30-42 dB. Figure 4.10 compares the receiver operating characteristic (ROC) curves for tampering detection of four decoders by sweeping the decision threshold T in the likelihood ratio test. The decoder unaware of adjustment has high possibility of false rejection, since it considers the contrast and brightness adjusted images as tampered. The EM decoder, the oracle decoder, and the decoder using an exhaustive search of parameters can confidently distinguish legitimate adjusted images from tampered ones. In the legitimate case, the EM decoder estimates α and β with mean squared error 6.2 × 10−5 and 0.9, respectively. 58 CHAPTER 4. LEARNING UNKNOWN PARAMETERS Rate (bits per pixel of original image) 0.016 0.014 0.012 Unaware of Adjustment for Tampered State Unaware of Adjustment for Legitimate State Oracle Decoder for Tampered State Oracle Decoder for Legitimate State EM Decoder for Tampered State EM Decoder for Legitimate State 0.01 0.008 0.006 0.004 0.002 0 (0.8, 20) (0.9, 10) (1, 0) (1.1, −10) Contrast and Brightness Adjustment Parameters (1.2, −20) Figure 4.9: Minimum rates for decoding the Slepian-Wolf bitstream S(Xq ) with legitimate and tampered side information using different decoders. The decoder unaware of adjustment requires high rates once the target image is adjusted or tampered. This makes the decoder unaware of adjustment unable to distinguish tampered or legitimately adjusted images. Both the EM and oracle decoders separate the minimum decodable rates. 4.5.2 Affine Warping Now we evaluate the performance of the affine-warping EM decoder for the test images with affine warping adjustments. The first experiment demonstrates the minimum decodable rates for rotated and sheared target images. The two-state channel in Figure 4.1 applies an affine warping adjustment to the images and crops them to 512×512. Then JPEG2000 or JPEG compression and reconstruction is applied at 30 dB reconstruction PSNR. In the illegitimate state, the malicious attack overlays a 20×122 pixel text banner randomly on the image. The image projection X is quantized to 4 bits, and the Slepian-Wolf encoder uses a 4096-bit LDPC code with 400 degree-1 syndrome nodes. Figure 4.11 compares the minimum rates for decoding S(Xq ) with legitimate test images using three different decoding schemes: the 59 CHAPTER 4. LEARNING UNKNOWN PARAMETERS −1 False Acceptance 10 −2 10 Oracle Decoder Exhaustive Search EM Decoder Unaware of Adjustment −3 10 −3 10 −2 10 −1 10 0 10 False Rejection Figure 4.10: ROC curves for different decoders. The target images have undergone random contrast and brightness adjustment and JPEG/JPEG2000 compression. The EM decoder and the exhaustive searching decoder, which tries parameter samples at intervals of 0.01 for α and 1 for β rounded from the ground truth, have performances very close to that of the oracle decoder, while the decoder unaware of adjustment rejects authentic test images with high probability. EM decoder that learns the affine parameters, an oracle decoder that knows the parameters, and a decoder unaware of adjustment that always assumes no adjustment. Figures 4.11 (a) and (b) show the results when the affine warping adjustments are rotation around the image center and horizontal shearing, respectively. The EM decoder requires minimum rates only slightly higher than the oracle decoder, while the decoder unaware of adjustment requires higher and higher rates as the adjustment increases. For the next experiment, we set the authentication data size to 250 bytes and measure false acceptance and rejection rates. The acceptance decision is made based on the likelihood ratio of Xq and y with estimated parameters within the estimated 60 0.02 0.02 0.018 0.018 Rate (bits per pixel of original image Rate (bits per pixel of original image CHAPTER 4. LEARNING UNKNOWN PARAMETERS 0.016 0.014 0.012 Unaware of Adjustment EM Decoder Oracle Decoder 0.01 0.008 0.006 0.004 0.002 −15 −10 −5 0 5 Degree of Rotation (°) (a) 10 15 0.016 0.014 0.012 Unaware of Adjustment EM Decoder Oracle Decoder 0.01 0.008 0.006 0.004 0.002 0 2 4 6 Percentage of Shearing (%) 8 10 (b) Figure 4.11: Minimum rate for decoding authentication data using legitimate adjusted test images as side information for different using different decoders. (a) The test images have undergone rotation. (b) The test images have undergone horizontal shearing. The EM decoder requires minimum rates only slightly higher than the oracle decoder, while the decoder unaware of adjustment requires higher and higher rates as the adjustment increases. cropped-in blocks. The channel settings remain the same except that transform parameters A11 and A22 are randomly drawn from [0.95, 1.05], A21 and A12 from [-0.1, 0.1], and b1 and b2 from [-10, 10]. The JPEG2000/JPEG reconstruction PSNR is selected from 30 to 42 dB. With 15,000 trials, Figure 4.12 shows the ROC curves created by sweeping the decision threshold of the likelihood ratio. The EM decoder performance is very close to that of the oracle decoder, while the decoder unaware of adjustment rejects authentic test images with high probability. The exhaustive searching decoder, which tries parameter samples at intervals of 0.01 for A and 1 for b rounded from the ground truth, also suffers from high probability of false rejection due to the inaccurate parameters used. In the legitimate case, the EM decoder estimates the transform parameters A11 , A21 , A12 , A22 , b1 , and b2 , with mean squared error 6.0 × 10−7, 4.1 × 10−6 , 4.2 × 10−7 , 1.6 × 10−6 , 0.06, and 0.69, respectively. 61 CHAPTER 4. LEARNING UNKNOWN PARAMETERS 0 10 −1 False Acceptance 10 −2 10 −3 10 Oracle Decoder Exhaustive Search EM Decoder Unaware of Adjustment −4 10 −4 10 −3 10 −2 10 False Rejection −1 10 0 10 Figure 4.12: ROC curves for different decoders. The target images have undergone random affine warping adjustment and JPEG/JPEG2000 compression. The EM decoder performance is very close to that of the oracle decoder, while the decoder unaware of adjustment rejects authentic test images with high probability. The exhaustive searching decoder, which tries parameter samples at intervals of 0.01 for A and 0.1 for b rounded from the ground truth, also suffers from high probability of false rejection due to the inaccurate parameters used. 4.5.3 Contrast, Brightness, and Affine Warping Adjustment We set the authentication data size to 250 bytes and measure false acceptance and rejection rates. The acceptance decision is made based on the likelihood of Xq and y with estimated parameters within the estimated cropped-in blocks. The channel settings remain the same except that parameter α is randomly drawn from [0.9,1.1], β from [-10,10], A11 and A22 from [0.95, 1.05], A21 and A12 from [-0.05, 0.05], and b1 and b2 from [-10, 10]. The JPEG2000/JPEG reconstruction PSNR is selected from 30 to 42 dB. With 15,000 trials, Figure 4.13 shows the ROC curves created by sweeping the decision threshold of the likelihood ratio test. The EM decoder performance is very close to that of the oracle decoder, while the decoder unaware 62 CHAPTER 4. LEARNING UNKNOWN PARAMETERS of adjustment rejects authentic test images with high probability. The exhaustive searching decoder, which tries parameter samples at intervals of 0.01 for A and α, 0.1 for β, and 1 for b rounded from the ground truth, also suffers from high probability of false rejection due to the inaccurate parameters used. In the legitimate case, the EM decoder estimates the transform parameters A11 , A21 , A12 , A22 , b1 , b2 , α, and β with mean squared error 4.5 × 10−7 , 2.6 × 10−6 , 3.4 × 10−7 , 1.6 × 10−6 , 0.05, 0.54, 2.0 × 10−5 , and 0.34, respectively. 0 10 −1 False Acceptance 10 −2 10 −3 10 Oracle Decoder Exhaustive Search EM Decoder Unaware of Adjustment −4 10 −4 10 −3 10 −2 10 False Rejection −1 10 0 10 Figure 4.13: ROC curves for different decoders. The target images have undergone random contrast, brightness, and affine warping adjustments and JPEG/JPEG2000 compression. The EM decoder performance is very close to that of the oracle decoder, while the decoder unaware of adjustment rejects authentic test images with high probability. The exhaustive searching decoder, which tries parameter samples at intervals of 1 for b and 0.01 for the others rounded from the ground truth, also suffers from high probability of false rejection due to the inaccurate parameters used. CHAPTER 4. LEARNING UNKNOWN PARAMETERS 4.6 63 Summary The image authentication system using distributed source coding has been extended to be robust against contrast, brightness, and affine warping adjustment. The system now decodes the Slepian-Wolf bitstream and estimates the adjustment parameters using an EM algorithm. Experimental results demonstrate that the system can distinguish legitimate encodings of authentic images from illegitimately modified versions, despite arbitrary contrast, brightness, and affine warping adjustment, using authentication data of less than 250 bytes per image. With accurate parameter estimation within a manageable number of iterations, our system outperforms the exhaustive searching decoder. Our system now can decode the Slepian-Wolf bitstream in the authentication data using legitimate target images that might have undergone contrast, brightness, and affine warping adjustments. The next chapter will present a method that decodes the authentication data at low rates even with tampered target images. This enables the system to localize the tampering. Chapter 5 Tampering Localization The previous chapter extends the authentication system to be robust against various legitimate adjustments. The authentication system selects an authentication data rate so that it can only be correctly decoded using authentic (possibly adjusted) target images as side information. It would fail to decode the authentication data as a result of tampering. However, localization of tampering would require reconstructing the original image projection using the tampered image as side information. As shown in the previous chapters, using legitimate editing models to decode the authentication data with tampered side information would require a high authentication data rate. An alternative to delivering the original image projection is to use conventional coding that makes data size independent of target images. However, a better solution using distributed source coding can be offered by leveraging the correlation between the original image and slightly tampered target images. This chapter presents an augmented decoder that can localize the tampering in those images deemed to be inauthentic by using the sum-product algorithm [89] over a factor graph that represents tampering models. Section 5.1 formulates the localization problem using a space-varying two-state channel. Section 5.2 describes the factor graph representation of the localization decoder. The decoder can reconstruct the image projection of the localization data using tampered images as side information. Section 5.3 presents spatial models that exploit the spatial correlation of the tampering. Section 5.4 extends the decoder to localize the tampering in tampered 64 65 CHAPTER 5. TAMPERING LOCALIZATION images that have undergone legitimate contrast and brightness adjustment. Simulation results in Section 5.5 demonstrate that the authentication system can localize the tampering with high probability and that the spatial models offer additional improvements. 5.1 Space-Varying Two-State Channel We introduce a space-varying two-state channel shown in Figure 5.1 to replace the two-state channel for the tampering localization problem. In the legitimate state, the channel output is legitimate editing, such as JPEG2000 compression and reconstruction. The tampered state additionally includes malicious tampering. The channel state variable Si is defined per nonoverlapping 16×16 blocks of image y. If any pixel in block Bi is part of the tampering, Si = 1; otherwise, Si = 0. The authentication " problem discussed in the previous chapters is a decision, Si > 0; the tampering localization problem can be formulated as deciding on Si for each block, given the Slepian-Wolf bitstream S(Xq ). Figure 5.2(c) shows the channel states overlaid on a tampered target image. The red blocks are tampered, and the others are legitimate. >5(;(/#+ ?<#;%&* 267& 12,34555612,3 7&"%-%*89 =&F$#/%'C =#5;%$&?<#;% + 267( Figure 5.1: Space-varying two-state lossy channel. The image is divided into nonoverlapping blocks. Each block has an associated channel state indicating whether the block is tampered or legitimate. Given the quantized original image projection Xq , and the target image projection Y , one can infer the channel state S using Bayes’ theorem: P (S|Xq , Y ) = P (Xq , Y |S)P (S) . P (Xq , Y ) (5.1) 66 CHAPTER 5. TAMPERING LOCALIZATION (a) Original (b) Tampered (c) Channel states overlaid Figure 5.2: The target image in (b) is a tampered version of the original image (a). The image in (c) is the overlaid channel state for each 16×16 block. The red blocks are tampered, and the others are legitimate. The localization decoder requires more information than the authentication decoder since it additionally estimates the channel states, and a tampered image is usually less correlated to Xq than an authentic one. Fortunately, since we use rate-adaptive LDPC codes [188], the localization decoder reuses the authentication data. Incremental localization data are sent through the Slepian-Wolf bitstream S(Xq ). In addition, the LDPC decoder can naturally adopt the channel state inference in (5.1). The next section will introduce the decoder factor graph that connects the LDPC decoding to the channel state inference. The sum-product algorithm over the factor graph simultaneously decodes the Slepian-Wolf bitstream and localizes the tampering. 5.2 Decoder Factor Graph A factor graph [89] is a bipartite graphical model that presents a factorization of a joint probability distribution of random variables. There are two classes of nodes: factor nodes and variable nodes. The variable nodes represent the random variables of interest; the factor nodes represent the probabilistic relationships among the adjacent variable nodes. Based on the factor graph representation, the sum-product algorithm 67 CHAPTER 5. TAMPERING LOCALIZATION efficiently marginalizes the approximate joint distribution for all variables. The sumproduct algorithm has emerged in many applications in coding, statistical filtering, and artificial intelligence. We apply this technique to our tampering localization problem. The factor graph in Figure 5.3 shows the relationship among the Slepian-Wolf bitstream (LDPC syndromes), bits of image projection Xq (3-bit quantization in this example), side information, and channel states. The variable nodes of interest are [Xq1 (i), Xq2 (i), Xq3 (i)] which are the binary representations of the 3-bit quantized image projection Xq (i) and the channel states Si . The factor node at each syndrome node is an indicator function of the satisfaction of that syndrome constraint. The factor fbi (Xq (i), Si ) = P (Xq (i)|Y (i); Si ) represents the relationship between image projection Xq (i), side information Yi , and the channel state Si . When Si = 0, fbi (Xq (i), 0) is proportional to the integral of a Gaussian distribution with mean Y (i) and a fixed variance σz2 over the quantization interval of Xq (i). When Si = 1, fbi (Xq (i), 1) is uniform. The factor connected to each state node fsi (Si ) = P (Si ) is the a priori probability of channel state. The localization decoder applies the sum-product algorithm [89] on the factor graph to estimate each channel state likelihood P (Si = 1) and decode the SlepianWolf bitstream S(Xq ). Decoding is initialized with the syndrome node values S(Xq ) and the side information Y embedded in fbi . For each iteration, the state node Si passes the belief message to the factor node fbi : uis→fb (s) = fsi (s) = P (Si = s). Then the factor node fbi summarizes all incoming messages and generates the outgoing messages as follows: uifb →s (s) (i,j) ∝ ufb →x (xj ) ∝ & fbi (x1 , x2 , x3 , s) x1 ,x2 ,x3 & 3 4 (i,k) ux→fb (xk ) k=1 fbi (x1 , x2 , x3 , s)uis→fb (s) {s,x1 ,x2 ,x3 }\xj 4 k∈{1,2,3}\j (i,k) ufb →x (xk ) (i,k) where ux→fb is the belief message from the factor node fbi to the bit node Xqk (i) (i,k) (the kth most significant bits of Xq (i)) and ufb →x is the opposite. The messages are " (i,j) normalized so that x u(x) = 1. The decoder takes the messages ufb →x and performs 68 CHAPTER 5. TAMPERING LOCALIZATION (i,j) one iteration of LDPC decoding to yield updated ux→fb for the next iteration. The a priori probability of channel states fsi is re-estimated each iteration as follows: fsi (s) = N 1 & i u (s) N i fb →s The message-passing iterations terminate when the hard decisions on bits of Xq satisfy the constraint imposed by the syndrome S(Xq ). The summary of all the incoming messages to Si nodes yields the marginal probability of the channel state: Papp (Si = s) ∝ uifb →s (s)uifs→s (s). Finally, each block Bi of y is declared to be tam- pered if the marginal probability Papp (Si = 1) > T , a fixed decision threshold. IC,.&+4" J+."3 K*$ J+."3 Ƀ Ƀ Ƀ I$#$" J+."3 # $C ?( @ , ?*()EC@! - # $8 ?( @ *)( , (*) ! + # $D ?( @ Ƀ , -? (!ED@*) Ƀ ,+( ! *) Ƀ "( , (* + ! + * +( Figure 5.3: Factor graph for the localization decoder. The syndrome nodes are indicator functions of the satisfaction of the syndrome constraints. The factor node connecting Xq (i) and Si is the conditional probability P (Xq (i)|Yi , Si ). The factor node fsi represents the a priori probability of the channel state. Here, we assume that the channel states are independent of each other. For most cases, the channel states are correlated to the adjacent channel states due to the contiguity of tampering. Figure 5.2(c) shows one example. The next section will describe spatial models on the channel states to exploit the spatial contiguity of tampered regions. 69 CHAPTER 5. TAMPERING LOCALIZATION 5.3 Spatial Models for State Nodes Figure 5.4 depicts a modified decoder factor graph. The spatial correlation of the channel states are now considered using two spatial models: 1D and 2D Markov models. In the previous section, the independent model in Figure 5.3 factorizes the 5 joint distribution of the channel states as P (S) = i P (Si ) by assuming each chan- nel state is memoryless. This section considers a 1D Markov model that gives P (S) = 5 5 P (S1 ) i P (Si |Si−1 ), and a 2D model P (S) = P (Sboundary ) i P (Si |Si−1 , Si−w , Si−w−1 ), where Sboundary are the channel states located at the left and top image boundaries and w is the number of the image projections in a row. The factor graphs of 1D and 2D models are shown in Figure 5.5. IC,.&+4" J+."3 L K*$ J+."3 L L C $ # ?( @ 8 $ # ?( @ D $ # ?( @ L L L *)( , (* ) ! + , +( ! * ) I@#$*#BFA+."B3F1+&FI$#$"FJ+."3 Figure 5.4: Factor graph for the localization decoder with spatial models. The factor nodes fbi pass belief messages of the channel states to the spatial models which capture the contiguity of tampering blocks. The spatial models then return the channel state beliefs back to the factor nodes fbi . 70 CHAPTER 5. TAMPERING LOCALIZATION The spatial models receive messages uifb →s from factor nodes fbi , and reply with messages uis→fb . The decoder for the 1D Markov model in Figure 5.5(a), parameterized by probability ft (Si , Si−1 ) = P (Si |Si−1 ), achieves this with one iteration of the Baum-Welch algorithm [19]. That is Forward recursion: ui+1 ft →s (s) ∝ Backward recursion: vfi−1 (s) t →s ∝ & si & si P (s|si )uift →s (si ) P (si |s)vfi t →s (si ). Then the 1D spatial model returns the messages, uifb →s (s) ∝ uift →s (s)vfi t →s (s), which are normalized so that uis→fb (0) + uis→fb (1) = 1. The 2D Markov random field in Figure 5.5(b) is parameterized by probability ft (Si , Si−1 , Si−w , Si−w−1) = P (Si |Si−1 , Si−w , Si−w−1), and so employs a modified Baum-Welch iteration similar to that of [187]. The forward and backward recursions are uift →s (s) = vfi t →s (s) = & si−1 ,si−w ,si−w−1 & si+1 ,si+w ,si+w+1 P (s|si−1, si−1 , si−w−1) P (si+w+1|si+1 , si+w , s) 4 j∈{i−1,i−w,i−w−1} 4 j∈{i+1,i+w,i+w+1} ujft →s (sj )ujfb →s (sj ) vfjt →s (sj )ujfb →s (sj ) . The resulting message uis→fb is given by uis→fb (s) ∝ uift →s (s)vfi t →s (s), which is normalized such that uis→fb (0) + uis→fb (1) = 1. In summary, the decoder runs LDPC decoding of the Slepian-Wolf bitstream S(Xq ) using the side information Y and yields the beliefs of bit nodes of Xq . The decoder then generates the beliefs of channel states based on fbi = P (Xq |Y, S). These belief messages pass through one of the spatial models (IID in the previous section, 1D, or 2D). The returning channel state belief messages summarize all the incoming 71 CHAPTER 5. TAMPERING LOCALIZATION , +( ! * ) , (* ) ! + C , (*"0 ! + , (*0 ! + "(./ *0 "( *0 2 C 2(*#0 ! + "(1/ ( *0 ! + (a) "(.3./ *0 "(./ "(.31/ "(.3 , +( ! * ) , (* ) ! + , (*0 ! + "( 2(*0 ! + *0 "(1/ (b) Figure 5.5: Factor graphs of spatial models for the channel states (a) 1D Markov chain (b) 2D Markov random field. messages from the factor nodes fbi with the spatial model. The factor nodes fbi then update the belief messages of each bit node of Xq using the new channel state beliefs and side information Y . The LDPC decoder takes the updated bit belief messages for the next iteration. An extension of the update in the factor nodes fbi also considers the side information that has undergone some adjustments by using an EM algorithm as described in the previous chapter. The next section will discuss tampering localization in an inauthentic image that has undergone some adjustments. 72 CHAPTER 5. TAMPERING LOCALIZATION 5.4 Tampering Localization for Contrast and Brightness Adjusted Images The authentication of images that have undergone some global adjustments was discussed in the previous chapter. The same challenge also arises in the tampering localization problem. The localization decoder that is unaware of adjustment will deem the legitimately adjusted image blocks as tampered ones. This makes the tampering localization result useless. This section presents our solution which combines the localization decoder with an EM algorithm to learn the adjustment parameters. From the perspective of the EM algorithm, the parameter estimation additionally uses the channel state information to learn the adjustment parameters; from the perspective of the localization decoder, the factor graph node fbi using the side information compensates for the adjustment using the estimated parameters. Though this section only describes the localization decoder for contrast and brightness adjustment, the same principle can apply to other adjustments. Figure 5.6 is an extended model of Figure 5.1 that additionally includes a global contrast and brightness adjustment. The tampering localization system described in the previous sections is not robust against the contrast and brightness adjustment. This is because the affine relationship is preserved by the random image projection due to its linearity; that is, Y = αX + β + Z. The localization decoder that is unaware of this adjustment will provide useless tampering localization information. This section describes an extended localization decoder that can correctly localize the tampering in contrast and brightness adjusted images using an EM algorithm. !"#$#%&' )*&$+ - Q11*,"F-+,$$F#,.F K&*/R$,"33FQ.S%3$4",$ MN'O7555PMN'O "(45 A#B*)*+%3F 2#4@"&*,/ 5&"$+2()*&$+ ! "(4/ Figure 5.6: Space-varying two-state lossy channel with contrast and brightness adjustment. The target image now is affected by a global contrast and brightness adjustment. 73 CHAPTER 5. TAMPERING LOCALIZATION The introduction of learning to the tampering localization system only requires that the Slepian-Wolf decoder block of Figure 5.4 be embedded within the contrastand-brightness-learning loop of Figure 5.7. As before, the decoder takes the SlepianWolf bitstream S(Xq ) and the side information Y , yields the reconstructed image projection Xq , and estimates the channel states Si . But it now does this via an EM algorithm that updates the a posteriori probability mass functions (pmfs) Papp (Xq ) and Papp (Si ) in the E-step and updates contrast α and brightness β by maximum likelihood estimation in the M-step. 9#/+()%<4"*&2#4% ! F>2#*&2+/(G4%2"&>2(&%/( ="#$32%+>>(0/B1>2*+%2 9'+6#&%:;4'< =#2>2"+&*("?#$@ 9'+6#&%:;4'< 7+-4/+" !'(" A&"&*+2+"( F>2#*&2#4% %&66?#$@E %&66?"@ ,+-4%>2"1-2+/ )*&$+(A"4B+-2#4% #$& " F>2#*&2+/(G3&%%+'(92&2+ Figure 5.7: Contrast and brightness learning Slepian-Wolf decoder for tampering localization. The decoder decodes the Slepian-Wolf bitstream S(Xq ) using the side information Y compensated with the previously estimated contrast and brightness adjustment parameters. Each iteration produces a soft estimation of Xq and the channel states S in the E-step and updates the contrast and brightness adjustment parameters in the M-step. In the E-step, the information in the a priori pmfs and the Slepian-Wolf bitstream S(Xq ) are combined via one iteration of the sum-product algorithm over the localization decoder factor graph Figure 5.3 in Section 5.2 or Figure 5.4 in Section 5.3). This produces a posteriori pmfs of the image projection pixels Qi (xq ) = Papp (Xq (i) = xq ) and the channel states Papp (Si = 0), denoted as w(i). 74 CHAPTER 5. TAMPERING LOCALIZATION In the M-step, the lower bound L̂(α, β) of the log-likelihood function is adapted to include the channel states in the statistical model as follows: L(α, β) ≡ = & log P (Xq (i), Y (i), Si ; α, β) i∈C & log i∈C ≥ ) && i∈C 1 & P (Xq (i), Y (i)|Si = s; α, β)φs(s) s=0 Qi (xq ) log xq ) 1 & * P (xq , Y (i)|Si = s; α, β)φs(s) s=0 * ≡ L̂(α, β), (5.2) where the untampered distribution P (Xq (i), Y (i)|Si = 0; α, β) is a quantized Gaussian with mean (Y (i) − β)/α, variance σ 2 /α2 , and uniform tampered distribution P (Xq (i), Y (i)|Si = 1; α, β). The quantization of X is uniform and saturated for X less than 0 or greater than 255. The a priori pmf over channel states is denoted by φs (s). Setting partial derivatives of L̂(α, β) with respect to α and β to zero, we obtain the optimality conditions1 : " " w(i)µix Y (i) − i∈C j∈C w(i)w(j)µixY (j) α= " " " W i∈C wi µix2 − i∈C j∈C w(i)w(j)µix µjx 1 & β= w(i)(Y (i) − αµix ) W i∈C W " i∈C φs (0) = W/|C|, and φs (1) = 1 − φs (0), & where W = w(i) i∈C µix = & Qi (xq )E[X(i)|q(X(i)) = xq , Y (i); α, β] xq µix2 = & xq 1 Qi (xq )E[X(i)2 |q(X(i)) = xq , Y (i); α, β] Appendix B discusses the concavity of L̂(α, β) to claim the optimal conditions. CHAPTER 5. TAMPERING LOCALIZATION 75 Since both the left and right hand sides of the optimality conditions contain α and β, we update them iteratively until convergence or at most 30 iterations. The outer loop of EM iterations terminates when hard decisions on Papp (Xq (i) = xq ) satisfy the constraints imposed by S(Xq ). Finally, each block Bi is declared to be tampered if Papp (Si = 1) > T , a fixed decision threshold. Note that by setting wi = 1 for all i, this result reduces to the EM algorithm for the authentication problem. The localization system now can localize the tampering in contrast and brightness adjusted tampered images. The same framework can also apply to the other adjustments by including the channel state estimations into parameter estimation. 5.5 Simulation Results This section shows the simulation results of the tampering localization decoder. In practice, the localization decoder would only run if the authentication decoder deems an image to be tampered, so we test the tampering localization system only with maliciously tampered images. Section 5.5.1 describes the simulation setup. Section 5.5.2 shows the minimum rates for successful decoding for various tampered images using different spatial models. In Section 5.5.3, we measure the failure rates of tampering localization. 5.5.1 Setup We use test images in the simulation, shown in Appendix A, at 512×512 resolution in 8-bit gray resolution. The space-varying two-state channel in Figure 5.1 applies JPEG2000 or JPEG compression and reconstruction at several qualities above 30 dB. The malicious tampering consists of the overlaying of up to five text banners of different sizes at random locations in the image. The text banner sizes are 198×29, 29×254, 119×16, 16×131, and 127×121 pixels. The text color is white or black, depending on which is more visible, again avoiding generating trivial attacks, such as overlaying white text on a white area. CHAPTER 5. TAMPERING LOCALIZATION 5.5.2 76 Decodable Rate We first compare the minimum localization data rate required by the localization decoder. Figure 5.8 shows the Slepian-Wolf bitstream components S(Xq ) of these rates (in bits per pixel of the original image x) for Lena with Xq quantized to 4 bitplanes. All five text banners are placed for malicious tampering, because greater tampering makes tampering more easily detected, but makes localization more difficult. The placement is random for 100 trials, leading to tampering of 12% to 17% of the nonoverlapping 16×16 blocks of the original image x. To successfully decode the localization data using tampered target images as side information, the DSC localization decoders using spatial models (IID, 1D, and 2D) achieve much lower localization data rates than the DSC decoder using a legitimate model whose rate is close to the conventional fixed length coding. The DSC localization decoders using spatial models save the localization data size around 65% compared to the conventional fixed length coding. The 1D and 2D spatial models offer additional 12% and 15% savings, respectively, compared to the independent spatial model. The required localization rate is roughly three times the required authentication rate. Since we use rate-adaptive LDPC codes [188], the localization decoder re-uses the authentication data and only requires the incremental localization data rates (the gaps to the authentication data rate) to discover not only the location of the tampering but also the magnitude of the tampering. In the worst case over all trials, the largest bitstream sizes are 232, 208, and 192 bytes for the independent, 1D and 2D spatial models, respectively. 5.5.3 Receiver Operating Characteristic Using these Slepian-Wolf bitstream sizes, we measure various failure rates. Since our system is block based, we measure the false rejection rate by counting the falsely deemed tampered blocks, and false acceptance rate by counting the undetected tampered pixels. The rate of falsely deemed tampered blocks is the proportion of untampered blocks mistaken for tampered blocks. The rate of undetected tampered pixels is the portion of tampered pixels accepted as untampered pixels. Figure 5.9 shows the receiver operating characteristic (ROC) curves of the tampering localization decoders 77 CHAPTER 5. TAMPERING LOCALIZATION 0.02 Rate (bits per pixel of the original image) 0.018 0.016 0.014 0.012 0.01 0.008 Fixed Length Coding DSC with Legitimate Model DSC with IID Spatial Model DSC with 1D Spatial Model DSC with 2D Spatial Model Authentication Data 0.006 0.004 0.002 0 30 32 34 36 38 Legitimate Reconstruction PSNR (dB) 40 42 Figure 5.8: Minimum localization data rates for decoding S(Xq ) using tampered side information compared to the authentication data rates. The decoder with a legitimate model requires a high rate close to the conventional fixed length coding. The decoder with spatial models gives around 65% rate savings compared to the fixed length coding. The 1D and 2D spatial models offer additional 12% and 15% savings, respectively, compared to the independent spatial model. Using rate-adaptive LDPC codes, only the incremental localization data rates (the gaps to the authentication data rate) are sent. using spatial models for X quantized to 4 bits as the decision threshold T varies. The rates of falsely deemed tampered blocks can reach zero, while keeping the undetected tampered pixel rates near 2%, since most of the blocks falsely deemed untampered have only a few tampered pixels. We now test the EM localization decoder by additionally applying contrast and brightness adjustment on the same set of test images. The contrast and brightness adjustment parameters α and β are drawn uniformly at random from [0.8, 1.2] and [−20, 20], respectively. The Slepian-Wolf bitstream is set to 278 bytes, the largest minimal Slepian-Wolf data size of the EM decoder among 50 training trials. We compare the performance of the EM localization decoder that learns α and β from initial 78 CHAPTER 5. TAMPERING LOCALIZATION 0.028 Independent 1D Spatial 2D Spatial Undetected Tampered Pixels 0.026 0.024 0.022 0.02 0.018 0.016 0.014 0 0.002 0.004 0.006 0.008 Falsely Deemed Tampered Blocks 0.01 Figure 5.9: ROC curves of the tampering localization decoders using spatial models. The rates of falsely deemed tampered blocks can reach zero, while keeping the undetected tampered pixel rates at about 2%, since most of the blocks falsely deemed untampered have only a few pixels tampered. In most cases, 1D and 2D spatial models achieve a lower undetected tampered pixel rate at a given falsely deemed tampered block rate. value 1 and 0, the oracle decoder that knows α and β, and the decoder unaware of adjustment that assumes α = 1 and β = 0. Independent, 1D and 2D spatial models are applied on all decoders for testing. Figure 5.10 plots these ROC curves as the decision threshold T varies. The results indicate that the localization performance of the EM decoder is close to the oracle decoder, while the decoder unaware of adjustment has high failure rates of falsely deemed tampered blocks due to inconsistency in the contrast and brightness parameters. 1D and 2D spatial models offer additional improvement for the EM decoder. 79 CHAPTER 5. TAMPERING LOCALIZATION 0.12 Undetected Tampered Pixels 0.1 0.08 EM Decoder with Independent Model EM Decoder with 1D Spatial Model EM Decoder with 2D Spatial Model Oracle Decoder with Independent Model Oracle Decoder with 1D Spatial Model Oracle Decoder with 2D Spatial Model Unaware of Adjustment with Independent Model Unaware of Adjustment with 1D Spatial Model Unaware of Adjustment with 2D Spatial Model 0.06 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Falsely Deemed Tampered Blocks 0.7 0.8 0.9 1 Figure 5.10: ROC curves of the tampering localization decoders facing contrast and brightness adjusted images. The localization performance of the EM decoder is close to the oracle decoder, while the decoder unaware of adjustment has high failure rates of falsely deemed tampered blocks due to inconsistency in the contrast and brightness parameters. The 1D and 2D spatial models offer additional improvement for the EM decoder. 5.6 Summary The image authentication system using distributed source coding is extended to perform tampering localization in images already deemed to be tampered. The system decodes the authentication data plus incremental localization data using the sumproduct algorithm over the factor graph representing the space-varying two-state channel model. The localization decoder can work jointly with EM algorithms to learn adjustment parameters for the images that have undergone legitimate adjustments. Simulation results demonstrate that the system can decode the Slepian-Wolf bitstream at a low rate even when side information is tampered and contrast and brightness have been adjusted. 1D and 2D spatial models capturing the contiguity of tampering additionally reduce the localization data size by 12% to 15% and offer better localization performance compared to the independent model. Chapter 6 Video Quality Monitoring Using Distributed Source Coding Digital video delivery involves coding the video content into a bitstream, transmitting the bitstream to end users, and reconstructing the video from the received (possibly transcoded or damaged) bitstream. Distortions are introduced in these steps. Lossy video coding compresses video content into small bitstreams but induces compression distortions. Packets might be lost during the transmission, especially through wireless links. The decoder at the end user tries to reconstruct the video content from the incoming packets with error protection and concealment to mitigate the distortion due to packet loss. To ensure the quality of service for the whole video delivery system, the first step is to monitor the fidelity of the received video. This chapter presents and investigates a reduced-reference video quality monitoring scheme using distributed source coding. Section 6.1 describes the scheme in detail and the rationale for using distributed source coding and maximum likelihood estimation of the received video quality. Section 6.2 provides a theoretical performance prediction based on the Cramér-Rao lower bound for maximum likelihood quality estimation. In Section 6.3, our approach is compared to the ITU-T J.240 Recommendation for remote PSNR monitoring and the theoretical performance prediction is confirmed. 80 81 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 6.1 Video Quality Monitoring System Figure 6.1 depicts the proposed quality monitoring system using distributed source coding. We denote the original video as y and the received video as x. Each user provides a video digest consisting of a Slepian-Wolf coded random projection of the received videos. The quality monitoring server uses the projection of the original video as side information to decode the video digest. It then analyzes the projections to estimate the quality in terms of reconstruction Peak Signal to Noise Ratio (PSNR). This architecture is advantageous for two reasons. The users are responsible for Slepian-Wolf encoding, which is much less computationally demanding than SlepianWolf decoding. We first describe the overall operation of the system, leaving the details of the pseudorandom projection to Section 6.1.1 and analysis methods to Section 6.1.2. >5(;(/#+&N(0%3 + W%.#*?=/&'9F%99%*' G&'.*F 2/*H#-;%*' '(0%&?/C35<#$(3/& "%*3/)$5@*$%0&N(0%3& # G53H%*$(3/ '+%.(#/AB3+C !-! D($)$5%#<126!-8 !"#$%&'()*"+? :'&"B9%9 0#-*.#/ O)$(<#$%0&P%#/& 'Q@#5%0&O5535 "%*%(I%0 N(0%3 * "#/03<&'%%0 /0 G&'.*F 2/*H#-;%*' N(0%3&G53H%*$(3/ ! !"#$%&'()*"+ ,'-*.#/ I8&';%J&;%*' E@#/$(F%0&N(0%3& G53H%*$(3/ !- Figure 6.1: Proposed video quality monitoring scheme using distributed source coding. The video digest from the receivers consists of the random seed and the SlepianWolf coded quantized projection of the received video. The pseudorandom projection follows the ITU-T J.240 Recommendation described in Figure 6.2. The server uses the original video projection as side information to decode the incoming video digest and yields reconstructed video projections. The mean squared error between the original and the received video are estimated using the original and the reconstructed quantized projection. CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 82 The right-hand side of Figure 6.1 shows the user receiving video. It applies a pseudorandom projection (based on a randomly drawn seed Ks ) to its received video x and quantizes the projection coefficients X to yield Xq . These quantized coefficients are then coded by a Slepian-Wolf encoder based on low-density parity-check (LDPC) codes [111, 112]. The user sends the Slepian-Wolf bitstream S(Xq ) as a video digest back to the quality monitoring server (shown on the left-hand side of Figure 6.1) through a secure channel. The user pseudorandomly generates a J.240 projection as an Nb × Nb block P according to a seed Ks . The seed changes for each frame and is communicated to the quality monitoring server along with the Slepian-Wolf bitstream. For each nonoverlapping block Bi of x, the inner product +Bi , P, is quantized into an element of Xq . The rate R of Slepian-Wolf bitstream S(Xq ) is determined by the joint statistics of Xq and Y . If the conditional entropy H(Xq |Y ) exceeds the rate R, then Xq can no longer be correctly decoded [167]. Therefore, we choose the rate R to be just sufficient to decode given x at the worst permissible quality. Upon receiving the video digest, the quality monitoring server first projects the original video y into Y using the same projections as at the user. A Slepian-Wolf decoder reconstructs Xq& from the Slepian-Wolf bitstream using Y as side information. Decoding is via the LDPC message-passing algorithm [111, 112] initialized according to the statistics of the worst permissible degradation for the given original video. Finally, the quality monitoring server analyzes the reconstructed projection Xq& and the projection Y of the original video to estimate video quality in terms of reconstruction PSNR. 6.1.1 J.240 Feature Extraction For quality estimation, we use the projection defined in the feature extraction module (shown in Figure 6.2) of the J.240 Recommendation [80]. Each Nb × Nb block Bi is whitened in both spatial and Walsh-Hadamard Transform (WHT) domains using pseudorandom number (PN) sequences s and t, respectively, to yield block Fi . Each element in s and t is either 1 or -1. From this block, a single feature pixel Fi (k) is CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 83 selected. Casting Bi and Fi as 1-D vectors, we can write −1 Fi = H THS= Bi : ;< GP where H is the WHT matrix (cast from the 2D WHT), and S and T are diagonal whitening matrices with entries s and t, respectively. The projection P that produces Fi (k) is the k th row of GP . '%+%*$%0&S%#$@5%& S%#$@5%&&D+3*R G(U%+#"6688 "6 )D=(N !#"#-;%*' ?<#;%&D+3*R !6 )D= GT&'%Q@%/*% " GT&'%Q@%/*% # Figure 6.2: Random projection of J.240 feature extraction module. An image or video frame is divided into nonoverlapped blocks. Each block is whitened in both spatial and Walsh-Hadamard transform domains. A whitened pixel is selected as the feature pixel to represent the block for the PSNR estimation. 6.1.2 PSNR Estimation In the ITU-T J.240 Recommendation, estimated PSNR (ePSNR) between x and y is computed as follows: eMSEJ240 N Q2s & & (X (i) − Yq (i))2 = N i=1 q ePSNRJ240 = 10 log10 2552 , eMSEJ240 (6.1) (6.2) where N is the number of samples, Yq is the quantized version of Y , and Qs is the quantization step size of Yq and Xq& . Since the quality monitoring server has complete information of Y , Valenzise et al. suggest reconstructing X & based on MMSE estimation given Y and Xq& , and then CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 84 estimating the MSE using Y and X & [183]. X & (i) = E[X|Xq& (i), Y (i)] eMSEMMSE N 1 & E[(X & (i) − Y (i))2 ] = N i=1 ePSNRMMSE = 10 log10 2552 eMSEMMSE (6.3) (6.4) (6.5) In our system, we propose maximum likelihood estimation of the MSE between x and y directly from Xq& and Y as follows: eMSEML N 1 & E[(X − Y (i))2 |Y (i), Xq& (i)] = N i=1 ePSNRML = 10 log10 2552 eMSEML (6.6) (6.7) We will compare the quality estimation performance of these three estimators in Section 6.3. Compression of Xq using distributed source coding is much more efficient then using conventional coding. Figure 6.3 depicts the distributions of X and X − Y of the first 100 frames of the Foreman sequence at CIF resolution, and shows that X has a large variance whereas X and Y are highly correlated. We model X|Y as a Gaussian with mean Y and variance σz2 , which is unknown at the decoder but can be estimated. 6.2 Performance Prediction with the Cramér-Rao Lower Bound The performance of MSE or PSNR estimation is related to the quantization of X and the number of samples, denoted as N. More precise representation of X or more samples yields lower estimation error, but requires a higher rate to deliver Xq . This section analyzes the tradeoffs between video digest size and estimation error with 85 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC −1 10 −2 Probability 10 −3 10 −4 10 X PMF X−Y PMF −5 10 −500 0 Value 500 Figure 6.3: Distributions of X and X − Y of the first 100 frames of the Foreman sequence at CIF resolution. The projection X has a large variance whereas X and Y are highly correlated. different configurations of block size and quantization. A more general information theoretical analysis has been addressed by Han et al. [66], where a lower (achievable) bound of coding X is provided, but the converse part remains open. Although the analysis in this section is dedicated to our practical video quality monitoring system, the result can be applied to other systems that require variance estimation using quantized information. We first derive the performance prediction as a function of N and the quantization of X in Section 6.2.1 and then use synthesized data to confirm the result in Section 6.2.2. 6.2.1 Performance Prediction Let X(i) and Y (i) be i.i.d. continuous random variables with the relationship: X(i)− Y (i) = Z(i) ∼ N (0, σz2), for i = 1, . . . , N. The target parameter is θ ! σz2 . The case in which X and Y are available at the same terminal is well-studied: one can estimate CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 86 θ locally and transmit the estimation result θ̂ to the server. In our case, the remote terminals have access to X, but Y is only available at the server. Here, we focus on the scheme that uses a scalar quantizer cascaded by a Slepian-Wolf encoder. Let Xq (i) be the quantized version of X(i) with a scalar quantizer Q(.), i.e. Xq (i) = Q(X(i)). Therefore, the achievable rate Rx for Slepian-Wolf coding is H(Xq |Y ; σl2 ), where σl2 is the noise variance at the worst permissible quality. This leaves the problem of relating the estimation error to the quantizer and number of samples. Note that our maximum-likelihood estimator of θ = σz2 in (6.6) is unbiased, i.e., N 1 & E[θ̂ML ] = E[ EX [(X − Y (i))2 |Y (i), Xq (i)]] N i=1 (6.8) N 1 & E[EX [(X − Y (i))2 |Y (i), Xq (i)]] (6.9) = N i=1 N & & 1 = E EX [(X − Y (i))2 |Y (i), Xq (i) = xq ]P (Xq (i) = xq |Y (i)) N i=1 x q (6.10) + 2 N & & (x − Y (i)) f (x|Y (i))dx 1 x:Q(x)=xq P (Xq (i) = xq |Y (i)) E = N i=1 P (X (i) = x |Y (i)) q q x q (6.11) = = N 1 & E N i=1 >? x @ 2 (x − Y (i)) f (x|Y (i))dx N 1 & E[EX [(X − Y (i))2 |Y (i)]] N i=1 =θ (6.12) (6.13) (6.14) where (6.9) is due to linearity of expectation, (6.10), (6.11), and (6.13) by definition of expectation, (6.12) due to concatenations of the integral intervals over all xq , and (6.14) by iterated expectation. CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 87 According to the Cramér-Rao theorem [37, 148], for any unbiased estimator, the variance of the estimator is bounded by the inverse of Fisher information [133] of θ: 'A B2 ( ∂ log f (Xq , Y ; θ) I(θ) = E ∂θ D N C = 4 E E 2 [(X − Y )2 |Xq , Y ] − θ2 4θ Note that here we assume P (X|Y ) ∼ N (Y, θ = σz2 ). Therefore, we obtain the Cramér- Rao lower bound (CRLB) of mean squared estimation error for any unbiased estimator: E[(θ̂ − θ)2 ] = Var[θ̂] ≥ 1 4θ4 = I(θ) N(G − θ2 ) (6.15) where G = E [E 2 [(X − Y )2 |Xq , Y ; θ]] is a function of the quantizer. Note that the result in (6.15) generalizes the variance estimation error lower bound to quantized information. The null quantization function Q(X) = X gives G = 3θ2 and the result in (6.15) yields E[(θ̂ −θ)2 ] ≥ 2θ 2 , N which is exactly the CRLB of the unbiased Gaussian variance estimator. Since the F log-likelihood function log f (Xq , Y ; θ) is twice differenE ∂ log f (Xq ,Y ;θ) = 0, by Cramér-Rao theorem, the maximum-likelihood tiable and E ∂θ estimator is efficient, i.e., E[(θ̂ML − θ)2 ] = 4θ 4 . N (G−θ 2 ) The last step is to relate the estimation error in MSE to that in PSNR in dB. Recall that PSNR = 10 log10 2552 (dB), MSE for 8-bit gray level images with peak value of 255. Assuming the estimation is close enough to the true value to use the first order approximation, we obtain A B dPSNR 10 .PSNR ≈ .θ = − .θ dθ θ log(10) B2 E A F C D 10 2 E |θ̂ − θ|2 E |ePSNR − PSNR| ≈ θ log(10) 88 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC Hence, the approximation of the mean squared estimation error of ePSNRML is C D E |ePSNRML − PSNR|2 ≈ 6.2.2 A 10 θ log(10) B2 4θ4 75.44 × θ2 = (dB2 ) N(G − θ2 ) N(G − θ2 ) (6.16) Synthesized Data Simulation To confirm the performance prediction results in the previous subsection, we randomly generate X of size N uniformly from [0, 255], Z of size N according to N (0, σz2 ), and Y = X + Z. X is further quantized into a different number of bits to yield Xq . For each configuration, we generate 20,000 sets of data, and measure the estimated MSE using maximum likelihood estimation. Figure 6.4 shows the estimation error in MSE for a varying number of samples N, number of bits in quantization, and σz2 in charge. These results confirm that the efficiency of maximum likelihood estimation matches the Cramér-Rao lower bound of (6.15) for various settings. Figure 6.5 plots the average squared PSNR estimation errors in dB2 for different numbers of samples N, and different numbers of bits in quantization. The estimation errors are averaged over the ground truth σz2 which yields reconstruction PSNR of {26, 28, . . . , 38} dB. The results indicate that the performance of the maximum likehood estimation is close to the performance prediction in (6.16). After confirming the prediction of the PSNR estimation performance with quantization, we are now interested in the overall trade-offs between the video digest data size and the PSNR estimation performance. We consider PSNR estimation of each 16-frame group of pictures (GOP) in 30 frames per second (fps) video sequences in CIF or QCIF format at the worst permissible quality of 26 dB. The quality monitoring scheme uses uniform quantization and the ITU-T J.240 feature extraction of 8×8, 16×16, or 32×32 block projections. This yields different numbers of samples to transmit. A lower bound of the trade-offs consists of (6.16) for PSNR estimation performance and the conditional entropy H(Xq |Y ) for the ideal Slepian-Wolf bit rates. This lower bound can be generated from the model without simulation data. For synthesized data simulation, we run 100 trials for each configuration and plot the largest minimum decodable rate among the trials as the practical Slepian-Wolf bit 89 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 350 80 Simulation Performance Prediction 250 200 150 100 50 0 1 60 50 40 30 20 10 2 3 4 5 6 Number of Bits in Quantization 7 0 1 8 (a) N=1584, PSNR=26dB 2 3 4 5 6 Number of Bits in Quantization 7 8 (b) N=6336, PSNR=26dB 5 1.4 Simulation Performance Prediction Simulation Performance Prediction 1.2 4 Averaged |MSE − eMSE|2 Averaged |MSE − eMSE|2 Simulation Performance Prediction 70 Averaged |MSE − eMSE|2 Averaged |MSE − eMSE|2 300 3 2 1 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 Number of Bits in Quantization 7 (c) N=1584, PSNR=38dB 8 0 1 2 3 4 5 6 Number of Bits in Quantization 7 8 (d) N=6336, PSNR=38dB Figure 6.4: These plots show the estimation errors of MSE using maximum likelihood estimation with different configurations: number of samples, σz2 in charge, and number of bits in quantization of X. These results confirm the efficiency of maximum likelihood estimation and the derivation of the Cramér-Rao lower bound. rate. The practical Slepian-Wolf decoder uses rate-adaptive LDPC codes [188] conditionally decoded as in [7]. Figure 6.6 plots the average squared PSNR estimation error versus Slepian-Wolf bit rate for different block sizes as we vary numbers of bits in quantization. It shows that the predicted performance closely matches the synthesized data simulation results. The results suggest using 16×16 block projection for video digest rates less than 50 kbps and 20 kbps for CIF and QCIF video sequences, respectively. 90 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 0.7 0.14 Simulation Performance Prediction Simulation Performance Prediction 0.12 Averaged |PSNR − ePSNR|2 Averaged |PSNR − ePSNR|2 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.08 0.06 0.04 0.02 0 1 2 3 4 5 6 Number of Bits in Quantization 7 0 1 8 2 (a) N=1584 3 4 5 6 Number of Bits in Quantization 7 8 (b) N=6336 Figure 6.5: These plots show the estimation errors of PSNR using maximum likelihood estimation with different configurations: number of samples N, and number of bits in quantization of X. 0 1 10 −1 ML 10 − PSNR|2 (dB2) 8x8 Simulation 16x16 Simulation 32x32 Simulation 8x8 Model 16x16 Model 32x32 Model Averaged |ePSNR Averaged |ePSNRML − PSNR|2 (dB2) 10 −2 10 −3 10 8x8 Simulation 16x16 Simulation 32x32 Simulation 8x8 Model 16x16 Model 32x32 Model 0 10 −1 10 −2 10 −3 −1 10 0 1 2 10 10 10 Slepian−Wolf Bit Rate (kbps) 3 10 (a) CIF 10 −1 10 0 1 10 10 Slepian−Wolf Bit Rate (kbps) 2 10 (b) QCIF Figure 6.6: Average squared PSNR estimation error of a 16-frame GOP versus Slepian-Wolf bit rate for 30 fps video in (a) CIF and (b) QCIF formats. The predicted performance using (6.16) and conditional entropy H(Xq |Y ) matches the synthesized data simulation results. 6.3 Experimental Results We compare our quality monitoring scheme to various solutions based on ITU-T J.240 Recommendation. We use original videos consisting of the first 160 frames of Foreman, Football, News, Mobile, and Coastguard CIF video sequences at 30 frames 91 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC per second (fps) for simulation. To create received videos, the video sequences are first compressed and reconstructed by H.264 with quantization parameters (QP) 21, 24, and 26, for I-, P- and B- pictures, respectively. The group of picture (GOP) coding structure is IBBPBBP and GOP size is 16 frames. Then the compressed video is transcoded into CIF or QCIF resolution with GOP structure IPPP, GOP size 16 frames, and QP at most 38. The reconstruction yields the received videos. Figure 6.7 plots the distortion-rate curves of test video sequences. The reconstruction 42 42 40 40 38 36 34 32 Foreman Football News Mobile CoastGuard 30 28 26 0 500 1000 1500 2000 Rate (kbps) (a) CIF 2500 3000 3500 Reconstruction PSNR (dB) Reconstruction PSNR (dB) PSNR varies from 26 dB to 38 dB. 38 36 34 32 Foreman Football News Mobile CoastGuard 30 28 26 0 200 400 Rate (kbps) 600 800 (b) QCIF Figure 6.7: Distortion-rate curves of the transcoded test video sequences of (a) CIF and (b) QCIF. The reconstructed PSNR varies from 26 dB to 38 dB. At the user, a video digest unit consists of 16 frames. In the simulations, we vary the quantization of the random projection coefficients to different numbers of bitplanes. Each bitplane is coded at the Slepian-Wolf encoder using rate-adaptive LDPC codes [188] with block size of 6336 bits for each CIF bitplane and 1584 bits for each QCIF bitplane. At the quality monitoring server, the bitplanes are conditionally decoded as in [7]. Figure 6.8 shows the root mean square (RMS) PSNR estimation error as we vary the number of bits in quantization, comparing the maximum likelihood in (6.6), J.240 PSNR estimation in (6.1), and MMSE reconstruction estimation in (6.3). Each point 92 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC represents the RMS PSNR estimation error G " N 1 i=1 N |ePSNRi − PSNRi |2 of the lu- minance component over 350 measurements using five video sequences, 10 GOPs per sequence, and seven transcoding QPs from the set (26, 28, ..., 38). Figure 6.8 indicates that we can obtain PSNR estimation error of just 0.2 dB with maximum likelihood estimation using 7- and 8-bit quantization for CIF and QCIF, respectively. J.240 uses both quantized X and Y and leads to inefficient PSNR estimation. The PSNR estimation using the MMSE reconstruction of X always underestimates the reconstruction MSE and yields large estimation error. 2 2 10 RMS of |ePSNR − PSNR| (dB) RMS of |ePSNR − PSNR| (dB) 10 1 10 0 10 −1 10 −2 10 0 MMSE Reconstruction J.240 Maximum Likelihood 2 4 6 8 Number of Bits in Quantization (a) CIF 1 10 0 10 −1 10 −2 10 12 10 0 MMSE Reconstruction J.240 Maximum Likelihood 2 4 6 8 Number of Bits in Quantization 10 12 (b) QCIF Figure 6.8: RMS PSNR estimation error versus the number of bits in the quantization of X. The PSNR estimation using MMSE reconstruction always underestimates the MSE and yields large estimation error. The J.240 uses both quantized X and Y and leads to inefficient PSNR estimation. Our maximum likelihood MSE estimation achieves 0.2 dB using only 7- and 8-bit quantization for CIF and QCIF, respectively. Figure 6.9 compares different combinations of estimation and coding methods by depicting the RMS PSNR estimation error versus the video digest data rate in kilobits per second (kbps), for videos at 30 fps. At RMS PSNR estimation error of 0.2 dB, maximum likelihood estimation and distributed source coding can reduce the video digest data rate up to 85% compared to the ITU-T J.240 Recommendation. This enables mobile users to feedback the received QCIF video digest at a reasonable rate of 7 kbps using distributed source coding, instead of 30 kbps with the ITU-T J.240 Recommendation. Figure 6.9 also depicts the performance lower bound derived from 93 CHAPTER 6. VIDEO QUALITY MONITORING USING DSC (6.16) and the conditional entropy H(Xq |Y ) as rate measurements. Even though the video data do not perfectly match the assumptions described in Section 6.2.1, the lower bound closely matches the simulation results. 2 2 10 J.240 ML Estimation + Fixed Length Coding ML Estimation + DSC Lower Bound 1 10 0 10 RMS of |ePSNR − PSNR|(dB) RMS of |ePSNR − PSNR| (dB) 10 J.240 ML Estimation + Fixed Length Coding ML Estimation + DSC Lower Bound 1 10 0 10 −1 10 −1 0 20 40 60 80 Bit Rate (kbps) (a) CIF 100 120 140 10 0 5 10 15 20 Bit Rate (kbps) 25 30 35 (b) QCIF Figure 6.9: RMS PSNR estimation error versus video digest data rates for videos at 30 fps. The maximum likelihood estimation lowers the PSNR estimation given the same number of bits in the quantization of X. Distributed source coding exploits the correlation between X and Y and yields a rate savings of 85% given the RMS PSNR estimation error of 0.2 dB. The simulation results demonstrate that using maximum likelihood estimation and distributed source coding is close to the performance prediction using the Cramér-Rao lower bound. 6.4 Summary A rate-efficient video quality monitoring scheme using distributed source coding is presented and investigated. In our scheme, each user sends a Slepian-Wolf coded projection of its received video to the quality monitoring server. The server decodes the projection using the original video as side information and then estimates MSE using maximum likelihood estimation. Distributed source coding exploits the correlation between the original and received video projections and leads to significant rate savings. We contribute a performance prediction of quality estimation for various system configurations using the Cramér-Rao lower bound. Distributed source CHAPTER 6. VIDEO QUALITY MONITORING USING DSC 94 coding and maximum likelihood estimation offer up to 85% video digest rate savings compared to the ITU-T J.240 Recommendation at the same performance. The performance prediction matches the simulation results for both synthesized and video data. Chapter 7 Conclusions and Future Work 7.1 Conclusions This dissertation presents and investigates a novel image authentication scheme that distinguishes legitimate encoding variations of an image from tampered versions based on distributed source coding and statistical methods. A two-state lossy channel model represents the statistical dependency between the original and the target images. Tampered degradations are captured by using a statistical image model, and legitimate compression noise is assumed to be additive white Gaussian noise. Dimensional reduction uses block projection to address the spatial correlation of the tampering model and to distinguish the tampered from legitimate degradations. Using Slepian-Wolf coding that exploits the correlation between the original and the target image projections achieves significant rate savings. The Slepian-Wolf decoder is extended to use Expectation Maximization algorithms to address the target images that have undergone contrast, brightness, and affine warping adjustment. The decoding loop iteratively estimates the editing parameters based on the side information and the soft information of the original image projection. The block projection offers a possibility to localize the tampering in an image that has been deemed tampered. The localization decoder infers the tampered locations and decodes the Slepian-Wolf bitstream by applying the message-passing algorithm 95 CHAPTER 7. CONCLUSIONS AND FUTURE WORK 96 over a factor graph. The factor graph represents the relationship among the SlepianWolf bitstream, projections of the original image and the target image, and the block states. Spatial models are applied to the block states to exploit the spatial correlation of the tampering. Simulation results demonstrate that the system can decode the Slepian-Wolf bitstream at a low rate even when side information is tampered and its contrast and brightness are adjusted. 1D and 2D spatial models exploiting the contiguity of tampering additionally reduce the localization data size by 12% to 15% and offer better localization performance compared to the independent model. In addition to the image authentication system, this dissertation explores a rateefficient video quality monitoring scheme using distributed source coding. In our scheme, each user sends a Slepian-Wolf coded projection of its received video to the quality monitoring server. The server decodes the projection using the original video as side information and then the MSE of the received video is estimated using maximum likelihood estimation. A PSNR estimation performance prediction using the Cramér-Rao lower bound is developed. The prediction suggests the choice of projection block size and the quantization. Distributed source coding and maximum likelihood estimation offer up to 85% video digest rate savings compared to the ITU-T J.240 Recommendation at the same performance. Advanced statistical signal processing methods play an important role throughout this dissertation. Spectral analysis provides the insight of choosing the right projection basis. The EM algorithm offers robustness against many common image adjustments. Statistical inference over factor graphs is the basis of the distributed source decoder and its extension to localize tampering. Maximum likelihood estimation in video quality monitoring achieves accurate PSNR estimation. We consistently find that techniques based on a rigorous mathematical analysis greatly outperform ad hoc methods. 7.2 Future Work Our novel ideas on authentication using distributed source coding have attracted attention in the research community. With additional assumptions on sparsity of CHAPTER 7. CONCLUSIONS AND FUTURE WORK 97 tampering, similar schemes are proposed using Wyner-Ziv coding and compressive sensing for image authentication [177, 178] and audio authentication [132, 184]. Our tampering localization ideas have been adopted by a number of other applications, such as coding of thumbnail video for distortion-aware retransmission [95] and coding of digest data for video file synchronization [216–218]. To the best of our knowledge, little work has been carried out to date towards statistical models for image tampering. This thesis uses a simple additive model that assumes that a tampered image is formed by adding an image-like random process to the original image. Future work should consider more sophisticated tampering models. The current design of the proposed image authentication system uses a pseudorandom projection to prevent an attacker from altering the image in the null space of the projection. Our pseudorandom projection choice is based on the assumption that tampering is image-like. An attacker might attempt to compromise the system based on this assumption. With models for attacker’s incentives and designer’s objectives, a game theory analysis could suggest an equilibrium for a more sophisticated system design. The proposed authentication system uses EM algorithms to address the images that have undergone some adjustments. This thesis reported detailed algorithms for contrast, brightness, and affine warping adjustment. Many other common image processing, such as filtering and gamma correction, can be included for future extensions. The limits and optimization of distributed source coding combined with EM algorithms remain open. The tampering localization in our authentication system is achieved using inference over the decoder factor graph which combines LDPC code graph and spatial models for tampering. The system can benefit from LDPC code optimization for the localization decoder and yield lower data requirements for the localization. The quality monitoring system studied in this thesis focuses on PSNR estimation using the ITU-T J.240 Recommendation projection. The proposed system can also be applied to other features and the corresponding metrics proposed in other quality assessment literatures. This will raise interesting design issues on distributed source CHAPTER 7. CONCLUSIONS AND FUTURE WORK 98 coding and optimal quality estimation for various quality assessment features. A natural extension considering a symmetric setup in which the content provider and the viewer send information to a central quality monitoring server will pose interesting challenges. While the general problem of statistical inference under rate constraints remains open, it has inspired us to investigate distributed source coding applications that infer the relation among separated sources instead of reconstructing the source. We believe that there are many other applications that fit this setting and can greatly benefit from distributed source coding. Appendix A Test Images Throughout this thesis, simulations are carried out using Kodak and classic test images shown in Figure A.1. All images are at 512×512 resolution in 8-bit gray resolution. Figure A.1: Test images used in simulations. 99 Appendix B Concavity of L̂ We use the concavity of L̂(α, β) in (4.2), (4.4), and (5.2) to claim the optimality conditions in terms of partial derivatives being zero. We first derive the Hessian of L0 (xq , i; α, β) = log P (xq |Y (i); α, β) and then give the condition of quantization " " of X for L̂(α, β) = i xq Qi (xq )L0 (xq , i; α, β) being concave, where Qi (xq ) is the resulting estimate of xq in the E-steps. Recall that αX|Y ∼ N((Y − β), σ 2 ), we have L0 = log P (xq |Y ; α, β) ? 1 (Y − αx − β)2 √ )dx = log exp(− 2σ 2 2πσ x:Q(x)=xq Let f (X; α, β) = √1 2πσ 2 ). The first order derivatives of L0 with exp(− (Y −αX−β) 2σ2 respective to α and β are ? ∂L0 x 1 = f (x; α, β) 2 (Y − αx − β)dx ∂α P (xq |Y ; α, β) x:Q(X)=xq σ 1 = 2 E[(Y − αX − β)X|xq , Y ; α, β] σ 100 APPENDIX B. CONCAVITY OF L̂ 101 and ? 1 1 ∂L0 f (x; α, β) 2 (Y − αx − β)dx = ∂β P (xq |Y ; α, β) x:Q(X)=xq σ 1 = 2 E[(Y − αX − β)|xq , Y ; α, β] σ The second order derivatives of L0 are D ∂ 2 L0 1 C = 4 E[(Y − αX − β)2 X 2 |xq , Y ; α, β] − E 2 [(Y − αX − β)X|xq , Y ; α, β] 2 ∂α σ E[X 2 |xq , Y ; α, β] − σ2 2 D ∂ L0 1 C = 4 E[(Y − αX − β)2 |xq , Y ; α, β] − E 2 [(Y − αX − β)|xq , Y ; α, β] 2 ∂β σ 1 − 2 σ 1 ∂ 2 L0 = 4 E[(Y − αX − β)2 X|xq , Y ; α, β] ∂α∂β σ 1 − 4 E[(Y − αX − β)|xq , Y ; α, β]E[(Y − αX − β)X|xq , Y ; α, β] σ 1 − 2 E[X|xq , Y ; α, β] σ We rewrite it in a Hessian matrix, ∇2 L0 = ' ∂ 2 L0 ∂α2 ∂ 2 L0 ∂α∂β 1 = 4 Cov σ ( ∂ 2 L0 ∂α∂β ∂ 2 L0 ∂β 2 '' (H '' (H ( ( H X 2 X HH 1 H Hxq , Y − 2 E Hxq , Y σ (Y − αX − β) H X 1 H (Y − αX − β)X (B.1) " " Qi (xq )∇2α,β L0 (xq , i) $ 0, then L̂(α, β) is concave. " With an additional assumption that N1 i Qi (Xq ) converges P (Xq ) for a suffi- This suggests that if i xq ciently large number of samples N, the second order derivative of L̂(α, β) approaches APPENDIX B. CONCAVITY OF L̂ 102 to E[∇2 L0 ]. 1 E[∇2 L0 ] = 4 Cov σ '' (Y − αX − β)X (( (Y − αX − β) ' '' (H (( ' ( (Y − αX − β)X HH X2 X 1 1 − 4 Cov E − 2E HXq , Y σ σ (Y − αX − β) H X 1 ' '' (H (( ' ( (Y − αX − β)X HH 1 0 1 = − 4 Cov E HXq , Y σ (Y − αX − β) H 0 0 (B.2) (B.3) where (B.2) is due to the law of total variance and (B.3) uses the third and the fourth order Gaussian statistics. This suggests that the choose of the quantization of X should satisfy E[∇2 L0 ] = such that as function. 1 N ' " i 1 0 0 0 ( ' '' (H (( (Y − αX − β)X HH 1 − 4 Cov E $ 0, HXq , Y σ (Y − αX − β) H (B.4) Qi (xq ) converges P (xq ) for sufficiently large N, L̂(α, β) is a concave 2 Figure B.1 plots det(E[∇2 L0 ]) and E[ ∂∂αL20 ] as we vary the number of bits in quantization of X, for Y uniformly distributed over [10, 235], α = 1.3, β = −10, and σ 2 = 10. For this setting, the 2-bit quantization is sufficient to make L̂ be a concave function of α and β. Note that without quantization, i.e., Q(X) = X, we have 1 E[∇2 L0 ] = − 2 E σ '' X2 X X 1 (( . (B.5) This serves as a reference to see how quantization of X effects the estimation of α and β. APPENDIX B. CONCAVITY OF L̂ 103 30 0 25 −200 −E[X2]/$2 −400 20 −600 15 Value Value E[ "2 L0 / " #2] 10 −800 −1000 5 −1200 det(E[!2L0]) 0 −1400 Without Quantization −5 0 2 4 6 8 Number of Bits in Quantization of X (a) 10 −1600 0 2 4 6 8 Number of Bits in Quantization of X 10 (b) Figure B.1: Let Y be uniformly distributed over [10, 235] and we set α = 1.3, β = −10, 2 and σ 2 = 10. We plot (a) det(E[∇2 L0 ]) and (b) E[ ∂∂αL20 ] as we vary the number of bits in quantization of X. For this setting, the 2-bit quantization of X is sufficient to make L̂ be a concave function of α and β. Bibliography [1] BitTorrent. http://www.bittorrent.com/. [2] eMule Project. http://emule-project.net/. [3] KaZaA. http://www.kazaa.com/. [4] A. Aaron and B. Girod. Compression with side information using Turbo codes. In Data Compression Conference, pages 252–261, Snowbird, UT, April 2002. [5] A. Aaron and B. Girod. Wyner-Ziv video coding with low-encoder complexity. In Picture Coding Symposium, San Francisco, CA, December 2004. [6] A. Aaron, S. Rane, and B. Girod. Wyner-Ziv video coding with hash-based motion compensation at the receiver. In IEEE International Conference on Image Processing, Singapore, October 2004. [7] A. Aaron, S. Rane, E. Setton, and B. Girod. Transform-domain Wyner-Ziv codec for video. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, 2004. [8] A. Aaron, S. Rane, R. Zhang, and B. Girod. Wyner-Ziv coding for video: Applications to compression and error resilience. In IEEE Data Compression Conference, Snowbird, UT, November 2003. [9] A. Aaron, S. Setton, and B. Girod. Towards practical Wyner-Ziv coding of video. In IEEE International Conference on Image Processing, Barcelona, Spain, September 2003. 104 BIBLIOGRAPHY 105 [10] A. Aaron, D. Varodayan, and B. Girod. Wyner-Ziv residual coding of video. In Picture Coding Symposium, Beijing, China, December 2006. [11] A. Aaron, R. Zhang, and B. Girod. Wyner-Ziv coding of motion video. In Asilomar Conference on Signals and Systems, Pacific Grove, CA, November 2002. [12] M. Abdel-Mottaleb, G. Vaithilingam, and S. Krishnamachari. Signature-based image identification. In SPIE conference on Multimedia Systems and Applications, pages 22–28, Boston, MA, September 1999. [13] R. Ahlswede and I. Csiszar. Hypothesis testing with communication constraints. IEEE Transactions on Information Theory, 32(4):533–542, July 1986. [14] F. Ahmed and M.Y. Siyal. A secure and robust hashing scheme for image authentication. In Information, Communications and Signal Processing, 2005 Fifth International Conference on, pages 705–709, 2005. [15] D. R. Ashbaugh. Ridgeology, 1991. [16] J. Bajcsy and P. Mitran. Coding for the Slepian-Wolf problem with Turbo codes. In IEEE Global Telecommunications Conference, volume 2, pages 1400–1404, 2001. [17] T. Barnwell. Correlation analysis of subjective and objective measures for speech quality. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, pages 706 – 709, April 1980. [18] T. Barnwell and A. Bush. Statistical correlation between objective and subjective measures for speech quality. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 595 – 598, April 1978. [19] L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1):164–171, October 1970. BIBLIOGRAPHY 106 [20] C. Berrou, A. Glavieux, and P. Thitimajshima. Near Shannon limit errorcorrecting coding and decoding: Turbo-codes. In IEEE International Conference on Communications, volume 2, pages 1064–1070, May 1993. [21] S. Bhattacharjee and M. Kutter. Compression tolerant image authentication. In International Conference on Image Processing, volume 1, pages 435–439, October 1998. [22] A. Bouzidi and N. Baaziz. Contourlet domain feature extraction for image content authentication. In IEEE International Workshop on Multimedia Signal Processing, pages 202–206, October 2006. [23] P. Campisi, M. Carli, G. Giunta, and A. Neri. Blind quality assessment system for multimedia communications using tracing watermarking. IEEE Transactions on Signal Processing, 51(4):996–1002, April 2003. [24] D. Chen, D. Varodayan, M. Flierl, and B. Girod. Distributed stereo image coding with improved disparity and noise estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, NV, March 2008. [25] D. Chen, D. Varodayan, M. Flierl, and B. Girod. Wyner-Ziv coding of multiview images with unsupervised learning of disparity and gray code. In IEEE International Conference on Image Processing, San Diego, CA, October 2008. [26] D. Chen, D. Varodayan, M. Flierl, and B. Girod. Wyner-Ziv coding of multiview images with unsupervised learning of two disparities. In International Conference on Multimedia and Expo, Hannover, Germany, June 2008. [27] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493– 507, December 1952. [28] N.-M. Cheung and A. Ortega. Distributed source coding application to lowdelay free viewpoint switching in multiview video compression. In Picture Coding Symposium, isbon, Portugal, November 2007. BIBLIOGRAPHY 107 [29] N.-M. Cheung and A. Ortega. Flexible video decoding: A distributed source coding approach. In IEEE International Workshop on Multimedia Signal Processing, Crete, Greece, October 2007. [30] N.-M. Cheung and A. Ortega. Compression algorithms for flexible video decoding. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, January 2008. [31] N.-M. Cheung, H. Wang H., and A. Ortega. Video compression with flexible playback order based on distributed source coding. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, January 2006. [32] K. Chono, Y.-C. Lin, D. Varodayan, Y. Miyamoto, and B. Girod. Reducedreference image quality assessment using distributed source coding. In IEEE International Conference on Multimedia and Expo, pages 609–612, April 2008. [33] T. P. Coleman, A. H. Lee, M. Medard, and M. Effros. On some new approaches to practical Slepian-Wolf compression inspired by channel coding. In Data Compression Conference, pages 282–291, March 2004. [34] T. Cover. A proof of the data compression theorem of Slepian and Wolf for ergodic sources. IEEE Transactions on Information Theory, 21(2):226–228, March 1975. [35] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 1991. [36] I. J. Cox, J. Kilian, T. Leighton, and T. Shamoon. Secure spread spectrum watermarking for images, audio and video. In IEEE Internation Conference on Image Processing, Lausanne, Switzerland, September 1996. [37] H. Cramér. Mathematical Methods of Statistics. Princeton Univ. Press, 1946. [38] I. Csiszar. Linear codes for sources and source networks: Error exponents, universal coding. IEEE Transactions on Information Theory, 28(4):585–592, July 1982. BIBLIOGRAPHY 108 [39] J. Daugman and C. Downing. Epigenetic randomness, complexity, and singularity of human iris patterns. In Proceedings of the Royal Society, volume B, pages 1737 – 1740, 2001. [40] C. De Roover, C. De Vleeschouwer, F. Lefebvre, and B. Macq. Robust image hashing based on radial variance of pixels. In IEEE International Conference on Image Processing, volume 3, pages 77–80, September 2005. [41] C. De Roover, C. De Vleeschouwer, F. Lefebvre, and B. Macq. Robust video hashing based on radial projections of key frames. IEEE Transactions on Signal Processing, 53(10):4020–4037, October 2005. [42] W. Diffie and M. E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22(6):644–654, January 1976. [43] J. Dittmann, A. Steinmetz, and R. Steinmetz. Content-based digital signature for motion pictures authentication and content-fragile watermarking. In IEEE International Conference on Multimedia Computing and Systems, volume 2, pages 209–213, July 1999. [44] P. L. Dragotti and M. Gastpar. Distributed Source Coding. Academic Press, 2009. [45] S. C. Draper, A. Khisti, E. Martinian, A. Vetro, and J. S. Yedidia. Secure storage of fingerprint biometrics using Slepian-Wolf codes. In Workshop on Information Theory and Applications, San Diego, CA, 2007. [46] S. C. Draper, A. Khisti, E. Martinian, A. Vetro, and J. S. Yedidia. Using distributed source coding to secure fingerprint biometric. In IEEE International Conference on Acoustics, Speech, and Singal Processing, Honolulu, HI, April 2007. [47] D. Eastlake. US secure hash algorithm 1 (SHA1), RFC 3174, September 2001. BIBLIOGRAPHY 109 [48] J. J. Eggers and B. Girod. Blind watermarking applied to image authentication. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, May 2001. [49] U. Engelke, M. Kusuma, H.-J. Zepernick, and M. Caldera. Reduced-reference metric design for objective perceptual quality assessment in wireless imaging. Signal Processing: Image Communication, 24(7):525 – 547, 2009. [50] U. Engelke, V. X. Nguyen, and H.-J. Zepernick. Regional attention to structural degradations for perceptual image quality metric design. In Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 869–872, April 2008. [51] U. Engelke and H.-J. Zepernick. Multi-resolution structural degradation metrics for perceptual image quality assessment. In Picture Coding Symposium, Lisbon, Portugal, November 2007. [52] U. Engelke and H.-J. Zepernick. Quality evaluation in wireless imaging using feature-based objective metrics. In International Symposium on Wireless Pervasive Computing, pages 367–372, February 2007. [53] M. C. Q. Farias, S. Mitra, M. Carli, and A. Neri. A comparison between an objective quality measure and the mean annoyance values of watermarked videos. In IEEE International Conference on Image Processing, volume 3, pages 469–472, 2002. [54] M. C. Q. Farias and S. K. Mitra. No-reference video quality metric based on artifact measurements. In IEEE International Conference on Image Processing, volume 3, pages 141–144, September 2005. [55] H. Farid. Image forgery detection. IEEE Signal Processing Magazine, 26(2):16– 25, March 2009. [56] J. Fridrich. Robust bit extraction from images. In International Conference on Multimedia Computing and Systems, volume 2, pages 536–540, July 1999. 110 BIBLIOGRAPHY [57] J. Fridrich and M. Goljan. Robust hash functions for digital watermarking. In International Conference on Information Technology: Coding and Computing, pages 178–183, 2000. [58] R. G. Gallager. Low-Density Parity Check Codes. PhD thesis, MIT, Cambridge, MA, 1963. [59] J. Garcı́a-Frı́as. Compression of correlated binary sources using Turbo codes. IEEE Communications Letters, 5(10):417–419, October 2001. [60] J. Garcı́a-Frı́as. Decoding of low-density parity-check codes over finite-state binary Markov channels. IEEE Transactions on Communications, 52(11):1840– 1843, November 2004. [61] J. Garcı́a-Frı́as and Y. Zhao. Data compression of unknown single and correlated binary sources using punctured Turbo codes. In Allerton Conference on Communication, Control, and Computing, Monticello, IL, October 2001. [62] J. Garcı́a-Frı́as and Wei Zhong. LDPC codes for compression of multi- terminal sources with hidden Markov correlation. IEEE Communications Letters, 7(3):115–117, March 2003. [63] N. Gehrig and P. L. Dragotti. Symmetric and asymmetric Slepian-Wolf codes with systematic and nonsystematic linear codes. IEEE Communications Letters, 9(1):61–63, January 2005. [64] B. Girod, A. M. Aaron, S. Rane, and D. Rebollo-Monedero. Distributed video coding. Proceedings of the IEEE, 93(1):71–83, January 2005. [65] T. S. Han. Hypothesis testing with multiterminal data compression. IEEE Transactions on Information Theory, 33(6):759–772, November 1987. [66] T. S. Han and S. Amari. Statistical inference under multiterminal data compression. IEEE Transactions on Information Theory, 44(6):2300–2324, October 1998. BIBLIOGRAPHY 111 [67] A. M. Hassan, A. Al-Hamadi, B. Michaelis, Y. M. Y. Hasan, and M. A. A. Wahab. Semi-fragile image authentication using robust image hashing with localization. In IEEE International Conference on Machine Vision, pages 133 –137, December 2009. [68] S. S Hemami and M. A. Masry. A scalable video quality metric and applications. In International Workshop on Video Processing and Quality Metrics for Consumer Electronics, January 2005. [69] T. Ignatenko. Secret-Key Rates and Privacy Leakage in Biometric Systems. PhD thesis, Eindhoven University of Technology, The Netherlands, 2009. [70] T. Ignatenko and F. M. J. Willems. On privacy in secure biometric authentication systems. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages II–121–II–124, April 2007. [71] T. Ignatenko and F. M. J. Willems. Privacy leakage in biometric secrecy systems. In 46th Annual Allerton Conference on Communication, Control, and Computing, pages 850–857, September 2008. [72] T. Ignatenko and F. M. J. Willems. Secret rate - privacy leakage in biometric systems. In IEEE International Symposium on Information Theory, pages 2251– 2255, 28 2009-July 3 2009. [73] ISO/IEC. IS 10918-1: Information technology – Digital compression and coding of continuous-tone still images: Requirements and guidelines, 1990. [74] ISO/IEC. IS information technology – Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s–part 2:video, 1993. [75] ISO/IEC. IS 13818-2: Information technology – Generic coding of moving pictures and associated audio informatio–part 2:video, 1995. [76] ISO/IEC. IS 14496-10: Information technology – Coding of audio-visual objects – part 10: Advanced video coding, 2003. 112 BIBLIOGRAPHY [77] ISO/IEC. IS 15444: Information technology – JPEG 2000 image coding system, 2004. [78] ITU-T. Recommendation J.147: Objective picture quality measurement method by use of in-service test signals, July 2002. [79] ITU-T. Recommendation H.264: Advanced video coding for generic audiovisual services, 2003. [80] ITU-T. Recommendation J.240: Framework for remote monitoring of transmitted picture signal-to-noise ratio using spread-spectrum and orthogonal transform, June 2004. [81] A.K. Jain. Advances in mathematical models for image processing. Proceedings of the IEEE, 69(5):502 – 528, May 1981. [82] N. Jayant, J. Johnston, and R. Safranek. Signal compression based on models of human perception. Proceedings of the IEEE, 81(10):1385 –1422, October 1993. [83] M. Johnson and K. Ramchandran. Dither-based secure image hashing using distributed coding. In IEEE International Conference on Image Processing, volume 2, pages 751–754, September 2003. [84] C. Kailasanathan and R. C Naini. Image authentication surviving acceptable modifications using statistical measures and k-mean segmentation. In Workshop on Nonlinear Signal and Image Processing, June 2001. [85] C. Kailasanathan, R. S. Naini, and P. Ogunbona. Compression tolerant DCT based image hash. In International Conference on Distributed Computing Systems Workshops, pages 562–567, May 2003. [86] R. Kawada, O. Sugimoto, A. Koike, M. Wada, and S. Matsumoto. Highly precise estimation scheme for remote video PSNR using spread spectrum and extraction of orthogonal transform coefficients. Electronics and Communications in Japan (Part I: Communications), 89(6):51–62, 2006. BIBLIOGRAPHY 113 [87] N. Khanna, A. Roca, G. T. C. Chiu, J. P. Allebach, and Delp E. J. Improvements on image authentication and recovery using distributed source coding. In SPIE Conference on Media Forensics and Security, 2009. [88] S. S. Kozat, R. Venkatesan, and M. K. Mihcak. Robust perceptual image hashing via matrix invariants. In Image Processing, 2004. ICIP ’04. 2004 International Conference on, volume 5, pages 3443–3446, October 2004. [89] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sumproduct algorithm. IEEE Transactions on Information Theory, 47(10):498–519, February 2001. [90] T. M. Kusuma and H.-J. Zepernick. A reduced-reference perceptual quality metric for in-service image quality assessment. In IEEE Joint First Workshop on Mobile Future and Symposium on Trends in Communications, pages 71–74, October 2003. [91] C. J. Lambrecht and O. Verscheure. Perceptual quality measure using a spatiotemporal model of the human visual system. In SPIE Conference on Digital Video Compression: Algorithms and Technologies, pages 450–460, January 1996. [92] C.-F. Lan, A. D. Liveris, K. Narayanan, Z. Xiong, and C. Georghiades. SlepianWolf coding of multiple M-ary sources using LDPC codes. In Data Compression Conference, pages 549–, March 2004. [93] P. Le Callet, C. Viard-Gaudin, and D. Barba. Continuous quality assessment of MPEG2 video with reduced reference. In International Workshop on Video Processing and Quality Metrics for Consumer Electronics, January 2005. [94] F. Lefebvre, J. Czyz, and B. Macq. A robust soft hash algorithm for digital image signature. In International Conference on Multimedia and Expo, Baltimore, Maryland, 2003. BIBLIOGRAPHY 114 [95] Z. Li, Y.-C. Lin, D. Varodayan, P. Baccichet, and B. Girod. Distortion-aware retransmission and concealment of video packets using a Wyner-Ziv-coded thumbnail. In IEEE International Workshop on Multimedia Signal Processing, pages 424 –428, October 2008. [96] J. Liang, R. Kumar, Y. Xi, and K. W. Ross. Pollution in P2P file sharing systems. In INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, volume 2, pages 1174–1185, March 2005. [97] C.-Y. Lin and S.-F. Chang. Generating robust digital signature for image/video authentication. In ACM Multimedia: Multimedia and Security Workshop, pages 49–54, Bristol, UK, September 1998. [98] C.-Y. Lin and S.-F. Chang. A robust image authentication method surviving JPEG lossy compression. In SPIE Conference on Storage and Retrieval for Image and Video Database, San Jose, CA, January 1998. [99] C.-Y. Lin and S.-F. Chang. A robust image authentication method distinguishing JPEG compression from malicious manipulation. IEEE Transactions on Circuits and Systems for Video Technology, 11(2):153–168, February 2001. [100] Y.-C. Lin, D. Varodayan, T. Fink, E. Bellers, and B. Girod. Authenticating contrast and brightness adjusted images using distributed source coding and expactation maximization. In International Conference on Multimedia and Expo, Hannover, Germany, June 2008. [101] Y.-C. Lin, D. Varodayan, T. Fink, E. Bellers, and B. Girod. Localization of tampering in contrast and brightness adjusted images using distributed source coding and expectation maximization. In IEEE International Conference on Image Processing, San Diego, CA, October 2008. [102] Y.-C. Lin, D. Varodayan, and B. Girod. Image authentication and tampering localization using distributed source coding. In IEEE Multimedia Signal Processing Workshop, Crete, Greece, Ocbober 2007. 115 BIBLIOGRAPHY [103] Y.-C. Lin, D. Varodayan, and B. Girod. Image authentication based on distributed source coding. In IEEE International Conference on Image Processing, San Antonio, TX, September 2007. [104] Y.-C. Lin, D. Varodayan, and B. Girod. Spatial models for localization of image tampering using distributed source codes. In Picture Coding Symposium, Lisbon, Portugal, November 2007. [105] Y.-C. Lin, D. Varodayan, and B. Girod. Authenticating cropped and re- sized images using distributed source coding and expectation maximization. In IS&T/SPIE Electronic Imaging, Media Forensics and Security XI, San Jose, CA, January 2009. [106] Y.-C. Lin, D. Varodayan, and B. Girod. Distributed source coding authentication of images with affine warping. In IEEE International Conference on Acoustic, Speech, and Signal Processing, Taipei, Taiwan, April 2009. [107] Y.-C. Lin, D. Varodayan, and B. Girod. Distributed source coding authentication of images with contrast and brightness adjustment and affine warping. In International Picture Coding Symposium, Chicago, IL, May 2009. [108] Y.-C. Lin, D. Varodayan, and B. Girod. Video quality monitoring for mobile multicast peers using distributed source coding. In 5th International Mobile Multimedia Communications Conference, London, UK, September 2009. [109] H. Liu and I. Heynderickx. A no-reference perceptual blockiness metric. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 865–868, 2008. [110] S. Liu and A. C. Bovik. Efficient DCT-domain blind measurement and reduction of blocking artifacts. IEEE Transactions on Circuits and Systems for Video Technology, 12(12):1139–1149, December 2002. BIBLIOGRAPHY 116 [111] A. Liveris, Z. Xiong, and C. Georghiades. Compression of binary sources with side information at the decoder using LDPC codes. In IEEE Global Communications Symposium, volume 2, pages 1300–1304, Taipei, Taiwan, November 2002. [112] A. Liveris, Z. Xiong, and C. Georghiades. Compression of binary sources with side information at the decoder using LDPC codes. IEEE Communications Letters, 6(10):440–442, October 2002. [113] D.-C. Lou and J.-L. Liu. Fault resilient and compression tolerant digital signature for image authentication. IEEE Transactions on Consumer Electronics, 46(1):31–39, February 2000. [114] C.-S. Lu, C.-Y. Hsu, S.-W. Sun, and P.-C. Chang. Robust mesh-based hashing for copy detection and tracing of images. In IEEE International Conference on Multimedia and Expo, volume 1, pages 731–734, June 2004. [115] C.-S. Lu and H.-Y. M. Liao. Structural digital signature for image authentication: an incidental distortion resistant scheme. In ACM workshops on Multimedia, pages 115–118, Los Angeles, CA, 2000. [116] C.-S. Lu and H.-Y. M. Liao. Structural digital signature for image authentication: an incidental distortion resistant scheme. IEEE Transactions on Multimedia, 5(2):161–173, June 2003. [117] Ligang Lu, Zhou Wang, A. C. Bovik, and J. Kouloheris. Full-reference video quality assessment considering structural distortion and no-reference quality evaluation of MPEG video. In IEEE International Conference on Multimedia and Expo, volume 1, pages 61–64, 2002. [118] J. Lukas and J. Fridrich. Estimation of primary quantization matrix in double compressed JPEG images. In Digital Forensic Research Workshop, August 2003. BIBLIOGRAPHY 117 [119] W. Lv and Z.J. Wang. Fast Johnson-Lindenstrauss transform for robust and secure image hashing. In IEEE Workshop on Multimedia Signal Processing, pages 725–729, October 2008. [120] E. Martinian, S. Yekhanin, and J. S. Yedidia. Secure biometrics via syndromes. In Allerton Conference on Communications, Control and Computing, Monticello, IL, September 2005. [121] M. Masry, S. S. Hemami, and Y. Sermadevi. A scalable wavelet-based video distortion metric and applications. IEEE Transactions on Circuits and Systems for Video Technology, 16(2):260–273, February 2006. [122] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference, 2002. [123] K. Mihcak and R. Venkatesan. New iterative geometric techniques for robust image hashing. In Workshop on Security and Privacy in Digital Rights Management, pages 13–21, November 2001. [124] V. Monga and B. L. Evans. Robust perceptual image hashing using feature points. In IEEE International Conference on Image Processing, volume 1, pages 677–680, October 2004. [125] V. Monga and B. L. Evans. Perceptual image hashing via feature points: Performance evaluation and tradeoffs. IEEE Transactions on Image Processing, 15(11):3452–3465, November 2006. [126] V. Monga and M. K. Mhcak. Robust and secure image hashing via non-negative matrix factorizations. IEEE Transactions on Information Forensics and Security, 2(3):376–390, September 2007. [127] V. Monga and M. K. Mihcak. Robust image hashing via non-negative matrix factorizations. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, May 2006. 118 BIBLIOGRAPHY [128] J. Oostveen, T. Kalker, and J. Haitsma. Visual hashing of video: applications and techniques. In SPIE Conference on Applications of Digital Image Processing, page 121131, San Diego, CA, July 2001. [129] A.C. Popescu and H. Farid. Exposing digital forgeries in color filter array interpolated images. IEEE Transactions on Signal Processing, 53(10):3948– 3959, October 2005. [130] S. S. Pradhan, J. Kusuma, and K. Ramchandran. Distributed compression in a dense microsensor network. IEEE Signal Processing Magazine, 19(2):51–60, March 2002. [131] S. S. Pradhan and K. Ramchandran. Distributed source coding using syndromes (DISCUS): design and construction. In Data Compression Conference, pages 158–167, March 1999. [132] G. Prandi, G. Valenzise, M. Tagliasacchi, and A. Sarti. Detection and identification of sparse audio tampering using distributed source coding and compressive sensing techniques. In International Conference on Digital Audio Effects, September 2008. [133] J. W. Pratt. F. Y. Edgeworth and R. A. Fisher on the efficiency of maximum likelihood estimation. The Annals of Statistics 4, 3:501–514, 1976. [134] R. Puri, A. Majumdar, and K. Ramchandran. PRISM: A video coding paradigm with motion estimation at the decoder. IEEE Transactions on Image Processing, 16(10):24362448, October 2007. [135] R. Puri and K. Ramchandran. PRISM: a new robust video coding architecture based on distributed compression principles. In Allerton Conference on Communication, Control, and Computing, Monticello, IL, 2002. [136] R. Puri and K. Ramchandran. PRISM: a ‘reversed’ multimedia coding paradigm. In IEEE International Conference on Image Processing, Barcelona, Spain, 2003. BIBLIOGRAPHY 119 [137] R. Puri and K. Ramchandran. PRISM: an uplink-friendly multimedia coding paradigm. In IEEE International Conference Acoustics, Speech, and Signal Processing, Hong Kong, China, 2003. [138] M. P. Queluz. Towards robust content based techniques for image authentication. In IEEE Workshop on Multimedia Signal Processing, pages 297–302, December 1998. [139] M. P. Queluz. Content-based integrity protection of digital images. In SPIE Conference on Security Watermarking Multimedia Contents, pages 85–93, San Jose, CA, January 1999. [140] S. Rane. Systematic Lossy Error Protection of Video Signals. PhD thesis, Stanford University, Stanford, CA, 2007. [141] S. Rane, A. Aaron, and B. Girod. Lossy forward error protection for errorresilient digital video broadcasting. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, July 2004. [142] S. Rane, A. Aaron, and B. Girod. Systematic lossy forward error protection for error resilient digital video broadcasting - a Wyner-Ziv coding approach. In IEEE International Conference on Image Processing, Singapore, October 2004. [143] S. Rane, A. Aaron, and B. Girod. Error-resilient video transmission using multiple embedded Wyner-Ziv descriptions. In IEEE International Conference on Image Processing, Genoa, Italy, September 2005. [144] S. Rane, P. Baccichet, and B. Girod. Modeling and optimization of a systematic lossy error protection based on H.264/AVC redundant slices. In Picture Coding Symposium, Beijing, China, April 2006. [145] S. Rane and B. Girod. Analysis of error-resilient video transmission based on systematic source-channel coding. In Picture Coding Symposium, San Francisco, CA, December 2004. BIBLIOGRAPHY 120 [146] S. Rane and B. Girod. Systematic lossy error protection versus layered coding with unequal error protection. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, January 2005. [147] S. Rane and B. Girod. Systematic lossy error protection based on H.264/AVC redundant slices. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, January 2006. [148] C. Rao. Information and the accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37:81–89, 1945. [149] A. R. Reibman, V.A . Vaishampayan, and Y. Sermadevi. Quality monitoring of video over a packet network. IEEE Transactions on Multimedia, 6(2):327–334, April 2004. [150] M. Ries, O. Nemethova, and M. Rupp. Motion based reference-free quality estimation for H.264/AVC video streaming. In IEEE International Symposium on Wireless Pervasive Computing, February 2007. [151] R Rivest. The MD5 message-digest algorithm, RFC 1321, April 1992. [152] R. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2):120– 126, 1978. [153] S. Roy and Q. Sun. Robust hash for detecting and localizing image tampering. In IEEE International Conference on Image Processing, San Antonio, TX, 2007. [154] M. Schlauweg and E. Müller. Gaussian scale-space features for semi-fragile image authentication. In Picture Coding Symposium, pages 1–4, May 2009. [155] M. Schlauweg, D. Pröfrock, and E. Müller. JPEG2000-based secure image authentication. In Workshop on Multimedia and Security, pages 62–67, Geneva, Switzerland, 2006. BIBLIOGRAPHY 121 [156] M. Schneider and S.-F. Chang. A robust content based digital signature for image authentication. In IEEE International Conference on Image Processing, volume 3, pages 227–230, September 1996. [157] D. Schonberg, S. Draper, and K. Ramchandran. On compression of encrypted images. In IEEE International Conference on Image Processing, pages 269–272, October 2006. [158] D. Schonberg, S. S. Pradhan, and K. Ramchandran. LDPC codes can approach the Slepian-Wolf bound for general binary sources. In Allerton Conference on Communication, Control, and Computing, Champaign, IL, October 2002. [159] D. Schonberg, S. S. Pradhan, and K. Ramchandran. Distributed code constructions for the entire Slepian-Wolf rate region for arbitrarily correlated sources. In Asilomar Conference on Signals, Systems and Computers, volume 1, pages 835–839, November 2003. [160] D. Schonberg, K. Ramchandran, and S. S. Pradhan. Distributed code constructions for the entire Slepian-Wolf rate region for arbitrarily correlated sources. In Data Compression Conference, pages 292–301, March 2004. [161] D. Schonberg, C. Yeo, S. C. Draper, and K. Ramchandran. On compression of encrypted video. In Data Compression Conference, 2007. DCC ’07, pages 173–182, March 2007. [162] J. S. Seo, J. Haitsma, T. Kalker, and C. D. Yoo. Affine transformation resilient image fingerprinting. In IEEE International Conference on Acoustics, Speech, and Singal Processing, Hong Kong, China, 2003. [163] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–659, July and October 1948. [164] H. R. Sheikh, A. C. Bovik, and L. Cormack. No-reference quality assessment using natural scene statistics: JPEG2000. IEEE Transactions on Image Processing, 14(11):1918–1927, November 2005. BIBLIOGRAPHY 122 [165] H. R. Sheikh, A. C. Bovik, and G. de Veciana. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Transactions on Image Processing, 14(12):2117–2128, December 2005. [166] H. Shimokawa, T. S. Han, and S. Amari. Error bound of hypothesis testing with data compression. In IEEE International Symposium on Information Theory, page 114, 1994. [167] D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources. IEEE Transactions on Information Theory, IT-19(4):471–480, July 1973. [168] V. Stankovic, A. D. Liveris, Z. Xiong, and C. N. Georghiades. Design of SlepianWolf codes by channel code partitioning. In Data Compression Conference, pages 302–311, March 2004. [169] O. Sugimoto, R. Kawada, M. Wada, and S. Matsumoto. Objective measurement scheme for perceived picture quality degradation caused by MPEG encoding without any reference pictures. In SPIE Conference on Visual Communications and Image Processing, volume 4310, pages 932–939, 2001. [170] Q. Sun, S.-F. Chang, M. Kurato, and M. Suto. A new semi-fragile image authentication framework combining ECC and PKI infrastructure. In IEEE International Symposium on Circuits and Systems, Phoenix, AZ, May 2002. [171] Q. Sun, S.-F. Chang, M. Kurato, and M. Suto. A quantitative semi-fragile JPEG2000 image authentication system. In IEEE International Conference on Image Processing, volume 2, pages 921–924, 2002. [172] Y. Sutcu, S. Rane, J. S. Yedidia, S. C. Draper, and A. Vetro. Feature extraction for a Slepian-Wolf biometric system using LDPC codes. In Information Theory, 2008. ISIT 2008. IEEE International Symposium on, pages 2297–2301, July 2008. [173] Y. Sutcu, S. Rane, J. S. Yedidia, S. C. Draper, and A. Vetro. Feature transformation for a Slepian-Wolf biometric system based on error correcting codes. BIBLIOGRAPHY 123 In IEEE Conference on Computer Vision and Pattern Recognition - Biometrics Workshop, Anchorage, Alaska, 2008. [174] A. Swaminathan, Y. Mao, and M. Wu. Image hashing resilient to geometric and filtering operations. In IEEE International Workshop on Multimedia Signal Processing, pages 355–358, September/October 2004. [175] A. Swaminathan, Y. Mao, and M. Wu. Robust and secure image hashing. IEEE Transctions on Information Forensics and Security, 1(2):215–230, June 2006. [176] M. Tagliasacchi, G. Valenzise, M. Naccari, and S. Tubaro. A reduced-reference structural similarity approximation for videos corrupted by channel errors. Springer Multimedia Tools and Applications, 48(3):471–492, 2010. [177] M. Tagliasacchi, G. Valenzise, and S. Tubaro. Localization of sparse image tampering via random projections. In IEEE International Conference on Image Processing, pages 2092 –2095, October 2008. [178] M. Tagliasacchi, G. Valenzise, and S. Tubaro. Hash-based identification of sparse image tampering. IEEE Transactions on Image Processing, 18(11):2491 –2504, November 2009. [179] Z. Tang, S. Wang, X. Zhang, and W. Wei. Perceptual similarity metric resilient to rotation for application in robust image hashing. In International Conference on Multimedia and Ubiquitous Engineering, pages 183–188, June 2009. [180] Z. Tang, S. Wang, X. Zhang, W. Wei, and S. Su. Robust image hashing for tamper detection using non-negative matrix factorization. Journal of Ubiquitous Convergence and Technology, 2(1):18–26, May 2008. [181] T. Tian, J. Garcı́a-Frı́as, and W. Zhong. Compression of correlated sources using LDPC codes. In Data Compression Conference, page 450, March 2003. [182] D. S. Turaga, Y. Chen, and J. Caviedes. No reference PSNR estimation for compressed pictures. In IEEE International Conference on Image Processing, volume 3, pages 61–64, 2002. BIBLIOGRAPHY 124 [183] G. Valenzise, M. Naccari, M. Tagliasacchi, and S. Tubaro. Reduced-reference estimation of channel-induced video distortion using distributed source coding. In ACM Multimedia, Vancouver, British Columbia, Canada, October 2008. [184] G. Valenzise, G. Prandi, and M. Tagliasacchi. Identification of sparse audio tampering using distributed source coding and compressive sensing techniques. EURASIP Journal on Image and Video Processing, 2009:1–12, 2009. [185] D. Varodayan. Adaptive Distributed Source Coding. PhD thesis, Stanford University, Stanford, CA, 2010. [186] D. Varodayan, A. Aaron, and B. Girod. Rate-adaptive distributed source coding using low-density parity-check codes. In Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, California, November 2005. [187] D. Varodayan, A. Aaron, and B. Girod. Exploiting spatial correlation in pixeldomain distributed image compression. In Picture Coding Symposium, Beijing, China, April 2006. [188] D. Varodayan, A. Aaron, and B. Girod. Rate-adaptive codes for distributed source coding. EURASIP Signal Processing Journal, Special Section on Distributed Source Coding, 86(11):3123–3130, November 2006. [189] D. Varodayan, D. Chen, M. Flierl, and B. Girod. Wyner-Ziv coding of video with unsupervised motion vector learning. EURASIP Signal Processing: Image Communication Journal, Special Issue on Distributed Video Coding,, 23(5):369– 378, June 2008. [190] D. Varodayan, Y.-C. Lin, A. Mavlankar, M. Flierl, and B. Girod. Wyner-Ziv coding of stereo images with unsupervised learning of disparity. In Picture Coding Symposium, Lisbon, Portugal, 2007. [191] D. Varodayan, A. Mavlankar, M. Flierl, and B. Girod. Distributed coding of random dot stereograms with unsupervised learning of disparity. In IEEE International Workshop on Multimedia Signal Processing, Victoria, BC, Canada, 2006. BIBLIOGRAPHY 125 [192] D. Varodayan, A. Mavlankar, M. Flierl, and B. Girod. Distributed grayscale stereo image coding with unsupervised learning of disparity. In IEEE Data Compression Conference, Snowbird, UT, 2007. [193] R. Venkatesan, S.-M. Koon, M. H. Jakubowski, and P. Moulin. Robust image hashing. In IEEE International Conference on Image Processing, volume 3, pages 664–666, 2000. [194] A. Vetro, S. C. Draper, S. Rane, and J. S. Yedida. Securing biometric data, chapter 11, pages 293–324. Academic Press, Inc., 2009. [195] Y. Wang, D. Wu, H. Zhang, and X. Niu. A robust contourlet based image hash algorithm. In IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pages 1010–1013, September 2009. [196] Z. Wang, A. C. Bovik, and B. L. Evan. Blind measurement of blocking artifacts in images. In IEEE International Conference on Image Processing, volume 3, pages 981–984, 2000. [197] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004. [198] Z. Wang, H. R. Sheikh, and A. C. Bovik. No-reference perceptual quality assessment of jpeg compressed images. In IEEE International Conference on Image Processing, volume 1, pages 477–480, September 2002. [199] Z. Wang and E. P. Simoncelli. Reduced-reference image quality assessment using a wavelet-domain natural image statistic model. In SPIE Conference on Human Vision and Electronic Imaging, San Jose, CA, January 2005. [200] Z. Wang, G. Wu, H. R. Sheikh, E. P. Simoncelli, E.-H. Yang, and A. C. Bovik. Quality-aware images. IEEE Transactions on Image Processing, 15(6):1680– 1689, June 2006. BIBLIOGRAPHY 126 [201] A. B. Watson, J. Hu, and J. F. McGowan III. Digital video quality metric based on human vision. SPIE Journal of Electronic Imaging, 10(1):20–29, 2001. [202] A. A. Webster, Jones C. T., M. H. Pinson, S. D. Voran, and S. Wolf. An objective video quality assessment system based on human perception. In SPIE Conference on Human Vision, Visual Processing, and Digital Display, volume 1913, pages 15–26, 1993. [203] S. Winkler. Issues in vision modeling for perceptual video quality assessment. Signal Processing, 78(2):231–252, 1999. [204] S. Wolf and M. H. Pinson. Spatial-temporal distortion metric for in-service quality monitoring of any digital video system. In SPIE Conference on Multimedia Systems and Applications, volume 3845, pages 266–277, 1999. [205] S. Wolf and M. H. Pinson. Low bandwidth reduced reference video quality monitoring system. In International Consumer Electronics Workshop on Video Processing and Quality Metrics, Scottsdale, AZ, January 2005. [206] R. B. Wolfgang and E. J. Delp. A watermark for digital images. In IEEE International Conference on Image Processing, Lausanne, Switzerland, September 1996. [207] A. Wyner. Recent results in the Shannon theory. IEEE Transactions on Information Theory, 20(1):2–10, January 1974. [208] L. Xie, G. R. Arce, and R. F. Graveman. Approximate image message authentication codes. IEEE Transactions on Multimedia, 3(2):242–252, June 2001. [209] Z. Xiong, A. D. Liveris, and S. Cheng. Distributed source coding for sensor networks. IEEE Signal Processing Magazine, 21(5):80–94, September 2004. [210] T. Yamada, Y. Miyamoto, and M. Serizawa. No-reference video quality estimation based on error-concealment effectiveness. In IEEE Packet Video Conference, Lausanne, Switzerland, November 2007. BIBLIOGRAPHY 127 [211] T. Yamada, Y. Miyamoto, M. Serizawa, and H. Harasaki. Reduced-reference based video quality metrics using representative luminance values. In International Consumer Electronics Workshop on Video Processing and Quality Metrics, Scottsdale, AZ, January 2007. [212] F. Yang, S. Wan, Y. Chang, and H. R. Wu. A novel objective no-reference metric for digital video quality assessment. IEEE Signal Processing Letters, 12(10):685–688, October 2005. [213] S. Yang. Robust image hash based on cyclic coding the distributed features. In International Conference on Hybrid Intelligent Systems, volume 2, pages 441–444, August 2009. [214] S.-H. Yang and C.-F. Chen. Robust image hashing based on SPIHT. In International Conference on Information Technology: Research and Education, pages 110–114, June 2005. [215] R.-X. Zhan, K. Y. Chau, Z.-M. Lu, B.-B. Liu, and W. H. Ip. Robust image hashing for image authentication based on DCT-DWT composite domain. In IEEE International Conference on Intelligent Systems Design and Applications, volume 2, pages 119–122, November 2008. [216] H. Zhang, C. Yeo, and K. Ramchandran. VSYNC: a novel video file synchronization protocol. In ACM Multimedia, pages 757–760, Vancouver, British Columbia, Canada, October 2008. [217] H. Zhang, C. Yeo, and K. Ramchandran. Rate efficient remote video file synchronization. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1845 –1848, April 2009. [218] H. Zhang, C. Yeo, and K. Ramchandran. Remote video file synchronization for heterogeneous mobile clients. In SPIE Conference on Applications of Digital Image Processing, volume 7443, page 74430F, 2009. BIBLIOGRAPHY 128 [219] H. Zhang, H. Zhang, Q. Li, and X. Niu. Predigest Watson’s visual model as perceptual hashing method. In International Conference on Convergence and Hybrid Information Technology, volume 2, pages 617 –620, November 2008. [220] H.-L. Zhang, C.-Q. Xiong, and G.-Z. Geng. Content based image hashing robust to geometric transformations. In International Symposium on Electronic Commerce and Security, volume 2, pages 105–108, may 2009. [221] Y. Zhao and J. Garcı́a-Frı́as. Data compression of correlated non-binary sources using punctured Turbo codes. In Data Compression Conference, pages 242–251, Snowbird, UT, April 2002. [222] Z. Zhu, A. Aaron, and B. Girod. Distributed compression for large camera array. In International Workshop on Statistical Signal Processing, St. Louis, MO, September 2003. [223] Z. Zhu, S. Rane, and B. Girod. Systematic lossy error protection (SLEP) for video transmission over wireless ad hoc networks. In SPIE Visual Communications and Image Processing Conference, Beijing, China, July 2005.