image authentication using distributed source coding

Transcription

image authentication using distributed source coding
IMAGE AUTHENTICATION
USING DISTRIBUTED SOURCE CODING
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Yao-Chung Lin
September 2010
© 2011 by Yao-Chung Lin. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons AttributionNoncommercial 3.0 United States License.
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/jw121yz9884
ii
Abstract
Image authentication is important in content delivery via untrusted intermediaries,
such as peer-to-peer (P2P) file sharing. Many differently encoded versions of the original image might exist. In addition, intermediaries might tamper with the contents.
Distinguishing legitimate diversity from malicious manipulations is the challenge addressed in this dissertation.
We propose an approach using distributed source coding for the image authentication problem. The key idea is to provide a Slepian-Wolf encoded quantized image
projection as authentication data. This version can be correctly decoded with the
help of an authentic image as side information. Distributed source coding provides
the desired robustness against legitimate variations while detecting illegitimate modification. The decoder incorporating expectation maximization (EM) algorithms can
authenticate images which have undergone contrast, brightness, and affine warping
adjustments. Our novel authentication system also offers tampering localization by
using inference over a factor graph that represents tampering models.
Video quality monitoring is closely related to the image authentication problem.
We contribute an approach using distributed source coding. The video receiver sends
the Slepian-Wolf coded video projections to the quality monitoring server which has
access to the original video. Distributed source coding provides rate-efficient encoding
of the projection by exploiting the correlation between the projections of the original
and received videos. We show that the projections can be encoded at a low rate of
just a few kilobits per second. Compared to the ITU-T J.240 Recommendation for
remote PSNR monitoring, our scheme achieves a bit-rate which is lower by at least
one order of magnitude.
iv
Acknowledgments
During my time at Stanford, I have been fortunate to have the support of many great
people. My advisor, Prof. Bernd Girod, consistently inspires me with interesting
research questions and creative suggestions to make my graduate study joyful. I
would like to thank Prof. Robert M. Gray and Dr. Ton Kalker for serving on my
committee and offering insightful comments that improved this dissertation. I am
deeply grateful to Dr. Erwin Bellers and Torsten Fink for mentoring my research
projects through three summers and beyond.
Working with Image, Video, and Multimedia Systems (IVMS) group is a rewarding
experience. We share not only sharp insights and stimulating ideas but also interesting
things happening surround. I sincerely thank my key collaborator, David Varodayan,
for enormous fruitful discussions that gradually enrich my research and baby care
experience. I would like to extend my gratitude to other members, alumni, and
visitors of IVMS group for their friendship throughout the years.
Most importantly, my deepest gratitude goes to my parents and grandparents for
their love throughout my life; my wife, Caroline, for making my life wonderful; little
Lucas, appearing recently, for reminding me to work hard and keep learning. This
dissertation is dedicated to my family.
v
Contents
Abstract
iv
Acknowledgments
v
1 Introduction
1
1.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2 Organization
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background
6
2.1 Robust Hashing for Image Authentication . . . . . . . . . . . . . . .
7
2.1.1
Compression-Inspired Features . . . . . . . . . . . . . . . . . .
8
2.1.2
Block Projection . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.1.3
Robust Projection
. . . . . . . . . . . . . . . . . . . . . . . .
10
2.1.4
Coding of Features . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2 Foundations of Distributed Source Coding . . . . . . . . . . . . . . .
12
2.2.1
Lossless Distributed Source Coding . . . . . . . . . . . . . . .
13
2.2.2
Practical Slepian-Wolf Coding . . . . . . . . . . . . . . . . . .
13
2.3 Secure Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.3.1
Secure Biometrics Using Slepian-Wolf Codes . . . . . . . . . .
16
2.3.2
Privacy Leakage and Secret-Key Rate . . . . . . . . . . . . . .
17
2.3.3
Comparison to Image Authentication . . . . . . . . . . . . . .
18
2.4 Rate Constrained Hypothesis Testing . . . . . . . . . . . . . . . . . .
19
2.5 Video Quality Monitoring Background . . . . . . . . . . . . . . . . .
23
2.5.1
Full-Reference Quality Assessment
vi
. . . . . . . . . . . . . . .
23
2.5.2
No-Reference Quality Assessment . . . . . . . . . . . . . . . .
24
2.5.3
Reduced-Reference Quality Assessment . . . . . . . . . . . . .
25
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3 Image Authentication Using DSC
3.1 Image Authentication Problem . . . . . . . . . . . . . . . . . . . . . .
27
28
3.1.1
Two-State Channel . . . . . . . . . . . . . . . . . . . . . . . .
28
3.1.2
Residual Statistics . . . . . . . . . . . . . . . . . . . . . . . .
29
3.2 Image Authentication System . . . . . . . . . . . . . . . . . . . . . .
34
3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.3.1
Authentication Data Size . . . . . . . . . . . . . . . . . . . . .
36
3.3.2
Receiver Operating Characteristic . . . . . . . . . . . . . . . .
38
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4 Learning Unknown Parameters
42
4.1 Two-State Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.2 EM Decoder for Contrast and Brightness Adjustment . . . . . . . . .
45
4.3 EM Decoder for Affine Warping Adjustment . . . . . . . . . . . . . .
49
4.4 Contrast, Brightness, and Affine Warping Adjustment . . . . . . . . .
54
4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
4.5.1
Contrast and Brightness Adjustment . . . . . . . . . . . . . .
56
4.5.2
Affine Warping . . . . . . . . . . . . . . . . . . . . . . . . . .
58
4.5.3
Contrast, Brightness, and Affine Warping Adjustment . . . . .
61
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
5 Tampering Localization
64
5.1 Space-Varying Two-State Channel . . . . . . . . . . . . . . . . . . . .
65
5.2 Decoder Factor Graph . . . . . . . . . . . . . . . . . . . . . . . . . .
66
5.3 Spatial Models for State Nodes . . . . . . . . . . . . . . . . . . . . .
69
5.4 Tampering Localization for Adjusted Images . . . . . . . . . . . . . .
72
5.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
5.5.1
Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
75
5.5.2
Decodable Rate . . . . . . . . . . . . . . . . . . . . . . . . . .
76
5.5.3
Receiver Operating Characteristic . . . . . . . . . . . . . . . .
76
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6 Video Quality Monitoring Using DSC
6.1 Video Quality Monitoring System . . . . . . . . . . . . . . . . . . . .
80
81
6.1.1
J.240 Feature Extraction . . . . . . . . . . . . . . . . . . . . .
82
6.1.2
PSNR Estimation . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.2 Performance Prediction . . . . . . . . . . . . . . . . . . . . . . . . . .
84
6.2.1
Performance Prediction . . . . . . . . . . . . . . . . . . . . . .
85
6.2.2
Synthesized Data Simulation . . . . . . . . . . . . . . . . . . .
88
6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
7 Conclusions and Future Work
95
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
A Test Images
99
B Concavity of L̂
100
Bibliography
104
viii
List of Figures
2.1 Image authentication scheme based on robust hashing . . . . . . . . .
8
2.2 Source coding with side information at the decoder . . . . . . . . . .
14
2.3 Compression for hypothesis testing with side information . . . . . . .
20
3.1 Two-state lossy channel . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.2 Examples of the two-state lossy channel output . . . . . . . . . . . .
29
3.3 The difference between the two-state lossy channel input and output .
30
3.4 Sample autocorrelation function of differences . . . . . . . . . . . . .
31
3.5 Power spectral density function of differences . . . . . . . . . . . . . .
32
3.6 The difference distributions between the two-state lossy channel input
and output using the blockwise mean as the projection . . . . . . . .
33
3.7 The difference distributions between the two-state lossy channel input
and output using a high frequency projection . . . . . . . . . . . . . .
33
3.8 Image authentication system using distributed source coding . . . . .
34
3.9 Minimum rate for decoding Slepian-Wolf bitstream for the image Lena
with the projection X quantized to 4 bits. . . . . . . . . . . . . . . .
37
3.10 Authentication data sizes in number of bytes using convention fixed
length coding and distributed source coding for different number of
bits in quantization.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.11 ROC curves of tampering detection with different number of bits in
quantization of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.12 ROC equal error rates for different authentication data sizes using conventional fixed length coding and distributed source coding. . . . . .
40
3.13 ROC curves of various authentication methods. . . . . . . . . . . . .
41
ix
4.1 Two-state channel with unknown adjustment parameters . . . . . . .
44
4.2 Examples of the channel output . . . . . . . . . . . . . . . . . . . . .
44
4.3 The oracle decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.4 Contrast and brightness learning Slepian-Wolf decoder . . . . . . . .
46
4.5 Search traces for different decoders . . . . . . . . . . . . . . . . . . .
49
4.6 Realignment of an affine warped image . . . . . . . . . . . . . . . . .
51
4.7 Slepian-Wolf decoder with affine warping parameter learning . . . . .
52
4.8 Example of corresponding coordinate estimation in 1D . . . . . . . .
53
4.9 Minimum decodable rates for contrast and brightness adjusted images
58
4.10 ROC curves for contrast and brightness adjusted images . . . . . . .
59
4.11 Minimum decodable rates for rotated and sheared images . . . . . . .
60
4.12 ROC curves for the target images that have undergone affine warping
61
4.13 ROC curves for the target images that have undergone contrast, brightness, and affine warping adjustments . . . . . . . . . . . . . . . . . .
62
5.1 Space-varying two-state lossy channel . . . . . . . . . . . . . . . . . .
65
5.2 Target image overlaid with channel states . . . . . . . . . . . . . . . .
66
5.3 Factor graph for the localization decoder . . . . . . . . . . . . . . . .
68
5.4 Factor graph for the localization decoder with spatial models . . . . .
69
5.5 Spatial models for the channel states . . . . . . . . . . . . . . . . . .
71
5.6 Space-varying two-state lossy channel with contrast and brightness adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
5.7 Contrast and brightness learning Slepian-Wolf decoder for tampering
localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
5.8 Minimum localization data rates for decoding S(Xq ) using tampered
side information compared to the authentication data rates . . . . . .
77
5.9 ROC curves of the tampering localization decoders using spatial models 78
5.10 ROC curves of the tampering localization decoders facing contrast and
brightness adjusted images . . . . . . . . . . . . . . . . . . . . . . . .
79
6.1 Video quality monitoring scheme using distributed source coding . . .
81
6.2 Random projection of J.240 feature extraction module . . . . . . . .
83
x
6.3 Distributions of X and X − Y . . . . . . . . . . . . . . . . . . . . . .
85
6.4 MSE estimation errors of maximum likelihood estimation . . . . . . .
89
6.5 PSNR estimation errors of maximum likelihood estimation . . . . . .
90
6.6 Average squared PSNR estimation error . . . . . . . . . . . . . . . .
90
6.7 Distortion-rate curves of the transcoded test video sequences . . . . .
91
6.8 RMS PSNR estimation error versus the number of bits in the quantization of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
6.9 RMS PSNR estimation error versus video digest data rates . . . . . .
93
A.1 Test images used in simulations. . . . . . . . . . . . . . . . . . . . . .
99
B.1 Concavity of the log-likelihood function as we vary the number of bits
in quantization of X . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
xi
Chapter 1
Introduction
Media content can be efficiently delivered through intermediaries, such as peer-to-peer
(P2P) file sharing and P2P multicast streaming. Popular P2P file sharing systems
include BitTorrent [1], eMule [2], and KaZaA [3]. In these systems, each user not
only receives the requested content but also acts as a relay forwarding the received
portions to the other users. Since the same content can be re-encoded several times,
media content in those P2P file sharing systems is available in various digital formats,
such as JPEG [73] and JPEG2000 [77] for images, and MPEG-1 [74], MPEG-2 [75],
and H.264/AVC [76, 79] for videos. On the other hand, the untrusted intermediaries might tamper with the media for a variety of reasons, such as interfering with
the distribution of particular files, piggybacking unauthentic content, or generally
discrediting a particular distribution system. A recent survey indicates that more
than 50% of popular songs in KaZaA are corrupted [96], e.g., replaced with noise
or different songs. Distinguishing legitimate encoding versions from maliciously tampered ones is important in applications that deliver media content through untrusted
intermediaries.
The problem is more challenging if some legitimate adjustments, such as cropping
and resizing an image, are allowed in addition to lossy compressions. Additional adjustments might not change the meaning of the content, but could be misclassified as
tampering. Users might also be interested in localizing tampered regions in a received
content already deemed tampered, so that the user can request that the tampered
1
CHAPTER 1. INTRODUCTION
2
regions be retransmitted for recovery instead of requesting the whole content. On
the other hand, the content distributor and the users can benefit from knowing the
quality of the received content. Distinguishing legitimate encodings with possible adjustments from tampering, localizing tampering, and estimating the received content
quality are the challenges addressed in this thesis with a focus on image authentication
problems.
Through digital video content delivery, distortions are introduced in lossy compression, transmission of bitstream, and reconstruction of the media content from
possibly transcoded or damaged bitstreams. To ensure quality of service for the
whole media delivery system, the first step is to monitor the fidelity of the received
video. The quality monitoring problem is closely related to image authentication.
Image authentication systems test two hypotheses, legitimate and tampered, while
the quality monitoring systems examine a continuum of hypotheses about quality.
This work contributes to the video quality monitoring problem by improving coding
efficiency and quality estimation methods.
1.1
Research Contributions
This thesis presents an extension of robust hashing for image authentication and quality monitoring problems using distributed source coding principles. Some results have
been published in [100–108]. The major contributions of this work are summarized
below:
• The concept of distributed source coding is applied to the image authentication
problem. Statistical properties of legitimate encodings and tampering are formulated. The use of blockwise projection of images is justified to capture the
spatial structure of possible tampering. The robust hash generated by SlepianWolf encoding of the quantized original image projection is transmitted to the
user. The user attempts to decode this robust hash using the target image as
side information. The Slepian-Wolf result [167] indicates that the lower the distortion between side information and the original, the fewer authentication bits
CHAPTER 1. INTRODUCTION
3
are required for correct decoding. By correctly choosing the size of the authentication data, this insight allows us to distinguish between legitimate encoding
variations of the image and illegitimate modifications.
• An extension of Slepian-Wolf decoding using an expectation maximization (EM)
algorithm is proposed to address the authentication problem when additional
adjustments appear. The extended decoder iteratively updates the editing parameters using the soft information of the partially decoded image projection
and then decodes the Slepian-Wolf bitstream using the updated side information. The resulting authentication scheme can be robust against contrast,
brightness, and affine warping adjustments.
• We model the tampering localization problem using a space-varying channel and
construct the corresponding decoder factor graph representation [89]. Using
a message-passing algorithm over the decoder factor graph, the scheme can
localize tampering in an image. The extensions with 1D and 2D spatial models
to exploit the contiguity of tampering result in a smaller required Slepian-Wolf
bitstream.
• The analysis and coding of video quality monitoring systems are investigated.
The maximum-likelihood estimation of the mean squared error between original
and received feature pixels results in a smaller number of bits required for
quantization of the feature pixels. Slepian-Wolf coding of quantized feature
pixels additionally yields rate savings. We characterize the system by using the
Cramér-Rao lower bound.
1.2
Organization
This thesis is organized as follows:
• Chapter 2 reviews past approaches for image authentication. Then we describe
some foundations of lossless distributed source coding which offer potential improvement on the image authentication system based on robust hashing. We
CHAPTER 1. INTRODUCTION
4
also review research into the secure biometric problem which is closely related to
the image authentication problem. Both secure biometric and image authentication have similar information theoretical fundamentals. The theoretical results
suggest that we explore the image authentication problem using distributed
source coding. Lastly, we review past approaches for video quality monitoring which is closely related to the image authentication problem and can also
benefit from distributed source coding.
• Chapter 3 introduces the image authentication system using distributed source
coding. We formulate the image authentication problem with a hypothesis
testing setup by assuming that legitimate compressions introduce small white
noise whereas tampering adds spatially correlated noise. The proposed system
reduces the image dimensions by using blockwise projection to capture the
spatial structure. The Slepian-Wolf encoder codes the projections. By correctly
choosing the size of the Slepian-Wolf bitstream, the resulting bitstreams can be
decoded using the legitimate image as side information. Section 3.3 shows that
our system can distinguish legitimate encodings from malicious tampering using
authentication data of less than 100 bytes.
• Chapter 4 presents a solution to authenticate images that have undergone legitimate editing, such as contrast, brightness, and affine warping adjustments.
The authentication decoder learns the editing parameters directly from the target image through decoding the authentication data using an EM algorithm.
Section 4.1 introduces a two-state channel with unknown editing parameters
to formulate the problem. Section 4.2 describe the proposed authentication
decoder for images that have undergone contrast and brightness adjustment.
Section 4.3 presents our solution to the authentication of affine warped images.
Section 4.4 extends the decoder to address images that have simultaneously
undergone contrast, brightness, and affine warping adjustment. Experimental
results in Section 4.5 demonstrate that the EM decoder can distinguish legitimate editing from malicious tampering while accurately learning the parameter
CHAPTER 1. INTRODUCTION
5
with authentication data size comparable to the oracle decoder which knows
the ground truth parameters.
• Chapter 5 extends the authentication system to localize the tampering. Sec-
tion 5.1 formulates the localization problem using a space-varying two-state
channel model. Section 5.2 describes the factor graph representation of the
corresponding decoder. The decoder can reconstruct the image projection of
the localization data using tampered images as side information by performing
a message-passing algorithm over the factor graph. Section 5.3 presents spatial models that exploit the spatial correlation of the tampering. Section 5.4
extends the decoder to localize the tampering in tampered images that have
undergone legitimate contrast and brightness adjustment. Simulation results in
Section 5.5 demonstrate that the authentication system can localize the tampering with high probability using localization data of a few hundred bytes.
• Chapter 6 studies a reduced-reference video quality monitoring scheme using
distributed source coding. Section 6.1 describes the scheme in detail and provides the rationale for using distributed source coding. In Section 6.2, a theoretical framework based on the Cramér-Rao lower bound gives a performance
prediction of the maximum likelihood variance estimation. The proposed performance prediction is confirmed by simulations. In Section 6.3, our approach is
compared with the ITU-T J.240 Recommendation for remote PSNR monitoring.
Chapter 2
Background
The objective of image authentication is to distinguish legitimate variations in content
from maliciously edited ones. Past approaches for image authentication fall into three
groups: forensics, watermarking, and robust hashing. In digital forensics, the user
verifies the authenticity by solely checking the received content [55]. For example,
the histograms of transform coefficients offer a clue about how many times lossy
compression has been applied [118] and the spectrum of color components tells us
if the image is spliced from two or more images [129]. Unfortunately, these forensic
methods cannot work well in images of low quality, since compression noise or reencoding would weaken those forensic traces. Without any information from the
original, one cannot confirm the integrity of the received content. Therefore, content
independent from the original may pass forensic checking. The next option for image
authentication is watermarking. In this option, a semi-fragile watermark is embedded
into the host signal waveform without perceptual distortion [36, 48, 206]. Users can
confirm the authenticity by extracting the watermark from the received content. The
system design should ensure that the watermark survives lossy compression, but that
it “breaks” as a result of malicious manipulations. Unfortunately, watermarking
authentication is not backward compatible with previously encoded contents, i.e.,
unmarked content cannot be authenticated later. Embedded watermarks might also
increase the bit rate required when compressing a media file.
6
CHAPTER 2. BACKGROUND
7
Robust hashing can check the integrity of the received content using a small
amount of data derived from the original content. Cryptographic hashing [42,47,151]
is a special case in which the authentication data are generated using a scrambling
hash function that is nearly impossible to invert; any modification of the content is
not allowed as modifications yield a very different hash value. However, cryptographic
hashing is not applicable to the image authentication problem as processed images are
not exactly identical to the original but carry the same meaning. Researchers have
been investigating robust hashing schemes that distinguish allowable distortion from
malicious editing. This chapter reviews related work on robust hashing approaches.
Section 2.1 reviews robust hashing schemes to offer an overview of previous approaches to the image authentication problem. Section 2.2 describes the key element
of this work, distributed source coding, by reviewing Slepian-Wolf results, and some
practical implementations of the Slepian-Wolf codec. Section 2.3 reviews contributions to the secure biometric problem which is closely related to the image authentication problem. Both secure biometrics and image authentication have similar information theoretical fundamentals which will be reviewed in Section 2.4. The theoretical
results suggest that we explore the image authentication problem using distributed
source coding. A closely related problem, quality monitoring, will be reviewed in
Section 2.5.
2.1
Robust Hashing for Image Authentication
Robust hashing achieves verification of previously encoded media by using an authentication server to supply authentication data to the user. Digital signatures [42, 152]
have solved the problem when only unaltered content is allowed. The idea is to
generate a hash value of the original content using a cryptographic hash function,
such as MD5 [151] and SHA-1 [47], which is then signed by the private key of an
authority using an asymmetric encryption system. The user can check if the content
is altered by comparing the hash value of the received content to that in the digital
signature. However, this solution is not applicable when some legitimate editing is
allowed, since changing any single bit leads to a completely different hash. Inspired by
8
CHAPTER 2. BACKGROUND
cryptographic hash, robust hashing is offered to provide proof of perceptual integrity.
If two media signals are perceptually indistinguishable, they should have identical
hash values. A common approach in media hashing is extracting features which have
perceptual importance and should survive compression. The authentication data are
generated by compressing these features or their hash values. The user checks the
authenticity of the received content by comparing the features or their hash values to
the authentication data.
Typical robust hashing schemes for image authentication consist of three parts:
feature extraction, coding of feature vector, and verification. In feature extraction,
the original image is analyzed to obtain a set of feature vectors that would be robust
against some type of processing, such as lossy compression. The (possibly quantized)
feature vectors are coded into a bitstream as a part of the authentication data. The
authenticity of the received image is verified at the receiver along with the authentication data which can be delivered through secure channels or embedded in the image.
!"#$#%&'()*&$+
!"#$%&"
'($&#)$*+,
2&#,34*33*+,
-+.*,/
0123+%2#-&2#4%(7&2&
,+-+#.+/()*&$+
0"&*1*)#$*+,
0123+%2#4"
5&*6+"+/
Figure 2.1: Image authentication scheme based on robust hashing. The authentication data are generated from robust feature vectors of the original image. The
received image is classified as authentic or tampered along with the authentication
data delivered via secure channels.
2.1.1
Compression-Inspired Features
Many image authentication systems achieve robustness against lossy compression by
using compression invariant features. For JPEG compression, Lin et al. proposed
to use the invariant relationship of DCT coefficients [97–99]. The key idea is that
the partial order relations between two transform coefficients (i.e., ≤ and ≥) remain
CHAPTER 2. BACKGROUND
9
unchanged after quantization and reconstruction. The binary feature vector encodes
a set of the DCT coefficient relations of the same frequency in two pseudo-randomly
selected blocks. For JPEG2000, Lu et al. proposed a similar image authentication
scheme using the partial order relation of wavelet child-parent pairs [115, 116]. Oostveen et al. [128] and Adel-Mottaleb et al. [12] also investigated relation-based features
for authentication. Other approaches directly use intermediate results in image coding, such as scan states of embedded block coding with optimal truncation (EBCOT)
from JPEG2000 [171], binary significance map in set partitioning in hierarchical trees
(SPIHT) [214], directly hashing the wavelet coefficients [14, 155], and using critical
sets of DCT coefficients [85]. These compression-inspired features are designed for
the corresponding compression schemes but would fail in other coding schemes or
common image processing.
2.1.2
Block Projection
Researchers investigated using block-based statistics or more sophisticated projection
for the feature vectors to increase robustness. Block-based approaches can preserve
feature locality for locating possible tampering. Schneider et al. proposed using image
block-based histograms for image authentication [156]. The block-based histograms
are robust against acceptable image compression, but an attacker can easily change
the content while keeping the histograms unchanged. Fridrich considered zero-mean
low-pass Gaussian pseudo-random projection for image authentication [56, 57]. The
purpose of zero-mean projection is to enhance the robustness against contrast and
brightness adjustment and other common image processing. However, this raises the
security issue that the attacker can arbitrarily change the mean of image blocks while
keeping the feature vector unaltered. This is because the null space of the projection
is known. The same problem happens to the authentication scheme which uses fixed
projection such as features using image block standard deviations or means [84, 113],
column and row projections [208], and transform coefficients [215, 219]. This security
issue can be addressed by using pseudo-random projection or pseudo-random tiling,
such as [193], that keeps the null space or partitions unknown to the attacker. Typical
CHAPTER 2. BACKGROUND
10
block-based features suffer from a robustness issue: when the target image is rotated,
cropped, resized, or translated, the system fails as the features in the authentication
data are no longer aligned to the target image. Mihcak et al. proposed the features
derived from a low-pass band of randomly partitioned images. They argued that
one should iteratively apply an order statistics filter and a linear filter to increase
the robustness against slightly affine transformation [123]. Khanna et al. suggest
normalizing the images before verification [87]. Next, we will review some sophisticated projections to authenticate images that have undergone rotation, scaling, and
translation.
2.1.3
Robust Projection
Many sophisticated projections are proposed to be invariant or robust to rotation,
scaling, and translation. Most are based on Radon and Fourier transforms. Lefebvre et al. use principal component analysis on Radon transform coefficients of images
to obtain features for authentication [94]. The important coefficients of the Radon
transformed image are extracted by principal component analysis to yield a short
representation for the authentication data. Principal component analysis also makes
the features robust to scaling and rotation. Similarly, Seo et al. [162] and Zhang et
al. [220] obtain affine invariant features from the log-mapped Radon transform coefficients. Swaminathan et al. proposed using a polar representation of the 2D Fourier
transform of an image to generate robust features [174, 175]. An element in the proposed feature vector is the pseudo-random weighted summation along the circumference of a selected radii in the Fourier transformed image. The magnitudes of Fourier
transform coefficients are independent from translation and the circumferential summation offers rotational invariance. Other features robust against rotation include the
variance of pixels along the radial [40, 41], mean of pixels within a radial sector [213],
contourlet coefficients [22, 195], and diametric strip projection [179]. One interesting
approach is to treat each image block as a non-negative matrix and then use dimensional reduction techniques to generate image hashes [67,88,119,126,127,180]. Other
CHAPTER 2. BACKGROUND
11
methods to enhance the robustness of the image authentication system include more
sophisticated features that are important to the human visual system.
Many approaches consider using visually important features for image authentication. These features are designed to be robust against moderate quality image
compression and other content preserving processing but sensitive to malicious editing. Monga et al. proposed using the features derived from the end-stop wavelet
coefficients to which human visual perception is reportedly sensitive [124,125]. Edges
and contours of objects in an image are investigated as promising candidates for
robust features in image authentication [21, 43, 138, 139]. Features inspired by the
computer vision community for content-based image retrieval have been considered
by Bhattacharjee et al. [21], Roy et al. [153], Lu et al. [114], and Schlauweg et al. [154].
2.1.4
Coding of Features
Most researchers in this field focus on investigating feature extraction. Generation
of the final authentication data requires quantization and compression of the feature
vectors. However, few articles discuss coding methods. Most approaches use a coarse
quantization to have short authentication data. For example, Fridrich et al. use 1-bit
quantization for random projection coefficients [56, 57, 153], and the relation-based
approaches [12, 97–99, 115, 116, 128] can be considered as 1-bit quantizations of coefficient differences. The entropy coded or cryptographic hashed binary representation
of the feature vector serves as the authentication data. Since the original and the
legitimate target images are highly correlated, one can generate less authentication
data by exploiting the correlation.
To the best of our knowledge, Venkatesan et al. were the first to consider errorcorrecting coding in the image authentication system to reduce the authentication
data size [193]. The idea is to project the binary feature vectors of the images into
syndrome bits of an error-correcting code and directly compare the syndrome bits to
decide the authenticity. Sun et al. independently considered using error-correcting
coding in the image authentication system [170]. Their approach uses systematic
CHAPTER 2. BACKGROUND
12
Hamming codes to obtain the parity bits of the binary feature vectors as the authentication data. The idea is to use the parity bits of systematic channel codes
concatenated with the binary feature vector of the received image to correct the
errors introduced by image processing, such as compression. Both approaches use
bounded distance error-correcting codes. Further improvements can be made with
the knowledge of distributed source coding. Another use for distributed source coding
in authentication is to secure the feature vector. Johnson et al. proposed an image
authentication scheme that protects the feature vector using subtractive dithering
and compresses the dithered feature vector using distributed source coding [83]. The
decoder uses the dither sequence as side information to decode the authentication
data. The next section will review the foundations of distributed source coding and
the connection to error-correcting codes.
2.2
Foundations of Distributed Source Coding
Distributed source coding addresses separate compression of statistically dependent
random sequences. Each encoder separately observes a random sequence and sends a
bitstream to a single decoder. The decoder reconstructs the random sequences from
the incoming bitstreams. Efficient compression can be achieved despite independent
encoding. This concept has been applied to many applications, such as data compression for sensor networks [130,209], low complexity video encoding [5–11,134–137],
compression for flexible video playback [28–31], systematic lossy error protection for
videos [8, 140–147, 223], distributed compression for large camera arrays [222], distributed stereo image coding [24–26, 190–192], and compression of encrypted content [157, 161]. Overviews of recent developments and applications of distributed
source coding can be found in [44,64]. This section reviews the Slepian-Wolf theorem
and practical implementations of Slepian-Wolf coding.
13
CHAPTER 2. BACKGROUND
2.2.1
Lossless Distributed Source Coding
A sequence of N independent and identically distributed (i.i.d.) samples of a finitealphabet random variable X can be compressed with negligible information loss at a
coding rate (in bits per source symbol) arbitrarily close to H(X) as N tends to infinity,
where H(X) is the entropy of the random process [163]. Similarly, a random sequence
of i.i.d. samples of Y can be compressed into a binary sequence at a coding rate close
to H(Y ). Clearly, a total coding rate of (H(X) + H(Y )) can allow the sequences of
X and Y to be reconstructed at the decoder. If X and Y are statistically dependent,
only H(X, Y ) bits per symbol are needed, but can it be done by encoding X and Y
separately?
Slepian and Wolf gave a positive answer to the above question [167], which shows
the achievable rate region:
RX + RY ≥ H(X, Y )
RX ≥ H(X|Y )
RY ≥ H(Y |X)
where RX and RY are the coding rates in bits per source symbol for X and Y ,
respectively.
Consider the special case depicted in Figure 2.2. We are interested in compression
of a sequence of X statistically dependent on the side information Y . The side
information Y is only available at the decoder. The Slepian-Wolf result implies that
the rate for lossless compression of X is at least H(X|Y ). Conversely, when the rate
is less than H(X|Y ), the probability of decoding error will be bounded from zero.
Recent work contributes several practical solutions approaching the theoretical limit.
We will discuss practical implementations of this coding scheme in Section 2.2.2.
2.2.2
Practical Slepian-Wolf Coding
Although the theoretical results of distributed source coding have been available for
more than three decades, most practical coding schemes emerged only recently. From
14
CHAPTER 2. BACKGROUND
!
!"#$%&'()*"+
,'-*.#/
"#$%&"
'$#$()$(*#++,&-%.%/0%/$
!"#$%&'()*"+
0#-*.#/
!!
#
Figure 2.2: A special case of the Slepian-Wolf theorem. The discrete memoryless
random variables X and Y are statistically dependent, but Y is only available at the
decoder.
the proof of the theorem in [34], the encoder divides the source space X N into 2N R
bins. For each input sequence, the encoder sends the index of its bin. With a sufficient
rate R for the binning, the decoder can reliably find the unique and correct X that
is jointly typical with the side information Y . Although the proof does not provide
any constructive means of binning, recent advances on channel codes offer promising
solutions.
In the outline of the alternative proof of the Slepian-Wolf theorem in [207], the
author demonstrates that channel coding is closely related to Slepian-Wolf coding.
Consider that binary sequences X and Y are correlated and with some small difference. The difference can be modeled as an error sequence introduced by a correlation
channel that captures the statistical dependence between X and Y . The key idea
is to send the syndrome of the source X to the decoder. The decoder corrects the
error by decoding the concatenation of the syndrome and the side information Y .
Csiszár showed linear codes can achieve the Slepian-Wolf rate with a bounded error
exponent [38]. This implies that using syndrome bits of channel codes as binning
indices is a suitable binning strategy.
Recent practical Slepian-Wolf coding schemes are based on channel codes like
block and trellis codes [131]. More sophisticated distributed source coding techniques are based on the channel codes that are close to the Shannon limit, such as
Turbo codes [20] and low-density parity-check (LDPC) codes [58]. Turbo codes for
compression of binary source with side information were independently proposed by
Garcı́a-Frı́as and Zhao [59, 61, 221], Bajcsy et al. [16], and Aaron et al. [4]. LDPC
CHAPTER 2. BACKGROUND
15
codes were suggested by Liveris et al. [111,112] and Schonberg et al. [158–160] as well
as other authors [33, 63, 92, 168, 181].
Iterative LDPC decoding methods using the message-passing algorithm are attractive as they can intuitively integrate with factor graphs [89] of source or channel
models. Garcı́a-Frı́as et al. demonstrated the compression of binary sequences correlated by a hidden Markov model using LDPC codes [62], as well as the decoding
algorithm for LDPC codes over a finite-state binary Markov channel [60]. Varodayan et al. [187] and Schonberg et al. [157] independently extend the LDPC decoder
to exploit the 2D spatial source correlation for compression.
Aaron et al. first introduced a concept of rate adaptivity to the distributed
source coding community. They proposed to use rate-compatible punctured turbo
(RCPT) codes for practical Slepian-Wolf coding to enable rate-efficient applications
of distributed source coding in which terminals can transmit bitstreams incrementally
without significant rate penalty [11]. Varodayan et al. then invented rate-adaptive
LDPC codes [186, 188] as the LDPC counter part. In addition to rate adaptivity,
the side information adaptation is introduced to offer a novel means for motion and
disparity estimation of distributed stereo images and video coding [189–192]. Our
work in authentication of adjusted images described in Chapter 4 has been inspired
by adaptive distributed source coding [185].
In addition to video and image coding applications, researchers in the security
community found that the distributed source coding technique meets the needs of
their applications. The next section will present some work in biometric security
which applies distributed source coding techniques.
2.3
Secure Biometrics
Access control to data or physical locations plays an important role in preventing
malicious actions. Traditional approaches use possession of secret knowledge, e.g.,
password, or physical token, e.g., identifying documents. The former can be guessed
by unauthorized persons or forgotten by the legitimate users; the latter might be
forged, lost, or stolen. Biometric systems provide an alternative solution to access
CHAPTER 2. BACKGROUND
16
control. Biometric data, such as fingerprints [15] and iris scans [39], cannot be forgotten or lost. These kinds of data contain a large amount of information and therefore
are hard to guess and copy. However, each measurement of biometric data suffers
from noise due to intrinsic variation, different measurement conditions, and devices.
For example, fingerprint scans can be different due to elastic skin deformation or
dust on the finger. Biometric systems should be robust to those variations. Most
approaches rely on pattern recognition techniques. The enrollment biometric data of
an individual are extracted and stored at registration. Later, the biometric reading
is compared to the stored biometric data for authentication. If the two biometric
data are close enough, then access is granted. However, biometric data, unlike password or identifying documents, cannot be regenerated once they are compromised.
Unencrypted storage of the original biometric data would pose a security risk. Just
as passwords in computer systems are not stored in the clear to prevent being comprised, biometric data should also be stored in a protected way. The solution for a
password, i.e., cryptographic hashing, cannot be directly applied to biometric data as
the aforementioned measurement noise appears in a biometric reading while a password is always the same. The challenge of secure biometrics includes privacy of the
original biometric data and robustness to measurement noise.
2.3.1
Secure Biometrics Using Slepian-Wolf Codes
Researchers in Mitsubishi Electric Research Laboratories (MERL) achieve secure storage of biometric data using Slepian-Wolf codes [45, 46, 120, 172, 173, 194] with the information theoretical knowledge of common randomness [38]. The key idea is to use
Slepian-Wolf coding as a tool extracting the common secret from two observations of
correlated sources. In the enrollment stage, the system extracts the biometric features
from the raw biometric data from an individual and then encodes the features into
Slepian-Wolf bitstream and cryptographic hash value. In the authentication stage,
the system decodes the Slepian-Wolf bitstream using the probe biometric features as
side information. The hash value of the decoded biometric feature is compared to
CHAPTER 2. BACKGROUND
17
the original hash value stored in the system. If the hash values are identical, access
is granted, otherwise, access is denied.
The authors assume that the biometric data from a user is supposed to be statistically independent of the biometric data from other users. On the other hand, the
biometric measurements from the same user are correlated. A suitable conditional
distribution models the correlation between the biometric readings (the enrollment
and the probe) from the same user. Slepian-Wolf coding provides the robustness
against the measurement noise by exploiting the correlation. The system is secure
as the Slepian-Wolf bitstream alone does not leak too much information and the
cryptographic hash value presumably leaks no information.
In [120], the authors demonstrated a secure biometric system for iris biometric
data using the above principle and indicated the tradeoffs between security and robustness of the secure biometrics. In [45, 46], the authors applied the same principle
to secure fingerprint biometric data and developed a fingerprint channel model for
a message-passing decoder to capture the measurement noise. Later, the authors
proposed an alternative approach using a feature transformation to map the fingerprint biometric data to a binary vector which has appropriate statistics for LDPC
syndrome coding [172] without the complicated message-passing algorithms.
2.3.2
Privacy Leakage and Secret-Key Rate
A research group in Eindhoven University of Technology independently investigated
the biometric problems from an information theoretic perspective [69–72]. The authors determined that the secure biometric problem is related to common randomness [38]. Their contribution is to investigate the tradeoff between the secret-key rate
and the privacy-leakage for the biometric system setting. In their setting, the enrollment and authentication biometric data can form a common (random) secret key
using a public message derived from the enrollment biometric data. The secret key
can be used in the cryptographic system for secure communication and access control. A higher secret-key rate would require the public message to leak more private
CHAPTER 2. BACKGROUND
18
information about the enrollment biometric data. The authors established the optimal secret-key and privacy-leakage rate regions. Although they aimed at a slightly
different goal from MERL which investigated the tradeoff between the robustness
and privacy leakage, both contributions give the same insight that the Slepian-Wolf
theorem plays an important role in an optimal detection scheme.
2.3.3
Comparison to Image Authentication
The image authentication problem is closely related to the secure biometrics. In the
image authentication problem, the original image and the authentic target image are
similar; in the secure biometric problem, the enrollment biometric data and the probe
biometric data from the same person are similar. Both target images and probe biometric data are not available at the time of generating the authentication data or
secure biometric data. These cases fit the setting of distributed source coding which
can lead to high coding efficiency for authentication data and secure biometric data.
In the image authentication problem, lower encoding rate yields smaller authentication data; in the secure biometric problem, lower encoding rate yields more privacy
of the original biometric data. Despite these similarities, these two problems do have
some differences.
In the secure biometric problem, the biometric data from two different persons
are assumed to be independent. In the image authentication problem, the tampered
target images are usually correlated to the original one but with different statistics
than the authentic target images. This means that the secure biometric problem is
actually a hypothesis testing against independence under rate constraints [13], while
the image authentication problem is a more general rate-constrained hypothesis testing problem [65, 66]. The next section will review information theoretical results of
hypothesis testing under rate constraints. The result suggests that the image authentication problem can benefit more than the biometric problem by using distributed
source coding.
19
CHAPTER 2. BACKGROUND
2.4
Hypothesis Testing with Side Information
Under Rate Constraints
Let each pair (X, Y ) be independently and identically distributed over discrete space
X × Y subject to a joint distribution based on the following two hypotheses:
(X, Y ) ∼
!
PXY (X, Y ) if H0 is true,
QXY (X, Y ) if H1 is true.
(2.1)
Hypothesis testing determines which hypothesis is true based on the observations
(xN , y N ). The Type I error α is the error of rejecting H0 when it is actually true. The
Type II error β is the error of accepting H0 when H1 is actually true. These two errors
express a tradeoff made by the choice of the acceptance region A ⊆ (X N , Y N ) for H0 .
Optimal error probabilities are related to the divergence of the two distributions P
and Q. Specifically, the minimal Type II error exponent for a given probability of a
Type I error has been determined by Stein [27, 35]:
Lemma 1 (Stein’s Lemma). [35, Theorem 12.8.1] Let X1 , X2 , . . . , XN be i.i.d. drawn
from P if H0 is true, or Q if H1 is true, and Q > 0 and P > 0 for all values in X . The Kullback-Leibler divergence from P to Q is defined as D(P ||Q) !
"
P (x)
N
be an acceptance region for hypothesis H0 ,
x∈X P (x) log Q(x) < ∞. Let AN ⊆ X
and probabilities of error be αN = P N (AcN ) and βN = QN (AN ). For 0 < # < 12 , define
!
βN
= min βN .
AN
αN <!
Then
1
!
log βN
= −D(P ||Q)
!→0 N →∞ N
lim lim
If both X N and Y N are available at the same site, one can perform the optimal
hypothesis testing and then send one-bit information of the decision. This would
achieve the optimal decision performance at zero-rate. A more interesting problem is
to consider the asymmetric case in which Y N is only available at the decoder.
20
CHAPTER 2. BACKGROUND
Consider the setup depicted in Figure 2.3. (X, Y )N are i.i.d. subject to the
joint probability P (X, Y ) if H0 is true, Q(X, Y ), otherwise. The encoder can only
observe X N and send the encoding of X N at the rate R to the decoder. The decoder
decides which hypothesis is true based on the incoming encoding of X N and the side
information Y N . Note that it is not necessary to reconstruct X N at the decoder
to obtain the decision. The decoder tries to minimize the decision error under the
encoding rate constraint. Define the optimal error exponent:
σ(#, R) ≡ lim inf −
N →∞
1
!
log βN
(R)
N
!
where # is the constraint on the Type I error, i.e., lim sup αN ≤ #, and βN
(R) is the
N →∞
minimal Type II error. The goal is to find the value of σ(#, R). However, to the best
of our knowledge, most results of this problem setting only have lower (achievable)
bounds. The only exception is hypothesis testing against independence.
!
$
,'-*.#/
"#$%&"
0#-*.#/
'$#$()$(*#++,&-%.%/0%/$4&
%&4&6!7#8&9&'6!7#8&35&
%(4&6!7#8&9&)6!7#8
-%*()(3/4&% 1 &35&%2
#$
Figure 2.3: (X, Y )N are i.i.d. subject to the joint probability P (X, Y ) if H0 is true,
Q(X, Y ), otherwise. The side information Y N is only available at the decoder. The
encoder sends an encoded bitstream based on the observations X N to the decoder.
The decoder decides which hypothesis is true based on the incoming bitstream and
the side information Y N .
Ahlswede and Csiszár establish the tight bound of the error exponent under rate
constraint for the case of the hypothesis testing against independence, i.e.,
H0 : PXY (X, Y )
(2.2)
H1 : PX (X)PY (Y ).
(2.3)
21
CHAPTER 2. BACKGROUND
Theorem 2 (Ahlswede and Csiszár [13]). Consider the hypothesis testing with (2.2)
and (2.3). For all 0 ≤ # < 1 and all R ≥ 0
σ(#, R) =
max
U ∈L(R)
||U ||≤||X ||+1
I(U; Y )
where
L(R) = {U|R ≤ I(U; X), U → X → Y },
(2.4)
and U is an auxiliary random variable with cardinality ||U|| and U → X → Y
forms a Markov chain such that the mutual information satisfies the rate constraint
I(U; X) ≤ R.
Note that this bound is tight, i.e. the achievable lower bound meets the converse
upper bound. In the case that R ≥ H(X), one can achieve the optimal decision
error exponent of I(X, Y ) by letting U = X. The mutual information I(X; Y ) is
the divergence between the joint distribution P (X, Y ) against the product of the two
marginals P (X)P (Y ). This result is closely related to the secure biometrics. According to the information theoretical results of the secure biometrics in [69], the rate
constraint here corresponds to the privacy leakage constraint in the secure biometrics,
and the optimal decision error exponent corresponds to the chosen secret-key rate.
We now review the general rate-constrained hypothesis testing by stating the
following theorem.
Theorem 3 (Han [65] in 1987). Consider the setting depicted in Figure 2.3 and the
hypothesis testing in (2.1). For all 0 ≤ # < 1 and all R ≥ 0
σ(#, R) ≥
max
min
U ∈L(R) P̃U XY ∈S(U )
||U ||≤||X ||+1
D(P̃U XY ||QU XY ),
where S(U) = {P̃U XY |P̃U X = PU X , P̃U X = PU X }, QU |X = PU |X , L(R) is defined in
(2.4).
This result only gives an achievable lower bound of the error exponent for the general hypothesis testing under the rate constraint. However, if Q(X, Y ) = P (X)P (Y ),
22
CHAPTER 2. BACKGROUND
this result reduces to Theorem 2. Clearly, the special case, R ≥ H(X), gives the
optimal decision error exponent σ(#, R) = D(PXY ||QXY ).
Both Theorem 2 and Theorem 3 only consider the zero-error coding scheme, i.e.
the decoder reconstructed the joint type of U, X, and Y at exact zero error probability. Later, Shimokawa et al. established an achievable lower bound for a wider
class of coding schemes that includes the coding schemes with decaying nonzero-error
probabilities. Note that most Slepian-Wolf coding schemes are in this class.
Theorem 4 (Shimokawa et al. [166] in 1994). Consider the setting depicted in Figure 2.3 and the hypothesis testing in (2.1). For all 0 ≤ # < 1 and all R ≥ 0.
σ(#, R) ≥
max
U ∈L∗ (R)
||U ||≤||X ||+1
min(d1 (U), d2 (U)),
where
L∗ (R) = {U|R ≤ I(U; X|Y ), U → X → Y }
d1 (U) =
d2 (U) =
min
P̃U XY ∈S(U )


(2.5)
D(P̃U XY ||QU XY )
(2.6)
+∞,
+
 [R − I(U; X|Y )] +
min
P̃U XY ∈T (U )
if R ≥ I(U; X)
D(P̃U XY ||QU XY ), otherwise
(2.7)
T (U) = {P̃U XY |P̃U X = PU X , P̃Y = PY , HP̃ (U|Y ) ≥ HP (U|Y )}
(2.8)
S(U) = {P̃U XY |P̃U X = PU X , P̃U X = PU X }
(2.9)
This result covers the wider coding scheme class by additionally considering the error exponent of decoding error in (2.7) and the rate constraint relaxation by decoding
using side information Y in (2.5), and yields a tighter bound than Theorem 3.
Consider a special case in which
R ≥ H(X|Y ) + D(PXY ||QXY ),
(2.10)
CHAPTER 2. BACKGROUND
23
then we can set U ≡ X and obtain the optimal decision error exponent σ(#, R) =
D(PXY ||QXY ). Note that the secure biometric problem assumes that the biometric
data from a different person are independent, i.e., QXY = PX PY . This implies the
rate requirement in (2.10) to be R ≥ H(X|Y )+I(X; Y ) = H(X). On the other hand,
most cases in the image authentication problem usually have only minor illegitimate
tampering which suggests the tampering probability model QXY would not be far
from the authentic model PXY . We can assume the rate constraint in (2.10) to be
H(X|Y ) + D(PXY ||QXY ) ≤ H(X), provided D(PXY ||QXY ) ≤ I(X; Y ), which is a
reasonable assumption for the image authentication problem. Therefore, the Slepian-
Wolf type coding scheme makes more sense for the image authentication problem to
achieve rate-efficient coding than for the secure biometric problem.
2.5
Video Quality Monitoring Background
A closely related problem to image authentication is video quality monitoring. This
section reviews the related work on quality monitoring systems. Past approaches are
categorized into three classes: full-reference, no-reference, and reduced-reference.
2.5.1
Full-Reference Quality Assessment
The full-reference quality assessment measures the distortion of the target video referring to the original video. The most common objective metrics are the mean squared
error (MSE) and peak-signal-to-noise ratio (PSNR). Most researchers seek objective
metrics that are highly correlated to the subjective quality assessment which involves
the human visual system. This perceptual quality measurement for speech, image,
and video coding has been proposed for three decades [17, 18, 82, 91, 201, 203]. The
idea is to decompose the signal into several frequency banks and evaluate the distortion with a different weight for each subband according to the human perception.
Statistical models are used to capture the subjective quality from objective metrics.
Wang et al. correlated the distortion to the perceptual quality by separating the
CHAPTER 2. BACKGROUND
24
structural distortion from the non-structural one [197]. Sheikh et al. suggested evaluating the fidelity using mutual information between the reference and the distorted
videos based on the proposed natural scene statistics [165]. The subjective distortionrate performance of video encoders can benefit from full-reference quality assessment,
since video encoders have access to the original content. For most video broadcasting
applications, since received video and original video are not available at the same site,
the full-reference quality assessment is not applicable.
2.5.2
No-Reference Quality Assessment
No-reference methods estimate the quality at the client device using the received
video alone. Common approaches use statistical image models and temporal information of videos to estimate objective or subjective quality. Turaga et al. estimated
the quantization error power of intra coded images using the statistics of the discrete
cosine transform coefficients [182]. For video, Reibman et al. additionally considered
the temporal dependency to estimate the mean squared error of received video degradation due to packet loss [149]. Yamada et al. proposed estimating the MSE due
to packet loss using effectiveness of error concealment which is derived from the discontinuity of concealed and correctly decoded blocks [210]. Approaches of subjective
quality estimation include using natural scene statistics [164], estimation of blocking
and ringing artifacts [109,110,196,198], and estimation using luminance gradients [54].
Researchers additionally use temporal information to assess the visual quality of video
content [117, 150, 212]. Since many video coding schemes introduce deblocking filters
and post processing to enhance subjective quality, the distortions are difficult to
quantify without the reference. Another type of approach in no-reference quality
assessment uses watermarking. Researchers have suggested embedding a watermark
into the original content to assess the received quality by relating the received reconstruction PSNR to the watermark detection rate [78, 169], comparing to the original
watermarking signal [23,53], or delivering the original quality features for comparison
using watermarking [200]. The watermark approaches offer more accurate quality
CHAPTER 2. BACKGROUND
25
estimation, but the tradeoff is quality degradation due to watermark embedding or
higher bit rate required to compress the media content to avoid watermark breaking.
2.5.3
Reduced-Reference Quality Assessment
The reduced-reference quality assessment approaches achieve more accurate quality estimation by providing features of the original video at a low bit rate. The
first reduced-reference video quality assessment system was proposed by Webster et
al. [202]. Features derived from spatial and temporal pixel gradients of the original
video content are sent to the receiver. The received video quality is estimated by combining the distortion of the features. This scheme has been refined by using features
describing the spatial activity [204] and color coherence [205]. Most researchers in
this field focus on investigating feature extraction according to target quality metrics.
In [49, 52, 90], features including blocking, blur, edge-based image activity, gradientbased image activity, and intensity masking, are transmitted and combined into a
hybrid image quality metric for the receiver quality assessment. Extensions include
extracting the features in multi-resolution images [51, 68, 121], assessing the quality
by treating the region of interest and background differently [50], and using a neural
network to learn the estimator of subjective quality from features [93]. A statistical
approach is proposed by Wang et al.. The authors propose using the generalized
Gaussian distribution to model wavelet coefficients of the original image and send
the estimated parameters as the reduced reference. The quality measurement is derived from the Kullback-Leibler divergence between the original model specified by
the parameters and the received image statistics [199].
Less work has been done on the coding of features. Yamada et al. propose estimating the reconstruction PSNR by sending a representative luminance value and
an entropy coded bit map indicating the positions in the original image having the
representative luminance value [211]. The ITU-T J.240 Recommendation suggests a
subsampling method using a projection of the image signal after whitening in both
spatial and Walsh-Hadamard transform domains [80,86]. Conventional source coding
CHAPTER 2. BACKGROUND
26
of the projections is not efficient due to the large variance of the whitened coefficients. In contrast, we report an efficient coding scheme using distributed source
coding to exploit the correlation between the projections of the original and received
images [32, 108]. Tagliasacchi et al. propose similar schemes using Wyner-Ziv coding for MSE estimation using random projection [183] and for a perceptual quality
estimation using structural features [176]. In their approaches, the decoder first reconstructs the features using minimum mean squared error (MMSE) estimation given
the side information and the reconstructed quantization indices, and then the decoder
estimates the quality according to the selected metric. Unfortunately, the MMSE reconstruction leads to a suboptimal quality estimation. Later in Chapter 6, we will
show that quality estimation that directly uses side information indices yields lower
estimation error.
2.6
Summary
This chapter reviews past approaches of robust hashing for image authentication,
covering compression invariant features, block projections, and robust projections.
Although past contributions were rarely dedicated to the coding of feature vectors,
the evidence from the literature indicates that error-correcting coding can provide
improvements in security and authentication data size reduction. We then review
the lossless distributed source coding which is the fundamental technique behind the
error-correcting coding to offer such improvements. We describe the contributions
to the secure biometrics which has similar setting to the image authentication. The
contributions indicate that the distributed source coding technique plays an important
role toward the optimal security biometric scheme. We then review some information
theoretical results in hypothesis testing with multiterminal data compression. The
results lead us to explore the potential of using distributed source coding in the image
authentication problem which will be presented in the following chapters and in the
quality monitoring system which will be described in Chapter 6.
Chapter 3
Image Authentication
Using Distributed Source Coding
This chapter studies and investigates a practical image authentication scheme using
distributed source coding. The key idea is to provide a user with a Slepian-Wolf
encoded projection of the original image as authentication data, and for the user to
attempt to decode this bitstream using the target image as side information. The
Slepian-Wolf result [167] indicates that the lower the distortion between side information and the original, the fewer authentication bits are required for correct decoding.
By correctly choosing the size of the authentication data, this insight allows us to
distinguish between legitimate encoding variations of the image and illegitimate modifications.
Section 3.1 formulates the image authentication problem and justifies the use of
block projection and distributed source coding for generating rate-efficient authentication data. Section 3.2 describes the image authentication scheme and its rationale in
detail. Simulation results presented in Section 3.3 demonstrate the tradeoffs between
the authentication data size and tampering detection performance.
27
28
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
3.1
Image Authentication Problem
Our approach to the image authentication problem is through hypothesis testing.
The authentication data provide some information about the original image to the
user. The user makes the authentication decision based on the target image and the
authentication data. We first describe a two-state channel that models the target
image. Section 3.1.2 details the statistical assumptions and describes the projection
basis.
3.1.1
Two-State Channel
We model the target image y by way of a two-state lossy channel, shown in Figure 3.1.
In the legitimate state, the channel performs lossy compression and reconstruction,
such as JPEG and JPEG2000, with peak signal-to-noise ratio (PSNR) of 30 dB or
better. In the tampered state, it additionally includes a malicious attack.
:%;($(<#$%
'$#$%
=#<.%5%0
'$#$%
>5(;(/#+
?<#;% *
>5(;(/#+
?<#;% *
=#5;%$&?<#;%
+
12,34555612,3
12,34555612,3
7&"%-%*89
:;;&-<
=#5;%$&?<#;%
+
Figure 3.1: The target image y is modeled as an output of a two-state lossy channel.
In the legitimate state, the channel consists of lossy compression and reconstruction,
such as JPEG and JPEG2000; in the tampered state, the channel further applies a
malicious attack.
Figure 3.2 demonstrates a sample input and two outputs of this channel. The
source image x is a Kodak test image at 512×512 resolution. In the legitimate state,
the channel is JPEG2000 compression and reconstruction at (the worst permissible)
30 dB PSNR. In the tampered state, a further malicious attack is applied: a 19×163
pixel text banner is overlaid on the reconstructed image and some objects are removed.
The joint statistics of x and y vary depending on the state of the channel. We
illustrate this by plotting in Figure 3.3 the luminance difference between the target
29
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
(a)
(b)
(c)
Figure 3.2: An image from the Kodak test image set. (a) x original, (b) y at the
output of the legitimate channel, and (c) y at the output of the tampered channel.
and the original images. In the legitimate state, the difference resembles white noise
due to the compression; in the tampered state, the channel additionally introduces
tampering which results in image-like differences in some regions.
Based on the aforementioned observation, we describe the image authentication
problem in a hypothesis testing formulation:
x−y = z =
!
z0 , if the channel is in the legitimate state,
z1 , if the channel is in the tampered state
(3.1)
where we model z0 as white noise, and z1 similar to z0 except that it has some regions
which contain image-like noise. Next, we will describe the statistical assumptions on
z0 and z1 .
3.1.2
Residual Statistics
To illustrate the residual statistics, we use Kodak test images and compress them
at different qualities to generate legitimate images. We additionally overlay text
banners and remove some objects in the images to generate tampered images like
Figure 3.2. We assume that the residual process z in tampered or legitimate regions
is wide-sense stationary. Figure 3.4 plots sample autocorrelation functions Rzz (k, l) =
30
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
(a)
(b)
Figure 3.3: The difference between the two-state lossy channel input and output. (a)
The difference resembles white noise in the legitimate state. (b) In the tampered
state, the channel introduces tampering resulting in image-like differences.
E[(z(m, n) − µz )(z(m − k, n − l) − µz )] of the residual in legitimate and tampered
regions, where µz is the mean of the residual z assumed to be zero. The residual z in
legitimate regions is less correlated to its neighbors, and more correlated in tampered
regions.
We, therefore, model the autocorrelation function of z as:
Rzz (k, l) =
!
σ02 δ(k, l),
σ12
exp(−λ1 |k|) exp(−λ2 |l|) +
legitimate region
σ02 δ(k, l),
tampered region
(3.2)
where the autocorrelation function in the tampered region is based on the separable image model [81]. To find important components that distinguish legitimate and
tampered regions, we plot power spectral density (PSD) functions Φzz (ω1 , ω2 ) (based
on the model of z in Equation (3.2) with λ1 = λ2 = 0.025) at ω2 = 0 in Figure 3.5.
The PSD in legitimate regions is flat and in tampered regions is peaky at low frequency. This suggests low frequency components can greatly distinguish legitimate
and tampered regions while high frequency components offer less discrimination.
31
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
1
1
0.5
0.5
0
0
10
10
0
10
0
−10
−10
l
k
(a)
10
0
0
−10
−10
l
k
(b)
Figure 3.4: Sample autocorrelation function Rzz normalized according to Rzz (0, 0) in
(a) legitimate regions, Rzz (0, 0) = 47.9, and (b) tampered regions, Rzz (0, 0) = 1326.8.
The residual z in legitimate regions is less correlated to its neighbors, while it is more
correlated in tampered regions.
We exercise this assumption by plotting the distribution of the difference Z =
Y − X, where X and Y are image projections of x and y in Figure 3.2, respectively.
The projections are blockwise of size 16×16 pixels. In Figure 3.6, we use the block
mean as the projection. Since the samples of the projection difference Z are sums
of compression noise, the distribution of Z resembles a Gaussian, by the central
limit theorem. In the tampered channel state, the image samples in the tampered
region are unrelated to those of the original image and have large variance in low
frequencies, giving the distribution of Z non-negligible tails. On the other hand, in
Figure 3.7, we use the highest frequency basis in the 2D Hadamard transform as
the projection basis. Both the difference distributions of legitimate and tampered
images resemble a Gaussian and are similar to each other. This indicates that the
high frequency projection hardly distinguishes tampered images from legitimate ones.
Both Figure 3.6 and Figure 3.7 show that image projections of the legitimate image
are highly correlated to the original image projection.
32
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
65
:
?"/*$*4#$"
2#4@"&".
" ;;<!=5>
65
65
65
65
9
8
7
6
$#
$
#
8
#
5
!
8
#
Figure 3.5: Power spectral density function of z at ω2 = 0 in legitimate and tampered
regions.
Now we describe the image authentication problem at the blockwise projection
level in the hypothesis testing setting:
X|Y ∼
!
P (X|Y ) ∼ N (Y, σ02 ),
Q(X|Y ) ∼ (1 −
γ)N (Y, σ02 )
y is legitimate
+ γPtampered (X|Y ), y is tampered
(3.3)
where γ ∈ [0, 1] is the tampered fraction of image blocks, and Ptampered (X|Y ) is the
probability model for tampered blocks depending on the projection basis. We assume
that Ptampered (X|Y ) = U(X) is a uniform distribution over the dynamic range of X
when we use the mean projection. Having both projections X and Y , the optimal
decision is based on the likelihood ratio test:
P (X,Y )
Q(X,Y )
≷ T . The next section will
describe our image authentication scheme which uses these statistical assumptions to
efficiently generate authentication data by using distributed source coding.
33
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
0
0
10
10
Difference PMF in legitimate state
Gaussian approximation
of difference in legitimate state
Difference PMF in tampered state
Gaussian approximation
of difference in legitimate state
−1
−1
10
10
−2
−2
10
10
−3
−3
10
10
−4
10
−4
0
20
40
60
Difference Z = Y − X
80
100
10
0
20
40
60
Difference Z = Y − X
(a)
80
100
(b)
Figure 3.6: The difference distributions between the two-state lossy channel input and
output using the blockwise mean as the projection. (a) The difference distribution
resembles a Gaussian in the legitimate state. (b) In the tampered state, the tampered
channel introduces larger deviations.
0
0
10
10
Difference PMF in legitimate state
Gaussian approximation
of difference in legitimate state
Difference PMF in tampered state
Gaussian approximation
of difference in legitimate state
−1
−1
10
10
−2
−2
10
10
−3
−3
10
10
−4
10
−5
−4
0
5
10
15
20
25
Difference Z = Y − X
(a)
30
35
40
10
−5
0
5
10
15
20
25
Difference Z = Y − X
30
35
40
(b)
Figure 3.7: The difference distributions between the two-state lossy channel input
and output using a high frequency projection. (a) The difference distribution in
the legitimate state. (b) The difference distribution in the tampered state. Both
the difference distributions resemble a Gaussian and are similar to each other. This
means that the high frequency projection can hardly distinguish tampered images
from legitimate ones.
34
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
3.2
Image Authentication System
Figure 3.8 depicts the image authentication scheme using distributed source coding.
The left-hand side of Figure 3.8 shows that the authentication data consist of a
Slepian-Wolf encoded quantized image projection of x and a digital signature of that
version. The verification decoder, in the right-hand side of Figure 3.8, knows the
statistics of the worst permissible legitimate channel and can correctly decode the
authentication data with the help of an authentic image y as side information.
>5(;(/#+&?<#;%
*
=>*(9;&;#?@A&''#"
"#/03<
'%%0 /0
G&'.*F
2/*H#-;%*'
E@#/$(F%0&?<#;%&
G53H%*$(3/ !-
G&'.*F
2/*H#-;%*'
'+%.(#/AB3+C
D($)$5%#<126!-8
?<#;%&G53H%*$(3/&!
I8&';%J&;%*'
'(0%&?/C35<#$(3/&#
=#5;%$&?<#;%
+
!"#$%&'()*"+
,'-*.#/
"%*3/)$5@*$%0&?<#;%&
G53H%*$(3/ !-3
!"#$%&'()*"+
0#-*.#/
@/B$;*C/&$A%-?
D&9A?E8'-;%*'
"#/03<
'%%0 /0
@/B$;*C/&$A%D&9A?E8'-;%*'
?<#;%
-(;%)$
:9BFF#;/%,'-/B$;%*'
G5(I#$%&J%,
-(;($#+
'(;/#$@5%
,6!-./08
:9BFF#;/%0#-/B$;%*'
?<#;%
-(;%)$
@*F$&/%9*'
G@K+(*&J%,
Figure 3.8: Image authentication system using distributed source coding. The authentication data consist of a Slepian-Wolf encoded quantized pseudorandom projection of the original image, a random seed, and a signature of the image projection.
The target image is modeled as an output of the two-state lossy channel shown in
Figure 3.1. The user projects the target image using the same projection to yield
the side information and tries to decode the Slepian-Wolf bitstream using the side
information. Once the decoding fails, i.e., the hash value of the reconstructed image
projection does not match the signature, the verification decoder claims it is tampered, otherwise, the reconstructed image projection along with the side information
will be examined using hypothesis testing.
In our authentication system shown in Figure 3.8, a pseudorandom projection
(based on a randomly drawn seed Ks ) is applied to the original image x and the
projection coefficients X are quantized to yield Xq . The authentication data are
comprised of two parts, both derived from Xq . The Slepian-Wolf bitstream S(Xq )
is the output of a Slepian-Wolf encoder based on low-density parity-check (LDPC)
35
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
codes [111,112] and the much smaller digital signature D(Xq , Ks ) consists of the seed
Ks and a cryptographic hash value of Xq signed with a private key.
The authentication data are generated by a server upon request. Each response
uses a different random seed Ks , which is provided to the decoder as part of the
authentication data. This prevents an attack which simply confines the tampering to the nullspace of the projection. Based on the random seed, for each 16×16
nonoverlapping block Bi , we generate a 16×16 pseudorandom matrix Pi by drawing
its elements independently from a Gaussian distribution N (1, σp2) and normalizing so
that ||Pi ||2 = 1. We choose σp = 0.2 empirically. In this way, we maintain the nice
properties of the mean projection as suggested in the previous section while gaining
sensitivity to high-frequency attacks. The inner product +Bi , Pi , is quantized into an
element of Xq .
The rate of the Slepian-Wolf bitstream S(Xq ) determines how statistically similar
the target image must be to the original to be declared authentic. If the conditional
entropy H(Xq |Y ) exceeds the bit rate R in bits per pixels, Xq can no longer be decoded
correctly [167]. Therefore, the rate of S(Xq ) should be chosen to distinguish between
the different joint statistics induced in the images by the legitimate and tampered
channel states. At the encoder, we select a Slepian-Wolf bit rate just sufficient to
authenticate both legitimate 30 dB JPEG2000 and JPEG reconstructed versions of
the original image.
At the receiver, the user seeks to authenticate the image y with authentication data
S(Xq ) and D(Xq , Ks ). It first projects y to Y in the same way as during authentication
data generation. A Slepian-Wolf decoder reconstructs Xq & from the Slepian-Wolf
bitstream S(Xq ) using Y as side information. Decoding is via the LDPC messagepassing algorithm [111, 112] initialized according to the statistics of the legitimate
channel state at the worst permissible quality for the given original image. Finally,
the image digest of Xq & is computed and compared to the image digest, decrypted
from the digital signature D(Xq , Ks ) using a public key. If these two image digests do
not match, the receiver recognizes that image y is tampered, otherwise the receiver
makes a decision based on the likelihood ratio test:
P (Xq " ,Y )
Q(Xq " ,Y )
≶ T , where P and Q are
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
36
probability models derived from (3.3) for legitimate and tampered states, respectively,
and T is a fixed decision threshold.
3.3
Simulation Results
We use the test images, shown in Appendix A, at 512×512 resolution in 8-bit gray
scale resolution. The two-state channel in Figure 3.1 has JPEG2000 or JPEG compression and reconstruction applied at several qualities. The malicious attack consists of the overlaying of a 19×163 text banner at a random location in the image
or removing a randomly selected Maximally Stable Extremal Region (MSER) [122]
by interpolating the region from its boundaries. The text color is white or black,
whichever is more visible, to avoid generating trivial attacks, such as white text on a
white area.
The quantization of the authentication encoder is varied so that the Slepian-Wolf
encoder processes between 1 to 8 bitplanes, starting with the most significant. The
Slepian-Wolf codec is implemented using rate-adaptive LDPC codes [188] with block
size of 1024 bits. During authentication data generation, the bitplanes of X are encoded successively as LDPCA syndromes. The bitplanes are conditionally decoded,
with each decoded bitplane acting as additional side information for subsequent bitplanes, as in [7].
3.3.1
Authentication Data Size
Figure 3.9 compares the minimum rate that would be required to decode the SlepianWolf bitstream S(Xq ) for side information Y due to legitimate and tampered channel
states for Lena with the projection X quantized to 4 bits. The following observations
also hold for other images and levels of quantization. The rate required to decode
S(Xq ) with legitimately created side information is significantly lower than the rate
(averaged over 100 trials) when the side information is tampered, for JPEG2000 or
JPEG reconstruction PSNR above 30 dB. Moreover, as the PSNR increases, the rate
for legitimate side information decreases, while the rate for tampered side information
37
Rate (bits per pixel of original image)
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
0.016
0.014
0.012
0.01
Rate with Conventional Fixed Length Coding
Minimum Rate for Tampered State with DSC
Minimum Rate for Legitimate State with DSC
0.008
0.006
0.004
0.002
0
28
30
32
34
36
38
Reconstruction PSNR (dB)
40
42
Figure 3.9: Minimum rate for decoding Slepian-Wolf bitstream for the image Lena
with the projection X quantized to 4 bits.
stays high and close to the conventional fixed length coding. The rate gap justifies
our choice for the Slepian-Wolf bitstream size: the size just sufficient to authenticate
both legitimate 30 dB JPEG2000 and JPEG reconstructed versions of the original
image.
Figure 3.10 shows the maximum selected Slepian-Wolf bitstream size in bytes
among all the test images from 1 to 8 bits in quantization of X. For 4-bit quantization,
the Slepian-Wolf bitstream size is less than 80 bytes or 2.3% of the encoded file sizes
at 30 dB reconstruction. Compared to conventional fixed length coding, distributed
source coding offers a great rate saving. For authentication data size of 120 bytes,
conventional fixed length coding can only deliver 1-bit quantized projections, while
distributed source coding can offer 5-bit precision. The overall effect is lower decision
error.
38
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
1200
Authentication Data Size (Bytes)
Conventional Fixed Length Coding
Distributed Source Coding
1000
800
600
400
200
0
1
2
3
4
5
6
Number of Bits in Quantization
7
8
Figure 3.10: Authentication data sizes in number of bytes using convention fixed
length coding and distributed source coding for different number of bits in quantization.
3.3.2
Receiver Operating Characteristic
We now fix the authentication data sizes of different numbers of bits in quantization
shown in Figure 3.10 to evaluate the tampering detection using 3,450 legitimate and
3,450 tampered test images. We measure the false acceptance rate (the chance that
a tampered image is falsely accepted as a legitimate one) and the false rejection rate
(the chance that a legitimate image is falsely detected as a tampered one.) Figure 3.11
compares the receiver operating characteristic (ROC) curves for tampering detection
with different numbers of bits in quantization by sweeping the decision threshold T
in the likelihood ratio test. In the likelihood ratio test, we set the variance of the
Gaussian in P to be 2 and γ (the convex combination parameter in Q in (3.3)) to be
0.02.
Figure 3.11 shows that higher quantization precision offers better detection performance, but this comes at the cost of more authentication data. Figure 3.12 combines
39
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
0
10
−1
False Acceptance
10
−2
10
−3
10
1−bit Quantization
2−bit Quantization
4−bit Quantization
8−bit Quantization
−4
10
−4
10
−3
10
−2
10
False Rejection
−1
10
0
10
Figure 3.11: ROC curves of tampering detection with different number of bits in quantization of X for test images. This demonstrates that higher quantization precision
offers better detection performance.
the results of Figures 3.10 and 3.11, depicting the ROC equal error rate versus the
authentication data size in bytes for different coding methods. The equal error rates
are interpolated from the ROC curves as the points where the false acceptance rate
equals the false rejection rate. Distributed source coding reduces the authentication
data size by 75% to 83% compared to conventional fixed length coding at the same
ROC equal error rates of 2%.
Now, we compare our authentication system to others in the literature: Lin et
al. [99], Fridrich [57], and Swaminathan et al. [175]. The method described in [99]
is JPEG-inspired. The first 2 DCT coefficients (according to the zigzag order) per
8×8 image block are selected to generate authentication data of 512 bytes per image.
The method proposed by Fridrich is block projection based. We set it to generate
20 bits per 64×64 block according to zero-mean low-pass pseudorandom projections.
This yields 168 bytes per image. The verification is based on Hamming distance of
the hash values. Swaminathan’s method takes circular summations of the 2D Fourier
40
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
0
10
ROC Equal Error Rate
Distributed Source Coding
Conventional Fixed Length Coding
−1
10
−2
10
0
200
400
600
800
1000
Authentication Data Size (Bytes)
1200
Figure 3.12: ROC equal error rates for different authentication data sizes using conventional fixed length coding and distributed source coding.
transformed image to generate hashes of 100 bytes per image. Our approach using
distributed source coding generates 78 bytes per image.
Figure 3.13 plots the ROC curves of various image authentication systems by
sweeping their decision thresholds. Our approach outperforms the methods proposed
by Fridrich and Swaminathan et al. and performs closely to Lin’s method which
requires the authentication data size of 512 bytes while our approach requires only
78 bytes per image.
3.4
Summary
This chapter presents a statistical model for the image authentication problem using a
two-state channel. The proposed scheme captures the spatial structure of tampering
noise using a blockwise projection. The analysis suggests that the mean projection
be used as the principal discriminant basis. For reasons of security, we instead use
41
CHAPTER 3. IMAGE AUTHENTICATION USING DSC
0
10
−1
False Acceptance
10
−2
10
−3
10
Fridrich
Lin
Swaminathan
Our approach
−4
10
−4
10
−3
10
−2
10
False Rejection
−1
10
0
10
Figure 3.13: ROC curves of various authentication methods.
a mean plus pseudo-random noise as the projection basis. The quantized projection
coefficients of the original image are compressed at a proper rate by the SlepianWolf encoder to yield the authentication data, which can be correctly decoded using
authentic images as side information. Distributed source coding provides robustness against various legitimate encodings while detecting malicious modifications at
a much lower rate of authentication data than conventional fixed length coding. The
proposed authentication system has lower detection error rates and smaller authentication data size compared with other systems. The authentication decoder presented
in this chapter addresses various lossy compressions. The next chapter will discuss
an adaptive distributed source coding decoder using a statistical method to broaden
the robustness of the system for some common adjustments, such as contrast and
brightness adjustment, and affine warping.
Chapter 4
Learning Unknown Parameters
of Image Adjustment
The previous chapter presents an image authentication scheme that distinguishes
legitimate encodings from tampering using distributed source coding. It is very common for the target image to be adjusted to accommodate the display capability for a
better content presentation. For example, the image might be cropped and resized to
meet the size and resolution of the client display or contrast and brightness adjustment may be applied to an image that is too gloomy or over-exposed. If we consider
those adjusted images as legitimate ones, the image authentication system described
in the previous chapter cannot authenticate those legitimate images because the side
information would be considered as tampered due to the adjustments.
As mentioned in Chapter 2, past approaches address this challenge by using invariant projections or features robust against the legitimate editings. These approaches
are not flexible: they work well on the editing for which they are designed, but might
fail for other types of editing. An apparent workaround is to try every possible adjustment to align the received image. Heuristic approaches would exhaustively search
over all possible editing space to authenticate the target image. The complexity of
this approach grows exponentially as the dimension of the editing space increases.
42
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
43
This chapter presents a solution in which the authentication decoder learns the
editing parameters directly from the target image through decoding the authentication data using an expectation maximization (EM) algorithm. Section 4.1 introduces
a two-state channel with unknown editing parameters to formulate the problem. Section 4.2 describes the proposed authentication decoder for images that have undergone
contrast and brightness adjustment. Section 4.3 presents our solution to authentication of affine warped images. Section 4.4 extends the decoder to address images that
have simultaneously undergone contrast, brightness, and affine warping adjustment.
Experimental results in Section 4.5 demonstrate that the EM decoder can distinguish
legitimate editing from malicious tampering while accurately learning the parameter.
Authentication data size is comparable to the oracle decoder which knows the ground
truth parameters.
4.1
Two-State Channel
with Unknown Adjustment Parameters
We model the target image by way of a two-state channel with unknown adjustment
parameters as shown in Figure 4.1. In both states, the channel adjusts the image via
legitimate editing with a fixed but unknown parameter θ. In the legitimate state,
we model y = f (x; θ) + z, where x and y are the original and the target images,
respectively, and z is noise introduced by compression and reconstruction. In the
tampered state, the channel additionally applies malicious tampering.
Figure 4.2 demonstrates the channel for a Kodak test image at 512×512 resolution. In Figure 4.2(b), a target image has undergone contrast and brightness
adjustment. We have f (x; α, β) = αx + β, where α, β ∈ R are contrast and bright-
ness adjustment parameters, respectively. In Figure 4.2(c), the channel applies affine
warping: y(m) = f (x; A, b) = x(Am + b), where m ∈ R2 are the correspond-
ing coordinates in the target image, and A ∈ R2×2 , b ∈ R2 are transformation
and translation parameters, respectively. Figure 4.2(d) shows a target image which
has simultaneously undergone contrast, brightness and affine warping adjustment:
44
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
?"/*$*4#$"
I$#$"
D&*/*,#B
E4#/"
!
?"/*$*4#$"F'.*$*,/
#<GH!>
2#4@"&".
I$#$"
D&*/*,#B
E4#/"
!
?"/*$*4#$"F'.*$*,/
#<GH!>
2#&/"$
E4#/"
"
?+33C
-+4@&"33*+,
?+33C
-+4@&"33*+,
A#B*)*+%3
2#4@"&*,/
2#&/"$
E4#/"
"
Figure 4.1: The target image is modeled as an output of a two-state channel affected
by a global editing function f (.; θ) with unknown but fixed parameter θ. In the
tampered state, the channel additionally applies malicious tampering.
y(m) = f (x; A, b, α, β) = αx(Am + b) + β. In the last case, there are 8 scalar
parameters. Heuristic methods may need to decode and test authentication data using 108 possible candidates given 10 for each parameter. This makes the exhaustive
search practically infeasible. Moreover, since the authenticity decision is based on
likelihood ratio test:
P (Xq ,y;θ)
Q(Xq ,y;θ)
≷ T , accurate estimation of θ is needed for confident
decision results.
Figure 4.3 shows an unrealistic solution. The decoder has an oracle knowing the
true editing parameters of the target image corresponding to the authentication data.
(a)
(b)
(c)
(d)
Figure 4.2: One of the Kodak test images. (a) The original image; (b) a legitimate
image with contrast increased by 20% and brightness decreased by 10/255; (c) a
legitimate image rotated 5-degrees around the center; and, (d) a legitimate image
with contrast increased by 20%, brightness decreased by 10/255, and rotated 5 degrees
around the center. All target images (b)-(d) are compressed and reconstructed by
JPEG at 30 dB PSNR.
45
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
The target image is then compensated using the parameters provided by the oracle.
Then the decoder decodes the Slepian-Wolf bitstream and tests the target image
and reconstructed image projection in the same way described in Chapter 3. The
following sections will show how to turn this unrealistic solution into a practical one
using statistical learning techniques for various editing models.
=#5;%$&?<#;%
+
K#C%;%F&;#?,.%;%'C?
@*F$#'9&;%*'
'+%.(#/AB3+C
D($)$5%#< 26!-8
!"#$%&'()*"+
0#-*.#/
!
'#..6!-8
L/&-"#
"%*3/)$5@*$%0
?<#;%&G53H%*$(3/
!-3
Figure 4.3: The oracle decoder knows the parameters and compensates the target
image to align with the authentication data. Then the Slepian-Wolf bitstream is
decoded using the compensated target image as side information to yield an a posteriori pmf of the quantized projection Papp (Xq ). The reconstructed quantized image
projection is the result of a hard decision on Papp (Xq ).
4.2
EM Decoder
for Contrast and Brightness Adjustment
This section considers a target image that has undergone contrast and brightness
adjustment. The example shown in Figure 4.2(b) has the contrast and brightness parameters (α, β) = (1.2, −10), such that y = αx+β +z. Without knowing the contrast
and brightness adjustment parameters, the decoder of Chapter 3 requires a high rate
of the Slepian-Wolf bitstream to successfully decode, since the decoder is unaware of
the adjustment. The affine relationship is preserved by the random image projection
due to its linearity; that is, Y = αX + β + Z. Consequently, even legitimate Y is
poor side information for the decoding of X and the decoder will treat the adjusted
image as tampering. Unlike past approaches in which the projection or the features
might be invariant to the contrast and brightness adjustment, we solve this problem
46
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
by decoding the authentication data while learning the parameters that establish the
correlation between the target and original images. Estimation of the contrast and
brightness adjustment parameters requires the target image and the original image
projections, but the latter is not available before decoding. This situation with latent
variables to estimate can be addressed using the statistical method called expectation
maximization.
Figure 4.4 shows the Slepian-Wolf decoder with EM. It decodes the Slepian-Wolf
bitstream S(Xq ) using the target image projection Y as side information and yields
the reconstructed image projection Xq & . Note that it now decodes the bitstream
via an EM algorithm that updates the a posteriori probability mass function (pmf)
Papp (Xq ) in the E-step and updates α and β by maximum likelihood estimation in
the M-step.
=#5;%$&?<#;%&
G53H%*$(3/
!
,9;%F&;#.?@*';/&9;?&'.?
M/%CA;'#99?:.H89;F#';
'+%.(#/AB3+C
D($)$5%#< "6#$8
!"#$%&'()*"+
0#-*.#/
!&'"
2&/&F#;#/
,9;%F&;%*'
%#..6#$8
"%*3/)$5@*$%0
?<#;%&G53H%*$(3/
#$(
Figure 4.4: The Slepian-Wolf decoder with contrast and brightness adjustment learning decodes the Slepian-Wolf bitstream S(Xq ) using the side information Y compensated with the previously estimated contrast and brightness adjustment parameters.
Each iteration produces soft estimation of Xq in the E-step and updates the contrast
and brightness adjustment parameters in the M-step.
In the E-step, we fix contrast α and brightness β at their current estimates. Next
we apply contrast and brightness adjustment with these values to obtain a priori pmfs
of the image projection Xq (i) from the side information Y (i). Finally, we run one
iteration of joint bitplane LDPC decoding on the a priori pmfs with the Slepian-Wolf
bitstream S(Xq ) to produce extrinsic pmfs Papp (Xq (i) = xq ), which we denote Qi (xq )
for convenience.
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
47
In the M-step, we fix these extrinsic pmfs Qi (xq ) of the projection Xq (i) and
estimate α and β with reference to the side information Y (i). For robustness, we
only consider projections, for which maxxq Qi (xq ) > T = 0.9995, denoting the set
of eligible indices as C.1 We now derive optimality conditions2 on the parameters α
and β for the maximization of a lower bound L̂(α, β) of the log-likelihood function
L(α, β). The lower bound is due to Jensen’s inequality and the concavity of log(.):
L(α, β) ≡
=
&
log P (Xq (i), Y (i); α, β)
i∈C
&
log
&&
i∈C
Qi (xq )P (xq , Y (i); α, β)
xq
i∈C
≥
&
xq
Qi (xq ) log P (xq |Y (i); α, β) + log P (Y (i))
≡ L̂(α, β),
(4.1)
(4.2)
where the distribution P (xq |Y (i); α, β) is modeled as a quantized Gaussian with mean
at α1 (Y (i) − β) and variance σz2 /α2 . The quantization of X is uniform and saturated
for X less than 0 or greater than 255. Setting partial derivatives of L̂(α, β) with
respect to α and β to zero, we obtain the optimality conditions :
" "
i
i
µ
Y
(i)
−
x
i∈C
i∈C
j∈C µx Y (j)
α=
"
"
"
|C| i∈C µix2 − i∈C j∈C µix µjx
1 &
β=
Y (i) − αµix
|C| i∈C
|C|
1
"
To guarantee that C is nonempty, we make sure to encode a small portion of the quantized image
projection Xq with degree-1 syndrome bits. The decoder knows those values with probability 1 and
includes their indices in C.
2
Appendix B discusses the concavity of L̂(α, β) to claim the optimality conditions.
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
48
where
µix =
&
Qi (xq )E[X(i)|q(X(i)) = xq , Y (i); α, β]
xq
µix2
=
&
xq
Qi (xq )E[X(i)2 |q(X(i)) = xq , Y (i); α, β].
Since both the left and right hand sides of the optimality conditions contain α and
β, we update them iteratively until convergence or at most 30 iterations. The outer
loop of EM iterations terminates when hard decisions on Papp (Xq (i) = xq ) satisfy the
constraints imposed by S(Xq ).
If the hash value of Xq does not match the one in the authentication data, the
decoder declares the image to be tampered; otherwise, we make a decision based on
the log likelihood ratio test,
P (Xq ,y;α,β)
Q(Xq ,y;α,β)
≷ T , where T is a fixed threshold.
Figure 4.5 demonstrates the efficiency of the EM decoder by illustrating the traces
of parameter searching for different decoders. The ground truth of the contrast parameter is 0.84, and brightness is 10. The oracle decoder directly outputs the ground
truth. The decoder unaware of adjustment uses 1 and 0 for contrast and brightness
parameters, respectively. In Figure 4.5(c), the exhaustive searching decoder tries to
decode the authentication data using samples in the parameter space from -0.75 to
1.2 of contrast parameter and -20 to 20 of brightness parameter until it obtains a
parameter sample that can successfully decode the bitstream. The discrete search
space makes the resulting parameters inaccurate and the computational complexity
will grow exponentially as the parameter dimension increases. Figure 4.5(d) shows
the search trace of our proposed EM decoder. Even though the initial parameters
are far from the ground truth, the decoder approaches the ground truth within a
manageable number of iterations. Unlike the exhaustive search, the EM decoder estimates the parameters in a continuous space. Simulation results in Section 4.5 will
demonstrate the accuracy of the our parameter estimation.
49
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
25
25
Ground Truth
Searched Parameters
Final Estimated Parameters
15
10
5
0
−5
−10
−15
−20
15
10
5
0
−5
−10
−15
0.8
0.9
1
1.1
Contrast Parameter
−20
1.2
(a) Oracle decoder
0.9
1
1.1
Contrast Parameter
1.2
25
Ground Truth
Searched Parameters
Final Estimated Parameters
20
15
10
5
0
−5
−10
−15
Ground Truth
Initial Parameters
Searched Parameters
Final Estimated Parameters
20
Brightness Parameter
Brightness Parameter
0.8
(b) Unaware of adjustment
25
−20
Ground Truth
Final Estimated Parameters
20
Brightness Parameter
Brightness Parameter
20
15
10
5
0
−5
−10
−15
0.8
0.9
1
1.1
Contrast Parameter
(c) Exhaustive search
1.2
−20
0.8
0.9
1
1.1
Contrast Parameter
1.2
(d) EM decoder
Figure 4.5: Search traces for different decoders. (a) The oracle decoder directly
outputs the ground truth; (b) the decoder unaware of adjustment outputs (1,0) for
contrast and brightness parameters; (c) the exhaustive searching decoder tries to
decode the authentication data using the parameters in the discrete search space,
until it reaches a parameter that can successfully decode the authentication data;
and, (d) the proposed EM decoder iteratively updates the parameters and decodes
the authentication data.
4.3
EM Decoder for Affine Warping Adjustment
In the previous section, the authentication system is extended to be robust against
contrast and brightness adjustment using an EM algorithm. This section presents an
extension of robustness against affine warping adjustment which includes cropping,
resizing, rotation, and shearing. The example target image shown in Figure 4.2(c) is
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
50
first rotated counterclockwise by 5 degrees around the image center, then cropped to
512×512, and finally JPEG compressed and reconstructed at 30 dB'PSNR. Recall that
(
0.996 −0.087
we model the editing as y(m) = x(Am + b) + z(m), where A =
0.087 0.996
'
(
23
and b =
for a 5-degree counterclockwise rotation and cropping. Without
−21
knowing the adjustment parameters, the decoder requires a high rate of Slepian-Wolf
bitstream for successful decoding and suffers from a high false rejection rate. The
authentication of such image adaptation is different from the problem addressed in
the previous section, since the target image is no longer aligned with the corresponding
authentication data. Our solution is to realign the target image by estimating the
affine warping parameters using the corresponding authentication data.
The authentication decision is based on the reconstructed image projection and
the compensated target image. Due to affine warping and cropping, some portions of
the original image are cropped out in the target image y. The cropped-out areas of
the target image are not considered in the authentication decision. Figure 4.6 shows
the target image realigned to the original. The blue areas in Figure 4.6(c) indicate the
cropped-out regions. We refer to the remaining area of the image as the cropped-in
region.
The affine-learning Slepian-Wolf decoder shown in Figure 4.7 takes the SlepianWolf bitstream S(Xq ) and the target image y and yields the reconstructed image
projection Xq & . It now decodes the authentication via an EM algorithm. The E-step
updates the a posteriori probability mass functions (pmf) Papp (Xq ) using the SlepianWolf decoder and also estimates corresponding coordinates for a subset of reliablydecoded projections. The M-step updates the affine warping parameters based on
the corresponding coordinate distributions, denoted Papp (m) in Figure 4.7. This loop
of EM iterations terminates when hard decisions on Papp (Xq ) satisfy the constraints
imposed by S(Xq ).
In the E-step, we fix the parameters A and b at their current hard estimates.
The inverse transform is applied to the target image y to obtain a compensated image
ycomp . If the affine warping parameters are accurate, ycomp would be closely aligned to
51
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
(a) Original image
(b) Affine warped image
(c) Realigned image
Figure 4.6: Realignment of an affine warped image. (a) The original image of a Kodak
test image; (b) target image rotated counterclockwise by 5 degree and cropped to
512×512; and, (c) realigned target image color overlaid. The blue areas associated
with the 16×16 blocks indicate the cropped-out regions; the other blocks form the
cropped-in region.
the original image x in the cropped-in region. We derive a priori pmfs for the image
projections Xq as follows. In the cropped-in region, we use Gaussian distributions
centered at the random projection values of ycomp , and in the cropped-out region,
we use uniform distributions. Then, we run three iterations of joint bitplane LDPC
decoding on the a priori pmfs with the Slepian-Wolf bitstream S(Xq ) to produce a
posteriori pmfs Papp (Xq ).
We estimate the corresponding coordinates for those projections for which
maxxq Papp (Xq (i) = xq ) > T = 0.995, denoting this set of reliably-decoded projection
indices as C. We also denote the maximizing reconstruction value xq to be xmax
(i).
q
As before, the degree-1 syndrome bits are sent to guarantee that C is nonempty. We
obtain corresponding coordinate pmfs Papp (m(i) ) for these projections by maximizing
the following log-likelihood function:
L(A, b) ≡
=
&
log P (xmax
(i), n(i) , y; A, b)
q
i∈C
&
i∈C
log
)
&
m(i)
*
P (xmax
(i), n(i) , y|m(i) ; A, b)P (m(i) ) ,
q
52
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
5&"$+2()*&$+
!
!"#
'?!@
!-4*6
9'+6#&%:;4'<
=#2>2"+&* "?#$@
%&66?#$@
,+-4%>2"1-2+/
)*&$+(A"4B+-2#4%
#$&
Figure 4.7: The Slepian-Wolf decoder with affine warping parameter learning decodes
the Slepian-Wolf bitstream S(Xq ) using the target image y as side information. The
soft output of the quantized image projection Papp (Xq ) is matched to the target image
y to yield corresponding coordinate estimations in the E-step. The affine warping
parameters (A, b) are estimated in the M-step.
where n(i) is the set of top-left coordinates of the 16×16 projection blocks Bi in the
original image x, and the latent variable m(i) represents the corresponding set of
coordinates in y.
In this way, we associate a corresponding coordinate vector Papp (m(i) ) with each
projection Xq (i) in C. For the projection Xq (i), we produce the pmf Papp (m(i) = m)
by matching Xq (i) to projections obtained from y through vectors m over a small
search window. Specifically, Papp (m(i) = m) is proportional to the integral over the
quantization interval of xmax
(i) of a Gaussian centered at the projection of a block
q
at m in the image y. Figure 4.8 gives a 1D example of the resulting distribution
for the projection at n(i) = 193. The quantized projection shown as a red bar in
Figure 4.8(a) is matched against the projections of y in Figure 4.8(b) over the search
+
window to obtain Papp (m) ∝ x:Q(x)=[xmax]i P (x|Y (m(i) ))dx in Figure 4.8(c).
q
In the M-step, we re-estimate parameters A and b by holding the corresponding
coordinate pmfs Papp (m(i) ) fixed and maximizing a lower bound of the log-likelihood
53
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
4Q5
+
NQ
N5
455
5VN
N55
Q
'%#5*L
B(/03M
Q5
5
'#.. !
!-
NQ5
5
N4O NPQ NRN NSS NOT 45O 44Q 4PN 4QS 4ST
NQ5
455
5
4
(a)
5V5Q
4Q5
5
NU5
NO5
455
4N5
5
(b)
(c)
Figure 4.8: Example of corresponding coordinate estimation in 1D. A reliably decoded
quantized image projection Xq shown as a red bar in (a) matches the over-complete
image projections of the target image over a search window shown in (b). The resulting a posteriori probability of the corresponding coordinate Papp (m) is proportional
to P (Xq |Y (m)).
function L(A, b):
(A, b)
:= arg max
A,b
&&
Qi (m(i) ) log P (xmax
(i), n(i) , y|m(i) ; A, b)
q
i∈C m(i)
= arg max
A,b
&&
i∈C m(i)
,
Qi (m(i) ) log P (n(i) |m(i) , xmax
(i), y; A, b) + log P (xmax
(i), y|m(i) )
q
q
The lower bound is due to Jensen’s inequality and concavity of log(.). Note also
that P (xmax
(i), y|m(i) ) does not depend on the parameters A and b and can be thus
q
ignored in the maximization. We model P (n(i) |m(i) , xmax
(i), y; A, b) as a Gaussian
q
distribution, i.e., (n(i) − Am(i) − b) ∼ N (0, σ 2I). Similar to the method of least
squares, log P (n(i) |m(i) , xmax
(i), y; A, b) is a concave function of A and b. Taking
q
partial derivatives with respect to A and b, and setting these to zero, we obtain the
optimal updates:

A11 A21

 A12 A22

b1
b2

 .
..
..
.

 (i) (i)
T
−1
T
 := E(G G) E[G ]  n

 1 n2
..
..
.
.


,

(4.3)
54
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
where

···


G =  ···
···

E[GT G] =
(i)
m1
(i)
m2
1
···
T

··· 

···
(i)
E[(m1 )2 ]
&
 E[m(i) m(i) ]
1
2

(i)
i∈C
E[m1 ]
and
(i)
(i)
(i)
E[m1 m2 ] E[m1 ]
(i)
E[(m2 )2 ]
The likelihood ratio test for authenticity is
(i)
E[m2 ]
P (Xq ,y;A,b)
Q(Xq ,y;A,b)


(i)
E[m2 ] 

1
≷ T , measured over the
cropped-in area of the compensated target image.
4.4
EM Decoder for Contrast, Brightness,
and Affine Warping Adjustment
Joint estimation of contrast, brightness, and affine warping adjustment parameters is
not a trivial extension of the EM decoders described in Section 4.2 and Section 4.3.
Without a proper estimation of contrast and brightness adjustment parameters, the
corresponding coordinate estimation would fail. On the other hand, estimation of
contrast and brightness adjustment parameters cannot be done without reference to
corresponding coordinates. The key idea to solve this problem is to use the soft
information of corresponding coordinates in the contrast and brightness adjustment
parameter estimation.
As before, in the E-step, we fix the parameters A, b, α, and β at their current
hard estimates and obtain a compensated image ycomp . We derive a priori pmfs
for the image projections Xq as follows. In the cropped-in region, we use Gaussian
distributions centered at the random projection values of ycomp , and in the croppedout region, we use uniform distributions. Then, we run three iterations of joint
bitplane LDPC decoding on the a priori pmfs with the Slepian-Wolf bitstream S(Xq )
to produce a posteriori pmfs Papp (Xq (i) = xq ).
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
55
We estimate the corresponding coordinates m(i) for those projections for which
maxxq Papp ([Xq ]i = xq ) > T = 0.995, denoting this set of reliably-decoded projection
indices as C. We also denote the maximizing reconstruction value xq to be [xmax
]i .
q
The latent variable update can be written as
Qi (m) := P (m(i) = m|[xmax
]i , y, n(i) ; A, b, α, β)
q
In the M-step, we re-estimate the parameters A, b, α, and β by holding the
corresponding coordinate pmfs Qi (m) fixed and maximizing a lower bound of the
log-likelihood function:
L(A, b, α, β) ≡
=
&
log
i∈C
≥
=
)
&
i∈C
P (xmax
(i), n(i) , y|m(i) ; A, b, α, β)P (m(i))
q
*
Qi (m) log P (xmax
(i), n(i) , y|m; A, b, α, β)
q
m
&&
i∈C
log P (xmax
(i), n(i) , y; A, b, α, β)
q
m(i)
&&
i∈C
&
m
,
(i), y|m; α, β) . (4.4)
(i), y; A, b) + log P (xmax
Qi (m) log P (n(i) |m, xmax
q
q
The lower bound is due to Jensen’s inequality and concavity of log(.). Note also that
P (xmax
(i), y|m(i) ; α, β) does not depend on the parameters A and b, and
q
]i , y; A, b) does not depend on the parameters α and β. Thus, we can
P (n(i) |m, [xmax
q
maximize the lower bound separately over these two sets of parameters. The affine
warping parameters are updated using (4.3).
Similarly, we model P (Xqmax(i)|y, m; α, β) as a quantized Gaussian with mean at
Y (m)−β
α
and variance σz2 /α2 . The quantization of X is uniform and saturated for X
less than 0 or greater than 255. Setting partial derivatives with respect to α and β
56
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
to zero, we obtain the optimal updates3 :
α :=
|C|
"
i∈C
µiXY −
"
i∈C
"
j∈C
µiX µjY
"
" "
|C| i∈C µiX 2 − i∈C j∈C µiX µjX
1 & i
µ − αµiX ,
β :=
|C| i∈C Y
where
(i)]],
µiX = Em∼Qi [E[X|Y (m), xmax
q
µiY = Em∼Qi [Y (m)],
µiX 2 = Em∼Qi [E[X 2 |Y (m), xmax
(i)],
q
µiXY = Em∼Qi [Y (m)E[X|Y (m), xmax
(i)]].
q
The likelihood ratio test for authenticity is
P (Xq ,y;A,b,α,β)
Q(Xq ,y;A,b,α,β)
≷ T , measured over the
cropped-in area of the compensated target image.
4.5
Simulation Results
We use Kodak and classic test images, shown in Appendix A, at 512×512 resolution
in 8-bit gray resolution for the simulation. The space-varying two-state channel in
Figure 4.1 applies JPEG2000 or JPEG compression and reconstruction at several
qualities above 30 dB. In Section 4.5.1, the channel applies contrast and brightness
adjustment. Section 4.5.2 shows the results when affine warping adjustment is applied.
In Section 4.5.3, the channel simultaneously applies contrast, brightness, and affine
warping adjustment.
4.5.1
Contrast and Brightness Adjustment
Our first experiment uses Lena of size 512×512 at 8-bit gray resolution. The twostate channel in Figure 4.1 randomly selects contrast and brightness parameters (α, β)
3
Appendix B discusses the concavity of L̂(α, β) to claim the optimality conditions.
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
57
from the set {(1.2, −20), (1.1, −10), (1.0, 0), (0.9, 10), (0.8, 20)}. After adjustment,
JPEG2000 or JPEG compression and reconstruction is applied at 30 dB reconstruction PSNR. The malicious attack overlays a 20×122 pixel text banner randomly in
the image. The text color is white or black, whichever is more visible, to avoid generating trivial attacks, such as white text on a white area. The image projection X
is quantized to 4 bits, and the Slepian-Wolf encoder uses a 4096-bit LDPC code with
200 degree-1 syndrome nodes. Figure 4.9 compares the minimum rate (averaged over
20 trials) for decoding S(Xq ) with legitimate and tampered side information using
three different decoding schemes: the proposed EM decoder that learns α and β, an
oracle decoder that knows α and β, and a decoder unaware of adjustment that always
uses α = 1 and β = 0. The EM decoder separates the minimum decodable rates as
effectively as the oracle decoder, while the decoder unaware of adjustment cannot
always decode at low rates with legitimate side information. This makes the decoder unaware of adjustment unable to distinguish tampered or legitimately adjusted
images. The same observation holds for other test images.
We set the authentication data size to 107 bytes and measure the false acceptance
rate (the chance that a tampered image is falsely accepted as a legitimate one) and
the false rejection rate (the chance that a legitimate image is falsely detected as a
tampered one), using 30,000 test target images derived from 15 sample images. The
channel settings remain the same except that α and β are drawn uniformly at random
from [0.8, 1.2] and [−20, 20], respectively, and JPEG2000/JPEG reconstruction PSNR
is selected from 30-42 dB. Figure 4.10 compares the receiver operating characteristic
(ROC) curves for tampering detection of four decoders by sweeping the decision
threshold T in the likelihood ratio test. The decoder unaware of adjustment has high
possibility of false rejection, since it considers the contrast and brightness adjusted
images as tampered. The EM decoder, the oracle decoder, and the decoder using
an exhaustive search of parameters can confidently distinguish legitimate adjusted
images from tampered ones. In the legitimate case, the EM decoder estimates α and
β with mean squared error 6.2 × 10−5 and 0.9, respectively.
58
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
Rate (bits per pixel of original image)
0.016
0.014
0.012
Unaware of Adjustment for Tampered State
Unaware of Adjustment for Legitimate State
Oracle Decoder for Tampered State
Oracle Decoder for Legitimate State
EM Decoder for Tampered State
EM Decoder for Legitimate State
0.01
0.008
0.006
0.004
0.002
0
(0.8, 20)
(0.9, 10)
(1, 0)
(1.1, −10)
Contrast and Brightness Adjustment Parameters
(1.2, −20)
Figure 4.9: Minimum rates for decoding the Slepian-Wolf bitstream S(Xq ) with legitimate and tampered side information using different decoders. The decoder unaware
of adjustment requires high rates once the target image is adjusted or tampered. This
makes the decoder unaware of adjustment unable to distinguish tampered or legitimately adjusted images. Both the EM and oracle decoders separate the minimum
decodable rates.
4.5.2
Affine Warping
Now we evaluate the performance of the affine-warping EM decoder for the test images
with affine warping adjustments. The first experiment demonstrates the minimum
decodable rates for rotated and sheared target images. The two-state channel in
Figure 4.1 applies an affine warping adjustment to the images and crops them to
512×512. Then JPEG2000 or JPEG compression and reconstruction is applied at
30 dB reconstruction PSNR. In the illegitimate state, the malicious attack overlays
a 20×122 pixel text banner randomly on the image. The image projection X is
quantized to 4 bits, and the Slepian-Wolf encoder uses a 4096-bit LDPC code with
400 degree-1 syndrome nodes. Figure 4.11 compares the minimum rates for decoding S(Xq ) with legitimate test images using three different decoding schemes: the
59
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
−1
False Acceptance
10
−2
10
Oracle Decoder
Exhaustive Search
EM Decoder
Unaware of Adjustment
−3
10
−3
10
−2
10
−1
10
0
10
False Rejection
Figure 4.10: ROC curves for different decoders. The target images have undergone
random contrast and brightness adjustment and JPEG/JPEG2000 compression. The
EM decoder and the exhaustive searching decoder, which tries parameter samples at
intervals of 0.01 for α and 1 for β rounded from the ground truth, have performances
very close to that of the oracle decoder, while the decoder unaware of adjustment
rejects authentic test images with high probability.
EM decoder that learns the affine parameters, an oracle decoder that knows the parameters, and a decoder unaware of adjustment that always assumes no adjustment.
Figures 4.11 (a) and (b) show the results when the affine warping adjustments are
rotation around the image center and horizontal shearing, respectively. The EM decoder requires minimum rates only slightly higher than the oracle decoder, while the
decoder unaware of adjustment requires higher and higher rates as the adjustment
increases.
For the next experiment, we set the authentication data size to 250 bytes and
measure false acceptance and rejection rates. The acceptance decision is made based
on the likelihood ratio of Xq and y with estimated parameters within the estimated
60
0.02
0.02
0.018
0.018
Rate (bits per pixel of original image
Rate (bits per pixel of original image
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
0.016
0.014
0.012
Unaware of Adjustment
EM Decoder
Oracle Decoder
0.01
0.008
0.006
0.004
0.002
−15
−10
−5
0
5
Degree of Rotation (°)
(a)
10
15
0.016
0.014
0.012
Unaware of Adjustment
EM Decoder
Oracle Decoder
0.01
0.008
0.006
0.004
0.002
0
2
4
6
Percentage of Shearing (%)
8
10
(b)
Figure 4.11: Minimum rate for decoding authentication data using legitimate adjusted
test images as side information for different using different decoders. (a) The test
images have undergone rotation. (b) The test images have undergone horizontal
shearing. The EM decoder requires minimum rates only slightly higher than the
oracle decoder, while the decoder unaware of adjustment requires higher and higher
rates as the adjustment increases.
cropped-in blocks. The channel settings remain the same except that transform parameters A11 and A22 are randomly drawn from [0.95, 1.05], A21 and A12 from [-0.1,
0.1], and b1 and b2 from [-10, 10]. The JPEG2000/JPEG reconstruction PSNR is
selected from 30 to 42 dB. With 15,000 trials, Figure 4.12 shows the ROC curves
created by sweeping the decision threshold of the likelihood ratio. The EM decoder
performance is very close to that of the oracle decoder, while the decoder unaware
of adjustment rejects authentic test images with high probability. The exhaustive
searching decoder, which tries parameter samples at intervals of 0.01 for A and 1 for
b rounded from the ground truth, also suffers from high probability of false rejection
due to the inaccurate parameters used. In the legitimate case, the EM decoder estimates the transform parameters A11 , A21 , A12 , A22 , b1 , and b2 , with mean squared
error 6.0 × 10−7, 4.1 × 10−6 , 4.2 × 10−7 , 1.6 × 10−6 , 0.06, and 0.69, respectively.
61
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
0
10
−1
False Acceptance
10
−2
10
−3
10
Oracle Decoder
Exhaustive Search
EM Decoder
Unaware of Adjustment
−4
10
−4
10
−3
10
−2
10
False Rejection
−1
10
0
10
Figure 4.12: ROC curves for different decoders. The target images have undergone
random affine warping adjustment and JPEG/JPEG2000 compression. The EM decoder performance is very close to that of the oracle decoder, while the decoder
unaware of adjustment rejects authentic test images with high probability. The exhaustive searching decoder, which tries parameter samples at intervals of 0.01 for A
and 0.1 for b rounded from the ground truth, also suffers from high probability of
false rejection due to the inaccurate parameters used.
4.5.3
Contrast, Brightness, and Affine Warping Adjustment
We set the authentication data size to 250 bytes and measure false acceptance and
rejection rates. The acceptance decision is made based on the likelihood of Xq and
y with estimated parameters within the estimated cropped-in blocks. The channel
settings remain the same except that parameter α is randomly drawn from [0.9,1.1],
β from [-10,10], A11 and A22 from [0.95, 1.05], A21 and A12 from [-0.05, 0.05], and
b1 and b2 from [-10, 10]. The JPEG2000/JPEG reconstruction PSNR is selected
from 30 to 42 dB. With 15,000 trials, Figure 4.13 shows the ROC curves created
by sweeping the decision threshold of the likelihood ratio test. The EM decoder
performance is very close to that of the oracle decoder, while the decoder unaware
62
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
of adjustment rejects authentic test images with high probability. The exhaustive
searching decoder, which tries parameter samples at intervals of 0.01 for A and α, 0.1
for β, and 1 for b rounded from the ground truth, also suffers from high probability
of false rejection due to the inaccurate parameters used. In the legitimate case, the
EM decoder estimates the transform parameters A11 , A21 , A12 , A22 , b1 , b2 , α, and β
with mean squared error 4.5 × 10−7 , 2.6 × 10−6 , 3.4 × 10−7 , 1.6 × 10−6 , 0.05, 0.54,
2.0 × 10−5 , and 0.34, respectively.
0
10
−1
False Acceptance
10
−2
10
−3
10
Oracle Decoder
Exhaustive Search
EM Decoder
Unaware of Adjustment
−4
10
−4
10
−3
10
−2
10
False Rejection
−1
10
0
10
Figure 4.13: ROC curves for different decoders. The target images have undergone
random contrast, brightness, and affine warping adjustments and JPEG/JPEG2000
compression. The EM decoder performance is very close to that of the oracle decoder,
while the decoder unaware of adjustment rejects authentic test images with high
probability. The exhaustive searching decoder, which tries parameter samples at
intervals of 1 for b and 0.01 for the others rounded from the ground truth, also suffers
from high probability of false rejection due to the inaccurate parameters used.
CHAPTER 4. LEARNING UNKNOWN PARAMETERS
4.6
63
Summary
The image authentication system using distributed source coding has been extended
to be robust against contrast, brightness, and affine warping adjustment. The system
now decodes the Slepian-Wolf bitstream and estimates the adjustment parameters using an EM algorithm. Experimental results demonstrate that the system can distinguish legitimate encodings of authentic images from illegitimately modified versions,
despite arbitrary contrast, brightness, and affine warping adjustment, using authentication data of less than 250 bytes per image. With accurate parameter estimation
within a manageable number of iterations, our system outperforms the exhaustive
searching decoder.
Our system now can decode the Slepian-Wolf bitstream in the authentication data
using legitimate target images that might have undergone contrast, brightness, and
affine warping adjustments. The next chapter will present a method that decodes the
authentication data at low rates even with tampered target images. This enables the
system to localize the tampering.
Chapter 5
Tampering Localization
The previous chapter extends the authentication system to be robust against various
legitimate adjustments. The authentication system selects an authentication data
rate so that it can only be correctly decoded using authentic (possibly adjusted) target
images as side information. It would fail to decode the authentication data as a result
of tampering. However, localization of tampering would require reconstructing the
original image projection using the tampered image as side information. As shown in
the previous chapters, using legitimate editing models to decode the authentication
data with tampered side information would require a high authentication data rate.
An alternative to delivering the original image projection is to use conventional coding
that makes data size independent of target images. However, a better solution using
distributed source coding can be offered by leveraging the correlation between the
original image and slightly tampered target images.
This chapter presents an augmented decoder that can localize the tampering in
those images deemed to be inauthentic by using the sum-product algorithm [89] over
a factor graph that represents tampering models. Section 5.1 formulates the localization problem using a space-varying two-state channel. Section 5.2 describes the
factor graph representation of the localization decoder. The decoder can reconstruct
the image projection of the localization data using tampered images as side information. Section 5.3 presents spatial models that exploit the spatial correlation of the
tampering. Section 5.4 extends the decoder to localize the tampering in tampered
64
65
CHAPTER 5. TAMPERING LOCALIZATION
images that have undergone legitimate contrast and brightness adjustment. Simulation results in Section 5.5 demonstrate that the authentication system can localize
the tampering with high probability and that the spatial models offer additional improvements.
5.1
Space-Varying Two-State Channel
We introduce a space-varying two-state channel shown in Figure 5.1 to replace the
two-state channel for the tampering localization problem. In the legitimate state, the
channel output is legitimate editing, such as JPEG2000 compression and reconstruction. The tampered state additionally includes malicious tampering. The channel
state variable Si is defined per nonoverlapping 16×16 blocks of image y. If any pixel
in block Bi is part of the tampering, Si = 1; otherwise, Si = 0. The authentication
"
problem discussed in the previous chapters is a decision,
Si > 0; the tampering
localization problem can be formulated as deciding on Si for each block, given the
Slepian-Wolf bitstream S(Xq ). Figure 5.2(c) shows the channel states overlaid on a
tampered target image. The red blocks are tampered, and the others are legitimate.
>5(;(/#+
?<#;%&*
267&
12,34555612,3
7&"%-%*89
=&F$#/%'C
=#5;%$&?<#;%
+
267(
Figure 5.1: Space-varying two-state lossy channel. The image is divided into nonoverlapping blocks. Each block has an associated channel state indicating whether the
block is tampered or legitimate.
Given the quantized original image projection Xq , and the target image projection
Y , one can infer the channel state S using Bayes’ theorem:
P (S|Xq , Y ) =
P (Xq , Y |S)P (S)
.
P (Xq , Y )
(5.1)
66
CHAPTER 5. TAMPERING LOCALIZATION
(a) Original
(b) Tampered
(c) Channel states overlaid
Figure 5.2: The target image in (b) is a tampered version of the original image (a).
The image in (c) is the overlaid channel state for each 16×16 block. The red blocks
are tampered, and the others are legitimate.
The localization decoder requires more information than the authentication decoder
since it additionally estimates the channel states, and a tampered image is usually less
correlated to Xq than an authentic one. Fortunately, since we use rate-adaptive LDPC
codes [188], the localization decoder reuses the authentication data. Incremental
localization data are sent through the Slepian-Wolf bitstream S(Xq ). In addition, the
LDPC decoder can naturally adopt the channel state inference in (5.1). The next
section will introduce the decoder factor graph that connects the LDPC decoding
to the channel state inference. The sum-product algorithm over the factor graph
simultaneously decodes the Slepian-Wolf bitstream and localizes the tampering.
5.2
Decoder Factor Graph
A factor graph [89] is a bipartite graphical model that presents a factorization of
a joint probability distribution of random variables. There are two classes of nodes:
factor nodes and variable nodes. The variable nodes represent the random variables of
interest; the factor nodes represent the probabilistic relationships among the adjacent
variable nodes. Based on the factor graph representation, the sum-product algorithm
67
CHAPTER 5. TAMPERING LOCALIZATION
efficiently marginalizes the approximate joint distribution for all variables. The sumproduct algorithm has emerged in many applications in coding, statistical filtering,
and artificial intelligence. We apply this technique to our tampering localization
problem.
The factor graph in Figure 5.3 shows the relationship among the Slepian-Wolf
bitstream (LDPC syndromes), bits of image projection Xq (3-bit quantization in this
example), side information, and channel states. The variable nodes of interest are
[Xq1 (i), Xq2 (i), Xq3 (i)] which are the binary representations of the 3-bit quantized image
projection Xq (i) and the channel states Si . The factor node at each syndrome node
is an indicator function of the satisfaction of that syndrome constraint. The factor
fbi (Xq (i), Si ) = P (Xq (i)|Y (i); Si ) represents the relationship between image projection
Xq (i), side information Yi , and the channel state Si . When Si = 0, fbi (Xq (i), 0) is
proportional to the integral of a Gaussian distribution with mean Y (i) and a fixed
variance σz2 over the quantization interval of Xq (i). When Si = 1, fbi (Xq (i), 1) is
uniform. The factor connected to each state node fsi (Si ) = P (Si ) is the a priori
probability of channel state.
The localization decoder applies the sum-product algorithm [89] on the factor
graph to estimate each channel state likelihood P (Si = 1) and decode the SlepianWolf bitstream S(Xq ). Decoding is initialized with the syndrome node values S(Xq )
and the side information Y embedded in fbi . For each iteration, the state node Si
passes the belief message to the factor node fbi : uis→fb (s) = fsi (s) = P (Si = s). Then
the factor node fbi summarizes all incoming messages and generates the outgoing
messages as follows:
uifb →s (s)
(i,j)
∝
ufb →x (xj ) ∝
&
fbi (x1 , x2 , x3 , s)
x1 ,x2 ,x3
&
3
4
(i,k)
ux→fb (xk )
k=1
fbi (x1 , x2 , x3 , s)uis→fb (s)
{s,x1 ,x2 ,x3 }\xj
4
k∈{1,2,3}\j
(i,k)
ufb →x (xk )
(i,k)
where ux→fb is the belief message from the factor node fbi to the bit node Xqk (i)
(i,k)
(the kth most significant bits of Xq (i)) and ufb →x is the opposite. The messages are
"
(i,j)
normalized so that x u(x) = 1. The decoder takes the messages ufb →x and performs
68
CHAPTER 5. TAMPERING LOCALIZATION
(i,j)
one iteration of LDPC decoding to yield updated ux→fb for the next iteration. The a
priori probability of channel states fsi is re-estimated each iteration as follows:
fsi (s) =
N
1 & i
u
(s)
N i fb →s
The message-passing iterations terminate when the hard decisions on bits of Xq
satisfy the constraint imposed by the syndrome S(Xq ). The summary of all the incoming messages to Si nodes yields the marginal probability of the channel state:
Papp (Si = s) ∝ uifb →s (s)uifs→s (s). Finally, each block Bi of y is declared to be tam-
pered if the marginal probability Papp (Si = 1) > T , a fixed decision threshold.
IC,.&+4"
J+."3
K*$
J+."3
Ƀ
Ƀ
Ƀ
I$#$"
J+."3
# $C ?( @
, ?*()EC@! -
# $8 ?( @
*)(
, (*) ! +
# $D ?( @
Ƀ
, -? (!ED@*) Ƀ
,+( ! *)
Ƀ
"(
, (* + ! +
* +(
Figure 5.3: Factor graph for the localization decoder. The syndrome nodes are indicator functions of the satisfaction of the syndrome constraints. The factor node
connecting Xq (i) and Si is the conditional probability P (Xq (i)|Yi , Si ). The factor
node fsi represents the a priori probability of the channel state.
Here, we assume that the channel states are independent of each other. For
most cases, the channel states are correlated to the adjacent channel states due to
the contiguity of tampering. Figure 5.2(c) shows one example. The next section
will describe spatial models on the channel states to exploit the spatial contiguity of
tampered regions.
69
CHAPTER 5. TAMPERING LOCALIZATION
5.3
Spatial Models for State Nodes
Figure 5.4 depicts a modified decoder factor graph. The spatial correlation of the
channel states are now considered using two spatial models: 1D and 2D Markov
models. In the previous section, the independent model in Figure 5.3 factorizes the
5
joint distribution of the channel states as P (S) = i P (Si ) by assuming each chan-
nel state is memoryless. This section considers a 1D Markov model that gives P (S) =
5
5
P (S1 ) i P (Si |Si−1 ), and a 2D model P (S) = P (Sboundary ) i P (Si |Si−1 , Si−w , Si−w−1 ),
where Sboundary are the channel states located at the left and top image boundaries
and w is the number of the image projections in a row. The factor graphs of 1D and
2D models are shown in Figure 5.5.
IC,.&+4"
J+."3
L
K*$
J+."3
L
L
C
$
# ?( @
8
$
# ?( @
D
$
# ?( @
L
L
L
*)(
, (* ) ! +
, +( ! * )
I@#$*#BFA+."B3F1+&FI$#$"FJ+."3
Figure 5.4: Factor graph for the localization decoder with spatial models. The factor
nodes fbi pass belief messages of the channel states to the spatial models which capture
the contiguity of tampering blocks. The spatial models then return the channel state
beliefs back to the factor nodes fbi .
70
CHAPTER 5. TAMPERING LOCALIZATION
The spatial models receive messages uifb →s from factor nodes fbi , and reply with
messages uis→fb . The decoder for the 1D Markov model in Figure 5.5(a), parameterized by probability ft (Si , Si−1 ) = P (Si |Si−1 ), achieves this with one iteration of the
Baum-Welch algorithm [19]. That is
Forward recursion: ui+1
ft →s (s) ∝
Backward recursion:
vfi−1
(s)
t →s
∝
&
si
&
si
P (s|si )uift →s (si )
P (si |s)vfi t →s (si ).
Then the 1D spatial model returns the messages, uifb →s (s) ∝ uift →s (s)vfi t →s (s), which
are normalized so that uis→fb (0) + uis→fb (1) = 1.
The 2D Markov random field in Figure 5.5(b) is parameterized by probability ft (Si , Si−1 , Si−w , Si−w−1) = P (Si |Si−1 , Si−w , Si−w−1), and so employs a modified
Baum-Welch iteration similar to that of [187]. The forward and backward recursions
are
uift →s (s) =
vfi t →s (s) =
&
si−1 ,si−w ,si−w−1
&
si+1 ,si+w ,si+w+1

P (s|si−1, si−1 , si−w−1)

P (si+w+1|si+1 , si+w , s)
4
j∈{i−1,i−w,i−w−1}
4
j∈{i+1,i+w,i+w+1}

ujft →s (sj )ujfb →s (sj )

vfjt →s (sj )ujfb →s (sj ) .
The resulting message uis→fb is given by
uis→fb (s) ∝ uift →s (s)vfi t →s (s),
which is normalized such that uis→fb (0) + uis→fb (1) = 1.
In summary, the decoder runs LDPC decoding of the Slepian-Wolf bitstream
S(Xq ) using the side information Y and yields the beliefs of bit nodes of Xq . The
decoder then generates the beliefs of channel states based on fbi = P (Xq |Y, S). These
belief messages pass through one of the spatial models (IID in the previous section,
1D, or 2D). The returning channel state belief messages summarize all the incoming
71
CHAPTER 5. TAMPERING LOCALIZATION
, +( ! * )
, (* ) ! +
C
, (*"0 !
+
, (*0 ! +
"(./
*0
"(
*0
2
C
2(*#0 !
+
"(1/
(
*0 ! +
(a)
"(.3./
*0
"(./
"(.31/
"(.3
, +( ! * )
, (* ) ! +
, (*0 ! +
"(
2(*0 ! +
*0
"(1/
(b)
Figure 5.5: Factor graphs of spatial models for the channel states (a) 1D Markov
chain (b) 2D Markov random field.
messages from the factor nodes fbi with the spatial model. The factor nodes fbi then
update the belief messages of each bit node of Xq using the new channel state beliefs
and side information Y . The LDPC decoder takes the updated bit belief messages for
the next iteration. An extension of the update in the factor nodes fbi also considers
the side information that has undergone some adjustments by using an EM algorithm as described in the previous chapter. The next section will discuss tampering
localization in an inauthentic image that has undergone some adjustments.
72
CHAPTER 5. TAMPERING LOCALIZATION
5.4
Tampering Localization
for Contrast and Brightness Adjusted Images
The authentication of images that have undergone some global adjustments was discussed in the previous chapter. The same challenge also arises in the tampering
localization problem. The localization decoder that is unaware of adjustment will
deem the legitimately adjusted image blocks as tampered ones. This makes the tampering localization result useless. This section presents our solution which combines
the localization decoder with an EM algorithm to learn the adjustment parameters.
From the perspective of the EM algorithm, the parameter estimation additionally uses
the channel state information to learn the adjustment parameters; from the perspective of the localization decoder, the factor graph node fbi using the side information
compensates for the adjustment using the estimated parameters. Though this section
only describes the localization decoder for contrast and brightness adjustment, the
same principle can apply to other adjustments.
Figure 5.6 is an extended model of Figure 5.1 that additionally includes a global
contrast and brightness adjustment. The tampering localization system described in
the previous sections is not robust against the contrast and brightness adjustment.
This is because the affine relationship is preserved by the random image projection
due to its linearity; that is, Y = αX + β + Z. The localization decoder that is
unaware of this adjustment will provide useless tampering localization information.
This section describes an extended localization decoder that can correctly localize the
tampering in contrast and brightness adjusted images using an EM algorithm.
!"#$#%&'
)*&$+ -
Q11*,"F-+,$&#3$F#,.F
K&*/R$,"33FQ.S%3$4",$
MN'O7555PMN'O
"(45
A#B*)*+%3F
2#4@"&*,/
5&"$+2()*&$+
!
"(4/
Figure 5.6: Space-varying two-state lossy channel with contrast and brightness adjustment. The target image now is affected by a global contrast and brightness
adjustment.
73
CHAPTER 5. TAMPERING LOCALIZATION
The introduction of learning to the tampering localization system only requires
that the Slepian-Wolf decoder block of Figure 5.4 be embedded within the contrastand-brightness-learning loop of Figure 5.7. As before, the decoder takes the SlepianWolf bitstream S(Xq ) and the side information Y , yields the reconstructed image
projection Xq , and estimates the channel states Si . But it now does this via an EM
algorithm that updates the a posteriori probability mass functions (pmfs) Papp (Xq )
and Papp (Si ) in the E-step and updates contrast α and brightness β by maximum
likelihood estimation in the M-step.
9#/+()%<4"*&2#4%
!
F>2#*&2+/(G4%2"&>2(&%/(
="#$32%+>>(0/B1>2*+%2
9'+6#&%:;4'<
=#2>2"+&*("?#$@
9'+6#&%:;4'<
7+-4/+"
!'("
A&"&*+2+"(
F>2#*&2#4%
%&66?#$@E %&66?"@
,+-4%>2"1-2+/
)*&$+(A"4B+-2#4%
#$&
"
F>2#*&2+/(G3&%%+'(92&2+
Figure 5.7: Contrast and brightness learning Slepian-Wolf decoder for tampering
localization. The decoder decodes the Slepian-Wolf bitstream S(Xq ) using the side
information Y compensated with the previously estimated contrast and brightness
adjustment parameters. Each iteration produces a soft estimation of Xq and the
channel states S in the E-step and updates the contrast and brightness adjustment
parameters in the M-step.
In the E-step, the information in the a priori pmfs and the Slepian-Wolf bitstream
S(Xq ) are combined via one iteration of the sum-product algorithm over the localization decoder factor graph Figure 5.3 in Section 5.2 or Figure 5.4 in Section 5.3). This
produces a posteriori pmfs of the image projection pixels Qi (xq ) = Papp (Xq (i) = xq )
and the channel states Papp (Si = 0), denoted as w(i).
74
CHAPTER 5. TAMPERING LOCALIZATION
In the M-step, the lower bound L̂(α, β) of the log-likelihood function is adapted
to include the channel states in the statistical model as follows:
L(α, β) ≡
=
&
log P (Xq (i), Y (i), Si ; α, β)
i∈C
&
log
i∈C
≥
)
&&
i∈C
1
&
P (Xq (i), Y (i)|Si = s; α, β)φs(s)
s=0
Qi (xq ) log
xq
)
1
&
*
P (xq , Y (i)|Si = s; α, β)φs(s)
s=0
*
≡ L̂(α, β),
(5.2)
where the untampered distribution P (Xq (i), Y (i)|Si = 0; α, β) is a quantized Gaussian with mean (Y (i) − β)/α, variance σ 2 /α2 , and uniform tampered distribution
P (Xq (i), Y (i)|Si = 1; α, β). The quantization of X is uniform and saturated for X
less than 0 or greater than 255. The a priori pmf over channel states is denoted by
φs (s). Setting partial derivatives of L̂(α, β) with respect to α and β to zero, we obtain
the optimality conditions1 :
" "
w(i)µix Y (i) − i∈C j∈C w(i)w(j)µixY (j)
α=
"
" "
W i∈C wi µix2 − i∈C j∈C w(i)w(j)µix µjx
1 &
β=
w(i)(Y (i) − αµix )
W i∈C
W
"
i∈C
φs (0) = W/|C|, and φs (1) = 1 − φs (0),
&
where W =
w(i)
i∈C
µix =
&
Qi (xq )E[X(i)|q(X(i)) = xq , Y (i); α, β]
xq
µix2 =
&
xq
1
Qi (xq )E[X(i)2 |q(X(i)) = xq , Y (i); α, β]
Appendix B discusses the concavity of L̂(α, β) to claim the optimal conditions.
CHAPTER 5. TAMPERING LOCALIZATION
75
Since both the left and right hand sides of the optimality conditions contain α and
β, we update them iteratively until convergence or at most 30 iterations. The outer
loop of EM iterations terminates when hard decisions on Papp (Xq (i) = xq ) satisfy the
constraints imposed by S(Xq ). Finally, each block Bi is declared to be tampered if
Papp (Si = 1) > T , a fixed decision threshold. Note that by setting wi = 1 for all i,
this result reduces to the EM algorithm for the authentication problem.
The localization system now can localize the tampering in contrast and brightness adjusted tampered images. The same framework can also apply to the other
adjustments by including the channel state estimations into parameter estimation.
5.5
Simulation Results
This section shows the simulation results of the tampering localization decoder. In
practice, the localization decoder would only run if the authentication decoder deems
an image to be tampered, so we test the tampering localization system only with maliciously tampered images. Section 5.5.1 describes the simulation setup. Section 5.5.2
shows the minimum rates for successful decoding for various tampered images using
different spatial models. In Section 5.5.3, we measure the failure rates of tampering
localization.
5.5.1
Setup
We use test images in the simulation, shown in Appendix A, at 512×512 resolution
in 8-bit gray resolution. The space-varying two-state channel in Figure 5.1 applies
JPEG2000 or JPEG compression and reconstruction at several qualities above 30
dB. The malicious tampering consists of the overlaying of up to five text banners of
different sizes at random locations in the image. The text banner sizes are 198×29,
29×254, 119×16, 16×131, and 127×121 pixels. The text color is white or black,
depending on which is more visible, again avoiding generating trivial attacks, such as
overlaying white text on a white area.
CHAPTER 5. TAMPERING LOCALIZATION
5.5.2
76
Decodable Rate
We first compare the minimum localization data rate required by the localization
decoder. Figure 5.8 shows the Slepian-Wolf bitstream components S(Xq ) of these
rates (in bits per pixel of the original image x) for Lena with Xq quantized to 4
bitplanes. All five text banners are placed for malicious tampering, because greater
tampering makes tampering more easily detected, but makes localization more difficult. The placement is random for 100 trials, leading to tampering of 12% to 17% of
the nonoverlapping 16×16 blocks of the original image x. To successfully decode the
localization data using tampered target images as side information, the DSC localization decoders using spatial models (IID, 1D, and 2D) achieve much lower localization
data rates than the DSC decoder using a legitimate model whose rate is close to the
conventional fixed length coding. The DSC localization decoders using spatial models
save the localization data size around 65% compared to the conventional fixed length
coding. The 1D and 2D spatial models offer additional 12% and 15% savings, respectively, compared to the independent spatial model. The required localization rate
is roughly three times the required authentication rate. Since we use rate-adaptive
LDPC codes [188], the localization decoder re-uses the authentication data and only
requires the incremental localization data rates (the gaps to the authentication data
rate) to discover not only the location of the tampering but also the magnitude of
the tampering. In the worst case over all trials, the largest bitstream sizes are 232,
208, and 192 bytes for the independent, 1D and 2D spatial models, respectively.
5.5.3
Receiver Operating Characteristic
Using these Slepian-Wolf bitstream sizes, we measure various failure rates. Since our
system is block based, we measure the false rejection rate by counting the falsely
deemed tampered blocks, and false acceptance rate by counting the undetected tampered pixels. The rate of falsely deemed tampered blocks is the proportion of untampered blocks mistaken for tampered blocks. The rate of undetected tampered pixels
is the portion of tampered pixels accepted as untampered pixels. Figure 5.9 shows the
receiver operating characteristic (ROC) curves of the tampering localization decoders
77
CHAPTER 5. TAMPERING LOCALIZATION
0.02
Rate (bits per pixel of the original image)
0.018
0.016
0.014
0.012
0.01
0.008
Fixed Length Coding
DSC with Legitimate Model
DSC with IID Spatial Model
DSC with 1D Spatial Model
DSC with 2D Spatial Model
Authentication Data
0.006
0.004
0.002
0
30
32
34
36
38
Legitimate Reconstruction PSNR (dB)
40
42
Figure 5.8: Minimum localization data rates for decoding S(Xq ) using tampered
side information compared to the authentication data rates. The decoder with a
legitimate model requires a high rate close to the conventional fixed length coding.
The decoder with spatial models gives around 65% rate savings compared to the fixed
length coding. The 1D and 2D spatial models offer additional 12% and 15% savings,
respectively, compared to the independent spatial model. Using rate-adaptive LDPC
codes, only the incremental localization data rates (the gaps to the authentication
data rate) are sent.
using spatial models for X quantized to 4 bits as the decision threshold T varies. The
rates of falsely deemed tampered blocks can reach zero, while keeping the undetected
tampered pixel rates near 2%, since most of the blocks falsely deemed untampered
have only a few tampered pixels.
We now test the EM localization decoder by additionally applying contrast and
brightness adjustment on the same set of test images. The contrast and brightness
adjustment parameters α and β are drawn uniformly at random from [0.8, 1.2] and
[−20, 20], respectively. The Slepian-Wolf bitstream is set to 278 bytes, the largest
minimal Slepian-Wolf data size of the EM decoder among 50 training trials. We compare the performance of the EM localization decoder that learns α and β from initial
78
CHAPTER 5. TAMPERING LOCALIZATION
0.028
Independent
1D Spatial
2D Spatial
Undetected Tampered Pixels
0.026
0.024
0.022
0.02
0.018
0.016
0.014
0
0.002
0.004
0.006
0.008
Falsely Deemed Tampered Blocks
0.01
Figure 5.9: ROC curves of the tampering localization decoders using spatial models.
The rates of falsely deemed tampered blocks can reach zero, while keeping the undetected tampered pixel rates at about 2%, since most of the blocks falsely deemed
untampered have only a few pixels tampered. In most cases, 1D and 2D spatial models achieve a lower undetected tampered pixel rate at a given falsely deemed tampered
block rate.
value 1 and 0, the oracle decoder that knows α and β, and the decoder unaware of
adjustment that assumes α = 1 and β = 0. Independent, 1D and 2D spatial models
are applied on all decoders for testing. Figure 5.10 plots these ROC curves as the
decision threshold T varies. The results indicate that the localization performance of
the EM decoder is close to the oracle decoder, while the decoder unaware of adjustment has high failure rates of falsely deemed tampered blocks due to inconsistency in
the contrast and brightness parameters. 1D and 2D spatial models offer additional
improvement for the EM decoder.
79
CHAPTER 5. TAMPERING LOCALIZATION
0.12
Undetected Tampered Pixels
0.1
0.08
EM Decoder with Independent Model
EM Decoder with 1D Spatial Model
EM Decoder with 2D Spatial Model
Oracle Decoder with Independent Model
Oracle Decoder with 1D Spatial Model
Oracle Decoder with 2D Spatial Model
Unaware of Adjustment with Independent Model
Unaware of Adjustment with 1D Spatial Model
Unaware of Adjustment with 2D Spatial Model
0.06
0.04
0.02
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Falsely Deemed Tampered Blocks
0.7
0.8
0.9
1
Figure 5.10: ROC curves of the tampering localization decoders facing contrast and
brightness adjusted images. The localization performance of the EM decoder is close
to the oracle decoder, while the decoder unaware of adjustment has high failure rates
of falsely deemed tampered blocks due to inconsistency in the contrast and brightness
parameters. The 1D and 2D spatial models offer additional improvement for the EM
decoder.
5.6
Summary
The image authentication system using distributed source coding is extended to perform tampering localization in images already deemed to be tampered. The system
decodes the authentication data plus incremental localization data using the sumproduct algorithm over the factor graph representing the space-varying two-state
channel model. The localization decoder can work jointly with EM algorithms to
learn adjustment parameters for the images that have undergone legitimate adjustments. Simulation results demonstrate that the system can decode the Slepian-Wolf
bitstream at a low rate even when side information is tampered and contrast and
brightness have been adjusted. 1D and 2D spatial models capturing the contiguity
of tampering additionally reduce the localization data size by 12% to 15% and offer
better localization performance compared to the independent model.
Chapter 6
Video Quality Monitoring
Using Distributed Source Coding
Digital video delivery involves coding the video content into a bitstream, transmitting
the bitstream to end users, and reconstructing the video from the received (possibly
transcoded or damaged) bitstream. Distortions are introduced in these steps. Lossy
video coding compresses video content into small bitstreams but induces compression
distortions. Packets might be lost during the transmission, especially through wireless
links. The decoder at the end user tries to reconstruct the video content from the
incoming packets with error protection and concealment to mitigate the distortion
due to packet loss. To ensure the quality of service for the whole video delivery
system, the first step is to monitor the fidelity of the received video.
This chapter presents and investigates a reduced-reference video quality monitoring scheme using distributed source coding. Section 6.1 describes the scheme in detail
and the rationale for using distributed source coding and maximum likelihood estimation of the received video quality. Section 6.2 provides a theoretical performance
prediction based on the Cramér-Rao lower bound for maximum likelihood quality
estimation. In Section 6.3, our approach is compared to the ITU-T J.240 Recommendation for remote PSNR monitoring and the theoretical performance prediction
is confirmed.
80
81
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
6.1
Video Quality Monitoring System
Figure 6.1 depicts the proposed quality monitoring system using distributed source
coding. We denote the original video as y and the received video as x. Each user
provides a video digest consisting of a Slepian-Wolf coded random projection of the
received videos. The quality monitoring server uses the projection of the original video
as side information to decode the video digest. It then analyzes the projections to
estimate the quality in terms of reconstruction Peak Signal to Noise Ratio (PSNR).
This architecture is advantageous for two reasons. The users are responsible for
Slepian-Wolf encoding, which is much less computationally demanding than SlepianWolf decoding. We first describe the overall operation of the system, leaving the
details of the pseudorandom projection to Section 6.1.1 and analysis methods to
Section 6.1.2.
>5(;(/#+&N(0%3
+
W%.#*?=/&'9F%99%*'
G&'.*F
2/*H#-;%*'
'(0%&?/C35<#$(3/&
"%*3/)$5@*$%0&N(0%3&
#
G53H%*$(3/
'+%.(#/AB3+C
!-!
D($)$5%#<126!-8
!"#$%&'()*"+?
:'&"B9%9
0#-*.#/
O)$(<#$%0&P%#/&
'Q@#5%0&O5535
"%*%(I%0
N(0%3
*
"#/03<&'%%0
/0
G&'.*F
2/*H#-;%*'
N(0%3&G53H%*$(3/
!
!"#$%&'()*"+
,'-*.#/
I8&';%J&;%*'
E@#/$(F%0&N(0%3&
G53H%*$(3/
!-
Figure 6.1: Proposed video quality monitoring scheme using distributed source coding. The video digest from the receivers consists of the random seed and the SlepianWolf coded quantized projection of the received video. The pseudorandom projection
follows the ITU-T J.240 Recommendation described in Figure 6.2. The server uses
the original video projection as side information to decode the incoming video digest and yields reconstructed video projections. The mean squared error between the
original and the received video are estimated using the original and the reconstructed
quantized projection.
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
82
The right-hand side of Figure 6.1 shows the user receiving video. It applies a
pseudorandom projection (based on a randomly drawn seed Ks ) to its received video
x and quantizes the projection coefficients X to yield Xq . These quantized coefficients
are then coded by a Slepian-Wolf encoder based on low-density parity-check (LDPC)
codes [111, 112]. The user sends the Slepian-Wolf bitstream S(Xq ) as a video digest
back to the quality monitoring server (shown on the left-hand side of Figure 6.1)
through a secure channel.
The user pseudorandomly generates a J.240 projection as an Nb × Nb block P
according to a seed Ks . The seed changes for each frame and is communicated
to the quality monitoring server along with the Slepian-Wolf bitstream. For each
nonoverlapping block Bi of x, the inner product +Bi , P, is quantized into an element
of Xq . The rate R of Slepian-Wolf bitstream S(Xq ) is determined by the joint statistics
of Xq and Y . If the conditional entropy H(Xq |Y ) exceeds the rate R, then Xq can
no longer be correctly decoded [167]. Therefore, we choose the rate R to be just
sufficient to decode given x at the worst permissible quality.
Upon receiving the video digest, the quality monitoring server first projects the
original video y into Y using the same projections as at the user. A Slepian-Wolf decoder reconstructs Xq& from the Slepian-Wolf bitstream using Y as side information.
Decoding is via the LDPC message-passing algorithm [111, 112] initialized according
to the statistics of the worst permissible degradation for the given original video. Finally, the quality monitoring server analyzes the reconstructed projection Xq& and the
projection Y of the original video to estimate video quality in terms of reconstruction
PSNR.
6.1.1
J.240 Feature Extraction
For quality estimation, we use the projection defined in the feature extraction module
(shown in Figure 6.2) of the J.240 Recommendation [80]. Each Nb × Nb block Bi is
whitened in both spatial and Walsh-Hadamard Transform (WHT) domains using
pseudorandom number (PN) sequences s and t, respectively, to yield block Fi . Each
element in s and t is either 1 or -1. From this block, a single feature pixel Fi (k) is
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
83
selected. Casting Bi and Fi as 1-D vectors, we can write
−1
Fi = H
THS= Bi
: ;<
GP
where H is the WHT matrix (cast from the 2D WHT), and S and T are diagonal
whitening matrices with entries s and t, respectively. The projection P that produces
Fi (k) is the k th row of GP .
'%+%*$%0&S%#$@5%&
S%#$@5%&&D+3*R
G(U%+#"6688
"6
)D=(N
!#"#-;%*'
?<#;%&D+3*R
!6
)D=
GT&'%Q@%/*%
"
GT&'%Q@%/*%
#
Figure 6.2: Random projection of J.240 feature extraction module. An image or video
frame is divided into nonoverlapped blocks. Each block is whitened in both spatial
and Walsh-Hadamard transform domains. A whitened pixel is selected as the feature
pixel to represent the block for the PSNR estimation.
6.1.2
PSNR Estimation
In the ITU-T J.240 Recommendation, estimated PSNR (ePSNR) between x and y is
computed as follows:
eMSEJ240
N
Q2s & &
(X (i) − Yq (i))2
=
N i=1 q
ePSNRJ240 = 10 log10
2552
,
eMSEJ240
(6.1)
(6.2)
where N is the number of samples, Yq is the quantized version of Y , and Qs is the
quantization step size of Yq and Xq& .
Since the quality monitoring server has complete information of Y , Valenzise et
al. suggest reconstructing X & based on MMSE estimation given Y and Xq& , and then
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
84
estimating the MSE using Y and X & [183].
X & (i) = E[X|Xq& (i), Y (i)]
eMSEMMSE
N
1 &
E[(X & (i) − Y (i))2 ]
=
N i=1
ePSNRMMSE = 10 log10
2552
eMSEMMSE
(6.3)
(6.4)
(6.5)
In our system, we propose maximum likelihood estimation of the MSE between x and
y directly from Xq& and Y as follows:
eMSEML
N
1 &
E[(X − Y (i))2 |Y (i), Xq& (i)]
=
N i=1
ePSNRML = 10 log10
2552
eMSEML
(6.6)
(6.7)
We will compare the quality estimation performance of these three estimators in
Section 6.3.
Compression of Xq using distributed source coding is much more efficient then
using conventional coding. Figure 6.3 depicts the distributions of X and X − Y of
the first 100 frames of the Foreman sequence at CIF resolution, and shows that X
has a large variance whereas X and Y are highly correlated. We model X|Y as a
Gaussian with mean Y and variance σz2 , which is unknown at the decoder but can be
estimated.
6.2
Performance Prediction
with the Cramér-Rao Lower Bound
The performance of MSE or PSNR estimation is related to the quantization of X
and the number of samples, denoted as N. More precise representation of X or more
samples yields lower estimation error, but requires a higher rate to deliver Xq . This
section analyzes the tradeoffs between video digest size and estimation error with
85
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
−1
10
−2
Probability
10
−3
10
−4
10
X PMF
X−Y PMF
−5
10
−500
0
Value
500
Figure 6.3: Distributions of X and X − Y of the first 100 frames of the Foreman
sequence at CIF resolution. The projection X has a large variance whereas X and Y
are highly correlated.
different configurations of block size and quantization. A more general information
theoretical analysis has been addressed by Han et al. [66], where a lower (achievable)
bound of coding X is provided, but the converse part remains open. Although the
analysis in this section is dedicated to our practical video quality monitoring system,
the result can be applied to other systems that require variance estimation using
quantized information. We first derive the performance prediction as a function of N
and the quantization of X in Section 6.2.1 and then use synthesized data to confirm
the result in Section 6.2.2.
6.2.1
Performance Prediction
Let X(i) and Y (i) be i.i.d. continuous random variables with the relationship: X(i)−
Y (i) = Z(i) ∼ N (0, σz2), for i = 1, . . . , N. The target parameter is θ ! σz2 . The case
in which X and Y are available at the same terminal is well-studied: one can estimate
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
86
θ locally and transmit the estimation result θ̂ to the server. In our case, the remote
terminals have access to X, but Y is only available at the server. Here, we focus on
the scheme that uses a scalar quantizer cascaded by a Slepian-Wolf encoder. Let Xq (i)
be the quantized version of X(i) with a scalar quantizer Q(.), i.e. Xq (i) = Q(X(i)).
Therefore, the achievable rate Rx for Slepian-Wolf coding is H(Xq |Y ; σl2 ), where σl2
is the noise variance at the worst permissible quality. This leaves the problem of
relating the estimation error to the quantizer and number of samples.
Note that our maximum-likelihood estimator of θ = σz2 in (6.6) is unbiased, i.e.,
N
1 &
E[θ̂ML ] = E[
EX [(X − Y (i))2 |Y (i), Xq (i)]]
N i=1
(6.8)
N
1 &
E[EX [(X − Y (i))2 |Y (i), Xq (i)]]
(6.9)
=
N i=1


N
&
&
1
=
E
EX [(X − Y (i))2 |Y (i), Xq (i) = xq ]P (Xq (i) = xq |Y (i))
N i=1
x
q
(6.10)


+
2
N
&
&
(x − Y (i)) f (x|Y (i))dx
1
x:Q(x)=xq
P (Xq (i) = xq |Y (i))
E
=
N i=1
P
(X
(i)
=
x
|Y
(i))
q
q
x
q
(6.11)
=
=
N
1 &
E
N i=1
>?
x
@
2
(x − Y (i)) f (x|Y (i))dx
N
1 &
E[EX [(X − Y (i))2 |Y (i)]]
N i=1
=θ
(6.12)
(6.13)
(6.14)
where (6.9) is due to linearity of expectation, (6.10), (6.11), and (6.13) by definition
of expectation, (6.12) due to concatenations of the integral intervals over all xq , and
(6.14) by iterated expectation.
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
87
According to the Cramér-Rao theorem [37, 148], for any unbiased estimator, the
variance of the estimator is bounded by the inverse of Fisher information [133] of θ:
'A
B2 (
∂ log f (Xq , Y ; θ)
I(θ) = E
∂θ
D
N C
= 4 E E 2 [(X − Y )2 |Xq , Y ] − θ2
4θ
Note that here we assume P (X|Y ) ∼ N (Y, θ = σz2 ). Therefore, we obtain the Cramér-
Rao lower bound (CRLB) of mean squared estimation error for any unbiased estimator:
E[(θ̂ − θ)2 ] = Var[θ̂] ≥
1
4θ4
=
I(θ)
N(G − θ2 )
(6.15)
where G = E [E 2 [(X − Y )2 |Xq , Y ; θ]] is a function of the quantizer. Note that the
result in (6.15) generalizes the variance estimation error lower bound to quantized
information. The null quantization function Q(X) = X gives G = 3θ2 and the result
in (6.15) yields E[(θ̂ −θ)2 ] ≥
2θ 2
,
N
which is exactly the CRLB of the unbiased Gaussian
variance estimator.
Since the
F log-likelihood function log f (Xq , Y ; θ) is twice differenE
∂ log f (Xq ,Y ;θ)
= 0, by Cramér-Rao theorem, the maximum-likelihood
tiable and E
∂θ
estimator is efficient, i.e., E[(θ̂ML − θ)2 ] =
4θ 4
.
N (G−θ 2 )
The last step is to relate the estimation error in MSE to that in PSNR in dB.
Recall that PSNR = 10 log10
2552
(dB),
MSE
for 8-bit gray level images with peak value of
255. Assuming the estimation is close enough to the true value to use the first order
approximation, we obtain
A
B
dPSNR
10
.PSNR ≈
.θ = −
.θ
dθ
θ log(10)
B2 E
A
F
C
D
10
2
E |θ̂ − θ|2
E |ePSNR − PSNR| ≈
θ log(10)
88
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
Hence, the approximation of the mean squared estimation error of ePSNRML is
C
D
E |ePSNRML − PSNR|2 ≈
6.2.2
A
10
θ log(10)
B2
4θ4
75.44 × θ2
=
(dB2 )
N(G − θ2 )
N(G − θ2 )
(6.16)
Synthesized Data Simulation
To confirm the performance prediction results in the previous subsection, we randomly
generate X of size N uniformly from [0, 255], Z of size N according to N (0, σz2 ), and
Y = X + Z. X is further quantized into a different number of bits to yield Xq . For
each configuration, we generate 20,000 sets of data, and measure the estimated MSE
using maximum likelihood estimation. Figure 6.4 shows the estimation error in MSE
for a varying number of samples N, number of bits in quantization, and σz2 in charge.
These results confirm that the efficiency of maximum likelihood estimation matches
the Cramér-Rao lower bound of (6.15) for various settings.
Figure 6.5 plots the average squared PSNR estimation errors in dB2 for different
numbers of samples N, and different numbers of bits in quantization. The estimation errors are averaged over the ground truth σz2 which yields reconstruction PSNR
of {26, 28, . . . , 38} dB. The results indicate that the performance of the maximum
likehood estimation is close to the performance prediction in (6.16).
After confirming the prediction of the PSNR estimation performance with quantization, we are now interested in the overall trade-offs between the video digest data
size and the PSNR estimation performance. We consider PSNR estimation of each
16-frame group of pictures (GOP) in 30 frames per second (fps) video sequences in
CIF or QCIF format at the worst permissible quality of 26 dB. The quality monitoring scheme uses uniform quantization and the ITU-T J.240 feature extraction of
8×8, 16×16, or 32×32 block projections. This yields different numbers of samples
to transmit. A lower bound of the trade-offs consists of (6.16) for PSNR estimation performance and the conditional entropy H(Xq |Y ) for the ideal Slepian-Wolf bit
rates. This lower bound can be generated from the model without simulation data.
For synthesized data simulation, we run 100 trials for each configuration and plot the
largest minimum decodable rate among the trials as the practical Slepian-Wolf bit
89
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
350
80
Simulation
Performance Prediction
250
200
150
100
50
0
1
60
50
40
30
20
10
2
3
4
5
6
Number of Bits in Quantization
7
0
1
8
(a) N=1584, PSNR=26dB
2
3
4
5
6
Number of Bits in Quantization
7
8
(b) N=6336, PSNR=26dB
5
1.4
Simulation
Performance Prediction
Simulation
Performance Prediction
1.2
4
Averaged |MSE − eMSE|2
Averaged |MSE − eMSE|2
Simulation
Performance Prediction
70
Averaged |MSE − eMSE|2
Averaged |MSE − eMSE|2
300
3
2
1
1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
Number of Bits in Quantization
7
(c) N=1584, PSNR=38dB
8
0
1
2
3
4
5
6
Number of Bits in Quantization
7
8
(d) N=6336, PSNR=38dB
Figure 6.4: These plots show the estimation errors of MSE using maximum likelihood
estimation with different configurations: number of samples, σz2 in charge, and number of bits in quantization of X. These results confirm the efficiency of maximum
likelihood estimation and the derivation of the Cramér-Rao lower bound.
rate. The practical Slepian-Wolf decoder uses rate-adaptive LDPC codes [188] conditionally decoded as in [7]. Figure 6.6 plots the average squared PSNR estimation
error versus Slepian-Wolf bit rate for different block sizes as we vary numbers of bits
in quantization. It shows that the predicted performance closely matches the synthesized data simulation results. The results suggest using 16×16 block projection for
video digest rates less than 50 kbps and 20 kbps for CIF and QCIF video sequences,
respectively.
90
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
0.7
0.14
Simulation
Performance Prediction
Simulation
Performance Prediction
0.12
Averaged |PSNR − ePSNR|2
Averaged |PSNR − ePSNR|2
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0.08
0.06
0.04
0.02
0
1
2
3
4
5
6
Number of Bits in Quantization
7
0
1
8
2
(a) N=1584
3
4
5
6
Number of Bits in Quantization
7
8
(b) N=6336
Figure 6.5: These plots show the estimation errors of PSNR using maximum likelihood
estimation with different configurations: number of samples N, and number of bits
in quantization of X.
0
1
10
−1
ML
10
− PSNR|2 (dB2)
8x8 Simulation
16x16 Simulation
32x32 Simulation
8x8 Model
16x16 Model
32x32 Model
Averaged |ePSNR
Averaged |ePSNRML − PSNR|2 (dB2)
10
−2
10
−3
10
8x8 Simulation
16x16 Simulation
32x32 Simulation
8x8 Model
16x16 Model
32x32 Model
0
10
−1
10
−2
10
−3
−1
10
0
1
2
10
10
10
Slepian−Wolf Bit Rate (kbps)
3
10
(a) CIF
10
−1
10
0
1
10
10
Slepian−Wolf Bit Rate (kbps)
2
10
(b) QCIF
Figure 6.6: Average squared PSNR estimation error of a 16-frame GOP versus
Slepian-Wolf bit rate for 30 fps video in (a) CIF and (b) QCIF formats. The predicted
performance using (6.16) and conditional entropy H(Xq |Y ) matches the synthesized
data simulation results.
6.3
Experimental Results
We compare our quality monitoring scheme to various solutions based on ITU-T
J.240 Recommendation. We use original videos consisting of the first 160 frames of
Foreman, Football, News, Mobile, and Coastguard CIF video sequences at 30 frames
91
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
per second (fps) for simulation. To create received videos, the video sequences are
first compressed and reconstructed by H.264 with quantization parameters (QP) 21,
24, and 26, for I-, P- and B- pictures, respectively. The group of picture (GOP)
coding structure is IBBPBBP and GOP size is 16 frames. Then the compressed
video is transcoded into CIF or QCIF resolution with GOP structure IPPP, GOP
size 16 frames, and QP at most 38. The reconstruction yields the received videos.
Figure 6.7 plots the distortion-rate curves of test video sequences. The reconstruction
42
42
40
40
38
36
34
32
Foreman
Football
News
Mobile
CoastGuard
30
28
26
0
500
1000
1500 2000
Rate (kbps)
(a) CIF
2500
3000
3500
Reconstruction PSNR (dB)
Reconstruction PSNR (dB)
PSNR varies from 26 dB to 38 dB.
38
36
34
32
Foreman
Football
News
Mobile
CoastGuard
30
28
26
0
200
400
Rate (kbps)
600
800
(b) QCIF
Figure 6.7: Distortion-rate curves of the transcoded test video sequences of (a) CIF
and (b) QCIF. The reconstructed PSNR varies from 26 dB to 38 dB.
At the user, a video digest unit consists of 16 frames. In the simulations, we
vary the quantization of the random projection coefficients to different numbers of
bitplanes. Each bitplane is coded at the Slepian-Wolf encoder using rate-adaptive
LDPC codes [188] with block size of 6336 bits for each CIF bitplane and 1584 bits for
each QCIF bitplane. At the quality monitoring server, the bitplanes are conditionally
decoded as in [7].
Figure 6.8 shows the root mean square (RMS) PSNR estimation error as we vary
the number of bits in quantization, comparing the maximum likelihood in (6.6), J.240
PSNR estimation in (6.1), and MMSE reconstruction estimation in (6.3). Each point
92
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
represents the RMS PSNR estimation error
G "
N
1
i=1
N
|ePSNRi − PSNRi |2 of the lu-
minance component over 350 measurements using five video sequences, 10 GOPs per
sequence, and seven transcoding QPs from the set (26, 28, ..., 38). Figure 6.8 indicates that we can obtain PSNR estimation error of just 0.2 dB with maximum
likelihood estimation using 7- and 8-bit quantization for CIF and QCIF, respectively.
J.240 uses both quantized X and Y and leads to inefficient PSNR estimation. The
PSNR estimation using the MMSE reconstruction of X always underestimates the
reconstruction MSE and yields large estimation error.
2
2
10
RMS of |ePSNR − PSNR| (dB)
RMS of |ePSNR − PSNR| (dB)
10
1
10
0
10
−1
10
−2
10
0
MMSE Reconstruction
J.240
Maximum Likelihood
2
4
6
8
Number of Bits in Quantization
(a) CIF
1
10
0
10
−1
10
−2
10
12
10
0
MMSE Reconstruction
J.240
Maximum Likelihood
2
4
6
8
Number of Bits in Quantization
10
12
(b) QCIF
Figure 6.8: RMS PSNR estimation error versus the number of bits in the quantization
of X. The PSNR estimation using MMSE reconstruction always underestimates the
MSE and yields large estimation error. The J.240 uses both quantized X and Y
and leads to inefficient PSNR estimation. Our maximum likelihood MSE estimation
achieves 0.2 dB using only 7- and 8-bit quantization for CIF and QCIF, respectively.
Figure 6.9 compares different combinations of estimation and coding methods by
depicting the RMS PSNR estimation error versus the video digest data rate in kilobits
per second (kbps), for videos at 30 fps. At RMS PSNR estimation error of 0.2 dB,
maximum likelihood estimation and distributed source coding can reduce the video
digest data rate up to 85% compared to the ITU-T J.240 Recommendation. This
enables mobile users to feedback the received QCIF video digest at a reasonable rate
of 7 kbps using distributed source coding, instead of 30 kbps with the ITU-T J.240
Recommendation. Figure 6.9 also depicts the performance lower bound derived from
93
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
(6.16) and the conditional entropy H(Xq |Y ) as rate measurements. Even though the
video data do not perfectly match the assumptions described in Section 6.2.1, the
lower bound closely matches the simulation results.
2
2
10
J.240
ML Estimation + Fixed Length Coding
ML Estimation + DSC
Lower Bound
1
10
0
10
RMS of |ePSNR − PSNR|(dB)
RMS of |ePSNR − PSNR| (dB)
10
J.240
ML Estimation + Fixed Length Coding
ML Estimation + DSC
Lower Bound
1
10
0
10
−1
10
−1
0
20
40
60
80
Bit Rate (kbps)
(a) CIF
100
120
140
10
0
5
10
15
20
Bit Rate (kbps)
25
30
35
(b) QCIF
Figure 6.9: RMS PSNR estimation error versus video digest data rates for videos at
30 fps. The maximum likelihood estimation lowers the PSNR estimation given the
same number of bits in the quantization of X. Distributed source coding exploits
the correlation between X and Y and yields a rate savings of 85% given the RMS
PSNR estimation error of 0.2 dB. The simulation results demonstrate that using maximum likelihood estimation and distributed source coding is close to the performance
prediction using the Cramér-Rao lower bound.
6.4
Summary
A rate-efficient video quality monitoring scheme using distributed source coding is
presented and investigated. In our scheme, each user sends a Slepian-Wolf coded
projection of its received video to the quality monitoring server. The server decodes
the projection using the original video as side information and then estimates MSE
using maximum likelihood estimation. Distributed source coding exploits the correlation between the original and received video projections and leads to significant
rate savings. We contribute a performance prediction of quality estimation for various system configurations using the Cramér-Rao lower bound. Distributed source
CHAPTER 6. VIDEO QUALITY MONITORING USING DSC
94
coding and maximum likelihood estimation offer up to 85% video digest rate savings
compared to the ITU-T J.240 Recommendation at the same performance. The performance prediction matches the simulation results for both synthesized and video
data.
Chapter 7
Conclusions and Future Work
7.1
Conclusions
This dissertation presents and investigates a novel image authentication scheme that
distinguishes legitimate encoding variations of an image from tampered versions based
on distributed source coding and statistical methods.
A two-state lossy channel model represents the statistical dependency between the
original and the target images. Tampered degradations are captured by using a statistical image model, and legitimate compression noise is assumed to be additive white
Gaussian noise. Dimensional reduction uses block projection to address the spatial
correlation of the tampering model and to distinguish the tampered from legitimate
degradations. Using Slepian-Wolf coding that exploits the correlation between the
original and the target image projections achieves significant rate savings.
The Slepian-Wolf decoder is extended to use Expectation Maximization algorithms
to address the target images that have undergone contrast, brightness, and affine
warping adjustment. The decoding loop iteratively estimates the editing parameters based on the side information and the soft information of the original image
projection.
The block projection offers a possibility to localize the tampering in an image that
has been deemed tampered. The localization decoder infers the tampered locations
and decodes the Slepian-Wolf bitstream by applying the message-passing algorithm
95
CHAPTER 7. CONCLUSIONS AND FUTURE WORK
96
over a factor graph. The factor graph represents the relationship among the SlepianWolf bitstream, projections of the original image and the target image, and the block
states. Spatial models are applied to the block states to exploit the spatial correlation
of the tampering. Simulation results demonstrate that the system can decode the
Slepian-Wolf bitstream at a low rate even when side information is tampered and
its contrast and brightness are adjusted. 1D and 2D spatial models exploiting the
contiguity of tampering additionally reduce the localization data size by 12% to 15%
and offer better localization performance compared to the independent model.
In addition to the image authentication system, this dissertation explores a rateefficient video quality monitoring scheme using distributed source coding. In our
scheme, each user sends a Slepian-Wolf coded projection of its received video to
the quality monitoring server. The server decodes the projection using the original
video as side information and then the MSE of the received video is estimated using
maximum likelihood estimation. A PSNR estimation performance prediction using
the Cramér-Rao lower bound is developed. The prediction suggests the choice of
projection block size and the quantization. Distributed source coding and maximum
likelihood estimation offer up to 85% video digest rate savings compared to the ITU-T
J.240 Recommendation at the same performance.
Advanced statistical signal processing methods play an important role throughout this dissertation. Spectral analysis provides the insight of choosing the right
projection basis. The EM algorithm offers robustness against many common image
adjustments. Statistical inference over factor graphs is the basis of the distributed
source decoder and its extension to localize tampering. Maximum likelihood estimation in video quality monitoring achieves accurate PSNR estimation. We consistently
find that techniques based on a rigorous mathematical analysis greatly outperform
ad hoc methods.
7.2
Future Work
Our novel ideas on authentication using distributed source coding have attracted
attention in the research community. With additional assumptions on sparsity of
CHAPTER 7. CONCLUSIONS AND FUTURE WORK
97
tampering, similar schemes are proposed using Wyner-Ziv coding and compressive
sensing for image authentication [177, 178] and audio authentication [132, 184]. Our
tampering localization ideas have been adopted by a number of other applications,
such as coding of thumbnail video for distortion-aware retransmission [95] and coding
of digest data for video file synchronization [216–218].
To the best of our knowledge, little work has been carried out to date towards
statistical models for image tampering. This thesis uses a simple additive model
that assumes that a tampered image is formed by adding an image-like random process to the original image. Future work should consider more sophisticated tampering
models. The current design of the proposed image authentication system uses a pseudorandom projection to prevent an attacker from altering the image in the null space
of the projection. Our pseudorandom projection choice is based on the assumption
that tampering is image-like. An attacker might attempt to compromise the system
based on this assumption. With models for attacker’s incentives and designer’s objectives, a game theory analysis could suggest an equilibrium for a more sophisticated
system design.
The proposed authentication system uses EM algorithms to address the images
that have undergone some adjustments. This thesis reported detailed algorithms
for contrast, brightness, and affine warping adjustment. Many other common image
processing, such as filtering and gamma correction, can be included for future extensions. The limits and optimization of distributed source coding combined with EM
algorithms remain open.
The tampering localization in our authentication system is achieved using inference over the decoder factor graph which combines LDPC code graph and spatial
models for tampering. The system can benefit from LDPC code optimization for the
localization decoder and yield lower data requirements for the localization.
The quality monitoring system studied in this thesis focuses on PSNR estimation
using the ITU-T J.240 Recommendation projection. The proposed system can also
be applied to other features and the corresponding metrics proposed in other quality
assessment literatures. This will raise interesting design issues on distributed source
CHAPTER 7. CONCLUSIONS AND FUTURE WORK
98
coding and optimal quality estimation for various quality assessment features. A natural extension considering a symmetric setup in which the content provider and the
viewer send information to a central quality monitoring server will pose interesting
challenges.
While the general problem of statistical inference under rate constraints remains
open, it has inspired us to investigate distributed source coding applications that
infer the relation among separated sources instead of reconstructing the source. We
believe that there are many other applications that fit this setting and can greatly
benefit from distributed source coding.
Appendix A
Test Images
Throughout this thesis, simulations are carried out using Kodak and classic test
images shown in Figure A.1. All images are at 512×512 resolution in 8-bit gray
resolution.
Figure A.1: Test images used in simulations.
99
Appendix B
Concavity of L̂
We use the concavity of L̂(α, β) in (4.2), (4.4), and (5.2) to claim the optimality
conditions in terms of partial derivatives being zero. We first derive the Hessian
of L0 (xq , i; α, β) = log P (xq |Y (i); α, β) and then give the condition of quantization
" "
of X for L̂(α, β) = i xq Qi (xq )L0 (xq , i; α, β) being concave, where Qi (xq ) is the
resulting estimate of xq in the E-steps. Recall that αX|Y ∼ N((Y − β), σ 2 ), we have
L0 = log P (xq |Y ; α, β)
?
1
(Y − αx − β)2
√
)dx
= log
exp(−
2σ 2
2πσ
x:Q(x)=xq
Let f (X; α, β) =
√1
2πσ
2
). The first order derivatives of L0 with
exp(− (Y −αX−β)
2σ2
respective to α and β are
?
∂L0
x
1
=
f (x; α, β) 2 (Y − αx − β)dx
∂α
P (xq |Y ; α, β) x:Q(X)=xq
σ
1
= 2 E[(Y − αX − β)X|xq , Y ; α, β]
σ
100
APPENDIX B. CONCAVITY OF L̂
101
and
?
1
1
∂L0
f (x; α, β) 2 (Y − αx − β)dx
=
∂β
P (xq |Y ; α, β) x:Q(X)=xq
σ
1
= 2 E[(Y − αX − β)|xq , Y ; α, β]
σ
The second order derivatives of L0 are
D
∂ 2 L0
1 C
= 4 E[(Y − αX − β)2 X 2 |xq , Y ; α, β] − E 2 [(Y − αX − β)X|xq , Y ; α, β]
2
∂α
σ
E[X 2 |xq , Y ; α, β]
−
σ2
2
D
∂ L0
1 C
= 4 E[(Y − αX − β)2 |xq , Y ; α, β] − E 2 [(Y − αX − β)|xq , Y ; α, β]
2
∂β
σ
1
− 2
σ
1
∂ 2 L0
= 4 E[(Y − αX − β)2 X|xq , Y ; α, β]
∂α∂β
σ
1
− 4 E[(Y − αX − β)|xq , Y ; α, β]E[(Y − αX − β)X|xq , Y ; α, β]
σ
1
− 2 E[X|xq , Y ; α, β]
σ
We rewrite it in a Hessian matrix,
∇2 L0 =
'
∂ 2 L0
∂α2
∂ 2 L0
∂α∂β
1
= 4 Cov
σ
(
∂ 2 L0
∂α∂β
∂ 2 L0
∂β 2
''
(H
''
(H
(
(
H
X 2 X HH
1
H
Hxq , Y − 2 E
Hxq , Y
σ
(Y − αX − β) H
X 1 H
(Y − αX − β)X
(B.1)
" "
Qi (xq )∇2α,β L0 (xq , i) $ 0, then L̂(α, β) is concave.
"
With an additional assumption that N1 i Qi (Xq ) converges P (Xq ) for a suffi-
This suggests that if
i
xq
ciently large number of samples N, the second order derivative of L̂(α, β) approaches
APPENDIX B. CONCAVITY OF L̂
102
to E[∇2 L0 ].
1
E[∇2 L0 ] = 4 Cov
σ
''
(Y − αX − β)X
((
(Y − αX − β)
' ''
(H
((
'
(
(Y − αX − β)X HH
X2 X
1
1
− 4 Cov E
− 2E
HXq , Y
σ
σ
(Y − αX − β) H
X 1
' ''
(H
((
'
(
(Y − αX − β)X HH
1 0
1
=
− 4 Cov E
HXq , Y
σ
(Y − αX − β) H
0 0
(B.2)
(B.3)
where (B.2) is due to the law of total variance and (B.3) uses the third and the fourth
order Gaussian statistics. This suggests that the choose of the quantization of X
should satisfy
E[∇2 L0 ] =
such that as
function.
1
N
'
"
i
1 0
0 0
(
' ''
(H
((
(Y − αX − β)X HH
1
− 4 Cov E
$ 0,
HXq , Y
σ
(Y − αX − β) H
(B.4)
Qi (xq ) converges P (xq ) for sufficiently large N, L̂(α, β) is a concave
2
Figure B.1 plots det(E[∇2 L0 ]) and E[ ∂∂αL20 ] as we vary the number of bits in
quantization of X, for Y uniformly distributed over [10, 235], α = 1.3, β = −10, and
σ 2 = 10. For this setting, the 2-bit quantization is sufficient to make L̂ be a concave
function of α and β. Note that without quantization, i.e., Q(X) = X, we have
1
E[∇2 L0 ] = − 2 E
σ
''
X2 X
X
1
((
.
(B.5)
This serves as a reference to see how quantization of X effects the estimation of α
and β.
APPENDIX B. CONCAVITY OF L̂
103
30
0
25
−200
−E[X2]/$2
−400
20
−600
15
Value
Value
E[ "2 L0 / " #2]
10
−800
−1000
5
−1200
det(E[!2L0])
0
−1400
Without Quantization
−5
0
2
4
6
8
Number of Bits in Quantization of X
(a)
10
−1600
0
2
4
6
8
Number of Bits in Quantization of X
10
(b)
Figure B.1: Let Y be uniformly distributed over [10, 235] and we set α = 1.3, β = −10,
2
and σ 2 = 10. We plot (a) det(E[∇2 L0 ]) and (b) E[ ∂∂αL20 ] as we vary the number of
bits in quantization of X. For this setting, the 2-bit quantization of X is sufficient to
make L̂ be a concave function of α and β.
Bibliography
[1] BitTorrent. http://www.bittorrent.com/.
[2] eMule Project. http://emule-project.net/.
[3] KaZaA. http://www.kazaa.com/.
[4] A. Aaron and B. Girod. Compression with side information using Turbo codes.
In Data Compression Conference, pages 252–261, Snowbird, UT, April 2002.
[5] A. Aaron and B. Girod. Wyner-Ziv video coding with low-encoder complexity.
In Picture Coding Symposium, San Francisco, CA, December 2004.
[6] A. Aaron, S. Rane, and B. Girod. Wyner-Ziv video coding with hash-based
motion compensation at the receiver. In IEEE International Conference on
Image Processing, Singapore, October 2004.
[7] A. Aaron, S. Rane, E. Setton, and B. Girod. Transform-domain Wyner-Ziv
codec for video. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, 2004.
[8] A. Aaron, S. Rane, R. Zhang, and B. Girod. Wyner-Ziv coding for video:
Applications to compression and error resilience. In IEEE Data Compression
Conference, Snowbird, UT, November 2003.
[9] A. Aaron, S. Setton, and B. Girod. Towards practical Wyner-Ziv coding of
video.
In IEEE International Conference on Image Processing, Barcelona,
Spain, September 2003.
104
BIBLIOGRAPHY
105
[10] A. Aaron, D. Varodayan, and B. Girod. Wyner-Ziv residual coding of video. In
Picture Coding Symposium, Beijing, China, December 2006.
[11] A. Aaron, R. Zhang, and B. Girod. Wyner-Ziv coding of motion video. In
Asilomar Conference on Signals and Systems, Pacific Grove, CA, November
2002.
[12] M. Abdel-Mottaleb, G. Vaithilingam, and S. Krishnamachari. Signature-based
image identification. In SPIE conference on Multimedia Systems and Applications, pages 22–28, Boston, MA, September 1999.
[13] R. Ahlswede and I. Csiszar. Hypothesis testing with communication constraints.
IEEE Transactions on Information Theory, 32(4):533–542, July 1986.
[14] F. Ahmed and M.Y. Siyal. A secure and robust hashing scheme for image
authentication. In Information, Communications and Signal Processing, 2005
Fifth International Conference on, pages 705–709, 2005.
[15] D. R. Ashbaugh. Ridgeology, 1991.
[16] J. Bajcsy and P. Mitran. Coding for the Slepian-Wolf problem with Turbo codes.
In IEEE Global Telecommunications Conference, volume 2, pages 1400–1404,
2001.
[17] T. Barnwell. Correlation analysis of subjective and objective measures for
speech quality. In IEEE International Conference on Acoustics, Speech, and
Signal Processing, volume 5, pages 706 – 709, April 1980.
[18] T. Barnwell and A. Bush. Statistical correlation between objective and subjective measures for speech quality. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 595 – 598, April 1978.
[19] L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique
occurring in the statistical analysis of probabilistic functions of Markov chains.
Annals of Mathematical Statistics, 41(1):164–171, October 1970.
BIBLIOGRAPHY
106
[20] C. Berrou, A. Glavieux, and P. Thitimajshima. Near Shannon limit errorcorrecting coding and decoding: Turbo-codes. In IEEE International Conference on Communications, volume 2, pages 1064–1070, May 1993.
[21] S. Bhattacharjee and M. Kutter. Compression tolerant image authentication.
In International Conference on Image Processing, volume 1, pages 435–439,
October 1998.
[22] A. Bouzidi and N. Baaziz. Contourlet domain feature extraction for image
content authentication. In IEEE International Workshop on Multimedia Signal
Processing, pages 202–206, October 2006.
[23] P. Campisi, M. Carli, G. Giunta, and A. Neri. Blind quality assessment system
for multimedia communications using tracing watermarking. IEEE Transactions on Signal Processing, 51(4):996–1002, April 2003.
[24] D. Chen, D. Varodayan, M. Flierl, and B. Girod. Distributed stereo image
coding with improved disparity and noise estimation. In IEEE International
Conference on Acoustics, Speech, and Signal Processing, Las Vegas, NV, March
2008.
[25] D. Chen, D. Varodayan, M. Flierl, and B. Girod. Wyner-Ziv coding of multiview images with unsupervised learning of disparity and gray code. In IEEE
International Conference on Image Processing, San Diego, CA, October 2008.
[26] D. Chen, D. Varodayan, M. Flierl, and B. Girod. Wyner-Ziv coding of multiview images with unsupervised learning of two disparities. In International
Conference on Multimedia and Expo, Hannover, Germany, June 2008.
[27] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based
on the sum of observations. The Annals of Mathematical Statistics, 23(4):493–
507, December 1952.
[28] N.-M. Cheung and A. Ortega. Distributed source coding application to lowdelay free viewpoint switching in multiview video compression. In Picture Coding Symposium, isbon, Portugal, November 2007.
BIBLIOGRAPHY
107
[29] N.-M. Cheung and A. Ortega. Flexible video decoding: A distributed source
coding approach. In IEEE International Workshop on Multimedia Signal Processing, Crete, Greece, October 2007.
[30] N.-M. Cheung and A. Ortega. Compression algorithms for flexible video decoding. In SPIE Visual Communications and Image Processing Conference, San
Jose, CA, January 2008.
[31] N.-M. Cheung, H. Wang H., and A. Ortega. Video compression with flexible
playback order based on distributed source coding. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, January 2006.
[32] K. Chono, Y.-C. Lin, D. Varodayan, Y. Miyamoto, and B. Girod. Reducedreference image quality assessment using distributed source coding. In IEEE
International Conference on Multimedia and Expo, pages 609–612, April 2008.
[33] T. P. Coleman, A. H. Lee, M. Medard, and M. Effros. On some new approaches
to practical Slepian-Wolf compression inspired by channel coding. In Data
Compression Conference, pages 282–291, March 2004.
[34] T. Cover. A proof of the data compression theorem of Slepian and Wolf for
ergodic sources. IEEE Transactions on Information Theory, 21(2):226–228,
March 1975.
[35] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley
& Sons, Inc., 1991.
[36] I. J. Cox, J. Kilian, T. Leighton, and T. Shamoon. Secure spread spectrum
watermarking for images, audio and video. In IEEE Internation Conference on
Image Processing, Lausanne, Switzerland, September 1996.
[37] H. Cramér. Mathematical Methods of Statistics. Princeton Univ. Press, 1946.
[38] I. Csiszar. Linear codes for sources and source networks: Error exponents,
universal coding. IEEE Transactions on Information Theory, 28(4):585–592,
July 1982.
BIBLIOGRAPHY
108
[39] J. Daugman and C. Downing. Epigenetic randomness, complexity, and singularity of human iris patterns. In Proceedings of the Royal Society, volume B,
pages 1737 – 1740, 2001.
[40] C. De Roover, C. De Vleeschouwer, F. Lefebvre, and B. Macq. Robust image
hashing based on radial variance of pixels. In IEEE International Conference
on Image Processing, volume 3, pages 77–80, September 2005.
[41] C. De Roover, C. De Vleeschouwer, F. Lefebvre, and B. Macq. Robust video
hashing based on radial projections of key frames. IEEE Transactions on Signal
Processing, 53(10):4020–4037, October 2005.
[42] W. Diffie and M. E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22(6):644–654, January 1976.
[43] J. Dittmann, A. Steinmetz, and R. Steinmetz. Content-based digital signature
for motion pictures authentication and content-fragile watermarking. In IEEE
International Conference on Multimedia Computing and Systems, volume 2,
pages 209–213, July 1999.
[44] P. L. Dragotti and M. Gastpar. Distributed Source Coding. Academic Press,
2009.
[45] S. C. Draper, A. Khisti, E. Martinian, A. Vetro, and J. S. Yedidia. Secure
storage of fingerprint biometrics using Slepian-Wolf codes. In Workshop on
Information Theory and Applications, San Diego, CA, 2007.
[46] S. C. Draper, A. Khisti, E. Martinian, A. Vetro, and J. S. Yedidia. Using
distributed source coding to secure fingerprint biometric. In IEEE International
Conference on Acoustics, Speech, and Singal Processing, Honolulu, HI, April
2007.
[47] D. Eastlake. US secure hash algorithm 1 (SHA1), RFC 3174, September 2001.
BIBLIOGRAPHY
109
[48] J. J. Eggers and B. Girod. Blind watermarking applied to image authentication.
In IEEE International Conference on Acoustics, Speech, and Signal Processing,
Salt Lake City, UT, May 2001.
[49] U. Engelke, M. Kusuma, H.-J. Zepernick, and M. Caldera. Reduced-reference
metric design for objective perceptual quality assessment in wireless imaging.
Signal Processing: Image Communication, 24(7):525 – 547, 2009.
[50] U. Engelke, V. X. Nguyen, and H.-J. Zepernick. Regional attention to structural
degradations for perceptual image quality metric design. In Acoustics, Speech
and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on,
pages 869–872, April 2008.
[51] U. Engelke and H.-J. Zepernick. Multi-resolution structural degradation metrics
for perceptual image quality assessment. In Picture Coding Symposium, Lisbon,
Portugal, November 2007.
[52] U. Engelke and H.-J. Zepernick. Quality evaluation in wireless imaging using feature-based objective metrics. In International Symposium on Wireless
Pervasive Computing, pages 367–372, February 2007.
[53] M. C. Q. Farias, S. Mitra, M. Carli, and A. Neri. A comparison between
an objective quality measure and the mean annoyance values of watermarked
videos. In IEEE International Conference on Image Processing, volume 3, pages
469–472, 2002.
[54] M. C. Q. Farias and S. K. Mitra. No-reference video quality metric based on
artifact measurements. In IEEE International Conference on Image Processing,
volume 3, pages 141–144, September 2005.
[55] H. Farid. Image forgery detection. IEEE Signal Processing Magazine, 26(2):16–
25, March 2009.
[56] J. Fridrich. Robust bit extraction from images. In International Conference on
Multimedia Computing and Systems, volume 2, pages 536–540, July 1999.
110
BIBLIOGRAPHY
[57] J. Fridrich and M. Goljan. Robust hash functions for digital watermarking. In
International Conference on Information Technology: Coding and Computing,
pages 178–183, 2000.
[58] R. G. Gallager. Low-Density Parity Check Codes. PhD thesis, MIT, Cambridge,
MA, 1963.
[59] J. Garcı́a-Frı́as. Compression of correlated binary sources using Turbo codes.
IEEE Communications Letters, 5(10):417–419, October 2001.
[60] J. Garcı́a-Frı́as. Decoding of low-density parity-check codes over finite-state
binary Markov channels. IEEE Transactions on Communications, 52(11):1840–
1843, November 2004.
[61] J. Garcı́a-Frı́as and Y. Zhao. Data compression of unknown single and correlated binary sources using punctured Turbo codes. In Allerton Conference on
Communication, Control, and Computing, Monticello, IL, October 2001.
[62] J. Garcı́a-Frı́as and Wei Zhong.
LDPC codes for compression of multi-
terminal sources with hidden Markov correlation. IEEE Communications Letters, 7(3):115–117, March 2003.
[63] N. Gehrig and P. L. Dragotti. Symmetric and asymmetric Slepian-Wolf codes
with systematic and nonsystematic linear codes. IEEE Communications Letters,
9(1):61–63, January 2005.
[64] B. Girod, A. M. Aaron, S. Rane, and D. Rebollo-Monedero. Distributed video
coding. Proceedings of the IEEE, 93(1):71–83, January 2005.
[65] T. S. Han. Hypothesis testing with multiterminal data compression. IEEE
Transactions on Information Theory, 33(6):759–772, November 1987.
[66] T. S. Han and S. Amari. Statistical inference under multiterminal data compression. IEEE Transactions on Information Theory, 44(6):2300–2324, October
1998.
BIBLIOGRAPHY
111
[67] A. M. Hassan, A. Al-Hamadi, B. Michaelis, Y. M. Y. Hasan, and M. A. A.
Wahab. Semi-fragile image authentication using robust image hashing with
localization. In IEEE International Conference on Machine Vision, pages 133
–137, December 2009.
[68] S. S Hemami and M. A. Masry. A scalable video quality metric and applications. In International Workshop on Video Processing and Quality Metrics for
Consumer Electronics, January 2005.
[69] T. Ignatenko. Secret-Key Rates and Privacy Leakage in Biometric Systems.
PhD thesis, Eindhoven University of Technology, The Netherlands, 2009.
[70] T. Ignatenko and F. M. J. Willems. On privacy in secure biometric authentication systems. In IEEE International Conference on Acoustics, Speech and
Signal Processing, volume 2, pages II–121–II–124, April 2007.
[71] T. Ignatenko and F. M. J. Willems. Privacy leakage in biometric secrecy systems. In 46th Annual Allerton Conference on Communication, Control, and
Computing, pages 850–857, September 2008.
[72] T. Ignatenko and F. M. J. Willems. Secret rate - privacy leakage in biometric
systems. In IEEE International Symposium on Information Theory, pages 2251–
2255, 28 2009-July 3 2009.
[73] ISO/IEC. IS 10918-1: Information technology – Digital compression and coding
of continuous-tone still images: Requirements and guidelines, 1990.
[74] ISO/IEC. IS information technology – Coding of moving pictures and associated
audio for digital storage media at up to about 1.5 Mbit/s–part 2:video, 1993.
[75] ISO/IEC. IS 13818-2: Information technology – Generic coding of moving
pictures and associated audio informatio–part 2:video, 1995.
[76] ISO/IEC. IS 14496-10: Information technology – Coding of audio-visual objects
– part 10: Advanced video coding, 2003.
112
BIBLIOGRAPHY
[77] ISO/IEC. IS 15444: Information technology – JPEG 2000 image coding system,
2004.
[78] ITU-T.
Recommendation J.147:
Objective picture quality measurement
method by use of in-service test signals, July 2002.
[79] ITU-T. Recommendation H.264: Advanced video coding for generic audiovisual
services, 2003.
[80] ITU-T. Recommendation J.240: Framework for remote monitoring of transmitted picture signal-to-noise ratio using spread-spectrum and orthogonal transform, June 2004.
[81] A.K. Jain. Advances in mathematical models for image processing. Proceedings
of the IEEE, 69(5):502 – 528, May 1981.
[82] N. Jayant, J. Johnston, and R. Safranek. Signal compression based on models of
human perception. Proceedings of the IEEE, 81(10):1385 –1422, October 1993.
[83] M. Johnson and K. Ramchandran. Dither-based secure image hashing using
distributed coding. In IEEE International Conference on Image Processing,
volume 2, pages 751–754, September 2003.
[84] C. Kailasanathan and R. C Naini. Image authentication surviving acceptable
modifications using statistical measures and k-mean segmentation. In Workshop
on Nonlinear Signal and Image Processing, June 2001.
[85] C. Kailasanathan, R. S. Naini, and P. Ogunbona. Compression tolerant DCT
based image hash. In International Conference on Distributed Computing Systems Workshops, pages 562–567, May 2003.
[86] R. Kawada, O. Sugimoto, A. Koike, M. Wada, and S. Matsumoto. Highly precise
estimation scheme for remote video PSNR using spread spectrum and extraction
of orthogonal transform coefficients. Electronics and Communications in Japan
(Part I: Communications), 89(6):51–62, 2006.
BIBLIOGRAPHY
113
[87] N. Khanna, A. Roca, G. T. C. Chiu, J. P. Allebach, and Delp E. J. Improvements on image authentication and recovery using distributed source coding.
In SPIE Conference on Media Forensics and Security, 2009.
[88] S. S. Kozat, R. Venkatesan, and M. K. Mihcak. Robust perceptual image
hashing via matrix invariants. In Image Processing, 2004. ICIP ’04. 2004 International Conference on, volume 5, pages 3443–3446, October 2004.
[89] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sumproduct algorithm. IEEE Transactions on Information Theory, 47(10):498–519,
February 2001.
[90] T. M. Kusuma and H.-J. Zepernick. A reduced-reference perceptual quality
metric for in-service image quality assessment. In IEEE Joint First Workshop
on Mobile Future and Symposium on Trends in Communications, pages 71–74,
October 2003.
[91] C. J. Lambrecht and O. Verscheure. Perceptual quality measure using a spatiotemporal model of the human visual system. In SPIE Conference on Digital Video Compression: Algorithms and Technologies, pages 450–460, January
1996.
[92] C.-F. Lan, A. D. Liveris, K. Narayanan, Z. Xiong, and C. Georghiades. SlepianWolf coding of multiple M-ary sources using LDPC codes. In Data Compression
Conference, pages 549–, March 2004.
[93] P. Le Callet, C. Viard-Gaudin, and D. Barba. Continuous quality assessment
of MPEG2 video with reduced reference. In International Workshop on Video
Processing and Quality Metrics for Consumer Electronics, January 2005.
[94] F. Lefebvre, J. Czyz, and B. Macq. A robust soft hash algorithm for digital image signature. In International Conference on Multimedia and Expo, Baltimore,
Maryland, 2003.
BIBLIOGRAPHY
114
[95] Z. Li, Y.-C. Lin, D. Varodayan, P. Baccichet, and B. Girod. Distortion-aware retransmission and concealment of video packets using a Wyner-Ziv-coded thumbnail. In IEEE International Workshop on Multimedia Signal Processing, pages
424 –428, October 2008.
[96] J. Liang, R. Kumar, Y. Xi, and K. W. Ross. Pollution in P2P file sharing systems. In INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer
and Communications Societies. Proceedings IEEE, volume 2, pages 1174–1185,
March 2005.
[97] C.-Y. Lin and S.-F. Chang. Generating robust digital signature for image/video
authentication. In ACM Multimedia: Multimedia and Security Workshop, pages
49–54, Bristol, UK, September 1998.
[98] C.-Y. Lin and S.-F. Chang. A robust image authentication method surviving
JPEG lossy compression. In SPIE Conference on Storage and Retrieval for
Image and Video Database, San Jose, CA, January 1998.
[99] C.-Y. Lin and S.-F. Chang. A robust image authentication method distinguishing JPEG compression from malicious manipulation. IEEE Transactions on
Circuits and Systems for Video Technology, 11(2):153–168, February 2001.
[100] Y.-C. Lin, D. Varodayan, T. Fink, E. Bellers, and B. Girod. Authenticating
contrast and brightness adjusted images using distributed source coding and expactation maximization. In International Conference on Multimedia and Expo,
Hannover, Germany, June 2008.
[101] Y.-C. Lin, D. Varodayan, T. Fink, E. Bellers, and B. Girod. Localization of
tampering in contrast and brightness adjusted images using distributed source
coding and expectation maximization. In IEEE International Conference on
Image Processing, San Diego, CA, October 2008.
[102] Y.-C. Lin, D. Varodayan, and B. Girod. Image authentication and tampering localization using distributed source coding. In IEEE Multimedia Signal
Processing Workshop, Crete, Greece, Ocbober 2007.
115
BIBLIOGRAPHY
[103] Y.-C. Lin, D. Varodayan, and B. Girod. Image authentication based on distributed source coding. In IEEE International Conference on Image Processing,
San Antonio, TX, September 2007.
[104] Y.-C. Lin, D. Varodayan, and B. Girod. Spatial models for localization of
image tampering using distributed source codes. In Picture Coding Symposium,
Lisbon, Portugal, November 2007.
[105] Y.-C. Lin, D. Varodayan, and B. Girod.
Authenticating cropped and re-
sized images using distributed source coding and expectation maximization.
In IS&T/SPIE Electronic Imaging, Media Forensics and Security XI, San Jose,
CA, January 2009.
[106] Y.-C. Lin, D. Varodayan, and B. Girod. Distributed source coding authentication of images with affine warping. In IEEE International Conference on
Acoustic, Speech, and Signal Processing, Taipei, Taiwan, April 2009.
[107] Y.-C. Lin, D. Varodayan, and B. Girod. Distributed source coding authentication of images with contrast and brightness adjustment and affine warping. In
International Picture Coding Symposium, Chicago, IL, May 2009.
[108] Y.-C. Lin, D. Varodayan, and B. Girod. Video quality monitoring for mobile
multicast peers using distributed source coding. In 5th International Mobile
Multimedia Communications Conference, London, UK, September 2009.
[109] H. Liu and I. Heynderickx. A no-reference perceptual blockiness metric. In
IEEE International Conference on Acoustics, Speech and Signal Processing,
pages 865–868, 2008.
[110] S. Liu and A. C. Bovik. Efficient DCT-domain blind measurement and reduction
of blocking artifacts. IEEE Transactions on Circuits and Systems for Video
Technology, 12(12):1139–1149, December 2002.
BIBLIOGRAPHY
116
[111] A. Liveris, Z. Xiong, and C. Georghiades. Compression of binary sources with
side information at the decoder using LDPC codes. In IEEE Global Communications Symposium, volume 2, pages 1300–1304, Taipei, Taiwan, November
2002.
[112] A. Liveris, Z. Xiong, and C. Georghiades. Compression of binary sources with
side information at the decoder using LDPC codes. IEEE Communications
Letters, 6(10):440–442, October 2002.
[113] D.-C. Lou and J.-L. Liu. Fault resilient and compression tolerant digital signature for image authentication. IEEE Transactions on Consumer Electronics,
46(1):31–39, February 2000.
[114] C.-S. Lu, C.-Y. Hsu, S.-W. Sun, and P.-C. Chang. Robust mesh-based hashing
for copy detection and tracing of images. In IEEE International Conference on
Multimedia and Expo, volume 1, pages 731–734, June 2004.
[115] C.-S. Lu and H.-Y. M. Liao. Structural digital signature for image authentication: an incidental distortion resistant scheme. In ACM workshops on Multimedia, pages 115–118, Los Angeles, CA, 2000.
[116] C.-S. Lu and H.-Y. M. Liao. Structural digital signature for image authentication: an incidental distortion resistant scheme. IEEE Transactions on Multimedia, 5(2):161–173, June 2003.
[117] Ligang Lu, Zhou Wang, A. C. Bovik, and J. Kouloheris. Full-reference video
quality assessment considering structural distortion and no-reference quality
evaluation of MPEG video. In IEEE International Conference on Multimedia
and Expo, volume 1, pages 61–64, 2002.
[118] J. Lukas and J. Fridrich. Estimation of primary quantization matrix in double
compressed JPEG images. In Digital Forensic Research Workshop, August
2003.
BIBLIOGRAPHY
117
[119] W. Lv and Z.J. Wang. Fast Johnson-Lindenstrauss transform for robust and
secure image hashing. In IEEE Workshop on Multimedia Signal Processing,
pages 725–729, October 2008.
[120] E. Martinian, S. Yekhanin, and J. S. Yedidia. Secure biometrics via syndromes.
In Allerton Conference on Communications, Control and Computing, Monticello, IL, September 2005.
[121] M. Masry, S. S. Hemami, and Y. Sermadevi. A scalable wavelet-based video
distortion metric and applications. IEEE Transactions on Circuits and Systems
for Video Technology, 16(2):260–273, February 2006.
[122] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from
maximally stable extremal regions. In British Machine Vision Conference, 2002.
[123] K. Mihcak and R. Venkatesan. New iterative geometric techniques for robust
image hashing. In Workshop on Security and Privacy in Digital Rights Management, pages 13–21, November 2001.
[124] V. Monga and B. L. Evans. Robust perceptual image hashing using feature
points. In IEEE International Conference on Image Processing, volume 1, pages
677–680, October 2004.
[125] V. Monga and B. L. Evans. Perceptual image hashing via feature points: Performance evaluation and tradeoffs. IEEE Transactions on Image Processing,
15(11):3452–3465, November 2006.
[126] V. Monga and M. K. Mhcak. Robust and secure image hashing via non-negative
matrix factorizations. IEEE Transactions on Information Forensics and Security, 2(3):376–390, September 2007.
[127] V. Monga and M. K. Mihcak. Robust image hashing via non-negative matrix
factorizations. In IEEE International Conference on Acoustics, Speech and
Signal Processing, volume 2, May 2006.
118
BIBLIOGRAPHY
[128] J. Oostveen, T. Kalker, and J. Haitsma. Visual hashing of video: applications
and techniques. In SPIE Conference on Applications of Digital Image Processing, page 121131, San Diego, CA, July 2001.
[129] A.C. Popescu and H. Farid. Exposing digital forgeries in color filter array
interpolated images. IEEE Transactions on Signal Processing, 53(10):3948–
3959, October 2005.
[130] S. S. Pradhan, J. Kusuma, and K. Ramchandran. Distributed compression in
a dense microsensor network. IEEE Signal Processing Magazine, 19(2):51–60,
March 2002.
[131] S. S. Pradhan and K. Ramchandran. Distributed source coding using syndromes
(DISCUS): design and construction. In Data Compression Conference, pages
158–167, March 1999.
[132] G. Prandi, G. Valenzise, M. Tagliasacchi, and A. Sarti. Detection and identification of sparse audio tampering using distributed source coding and compressive sensing techniques. In International Conference on Digital Audio Effects,
September 2008.
[133] J. W. Pratt. F. Y. Edgeworth and R. A. Fisher on the efficiency of maximum
likelihood estimation. The Annals of Statistics 4, 3:501–514, 1976.
[134] R. Puri, A. Majumdar, and K. Ramchandran. PRISM: A video coding paradigm
with motion estimation at the decoder. IEEE Transactions on Image Processing, 16(10):24362448, October 2007.
[135] R. Puri and K. Ramchandran. PRISM: a new robust video coding architecture based on distributed compression principles. In Allerton Conference on
Communication, Control, and Computing, Monticello, IL, 2002.
[136] R. Puri and K. Ramchandran.
PRISM: a ‘reversed’ multimedia coding
paradigm. In IEEE International Conference on Image Processing, Barcelona,
Spain, 2003.
BIBLIOGRAPHY
119
[137] R. Puri and K. Ramchandran. PRISM: an uplink-friendly multimedia coding
paradigm. In IEEE International Conference Acoustics, Speech, and Signal
Processing, Hong Kong, China, 2003.
[138] M. P. Queluz. Towards robust content based techniques for image authentication. In IEEE Workshop on Multimedia Signal Processing, pages 297–302,
December 1998.
[139] M. P. Queluz. Content-based integrity protection of digital images. In SPIE
Conference on Security Watermarking Multimedia Contents, pages 85–93, San
Jose, CA, January 1999.
[140] S. Rane. Systematic Lossy Error Protection of Video Signals. PhD thesis,
Stanford University, Stanford, CA, 2007.
[141] S. Rane, A. Aaron, and B. Girod. Lossy forward error protection for errorresilient digital video broadcasting. In SPIE Visual Communications and Image
Processing Conference, San Jose, CA, July 2004.
[142] S. Rane, A. Aaron, and B. Girod. Systematic lossy forward error protection
for error resilient digital video broadcasting - a Wyner-Ziv coding approach. In
IEEE International Conference on Image Processing, Singapore, October 2004.
[143] S. Rane, A. Aaron, and B. Girod. Error-resilient video transmission using
multiple embedded Wyner-Ziv descriptions. In IEEE International Conference
on Image Processing, Genoa, Italy, September 2005.
[144] S. Rane, P. Baccichet, and B. Girod. Modeling and optimization of a systematic
lossy error protection based on H.264/AVC redundant slices. In Picture Coding
Symposium, Beijing, China, April 2006.
[145] S. Rane and B. Girod. Analysis of error-resilient video transmission based on
systematic source-channel coding. In Picture Coding Symposium, San Francisco,
CA, December 2004.
BIBLIOGRAPHY
120
[146] S. Rane and B. Girod. Systematic lossy error protection versus layered coding
with unequal error protection. In SPIE Visual Communications and Image
Processing Conference, San Jose, CA, January 2005.
[147] S. Rane and B. Girod. Systematic lossy error protection based on H.264/AVC
redundant slices. In SPIE Visual Communications and Image Processing Conference, San Jose, CA, January 2006.
[148] C. Rao. Information and the accuracy attainable in the estimation of statistical
parameters. Bulletin of the Calcutta Mathematical Society, 37:81–89, 1945.
[149] A. R. Reibman, V.A . Vaishampayan, and Y. Sermadevi. Quality monitoring of
video over a packet network. IEEE Transactions on Multimedia, 6(2):327–334,
April 2004.
[150] M. Ries, O. Nemethova, and M. Rupp. Motion based reference-free quality
estimation for H.264/AVC video streaming. In IEEE International Symposium
on Wireless Pervasive Computing, February 2007.
[151] R Rivest. The MD5 message-digest algorithm, RFC 1321, April 1992.
[152] R. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2):120–
126, 1978.
[153] S. Roy and Q. Sun. Robust hash for detecting and localizing image tampering.
In IEEE International Conference on Image Processing, San Antonio, TX, 2007.
[154] M. Schlauweg and E. Müller. Gaussian scale-space features for semi-fragile
image authentication. In Picture Coding Symposium, pages 1–4, May 2009.
[155] M. Schlauweg, D. Pröfrock, and E. Müller. JPEG2000-based secure image
authentication. In Workshop on Multimedia and Security, pages 62–67, Geneva,
Switzerland, 2006.
BIBLIOGRAPHY
121
[156] M. Schneider and S.-F. Chang. A robust content based digital signature for
image authentication. In IEEE International Conference on Image Processing,
volume 3, pages 227–230, September 1996.
[157] D. Schonberg, S. Draper, and K. Ramchandran. On compression of encrypted
images. In IEEE International Conference on Image Processing, pages 269–272,
October 2006.
[158] D. Schonberg, S. S. Pradhan, and K. Ramchandran. LDPC codes can approach
the Slepian-Wolf bound for general binary sources. In Allerton Conference on
Communication, Control, and Computing, Champaign, IL, October 2002.
[159] D. Schonberg, S. S. Pradhan, and K. Ramchandran. Distributed code constructions for the entire Slepian-Wolf rate region for arbitrarily correlated sources.
In Asilomar Conference on Signals, Systems and Computers, volume 1, pages
835–839, November 2003.
[160] D. Schonberg, K. Ramchandran, and S. S. Pradhan. Distributed code constructions for the entire Slepian-Wolf rate region for arbitrarily correlated sources.
In Data Compression Conference, pages 292–301, March 2004.
[161] D. Schonberg, C. Yeo, S. C. Draper, and K. Ramchandran. On compression
of encrypted video. In Data Compression Conference, 2007. DCC ’07, pages
173–182, March 2007.
[162] J. S. Seo, J. Haitsma, T. Kalker, and C. D. Yoo. Affine transformation resilient
image fingerprinting. In IEEE International Conference on Acoustics, Speech,
and Singal Processing, Hong Kong, China, 2003.
[163] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–659, July and October 1948.
[164] H. R. Sheikh, A. C. Bovik, and L. Cormack. No-reference quality assessment
using natural scene statistics: JPEG2000. IEEE Transactions on Image Processing, 14(11):1918–1927, November 2005.
BIBLIOGRAPHY
122
[165] H. R. Sheikh, A. C. Bovik, and G. de Veciana. An information fidelity criterion
for image quality assessment using natural scene statistics. IEEE Transactions
on Image Processing, 14(12):2117–2128, December 2005.
[166] H. Shimokawa, T. S. Han, and S. Amari. Error bound of hypothesis testing with
data compression. In IEEE International Symposium on Information Theory,
page 114, 1994.
[167] D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources.
IEEE Transactions on Information Theory, IT-19(4):471–480, July 1973.
[168] V. Stankovic, A. D. Liveris, Z. Xiong, and C. N. Georghiades. Design of SlepianWolf codes by channel code partitioning. In Data Compression Conference,
pages 302–311, March 2004.
[169] O. Sugimoto, R. Kawada, M. Wada, and S. Matsumoto. Objective measurement
scheme for perceived picture quality degradation caused by MPEG encoding
without any reference pictures. In SPIE Conference on Visual Communications
and Image Processing, volume 4310, pages 932–939, 2001.
[170] Q. Sun, S.-F. Chang, M. Kurato, and M. Suto. A new semi-fragile image
authentication framework combining ECC and PKI infrastructure. In IEEE
International Symposium on Circuits and Systems, Phoenix, AZ, May 2002.
[171] Q. Sun, S.-F. Chang, M. Kurato, and M. Suto. A quantitative semi-fragile
JPEG2000 image authentication system. In IEEE International Conference on
Image Processing, volume 2, pages 921–924, 2002.
[172] Y. Sutcu, S. Rane, J. S. Yedidia, S. C. Draper, and A. Vetro. Feature extraction
for a Slepian-Wolf biometric system using LDPC codes. In Information Theory,
2008. ISIT 2008. IEEE International Symposium on, pages 2297–2301, July
2008.
[173] Y. Sutcu, S. Rane, J. S. Yedidia, S. C. Draper, and A. Vetro. Feature transformation for a Slepian-Wolf biometric system based on error correcting codes.
BIBLIOGRAPHY
123
In IEEE Conference on Computer Vision and Pattern Recognition - Biometrics
Workshop, Anchorage, Alaska, 2008.
[174] A. Swaminathan, Y. Mao, and M. Wu. Image hashing resilient to geometric
and filtering operations. In IEEE International Workshop on Multimedia Signal
Processing, pages 355–358, September/October 2004.
[175] A. Swaminathan, Y. Mao, and M. Wu. Robust and secure image hashing. IEEE
Transctions on Information Forensics and Security, 1(2):215–230, June 2006.
[176] M. Tagliasacchi, G. Valenzise, M. Naccari, and S. Tubaro. A reduced-reference
structural similarity approximation for videos corrupted by channel errors.
Springer Multimedia Tools and Applications, 48(3):471–492, 2010.
[177] M. Tagliasacchi, G. Valenzise, and S. Tubaro. Localization of sparse image
tampering via random projections. In IEEE International Conference on Image
Processing, pages 2092 –2095, October 2008.
[178] M. Tagliasacchi, G. Valenzise, and S. Tubaro. Hash-based identification of
sparse image tampering. IEEE Transactions on Image Processing, 18(11):2491
–2504, November 2009.
[179] Z. Tang, S. Wang, X. Zhang, and W. Wei. Perceptual similarity metric resilient
to rotation for application in robust image hashing. In International Conference
on Multimedia and Ubiquitous Engineering, pages 183–188, June 2009.
[180] Z. Tang, S. Wang, X. Zhang, W. Wei, and S. Su. Robust image hashing for
tamper detection using non-negative matrix factorization. Journal of Ubiquitous
Convergence and Technology, 2(1):18–26, May 2008.
[181] T. Tian, J. Garcı́a-Frı́as, and W. Zhong. Compression of correlated sources
using LDPC codes. In Data Compression Conference, page 450, March 2003.
[182] D. S. Turaga, Y. Chen, and J. Caviedes. No reference PSNR estimation for
compressed pictures. In IEEE International Conference on Image Processing,
volume 3, pages 61–64, 2002.
BIBLIOGRAPHY
124
[183] G. Valenzise, M. Naccari, M. Tagliasacchi, and S. Tubaro. Reduced-reference
estimation of channel-induced video distortion using distributed source coding.
In ACM Multimedia, Vancouver, British Columbia, Canada, October 2008.
[184] G. Valenzise, G. Prandi, and M. Tagliasacchi. Identification of sparse audio
tampering using distributed source coding and compressive sensing techniques.
EURASIP Journal on Image and Video Processing, 2009:1–12, 2009.
[185] D. Varodayan. Adaptive Distributed Source Coding. PhD thesis, Stanford University, Stanford, CA, 2010.
[186] D. Varodayan, A. Aaron, and B. Girod. Rate-adaptive distributed source coding using low-density parity-check codes. In Asilomar Conference on Signals,
Systems, and Computers, Pacific Grove, California, November 2005.
[187] D. Varodayan, A. Aaron, and B. Girod. Exploiting spatial correlation in pixeldomain distributed image compression. In Picture Coding Symposium, Beijing,
China, April 2006.
[188] D. Varodayan, A. Aaron, and B. Girod. Rate-adaptive codes for distributed
source coding. EURASIP Signal Processing Journal, Special Section on Distributed Source Coding, 86(11):3123–3130, November 2006.
[189] D. Varodayan, D. Chen, M. Flierl, and B. Girod. Wyner-Ziv coding of video
with unsupervised motion vector learning. EURASIP Signal Processing: Image
Communication Journal, Special Issue on Distributed Video Coding,, 23(5):369–
378, June 2008.
[190] D. Varodayan, Y.-C. Lin, A. Mavlankar, M. Flierl, and B. Girod. Wyner-Ziv
coding of stereo images with unsupervised learning of disparity. In Picture
Coding Symposium, Lisbon, Portugal, 2007.
[191] D. Varodayan, A. Mavlankar, M. Flierl, and B. Girod. Distributed coding of
random dot stereograms with unsupervised learning of disparity. In IEEE International Workshop on Multimedia Signal Processing, Victoria, BC, Canada,
2006.
BIBLIOGRAPHY
125
[192] D. Varodayan, A. Mavlankar, M. Flierl, and B. Girod. Distributed grayscale
stereo image coding with unsupervised learning of disparity. In IEEE Data
Compression Conference, Snowbird, UT, 2007.
[193] R. Venkatesan, S.-M. Koon, M. H. Jakubowski, and P. Moulin. Robust image
hashing. In IEEE International Conference on Image Processing, volume 3,
pages 664–666, 2000.
[194] A. Vetro, S. C. Draper, S. Rane, and J. S. Yedida. Securing biometric data,
chapter 11, pages 293–324. Academic Press, Inc., 2009.
[195] Y. Wang, D. Wu, H. Zhang, and X. Niu. A robust contourlet based image hash
algorithm. In IEEE International Conference on Intelligent Information Hiding
and Multimedia Signal Processing, pages 1010–1013, September 2009.
[196] Z. Wang, A. C. Bovik, and B. L. Evan. Blind measurement of blocking artifacts
in images. In IEEE International Conference on Image Processing, volume 3,
pages 981–984, 2000.
[197] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality
assessment: from error visibility to structural similarity. IEEE Transactions on
Image Processing, 13(4):600–612, April 2004.
[198] Z. Wang, H. R. Sheikh, and A. C. Bovik. No-reference perceptual quality
assessment of jpeg compressed images. In IEEE International Conference on
Image Processing, volume 1, pages 477–480, September 2002.
[199] Z. Wang and E. P. Simoncelli. Reduced-reference image quality assessment
using a wavelet-domain natural image statistic model. In SPIE Conference on
Human Vision and Electronic Imaging, San Jose, CA, January 2005.
[200] Z. Wang, G. Wu, H. R. Sheikh, E. P. Simoncelli, E.-H. Yang, and A. C. Bovik.
Quality-aware images. IEEE Transactions on Image Processing, 15(6):1680–
1689, June 2006.
BIBLIOGRAPHY
126
[201] A. B. Watson, J. Hu, and J. F. McGowan III. Digital video quality metric based
on human vision. SPIE Journal of Electronic Imaging, 10(1):20–29, 2001.
[202] A. A. Webster, Jones C. T., M. H. Pinson, S. D. Voran, and S. Wolf. An
objective video quality assessment system based on human perception. In SPIE
Conference on Human Vision, Visual Processing, and Digital Display, volume
1913, pages 15–26, 1993.
[203] S. Winkler. Issues in vision modeling for perceptual video quality assessment.
Signal Processing, 78(2):231–252, 1999.
[204] S. Wolf and M. H. Pinson. Spatial-temporal distortion metric for in-service quality monitoring of any digital video system. In SPIE Conference on Multimedia
Systems and Applications, volume 3845, pages 266–277, 1999.
[205] S. Wolf and M. H. Pinson. Low bandwidth reduced reference video quality
monitoring system. In International Consumer Electronics Workshop on Video
Processing and Quality Metrics, Scottsdale, AZ, January 2005.
[206] R. B. Wolfgang and E. J. Delp. A watermark for digital images. In IEEE International Conference on Image Processing, Lausanne, Switzerland, September
1996.
[207] A. Wyner. Recent results in the Shannon theory. IEEE Transactions on Information Theory, 20(1):2–10, January 1974.
[208] L. Xie, G. R. Arce, and R. F. Graveman. Approximate image message authentication codes. IEEE Transactions on Multimedia, 3(2):242–252, June 2001.
[209] Z. Xiong, A. D. Liveris, and S. Cheng. Distributed source coding for sensor
networks. IEEE Signal Processing Magazine, 21(5):80–94, September 2004.
[210] T. Yamada, Y. Miyamoto, and M. Serizawa. No-reference video quality estimation based on error-concealment effectiveness. In IEEE Packet Video Conference, Lausanne, Switzerland, November 2007.
BIBLIOGRAPHY
127
[211] T. Yamada, Y. Miyamoto, M. Serizawa, and H. Harasaki. Reduced-reference
based video quality metrics using representative luminance values. In International Consumer Electronics Workshop on Video Processing and Quality Metrics, Scottsdale, AZ, January 2007.
[212] F. Yang, S. Wan, Y. Chang, and H. R. Wu. A novel objective no-reference
metric for digital video quality assessment. IEEE Signal Processing Letters,
12(10):685–688, October 2005.
[213] S. Yang. Robust image hash based on cyclic coding the distributed features.
In International Conference on Hybrid Intelligent Systems, volume 2, pages
441–444, August 2009.
[214] S.-H. Yang and C.-F. Chen. Robust image hashing based on SPIHT. In International Conference on Information Technology: Research and Education,
pages 110–114, June 2005.
[215] R.-X. Zhan, K. Y. Chau, Z.-M. Lu, B.-B. Liu, and W. H. Ip. Robust image
hashing for image authentication based on DCT-DWT composite domain. In
IEEE International Conference on Intelligent Systems Design and Applications,
volume 2, pages 119–122, November 2008.
[216] H. Zhang, C. Yeo, and K. Ramchandran. VSYNC: a novel video file synchronization protocol. In ACM Multimedia, pages 757–760, Vancouver, British
Columbia, Canada, October 2008.
[217] H. Zhang, C. Yeo, and K. Ramchandran. Rate efficient remote video file synchronization. In IEEE International Conference on Acoustics, Speech and Signal
Processing, pages 1845 –1848, April 2009.
[218] H. Zhang, C. Yeo, and K. Ramchandran. Remote video file synchronization for
heterogeneous mobile clients. In SPIE Conference on Applications of Digital
Image Processing, volume 7443, page 74430F, 2009.
BIBLIOGRAPHY
128
[219] H. Zhang, H. Zhang, Q. Li, and X. Niu. Predigest Watson’s visual model as
perceptual hashing method. In International Conference on Convergence and
Hybrid Information Technology, volume 2, pages 617 –620, November 2008.
[220] H.-L. Zhang, C.-Q. Xiong, and G.-Z. Geng. Content based image hashing robust to geometric transformations. In International Symposium on Electronic
Commerce and Security, volume 2, pages 105–108, may 2009.
[221] Y. Zhao and J. Garcı́a-Frı́as. Data compression of correlated non-binary sources
using punctured Turbo codes. In Data Compression Conference, pages 242–251,
Snowbird, UT, April 2002.
[222] Z. Zhu, A. Aaron, and B. Girod. Distributed compression for large camera
array. In International Workshop on Statistical Signal Processing, St. Louis,
MO, September 2003.
[223] Z. Zhu, S. Rane, and B. Girod. Systematic lossy error protection (SLEP) for
video transmission over wireless ad hoc networks. In SPIE Visual Communications and Image Processing Conference, Beijing, China, July 2005.