Audio Protection with Removable Watermarking

Transcription

Audio Protection with Removable Watermarking
DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING
DEGREE PROGRAMME IN INFORMATION ENGINEERING
AUDIO PROTECTION WITH REMOVABLE
WATERMARKING FOR MOBILE
DISTRIBUTION
Author
___________________________________
Marko Brockman
Supervisor
___________________________________
Tapio Seppänen
Accepted
_______/_______2009
Grade
___________________________________
2
Brockman M. (2009) Audio Protection with Removable Watermarking for
Mobile Distribution. University of Oulu, Department of Electrical and Information
Engineering, Oulu, Finland. Master’s thesis, 77 s.
ABSTRACT
The failure of encryption-based digital rights management systems and the growing
popularity of online music stores have led to an increasing need for new technologies
that could be able to protect audio copyrights while it is stored in an unprotected
format. Digital watermarking can be used for creating solutions based on embedding
inaudible identifiers known as digital fingerprints into digital audio. These
fingerprints can be used for detecting the origin of the content in case the content is
distributed illegally.
This work designed and implemented an audio protection system, utilizing
removable watermarking and fingerprinting technologies. An audible watermark was
inserted to free preview samples, which were made available in the server. The user
could download and listen to the samples, and request a license from the server,
which enabled the client application to transform the audible fingerprint in the
preview song into a unique inaudible fingerprint, containing user identity
information.
The fingerprint watermark was tested against an extensive set of signal processing
attacks, which made inaudible changes to the audio. The inaudibility of the
watermark was also tested with a dozen test users. The fingerprint watermark was
robust against all attacks, and average listeners could not tell the difference between
watermarked and original versions in the listening tests.
The results show that digital watermarking and fingerprinting technologies can be
used for creating a robust and imperceptible digital rights management system. The
use of removable watermarking provides increased usability, especially in mobile
audio distribution.
Keywords: Digital rights management, DRM, frequency hopping, robust
watermarking, audio synchronization
3
Brockman M. (2009) Audion suojaaminen poistettavalla vesileimauksella
mobiilijakelussa. Oulun yliopisto, sähkö- ja tietotekniikan osasto. Diplomityö, 77 s.
TIIVISTELMÄ
Salaukseen perustuvien digitaalisten käyttöoikeuksien hallintatekniikoiden
epäonnistuminen on luonut kasvavan tarpeen uusille teknologioille, joilla voitaisiin
suojata digitaalisen musiikin tekijänoikeuksia salaamattomassa muodossa.
Digitaalisella vesileimauksella voidaan luoda ratkaisuja, jotka perustuvat
kuulumattomien tunnistimien, digitaalisten sormenjälkien, upottamiseen digitaaliseen
musiikkiin. Näitä sormenjälkiä voidaan käyttää musiikkitiedoston alkuperäisen
omistajan tunnistamiseen, jos sisältöä jaetaan laittomasti.
Tässä työssä on suunniteltu ja toteutettu audionsuojausjärjestelmä, jossa
hyödynnetään poistettavaa vesileimausta ja sormenjälkitekniikkaa. Palvelimelle
ilmaiseksi saataville laitettuihin esikuuntelukappaleisiin upotettiin ensin kuuluva
vesileima, jonka käyttäjät pystyivät lataamaan omalle mobiililaitteelleen. Käyttäjät
saivat kuunnella esikuunteluversioita, ja ladata palvelimelta lisenssin, jonka avulla
päätelaite osasi muuntaa kappaleessa olevan kuuluvan vesileiman käyttäjäkohtaiseksi
digitaaliseksi sormenjäljeksi.
Sormenjälkivesileiman kestävyyttä testattiin kattavaa signaalinkäsittelyyn
perustuvaa hyökkäysvalikoimaa vastaan, joka teki kuulumattomia muutoksia
kappaleisiin. Vesileiman kuulumattomuutta testattiin myös kymmenellä
testikäyttäjällä.
Sormenjälkivesileima
kesti
kaikkia
hyökkäyksiä,
eikä
keskivertokuuntelija pystynyt erottamaan vesileimattua ja alkuperäistä kappaletta
kuuntelutesteissä.
Tulokset osoittavat että digitaalisella vesileimauksella ja sormenjälkitekniikalla
voidaan luoda kestävä ja kuulumaton digitaalisten käyttöoikeuksien
hallintajärjestelmä. Poistettavan vesileimauksen käyttö parantaa järjestelmän
käytettävyyttä etenkin mobiilissa musiikinjakelussa.
Avainsanat: Digitaalinen käyttöoikeuksien hallinta, DRM, taajuushyppely, kestävä
vesileimaus, audion synkronointi
4
TABLE OF CONTENTS
ABSTRACT
TIIVISTELMÄ
TABLE OF CONTENTS
PREFACE
ABBREVIATIONS
1.
INTRODUCTION ..............................................................................................8
2.
DIGITAL RIGHTS MANAGEMENT...........................................................10
2.1.
Background ............................................................................................10
2.2.
Rights models .........................................................................................11
2.3.
Rights Expression Languages ................................................................12
2.4.
DRM reference architecture ...................................................................12
2.4.1. The content server .....................................................................13
2.4.2. The license server......................................................................14
2.4.3. The client...................................................................................14
2.5.
Mobile DRM ..........................................................................................15
2.5.1. OMA DRM Version 1.0............................................................15
2.5.2. OMA DRM Version 2.0............................................................16
2.6.
DRM in digital audio..............................................................................16
2.6.1. Audio CDs.................................................................................17
2.6.2. Online music stores ...................................................................17
3.
DIGITAL WATERMARKING FOR AUDIO...............................................19
3.1.
Background ............................................................................................19
3.2.
Watermark characteristics ......................................................................20
3.3.
Watermarking methods ..........................................................................21
3.3.1. General watermarking scheme ..................................................22
3.3.2. Watermarking domains .............................................................23
3.3.3. Direct sequence spread spectrum method .................................24
3.3.4. Frequency hopping method .......................................................26
3.4.
Digital watermarks in DRM ...................................................................27
3.4.1. Watermarking and encryption ...................................................27
3.4.2. Digital fingerprinting.................................................................29
3.4.3. Tamper detection.......................................................................30
3.4.4. Attacking digital watermarks ....................................................30
3.5.
Removable watermarking.......................................................................33
3.6.
Commercial solutions.............................................................................33
3.6.1. MarkAny ...................................................................................34
3.6.2. Verance......................................................................................34
3.6.3. Philips........................................................................................35
3.6.4. Alpha Tec ..................................................................................35
4.
ALGORITHM FOR REMOVABLE WATERMARKING AND
FINGERPRINTING ........................................................................................36
4.1.
Embedding..............................................................................................36
4.2.
Noise transform ......................................................................................38
5
4.3.
4.4.
4.5.
Reading the fingerprint...........................................................................39
Performance evaluation..........................................................................40
4.4.1. Robustness.................................................................................40
4.4.2. Imperceptibility .........................................................................43
Discussion ..............................................................................................44
5.
DESIGN OF ROBUST AUDIO PROTECTION SYSTEM.........................48
5.1.
General system description ....................................................................48
5.2.
Use cases ................................................................................................48
5.2.1. Downloading songs for preview................................................49
5.2.2. Purchasing a license for a song .................................................49
5.2.3. Requirements specification .......................................................50
5.3.
System architecture ................................................................................51
5.3.1. Music store ................................................................................52
5.3.2. Client application ......................................................................53
5.3.3. Communications protocol .........................................................54
5.4.
Software design ......................................................................................54
5.4.1. Client application ......................................................................55
5.4.2. Server application......................................................................58
5.4.3. Sequence diagrams ....................................................................60
6.
SYSTEM IMPLEMENTATION AND TESTING........................................61
6.1.
Software platforms .................................................................................61
6.2.
Limitations..............................................................................................61
6.3.
Functional tests.......................................................................................62
6.3.1. Downloading a list of preview files ..........................................62
6.3.2. Downloading a preview file ......................................................62
6.3.3. Music file playback ...................................................................63
6.3.4. Requesting a license for a preview file .....................................63
6.3.5. Generating unique licenses........................................................63
6.3.6. Noise transform .........................................................................64
6.3.7. Maintaining the network connection.........................................64
6.4.
Technical tests ........................................................................................64
6.4.1. Preview file download time.......................................................65
6.4.2. License file download time .......................................................65
6.4.3. Noise transform processing time...............................................66
6.4.4. Multiple users support ...............................................................66
6.4.5. Server stability...........................................................................66
6.4.6. Client stability ...........................................................................66
6.5.
User tests ................................................................................................67
6.6.
Discussion ..............................................................................................67
7.
DISCUSSION....................................................................................................69
8.
SUMMARY.......................................................................................................71
9.
REFERENCES .................................................................................................72
10. APPENDICES ..................................................................................................76
6
PREFACE
This thesis was completed as part of the Zirion project at MediaTeam Oulu,
a research group in the Information Processing Laboratory of the University of Oulu,
Finland. The project focused on creating new value-adding services and content
distribution channel prototypes that function in a real environment. The work started
in January 2008 with algorithm development and it was followed by system design
and implementation in the summer. This thesis document was written during the fall.
I would like to acknowledge Professor Tapio Seppänen for giving essential advice
during the writing of this thesis and digital watermarking researchers Mikko
Löytynoja, Anja Keskinarkaus and Anu Pramila for fun and encouraging work
atmosphere during my years of DRM and watermarking research in MediaTeam in
2004-2008. I would also like to thank Professor Mika Ylianttila for improvement
suggestions during the reviewing process and Dr. Pertti Väyrynen for proofreading
the text.
Special thanks to my family for constant support during my studies, and to my
wife Tiia for all the love and sandwiches.
Oulu, December 5, 2008
Marko Brockman
7
ABBREVIATIONS
3G
AACS
AD/ DA
CD
CEK
CEK
DFT
DRM
DVD
FFT
FP1
HAS
HD DVD
HTTP
ICICS
ID
IFFT
IP
IT
ITU
ITU-R
JND
JPEG
LAME
LCG
MOS
MP3
MPEG
ODRL
OMA
REL
S60
SDK
SDMI
TCP
UML
URL
USB
VCMS/A
WAV
XML
XrML
3rd Generation mobile phone standard
Advanced Access Content System
Analog to Digital / Digital to Analog converter
Compact Disc
Content Encryption Key
Content Encryption Key
Discrete Fourier Transform
Digital Rights Management
Digital Versatile Disc
Fast Fourier Transform
Feature Pack 1
Human Auditory System
High-Definition Digital Versatile Disc
Hypertext Transfer Protocol
International Conference on Information and Communications
Security
Identification
Inverser Fast Fourier Transform
Internet Protocol
Information Technology
International Telecommunication Union
ITU / Radiocommunication Sector
Just Noticeable Difference
Joint Photographic Experts Group
LAME Ain’t an MP3 Encoder
Linear Congruential Generator
Mean Opinion Score
MPEG-1 Audio Layer 3
Motion Picture Experts Group
Open Digital Rights Language
Open Mobile Alliance
Rights Expression Language
Symbian S60 platform
Software Development Kit
Secure Digital Music Initiative
Transmission Control Protocol
Unified Modeling Language
Uniform Resource Locator
Universal Serial Bus
Verance Copy Management System for Audio content
Waveform Audio Format
eXtensible Markup Language
eXtensible Rights Markup Language
8
1. INTRODUCTION
The future of music sales is online and mobile. According to an eMarketer report, the
sales of music through physical medium is going to drop by almost two thirds in just
five years from 2006 to 2011. Online and mobile sales are predicted to be the major
sales channel with a share of 56.5 percent from the total music sales worldwide. [1]
One of the enablers for online and mobile music has been digital rights
management. It provides the means for protecting the content ownership and
copyrights by restricting unauthorized distribution and usage. However, traditional
DRM solutions have proved controversial. Different techniques were tried for
preventing the copying of audio CDs, but they caused compatibility problems with so
many players that DRM is no longer used in audio CD distribution. In mobile music,
there are separate groups of music player manufacturers and online music retailers
using different DRM techniques, which are not interoperable. This is not an ideal
situation from the consumer perspective, because DRM protected music purchased
from an online music store may be playable in digital audio players of only one
manufacturer.
The dominant digital music format is currently MPEG-1 Audio Layer 3, more
commonly known as MP3. It is also the de facto standard encoding of music played
on digital audio players. The problem with MP3 regarding mobile music distribution
is that it does not support copy protection. This has caused online music retailers to
use other DRM enabled proprietary audio formats. The aim is to make using the
music files difficult in ways not specified and allowed by the record companies.
Most of the current encryption based solutions can be circumvented with burning the
music to CD and then ripping it back into some unprotected format such as MP3.
Digital watermarking can be used for creating a solution for the rights
management problem of digital audio. The nature of watermarking allows the audio
to be unencrypted because the content protection is embedded into the audio signal
itself. The use of an unprotected file format enables the music to be played on any
digital audio player, and the music can also easily be burned to CD as well. This
eliminates many of the attacks used on other DRM systems and allows better
consumer satisfaction because of wider usability. The problem is, however, that
digital watermarks can be vulnerable to signal processing attacks. The watermarked
signal can be modified so that the modification is inaudible for a human listener, but
the watermark signal may be destroyed in the process. This is a major challenge for
all watermarking applications, and one of the emphasis areas of this thesis is the
robustness of the watermarking algorithm.
The goal of this thesis was to design and implement a robust audio protection
system using removable watermarking and fingerprinting techniques. The emphasis
of the design process was in the watermarking algorithms, which were developed and
tested on Matlab environment. The algorithms consisted of an improved version of
the audio protection scheme presented in [2], which is also based on research done in
the Zirion project in MediaTeam Oulu research group. The most significant
improvement presented in this thesis is a method where the audible watermark is
transformed into an imperceptible digital fingerprint, when the user purchases a
license for the content and the audible noise signal is removed from the audio file.
An algorithm for extracting the fingerprint from the audio was also developed. The
new algorithm is also improved in terms of modularity and robustness. A music store
server and a client application were developed for performing the user tests. The
9
implementation platform chosen for the server was a Java application on a Linux
machine, and Symbian S60 3rd Edition platform for the client application. The
watermarking algorithms were implemented to the S60 platform to be used by the
client application. The perceptual quality of the watermark was analyzed with a
listening test by a dozen test users, and the robustness was tested against an extensive
set of attacks against watermarked audio clips.
This thesis is divided into five main chapters. Chapter 2 explains the basic
principles, components and concepts of digital rights management with a focus in
digital audio. Chapter 3 introduces digital audio watermarking and presents its major
characteristics, methods and applications. Chapter 4 presents the watermarking
algorithms which form the core part of the implemented software, and chapter 5
covers the design process of the audio protection system which is the major
contribution of this thesis. Chapter 6 presents shortly the implementation details and
the results of user testing, and finally, chapter 7 includes the final discussion of the
contribution of this thesis and analyzes the possibilities for future work.
10
2. DIGITAL RIGHTS MANAGEMENT
The term Digital Rights Management (DRM) refers to controlling and managing
rights of digital intellectual property. Usually, the term is used when referring to
managing rights of digital media content, such as images, video and audio, but in a
broad view, it also includes software copy protection scenarios. This chapter
introduces the basic concepts of DRM systems, with a special focus on the usage of
DRM in managing copyrighted digital audio.
2.1. Background
Before the digital age when copying information was not as easy as today, there was
rarely a need for any special rights management systems. Instead, rights were tightly
bound to the media format itself. If you bought a book, you were not allowed to
create a copy of it, but even if you wanted to, it would not have been a very easy task
to achieve. Therefore, the book as a medium restricted its usage in a way the
publisher wanted. The introduction of Compact Cassettes facilitated the copying of
music a little, but the quality of the new copy was never as good as the original. [3]
The birth of DRM has primarily been a consequence of digital formats and the
Internet. The ability to distribute content easily and affordably between computers
has allowed piracy to grow fast, and DRM has been created as a countermeasure by
the content producing industry. The first DRM generation focused primarily on
security an encryption as a means of countering unauthorized copying. The content
was encrypted, and only paying customers could unlock and use it. The next
generation DRM systems introduced a whole new range of capabilities, such as
description, identification, trading, protecting, monitoring and tracking of rights
usages. In other words, DRM started to include everything a person can do with
media content. [4]
The environment of digital rights management is mainly characterized by three
aspects: law, technology and business models. The rights which need to be managed
exist because of law. For every original work which is made accessible through some
medium of expression, the copyright law assigns copyrights to the author. The author
has the right to reproduce, modify, distribute, perform, or display the work publicly.
These rights are essential to DRM. The other important law to DRM is contract law,
because all licenses between the content provider and the consumer are basically
contracts, which usually grant access to intellectual property for some monetary
compensation. DRM would not exist without laws, and it is also the current response
of the law for the new content usage scenarios technology has made possible. From
the legal point of view, DRM determines which usages are authorized and which are
not. [3]
Technology is the second major enabler for DRM. Without the ability to represent
and enforce rights models digitally, the whole concept would remain theoretical.
However, technical implementations have their challenges, especially in the field of
security. Studies have shown that every DRM system can be broken, and therefore,
the goal usually is to create a security that is robust enough [5]. The DRM reference
architecture, which is the technical base of most of the DRM solutions, is presented
in section 2.4.
11
Business models are the driving force of DRM development. DRM enables new
ways to distribute content and opens up new business opportunities. This was
discovered after the first DRM systems failed in making the online world work
exactly like the offline world in terms of content distribution and rights management.
The Internet was seen merely as a new medium, rather than a whole new world of
business model possibilities. DRM works where it supports the business models and
not the other way around. Examples of the new business models are paid downloads,
pay-per-view and superdistribution [3]. Economic factors such as the market
situation play a role when the content providers determine which business models to
use for content distribution. These factors also have an effect on consumers and their
will to start using new content delivery services.
The functionality of DRM can be split into two categories, as illustrated in
Figure 1. DRM is about both managing digital rights and digital management of
rights. The former in the sense that rights holders must identify their content, collect
metadata about it and assert rights they have to the content. In addition, the rights
holders must develop business models for content distribution in order to gain benefit
from their rights to the content. In other words, DRM enables management of digital
rights for the rights holders. The other category, digital management of rights,
concerns enforcing digitally the distribution and usage rules set by the rights holders.
Most of the DRM functions fall into the enforcement category. [4]
Figure 1. The two parts of DRM.
2.2. Rights models
Rights have traditionally been divided into three categories: legal, transactional and
implicit rights. Legal rights are the ones you get when you produce some kind of
original work that falls under the copyright law categories. Legal rights may also be
applied by some legal procedure, such as applying for a patent. The second type of
rights is transactional rights. These are rights that you receive or give up because of
some transactions, such as buying or selling. The third type is implicit rights. They
are rights which are bound to the medium that the information is in. For example, a
book allows the owner to read the book as many times as he wants, and also sell or
lend the book to someone else. [3]
The evolution of DRM has led to the introduction of rights models, which specify
more accurately the types of rights the digital rights management system can use.
The rights models are important when the business models related to digital media
are designed. In the new approach, the rights are also divided into three fundamental
categories: render, transport and derivative rights. Render rights are the rights to
present the digital content on some output medium. For example, printing or viewing
on a screen are render rights. The second type of rights is transport rights, which
concern copying, moving or loaning the content. The third rights type, derivative
12
rights, has to do with manipulating the original content in a way that a new work is
produced. The new content can be extracted or edited from the original, or the
original content can be embedded into some new context, where a new work is
created. [3]
2.3. Rights Expression Languages
Although rights can be expressed with simpler formats, several complex Rights
Expression Languages (REL) have been developed for expressing the rights
specifications. A REL is a formal language, not open to interpretation like the
copyrights laws, but instead defines the rights precisely as a programming language.
The most popular languages are Open Digital Rights Language (ODRL) and
eXtensible Rights Markup Language (XrML). An overview of a Rights Expression
Language is presented in Figure 1. [6]
Figure 2. Overview of a Rights Expression Language.
The ODRL is an XML-based language, which aims at developing an open
standard for rights expressions. It is managed by ODRL Initiative, which supports
open-source DRM projects implementing ODRL specifications [3]. The Open
Mobile Alliance (OMA) has adopted ODRL in its REL specification already in
OMA DRM specification version 1.0 and continued further support in versions 2.0
and 2.1. [7][8][9][10]
The XrML is also based on XML, and it has been selected as the basis for MPEG32 REL. XrML is developed by ContentGuard, which has a range of patents covering
its usage. License fees are applied if XrML is used in a context covered by the
patents. [11]
2.4. DRM reference architecture
System enforcing rights models is called a DRM system. Although the DRM system
architecture depends heavily on the specific usage scenario, there are some common
components, which are found on most of the systems. This common theme is called
DRM reference architecture. It consists of three major components: the content
13
server, the license server and the client. The DRM reference architecture is illustrated
in Figure 2.
Figure 3. The DRM reference architecture.
2.4.1. The content server
The content server includes a content database for all content files, and the
functionality to prepare content for DRM controlled distribution. In addition to the
content itself, the database stores metadata information about the content, such as
title, author, format and price. For end users, the content server allows access to the
DRM enabled content downloads.
The content files are usually manipulated in some way in order to prepare them for
controlled distribution when they are imported into the content repository. This is
done by the content packager component of the content server. All files which are
brought into the system by the content providers are first processed by the content
packager and then placed into the content database for storing. Another important
task of the content packager is the specification of rights the content provider wants
to allow for the user. Separate rights can be specified for previewing purposes, and
several purchasing options can be offered to the user. The content packager can be
for example a web interface running on top of the server providing database access
for the content providers.
An essential feature of the content packager is batch processing. As content
providers generally add plenty of content in a single session, it must be possible to
input multiple files with customizable rights models into the system.
14
2.4.2. The license server
Although the licensing system can be implemented in many ways, the license server
in a typical DRM system creates licenses for each user from content rights, user
identities and content encryption keys. The rights and possible encryption keys are
provided by the content server, and the client provides information about the user
identity. As the communications path between the license server and the client is
usually insecure, the data transmissions must be protected with public-key
cryptography. In small scale systems, the content and license servers can be
combined and used in a single process.
In addition to generating and transmitting licenses to the client, the license server
is responsible for the financial transaction of the licensing process. The license server
uses the identity of the user to fetch the necessary details concerning the transaction,
such as credit card or account details. The identity of the user can be created from a
username, social security number, or any other piece of information which accurately
identifies the user.
2.4.3. The client
The DRM client side application can reside in a variety of platforms depending on
the usage scenario. The primary functionality of the client is contained in a DRM
controller, which can either be an independent piece of software or it can be
integrated into the content rendering application itself. In some solutions, the DRM
controller is an external piece of dedicated hardware. The main functions of the
controller are to gather identity information from the user, obtain licenses from the
license server, authorize the rendering application to have access to the content and
perform the possible content decryption. Additionally, the controller delivers the
user’s commands to the license server for requesting licenses and checking the
payment options. The DRM controller must support public-key cryptography for
secure data transmission between the client and the license server.
The usage authorization scenarios depend on the used rights models of the content.
The basic model authorizes the user to have access to the content as many times as
possible for a single fee. Other models may give or restrict access to the content
temporarily regarding the selected payment options. Other possibility is to restrict the
number of renderings with a counter-based solution. Securing the usage counter in
the client device remains an implementation problem, especially in cases when the
user is not required to be online when accessing the content. Trusted computing and
hash-based solutions have been proposed for secure storing of the usage counter. [12]
There are two types of content rendering applications in the client. Those with
built-in DRM support which can handle the content usage restriction and license
processing by themselves, and the other non-DRM applications which must be
restricted by the DRM system for getting access to the content. The main advantage
of the applications with built-in DRM is security. The programs allow only specific
functions which the content provider or the rights holder has allowed. However, the
disadvantages are huge. The vendors must distribute the application to all content
users, and in addition, the users must learn to use the new application. This can be an
enormous burden for the users. A more common approach is to let the users use their
existing rendering applications, and modify their behavior with plug-ins. For
15
example, the plug-in framework of Adobe Acrobat makes it possible to disable
commands such as Print and Save As. The advantage of using plug-ins for DRM
purposes is that the users usually have the base application installed, and installing an
additional plug-in is clearly a smaller burden compared to installing a whole new
application. However, making the plug-in frameworks as secure as individual
applications is problematic.
2.5. Mobile DRM
The most important player in Mobile DRM industry is the Open Mobile Alliance
(OMA), which is a standards body developing open standards for the mobile phone
industry. It has members, including mobile phone manufacturers, mobile operators,
application and content providers and other IT companies.
2.5.1. OMA DRM Version 1.0
OMA DRM 1.0 was the first industry standard method for protecting mobile content.
It was approved in 2004, and it is currently supported in most of the mobile phones
in the market. The goal of OMA DRM 1.0 is to follow common DRM practices with
conforming to special requirements and characteristics of the mobile domain, while
providing basic functionality with some level of security. Version 1.0 provides three
methods for content protection and delivery. The methods, forward-lock, combined
delivery and separate delivery, are illustrated in Figure 4.
Figure 4. Content protection methods in OMA DRM 1.0.
The simplest and most supported method is forward-lock, where the content is
wrapped in a DRM message and delivered to the mobile device. After that the user
cannot send the message or the content to any other device. The send option is
removed from all applications where it would normally be, and the file cannot be
copied over USB or Bluetooth connection either. Because a forward-lock message
16
does not contain any rights specifications, a set of default rights are applied for the
media object.
The combined delivery method is similar to the forward-lock except that the DRM
message contains also the rights specifications. The rights object defines what the
user can do with the content. For example, it can allow a temporary access to the
content or allow the content to be rendered for a certain amount of times. As with the
forward-lock method, the content and the rights are wrapped in a DRM message.
The separate delivery method provides more security by separating the content and
the rights. The content is encrypted with symmetric encryption, which makes the
content object useless for parties without the Content Encryption Key (CEK). This
allows distribution of the content via insecure transport methods. The rights and the
encryption key are wrapped in a license object, which must be delivered through
secure transport channel. Unlike forward-lock and combined delivery, the content
object can be delivered freely without compromising the business model behind the
rights. Superdistribution, which means distributing the content objects directly
between users, is part of the separate delivery method and it should even be
encouraged by the content provider because it may bring additional customers to the
system. The users who have received new content can unlock it with acquiring the
corresponding license.
2.5.2. OMA DRM Version 2.0
In the first DRM revision OMA focused on the fundamental building blocks for a
DRM system, but the security level was not high enough for creating a robust
system. The new OMA DRM 2.0 addressed these security issues with new features
based on the separate delivery method.
The new security model relies heavily on the DRM agent of the user device. The
content itself is packaged in a similar secure container encrypted with a symmetrical
content encryption key, but in addition it utilizes PKI (Public Key Infrastructure)
certificates for increased security. Every device with OMA DRM 2.0 support has an
individual PKI certificate with a public and a private key. Every rights object is then
encrypted with the public key of the receiver before it is sent over the network. The
rights object contains the symmetrical key that is used to decrypt the actual content
files.
The devices must be registered with the rights issuer before they can receive rights
objects. During the registration the client certificate is validated against a blacklist of
known hacked devices. This method allows banning the distribution of rights objects
from non-trusted devices.
2.6. DRM in digital audio
Although digital audio formats have been around for several decades, the record
companies did not start using DRM technologies with digital audio until 2002, when
BMG introduced a copy protection DRM system to be used with audio CDs. The
system failed badly as users reported the CDs would not play on PCs or car CDplayers [13]. The introduction was mainly due to the popularity of peer-to-peer file
17
sharing program Napster between 1999 and 2001, which forced the record industry
to start taking the thread of Internet piracy seriously. After that the use of DRM
spread to most major record labels, but the current trend seems to find other solutions
to the piracy issues of digital music. This chapter discusses the current state of DRM
technologies in audio CDs and online music store.
2.6.1. Audio CDs
DRM technologies were previously used in digital audio CDs, but major publishers
have since abandoned the technology and CDs with DRM are no longer published.
The last publisher to give up DRM on audio CDs was EMI in January 2007. [14]
The goal of DRM in audio CDs was to prevent unauthorized copying. This was
attempted with shuffling the audio content in a way the ripping of the audio into nonDRM formats such as WAV or MP3 would not succeed. The CDs contained a
dedicated piece of DRM software to achieve this, and often some bonus data content
available for computer usage was also included in the CD.
The DRM software caused many problems among legitimate users. The reason
was that the discs with installed DRM software were not standard compliant
Compact Discs but CD-ROM media discs. This rendered the CDs unplayable on
some CD players and computers. Some DRM software included security
vulnerabilities that exposed the users’ computers open to exploitations. The most
famous incident was in 2005, when the Sony BMG DRM software was discovered to
automatically install a rootkit to a PC where the audio CD was inserted. Sony BMG
admitted the mistake and agreed to recall CDs with the security problem from stores
and publish uninstallers for computers where the rootkit had already been installed.
[15]
2.6.2. Online music stores
Digital Rights Management has become a common component of online music
stores, where it aims at restricting the usage of purchased music. Most of the
dominating music stores on the market apply some kind of DRM, but some have
decided to offer music both with and without DRM, and others have abandoned
DRM completely and sell only unprotected music files. Currently, there is a clear
trend towards selling music without DRM, because music stores have noticed that
consumers are more willing to pay for wider usability rather than usage restrictions
through DRM.
The dominant player in the online music store market is the iTunes Store. In June
2008, iTunes had 70 percent market share in online music sales [16]. It uses Apple’s
FairPlay DRM technology, which is integrated into the iTunes application used for
shopping and managing songs purchased from the iTunes Store. The songs
purchased from iTunes can only be played on a computer or with iTunes installed or
Apple’s portable media player device iPod. Other MP3 devices or applications do not
support audio files with FairPlay DRM. Currently, FairPlay allows users to access
their music files from five computers and create a maximum of seven CD copies of
any playlist containing tracks purchased from iTunes. FairPlay DRM technology can
18
be broken with burning the music to CD and then ripping them back into any nonDRM format. In April 2007, Apple and the record label EMI announced an option
for customers to purchase DRM-free music from iTunes. This was after the CEO of
Apple, Steve Jobs, published an article in February 2007 where he disputed the
benefits of DRM and wished for record labels to allow Apple to sell music without
DRM. Jobs explained that since only 3 percent of music on average iPod was
protected with DRM, its significance for the music industry would be negligible.
However, the songs purchased from iTunes without FairPlay include the purchaser’s
name and other identifying information. [17] [18]
Another widely used DRM platform in online music stores is Windows Media
DRM system by Microsoft. It was part of the PlaysForSure certification, which has
recently been rebranded as Certified for Windows Vista. PlaysForSure was created to
challenge FairPlay and to create the de facto standard for music stores other than
iTunes. It has widely achieved this, but has still remained a defendant because of the
success of Apple’s iPod devices, which are incompatible with DRM music purchased
from PlaysForSure stores [19]. However, Nokia chose PlaysForSure DRM for its
upcoming Comes With Music service, which is planned to launch during year 2008.
It allows users to download an unlimited amount of music with their mobile phones
for a period of one year after purchasing the device [20]. Other large online music
stores using PlaysForSure DRM are Napster and Wal-Mart Music Downloads.
Some online music stores, such as eMusic and Amazon, have decided to sell all
their music without DRM software restrictions. This allows music purchased from
their stores to be played on any digital audio player, which is a clear advantage over
FairPlay or PlaysForSure. Some stores claim that DRM is not beneficial for sales,
and encourages publishers and independent music labels to allow distributing their
music without DRM restrictions. A German online music store Musicload, which
announced in March 2007 that three out of four customer support calls were due to
problems caused by malfunctioning DRM systems. [21]
19
3. DIGITAL WATERMARKING FOR AUDIO
Digital watermarking is a process where information is embedded into a digital host
signal, which can be video, audio, or an image. The watermark can be visible or
invisible depending on the application. The term watermark derives from traditional
paper watermarking, where a visible mark was inserted on paper for authentication
purposes. This chapter presents the main characteristics, methods and applications
for digital watermarking with a special focus on audio watermarking. The use of
watermarking for DRM purposes is also discussed.
3.1. Background
The history of information hiding or steganography can be traced back 4,000 years to
ancient Egypt, where information was hidden to small adjustments of characters.
Later a Greek storyteller Herodotus explained in his Histories that wax was often
used to cover a message on a wooden panel to send secret messages [22]. Since then,
numerous methods have been developed for hiding information from changing the
heights of letter-strokes in a cover text to microdots and invisible ink. Especially the
World Wars urged on the research for a reliable and secure way to deliver secret
messages. The advantage of steganography over cryptography is that because the
message is embedded in a cover signal, an accidental observer may not even notice
there is a hidden message in it. A plain cryptographically encoded message will
always attract attention, because it is clear that there is some kind of valuable
information worth of encryption. While cryptography is about protecting the content
of messages, steganography is about covering their whole existence. Therefore,
cryptography and steganography are usually used in combination to ensure the
security of the message.
Traditional watermarking relates to the invention of papermaking in China, but it
did not receive broad use until in the 18th century in America and Europe, where it
was used as an authentication method for books and money and also for recording
manufacturing dates [22]. Nowadays, paper watermarks are used mainly for proving
originality and complicating illegal reproduction of important documents and
banknotes.
The introduction of digital media formats opened many new possibilities for data
hiding. The quality of digital signals is higher than their analog counterparts and
copying can be done without losing signal fidelity. Digital video, audio or image can
also be easily transmitted over information networks. These advantages enabled the
possibility to hide information into a digital signal in a way that it is statistically and
perceptually undetectable. In some cases, the hidden information can be recovered
even if the digital information is compressed, edited or converted from digital to
analog format and back. [23]
The first digital watermarking publications date back to 1980’s, but a notable
increase in research projects occurred in late 1990’s when the number of publications
related to digital watermarking increased to over 100 publications a year. The
increase in publications led to the first academic conference on information hiding
which was organized in 1996. This was due to concern of the publishing industries
20
over copyright issues because of the easy copying of digital material the new
technology had enabled. [24]
The primary focus of digital watermarking research has always been watermarking
of digital images. The first papers on digital audio watermarking were published in
1999 and several embedding and extraction methods have been developed since then.
The main feature of all developed algorithms for audio watermarking is taking
advantage of the human auditory system (HAS) to improve the imperceptibility of
the watermarks. Compared to the human visual system, the HAS is more receptive to
dynamic changes. This is a major challenge for audio watermarking, because
inaudibility is often a requirement for audio watermarking applications. [25]
The applications for digital watermarking generally concentrate on protection of
ownership rights of digital video, audio or image. Typical applications include digital
signatures, fingerprinting, broadcast monitoring, authentication, copy control and
secret communication. Digital signatures and fingerprints can be used for identifying
the owner and the consumer of the content. Broadcast monitoring relates to tracking
the appearance of distributed material in television or radio broadcasts. Fragile
watermarks can also be used for content authentication to make sure the content has
not been altered from the original version. This type of watermark is designed so that
it is destroyed if the content is modified. Another application is copy protection,
where the embedded information contains the rules for content usage and
distribution. Secure communication applications resemble the classical
steganography scenarios except that the communication channel is a watermark
embedded to a digital signal. [22]
3.2. Watermark characteristics
Digital watermarks have three important characteristics that are determined by the
type of application: capacity, robustness and imperceptibility. Capacity is the amount
of data that can be embedded in the watermark, robustness is the ability of the
watermark to resist modifications to the host signal, and imperceptibility means that
the watermark cannot be detected from the host signal with human senses. These
characteristics are partially exclusionary which means that other areas can be
emphasized while deteriorating others. Trade-offs must be accepted for optimal
performance. For example, a robust watermark cannot achieve both high capacity
and imperceptibility. Figure 5 illustrates this compromise.
Figure 5. Trade-offs in digital watermarking.
21
Robustness is generally the most important watermark characteristic in copy
protection scenarios. The watermark should be designed so that it is not possible to
remove the watermark without a proper secret key. Robustness also means the ability
of the watermark to resist modifications in the host signal. First of all, the resistance
to a lossy compression such as JPEG or MPEG compression is usually a requirement
for most watermark applications. In the case of value-adding watermarking, usually
only the robustness against unintentional attacks such as geometrical distortions,
lossy compression and AD/DA transforms is required. This is because the user would
not gain any benefit for deliberately destroying such a watermark. [26]
Imperceptibility is also an important factor in most watermarking applications. It is
affected by the embedding method and especially the embedding strength. Various
methods can be applied for finding the optimal embedding strength, which is at a
threshold where the watermark is as robust as possible but still unnoticeable. For
example, the Just Noticeable Difference (JND) method can be applied for finding the
threshold and instead of being embedded at constant strength, the watermark signal
can be dynamically scaled so that it is just below the JND level. The JND function is
very complex in reality, but depending on the signal type it can be modeled with
different techniques. [27]
The capacity of the watermark can usually be increased directly at the cost of
robustness. For example, if there are multiple channels of information in the
watermark, the same information can be embedded in all of them for increased
robustness, or all channels can be used for different information for increased
capacity. As well as robustness and imperceptibility, the capacity requirement
depends on the watermark application. A simple allow-or-not copy protection can be
achieved with just one bit, whereas complex DRM applications may require
hundreds of bits for accurate rights model descriptions, user fingerprints and digital
signatures.
Other digital watermarking characteristics include algorithm complexity and
performance. Generally, the more complicated the algorithm is, the harder it is to
break the system. On the other hand, large and complex systems are harder to
manage and they are more prone to programming errors. Complex algorithms also
lead to increased processing time when embedding or extracting the watermark. This
can be an issue in DRM systems where performance is vital for the system to
function properly. For example, watermark extraction must not cause delays when
playing a video or an audio file.
3.3. Watermarking methods
This chapter presents the general watermarking scheme and introduces the most
common domains where digital watermarks are embedded. Each watermarking
method consists of an embedding and an extraction algorithm. The embedding
algorithm inserts the watermark into the content data and the extraction algorithm
reads the watermark information from the data. However, in some applications just
verifying the existence of the watermark is required.
22
3.3.1. General watermarking scheme
The general watermarking scheme introduces the basic functional principles of a
digital watermarking system. The scheme consists of watermark embedding and
extraction process. The most important component of both processes is the
algorithm, which handles the system inputs and produces the output of the process.
The algorithm details are discussed in more detail in section 3.3. where the different
watermarking methods are introduced.
The basic framework of the watermark embedding process consists of the
embedding algorithm, the original data contents and the watermark information as
input, and the produced watermarked data contents as output. An optional
watermarking key can be utilized in the embedding process depending on the
algorithm details. The purpose of the key is to increase the security of the system so
that the security does not depend on the algorithm being secret. The watermark
information can be in any digital format the algorithm understands, but usually a bit
array is used because of the small capacity of the watermark. The input data can be
video, audio or an image signal, and the output data is generally in the same format
as the input, but with the watermark information inserted in the data contents. The
general embedding process is illustrated in Figure 6.
Figure 6. A general embedding scheme of digital watermarking.
The watermark extraction framework contains the extraction algorithm, the
watermarked data contents as input and the extracted watermark information as
output. The output of the extraction process can be a similar bit array as in the
embedding phase or simply whether the watermark was found or not. The general
watermark extraction scheme is presented in Figure 7.
Figure 7. A general extraction scheme of digital watermarking.
23
If the optional watermarking key was used in the embedding phase, it is usually
required in the extraction phase as well. Furthermore, if the original data contents are
used in the extraction process, the term informed, non-blind or private watermarking
is used. This means that the extraction process takes advantage of the original
content while extracting. Depending on the implementation of the algorithm, the
informed extraction techniques can greatly facilitate the watermark extraction. The
simplest informed extraction method is to subtract the original data from the
watermarked data, so the remaining data contains only the watermark in case the
watermarked data is not modified after the embedding process.
If the original data is not required in the extraction process, the term blind or
public watermarking is used. This is easier to manage because the presence of the
original data can be a difficult requirement in some extraction schemes. The blind
watermarking scenario is also more challenging in terms of algorithm development,
but it also allows a wider range of possibilities for application development. The
term semi-private watermarking is also sometimes used. It does not use the original
cover signal in the detection process like private watermarking, but instead a
different published watermarked signal. [28]
3.3.2. Watermarking domains
Watermarks can be embedded in audio in time domain or some transform domain,
such as the Fourier domain. The selection of domain affects the properties of the
watermark concerning imperceptibility and robustness. Frequency domain
watermarks are generally considered more inaudible, but they are especially
vulnerable against frequency modifications such as pitch shifting or dynamic
compression. Time domain watermarking techniques generally use spread spectrum
based watermarking. Other domains used for audio watermarking are wavelet
domain and cepstrum domain, which is basically the Fourier transform of the decibel
spectrum of the signal. [29]
Watermarking in the Fourier domain is based on the Fourier transform of the
signal. It is one of the most important tools of modern digital signal processing,
although the Fourier series was introduced already in the early 19th century by Joseph
Fourier. By using the Fourier transform, the signal can be presented in frequency
domain, where its frequency components are easily modifiable. The discrete Fourier
transform (DFT) X(k) of a discrete signal f(k) with length of N is defined as
N −1
X ( k ) =∑ x( n)e −2πikn / N ,
(1)
n =0
where k = 0, …, N-1. In actual implementations, the DFT is never calculated with the
formula (1), but rather using an efficient Fast Fourier Transform (FFT) algorithm.
Frequency domain watermarking takes advantage of the insensitivity of the human
auditory system to phase variations. They can also benefit from techniques similar to
audio compression, such as psychoacoustic models. [30] Embedding a watermark in
the Fourier domain basically means modifying the frequency coefficients of the
Fourier transformed signal. The embedding can be done in individual coefficients or
24
spread spectrum or frequency hopping techniques can be utilized. Frequency hopping
method is presented in section 3.3.4. One possibility is to modify the coefficient
magnitudes by a specified amount of decibels (dB). The magnitudes of the DFT
coefficients are described as
X ( k ) = ( X Re ( k )) 2 + ( X Im ( k )) 2 ,
(2)
where XRe is the real part and XIm is the imaginary part of the DFT. The coefficient
magnitudes can be converted into decibel domain with the formula
X dB (k ) = 10 × log10 ( X (k ) ) ,
(3)
where XdB (k) contains the coefficient magnitudes in the decibel domain. The
advantage of using the decibel domain is that handling large differences of
watermark intensities is simpler. This technique is utilized in [2] and [29].
3.3.3. Direct sequence spread spectrum method
Spread spectrum watermarking means that the power of the watermark information is
deliberately spread wider in the frequency domain in order to hide the signal more
efficiently in the cover signal. Currently, spread spectrum methods are the most
popular watermarking methods in the literature. Two types of spread spectrum
methods are generally used in digital watermarking: frequency hopping and direct
sequence spread spectrum methods. The frequency hopping method is based on fast
switching of the carrier frequency according to a pseudorandom sequence, which
must be known both in the embedding and extraction phases. The direct sequence
method spreads the watermark signal into a wider band signal, also created from a
pseudorandom sequence. [31]
In direct sequence spread spectrum watermarking, the watermark signal
constructed from pseudorandom sequences can be added to the cover signal by
simply adding or subtracting the samples. As the pseudorandom sequence is
generally much shorter than the host signal, the sequence is repeated for every block
of the host signal. One possible method is to add the pseudorandom signal to the
block if the bit to be embedded is one, and subtract if the bit is zero. This method is
illustrated in Figure 8. This kind of approach keeps the computational complexity of
the embedding algorithm very low for facilitating real-time usage.
25
Figure 8. An example of embedding a direct sequence spread spectrum watermark.
The embedded information can be extracted from direct sequence spread spectrum
watermark with calculating the cross-correlation between the original pseudorandom
sequence and the watermarked signal block by block. If the pseudorandom sequence
has been embedded with a big enough scaling factor, the cross-correlation will show
a spike at the middle position of the sequence. The spike will be positive if the
embedded bit was one, and negative if the bit was zero. Figure 9 illustrates the
extraction process of direct sequence spread spectrum watermarking.
Figure 9. An example of extracting a direct sequence spread spectrum watermark.
26
In addition to fast computation, the advantage of the direct sequence method is
fairly good robustness against different signal processing attacks. The downside is,
however, that the watermark signal becomes audible relatively easily if the power of
the watermark signal is increased too much. To achieve maximum inaudibility and
robustness for the spread spectrum watermark, several methods have been suggested
for analyzing the cover audio. Such methods include using psychoacoustic models
for achieving perceptual transparency after embedding and a whitening procedure to
improve the correlation in the extraction phase in [33], and temporal masking and msequences to increase correlation strength in [32].
Another important usage for direct sequence spread spectrum methods in audio
watermarking is synchronization. It is a procedure for determining the exact location
of the watermark in the extraction process. Finding the location is important, because
generally the watermark is embedded block by block in the audio, starting from some
specified position. The synchronization can be performed either by inserting the
synchronization signal once to the beginning of the block sequence or to the
beginning of each block. The previous method is faster, but the latter provides more
robustness due to individual synchronization of each block.
The synchronization signal is usually a similar pseudorandom spread spectrum
signal as in the direct sequence methods, except that the synchronization signal can
be much longer. In the extraction process, the synchronization point is calculated by
calculating the cross-correlation of the original synchronization signal and the
watermarked signal. The spike in the cross-correlation result determines the
synchronization offset point, where the signal must be shifted to, before starting the
extraction of the actual watermark. Direct sequence spread spectrum watermarks
have the natural feature of synchronization, but a separate synchronization signal can
be used for increasing robustness of the watermark. This is because the
pseudorandom sequence for the watermark data is usually much shorter than the
synchronization sequence. Separate synchronization signals must be used if the
watermark is embedded with the frequency hopping method.
3.3.4. Frequency hopping method
The frequency hopping method is very different by nature than the direct sequence
method. Instead of being a wide band signal, the frequency hopping watermark is
present at very narrow bands at any given time. The frequency of the signal changes
rapidly over time according to a pre-defined pseudorandom sequence. The frequency
hopping band defines limits for the hopping sequence. The pseudorandom sequence
defining the frequency hopping sequence can be used as the watermark key for
securing the exact location of the watermark signal in the frequency coefficients.
An example of the frequency hopping method is presented in [29]. It divides the
host audio into blocks of 1024 FFT coefficients and selects two coefficients
according to the pseudorandom frequency hopping sequence. The method changes
the values of these coefficients to the subband mean, which is calculated from the
coefficients around the two coefficients. If bit one is embedded, the lower coefficient
magnitude is set K decibels higher and the higher coefficient is set K decibels lower.
If zero bit is embedded, the procedure is the opposite. This method is illustrated in
Figure 10. The watermark strength is directly determined by the used K value.
Therefore, K cannot be higher than the distance from the subband mean value to the
frequency masking threshold in order for the watermark to remain below the JND
27
level. The presented method also includes attack characterization which analyzes the
host signal blocks with different signal processing methods in order to achieve
maximum robustness against the MPEG compression.
Figure 10. A frequency hopping method for embedding digital watermarks.
3.4. Digital watermarks in DRM
Digital watermarks can be used for protecting and managing rights of digital audio
content. The different nature of watermarking from traditional DRM solutions
introduces new possibilities for creating successful DRM solutions. The following
sections present the most important features and applications for watermarks in
DRM.
3.4.1. Watermarking and encryption
The general approach of DRM in audio is to use some proprietary format where the
content and the rights management metadata are encrypted. Before playback, the file
must be decrypted with a DRM controller, as described in section 2.4.3. After
decryption, the content is separated from the metadata, which causes a security risk.
Audio capturing methods utilize this risk, called the analog hole, which realizes at
the latest when the playback starts and the audio is converted to analog format. This
enables capturing the audio without the DRM metadata. With watermarking, the
metadata is always part of the content, and there is no analog hole problem. Even if
the audio was captured at any point, the rights management metadata would still be
present in the audio content. Figure 11 illustrates this difference of encryption and
watermarking. [3]
A more efficient solution is achieved when encryption and watermarking are used
concurrently. Encryption can be used with watermarking for creating a secure endto-end communications channel. Generally, there are two basic ways of combining
watermarking and encryption. The first is to encrypt the watermark information, but
leave the content file unencrypted. This protects the watermark in a way that even if
an outsider would be able to extract the metadata from the content, the information
would still be encrypted. The other solution is to encrypt both the watermark and the
content file. This provides the maximum security, but is also more complex in terms
of calculation time when the metadata needs to be read prior to content playback.
28
This approach also removes the benefit that the audio would be playable on all media
players. Figure 12 presents an example DRM scenario where the audio file is
encrypted after the watermark has been inserted. [3]
Figure 11. Comparison of encryption and watermarking in DRM.
Figure 12. Encryption and watermarking can be used together in order to create
a more efficient DRM solution.
The goals of encryption and watermarking are very different and the choice of
which technology to use must be considered with the specific application in mind.
Encryption provides access control so that only authorized users with proper
encryption keys can have access to the content. However, the protection provided by
encryption is only in use when the content is encrypted. After decryption the content
is as vulnerable as without encryption. This access control problem has proven to be
29
problematic for the music industry, but it can be complemented with digital
watermarking in order to create a more robust DRM solution. [34]
3.4.2. Digital fingerprinting
Digital fingerprinting is a technology for digital rights management where unique
identifiers, known as digital fingerprints, are embedded in content before
distribution. The technique is mostly used with digital audio and video, but in
principle it can be applied to digital images as well. Conventional digital
watermarking methods can be used in the fingerprint embedding process. These
methods are discussed in more detail in section 3.3.
The purpose of digital fingerprints is to trace the owner of a particular copy of
multimedia content in case the content is leaked to unintended domains. To achieve
this, the unique identifier in the content must be linked to the owner in some way, for
example with the user ID database used by the content vendor. Then, if a copy of the
content is discovered on the Internet, the user who made it available can be traced by
reading the fingerprint in the content. Figure 13 illustrates the process of embedding
a digital fingerprint in an audio file. The user, Alice, has just purchased a music file
from an online music vendor and the vendor’s server adds Alice’s customer ID as a
fingerprint to the audio. The resulting unique fingerprinted copy of the audio is then
distributed to Alice.
Figure 13. A customer-specific identifier is embedded in the audio content prior to
distribution in order to create a unique digital fingerprint.
Digital fingerprints should be imperceptible and they should not be easily removed
although the content is tampered with. The imperceptibility is vital because the users
would not have any motivation to select a service where the content quality is not as
good as in other services. The fingerprint robustness is also important, because
culprits can try to remove the fingerprint with different attacks, and even non-hostile
users may perform file type conversions or other operations which modify the
content slightly. Therefore, the fingerprint should be as robust as possible while still
remaining imperceptible. The capacity is usually not an important factor, since the
number of unique identifiers increases rapidly with an increased bit depth.
30
3.4.3. Tamper detection
Another important application scenario for digital watermarking in DRM use is
tamper detection. It provides means for determining whether the content has been
modified from the original version. This is achieved with embedding a fragile
watermark to the original content, which is then extracted in the content verification
phase. The fragile watermark is designed so that it is destroyed if the content is
modified in any way, so if the watermark can be extracted successfully then it is
certain that the content has not been modified from the original version.
The main challenge in this application is to prevent unauthorized insertion of the
authentication watermark to tampered or unauthorized multimedia signals. It can also
be desirable to detect specific changes to the content, such as lossy compression,
which can be distinguished from actual content tampering. Most tamper detection
applications use blind watermarking, because of the unavailability of the original
signal. [34]
3.4.4. Attacking digital watermarks
Attacks on digital watermarks can be defined as intentional or unintentional
modifications to the watermarked signal. That is, every small change to the signal
can be considered as an attack on the watermark. The used watermarking method
affects the ability of the watermark to resists some attacks better than others. The
required robustness level also depends on the watermarking application scenario.
The attacks can be generally divided into two categories: friendly and hostile
attacks. Friendly attacks are usually unintentional where the user does not have any
knowledge of the watermark and/or its embedding procedure. Hostile attacks are
always intentional and they aim at destroying the watermark. An example of a
friendly attack could be a radio station performing an audio preparation process for
the audio material. The audio can be normalized to the correct volume level,
equalized for better perceived quality and probably run also through noise removal,
which removes unwanted parts of the audio. A more common friendly attack on a
watermarked audio is the MP3 compression, where the severity of the attack depends
heavily on the used compression rate.
The growing number of attacks has lead to the development of special applications
for testing the robustness of embedded watermarks. A properly defined benchmark
can also function as a performance comparison tool for different watermarking
algorithms. The StirMark application has been created with the benchmark aspect in
mind. It aims at providing a trusted third party watermark evaluation tool for image,
audio and video watermarking. [35]
There are various types of attacks on digital audio watermarks. Based on the basic
characteristics of the attacks, they can be classified into a few basic groups:
dynamics attacks change the dynamic profile of the audio. They can be linear or nonlinear, which modify the spectral components depending on the frequency. Filter
attacks cut off or increase a specific band from the spectrum. The simplest filter
attacks are low-pass and high-pass filters, but more complex ones can also be used.
Ambience attacks create effects similar to those naturally present in a closed room,
most commonly reverb and delay. Conversion attacks are caused by changing the
audio format. Sampling rate, bit depth or the number of channels are the usual
31
properties affected by the audio format selection. Lossy compression attacks use
some specific algorithm to remove information from the audio, which causes loss of
quality. The MP3 compression falls into this category. Noise attacks add some type
of noise to the audio signal. Modulation attacks include effects such as chorus and
flanging. Time stretch and pitch shift attacks change the audio length with keeping
the pitch constant or change the pitch with keeping the audio length constant.
Especially the pitch shift is one of the most sophisticated attacks and it can cause
problems for watermarks embedded on specific narrow band frequencies. [35]
The attacker does not have to limit to a single attack, but instead he can perform
multiple attacks to the same watermarked audio track. This presents a new challenge
to the watermarking algorithms because they have to resist many interference signals
at the same time. However, the attacker must keep in mind that the perceptual quality
of the audio suffers more from a group attack and therefore the attack strength must
be lower than if performing just a single attack.
The goal of the attacker is to modify the watermarked audio signal just enough for
the watermark to be destroyed, because he wants to keep the audio quality as high as
possible. On the other hand, the watermark is robust enough if it survives all types of
attacks on the signal long enough for the cover signal to be destroyed just enough to
make the listening experience noticeably worse. An example of a successful attack
scenario is presented in Figure 14. The attack modifies the watermarked audio in a
way that the extraction algorithm is unable to extract the watermark information
correctly, but the audio content is still of good quality so the listening experience
remains unaffected.
Figure 14. A successful attack destroys the metadata of the watermark, but does
not affect the listening experience too much.
Every watermarking method is strong against some attacks and weak against
others depending on the embedding details. For example, if the watermark is
embedded in the high frequencies, then a low-pass filter can be a tough attack. If the
frequency hopping method is used, then a dynamics attack and the pitch shifter are
the most dangerous adversaries. Figure 15 presents a spectral frequency display of
two audio signals. The first is the original unmodified version and the second is a
version where the pitch of the audio has been shifted by 5 percent. It can be seen that
the frequency components are compressed in the frequency scale, and the pitch shift
is stronger in the high frequencies and weaker in the low frequencies.
32
Figure 15. The spectral frequency display of an audio clip before and after a pitch
shift attack shows that the change is especially notable in the high frequencies.
In the case of digital fingerprinting, there is one special type of attack to be
considered, namely the multi-user collusion attack. It is one of the most sophisticated
attacks against digital fingerprints, because instead of being performed just by one
malicious user, the collusion attack is a group attack. First, all users participating in
the attack get the fingerprinted content legitimately, and then they use averaging
methods to attenuate all fingerprints. If the fingerprint embedding and identification
scheme does not take this attack into account, the collusion attack can relatively
easily destroy all fingerprints from the content. [34]
Figure 16. A successful attack destroys the metadata of the watermark but does not
affect the listening experience too much.
Figure 16 presents a scenario where two users, Alice and Bob, use their
fingerprinted copies to create a colluded copy, where the fingerprint is destroyed. In
reality, the number of users is dozens, but the principle is the same. The attack can be
performed with averaging directly all the samples of the synchronized fingerprinted
files of all participating users. This is an example of a linear attack, which is a simple
and effective way of attenuating digital fingerprints. Because of the good perceptual
33
quality of the fingerprinted audio clips, nonlinear attacks can be used as well. An
effective nonlinear attack is to analyze the minimum, maximum and median of the
corresponding sample values of all fingerprinted versions and using some function
for determining the final output value for the colluded copy. [34]
However, a good collusion-resistant fingerprint can survive a collusion attack and
identify all or part of the participants in the attack. One possibility is to use
orthogonal watermarks as fingerprints, which ease the fingerprint distinction process.
The other solution is to use code modulation with creating user fingerprints from
linear combinations of orthogonal basis signals. This method allows introducing
correlation between fingerprints and more fingerprints can be used than normally
would be possible for a given dimensionality. [34][36]
3.5. Removable watermarking
Removable watermarking is a special technique where the watermark is embedded in
a way that it can be removed from the host signal after embedding. It is also
sometimes called reversible, invertible or erasable watermarking.
Embedding a reversible watermark is different from conventional watermarking
because the embedding algorithm must store the recovery information for the
watermark. This information is used in the watermark removal process for accurate
recovery of the original signal. Currently, the research focus has mainly concentrated
on removable watermarking of digital images, but most of the methods can be
applied to audio signals as well.
The methods for reversible watermarking for images and video have been divided
into three categories: data compression, difference expansion and histogram bin
exchanging. Data compression methods embed the recovery information required for
removing the watermark into the watermark itself. In difference expansion, the
watermark data is embedded in expanded values of some small numbers which
represent the features of the original data. The third method category relies on
shifting the histogram bins according to the watermark information. [37]
The major challenges for removable watermarking are embedding capacity and
robustness [37]. Capacity is often limited because the amount of required recovery
information usually increases with the increased capacity. It is not necessarily a
requirement for all applications, but it certainly limits the usage of removable
watermarking to a certain set of applications.
3.6. Commercial solutions
Although digital watermarking is a relatively new research topic, it has received a
steadily growing interest from the media industry which is constantly finding more
applications for utilizing it in various scenarios. Most of the commercial
development has concentrated on digital image watermarking, but a few digital audio
watermarking solutions have also been developed.
The usage of digital watermarking techniques in the media industry is wideranging. It is used by broadcasters to track and measure TV programming and
advertising, and by movie studios and music labels to deter content piracy. Media
34
and entertainment companies use watermarking to identify and manage media assets,
photographers and image aggregators to manage image copyrights, and satellite
image providers to verify ownership of their images. Also governments authenticate
IDs and prevent document counterfeits with the help of digital watermarking
applications. [38]
3.6.1. MarkAny
MarkAny is a a Korean company developing digital security, authentication and
copyright protection applications. It provides image, video and audio watermarking
products and also a CastLog broadcast monitoring system for video and audio
content. MarkAny also has a product aiming at protecting user-generated content
with digital watermarks from unauthorized copying. MarkAny watermarking
technology has been implemented for applications in mobile commerce, document
security and forensic tracking usage.
The audio watermarking solution of MarkAny, MAO 2.0, is a product
concentrating on copyright protection. It embeds copyright information in audio
content, and it features imperceptible watermark embedding and robustness against
audio compression and signal processing. It is essentially a software library which
can be implemented into other applications. The current version supports only WAV
audio format, but it can be applied to other service providers, extending the support
to MP3 or other file formats. MAO 2.0 is certified by Secure Digital Music Initiative
(SDMI). MAO 2.0 is mainly targeted for music producers, audio software developers
and music content industry in general. [39]
3.6.2. Verance
Verance provides cross-platform copy protection solutions for video and music
content. They are widely used in business and consumer applications, most notably
in Blu-ray, HD DVD, DVD-Audio and SD-Audio formats. Verance provides a set of
embedders and verifiers for inserting the watermark into video and audio content.
Detectors are also provided for various platforms for reading the watermark
information. [40]
Verance Copy Management System for Audio content (VCMS/A) is a copy
protection solution for DVD-Audio, SD-Audio and SDMI portable device consumer
product formats. It claims to be the only solution providing persistent and selfidentifying copyright management across all music distribution formats. Some
product manufacturers have inserted detectors in their DVD-Audio, SD-Audio and
SDMI portable devices which enable detection of VCMS/A watermarks in audio
content. The detection module then interprets the watermarked information and
delivers it to the playback and record control software. [40]
The Verance audio watermarking technology has been included in the Advanced
Access Content System (AACS) standard for content distribution and digital rights
management. It is widely used in HD DVD and Blu-ray discs and players. The movie
studios can insert a watermark to a movie audio track, which can then be detected in
video players if someone manages to copy the movie, for example with illegal
35
camcording. The watermark is embedded with modifying the audio waveform in a
regular pattern to convey the information. Another version of the use of Verance
audio watermarking solution with AACS is used in home entertainment, where the
creation of illegal copies of purchased or rented discs can be disabled. [41]
3.6.3. Philips
Philips provides content identification solutions for digital video and audio through
several digital watermarking products. Its video watermarking products include
RepliTrack for forensic tracking purposes, CompoTrack product family for creating
flexible watermarking solutions, CineFence for protection against illegal movie
recording in digital cinema environment and VTrack for identifying the source of
pirated PayTV content.
The Philips audio watermarking software is part of the CompoTrack family, called
CompoTrack WAV. It is a product for Microsoft Windows with a DLL API interface
and a support for the WAV file format. CompoTrack WAV also includes a detector
for WAV audio files or WAV audio streams.
Several other companies base their watermarking solutions on Philips
CompoTrack products. Media Science International has developed MSI Copy
Control software for providing services for record labels to protect their digital
content from piracy using content management systems and watermarking
technologies. It focuses on protecting the audio copyright right from the recording
studios and also through promotion and distribution, and promises full compatibility
and integration with Rimage, a producer of Blu-ray and DVD-R discs. Other
companies using CompoTrack WAV for creating watermark solutions include
Fortium and Ezee studios.
3.6.4. Alpha Tec
Alpha Tec is a company specialized in digital image and video processing and
multimedia applications. Alpha Tec has a digital audio watermarking product called
AudioMark, which concentrates on copyright protection scenarios. The AudioMark
software package is designed for embedding inaudible watermarks in digital audio
and detecting them from suspected audio files. It supports RAW and WAV audio file
formats, batch processing and uses blind extraction while detecting the watermark
from an audio signal.
AudioMark claims to be resistant to MPEG audio compression, filtering,
resampling and requantization signal processing operations. One of the goals of the
company in designing AudioMark is user-friendliness.
36
4. ALGORITHM FOR REMOVABLE WATERMARKING AND
FINGERPRINTING
This chapter introduces the watermarking algorithm used in the audio protection
system which is discussed in further detail in chapter 5. The algorithm designed in
this work is an improved version of the algorithm presented in [2]. The paper
introduced a removable watermarking algorithm for digital audio, where an audible
noise signal is inserted into an audio file, which is then made available freely as a
teaser of the original content. The user can then remove the noise and restore the
original audio quality for a fee.
The goal of the new algorithm presented in this thesis is to provide tools for an
online music store to publish all their music as free preview versions on the Internet,
after audible watermarks have been inserted to the audio files. The purpose of the
watermark is to disturb the listening experience enough to make extended listening
uncomfortable, but still allow a nice preview of the song in question. The users can
download and listen to the preview versions freely, but if they want to have the high
quality version without the disturbing watermark, they have to purchase a license for
the song. When the user purchases a license, the online store creates and sends it to
the user’s device, which then removes the audible noise and transforms it into an
inaudible digital fingerprint.
The fingerprint in the purchased song contains the user’s music store ID, which
can be used for tracing the original owner of the copy if the song is being distributed
on the Internet. Therefore, it is very important that the fingerprint is robust enough to
resist basic signal processing attacks. The robustness of the fingerprint was one of
the priorities of this thesis, and the algorithm was tested against a large set of signal
processing attacks. The test scenario and results are discussed further in section 4.4.
The algorithm is divided into three phases: embedding, noise transform and
fingerprint detection. The embedding phase is very similar to the previous algorithm
version of [2], but some modifications have been made. These include adding the
Linear Congruential Generator (LCG) for generating the pseudo-random sequence
and the synchronization signal embedding process. The noise transform is based on
the watermark removal algorithm, but the new version includes synchronization and
the ability to leave part of the watermark as a digital fingerprint. Also, both the
embedding and the noise transform algorithms include a new feature for supporting
different band widths of the watermark. This feature was used for improving the
robustness and the imperceptibility of the new version. The fingerprint detection part
is completely new and it was not included in the previous version. These phases are
presented in further detail in the following chapters.
4.1. Embedding
In the embedding phase, a removable watermark is inserted into the original audio in
order to produce the distributable preview version. The algorithm combines several
digital watermarking techniques, such as frequency hopping and direct sequence
spread spectrum watermarking.
Inputs of the process are the uncompressed original audio file and the pseudorandom key for improving the security of the watermark. At first, the original file is
37
divided into blocks of 1024 samples and each block is processed separately from
here on. The FFT coefficient magnitude of each block is calculated, and the pseudorandom key defines the coefficients which are modified according to a random K
value from a specified decibel range [min_k, max_k]. The scaling factors k1 and k2
are then calculated by modifying the FFT magnitude array in decibels and comparing
it to the original complex FFT array. The values in the complex FFT array are then
scaled according to the scaling values in order to produce the complex FFT array
with the added noise. After the IFFT, all blocks of 1024 samples are combined
together and the distributable audio file is created. The final step is to add a spread
spectrum synchronization signal to the beginning of the block sequence.
Outputs of the process are the distributable audio file and the watermarking key,
which contains the recovery information needed for removing the watermark. The
recovery information consists of the used pseudo-random key, the array of K values
and the synchronization signal. The embedding process is illustrated in Figure 17.
Figure 17. The process of embedding the initial audible watermark.
38
4.2. Noise transform
The noise transform process takes place when the user has acquired the license for
the audio file, and the noise can be removed from the preview version. The required
inputs are the free distributable version of the audio, the spread spectrum
synchronization signal and the watermarking key. The output of the process is the
fingerprinted audio file. An overview of the algorithm is presented in Figure 18.
Figure 18. An overview of the noise transform process of the implemented system.
The process can be divided into three steps: synchronization, block processing and
combining the result audio. Synchronization determines the starting point of the
watermarking sequence. The synchronization method utilizes direct sequence spread
spectrum watermarking techniques, which are described in more detail in section
3.3.3. Synchronization is important because different lossy compression encoders,
such as LAME for MP3 encoding, may add some additional samples to the
beginning of the audio in the encoding phase. The synchronization signal is removed
39
from the audio after the starting point has been located in order to achieve higher
audio quality.
After the synchronization step, the audio is divided into blocks, which are
processed in a similar way than in the embedding phase. FFT coefficient magnitudes
of the individual blocks are modified with the K array values which are part of the
watermarking key. The pseudo-random key is used for deriving the frequency
hopping sequence which determines the exact FFT coefficients to be modified. The
modification is done by first determining the scaling values k1 and k2 by modifying
the FFT magnitudes array values, and then scaling the actual complex FFT array for
creating the fingerprinted FFT values. Then, after the IFFT, all blocks are combined
together and the fingerprinted audio file is created.
The actual noise transform from noise into a fingerprint is done when the FFT
coefficients are modified with the K array values. It is possible because the K array
values are not exactly the same values which were stored by the server in the
embedding phase, but instead they are modified slightly by the server in a way that it
contains the digital fingerprint of the user. The ID of the user in the music store can
be used as the fingerprint data. This means that a unique K array must be generated
by the server every time a new customer purchases a license for a song, because of
different fingerprint data.
One advantage of this kind of approach is that the song is never in an unprotected
state, because it transforms directly from the free preview version into the
fingerprinted version without any additional steps in between. It is also convenient
for the user because he does not have to download the song again after purchasing.
Instead, he only needs to acquire the license and wait for the local noise transform
process to be completed.
4.3. Reading the fingerprint
The last part of the algorithm is the one that the rightsholders would rather not use,
but unfortunately, it is still probably necessary. When a rightsholder discovers that
songs are being distributed illegally, he can take one of the songs and check if there
is a fingerprint. The purpose of the fingerprint reading algorithm is to use the whole
song and extract the digital fingerprint as reliably as possible. It uses the original
audio file in the process, so the extraction method is non-blind. An overview of the
fingerprint reading process is presented in Figure 19.
The first step of the algorithm is to synchronize the fingerprinted file to the
original audio file. The synchronization is performed in a similar way than with a
separate synchronization signal. Then, as in the previous algorithms, the audio is
divided into blocks and the FFT coefficient magnitudes are analyzed. This time also
the original signal is divided into blocks and their FFT magnitudes are calculated.
The pseudo-random frequency hopping sequence is generated from the pseudorandom key and those coefficients of the original and fingerprinted are compared.
This gives us the bit value and the bit intensity of the current block. After all blocks
have been compared, the encoded bit array is created by integrating over all bit
values and their intensities. The final step is to decode the error correction, which
results in the fingerprint data.
40
Figure 19. The algorithm for reading the fingerprint.
4.4. Performance evaluation
Because the protection of the presented DRM algorithm relies heavily on the
robustness and imperceptibility of the fingerprint watermark, a proper testing process
had to be performed in order to evaluate the performance of the algorithm. The
robustness and imperceptibility tests were tested separately in two test cases. The
robustness test applied a series of signal processing attacks on fingerprinted songs.
The attacks aimed at destroying the fingerprint watermark without otherwise
destroying the audio quality. The imperceptibility was tested with a listening test by
a dozen of test users. The test was implemented with a web-based Audio Quality
Evaluation Tool, which allowed the users to listen to watermarked and nonwatermarked audio clips and evaluate their perceptual quality.
4.4.1. Robustness
The robustness of the algorithm was tested with an extensive set of attacks against
the fingerprint watermark. A set of 15 different attacks was compiled from [35],
which presents a wide range of selected attacks from various attack classes used in
41
the StirMark benchmark. These attacks were applied to 16 different fingerprinted
audio samples, which represent different musical styles. The attacks were configured
to cause nearly imperceptible changes to the audio in order to provide a realistic
attack scenario, because the listening experience must not deteriorate, as explained in
section 3.4.4. The attacks and their descriptions are presented in Table 1. The
specific attack properties are listed in more detail in Appendix 1.
Table 1. The applied attacks and their descriptions
No.
1
2
3
4
5
6
7
8
9
10
11
12
Attack
Description
MP3 compression
The de facto standard encoding for music on
(two attacks with 128 and digital audio players. Basically, a lossy
192 kbps bit rate)
compression algorithm utilizing psychoacoustics models for effective perceptual
coding.
Chorus
Multiple delayed versions of the audio are
added into itself. The delay time, modulation
strength and voice number parameters can be
modified.
Compressor
The loudest signal peaks are limited, which
allows stronger overall signal strength.
Delay
A delayed copy of the audio is added to the
original copy.
Flanger
A slightly delayed copy of the audio is added
to it, with the length of the delay changing
constantly.
Invert
Inaudible attack. The audio sample values are
inverted.
Low pass filter
A filter that removes all frequencies higher
than the chosen parameter.
Pitch
The frequency of the audio is changed
(two attacks with -1% and without changing the speed of the audio
+1% pitch)
signal.
Random noise
The audio sample values are modified with a
random value. The maximum change from
the original is specified with a separate
parameter.
Resampling
The sampling frequency of the audio is
changed.
Reverb
Similar to delay but with shorter delay time
and reflections.
Stretch
The audio duration is changed without
(two attacks with -2% and changing the audio frequency or pitch.
+2% stretch)
The test preparations required creating the fingerprinted versions of all songs in
the test. Three different versions were used: one without error correction, one with
hamming and one with turbo encoding. First, the original song versions were
manipulated with the embedding algorithm presented in section 4.1. This created the
42
distributable preview versions, which were then transformed into the fingerprinted
versions with the noise transform algorithm described in section 4.2. All 15 attacks
were applied to each three versions of fingerprinted song with Adobe Audition 2.0
audio processing software. This resulted in a total of 720 fingerprinted audio clips,
which were modified by the attacks.
The final step of the testing process was to check if the fingerprint reading
algorithm was still able to read the fingerprint watermark from the audio clips
regardless of the attacks. The test used the fingerprint reading algorithm presented in
section 4.3. for detecting the watermark.
It should be noted that the length of the audio clips was between 11.5 seconds and
19 seconds with the average length being 13.5 seconds. The clips were selected to be
short because the noise transform algorithm had to be processed on a mobile phone,
and increasing the audio length would increase the processing time. The short length
of the audio samples affects greatly the effectiveness of the fingerprint detection
algorithm, because during embedding the fingerprint is iterated over the whole audio.
In the detection phase, the detection results are also iterated, and a longer audio file
provides a more reliable detection result. If complete songs would be used, the audio
length would be 10-15 times longer than in these tests. Therefore, this test provides
data on how the algorithm performs with much shorter audio files than normally
would be used on a system like this. The fingerprint detection results on a test with
full songs would be better than the results of this test.
The results of the robustness are presented in Table 2. The table presents the
detection percentage of the fingerprint from all 16 test samples. A result of 100%
means that the fingerprint was detected from all 16 audio clips, which were modified
with the corresponding attack. A result of 81% means that the fingerprint was
detected from 13 of 16 audio clips.
Table 2. The fingerprint detection percentages of each song after the signal
processing attacks
Attack
MP3 128kbps
MP3 192kbps
Chorus
Compressor
Delay
Flanger
Invert
Low pass filter
Pitch -1%
Pitch +1%
Random noise
Resampling
Reverb
Stretch -2%
Stretch +2%
Not encoded
100%
100%
100%
81%
100%
100%
100%
100%
25%
38%
100%
100%
100%
94%
88%
Hamming
100%
100%
100%
100%
100%
100%
100%
100%
25%
31%
100%
100%
100%
100%
69%
Turbo
100%
100%
100%
100%
100%
100%
100%
100%
31%
19%
100%
100%
100%
100%
94%
43
4.4.2. Imperceptibility
The imperceptibility of the algorithm was tested with several users who listened to
watermarked and non-watermarked audio clips. The test environment was a webbased Audio Quality Evaluation Tool which has been used in previous studies for
evaluating the perceptual quality of digital audio watermarks.
The total number of users who participated in the listening test was 10. The
respondent age distribution varied so that 7 of them were aged between 21-30 years,
and also one respondent from each category of 11-20, 31-40 and 41-50 years. Six test
users described themselves as average music listeners, two of them were dealing with
music in their work and two played some musical instrument.
From the audio clips used in the robustness test, 10 clips were selected to be used
in the imperceptibility test. The clips were selected to represent different music
styles. Fingerprinted versions were also created in a similar way than in the
robustness test. This time only the version without error correction was used, since
message encoding does not affect the perceptual quality of the audio signal. The
original and the fingerprinted audio clips were then uploaded to the evaluation tool.
Every file was uploaded two times in order to increase the statistical reliability of the
results. This resulted into 40 audio clips, from which 20 were watermarked and 20
were not.
The users first listened to every clip one at a time and evaluated whether it was
watermarked or not. Then they evaluated the perceptual quality of the clips with a
grade from 1 to 5. The grades were accompanied with textual descriptions, which are
presented in Table 3. The scale is one of the suggested subjective audio quality
measurement methods in ITU-T P.800 recommendation [42].
Table 3. The grades and their descriptions used in the user tests
Grade
1
2
3
4
5
Text
Imperceptible
Perceptible, but not annoying
Slightly annoying
Annoying
Very annoying
The answers percentages for each audio clip, both non-watermarked and
watermarked versions, are presented in Table 4. The answers are categorized into
three listener groups. The fourth result column includes the results of all three
listener groups. The presented percentages in the non-watermarked column are
averaged between the two identical non-watermarked audio clips, and the
percentages in the watermarked column are averaged between the two identical
watermarked audio clips.
From the 20 non-watermarked audio clips the average number of correct answers
was 20*0.63 = 12.6, and from the 20 watermarked audio clips, the average number
of correct answers was 20*(1-0.57) = 8.6. This results in a total of 21.2 of 40 correct
answers per user, which is 53% of the total amount. The users answered on average
that 60% of the songs were not watermarked and 40% were watermarked although
the correct ratio was 50% to 50%.
44
Table 4. The percentages of answers deciding that the audio clip was not
watermarked
Audio clip
Non-watermarked
Watermarked
Average Musician Pro
All
Average Musician
bigyellow
25%
50%
75%
40% 42%
75%
exitmusic
50%
50%
75%
55% 58%
50%
bryanadams 75%
75%
75%
75% 67%
50%
cocker
58%
75%
75%
65% 75%
75%
duel
75%
25%
75%
70% 58%
50%
finlandia
58%
25%
75%
55% 58%
75%
metallica
75%
50%
100% 75% 67%
75%
queen
67%
25%
100% 75% 50%
50%
sipe
58%
100%
75%
70% 42%
25%
sting
58%
50%
25%
50% 50%
75%
53%
60%
AVERAGE 60%
75% 63% 57%
Pro
25%
25%
75%
75%
75%
75%
50%
50%
75%
25%
55%
All
45%
50%
65%
75%
60%
65%
65%
50%
45%
50%
57%
The results for the perceptual quality evaluation of the audio clips presented as
mean opinion score (MOS) values are presented in Table 5. These values were
accompanied by the textual descriptions presented in Table 3.
Table 5. The average (MOS) grades per audio clip categorized with listener types
Audio clip
Non-watermarked
Average Musician
bigyellow
4.50
4.75
exitmusic
4.09
4.75
bryanadams 4.92
5.00
cocker
4.67
5.00
duel
4.67
5.00
finlandia
4.84
5.00
metallica
4.75
5.00
queen
4.67
4.75
sipe
4.50
5.00
sting
4.58
5.00
AVERAGE 4.62
4.93
Pro
All
5.00 4.65
5.00 4.40
5.00 4.95
5.00 4.80
5.00 4.80
5.00 4.90
5.00 4.85
5.00 4.75
5.00 4.70
5.00 4.75
5.00 4.76
Watermarked
Average Musician Pro All
4.59
5.00
5.00 4.75
3.67
5.00
4.50 4.10
4.92
5.00
5.00 4.95
4.50
5.00
5.00 4.70
4.59
4.75
5.00 4.70
4.33
5.00
5.00 4.60
4.75
5.00
5.00 4.85
4.50
5.00
5.00 4.70
4.59
5.00
5.00 4.75
4.59
4.75
5.00 4.70
4.50
4.95
4.95 4.68
4.5. Discussion
Robustness and imperceptibility of the fingerprint watermark were one of the main
focus areas of this thesis. The watermark was tested against a large set of different
attacks to ensure that it could not be easily removed. The attack set included attacks
from every attack group presented in [35].
The results indicate that the implemented algorithm is most vulnerable to the pitch
attack. This was of course expected, because the watermark is embedded in the
frequency domain. The FFT coefficient values change radically in the pitch attack
because it modifies the sound frequencies directly. Figure 15 in section 3.4.4.
demonstrates the effect of this attack. However, modifying the pitch is probably the
45
most audible from the attacks. If two versions of a song modified with pitch -1% and
pitch +1% attacks were listened carefully and compared, the listener would probably
be able to notice a clear difference between them. The difference between pitch -1%
or +1% and the original version is much harder to notice, but still it is the most
audible attack of the set, except from the attack of resampling to 8,000Hz, which was
not intended to be an inaudible attack.
The use of forward error correction seems to improve the results a little. The
reason for lower success rates in some situations is probably the use of short audio
clips. This is because the use of error correction increases the message length by 75%
(hamming) or by 256% (turbo). This means that the number of times the message can
be repeated during the audio is much lower than if error correction was not used.
Using longer and more realistic audio clip lengths would improve the results in all
situations significantly. The clips were selected to be short in this study because of
performance issues, as the duration of the noise transform process would increase
linearly with the audio clip length. Another reason for using short clips was the use
of copyrighted works as test samples. The fair use principles recommend using
shorter than 30 second audio clips if the works are copyrighted.
The overall results show that the implemented algorithm is fairly robust against the
most common signal processing attacks. The lowest success rates were against the
pitch attack, 25%, which means that in every fourth audio clip the fingerprint was
still detected. This is a sufficient result in most cases, because Internet piracy rarely
distributes individual songs, but instead whole albums or even whole discographies
are distributed. This increases the probability that the fingerprint can be detected at
least in part of the songs, which is the goal of this DRM system.
The imperceptibility of the fingerprint watermark was tested by a dozen test users,
who listened to watermarked and original versions of the audio clips. In the first test,
they evaluated which clip was watermarked and which was not, and in the second
test, they evaluated the imperceptibility of the audio clips with a numerical grade
from 1 to 5.
The results of the experiment show that the test users marked 63% of the original
non-watermarked audio clips as non-watermarked and 57% of the watermarked clips
as non-watermarked. The 6% difference in the values comes down to 24 answers
from the total of 400 answers collected in the entire experiment. This is 2.4 answers
per user and 0.24 answers per song and 0.06 per clip. This means that in every fourth
song an average user would favor the non-watermarked version in one of the four
audio clips of the song. In other words, he would favor the non-watermarked version
in every 17th audio clip. This result would suggest that the watermark was not
entirely inaudible. However, the margin of error and the confidence interval must
also be taken into account.
The standard error can be calculated with the formula
e=
p * (1 − p )
,
n
(4)
where p is the proportion of correct answers and n is the total number of answers in
the test. This gives us the standard error of 2.50%. For normally distributed samples,
the confidence interval of 95% is calculated with multiplying the standard error with
1.960. The resulting confidence interval of 95% is therefore ±4.89%. This means that
46
if a larger sample set would be used, the number of correct answers would in 95%
probability be between 48.11% and 57.89% of the total amount of listened audio
clips. With calculating the gauss error function, we can deduce that, if the confidence
interval was loosened to 76.99%, then, the resulting percentages would be 50.00%
and 56.00%. This statistical analysis of the first listening test suggests that listeners
would in 76.99% probability favor slightly the non-watermarked version in terms of
audio quality.
The difference in results between listener groups is quite notable. The users
identifying themselves as professionals working with audio got the best results,
which is not unexpected. However, the results of the two musician test users got the
least accurate results. If the results of the two professional test users are removed,
then the percentages of ‘not watermarked’ answers are 58.25% for non-watermarked
and 57.75% for watermarked audio clips. This suggests that an average use could not
tell the difference between the watermarked and the non-watermarked version. The
analysis of the results without the professionals is justified because the two persons
have been working closely with digital audio watermarking, which gives them an
advantage in the watermark detection.
In the second test, the listeners evaluated the perceptual quality of the audio clips
with giving them a grade from 1 to 5, where 1 was very annoying and 5 was
imperceptible. The users identifying themselves as average music listeners were
clearly most critical in their evaluation with an average grade of 4.619 for nonwatermarked and 4.503 for watermarked audio clips. The trend is the same as in the
first test where the users also preferred the non-watermarked version. The average
grade of the musician test group also follows the same pattern as in the first test
because they gave an average score of 4.95 for the watermarked clips and 4.925 for
the non-watermarked clips. The professionals were surprisingly the least critical and
they gave the best scores for both the non-watermarked and watermarked audio clips.
The lower scores of the average users could be a result for a misinterpretation of
the natural distortions of the music. This is supported by the lowest scores of 4.09
and 3.67 given to the exitmusic audio clip, which is heavily compressed with a
dynamic range compressor in order to achieve higher loudness levels. The resulting
audio has less dynamics than a normal uncompressed song, and even the nonwatermarked version sounds distorted, although it does not contain any watermarks.
This may have affected the listeners, especially in the average group, who gave the
score of 4.09 for the non-watermarked clip. However, the lowest watermarked clip
score of 4.1 for the exitmusic from all listeners is justified, because the compression
affects the song in a way that it makes originally quiet parts sound loud. The song
has quiet singing and a guitar playing in the background, but as the levels are maxed
out, the algorithm embeds the watermark with a stronger scaling than if the audio
levels were lower. This causes the watermark to be more audible in exitmusic than in
other clips. The average users seem to have been the most sharp-eared for detecting
this.
Except from the exitmusic audio clip, the differences between the nonwatermarked and watermarked MOS grades are small. If the exitmusic grades are
removed from the test, the average grade from all users was 4.79 for nonwatermarked and 4.74 for watermarked clips. The difference in grades is only 1%
which is less than the standard error. This implicates that the watermark would be
inaudible if the problem with dynamically compressed audio signals was fixed.
47
The conclusion from the results is that the fingerprint watermark is very close to
being inaudible. In the first test, only the professional users could tell the difference
between watermarked and non-watermarked audio clips. The second test showed that
the watermark algorithm should be improved to achieve more inaudible results in
quiet audio clips that have been dynamically compressed. Otherwise, the results
show that the users were not able to distinguish the watermark from the audio.
48
5. DESIGN OF ROBUST AUDIO PROTECTION SYSTEM
This chapter presents the design of a robust audio protection system utilizing digital
watermarking techniques. The purpose of the system was to implement and test a
new business model for mobile audio distribution. The new business model is
described in section 5.1. The use cases which are used for deriving the requirement
specification are presented in section 5.2. A detailed system architecture description
is given in section 5.3., and the software design process is described in section 5.4.
5.1. General system description
The implemented system is a mobile audio distribution system, which consists of an
online music store server and a mobile client application. The purpose is to allow the
music vendor to distribute all their music as free distributable preview versions,
which are available for download to all users. These preview versions contain an
audible noise signal, which allows the users to listen to the whole song and to sample
what would the high quality version sound like. The users can then purchase a
license for the song, which removes the noise and restores the original high quality of
the song.
The new business model tries to solve the problem of people using short preview
samples as ringtones in mobile phones causing revenue losses for the mobile
industry. The preview versions with the added noise are not so attractive in ring tone
use because of the disturbing noise, so the ring tone sales are not affected by the use
of preview samples. The use of full songs as previews also gives a better preview of
the whole song for the users and it also speeds up the song purchasing process,
because users do not need to download the song again. This can also show up as an
increase in music sales, especially in mobile markets where the bandwidth is not yet
as high as in desktop computers.
The digital watermarking algorithms presented in chapter 4 form an important part
of the system, because they perform the required watermark and fingerprint
processing functions. The music store server and the client application are basically
interfaces for the actual algorithms connected via client-server architecture. The
embedding algorithm is used by the online music store when songs are imported to
the system for the first time, and the noise transform algorithm is used by the mobile
client application when the user purchases a license for a song. The fingerprint
detection algorithm is a special type of algorithm, which is used separately from the
other two software components of the system. The idea is to read the fingerprint of
only those audio files which show up on unauthorized music distribution channels,
such as peer-to-peer networks.
5.2. Use cases
The use case analysis concentrates on finding the requirements of the software by
means of separate use cases, which represent the different behavioral aspects of the
system. Each use case contains a set of inputs for which the system must respond
accordingly. This leads to the specification of functional requirements for the system.
49
The analysis defines only use cases initiated from the client side, because the music
store administration tasks, such as importing new audio files and detecting
fingerprints, were not implemented outside Matlab environment.
5.2.1. Downloading songs for preview
The first use case consists of requesting a content index from the server, displaying it
to the user and downloading one of the preview music files in the index. The use case
is initiated by the user of the client application. It sets requirements for the message
sequences between the server and the client, and the performance of the network
connection is also an important factor during the song download process.
Actors: Music store customer
Preconditions: The client application has been installed on the mobile phone and it
is running. The network connection is available on the phone over Wi-Fi or 3G
network. The music store server is running on a computer with network access and it
has preview music files available for download. The IP address and port of the server
has been set in the settings of the client application.
Description: The user selects a menu command which starts the preview file
download process. The user is asked for a network access point. He selects either the
preferred Wi-Fi connection or a 3G network connection. After a few seconds, the
server responds with sending a list of the preview files available for download. The
list is displayed to the user as a selection list box. The user selects one song from the
list and the server starts sending the file. A progress bar is displayed to the user,
indicating the status of the download. After the file has been downloaded, the new
song is written to the phone memory and it is possible to listen to the preview with
the client application.
Exceptions:
1) The phone fails to initiate a network connection.
2) The network connection is interrupted before the song has been downloaded.
3) There is not enough memory for saving the downloaded file on the phone.
4) The mobile phone runs out of battery.
Post-conditions: The user has downloaded a preview music file from the music
server. The song is saved on the phone and it is displayed to the user in the client
application user interface. The file can be played with the client application.
5.2.2. Purchasing a license for a song
The second use case is about requesting, transmitting and applying a license file to a
previously downloaded preview music file. It is also initiated by the client user. The
primary task in the use case is applying the license, which requires a lot of
50
processing power and memory from the device. The network connection is used only
for requesting and transmitting the license data.
Actors: Music store customer
Preconditions: The client application has been installed on the mobile phone and it
is running. The network connection is available on the phone over Wi-Fi or 3G
network. The music store server is running on a computer with network access and
has preview music files available for download. The server also has the
watermarking keys for the corresponding files stored. The IP address and port of the
server and the user ID has been set in the settings of the client application.
Description: The user selects a song from a list box which displays all previously
downloaded preview music files. Then the user selects a purchase menu command,
which starts the license retrieval process. The user is asked for a network access
point. He selects either Wi-Fi or 3G network connection. The client application sends
a license request containing the ID of the selected song and the user ID to the server.
The server generates a unique license containing the user ID embedded in the
watermarking key and sends the key back to the client application. After a few
seconds, the client begins the noise transform process described in section 4.2. The
process removes the noise watermark in the preview version and transforms it into a
fingerprinted version where the user ID has been embedded with an inaudible
watermark. After the noise transform process, the fingerprinted version is written to
the phone memory. The new file is displayed in the user interface
Exceptions:
1) The phone fails to initiate a network connection.
2) The network connection is interrupted before the license has been received.
3) There is not enough memory for saving the transformed file on the phone.
4) The mobile phone runs out of battery.
Post-conditions: The user has purchased a license for a previously downloaded
preview music file. The application has created a high quality fingerprinted version
of the song, which is displayed to the user in the user interface of the client
application. The file can be played with the client application.
5.2.3. Requirements specification
The requirements for the designed system can be derived from the use cases. The
requirements can be divided into technical and functional requirements, where the
former concern the technical constraints of the system and the latter the actual
functionality the system must provide. The technical requirements are listed in Table
6 and the functional requirements in Table 7.
51
Table 6. The technical requirements of the system
No.
1
2
3
4
5
6
Requirement
The preview files must be downloadable over Wi-Fi or 3G
connection in less than a minute.
The license files must be downloaded in less than five seconds.
The noise transform must not take over 30 seconds.
The server must support multiple users simultaneously.
The server must not crash if the connection is terminated
unexpectedly.
The client must not crash if the connection is terminated
unexpectedly.
Use case
1
2
2
1, 2
1, 2
1, 2
Table 7. The functional requirements of the system
No.
1
2
3
4
5
6
7
Requirement
The client application must be able to download a list of
available preview files from the server and display the list to the
user.
The user must be able to select a song from the list of preview
files and the song must be downloaded and saved on the client
device.
The user must be able to play preview and fingerprinted songs
on the device.
The user must be able to select a preview music file and request
a license from the server.
When requested, the server must generate a unique license based
on the ID of the user and send the license to the client.
The client must perform a noise transform on the device
removing the audible noise from a preview music file and
transforming it into a fingerprinted song.
The network connection between the server and the client must
be left on for the duration the client program is running after the
client has connected to the server.
Use case
1
1
1, 2
2
2
2
1, 2
5.3. System architecture
The general structure of the system is based on client-server architecture. The server
functions as a music store service provider and the client application is used for
accessing the services provided by the store with a mobile device. These two major
components of the system are further divided into several subcomponents which
represent more accurate abstractions of the component functionalities.
A functional overview of the architecture is presented in Figure 20.
52
Figure 20. An overview of the system architecture with a sequence of events
demonstrating the functionality of the system.
5.3.1. Music store
The music store is a server component providing services to client users. The store
consists of an audio database, watermark database and a license server. An interface
for accessing the administration functions is also provided. The audio database is
where the original versions and the distributable preview versions of the songs are
stored. The original uncompressed versions should be stored as protected files,
because they need to be accessed only by store administrators. The watermark
database contains all watermarking keys, which are linked to the respective preview
music files in the audio database. The keys must also be protected from unauthorized
access, because they contain the information for removing the noise from the
preview versions available for free download. The license server part is used for
generating licenses for songs requested by the client. It uses information from the
watermarking database and the client for creating unique licenses which are then sent
to the client.
The music store has two separate user groups: administrator users and client users.
The administrator users have direct access to the audio and watermark databases and
they can import new songs to the system. They can also access the original
uncompressed songs in the audio database. The client users have access to the music
server via TCP/IP services with using the client application. They can download
preview versions of the songs for free, or request licenses to be generated for
previously downloaded preview versions.
The song importing process is managed by the administrator user. During the
process the preview versions of the songs are created from the original versions with
using the embedding algorithm described in section 4.1. A watermarking key is also
53
created for every imported song. The key contains information for removing the
watermark inserted into the preview version and it must be stored securely in the
watermarking database.
The song preview download process is straightforward. First, the client sends a
request to the server for a list of all available songs which is then displayed to the
user. The user then selects one song from the list which is then requested from the
server and the download process can start. The files could also be available on a
HTTP server for downloading with web browsers.
The third major task of the music store is generating licenses. The licenses are
generated by a license server, which must have access to the watermark database
containing the watermarking keys. The license consists from a song-specific
pseudorandom key, a scaling value of the synchronization signal and an array of
values indicating transitions in the frequency domain of the song. All these values
are part of the watermarking key, but the plain frequency domain transitions must not
be used in the license. This is because the original array contains the information for
removing completely the audible watermark in the preview version. This is not the
intention of the license, but instead the watermark should be transformed into an
inaudible fingerprint. This is achieved with modifying the frequency domain
transition array slightly according to the ID of the user requesting the license. This
method generates unique licenses for each user for each song, which increases the
security of the system.
All communication with the client is done via TCP socket interface. The server
initializes a server socket which spawns a new server thread for every incoming
socket connection. This approach enables multiple simultaneous users.
5.3.2. Client application
The client application is used for accessing the services provided by the music store
server. Its purpose is to function as an easy-to-use front-end for using the functions
required by the business model described in section 5.1.
In addition to the previously mentioned functions related to communication with
the music store, song preview downloads and requesting licenses, the client
application includes several other features as well. It includes a file browser for
accessing the downloaded preview files and fingerprinted music files. It features
basic file operations such as deleting and selecting. The music files can be played
with selecting a file and clicking the play button. This activates the embedded music
player in the application, which offers functions for pausing, resuming and stopping
the music playback. The volume can also be controlled with the dedicated volume
control buttons of the device.
The main functionality concerning the business model is located in the DRM agent
component. It contains the algorithm for the noise transform process which removes
the audible noise from the preview file and transforms it into an inaudible userspecific fingerprint. Information from the received license is used in the process as
described in section 4.2.
54
5.3.3. Communications protocol
The client and the server communicate with a simple and efficient text-based
communications protocol. The purpose of the protocol is to deliver messages from
client to server and vice versa. There are three different message sequences which
are all initiated by the client. The first is a request for the content index which
contains an array of all the downloadable preview files in the server. The server
responds by sending the content index. The next is a request for downloading a
specific preview file. The server responds by sending the length of the file and then
the actual file. The third sequence is a request for a license of a particular preview
file. The client must identify itself by sending the user ID to the server in this phase.
The server responds by creating the license and sending it to the client. Table 8 lists
all the messages from client to server and Table 9 all the messages from server to
client. All messages end with a specific message end character ‘\n’. In the content
file message from server to client, the end character is sent before the content file, so
that the client knows how to process the following N incoming bytes.
Table 8. Messages from client to server
No.
1
2
3
Event
Request a content index from the server
Request a content file (‘filename’) from
the server
Request a license for a specific content
(‘filename’) from the server for the user
(‘id’)
Format
00#
01#<content>filename</content>#
02#<content>filename</content>#
<userid>id</userid>#
Table 9. Messages from server to client
No.
1
2
3
Event
Content index (N = number of files)
Content file
(N = length of the file in bytes)
Content license
(K = Pseudo-random key,
S = Synchronization signal scale
N = Length of the array A
A = Array of transitions in the
frequency domain, separated by a space
character)
Format
10#N#filename1#filename2#...#
11#N#
Content data
12#K#S#N#A#
5.4. Software design
The requirements specification derived from the system use cases are the basis of the
software design process. The goal of the software design is to design a software
system that provides all the functions defined in the requirements. The design
approach used in this thesis was to utilize Unified Modeling Language (UML)
55
techniques for presenting the design. Static software structures were presented with
UML class diagrams and the core behavior was described with UML state diagrams.
In addition, a UML sequence diagram was used to give an overview of the
communication sequence between the server and the client.
5.4.1. Client application
The client application architecture is based on a typical Symbian S60 3rd Edition
application framework. The large number of classes compared to actual functionality
is because of the nature of the S60 architecture.
The basic S60 framework consists of Application, Document, AppUi and View
classes. The entry point function creates the Application class which creates the
Document class which in turn creates the AppUi class. The AppUi acts as a main
controller class in the framework, and it is responsible for creating any view or
container classes. Most S60 applications follow the traditional software architectural
pattern of separating the model, the view and the controller. The model represents the
data or the state of the application. It is often also called the application engine. The
view contains all the visual elements the application displays to the user such as
menus, text or images, and the controller is responsible for reading all user input
events and processing them accordingly.
Figure 21. The class diagram of the client application.
56
The class diagram of the client application is presented in Figure 21. The basic S60
framework classes are located at the top of the diagram forming the model-viewcontroller architecture. CRemoAppUi is the main controller class, and it handles all
user inputs such as menu events or button presses. CRemoAppView is responsible
for drawing the user interface which has a file browser as the main component. The
contents of the file browser are read in the CRemoEngine class, which implements
most of the functionality of the application.
Figure 22. The state diagram of the client application.
57
Application settings are contained in the CSettings class. The stored settings are
server IP address and port, user ID and volume level. The class stores the settings on
a CDictionaryStore, which is basically an ini file with read and write stream access.
The settings can be modified with the class CSettingsDialog which opens an editing
dialog for modifying the values. These classes are owned by the CRemoEngine but
operated by the CRemoAppUi when the user selects the settings menu command.
The network connection is created and operated by the CSocketsEngine class. It
uses several classes to assist in the process, such as CNetConnection which creates
and maintains the actual connection to the network. CMessage is used for parsing the
message received from the server and delivering it to CRemoEngine.
CSocketsReader and CSocketsWriter are used in the corresponding socket operations
and CTimeOutTimer notifies the socket engine in case there is a connection timeout.
Additional classes used by the engine class are CSecondTimer, CFileHandler,
CAudioPlayer, CAlgorithm and CWatermarkFunctionLibrary. The timer class is
used for calling delayed operations such as reconnecting to the server or starting the
noise transform process. The transform process must be delayed a little because the
user interface must be updated before the transform process starts, and it takes a
moment to redraw the screen. CFileHandler encapsulates some functions for
accessing files, CAudioPlayer contains the functions for audio playback and
CAlgorithm implements the actual watermarking algorithms required for the noise
transform process. In addition, CWatermarkFunctionLibrary also contains some
helper functions for handling complex arrays in S60 environment.
A simplified state diagram of the client is presented in Figure 22. It presents the
functions of the application, the server connection logic and the communication
sequences between the server and the client. The only simplifications are the
omission of the settings menu and the file browser states. These are separate from the
main functionality so the validity of the state diagram is not affected.
The client starts unconnected with the file browser displaying the files in the
sounds folder of the S60 directory structure. The possible functions for the user at
this point are playing an audio file or requesting a content list or a license. If the user
selects an audio file and clicks play, the audio playback process is started. The
process terminates when the end of the audio clip is reached or the stop button is
pressed. The playback can also be paused and resumed. Another two functions
available for the user are requesting a content list from the server or requesting a
license for a previously downloaded preview music file. Both of these processes
require the network connection to the server, so the connection process is started.
After asking the network access point from the user, an Internet connection is first
established and then a direct TCP/IP connection to the music store server is created.
Then depending on which function was selected before in the first place, the
corresponding message is sent to the server.
In addition to the initial idle state, there is another idle state where the only
difference is that the server connection has been established. Both states have the
same functions available, but the connected idle state is the only state where the
client can receive messages from the server. After a new message has been received,
it is parsed and the output is presented to the user. Downloading large files from the
server is a special case where the client application goes into a special content
receiving state where the client does not parse the incoming data, but instead it writes
everything into a buffer. After the amount of bytes specified in the header message
has been received, the client writes the data into a file and returns to the idle state.
58
5.4.2. Server application
The server is based on a basic Java socket server architecture. This contains the main
method in a separate RemoServer class, which creates a ServerSocket object and
assigns a specified TCP port to it. A separate RemoServerThread class contains the
actual server functionality such as reading input commands and sending data to the
client. The class diagram of the music store server is presented in Figure 23.
Figure 23. The class diagram of the server software.
The state diagram presented in Figure 24 illustrates the functionality of the
RemoServer class. The main method of the server keeps listening to the port, and it
creates a new RemoServerThread every time a new incoming connection is detected.
RemoServer then passes the corresponding Socket object to the new thread so that it
can have read and write access to the socket.
Figure 24. The state diagram of the RemoServer class.
The functionality of the RemoServerThread class is presented as a state diagram in
Figure 25. After a new RemoServerThread object is created, it goes into an idle state
where the thread listens to input commands from the socket. After a command has
been received, the thread parses it according to the communication protocol rules
presented in section 5.3.3. If the parsed command is a content index request, the
server reads the content index from a file and creates the response message in an
appropriate format. The message is then sent to the client. If the input command is a
content request, the server reads the requested content file and creates and sends the
message to the client. In the case of a license request, the server must first read the
original license data from the file system. This data contains the pseudo-random key,
synchronization signal scale and an array of transitions in the frequency domain. The
final license data is then created by modifying the license data according to the user
ID the client has sent to the server. After this operation, the license message is
created and sent normally. After sending any message to the client, the server returns
to the idle state. The server thread is terminated in case the connection to the client is
closed.
59
Figure 25. The state diagram of the RemoServerThread class.
Figure 26. The sequence diagram illustrating the message interchange between the
client and the server.
60
5.4.3. Sequence diagrams
The network messages used between the server and the client are specified in section
4.4.3. All the possible sequences of messages can be derived also from the state
diagram of the client software, but Figure 26 illustrates a basic message sequence.
All time-consuming activities during the interchanging of messages are preformed
at the client side. This is an important note that can be made from the sequence
diagram. The only phase which takes a longer time than a fraction of a second on the
server is sending the content files, because this operation is divided into multiple
message packets. The conclusion from this observation is that the server can easily
manage multiple connections at the same time. The performance should not become
an issue.
61
6. SYSTEM IMPLEMENTATION AND TESTING
The system implementation is based on the software design process presented in
chapter 5. The watermarking algorithm presented in chapter 4 is also an essential part
of the implementation. This chapter discusses the main features of the implemented
software system with an emphasis on the system and user testing. The system
functionality was tested against the requirements specified in section 5.2.3. The user
tests were based on a web-based Audio Quality Evaluation Tool.
6.1. Software platforms
The server was implemented on Java SE 1.6 platform on a Linux server with kernel
version 2.6.18. The implementation is characterized by the architecture of the basic
Java socket server and the rather small amount of different messages it must be able
to handle. This resulted in a moderately simple implementation with less than 400
lines of Java code. The required audio and watermark databases were implemented
with using a dedicated directory in the filesystem where the preview audio files were
stored. The content index and the license database were implemented with a text file,
which could be easily read with the server application.
The client was implemented on Symbian S60 3rd edition platform. The used SDK
version was S60 3rd Edition Feature Pack 1 (FP1) and the device used while
debugging and user testing was Nokia N95, which uses the same S60 3rd Edition FP1
operating system version. The use of S60 platform sets many requirements for the
developer and the development process. The platform is not as slick as the Java
platform in terms of documentation quality and the quality of the platform libraries
delivered with the SDK. The application was implemented with reusing several
components form previously created S60 applications by the author. Such
components include parts of the algorithm, logging system, socket engine, timer, file
handler and the watermark function library. The use of these components facilitated
rapid software development, but the nature of the S60 platform is unfortunately such
that it sometimes behaves illogically and presents strange problems at random, for
example, during the compiling process. This causes the development process to be
not as rapid as it would be on some another platform.
The signal processing algorithms required by the system were originally designed
and developed on a Matlab environment. This is the general approach and provides a
fast and efficient way of testing and debugging new algorithms. The algorithm used
for the noise transform process was then ported to the S60 platform to be used in the
client application.
6.2. Limitations
The implemented system has some limitations, or simplifications, concerning some
parts of the functionality. Most of the limitations concern the server part of the
software. The two most distinguishable limitations are the lack of an actual financial
transaction during the license purchasing process and the lack of a proper interface
for importing the songs to the music store server.
62
Before the license is generated on the server and sent to the client, the financial
transaction event should occur. However, in the current implementation it is left as a
concept only and it is not implemented. This is because the transaction would be too
complex to implement and also because it would not affect the general usability and
the test results.
The other major simplification is the song importing process. The importing
algorithm was not ported on the software platform of the server, but instead it was
used with the Matlab program. The algorithm produces the watermarking key and the
preview version of the song, which can then manually be uploaded to the server. This
is unpractical if done for hundreds or more songs, but in this test, only a couple of
dozen songs were used so the simplification was justified.
In addition, the server has no graphical user interface, but instead it is used via
TCP/IP access by the client and with direct file access to the databases by the
administrator. This approach also accelerated the development process. Another
limitation is that the communications channel between the server and the client is not
secure. It is done with a regular unencrypted TCP/IP socket connection which would
be an enormous security issue if it was used on a real system. A public-key
encryption could be implemented, but it was left out because it would not affect the
test results in any way.
6.3. Functional tests
The system functionality was tested against the requirements specified in section
5.2.3. The system should provide all behavior listed in Table 7 and also follow the
technical conditions specified in Table 6. The results of the functional tests are
presented in this section.
6.3.1. Downloading a list of preview files
Requirement: The client applications must be able to download a list of available
preview files from the server and display the list to the user.
Execution: The user selects the Download songs command from the menu, and
selects a valid access point to be used for creating the Internet connection. A
connection to the server is established and the list of the available preview files is
transmitted from the server to the client. The list is displayed to the user.
Result: The test was successful.
6.3.2. Downloading a preview file
Requirement: The user must be able to select a song from the list of preview files
and the song must be downloaded and saved on the client device.
63
Execution: The user selects the Download songs command from the menu, and
selects a valid access point to be used for creating the Internet connection. A
connection to the server is established and the list of the available preview files is
transmitted from the server to the client. The list is displayed to the user. The user
selects a song from the list and presses the download button. The file transmission
begins and a progress bar indicating the download status is displayed to the user.
After the download is completed, the file is saved on the device and it appears on the
file browser.
Result: The test was successful.
6.3.3. Music file playback
Requirement: The user must be able to play preview and fingerprinted songs on the
device.
Execution: The user selects a preview music file on the file browser and presses the
OK button. Music playback begins, and the device softkeys are changed to Pause and
Stop. After the playback is complete, the music playback ends and the softkeys are
changed back to Options and Exit. The user then selects a fingerprinted song and
clicks the OK button. The music playback begins.
Result: The test was successful.
6.3.4. Requesting a license for a preview file
Requirement: The user must be able to select a preview music file and request a
license from the server.
Execution: The user selects a preview file on the file browser and selects the
Purchase menu option. After selecting a valid access point, the server connection is
established and the license is immediately received. A text indicates that the license
has been received and the noise transform process has started.
Result: The test was successful.
6.3.5. Generating unique licenses
Requirement: When requested, the server must generate a unique license based on
the ID of the user and send the license to the client.
Execution: The user selects a preview file on the file browser. Then the user selects
the Purchase menu option. After selecting a valid access point, the server connection
64
is established and the text output of the server indicates that the client has requested a
license and it has sent the filename of the song and the user ID to be used in the
license creation process. The server receives the data and reads the watermarking key
from the database. Then it modifies the watermarking key according to the user ID.
A unique license is created for the user, which is then sent to the client.
Result: The test was successful.
6.3.6. Noise transform
Requirement: The client must perform a noise transform on the device removing the
audible noise from a preview music file and transforming it into a fingerprinted song.
Execution: The user selects a preview file on the file browser. Then the user selects
the Purchase menu option. After selecting a valid access point, the server connection
is established and the license is immediately received. The noise transform process
has started. After 13 seconds, the process is completed and a new fingerprinted file is
created. The new file is displayed on the file browser.
Result: The test was successful.
6.3.7. Maintaining the network connection
Requirement: The network connection between the server and the client must be
left on for the duration the client program is running after the client has connected to
the server.
Execution: The user selects the Download songs command from the menu, and
selects a valid access point to be used for creating the Internet connection. A
connection to the server is established and the list of the available preview files is
transmitted from the server to the client. The list is displayed to the user, but he
selects Cancel and the file browser is again displayed. The server connection is still
on.
Result: The test was successful.
6.4. Technical tests
The technical properties of the system were tested against the technical requirements
specified in section 5.2.3. The tests were performed on a Nokia N95 phone and a
Wi-Fi connection. The server was running on a dedicated server computer running
Linux operating system.
65
6.4.1. Preview file download time
Requirement: The preview files must be downloadable over Wi-Fi or 3G
connection in less than a minute.
Execution: The user selects the Download songs command from the menu and
selects a valid access point. The server connection is established and the list of the
available preview files is displayed to the user. Then the user selects a song from the
list and presses the download button. The file transmission begins and a progress bar
is displayed to the user, which indicates the download status. The download time is
recorded and the process is repeated for every file. The results are displayed on Table
10.
Result: The test was successful.
Table 10. Technical performance details of the system
Audio clip
Download time
aerosmith
bigyellow
bryanadams
celine
cocker
dafunk
duel
exitmusic
finlandia
Madonna
metallica
ordinaryworld
queen
rushing
sipe
sting
AVERAGE
8s
10s
11s
10s
16s
15s
18s
19s
16s
11s
11s
16s
10s
22s
14s
10s
13.6s
License download
time
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
< 1s
Noise transform time
12s
15s
13s
12s
13s
15s
17s
19s
16s
14s
12s
19s
13s
15s
17s
13s
14.7s
6.4.2. License file download time
Requirement: The license files must be downloaded in less than five seconds.
Execution: The user selects a preview file on the file browser and selects the
Purchase option from the menu. After selecting a valid access point, the server
connection is established and the license is received. The download time is recorded
and the process is repeated for every file. The results are displayed on Table 10.
Result: The test was successful.
66
6.4.3. Noise transform processing time
Requirement: The noise transform must not take over 30 seconds.
Execution: The user selects a preview file on the file browser and selects the
Purchase option from the menu. After selecting a valid access point, the server
connection is established and the license is received. Then the noise transform
process begins and its duration is recorded. The results are listed in Table 10.
Result: The test was successful.
6.4.4. Multiple users support
Requirement: The server must support multiple users simultaneously.
Execution: The user selects the Download songs menu command and selects a valid
access point. After the server connection is established, the list of available preview
files is shown to the user. At this point, another user connects to the server and
requests the list for the preview files. A new thread is created on the server. Both
users can select and download preview songs at the same time.
Result: The test was successful.
6.4.5. Server stability
Requirement: The server must not crash if the connection is terminated
unexpectedly.
Execution: The user selects the Download songs menu command and selects a valid
access point. After the server connection is established, the list of available preview
files is shown to the user. Then, the client device is shut down by pressing the power
button. This terminates the connection between the server and the client. The server
console displays a java.net.SocketException message and the running thread is
terminated. The server is accepting new connections normally.
Result: The test was successful.
6.4.6. Client stability
Requirement: The client must not crash if the connection is terminated
unexpectedly.
Execution: The user selects the Download songs menu command and selects a valid
access point. After the server connection is established, the list of available preview
files is shown to the user. Then the server process is killed in the server machine. The
67
connection between the server and the client is terminated. After the connection
timeout, the client notices that the connection has been lost. The client can use all
commands normally and the connection to the server is attempted if the command
requires a server connection.
Result: The test was successful.
6.5. User tests
In addition to the algorithm imperceptibility tests, the test users answered a
questionnaire on the business model behind the system. The users were first
introduced to the client application running on a Nokia N95 mobile phone, and then
they used the application to download free preview audio files from the server. After
listening to the audio files, they purchased a license to at least one preview file,
which was then transformed into a fingerprinted audio file. They listened to the
fingerprinted version and compared it to the preview version. After the test, they
answered the questionnaire. The answers are presented in Table 11.
Table 11. User questionnaire results
Question
Have you bought music with a mobile device?
Do you think a full song preview with the noise
is better than a normal preview with a short high
quality sample?
Do you think a full song preview with the noise
is better than a full song preview with otherwise
decreased audio quality (low bit rate)?
Do you think it is beneficial that you don't need
to download the song again after purchasing?
Would you consider using the system if it was
commercially available?
Yes
0%
100%
No
100%
0%
60%
40%
100%
0%
20%
80%
The questionnaire was presented with the same audio quality evaluation toolkit as
the imperceptibility tests. Before listening to the watermarked and non-watermarked
samples, the users answered the questions about their usage experience on the
system. The respondents are therefore the same as in the imperceptibility test, which
was explained in section 4.4.2.
6.6. Discussion
The users who participated in the questionnaire had no previous experience on
purchasing music from online music stores with a mobile device. The reason is
probably that the era of mobile music sales is just taking its first steps with the
iTunes support of iPhone, Nokia Music Store and the Comes With Music service
coming to the latest smartphones. The trend is clear, however, as the eMarketer
report shows [1].
68
The users were quite reluctant for using a similar system if it was commercially
available. The users may have thought that the commercial system would be as crude
as the implemented demo version. The iPhone solution, where you can easily
purchase music from iTunes, is currently settings the standards for implementation
quality required by the customers. Another reason can be that people do not
appreciate the idea of having music available in their mobile phones. There are still
many people who do not have smartphones with digital music player capabilities,
and some are still very content with phones that have only call and text messaging
capabilities.
The idea for using the full song for previewing purposes was appreciated. Every
test user preferred the full song to the generally used short preview clips. They also
appreciated the feature that they did not have to download the song again after they
had listened to the preview version and decided to unlock the high quality version.
The watermark noise used to disturb the listening experience in the preview version
did not get full marks from every test user. Some would have preferred a low bit rate
version or otherwise decreased audio quality to the watermark noise in the preview
version. The watermark noise inserted by the embedding algorithm can be modified
to sound different, but this thesis concentrated more on the imperceptibility and
robustness of the fingerprint watermark. Future research could put more emphasis on
the sound quality of the initial watermark on the preview songs.
69
7. DISCUSSION
The failure of encryption-based DRM systems in digital audio distribution has led to
an increase in online music stores that sell their music in some unprotected audio
format, such as MP3. At the same, time the online and mobile music markets
continue to grow as more people are getting accustomed to using digital audio
players and smartphones. This has lead to an increasing need for new DRM
technologies that could be able to protect the content while it is stored externally
unprotected.
Digital watermarking can be used for creating solutions based on embedding
inaudible identifiers known as digital fingerprints in audio. These fingerprints can
then be used for detecting the origin of the content in the case of Internet piracy. This
work designed and implemented an audio protection system utilizing removable
watermarking and fingerprinting technologies. The fingerprint watermark was robust
against signal processing attacks and it was proven very close to being imperceptible
in the listening tests. A more detailed analysis of the robustness and imperceptibility
tests are presented in section 4.5. Test users also answered a questionnaire on
purchasing music with a mobile device. These results are discussed in section 6.6.
Digital audio watermarking has been widely researched and several methods have
been developed for embedding the watermark. This thesis is also a continuation in
the series of digital watermarking publications at the MediaTeam Oulu research
group. The digital watermarking research at MediaTeam has currently two main
topics: image and audio watermarking. The algorithms presented in this thesis are
part of the long-term research of audio watermarking algorithms, and they combine
pseudo-random frequency hopping sequences and spread spectrum synchronization
with removable watermarking and fingerprinting techniques.
Related work in the field has usually concentrated on algorithm details and testing
the algorithm performance. This work also presents a fully functional system for
mobile audio distribution utilizing watermarking and fingerprinting techniques. This
allowed test users to try out the system and give their valuable opinion. Related work
on removable watermarking has mainly concentrated on digital image watermarking,
while this work combines audio watermarking with removable watermarking
techniques.
The industry has been very interested in the algorithms developed in this study,
and a local company has bought the rights to apply for a patent for the invention of
transforming the audible noise into an inaudible fingerprint. This proves the
significance of this work to the industry. Also because the results of this work show a
clear and significant improvement over the previous version of the algorithm, a new
paper based on this thesis will be submitted to a conference as well.
The interested response of the industry opens up many directions for future
research. The algorithms could be improved in many ways. From the user point of
view, the sound quality of the watermark in the music file previews should be
optimized to be at the exact level where it would not be too disturbing, but still make
long-term listening uncomfortable. Also, the fingerprint watermark is still not perfect
in terms of robustness and inaudibility. The robustness against the pitch attack could
be improved by embedding the signal wider in the frequency band. That way, the
modifications in the audio frequency would not destroy the watermark so easily.
Robustness should also be evaluated with full length songs. This could reveal new
ways to optimize the embedding algorithm.
70
The perceptual quality of the fingerprint watermark could be improved with using
an optimal frequency band for every audio file. The use of a more effective
frequency band could allow lowering the embedding strength of the fingerprint,
which would directly improve the perceptual quality.
71
8. SUMMARY
This thesis designed, implemented and evaluated an audio protection system for
mobile distribution environment. The technological focus was on digital rights
management and digital audio watermarking techniques.
The audio protection system consists of a server and a client component. The
server provides free preview music files that can be downloaded with the client
application. The preview files contain an audible watermark noise signal. The idea is
to allow the music vendors to distribute complete songs as previews with the
watermark, which also enables the customers to have better previews of the music
instead of the traditional 30 second samples. The noise signal makes long-term
listening of the preview samples unpleasant, but still allows the users to have a
proper preview of the whole song. The client application contains a watermarking
algorithm for transforming the noise signal into an inaudible fingerprint effectively
creating a high quality version of the song. This process requires a license which is
generated by the server and sent to the client. The license contains the watermarking
key required for the noise transform process.
The purpose of the digital fingerprint inserted in the noise transform process to the
high quality version is to identify the user in case the song is leaked to unauthorized
domains such as piracy torrents. The advantage of having the identity of the song
owner in a digital watermark is that it cannot be removed with traditional DRM
circumventing techniques such as burning the music to CD and then ripping it back
into some unprotected format. The watermark remains in the audio although it would
be transformed into analog format.
The implementation was tested with three separate test cases. The robustness of
the fingerprint watermark was tested against an extensive set of attacks, which
performed inaudible changes to the audio and tried to destroy the fingerprint. The
results proved that the algorithm was robust even when short audio clips were used.
The audio files used in real life cases would be 10-15 times longer, which should
improve the results notably. The inaudibility of the fingerprint was tested with a
listening test by 10 test users. The results implicated that an average user could not
tell the difference between watermarked and non-watermarked audio clips. The
algorithm should still be improved in terms of audio quality with clips which are
dynamically compressed. The last test presented the test users a questionnaire about
the audio distribution business case implemented in the system. They liked the idea
that the full song was available in the free preview version, although the watermark
sound quality could be improved.
The algorithms developed in this work have received an interested response from
the industry, and a local company has bought the rights to apply for a patent for the
idea of transforming the audible noise watermark into an inaudible fingerprint. Also,
the good results in the robustness and imperceptibility tests could enable publishing a
conference paper on the topic of this thesis.
The main future work aspects include optimizing the perceptual quality of the
audible watermark in the preview versions into a level where it sets a balance
between giving a good preview and encouraging the customer to purchase the high
quality version. Also, the robustness and imperceptibility could be improved with
additional research on the watermark embedding details.
72
9. REFERENCES
[1]
Verna, P. (2007). Recorded Music: Digital Falls Short. (read 11.8.2008)
eMarketer
report.
URL:
http://www.emarketer.com/Reports/All/Emarketer_2000472.aspx
[2]
Löytynoja, M., Cvejic, N. and Seppänen, T. (2007). Audio protection with
removable watermarking. Proc. Sixth International Conference on
Information, Communications and Signal Processing (ICICS 2007),
December 10-13, Singapore, 1-4.
[3]
Rosenblatt, W., Mooney, S. and Trippe, W. (2001). Digital Rights
Management: Business and Technology, John Wiley & Sons, Inc,
Chichester, 312 p. ISBN: 978-0-7645-4889-5
[4]
Rump, N. (2003): Digital Rights Management: Technological Aspects. In
Digital Rights Management – Technological, Economical, Legal and
Political Aspects. LNCS 2770, Springer, Berlin, 3-15. ISBN: 3-540-404651
[5]
Hauser, T. and Wenz, C. (2003): DRM Under Attack: Weaknesses in
Existing Systems. In Digital Rights Management – Technological,
Economical, Legal and Political Aspects. LNCS 2770, Springer, Berlin,
206-223.
[6]
Iannella, R. (2001). Digital Rights Management (DRM) Architectures. DLib Magazine 7(6).
[7]
Open Digital Rights Language (ODRL) (read 23.7.2008) URL:
http://odrl.net
[8]
Rights Expression Language. Approved Version 1.0 – 15 June 2004. Open
Mobile Alliance. OMA-Download-DRMREL-V1_0-20040615-A
[9]
DRM Rights Expression Language. Approved Version 2.0.1 – 26 Feb
2008. Open Mobile Alliance. OMA-TS-DRMREL-V2_0_1-20080226-A
[10]
DRM Rights Expression Language. Candidate Version 2.1 – 24 Jul 2007.
Open Mobile Alliance. OMA-TS-DRMREL-V2_1-20070724-C
[11]
eXtensible Rights Markup Language (XrML) (read (23.7.2008) URL:
http://www.xrml.org
[12]
Löytynoja, M. and Seppänen, T. (2005). Hash-based Counter Scheme for
Digital Rights Management. Proc. 2005 IEEE International Conference on
Multimedia & Expo, Amsterdam, The Netherlands, 121-124.
[13]
Taylor, S. (2007). Industry sees sunnier side of digital copying. (read
17.11.2008) Global Technology Forum, Best practice, 21 Aug 2007. URL:
73
http://globaltechforum.eiu.com/index.asp?layout=printer_friendly&doc_id
=11248
[14]
Emi abandons CD DRM. (read 6.8.2008) Boing Boing. URL:
http://www.boingboing.net/2007/01/08/emi-abandons-cd-drm.html
[15]
Halderman, J. A. and Felten, E. W. (2006). Lessons from the Sony CD
DRM episode. Proceedings of the 15th conference on USENIX Security
Symposium - Volume 15. Vancouver, B.C., Canada, USENIX Association.
[16]
Digital developments could be tipping point for MP3. (read 11.8.2008)
URL:
http://www.reuters.com/article/musicNews/idUSN0132743320071203
[17]
Apple iTunes Store Support – Authorization FAQ (read 11.8.2008) URL:
http://www.apple.com/support/itunes/store/authorization/
[18]
Jobs, S. (2007). Thoughts on Music. (read
http://www.apple.com/hotnews/thoughtsonmusic/
[19]
Anderson, N. (2007). PlayForSure becomes “Certified for Windows Vista”
(read 12.8.2008) URL: http://arstechnica.com/news.ars/post/20071212playforsure-becomes-certified-for-windows-vista.html
[20]
Nokia outlines its vision of Internet evolution and commitment to
environmental sustainability. (December 2007) Nokia Press Release. URL:
http://www.nokia.com/A4136001?newsid=1172937
[21]
Fisher, K. (2007). Musicload: 75% of customer service problems caused
by DRM (read 13.8.2008) URL: http://arstechnica.com/news.ars/post/
20070318-75-percent-customer-problems-caused-by-drm.html
[22]
Juergen, S. (2005). Digital Watermarking for Digital Media, Information
Resources Press, Arlington, VA. ISBN: 159140519X
[23]
Swanson, M. D., Kobayashi, M. and Tewfik, A. H. (1998). Multimedia
data-embedding and watermarking technologies. Proceedings of the IEEE
86(6): 1064-1087.
[24]
Petitcolas, F. A. P., Anderson, R. J. and Kuhn, M. G. (1999). Information
hiding - a survey. Proceedings of the IEEE 87(7): 1062-1078.
[25]
Cvejic, N. and Seppänen, T. (eds.) (2008) Digital Audio Watermarking
Techniques and Technologies: Applications and Benchmarks, Information
Science Reference, Hershey, PA, USA, 1-10.
[26]
Pramila, A. (2007) Watermark synchronization in camera phones and
scanning devices. Master’s Thesis, University of Oulu, Department of
Electrical and Information Engineering, Oulu, Finland.
11.8.2008)
URL:
74
[27]
Mäkelä, K. (2000) Digital watermarking and steganography. Diploma
Thesis, Department of Electrical Engineering, University of Oulu, Oulu,
Finland.
[28]
Petitcolas, F. A. P. (2003): Digital Watermarking. In Digital Rights
Management – Technological, Economical, Legal and Political Aspects.
LNCS 2770, Springer, Berlin, 81-92.
[29]
Cvejic, N. and Seppänen, T. (2004). Spread spectrum audio watermarking
using frequency hopping and attack characterization. Signal Processing.
84(1): 207-213.
[30]
Wen-Nung, L. and Li-Chun, C. (2006). Robust and high-quality timedomain audio watermarking based on low-frequency amplitude
modification. IEEE Transactions on Multimedia. 8(1): 46-59
[31]
Löytynoja, M. (2008) Digital Rights Management of Audio Distribution in
Mobile Networks. Dissertation, Acta Univ Oul C 311, Department of
Electrical and Information Engineering, University of Oulu, Finland.
[32]
Cvejic, N., Keskinarkaus, A. and Seppänen, T. (2001). Audio
watermarking using m-sequences and temporal masking. Proc. 7th IEEE
Workshop on Applications of Signal Processing to Audio and Acoustics,
New York, NY, 227-230.
[33]
Jong Won, S. and Jin Woo, H. (2001). Audio watermarking for copyright
protection of digital audio data. Electronics Letters 37(1): 60-61.
[34]
Liu, K. J. R., Trappe, W., Wang, Z. J., Wu, M. and Zhao, H. (2005).
Multimedia Fingerprinting Forensics for Traitor Tracing, EURASIP Book
Series on Signal Processing and Communications, Hindawi Publishing
Corporation, New York, 272p. ISBN: 978-9775945181
[35]
Steinebach, M., Petitcolas, F. A. P., Raynal, F., Dittmann, J., Fontaine, C.,
Seibel, S., Fates, N. and Ferri, L. C. (2001). StirMark Benchmark: Audio
Watermarking Attacks. Proceedings of the International Conference on
Information Technology: Coding and Computing, April 2-4, Las Vegas,
49-54. ISBN: 0-7695-1062-0
[36]
Wu, M., Trappe, W., Wang, Z. J. and Liu, K. J. R. (2004). Collusionresistant fingerprinting for multimedia. Signal Processing Magazine, IEEE
21(2): 15-27.
[37]
Feng, J.-B., Lin, I.-C., Tsai, C.-S. and Chu, Y.-P. (2006). Reversible
watermarking: Current status and key issues. International Journal of
Network Security 2(3): 161-171.
[38]
Digital Watermarking Alliance brochure. (read 5.11.2008) URL:
http://www.digitalwatermarkingalliance.org/about.asp
75
[39]
MarkAny – Inaudible and Robust Audio Watermark Technology. (read
5.11.2008) URL: http://www.markany.com/en/sub_index.asp?fn=product
&spname=product_04_04
[40]
Verance
Music
Solutions
(read
http://www.verance.com/solutions/music.php
[41]
Verance Announces Availability of Audio Watermark Technology for
High-Definition Entertainment Formats. (read 5.11.2008). Verance press
release. July 2, 2007
[42]
ITU-T Recommendation P.800 (1996).
Determination of Transmission Quality
5.11.2008)
Methods
for
URL:
Subjective
76
10. APPENDICES
Appendix 1
The properties of the applied attacks against the fingerprint watermark
77
Appendix 1
No.
1
The properties of the applied attacks against the fingerprint watermark
Attack
MP3 compression
2
Chorus
3
Compressor
4
Delay
5
Flanger
6
7
8
Invert
Low pass filter
Pitch
9
10
Random noise
Resampling
11
Reverb
12
Stretch
Properties
Encoder: LAME version 3.97 MMX
Parameters: --quiet –h –b 128 (attack #1)
Parameters: --quiet –h –b 192 (attack #2)
Sampling frequency: 44100Hz
Voices: 5
Delay time: 30ms
Delay rate: 1.2Hz
Feedback: 10%
Spread: 60ms
Modulation depth: 5dB
Modulation rate: 2.0Hz
Output level: Dry 100%, Wet 5%
Attack time: 1ms
Release time: 500ms
Output gain: 0dB
Threshold: -50dB
Ratio: 1:1.0
Delay time: 400ms
Mix: 5%
Initial delay: 1ms
Final delay: 2ms
Stereo phasing: 45 degrees
Feedback: 10%
Modulation rate: 0.40Hz
Mix: 5%
Cut-off frequency: 15kHz
Splicing frequency: 40Hz
Overlapping: 33%
Ratio: 101% (Attack #1)
Ratio: 99% (Attack #2)
Maximum noise amount: 0.91%
Resampling to 8000Hz
Bit depth: 8
Decay time: 700ms
Pre-delay time: 10ms
Diffusion: 1818ms
Perception: 50
Output level: Dry 100%, Wet: 20%
Splicing frequency: 40Hz
Overlapping: 33%
Ratio: 102% (Attack #1)
Ratio: 98% (Attack #2)