mp3 book: Table of Contents - University of Texas at Austin

Transcription

mp3 book: Table of Contents - University of Texas at Austin
<d.w.o> mp3 book: Table of Contents
<david.weekly.org>
January 4 2002
mp3 book Table of Contents
Table of Contents
Chapter 0: Introduction
●
●
●
What’s In This Book
Who This Book Is For
How To Read This Book
Chapter 1: The Hype
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
●
What Is Internet Audio and Why Do People Use It?
●
Some Thoughts on the New Economy
●
A Brief History of Internet Audio
●
●
●
mp3 book
news
❍
Bell Labs, 1957 - Computer Music Is Born
pictures
❍
Compression in Movies & Radio - MP3 is
Invented!
poems
❍
The Net Circa 1996: RealAudio, MIDI, and .AU
The MP3 Explosion
❍
1996 - The Release
❍
1997 - The Early Adopters
❍
1998 - The Explosion
❍
sidebar - The MP3 Summit
❍
1999 - Commercial Acceptance
Why Did It Happen?
❍
Hardware
❍
Open Source -> Free, Convenient Software
❍
Standards
❍
Memes: Idea Viruses
Conclusion
http://david.weekly.org/mp3book/toc.php3 (1 of 6) [1/4/2002 10:53:06 AM]
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
<d.w.o> mp3 book: Table of Contents
Chapter 2: The Guts of Music Technology
●
Digital Audio Basics
●
Understanding Fourier
●
The Biology of Hearing
●
Psychoacoustic Masking
❍
Normal Masking
❍
Tone Masking
❍
Noise Masking
●
Critical Bands and Prioritization
●
Fixed-Point Quantization
●
Conclusion
Chapter 3: Modern Audio Codecs
●
●
●
MPEG Evolves
❍ MP2
❍ MP3
❍ AAC / MPEG-4
Other Internet Audio Codecs
❍ AC-3 / Dolbynet
❍ RealAudio G2
❍ VQF
❍ QDesign Music Codec 2
❍ EPAC
Summary
Chapter 4: The New Pipeline: The New Way To Produce,
Distribute, and Listen to Music
●
●
●
●
●
Digital Recording
❍ to DAT (studio)
❍ from CD (post-master)
MIDI Studios
Digital Editing
Digital Distribution
Digital Consumption
http://david.weekly.org/mp3book/toc.php3 (2 of 6) [1/4/2002 10:53:06 AM]
<d.w.o> mp3 book: Table of Contents
●
Portable Digital Audio
Chapter 5: Software Tools
●
●
●
●
Encoding
❍ Audio Catalyst
❍ BladeEnc
❍ Fraunhofer’s tools
❍ Liquid Audio
❍ MusicMatch
❍ Microsoft
❍ RealJukebox / RealEncoder
❍ WinDAC32 & Other Rippers
❍ 3rd Party Encoding
Playback
❍ WinAMP
❍ Sonique
❍ Microsoft
❍ FreeAMP
❍ RealPlayer
❍ Other Players
Serving
❍ RealServer
❍ Shoutcast & Icecast
❍ Microsoft
3rd party Serving
❍ Live365
❍ Myplay
❍ Summary
Chapter 6: The Law
●
●
What Are You Allowed To Do With Music?
❍ Recording Rights, Composition Rights
❍ Streaming, Downloading, and Public Performance
What Laws Are There?
❍ The Audio Home Recording Act of 1992
❍ The Digital Millenium Copyright Act of 1995
http://david.weekly.org/mp3book/toc.php3 (3 of 6) [1/4/2002 10:53:06 AM]
<d.w.o> mp3 book: Table of Contents
The "No Net Copy" Act of 1997
Where To Look For More Information
Summary
❍
●
●
Chapter 7: The Security Issue
●
●
●
●
●
●
Encryption Systems
❍ Liquid Audio
❍ a2bmusic
❍ mjuice
❍ Microsoft’s ASX
Watermarking Systems
❍ Aris
❍ SDMI
Whence eMusic?
Why People Will Try To Protect Music Even When It’s Impossible
Why MP3 Will Be Slow To Die
Summary
Chapter 8: How Artists Can Use The Internet (Push Out /
Suck In)
●
●
●
●
The Consumer Is Your Network: How and Why Superdistribution Works
How To Push Out (Be Heard!)
How To Suck In (Get Visitors!)
How To Make Money From Your Fans
Chapter 9: Enjoying Internet Music
●
The Hunt For Good Music
❍ Indies
■ MP3.com
■ AMP3.com
■ EMusic.com
■ Liquid Music Network
❍ Popular Music
■ MusicMaker
■ Napster
■ IRC
http://david.weekly.org/mp3book/toc.php3 (4 of 6) [1/4/2002 10:53:06 AM]
<d.w.o> mp3 book: Table of Contents
■
●
●
●
Friends!
Streaming
The Portable Issue
❍ Burning Audio CDs
❍ Burning MP3 CDs
❍ Portable MP3 Players
Summary
The Leaders of the Revolution
●
●
●
●
●
●
●
●
●
●
●
Michael Robertson, MP3.com
Karlheinz Brandenberg, FHG IIS
Shawn Fanning, Napster
Jim Griffin, OneHouse/Cherry Lane Digital
Gene Hoffman, eMusic
Justin Frankel, Nullsoft/AOL
Phil Wiser, Liquid Audio
Jack Moffit, Icecast
Doug Camplejohn, MyPlay
Ram Samuldrala
Summary
Chapter 10: The Future
●
●
●
●
●
●
●
What are the Labels Scared of?
Personalized Radio
Donation Systems / Shareware Music
Multichannel Audio
Interactive Music
❍ Collaborative Composition
❍ Voice-Based Composition
❍ "Cyberskat"
Digital Video
Summary
Appendix A: The Author's Story
http://david.weekly.org/mp3book/toc.php3 (5 of 6) [1/4/2002 10:53:06 AM]
<d.w.o> mp3 book: Table of Contents
Appendix B: Web Resources
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/mp3book/toc.php3 (6 of 6) [1/4/2002 10:53:06 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
<david.weekly.org>
January 4 2002
mp3 book Chapter 1: The Hype About Internet
Audio
Chapter 1: The Hype About
Internet Audio
What Is Internet Audio and Why Do People
Use It?
When people say "Internet audio," they're generally not
speaking about websites that sell CDs online. Instead,
they're talking about the recent phenomenon of
downloading files from the Internet that contain
information about music in a similar fashion to the way
that a CD stores music. This means that you can play
music on your computer without a CD, or a tape, or a
vinyl record! The song is stored in a file. These files
tend to be very large, as it takes a lot of information to
store high-quality audio. As a result, most people use
programs that compress their music - this way their
music files take up much less space on their hard
drives, but the music maintains the quality of a CD.
The most popular of these compressed music formats
is known as MP3. (We'll get into more of exactly how
it works in Chapter 2!)
Once you have individual songs in files stored on your
computer, you can have much more control over your
music than if you had been listening only with a CD
player. For instance: you could make a list of your
favorite 100 jazz tunes, or send a song that you
particularly loved to a friend who lives across the
country. If you have a CD burner, you can even burn
custom audio mixes onto CDs for your friends! Since
files are copied perfectly, they do not degrade as you
make more copies like a tape would. Programs are now
cheaply and widely available to allow users to quickly
make music files of their entire CD collection. For
these reasons and more, in the last three years, it has
become very popular among college students to store
http://david.weekly.org/mp3book/ch1.php3 (1 of 10) [1/4/2002 10:54:05 AM]
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
mp3 book
news
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
music on their computers.
Many people have complained that putting music on
your computer limits you, because you can only listen
to music while you're sitting in front of your computer!
Fortunately, several major manufacturers have solved
this problem by introducing small devices that can
store and play your music away from the computer:
they are shaped like very small Walkmen, and tend not
to weigh almost anything at all. Unfortunately, such
devices are not yet compelling at the time of this
writing, playing only an hour of music, after which you
must run back to your computer to "refill" the device
with new music - hardly suitable for a ski trip! There
are, however, even newer devices that will likely be
widely available by the time you're reading this that
will allow you to store many tens of hours of music.
Unlike most other technological revolutions before it
(such as the introduction of CDs), MP3 and other
Internet audio formats were not introduced by the
record labels. Instead, they were introduced by
consumers who, finding the technology exciting,
passed the knowledge on by word of mouth. In fact,
most record companies have been quite unhappy by the
existence of MP3s, chiefly because it is now possible
to quite easily obtain copyrighted music for free: the
latest Beck tune is just a click away, regardless of what
the label or the band thinks about it. Most labels are
scared that free copying on the Internet will erase their
ability to make a profit; or more importantly, to pay
artists. In Chapter 10, we'll see why they're scared.
Some Thoughts on the New Economy
The Internet is changing our notion of a market. We used to think that an
economy would be centered upon the sale of physical goods, with a small
market for services. The rapid and nearly free redistribution that the Web
permits morphs what were once products into services. News, once a physical
commodity, to be delivered on pressed sheets of paper, has since become a
service on the Internet. Obviously, the Internet cannot so dramatically change
industries less centered on the circulation of ideas: the steel industry, for
instance, has likely been undergoing far less rapid upheaval than the news
industry.
The music economy has been particularly interesting: originally, music was a
service. One paid to attend a concert - you did not receive any physical object
http://david.weekly.org/mp3book/ch1.php3 (2 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
that embodied the music; that would be unthinkable! But when Edison first
recorded his voice on a wax cylinder at the beginning of the 20th century, that
all was changed. Music could now be "bottled up," contained within a physical
object, and sold, just like bread and beef, as a commodity. New advances in
production technology, such as Ford's ingenious assembly lines, placed
phonographs and radios in millions of homes, which in turned allowed for the
rapid commercial distribution of music that exists to this day. Large record
companies would solicit radio stations to play their music, which in turn would
allow for rapid and widespread exposure and in its turn leading to increased
sales of records. Pop stars could be made or broke in a twinkling; music as a
commodity was thriving and labels (and a few lucky artists) were raking it in.
But now the Internet is entering into the picture and erasing the concept of
music as a product, returning music to the service market. Since music can be
(and is!) freely copied, an individual song carries little value: instead, it is the
arrangement and/or the branding of the song that is coming to be of value.
A Brief History of Internet Audio
So where did this notion of having computers play music come from? Truth be
told, it wasn't a sudden quantum leap; computer music has been evolving for
over 30 years. If any one place or any one man can be said to be the source of
this whole hullabaloo, though, it would have to be Max Matthew's group at Bell
Labs in New Jersey.
Bell Labs, 1957 - Computer Music Is Born
Max Matthews was working as a researcher for AT&T, whose Bell
Laboratories have produced some of the most amazing technological
discoveries of the century, such as the transistor, the laser, the digital computer,
and most relevantly electronic audio recording and the phonograph. While there
had been a few individuals who had made machines capable of electronically
generating music, Max was the first to generate music on a general-purpose
computer. In 1957, Max released "Music I", a program for a very early IBM
computer that allowed music to be synthesized in the computer and output to a
speaker. In the mid-60's, famous movie director Stanley Kubrick heard a later
and more advanced version of Max's program actually sing the classic song
"Daisy, Daisy, Bicycle Built for Two..." and was so impressed with the
technology that he incorporated it into his movie 2001: A Space Odyssey.
(Near the end of the movie, we discover that this song was the first thing that
HAL, the film's intelligent and self-aware computer, had learned.) The original
version is included on the CD in the back if you'd like to have a listen.
Compression in Movies and Radio - MP3 is Invented!
If you did bother to listen to the sample, you too would conclude that music
synthesis has come a long way since then, with "Techno" (primarily
http://david.weekly.org/mp3book/ch1.php3 (3 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
computer-generated music) emerging as a musical category in its own right,
and most modern pop songs making heavy use of computer synthesis. But
using a computer to synthesize music is only one part of the picture: since
computers can perfectly copy music, it would seem to be most prudent to use a
digital device to transmit and store music.
The film industry has been very interested in digital audio formats from the
beginning, but there was a very interesting initial problem to adding digital
audio to movies. Audio was stored in a very small band to the right and left of
each frame of the movie: it would be impossible to store the full digital signal,
so the music needed to be compressed in order to fit on the reel. Dolby
Laboratories, along with several other companies, rose to the challenge and
invented several compression schemes that survive to this day.
Radio stations also were keenly interested in digital audio, albeit for different
reasons. Radio producers desired the ability to simultaneously broadcast a live
show to many stations without a loss in quality. The solution would have to be
for a broadcasting facility to "call up" a radio station and digitally transmit the
audio. The problem with this is that the speed at which the telephone networks
in the late 1980's sent information was far too slow for uncompressed audio. As
a result, several companies undertook extensive research to discover an
effective way to compress audio enough to be sent over the telephone lines.
Karlheinz Brandenburg at Fraunhofer IIS, a German commercial research
institute, designed one of the most effective algorithms for audio compression:
as the third and most advanced method for compressing audio as standardized
by the Motion Pictures Expert Group (MPEG), it was dubbed MPEG Layer 3
audio, or MP3 for short. MP3 was invented in 1989 and standardized by 1991.
The algorithm was so complicated that only a very expensive and dedicated
piece of hardware could run it, and the notion that a personal computer would
be able to run such software some day was likely not in the heads of many.
The Net Circa 1996: RealAudio, MIDI, and .AU
Around 1991, the world's largest inter-network (a network of computer
networks) connecting U.S. government, educational, and research facilities,
started to garner the public attention. It became known as the Internet, or even
just, "The Net" for short. University students gradually started using electronic
mail, or "email," to send letters and messages to their friends on campus or at
other colleges.
At the same time, Tim Berners-Lee was in Switzerland, developing the World
Wide Web for CERN, The European Center for Nuclear Research (the acronym
is from the French title). The University of Illinois at Urbana-Champaign soon
decided to implement a high-quality graphical cross-platform web browser
called Mosaic. Both Microsoft's Internet Explorer and Netscape's Navigator
were built on Mosaic's core.
http://david.weekly.org/mp3book/ch1.php3 (4 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
The first versions of these browsers did very little with audio: they had enough
on their plate as was and most people involved with developing the browsers
were focused on the creation of a new publication medium: since journals don't
make music, why should a web browser? Nevertheless, one could still
download sound files and play them back. Initially, there was one dominant
format: Sun's .AU files. .AUs sound awful, but they're about as good as you
could hear off of the tiny built-in speakers in a Sun workstation. I'll cover them
in more detail in Chapter 3.
RealAudio v1.0 came out of beta on July 26, 1995, allowing users for the first
time to listen to music as it downloaded: people could begin hearing a tune as
soon as they clicked on it, as opposed to having to wait until the download
completed. That fall, NPR began posting 5 minute news segments on their
website in RealAudio format. Streaming audio had come to the web.
Unfortunately, even with their subsequent 2.0 and 3.0 releases, the audio
quality was awful; unlistenable for everything except speech. People were
amused by audio on the Internet, but few took it seriously.
Arguably the most annoying of all Internet audio formats is MIDI. MIDI files
are stored in a very different fashion from most others. Instead of storing the
recording of, for instance, a piano concerto, it stores the notes. That is to say,
all the file contains is that at so-and-so time, a C# is to be played on a grand
piano with such-and-such force. It is up to the computer that plays the actual
file to figure out what that C# should sound like. Naturally, if you have very
expensive gear hooked up to your computer, it will sound great. However,
synthesis on most people's computers sounds absolutely wretched. The two
major pluses of MIDI files is that they take up almost no space at all (you're
just storing the notes!) and that they are editable (if you want to, say, bring the
bass line up by an two notes, you can). For the latter reason, this format has
been very popular with musicians. It is the former reason that enabled it to take
off in the early days of the Internet: only MIDI would allow you to hear a 2
minute song after a 15 second download on a 14.4 modem! (In contrast, this
amount of time would be sufficient to download only 2 seconds of MP3 audio.)
As computers grew faster and people started getting faster and faster
connections to the Internet, an opportunity began to emerge for a high-quality
audio compression algorithm.
The MP3 Explosion
As mentioned earlier, the MP3 algorithm was conceived in 1989 and
standardized in 1991. It had not been anticipated to be widely run on personal
computers due to its computational complexity. However, as Intel continued
pushing out faster and faster chips, it became clear that once out-of-reach
algorithms might be able to run in realtime. It was important that MP3
decoding be able to run in realtime. If it didn't, users would have to wait several
minutes as the computer created a decompressed copy of the song before it
http://david.weekly.org/mp3book/ch1.php3 (5 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
played. By being able to decode the upcoming audio as the song played, users
could click on a song and immediately hear it play, making for a considerably
more compelling experience. It was in 1996 that Intel finally released a
processor fast enough to do this: the Pentium 120.
1996 - The Release
It was late in 1996 that Fraunhofer decided to release their MP3 encoder and
decoder, simply dubbed L3ENC (for Layer 3 Encoder) and WinPlay3 (their
Windows MP3 player), as shareware on the Internet. A few people heard about
it, made a few MP3s, and spread the word. The MP3 buzz began.
One of the most impressive early websites was put up by a handful of students
at Texas A&M University with handles like "bongo" and "frixion." Their site,
called TEK, archived large quantities of high-quality streaming music: with a
click you could be listening to a personalized country music, alternative, or
R&B station. TEK's user interface was smooth and elegant, far beyond what
any commercial entity would manage to pull off for the next few years. Sites
like TEK exposed people to the MP3 revolution and greatly increased
awareness around MP3. Unfortunately, early the next year TEK was shut down
due to pressure from the University's administration. It never went online again.
1997 - The Early Adopters
It was in 1997 that MP3s gained a strong "early adopter" following, including a
good portion of the computer science types at colleges nationwide. As
mentioned in the introduction, this was around the time that I had begun setting
up my personal website to explain the intricacies of MP3s to the Internet public
and give links to the latest players. Many great sites similar to mine were
established; there was a real sense of community between those who were
using MP3s and maintaining MP3 websites. It was not long, however, before
the record labels began to act to stop MP3s from becoming popular.
My personal music website was shut down, along with several dozen other
websites. None of us had made any attempt to avoid detection; we had instead
made our sites as visible as possible, posting their location to all of the popular
search engines. We had also made links to each of each other's pages. It was, as
a result, a simple task to discover and contact all of us rapidly: indeed, in one
week early in 1997, just about every popular MP3 site on the Net disappeared.
Later MP3 websites focused less on the specific distribution of MP3s and more
on MP3 resources: how to make them, where to get the latest players, what sort
of places to get them, etc. Michael Robertson acquired MP3.COM in late 1997
and developed an effective MP3 portal of this type (popular initially because of
the domain) and also began signing bands up to non-exclusively distribute their
music on the site.
The media caught on starting in the middle of the year, and articles began
appearing in all sorts of business and technology magazines, discussing the
http://david.weekly.org/mp3book/ch1.php3 (6 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
future of the record industry. Microsoft near the end of the year quietly added
MP3 playback and encoding to their Netshow (later renamed Windows Media)
tools.
Many programmers began to look at the Fraunhofer's programs and improve
upon them, writing their own audio players from scratch. Tomislav Uzelac,
then a Croatian student, decided to make a low-level engine to play back MP3
files that would let other people put a nice user interface on it or integrate it into
other software players. A number of people noticed that this would make it
very easy to create new players and began doing so. Justin Frankel, also a
student at the time, constructed his MP3 player "WinAMP" based on the
engine. WinAMP had a very straightforward and attractive interface. WinAMP
quickly gained a massive following, which it maintains to this day. Nullsoft,
Justin's holding company for WinAMP, was bought by America Online in June
of 1999.
1998 - The Explosion
The underground MP3 phenomenon continued through the next year, with the
introduction of high-quality software and extremely rapid word-of-mouth
growth. By the end of 1998, most college students had heard about MP3s and
most major news outlets had written at least one story about the new music
explosion.
Sonique was perhaps the most exciting software release
The Annual MP3
of the year, offering a slick and dynamic interface that
Summits
felt right out of a sci-fi movie. WinAMP continued to
develop advanced features, like a customizable user
interface and an advanced "plugin" architecture that
Around February
of 1998, I was
allowed third-party developers to integrate new
talking on the
functionality into WinAMP. Hardware manufacturers
phone with
began to show interest in the growing MP3 market and Michael
Saehan, a Korean hardware manufacturer, announced
Robertson. I
that they would be selling a portable MP3 player called mentioned to him
that I had had
the MPMan. The RIAA (the Recording Industry
plans to host the
Association of America) launched their SoundByting
first MP3-oriented
campaign and website in an attempt to steer college-age conference in the
fall of 1997.
students away from music piracy and convince people
Unfortunately,
that sharing music wasn't "cool." Unfortunately for
had fallen
them, the notion of sharing music has shown itself to be plans
through, due to
compelling to wide numbers of people; most people
my not being able
who knew about SoundByting were already heavily
to personally pay
involved with MP3s. In early 1999, the RIAA dumped the down
payment on the
the PR agency that had been managing the campaign,
hotel and funding
but the site remains to this day.
coming through
1999 - Commercial Acceptance
http://david.weekly.org/mp3book/ch1.php3 (7 of 10) [1/4/2002 10:54:05 AM]
too late. Michael
sympathized and
told me that
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
1999 signified the complete acceptance of MP3 by
hardware, software, and Internet companies. MP3.com
went public as MPPP, eMusic began signing popular
bands to exclusively sell their albums online (including
Bush, James Brown, Phish, and They Might Be
Giants). WinAMP's parent company, Nullsoft, got
bought out by America Online along with Spinner.com,
a set of online radio stations. Yahoo! acquired online
audio/video giant broadcast.com while Lycos
purchased Sonique. Nullsoft introduced new software
and services allowing individuals to listen to, create,
and broadcast their own online radio shows called
Shoutcast; an OpenSource variant by the name of
Icecast soon showed up to compete. Startups live365
and myplay jumped onto the scene to allow people to
manage their own MP3 collections and freely outsource
their broadcasting.
Dozens of hardware companies began to pump out
portable players with no moving parts, including such
heavy-hitters as RCA, Diamond Multimedia and
Creative Labs, with players expected in 2000 from
Sony, Panasonic, Toshiba, and Casio.
MP3.com would
sponsor such a
conference if I
decided to try
again. I told him I
was too busy,
being a full-time
student at
Stanford. The
next week he
called me and told
me that MP3.com
was going to put
on the
conference, just
as I had
envisioned it: an
annual event with
discussion panels
from the legal,
music, and tech
industries,
mingling time,
and music at the
end. I was greatly
pleased. The
Annual MP3
Summits were
formed, the first
one taking place
in June of 1998.
My report on that
first Summit is
still on their site.
The RIAA conceded that digital audio is likely to be
the future of music distribution and instead focusing
exclusively on trying to stop the MP3 revolution, they
redirected their efforts towards creating a new, secure
music format. Their Secure Digital Music Initiative
(SDMI) tried to formalize a standard in time to allow
hardware manufacturers to incorporate protection into their devices before the
Christmas rush, but negotiations dragged on longer than expected and not a
single SDMI-compliant player was sold in the holiday season.
Napster was also released in 1999, allowing users to connect with music on
each other's computers, achieving particular infamy for its sheer effectiveness
at letting users exchange music. We'll go into more detail on Napster in Chapter
9.
The MP3 revolution was well on its way, with artists signing to online sites left
and right, hardware and software companies making it ever easier to use and
manipulate MP3s, and increasing number of listeners flocking to the format.
That's the "how" of the MP3 revolution, but there's another important question
to ask...
http://david.weekly.org/mp3book/ch1.php3 (8 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
Why Did It Happen?
Hardware
The MP3 revolution happened as soon as it was capable of happening - as soon
as computers came onto the market that were fast enough to cope with playing
back MP3 files, the technology took off. Coincidentally, storage space in 1997
was entering into the multi-gigabyte range, allowing regular users to store
many hours of music on their computer without necessarily buying new
storage. As hard drives continued to increase in size and lessen in price, it
became possible to cheaply build absolutely massive (200+ CD) audio
collections on a regular PC; understandably, this has made MP3 usage all the
more compelling. As broadband (high-speed) Internet access is extended to the
U.S. population, it's quite likely that rich-media activities such as MP3 sharing
will continue to explode.
Open Source -> Free, Convenient Software
When Fraunhofer released L3ENC, they also released the source code to play
back the resulting MP3 files. This enabled a whole generation of free MP3
playback engines that in turn became today's popular MP3 players. Without this
source, there might not have been such a diversity of compelling software
players, and MP3 might never have gained the popularity that it did. Many
other formats exist today that are more technically advanced than MP3 but that
do not allow people to freely create players and, consequently, do not have
much of a following.
Fraunhofer was also quite generous in licensing its encoding technology and as
a result there are a fair number of high-quality MP3 encoders available, some
of which are entirely free, others of which can be purchased for a very modest
fee. Some other companies won't license their algorithms for any price; Apple
has restricted Sorenson, the makers of QuickTime video technology, from
using it anywhere else. Such closed policies have made it nearly impossible for
other formats to encroach upon the much more open turf of the MP3 world.
Standards
It is also equally important to MP3's success that it is a very well-defined
standard. As a result, there is complete software and hardware interoperability:
any program that makes an MP3 can create a file that is playable on any
hardware or software MP3 player. New uses of the format, such as with Icecast
and Shoutcast, can be rapidly deployed and integrated into the existing
architecture. Without a standard, such interoperability would be impossible.
Memes: Idea Viruses
It's also important to note that the MP3 revolution could never have happened
(or it would have taken much longer) if the Internet had not been widely
http://david.weekly.org/mp3book/ch1.php3 (9 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 1: The Hype About Internet Audio
popularized; the Internet allowed participants to post information about the
format, exchange messages, inform, and share. It let people quickly learn about
what MP3s are and obtain software to make and listen to MP3s. Once people
found out, they would often rush to tell their friends about it, spreading the
word rapidly. The Internet especially enables this kind of rapid propagation of
ideas; in some ways the ideas spread like viruses through a population: you
catch an idea from a friend and once you're infected, you pass the idea on to
other friends of yours. Richard Dawkins called such ideas "memes." Since it's
easy to share ideas with people over the Internet, memes can be rapidly
propagated. The Internet enabled the MP3 meme.
Conclusion
MP3 has been a long time coming; we've seen how the development of the
Internet and of audio technology led up to the MP3 explosion and the way that
MP3 has grown from an underground movement into a popular and
widely-accepted activity. Hundreds of companies are now engaged in
MP3-related activity, working on making music easier to make, share, find, and
hear. MP3 is everywhere. I hope you have a better understanding of how and
why MP3 has grown to the level of hype now pervading the media.
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/mp3book/ch1.php3 (10 of 10) [1/4/2002 10:54:05 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
<david.weekly.org>
January 4 2002
mp3 book Chapter 2: The Guts of Music Technology
Chapter 2: The Guts of Music
Technology
In this section and the ones following, things are going
to get increasingly technical. I'm going to start off
pretty simple and slowly ramp up to some considerably
involved topics, so please feel free to skip the parts that
you already know to get to the juicy stuff. It's possible
that you may find some parts overwhelming. Don't
worry yourself too much about it, just feel free to
simply skim. To make this easy for you, I've bolded the
key definitions throughout the text. And if you get
bored? Just go to the next chapter. Nobody's quizzing
you on this!
Digital Audio Basics
Computers work by passing small charges through
aluminum trenches etched in silicon and shoving these
charges through various gates: If this charge is here
and that one is too then the chip will create a charge in
another place. The computer does all of its
computations in ones and zeroes. Integers, like -4, 15,
0, or 3, can be represented with combinations of ones
and zeroes in an arithmetic system called binary.
Humans normally use a "decimal" system with ten
symbols per space: we count 1, 2, 3,...8, 9, 10, 11. In
the binary system there are only two symbols per
space: one counts 1, 10, 11, 100, 101, 110, 111, 1000,
etc.!
If the computer is to understand how to store music,
music must be represented as a series of ones and
zeroes. How can we do this? Well, one thing to keep in
mind throughout all of this discussion is that we're
going to be focusing on making music for humans to
hear. While that may sound trite, that will allow us to
"cheat" and throw out the parts of the music the people
http://david.weekly.org/mp3book/ch2.php3 (1 of 9) [1/4/2002 10:54:36 AM]
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
mp3 book
news
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
can't hear: a dog might not be able to appreciate
Mozart as much after we're done with things, but if it
sounds just the same to an average Jane, then we've
accomplished our true mission - to have realistic music
come from a computer!
We first need to understand what sound is. When you
hear a sound, like a train whistle or your favorite
hip-hop artist, your eardrum is getting squished in and
out by air. Speakers, whistles, voices, and anything
else that makes sound repeatedly squishes air and then
doesn't. When the sound gets to your ear, it pushes
your eardrum in and out. If the air gets squished in and
out at a constant rate, like 440 times a second, you'll
hear a constant tone, like when someone whistles a
single note. The faster the air gets squished in and out,
the higher tone you hear; likewise, the low bass tones
of a drum squish the air in and out very slowly, about
50 times a second. Engineers use the measurement
Hertz, abbreviated Hz, to mean "number of times per
second" and kilohertz, or kHz, to mean "thousands of
times per second." Some people with very good
hearing can hear sounds as low as 20Hz and as high as
20kHz. Also, the more violently the air is compressed
and decompressed, the louder the signal is.
Now we can understand what a microphone does. A
microphone consists of a thin diaphragm that acts a lot
like your eardrum: as music is being played, the
diaphragm of the microphone gets pushed in and out.
The more pushed in the diaphragm is, the more
electrical charge the microphone sends back to the
device into which you've plugged your mic. What if
you plug the mic into your computer? The computer is
good at dealing with discrete numbers, also known as
digital information, but the amount that the
microphone is being compressed is always changing; it
is analog information. There is a small piece of
hardware in a computer that allows it to record music
from a microphone: it is a called a Analog to Digital
Converter, or ADC for short. It is impossible for us to
record a smooth signal as ones and zeroes and
reproduce it perfectly on a computer. The ADC does
not attempt to perfectly record the signal. Instead,
several thousand times a second it takes a peek at how
squished in the microphone is. The rate at which I
http://david.weekly.org/mp3book/ch2.php3 (2 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
check on the microphone is called the sampling rate. If
the microphone is 100% squished in, we'll give it the
number 64,000. If the microphone is not squished in at
all, we'll give it a 0, and we'll assign it a number
correspondingly for in-between values: halfway
squished in would merit a 32,000. We call these values
samples.
The Nyquist Theorem says that as long as our
sampling rate is twice the frequency of highest tone we
want to record, we'll be able to accurately reproduce
the tone. Since humans can't hear anything higher than
22kHz, if we take sample the microphone 44,000 times
a second, we'll be able to reproduce the highest tones
that people can hear. In fact, CDs sample at 44.1kHz
and, as suggested above, store the amount the
microphone was squished as a number between 0 and
65,536, using 16 ones and zeros, or bits, for every
sample. In this way, we'd say that CDs have a sample
resolution of 16 bits.
All of this data ends up taking a great deal of space: if
we sample a left and a right channel for stereo sound at
44.1kHz, using 16 bits for every sample, that's 1.4
million bits for every second of music! On a 28.8
modem, it would take you over 50 seconds to transmit
a single second of uncompressed music to a friend! We
clearly need a way to use fewer bits to transmit the
music.
Those of you comfortable with computers may suggest
we use a compression program like WinZIP or
StuffitDeluxe to reduce the size of these musicfiles.
Unfortunately, this does not work very well. These
compression programs were designed largely with text
in mind. These programs were also designed to
perfectly reproduce every bit: If you compress a
document to put it on a floppy, it had better not be
missing anything when you decompress it on a friend's
machine! Compression algorithms work best when
they know what they are compressing. specialized
algorithms can squish down video to an 100th of its
original size, and people routinely use the JPEG (.JPG)
compression format to reduce the size of pictures on
the web. JPEG is lossy; that is to say, it destroys some
data. If you scan in a beautifully detailed picture and
http://david.weekly.org/mp3book/ch2.php3 (3 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
squish it down to a small JPEG file, you will see that
there are noticeable differences between the original
and the compressed versions, but in general it is
throwing away the information that is less important
for your eye to see to understand what the picture is
about. In the same way, we will get much better
compression of sound if we use and algorithm that
understands the way that people hear and destroys the
parts of the sound that we cannot perceive. Already,
we have done this in a small way by ignoring any
sounds above 22kHz. We might have done things
differently if we were making an audio system for a
dog or a whale; we have already exploited some
knowledge of the human ear to our advantage, now it
comes time for us to further use this knowledge to
compress the sound.
Understanding Fourier
In order to compress the sound, we need to understand what parts are okay to
throw away; that is to say, what the least important parts of the sound are. That
way, we can keep the most important parts of the sound so we can stream them
live through, say, a 28.8k modem.
Now as it turns out, Sound is very tonal. This means that sounds tend to
maintain their pitch for periods of time: a trumpet will play a note for
half-second, a piano will sound a chord, etc. If I were to whistle an 'A' for
second, your eardrum may be wiggling in and out very quickly, but the tone
stays constant. While recording the "wiggling" of the signal going in and out
would take a great deal of numbers to describe, in this case it would be much
simpler to simply record the tone and how long it went for, i.e., "440Hz (that's
A!) for 1.0 seconds." In this way, I've replaced hundreds of thousands of
numbers with two numbers.
While clearly most signals are not so compressible, the concept applies: sound
pressure, or the amount that your eardrum is compressed, changes very rapidly
(tens of thousands of times a second), while frequency information, or the tones
that are present in a piece of music, tend not to change very frequently (32
notes per second is pretty fast for a pianist!). If we only had a way to look at
sound in the frequency domain, we could probably get excellent compression.
Luckily for us, J. B. Joseph Fourier, a 19th century mathematician, came up
with a nifty way for transforming a chunk of samples into their respective
frequencies. While describing the method in detail has occupied many
graduate-level electrical engineering books, the concept is straightforward: if I
take a small chunk of audio samples from the microphone as you are whistling,
http://david.weekly.org/mp3book/ch2.php3 (4 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
I take the discrete numbers that describe the microphone's state and run it
through a Discrete Fourier Transform, also known as a DFT. What I get out
is a set of numbers that describe what frequencies are present in the signal and
how strong they are, i.e., "There is a very loud tone playing an A# and there is a
quiet G flat, too." I call the chunk of samples that I feed the DFT my input
window.
There is an interesting tradeoff here: if I take a long input window, meaning I
record a long chunk of audio from the microphone and run it all through the
DFT at once, I'll be able to pick out what tone a user was whistling with great
precision. And, just like with people, if I only let the computer hear a sound for
a short moment, it will have poor frequency resolution, i.e., it will be difficult
for it to tell what tone was whistled. Likewise, if I'm trying to nail down exactly
when a user begins to whistle into a microphone if I take short windows, I'll be
able to pick out close to the exact time when they started to whistle; but if I take
very long windows, the Fourier transform won't tell me when a tone began,
only how loud it is. I'd have trouble nailing down when it began and could be
said to have poor time resolution. Frequency resolution and time resolution
work against each other: the more you need to know exactly when a sound
happened, the less you know what tone it is; the more exactly you need to know
what frequencies are present in a signal, the less precisely you know the time at
which those frequencies started or stopped.
As a real world example of where this is applicable, Microsoft's MS Audio 4
codec uses very long windows. As a result, music encoded in that format is
bright and captures properly the tone of music, but quick, sharp sounds like
hand claps, hihats, or cymbals sound mushy and drawn out. These kinds of
quick bursts of sound are called transients in the audio compression world.
Later on, we'll learn how MP3 deals with this. (AAC and AC-3 use similar
techniques to MP3.)
In 1965, two programmers, J. Tukey and J. Cooley invented a way to perform
Fourier transforms a lot faster than had been done before. They decided to call
this algorithm the Fast Fourier Transform, or FFT. You will likely hear this
term used quite a bit in compression literature to refer to the Fourier transform
(the process of looking at what tones are present in a sound).
The Biology of Hearing
Now that we understand how computers listen to sounds and how frequencies
work, we can begin to understand how the human ear actually hears sound. So
I'm going to take a bit of a "time out" from all of this talk about computer
technology to explain some of the basics of ear biology.
As I mentioned before, when sound waves travel through the air, they cause the
eardrum to vibrate, pushing in and out of the ear canal. The back of the eardrum
is attached to an assembly of the three smallest bones in your body, known as
http://david.weekly.org/mp3book/ch2.php3 (5 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
the hammer, anvil, and stirrup. These three bones are pressed up against an oval
section of a spiral fluid cavity in your inner ear shaped like a snail shell, known
as the cochlea. (Cochlea is actually Latin for "snail shell"!) The vibrations from
the bones pushing against the oval window of the cochlea cause hairs within the
cochlea to vibrate.
Depending on the frequency of the vibrations, different sets of hairs in the
cochlea vibrate: high tones excite the hairs near the base of the cochlea, while
low tones excite the hairs at the center of the cochlea. When the hairs vibrate,
they send electrical signals to the brain; the brain then perceives these signals as
sound. The astute reader may notice that this means that the ear is itself
performing a Fourier transform of sorts! The incoming signal (the vibrations of
the air waves) is broken up into frequency components and transmitted to the
brain. This means that thinking about sound in terms of frequency is not only
useful because of the tonality of music, but also because it corresponds to how
we actually perceive sound!
The sensitivity of the cochlear hairs is mind-boggling. The human ear can sense
as little as a picowatt of energy per square foot of sound compression, but can
take up to a full watt of energy before starting to feel pain. Visualize dropping a
grain of sand on a huge sheet and being able to sense it. Now visualize
dropping an entire beachful of sand (or, say, an anvil) onto the same sheet,
without the sheet tearing and also being able to sense that. This absurdly large
range of scales necessitated the creation of a new system of acoustic
measurement, called the bel, named after the inventor of the telephone,
Alexander Graham Bell. If one sound is a bel louder than another, it is ten times
louder. If a sound is two bels louder than another, it is a hundred times louder
than the first. If a sound is three bels louder than another, it is a thousand times
louder. Get it? A bel corresponds roughly to however many digits there are
after the first digit. A sounds 100,000 times louder than another would mean
there was 5 bels of difference. This system lets us deal with manageably small
numbers that can represent very large numbers. Mathematicians call these
logarithmic numbering systems.
People traditionally have used "tenths of bels," or decibels (dB) to describe
relative sound strengths. In this system, one sound that was 20dB louder than
another would be 2 bels louder, which means it is actually 100 times louder
than the other. People are comfortable with sounds that are a trillion times
louder than the quietest sounds they can hear! This corresponds to 12 bels, or
120dB of difference.
If a set of hairs are excited, it impairs the ability of nearby hairs to pickup
detailed signals; we'll cover this in the next section. It's also worth noting that
our brain groups these hairs into 25 frequency bands, called critical bands: this
was discovered by acoustic researchers Zwicker, Flottorp, and Stevens in 1957.
We'll review critical bands a bit later on. Now, equipped with a basic
knowledge of the functioning of the ear, we can tackle understanding the parts
http://david.weekly.org/mp3book/ch2.php3 (6 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
of a sound less important to the ear.
Psychoacoustic Masking
Your ear adapts to the sounds in the environment around you. If all is still and
quiet, you can hear a twig snap hundreds of feet away. But when you're at a
concert with rock music blaring, it can be difficult to hear your friend, who is
shouting right into your ear. This is called masking, because the louder sounds
mask the quieter sounds. There are several different kinds of masking that
occur in the human ear.
Normal Masking
Your ear obviously has certain inherent thresholds: you can't hear a mosquito
buzzing 5 miles away even in complete silence, even though, theoretically it
might be possible to do it with sufficiently sensitive instrumentation. The
human ear is also more sensitive to some frequencies than to others: our best
hearing is around 4000Hz, unsurprisingly not too far from the frequency range
of most speech.
If you were to plot a curve graphing the quietest tone a person can hear versus
frequency, as is done to the right, it would look like a "U," with a little
downwards notch around 4000Hz. Interestingly enough, people who have
listened to too much loud music have a lump in this curve at 4000Hz, where
they should have a notch. This is why it's hard to hear people talk right after a
loud concert. Continued exposure to loud music will actually permanently
damage your cochlear hair cells, and unlike the hair on your head, cochlear
hairs never grow back.
This curve, naturally, varies from person to person, and gets smaller the older
the subject is, especially in the higher frequencies. Translation: old people
usually have trouble hearing. Theoretically, this variance could be used to
create custom compression for a given person's hearing capability, but this
would require a great deal of CPU horsepower for a server delivering 250
custom streams at once!
Tone Masking
Pure tones, like a steady whistle, mask out nearby tones: if I were to whistle a C
very loudly and you were to whistle a C# very softly, an onlooker (or
"on-listener," really) would not be able to hear the C#. If, however, you were to
whistle an octave or two above me, I might have a better chance of noticing it.
The farther apart the two tones are, the less they mask each other. The louder a
tone is, the more surrounding frequencies it masks out.
Noise Masking
Noise often encompasses a large number of frequencies. When you hear static
http://david.weekly.org/mp3book/ch2.php3 (7 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
on the radio, you're hearing a whole slew of frequencies at once. Noise actually
masks out sounds better than tones: It's easier to whisper to someone at even a
loud classical music concert than it is under a waterfall.
Critical Bands and Prioritization
As mentioned in our brief review of the biology of hearing, frequencies fall into
one of 25 human psychoacoustic "critical bands." This means that we can treat
frequencies within a given band in a similar manner, allowing us to have a
simpler mechanism for computing what parts of a sound are masked out.
So how do we use all of our newly-acquired knowledge about masking to
compress data? Well, we first grab a window of sound, usually about 1/100th of
a second-worth, and we take a look at the frequencies present. Based on how
strong the frequency components are, we compute what frequencies will mask
out what other frequencies.
We then assign a priority based on how much a given frequency pokes up
above the masking threshold: a pure sine wave in quiet would receive nearly all
of our attention, whereas with noise all of our attention would be spread around
the entire signal. Giving more "attention" to a given frequency means allocating
more bits to that frequency than others. In this way, I describe exactly how
much energy is at that frequency with greater precision than for other
frequencies.
Fixed-Point Quantization
How are the numbers encoded with different resolutions? That is to say, how
can I use more bits to describe one number than another? The answer involves
a touch of straightforward math. Do you remember scientific notation? It uses
numbers kike 4.02 x 1032. The 4.02 is called the mantissa. The 32 is usually
called the exponent, but we're going to call it the scale factor. Since
frequencies in the same critical band are treated similarly by our ear, we give
them all the same scale factor and allocate a certain (fixed) number of bits to
the mantissa of each. For example, let's say I had the numbers 149.32, -13.29,
and 0.12 - I'd set a scale factor of 4, since 104 = 100 and our largest number is
0.14932 x 103. In this way, I'm guaranteed that all of my mantissas will be
between -1 and 1. Do you see why the exponent is called a scale factor now? I
would encode the numbers above as 0.14932, -0.01329, and 0.00012 using a
special algorithm known as fixed-point quantization.
Have you ever played the game where someone picks a number between 1 and
100 and you have to guess what it is, but are told if your guess is high or low?
Everybody knows that the best way to play this game is to first guess 50, then
25 or 75 depending, etc., each time halving the possible numbers left.
Fixed-point quantization works in a very similar fashion. The best way to
http://david.weekly.org/mp3book/ch2.php3 (8 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> mp3 book: Chapter 2: The Guts of Music Technology
describe it is to walk through the quantization of a number, like 0.65. Since we
start off knowing the number is between -1 and 1, we should record a 0 if the
number is greater than or equal to 0, and a 1 if it is less than 0. Our number is
greater than zero, so we record 0: now we know the number is between 0 and 1,
so we record a 0 if the number is greater than or equal to 0.5. Being greater, we
record 0 again, narrowing the range to between 0.5 and 1. On the next step, we
note that our number (0.742) is less than 0.75 and record a 1, bringing our total
number to 001. You can here see how with each successive "less-than,
greater-than" decision we record a one or a zero and come twice as close to the
answer. The more decisions I am allowed, the more precisely I may know a
number. We can use a lot of fixed-point quantization decisions on the
frequencies that are most important to our ears and only a few on those that are
less. In this way, we "spend" our bits wisely.
We can reconstruct a number by reversing the process: with 001, we first see
that the number is between 0 and 1, then that it is between 0.5 and 1, and finally
that it is between 0.5 and 0.75. Once we're at the end, we'll guess the number to
be in the middle of the range of numbers we have left: 0.625 in this case. While
we didn't get it exactly right, our quantization error is only 0.025 - not bad for
three ones and zeroes to match a number so closely! Naturally, the more ones
and zeroes that are given, the smaller the quantization error.
Conclusion
The above technique roughly describes the MPEG Layer 2 codec (techie jargon
for compression / decompression algorithm) and is the basis for more advanced
codecs like Layer 3, AAC, and AC-3, all of which incorporate their own extra
tricks, like predicting what the audio is going to do in the next second based on
the past second. At this point you understand the basic foundations of modern
audio compression and are getting comfortable with the language used; it is
time to move to a comprehensive review of modern audio codecs.
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/mp3book/ch2.php3 (9 of 9) [1/4/2002 10:54:36 AM]
<d.w.o> code: Programs I've Written
<david.weekly.org>
January 4 2002
code Programs I've Written
pixdir 0.3
pixdir is a set of scripts that take recursive directories full of
full-size pictures, munge through them, and create thumbnails
for pictures not yet thumbnailed; then spitting out HTML
pages with tables full of thumbnails. (see this in action) The
code's currently quite a hack; not very elegant at all. But it
gets the job done. I'll move to something nicer later.
Simple Finger 1.0
Simple Finger is a simple finger client for Win32 systems. It
should theoretically work on Windows95, 98, NT, or 2000, but
has only been tested on Windows98. It is tiny: the ZIP file
above is 18Kb, including the binary and source code. The
code is simply structured and should serve as a useful tool for
anyone looking to learn Windows Sockets programming. It is
non-graphical, running from the DOS prompt. (No
documentation is included: if you don't know how it works,
you probably can't make use of it anyhow.)
Unbooting Yourself From Napster
Has the Metallica lawsuit banned you from using Napster?
Click here for some quick tips on how to get back online.
DiamondSilk: my senior project
DiamondSilk is a project to created structured data from
unstructured HTML, such as being able to deduce the price of
a product from the corresponding page at buy.com.
Documentation for the Napster Protocol is here.
I did a very quick summary overview of the Napster protocol.
A better and more comprehensive description can be found at
OpenNap.
Information for circumnavigating an ISP's Napster block
is here
Here is a short tutorial on configuring a Linux box to act as a
SOCKS5 proxy to grant people who have been blocked from
Napster access to the Napster network.
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
mp3 book
news
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
A mirror of The Breaking of Cyber Patrol 4 (not my work)
My project for a secure, authenticated file exchange network (known alternately as
http://david.weekly.org/code/ (1 of 3) [1/4/2002 10:55:27 AM]
<d.w.o> code: Programs I've Written
safeX or fexnet) -- while code work is just beginning and not yet available, read the
idea.
console othello has its own page!
rio v1.06
The official Rio utilities have now incorporated the ability to download
files from the Rio to the PC, so my patch to do this is no longer
neccessary. I have cleaned things up and put them in these easy-to-use
packages:
● RedHat i386 RPM [v1.03 - old!]
●
RedHat Source RPM [v1.03 - old!]
●
Raw Source [v1.06 - newest]
ftpcheck v0.33
v0.33: Shifted to a more subtle anonymous email address to pass most
anonftp checks - thanks to Tox Gunn for pointing this out!
v0.32: Fixed misclassification of "a.b.c" hostnames as class C IPs
(Thanks, Jesper!)
v0.31: Patched up some dumb bugs and cleaned up the code a little bit
(Thanks Shane!). Wow, over a thousand downloads now! Keep mailing
back those source patches!
v0.3: ftpcheck is now an order of magnitude more efficient, thanks to
improvements from Shane Kerr and some new timeout code that I
wrote. Also now under the GPL.
ftpcheck scans hosts and networks for FTP and anonymous FTP
archives. It was written as a security analysis tool. I wouldn’t
recommend running it on subnets you don’t own, unless you like getting
calls from sysadmins at very early hours in the morning.
requires perl modules
relaycheck v0.3
the parent of ftpcheck, relaycheck scans a network for SMTP hosts that
permit "relaying" of email. These servers are vulnerable because a 3rd
party could come in and use the mail server to relay mail through the
server for the purpose of spamming folks. Please email the
administrators of any machines you find with this tool and tell them to
turn off SMTP forwarding!
requires perl modules
sweep v0.4
http://david.weekly.org/code/ (2 of 3) [1/4/2002 10:55:27 AM]
<d.w.o> code: Programs I've Written
mach-sweep was written back when Snap was running their Mach M3
contest back in august of 1998. The basic gist of the contest was that
you had some small chance to win an instant prize every time you
searched. So I wrote a perl script to "search" snap for the same term
over and over again, and see if "congratulations!" was anywhere in the
returned page. If so, it would save out the HTML page to disk and notify
me. Otherwise, it would just print a period and search again. I ran it on
about seven machines in parallel -- let’s just say I have enough slinkies,
books, and video cameras to keep myself entertained for a while. ;)
requires perl modules
{required perl modules}
● perl [ for unix | for Win32 | for mac ]
●
libnet [info]
●
MD5 [info]
●
MIME::Base64
●
HTML::Parser
●
libwww
notes to self on perl modules
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/code/ (3 of 3) [1/4/2002 10:55:27 AM]
<d.w.o> codecs: Dave's Encoding Guide
<david.weekly.org>
January 4 2002
codecs Dave's Encoding Guide
One day, procrastinating from doing schoolwork and about
two days after Microsoft had released their new standard for
compressing sound, called MS Audio 4, I decided to see just
how good (or bad) the codec was and ran these tests, pitting
it against MP3 and RealAudio, both of which it was supposed
to crush. While I certainly don’t think the quality is
earthshattering as it does not scale well to provide CD-quality
audio and has annoying high-frequency artifacts, it may give
RealAudio a run for its money in the low-bitrate market.
As it turned out, the report became pretty popular. Over
30,000 people are estimated to have viewed this report. A
second report will be forthcoming, covering MP2, MP3, AAC,
AC-3, QDesign, EPAC, RealAudio, MS Audio 4, and VQF. (I
decided to reserve CodecReview.com from Internic.)
NOTE: this report is getting pretty old and may not be
representative of the current version of Windows Media.
december 4, 2000 - audio samples are back online!
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
mp3 book
news
pictures
poems
Executive Summary
projects
Introduction
updates
Equipment
writings
DTMF Tone Tests
video
get my updates
Sliding Tone Tests
Speech Tests
enter email
Subscribe!
Music Tests
dwo search
Updates
Also of Interest
http://david.weekly.org/audio/ (1 of 2) [1/4/2002 10:55:43 AM]
page source
<d.w.o> codecs: Dave's Encoding Guide
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/audio/ (2 of 2) [1/4/2002 10:55:43 AM]
<d.w.o> codecs: Executive Summary
<david.weekly.org>
January 4 2002
codecs Executive Summary
MS Audio v4.0 sounds considerably different than existing
codecs, likely due to its completely new compression
schema. The sound is brighter and has a much greater
frequency range than other codecs, but at a loss of crispness
and precision in the upper register that many find intolerably
mushy and distorted, especially for "transients," sounds that
occur quickly, like a hand clap or a hihat.
auf deutsch
en español
en français
{
<d.w.o>
about
books
WinAMP and RealAudio’s MP3 playback engines were both
found to not perform properly in some of the tests, whereas
Xing and Sonique properly played back the test MP3 files.
RealAudio was found to perform adequately, not providing
spectacular results, but generally producing the most reliably
listenable files.
}
code
codecs
mp3 book
news
MP3 Variable Bitrate Reduction coding (VBR), implemented
by Xing, was found to produce very crisp files, although a test
has as of yet to be run to see if the files are actually of higher
quality than constant bitrate files at the same average rate.
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
content & layout copyright '2000 -{ david e weekly }http://david.weekly.org/audio/summary.php3 [1/4/2002 10:56:16 AM]
<d.w.o> codecs: Introduction
<david.weekly.org>
January 4 2002
codecs Introduction
Today is April 17, 1999. Microsoft a few days ago released a
proprietary codec, known as "MS Audio v4.0" Nobody’s
exactly sure what’s in it, but it seems to be pretty high-quality.
I wanted to investigate how well MS Audio actually works and
possibly get a glimpse into how it works as well. I created four
test files: a series of pure DTMF tones (like what your
telephone does to dial a number), a descending tone and a
descending tone at the same time, a short voice clip, and a
short music clip. Just because I felt like it, I decided to check
a number of different encoding schemes for relative quality,
and ended up running a somewhat exhaustive battery of tests
on most all of the popular codecs. The results were
interesting. I found one codec that could reproduce my
speech at 2.5kbps (That’s an hour and a half of speech on
one floppy disk!) and I found some fascinating hints as to how
Xing’s Variable Bitrate Coding actually works, as well as
some anomalies in the MS Audio codec itself.
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
mp3 book
news
pictures
As a quick note, I was unfortunately unable to perform
comprehensive testing on RealAudio’s G2 codec, as their free
G2 Encoder (called the RealProducer G2) was extremely
limited in what bitrates it would let you encode at. I was
somewhat miffed.
poems
projects
updates
UPDATE: Real Networks is sending me a copy of
RealProducer G2 to complete the testing. More details when
it arrives.
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
content & layout copyright '2000 -{ david e weekly }http://david.weekly.org/audio/introduction.php3 [1/4/2002 10:56:24 AM]
<d.w.o> codecs: Equipment
<david.weekly.org>
January 4 2002
codecs Equipment
●
CoolEdit96 (Registered) was used to edit the .WAV files
and generate the sample tones.
RealProducer G2 v6.0.3.271(Free Version) was used to
create the sample G2 files.
RealPlayer G2 v6.0.5.27 was used to listen to the G2
files.
Xing’s AudioCatalyst v2.0 (Registered) was used to rip
raw audio from my CD and, separately, to create the
sample MP3 files.
Microsoft’s Windows Media Encoder v4.0.0.3688 was
used to encode the sample MS Audio v4, ACELP.Net,
G.723.1, VoxWare MetaVoice, and Lernout & Hauspie
CELP files.
The Windows Media Player v6.02.05.0410 was used to
listen to all ASX files.
WinAMP v2.10 was used to listen to some of the MP3
files.
Sonique v0.90b was used for some MP3 tests
●
freeamp v1.2 was used for some MP3 tests
projects
●
XingMP3 Player v1.0.0 was used for some MP3 tests
updates
●
●
●
●
●
●
●
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
mp3 book
news
pictures
poems
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
content & layout copyright '2000 -{ david e weekly }http://david.weekly.org/audio/equipment.php3 [1/4/2002 10:56:50 AM]
<d.w.o> codecs: DTMF Tone Tests
<david.weekly.org>
January 4 2002
codecs DTMF Tone Tests
sample stereo DTMF tones (2.95
MB)
Interestingly enough, when I tried to encode the above file as
a constant bitrate MP3 file (@16kbps & @24kbps), the result
was complete silence! Apparently, (I thought) MP3 had some
fixed filterbanks that didn’t take well to pure tones. MP3 at
higher bitrates still sounded wierd, and it wasn’t until 128kbps
that WinAMP actually got it right. When I accidentally dragged
my 48kbps file onto the RealPlayer and it played correctly, I
realized that the problem was not on the encoding side, but
the decoding side: this was a bug in WinAMP. Try listening to
the following files in WinAMP, then try listening to them in
another program (like RealPlayer or Windows Media).
● 48 kbps MP3 (100 KB)
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
codecs
mp3 book
news
●
64 kbps MP3 (134 KB)
pictures
●
96 kbps MP3 (201 KB)
poems
●
128 kbps MP3 (268 KB)
Lest you all think I’m nuts, or this bug gets fixed, I’ve provided
a WAV output (1.48 MB) of what WinAMP did to the MP3 at
48kbps. This didn’t seem to be a problem in other players.
Xing’s Variable Bitrate Encoding reproduced the file faithfully
at an average of 35.5 kbps at the lowest setting, but could not
get any smaller. The VBR file played back just fine under
WinAMP, unlike the constant bitrate files. There was little
difference (73KB to 90KB) between the highest and the
lowest VBR settings for this file:
● Lowest VBR setting (73 KB)
●
projects
updates
writings
video
get my updates
enter email
Subscribe!
Highest VBR setting (90 KB)
RealAudio was disappointing, as I could only scale it down to
about 20kbps with the free version of their encoder. The test
(48 KB) came out a touch scratchy.
dwo search
page source
The Micrsoft encoder was able to scale to much lower
bitrates. When I fed it the tone file and asked it to encode a 5kbps file I was at first
surprised to see that the output file contained nothing but silence. Then I realized
http://david.weekly.org/audio/dtmf.php3 (1 of 2) [1/4/2002 10:56:57 AM]
<d.w.o> codecs: DTMF Tone Tests
that all of the frequencies in the file were above 4khz: since the 5kbps encoder was
sampling at 8khz, it had missed all of the tones above 4khz! This, incidentally,
means that the codec includes a good low-pass filter. Otherwise, there would have
been a significant amount of aliasing and I would have ended up with a bunch of
noise. Instead, the whole signal was filtered out and I got pure silence.
Listening closely to MS Audio encodings of the file, one can hear the warble and
mask when the two frequencies come close together. The two tones are almost
fighting each other for dominance. This anomaly does not seem to be as
conspicuous at lower sampling rates:
● 20kbps MSA4 encoding, sampled @ 16khz (46 KB)
(hear the warble & mask)
● 20kbps MSA4 encoding, sampled @ 11khz (46 KB)
High frequency anomalies seem to be a fundamental problem with the MS Audio
codec; I will return to this further on in the writeup.
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/audio/dtmf.php3 (2 of 2) [1/4/2002 10:56:57 AM]
<d.w.o> codecs: Sliding Tone Tests
<david.weekly.org>
January 4 2002
codecs Sliding Tone Tests
sample sliding tones (1.76 MB)
This is the area in which I grilled the MS Audio Encoder
because I got the most tangibly interesting results from it:
● MSA4 5kbps w/8khz sampling (9 KB)
auf deutsch
en español
en français
{
}
<d.w.o>
about
●
MSA4 8kbps w/8khz sampling (13 KB)
●
MSA4 12kbps w/8khz sampling [stereo] (18 KB)
books
●
MSA4 16kbps w/8khz sampling [stereo] (23 KB)
code
●
MSA4 16kbps w/11khz sampling [stereo] (23 KB)
Notice how in the 5 & 12 kbps samples the high and the low
tone toggle between eachother, unsure of which should take
dominance. Also note the frequency cutoff (the high tone
suddenly appears & disappears): oddly enough, this cutoff is
not at 4khz as we might expect, but considerably lower. The
warbling would seem to indicate the encoder is using a sort of
"point of focus," where it concentrates on the most energetic
portion of the signal. The lower-than-4khz cutoff seems to
also indicate that this focus model is not based on filterbanks,
or at least a very different model. This is clearly something
entirely different from the MPEG-type perceptual audio
encoders. Listen to these MP3 files and hear how the high
tone reaches a full 4khz before cutting out:
● MP3 16kbps mono (20 KB)
●
MP3 24kbps mono (30 KB)
●
MP3 96kbps stereo (120 KB)
●
MP3 VBR (low) (61 KB)
One thing that amused me about this bank of MP3s is that
RealAudio’s builtin MP3 decoder was unable to render them
properly, but WinAMP had no problem with them! Clearly,
there is some variation in MP3 decoder implementations! To
their credit, the Xing, Sonique, and FreeAMP players were all
able to successfully play both sets of files. Note how the
MP3s properly cut off at 4khz.
codecs
mp3 book
news
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
Again, the G2 test did not provide any interesting result, due
to the limitations on the free version of their encoder. (The paid version costs $150!)
The 20kbps stereo encoding performed adequately.
http://david.weekly.org/audio/slide.php3 (1 of 2) [1/4/2002 10:57:44 AM]
<d.w.o> codecs: Sliding Tone Tests
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/audio/slide.php3 (2 of 2) [1/4/2002 10:57:44 AM]
<d.w.o> codecs: Speech Tests
<david.weekly.org>
January 4 2002
codecs Speech Tests
speech sample (2.0 MB)
"Hello, my name is David Weekly and this is a test of speech
quality audio coding. The purple cat, masked, made an
indelible impression on the clandestine cohorts." - a random
sentence with crisp consonants
MS Audio v4.0
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
●
5kbps, 8khz, mono (9 KB)
●
10kbps, 11khz, mono (16 KB)
●
10kbps, 22khz, mono (16 KB)
●
16kbps, 16khz, mono (24 KB)
●
16kbps, 44khz, mono (24 KB)
news
●
22kbps, 44khz, mono (33 KB)
pictures
●
22kbps, 44khz, stereo (33 KB)
●
32kbps, 32khz, mono (48 KB)
●
32kbps, 44khz, mono (48 KB)
projects
●
40kbps, 44khz, mono (60 KB)
updates
The 5kbps version, while comprehensible, is unpleasant to
listen to; it is echoed, as if I were talking through a tin can.
The 10kbps version at 22khz sounds rather robotic. Reducing
the sampling rate to 11khz produced a much more pleasant
version, as there was less high-frequency "drowning" of the
signal. This is also illustrated in the 16kbps versions at 16
versus 44khz. It seems clear that if one is to use low bitrate
signals with MS Audio, it’s better to use a low sample rate as
well. The 32kbps still adds a rather annoying "swish" to my
voice, as if there were a thick piece of fabric on my lips as I
was speaking. At 40kbps, it becomes listenable, even with
some high-frequency artifacts still remaining.
codecs
mp3 book
poems
writings
video
get my updates
enter email
Subscribe!
dwo search
MP3
●
16 kbps mono (23 KB)
●
24 kbps mono (34 KB)
http://david.weekly.org/audio/speech.php3 (1 of 3) [1/4/2002 10:57:51 AM]
page source
<d.w.o> codecs: Speech Tests
●
32 kbps mono (46 KB)
●
32 kbps stereo (46 KB)
●
48 kbps stereo (69 KB)
●
128 kbps stereo (183 KB)
●
VBR (lowest) stereo (77 KB)
●
VBR (highest) stereo (168 KB)
The VBR (lowest) file here performed aimicably against the constant bitrate
samples. One notices a high-pitched ringing in the 24-48kbps encodings. The
16kbps encoding is listenable, but sounds like I’m speaking through a plastic tube of
sorts.
Barath Raghavan wrote in to say that Fraunhofer’s encoder offers better quality
than Xing for low, constant bitrate speech. As soon as I get my hands on some
samples, I will post them.
Alternative Speech Codecs
●
MetaVoice 2.4 kbps (5.7 KB)
●
MetaVoice 3 kbps (6.6 KB)
●
Lernout & Hauspie CELP 4.8 kbps (10.6 KB)
●
Microsoft G.723.1 5.3 kbps (10.8 KB)
●
ACELP.net 5 kbps (10.2 KB)
●
ACELP.net 16 kbps (27 KB)
●
ADPCM 6 bit (506 KB)
The MetaVoice codec performed outstandingly, intelligibly reproducing my voice at
a mere 2400 bits per second. While it sounds somewhat like a Speak ’n Spell
instead of me, the text comes across fairly clearly. I was pleasantly impressed. The
L&H CELP did not perform too well (IMHO) against G.723.1 and ACELP.net, and
while ADPCM offered high quality, the size was nearly two orders of magnitude
larger than MetaVoice.
ACELP.net would here be my recommended codec of choice for 5-15kbps speech
coding, with MetaVoice handling anything beneath that.
RealAudio
RealAudio did pretty well with their 16 kbps (24 KB) encoding and the 32 kbps (55
KB) were both pleasant to listen to, even if not transparent (i.e., there were
noticeable, but acceptable errors in the audio).
http://david.weekly.org/audio/speech.php3 (2 of 3) [1/4/2002 10:57:51 AM]
<d.w.o> codecs: Speech Tests
Recommendations
For encoding speech, I recommend the following codecs for the specified bitrates:
codec
TrueVoice
ACELP.net
RealAudio
MP3 VBR
speed
< 5kbps
5kbps - 15kbps
15kbps - 50kbps
> 50kbps
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/audio/speech.php3 (3 of 3) [1/4/2002 10:57:51 AM]
<d.w.o> codecs: Music Tests
<david.weekly.org>
January 4 2002
codecs Music Tests
Here is the clincher. The fact of the matter is, most people
don’t encode sine waves or DTMF tones, and most of the
popular content out there isn’t speech, either. It’s music. So I
picked a nice 36-second sample of Brazilian music,
Bermimbau’s "Mandrake Som" from Blue Brazil.
funky music sample (6.3 MB)
Microsoft Audio v4.0
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
●
16 kbps, 16khz stereo (80 KB)
●
20 kbps, 16khz stereo (98 KB)
●
44 kbps, 44khz stereo (206 KB)
mp3 book
●
64 kbps, 44khz stereo (295 KB)
news
●
80 kbps (367 KB)
●
96 kbps (439 KB)
●
128 kbps (582 KB)
At 20kbps, MSA4 is definitely carrying the upper frequencies,
but they are "drowning." Listen carefully to how short and
crisp the syncopated hihat sounds in the WAV file and then
how long and watered down it sounds in the 20kbps version. I
think what is so immediately surprising about MSA4 is that it
bothers to try and encode the higher frequencies at all: we’re
not used to hearing that from a modem-rate codec. But now
we see why, perhaps, other codecs steered clear of that area
-- high-frequency anomalies can be quite annoying. Even at
44kbps, the higher frequencies are still being swished
around. RealPlayer, to counter, just hacks the signal down to
a frequency range that it’s comfortable with. As a result, the
files are soft and easy to listen to, even if lacking crispness.
To the credit of MSA4, the overall quality is much higher than
the other codecs and the dynamic range is excellent.
codecs
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
RealAudio
page source
●
20 kbps (95 KB)
http://david.weekly.org/audio/music.php3 (1 of 3) [1/4/2002 10:57:58 AM]
<d.w.o> codecs: Music Tests
●
32 kbps (154 KB)
●
44 kbps (209 KB)
●
64 kbps (303 KB)
●
97 kbps (457 KB)
The 20kbps is not very nice to listen to; it feels kind of like sitting in a middle seat in
coach class on a 12-hour flight: all of the sound is packed in to too low a frequency
range. The 32kbps version is listenable, but not quite yet pleasant. At 44kbps the
music, to me, breaks some gray border between "listenable" and "funky" - the music
is now genuinely enjoyable, with the hihats and percussion properly accounted for.
The 64kbps version fills out the sound a bit more, but the 97kbps version seems to
have little further to add.
MP3
●
16 kbps, mono (71 KB)
●
24 kbps, mono (107 KB)
●
32 kbps, mono (143 KB)
●
32 kbps, stereo (143 KB)
●
48 kbps, stereo (215 KB)
●
64 kbps, mono (286 KB)
●
64 kbps, stereo (286 KB)
●
VBR (lowest) mono (231 KB)
●
VBR (lowest) stereo (400 KB)
●
VBR (low) mono (280 KB)
●
VBR (mid) mono (338 KB)
●
VBR (mid) stereo (582 KB)
●
VBR (highest) stereo (847 KB)
Note that the RealPlayer G2 will choke on the VBR files, inserting large quantities of
silence. Their VBR support is obviously not quite polished. We see here once more
the classic playoffs of sampling frequency and stereo vs. mono. The music
becomes listenable at 64kbps. While not encoded above (gosh, I’m getting tired!)
the 128kbps version adds a small amount of clarity and crispness to the sounds.
Notably, the VBR (highest) encoding is effectively transparent. I haven’t been able
to find a file that encoded poorly with VBR on its highest setting: it will just suck up
more bits. This is ideal for archiving music, as it doesn’t sound lossy at all, even for
very high-fidelity clips (I have an excellent hirate VBR clip of Sting’s "Hounds of
Winter" that just blew me away - I may put up part of it at some point).
http://david.weekly.org/audio/music.php3 (2 of 3) [1/4/2002 10:57:58 AM]
<d.w.o> codecs: Music Tests
Recommendations
While I have yet to put up more music here, I would say that in general MSA4
encodes low/mid frequency music excellently and that you should run to encode
most of your techno / drum&bass / house music right away with it. For those of you
that love folk, classical, or hifi audio, I’d either use MSA4 with a very high bitrate or
MP3 VBR at the highest setting. Given that it’s free to stream MP3s and that you
can make and listen to them on a diverse array of platforms (versus just MS’s), and
given the relatively lossless character of VBR (highest), I’d vote for that right now. If
you’re already with a RealAudio framework, try to provide a 44kbps stream for those
of us at universities and at work who have fast enough connections: the music is an
entirely different (better!) experience when it is clean! Most people have G2 at this
point, or are willing to get it, so I would opt to use it. Although not covered here, the
G2 codec is significantly cleaner & nicer than the RA5 & RA3 Dolbynet-based
codecs. It’s worth the move up.
Shame on RealNetworks for making all of their free tools difficult and hard to
access. RN cannot continue trying to pimp their customers, or people will move to
more pleasant and more powerful frameworks, like MS Audio and/or MP3.
One last thing: I came into this report wanting to dislike MS Audio v4.0. But it has
shown itself admirably, and seems to be based entirely on in-house, proprietary
work. While I loathe closed standards and the way they’ve tied the codec to their
own expensive products, the codec is of extremely high quality, possibly better than
AAC (although I need some listening tests for that!). Kudos to the quiet brains that
made it.
I’ll be making additions and modifications to this document as they come in. Please
feel free to tell me what you thought of the report, what’s wrong with it, where you
have something to add, or where you’d like me to put a sample of yours.
UPDATE: Microsoft did send me an email. In fact, I got an email from Microsoft’s
Codec Group Manager, Amir. Here’s what he said. It’s worth a read as he pointed
out some important technical flaws in the resampler bundled with MS Audio 4.0 that
may have caused the high-frequency errors. They are working on improving their
resampler but suggest in the meantime that anyone using MS Audio should
downsample on their own before encoding. I will downsample before encoding my
next round of tests.
content & layout copyright '2000 -{ david e weekly }-
http://david.weekly.org/audio/music.php3 (3 of 3) [1/4/2002 10:57:58 AM]
<d.w.o> codecs: Updates
<david.weekly.org>
January 4 2002
codecs Updates
If you want to be notified when this report and other items on
my website change, sign up here. I will not give your email
address to anyone, and I usually send out less than one
email message a month.
auf deutsch
en español
en français
{
}
<d.w.o>
email:
about
Notify me when the site changes
If you found this report useful, please consider donating a few
bucks or maybe a fresh batch of cookies to keep a starving
college student alive. Just send whatever you can to "David
Weekly, PO Box 14216, Stanford, CA 94309" and the gods
will bless you and I will not die of starvation.
books
code
codecs
mp3 book
news
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
content & layout copyright '2000 -{ david e weekly }http://david.weekly.org/audio/updates.php3 [1/4/2002 10:58:10 AM]
<d.w.o> codecs: Also of Interest
<david.weekly.org>
January 4 2002
codecs Also of Interest
If you found this report interesting, you may also find John
Hayward-Warburton’s writeup to be particularly interesting.
RealNetworks themselves did a writeup on MS Audio here.
Robin Whittle did an aboslutely incredible comparison of
AAC, MP3, and VQF a few months ago. Robin also covers
lossless audio compression in great detail. Panos Stokas ran
a nice informal series of tests on his page. You could also
look at ISO’s AAC listening tests (requires adobe acrobat) to
see a very formalized set of tests.
auf deutsch
en español
en français
{
}
<d.w.o>
about
books
code
And, of course, if you found this writeup interesting, maybe
you’d like to look at the rest of <david.weekly.org>! =)
codecs
mp3 book
news
pictures
poems
projects
updates
writings
video
get my updates
enter email
Subscribe!
dwo search
page source
content & layout copyright '2000 -{ david e weekly }http://david.weekly.org/audio/links.php3 [1/4/2002 10:58:21 AM]
Lossless audio compression
Lossless Compression of Audio
●
Tests of Shorten, MUSICompress/WaveZIP , WaveArc, Pegasus SPS (ELS-Ultra), Sonarc, LPAC , WavPack, AudioZip, Monkey, RKAU and
FLAC audio compression software.
Links to material concerning the lossless compression (data reduction) of digital audio signals, including some other programs which I did not test.
●
A detailed look at Rice coding and other techniques for compressing integers of varying lengths - particularly Elias coding and the work of Peter Fenwick.
●
My particular interest is in delivery of music via the Net - with compression which does not affect the sound quality at all. I am primarily interested in
compression ratios, not speed of the programs. This is the first web site devoted to listing all known lossless audio compression algorithms and software - please
email your suggestions and I will try to keep it up-to-date.
Copyright Robin Whittle 1998 - 2000 [email protected] Originally written 8 December 1998. Complete new test series and update 24 November to 11
December 2000.
Latest update 31 October 2001. The update history is at the bottom of this page.
Back to the First Principles main page - for material on telecommunications, music marketing via Internet delivery, the Devil Fish TB-303
modification, the world's longest Sliiiiiiinky and many and other show-and-tell items.
To the /audiocomp/ directory, which leads to material on lossy audio compression, in particular, comparing AAC, MP3 and TwinVQ.
This new series of tests was performed as a project paid for by the Centre for Signal Processing,
Nanyang Technological University, Singapore. Dr Lin Xiao, of the Centre, whose program
AudioZip is one of the ten programs I tested, was keen that my tests be independent. To this end, I
used exactly the same test tracks I used in 1998, adding only two pink noise tracks which do not count
towards the averages for file-size and compression ratio . Thanks to Lin Xiao and his colleagues for
enabling me to do a proper job of this!
Let me know if you would like me to send you a dual CD-R set with all the test files so you can
reproduce these tests yourself.
http://www.firstpr.com.au/audiocomp/lossless/ (1 of 37) [1/4/2002 10:58:54 AM]
Lossless audio compression
Tests: what can be achieved with lossless compression?
Short answer: 60 to 70% of original file-size with pop, rock, techno and other loud, noisy music; 35% to 60% for quieter choral and orchestral
pieces.
My primary interest is in compression of 16 bit 44.1 kHz stereo audio files - as used in CDs. There are lossless compression systems for 24 bit and
surround-sound systems. While I have a few links to these, they are not tested here. My tests are for programs which run on a Windows machine,
though I have Linux machines as well, and some of these programs run under Linux too. I only found one Mac-only lossless compressor (ZAP)
and have not tested it.
In my 1998 tests I was not interested in speed, but in November 2000, in view of the fact that the compression ratios of the leading programs were
fairly similar, I decided to test their speed as well, since this varies enormously.
Five programs distinguished themselves with high compression ratios:
● Dennis Lee's Waveform Archiver (WavArc).
●
Tilman Liebchen's LPAC.
Lin Xiao's AudioZip.
● Matthew T. Ashland's Monkeys Audio.
● Malcolm Taylor's RKAU.
Since each program performed differently on different types of music, and since the choice of music in these tests is arbitrary, I cannot say
with confidence that any of these programs will produce generally higher rates of compression than the others. With my particular test
material, all five produce significantly higher rates of compression than the other programs I tested.
●
Since the difference between the best three programs and the next best three is only a few percent, many other factors are likely to influence your
choice of which program is most useful to you.
A full description of the test tracks follows the test results themselves. All tracks were 44.1 kHz 16 bit stereo .WAV files read directly from audio
CD and are either electronic productions or microphone based recordings - except for my Spare Luxury piece which was generated entirely with
software. The music constituted 775 Megabytes of data - 73 minutes of music. The tabulation of these figures was done by MS-DOS directory
listings and pasting the file-sizes as a block into a spreadsheet. (See notes below on exactly how I did it.) Those files are here: sizes9.txt and
lossless-analysis.xls . I am pretty confident there are no clerical or other errors, but these intermediate documents enable you to check.
Audio files contain a certain amount of information - "entropy" - so they cannot be compressed losslessly to any size smaller than that. So it is not
realistic to expect an ever-increasing improvement in lossless compression algorithm performance. The performance can only approach more
closely whatever the basic entropy of the file is. No-one quite knows what that entropy is of course . . . I think that would require understanding the
datastream in a way which is exactly in tune with it's true nature. For instance a .jpg image of handwriting would appear to contain a lot of data,
unless you could see and recognise the handwriting and record its characters in a suitably compressed format. The true nature of sound varies with
its source, physical environment and recording method, and a lossless compression program cannot adapt itself entirely to the "true" nature of the
sound in each piece of music. Therefore it is not surprising that different algorithms work best on different kinds of music.
Here are the test results, with the figures in the main body of the table showing the compressed file size as a function of the original size. Those
instances which are the smallest are in bold-face, larger characters and with a green background. The average file sizes are the average of the file
sizes of the test tracks 00 to 10. The average compression ratio is simply 100 divided by the average file size percentage. Except where noted
http://www.firstpr.com.au/audiocomp/lossless/ (2 of 37) [1/4/2002 10:58:54 AM]
Lossless audio compression
(WaveArc -4 and with RKAU -l2 and -l3) I have selected the highest compression option for all programs tested.
The two test files 11PS and 12PM are pink-noise files with a -12dB signal level. 11PS is independent stereo channels and 12PM is the same signal
on both channels - the left channel of 11PS. These are not realistic tests of compression of music, but they show something about the internal
functioning of the programs. The compression ratios for these pink noise files do not contribute to the averages at the bottom of the table. It is
unavoidably wide, so scroll sideways and print in landscape. The table alone, for those who want to print it, is available as an HTML file here:
table.html .
Wave
Zip
Shorten
Tony
Robinson
00HI Choral
01CE Solo Cello
02BE Orchestra
03CC Ballet
04SL Softw. Synth.
05BM Club Techno
06EB Rampant Techno
07BI Rock
08KY Pop
09SR Indian Classical 1
10SI Indian Classical 2
11PS Pink noise
12PM Pink noise mono
Average size
Tracks 00 - 10
Average ratio
Tracks 00 - 10
Gadget
WavWavPegasus
labs
Arc -4 Arc -5 SPS
MUSICompress Dennis Lee Dennis Lee jpg.com
Sonarc
Wav2.1i
Pack
Richard LPAC 3.6B
P.
David
Tilman
Sprague Liebchen Bryant
Monkey
Audio 3.81B RKAU
Matthew
1.07
Zip
T.
Malcolm
Lin Xiao
Ashland
Taylor
37.23
44.81
36.49
34.73
36.69
40.91
39.57
41.77
40.28
38.98
33.28
42.01
44.71
41.98
40.44
41.14
41.53
40.33
41.38
40.52
39.61
39.18
55.68
57.99
42.00
40.72
42.43
53.15
40.55
43.89
43.48
39.86
39.01
58.28
60.29
57.32
54.58
56.52
55.97
54.31
56.51
55.20
53.82
52.80
42.54
45.23
42.02
39.64
40.70
40.99
39.61
41.88
40.65
38.32
33.06
74.07
75.43
69.51
68.45
70.70
72.91
68.45
69.75
69.34
66.81
66.60
68.50
69.56
66.95
66.23
67.67
68.97
67.02
66.48
65.80
66.30
65.88
65.04
66.54
62.07
58.79
62.48
59.50
57.59
61.78
58.36
57.15
56.95
74.36
75.28
71.39
70.41
72.08
71.13
69.55
71.76
69.47
68.09
68.07
53.54
56.11
46.70
44.63
52.39
51.99
44.45
46.58
47.76
43.41
43.89
58.60
61.50
56.12
50.99
53.46
50.99
49.73
54.34
50.70
49.24
49.23
86.70
89.06
86.25
86.21
86.42
87.13
86.15
86.54
85.87
86.45
85.49
86.71
89.06
43.15
43.14
43.27
87.14
43.09
46.29
78.32
46.24
42.75
57.26
59.77
53.87
51.78
54.20
55.28
51.92
54.19
52.87
51.05
49.81
1.746
1.673
1.856
1.931
1.845
1.809
1.926
1.845
1.891
1.959
2.008
Shorten
WaveZip WaveArc
-4
http://www.firstpr.com.au/audiocomp/lossless/ (3 of 37) [1/4/2002 10:58:56 AM]
WaveArc
-5
Pegasus
SPS
Sonarc
LPAC
Wave
Pack
Audio
Zip
Monkey
RKAU
Lossless audio compression
Time to compress 3min
20sec Kylie pop track
(500 MHz Celeron)
0:17
0:22
0:30
4:37
1:42
66:00
1:18
0:21
6:26
0:28
The compress time tests were performed with a 500MHz Celeron with 128MB of RAM and a 13Gig IDE hard disc. It took 7 seconds to copy the
test file (00ky.wav 35.9 MB) from and to the disc. These figures should be regarded as accurate to only +/- 20%.
The test files are described below. 6 second 1 Megabyte sample waveforms are provided. The .wav files are stored in a directory which is not
linked to exactly here, to stop search engines downloading them. The directory is /audiocomp/lossless/wav/ . Type this into your browser if you
wish to download .wav files. Compression of these 6 second samples will no-doubt produce different ratios then compressing the entire file, due
to variations in the sound signal from moment to moment.
After I did these tests, I discovered some non-ideal aspects of two files:
● The Orchestra track was in fact mono - both channels were almost identical. I think it was an old analogue recording.
● The Ballet file (Can Can) had 12 seconds of silence at the end.
I have not changed them, since they are the same files as I used in 1998.
Description of audio Average
track
level dB
(Size Megabytes)
Smallest Length Comments Source
file size
min:sec
as ratio
of original
Choral - Gothic Voices:
Hildergard von Bingen:
Columbia aspexit
(00HI.wav 55.9MB)
-29.5
34.7%
5:17
Solo cello - Janos
Starker J.S. Bach: Suite
1 in G Major
(01CE.wav 173.2MB)
-20.4
40.3%
16.45
Orchestra - Beethoven
3rd Symphony
(02BE.wav 43.6MB)
-21.1
40.6%
4.07
Mono
Berlin Philharmonic
Music and Arts CD520,
from a Classic CD
magazine issue 54 cover
disc.
Ballet - Offenbach, Can
Can (03CC.wav
24.4MB)
-14.6
54.3%
2.18
12 sec
silence
Unknown orchestra, Tek
(Innovatek S.A.
Bruxelles) 93-006-2
A Feather on the Breath
of God Hyperion
CDA66039
Sefel SE-CD 300A
http://www.firstpr.com.au/audiocomp/lossless/ (4 of 37) [1/4/2002 10:58:56 AM]
3:14
Lossless audio compression
Software synthesis: my
"Spare Luxury" Csound
binaural piece
(04SL.wav 85.0MB)
-20.5
39.6%
8.02
Club techno Bubbleman (Andy
Van): Theme from
Bubbleman (05BM.wav
59.1MB)
-11.7
68.5%
5.35
Vicious Vinyl Vol 3
VVLP004CD
Rampant trance
techno - ElBeano (Greg
Bean): Ventilator
(06EB.wav 44.0MB)
-14.3
65.8%
4.09
Earthcore EARTH 001
Rock - Billy Idol, White
Wedding (07BI.wav
88.9MB)
-17.3
57.6%
8.23
Chrysalis CD 53254
Pop - Kylie Minogue, I
Should be so Lucky
(08KY.wav 35.9MB)
-14.9
69.5%
3.23
Mushroom TVD93366
Indian classical
(mandolin and
mridangam) - U.
Srinivas: Sri Ganapathi
(09SR.wav 71.7MB)
-12.1
44.4%
6.45
Academy of Indian
Music (Sandstock)
Aust.SSM054 CD
Indian classical (sitar
and tabla) PT. Kartick
Kumar & Niladri
Kumar,: Misra Piloo
(10SI.wav 89.4MB)
-19.4
49.7%
8.27
OMI music D4HI0627
Pink noise stereo
(11PS.wav)
-12.2
85.8%
1.00
Pink noise mono
(12PM.wav)
-12.2
43.1%
1.00
The 10 programs I tested
Shorten Tony Robinson
WaveZip Gadget labs (MUSI-Compress)
http://www.firstpr.com.au/audiocomp/lossless/ (5 of 37) [1/4/2002 10:58:57 AM]
Lossless audio compression
WavArc Dennis Lee
Pegasus SPS jpg.com
Sonarc 2.1i Richard P. Sprague
LPAC Tilman Liebchen
WavPack 3.1 David Bryant
AudioZip Lin Xiao Centre for Signal Processing, Nanyang Technological University, Singapore
Monkeys Audio 3.7 Matthew T. Ashland
RKAU Malcolm Taylor
FLAC Josh Coalson (Not tested yet.)
Any program listed as running under Windows 95 or 98 will presumably run under Windows ME, NT, 2000, XP etc.
Shorten Tony Robinson
Homepage
http://www.softsound.com/Shorten.html
email
[email protected]
Operating systems
MS-DOS, Win9x.
Versions and price
Win9x and demos free. More functional MS-DOS
and Win9x version available for USD$29.95.
Source code available?
(In the past.)
GUI / command line
GUI & Command line.
Notable features
High speed.
Real-time decoder
In paid-for version.
http://www.firstpr.com.au/audiocomp/lossless/ (6 of 37) [1/4/2002 10:58:57 AM]
Lossless audio compression
Other features
●
●
●
Near-lossless compression available.
Shorten "supports compression of Microsoft
Wave format files (PCM, ALaw and mu-Law
variants) as well as many raw binary formats".
Paid-for version includes:
❍ Batch encoding and decoding.
❍ Creation of self-extracting encoded files.
❍ MS-DOS Command line
encoder/decoder.
Theory of operation
A 1994 paper by Tony Robinson is available at from
this Cambridge University site.
Options used for tests
GUI program: "lossless".
Technical background to the program is at: http://svr-www.eng.cam.ac.uk/~ajr/GroupPubs/Robinson94-tr156/index.html . I tested version "2.3a1
(32 bit)" as reported in the GUI executable. This was from the shortn23a32e.exe installation file.
Seek information in Shorten files, and other programs which compress to the Shorten file format
There is another version of Shorten, "shortn32.exe" V3.1 at: http://etree.org/shncom.html . etree.org is concerned with lossless compression for
swapping DAT recordings of bands who permit such recordings. This is an MS-DOS executable which reports itself (with the -h option) as:
shorten: version 3.1: (c) 1992-1997 Tony Robinson and SoftSound Ltd
Seek extensions by Wayne Stielau - 9-25-2000
This adds extra data to the file, or as a separate file, to enable quick seeking within a file for real-time playback. It compresses and decompresses.
I was unable to get it to compress without including the seek data, so I did not test it. I assume its performance is the same as the program I
obtained from Tony Robinson's site.
Another program based on Tony Robinson's Shorten is by Michael K. Weise - a Win98/NT/2000 GUI program called "mkw Audio Compression
Tool - mkwACT" http://etree.org/mkw.html . This generates compressed Shorten files with seek information. It can also compress to MP3 using
the Blade codec. I tried installing the "version 0.97 beta 1" of this program, but there was an error.
Real-time players for Shorten files
In addition to the real-time player included in the full (paid-for) version of Shorten, there is a free plugin for the ubiquitous Windows MP3 (etc. &
etc.) audio player Winamp http://www.winamp.com . The plug-in - ShnAmp v2.0 - http://etree.org/shnamp.html . This uses the special files with
seek information produced by the programs mentioned above.
There is a functionally similar real-time player program for Xmms the X MultiMedia System (Linux: and other Unix-compatible operating systems):
xmms-shn which is freely available, with source code, from: http://freeshell.org/~jason/shn-utils/xmms-shn/ .
http://www.firstpr.com.au/audiocomp/lossless/ (7 of 37) [1/4/2002 10:58:57 AM]
Lossless audio compression
WaveZip Gadget labs (MUSICompress)
Homepage
WaveZip http://www.gadgetlabs.com but see note
below on availability.
MUSICompress http://hometown.aol.com/sndspace
email
None.
Operating systems
Win9x. (MUSICompress command line demo
program runs in DOS box under any version of
Windows.)
Versions and price
Win9x evaluation version is free. A paid-for 24 bit
upgrade was available, but Gadget Labs has now
gone out of business. (MUSICompress command
line demo program is free to use.)
Source code available?
No, but see the Al Al Wegener's Soundspace site
(below) for information and source code regarding
the MUSI-Compress algorithm.
GUI / command line
GUI.
Notable features
High speed. Handles 8 and 16 bit .WAV files in
stereo and mono. Also supports ACD (Sonic
Foundry's ACID) and BUN (Cakewalk Pro).
Real-time decoder
No.
Other features
Very handy file selection system
Theory of operation
Soundspace Audio's page for their MUSICompress
algorithm: http://hometown.aol.com/sndspace See
notes below.
Options used for tests
There are no options. (But see note below on
commandline version of MUSICompress.)
On 1 December 2000, Gadget Labs ceased trading and put some of its software in the public domain, with the announcement:
"We regret to announce that Gadget Labs is no longer in business. We sincerely appreciate the support from customers during the last
3 years, and we regret that we didn't meet with enough success to be able to continue to deliver our products and service. This web site
http://www.firstpr.com.au/audiocomp/lossless/ (8 of 37) [1/4/2002 10:58:57 AM]
Lossless audio compression
includes technical information and software drivers that are being placed in the public domain. Please note that usage of the
information and drivers contained here is at the user's sole discretion, responsibility, and risk."
Gadget Labs was primarily known for its digital audio interface cards. A Yahoo Groups discussion group regarding Gadget Labs is here. The
WaveZip page at their site (wavezip.htm) has disappeared. There is no mention of WaveZip at their site at present. For now, I have placed the
evaluation version 2.01 of WaveZip in a directory here: WaveZip/ . It is 2.7 megabytes.
In October 2001, Al Wegener wrote to me to point out the command line demo version of MUSICompress which is available for free (subject to
non-disclosure and no-dissassembly) at his site. He wrote:
Even though the console interface is not nearly as nice as WaveZIP was, people can still
submit WAV-format files to this PC app and both compress and decompress their files. This
version also supports lossy compression, where users can play with a decrease in quality
(one LSB at a time), vs. an increase in compression ratio.
By the way, I've gotten several new customers recently that use MUSICompress specifically
because it's fast. On many of these customers' files, an extra 10% compression ratio just
isn't worth a 20x wait.
MUSI-Compress Theory
The information sheet at: http://members.aol.com/sndspace/download/musi_txt.txt indicates that MUSI-Compress is capable of reducing rock
recordings to between 60 and 70% of their original size. An informative paper from the developer, Al Wegener, is available in Word 6 format from
the Soundspace site. MUSICompress is written in ANSI C using integer math only. It has been ported to at least two DSPs and is used in the
WaveZIP program (see below).
There is also a Matlab version, and the documentation which comes with this indicates that MUSICompress typically uses:
Compression requires between 35 and 45 instructions per sample.
Expansion requires between 25 and 35 instructions per sample
According to Al Wegener, like other commercial lossless audio compression algorithms, MUSICompress uses a predictor to approximate the audio
signal - encoding the prediction data in the output stream - and then computes a set of difference values between the prediction and the actual
signal. These difference values are relatively small integers (in general) and these are compressed using Huffman coding and sent to the output
stream. The compress and decompress functions can apparently be implemented in hardware with 4,700 gates and 20,500 bits of RAM (compress)
and 3,800 gates and 1,500 bits of RAM (decompress) - which sounds pretty snappy to me.
http://www.firstpr.com.au/audiocomp/lossless/ (9 of 37) [1/4/2002 10:58:57 AM]
Lossless audio compression
The diagram to the left, from the abovementioned paper, depicts the approach taken by all the
compression algorithms reviewed on this page. The raw signal is approximated by some kind of
"prediction" algorithm, the parameters of which are selected to produce a wave quite similar to the
input waveform. Those parameters are different for each frame (say 256 samples) of audio and are
packed into a minimum number of bits in the output file (or stream, in a real-time application).
Meanwhile, the difference between the "predicted" waveform and the real signal is packed into as
small a number of bits as possible. Often, the "Rice" coding (AKA Rice packing) algorithm is
used, but MUSI-Compress uses Huffman packing instead. Some of the material mentioned below
contains more detailed theoretical descriptions of Rice packing and other algorithms - and I have
my own explanation below.
This diagram is relevant to all the lossless algorithms I know of. (I worked on my own algorithm
which worked on different principles for a while - but it did not work out well. A good
"prediction" system is crucial.) The predictor is replicated in the decoder - and it must work from
prediction parameters and the previously decoded samples. The predicted value is added to the
"error" value to create the final exactly correct value for that sample. Then the prediction
algorithm is run again, based on the newly decoded sample and some previous ones, to predict the
next sample.
WavArc Dennis Lee
Homepage
Unknown - but the program is available here: wavarc/
..
email
Unknown.
Operating systems
MS-DOS. (ie, in an MS-DOS window in Win9.x.)
Versions and price
Free.
Source code available?
No.
GUI / command line
Command line.
Notable features
Potentially very high compression. Multiple files
stored in one archive.
Real-time decoder
No.
http://www.firstpr.com.au/audiocomp/lossless/ (10 of 37) [1/4/2002 10:58:58 AM]
Lossless audio compression
Other features
High compression ratio. Selectable
speed/compression trade-off. Compresses WAV files
and stores all other files without compression in the
archive.
Theory of operation
?
Options used for tests
"a -c4" and "a -c5".
Dennis Lee's Waveform Archiver is a freeware command-line program to run under MS-DOS or in a Windows command line mode. It can store
multiple .WAV files in a single archive.
Dennis Lee's web page: http://www.ecf.utoronto.ca/~denlee/wavarc.htm disappeared sometime in 1999. Emails to that site (University of Toronto)
enquiring about him have not resulted in any replies.
No source code was available, and there was no mention of what algorithms are used. This program was made available on a low-key basis - but its
performance in "compression level 5" mode significantly exceeds the alternatives that I was aware of when I did my first rounds of tests in late
1998. When compressing, I found that the report it gives on screen about the percentage file size is sometimes completely wrong. I tested version
1.1 of 1 August 1997.
Dennis told me by email on 4 December 1998 that he had done a lot of work on version 2.0 of Waveform Archiver - but is not sure when it will be
finished:
Shortly before completing WA v2.0 I became involved with another project
full-time, and haven't been able to work on WA since. WA v2.0 has some
significant improvements including:
1) Faster at all compression settings.
2) -c6 codec (slightly more optimal than v1.1's -c5).
3) A new -c5 that's much faster (about half the speed of -c4).
This new codec is both backward and forward compatible with v1.1's -c5.
4) Lossless compression for non-audio files (provided by zlib).
5) Several bug fixes including the incorrect compression status on
large files.
I hope to continue work on WA when I find the time.
WavArc began life in 1994, as explained in /wavarc/WA.TXT . I would be very glad to hear of Dennis Lee. I did an extensive web search in
November 2000, but found no leads.
http://www.firstpr.com.au/audiocomp/lossless/ (11 of 37) [1/4/2002 10:58:58 AM]
Lossless audio compression
Pegasus SPS jpg.com
Homepage
http://www.jpg.com/products/sound.html
email
[email protected]
Operating systems
Win9x.
Versions and price
Full version USD$39.95.
Evaluation version limited to 10 compressions.
Source code available?
No.
GUI / command line
GUI.
Notable features
WAV files, 8 and 16 bit, stereo and mono.
Real-time decoder
No.
Other features
Batch compression in paid-for version.
Theory of operation
http://www.jpg.com/imagetech_els.htm for
generalised ELS algorithm.
Options used for tests
There are no options.
In 1997 Krishna Software Inc. http://www.krishnasoft.com. wrote a lossless audio compression program for Windows. The program has some
limited audio editing capabilities and several compression modes, but the most significant lossless compression algorithm - ELS - comes from
Pegasus Imaging, http://www.jpg.com who seem to have developed it initially for JPG image compression. The SPS program is available from
both companies.
Pegasus-SPS provides four lossless compression modes and has the ability to truncate a specified number of bits for lossy compression. I used the
default and highest performance "ELS-Ultra" algorithm for my tests. This was reasonably fast and produced results a fraction of a percent better
than the next two best performing algorithms. When the compression function is working, this program seems to use virtually all the CPU cycles - at
least under Windows 98 - so don't plan on doing much else with your computer!
Some information on ELS - Entropy Logarithmic Scale - encoding is at: http://www.pegasusimaging.com/imagetech_els.htm this leads to a .PDF
file which has a scanned version of a 47 page 1996 paper explaining the algorithm: "A Rapid Entropy-Coding Algorithm" by Wm. Douglas Withers.
I tested version 1.00 of Pegasus-SPS.
http://www.firstpr.com.au/audiocomp/lossless/ (12 of 37) [1/4/2002 10:58:58 AM]
Lossless audio compression
Sonarc 2.1i
Richard P. Sprague
Homepage
None.
email
None.
Operating systems
MS-DOS.
Versions and price
Was shareware, but author is uncontactable.
Source code available?
No.
GUI / command line
Command line.
Notable features
Real-time decoder
No.
Other features
Theory of operation
?
Options used for tests
"-x -o0" = use floating point and for each frame,
search for the best order or predictor.
Sonarc, by Richard P. Sprague was developed up until 1994. His email address was "[email protected]" but in December 1998, this
address was no longer valid. Sonarc has quite good compression rates, but it is very slow indeed.
There is an entry for it in the speech compression FAQ http://www.itl.atr.co.jp/comp.speech/ at:
http://www.itl.atr.co.jp/comp.speech/Section3/Software/sonarc.html . Sonarc is also listed in Jeff Gilchrist's magnificent MS-DOS/Windows
"Archive Comparison Test" site http://web.act.by.net/~act/ at: http://web.act.by.net/~act/act-indx.html which gives an FTP site for the program:
ftp://ftp.elf.stuba.sk/pub/pc/pack/snrc21i.zip . This is the program I tested: version 2.1i. You can get a copy of it here: sonarc/ . The programs are
MS-DOS executables, dated 27 June 1994. The documentation file, with the shareware arrangements and author's contact details is here:
sonarc/sonarc.txt .
LPAC Tilman Liebchen
Homepage
http://www-ft.ee.tu-berlin.de/~liebchen/lpac.html
email
[email protected]
Operating
systems
Win9x/ME/NT/2000, Linux, Solaris.
http://www.firstpr.com.au/audiocomp/lossless/ (13 of 37) [1/4/2002 10:58:58 AM]
Lossless audio compression
Versions and
price
Free.
Source code
available?
Tilman Liebchen writes that he is contemplating some form of
availability, and that "the LPAC codec DLL can be used by
anyone for their own programs. I do not supply a special
documention for the DLL, but any potential user can contact
me.".
GUI / command
line
GUI and command line. In the future (Dec 2000) the LPAC
codec DLL will operate as part of the Exact Audio Copy CD
ripper.
Notable features
8, 16, 20 and 24 bit support.
Real-time
decoder
Yes, and a WinAmp plug-in.
Other features
High compression ratio. CRC (Cyclic Redundancy Check) for
verifying proper decompression.
Theory of
operation
Tilman Liebchen writes "adaptive prediction followed by entropy
coding".
Options used for
tests
Extra High Compression, Joint Stereo and no Random Access.
Tilman Liebchen is continuing to actively develop LPAC, the successor to LTAC which I tested in 1998.
The results shown here are for the "Extra High Compression" option with "Joint Stereo" and no "Random Access". The Random Access is to aid
seeking in a real-time player, and adds around 1% to the file size. But see the sizes9.txt for the actual file sizes. In all cases not using the "Joint
Stereo" option produced files of the same size or larger.
On 17 January, Tilman wrote:
The new LPAC Codec 3.0 has just been released. It offers significantly
improved compression ("medium" compression is now better than "extra
high" compression was before) together with increased speed (approx.
factor 1.5 - 2). I would be lucky if you could test the new codec and
put the results on your page.
I haven't tested it yet.
http://www.firstpr.com.au/audiocomp/lossless/ (14 of 37) [1/4/2002 10:58:59 AM]
Lossless audio compression
WavPack 3.1 David Bryant
Homepage
http://www.wavpack.com
email
[email protected]
Operating systems
MS-DOS
Versions and price
Free. Version 3.1 and 3.6 Beta.
Source code available?
No.
GUI / command line
Command line.
Notable features
High speed.
Real-time decoder
WinAmp plugin currently being developed.
Other features
Compresses non .WAV files, including Adaptec .CIF
files for an entire CD.
Nice small distribution file < 82 kbytes.
Theory of operation
http://www.wavpack.com/technical.htm
Options used for tests
No options affected the lossless mode.
I tested version 3.6 Beta of WavPack, using the -h option for the high compression mode which Dave Bryant added in 3.6. WavPack is freely
available, without source code but with a good explanation of the compression algorithm. It is intended as a fast compressor with good compression
ratios for .wav files. Compression and decompression rates of 8 times faster than audio are achieved on a Pentium 300 MHz machine. The
algorithm makes use of the typical correlation which exists between left and right channels in a stereo file. Two additional features are lossless
compression of any file, with high compression for those containing audio (such as CD-R image files) and selectably lossy compression.
AudioZip
Lin Xiao Centre for Signal Processing, Nanyang Technological University, Singapore
Homepage
http://www.csp.ntu.edu.sg:8000/MMS/MMCProjects.htm
email
Lin Xiao (Dr) [email protected]
Operating systems
Win9x.
Versions and price
Free.
http://www.firstpr.com.au/audiocomp/lossless/ (15 of 37) [1/4/2002 10:58:59 AM]
Lossless audio compression
Source code available?
No.
GUI / command line
GUI.
Notable features
High compression ratio.
Real-time decoder
No.
Other features
Theory of operation
"LPC with Rice encoding."
Options used for tests
Maximum.
The current version of AudioZip is rather slow - at least at the Maximum compression mode, which I used in these tests. Its user interface is quite
primitive, for instance it is necessary to manually enter the name of each compressed file. However Lin Xiao writes that he and his team are
working to make AudioZip faster and more user friendly. See the note below in the RKAU section on how AudioZip and RKAU achieved the
highest compression ratios for the pink noise file.
Monkey 3.7 - 3.81 Matthew T. Ashland
Homepage
http://www.monkeysaudio.com
email
[email protected]
Operating systems
Win9x.
Versions and price
Free.
Source code available?
No, but author could be tempted. Programming
details for the DLLs are provided, along with the
source for the plugin realtime players (which use the
DLL for decoding).
GUI / command line
GUI and command line. Encoder can be used by
Exact Audio Copy CD ripper.
Notable features
High speed and high compression.
Real-time decoder
Standalone program and plugins for Winamp and
Media Jukebox.
http://www.firstpr.com.au/audiocomp/lossless/ (16 of 37) [1/4/2002 10:58:59 AM]
Lossless audio compression
Other features
CRC checking. Includes ID3 tags as used in MP3 to
convey information about the track. Can be used as
front end for other compressors, including WavPack,
Shorten and RKAU. Compresses WAV files, mono
or stereo, 8, 16 or 24 bits.
Theory of operation
Adaptive predictor followed by Rice coding.
http://www.monkeysaudio.com/theory.html
Options used for tests
Command line version -c4000.
I tested the command line 3.81 Beta 1 commandline-only version of Monkeys Audio, using the -c4000 option for highest compression. A separate
renamer program is handy for changing the extension of file names - it can recurse into sub-directories.
RKAU Malcolm Taylor
Homepage
http://rksoft.virtualave.net/rkau.html
email
[email protected]
Operating systems
Win9x.
Versions and price
Free.
Source code available?
No.
GUI / command line
Command line. (But Monkeys Audio can be a GUI
front end.)
Notable features
High compression.
Real-time decoder
Winamp plugin.
Other features
Selectable lossy compression modes.
Can include real-time seek information for use with
realtime players.
Theory of operation
?
http://www.firstpr.com.au/audiocomp/lossless/ (17 of 37) [1/4/2002 10:58:59 AM]
Lossless audio compression
Options used for tests
-t- -l2
-t- -l2 -s-t- -l3
-t- -l3 -s-
I tested the v1.07 version, with options -t-" to not include real-time tags. Malcolm told me that the highest compression option "-l3" sometimes
produced compression lower than "-l2", so I tried both options. Likewise the program's default behaviour of assuming there is something in
common with both stereo channels does not always lead to the best compression. I tried RKAU with and without the -s- option, giving me four sets
of file sizes. See analysis-rkau-107.html for these results and the "best-of" set chosen from the four options. The best-of set is reproduced below.
Theseare the figures I have used in the main comparison table.
00HI
With or
without -sto disable
separate
stereo
channels
Either
-L2 or
-L3
-s-
L2
Best of RKAU
1.07 -t- with or
without -s- and at
either -l2 or -l3
%
18,610,940
33.28
01CE
L3
69,471,291
39.18
02BE
L3
17,001,008
39.01
03CC
L3
12,879,953
52.80
L3
28,109,211
33.06
05BM
L3
39,382,534
66.60
06EB
L2
28,985,245
65.88
07BI
L3
50,598,306
56.95
08KY
L3
24,435,044
68.07
L2
31,464,353
43.89
L2
44,015,255
49.23
9,048,056
85.49
04SL
-s-
09SR
10SI
-s-
11PS
-s-
L3
http://www.firstpr.com.au/audiocomp/lossless/ (18 of 37) [1/4/2002 10:58:59 AM]
Lossless audio compression
12PM
L2
4,524,830
Average
size 00 - 09
42.75
49.812
Average
ratio
2.00755
Note that the average file size and compression ratio is based on the best achievable after compressing each file in four ways and manually choosing
the smallest file size - something which is not likely to be practical for everyday use. It shows that RKAU has potentially better compression ratios
than other programs for the files I tested, but that at present, the program is not smart enough to choose the best approach for each file.
The best results with any one option were for "-l2" (with -t- and without -s-). The average file size was 50.132% and the average compression ratio
was 1.99475.
Malcolm suggests that other programs would benefit from correct choice of whether or not to treat the stereo channels separately, or to treat them
together (I guess compressing L+R as one channel and L-R as the other, presumably quieter channel). You can see by the results for the stereo and
"mono" (both stereo channels the same) which programs are taking notice of stereo correlations. RKAU does this by default, but sometimes it
would be better if it did not. Here are the options for each program:
Program
Default - does it
recognise correlation
between channels?
Option to
control
Joint Stereo?
Shorten
No.
WaveZip (Gadget
Labs)
No.
WaveArc
Yes.
No.
PegausSPS
Yes.
No.
Sonarc
LPAC
Comments
No.
Yes.
WavPack
Joint Stereo is on
by
default.
No.
AudioZip
To some extent.
No.
Monkey 3.81 beta
Yes.
No.
http://www.firstpr.com.au/audiocomp/lossless/ (19 of 37) [1/4/2002 10:59:00 AM]
Best to use Joint Stereo the results are the same or
better then without it.
Lossless audio compression
RKAU
Yes.
Yes: -s-
-s- is sometimes better. -l2
is sometimes better than
-l3.
I have not counted the pink-noise results towards the average compression percentages/ratios, because they do not represent musical signals, it is
interesting to see which algorithm achieves the highest compression ratio for the stereo pink noise file. This signal has no musical pattern in terms
of spectrum or sample-to-sample correlation other than pink noise filtering of white noise to give a spectrum of -3dB per octave, compared to
white-noise (each sample completely random) which has a flat frequency response. (For more on pink noise, see: dsp/pink-noise/ ). This indicates
that the RKAU and AudioZip's algorithms are highly attuned.
FLAC Josh Coalson
Homepage
http://flac.sourceforge.net/
http://sourceforge.net/projects/flac
email
[email protected]
Operating systems
Win9x, Linux, Solaris - any Unix.
Versions and price
Free.
Source code available?
Yes! GPL and LGPL. Written in C.
GUI / command line
Command line.
Notable features
Open source, patent-free format and source code for
codec.
Real-time decoder
Winamp and XMMS plugins.
Other features
Uses stereo interchannel correlation in several
possible ways, including variants such as L, (R-L).
Several predictor algorithms and two approaches to
Rice coding of the residuals. All these can be used
optimally per block. The current version of the codec
uses fixed blocksizes, but the format enables them to
be varied dynamically. Provision for metadata, such
as ID3 tags.
Theory of operation
http://flac.sourceforge.net/format.html
Options used for tests
Not tested yet.
http://www.firstpr.com.au/audiocomp/lossless/ (20 of 37) [1/4/2002 10:59:00 AM]
Lossless audio compression
FLAC (Free Lossless Audio Coder) was released in an Alpha form on 10 December 2000.
parameters which may affect compression ratios, so I will try a few combinations.
I have not yet tested it. There are a number of
Please provide feedback!
Please let me know your suggestions for improving this page, particularly for correcting any problem with my description of the programs tested.
I can't keep linking to every paper or page regarding lossless audio compression, but I would like to link to he major ones. Mark Nelson's
compression link farm below, is likely to be a more complete set of links.
If you like this page, please consider writing to Dr Lin Xiao [email protected] who organised the funding for my work on it in
November-December 2000.
With about 150 visits a day, this page is one of the most popular on my website.
Programs I did not test or report on fully
Two programs I tested briefly but have not reported on because their performance was not as good as any of those listed above:
❍ ADA, an MS-DOS command line program: http://wwwcip.informatik.uni-erlangen.de/~hovolk/ada/adaframe.htm .
❍
A simple sample-to-sample diff program followed by zip or gzip: ftp://mustec.bgsu.edu/pub/linux/ audiozip
RAR (AKA Win-RAR) is a general purpose archiver, with a "Multimedia" option: http://www.rarsoft.com I tested version 2.80 Beta 1 with the "Best"
and "Mulimedia" options - the results are in the spreadsheet. I have not added it to the table because, with a few exceptions, its compression ratios were
worse than any of the programs listed in the table. RAR's average compression size was 61.219, giving a ratio of 1.63347. RAR is a shareware program
with an evaluation period and a USD$35 registration fee. It has versions in multiple languages for operating systems including Windows, MS-DOS, Mac,
Linux and various other flavours of Unix.
MKT http://home.att.net/~mkw/mkwact/ is a Windows drag-and-drop program with its own lossless compression format by Michael K. Weise,
[email protected] . As with RAR and DAKX the compression ratios were no better than those already in my table. I have added them to the
spreadsheet For MKT 0.97 Beta 1, the average compression size was 70.061, giving a ratio of 1.42732. MKT can also act as a fron-end for encoding
with LAME (a highly regarded open-souce MP3 encoder/decoder) and losslessly with Shorten, with or without real-time seeking information. In doing
so, MKT can apparently recurse and create subdirectories.
Emagic have a lossless compressor Zap for the Macintosh, with decompression on Mac and Windows:
http://www.emagic.de/english/products/software/zap.html . I did not test it because I do not have a Macintosh.
The DAKX system, described more fully below, has a Mac-only shareware version and a Windows 9x version 1.0.
http://www.firstpr.com.au/audiocomp/lossless/ (21 of 37) [1/4/2002 10:59:00 AM]
Lossless audio compression
Merging Technology's LRC system has no demonstration program: http://www.merging.com/products/lrc.htm.
Links specific to lossless audio compression
There are many more links regarding data compression of integers of varying lengths at the bottom of this page.
Brian Dipert's lossy and lossless codec project
>>> "There is another system!" - Colossus, in The Forbin Project. <<<
http://www.commvergemag.com/commverge/extras/P178673.htm This site tests several of the lossless codecs (encoder / decoder) tested here. In
December 2000, these included MUSICompress (WaveZip), Shorten, WavPack and RAR. This is an ongoing project and the site lists several other
lossless codecs, including MKT which I had not heard of before. This page has some excellent links to lossy and lossless codec sites!
When I looked at it this page was corrupt and displayed incorrectly on every browser I tried (Netscape, Mozilla, Opera) apart from Microsoft Internet
Explorer.
Search Engines
AltaVista Advanced - http://www.altavista.com/cgi-bin/query?pg=aq&text=yes - returned 1,014 pages in December 2000 for the query:
lossless near (sound or audio) and (compression or "data reduction")
Click here to repeat that search.
Mark Nelson's formidable compression resources and link farm
Author Mark Nelson maintains a fabulous set of pages of links to all matters compression. The index page is:
http://dogma.net/DataCompression/
Some lossless audio programs are listed at:
http://dogma.net/DataCompression/NonCommercialProgs.shtml
Also, check out the links at:
http://dogma.net/DataCompression/Lossless.shtml
http://www.firstpr.com.au/audiocomp/lossless/ (22 of 37) [1/4/2002 10:59:00 AM]
Lossless audio compression
Dr Dobb's Compression Forum
http://www.ddj.com/topics/compression/ Mark Nelson's extensive resource at the Web abode of what used to be known, from 1976 as Dr. Dobb's
Journal of Calisthenics and Orthodontia: Running Light Without Overbyte.
Usenet newsgroup comp.compression
The FAQ for Usenet newsgroup comp.compression is at: http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html
If you don't have direct access (via NNTP) to a Usenet server which carries comp.compression, you can read the posts via the Web at:
http://www.deja.com/group/comp.compression - and no doubt other such sites. You can post from deja.com too, but it is best to set up a free account with
them first.
Mat Hans' thesis
Mat Hans http://users.ece.gatech.edu/~hans/ has written a magnificent PhD thesis on lossless audio compression. It is in PDF format and is available
zipped from his web site. (Actually, PDF is highly compressed - its generally better not to compress them with another algorithm.) The thesis looks at
both lossy and lossless algorithms. In the lossless area, Hans tests and explores the inner workings of many lossless audio compression algorithms.
DAKX
A new algorithm suitable for lossless or lossy compression of audio and other similar signals has been developed by DAKX in North Carolina:
http://www.dakx.com The algorithm is patented in the US: http://www.delphion.com/details?&pn=US05825830__ . My impression is that it focuses on the
most efficient way of storing the differences between one sample and another. This largely concerns setting a bit-length to hold each diff value in the
output stream, with snappy ways of increasing or decreasing that bit-length. There are Macintosh executables for audio files. One (v1.1) is shareware
for USD$40.
D. A. Kopf, DAKX's developer, wrote to me and asked me to link to and test a Windows v1.0 version of DAKX. This is not as developed as the later Mac
version, but it has the same compression performance. The program is at: http://dakx.com/aps/daxwav32.zip is free to use. His email dakx-1.txt contains a
brief description or DAKX's algorithm.
I have not listed DAKX's compression performance in the table above, since it does not in general surpass any of those listed - and the table is already
rather wide. I have added its results to the spreadsheet. Its average file size and compression ratio are 61.444% and 1.62749 respectively. These tests
were done in 16 bit mode with the slider on the right at the higest position for "M" maximum complression. D. A. Kopf wrote to me that the Windows
version's maximum compression is not quite as good as the Macintosh version. Compressing one section of a test file (03CC.wav) he found that the
Windows and Macintosh file sizes were 57.03% and 56.50% respectively.
Quite apart from its use as a compressor, this Windows version 1.0 of DAKX (and, I imagine the later Mac version) has a most fascinating and
eductational function: It can play back WAV files with truncation from 16 (no truncation) down to 2 bits, with the truncation selectable by a slider in
real-time!! Be sure to increase the size of the window by dragging the bottom right corner - this makes the file selection business easier. Click on the
http://www.firstpr.com.au/audiocomp/lossless/ (23 of 37) [1/4/2002 10:59:01 AM]
Lossless audio compression
Play button and then double-click on the file you want to listen to. This is a handy test tool for those who swear they need 24 bits! In many listening
situations, it is hard to hear 14 bit truncation. This playback function makes a Jim Dandy fuzzbox too!
DVD-AUDIO and Meridian Lossless Packing (MLP)
I have not been following this at all, due to shortage of time, my belief that there's nothing wrong with 44.1 kHz 16 bit digital audio when it is properly
done, and my interest in binaural sound, rather than multi-channel surround sound. (Nonetheless, for mastering, 20 or even 24 bit accuracy would be
handy.)
Click here to search Alta Vista Advanced text mode for material on "meridian lossless packing".
❍
❍
❍
❍
Meridian's site: http://www.meridian.co.uk/m_news.htm .
A paper called "Coding High Quality Digital Audio " available in PDF format from http://www.meridian.co.uk/ara/jas98.htm has some interesting
arguments about why 16 bit 44.1 kHz audio is supposedly not good enough, and mentions lossless compression for 96 kHz 24 bit digital audio for
DVD discs. This is part of the ARA Acoustic Renaissance for Audio project.
Frequently Asked Questions about Surround Sound : http://www.surroundassociates.com/fqmain.html . More than 12 hours of 44.1 kHz 16 bit
stereo with MLP will apparently be possible with DVD-A. It is extraordinary that there are still plans to add watermarking noise to material
released on this potentially impeccable format!
Dolby's Frequently Asked Questions about Dolby Digital is one of the files available at: http://www.dolby.com/tech/ .
Various other links
A large list of MS-DOS / Windows archiving programs, including some real antiques, is maintained by Jeff Gilchrist at his ACT Archive Compression
Test site: http://web.act.by.net/~act/ . There is a test there on lossless compression of 8 bit audio files, but I am really only interested in 16 bit stereo files,
which are a very different thing.
Compression Pointers from Stuart Inglis: http://www.internz.com/compression-pointers.html .
Seneschal http://seneschal.net/infoannex.htm?external has an excellent set of links and papers regarding high sample-rate and bit resolution audio,
including with new DVD audio formats. One article there by Seneshal's Oliver Masciarotte from the July 1998 Mix magazine, discussing various audio
formats for DVD (DVD-Audio - which was then being finalised) and potentially lossless compression based the ARA proposal. In June 99 there is an
article interviewing a Dolby engineer about Meridian Lossless Packing as used in DVD-Audio.
http://www.firstpr.com.au/audiocomp/lossless/ (24 of 37) [1/4/2002 10:59:01 AM]
Lossless audio compression
Not a lossless algorithm, but a relatively low loss system used for broadcasting is Audio Processing Technology's 4:1 fixed rate apt-X system.
http://www.aptx.com This is a real-time, high quality, low delay system (2.76ms for encode and decode combined) - which does not rely on
psycho-acoustic models etc. The FAQ describes it:
ADPCM as used by APT for its apt-X 4:1 compression algorithm
takes the digital signal and breaks it into four frequency sub bands
by means of a QMF digital filter. Each of these sub bands is
subsequently coded by means of predictive analysis; the coder predicts
what the next digital sample in the audio signal will be and subtracts this
prediction from the actual sample. The resulting, small error signal is
transmitted to the decoder which then adds back in the prediction from
identical tables stored in the decoder. NO psycho-acoustic auditory
mask is used to throw away any of the original audio signal resulting in
a near lossless compression system.
In March 2001, a chap from APT wrote to me that the algorithm is available on a demo basis as a Windows DLL.
My work, an explanation of Rice Coding and an exploration of alternative coding strategies for generally
short variable length integers
In November 1997 I spent some time pursuing an old interest - lossless compression of audio signals. I tried sending, for instance, every 32nd
sample, and then sending those in between - sample 16, 48 etc. - as differences from the interpolation of the preceding and following 32nd sample.
Then I would do the 8th samples which were missing, and then the 4th and then all the odd samples on the same basis.
This constitutes using interpolation between already transmitted samples as a predictor - and the results are not particularly promising.
I also experimented using the highest quality MP3 encoding as a predictor, but even using LAME at 256 kbps, the difference between this (once
decoded and aligned for shifts in the output waveform's timing) were quite large. The difference was generally broadband noise, with its volume
depending on the volume of the input signal. This does not look like a promising approach either.
In December I figured out an improvement to traditional "101 => 000001" Rice encoding. It turned out to be a generally sub-ovariation on the Elias
Gamma code.
Here is a quick description of Rice and Elias Gamma - and a link to an excellent paper on these and other funtionally similar codes.
Rice Coding, AKA Rice Packing, Elias Gamma codes and other approaches
In a lossless audio compression program, a major task is to store a larege body of signed integers of varying lengths. These are the "error" values
for each sample: they tell the decoder the difference between its predictor value (based on some variable algorithm working on previously decoded
samples) and the true value of the sample.
The coding methods here all relate to storing variable length integers, in which the distribution is stronger for low values than for high.
http://www.firstpr.com.au/audiocomp/lossless/ (25 of 37) [1/4/2002 10:59:01 AM]
Lossless audio compression
I have not read the original paper:
R. F. Rice, "Some practical universal noiseless coding techniques" Tech Rep. JPL-79-22, Jet Propulsion Laboratory, Pasadena, CA, March
1979.
I have read various not-so-great explanations of Rice coding. There seems to be several often-related algorithms which come under this heading.
Initially, there are Golumb codes, as per the paper:
S.W. Golomb, "Run-Length Encodings", IEEE Trans Info. Theory, Vol 12 pp 399Ð401 1966.
Golumb codes are a generalised approach to dividing a number into two parts, encoding one directly and the other part - the one which varies more
in length, in some other way.
Rice codes are a development of Golumb codes. Here I will deal only with Rice codes of order k = 1, which is the same (I think) as Golomb codes
of order m = 1.
The basic principle of Rice coding for k = 1 is very simple: To code a number N, send N zeros followed by a one.
To send 0 (0 binary) with Rice coding, the output is 1.
To send 1 (1 binary) with Rice coding, the output is 01.
To send 4 (100 binary) with Rice coding, the output is 00001.
There are other Rice codes for k = 2 and higher values, but they are not so straightforward and suffer from the problem of involving three, four or
more bits even when sending a simple 0.
Terminology is a bit of a problem, since there is a larger, more complex operation as part of a lossless audio compression algorithm (described
below) which is often referred to as "Rice" coding or packing, but technically, the Rice coding (for k = 1) is nothing more than the above. I will
refer to them collectively as Rice - but I think there should or must be a separate term for the more complex algorithm described below.
The following discusses how Rice (and later some other algorthms) is used as part of a larger operation on multiple signed binary numbers - the
"error" values in a lossless compressor algorithm. The "error" values are to be stored in the file in as compact form as possible. They will be used
by the decoder to arrive at the final value for each output sample, by adding this "error" value to the output of the predictor algorithm, which is
operating from previously decoded samples.
These "error" numbers are generally small, but quite a few of them are large, due to the complex and unpredictable nature of sound. Huffman
coding can also be used, and is used by some of the programs tested here.
These error numbers are typically twos-complement (signed) 16 bit integer, but their values are often small, say in the range of +/- a hundred or so,
and so can be shortened to a 9 bit twos-complement integer. Some may be much larger - say +/- several thousand. Few are the full 16 bits, but
some may be. How do you compress this ragged assortment of numbers?
For these signed numbers, there is a preliminary step of converting to positive integers with a right-affixed sign bit. Lets use some examples,
which we will consider as a frame of 8 "error words" to be compressed.
Decimal
16 bit signed
integer
Sign
http://www.firstpr.com.au/audiocomp/lossless/ (26 of 37) [1/4/2002 10:59:01 AM]
Integer
Integer with sign
at right
Lossless audio compression
+20
+70
-5
+129
-300
+12
+510
-31
0000
0000
1111
0000
1111
0000
0000
1111
0000
0000
1111
0000
1110
0000
0001
1111
0001
0100
1111
1000
1101
0000
1111
1110
0100
0110
1011
0001
0100
1100
1110
0001
+
+
+
+
+
-
1 0100
100 0110
101
1000 0001
1 0010 1100
1100
1 1111 1110
1 1111
101000
10001100
1011
100000010
1001011001
11000
1111111100
111111
The decimal and spaced-out 16 bit signed binary integer representations are to the left. Following that are the numbers converted into a sign and an
unsigned integer, with leading zeroes removed. Then those integers have there corresponding sign bit tacked on to the right end, with negative
being a "1".
This right column of binary numbers is what we want to store in a compact form. The big problem is that their lengths vary dramatically.
The first part of the Rice algorithm is to decide a "boundary" column, in this set of numbers. To the right, the bits are generally an
impossible-to-compress mixture of 1s and 0s. To the left, for some samples at least, there are some extra ones and zeros which we need to code as
well. For example, by some algorithm (which can be complex and iterative for Rice coding of these numbers on the left) a decision is made to set
the boundary so that, in this example, the right-most 6 bits (including the sign bit which has been attached) is regarded as uncompressible. These
bits will be sent directly to the output file. Also, by some means, the decoder has to be told, via a number in the output file, to expect this six bit
wide stream of bits. (The decoder already knows the block length, which is typically fixed, so when it receives 48 bits, in this case of an 8 sample
frame, in the context of already having been told to expect 6 bits of each sample sent directly, then it knows exactly how to use these 48 bits.
Typically lossless audio programs use longer frames, say several hundred samples.)
So the body of data we are trying to compress is broken into two sections:
10
100
1001
1111
101000
001100
1011
000010
011001
11000
111100
111111
or, more pedantically:
10
100
1001
1111
101000
001100
001011
000010
011001
011000
111100
111111
http://www.firstpr.com.au/audiocomp/lossless/ (27 of 37) [1/4/2002 10:59:01 AM]
Lossless audio compression
Sending the right block is straightforward - but how to efficiently encode the stragglers on the left? (Note, "straggler" is my term!) Here they are,
expressed as the shortest possible integer. I have added two columns for their decimal equivalent and for which of the 8 samples they are:
Binary
0
10
0
100
1001
0
1111
0
Decimal
Straggler of
sample number
0
2
0
4
9
0
15
0
0
1
2
3
4
5
6
7
As mentioned above, the Rice algorithm (for k = 1) has a simple rule for transmitting these numbers. Send the number or 0s as per the value of the
number, and then send a 1 to indicate that this is the end of the number. (In other descriptions, a variable number of 1s may be sent with a
terminating 0.)
So to send straggler number 0, which has a value of 0, the Rice algorithm sends (i.e. writes to the output file) a single "1".
To send straggler number 0 (value 2) the Rice algorithm sends "001".
Similarly, for straggler number 6, the Rice algorithm sends "0000000000000001".
To encode the above 8 stragglers, the Rice algorithm puts out:
10011000010000000001100000000000000011
Now, seeing strings of 0s in a supposedly compressed data stream does seem a little incongruous. As the Kingdom of Id's fearless knight, Sir
Rodney was heard to say, when during a ship inspection he was shown first the head (lavatory) and then the poopdeck: "Somehow, I sense
redundancy here.".
So I "invented" an alternative. Initially I could find no reference to such an improvement on basic Rice (k = 1), but a more extensive web-search
showed that my system was a sub-optimal variation on "Elias Gamma" coding.
More efficient alternatives to Rice Coding - Elias Gamma and other codes
The code I "invented" is not exactly described in papers I have found, so I will give it a name here: Pod Coding because it reminds me of peas in a
pod.
Rice (k = 1) coding has the following relationship between the number to be coded and the encoded result:
Number
Dec
Binary
Encoded result
http://www.firstpr.com.au/audiocomp/lossless/ (28 of 37) [1/4/2002 10:59:01 AM]
Lossless audio compression
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
00000
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
01011
01100
01101
01110
01111
10000
10001
10010
1
01
001
0001
00001
000001
0000001
00000001
000000001
0000000001
00000000001
000000000001
0000000000001
00000000000001
000000000000001
0000000000000001
00000000000000001
000000000000000001
0000000000000000001
Rice coding makes most sense when most of the numbers to be coded are small. This suits the "exponential" or "laplacian" distribution of numbers
which typically make up the differences of a lossless audio encoding algorithm.
(Speaking very loosely here - where is a reference on the various distributions and a graphic of their curves? Gaussian is a bell-shaped
curve. Britannica has some explanations and diagrams: http://www.britannica.com/bcom/eb/article/2/0,5716,115242+8,00.html . . .
. . . the best reference I can find is from Tony Robinson's paper:
http://svr-www.eng.cam.ac.uk/reports/ajr/TR156/node6.html
This includes some very helpful graphs.
One page: http://www.bmrc.berkeley.edu/people/smoot/papers/imdsp/node1.html describes laplacian as being double-sided
exponential.)
Here is the "pod" approach. The example is probably easier to understand than the formal explanation:
Number
Dec
Binary
0
00000
1
00001
2
00010
3
00011
4
00100
Encoded result
1
01
0010
0011
000100
http://www.firstpr.com.au/audiocomp/lossless/ (29 of 37) [1/4/2002 10:59:01 AM]
Lossless audio compression
5
00101
000101
6
00110
000110
7
00111
000111
8
01000
00001000
9
01001
00001001
10
01010
00001010
11
01011
00001011
12
01100
00001100
13
01101
00001101
14
01110
00001110
15
01111
00001111
16
10000
0000010000
17
10001
0000010001
18
10010
0000010010
Here is the formal definition.
For 0, send:
1
For 1, send
01
For 2 bit numbers 1Z, send
001Z
For 3 bit numbers 1YZ, send
0001YZ
For 4 bit numbers 1XYZ, send
00001XYZ
For 3 bit numbers 1WXYZ, send
000001WXYZ
There's no problem for the decoder knowing how many bits WXYZ etc. to expect - it is one less than the number of 0s which preceded the 1.
In two instances, "pod" coding produces one more bit than the Rice algorithm. In 4 instance it produces the same number of bits. In all other case,
it produces less bits than the Rice algorithm.
Number
Dec
Binary
0
00000
1
00001
2
00010
3
00011
4
00100
5
00101
6
00110
7
00111
8
01000
9
01001
10
01010
Rice
bits
1
2
3
4
5
6
7
8
9
10
11
Pod
bits
1
2
4
4
6
6
6
6
8
8
8
Benefit
-1
-1
1
2
1
2
3
http://www.firstpr.com.au/audiocomp/lossless/ (30 of 37) [1/4/2002 10:59:02 AM]
Lossless audio compression
11
12
13
14
15
16
17
18
19
20
01011
01100
01101
01110
01111
10000
10001
10010
10011
10100
12
13
14
15
16
17
18
19
20
21
8
8
8
8
8
10
10
10
10
10
4
5
6
7
8
7
8
9
10
11
There would be few situations in which the loss of efficiency encoding "2" and "4" would not be compensated by the gains in coding numbers higher
than "5".
In the above example, the output of the Rice and "Pod" algorithms respectively would be as follows, first broken with commas and then without:
Rice:
1,001,1,00001,0000000001,1,0000000000000001,1
"Pod":
1,0010,1,000100,00001001,1,000011111,1
Rice:
10011000010000000001100000000000000011
"Pod":
1001010001000000100110000111111
I imagine that "Pod" coding (the way it encodes each straggler) would be suitable for sending individual signed and unsigned integer values in a
compressed datastream, such as changes from one frame to the next in prediction parameters and the number of bits (width of the block to the right
of the "boundary") to be send directly without coding,
"Pod" packing has the fortuitous property that the number of bits produced is exactly twice the number of bits encoded - except for encoding "0"
which produces 1 bit.
This should greatly simplify the algorithm in a Rice-like system for deciding where to set the boundary between bits to be sent unencoded, and those
stragglers to the left to be encoded with the "Pod" algorithm.
A practical implementation could determine the bit length for each complete "sample" (including its sign bit on the right) which is being
accumulated in the array prior to coding. By maintaining a counter for each possible sample length, and incrementing the appropriate counter for
each sample created, then an array of integers would result showing how many 10 bit numbers there were in the frame, how many 9 bit numbers and
so on.
Since "Pod" coding produces precisely 2 bits for every input bit, and one bit if the input is 0, then as we move the boundary (between "stragglers" to
the left and "direct send" bits to the right) to the right, we can answer the question of when to stop rather easily:
1. Samples which are not stragglers require 1 bit to code with "Pod".
2. No matter how long the straggler, moving the boundary to the right (including the case of a "1" appearing as a straggler in the column
http://www.firstpr.com.au/audiocomp/lossless/ (31 of 37) [1/4/2002 10:59:02 AM]
Lossless audio compression
immediately to the left of the boundary) will require 2 bits to encode this extra straggler bit.
3. Moving the boundary to the right saves 8 bits (in this example) by reducing the number of bits to be sent without coding from the block to the
right of the boundary.
The result is a simultaneous equation involving a potential cost in the straggler encoding part of N (the number of samples in the frame) and that cost
being 2 times the number of stragglers minus 1 times the number of non-stragglers.
Therefore, the break-even point is the boundary position where 1/3 of the samples have stragglers and 2/3 don't.
The optimal position for the boundary is the one to the left of the position which would increase the number of stragglers to more than 1/3 of the
number of samples in the frame.
With the Rice algorithm, this process is more complex (I think) because the exact number of bits to be sent depends on the exact values of the
stragglers, not just their bit-length. Some papers give a simple formulae based on the values of all the samples to determine the optimal placement
of the boundary.
I have not implemented or tested this "Pod" approach.
This approach would provide the most benefit over the "Rice" approach when the stragglers have relatively high values - which means when most of
the samples are small and a few are much larger. Since "Pod" outperforms the Rice algorithm when the samples peak at around 16 or more times
the maximum value which will fit to the right of the "boundary" and since, at a guess, the average of those samples which do not form stragglers
would be around a half to a third of that value, this "Pod" approach would only have benefits for relatively "spiky" values.
To what extent this is true of error samples to be encoded in a lossless audio algorithm, I can't say, but I would venture that it is is proportional to the
unpredictability of the music.
Now it turns out that my "pod" approach is a variation, and not necessarily the best variation, on "Elias Gamma" coding.
By far the best paper I can find on Golumb, Rice, Elias and some more involved and novel codes is by Peter Fenwick (
http://www.cs.auckland.ac.nz/~peter-f/ )
Punctured Elias Codes for variable-length coding of the integers
Peter Fenwick [email protected]
Technical Report 137 ISSN 1173-3500 5 December 1996
Department of Computer Science, The University of Auckland [email protected]
This is available as a Postscript file at: ftp://ftp.cs.auckland.ac.nz/out/peter-f/ TechRep137.ps . I have converted it into a PDF file here:
TechRep137.pdf . (Peter Fenwick wrote to me that this is fine.)
Peter Fenwick's purpose co-incides closely with our interest - how to efficiently encode integers of varying lengths, when most of the integers are
small.
He refers to a paper:
P. Elias, Universal Codeword Sets and Representations of the Integers, IEEE Trans. Info. Theory, Vol IT 21, No 2, pp 194-203,
Mar 1975.
There is an abstract for this at: http://galaxy.ucsd.edu/welcome.htm but no PDF.
Peter Fenwick's paper describes:
http://www.firstpr.com.au/audiocomp/lossless/ (32 of 37) [1/4/2002 10:59:02 AM]
Lossless audio compression
Golumb and Rice codes of various orders.
● Elias gamma codes.
● Elias delta codes.
● Elias omega codes, which are comparable to Even-Rodeh codes
● Start-Step-Stop codes.
● Ternary comma codes.
● His new "punctured" code.
● Comparisons of all the above for different sizes of integer and for two probability density functions of actual files in a text compressor.
● A modified ternary comma code with slight improvement for low numbers and less efficiency for numbers higher than 15.
● Variable radix gamma codes - a special, self-extending, case of Start-Step-Stop codes.
He describes the Elias Gamma (actually the Gamma symbol with a prime - so maybe Elias Gamma Prime) code as:
●
Number
Dec
Binary
1
00001
2
00010
3
00011
4
00100
5
00101
6
00110
7
00111
8
01000
Encoded result
"Codeword"
1
010
011
00100
00101
00110
00111
0001000
Note that this does not include encoding for zero.
My "pod" approach is simply extending the above code with a 0 to the left of all the above codewords, and a "1" for encoding zero.
This is not how Elias Gamma is normally extended to cover 0. The usual approach is to use the same pattern, but start at 0 rather than 1 for the
number range it encodes. This is known as "biased Elias Gamma". Here are the two codes alongside each other, with their benefits over standard
Rice coding:
Number
Dec
Binary
0
1
2
3
4
5
6
7
8
00000
00001
00010
00011
00100
00101
00110
00111
01000
"Pod"
Biased Elias
Gamma
1
01
0010
0011
000100
000101
000110
000111
00001000
1
010
011
00100
00101
00110
00111
0001000
0001001
http://www.firstpr.com.au/audiocomp/lossless/ (33 of 37) [1/4/2002 10:59:02 AM]
"Pod"
benefit
Biased Elias Gamma
benefit
-1
-1
-1
-1
1
2
1
1
2
3
2
Lossless audio compression
9
10
11
12
13
14
15
16
17
18
01001
01010
01011
01100
01101
01110
01111
10000
10001
10010
00001001
00001010
00001011
00001100
00001101
00001110
00001111
0000010000
0000010001
0000010010
0001010
0001011
0001100
0001101
0001110
0001111
000010000
000010001
000010010
000010011
2
3
4
5
6
7
8
7
8
9
3
4
5
6
7
8
9
8
9
10
Standard Rice coding is clearly the best approach when most of the numbers to be encoded are small.
From the example above, here are the "stragglers" encoded with Rice, my "Pod" extension to Elias Gamma, and the more usual approach: Biased
Elias Gamma:
Rice:
10011000010000000001100000000000000011
"Pod":
1001010001000000100110000111111
Biased Elias Gamma: 1011100101000101010000100001
There are many potential Start-Step-Stop codes - please read the paper for a full explanation. The reference for these is:
E.R. Fiala, D.H. Greene, Data Compression with Finite Windows, Comm ACM, Vol 32, No 4, pp 490-505 , April 1989.
One such code, with an open-ended arrangment so it can be used for arbitrarily large numbers, rather than stopping at 679 as in the paper's Table 5,
is what I would call {3, 2, N} Start-Step-Stop coding, where N is some high number to set a limit to the system.
Number range
to be coded
Codeword
0
8
40
168
0xxx
10xxxxx
110xxxxxxx
1110xxxxxxxx
7
- 39
- 167
- 679
( 4
( 7
(10
(13
bits
bits
bits
bits
total,
total,
total,
total,
1
2
3
4
bit
bit
bit
bit
prefix
prefix
prefix
prefix
+
+
+
+
3
5
7
9
bits
bits
bits
bits
data)
data)
data)
data) etc.
If it was desired to limit the system to 679, (or perhaps limit it to a slightly lower number and use the last few as an escape code for the rare occasion
of encoding anything higher) then the last line would be as it is in the paper:
168 - 679
111xxxxxxxx
(12 bits total, 3 bit prefix + 9 bits data)
This gives excellent coding efficiency for larger numbers, but at the expense of smaller values. In lossless coding, our scheme will very often be
encoding 0, so the above system does not look promising.
An alternative would be {2, 2, N} Start-Step-Stop:
0 4 12 -
3
11
83
0xx
10xxxx
110xxxxxxx
( 3 bits total, 1 bit prefix + 2 bits data)
( 6 bits total, 2 bit prefix + 4 bits data)
( 9 bits total, 3 bit prefix + 6 bits data)
http://www.firstpr.com.au/audiocomp/lossless/ (34 of 37) [1/4/2002 10:59:02 AM]
Lossless audio compression
84 - 339
1110xxxxxxxx
(12 bits total, 4 bit prefix + 8 bits data) etc.
Or {1, 2, N} Start-Step-Stop has another curve, favouring lower values still:
0
2
10
42
1
9
- 41
- 169
0x
10xxx
110xxxxx
1110xxxxxxx
( 2
( 5
( 8
(11
bits
bits
bits
bits
total,
total,
total,
total,
1
2
3
4
bit
bit
bit
bit
prefix
prefix
prefix
prefix
+
+
+
+
1
3
5
7
bits
bits
bits
bits
data)
data)
data)
data) etc.
Peter Fenwick describes new codes in which 0's are interspersed after 1's to indicate that the (reversed) number is not finished yet. These
"punctured" codes take a bit of getting used to, but are very efficient at higher values. The bit length at higher values approximates "1.5 log N bits,
in comparison to 2 log N bits for the Elias codes" (logs base 2).
There is a ragged pattern of code-word lengths.
Number
Dec
Binary
P1
P2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
0
101
1001
11011
10001
110101
110011
1110111
1000001
1101001
1100101
11101101
1100011
11101011
11100111
111101111
1000001
01
001
1011
0001
10101
10011
110111
00001
101001
100101
1101101
100011
1101011
1100111
11101111
000001
1010001
00000
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
01011
01100
01101
01110
01111
10000
The capacity of these various codes to help with coding the "stragglers" in a lossless audio compression algorithm depends on many factors I can't be
sure of at present. Those codes which take more than one bit to code 0 are clearly at a disadvantage, since with the border set optimally for Rice or
"Pod"/Biased Elias Gamma, most (2/3 or more) of the numbers will be 0.
However, since these codes are highly efficient at coding larger integers, the "cost" of having longer stragglers is much less. Standard Rice coding
of stragglers of values more than about 3 bits long can clearly be improved upon with Biased Elias Gamma. This uses about 2 bits per bit of straggler
to be coded. But Peter Fenwick's "punctured" codes use only about 1.5 bits for longer values. This should enable the "boundary" separating the
stragglers from the bits to be sent without compression further to the right, so reducing the number of bits to be sent uncompressed.
http://www.firstpr.com.au/audiocomp/lossless/ (35 of 37) [1/4/2002 10:59:02 AM]
Lossless audio compression
Some other links to sites concerning coding techniques for variable length integers:
●
http://www.cs.auckland.ac.nz/~peter-f/ Peter Fenwick's home page.
●
http://www.hn.is.uec.ac.jp/~arimura/compression_papers.html Mitsuharu ARIMURA's Bibliography on Source Coding/Data Compression.
●
http://www.hn.is.uec.ac.jp/~arimura/compression_links.html Mitsuharu ARIMURA's Bookmarks on Source Coding/Data Compression.
●
http://citeseer.nj.nec.com/cs CiteSeer - a fabulous site indexing and linking a vast number of scientific papers. Often has .PDFs even if the
author only has a PostScript file on their site.
http://www.jucs.org/jucs_3_2/symbol_ranking_text_compression/html/paper.html Symbol Ranking Text Compression with Shannon
Recodings. Another paper by Peter Fenwick, from which I found the link to the paper I have just been discussing.
http://galaxy.ucsd.edu/welcome.htm IEEE Transactions on Information Theory. Index to the Journal, but the PDFs which can be found on
IEEE CD-ROMs do not seem to be available.
http://www.cs.tut.fi/~albert/Dev/pucrunch/ An Optimizing Hybrid LZ77 RLE Data Compression Program, aka Improving Compression
Ratio for Low-Resource Decompression. Commodore 64 compression and the use of Elias Gamma codes.
http://www.cs.tut.fi/~albert/Dev/pucrunch/packing.html A handy tutorial on a number of related techiques.
●
http://www.perfsci.com/algsamp/ USD$89 for a bunch of C source code, including for Elias coding.
●
http://wannabe.guru.org/alg/node167.html Tutorial and source code for Elias codes and Golomb codes.
●
http://www.iro.umontreal.ca/~pigeon/science/vlc/relias.html "Recursive Elias codes" with graphs of their efficiency.
●
http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html Discussion of Elias Gamma and Delta codes.
●
Try Alta Vista Advanced or Google Advanced for terms such as:
●
●
●
●
❍
❍
❍
❍
❍
Elias codes
Elias coding
Elias gamma
Punctured Elias Codes
Data Compression with Finite Windows
Tabulation details
These are low level details of how I processed the file size results.
I used a batch file or GUI to compress all the WAV files to individual compressed files with a distinctive file name extension. Then I would use a
batch file containing "dir > dir.txt" to list the directory and so the file sizes. I would edit out all other lines except those of the compressed files and
put them in the correct order, and then add those to sizes9.txt. Then I would use the MS-DOS command "type sizes9.txt" to show the text in an
MS-DOS window. There is a rectangular text select command there and I selected the block of file sizes, complete with commas and by pressing
Enter copied them to the clipboard. In the spreadsheet, I selected the column containing file sizes and pressed Control V. Voila! The spreadsheet
http://www.firstpr.com.au/audiocomp/lossless/ (36 of 37) [1/4/2002 10:59:02 AM]
Lossless audio compression
does the rest. Then I manually copied the percentages and ratio from the spreadsheet into the HTML table.
The excel spreadsheet and "sizes9.txt" URL is listed near the top of this page. To add your own pair of columns to the spreadsheet, select the block
for an existing program, copy it to the clipboard, place your cursor in the Row 1 cell immediately to the right, and then paste into there.
This page was created with Netscape Communicator 4.7's Composer. In December 2000, Mozilla's Composer still has a way to go.
Updates in reverse order
●
2001 October 31. Added mention of new MUSICompress web site.
● 2001 March 22. Added mention of APT-X Windows DLL.
● 2001 January 17. Added mention of LPAC 3.0.
● 2000 December 19. Added listing of FLAC, but not tests.
● 2000 December 18. Slight change do DAXK notes - that the Mac version performs better than the PC version.
● 2000 December 14. Added mention of MKT and updated RKAU results, with extra tables showing which of the options worked best for each file.
Added a table in the RKAU section on the options available in all programs for using stereo correlations. Rounded average file size and
compression ratios to the nearest number, up or down, so 67.545 becomes 67.55 and 67.544 becomes 67.54.
● 2000 December 12. Completely revised section on Rice coding etc. adding material on Elias Gamma codes. Added mention of the DAKX
Windows version 1.0.
● 2000 December 11. Added test results for RKAU 1.07. Mentioned RAR.
● 2000 December 10. Page completely revised. The old page is here.
●
Robin Whittle [email protected]
Return to the main page of the First Principles web site.
http://www.firstpr.com.au/audiocomp/lossless/ (37 of 37) [1/4/2002 10:59:03 AM]
AAC, MP3 & TwinVQ Page 1 of 2
Comparing AAC, MP3 and TwinVQ Lossy
Compression of Audio
Robin Whittle . Melbourne Australia [email protected] Last major update 13 September 1999. (But
important links added at the front and some other minor changes to links to other sites since then - 22
March 2001. )
Investigating the quality of lossy algorithms: Advanced Audio Coding (AAC), MPEG Audio Layer 3
(MP3) and Yamaha's SoundVQ, an implementation of TwinVQ.
Back to the Audio compression page, which leads to some tests on lossless algorithms (totally updated
in December 2000.).
Back to the First Principles main page - for material on telecommunications, Internet music marketing,
stick insects . . .
14 - 19 December 2000 Please note:
There is a highly significant listening test report from the EBU
in June 2000 on a variety of algorithms, including AAC and
MP3. http://www.ebu.ch/trev_dolby_frm.html Proper
listening tests are very difficult and expensive to conduct. I
recommend you read this report in its entirety before bothering
too much with what I wrote below, in December 1998.
The report and a separate file with the results in greater graphic detail are both
.PDF files. Current Acrobat plugins are a menace in terms of not caching the
file when re-viewing it or printing it, and are often too dumb to save to disk with
the original file name. Here are the URLs of the main report and two
sub-reports which contain graphs in a larger format. If you shift click on them,
you should be able to save them to disc and read them at your leisure. They are
about 1.3 Megs in total
● http://www.ebu.ch/trev_283-kozamernik.pdf
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (1 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
●
http://www.ebu.ch/trev_283-kozamernik-images-1.pdf
●
http://www.ebu.ch/trev_283-kozamernik-images-2.pdf
The EBU report tests the following codecs:
● Microsoft Windows Media 4.
● AAC - implementation by FhG-IIS.
● MP3 - or close to it, by Opticom.
● Q-Design Music Codec 2 - prototype version of that for Quicktime.
● Real Networks 5.0.
● Real Networks G2. Newer, widely used system based on "DolbyNet".
● Yamaha Sound VQ.
These were tested at:
● 16 kbps mono. Q-Design gets special mention for music, but not for
speech.
● 20 kbps stereo. Lower subjective results than 16 kbps mono. Ditto the
Q-Design special mention.
● 32 kbps stereo. AAC leads.
● 48 kbps stereo. AAC leads with MP3 close behind. Windows Media
gets special mention for a folk music test for being indistinguishable from
the reference. Q-Design is not much better than at 20 kbps.
● 64 kbps stereo. AAC wins by a country mile averaging 80 points. At
this data rate, AAC was the only codec which evaluated in the "excellent"
range for all items tested.
This report also discusses the codecs specifically. The Microsoft and Q-Design
codecs show highly variable results on different test material at 48 and 16 kbps
respectively.
While the report does not give the complete breakdown of results, by codec, by
test item, my interpretation of this is:
1. Forget TwinVQ.
2. The Windows and Q-Design codecs were very fussy about what material
they encoded. With some items they were better and others much
worse. Q-Design shows no significant improvement as the data rate
increases.
3. Real Audio G2 is solid at all rates, except 20kbps stereo where Real
Audio 5 is better. G2 rates a fraction better than AAC at 16 kbps mono.
4. MP3 tails slightly behind AAC as the data rate increases, except for at
64kbps where AAC is very significantly better than both MP3 and Real
Audio 2, which have about the same score.
Its horses for courses!
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (2 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
Unfortunately, while AAC is widely regarded as being better then MP3 (as good
at 96kbps as MP3 at 128 kbps) MP3 is good enough and is so established that
the more tightly licensed AAC is unlikely to displace it for a while. Think Beta
vs. crappy, widely marketed VHS, except VHS coming first - and as before, the
average user not being fussy enough to care. Fortunately, with decoders in
software on PCs, we aren't stuck with the fixed hardware and media investments
which makes only one kind of video cassette system viable, even if it is not the
best. Portable MP3 players, including CD players, imbed decoders which
cannot be updated as can PC software.
I think Real Audio G2 is here to stay for a few years for streaming applications,
and for archived files. Its ability for a single file on disc to generate multiple
streams, including via HTTP, for different players, is very snappy.
AAC licensing is apparently tied up with attempts to keep music "secure" which I think is a waste of time.
Here are some other important new URLs:
http://www.commvergemag.com/commverge/extras/P178673.htm
Extensive analysis and links regarding lossy (MP3 and WMA at
least) compression and some lossless codecs. Be sure to check this
site! When I looked at it, the page was corrupt and would only
display properly on MS Internet Explorer. There are many
interesting things here, including a link to his listening tests of a
watermarking system (Hiss!!) which was clearly audible and is
apparently to be used on DVD audio discs. Watermarks are a
waste of time, for too many reasons to explain here, but see what I
wrote in 1997 about them:
http://www.cni.org/Hforums/cni-copyright/1997-02/1005.html .
http://CodecReview.com/ Dave Weekly's specialist site with many
links, some tests of lossless codecs and plans for a much more
extensive and interactive codec comparison.
http://privatewww.essex.ac.uk/~djmrob/mp3decoders/ David J
M Robinson tests 24 MP3 decoders with a variety of encoders,
including VBR (variable bit rate) and finds that only five pass all
his tests. Salute!
There is a freeware AAC encoder project:
http://www.audiocoding.com The source code is available at:
http://sourceforge.net/projects/faac/ There is a bit of patent
cat-and-mouse going on here!
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (3 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
A Dolby AAC site is: http://www.aac-audio.com . The announce
that Music-Match Jukebox will support AAC. I had a suspicion
that AAC or some related Dolby approach is used in Real Audio,
which I think achieves remarkable results in stereo at only 20
kbps. The music lacks top-end detail, and speech sounds a little
odd, but the music is still well worth listening to, for instance, from
the archives or real-time source at fab community music station
WMNF in Florida. However, the EBU report mentioned above
distinguishes between AAC and Real Audio. CodecReview.com
states that Real Audio 3 to 5 is based on DolbyNet/AC-3
http://www.dolby.com/tech/ac3flex.html . But what technology is
behind Real Audio G2?
The MP3 Encoder's Mailing List is at:
http://geek.rcc.se/mp3encoder/ .
Scope
This page documents my own investigation of the audio quality provided by AAC (an early,
unlicensed and non-optimised encoder / decoder) , MP3 and TwinVQ/SoundVQ. These are
not full-blooded double-blind listening tests. They are for my own interest and concentrate
on finding musical sounds which are most likely to cause audible differences in the decoded
signal. These test show the performance of particular encoders and decoders, and do not
necessarily show the maximum possible performance of the algorithm.
This site also contains links to other sites regarding these three compression algorithms.
I am particularly interested in the applicability of these compression algorithms to music
delivery - as part of my interest in music marketing, which is the subject of a separate page:
musicmar .
Note that this is not an investigation of low bit-rate schemes suitable for streaming
(real-time delivery) of music via 33.6 or 56 kbps modems. Although I tested some lower bit
rates, I didn't really investigate them. My question was: "What algorithm and bit rate can
be relied upon to encode a very wide range of music so it is audibly indistinguishable from
the original, including with demanding listeners and listening environments?"
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (4 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
6 July 2000 Please note:
1. This work was done in late 1998 and I am not attempting to keep up with
developments in this rapidly changing field. I can't keep this as an
up-to-date link farm for lossy compression either.
2. See the following sites for more recent developments and links:
❍
❍
❍
http://www.mp3-tech.org/ Lots of up-to-date analyis.
http://users.belgacom.net/gc247244/ Detailed testing of MP3
encoders, showing that open-source LAME is the way to go!
LAME is now available as executables for Windows (
http://www.mp3-tech.org/encoders_win.html but you might be
violating patents to use it) as well as in source versions for Linux,
Windows etc. LAME is an intense collaborative effort and no
longer relies on ISO code. Salute! http://www.sulaco.org/mp3/
.
13 September 1999 Please note:
1. This work was done in late 1998 and I am not attempting to keep up with
developments in this rapidly changing field. I can't keep this as an
up-to-date link farm for lossy compression either.
2. My aim was not to find the best MP3 encoder or decoder, but to find out
roughly how good the various algorithms were, or could be.
3. Most of the things I tested have now been superseded by later versions for instance MusicMatch http://www.musicmatch.com/ is now (Sept 99)
up to version 4.1 totally different from the demo 2.50.005 version I
used.
4. I am currently using LAME http://www.sulaco.org/mp3/ on my Linux
machines for MP3 encoding.
Summary
AAC is a most impressive compression algorithm. According to carefully conducted
listening tests, at 128 kbps, it seems to be superior to MP3 at 192 kbps. This is reported by
David Meares, Kaoru Watanabe and Eric Scheirer in their February 98 paper which is in a
Word 6 file, zipped at: http://www.cselt.it/mpeg/public/w2006.zip . I have quoted some of
the results below, in the AAC section.
I found that the audio quality of the Yamaha SoundVQ encoder (2.54eb1) and decoder
(2.51eb1) is noticeably inferior to MP3 or AAC at the available bit rates of 96 and 80 kbps
for stereo. Its performance on simple slowly swept-frequency sine-waves in the 3 to 6 kHz
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (5 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
range is really bad. Amongst TwinVQ users, these problems are generally well recognised
and accepted - with the argument that TwinVQ's artefacts are not too unpleasant, that it's
lower bit rate (80 or 96 kbps) is attractive and that it copes well with a wide variety of
music, including tracks which work badly with MP3 joint stereo (for instance those from
analogue master tapes which have significant L - R phase differences).
Test sound files, and some of the decoded files are provided in .WAV format. I have
included some graphic frequency analysis images as well.
I don't believe that the term "CD quality" should be applied to any lossy algorithm. That
said, I believe that for the majority of music and listening conditions, MP3 when properly
implemented at 128 kbps (though it seem that joint stereo will fail with some out of phase
material) and AAC when properly implemented at 128 and probably 96 kbps will probably
reproduce virtually all music in a way that the degradation is inaudible to virtually all
listeners.
Personally, if I was buying music, I would want a delivery system that wasn't teetering on
the edge of human perception. My tests of lossless algorithms (See here.) suggest that for
pop, rock and techno, music can only be compressed losslessly to about 55 to 75% of its
normal size.
Until Internet bandwidth and costs improve, MP3 and soon AAC will play a vital role in the
discovery and delivery of music for commercial and non-commercial purposes.
Caveats
I do not have a lot of experience with these algorithms. This was an attempt to find
whatever it took to trip MP3, AAC and TwinVQ up. TwinVQ, trips up on the most
fundamental component of sound - the sine wave - and so I cannot take it seriously. Nor do
I think claims that "music does not contain sine waves" are valid. (Think of the Theremin in
the Dr Who theme.) Accepting its limitations, it does cope remarkably well with a wide
range of music. Lots of people like TwinVQ, and a lively discussion about it can be found
at the VQF.COM discussion forum: http://www.vqf.com/bbs/?board=VQF.comForum ,
particularly starting with my post.
This field is changing rapidly. I may not be able to keep this page up-to-date. Be sure to
check with the sites mentioned below for the latest developments.
There are many MP3 encoders and decoders, and it is evident that depending on the
combination of encoder/decoder, the data rate, the type of music, the choice of stereo or
joint-stereo for encoding (if you can choose), and characteristics of the original material
which can cause joint-stereo encoding to sound bad, the audible results may vary
considerably. To test all the combinations would be a mammoth task. Please let me know
if you find anyone doing this even partially.
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (6 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
Updates
I can't keep up with all the developments in lossy audio compression, but I will attempt to
update this page - primarily by linking to more up-to-date sites.
One set of updates is flagged in the text as: up990424 for 24 April 1999. If you search for
this, you will see what has changed.
Another set is flagged in the text as: up990606 for 6 June 1999.
Preamble
I believe that if the Analogue to Digital Conversion (ADC) and Digital to Analogue
Conversion (DAC) are performed properly, then the 44.1 kHz sampling rate and linear 16
bit resolution system established by Sony in the early 1980s for the audio CD is entirely
adequate for reproducing stereo signals which are to be heard by humans in any "ordinary"
listening environment. (This includes the highest quality headphones and speakers with the
most exquisite music. It does not involve hiding a safe distance from the speakers when the
cannons in the 1812 overture go off, and then running up to the speaker to hear
quantitisation noise as the track fades out.)
Achieving the potential of 16 bit 44.1 kHz digital audio is a challenging task - it only
became possible around 1990 as far as I am aware. It can best be accomplished with
oversampling ADCs followed by linear-phase digital decimation filters to bring the
sampling rate down to 44.1 kHz, whilst rejecting frequencies outside the audio range
without the need for high-Q analogue filters. For instance see the Delta-Sigma ADCs of
Crystal Semiconductor. (The mathematical and electronic principles of these delta-sigma
ADCs are partially beyond me.)
With the existence of the CD, the DAT recorder and the CD-R, these extraordinary ADCs
which Crystal and AKM pioneered have, as far as I can see, solved the problems of audio
recording and storage.
So why do some people want 96 kHz sampling? Maybe to keep their canine friends happy
or to impress those, including themselves, who believe that 44.1 kHz is inadequate? (There
are some people who work professionally in audio who are very keen about 96 kHz
sampling. Check the Seneschal site for material on 96 kHz audio.) I agree that 20 bit
resolution is highly desirable for recording, mixing and editing, but I still think that a
properly edited (with dither) recording in a form suitable for playback on headphones or
loudspeakers can contain a perfectly adequate signal to noise+distortion ratio with a 16 bit
signal resolution at 44.1 kHz. (Dither extends the resolution in the most audible
frequencies by several bits - to 18 or 19 bits or so. The playback is probably best done with
4 or 8 times oversampling digital filters and 18 bit current switching DACs (the extra bits
are output by the filters and should be used) so that only a very gentle analogue low-pass
filter is required.
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (7 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
Lossless compression (compression is here used as a synonym for "data-reduction")
algorithms for 16 bit 44.1 kHz stereo signals (1,411,200 bits per second) seem to reduce
most music by only about 30% - so they are not very widely used. It looks like a daunting
task to do much better than this.
So why are people saying that MPEG Audio Layer 3 compression to 128,000 bits per
second (128 kbps - a compression ratio of 11.025 to 1) is "CD Quality"? Because, they
want to believe it is true, or they can't tell the difference. (But see later - when I found it
hard to tell the difference too.) "CD Quality" should rightfully mean any lossless form of
conveying the full 44.1 kHz 16 bit stereo bitstream - but the term has been so widely
misused now that I think it is best avoided.
MPEG Audio Layer 3 (hereafter referred to as "MP3") and perhaps AAC (MPEG
Advanced Audio Coding) are shaping up as the preferred form of distributing and storing
music via the Internet. In general the bit rate of 128 kbps is used at present - so I am
concerned that we are taking a serious step backwards in audio quality from the potentially
pristine and transparent 16 bit 44.1 kHz system established by Toshi Doi and his colleagues
at Sony in the late 1970s.
These two algorithms - and TwinVQ (Yamaha calls it SoundVQ) - all work by breaking the
sound into short time segments, filtering those segments into separate frequency bands,
encoding the signal in each frequency band, and then - using a mathematical model of
human hearing, sending the most audible parts of the signal to the output stream. With
enough bits in the output stream, the result may be lossless - the decoded file is bit-for-bit
identical with the original. However at the data-rates of interest to Internet users, these
compression algorithms are certainly lossy. With a lot of music, on the crappy speakers
that many people listen to music on, in the imperfect listening conditions (computer, car and
other background noises), this loss in the compression system may not be audible at all.
So for general use, with lots of boisterous music, I think these algorithms are likely to be
fine at 128 kbps - assuming the encoding (compression) is performed optimally, which may
not always be done due to not all encoders (or decoders) being perfectly written and due to
CPU-intensive nature of filtering, analysis and of the recursive approaches to figuring out
the best way to pack the data into the output stream.
However this is not to say that the losses in the compression algorithms are insignificant or
should be ignored. Sound and human hearing involves very subtle processes - and having
come all this way to the point where we can record and reproduce stereo audio without any
significant degradation, I don't believe we should put up with lossy compression algorithms
if we are purchasing music for keeps.
This page links to some sites of interest regarding compression, and then documents my
attempt to find the weaknesses of MP3, AAC and VQ.
In the future I may have some links regarding "digital watermarking" or "fingerprinting".
For now, let me say that I think watermarking is doomed to failure for a number of technical
and business reasons.
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (8 of 16) [1/4/2002 10:59:46 AM]
AAC, MP3 & TwinVQ Page 1 of 2
The three encoder-decoders I used
AAC: The AAC compression algorithm is documented at http://mp3tech.cjb.net and
www.mp3.com has a list of AAC software. From that list I found the site of the enigmatic
Astrid/Quartex (up990424 it was at
http://www.geocities.com/ResearchTriangle/Facility/2141/ but see the AAC links section
below on where to get it) - who has a Windows based AAC encoder and decoder. Thanks
to [email protected] for making this software available! The files I got were
called aacdec01.zip and aacenc02.zip. These contain version 0.1 of the decoder and 0.2 of
the encoder. The encoder zip file contained an executable and an aacenc.txt file which
were dated 12 October 1998.
Be sure to check at Astrid's site above, and at the AAC sites listed below for later versions but here are the zip files in case you find them hard to get. aacdec01.zip aacenc02.zip
According to the Fraunhofer AAC FAQ, any software (such as Astrid/Quartex's) which is
based on the MPEG source code will not be of the highest quality, and any AAC
implementation must be licensed by the patent holders. In case Astrid/Quartex's site
disappears, you may wish to search AltaVista for "aacenc" or "aacdec", (or with "02" or
"03" etc, after that name - such as "aacenc02" or refer to some of the sites in the AAC links
section below. There is another AAC encoder/decoder from Homeboy as well. See the
AAC links section below for more sites for the Astrid/Quartex encoder/decoder.
MP3: The Munich based Fraunhofer Institut for Integrated Circuits IIS-A is in many
respects the home of MP3 - they did a lot of the work on developing the standard:
http://www.iis.fhg.de/amm/techinf/layer3/ They are not so popular in MP3 circles at
present (November 1998) because of claims they are making regarding patents and pressure
they have successfully exerted on a number of authors of freely available and/or shareware
MP3 programs. I used their Windows demo-edition encoder and decoder for these
experiments. The versions I used are: WinPlay 3 Version 2.3 beta 5 from
http://www.iis.fhg.de/amm/download/mp3player/index.html and the command-line encoder
program "mp3encdemo31.exe" which identifies itself as "MPEG Layer-3 Encoder V3.1
Demo (build Sep 23 1998)" and which comes in the file: mp3encdemo_3_1_win32.zip. The
encoder is available for various Unices - including x86 Linux - and Windows at
http://www.iis.fhg.de/amm/download/mp3enc/index.html .
VQ: Yamaha has a freely available VQ (more properly TwinVQ) encoder and decoder for
Windows - which I used in these tests:
http://www.yamaha-xg.com/english/xg/SoundVQ/index.html . The versions I used are:
encoder 2.54eb1 and decoder 2.51eb1.
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (9 of 16) [1/4/2002 10:59:47 AM]
AAC, MP3 & TwinVQ Page 1 of 2
Links and information about the algorithms
AAC
AAC will be part of the forthcoming MPEG-4 standard, so "AAC", "MPEG-4" and "MP4"
may be used interchangeably at some sites.
There are three "profiles" for AAC in the MPEG-2 data stream. "Main" is the fully fledged
AAC. "LC" (Low Complexity) and "SSR" (Scalable Sample Rate) are lower quality options
for restricted CPU power implementations. I think that all AAC software mentioned here is
not mucking around with the lower quality profiles.
● The definitive reference for MPEG Audio, including AAC (AKA MP4, of which it is
a subset) is the MPEG Audio FAQ by D. Thom, H. Purnhagen, and the MPEG
Audio Subgroup: http://www.cselt.it/mpeg/faq/faq-audio.htm. (Note this server
is also known as drogo.cselt.stet.it )It mentions that Dolby Laboratories should be
contacted for AAC licensing - they and associated companies have some of the AAC
technologies covered by patents.
●
Dolby Laboratories has an email address, listed at:
http://www.dolby.com/trademark/ for AAC licensing. In a letter to MP3.com's CEO
Michael Robertson http://www.mp3.com/news/135.html ( 23 November 1998 ) Dolby
Laboratories states that they are "the licensing administrator for a new compression
technology called AAC". The AAC patent rights apparently belong to AT&T,
Dolby, Fraunhofer and Sony. Dolby asked Robertson to remove links from his
www.filez.com to unlicensed AAC software. "These companies take the unlicensed
use of their technology very seriously, and are presently in the process of
communicating with each of your linked sites. Our goal is to provide them
inexpensive licensing arrangements so that they can continue to utilize AAC
technology." Following on from the letter to Michael Robertson, there is a lively
discussion board (as there is for each MP3.com news item) at:
http://bboard.mp3.com/ubb/Forum4/HTML/000148.html . At present, the only AAC
encoders and decoders which are generally available are the Homeboy and
Astrid/Quartex pairs. While recognising that these are far from optimal, I consider
their availability to be vital for those such as myself who are interested in foreseeing
the development of music marketing - and probably for quite a few other purposes.
If I become convinced that these authors are materially and negatively affecting
whoever owns the patents for these principles, or if I think they are lowering the
standard of audio and software development, then I might take a dim view of them.
At present, they are the only place you can get an AAC encoder or decoder without
paying very large up-front license fees - and they are doing it for free because of their
interest in audio. Hopefully Dolby Laboratories will succeed in making this
excellent technology available to all those who can use it at a price which makes
sense. For a long time they did it with Dolby B, and maintained standards at the
same time. Now it's software and a very different marketing model.
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (10 of 16) [1/4/2002 10:59:47 AM]
AAC, MP3 & TwinVQ Page 1 of 2
●
●
●
●
●
Astrid/Quartex (up990424) used to have a site with the command line Windows
AAC encoder/decoder I used:
http://www.geocities.com/ResearchTriangle/Facility/2141/ . The programs are
mirrored here , here and at this site (see the AAC section above). According to the
MPEG Audio FAQ V9, referring to the publicly available reference software on
which the Astrid/Quartex 0.2 encoder is based: "The encoder software is not yet a
general multi-channel encoder, and does not yet make use of all AAC coding tools."
Therefore, this early software does not provide the full performance which is possible
with AAC.
K+K Research in Denmark(up990424) has a new AAC encoder and decoder:
http://kk-research.hypermart.net/ . I have not tried it.
KM (up990424) http://cad-audio.fsn.net/ (who is associated with K+K) has
extensive and up-to-date pages on audio compression in general and on AAC in
particular: http://cad-audio.fsn.net/aacinfo.htm .
Homeboy Softwarehttp://www.eotd.com/hbsaudio/default.htm are the other people
who have gone to the trouble of writing and freely releasing AAC encoders/decoders
in late November 1998. They seem to have an AAC player plug-in for WinAmp, and
AAC encoder (aacenc05.zip) which has known problems - they are working on a new
version - and soon an AAC player for the Macintosh. Who are these dudes? One of
the directors apparently posted to the AAC discussion at:
http://bboard.mp3.com/ubb/Forum4/HTML/000148.html .
CSELT's Official MPEG web site http://www.cselt.it/mpeg/ has a Word 6 file
w2006.doc, pkzipped, containing a detailed February 1998 report from David Meares,
Kaoru Watanabe and Eric Scheirer comparing AAC and MP3 at various bit rates with
carefully conducted listening tests. The title is: "Report on the MPEG-2 AAC
Stereo Verification Tests". http://www.cselt.it/mpeg/public/w2006.zip . I quote some
of the results below in the AAC section - they are most impressive.
●
mp3.com has a list of AAC software.
●
See the next section for a link to MP3Tech.
●
A site dedicated to AAC is the Advanced Audio Coding
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (11 of 16) [1/4/2002 10:59:47 AM]
AAC, MP3 & TwinVQ Page 1 of 2
Homepagehttp://nedhosting.com/users/aac/ . They have a discussion section.
●
●
●
●
●
●
●
●
The Fraunhofer Institut has some excellent technical material, including an encoder
block diagram on AAC http://www.iis.fhg.de/amm/techinf/aac/ See also the
FAQ at this site.
Forbidden Donut Unlimited has a site http://www.forbiddendonut.com which
includes a copy of the Astrid/Quartex aacenc02.zip file.
There was a Windows player for AAC, MP3 and VQ files, called KJofol 0.402 The
site used to be at: http://www.audioforge.net/kjofol/ but seems to be gone now . . . but
see below for new sites. There was a letter.txt there requesting the authors stop
distributing the program, because of a claimed patent violation. Take a look at the
screenshot. For some reason, this audio compression field leads programmers to
create interfaces they think are exquisitely beautiful and easy to use, but which I think
just the opposite! MP3 players FreeAmp, Sonique and quite a few others are really
non-standard and focused on circles and curves and trying to be like a piece of hi-fi
equipment, rather than a plain, easy-to-use program.
A new site for KJofol (Windows player for MP3, AAC and VQ files) is
http://kjofol.org On 26 November 1998, this has the v0.42 and promises v0.5 soon.
Mirrors are here, here and here.
A company called Mayah plans an editor for AAC files:
http://www.mayah.com/english/n980918e.html .
A site called AAC Nethttp://www.worldzone.net/ss/aacnet/ has some AAC
information and also seems to be available from: http://come.to/justmp3 (In the
Tongan domain!). They have a discussion section.
MP4 Central http://people.goplay.com/MP4Central/ or http://come.to/mp4central
concentrates on AAC audio files.
Liquid Audio has a commercial program, Liquefier Pro for Windows, which will
soon encode AAC (AKA MP4) files.
http://www.liquidaudio.com/products/liquifier.html However, I think the output is
probably a proprietary format - or at least optionally so. Liquid Audio promote the
use of watermarking and encryption in an attempt to stop people copying music. I
think this is a waste of time.
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (12 of 16) [1/4/2002 10:59:47 AM]
AAC, MP3 & TwinVQ Page 1 of 2
●
●
Two Japanese AAC sites: http://ha2.seikyou.ne.jp/home/tlswosk/comp/aac.html and
http://www.moemoe.gr.jp/~hibari/aacjapan.html .
AAC is also used, together with encryption and proprietary file formats, by AT&T's
http://www.a2bmusic.com with their "a2b" player and music control system. Files
purchased from their site reside on the user's computer and are supposedly unplayable
on any other computer. See my music marketing material for why I think this
approach to hang onto old certainties about uncopyable music is doomed to failure.
MP3 and other algorithms
See above for links new sites in 2000.
Quite a few of these sites concern AAC, TwinVQ and PAC compression too.
●
●
●
●
The Official MPEG site is at CSELT in Torino, north-west Italy:
http://www.cselt.it/mpeg/
MP3Tech http://www.mp3tech.org/
has information regarding MP3,
AAC, TwinVQ, Dolby AC-3, listening tests, patents etc. There is a web discussion
forum and an mp3tech mailing list. There is also results of a limited but interesting
listening test of MP3 at different bit rates .
A significant development is LAME http://www.sulaco.org/mp3/ This is an open
source patch for the publicly available ISO encoder source file to correct errors in the
algorithms, improve sonic performance and make it run faster. Distribution of this
patch should be free of the patent restrictions concerning functional MP3 encoders
(executables or source). This is a very promising development! The LAME crew
are working intensively on all this and have a deep understanding of the psycho
acoustics and the encoding algorithms. There is also a link to an MP3 Encoders
mailing list. ( up990606 ).
There are zillions of MP3 sites and a vast range of software. I won't attempt to keep
up with it - see the biggest activist site in the MP3 universe is http://www.mp3.com .
There you will find extensive discussion of the technical, legal, moral, industry and
political aspects of audio compression, electronic delivery of music and of copying
and copyright. They also have an extensive set of links to all the relevant MP3 play,
decode, encode etc. software. An essential starting point!
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (13 of 16) [1/4/2002 10:59:47 AM]
AAC, MP3 & TwinVQ Page 1 of 2
●
●
●
●
●
●
●
GoodNoise - soon to be Emusic - is a pioneering company (together with Nordic
Communications) in selling music with discovery, delivery and payment via the Net,
in an open standards format (MP3) and without attempts at preventing listener
copying. There are many other important music marketing sites - so see
www.mp3.com and my music marketing material on another page of this web site:
musicmar .
Cedric Amand's http://mp3bench.com has a variety of interesting technical,
performance and popularity material regarding MP3 software, and AAC as well.
The Fraunhofer Institut has some excellent technical material:
http://www.iis.fhg.de/amm/techinf/ .
The Motion Picture Experts Group site http://www.mpeg.org has lots of information
on MP3 and AAC - and on the data streams they can be put into. MPEG numbering
and terminology is a mess - I won't get into it here.
Karsten Madsen <[email protected]> has a site: http://cad-audio.fsn.net/
reviewing the Liquefier Pro encoder's AAC (Proper Dolby/Fraunhofer encoder, I
believe), Astrid/Quartex's AAC software, PAC, MP3 and VQF.
Not related to the audio quality, but relevant to the way people organise large
numbers of MP3 files, is the ID3v2 tagging specification:
http://www.lysator.liu.se/id3v2/ . This is an informal and evolving standard, and I
think the web site is beautifully organised and presented. From their explanation:
"ID3v2 is a new tagging system that lets you put enriching and relevant information
about your audio files within them. In more down-to-earth terms, ID3v2 is a chunk of
data prepended to the binary audio data. Each ID3v2 tag holds one or more smaller
chunks of information, called frames. These frames can contain any kind of
information and data you could think of such as title, album, performer, website,
lyrics, equalizer presets, pictures etc.
(Update 8 Jan 1999.) Leonardo Maffi has some detailed material, mainly in Italian,
testing the performance of lossy audio and other compresssion algorithms:
http://computer.digiland.it/1609/ .
MPEG-4
●
The Official MPEG site is at CSELT in Torino, north-west Italy:
http://www.cselt.it/mpeg/. They have a version 2 draft of the MPEG-4 work:
http://www.cselt.it/mpeg/standards/mpeg-4/mpeg-4.htm This mentions that AAC and
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (14 of 16) [1/4/2002 10:59:47 AM]
AAC, MP3 & TwinVQ Page 1 of 2
TwinVQ will be part of the forthcoming MPEG-4 standard. MPEG-4 covers a
bewildering array of concepts beyond direct compression of audio and video. One
relatively straightforward aspect is SAOL - Structured Audio Orchestra Language
http://sound.media.mit.edu/~eds/mpeg4-old/ This is a portable and flexible approach
to digital synthesis of sound with software - based on Csound:
http://mitpress.mit.edu/e-books/csound/frontpage.html or
http://www.firstpr.com.au/csound/ The bewildering stuff is when they start talking
about compressed coding for facial, head and body animation! Apparently, rather
than compressing a video of a person, they are planning on analysing them according
to facial structure, expression, skin texture etc and synthesising an image based on
these parameters at the receiving end. These images would then be merged together
with MPEG-2 video or some VRML nonsense. Propellerhead zone!
TwinVQ
"TwinVQ" is the proper term. But I use "VQ" at this site. "SoundVQ" is Yamaha's term
for this compression system, and files are normally stored with an extension of "VQF".
TwinVQ will also be a part of MPEG-4.
● TwinVQ (Transform-domain Weighted Interleave Vector Quantitisation)
was developed by NTT Human Interface Laboratories:
http://www.hil.ntt.co.jp/top/index_e.html. The English version of the
TwinVQ home page is: http://music.jpn.net/ .
Yamaha's site is:
http://www.yamaha-xg.com/english/xg/SoundVQ/index.html .
●
A big activist site for TwinVQ is VFQ.COM: http://www.vqf.com . They
have a discussion area, which I posted to regarding these tests. My posting is:
http://www.vqf.com/bbs/display.php3?board=VQF.comForum&DISP=2436 .
Follow this link for alternative viewpoints to my negative assessment of
TwinVQ!
●
●
Search for "twinvq" with AltaVista by clicking here!
Other related algorithms
●
Dolby AC-3 is a highly respected, proprietary, multi-channel compression system
which is also introduced at this site. DVD uses it at 384 kbps, and cinemas use it at
640 kbps. I don't know of any easy to obtain encoders or decoders for it, so have not
investigated it further. http://www.dolby.com/tech/
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (15 of 16) [1/4/2002 10:59:47 AM]
AAC, MP3 & TwinVQ Page 1 of 2
●
A relatively low loss system used for broadcasting is Audio Processing Technology's
4:1 fixed rate apt-X system. http://www.aptx.com This is a real-time, high quality,
low delay system (2.76ms for encode and decode combined) - which does not rely on
psycho-acoustic models etc. The FAQ describes it:
ADPCM as used by APT for its apt-X 4:1 compression algorithm
takes the digital signal and breaks it into four frequency sub bands
by means of a QMF digital filter. Each of these sub bands is
subsequently coded by means of predictive analysis; the coder predicts
what the next digital sample in the audio signal will be and subtracts this
prediction from the actual sample. The resulting, small error signal is
transmitted to the decoder which then adds back in the prediction from
identical tables stored in the decoder. NO psycho-acoustic auditory
mask is used to throw away any of the original audio signal resulting in
a near lossless compression system.
In March 2001, a chap from APT wrote to me that the algorithm is available on a
demo basis as a Windows DLL.
●
Microsoft (up990424) has developed a new low-bit-rate audio compression system:
http://www.microsoft.com/windows/windowsmedia/ . The encoder is available
here. An article and discussion of its merits is at MP3.COM:
http://www.mp3.com/news/230.html . As always, keep an eye on
http://www.mp3.com for the latest news.
To Page 2
http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html (16 of 16) [1/4/2002 10:59:47 AM]
EBU Technical Review
EBU Home
EBU Technical Home
EBU Technical Review
CLICK HERE to display EBU listening tests on Internet audio codecs, by G. Stoll and F. Kozamernik
(445 KB).
http://www.ebu.ch/trev_dolby_frm.html [1/4/2002 11:00:09 AM]
INTERNET AUDIO
EBU listening tests on
Internet audio
codecs
G. Stoll
IRT
F. Kozamernik
EBU
The advent of Internet multimedia has stimulated the development of
several advanced audio and video compression technologies. Although
most of these developments have taken place outside the EBU, many
members are using these low bit-rate codecs extensively for their
webcasting activities, either for downloading or live streaming. To this
end, the EBU Project Group, B/AIM (Audio in Multimedia), was asked to
carry out some tests on several low bit-rate audio codecs that are now
available on the commercial Internet market.
This article gives the results of the subjective evaluations undertaken by B/
AIM in late 1999 and early 2000. These EBU tests are the first international
attempt at comparing the different audio compression schemes used on the
Internet. In addition, prior to conducting these tests, no internationallyagreed subjective method was available for carrying out evaluations on
very low bit-rate, intermediate-quality, codecs. In order to overcome this
problem, the group was instrumental in devising a novel test method to
evaluate specifically these low-quality audio codecs. The new method is
now known as MUSHRA. Both the EBU and ITU-R have now adopted
MUSHRA as a standard evaluation method.
1. Introduction
During the last ten years or so, audio coding technology has made enormous progress.
Many advanced coding schemes have been developed and successfully used in radio
broadcasting, in storage media (e.g. CD, MiniDisc, CD-ROM, DVD) and, particularly,
over the Internet. There have been significant advances in terms of the bit-rate reduction
achieved, and the quality of the speech and music reproduced has been steadily improvEBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
1 / 24
INTERNET AUDIO
ing. Nevertheless, the biggest push in low bit-rate audio coding has taken place quite
recently, due to the fast development of the Internet where extremely low bit-rates are
required while preserving the subjective quality of the original signal. Digital radio
broadcast networks and audio automation systems are now almost completely based on
relatively low bit-rate audio coded formats. Within the next few years, the on-line sales
and distribution of music may surpass conventional physical distribution channels in
terms of market share.
2. Audio codecs market
Following the development of early digital codecs such as NICAM [1] and later ISO/IEC
MPEG 1 [2], which are both successfully used in digital broadcasting, there are currently
a large variety of different ultra low bit-rate audio codecs, specifically designed for the
Internet market. Table 1 gives a provisional list of the more important codecs. Because
of the limited bandwidth available over the Internet, extremely efficient compression
techniques for data reduction have been developed.
Current audio-coding standards were developed with relatively simple goals in mind: to
achieve the lowest possible data rate while preserving the subjective quality of the original signal. The foreseen applications were digital broadcast emissions (including DAB
and DVB), CD-ROM, DVD, etc. Since these channels assume to provide evenly-distributed single errors, error mitigation was limited to simple error detection codes which
would allow muting or interpolation of the error-affected frames at the receiver. In the
case of the Internet, the error characteristics are “block” in nature and radically lower bitrates are used, so different design approaches were necessary for optimizing the audio
quality at very low bit-rates. Consequently, many new coding schemes were developed
specifically for the Internet.
The most advanced audio compression systems spread small portions of the encoded signal – both in time and frequency – and transmit these elements interleaved and spread
among many transmission datagrams. Thus the audible effect of a lost or delayed packet
can effectively be minimized by interpolating the data between neighbouring packets. In
order to make the transmitted stream more robust, some redundancy can be added and
the critical elements of the signal can be sent multiple times.
There are additional requirements for advanced compression codecs:
cut-and-paste editing of the encoded format directly, without audible impairments,
must be possible;
it should be possible to transmit the same file at different bit-rates, in order to
adapt dynamically to network throughput and congestion.
The latter feature is extremely important as it enables optimal sharing of the bit-rate
between audio and video, and allows storage of a single file in the content database for a
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
2 / 24
INTERNET AUDIO
variety of applications – low bit-rate previews, several different medium bit-rates for
streaming, and a high bit-rate version for download or purchase.
As more and more content becomes organized into on-line databases, there is increasing
demand for efficient ways to search and categorize this content, and to package it for
consumption. It is necessary to index and create metadata using audio analysis tools
which classify many parameters of an audio signal. These tools can detect pitch, dynamics, key signature, whether or not the signal contains voice or a musical instrument, how
similar the voice is to another voice, etc. Coded formats must support efficient classification. With the adoption of Apple’s QuickTime as the basis of the ISO MPEG-4 file and
Table 1
Most popular streaming audio and/or video systems (status: June 1999).
Product Name
1
Advanced Audio Coding
(AAC) – MPEG-4
2
Audioactive
3
AudioSoft
4
Destiny Internet
5
Command Engine (DICE)
6
I-Media
7
Intel Streaming Media
8
Company
Audio/Video
Platform
A
Telos
A
Win, Mac
Eurodat
A
Win, Mac
Destiny Software
A
Win
I
Q-Design
A
Win
Intel
A/V
Win
Internet Wave
Vocaltec
A
Win
9
InterVU
InterVU
A/V
Win, Mac
10
MP3
A
Win, Mac
11
Netscape Media
Netscape
A/V
Win, Mac, Unix
12
QuickTime
Apple
A/V
Win, Mac
13
RealAudio
Progressive
Networks
A/V
Win, Mac, Unix
14
ShockWave
Macromedia
A/V
Win, Mac
15
Stream Works
Xing Technologies
A/V
Win, Mac, Unix
16
TrueSpeech
DSP Group
A
Win
17
ToolVox
VoxWare
A
Win, Mac, Unix
18
VDOLive
VDOnet
A/V
Win, Mac
19
Vosaic
Univ. of Illinois
A/V
Win, Mac, Unix
20
Win Media-Player
Microsoft
A/V
Win
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
3 / 24
INTERNET AUDIO
streaming format, there is a strong common standard architecture defined for the next
generation of multimedia systems.
The advent of such a large number of audio codecs has brought a radically new approach
to standardization. Standards have become less important, since decoders (which are
normally simple and do not require a lot of processing power) are downloadable (possibly in the form of a Java applet) to the client machine along with the content.
In the Internet environment there is no longer a need for a single coding system as is the
case in conventional broadcasting. Indeed, RealAudio is no longer the only, and not
even the main, audio technology used over the Internet.
From the user point of view, it is irrelevant which audio codec is being used – as long as
the technical and commercial performance is comparable. Service providers decide
which coding scheme to use. One of the advantages of this “deregulated” approach is
that decoders can be regularly updated as the technology advances. The user can have
the latest version of the decoder all the time. Audio players can be stored in a flash
memory and not on a hard disk.
Browsers or operating systems are usually shipped with a few audio plug-ins. New plugins can be downloaded easily. The user is no longer restricted to the use of plug-ins that
came with the browser but is free to install any new decoder as appropriate.
The business model of audio streaming is likely to change due to the advent of multicasting. Today, ISPs charge per audio stream. In multicasting situations, however, a single
stream will be delivered to several users. The user will then be charged according to the
occupancy of the servers used. Due to the huge competition in the audio decoder market,
audio streamers will be increasingly available for free.
3. Audio quality assessments
One of the principal characteristics of the current Internet audio codecs is that they experience a large variation in terms of the audio quality achieved for different bit-rates and
different audio signals. In addition, they vary in terms of cost, the computation power
required (real time), complexity of handling, reliability of the server, the service quality
(ruggedness against errors), scalability and marketplace penetration.
The main reason for this is that there is no standard. Even in the MPEG family of standards, the implementation of audio encoders is not standardized, allowing for a large variety of possible implementations in the marketplace. Since the encoder is not
standardized, some improvements are possible while keeping the user’s decoder terminal
unchanged.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
4 / 24
INTERNET AUDIO
Analogue sound systems are measured in terms of the signal-to-noise ratio (S/N) and
bandwidth, and they exhibit some harmonic distortions and wide-band noise. Typical
artefacts of digital Internet audio codecs are not “harmonic”; they are usually less pleasant for the listener and are often more noticeable and disturbing.
In order to assess the quality of an audio signal under controlled and repeatable conditions, subjective listening tests using a number of qualified listeners and a selection of
audio sequences are still recognized as being the most reliable way of quality assessment. ITU-R Recommendation BS.1116-1 [3] is used for the evaluation of high-quality
digital audio codecs, exhibiting small impairments of the signal. On the Internet 1 however, medium or even low-quality codecs should be acceptable and are unavoidable.
Thus, compromises in the audio quality are necessary. The test method defined in
BS.1116-1 is not suitable for assessing such lower audio qualities; it is generally too sensitive, leading to a grouping of results at the bottom of the scale.
This is the main reason that EBU Project Group B/AIM proposed a new test method,
termed MUSHRA “MUlti Stimulus test with Hidden Reference and Anchors” [4] 2.
This method has been designed to give a reliable and repeatable measure of the audio
quality of intermediate-quality signals. The method is in the process of being standardized by the ITU-R [5].
4. The EBU MUSHRA method
Regardless of the method used, the conducting of subjective evaluation tests is generally
a highly complex time-consuming and costly process which requires very careful preparation and carrying out, followed by statistical processing of the results 3. Each of these
three phases is briefly described below and is contrasted with ITU-R Recommendation
BS.1116-1.
1.
Other applications that may require low bit-rate codecs – due to low available bandwidths – and
which support intermediate audio quality are digital AM (that is DRM - Digital Radio Mondiale), digital satellite broadcasting, commentary circuits in radio and TV, audio-on-demand services and
audio-on-dial-up lines.
2.
This inelegant name was agreed by the majority of B/AIM members in spite of some reservations
concerning the aesthetic appeal of the acronym. However, taking into account the large impairments and poor audio quality encountered, and the need to endure unpleasant and repetitive listening to the numerous test items, this name does not seem so inadequate.
3.
While several such methods have recently been developed (e.g. the new ITU-R PEAQ Standard
which has been successfully verified at high audio-quality levels), they are not yet mature and reliable enough to be used in large-scale evaluation tests which feature low and intermediate quality
audio, such as the tests described in this article.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
5 / 24
INTERNET AUDIO
4.1. How MUSHRA works
Whereas BS.1116-1 uses a “double-blind triple-stimulus with hidden reference” test
method, MUSHRA is a “double-blind multi-stimulus” test method with hidden reference
and hidden anchors.
The MUSHRA approach is felt to be more appropriate for evaluating medium and large
impairments.
MUSHRA also has the advantage that it provides an absolute measure of the audio quality of a codec which can be compared directly with the reference, i.e. the original audio
signal as well as the anchors. Such an absolute measure is necessary in order to be able
to compare the results with any other similar tests. If the reference is narrow-band (say
7 kHz), then the codecs under test tend to be rated higher, and this may sometimes lead
to very misleading results (e.g. the NADIB test results).
In a test involving small impairments, assessors are asked to detect and assess any perceptible annoyance of artefacts which may be present in the signal. A hidden reference
signal helps the assessor to detect these artefacts. On the other hand, in a test with relatively large impairments, the assessor should normally have no difficulty in detecting the
artefacts and, therefore, a hidden reference is not necessary. The difficulty however
arises when the assessor must grade the relative annoyances of the various artefacts. The
assessors are asked to judge their degree of “preference” for one type of artefact versus
some other type of artefact.
As MUSHRA is intended for evaluating medium and large impairments, the use of a
high-quality reference (as used in BS.1116-1) is to be questioned. The perceptual distance between the reference and the test items is expected to be relatively large. On the
other hand, the perceptual distances between the test items belonging to different systems may be quite small. Thus, if each system is only compared with the reference, the
differences between any two systems may be too small to discriminate between them.
Consequently, MUSHRA uses not only a high-quality reference but also a direct paired
comparison between different systems. The assessor can switch at will between the reference signal and any of the systems under test. By way of comparison, in BS.1116-1 the
assessor is asked to assess the impairments on “B” compared to a known reference “A”
and then to assess “C” compared to “A”, where B and C are randomly assigned to a hidden reference and the object under test.
Because the assessors can directly compare the impaired signals, they can relatively easily detect differences between the impaired signals and can then grade them accordingly.
This feature permits a high degree of resolution in the grades given to the systems. It is
important to note, however, that assessors will derive their grade for a given system by
comparing that system to the reference signal, as well as to the other signals in each trial.
In the EBU tests, a computer-controlled replay system was used, although other mechanisms using multiple CD or tape machines can also be used. In a given session, the
assessor is presented with a sequence of trials. In each trial, the assessor is presented
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
6 / 24
INTERNET AUDIO
with the reference version as well as all versions of the test signal processed by the systems under test. For example, if a test contains seven audio systems, then the assessor is
allowed to switch instantly among at least ten signals (one “known” reference + seven
impaired signals + one “hidden” reference + at least one “hidden” anchor). Depending
on the test, more than one anchor might be used.
During an ITU-R Rec. BS.1116-1 test, assessors tend to approach a given trial by starting
with a detection process, followed by a grading process. In MUSHRA, assessors tend to
begin a session with a rough estimation of the quality. This is followed by a sorting or
ranking process and finally the assessor performs the grading process. Since the ranking
is done in a direct fashion, the results are likely to be more consistent and reliable than
for the BS.1116-1 method.
4.2. Grading process
The grading scale used in the MUSHRA process is different from the one used in
BS.1116-1 which uses the five-grade impairment scale given in ITU-R Recommendation
BS.562 [6] In MUSHRA, the assessors are required to score the stimuli according to the
five-interval Continuous Quality Scale (CQS) 4. The CQS consists of identical graphical
scales (typically 10 cm long or more, with an internal numerical representation in the
range of 0 to 100) which are divided into five equal intervals with the following descriptors from top to bottom:
Excellent
Good
Fair
Poor
Bad
The listeners record their assessments of the quality in a suitable form; for example, with
the use of sliders on an electronic display (see Fig. 1), or by using a pen and paper scale.
4.3. Reference signals
MUSHRA uses an unprocessed original programme material of full bandwidth as the
reference signal. In addition, at least one additional signal (anchor) – being a low-pass
filtered version of the unprocessed signal – should be used. The bandwidth of this additional signal should be 3.5 kHz. Depending on the context of the test, additional anchors
can be used optionally. Other types of anchors, showing similar types of impairments as
the systems under test, can also be used. For example, these types of impairments can
include any of the following possibilities:
4.
This scale is also used for the evaluation of picture quality (ITU-R Recommendation BT.500-8 [7]).
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
7 / 24
INTERNET AUDIO
bandwidth limitation of 7.0 kHz or 10 kHz;
reduced stereo image;
additional noise;
drop-outs;
packet losses.
In the EBU tests, two anchor sequences, i.e. low-pass filtered (3.5 and 7 kHz) versions of
the unprocessed signals, were used. In BS.1116-1, the known reference is always available as stimulus “A”: the hidden reference and the object are simultaneously available but
are randomly assigned to “B” and “C”.
4.4. User interface
Compared to ITU-R Rec.
BS.1116-1, the MUSHRA
method has the advantage of
displaying all stimuli for one
test item at a given bit-rate at
the same time (see Fig. 1).
The assessors are therefore
able to carry out any comparison between them directly.
The time consumption for the
test is significantly lower
than for BS.1116 tests.
Fig. 1 shows the user-interface which was used for each
session. The buttons repreFigure 1
sent the reference (which is
User interface for MUSHRA tests.
specially displayed on the top
left) and all the codecs under
test, including the hidden reference and both processed references, i.e. the two anchors.
Under each button, with the exception of the button for the reference, a slider is used to
grade the quality of the test item according to the continuous quality scale used. For
each of the test items, the signals under test are randomly assigned. In addition, the test
items are randomized for each subject within a session. To avoid sequential effects, each
assessor runs the five sessions in randomized order.
4.5. Selection of assessors
As in BS.1116-1, listening assessors (i.e. evaluators) should have certain experience in
listening critically to the sound sequences. Although the impairments caused by the
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
8 / 24
INTERNET AUDIO
Internet audio codecs are generally quite high and therefore relatively easy to detect,
experience shows that experienced listeners give more reliable results, and more
quickly than non-experienced listeners. However, non-experienced listeners generally become sensitive enough to the various types of artefacts after frequent exposure. There are methods of pre- and post-screening to eliminate assessors that are
not able to discriminate between different artefacts with sufficient accuracy.
4.6. Training phase
In order to get reliable results, it is mandatory to train the assessors at special training
sessions in advance of the test. This training has been found to be important for obtaining reliable results. The training should at least expose the assessor to the full range and
nature of the impairments and all the test signals that will be experienced during the test.
This may be achieved using several methods: a simple tape replay system or an interactive computer-controlled system.
4.7. Test material
The choice of test material is crucial to the success of the tests and is far from being a
simple matter. The MUSHRA method uses a selection of ordinary, unprocessed, broad-
Abbreviations
AAC
(MPEG-2/4) Advanced Audio Coding
AIFF
(Apple) Audio Interchange File Format
ASF
(Microsoft) Advanced Streaming
Format
CFI
Confidence interval
CQS
Continuous quality scale
DR
Danmarks Radio (Denmark)
DVB
Digital Video Broadcasting
DVD
Digital versatile disc
FhG-IIS
IEC
IRT
Institut für Rundfunktechnik
GmbH (German broadcast engineering research centre)
ISDN
Integrated services digital network
ISO
International Organization for
Standardization
ITU-R
International Telecommunication
Union, Radiocommunication Sector
MPEG
Moving Picture Experts Group
MUSHRA (EBU) MUlti Stimulus test with Hidden Reference and Anchors
NICAM
Near-instantaneous companding
and multiplexing
Fraunhofer Gesellschaft – Institut
für Integrierte Schaltungen
NOS
Nederlandse Omroep Stichting
(Holland)
International Electrotechnical
Commission
NRK
Norsk rikskringkasting (Norway)
SR
Sveriges Television Ab (Sweden)
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
9 / 24
INTERNET AUDIO
cast programme sequences – consisting of pure speech, a mixture of speech, music and
background noise, and music only. In contrast, BS.1116-1 uses very critical test
sequences specifically chosen to “stress” or even “break” the codec tested and to reveal
some audible artefacts. The length of the sequences should typically not exceed 20 s to
avoid fatiguing the listeners and also to reduce the total duration of the listening tests.
In order to reveal the differences among the systems under test, the material should be
sufficiently critical for each system to be tested. Searching for suitable material is often
time consuming; however, unless truly critical material is found for each system, tests
may fail to reveal differences among systems and may be inconclusive. On the other
hand, too-critical signals (e.g. synthetic, rather than “natural” broadcast programmes)
which are deliberately designed to break a specific system should not be used. Care
should be taken that the artistic or intellectual content of a programme sequence should
be neither so attractive nor so disagreeable or wearisome that the assessors are distracted
from focusing on the detection of impairments. The choice should reflect the expected
likelihood of occurrence of each type of programme material in actual broadcasts 5.
For the purpose of preparing subjective comparison test tapes, the loudness of each
excerpt needs to be adjusted subjectively by the group of skilled assessors – the so-called
“experts panel” – prior to recording it on the test media. This will allow subsequent use
of the test media at a fixed gain setting for all the programme items within a test trial.
For all test sequences, the group of skilled assessors shall convene and come to a consensus on the relative sound levels of the individual test excerpts. In addition, the experts
should come to a consensus on the absolute reproduced sound pressure level for the
sequence as a whole, relative to the alignment level. A tone burst 6 at alignment signal
level may be included at the head of each recording to enable its output alignment level
to be adjusted to the input alignment level required by the reproduction channel [8]. The
tone burst is only for alignment purposes: it should not be replayed during the test. The
sound-programme signal should be controlled so that the amplitudes of the peaks only
rarely exceed the peak amplitude of the permitted maximum signal defined in ITU-R
Recommendation ITU BS.645 [9] (a sine wave 9 dB above the alignment level).
The number of test items to be included in a test can vary but it should not be too large,
otherwise tests would simply be too long. A reasonable number seems to be around 1.5
times the number of systems under test, with a minimum of 5 items per system. Audio
sequences should typically be 10 s to 20 s long. All systems should be tested with the
same selection of test items.
The performance of a multichannel system, under the conditions of two-channel playback, shall be tested using a reference down-mix. Although the use of a fixed down-mix
may be considered to be restricting in some circumstances, it is undoubtedly the most
5.
This condition may be fulfilled with some difficulty since the nature of broadcast material may vary
from one station to another and may change in time as musical styles and preferences evolve.
6.
For example 1 kHz, 300 ms, -18 dBFS
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
10 / 24
INTERNET AUDIO
sensible option for use by broadcasters in the long run. The equations for the reference
down-mix [10] are given in:
L0 = 1.00 L + 0.71C + 0.71Ls
R0 = 1.00 R + 0.71C + 0.71Rs
It goes without saying that the pre-selection of suitable test excerpts for the critical evaluation of the performance of a reference two-channel down-mix should be based on the
reproduction of two-channel down-mixed programme material.
4.8. Listening conditions
The listening tests should be conducted under strictly-controlled conditions as specified
in Sections 7 and 8 of ITU-R Recommendation BS.1116-1. Either headphones or loudspeakers are allowed. However, the use of both within one test session is not permitted.
All assessors must use the same type of transducer.
Individual adjustment of listening level by a assessor is allowed within a session and
should be limited within the range of ± 4 dB relative to the reference level defined in
BS.1116-1. The balance between the test items in one test should be provided by the
selection panel in such a way that the assessors would normally not need to perform individual adjustments for each item. Level adjustments inside one item should not be
allowed.
4.9. Statistical analysis
The statistical analysis of the results obtained is perhaps one of the most demanding
tasks. Its purpose is to apply some mathematical operations to the raw data obtained, and
then present the results in a user-friendly manner.
The assessments for each test condition are converted linearly from measurements of
length on the score sheet to normalized scores in the range 0 to 100, where 0 corresponds
to the bottom of the scale (bad quality). Then, the absolute scores are calculated as follows.
Calculation of the averages of the normalized scores of all listeners who remain after
post-screening will result in the Mean Subjective Scores (MSS).
The first step in the analysis of the results is the calculation of the mean score, u jk for
each of the presentations:
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
11 / 24
INTERNET AUDIO
u jk =
where:
1 N
å uijk
N i =1
(1)
ui = the score of observer i for a given test condition j and sequence k
N = the number of observers.
Similarly, overall mean scores, u j and u k , could be calculated for each test condition
and each test sequence.
When presenting the results of a test, all mean scores should have an associated confidence interval which is derived from the standard deviation and size of each sample.
It is proposed to use the 95% confidence interval which is given by:
[u
where:
jk
− δ jk , u jk + δ jk
δ jk = 1.96
]
S jk
N
(2)
The standard deviation for each presentation, Sjk, is given by:
S jk =
N
å
i =1
(u jk − uijk ) 2
( N − 1)
(3)
With a probability of 95%, the absolute value of the difference between the experimental
mean score and the “true” mean score (for a very high number of observers) is smaller
than the 95% confidence interval, on condition that the distribution of the individual
scores meets certain requirements.
Similarly, a standard deviation Sj could be calculated for each test condition. It is noted
however that this standard deviation will, in cases where a small number of test
sequences are used, be influenced more by differences between the test sequences used
than by variations between the assessors participating in the assessment.
Experience has shown that the scores obtained for different test sequences are dependent
on the criticality of the test material used. A more complete understanding of system
performance can be obtained by presenting results for different test sequences separately,
rather than only as aggregated averages across all the test sequences used in the assessment.
For each test parameter, the mean and 95% confidence interval of the statistical distribution of the assessment grades must be given.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
12 / 24
INTERNET AUDIO
5. The EBU tests
The following seven audio codecs were tested:
Microsoft Windows Media 4
MPEG-2 AAC (implementation by FhG-IIS)
MP3 (close to MPEG-1 and MPEG-2 Layer III, implementation by Opticom)
Q-Design Music Codec 2
RealNetworks 5.0
RealNetworks G2
Yamaha SoundVQ
Each of these codecs was tested at five different bit-rates: 16, 20, 32, 48 and 64 kbit/s.
The test was divided into five sessions, according to the five different bit-rates used. In
each of these sessions (with the exception of Sessions 4 and 5 7), all seven codecs were
tested.
Session 1: codecs at 16 kbit/sec, mono;
Session 2: codecs at 20 kbit/sec, stereo;
Session 3: codecs at 32 kbit/sec, stereo;
Session 4: codecs at 48 kbit/sec, stereo;
Session 5: codecs at 64 kbit/sec, stereo.
The test material was partly taken from earlier Internet Radio listening tests, but also
comprised completely new material. The test material consisted of critical, but ordinary
broadcast material. It contained pure speech, speech together with music or background
noise, as well as music only. The length of the sequences was set to a maximum of 17 s,
with a typical length of about 12 s.
The audio items shown in Table 2 were used for the MUSHRA tests.
The bitstreams produced by the encoders under test at the IRT were sent to T-Nova
(Berkom) for verification. The bit-rate was checked for each test item by calculating the
size of the encoded file according to the length of the sequence.
Then all bitstreams were decoded or replayed for a subjective check of the technical
quality of the items. This was done in order to find any errors which were not caused by
the encoding-decoding process. By doing this, an additional check of the bit-rate, as
shown in the display of the decoder or player, was done.
7.
One of the codecs (i.e. RealAudio 5) did not support 48 and 64 kbit/s and could not be tested in
Sessions 4 and 5.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
13 / 24
INTERNET AUDIO
Table 2
Audio test items which were selected for the listening tests
Type of audio content
Audio item
Recorded
by
Comments
1
Classical music
Mozart: Requiem –
beginning of Dies Irae
IRT
New item
2
Broadcast
programme
Female speech (Dutch) &
Music
NOB
Used already by EBU
B/IR group
3
Broadcast
programme
Female speech (Danish)
DR
Used already by EBU
B/IR group
4
Folk music
Swedish Folk Music
SR
Used in ITU-R tests
(ITU-R TG 10/2)
5
Live broadcast
programme
Ice-hockey commentary
IRT
New item
6
Jazz music
Lee Ritenour
GRP-Records
New item
7
Broadcast
programme
Male speech (Danish)
DR
Used already by EBU
B/IR group
8
Pop music
Chris Rea – On the
beach
New item
9
Pop music
Susan Vega – Tom's
dinner
Used already in previous MPEG-tests
6. Summary of test results
The EBU listening tests on Internet audio coding schemes confirmed that the new
MUSHRA methodology provides small confidence intervals and thus reliable and stable
results. The tests also showed that the evaluation results are repeatable and reproducible.
In the following, the main results of each session are described. The main test results are
given in Fig 2. More detailed results are available in [4].
6.1. Results for 16 kbit/s per mono signal
The results for a bit-rate of 16 kbit/s per mono signal are given in Fig. 2a. These results
show that the quality provided by all tested codecs at a bit-rate of 16 kbit/s is significantly lower than the subjective quality of the 7 kHz low-pass anchor. Even more, at this
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
14 / 24
Click here to download larger versions of these charts (628 KB)
INTERNET AUDIO
a) 16 kbit/s, mono
b) 20 kbit/s, stereo
c) 32 kbit/s, stereo
d) 48 kbit/s, stereo
e) 64 kbit/s, stereo
f) Hidden Ref., 3.5
and 7 kHz low-pass
anchor signals
g) MPEG-2/4 AAC,
MS Windows Media
4, Opticom MP3
and RealNetworks
G2
h) Q-Design Music Codec
2, Real-Networks 5
and Yamaha TwinVQ
i) RealNetworks 5 and
RealNetworks G2
Figure 2
Mean and 95% confidence interval at the
various bit-rates tested, compared with
the hidden reference and the bandwidthlimited pilots.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
15 / 24
INTERNET AUDIO
bit-rate no codec is better than the 3.5 kHz low-pass anchor. The difference between the
different codecs seems to be relatively small, with a grade of about 40 for the best and 25
for the worst.
However, looking at the figures with the detailed results, in particular at those which
show the individual test items per codec, it becomes obvious that there are large differences among the different codecs. For example, at 16 kbit/s, the Q-Design Music Codec
2 gives very good quality with all the music-only items. The quality with the folk music
item is no different from that of the 7 kHz low-pass anchor, which is in the range of
“good quality”. The same behaviour can be found for the jazz item. However, this QDesign codec does not perform so well in cases where music is overlaid by a human
voice, or with speech-only items.
6.2. Results for 20 kbit/s per stereo signal
The results for a bit-rate of 20 kbit/s per stereo signal are given in Fig. 2b. These results
show that the quality provided by all the tested codecs is still significantly lower than the
subjective quality of the 7 kHz low-pass anchor. As in the case of 16 kbit/s mono, the
quality at 20 kbit/s per stereo signal is also lower than that of the 3.5 kHz low-pass
anchor. Comparing the results of Sessions 1 and 2 (i.e. Figs. 2a and 2b), the subjective
quality of the 20 kbit/s stereo signal is slightly worse than that of the 16 kbit/s mono signal, for most of the codecs tested. However, in the case of the low-pass filtered anchors,
there is no difference between Figs. 2a and 2b (because the only difference between
those sessions was that monophonic signals were used in Session 1 and stereophonic in
Session 2).
Again, the Q-Design Music Codec 2 showed a very peculiar behaviour. With the two
music-only items, it demonstrated good quality. In case of the folk song, the stereo performance was even better than the mono case. However, as soon as human voices were
involved in the audio item, the quality of the Q-Design Music Codec 2 dropped significantly.
6.3. Results for 32 kbit/s per stereo signal
The results for a bit-rate of 32 kbit/s per stereo signal are given in Fig. 2c. The most
obvious result here is that, at this bit-rate, the differences between the various codecs
becomes more pronounced. The difference between the best and the worst codec is
about 25 points on the 100-point scale, whereas this difference was only about 15 in the
case of 16 kbit/s mono. The better codecs are already approaching the subjective quality
of the 3.5 kHz low-pass anchor.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
16 / 24
INTERNET AUDIO
6.4. Results for 48 kbit/s per stereo signal
The results for a bit-rate of 48 kbit/s per stereo signal are given in Fig. 2d. The MPEG-2/
4 AAC and the Opticom MP3 codecs exhibit a “fair” quality level comparable to that of
the 7 kHz low-pass filtered anchor. Microsoft Windows Media 4, Q-Design Music
Codec 2, RealNetworks G2 and Yamaha TwinVQ are similar to the 3.5 kHz low-pass filtered anchor. It should be pointed out that, for certain audio items (e.g. folk music), the
quality of the Windows Media 4 codec was indistinguishable from the hidden reference,
whereas the MPEG-2/4 AAC and Opticom MP3 codecs produced a mean value of only
63, i.e. in the range of “good” quality. Considering the results of the Q-Design Music
Codec 2, it is interesting to note that the quality at 48 kbit/s did not increase significantly
over the quality assessed at 20 kbit/s, for most of the audio items.
6.5. Results for 64 kbit/s per stereo signal
The results for a bit-rate of 64 kbit/s per stereo signal are given in Fig. 2e. Several
codecs showed very promising results at this bit-rate. In particular, the MPEG-2/4 AAC
codec came close to the hidden reference, achieving an overall average of 80 points. It
was the only codec in the 64 kbit/s test which was evaluated in the “excellent” range for
all the items. Both the MPEG-2/4 AAC codec and the Microsoft Windows Media 4
codec exceeded the quality of the 7 kHz low-pass filtered anchor. The difference
between the best and the worst codec was more than 40 points, i.e. the quality differences
between the various codecs was greater.
6.6. Results for the hidden anchor and low-pass filtered
anchors
As shown in Fig. 2f, the Confidence Interval (CFI) for the full-bandwidth reference signal increased at 48 and 64 kbit/s. This was because some of the subjects failed to detect
(identify) the hidden reference during the 48 and 64-kbit/s tests. This shows that, even at
the relatively low bit-rates considered in these tests, some codecs are capable of offering
a quality comparable to the full-bandwidth reference.
In most cases, the CFI of the 7 kHz anchor was evaluated in the range “good” for all the
bit-rates tested. The evaluation rating of the 7 kHz anchor however dropped somewhat
as the bit-rate was increased, which means that the evaluation of the 7 kHz anchor has
some dependency on the bit-rates being evaluated.
The CFI of the 3.5 kHz anchor was evaluated well within the range “fair” at all the bitrates tested. Again, there was a tendency for the evaluation rating of the 3.5 kHz pilot to
drop when the bit-rate was increased. However, the CFI intervals seem to overlap when
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
17 / 24
INTERNET AUDIO
comparing the lowest and the highest bit-rates tested, which indicates that the MUSHRA
method is an absolute grading system which gives stable and reliable results.
6.7. Mean and 95% confidence interval
Figs. 2g, 2f and 2i depict the mean values of the scores and the 95% confidence intervals
for the different bit-rates. These charts show that the measurements were very consistent, thus confirming the validity of the MUSHRA method.
7. Main features of the codecs tested
7.1. Microsoft Windows Media 4
This audio system, based on Windows Media Technologies 4.0 and revealed at NAB 99,
has two basic codecs that were specifically designed for encoding music and voice content. The encoding speed is rather fast, allowing for real-time encoding on a standard
PC, and it can be compared to RealNetworks G2. The multi-threaded architecture
increases encoding performance when using more than one processor, i.e. dual-processor
systems encode at nearly twice the speed as single-processor systems. MS Media 4
Audio offers a very wide bit-rate range from 5 kbit/s to 128 kbit/s with an 8 kHz to
48 kHz sampling rate, in both mono and stereo. The Media 4 codec is a proprietary system, developed by Microsoft. The version which was tested was an update from August
1999.
For the encoding of voice, Windows Media 4 uses a specially-designed voice codec for
compressing the human voice to produce high quality wide-band audio at very low bitrates. It is based on the ACELP technology and supports bit-rates from 5 kbit/s to
16 kbit/s. This codec was developed by Sipro Lab Telecom.
With Windows Media Technologies version 4.0, content providers can offer as many as
five different bit-rates (multi-bit-rate streams) for both on-demand and live streams in a
single Advanced Streaming Format (ASF) file. When Windows Media Services and
Windows Media Player connect, they automatically determine the available bandwidth.
The server then selects and serves the appropriate audio stream. If the available bandwidth changes during a transmission, the server will automatically detect this and switch
to a stream with a higher or lower bit-rate.
7.2. MPEG-2, MPEG-4 AAC
AAC forms part of the MPEG-2 and MPEG-4 standards. It uses waveform coding,
based on the modified discrete cosine transform (MDCT) of variable length. To prevent
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
18 / 24
Click here to download larger versions of these charts (386 KB)
INTERNET AUDIO
a) Windows Media 4
at 48 kbit/s
b) MPEG AAC at
32 kbit/s
c) MP3 at 20 kbit/s
d) Q-Design Music
Codec 2 at
16 kbit/s
e) RealNetworks
Real 5 at 20 kbit/s
Figure 3
Selected results from the codecs tested.
AAC from becoming a medium for music piracy, AAC is currently only available in
secure formats. At present, an Internet application of AAC is only available from Liquid
Audio. This specific implementation does not support live streaming nor does it allow
replay of AAC-encoded files from normal servers. Currently the system is applicable
only to the secure distribution of music over the Internet. In order to prevent music
piracy, a specially-certified Liquid Audio server is needed. Other implementations for
the use of AAC on the Internet are expected to be available soon. Besides the Internet,
AAC will be used in the Japanese HDTV system.
The AAC coder used in this test was the MPEG-2 AAC Main profile encoder according
to ISO/IEC 13818-7, implemented by FhG-IIS. AAC was used with four sampling rates
between 8 and 32 kHz, depending on the bit-rates in use.
7.3. MPEG-1, MPEG-2 Layer 3 (MP3)
MP3 characterizes a special file format which is mainly used for streaming or downloading of audio files, but also for broadcasting applications (e.g. contributions via ISDN, the
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
19 / 24
INTERNET AUDIO
satellite broadcasting system WorldSpace). MP3 is based on the ISO/IEC MPEG Layer
3 standard. There exist several implementations of MP3 encoders and plenty of decoder
implementations on the market. The most popular encoders are AudioActive (from
Telos Systems), MP3 Producer (from Opticom) and MP3 Live! (from Xing Technologies). All these implementations provide both the standardized sampling rates of ISO/
IEC 11172-3 and ISO/IEC 13818-3 and a proprietary extension to very-low sampling
rates, named “MPEG-2.5”. The MP3 Live! encoder – together with Xing Streamworks
MP3 streaming technology, or the AudioActive system using the Microsoft Advanced
Streaming Format – are usually taken for live streaming of MP3.
For the EBU tests, Opticom’s software encoder and decoder were used. At bit-rates of
48 kbit/s and 64 kbit/s, MP3 was used fully compliant with the MPEG standards whereas
at the lower bit-rates, a sampling frequency of 11 kHz (from the MPEG-2.5 extension)
was used.
7.4. Q-Design Music Codec 2
This codec runs under the QuickTime 4.0 multimedia platform, which previously was
designed only for the downloading of audio and/or video. However, since April 1999
with the first public release of the beta-version of QuickTime 4.0, live-streaming is also
supported. The Music Codec 2, is based on a completely new, proprietary, parametric
coding system of which details are not available. The public version, which ships without any charge along with the QuickTime 4.0 platform, takes a lot of processing power
and thus is very slow. Real-time encoding is more or less impossible with this version.
A professional version which automatically adjusts itself to all the necessary refinements
involved in audio processing, offers a significantly higher processing speed, allowing for
real-time coding on a current standard PC or Mac. A new prototype version was used for
the EBU tests, and was not commercially available at the time. The sampling rate was
fixed at 44.1 kHz, at all the bit-rates tested 8.
7.5. RealAudio 5.0 and RealNetworks G2
The RealAudio encoder and decoder is a proprietary coding algorithm which supports
different coding options with different flavours of the codec.
The RealNetworks G2 audio system is used exclusively for live streaming of audio or the
streaming of audio files. However, the creation of WAV or AIFF files is disabled for
copy protection reasons. The new G2 system – based on DolbyNet coding technology –
provides a big step forward when compared with RealAudio 5.0, thanks to its scalability.
To this end, G2 can be used simultaneously on ISDN networks at 64 kbit/s as well as
8.
Results below a bit-rate of 32 kbit/s may not be valid for this codec, because a lower sampling frequency might have shown better results.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
20 / 24
INTERNET AUDIO
with a modem of only 14.4 kbit/s capacity. A number of parallel streams, typically up to
six, can be created simultaneously within one audio file. The system flexibly allows the
quality to be reduced if the available bandwidth reduces (as frequently occurs during
Internet rush-hour periods). This facility can be compared to the Intelligent Streaming
system used by Windows Media 4.0.
7.6. Yamaha SoundVQ
The Yamaha SoundVQ is a TwinVQ (Transform-domain Weighted Interleave Vector
Quantization) coder. It is based on an audio compression technology developed by the
NTT Human Interface Laboratories, in which patterns are developed from multiple units
of data and compared with standard patterns: compressed code for similar patterns is
transmitted. This provides high quality and high compression ratios. The TwinYQ algorithm has been standardized by MPEG-4 Audio. “SoundVQ” is not limited to the distribution of audio data from home pages. It can also be used for voicemail or audio bulletin
boards, or for CD-ROMs containing large amounts of audio data. By using the SoundVQ
“encoder”, anyone can easily create data for distribution. The compression ratio can be
selected, allowing the audio data to be compressed from 1/10th to as much as 1/20th of
its original size. Since encoded files do not require a special server for distribution, individuals may distribute audio data regardless of their Internet service provider. The
“player” is used in conjunction with Internet browsing software, and allows audio to be
played back from the user’s computer, simply by accessing a homepage.
8. General conclusions
These EBU tests on Internet audio codecs represent a major collaborative achievement
among EBU members. They also confirm the well-established EBU role in performing
large-scale independent and commercially-neutral evaluations of advanced digital technologies. Following a thorough examination of the test results, the following main conclusions may be drawn:
The AAC codec is the only one in the tests which was evaluated in the range
“Excellent” at 64 kbit/s, for all the audio items evaluated.
The Q-Design and RealNetworks 5 codecs produced, over most of the audio items
assessed, a grading in the range “Poor” or “Bad”, independent of the bit-rate used.
At 16 kbit/s, the Confidence Intervals of the MPEG-2/4 AAC coder are fully or
partly within the range of “Fair”, except for two items (i.e. Male and Classics). At
64 kbit/s, the Confidence Interval is fully or partly within “Excellent”, with the
exception of two items (i.e. Ice-hockey and Classics).
MS Windows Media 4 has a quite non-uniform distribution over the different
audio items and bit-rates. At 16 kbit/s, the quality varies mainly between the
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
21 / 24
INTERNET AUDIO
ranges “Fair” and “Poor”. At 64 kbit/s, depending on the audio item tested, the
quality level could be “Excellent”, “Good”, “Fair” or even “Poor”.
The Opticom codec quality is mainly in the quality range “Poor” at the lowest bitrate, and mainly “Good” at the highest bit-rate.
The quality range of the Q-Design Music Codec 2 is very much dependent on the
nature of the audio item, and not very much on the chosen bit-rate. The items Folk
and Jazz reach a quality level of “Good” even at the lowest bit-rate, but most of
the remaining items are placed in the category “Fair” or “Bad” even at the highest
bit-rate.
The RealNetworks 5 codec was tested only at the three lowest bit-rates under test:
16 kbit/s, 20 kbit/s and 32 kbit/s. The quality evaluation of this codec is mainly in
the category “Fair” and is independent of bit-rate.
Franc Kozamernik graduated in 1972 from the Faculty of Electrotechnical
Engineering, University of Ljubljana, Slovenia. Since 1985 he has been with
the European Broadcasting Union (EBU). As a Senior Engineer, he has been
involved in a variety of engineering activities, ranging from digital audio
broadcasting and audio source coding to the RF aspects of the various
audio and video broadcasting system developments. In particular, he contributed to the development and standardization of the DAB and DVB systems.
Currently Mr Kozamernik is the co-ordinator of several EBU research and development Project
Groups including B/AIM (Audio in Multimedia) and B/BMW (Broadcasting of Multimedia on the
Web). He is also involved in several IST collaborative projects, such as SAMBITS (Advanced
Services Market Survey / Deployment Strategies and Requirement / Specification of Integrated
Broadcast and Internet Multimedia Services), Hypermedia and S3M.
Franc Kozamernik was instrumental in establishing the EuroDAB Forum in 1994 to promote
and roll out DAB, and acted as the Project Director of the WorldDAB Forum until the end of
1999. He represents the EBU in Module A of the WorldDAB Forum. He is also a member of
the World Web Consortium (W3C) Advisory Committee.
Gerhard Stoll studied electrical engineering, with the main emphasis on
communications theory and psycho-acoustics, at the universities of Stuttgart and Munich. In 1984 he joined the IRT – the research centre of the
public broadcasters in Germany, Austria and Switzerland – and became
head of the psycho-acoustics group. At the IRT, he was responsible for the
development of the MPEG-Audio Layer II standard.
Mt Stoll was/is also a member of different standardizations groups, such as
MPEG, Eureka-147, DAB, DVB and the EBU, involved in setting up international standards for broadcasting. For his contributions in the area of low
bit-rate audio coding, he received the Prof. Lothar Cremer Award of the German Acoustical
Society, and the Fellowship Award of the Audio Engineering Society (AES). As a senior engineer at the IRT, he is now in charge of advanced multimedia broadcasting and information
technology services.
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
22 / 24
INTERNET AUDIO
The RealNetworks G2 codec shows at 20 kbit/s a significantly worse quality than
at 16 kbit/s mono. At 32 kbit/s it offers a similar quality to 16 kbit/s mono, i.e. it
seems that the Real G2 does not gain from any joint stereo coding. Due to the
decoded signal’s higher frequency response at 48 kbit/s, compared with 32 kbit/s,
the quality is even worse than for 32 kbit/s. At 64 kbit/s, the quality is in the range
of “Good” and “Fair” for most of the tested signals.
9. Acknowledgements
The authors would like to thank warmly the members of the B/AIM project group who
worked hard in conducting the studies, carrying out the subjective tests and putting
together the final report which served as the basis for the present article. Particular
thanks should go to Messrs. Thomas Sporer (Fraunhofer Institute) for providing the software and user-interface for training and conducting the tests as well as for statistical
analysis of the results, Tor Vidar Fosse (NRK) and Michael Harrit (DR) for providing the
assessors and for conducting the listening tests, Ulf Wüstenhagen (T-Nova) for verification of test material, and other members of the EBU Project Group B/AIM for their comments and advice.
10. References
[1] ETS 300 163: Television systems; NICAM 728: Specification for transmission
of two-channel digital sound with terrestrial television systems B, G, H, I and L
http://www.etsi.org/
[2] ISO/IEC 11172-1:1993: Information technology -- Coding of moving pictures
and associated audio for digital storage media at up to about 1,5 Mbit/s
http://www.cselt.it/mpeg/standards/mpeg-1/mpeg-1.htm
[3] ITU-R Recommendation BS.1116-1: Methods for the subjective assessment of
small impairments in audio systems including multichannel sound systems
http://www.itu.int/search/index.html
[4] BPN 029: EBU Report on the Subjective Listening Tests of Some Commercial
Internet Audio Codecs
Contribution of EBU Project Group B/AIM, June 2000.
[5] Preliminary Draft New Recommendation, ITU-R document 10-11Q/TEMP/33:
A method for subjective listening tests of intermediate audio quality - Contribution from the EBU to ITU Working Party 10-11Q
http://www.itu.int/itudoc/itu-r/sg11/docs/wp10-11q/1998-00/contrib/
56005.html
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
23 / 24
INTERNET AUDIO
[6] ITU-R Recommendation BS.562: Subjective assessment of sound quality
http://www.itu.int/plweb-cgi/fastweb?getdoc+view1+itudoc+12352+1++BS.562
[7] ITU-R Recommendation BT.500: Methodology for the subjective assessment of
the quality of television pictures
http://www.itu.int/plweb-cgi/fastweb?getdoc+view1+itudoc+12310+6++BT.500
[8] EBU Recommendation R 68-1992: Alignment level in digital audio production
equipment and in digital audio recorders
http://www.ebu.ch/tech_texts.html
[9] ITU-R Recommendation BS.645: Test signals and metering to be used on international sound programme connections
http://www.itu.int/plweb-cgi/fastweb?getdoc+view1+itudoc+12361+1++BS.645
[10] ITU-R Recommendation BS.775: Multichannel stereophonic sound systems with
and without accompanying picture
http://www.itu.int/plweb-cgi/fastweb?getdoc+view1+itudoc+12373+0++BS.775
EBU TECHNICAL REVIEW – June 2000
G. Stoll and F. Kozamernik
24 / 24