Digital Sound Reconstruction Using Arrays of SMOD

Transcription

Digital Sound Reconstruction Using Arrays of SMOD
Digital Sound Reconstruction
Using Arrays of CMOS-MEMS
Microspeakers
Brett
M. Diamond
2002
Advisor:
Prof.
Gabriel
Electrical & Computer
ENGINE RING
DSR Sound R~cordings
Digital SoundReconstruction Using Arrays of
CMOS-MEMS
Microspeakers
by
Brett Matthew Diamond
B.S. CarnegieMellon University (2000)
A proiect submittedin partial satisfaction of the
requirementsfor the degreeof
Master of Science
in
Electrical
and ComputerEngineering
Carnegie Mellon University
Committeein charge:
Professor KaighamGabriel
Professor Richard Stern
2OO2
To myfriends, family, and
those without a cure
Table of Contents
Acknowledgments
...................................................................................................
v
Chapter
1: Introduction
............................................................................................
1
1.1
Historyof Sound
Reconstruction
.........................................................................
1
1.2
Overview
of Research
.....................................................................................
2
1.3
Summary
......................................................................................................
3
Chapter
2: Digital Sound
Reconstruction
...................................................................
5
2.1
Concept
..........................................................................................................
5
2.2
Bit Grouped
Digital Array...................................................................................
6
2.3
Requirements
of Digital Sound
Reconstruction
.......................................................
7
Chapter
3: CMOS-MEMS
Microspeaker
......................................................................
8
3.1
Introduction
....................................................................................................
8
3.2
Fabricationof CMOS-MEMS
Microspeaker
...........................................................
9
3.3
Dynamic
Modelingof CMOS-MEMS
Microspeaker
................................................
12
3.4
AcousticResponse
of: CMOS-MEMS
Microspeaker
...............................................
14
Chapter
4: DigitalSpeaker
Arrays
............................................................................
16
4.1
Introduction
.....................................................................................................
16
4.2
Characterization
of Digital Speaker
Arrays............................................................
17
4,2,1
Uniformity
............................................................................................
17
4.2,2
Linearity
.................................................................................................
19
4.2.3
Directivity.............................................................................................
19
4.3 Harmonic
Distortion........................................................................................
20
Chapter
5: Experimental
Results
............................................................................
23
5.1Introduction
...................................................................................................
23
5.2TestSetup
......................................................................................................
24
iii
5.2.1
Generation
of Digital Signals...................................................................
24
5.2.2
HighVoltageAmplifiers.........................................................................
26
5.2.3
AcousticMeasurement
andAnalysis.........................................................
27
5.3 Characterization
of AcousticResponse,.
.............................................................
5.4
29
5.3.1
PulseAmplitude
...................................................................................
29
5.3.2
PulseWidth........................................................................................
31
5.3.3
PulseType.........................................................................................
31
5.3.4
Summary
............................................................................................
32
Sound Reconstruction
Measurements
................................................................
32
5.4.1
32
Pulsingof Sequential
Bits ......................................................................
of a 400HzSinusoid.....................................
5.4.2 TheBit-by-Bit Reconstruction
33
5.4.3
Effectof LowPassFilter ........................................................................
35
5.4.4
Characterizationof Reconstructed
Waveforms
...........................................
38
5.4.5
Effectof Sampling
Frequency
.................................................................
39
5.4.6
Frequency
Characterization
....................................................................
40
5.4.7
Reconstruction
of Complex
PeriodicWaveforms
.........................................
42
5.4.8
Musical
Recordings
..............................................................................
43
5.4.9
Summary
............................................................................................
45
Conclusions
andFutureWork................................................................
46
References
.............................................................................................................
49
Appendix
...............................................................................................................
50
Chapter 6:
A.
CircuitSchematics
...........................................................................................
50
B.
Indexof Sound
Recordings
..............................................................................
52
Acknowledgements
I walked into graduate school not knowing muchabout MEMS
or acoustics. I was a musician
and an engineer with a focus on data storage from my undergraduateyears. But I was inspired
and trained by someof the best people that Carnegie Mellon University has to offer, and would
like to thankthemindividually for their contribution to this research.
First,
I would first
Gabriel.
like to express my sincere gratitude towards my research advisor, Ken
From his experience in industry,
academia, and government, Professor Gabriel
provided me with the freedom to explore and guidance to succeed. His enthusiasm for this
project motivated methrough the difficult
times of graduate school and prevented me from
losing sight of my goals. Despite his devotion to his family, he wouldtake a few hours out of his
weekend to meet with me when needed. Our professional
and personal relationship
has
provided me with a well-rounded research experience.
Professor Richard Stern has a passion for signal processing and acoustic applications that
made him perfect
for my committee. I enjoyed taking his DSP course and our frequent
discussions on digital
sound reconstruction.
Professor AdnanAkay introduced me to the
theoretical world of acoustics and tolerated mymanyquestions with incredible patience. I would
like to thank him for counseland his suggestionsregarding my thesis.
My friend and colleague, John Neumann,also deserves manythanks. Although I only have one
official
advisor, John has been an amazing mentor and source of countless anecdotes that
always seemto put a smile on my face. His, practical
understanding of acoustics and sound
recording madehim a gold mine of useful information.
All the devices described in this paper were fabricated at the Carnegie Mellon Nanofabrication
Facility. I wouldlike to thank the nanofabstaff for their supportduring the countless hours spent
fabricating my devices.
The foundation of any good research requires a healthy environment. The MEMS
Laboratory at
Carnegie Mellon University provided me with the resources and companionship necessary to
survive graduate school. First, I wouldlike to thank our lab administrator, MaryMoore,for her
support, patience, and endless supply of chocolate that maintainedmy sugar level well into the
late evenings. Next, I would like to thank my officemates: DanGaugel, Matte Zeleznick, George
Lopez, Kevin Frederick, and Janet Stillman. Their technical knowledgeand friendship mademy
graduate experience worthwhile.
Humoris the cure for alrnost anything and one should never cease to befriend those who
provide oodles of jokes and laughter. For keeping mesane, I would like to thank my closest
friends: Phil Odenz,Mike Beattie, Ed Latimer, Mark Tyberg, Kirstin Connorsand the Trekfest
gang.
I wouldalso like to thank two very special people whohavebeen instrumental in my fight again
Crohn’s Disease: Dr. Richard Duerr and Beth Rothert in the Digestive Disorder Center at
UPMC,Pittsburgh. Graduateschool would not have been possible for mewithout their aid.
Finally and most importantly, I would like to thank myimmediateand extendedfamily for their
constant and strong support throughout my life.
My parents, Allen and Harriet Diamond,have
shownselfless dedication towards my health, happiness, and success. They taught me to look
past the stress of my current situation and to relax and have fun. My brothers, Seth and Eric,
were a constant source of advice on social issues and taught meto look ahead towards the
future and not dwell on the past. I thank thern, as well as my sister-in-law Sandy, and cousins
Alicia and Justin, for their adviceand supportregardlessof the time of day.
Chapter1 Introduction
1.1 History of SoundReconstruction
Theinvention of the electric telegraphby SarnuelMorsein 1832stimulatedmanyinventors of
that time to find newmethodsfor recording and transmitting messages,
including soundand
music.Some
of the earliest attempts
at integratingan acoustictransmitterandelectric circuit
occurredin Europe,but quickly influenced manyAmericaninventors such as Thomas
Edison,
AlexanderG. Bell, Ernst Siemens,and EmileBerliner. Figure 1.1 highlights someof the
important
developments
in sound
reconstruction,beginningwith the invention
of the "dynamic"or moving-coiltransducerin
1874by Siemensand telephone in 1876by
Bell. Bothinventions involved vibrating a
diaphragmwith an electromagnetby placing
a circular coil of wire inside a magnetic
field.
Thirty yearslater, the vacuum-tube,
usedin
th
analogelectronicsfor the first half of the 20
century, is introduced. The voice-coil
speakeris replaced in 1921by the directradiator
loudspeaker, which employs a
magneticallyactuateddiaphragmto produce
sound.This designis the prototypefor most
1877,Berliner invents
the first microphone~
1874,"dynamicor
moving-coiltransducer
invented by Siemens
andBell
1921,Phonetron,the
first direct-radiator
7
loudspeaker
L
1906,Fleminginvents the
~ first vacuum-tubeknown
as the ’~hermionicvalve"
1929, Kellogg invents~ ~
1928,Nyquistproves
electrostatic speaker
SamplingTheorem
1948, Commercial33
1947,Inventionof the
1/3 LP(LongPlaying)
transistor by Shockley~~ microgroove disc
et al.
introducedby Goldmark
1948, Audio
1963, Compact stereo~
EngineeringSociety
tape cassettesand
playersare developed
-~ (AES)
formed
by Phillips
1965,Era
of digital
signal processing(DSP)
beginswith the
1982, Digital Compact~
applicationof Fast
Disc (CD)introducedby
Fourier Transform(FFT)
Japaneseconglomerate
by Cooleyand Tukey
-[_1990, Phillips introduces
a digital audiorape
1996,DVD
(Digital Versatile
recorder(DAT)using
Disc) increases capacity of~
digital cassette
digital storageof audioand
video from 725MBto 14
GBper double-sided
disc.
Figure1.1 Timelineof SoundReconstruction
current analogspeakers.Theinvention of the transistor in 1947begins the changeoverfrom
vacuum-tubes
to a cheaper,smaller, and faster technology:integrated circuits. In 1965, the
application of the Fast Fourier Transform(FFT) towards signal processing, gives sound
electronicsthe ability of real-timefiltering anddigital/analogconversion.At the same
time of the
paradigmshift from analogelectronics to digital signal processing(DSP), soundrecording
mediahavealso changed
from the first microgroovedisc recordsin 1948to digital medialike
CDsand DVDs.The introduction of digital
file
formats such as MP3have increased the
popularity of thesenewmediaover traditional analogmedia~ike cassettetapes andLPs.
1.2 Overviewof Research
Despite130years of development
in soundtechnology,the transduceris still the only remaining
analogcomponent
in a world nowdominatedby digital mediaand electronics. Therefore, the
conversionfrom analogtransducerto digital transducerwouldbe the last "piece of the puzzle"
to achievinga completelydigital system.This newparadigmof soundreconstruction, referred
to as Digital SoundReconstruction(DSR),wouldalleviate manyof the inadequaciesassociated
with traditional analogspeakers.Thepractical limitations of an analogspeakerlimit its
performance,
particularly the frequencyresponseandlinearity. For example,it is difficult to
producelow frequencysoundswith a small size speaker.In addition, a digital-to-analog (D/A)
converter mustbe usedbefore electro-acoustic transduction to accountfor the incongruity
betweenthe analogspeakerand digital electronics responsiblefor filtering andother signal
processing.Theconverternot only increasescost but introducesadditional signal distortion.
Digital soundreconstruction is not a recent conceptand has beentheorized since the early
1980’sin severalpatents~[1,2]. Oneof the earliest patentsinvolvedthe designof a fluid flow
control speakersystemconsistingof pipe openingsof a specific area that correspond
to a bit of
a pulse codemodulated
(PCM)signal. Anotherpatent describesthe hypothetical creation of
army of plastic membranes
that are pulsed in time to create a time-varying waveform
interpreted by our ears as an analogsignal. As far as wecantell, neither of those inventions
were ever implemented.Several papers examinethe acoustic issues behind the use of a
theoretical array of speakersfor the purposesof digitally reconstruction sound,including
2
harmonic
distortion, directivity, andlinearity [3,4]. Again,thesepapersreflect simulationsbased
on theory.
1.3 Summary
This thesis servesto crossthat theoretical thresholdby presentingexperimentalresults of DSR
using an array of microspeakers.Chapter2 describes the DSRconceptin detail and explains
howa transducerarray can be usedto digitally reconstruct sound.Thetrade-offs betweenDSR
andtraditional analogspeakersare also discussed.
Thedifficulty in demonstrating
digital soundreconstructionstemsfrom the high manufacturing
costs associatedwith an essential high quality array of speakers.However,
the solution can be
found in a technologythat is well developedand useful in makinguniform and repeatable
componentsat low cost: MicroelectromechanicalSystemsor MEMS.
This technology can be
integrated with conventionalCMOS
electronics to create devicesthat intelligently interact with
the environment,such as a microphoneor speaker. Chapter3 will cover the fabrication and
dynamicmodelingof a CMOS-MEMS
microspeakerdevelopedat CarnegieMellon University.
This particular speakerdesignwasthe foundationof the digital speakerarrays presentedin our
research.
In Chapter4, we will branchout from a single microspeaker
to discuss the issues concerning
transducerarrays. Someof these issues are dependenton the size of the array while others
are morepertinent to the modeof operation: analogor digital. Fourdifferent speakerarrays
useful for studyingdigital soundreconstruction are also presented,the mostcurrent design
providinga majorityof our data.
Thefirst few chaptersof this thesis serve to establish the conceptandissues behinddigital
soundreconstruction. Chapter5 quantitatively investigates manyof these issuesanddescribes
the laboratory setup used in our experiments.Afterwards, we will demonstratethat DSRis
possible for a wide variety of sounds, including music, and comparehowwell the theory
matchesour experimentalresults. Thelast chapter summarizesthe research doneover the
past two years andcomments
on future work basedon results presented.Wefeel that whenit
comesto soundand other acoustical topics, sometimesyou have to hear it to believe it.
Therefore, in addition to the manypressurewaveforms
andfrequencyanalyses,weencloseda
CDof recordings madewith a CMOS-MEMS
digital speakerarray.
Solet us begin...
4
Chapter2" Digital SoundReconstruction
2.1 Concept
Traditional
sound reconstruction techniques use
one or a small
number of analog
speaker
diaphragms with motions that are proportional to
the sound being created. As shownin Figure 2.1,
louder soundis generated by greater motion of the
diaphragm and different
frequencies are produced
by time-varying diaphragm motion. With r)igital
Sound Reconstruction (DSR), the desired sound
Figure 2.1 Conventionalanalogsoundreconstruction
showingdiaphragm
position correspondingto different
points in the soundwaveform.
waveformis generated from the summationof discrete pulses of acoustic energy produced by
an array of speakers or speaklets. These pulses or clicks contribute a small portion of the
overall sound, so unlike analog speakers, DSRspeaklets do not require a large dynamicrange.
Louder sound is generated from a greater numberof speaklets emitting clicks,
frequencies
(d)
(a)
and different
are produced by time-varying
numbersof speaklets emitting clicks.
Figure
2.2 showsa graphical description of digital
soundreconstruction. Initially,
four speaklet
clicks are neededto produce a certain amount
(a)
(b)
(c)
(d)
of pressure (Figure 2.2(a)). At the next instant
in time, three speaklets are used to generate a
slightly smaller pressure (Figure 2.2(b)).
Figure 2.2 Digital SoundReconstruction (DSR)with
hypothetical15-speaklet(4-bit) chip. Anidealized sounc~
pulse (click) is generatedfrom the motionof a single
speaklet’s binary motion( [] ). Multiple speakletbinary
motionsat different timescreatea soundwaveform.
smaller pressures are needed, less speaklets
are pulsed (Figure 2.2(c)).
Conversely,
speaklets are pulsed whenlarger pressures are needed(Figure 2.2(d)). The resulting
summationof sounds or pressure variations produced by the array of speaklets has a
magnitudecorrespondingto the analog value at discrete sampledtime values. Thenumberof
speakletsin the array andthe samplingrate (frequencyat whichthe number
of pulsedspeaklets
is updated), will determinethe resolution of the resulting waveform.Since the humanear
inherentlyhasthe characteristicsof a low-passfilter, the listener hearsan acousticallysmoother
signal identical to the original analogsignal. In practice, however,various non-idealities
associatedwith the acoustic pulseof a speakletandtransducerarray introducedistortion into
the system,preventing an accuratereconstruction of the analogwaveform.Theseissues will
be discussedlater.
2.2 Bit-GroupedDigital Array
A digital transducerarray is requiredto implement
true, direct digital reconstruction
of sound.In a bitgroupeddigital array, eachtransduceris assigned
a bit group, wherethe numberof transducers in
eachgroup is binary weighted[3]. For example,
the 8-bit digital array shownin Figure2.3 haseight
different
groups of transducers.
The most
significant bit (MSB)contains27 or 128transducers
while the least significant bit (LSB)is represented
by a single speaker.In general,an n-bit array will
have 2n-1
Figure 2.3 Conceptualdiagramof an 8-bit transducer
array chip. Thenumber
of transducersin eachgroupis
binary weighted.Bondpads
surroundingthe array supply
driving voltagefromthe package.
transducersand the mth bit of that array will contain 2ml transducers. Whenthe
signal for a particular bit is high, all of the transducersin the groupassignedto the bit are
activatedfor that sample
interval.
2.3 Requirements
of Digital SoundReconstruction
In Section2.1 of this chapter, the acousticresponseor click of a speakletwasidealized as a
rectangularpulse. It will be shownlater that in practice, this pulse can neverbe perfectly
rectangular. Regardlessof the shapeof the acoustic response,however,several requirements
of theseresponses
are essential to digitally reconstructsoundsusingan array of transducers:
The acoustic responseof a single transducer should be fast, on the order of 10’s of
microseconds.
This includes the time it takes for the transducerto respondto the impulse
as well as the time it takesfor the response
to decayto negligible levels. This will limit the
samplingrates that can be usedin converting digital information to an analogacoustic
waveform.Theresponsetimes in this thesis will be measured
from the start of the impulse
to the maximum
pressureof the acoustic response.
¯
Theacoustic responsemustbe repeatableover time anduniformacrossall speakletsin the
array. Variations in uniformity will introduceerror into the digital to analogconversion.
Designingthe array on a single chip will minimizeprocessvariations during fabrication and
control the overall speakletuniformity.
¯
Regardlessif the acoustic responsesfrom multiple speakletsare linear or non-linear, the
resulting acousticenergyfromthosespeakletsmustadd/inear/y.This implies that the total
pressurefield in the region is the superpositionof all pressurefields generatedfrom the
speaklets,or written mathematically:
whereN is the number
of speakletsin the array, ~ is the path length from eachspeakletto
the listening point and c is the speedof sound(343 m/s). An~, ~o, and ¯ is the source
amplitude,frequency,andinitial
phase,respectively. Withoutlinearity, the summation
of
acousticenergy(as shownin Figure 2.2) cannotbe predicted.
Chapter 3: CMOS-MEMS
Microspeaker
3.1 Introduction
Current MEMS
applications require increasingly complexmicrosystems
and morecomputational
power.Oneapproachto satisfy these technological demands
is to incorporate MEMS
structures
with conventional CMOS
electronics
integration.
processing, commonlyreferred to as CMOS-MEMS
Desirable for systems with arrayed microsensors (e.g. microphones)
microactuators(e.g. speakers),this form of integration canleveragethe speed,reliability,
and
economical benefits of CMOS
processing. For example, on-chip electronics can reduce
parasitic capacitancefrom interconnects betweenthe electronics and structures as well as
decreasepackagingcosts.
A variant of this CMOS-based
micromachining process, developed at Carnegie Mellon
University, has manyadvantagesover other CMOS-MEMS
techniques [5]. This process,
describedin the next section, can be usedto producehigh-aspect-ratiostructures with narrow
beamwidths andgaps, with limited post-processingsteps that are safe for CMOS
electronics.
The design of the CMOS-MEMS
microspeakerdeveloped by Neumann
et al. [6] provides the
foundation for demonstratingdigital sound
reconstruction. A serpentine metal and oxide
meshpattern, shownin Figure 3.1, is repeated
to form structures with dimensions up to
several millimeters. Thepatterns are realized
in a CMOS
chip, etched and released to form
a suspended
mesh,typically 5 to 60 #mabove
the substrate. The meshis coated with a
polymerto create an airtight membrane,
and
Figure 3.1 SEMof serpentine mesh(1.5 Izm beamsana
gaps)after releasefromsubstrate.
then electrostatically actuatedby applying a varying electrical potential betweenthe CMOS
metal andsilicon substrate. Theresulting out of plane motionis the sourceof pressurewaves
that producesound.
3.2 Fabrication of CMOS-MEMS
Microspeaker
In this section, we will describe the process used to create the original CMOS-MEMS
microspeaker. The CMOS
chip comesfrom a foundry facility
(i.e.
MOSIS,AMS)with
protective layer of silicon dioxide. As shownin Figure 3.2(a), regions meantfor mechanical
structures are usually patterned in one or
moreof the metal layers. To minimize the
Silicon dioxide
(overglass)
Metallayers
thickness of the membrane,
only the bottom
metal layer is utilized in the microspeaker
meshdesign. Before releasing the CMOS
structures that define speakerson the front
CMOS
circuitry
side of the chip, vent holes must be
patterned
and etched to connect the
Silicon substrate
Figure 3.2(a) 3D view of CMOS
chip from foundry facility,
covered
witha protectivelayer of silicon dioxide.
membrane
cavity with the backsideof the chip (Figure 3.2(b)). Theseholes, generally15 to
#min diameter, are necessaryto reduce the acoustic impedancebehind the membrane
and
reduce unwanted resonances that will
causeoscillations in the acousticresponse.
After the vent holesare etched, processing
of the front side continuesas follows:
Vent holes
Figure 3.2(b) 3D view of CMOS
chip after the backside vent
holeshavebeenetchedleaving,’,sO to 60pmof silicon remaining.
/n somecases,the oxidelayer canbe usedas anetch-stop.
(1) Thesilicon dioxide is etchedanisotropically
Composite
(directionally) downto the silicon substrate structurallayer
Exposed
silicon
using a reactive-ion etch system. Thetop
metal layer asksas a maskandprotects the
CMOS
meshstructures (Figure 3.2(c)).
(2) Theunderlying silicon substrate is etched
isotropically (all directions) to undercutand
Figure 3.2(c) Cross sectional view of CMOS
chip after
oxideetch.
release the mechanicalstructures, whichcan nowbe usedas electrodes for sensing and
actuation, or wires for connectingto on-chipcircuitry (Figure 3.2(d)). Theresulting cavity
can be etched using an SF6plasmaor
~ 5 #m
~ 1 #m
XenonDifluoride
An
(XeF2).
anisotropic deep reactive-ion
etch
(DRIE)can precedethe release etch
form much larger cavities
Etchedcavity
Figure 3.2(d) Cross sectional view of CMOS
chip after
silicon releaseetch.
without
removing the silicon sidewalls that
isolate different structures. Typically
only the isotropic etchis requiredfor cavity depthsless than20 p.m.
(3) In the final step, the released CMOS-MEMS
meshis coated with a CHF3polymer in
chemical vapor deposition process. The
CVDpolymer
polymerconformsto all sides of the meshlike beams,until all gapsare sealed,thus
V
creating an airtight suspendedmembrane
(Figure 3.2(e)). Themetal layers inside
the beams allow the membraneto be
electrostaticallyactuated..
Figure 3.2(e) Cross sectional view of CMOS
chip after
polymer deposition.
The sealed membranecan be
electrostatically actuatedby applyinga potential to the metal
beamsandsubstrate.
For speaker chips containing a single or few membranes
and provided that the membrane
cavities are at least 40 ~rn deep,wecan guarantee
with the aboveprocessthat all speakerswill
be sufficiently vented.However,
with an array of 256speakersthat are less than 250~min size
suspendedover a 10 - 15 ~mdeepcavity, this tasking becomesdaunting. An alternative
methodwasadaptedfor the large speakerarrays usedin this research.Instead of patterning
vent holes, we etchedthe backsideof the
chip usinga combinationof anisotropic and
isotropic DRIEsteps until 50 - 100 ~mof
silicon
remained (Figure
3.3(a)).
Approximately500 I~m of the outer edges
Figure 3.3(a) 3D view of CMOS
chip after modified backside
etchleaving50to 100,urnsilicon remaining.
of the chip weremasked
to provide support
during future processing steps.
An
isotropic etch wasaddedto the recipe to minimizeloadingeffects that occurwith large areasof
silicon, resulting in highly non-uniform
etch profiles. After the usualoxideetchdescribedin step
(1), vent holes are then patterned andetched
on the front side of the chip (instead of the
backside). As shownin Figure 3.3(b), this
modification allows us to properly vent all
membranes
with cavities that are only a few
micronsdeep.After the vent hole etch reaches
Figure 3.3(b) Crosssectional viewof CMOS
chip after the
vent holes havebeenpattemed
andetchedfromthe front.
the depth achievedthrough the backside etch
(which is indicated by tt~e presenceof light whenshined from a microscopeback source),
releaseof the membranes
can continueas describedin steps(2) and(3).
3.3 DynamicModeling of CMOS-MEMS
Microspeaker
The composite materials and serpentine pattern makethe CMOS-MEMS
membrane
a complex
structure to accurately model. However,approximations can be madeto obtain useful
information from a first- or second-orderanalysis. In a paperon micromachined
electrostatic
transducers,Ladabaum
et al. representa transduceras a first-order lumpedelectro-mechanical
model,consistingof a linear spring, a mass,anda parallel plate capacitor[7]. Themotionof the
membrane
can be expressedin a force-balanceequation:
FELECTROSTATt
c +
FsPRm
~ = Fu~ss
whereFELECTROSTATIC
andFSPRING are forces exertedby the capacitorandspring, respectively. In
order to represent the membrane
as a simple mass-springsystem,several approximationswere
necessary.First, the membrane’s
restoring force (opposingdirection of motion) wasassumed
to be a linear functionof its displacement:
FSPR~N~
= -kx. Secondly,the electrical fringing fields
and curvature of the rnembranewere neglected. Finally,
dampingof the air or other
surroundingmedium
wasreplaced with a vacuum
to simplify the mathematics.As will be shown
later, the dampingfrom the air underneaththe sealed membrane
will havea drastic effect on
the responseof the system.In either case, the modelshowsthat at someapplied voltage, the
electrostatic force overcomes
the spring’s restoring force andthe membrane
collapses. This
collapse point occurswhen:
whereS is the area of the membrane,
e is the electric permittivity, and do is the separation
betweenthe membrane
and substrate at rest.
Thepresenceof a thin layer of oxide (170 /~) underneaththe metal structures preventsthe
membrane
and substrate from shorting after collapse. This layer was neglected in the
12
derivation of the collapsevoltagesince do >>
After collapse, however,the membrane
dlNSULATOR.
will not snapbackuntil the voltagereachesbelowVSNAP.BACK:
2kdlus~r°~ -.dms~aroe )
=
VSNAP_BACK
~insulator S
~
This hysteretic behavior waspredicted and
experimentally verified
for a CMOS-MEMS
7O
microspeaker,as shownin Figure 3.4.
In order to maximizesystemefficiency, it
~ 3O
would be optimal to driw. ~ these membranes
~ 2O
10
at their resonancefrequency,for whichthe
10
20
30
40
DCVoltage
(V)
50
60
70
deflection is largest. If we simplify the
Figure 3.4 Hysteretic
behavior of membrane displacement as
a function of DC voltage. As the applied voltage increases,
the membrane collapses around 48 Volts, but does not snapback until the voltage drops below 22 Volts.
serpentine metal-oxide meshand polymer
composite as a flexural
plate
mass
dominated
by the polymer,the resonance
frequency,fR, can be written as:
wherethe Young’sModulus,E, Poisson’sratio, v, membrane
side length, a, thickness, t, and
density, p, are knownor estimated [8]. If we use an experimentally determinedYoung’s
Modulusvalue of 500 MPafor the mesh-polymer
structure, then we would expect a 1.4 mmx
1.4 mmCMOS-MEMS
membrane
(3 #mthickness) to have a resonancefrequency around 1.5
kHz. If we decreasethe size of the membrane
to 216 #mand thickness downto 1.3 #m, then
the predicted resonance
frequencyjumpsto 27.5 kHz.
A frequencyresponsecurve of the membrane
will validate our predicted results as well as give
information concerningthe effectiveness of the membrane
whendriven as an analogspeaker.
Figure 3.5 shows the frequency response curve of a 216 p.m x 216 ~m CMOS-MEMS
13
membrane, measured using
9080
an
acousticsetup describedin Chapter5.
7060-
To accommodate
the small pressure
output of a single 216 ~mmembrane,
MembraneResponse1
we applied the 20 V bias and 10 Vp-p
,!
~" ~~~
Microphoneo/~ensiti’~ty"
0
~0
100
1000
10000
100000
sinusoidal input to a group of 128
Fri~:luenc
y (Hz)
speakers. As the microphonehas its
Figure 3.5 Frequencyresponse curve for a group of 128 CMOSMEMS
square membranes,
each 216/JJ’n on a side. Resonantpeaks
are dueto the earphone
housing(~6 kHz), microphone
sensitivity (~
kHz), and membrane
resonance(20-45 kHz). The dotted line shows
tvr)ical microphone
sensitivity plot, with a resonance
around12 kHz.
own resonance frequencies
that
influence the measuredoutput, we
overlaid a typical microphone
sensitivity plot in the figure to identify the resonantpeaksaround
15 kHz. Thereare several additional peaksbetween20 and 45 kHzthat could be either the
membrane
resonanceor a harmonicof the microphoneand cavity resonances.With further
tests, wewill be ableto better identify the sourcesof theseresonantpeaks.
3.4 Acoustic Responseof CMOS-MEMS
Microspeaker
Clearly, there is a great deal moremodelingthat can doneto havea better understandingof
membrane
dynamics. However, as this thesis is meant to be a proof of digital
sound
reconstruction, it makesmoresenseto spendtime characterizingthe acoustic responseof the
membrane.An acoustic response is the
pressure waveformcreated by one or more
speakersin response
to an electrostatic input,
in the case of our research, a finite high
voltage pulse. Figure 3.6 showsthe acoustic
response of a group of 216 I~m square-
Figure 3.6 Combined
acoustic responseof 128 speak/ets
(2161Jm
on a side) to a 40 Vo/t, 200ps pu/se(a/so shown).
shaped membranes
given a 200 ~s, 40 Volt
14
pulse. Thereare several observationsthat can be madefrom this response,first of which, is
the time it takesthe membrane
to respondto the pulse. Onepossibility for the sourceof this
delayis the propagationtime for the acousticpressureto reachthe microphone,
typically a few
millimeters from the speakerarray. However,that only contributes to 10 - 15 p.s of the
responsetime. Therefore, the majority of delay must comefrom the responsetime of the
membrane.
Thesecondobservationis the bipolar natureof the acousticresponse.Thepositive peakarises
from the membrane
collapsing towardsthe substrate while the negative peakarises from the
release of the membrane.
In mostcases,the first positive peakis greatest in amplitudeand
responsiblefor mostof the pressure.However,
it is clear that if digital soundreconstruction
relies on the superpositionof pressuresin time, then the negativepeakswill counteractpositive
peaksproducedby other membranes.
As will be discussed in Chapter5, there are several
other consequences
frorn this behavior. Chapter6 will present somesuggestionsthat might
reduceor possiblyavoidthis behaviorin the future.
Thefinal observationis the oscillatory nature of the acoustic response.Rangingfrom 10 - 20
kHzin frequency,these oscillations are causedby the resonances
in the system,including the
membrane,
cavity, and rnicrophone. In Section 3.3, we showedthat these resonancesoccur
between6 kHz and 45 kHz. As a consequence,the oscillations
will appear in digitally
reconstructedwaveforms
andthus contribute energyto upperharmonicsof the system.
Chapter4: Digital SpeakerArrays
4.1 Introduction
To demonstrate
the digital soundreconstructionconcept,weconstructedfour different speaker
array chips basedon the CMOS-MEMS
membrane
design described in Chapter3. The original
3-bit array prototype consistedof sevenseparate1.4 mmx 1.4
mmmembranechips mountedon a TO-8 package with holes
drilled underneathfor properventing (see Figure 4.1). Four
the speakletswereelectrically tied together to form the most
significant bit (MSB),anothertwoweretied togetherto formthe
Figure4.1 3-bit (7 speaklet) array
mounted on a TO-8 package.
Under the chips, vent holes have
beendrilled through the package.
Unusedholes are filled to preven~
airleakage,
next most significant
bit,
and the
remaining
one
speaklet
formedthe least significant bit (LSB). This design
allowedus to studythe individual acousticresponse
of
eachspeakletas well as reconstructsimplesinusoidal
signals. As will be shownlater, variations in the
Figure 4.2 Integrated array of four 1.4 mm
membrane
speaklets. Chip size is 5 mmx 5 mm.
fabrication of separate chips (even whenprocessed
together) makeit impractical and difficult
to
characterize the array performance.Therefore, the
digital speakerarrays shownin Figures 4.2 and4.3
weredesignedwith multiple speakletsintegrated on
the samechip with the speakletselectrically isolated
and connectedto separate bondpads.Furthermore,
Figure4.3 3-bit array containingoctagonal900izrn
diameterspeaklets.Chip size is 5 mmx 2.5 mm.
multiple chips could be arranged on the same
packageto create larger bit arrays. Basedon the
initial
results measured
with these arrays, an 8-bit
(255 speaklet) array was designed to balance the
resolution necessaryto demonstratedigital sound
reconstruction with the high costs of fabricating
centimeter-scalechips. Shownin Figure 4.4, the 8bit array contains256 squarespeaklets, each216 ~m
Figure 4.4 8-bit array containing 255 square
membranes,
each 216 ~mon a side. Total chip
size is 5.2 mmx 5.2 mm.
on a side. Asit wouldbe impracticalto wire eachspeakletin the 8-bit array to its ownbondpad,
the speakletsweredividedup into eight electrically isolated regions,as describedin the section
on digital speakerarrays. Althoughwelose the ability to actuate an individual speaklet, only
nine bondpads
are required: eight for the bit groupsandonefor the substrate tied to ground.
Mostof the acoustic responseand soundreconstruction measurements
detailed in the next
chapter,as well as the recordingsincludedwith this thesis, weremadeusingthe 8-bit speaklet
chip.
4.2 Characterizationof Digital SpeakerArrays
4.2.1 Uniformity
As mentioned
in chaptertwo, anyvariations in uniformity will introduceerror into the digital to
analogreconstructionprocess.Wecan examinenon-uniformityat the individual speakletlevel
as well as the bit grouplevel. Thecollapse voltage describedin Chapter3 can be a useful
indicator of non-uniformities betweenspeaklets throughoutthe array. Figure 4.5 showsa 3D
graphdivided into 16 x 16 data points of collapse voltagesfor the 8-bit array. To accurately
measure
the collapse voltage for all 256 speakletmembranes,
weincreasedthe voltage in 0.1
Volt increments and markedindividual membranes
under the microscopeas they collapsed.
Themeancollapse voltage andstandarddeviation are 28.0 Volts and3.28 Volts, respectively.
Mostof these variations occur during the fabrication processdescribedin Chapter3, but once
exposed to an environment outside the
cleanroom, dirt
and other destructive
particles can deposit on the membranes
I r~ 25K
andpotentially affect speakeroperation.
If wewantto look at uniformityfromonebit
154-"//
]
lOJ~
10I,~113
lS-:
I| 5-1(
groupto another,the best wayis to find the
normalized acoustic response for each
group. In this case, normalizationinvolves
measuring the acoustic response of a
Figure4.5 Three
dimensional
profileof collapse
voltages
for all
256 speaklet membranes
in the 8-bit array. Ma~::34.3 V, Min:18
particular
bit group and dividing that
responseby the numberof speaklets in the bit group. Figure 4.6 showsthe original and
normalizedacoustic responsesof eachgroupin the 8-bit array given a 40 Volt, 200 I~s pulse.
The response
fromthe MSBexhibits less oscillations than the other bits, probablyduein part to
the superpositionof manyspeaklets that have slightly different acoustic responses.Despite
0.02 T
---,BI 1
Bit2
0.015
0.01
0.005
0
-0.005
-0.01
(a)
(b)
Time(p.s)
Time
Figure4.6 (a) Theacoustic
response
of eachbit groupof an 8-bit arraygivena 40Volt, 200ps pulse.(b) The
normalized
acoustic
response
of (a) canbe foundbydividingeachacousticresponse
by the number
of speaklets.
18
variations in the collapse voltage shownin Figure 4.5, the shapeand responseof the
normalizedacoustic responses
in Figure 4.6(b) havean averagecross-correlationvalue of 0.95
where1.00 implies a perfectly identical match°
4.2.2 Linearity
Although the acoustic responsesof the individual CMOS-MEMS
membranes
are highly nonlinear with respectto excitation voltage, the linear in the superpositionof multiple speaklet
responsesis essential to digital soundreconstruction. Otherwise,the conversionfrom digital
samplesto analogpressureswouldbe non-linear andrequire signal. Themaximum
pressureof
acoustic responsesgives a goodindication of the array linearity. As shownin Figure 4.7, a
linear trendline andR-squared
coefficient
of determination
RZ=0.9985
can be used to
approximate these values.
The R-
squared value or coefficient
of
determinationdisplays howclosely the
estimatedvalues of the trendline match
20
40
60
80
# Speaklets
100
120
140
the actual data. As the R-squaredvalue
Figure 4.7 Maximum
pressure for the acoustic responseof each bit
approaches
unity, the trendlinebecomes
groupgiven
a
40 Volt, 200 ps
pulse.
moreaccurate.A valueof R2 greaterthan 0.95 is consideredstatistically linear, so compared
to
our valueof 0.9985,the superpositionof acousticresponses
between
bit groupsis linear.
4.2.3 Directivity
In discussionsinvolving speakerarrays, it is common
to examinehowthe soundpressurefields
created from individual speakersinteract with eachother. Depending
on the geometryof the
array andthe frequencies
of interest, the resulting pressurefield canvarywith listening position,
therebygiving a senseof direction to soundcreatedby the array. Thereare manyapplications
for which the production of directional soundis useful, such as soundrecording, military
communications,
and sonar.
Thereare generallytwo sourcesfor the creation of a pressurefield that varies with listening
position. If the size of the speakeris large compared
with the wavelengths
of interest, then the
speaker must be modeledas a sound source that exhibits non-uniform and frequencydependentradiation properties. If the size of the array (or maximum
distance between
speakers)is comparableto the wavelengthsof interest, then the path difference from one
listening positionto anotherwill varyconsiderably
andcauseirregularities in the soundfield.
As the dimensionsof typical CMOS-MEMS
membranes
range between100 and 2000~m, their
size wouldnot become
an issue for frequenciesin the audio (or evenlow ultrasonic) range.
Althoughthe size of a speakerarray is comparableto a quarter-wavelengthfor frequencies
abovea few kHz,the variation in path differencefromtwodifferent speakersis minimal.In fact,
for mostin-ear applications,the listening position (i.e. eardrum)is fixed andreceivesa constant
pressurefield. Therefore,it is safe to assume
that directivity is not an issue in digital sound
reconstruction using CMOS-MEMS
technologyfor earphoneapplications.
4.3 Harmonic
Distortion
The harmonicdistortion,
Dh, can be useful in evaluating the effectiveness of any sound
reconstructionprocessandis definedas:
N
Dh
N
’
~_,A~
n=l
whereAn is the Fourier coefficient of the nth harmonicin the soundwaveform.Fromthis
equation, Dh can be viewedas the ratio of the sumof harmoniccomponents,excluding the
fundamental
(n=l), to the wholespectrumof the signal. Theharmonicdistortion of a perfectly
linear analogarray, in whichthe speakersare driven by the samesinusoidal signal without the
presence
of distortion, is zero at all locationssince it doesnot containanyharmonics
other than
2O
the fundamental.Theharmonicdistortion for a commercialanalog earphonesrangefrom 0.1%
to 1%for high quality speakersand 1%to 10%for lower quality speakers.Thesevalues vary
with frequencyanddo not includean additional fractional percentdistortion fromthe electronics.
For a digital transducerarray, Dhwill havenonzerovaluesthat vary with different listening
locations [3]. Non-uniformities in the acoustic responsebetweenspeaklets, described in
Section4.2.1, contribute to harmonicdistortion by introducing higher harmoniccomponents
to
the resulting waveform.
If the size of the array is comparable
to the wavelength
of the signal,
then any phasediscrepancy betweentwo speaklets, resulting from different paths to the
listening point, will also contribute
some harmonic distortion.
As
mentionedin the previous section,
however,this issue is not applicable
for earphoneapplications of DSR.
-70
-80
For cases where directivity
-90
-100
10
100
1000
I(XX)O
1~
Frequency
(Hz)
Figure 4.8 Example of spectral
analysis
taken during digital
sound
reconstruction
experiments.
The frequency range studied was 20 to
20,000 Hz. Measured sound levels given in units of decibel-Volts
or dBV.
become an issue,
does
the array
geometryand speaklet arrangement
will havea considerableeffect on
harmonic
distortion. Initially, onemightthink that increasingthe number
of bits in a digital array
wouldimprovesoundquality dueto the increasedresolution (or quantizationlevels). However,
Huang
et al. report via simulationsthat increasingthe number
of speakletspast a thresholdwill
increase the path difference betweenspeaklets, and thus negatively affect soundquality
throughincreasedharmonicdistortion.
Thevalue of Dh, also referred to as total harmonicdistortion (THD),can be calculated using
data collected from the spectrumanalyzer. As shownin Figure 4.8, the analyzer can give us
the strengthof a particular frequency
in termsof dBVr~,s
or decibel-Volts.To convertdBVr~,s
into
power,weneedto first backout the Vrrnsby the followingformula:
21
dBV,~, = 201Og~o(V~,~,) =:, V~,. --: ~~9
To obtain the power or Fourier coefficient
Ao at each frequency, we simply square each V~s
value. As the frequency range of the humanear does not extend beyond 20 kHz, we will
concern ourselves only with spectral powerof frequencies between20 Hz and 20 kHz.
22
Chapter5: ExperimentalResults
5.1 Introduction
Nowthat the theory and issues behinddigital soundreconstruction havebeenpresented,we
are ready to demonstratethat DSRis possible and comparehowwell the theory matchesour
experimentalresults. Weperformedour first experimentsusing the 3-bit (7 speaklet) array
describedin the last chapter.Withthis chip, wewereable to: (1) begincharacterizingthe effect
of the pulse voltage andpulse width on the acoustic responseof a single speaklet; and (2),
demonstratethe first attemptsof digitally reconstructing sinusoidal waveforms.Figure 5.1
showsrepresentative data from the 3-bit
DSRexperiments, including
(a) the
acoustic responseof a 1.4 mmspeaklet
membrane
to a 90 Volt, 2.00 las pulse, and
(a)
(b) the reconstruction
of a 500
sinusoidal waveusing the 3-bit array. We
madeseveral observations from these
initial
sets of experiments.First, the
-100
100
300
500
700
900
1100
1300
1500
Time(ps)
responsetime of the acoustic responseis
relatively slow (-250 ~s) for mostaudio
i.................
..............
A.................................
~..................................
.2’.
..................
................
~i..................
4...........(b)
reconstruction
applications
and as
discussed in Chapter 2, limits
the
precision of the reconstructedwaveform.
Second,weapplied an A-weightedfilter,
typically
50~#s/div
i
I
Figure 5.1 (a) Acoustic response of a single 1.4
membrane
given a 90 Volt, 200/zs pulse, also shown.(b)
Reconstructed
500Hzsinusoidalwaveusing the original 3-bi~
array.
used in acoustic noise measurements,
to all measurements.A-weightedfilters
attenuatethe low frequencyelectrical noise in the roomandhigh frequencyoscillations from
23
various systemresonances,resulting in a smootheracoustic responseand reconstructed
waveform.Althoughthis filter
gavebetter results, it limited our ability to reconstruct low
frequencywaveforms
or control the filtering stepin the digital soundreconstructionprocess.
Despitetheseconsiderations,the mostimportantresult wasthat wewereable to reconstructa
500 Hz sinusoidal-like waveformusing discrete, digital acoustic pulses. With only seven
separate speaklets (instead of hundredsto thousandsof speaklets integrated on the same
chip), wewereable to proveour theoryof digital soundreconstruction.
Thenext step wasto designandfabricate a newset of chips that could producea higher quality
DSRwaveformthrough improveduniformity, faster response, and higher resolution. This
ultimately led to the creationof the 8-bit array describedin Chapter4. With255speaklets,each
216 #mon a side, wewereable nowto further explore the issues relating to digital sound
reconstructionin detail andavoidgthe fabrication of hundredsof chips. By the time the 8-bit
chip hadbeenfabricated.., the test bedusedfor theseexperiments
hadtripled in complexityand
capability. It also became
evidentthat the A-weightingfilter wasinappropriatefor this research
and not used in future experiments.With our equipmentsetup and chips available, wewere
readyto change
the world....
5.2 Test Setup
This section describes the software and hardwareusedto take the measurements
reported in
this thesis. An extensiveand flexible test bedwasnecessaryto run a wide rangeof digital
soundreconstruction experimentsand to sufficiently characterize the acoustic responseof
CMOS-MEMS
membranes.
5.2. 1 Generation
of Digital Signals
Weusedtwomethods
for creating digital signals froman analogwaveform:
(1) direct generation
of digital samples;and (2), an analog-to-digital converter (ADC).Wereconstructedsimple
sinusoidal, square, triangle, and sawtoothwaveformsusing Labview, a programming-based
24
platform that can generatesignals with
a data acquisition
(DAQ)card. The
interface shownin Figure 5.2 lets the
user choose a periodic waveformto
create, the frequencyof that waveform,
and the samplingrate for the digital
output. The amplitude values of the
sampled
analog
waveform
are
converted into binary numbers,where
eachbit containsthe digital information
for a specific groupof speakletsin the
array.
Therefore, with eight bit
channels,wecan create .a signal having
256 distinct
expected,
levels (0 to 255).
we can create
Figure5.2 Labviewbasedinterface that creates an 8-bit digital
representationof a simpleperiodic waveform.Theuser can control
the, waveformfrequency, samplerate, pulse width, and choose
betweensine, square, sawtooth, and triangle waves.Manualmode
optionallowsthe creationof a user-defined
pattern.
higher
resolution waveforms
with morebits becauseof the increasednumberof levels. By turning off
any of these channels, wecan examinehowdifferent bit weightingsaffect the reconstructed
output.
This approachis cumbersome
for anything morethan a simple periodic waveform,so for more
complex waveformssuch as music, a realtime hardware implementation is needed. To
accomplish
this, weusedan 8-bit analog-to-digital converter(ADC),a clock generationcircuit
for the sampling
frequency,anda circuit to adjust the input signal to fit inside the input rangeof
the ADC(see Appendix A). The ADCcircuit
could have been used to generate simpler
waveforms,but the Labviewapproachcan generatemoreconsistent and controllable digital
outputusefulfor initial testing of our speaker
arrays.
As shownin Figure 5.3, the digital signal must be logically ANDed
with another signal to ensure
that the speaklets are pulsed for eachsampleperiod containing a digital high. Theduty cycle of
this signal can control the width of the pulse without affecting the frequency. Applying this type
of signal is called the pulse methodof soundreconstruction.
If the digital signal is sent to the array without
(40%duty)
multiplexing
with the pulse signal,
then the
position of the speaklets will be determinedby the
_[-1
~
PULSED
OUTPUT
Figure5.3 Descriptionof pulsemethod:
the digital input
is logically ANDed
with a clock signal defined by the
samplingfrequencyandvariable duty cycle to determine
the pulsewidth. Theresult is a signal with finite pulses
occurringeachsampleperiodwith a digital HIGH.
logic level of the digital signal. As the majority of
acoustic pressure is generatedduring a transition
from one state to another (i.e.
the membranes
are
snappedor released), then the generated signal
will appeardifferent than whenthe speaklets are pulsed. This methodis called the step method
of soundreconstruction. The differences in these two reconstruction methodswill be presented
later in this chapter.
5.2.2 High Voltage Ampfifiers
As both the DAQcard and analog-to-digital converter chip are incapable of generating the 20 to
50 Volts currently necessary to collapse the CMOS-MEMS
membranes,we amplified the digital
+Vdd
data before driving the speaker arrays. Weexplored two methodsfor
high voltage amplification: (1) high voltage analogamplifiers, and (2)
high voltage MOSFETs.
The analog amplifier can be used for a wide
variety
of waveforms while the high voltage MOSFETs
can only
generate high voltage pulses with 10-20 ~s switching speeds. As
each bit channel requires its own amplifier,
however, the large
expenseand size of the analog amplifiers outweigh the usefulness of
generating a wider variety of signals. The circuit
Figure 5.4 High voltage
switchingcircuit to amplify
digital pulses.
schematic shownin Figure 5.4 uses a P-
MOSFET
configuration in which the output reaches VDD (
-
40 V) whenthe input gate voltage is
a digital low andnear groundwhenthe input is a digital high. In the future, with different
membrane
andgap geometries,smaller voltages suitable for on-chip CMOS
electronics will be
sufficient to collapsethe membranes.
5.2.3 Acoustic Measurement
and Analysis
Thelast section of our test bedis the equipment
usedto measure
the acousticresponseof the
speakerarray andconvertit to an electrical signal for analysis. Thetypes of analyseswewant
to performwill determineany additional equipment.
Anyacoustic measurements
whereinterference from external soundsand reflections is not
desired should be done inside a BrQel and Kjaer (B&K) 4232anechoic chamber,a selfcontained apparatus with foam
walls designedto absorb sound
energy(seeFigure5.5). It is also
common to take
acoustic
measurements
with
the
microphone held
at a fixed
distance from the soundsource.
To accomplish this, we used a
plastic
collar
to secure the
package and microphone with
minimal air leakage. The B&K
Figure 5.5 B&K4232anechoicchamberwith B&Kear simulator microphone
attached to DSRearphone.
4157Ear Simulator microphoneis designedto mimic the acoustic environmentof the human
ear, whichas shown
in Figure5.6, acts as a low-passfilter for frequenciesgreaterthan 40 kHz.
In Chapter2, we mentionedhowa low-pass filter
is necessaryto smoothout the acoustic
pulses producedby the array. For samplingfrequenciesgreater than 40 kHz, the human
ear is
a sufficient low-passfilter for anti-aliasing. If the samplingfrequencyis less than 40 kHzor
additionalfiltering is necessary,
thenanartificial filter mustbe applied,either throughthe design
of an acoustic filter
in the earphone
packaging
or applyingan electronic filter
at the output of the microphone. We
usedan RCactive filter
(see Appendix
A), to mimicfuture acousticfilters that
Figure 5.6 Frequency
response,of B&K4157ear simulator, designed
to mimicthe acoustic environment
of the human
ear canal. Theplot
is basedon the microphone
sensitivity at 500Hz.
would smooth the digital
acoustic
pulses.
Nowthat the acoustic energyhas beenconvertedto an amplified electrical voltage, the output
of the microphone
pre-amplifieris sent to severaldifferent locations depending
on the analysis.
Theoscilloscopedisplays the acoustic waveform
as a voltage, whichcan be convertedeasily to
a pressureby backingout the microphone
sensitivity andamplification. For example,the B&K
4157ear simulatorhasa sensitivity of 11 mV/Pa,so if the amplifier wasset to 100V/V, then the
following equationcan be usedto backout the RMS
pressureas seenby the microphone:
Wealso connected
the microphone
amplifier output to a spectrumanalyzerfor spectral analysis
of the DSRwaveforms.During digital
soundreconstruction measurements,the frequency
spectrumgives us information about the additional harmonicspresent besidesthe fundamental
weare trying to createas well as informationneeded
to calculate the total harmonicdistortion.
Wecan record the acoustic waveformby connecting the microphoneoutput to the auxiliary
input of a tape player (for analogrecording)or computer
(for digital recording). Therecordings
includedwith this thesis weremadewith a laptop andtransferredto compact
disc.
28
5.3 Characterizationof AcousticResponse
Thissectiondescribesa variety of experiments
usedto characterizeindividual bit andtotal array
responses.Specifically, wewill examinehow’the speed,amplitude, andshapeof the acoustic
responseare affected by the input pulseparameters.
5.3.1 Pulse Amplitude
In Chapter 3, we showedthat the displacementof a CMOS-MEMS
membrane
from an applied
electrostatic voltageexhibit a hysteretic behavior.Figure5.7(a) showsthe acousticresponse
the 8-bit array MSB
(128speaklets)
3
a function of the electrostatic voltage
2
for a 200 l~s pulse. The maximum
pressure of each response, shownin
-2
Figure 5.7(b), exhibits a nonlinear
-3
relationship to the pulsevoltageunder
4
35 Volts
.~ 2s ..........................................
I t5 .............................
and becomes linearly
dependent
after 35 Volts.
~.......................
It is no coincidencethat the collapse
~
voltage for this membrane
occurs at
(b)
Pu~se
Vo~ge
(V)
Figure 5.7 (a) Acousticresponseof MSB
(128speaklets)for varying
pulsevoltages(width= 200/z,s). (b) Maximum
pressureof
the changeover
of these
behaviors around 35 Volts.
exponential correlation
two
The
between
pressureand pulse voltage belowthe collapse voltage point can be explainedby the 2FE o~ V
relationship betweenvoltage andelectrostatic force. Immediatelyafter collapsing, only the
center region of the membrane
is in contact with the bottomof the cavity. As the voltage is
increasedbeyondthe collapse point, the outer regions of the membrane
continue movingand
provide additional pressure. This processis seenin the linear dependence
region between
29
pressure and voltage. Dueto variations in collapse voltage betweenthe speaklets, the
transition betweenthese two regionsfor a 128-membrane
array is not sharpand representsthe
averageresponse
of all 12.8 speaklets.
2.52
pulsewidth = 50 ~s
1.5
o
£ulsewidth= 100~ps
width = 20O
)ulsewidth= 1000
pulsewidth= 133p,s
3ulsewidth = 500
pulsewidth = 2000#s
I 2
1~5
Time(#s)
Figure5.8 Acousticresponse
of MSB
(128speaklets)k)r varyingpulsewidths: 25, 50, 100, 133, 200, 500, 1000,and
2000~s. Anoutline of the 40 Volt pulse is overlaid oneachdiagram.
3O
5.3.2 Pulse Width
Thesampling frequency of digital
soundreconstruction maybe fixed by the electronics
providingthe input digital words,but wecanstill control the widthof the pulsethat remainshigh
with each sampleperiod. Figure 5.8 showshowthe amplitude, speed, and shapeof the
acoustic responsevary with the pulse width. Several changesoccur as the pulse width is
increased:(1) the amplitudeincreaseslinearly until it reachesa maximum
around133~s; (2)
the speedof the acoustic response,measured
by the time from the pulse start to the maximum
positive pressure,also increaseslinearly until it reachesa maximum
around133p.s; (3) the first
andsecondpeaksbegin to separate, leaving a muchsmaller secondarypeakthat decayswith
someoscillation. In the caseof the snappulse responseshownin the figure, you can clearly
see that the positive peakresults from the collapse of the membrane
while the negative peak
results frommembrane
returning to its original position. Oncethe two peaksdo not overlapwith
each other, the acoustic impulse responsetransforms into a step response.This type of
responseis used with the step methodof soundreconstruction mentionedin the previous
section.
5.3.3 Pulse Type
Wecan invert the order of the positive and negative pressure peaksby applying a release
pulse, wherethe membrane
is releasedfromits deflectedstate for the pulse duration. As shown
in Figure 5.9, the release pulseyields a larger absoluteamplitudepressureresponsethan the
3
(a)
(b)
Figure5.9 (a) Acousticresponseof Bit 6 (64 speaklets) given a 40 Volt, 200,us snapandrelease pulse. (b) Absolute
maximum
pressurefor snapandreleasepulsesas a function of pulse width(40 Volt pulse, MSB).
31
snap pulse. The speed and overall pattern of the acoustic response, however, remain
unaffectedby the pulsetype.
5.3.4
Summary
Changes
in the input voltage pulse influence the amplitudeof the acoustic responsebut leave
speedand shapeof the responseunaffected. With higher voltage pulses, we can achieve
greater pressurestranslating to louder sound,but high voltageslimit the use of standardlowvoltage CMOS
for integration of on-chip electronics. Thepulsewidth gives us morecontrol over
the acoustic response and can be used to generate both impulse and step responses.
However,the pulse width is ultimately limited by the sampling frequencyused for sound
reconstruction. As the pulse type only affects the amplitudeof the pressureresponse,there
are manyopportunitiesfor future researchthat can use different shapedpulsesto improvethe
quality of reconstructedwaveforms.
5.4 SoundReconstruction Measurements
This section can be consideredthe mostsignificant part of the thesis becauseit provides
evidenceof digitally reconstructedsound.Wewill start by creating simple exponentialand
sinusoidal waveforms
andturn our attention to examinehowthe pulse voltage and pulse width
affect these reconstructedsignals. Thelast two sections describethe reconstructionof other
periodic waveforms
and soundrecordings that are included in this thesis. For mostof these
experiments,wewill compare
the step andpulse methods
of reconstructionmentioned
earlier in
this chapter.
5.4.1 Pulsing of SequentialBits
Beforedelvinginto sinusoidsandother patternsthat involve multiple bits firing simultaneously,
we reconstructeda simple exponential signal by firing eachbit group sequentially. As the
acoustic responseto a 40 Volt, 200t~s pulse is knownfor eachbit andthe cycle time of the
pattern is longer than the time necessaryfor the responsesto decay,wecanpredict the linear
32
superposition of these ~responses
as a function of time. Figure 5.10
showsthe predicted and actual
experimental
response
waveforms
in
to the following
500
1500
0
exponentialsequence:
0, 1, 2, 4, 8,
16, 32, 64, and 128. Thesevalues
represent the numberof speaklets
-1,5
R’edicted Multiple Sequence
Actual Multiple Sequence
-2
Time(p.s)
that are pulsedandcorrespondto a
specific bit group(see Chapter2).
Figure 5.10 Predicted and actual waveform in response to the
sequentially pulsing of each binary weightedgroup in the 8-bit array.
Giventhat linearity of acousticpressureshasalreadybeenproven,it is not surprising that the
predicted and actual waveforms
closely matcheachother. This experimentalso points out a
very importantcurrent limitation of DSR.Thebipolar nature of the acoustic responsemakesit
difficult to producesignals requiring small amplitudes(or pressures)immediatelyafter a large
amplitude.Thenegativepeakof the larger bits drownsout the positive peakof the smallerbits,
whichexplainswhya negativepressureis presentimmediatelyafter the MSB
clicks instead of a
low positive pressurelevel. Several possible methodsfor dealing with this issue will be
discussed
in the nextcha.pter.
5.4.2 TheBit-by-Bit Reconstructionof a 400HzSinusoid
Section5.2 of this chapterdescribedhowan 8-bit digital signal canbe createdto representan
analogwaveform.Sinceeachof the speaklet groupsin the 8-bit array are binary weighted,we
can study the effect of addingbits oneat a time towardsthe creation of an acoustic waveform.
This experiment
will serveas the first of manythat will highlight differencesbetween
the step
and pulse methodof soundreconstruction. Theseries of oscilloscope traces shownin Figure
5.11 describethe reconstructionof a 400Hzsinusoidas the individual bit groupsof the 8-bit
array are addedsequentially. Figure 5.11 (a) and 5.11 (b) compare
the step and pulse methods
33
L__~
......
Ibl PulseM~ethod
(a) StePMethod
...............
Bit 7 (MSB)
+ Bit 6
+ Bit 5
+ Bit 1
+ Bit 0 (LSB)
Time(500lzs/div)
Figure 5.11 Reconstruction of a 400 Hz sinusoid with various levels of last-group inclusion (sampledat 8 kHz).
Weused a 701zs pulse width for the pulse method.
of this reconstruction, respectively. In the first set of traces, only the mostsignificant bit (MSB),
containing 128 speaklets, are clicked. The secondset describes the addition of the next most
34
significant bit, containing64 speaklets,andso forth. Threekey observationscanbe madefrom
these sequences:(1) the waveform
producedby the step methodis almostfour times greater
amplitudethan the waveformproducedby the pulse method;(2) the pulse methodproduces
amplitudemodulated
reconstructedoutput with a carrier frequencyequal to the samplingrate;
and(3), regardlessof either method,the additionof the twoleast significant bits (LSB)haslittle
or no effect on the overall waveform.
5.4.3 Effect of LowPassFilter
Based
on the results seenin Figure5.11, digital soundreconstructionis clearly at an early stage
of development.Thesamehigh frequency components
observedin the acoustic responseof
single membranes
are present in the exponential andsinusoidal waveforms.Thesefrequencies
are below 20 kHzand heard as distortion abovethe lower frequencyfundamental.However,
weare forgetting the last: importantstep in DSR
or any other sampledreconstructionprocess:
the lowpassfilter.
As mentionedpreviously, the nature of DSRinvolves the production of higher frequencies
usually associated with the samplingfrequency and resonancesof the membrane,
earphone
cavity, andmicrophone.Ideally, the human
ear will filter
additional filtering
out manyof these frequencies,but
might be necessaryfor lower samplingfrequencies.Figure 5.12 showsthe
same8-bit, 400 Hz reconstructed waveformwith several different low pass filters
applied
electronically: cutoff frequency,fc = 22 kHz, 11 kHz, 4 kHz, and2 kHz.As the cutoff frequency
is decreased, the oscillations
resulting from the membrane
and cavity resonancesare
attenuatedand the quality of the waveformimprovesgreatly. For the waveformreconstructed
using the pulse method,the amplitudemodulationeffect is completelyremoved
with the 4 kHz
cutoff frequencybecausethe 8 kHzsamplingfrequencyhas beenfiltered out. As a great deal
of the energyof the unfiltered signal arosefrom the samplingfrequency(including harmonics
andside bands), the amplitudeof the filtered 400 Hz sinusoid becomes
very small compared
with the waveform
using the step method.Tile correspondingfrequencyspectra andcalculated
(a) Step Method
(b) Pulse Method
Nofilter
fc = 22 kHz
fc = 11 kHz
./
fc = 4 kHz
/
fc = 2 kHz
Time(500,usldiv)
Figure 5,12 Oscilloscope traces of a reconstructed .400 Hz sinusoid (sampledat 8 kHz) for a variety of lowpass filters applied electronically. A comparisonbetweenthe (a) step method and (b) pulse method
shown. Weused a 70/~s pulse width for the pulse method(b).
total
harmonic distortion
(THD) values are shown in Figure 5.13, from which three key
observations can be made. First,
the application of the low-pass filter
clearly attenuates
frequencies abovetheir cutoff frequencies with minimal affect on the fundamental. Second, we
notice from the changein harmonicdistortion that the pulse methodbenefits from the low-pass
filter
a great deal morethan the step method,particularly whenthe cutoff frequency drops below
the sample frequency. This is to be expected as the pulse method contains more energy
around the sampling frequency (and its harmonics) than the step method. Finally,
note that with either reconstruction methodand sufficient low-passfiltering,
arrays can attain
THDvalues that correspond to good quality
36
we should
the digital speaker
analog speakers (-1% THD).
(a) Step Method
’°1 ......................
THD= 41.14%
(b) PulseMethod
THD= 97+68%
.+o
+
|1
2
Nofilter
....[ , ~ I I1~11IIII!111!1111
-,o+..........................
THD= 39.26%
fc = 22 kHz
............................
THD= 25.85%
+,,,+
THD= 76.13%
\
THD= 4.32%
+~
~~
+,++.......................
It IIIIIIIIIIIIIIII
IIIIIIIIIII I~
II! lllll IIn
fc = 11 kHz
THD= 2.21%
fc = 4 kHz
°]..........................................................................
THD = 0+63%
THD= 0.14%
fc = 2 kHz
Figure5.13Frequency
spectraandcorresponding
total harmonic
distortion(THD)
valuesfor a reconstructed
400
sinusoid
(8 kHzsampfing
rate)after a vadety
of low-pass
filters areapplied
electronically.
Boththe(a) stepand
pulsemethods
are shown+
37
5.4.4 Characterization of ReconstructedWaveforms
In Section5.3, wedescribedhowthe acoustic responseof a single bit groupvaries with the
voltage and width of the applied pulse, lit
was shownthat (1) the pressure increases
exponentially with voltage until the membranes
collapse andlinearly afterwards; and(2) the
pressureincreases linearly with pulse width until it reachesa maximum
amplitude. Sothe
question wewantto answeris whetherthe samevariations hold for a digitally reconstructed
waveform. Figure 5.14(a) shows
a reconstructed 400 Hz (fs = 8
kHz, f¢ = 4 kHz) sinusoid for
range of pulse voltages. From
this diagram,wecan see that the
¢--o.5
shapeof the waveformcloes not
changewith increasing voltage.
(a)
Time(p.s)
The maximumpressure of these
waveformsis plotted in Figure
5.14(b), showingthat as the pulse
voltage increases, the pressure
follows the samepattern for the
400 Hz sinusoid as the acoustic
response.
Furthermore, the
transition point
between the
Fi~]ure ~i.14 (a) Reconstructed400 Hz sinusoid as a function of pulse voltage
(200/is pulse width) using the step methodsampled at 8 kHz. (b) Maximum
pressureof (a).
exponentialandlinear regions also occursbetween30 and35 Volts.
In the acoustic responsemeasurements,
the voltage pulses werespacedfar enoughapart to
avoid interactions betweenconsecutivepulses. Whenusing the pulse methodto reconstruct
sound,the pulsespacingis fixed by the samplingrate. Therefore,varying the pulsewidth will
havea slightly different effect with reconstructedwaveformsbecausethe spacing between
consecutive pulses decreasesas the pulse width increases. Figure 5.15 showsthe same
reconstructed 400 Hz sinusoid for a
rangeof pulse widths, including the
measuredmaximum
pressure for each
~ o
pulse width. As the pulse width
increases,
the
a. -0.5
mass of the
membranes prevent
-I
-I .5
them from
Time (ms)
completely responding to the low
1.4
voltage(digital zero) beforethe cycle
08
repeats.Therefore,it is not surprising
that as the pulse width approaches
the
0.2
0
sampleperiod, 125 Us, or 100%duty
20
observedusing the step rnethod.
60
80
1 O0
120
F~lse W~th(~s)
(b)
cycle, the waveform matches one
40
Figure5.15 (a) Reconstructed
400Hzsinusoidas a function of pulse
width (40 Volts) using the pulse methodsampledat 8 kHz. (b)
Maximum
pressureof (a).
5.4.5 Effect of SamplingFrequency
The samplefrequency at which digital
sound reconstruction occurs limits the range of
frequenciesthat can be created. Accordingto the Nyquist Samplingtheorem,with a sampling
frequency,fs, wecan reconstructsignals with frequenciesup to 0.5fs. SinceDSRinvolves the
creation of soundin time, highersamplingrates will also increasethe resolution andquality of
the waveform.However,as mentioned
in Chapter2, the samplingrate is usually limited by the
responsetime of the membranes.
The application of the low pass filter
will also play an
important role in choosing a samplefrequency. For example, if a complexwaveformis
reconstructedat a 8 kHzsamplingfrequency,then a low pass filter
with 4 kHzcutoff (3 dB)
frequencyshould be usedto allow any frequenciesunder4 kHzto be properly reconstructed.
Figure 5.16 showsthe reconstruction of a 400 Hz sinusoid at four different
frequencies:4 kHz, 8 kHz, 22 kHz, and44.1 kHzusing a low passfilter
39
sampling
with 2 kHz, 4 kHz, 11
(a) Step Method
\
(b) Pulse Method
/
~._t/~ /~7\
"--._ ,/~
"-._~ ./-~’~
"-~" fs=4
fc 2kHz
kHz
133ps pulse
,~.j’ ’\/
fs = 8 kHz
fc = 4 kHz
70 f~s pulse
+ ~ i~,,,
it~.t[i;~.w
......................
15 ~ pulse
.....
Time(1 ms/div)
Figure5.16 Reconstructed
400Hzsinusoidas a function of samplingrate for both the step andpulse
methods.
Thecorresponding
low-pass
filter andpulsewidthfor (b) are listed on the right.
kHz, and 22 kHz cutoff frequencies, respectively. In the pulse methodexamples,notice that the
sampling frequency is adequately filtered
regardless of sound reconstruction
out to removethe amplitude modulation. However,
method, the higher sampling rate and low pass filter
combinations are incapable of completely removing the oscillations.
Depending on the
bandwidth and acceptable amountof distortion
in the application,
it maybe necessary to
decreasethe cutoff frequencyof the low passfilter
until the desired quality is reached.
5.4.6 Frequency Characterization
Now,we focus our attention on the reconstruction of a wider range of frequencies. Figure 5.17
showsthe reconstruction of a 50, 100, 200, 400, 800, 2000, and 4000 Hz sinusoid using both
the step and pulse methods. For the pulse method, we used a 70 ~s pulse width (56% duty
cycle), but kept the same8 kHz samplingrate and 4 kHzcutoff low-pass filter
4O
for both sets of
(a) StepMethod
(b) PulseMethod
50 Hz
5.0 ms/div
100 Hz
2.0 ms/div
200 Hz
2.0 ms/div
400 Hz
500ps/div
~--X~ ~/i
~,
.,,.,:
200 IxS/div
,i
~" ................................................................
Time(per
Figure 5.17 Reconstruction
of sinusoid
for different
frequencies using the step and pulse methods sampled at 8 kHz.
The corresponding frequency and time axis for each diagram is listed
on the right.
experiments. With this data, we can begin to construct a frequency response of the speaker
array acting as a digital
speaker array, not to be confused with the frequency response
measurementof the CMC)S-MEMS
membranesdiscussed in Chapter 3. Shownin Figure 5.18,
41
we see that
the step
method
consistently
produces a higher
9O
intensity
8O
7O
sound than the pulse ¯
methodfor the audio range up to 4
~_
so
m 4o
StepMethod
Pu~seMethod
kHz. In either case, the soundpower
20
lO
o
levels for both methodsare typically
100
lO
1000
10000
Frequency(Hz)
78 - 95 dB SPLover all frequencies
between
50 and 4000 Hz. For
Figure 5.18 Frequencyresponseof 8-bit array whenoperating as a
digital array using the step and pulse methods.The analog ddven
membrane
responseis overlaid on the graphfor comparison.
comparison, the analog driven membrane
response is overlaid in the figure to highlight
the
improvedperformanceof the digital speaker array over the analog speakerarray.
(a) Step Method
(b) Pulse Method
.
,
,,
.,
Sawtooth
~, Wave
Triangle
Wave
Time(1 msldiv)
Figure 5.19 Reconstructed
400Hz waveforms
(square, sawtooth,andtriangle wave)using 8 kHzsampling
rate (4 kHzcutoff frequency
filter) for bothstep andpulsemethods.
Weuseda 70ps pulsewidthin (b).
5.4. 7 Reconstruction of ComplexPeriodic Waveforms
Figure 5.19 showsthe reconstructed waveformsof several periodic signals containing multiple
frequency components:(a) square wave, (b) sawtooth wave, and (c) triangle wave. The
42
waveforms
werecreatedusing an 8 kHzsamplingrate and4 kHzcutoff low-passfilter.
Similar
to the sinusoidal waveforms,the step methodproducesa higher pressurewaveformthan the
pulse method. However,the shapeof the waveformbetter resemblesthe intended analog
waveform
using the pulse methodthan the step method,mostlikely in part from the continuous
pulsing that is missingusing the step method.However,the switch from a very high pressure
to very low pressure found in the square and sawtooth waveformsis difficult
to produce
regardlessof whichmethodweuse. This artifact, seenin the figure as periodic spikes, was
discussed
previouslyin Section5.4.1 on the pulsingof sequentialbits.
5.4.8 SoundRecording.,~
Althoughdigital soundreconstructionhas beentheorized in several patents andpapers,it has
not beendemonstrated
due to the high costs of engineeringa uniformspeakerarray. With the
CMOS-MEMS
microspeakerarrays presented in Chapter 4, we were able to prove this theory
and take DSRto the next step of becomingan emergent technology. To complete our
demonstration, we included somesoundrecordings of DSRin action. Although a complete
listing can be found in AppendixB, we will outline the content on the enclosedcompactdisc
(playablein computers
andCDplayers)in this section.
First, werecordedthe chck, a terminitially
mentioned
in Chapter2 to meanthe snappingof a
single or groupof membranes.
Youwill quickly understandwhywe chosethat wordto describe
this action. Then,werecordedthree experirnentspresentedin this chapter: (1) effect of lowpass filter
and (2) sampling frequency, on the reconstruction of a 400 Hz sinusoid; (3)
reconstruction of a rangeof frequencies. In all three experiments,wecompared
the pulse and
step methods.Oneobservation that you will notice immediatelyis the different loudness
betweenthe step and pulse methodsand howeachis affected by the low-pass filter.
The
presence
of the low-passfilter wasexplainedin the discussionin Section5.4.3.
Thefinal set of recordings consists of musicfrom a wide variety of genres. Before these
recordings,weincludeda short audioclip whenthe musicis silent but the electronics are still
43
operational. Depending
on the reconstructionmethodandlow-passfiler used, you will hear two
different formsof background
noise. Withthe pulsemethod,no input voltage, andno filter,
you
will hear a constantfrequencyequal to the samplingfrequency.To explain, weneedto refer to
Section5.2, wherewedescribedhowthe input analogsignal is divided up into quantizedlevels
representedby a binary r~umber.With this designation,the mostpositive pressureis assigned
the highest level (usually 255for an 8-bit quantization), andthe mostnegative pressure
assignedthe lowest level (usually zero). So what happenswhenno pressureis present?The
middlelevel (around127or 128for 8-bit) is sentto the array. If the pulsemethod
is used,then
the bit groupsrepresentingthat middle level are periodically snapped
at a rate equal to the
samplingfrequency.Withoutthe presenceof a low-passfilter,
the samplingfrequencywill be
the major contribution of noise. If the step methodis used, then these bit groupsremain
collapseduntil the level changes.Thusthe primary sourceof soundusing the step methodis
the electronic andthermalnoisein the system.
Wheredoesthe electronic noise comefrom’? Twosourcesof noise typically associated with
ADC
circuits includetimejitter andoccasionaldrift outsidethe input rangeof the ADC
[9]. If we
look at the path of the input signal taken during this process,you will notice whythesenoise
sourcesare so influential. Theinput signal starts fromdigital samplesstoredin the CDor other
digital mediaplayer. Thesignal is internally converted to an analog waveformand then
convertedback to 8-bit digital samplesand amplified by the hardwaresetup discussedin
Section 5.2. The microphoneconverts and amplifies the acoustic waveformto an analog
voltage. After applyingour electronic filter,
the analogsignal is then againconvertedbackto
digital samplesfor recording onto the computer.Muchof this complicatedpathwaywouldbe
unnecessaryand contain less noise if we useda music source that generatedthe digital
samples
directly.
Similar experimentswith the sinusoids are also heardon the CDusing music, including the
effect of the low-passfilter
andsamplingfrequency.Oneexperimentadjusts the samplingrate
44
from an initial
frequencybelow the low-pass filter
cutoff frequency(wherethe sampling
frequencyoverwhelms
the music) to a final frequencywell abovethe cut-off frequency(where
the musicis mostlyheard). Putting these details aside, however,you will find the recordings
very impressivefor an acoustictechnology
this early in its development.
5.4.9
Summary
In this section wepresentedthe reconstruction of several types of waveforms,ranging from
simpleexponentialandsinusoidal signals to morecomplexmusicalrecordings. Westarted with
the sequentialpulsingof eachbit groupin the 8-bit speakletchip to forma periodic, exponential
waveform.This simpleexperimenthighlighted the effect of a bipolar acoustic responseon the
reconstructionprocess.Next, weillustrated the bit-by-bit reconstructionof a 400Hzsinusoid
using the step and pulse methods. The pulse methodcreates an amplitude modulated
waveform
with a carrier frequencyequalto the samplingrate. For this reason,the pulse method
initially
contains moreenergyandexhibits a louder soundover the step method.After the
application of our low-passfilter,
the pulse method
considerablyreducesin volumebut contains
a cleaner waveform.Wealso confirmedthat the input voltage andpulse width (during the pulse
method)exhibit the sameeffect on the reconstructed waveformas they do on the acoustic
responseof a single speaklet. Fromthe reconstruction of several different frequencies, we
wereable to constructthe frequencyresponseof the 8-bit array whendigitally driven andfound
that it performs10-20dBbetter from 50 to ,$000Hzthan an analogdriven array. Finally, we
demonstratedthe reconstruction of more complexperiodic waveformssuch as square and
sawtoothwaves,andfinished off with musicalexcerpts. Despitethe substantial noise created
from the ADCand other electronics, the soundrecordings included with this thesis are
undeniableproof that digitally reconstructedsoundwill soonreplace its analogcounterpartto
createa truly digital audiosystem
for the future.
45
Chapter6" Conclusions
andFuture Work
This thesis covereda lot of material, from the conceptsand issues regardingdigital sound
reconstruction to the numerous
experimentsto characterize the performanceof our digital
speakerarrays. Sobeforewediscussthe implications of this researchas well as future workto
improvethe quality of our DSRearphones,
it is importantto reviewthe material coveredin the
last five chapters.
Weintroducedyou to the conceptof digital soundreconstruction, a processby whichdiscrete
acoustic pulses of energy created from an array of speakersor speaklets are summed
to
producea time-varying waveform.Theresulting pressurevariations havea digital magnitude
correspondingto the analog value at sampledtime values. An n-bit DSRchip needs2n-1
speakletsin the array. As this formof true direct soundis constructedin the time domain,there
are several requirementsfor DSRto be effective. Theacoustic pressureresponseto a finite
pulse mustbe fast comparableto the samplingrate. To ensurea linear system, these same
responses
mustaddlinearly andbe uniformover time from onespeakerin the array to another.
Developed at Carnegie Mellon University,,
performance
reliability
a CMOS-MEMS
microspeaker provides the
and cost-effectivenessneededfor an array of speakers.Wedescribed
the microspeakerfabrication and the dynamicmodelsuseful for characterizing individual
membranes.
Fromour magnifiedview of a single speaklet, we then presentedthe global issues
concerningtransducer arrays, including harmonicdistortion, a meansfor evaluating the
effectivenessof any souredreconstructionprocess.
Finally, wepresenteda wealthof experimental
data designedto provethe possibility of digitally
reconstructedsoundandexamineits limitations and further areas of improvement.To drive
homethe ideas behind DSR,we included oscilloscope traces, frequency spectra, and even
soundrecordings for those whoprefer to makejudgmentsbasedon what they hear. So where
doesthat leave us?
Digital soundreconstructionhas beensuccessfullydemonstrated.
It is clear from the results
presentedin Chapter5, however,that the there is moreresearchto be done.Theexperimental
data doesnot completelymatchthe DSRtheory, but it doessupport it. Thebipolar acoustic
responseof our membranes
makesit difficult
to reconstruct pressurewaveforms
accordingto
the digital signal sent to the array. Theresulting waveforms,
however,are periodic with 50-60%
harmonicdistortion, comparedwith 0.1-10%from traditional analogspeakers. Theresponse
time of our current membranes
are too slow compared
with the 44.1 kHzsamplingrate usedin
the music industry, but the processworksnicely around8 kHz, sufficient for reproducing
frequencies under 4 kHz. Throughfurther mechanicaland acoustical modelingof the CMOSMEMS
membrane,we can dampenor adjust system resonances and design a speaker to
operate at higher samplingrates with minimaloscillations. Throughmorecareful processing
control duringfabrication .and testing, wecanimprovespeakeruniformity andlinearity to reduce
harmonic
distortion.
If weare not able to modify the bipolar acoustic response,then the step methodmayhold the
key to generatingcleaner signals. ThepremisebehindDSRpresentedin this researchinvolves
the summation
of positive pressuresto create
a time-varying waveformwith a positive DC
bias. Asdescribedin Chapter5, this bias can
be heard during silence without proper
filtering.
Therefore,an alternative solution
wouldbe to create both positive andnegative
pressuresusing the step pulse. As shown
Jr!
Figure6.1, whena positive unit of pressureis
Figure 6.1 Oscilloscopetrace for the acousticstep response
of the MSBwhena step pulse is appfied.. The positive
pressureoccurs whenthe high voltage is present and the
negativepressureoccurswhenthe votlage is takenaway.
needed,a high voltage is generatedandthe speakletcollapsesto the substrate. Thenext time
4"/
a negativeunit of pressureis needed,the voltagedropsto zero andthe speakletreleasesto its
original position. Thedisplayed acoustic output is really a step responseinstead of pulse
response so the samelinearity
and uniformity issues with pulses apply. With on-chip
electronics, the current position of eachspeakletcould be stored to avoid confusionof which
membranes
havecollapsed and whichare released.
Theuseof on-chipelectronics will also allow us to look beyond
the simpledigital voltagepulses
and create an optimal pulse shapefor generating a desired acoustic response. Instead of
assigningspeakletsto a specific bit group,these electronics coulddynamicallydistribute the
load over the entire array. It is also conceivableto detect brokenmembranes
andredirect their
signals to other backupspeaklets. Both of these features would dramatically improvethe
reliability andperformance
of the chip.
With the flexibility
of CMOS-MEMS
technology,there are manyopportunities for improvingthe
quality of digitally reconstructed
sound.It is only a matterof time beforeanalogtransducersare
replacedby their digital counterparts
towardsthe constructionof a truly digital soundsystem.
References
[1] Doi et al., "Fluid Flow Control SpeakerSystem," USpatent 4194095(1980).
[2] W.E.Stinger, Jr. "Direct Digital Loudspeaker,"USpatent 4515997(1985).
[3] Huanget al., "Distortion and Directivity in a Digital TransducerArray Loudspeaker,"Journal
of Audio EngineeringSociety, vol. 49, pp. 337-352(200t May).
[4] P. Morseand K. Ingard, Theoretical Acoustics (McGraw-Hill, NewYork, 1968).
[5] G.K. Fedderet al.,
"LaminatedHigh-Aspect Ratio Microstructures in a Conventional CMOS
Process," Sensorsand Actuators A 57, pp. 103-110 (1996).
[6] J.J. Neumann,K.J. Gabriel, "CMOS-MEMS
Membranefor Audio-Frequency Acoustic
Actuation," Sensorsand Actuators A 95, pp. 175-182(2002).
[7] Ladabaum
et al.,
"Sud:ace MicromachinedCapacitive Ultrasonic Transducers," IEEE
Transactionson Ultrasonics, Ferroelectrics, and FrequencyControl, vol. 45, pp. 678-690(1998).
[8] W.C. Young,Roark’s Formulasfor Stress and Strain (McGraw-Hill, NewYork, 6th ed. 1989).
[9] Tewksburyet al., "TerminologyRelated to the Performanceof S/H, A/D, and D/A Circuits,"
IEEETransactions on Circuits and Systems,vol. cas-25, pp. 419-426(1978).
AppendixA: Circuit Schematics
A.1 Generation of Digital SamplesUsing D/A Converter
ADC[3o!
B1
k4N
D2
-- VREF4- D5
-- ’,,’R EF-- D4-
....~Joc
OE
VDC = 0
VAC~ 1 Vpp
D5_
D6 D7
EOC-
Digital
samples to
amplifier
circuit
A.2 High Voltage Amplification Circuit
+Vdd
~-Vdd
Step Method
digital
samplesin
or
Pulse Method
(~
Labviewor
ADCcircuit
50
A.3 Low-PassFilter Circuit
3-poleButterworthfilter with 4 kHz3dBcutoff frequency
Resistorvalues:10, 20, 70, and178kgZcorrespond
to 2, 4, 11, and22 kHzcutoff frequencies,
respectively.
:
outputto
oscilloscope,
spectrum
analyzer,etc.
conditioning
amplifier (- 2 Vpp)
NOTE:
Theabove3-polebutterworthfilter circuit wasusedin conjuctionwith a 2-polefilter in
the B&Kconditioningamplifier to forma 5-pole low-passfilter. Themagnitude
andphase
response
of a 5-pole(4 kHzcutoff) filter is shown
below.
-80
Magnitude
responseof 5-pole low-passfilter
-300
Phaseresponseof 5-pole low-passfilter
51
AppendixB" Index of SoundRecordings
200 #s pulses at I s intervals using MSB(128 speaklets)
[1] without low-passfilter
(LPF)
:10
[2] with 4 kHz 3 dB LPF
:10
Various frequencies (50, 100, 200, 400, 800, 2000, 4000 Hz) sampledat 8 kHz (4 kHz LPF)
[3] pulse method(70 Us pulse width)
:36
[4] step method
:33
Varying sampling rate/LPF combinations (44 kHz w/22 kHz LPF, 22 kHz w/11 kHz LPF,
8 kHz w/4 kHz LPF, 4 kHz w/2 kHz LPF) of 400 Hz sinusoid
[5] pulse method(15, 25, 70~ 125 #s pulse widths, respectively)
:20
[6] step method
:20
Varying LPF3 dB cutoff frequencies (no filter,
22, 11,8, and 4 kHz) of 400 Hz sinusoid
sampledat 8 kHz
[8] pulse method(70 Us pulse width)
:26
[7] step method
:26
Silence sampledat 8 kHz with electronics turned on
[9] pulse method(70 p~s pulse width)
[10] step method
Excerpt from "If"
:11
:10
by Perry Comosampledat
[11] 44 kHz (22 kHz LPF) using pulse method(11.4 ks pulse width)
:30
[12] 44 kHz (22 kHz LPF) using step method
:30
[13] 22 kHz (11 kHz LPF) using pulse method(22.7 ks pulse width)
:29
[14] 22 kHz (11 kHz LPF) using step method
:30
[15] 8 kHz (4 kHz LPF) using pulse method(62.5 #s pulse width)
:29
[16] 8 kHz (4 kHz LPF) using step method
:29
[17] 4 kHz (2 kHz LPF) using pulse method(125 #s pulse width)
:31
[18] 4 kHz (2 kHz LPF) using step method
:30
[19] Excerpt from "Battlestar Galactica" with samplingrate sweepfrom 1 kHz to 44.1 kHz
(4 kHz LPF) using pulse method(50% duty cycle pulse width)
:40
Excerpt from "Pictures at an Exhibition" sampledat 8 kHz using step methodwith
[20] no low-passfilter
:34
[21] 22 kHz LPF
:33
[22]
11kHzLPF
:33
[23] 4 kHz LPF
:33
[24] 2 kHz LPF
:34
Songselections sampledat 8 kHz (4 kHz LPF) using step method
[25] "Sleighride" by Leroy Anderson
2:50
[26] "Splashdown" from Apo/fo 13
2:05
[27]
1st movementfrom "SymphonyNo. 5" by Ludwig ,Jan Beethoven
[28] "Also Sprach Zarathustra" by Richard Strauss
1:39
1:35