Digital Sound Reconstruction Using Arrays of SMOD
Transcription
Digital Sound Reconstruction Using Arrays of SMOD
Digital Sound Reconstruction Using Arrays of CMOS-MEMS Microspeakers Brett M. Diamond 2002 Advisor: Prof. Gabriel Electrical & Computer ENGINE RING DSR Sound R~cordings Digital SoundReconstruction Using Arrays of CMOS-MEMS Microspeakers by Brett Matthew Diamond B.S. CarnegieMellon University (2000) A proiect submittedin partial satisfaction of the requirementsfor the degreeof Master of Science in Electrical and ComputerEngineering Carnegie Mellon University Committeein charge: Professor KaighamGabriel Professor Richard Stern 2OO2 To myfriends, family, and those without a cure Table of Contents Acknowledgments ................................................................................................... v Chapter 1: Introduction ............................................................................................ 1 1.1 Historyof Sound Reconstruction ......................................................................... 1 1.2 Overview of Research ..................................................................................... 2 1.3 Summary ...................................................................................................... 3 Chapter 2: Digital Sound Reconstruction ................................................................... 5 2.1 Concept .......................................................................................................... 5 2.2 Bit Grouped Digital Array................................................................................... 6 2.3 Requirements of Digital Sound Reconstruction ....................................................... 7 Chapter 3: CMOS-MEMS Microspeaker ...................................................................... 8 3.1 Introduction .................................................................................................... 8 3.2 Fabricationof CMOS-MEMS Microspeaker ........................................................... 9 3.3 Dynamic Modelingof CMOS-MEMS Microspeaker ................................................ 12 3.4 AcousticResponse of: CMOS-MEMS Microspeaker ............................................... 14 Chapter 4: DigitalSpeaker Arrays ............................................................................ 16 4.1 Introduction ..................................................................................................... 16 4.2 Characterization of Digital Speaker Arrays............................................................ 17 4,2,1 Uniformity ............................................................................................ 17 4.2,2 Linearity ................................................................................................. 19 4.2.3 Directivity............................................................................................. 19 4.3 Harmonic Distortion........................................................................................ 20 Chapter 5: Experimental Results ............................................................................ 23 5.1Introduction ................................................................................................... 23 5.2TestSetup ...................................................................................................... 24 iii 5.2.1 Generation of Digital Signals................................................................... 24 5.2.2 HighVoltageAmplifiers......................................................................... 26 5.2.3 AcousticMeasurement andAnalysis......................................................... 27 5.3 Characterization of AcousticResponse,. ............................................................. 5.4 29 5.3.1 PulseAmplitude ................................................................................... 29 5.3.2 PulseWidth........................................................................................ 31 5.3.3 PulseType......................................................................................... 31 5.3.4 Summary ............................................................................................ 32 Sound Reconstruction Measurements ................................................................ 32 5.4.1 32 Pulsingof Sequential Bits ...................................................................... of a 400HzSinusoid..................................... 5.4.2 TheBit-by-Bit Reconstruction 33 5.4.3 Effectof LowPassFilter ........................................................................ 35 5.4.4 Characterizationof Reconstructed Waveforms ........................................... 38 5.4.5 Effectof Sampling Frequency ................................................................. 39 5.4.6 Frequency Characterization .................................................................... 40 5.4.7 Reconstruction of Complex PeriodicWaveforms ......................................... 42 5.4.8 Musical Recordings .............................................................................. 43 5.4.9 Summary ............................................................................................ 45 Conclusions andFutureWork................................................................ 46 References ............................................................................................................. 49 Appendix ............................................................................................................... 50 Chapter 6: A. CircuitSchematics ........................................................................................... 50 B. Indexof Sound Recordings .............................................................................. 52 Acknowledgements I walked into graduate school not knowing muchabout MEMS or acoustics. I was a musician and an engineer with a focus on data storage from my undergraduateyears. But I was inspired and trained by someof the best people that Carnegie Mellon University has to offer, and would like to thankthemindividually for their contribution to this research. First, I would first Gabriel. like to express my sincere gratitude towards my research advisor, Ken From his experience in industry, academia, and government, Professor Gabriel provided me with the freedom to explore and guidance to succeed. His enthusiasm for this project motivated methrough the difficult times of graduate school and prevented me from losing sight of my goals. Despite his devotion to his family, he wouldtake a few hours out of his weekend to meet with me when needed. Our professional and personal relationship has provided me with a well-rounded research experience. Professor Richard Stern has a passion for signal processing and acoustic applications that made him perfect for my committee. I enjoyed taking his DSP course and our frequent discussions on digital sound reconstruction. Professor AdnanAkay introduced me to the theoretical world of acoustics and tolerated mymanyquestions with incredible patience. I would like to thank him for counseland his suggestionsregarding my thesis. My friend and colleague, John Neumann,also deserves manythanks. Although I only have one official advisor, John has been an amazing mentor and source of countless anecdotes that always seemto put a smile on my face. His, practical understanding of acoustics and sound recording madehim a gold mine of useful information. All the devices described in this paper were fabricated at the Carnegie Mellon Nanofabrication Facility. I wouldlike to thank the nanofabstaff for their supportduring the countless hours spent fabricating my devices. The foundation of any good research requires a healthy environment. The MEMS Laboratory at Carnegie Mellon University provided me with the resources and companionship necessary to survive graduate school. First, I wouldlike to thank our lab administrator, MaryMoore,for her support, patience, and endless supply of chocolate that maintainedmy sugar level well into the late evenings. Next, I would like to thank my officemates: DanGaugel, Matte Zeleznick, George Lopez, Kevin Frederick, and Janet Stillman. Their technical knowledgeand friendship mademy graduate experience worthwhile. Humoris the cure for alrnost anything and one should never cease to befriend those who provide oodles of jokes and laughter. For keeping mesane, I would like to thank my closest friends: Phil Odenz,Mike Beattie, Ed Latimer, Mark Tyberg, Kirstin Connorsand the Trekfest gang. I wouldalso like to thank two very special people whohavebeen instrumental in my fight again Crohn’s Disease: Dr. Richard Duerr and Beth Rothert in the Digestive Disorder Center at UPMC,Pittsburgh. Graduateschool would not have been possible for mewithout their aid. Finally and most importantly, I would like to thank myimmediateand extendedfamily for their constant and strong support throughout my life. My parents, Allen and Harriet Diamond,have shownselfless dedication towards my health, happiness, and success. They taught me to look past the stress of my current situation and to relax and have fun. My brothers, Seth and Eric, were a constant source of advice on social issues and taught meto look ahead towards the future and not dwell on the past. I thank thern, as well as my sister-in-law Sandy, and cousins Alicia and Justin, for their adviceand supportregardlessof the time of day. Chapter1 Introduction 1.1 History of SoundReconstruction Theinvention of the electric telegraphby SarnuelMorsein 1832stimulatedmanyinventors of that time to find newmethodsfor recording and transmitting messages, including soundand music.Some of the earliest attempts at integratingan acoustictransmitterandelectric circuit occurredin Europe,but quickly influenced manyAmericaninventors such as Thomas Edison, AlexanderG. Bell, Ernst Siemens,and EmileBerliner. Figure 1.1 highlights someof the important developments in sound reconstruction,beginningwith the invention of the "dynamic"or moving-coiltransducerin 1874by Siemensand telephone in 1876by Bell. Bothinventions involved vibrating a diaphragmwith an electromagnetby placing a circular coil of wire inside a magnetic field. Thirty yearslater, the vacuum-tube, usedin th analogelectronicsfor the first half of the 20 century, is introduced. The voice-coil speakeris replaced in 1921by the directradiator loudspeaker, which employs a magneticallyactuateddiaphragmto produce sound.This designis the prototypefor most 1877,Berliner invents the first microphone~ 1874,"dynamicor moving-coiltransducer invented by Siemens andBell 1921,Phonetron,the first direct-radiator 7 loudspeaker L 1906,Fleminginvents the ~ first vacuum-tubeknown as the ’~hermionicvalve" 1929, Kellogg invents~ ~ 1928,Nyquistproves electrostatic speaker SamplingTheorem 1948, Commercial33 1947,Inventionof the 1/3 LP(LongPlaying) transistor by Shockley~~ microgroove disc et al. introducedby Goldmark 1948, Audio 1963, Compact stereo~ EngineeringSociety tape cassettesand playersare developed -~ (AES) formed by Phillips 1965,Era of digital signal processing(DSP) beginswith the 1982, Digital Compact~ applicationof Fast Disc (CD)introducedby Fourier Transform(FFT) Japaneseconglomerate by Cooleyand Tukey -[_1990, Phillips introduces a digital audiorape 1996,DVD (Digital Versatile recorder(DAT)using Disc) increases capacity of~ digital cassette digital storageof audioand video from 725MBto 14 GBper double-sided disc. Figure1.1 Timelineof SoundReconstruction current analogspeakers.Theinvention of the transistor in 1947begins the changeoverfrom vacuum-tubes to a cheaper,smaller, and faster technology:integrated circuits. In 1965, the application of the Fast Fourier Transform(FFT) towards signal processing, gives sound electronicsthe ability of real-timefiltering anddigital/analogconversion.At the same time of the paradigmshift from analogelectronics to digital signal processing(DSP), soundrecording mediahavealso changed from the first microgroovedisc recordsin 1948to digital medialike CDsand DVDs.The introduction of digital file formats such as MP3have increased the popularity of thesenewmediaover traditional analogmedia~ike cassettetapes andLPs. 1.2 Overviewof Research Despite130years of development in soundtechnology,the transduceris still the only remaining analogcomponent in a world nowdominatedby digital mediaand electronics. Therefore, the conversionfrom analogtransducerto digital transducerwouldbe the last "piece of the puzzle" to achievinga completelydigital system.This newparadigmof soundreconstruction, referred to as Digital SoundReconstruction(DSR),wouldalleviate manyof the inadequaciesassociated with traditional analogspeakers.Thepractical limitations of an analogspeakerlimit its performance, particularly the frequencyresponseandlinearity. For example,it is difficult to producelow frequencysoundswith a small size speaker.In addition, a digital-to-analog (D/A) converter mustbe usedbefore electro-acoustic transduction to accountfor the incongruity betweenthe analogspeakerand digital electronics responsiblefor filtering andother signal processing.Theconverternot only increasescost but introducesadditional signal distortion. Digital soundreconstruction is not a recent conceptand has beentheorized since the early 1980’sin severalpatents~[1,2]. Oneof the earliest patentsinvolvedthe designof a fluid flow control speakersystemconsistingof pipe openingsof a specific area that correspond to a bit of a pulse codemodulated (PCM)signal. Anotherpatent describesthe hypothetical creation of army of plastic membranes that are pulsed in time to create a time-varying waveform interpreted by our ears as an analogsignal. As far as wecantell, neither of those inventions were ever implemented.Several papers examinethe acoustic issues behind the use of a theoretical array of speakersfor the purposesof digitally reconstruction sound,including 2 harmonic distortion, directivity, andlinearity [3,4]. Again,thesepapersreflect simulationsbased on theory. 1.3 Summary This thesis servesto crossthat theoretical thresholdby presentingexperimentalresults of DSR using an array of microspeakers.Chapter2 describes the DSRconceptin detail and explains howa transducerarray can be usedto digitally reconstruct sound.Thetrade-offs betweenDSR andtraditional analogspeakersare also discussed. Thedifficulty in demonstrating digital soundreconstructionstemsfrom the high manufacturing costs associatedwith an essential high quality array of speakers.However, the solution can be found in a technologythat is well developedand useful in makinguniform and repeatable componentsat low cost: MicroelectromechanicalSystemsor MEMS. This technology can be integrated with conventionalCMOS electronics to create devicesthat intelligently interact with the environment,such as a microphoneor speaker. Chapter3 will cover the fabrication and dynamicmodelingof a CMOS-MEMS microspeakerdevelopedat CarnegieMellon University. This particular speakerdesignwasthe foundationof the digital speakerarrays presentedin our research. In Chapter4, we will branchout from a single microspeaker to discuss the issues concerning transducerarrays. Someof these issues are dependenton the size of the array while others are morepertinent to the modeof operation: analogor digital. Fourdifferent speakerarrays useful for studyingdigital soundreconstruction are also presented,the mostcurrent design providinga majorityof our data. Thefirst few chaptersof this thesis serve to establish the conceptandissues behinddigital soundreconstruction. Chapter5 quantitatively investigates manyof these issuesanddescribes the laboratory setup used in our experiments.Afterwards, we will demonstratethat DSRis possible for a wide variety of sounds, including music, and comparehowwell the theory matchesour experimentalresults. Thelast chapter summarizesthe research doneover the past two years andcomments on future work basedon results presented.Wefeel that whenit comesto soundand other acoustical topics, sometimesyou have to hear it to believe it. Therefore, in addition to the manypressurewaveforms andfrequencyanalyses,weencloseda CDof recordings madewith a CMOS-MEMS digital speakerarray. Solet us begin... 4 Chapter2" Digital SoundReconstruction 2.1 Concept Traditional sound reconstruction techniques use one or a small number of analog speaker diaphragms with motions that are proportional to the sound being created. As shownin Figure 2.1, louder soundis generated by greater motion of the diaphragm and different frequencies are produced by time-varying diaphragm motion. With r)igital Sound Reconstruction (DSR), the desired sound Figure 2.1 Conventionalanalogsoundreconstruction showingdiaphragm position correspondingto different points in the soundwaveform. waveformis generated from the summationof discrete pulses of acoustic energy produced by an array of speakers or speaklets. These pulses or clicks contribute a small portion of the overall sound, so unlike analog speakers, DSRspeaklets do not require a large dynamicrange. Louder sound is generated from a greater numberof speaklets emitting clicks, frequencies (d) (a) and different are produced by time-varying numbersof speaklets emitting clicks. Figure 2.2 showsa graphical description of digital soundreconstruction. Initially, four speaklet clicks are neededto produce a certain amount (a) (b) (c) (d) of pressure (Figure 2.2(a)). At the next instant in time, three speaklets are used to generate a slightly smaller pressure (Figure 2.2(b)). Figure 2.2 Digital SoundReconstruction (DSR)with hypothetical15-speaklet(4-bit) chip. Anidealized sounc~ pulse (click) is generatedfrom the motionof a single speaklet’s binary motion( [] ). Multiple speakletbinary motionsat different timescreatea soundwaveform. smaller pressures are needed, less speaklets are pulsed (Figure 2.2(c)). Conversely, speaklets are pulsed whenlarger pressures are needed(Figure 2.2(d)). The resulting summationof sounds or pressure variations produced by the array of speaklets has a magnitudecorrespondingto the analog value at discrete sampledtime values. Thenumberof speakletsin the array andthe samplingrate (frequencyat whichthe number of pulsedspeaklets is updated), will determinethe resolution of the resulting waveform.Since the humanear inherentlyhasthe characteristicsof a low-passfilter, the listener hearsan acousticallysmoother signal identical to the original analogsignal. In practice, however,various non-idealities associatedwith the acoustic pulseof a speakletandtransducerarray introducedistortion into the system,preventing an accuratereconstruction of the analogwaveform.Theseissues will be discussedlater. 2.2 Bit-GroupedDigital Array A digital transducerarray is requiredto implement true, direct digital reconstruction of sound.In a bitgroupeddigital array, eachtransduceris assigned a bit group, wherethe numberof transducers in eachgroup is binary weighted[3]. For example, the 8-bit digital array shownin Figure2.3 haseight different groups of transducers. The most significant bit (MSB)contains27 or 128transducers while the least significant bit (LSB)is represented by a single speaker.In general,an n-bit array will have 2n-1 Figure 2.3 Conceptualdiagramof an 8-bit transducer array chip. Thenumber of transducersin eachgroupis binary weighted.Bondpads surroundingthe array supply driving voltagefromthe package. transducersand the mth bit of that array will contain 2ml transducers. Whenthe signal for a particular bit is high, all of the transducersin the groupassignedto the bit are activatedfor that sample interval. 2.3 Requirements of Digital SoundReconstruction In Section2.1 of this chapter, the acousticresponseor click of a speakletwasidealized as a rectangularpulse. It will be shownlater that in practice, this pulse can neverbe perfectly rectangular. Regardlessof the shapeof the acoustic response,however,several requirements of theseresponses are essential to digitally reconstructsoundsusingan array of transducers: The acoustic responseof a single transducer should be fast, on the order of 10’s of microseconds. This includes the time it takes for the transducerto respondto the impulse as well as the time it takesfor the response to decayto negligible levels. This will limit the samplingrates that can be usedin converting digital information to an analogacoustic waveform.Theresponsetimes in this thesis will be measured from the start of the impulse to the maximum pressureof the acoustic response. ¯ Theacoustic responsemustbe repeatableover time anduniformacrossall speakletsin the array. Variations in uniformity will introduceerror into the digital to analogconversion. Designingthe array on a single chip will minimizeprocessvariations during fabrication and control the overall speakletuniformity. ¯ Regardlessif the acoustic responsesfrom multiple speakletsare linear or non-linear, the resulting acousticenergyfromthosespeakletsmustadd/inear/y.This implies that the total pressurefield in the region is the superpositionof all pressurefields generatedfrom the speaklets,or written mathematically: whereN is the number of speakletsin the array, ~ is the path length from eachspeakletto the listening point and c is the speedof sound(343 m/s). An~, ~o, and ¯ is the source amplitude,frequency,andinitial phase,respectively. Withoutlinearity, the summation of acousticenergy(as shownin Figure 2.2) cannotbe predicted. Chapter 3: CMOS-MEMS Microspeaker 3.1 Introduction Current MEMS applications require increasingly complexmicrosystems and morecomputational power.Oneapproachto satisfy these technological demands is to incorporate MEMS structures with conventional CMOS electronics integration. processing, commonlyreferred to as CMOS-MEMS Desirable for systems with arrayed microsensors (e.g. microphones) microactuators(e.g. speakers),this form of integration canleveragethe speed,reliability, and economical benefits of CMOS processing. For example, on-chip electronics can reduce parasitic capacitancefrom interconnects betweenthe electronics and structures as well as decreasepackagingcosts. A variant of this CMOS-based micromachining process, developed at Carnegie Mellon University, has manyadvantagesover other CMOS-MEMS techniques [5]. This process, describedin the next section, can be usedto producehigh-aspect-ratiostructures with narrow beamwidths andgaps, with limited post-processingsteps that are safe for CMOS electronics. The design of the CMOS-MEMS microspeakerdeveloped by Neumann et al. [6] provides the foundation for demonstratingdigital sound reconstruction. A serpentine metal and oxide meshpattern, shownin Figure 3.1, is repeated to form structures with dimensions up to several millimeters. Thepatterns are realized in a CMOS chip, etched and released to form a suspended mesh,typically 5 to 60 #mabove the substrate. The meshis coated with a polymerto create an airtight membrane, and Figure 3.1 SEMof serpentine mesh(1.5 Izm beamsana gaps)after releasefromsubstrate. then electrostatically actuatedby applying a varying electrical potential betweenthe CMOS metal andsilicon substrate. Theresulting out of plane motionis the sourceof pressurewaves that producesound. 3.2 Fabrication of CMOS-MEMS Microspeaker In this section, we will describe the process used to create the original CMOS-MEMS microspeaker. The CMOS chip comesfrom a foundry facility (i.e. MOSIS,AMS)with protective layer of silicon dioxide. As shownin Figure 3.2(a), regions meantfor mechanical structures are usually patterned in one or moreof the metal layers. To minimize the Silicon dioxide (overglass) Metallayers thickness of the membrane, only the bottom metal layer is utilized in the microspeaker meshdesign. Before releasing the CMOS structures that define speakerson the front CMOS circuitry side of the chip, vent holes must be patterned and etched to connect the Silicon substrate Figure 3.2(a) 3D view of CMOS chip from foundry facility, covered witha protectivelayer of silicon dioxide. membrane cavity with the backsideof the chip (Figure 3.2(b)). Theseholes, generally15 to #min diameter, are necessaryto reduce the acoustic impedancebehind the membrane and reduce unwanted resonances that will causeoscillations in the acousticresponse. After the vent holesare etched, processing of the front side continuesas follows: Vent holes Figure 3.2(b) 3D view of CMOS chip after the backside vent holeshavebeenetchedleaving,’,sO to 60pmof silicon remaining. /n somecases,the oxidelayer canbe usedas anetch-stop. (1) Thesilicon dioxide is etchedanisotropically Composite (directionally) downto the silicon substrate structurallayer Exposed silicon using a reactive-ion etch system. Thetop metal layer asksas a maskandprotects the CMOS meshstructures (Figure 3.2(c)). (2) Theunderlying silicon substrate is etched isotropically (all directions) to undercutand Figure 3.2(c) Cross sectional view of CMOS chip after oxideetch. release the mechanicalstructures, whichcan nowbe usedas electrodes for sensing and actuation, or wires for connectingto on-chipcircuitry (Figure 3.2(d)). Theresulting cavity can be etched using an SF6plasmaor ~ 5 #m ~ 1 #m XenonDifluoride An (XeF2). anisotropic deep reactive-ion etch (DRIE)can precedethe release etch form much larger cavities Etchedcavity Figure 3.2(d) Cross sectional view of CMOS chip after silicon releaseetch. without removing the silicon sidewalls that isolate different structures. Typically only the isotropic etchis requiredfor cavity depthsless than20 p.m. (3) In the final step, the released CMOS-MEMS meshis coated with a CHF3polymer in chemical vapor deposition process. The CVDpolymer polymerconformsto all sides of the meshlike beams,until all gapsare sealed,thus V creating an airtight suspendedmembrane (Figure 3.2(e)). Themetal layers inside the beams allow the membraneto be electrostaticallyactuated.. Figure 3.2(e) Cross sectional view of CMOS chip after polymer deposition. The sealed membranecan be electrostatically actuatedby applyinga potential to the metal beamsandsubstrate. For speaker chips containing a single or few membranes and provided that the membrane cavities are at least 40 ~rn deep,wecan guarantee with the aboveprocessthat all speakerswill be sufficiently vented.However, with an array of 256speakersthat are less than 250~min size suspendedover a 10 - 15 ~mdeepcavity, this tasking becomesdaunting. An alternative methodwasadaptedfor the large speakerarrays usedin this research.Instead of patterning vent holes, we etchedthe backsideof the chip usinga combinationof anisotropic and isotropic DRIEsteps until 50 - 100 ~mof silicon remained (Figure 3.3(a)). Approximately500 I~m of the outer edges Figure 3.3(a) 3D view of CMOS chip after modified backside etchleaving50to 100,urnsilicon remaining. of the chip weremasked to provide support during future processing steps. An isotropic etch wasaddedto the recipe to minimizeloadingeffects that occurwith large areasof silicon, resulting in highly non-uniform etch profiles. After the usualoxideetchdescribedin step (1), vent holes are then patterned andetched on the front side of the chip (instead of the backside). As shownin Figure 3.3(b), this modification allows us to properly vent all membranes with cavities that are only a few micronsdeep.After the vent hole etch reaches Figure 3.3(b) Crosssectional viewof CMOS chip after the vent holes havebeenpattemed andetchedfromthe front. the depth achievedthrough the backside etch (which is indicated by tt~e presenceof light whenshined from a microscopeback source), releaseof the membranes can continueas describedin steps(2) and(3). 3.3 DynamicModeling of CMOS-MEMS Microspeaker The composite materials and serpentine pattern makethe CMOS-MEMS membrane a complex structure to accurately model. However,approximations can be madeto obtain useful information from a first- or second-orderanalysis. In a paperon micromachined electrostatic transducers,Ladabaum et al. representa transduceras a first-order lumpedelectro-mechanical model,consistingof a linear spring, a mass,anda parallel plate capacitor[7]. Themotionof the membrane can be expressedin a force-balanceequation: FELECTROSTATt c + FsPRm ~ = Fu~ss whereFELECTROSTATIC andFSPRING are forces exertedby the capacitorandspring, respectively. In order to represent the membrane as a simple mass-springsystem,several approximationswere necessary.First, the membrane’s restoring force (opposingdirection of motion) wasassumed to be a linear functionof its displacement: FSPR~N~ = -kx. Secondly,the electrical fringing fields and curvature of the rnembranewere neglected. Finally, dampingof the air or other surroundingmedium wasreplaced with a vacuum to simplify the mathematics.As will be shown later, the dampingfrom the air underneaththe sealed membrane will havea drastic effect on the responseof the system.In either case, the modelshowsthat at someapplied voltage, the electrostatic force overcomes the spring’s restoring force andthe membrane collapses. This collapse point occurswhen: whereS is the area of the membrane, e is the electric permittivity, and do is the separation betweenthe membrane and substrate at rest. Thepresenceof a thin layer of oxide (170 /~) underneaththe metal structures preventsthe membrane and substrate from shorting after collapse. This layer was neglected in the 12 derivation of the collapsevoltagesince do >> After collapse, however,the membrane dlNSULATOR. will not snapbackuntil the voltagereachesbelowVSNAP.BACK: 2kdlus~r°~ -.dms~aroe ) = VSNAP_BACK ~insulator S ~ This hysteretic behavior waspredicted and experimentally verified for a CMOS-MEMS 7O microspeaker,as shownin Figure 3.4. In order to maximizesystemefficiency, it ~ 3O would be optimal to driw. ~ these membranes ~ 2O 10 at their resonancefrequency,for whichthe 10 20 30 40 DCVoltage (V) 50 60 70 deflection is largest. If we simplify the Figure 3.4 Hysteretic behavior of membrane displacement as a function of DC voltage. As the applied voltage increases, the membrane collapses around 48 Volts, but does not snapback until the voltage drops below 22 Volts. serpentine metal-oxide meshand polymer composite as a flexural plate mass dominated by the polymer,the resonance frequency,fR, can be written as: wherethe Young’sModulus,E, Poisson’sratio, v, membrane side length, a, thickness, t, and density, p, are knownor estimated [8]. If we use an experimentally determinedYoung’s Modulusvalue of 500 MPafor the mesh-polymer structure, then we would expect a 1.4 mmx 1.4 mmCMOS-MEMS membrane (3 #mthickness) to have a resonancefrequency around 1.5 kHz. If we decreasethe size of the membrane to 216 #mand thickness downto 1.3 #m, then the predicted resonance frequencyjumpsto 27.5 kHz. A frequencyresponsecurve of the membrane will validate our predicted results as well as give information concerningthe effectiveness of the membrane whendriven as an analogspeaker. Figure 3.5 shows the frequency response curve of a 216 p.m x 216 ~m CMOS-MEMS 13 membrane, measured using 9080 an acousticsetup describedin Chapter5. 7060- To accommodate the small pressure output of a single 216 ~mmembrane, MembraneResponse1 we applied the 20 V bias and 10 Vp-p ,! ~" ~~~ Microphoneo/~ensiti’~ty" 0 ~0 100 1000 10000 100000 sinusoidal input to a group of 128 Fri~:luenc y (Hz) speakers. As the microphonehas its Figure 3.5 Frequencyresponse curve for a group of 128 CMOSMEMS square membranes, each 216/JJ’n on a side. Resonantpeaks are dueto the earphone housing(~6 kHz), microphone sensitivity (~ kHz), and membrane resonance(20-45 kHz). The dotted line shows tvr)ical microphone sensitivity plot, with a resonance around12 kHz. own resonance frequencies that influence the measuredoutput, we overlaid a typical microphone sensitivity plot in the figure to identify the resonantpeaksaround 15 kHz. Thereare several additional peaksbetween20 and 45 kHzthat could be either the membrane resonanceor a harmonicof the microphoneand cavity resonances.With further tests, wewill be ableto better identify the sourcesof theseresonantpeaks. 3.4 Acoustic Responseof CMOS-MEMS Microspeaker Clearly, there is a great deal moremodelingthat can doneto havea better understandingof membrane dynamics. However, as this thesis is meant to be a proof of digital sound reconstruction, it makesmoresenseto spendtime characterizingthe acoustic responseof the membrane.An acoustic response is the pressure waveformcreated by one or more speakersin response to an electrostatic input, in the case of our research, a finite high voltage pulse. Figure 3.6 showsthe acoustic response of a group of 216 I~m square- Figure 3.6 Combined acoustic responseof 128 speak/ets (2161Jm on a side) to a 40 Vo/t, 200ps pu/se(a/so shown). shaped membranes given a 200 ~s, 40 Volt 14 pulse. Thereare several observationsthat can be madefrom this response,first of which, is the time it takesthe membrane to respondto the pulse. Onepossibility for the sourceof this delayis the propagationtime for the acousticpressureto reachthe microphone, typically a few millimeters from the speakerarray. However,that only contributes to 10 - 15 p.s of the responsetime. Therefore, the majority of delay must comefrom the responsetime of the membrane. Thesecondobservationis the bipolar natureof the acousticresponse.Thepositive peakarises from the membrane collapsing towardsthe substrate while the negative peakarises from the release of the membrane. In mostcases,the first positive peakis greatest in amplitudeand responsiblefor mostof the pressure.However, it is clear that if digital soundreconstruction relies on the superpositionof pressuresin time, then the negativepeakswill counteractpositive peaksproducedby other membranes. As will be discussed in Chapter5, there are several other consequences frorn this behavior. Chapter6 will present somesuggestionsthat might reduceor possiblyavoidthis behaviorin the future. Thefinal observationis the oscillatory nature of the acoustic response.Rangingfrom 10 - 20 kHzin frequency,these oscillations are causedby the resonances in the system,including the membrane, cavity, and rnicrophone. In Section 3.3, we showedthat these resonancesoccur between6 kHz and 45 kHz. As a consequence,the oscillations will appear in digitally reconstructedwaveforms andthus contribute energyto upperharmonicsof the system. Chapter4: Digital SpeakerArrays 4.1 Introduction To demonstrate the digital soundreconstructionconcept,weconstructedfour different speaker array chips basedon the CMOS-MEMS membrane design described in Chapter3. The original 3-bit array prototype consistedof sevenseparate1.4 mmx 1.4 mmmembranechips mountedon a TO-8 package with holes drilled underneathfor properventing (see Figure 4.1). Four the speakletswereelectrically tied together to form the most significant bit (MSB),anothertwoweretied togetherto formthe Figure4.1 3-bit (7 speaklet) array mounted on a TO-8 package. Under the chips, vent holes have beendrilled through the package. Unusedholes are filled to preven~ airleakage, next most significant bit, and the remaining one speaklet formedthe least significant bit (LSB). This design allowedus to studythe individual acousticresponse of eachspeakletas well as reconstructsimplesinusoidal signals. As will be shownlater, variations in the Figure 4.2 Integrated array of four 1.4 mm membrane speaklets. Chip size is 5 mmx 5 mm. fabrication of separate chips (even whenprocessed together) makeit impractical and difficult to characterize the array performance.Therefore, the digital speakerarrays shownin Figures 4.2 and4.3 weredesignedwith multiple speakletsintegrated on the samechip with the speakletselectrically isolated and connectedto separate bondpads.Furthermore, Figure4.3 3-bit array containingoctagonal900izrn diameterspeaklets.Chip size is 5 mmx 2.5 mm. multiple chips could be arranged on the same packageto create larger bit arrays. Basedon the initial results measured with these arrays, an 8-bit (255 speaklet) array was designed to balance the resolution necessaryto demonstratedigital sound reconstruction with the high costs of fabricating centimeter-scalechips. Shownin Figure 4.4, the 8bit array contains256 squarespeaklets, each216 ~m Figure 4.4 8-bit array containing 255 square membranes, each 216 ~mon a side. Total chip size is 5.2 mmx 5.2 mm. on a side. Asit wouldbe impracticalto wire eachspeakletin the 8-bit array to its ownbondpad, the speakletsweredividedup into eight electrically isolated regions,as describedin the section on digital speakerarrays. Althoughwelose the ability to actuate an individual speaklet, only nine bondpads are required: eight for the bit groupsandonefor the substrate tied to ground. Mostof the acoustic responseand soundreconstruction measurements detailed in the next chapter,as well as the recordingsincludedwith this thesis, weremadeusingthe 8-bit speaklet chip. 4.2 Characterizationof Digital SpeakerArrays 4.2.1 Uniformity As mentioned in chaptertwo, anyvariations in uniformity will introduceerror into the digital to analogreconstructionprocess.Wecan examinenon-uniformityat the individual speakletlevel as well as the bit grouplevel. Thecollapse voltage describedin Chapter3 can be a useful indicator of non-uniformities betweenspeaklets throughoutthe array. Figure 4.5 showsa 3D graphdivided into 16 x 16 data points of collapse voltagesfor the 8-bit array. To accurately measure the collapse voltage for all 256 speakletmembranes, weincreasedthe voltage in 0.1 Volt increments and markedindividual membranes under the microscopeas they collapsed. Themeancollapse voltage andstandarddeviation are 28.0 Volts and3.28 Volts, respectively. Mostof these variations occur during the fabrication processdescribedin Chapter3, but once exposed to an environment outside the cleanroom, dirt and other destructive particles can deposit on the membranes I r~ 25K andpotentially affect speakeroperation. If wewantto look at uniformityfromonebit 154-"// ] lOJ~ 10I,~113 lS-: I| 5-1( groupto another,the best wayis to find the normalized acoustic response for each group. In this case, normalizationinvolves measuring the acoustic response of a Figure4.5 Three dimensional profileof collapse voltages for all 256 speaklet membranes in the 8-bit array. Ma~::34.3 V, Min:18 particular bit group and dividing that responseby the numberof speaklets in the bit group. Figure 4.6 showsthe original and normalizedacoustic responsesof eachgroupin the 8-bit array given a 40 Volt, 200 I~s pulse. The response fromthe MSBexhibits less oscillations than the other bits, probablyduein part to the superpositionof manyspeaklets that have slightly different acoustic responses.Despite 0.02 T ---,BI 1 Bit2 0.015 0.01 0.005 0 -0.005 -0.01 (a) (b) Time(p.s) Time Figure4.6 (a) Theacoustic response of eachbit groupof an 8-bit arraygivena 40Volt, 200ps pulse.(b) The normalized acoustic response of (a) canbe foundbydividingeachacousticresponse by the number of speaklets. 18 variations in the collapse voltage shownin Figure 4.5, the shapeand responseof the normalizedacoustic responses in Figure 4.6(b) havean averagecross-correlationvalue of 0.95 where1.00 implies a perfectly identical match° 4.2.2 Linearity Although the acoustic responsesof the individual CMOS-MEMS membranes are highly nonlinear with respectto excitation voltage, the linear in the superpositionof multiple speaklet responsesis essential to digital soundreconstruction. Otherwise,the conversionfrom digital samplesto analogpressureswouldbe non-linear andrequire signal. Themaximum pressureof acoustic responsesgives a goodindication of the array linearity. As shownin Figure 4.7, a linear trendline andR-squared coefficient of determination RZ=0.9985 can be used to approximate these values. The R- squared value or coefficient of determinationdisplays howclosely the estimatedvalues of the trendline match 20 40 60 80 # Speaklets 100 120 140 the actual data. As the R-squaredvalue Figure 4.7 Maximum pressure for the acoustic responseof each bit approaches unity, the trendlinebecomes groupgiven a 40 Volt, 200 ps pulse. moreaccurate.A valueof R2 greaterthan 0.95 is consideredstatistically linear, so compared to our valueof 0.9985,the superpositionof acousticresponses between bit groupsis linear. 4.2.3 Directivity In discussionsinvolving speakerarrays, it is common to examinehowthe soundpressurefields created from individual speakersinteract with eachother. Depending on the geometryof the array andthe frequencies of interest, the resulting pressurefield canvarywith listening position, therebygiving a senseof direction to soundcreatedby the array. Thereare manyapplications for which the production of directional soundis useful, such as soundrecording, military communications, and sonar. Thereare generallytwo sourcesfor the creation of a pressurefield that varies with listening position. If the size of the speakeris large compared with the wavelengths of interest, then the speaker must be modeledas a sound source that exhibits non-uniform and frequencydependentradiation properties. If the size of the array (or maximum distance between speakers)is comparableto the wavelengthsof interest, then the path difference from one listening positionto anotherwill varyconsiderably andcauseirregularities in the soundfield. As the dimensionsof typical CMOS-MEMS membranes range between100 and 2000~m, their size wouldnot become an issue for frequenciesin the audio (or evenlow ultrasonic) range. Althoughthe size of a speakerarray is comparableto a quarter-wavelengthfor frequencies abovea few kHz,the variation in path differencefromtwodifferent speakersis minimal.In fact, for mostin-ear applications,the listening position (i.e. eardrum)is fixed andreceivesa constant pressurefield. Therefore,it is safe to assume that directivity is not an issue in digital sound reconstruction using CMOS-MEMS technologyfor earphoneapplications. 4.3 Harmonic Distortion The harmonicdistortion, Dh, can be useful in evaluating the effectiveness of any sound reconstructionprocessandis definedas: N Dh N ’ ~_,A~ n=l whereAn is the Fourier coefficient of the nth harmonicin the soundwaveform.Fromthis equation, Dh can be viewedas the ratio of the sumof harmoniccomponents,excluding the fundamental (n=l), to the wholespectrumof the signal. Theharmonicdistortion of a perfectly linear analogarray, in whichthe speakersare driven by the samesinusoidal signal without the presence of distortion, is zero at all locationssince it doesnot containanyharmonics other than 2O the fundamental.Theharmonicdistortion for a commercialanalog earphonesrangefrom 0.1% to 1%for high quality speakersand 1%to 10%for lower quality speakers.Thesevalues vary with frequencyanddo not includean additional fractional percentdistortion fromthe electronics. For a digital transducerarray, Dhwill havenonzerovaluesthat vary with different listening locations [3]. Non-uniformities in the acoustic responsebetweenspeaklets, described in Section4.2.1, contribute to harmonicdistortion by introducing higher harmoniccomponents to the resulting waveform. If the size of the array is comparable to the wavelength of the signal, then any phasediscrepancy betweentwo speaklets, resulting from different paths to the listening point, will also contribute some harmonic distortion. As mentionedin the previous section, however,this issue is not applicable for earphoneapplications of DSR. -70 -80 For cases where directivity -90 -100 10 100 1000 I(XX)O 1~ Frequency (Hz) Figure 4.8 Example of spectral analysis taken during digital sound reconstruction experiments. The frequency range studied was 20 to 20,000 Hz. Measured sound levels given in units of decibel-Volts or dBV. become an issue, does the array geometryand speaklet arrangement will havea considerableeffect on harmonic distortion. Initially, onemightthink that increasingthe number of bits in a digital array wouldimprovesoundquality dueto the increasedresolution (or quantizationlevels). However, Huang et al. report via simulationsthat increasingthe number of speakletspast a thresholdwill increase the path difference betweenspeaklets, and thus negatively affect soundquality throughincreasedharmonicdistortion. Thevalue of Dh, also referred to as total harmonicdistortion (THD),can be calculated using data collected from the spectrumanalyzer. As shownin Figure 4.8, the analyzer can give us the strengthof a particular frequency in termsof dBVr~,s or decibel-Volts.To convertdBVr~,s into power,weneedto first backout the Vrrnsby the followingformula: 21 dBV,~, = 201Og~o(V~,~,) =:, V~,. --: ~~9 To obtain the power or Fourier coefficient Ao at each frequency, we simply square each V~s value. As the frequency range of the humanear does not extend beyond 20 kHz, we will concern ourselves only with spectral powerof frequencies between20 Hz and 20 kHz. 22 Chapter5: ExperimentalResults 5.1 Introduction Nowthat the theory and issues behinddigital soundreconstruction havebeenpresented,we are ready to demonstratethat DSRis possible and comparehowwell the theory matchesour experimentalresults. Weperformedour first experimentsusing the 3-bit (7 speaklet) array describedin the last chapter.Withthis chip, wewereable to: (1) begincharacterizingthe effect of the pulse voltage andpulse width on the acoustic responseof a single speaklet; and (2), demonstratethe first attemptsof digitally reconstructing sinusoidal waveforms.Figure 5.1 showsrepresentative data from the 3-bit DSRexperiments, including (a) the acoustic responseof a 1.4 mmspeaklet membrane to a 90 Volt, 2.00 las pulse, and (a) (b) the reconstruction of a 500 sinusoidal waveusing the 3-bit array. We madeseveral observations from these initial sets of experiments.First, the -100 100 300 500 700 900 1100 1300 1500 Time(ps) responsetime of the acoustic responseis relatively slow (-250 ~s) for mostaudio i................. .............. A................................. ~.................................. .2’. .................. ................ ~i.................. 4...........(b) reconstruction applications and as discussed in Chapter 2, limits the precision of the reconstructedwaveform. Second,weapplied an A-weightedfilter, typically 50~#s/div i I Figure 5.1 (a) Acoustic response of a single 1.4 membrane given a 90 Volt, 200/zs pulse, also shown.(b) Reconstructed 500Hzsinusoidalwaveusing the original 3-bi~ array. used in acoustic noise measurements, to all measurements.A-weightedfilters attenuatethe low frequencyelectrical noise in the roomandhigh frequencyoscillations from 23 various systemresonances,resulting in a smootheracoustic responseand reconstructed waveform.Althoughthis filter gavebetter results, it limited our ability to reconstruct low frequencywaveforms or control the filtering stepin the digital soundreconstructionprocess. Despitetheseconsiderations,the mostimportantresult wasthat wewereable to reconstructa 500 Hz sinusoidal-like waveformusing discrete, digital acoustic pulses. With only seven separate speaklets (instead of hundredsto thousandsof speaklets integrated on the same chip), wewereable to proveour theoryof digital soundreconstruction. Thenext step wasto designandfabricate a newset of chips that could producea higher quality DSRwaveformthrough improveduniformity, faster response, and higher resolution. This ultimately led to the creationof the 8-bit array describedin Chapter4. With255speaklets,each 216 #mon a side, wewereable nowto further explore the issues relating to digital sound reconstructionin detail andavoidgthe fabrication of hundredsof chips. By the time the 8-bit chip hadbeenfabricated.., the test bedusedfor theseexperiments hadtripled in complexityand capability. It also became evidentthat the A-weightingfilter wasinappropriatefor this research and not used in future experiments.With our equipmentsetup and chips available, wewere readyto change the world.... 5.2 Test Setup This section describes the software and hardwareusedto take the measurements reported in this thesis. An extensiveand flexible test bedwasnecessaryto run a wide rangeof digital soundreconstruction experimentsand to sufficiently characterize the acoustic responseof CMOS-MEMS membranes. 5.2. 1 Generation of Digital Signals Weusedtwomethods for creating digital signals froman analogwaveform: (1) direct generation of digital samples;and (2), an analog-to-digital converter (ADC).Wereconstructedsimple sinusoidal, square, triangle, and sawtoothwaveformsusing Labview, a programming-based 24 platform that can generatesignals with a data acquisition (DAQ)card. The interface shownin Figure 5.2 lets the user choose a periodic waveformto create, the frequencyof that waveform, and the samplingrate for the digital output. The amplitude values of the sampled analog waveform are converted into binary numbers,where eachbit containsthe digital information for a specific groupof speakletsin the array. Therefore, with eight bit channels,wecan create .a signal having 256 distinct expected, levels (0 to 255). we can create Figure5.2 Labviewbasedinterface that creates an 8-bit digital representationof a simpleperiodic waveform.Theuser can control the, waveformfrequency, samplerate, pulse width, and choose betweensine, square, sawtooth, and triangle waves.Manualmode optionallowsthe creationof a user-defined pattern. higher resolution waveforms with morebits becauseof the increasednumberof levels. By turning off any of these channels, wecan examinehowdifferent bit weightingsaffect the reconstructed output. This approachis cumbersome for anything morethan a simple periodic waveform,so for more complex waveformssuch as music, a realtime hardware implementation is needed. To accomplish this, weusedan 8-bit analog-to-digital converter(ADC),a clock generationcircuit for the sampling frequency,anda circuit to adjust the input signal to fit inside the input rangeof the ADC(see Appendix A). The ADCcircuit could have been used to generate simpler waveforms,but the Labviewapproachcan generatemoreconsistent and controllable digital outputusefulfor initial testing of our speaker arrays. As shownin Figure 5.3, the digital signal must be logically ANDed with another signal to ensure that the speaklets are pulsed for eachsampleperiod containing a digital high. Theduty cycle of this signal can control the width of the pulse without affecting the frequency. Applying this type of signal is called the pulse methodof soundreconstruction. If the digital signal is sent to the array without (40%duty) multiplexing with the pulse signal, then the position of the speaklets will be determinedby the _[-1 ~ PULSED OUTPUT Figure5.3 Descriptionof pulsemethod: the digital input is logically ANDed with a clock signal defined by the samplingfrequencyandvariable duty cycle to determine the pulsewidth. Theresult is a signal with finite pulses occurringeachsampleperiodwith a digital HIGH. logic level of the digital signal. As the majority of acoustic pressure is generatedduring a transition from one state to another (i.e. the membranes are snappedor released), then the generated signal will appeardifferent than whenthe speaklets are pulsed. This methodis called the step method of soundreconstruction. The differences in these two reconstruction methodswill be presented later in this chapter. 5.2.2 High Voltage Ampfifiers As both the DAQcard and analog-to-digital converter chip are incapable of generating the 20 to 50 Volts currently necessary to collapse the CMOS-MEMS membranes,we amplified the digital +Vdd data before driving the speaker arrays. Weexplored two methodsfor high voltage amplification: (1) high voltage analogamplifiers, and (2) high voltage MOSFETs. The analog amplifier can be used for a wide variety of waveforms while the high voltage MOSFETs can only generate high voltage pulses with 10-20 ~s switching speeds. As each bit channel requires its own amplifier, however, the large expenseand size of the analog amplifiers outweigh the usefulness of generating a wider variety of signals. The circuit Figure 5.4 High voltage switchingcircuit to amplify digital pulses. schematic shownin Figure 5.4 uses a P- MOSFET configuration in which the output reaches VDD ( - 40 V) whenthe input gate voltage is a digital low andnear groundwhenthe input is a digital high. In the future, with different membrane andgap geometries,smaller voltages suitable for on-chip CMOS electronics will be sufficient to collapsethe membranes. 5.2.3 Acoustic Measurement and Analysis Thelast section of our test bedis the equipment usedto measure the acousticresponseof the speakerarray andconvertit to an electrical signal for analysis. Thetypes of analyseswewant to performwill determineany additional equipment. Anyacoustic measurements whereinterference from external soundsand reflections is not desired should be done inside a BrQel and Kjaer (B&K) 4232anechoic chamber,a selfcontained apparatus with foam walls designedto absorb sound energy(seeFigure5.5). It is also common to take acoustic measurements with the microphone held at a fixed distance from the soundsource. To accomplish this, we used a plastic collar to secure the package and microphone with minimal air leakage. The B&K Figure 5.5 B&K4232anechoicchamberwith B&Kear simulator microphone attached to DSRearphone. 4157Ear Simulator microphoneis designedto mimic the acoustic environmentof the human ear, whichas shown in Figure5.6, acts as a low-passfilter for frequenciesgreaterthan 40 kHz. In Chapter2, we mentionedhowa low-pass filter is necessaryto smoothout the acoustic pulses producedby the array. For samplingfrequenciesgreater than 40 kHz, the human ear is a sufficient low-passfilter for anti-aliasing. If the samplingfrequencyis less than 40 kHzor additionalfiltering is necessary, thenanartificial filter mustbe applied,either throughthe design of an acoustic filter in the earphone packaging or applyingan electronic filter at the output of the microphone. We usedan RCactive filter (see Appendix A), to mimicfuture acousticfilters that Figure 5.6 Frequency response,of B&K4157ear simulator, designed to mimicthe acoustic environment of the human ear canal. Theplot is basedon the microphone sensitivity at 500Hz. would smooth the digital acoustic pulses. Nowthat the acoustic energyhas beenconvertedto an amplified electrical voltage, the output of the microphone pre-amplifieris sent to severaldifferent locations depending on the analysis. Theoscilloscopedisplays the acoustic waveform as a voltage, whichcan be convertedeasily to a pressureby backingout the microphone sensitivity andamplification. For example,the B&K 4157ear simulatorhasa sensitivity of 11 mV/Pa,so if the amplifier wasset to 100V/V, then the following equationcan be usedto backout the RMS pressureas seenby the microphone: Wealso connected the microphone amplifier output to a spectrumanalyzerfor spectral analysis of the DSRwaveforms.During digital soundreconstruction measurements,the frequency spectrumgives us information about the additional harmonicspresent besidesthe fundamental weare trying to createas well as informationneeded to calculate the total harmonicdistortion. Wecan record the acoustic waveformby connecting the microphoneoutput to the auxiliary input of a tape player (for analogrecording)or computer (for digital recording). Therecordings includedwith this thesis weremadewith a laptop andtransferredto compact disc. 28 5.3 Characterizationof AcousticResponse Thissectiondescribesa variety of experiments usedto characterizeindividual bit andtotal array responses.Specifically, wewill examinehow’the speed,amplitude, andshapeof the acoustic responseare affected by the input pulseparameters. 5.3.1 Pulse Amplitude In Chapter 3, we showedthat the displacementof a CMOS-MEMS membrane from an applied electrostatic voltageexhibit a hysteretic behavior.Figure5.7(a) showsthe acousticresponse the 8-bit array MSB (128speaklets) 3 a function of the electrostatic voltage 2 for a 200 l~s pulse. The maximum pressure of each response, shownin -2 Figure 5.7(b), exhibits a nonlinear -3 relationship to the pulsevoltageunder 4 35 Volts .~ 2s .......................................... I t5 ............................. and becomes linearly dependent after 35 Volts. ~....................... It is no coincidencethat the collapse ~ voltage for this membrane occurs at (b) Pu~se Vo~ge (V) Figure 5.7 (a) Acousticresponseof MSB (128speaklets)for varying pulsevoltages(width= 200/z,s). (b) Maximum pressureof the changeover of these behaviors around 35 Volts. exponential correlation two The between pressureand pulse voltage belowthe collapse voltage point can be explainedby the 2FE o~ V relationship betweenvoltage andelectrostatic force. Immediatelyafter collapsing, only the center region of the membrane is in contact with the bottomof the cavity. As the voltage is increasedbeyondthe collapse point, the outer regions of the membrane continue movingand provide additional pressure. This processis seenin the linear dependence region between 29 pressure and voltage. Dueto variations in collapse voltage betweenthe speaklets, the transition betweenthese two regionsfor a 128-membrane array is not sharpand representsthe averageresponse of all 12.8 speaklets. 2.52 pulsewidth = 50 ~s 1.5 o £ulsewidth= 100~ps width = 20O )ulsewidth= 1000 pulsewidth= 133p,s 3ulsewidth = 500 pulsewidth = 2000#s I 2 1~5 Time(#s) Figure5.8 Acousticresponse of MSB (128speaklets)k)r varyingpulsewidths: 25, 50, 100, 133, 200, 500, 1000,and 2000~s. Anoutline of the 40 Volt pulse is overlaid oneachdiagram. 3O 5.3.2 Pulse Width Thesampling frequency of digital soundreconstruction maybe fixed by the electronics providingthe input digital words,but wecanstill control the widthof the pulsethat remainshigh with each sampleperiod. Figure 5.8 showshowthe amplitude, speed, and shapeof the acoustic responsevary with the pulse width. Several changesoccur as the pulse width is increased:(1) the amplitudeincreaseslinearly until it reachesa maximum around133~s; (2) the speedof the acoustic response,measured by the time from the pulse start to the maximum positive pressure,also increaseslinearly until it reachesa maximum around133p.s; (3) the first andsecondpeaksbegin to separate, leaving a muchsmaller secondarypeakthat decayswith someoscillation. In the caseof the snappulse responseshownin the figure, you can clearly see that the positive peakresults from the collapse of the membrane while the negative peak results frommembrane returning to its original position. Oncethe two peaksdo not overlapwith each other, the acoustic impulse responsetransforms into a step response.This type of responseis used with the step methodof soundreconstruction mentionedin the previous section. 5.3.3 Pulse Type Wecan invert the order of the positive and negative pressure peaksby applying a release pulse, wherethe membrane is releasedfromits deflectedstate for the pulse duration. As shown in Figure 5.9, the release pulseyields a larger absoluteamplitudepressureresponsethan the 3 (a) (b) Figure5.9 (a) Acousticresponseof Bit 6 (64 speaklets) given a 40 Volt, 200,us snapandrelease pulse. (b) Absolute maximum pressurefor snapandreleasepulsesas a function of pulse width(40 Volt pulse, MSB). 31 snap pulse. The speed and overall pattern of the acoustic response, however, remain unaffectedby the pulsetype. 5.3.4 Summary Changes in the input voltage pulse influence the amplitudeof the acoustic responsebut leave speedand shapeof the responseunaffected. With higher voltage pulses, we can achieve greater pressurestranslating to louder sound,but high voltageslimit the use of standardlowvoltage CMOS for integration of on-chip electronics. Thepulsewidth gives us morecontrol over the acoustic response and can be used to generate both impulse and step responses. However,the pulse width is ultimately limited by the sampling frequencyused for sound reconstruction. As the pulse type only affects the amplitudeof the pressureresponse,there are manyopportunitiesfor future researchthat can use different shapedpulsesto improvethe quality of reconstructedwaveforms. 5.4 SoundReconstruction Measurements This section can be consideredthe mostsignificant part of the thesis becauseit provides evidenceof digitally reconstructedsound.Wewill start by creating simple exponentialand sinusoidal waveforms andturn our attention to examinehowthe pulse voltage and pulse width affect these reconstructedsignals. Thelast two sections describethe reconstructionof other periodic waveforms and soundrecordings that are included in this thesis. For mostof these experiments,wewill compare the step andpulse methods of reconstructionmentioned earlier in this chapter. 5.4.1 Pulsing of SequentialBits Beforedelvinginto sinusoidsandother patternsthat involve multiple bits firing simultaneously, we reconstructeda simple exponential signal by firing eachbit group sequentially. As the acoustic responseto a 40 Volt, 200t~s pulse is knownfor eachbit andthe cycle time of the pattern is longer than the time necessaryfor the responsesto decay,wecanpredict the linear 32 superposition of these ~responses as a function of time. Figure 5.10 showsthe predicted and actual experimental response waveforms in to the following 500 1500 0 exponentialsequence: 0, 1, 2, 4, 8, 16, 32, 64, and 128. Thesevalues represent the numberof speaklets -1,5 R’edicted Multiple Sequence Actual Multiple Sequence -2 Time(p.s) that are pulsedandcorrespondto a specific bit group(see Chapter2). Figure 5.10 Predicted and actual waveform in response to the sequentially pulsing of each binary weightedgroup in the 8-bit array. Giventhat linearity of acousticpressureshasalreadybeenproven,it is not surprising that the predicted and actual waveforms closely matcheachother. This experimentalso points out a very importantcurrent limitation of DSR.Thebipolar nature of the acoustic responsemakesit difficult to producesignals requiring small amplitudes(or pressures)immediatelyafter a large amplitude.Thenegativepeakof the larger bits drownsout the positive peakof the smallerbits, whichexplainswhya negativepressureis presentimmediatelyafter the MSB clicks instead of a low positive pressurelevel. Several possible methodsfor dealing with this issue will be discussed in the nextcha.pter. 5.4.2 TheBit-by-Bit Reconstructionof a 400HzSinusoid Section5.2 of this chapterdescribedhowan 8-bit digital signal canbe createdto representan analogwaveform.Sinceeachof the speaklet groupsin the 8-bit array are binary weighted,we can study the effect of addingbits oneat a time towardsthe creation of an acoustic waveform. This experiment will serveas the first of manythat will highlight differencesbetween the step and pulse methodof soundreconstruction. Theseries of oscilloscope traces shownin Figure 5.11 describethe reconstructionof a 400Hzsinusoidas the individual bit groupsof the 8-bit array are addedsequentially. Figure 5.11 (a) and 5.11 (b) compare the step and pulse methods 33 L__~ ...... Ibl PulseM~ethod (a) StePMethod ............... Bit 7 (MSB) + Bit 6 + Bit 5 + Bit 1 + Bit 0 (LSB) Time(500lzs/div) Figure 5.11 Reconstruction of a 400 Hz sinusoid with various levels of last-group inclusion (sampledat 8 kHz). Weused a 701zs pulse width for the pulse method. of this reconstruction, respectively. In the first set of traces, only the mostsignificant bit (MSB), containing 128 speaklets, are clicked. The secondset describes the addition of the next most 34 significant bit, containing64 speaklets,andso forth. Threekey observationscanbe madefrom these sequences:(1) the waveform producedby the step methodis almostfour times greater amplitudethan the waveformproducedby the pulse method;(2) the pulse methodproduces amplitudemodulated reconstructedoutput with a carrier frequencyequal to the samplingrate; and(3), regardlessof either method,the additionof the twoleast significant bits (LSB)haslittle or no effect on the overall waveform. 5.4.3 Effect of LowPassFilter Based on the results seenin Figure5.11, digital soundreconstructionis clearly at an early stage of development.Thesamehigh frequency components observedin the acoustic responseof single membranes are present in the exponential andsinusoidal waveforms.Thesefrequencies are below 20 kHzand heard as distortion abovethe lower frequencyfundamental.However, weare forgetting the last: importantstep in DSR or any other sampledreconstructionprocess: the lowpassfilter. As mentionedpreviously, the nature of DSRinvolves the production of higher frequencies usually associated with the samplingfrequency and resonancesof the membrane, earphone cavity, andmicrophone.Ideally, the human ear will filter additional filtering out manyof these frequencies,but might be necessaryfor lower samplingfrequencies.Figure 5.12 showsthe same8-bit, 400 Hz reconstructed waveformwith several different low pass filters applied electronically: cutoff frequency,fc = 22 kHz, 11 kHz, 4 kHz, and2 kHz.As the cutoff frequency is decreased, the oscillations resulting from the membrane and cavity resonancesare attenuatedand the quality of the waveformimprovesgreatly. For the waveformreconstructed using the pulse method,the amplitudemodulationeffect is completelyremoved with the 4 kHz cutoff frequencybecausethe 8 kHzsamplingfrequencyhas beenfiltered out. As a great deal of the energyof the unfiltered signal arosefrom the samplingfrequency(including harmonics andside bands), the amplitudeof the filtered 400 Hz sinusoid becomes very small compared with the waveform using the step method.Tile correspondingfrequencyspectra andcalculated (a) Step Method (b) Pulse Method Nofilter fc = 22 kHz fc = 11 kHz ./ fc = 4 kHz / fc = 2 kHz Time(500,usldiv) Figure 5,12 Oscilloscope traces of a reconstructed .400 Hz sinusoid (sampledat 8 kHz) for a variety of lowpass filters applied electronically. A comparisonbetweenthe (a) step method and (b) pulse method shown. Weused a 70/~s pulse width for the pulse method(b). total harmonic distortion (THD) values are shown in Figure 5.13, from which three key observations can be made. First, the application of the low-pass filter clearly attenuates frequencies abovetheir cutoff frequencies with minimal affect on the fundamental. Second, we notice from the changein harmonicdistortion that the pulse methodbenefits from the low-pass filter a great deal morethan the step method,particularly whenthe cutoff frequency drops below the sample frequency. This is to be expected as the pulse method contains more energy around the sampling frequency (and its harmonics) than the step method. Finally, note that with either reconstruction methodand sufficient low-passfiltering, arrays can attain THDvalues that correspond to good quality 36 we should the digital speaker analog speakers (-1% THD). (a) Step Method ’°1 ...................... THD= 41.14% (b) PulseMethod THD= 97+68% .+o + |1 2 Nofilter ....[ , ~ I I1~11IIII!111!1111 -,o+.......................... THD= 39.26% fc = 22 kHz ............................ THD= 25.85% +,,,+ THD= 76.13% \ THD= 4.32% +~ ~~ +,++....................... It IIIIIIIIIIIIIIII IIIIIIIIIII I~ II! lllll IIn fc = 11 kHz THD= 2.21% fc = 4 kHz °].......................................................................... THD = 0+63% THD= 0.14% fc = 2 kHz Figure5.13Frequency spectraandcorresponding total harmonic distortion(THD) valuesfor a reconstructed 400 sinusoid (8 kHzsampfing rate)after a vadety of low-pass filters areapplied electronically. Boththe(a) stepand pulsemethods are shown+ 37 5.4.4 Characterization of ReconstructedWaveforms In Section5.3, wedescribedhowthe acoustic responseof a single bit groupvaries with the voltage and width of the applied pulse, lit was shownthat (1) the pressure increases exponentially with voltage until the membranes collapse andlinearly afterwards; and(2) the pressureincreases linearly with pulse width until it reachesa maximum amplitude. Sothe question wewantto answeris whetherthe samevariations hold for a digitally reconstructed waveform. Figure 5.14(a) shows a reconstructed 400 Hz (fs = 8 kHz, f¢ = 4 kHz) sinusoid for range of pulse voltages. From this diagram,wecan see that the ¢--o.5 shapeof the waveformcloes not changewith increasing voltage. (a) Time(p.s) The maximumpressure of these waveformsis plotted in Figure 5.14(b), showingthat as the pulse voltage increases, the pressure follows the samepattern for the 400 Hz sinusoid as the acoustic response. Furthermore, the transition point between the Fi~]ure ~i.14 (a) Reconstructed400 Hz sinusoid as a function of pulse voltage (200/is pulse width) using the step methodsampled at 8 kHz. (b) Maximum pressureof (a). exponentialandlinear regions also occursbetween30 and35 Volts. In the acoustic responsemeasurements, the voltage pulses werespacedfar enoughapart to avoid interactions betweenconsecutivepulses. Whenusing the pulse methodto reconstruct sound,the pulsespacingis fixed by the samplingrate. Therefore,varying the pulsewidth will havea slightly different effect with reconstructedwaveformsbecausethe spacing between consecutive pulses decreasesas the pulse width increases. Figure 5.15 showsthe same reconstructed 400 Hz sinusoid for a rangeof pulse widths, including the measuredmaximum pressure for each ~ o pulse width. As the pulse width increases, the a. -0.5 mass of the membranes prevent -I -I .5 them from Time (ms) completely responding to the low 1.4 voltage(digital zero) beforethe cycle 08 repeats.Therefore,it is not surprising that as the pulse width approaches the 0.2 0 sampleperiod, 125 Us, or 100%duty 20 observedusing the step rnethod. 60 80 1 O0 120 F~lse W~th(~s) (b) cycle, the waveform matches one 40 Figure5.15 (a) Reconstructed 400Hzsinusoidas a function of pulse width (40 Volts) using the pulse methodsampledat 8 kHz. (b) Maximum pressureof (a). 5.4.5 Effect of SamplingFrequency The samplefrequency at which digital sound reconstruction occurs limits the range of frequenciesthat can be created. Accordingto the Nyquist Samplingtheorem,with a sampling frequency,fs, wecan reconstructsignals with frequenciesup to 0.5fs. SinceDSRinvolves the creation of soundin time, highersamplingrates will also increasethe resolution andquality of the waveform.However,as mentioned in Chapter2, the samplingrate is usually limited by the responsetime of the membranes. The application of the low pass filter will also play an important role in choosing a samplefrequency. For example, if a complexwaveformis reconstructedat a 8 kHzsamplingfrequency,then a low pass filter with 4 kHzcutoff (3 dB) frequencyshould be usedto allow any frequenciesunder4 kHzto be properly reconstructed. Figure 5.16 showsthe reconstruction of a 400 Hz sinusoid at four different frequencies:4 kHz, 8 kHz, 22 kHz, and44.1 kHzusing a low passfilter 39 sampling with 2 kHz, 4 kHz, 11 (a) Step Method \ (b) Pulse Method / ~._t/~ /~7\ "--._ ,/~ "-._~ ./-~’~ "-~" fs=4 fc 2kHz kHz 133ps pulse ,~.j’ ’\/ fs = 8 kHz fc = 4 kHz 70 f~s pulse + ~ i~,,, it~.t[i;~.w ...................... 15 ~ pulse ..... Time(1 ms/div) Figure5.16 Reconstructed 400Hzsinusoidas a function of samplingrate for both the step andpulse methods. Thecorresponding low-pass filter andpulsewidthfor (b) are listed on the right. kHz, and 22 kHz cutoff frequencies, respectively. In the pulse methodexamples,notice that the sampling frequency is adequately filtered regardless of sound reconstruction out to removethe amplitude modulation. However, method, the higher sampling rate and low pass filter combinations are incapable of completely removing the oscillations. Depending on the bandwidth and acceptable amountof distortion in the application, it maybe necessary to decreasethe cutoff frequencyof the low passfilter until the desired quality is reached. 5.4.6 Frequency Characterization Now,we focus our attention on the reconstruction of a wider range of frequencies. Figure 5.17 showsthe reconstruction of a 50, 100, 200, 400, 800, 2000, and 4000 Hz sinusoid using both the step and pulse methods. For the pulse method, we used a 70 ~s pulse width (56% duty cycle), but kept the same8 kHz samplingrate and 4 kHzcutoff low-pass filter 4O for both sets of (a) StepMethod (b) PulseMethod 50 Hz 5.0 ms/div 100 Hz 2.0 ms/div 200 Hz 2.0 ms/div 400 Hz 500ps/div ~--X~ ~/i ~, .,,.,: 200 IxS/div ,i ~" ................................................................ Time(per Figure 5.17 Reconstruction of sinusoid for different frequencies using the step and pulse methods sampled at 8 kHz. The corresponding frequency and time axis for each diagram is listed on the right. experiments. With this data, we can begin to construct a frequency response of the speaker array acting as a digital speaker array, not to be confused with the frequency response measurementof the CMC)S-MEMS membranesdiscussed in Chapter 3. Shownin Figure 5.18, 41 we see that the step method consistently produces a higher 9O intensity 8O 7O sound than the pulse ¯ methodfor the audio range up to 4 ~_ so m 4o StepMethod Pu~seMethod kHz. In either case, the soundpower 20 lO o levels for both methodsare typically 100 lO 1000 10000 Frequency(Hz) 78 - 95 dB SPLover all frequencies between 50 and 4000 Hz. For Figure 5.18 Frequencyresponseof 8-bit array whenoperating as a digital array using the step and pulse methods.The analog ddven membrane responseis overlaid on the graphfor comparison. comparison, the analog driven membrane response is overlaid in the figure to highlight the improvedperformanceof the digital speaker array over the analog speakerarray. (a) Step Method (b) Pulse Method . , ,, ., Sawtooth ~, Wave Triangle Wave Time(1 msldiv) Figure 5.19 Reconstructed 400Hz waveforms (square, sawtooth,andtriangle wave)using 8 kHzsampling rate (4 kHzcutoff frequency filter) for bothstep andpulsemethods. Weuseda 70ps pulsewidthin (b). 5.4. 7 Reconstruction of ComplexPeriodic Waveforms Figure 5.19 showsthe reconstructed waveformsof several periodic signals containing multiple frequency components:(a) square wave, (b) sawtooth wave, and (c) triangle wave. The 42 waveforms werecreatedusing an 8 kHzsamplingrate and4 kHzcutoff low-passfilter. Similar to the sinusoidal waveforms,the step methodproducesa higher pressurewaveformthan the pulse method. However,the shapeof the waveformbetter resemblesthe intended analog waveform using the pulse methodthan the step method,mostlikely in part from the continuous pulsing that is missingusing the step method.However,the switch from a very high pressure to very low pressure found in the square and sawtooth waveformsis difficult to produce regardlessof whichmethodweuse. This artifact, seenin the figure as periodic spikes, was discussed previouslyin Section5.4.1 on the pulsingof sequentialbits. 5.4.8 SoundRecording.,~ Althoughdigital soundreconstructionhas beentheorized in several patents andpapers,it has not beendemonstrated due to the high costs of engineeringa uniformspeakerarray. With the CMOS-MEMS microspeakerarrays presented in Chapter 4, we were able to prove this theory and take DSRto the next step of becomingan emergent technology. To complete our demonstration, we included somesoundrecordings of DSRin action. Although a complete listing can be found in AppendixB, we will outline the content on the enclosedcompactdisc (playablein computers andCDplayers)in this section. First, werecordedthe chck, a terminitially mentioned in Chapter2 to meanthe snappingof a single or groupof membranes. Youwill quickly understandwhywe chosethat wordto describe this action. Then,werecordedthree experirnentspresentedin this chapter: (1) effect of lowpass filter and (2) sampling frequency, on the reconstruction of a 400 Hz sinusoid; (3) reconstruction of a rangeof frequencies. In all three experiments,wecompared the pulse and step methods.Oneobservation that you will notice immediatelyis the different loudness betweenthe step and pulse methodsand howeachis affected by the low-pass filter. The presence of the low-passfilter wasexplainedin the discussionin Section5.4.3. Thefinal set of recordings consists of musicfrom a wide variety of genres. Before these recordings,weincludeda short audioclip whenthe musicis silent but the electronics are still 43 operational. Depending on the reconstructionmethodandlow-passfiler used, you will hear two different formsof background noise. Withthe pulsemethod,no input voltage, andno filter, you will hear a constantfrequencyequal to the samplingfrequency.To explain, weneedto refer to Section5.2, wherewedescribedhowthe input analogsignal is divided up into quantizedlevels representedby a binary r~umber.With this designation,the mostpositive pressureis assigned the highest level (usually 255for an 8-bit quantization), andthe mostnegative pressure assignedthe lowest level (usually zero). So what happenswhenno pressureis present?The middlelevel (around127or 128for 8-bit) is sentto the array. If the pulsemethod is used,then the bit groupsrepresentingthat middle level are periodically snapped at a rate equal to the samplingfrequency.Withoutthe presenceof a low-passfilter, the samplingfrequencywill be the major contribution of noise. If the step methodis used, then these bit groupsremain collapseduntil the level changes.Thusthe primary sourceof soundusing the step methodis the electronic andthermalnoisein the system. Wheredoesthe electronic noise comefrom’? Twosourcesof noise typically associated with ADC circuits includetimejitter andoccasionaldrift outsidethe input rangeof the ADC [9]. If we look at the path of the input signal taken during this process,you will notice whythesenoise sourcesare so influential. Theinput signal starts fromdigital samplesstoredin the CDor other digital mediaplayer. Thesignal is internally converted to an analog waveformand then convertedback to 8-bit digital samplesand amplified by the hardwaresetup discussedin Section 5.2. The microphoneconverts and amplifies the acoustic waveformto an analog voltage. After applyingour electronic filter, the analogsignal is then againconvertedbackto digital samplesfor recording onto the computer.Muchof this complicatedpathwaywouldbe unnecessaryand contain less noise if we useda music source that generatedthe digital samples directly. Similar experimentswith the sinusoids are also heardon the CDusing music, including the effect of the low-passfilter andsamplingfrequency.Oneexperimentadjusts the samplingrate 44 from an initial frequencybelow the low-pass filter cutoff frequency(wherethe sampling frequencyoverwhelms the music) to a final frequencywell abovethe cut-off frequency(where the musicis mostlyheard). Putting these details aside, however,you will find the recordings very impressivefor an acoustictechnology this early in its development. 5.4.9 Summary In this section wepresentedthe reconstruction of several types of waveforms,ranging from simpleexponentialandsinusoidal signals to morecomplexmusicalrecordings. Westarted with the sequentialpulsingof eachbit groupin the 8-bit speakletchip to forma periodic, exponential waveform.This simpleexperimenthighlighted the effect of a bipolar acoustic responseon the reconstructionprocess.Next, weillustrated the bit-by-bit reconstructionof a 400Hzsinusoid using the step and pulse methods. The pulse methodcreates an amplitude modulated waveform with a carrier frequencyequalto the samplingrate. For this reason,the pulse method initially contains moreenergyandexhibits a louder soundover the step method.After the application of our low-passfilter, the pulse method considerablyreducesin volumebut contains a cleaner waveform.Wealso confirmedthat the input voltage andpulse width (during the pulse method)exhibit the sameeffect on the reconstructed waveformas they do on the acoustic responseof a single speaklet. Fromthe reconstruction of several different frequencies, we wereable to constructthe frequencyresponseof the 8-bit array whendigitally driven andfound that it performs10-20dBbetter from 50 to ,$000Hzthan an analogdriven array. Finally, we demonstratedthe reconstruction of more complexperiodic waveformssuch as square and sawtoothwaves,andfinished off with musicalexcerpts. Despitethe substantial noise created from the ADCand other electronics, the soundrecordings included with this thesis are undeniableproof that digitally reconstructedsoundwill soonreplace its analogcounterpartto createa truly digital audiosystem for the future. 45 Chapter6" Conclusions andFuture Work This thesis covereda lot of material, from the conceptsand issues regardingdigital sound reconstruction to the numerous experimentsto characterize the performanceof our digital speakerarrays. Sobeforewediscussthe implications of this researchas well as future workto improvethe quality of our DSRearphones, it is importantto reviewthe material coveredin the last five chapters. Weintroducedyou to the conceptof digital soundreconstruction, a processby whichdiscrete acoustic pulses of energy created from an array of speakersor speaklets are summed to producea time-varying waveform.Theresulting pressurevariations havea digital magnitude correspondingto the analog value at sampledtime values. An n-bit DSRchip needs2n-1 speakletsin the array. As this formof true direct soundis constructedin the time domain,there are several requirementsfor DSRto be effective. Theacoustic pressureresponseto a finite pulse mustbe fast comparableto the samplingrate. To ensurea linear system, these same responses mustaddlinearly andbe uniformover time from onespeakerin the array to another. Developed at Carnegie Mellon University,, performance reliability a CMOS-MEMS microspeaker provides the and cost-effectivenessneededfor an array of speakers.Wedescribed the microspeakerfabrication and the dynamicmodelsuseful for characterizing individual membranes. Fromour magnifiedview of a single speaklet, we then presentedthe global issues concerningtransducer arrays, including harmonicdistortion, a meansfor evaluating the effectivenessof any souredreconstructionprocess. Finally, wepresenteda wealthof experimental data designedto provethe possibility of digitally reconstructedsoundandexamineits limitations and further areas of improvement.To drive homethe ideas behind DSR,we included oscilloscope traces, frequency spectra, and even soundrecordings for those whoprefer to makejudgmentsbasedon what they hear. So where doesthat leave us? Digital soundreconstructionhas beensuccessfullydemonstrated. It is clear from the results presentedin Chapter5, however,that the there is moreresearchto be done.Theexperimental data doesnot completelymatchthe DSRtheory, but it doessupport it. Thebipolar acoustic responseof our membranes makesit difficult to reconstruct pressurewaveforms accordingto the digital signal sent to the array. Theresulting waveforms, however,are periodic with 50-60% harmonicdistortion, comparedwith 0.1-10%from traditional analogspeakers. Theresponse time of our current membranes are too slow compared with the 44.1 kHzsamplingrate usedin the music industry, but the processworksnicely around8 kHz, sufficient for reproducing frequencies under 4 kHz. Throughfurther mechanicaland acoustical modelingof the CMOSMEMS membrane,we can dampenor adjust system resonances and design a speaker to operate at higher samplingrates with minimaloscillations. Throughmorecareful processing control duringfabrication .and testing, wecanimprovespeakeruniformity andlinearity to reduce harmonic distortion. If weare not able to modify the bipolar acoustic response,then the step methodmayhold the key to generatingcleaner signals. ThepremisebehindDSRpresentedin this researchinvolves the summation of positive pressuresto create a time-varying waveformwith a positive DC bias. Asdescribedin Chapter5, this bias can be heard during silence without proper filtering. Therefore,an alternative solution wouldbe to create both positive andnegative pressuresusing the step pulse. As shown Jr! Figure6.1, whena positive unit of pressureis Figure 6.1 Oscilloscopetrace for the acousticstep response of the MSBwhena step pulse is appfied.. The positive pressureoccurs whenthe high voltage is present and the negativepressureoccurswhenthe votlage is takenaway. needed,a high voltage is generatedandthe speakletcollapsesto the substrate. Thenext time 4"/ a negativeunit of pressureis needed,the voltagedropsto zero andthe speakletreleasesto its original position. Thedisplayed acoustic output is really a step responseinstead of pulse response so the samelinearity and uniformity issues with pulses apply. With on-chip electronics, the current position of eachspeakletcould be stored to avoid confusionof which membranes havecollapsed and whichare released. Theuseof on-chipelectronics will also allow us to look beyond the simpledigital voltagepulses and create an optimal pulse shapefor generating a desired acoustic response. Instead of assigningspeakletsto a specific bit group,these electronics coulddynamicallydistribute the load over the entire array. It is also conceivableto detect brokenmembranes andredirect their signals to other backupspeaklets. Both of these features would dramatically improvethe reliability andperformance of the chip. With the flexibility of CMOS-MEMS technology,there are manyopportunities for improvingthe quality of digitally reconstructed sound.It is only a matterof time beforeanalogtransducersare replacedby their digital counterparts towardsthe constructionof a truly digital soundsystem. References [1] Doi et al., "Fluid Flow Control SpeakerSystem," USpatent 4194095(1980). [2] W.E.Stinger, Jr. "Direct Digital Loudspeaker,"USpatent 4515997(1985). [3] Huanget al., "Distortion and Directivity in a Digital TransducerArray Loudspeaker,"Journal of Audio EngineeringSociety, vol. 49, pp. 337-352(200t May). [4] P. Morseand K. Ingard, Theoretical Acoustics (McGraw-Hill, NewYork, 1968). [5] G.K. Fedderet al., "LaminatedHigh-Aspect Ratio Microstructures in a Conventional CMOS Process," Sensorsand Actuators A 57, pp. 103-110 (1996). [6] J.J. Neumann,K.J. Gabriel, "CMOS-MEMS Membranefor Audio-Frequency Acoustic Actuation," Sensorsand Actuators A 95, pp. 175-182(2002). [7] Ladabaum et al., "Sud:ace MicromachinedCapacitive Ultrasonic Transducers," IEEE Transactionson Ultrasonics, Ferroelectrics, and FrequencyControl, vol. 45, pp. 678-690(1998). [8] W.C. Young,Roark’s Formulasfor Stress and Strain (McGraw-Hill, NewYork, 6th ed. 1989). [9] Tewksburyet al., "TerminologyRelated to the Performanceof S/H, A/D, and D/A Circuits," IEEETransactions on Circuits and Systems,vol. cas-25, pp. 419-426(1978). AppendixA: Circuit Schematics A.1 Generation of Digital SamplesUsing D/A Converter ADC[3o! B1 k4N D2 -- VREF4- D5 -- ’,,’R EF-- D4- ....~Joc OE VDC = 0 VAC~ 1 Vpp D5_ D6 D7 EOC- Digital samples to amplifier circuit A.2 High Voltage Amplification Circuit +Vdd ~-Vdd Step Method digital samplesin or Pulse Method (~ Labviewor ADCcircuit 50 A.3 Low-PassFilter Circuit 3-poleButterworthfilter with 4 kHz3dBcutoff frequency Resistorvalues:10, 20, 70, and178kgZcorrespond to 2, 4, 11, and22 kHzcutoff frequencies, respectively. : outputto oscilloscope, spectrum analyzer,etc. conditioning amplifier (- 2 Vpp) NOTE: Theabove3-polebutterworthfilter circuit wasusedin conjuctionwith a 2-polefilter in the B&Kconditioningamplifier to forma 5-pole low-passfilter. Themagnitude andphase response of a 5-pole(4 kHzcutoff) filter is shown below. -80 Magnitude responseof 5-pole low-passfilter -300 Phaseresponseof 5-pole low-passfilter 51 AppendixB" Index of SoundRecordings 200 #s pulses at I s intervals using MSB(128 speaklets) [1] without low-passfilter (LPF) :10 [2] with 4 kHz 3 dB LPF :10 Various frequencies (50, 100, 200, 400, 800, 2000, 4000 Hz) sampledat 8 kHz (4 kHz LPF) [3] pulse method(70 Us pulse width) :36 [4] step method :33 Varying sampling rate/LPF combinations (44 kHz w/22 kHz LPF, 22 kHz w/11 kHz LPF, 8 kHz w/4 kHz LPF, 4 kHz w/2 kHz LPF) of 400 Hz sinusoid [5] pulse method(15, 25, 70~ 125 #s pulse widths, respectively) :20 [6] step method :20 Varying LPF3 dB cutoff frequencies (no filter, 22, 11,8, and 4 kHz) of 400 Hz sinusoid sampledat 8 kHz [8] pulse method(70 Us pulse width) :26 [7] step method :26 Silence sampledat 8 kHz with electronics turned on [9] pulse method(70 p~s pulse width) [10] step method Excerpt from "If" :11 :10 by Perry Comosampledat [11] 44 kHz (22 kHz LPF) using pulse method(11.4 ks pulse width) :30 [12] 44 kHz (22 kHz LPF) using step method :30 [13] 22 kHz (11 kHz LPF) using pulse method(22.7 ks pulse width) :29 [14] 22 kHz (11 kHz LPF) using step method :30 [15] 8 kHz (4 kHz LPF) using pulse method(62.5 #s pulse width) :29 [16] 8 kHz (4 kHz LPF) using step method :29 [17] 4 kHz (2 kHz LPF) using pulse method(125 #s pulse width) :31 [18] 4 kHz (2 kHz LPF) using step method :30 [19] Excerpt from "Battlestar Galactica" with samplingrate sweepfrom 1 kHz to 44.1 kHz (4 kHz LPF) using pulse method(50% duty cycle pulse width) :40 Excerpt from "Pictures at an Exhibition" sampledat 8 kHz using step methodwith [20] no low-passfilter :34 [21] 22 kHz LPF :33 [22] 11kHzLPF :33 [23] 4 kHz LPF :33 [24] 2 kHz LPF :34 Songselections sampledat 8 kHz (4 kHz LPF) using step method [25] "Sleighride" by Leroy Anderson 2:50 [26] "Splashdown" from Apo/fo 13 2:05 [27] 1st movementfrom "SymphonyNo. 5" by Ludwig ,Jan Beethoven [28] "Also Sprach Zarathustra" by Richard Strauss 1:39 1:35