www.ece.utexas.edu - The University of Texas at Austin
Transcription
www.ece.utexas.edu - The University of Texas at Austin
Copyright by Jay Brady Fletcher 2005 Integrated Noise Cancellation with the Least Mean Square Algorithm and the Logarithmic Number System by Jay Brady Fletcher, B.S. REPORT Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ENGINEERING THE UNIVERSITY OF TEXAS AT AUSTIN December 2005 Integrated Noise Cancellation with the Least Mean Square Algorithm and the Logarithmic Number System APPROVED BY SUPERVISING COMMITTEE: Jacob Abraham, Supervisor Mark McDermott Dedicated to my pet fish Rooney. Acknowledgments I would like to thank my friends, project partners, teachers, and family for their unending support in my endeavors. v Integrated Noise Cancellation with the Least Mean Square Algorithm and the Logarithmic Number System Jay Brady Fletcher, M.S.E. The University of Texas at Austin, 2005 Supervisor: Jacob Abraham This paper outlines design considerations and implementation aspects of a portable active noise cancellation solution. Power, area, and performance tradeoffs are examined. vi Table of Contents Acknowledgments v Abstract vi List of Tables x List of Figures xi Chapter 1. Introduction 1.1 A Brief History of ANC . . . . . . . . . . . . . . . 1.2 Product Survey . . . . . . . . . . . . . . . . . . . 1.2.1 Modern Active Noise Cancellation Systems 1.2.2 Portable Media Players . . . . . . . . . . . Chapter 2. Specifications 2.1 Area and Cost . . . . . . . . . . . 2.2 Power . . . . . . . . . . . . . . . . 2.3 Performance . . . . . . . . . . . . 2.3.1 Noise Cancellation . . . . . 2.3.2 Convergence . . . . . . . . 2.3.3 Summary of Specifications . Chapter 3. Modelling the System 3.1 Function Breakdown . . . . . . 3.1.1 Active Noise Control . . . 3.1.2 Acoustic Model . . . . . . 3.1.3 ADC . . . . . . . . . . . 3.1.4 Test Bench . . . . . . . . 3.1.5 Input Builder . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 7 7 7 . . . . . . 8 8 10 11 11 12 12 . . . . . . 14 16 16 16 17 18 19 3.2 Modelling the Logarithmic Number System 3.2.1 LNS Finite Word-Length Noise . . . 3.3 Model Results . . . . . . . . . . . . . . . . 3.3.1 Word Length Results . . . . . . . . . . . . 19 21 28 28 . . . . . . . . . . . . . 34 34 34 35 35 37 37 39 42 44 44 45 45 46 Chapter 5. Filter Hardware 5.1 Possible Enhancements . . . . . . . . . . . . . . . . . . . . . . 5.2 Pipelining the LMS Filter . . . . . . . . . . . . . . . . . . . . 5.3 Other Filter Options . . . . . . . . . . . . . . . . . . . . . . . 50 54 55 56 Appendices 57 Chapter 4. Low-Power Multiply/Add 4.1 Logarithmic Number System . . . . 4.2 Implementation . . . . . . . . . . . 4.2.1 LNS Multiplication . . . . . 4.2.2 LNS Adder . . . . . . . . . . 4.2.3 Linear Adder . . . . . . . . . 4.2.4 Multiplexer . . . . . . . . . . 4.2.5 ROM . . . . . . . . . . . . . 4.2.5.1 Cell . . . . . . . . . 4.2.5.2 Row Circuitry . . . . 4.2.5.3 Column Circuitry . . 4.3 Circuit-level Simulation . . . . . . . 4.3.1 ROM Simulation . . . . . . . 4.3.2 Linear Adder . . . . . . . . . Appendix A. . . . . . . . . Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Least-Mean Square Algorithm Appendix B. Matlab Routines B.1 Modelling Subroutines . . . . . . . B.1.1 LNS Fix and Saturate . . . . B.2 Implementation Aid . . . . . . . . . B.2.1 LNS Lookup Table Generator viii . . . . . . . . . . . . . . . . 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 62 62 64 64 Appendix C. ROM Compiler in PERL C.1 HSPICE Output File Samples . . . . . . . . . . . . . . . . . . 67 69 Index 70 Bibliography 72 Vita 75 ix List of Tables 1.1 Several active noise cancellation headphones available at publication[5][21][24]. . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 2.2 Assumptions for estimating the size of an SoC device. . . . . . Active noise control specifications. . . . . . . . . . . . . . . . . 9 13 3.1 3.2 3.3 3.4 Matlab ADC Parameters. . . . . . . . . . . . . . . . . . Perfomance measures acquired in the test bench function. LNS word stored as a string . . . . . . . . . . . . . . . . Configuration used to determine NROD . . . . . . . . . . . . . . . 18 19 21 28 4.1 4.2 ROM Compiler high-level features. . . . . . . . . . . . . . . . Implementation results. . . . . . . . . . . . . . . . . . . . . . . 40 47 5.1 5.2 Filter hardware requirements. . . . . . . . . . . . . . . . . . . Filter hardware results. . . . . . . . . . . . . . . . . . . . . . . 54 54 x . . . . . . . . List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Acoustic summation of two audio signals. . . . . . . . . . . . . Feedback only noise cancellation. . . . . . . . . . . . . . . . . Feedforward ANC using the LMS algorithm. . . . . . . . . . . Secondary source transfer function, H(z), is added to the LMS system [16]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filtered-X LMS system with transfer function C(x) added. . . Leaky LMS system. . . . . . . . . . . . . . . . . . . . . . . . . Secondary feedback path compensation added to the LMS system. 2.1 2.2 Performance of an active noise control algorithm. . . . . . . . Typical settling time of an active noise control algorithm. . . . 3.1 Overview of Matlab model including the test bench, input builder, and active noise control blocks. . . . . . . . . . . . . . . . . . Active noise control module implemented in Matlab. . . . . . Input builder window-concatenate operation. . . . . . . . . . . The LNS bow-tie depicts the LNS fixed-point conversion error. As |X| approaches 0, XL approaches −∞. . . . . . . . . . . . Mean-square error (MSE) for several different LNS bit precisions. Mean-square error (MSE) for varying number of taps. . . . . . LNS Fixed-Point Multiplcation error for inputs from −1 to +1. LNS Fixed-Point Addition error for inputs from −1 to +1. . . Architecture and design decision tree. . . . . . . . . . . . . . . Determining the fractional precision of the LNS words. 6/2 is found to be optimum. . . . . . . . . . . . . . . . . . . . . . . . MSE measurements after the filter has converged for different LNS fractional precisions. . . . . . . . . . . . . . . . . . . . . BLNCP for the implemented architecture. . . . . . . . . . . . Convergence measurement of the filter. . . . . . . . . . . . . . 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 xi 2 2 3 4 4 5 6 12 13 15 16 20 22 23 25 26 27 27 29 30 31 32 33 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 5.1 5.2 5.3 5.4 LNS multiply logic. . . . . . . . . . . . . . . . . . . . . . . . . LNS addition block. . . . . . . . . . . . . . . . . . . . . . . . . Ladner-Fischer tree adder[11] used for LNS operations. . . . . Saturating linear adder used in LNS adder and multiplier blocks. 4-to-1 CMOS multiplexer [31]. . . . . . . . . . . . . . . . . . . Overall ROM organization. Slices are interleaved in actual implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-ended bit-cell shown with parasitic metal capacitance and resistance. . . . . . . . . . . . . . . . . . . . . . . . . . . Differential ROM bit-cell with complementary output (BLb) and parasitic elements. . . . . . . . . . . . . . . . . . . . . . . Current mirror sense amplifier. . . . . . . . . . . . . . . . . . . Latching sense amplifier . . . . . . . . . . . . . . . . . . . . . Power of the 8-bit ROM with inverter receiver. . . . . . . . . . ROM read access time. . . . . . . . . . . . . . . . . . . . . . . Power consumption in the 8-bit linear saturating adder. . . . . Worst case timing measurement of the 8-bit linear saturating adder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct form FIR filter block diagram. . . Transposed FIR filter block diagram. . . Least-mean square weight update block. Filter implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 42 43 45 46 47 48 49 49 . . . . 51 51 52 53 A.1 A diagram of an adaptive filter system. . . . . . . . . . . . . . 58 C.1 Block diagram of custom ROM compiler written in PERL. . . 68 xii . . . . 35 37 38 39 40 Chapter 1 Introduction This document describes the design and validation of an on-chip noise cancellation solution for low power applications. The development stems from an increasing demand for feature rich mobile audio. 1.1 A Brief History of ANC In 1936, Paul Leug filed patent number 2,043,416 describing how undesirable audio tones could be selectively removed from the acoustic spectrum by broadcasting the undesirable signal with a phase shift of π radians using a microphone and loudspeaker. Leug stated [18]: “According to the present invention the sound oscillations, which are to be silenced are taken in by a receiver and reproduced by a reproducing apparatus in the form of sounds having an opposite phase.” The acoustic waves add together in the atmosphere, minimizing the undesirable acoustic signal, a concept that has existed long before Leug’s patent. This simple concept is illustrated in figure 1.1. The 1936 patent suggests that that the microphone and loudspeaker be placed a distance apart such that the π radian phase shift is realized. One of the first active noise cancellation techniques was developed by Olson and May in 1953 [20]. Deemed the “Electronic Sound Absorber,” this solution used a simple feedback loop to provide the secondary source. This system is depicted in figure 1.2. 1 1.5 Noise Secondary Source Acoustic Sum 1 0.5 0 −0.5 −1 −1.5 0 5 10 15 20 25 30 35 Figure 1.1: Acoustic summation of two audio signals. Acoustic x(n) + e(n) + y(n) W(z) Electric Figure 1.2: Feedback only noise cancellation. 2 Acoustic x(n) d(n) P(z) e(n) + + y(n) W(z) LMS Electric Figure 1.3: Feedforward ANC using the LMS algorithm. Early work by Howells and Applebaum at GE in 1957[12] sparked development of adaptive interference cancelling techniques. Bernard Widrow and Samuel Stearns developed the least-mean square, or LMS, algorithm in 1959 [30]. The attractiveness in the LMS algorithm is in its simplicity. A complete derivation of the LMS algorithm is included in appendix A. The LMS algorithm by itself provides good noise cancellation, but a higher level of noise cancellation can be achieved by taking other system characteristics into consideration. Note that when the acoustic-electric boundary is crossed, a data converter is used along with either a microphone or speaker. The speaker may have a phase and magnitude response, shown as H(z) in figure 1.4, that can be compensated for by filtering the input x(n) [16]. The resulting system is shown in figure 1.5 and is referred to as FX-LMS , or filtered-X LMS. The LMS algorithm can be modified slightly to help finite precision rounding noise. This technique is known as leaky LMS [16] and is shown in figure 1.6. 3 Acoustic x(n) d(n) P(z) e(n) + + H(z) y(n) W(z) LMS Electric Figure 1.4: Secondary source transfer function, H(z), is added to the LMS system [16]. Acoustic x(n) d(n) P(z) e(n) + + H(z) W(z) y(n) C(z) x'(n) LMS Electric Figure 1.5: Filtered-X LMS system with transfer function C(x) added. 4 Acoustic x(n) d(n) P(z) e(n) + + H(z) W(z) y(n) C(z) x'(n) Leaky LMS Electric Figure 1.6: Leaky LMS system. One final consideration of the noise cancelling system brings the acoustice feedback path into focus. The secondary source is a speaker that may feed back into the reference microphone with some amount of attenuation and phase shift. This path is referred to as the secondary feedback path [16]. The secondary feedback path can corrupt the reference and lessen the performance of the adaptive system. In order to compensate for this path, an electronic counterpart can be added into the adaptive filter module. The idea is to take the secondary source output, apply a digital filter that mimicks the secondary path, and subtract the result from the reference signal [16]. Figure 1.7 shows the complete system with this enhancement added. Note that the acoustic secondary feedback path transfer function, F (z), would need to be static for the life of the product, lessening the usefulness of this approach as the plant is considered to be dynamic and the feedback path is similar to the plant. 5 Acoustic F(z) + u(n) d(n) P(z) + + e(n) + H(z) + - D(z) x(n) y(n) W(z) C(z) x'(n) Leaky LMS Electric Figure 1.7: Secondary feedback path compensation added to the LMS system. 6 1.2 1.2.1 Product Survey Modern Active Noise Cancellation Systems Table 1.1 lists many common noise cancellation headphones and their product specifications. Product Cancellation Battery Life Price r ° Bose QuietComfort 2 Unspecified 1xAAA 35 hrs $299 Panasonic RP-HC300 10 dB 1xAAA 35 hrs $119 Sennheiser PXC 300 ≤ 15 dB (<1000 Hz) 2xAAA 80 hrs $189 Table 1.1: Several active noise cancellation headphones available at publication[5][21][24]. While products developed by Panasonic and Sennheiser are cheaper than that by Bose, user reviews of noise cancelling headphones clearly crowned the Bose to be the premier noise cancelling performers. All of the headphones consume about the same power levels via a dedicated battery. Most of the specifications also limit the noise cancellation to a range less than 1 kHz. 1.2.2 Portable Media Players The portable audio/video player market is saturated with devices that have a limited feature set. The media player manufacturers need new features that differentiate their product from existing products without adding singificant power or cost. 7 Chapter 2 Specifications The overall goal of the implemtation is to add a complex feature without adding too much area and power. In order to achieve this goal, clear specifications must be made up front. Approximations for area, power, and performance can be made based on existing product offerings. The priority of the specifications are set in order. 1. Cost (area) 2. Power 3. Performance The following sections describe the reasoning behind the specifications for the on-chip noise cancelling solution. 2.1 Area and Cost While adding the noise cancelling feature to a portable media player will increase the retail price of the end product, only a small increase in the cost of the SoC will be tolerable. The specified cost of adding the noise cancelling feature to an existing device is developed based on die area. According to Jan Rabaey in [23], the amortized cost of an integrated circuit is a function of the area raised to the fourth power. cost = f (area4 ) 8 From [7], the price of normal headphones typically doubles when ANC technology is added. A high end media player sells for $300. The best selling ANC-capable headphones also sell for $300. Assuming a projection of the price of a media player with ANC technology is PAN C , the chip manufacturer could sell their chip for at least PAN C more without giving away the new feature. The increase in area of the device with ANC, AAN C , could be expressed as √ AAN C = f ( 4 P riceIncrease) For PAN C = 125%, the area occupied by the ANC circuits must be less than 6% of the area of the original SoC. If PAN C = 200%, AAN C may be as much as 19% of the original device. With the percent increase in area in mind, a typical SoC die area is needed to determine the exact area requirements of the integrated solution. This estimate of the typical SoC die area, ASoC , can be arrived at by taking the following variables in table 2.1 into consideration. Description Wafer cost Wafer diameter Yield, packaged and tested Asking sale price per part Device sales margin Symbol Cw Dw YP T PASP M Estimate $3000 300 mm 75% $10 70% Table 2.1: Assumptions for estimating the size of an SoC device. From [23], the number of die per wafer, or DPW, is expressed. π · ( D2w )2 π · Dw −√ DP W = ASoC 2 · ASoC (2.1) The second term in 2.1 accounts for the non-functional die around the perimeter of the wafer. If the area of the part is much smaller than the area of the wafer, the second term in 2.1 can be ignored. The cost of the wafer should be roughly equal to the cost of the sum of useable die. 9 Cw = (Useable Die) · (Cost per Die) = DP W · YP T · PASP (1 − M ) π · ( D2w )2 = YP T · PASP (1 − M ) ASoC (2.2) (2.3) (2.4) Solving for ASoC formulates the die size estimate based on these assumptions. ASoC = π · ( D2w )2 YP T · PASP (1 − M ) Cw (2.5) From this formulation and the estimates in table 2.1, an estimate of 53 mm is made. As mentioned, the increase in area of the newly integrated noise cancellation circuitry should occupy an addition 6-19% of this, or between 3 and 10 square millimeters. 2 2.2 Power Many SoC devices’ power consumption is rated on battery life. Based on a typical AA battery, the power of an SoC is related to the battery life as Capacity × Vdd Battery Life 2850 × 1.5 = Battery Life P = (2.6) (2.7) The battery life of the Sigmatel D-major is 50 hrs on 1 AA battery. This results in average power dissipation of 85 mW. Note that the storage device, be it flash or hard disk, will also dissipate power. Stand-alone noise cancelling headphones are always powered off of a separate battery, resulting in a long battery life for the combined solution. The battery life of the Sennheiser PXC 250 headphones is rated at 80 hours with 2 AAA batteries (roughly 94 mW). However, the newly integrated media player SoC may only pay a small power penalty to maintain sufficient battery life. 10 In total, an individual listening to a portable media player with standalone active noise cancellation headphones dissipates around 180 mW. Some of the power is redundant in that there are two audio amplifiers in the system, one in the player and another in the headphones. With this in mind, it is reasonable to allow for an increase in power to account for the noise cancellation circuitry. This increase will be to the player alone. The assumption is made that an increase in power of 10-33% would be tolerable, given that a person would consume nearly 100% more power to use the conventional noise cancelling headphones with a separate battery. This results in a power range of 8-28 mW for the additional logic that handles the noise cancellation. 2.3 Performance Two definitive performance metrics describe the operation of the noise cancelling feature. These are noise cancellation and convergence. 2.3.1 Noise Cancellation Performance of the noise cancelling solution, a measure of the difference in magnitude of the input and output noise is chosen to compete directly with existing noise cancelling headset solutions. Typically the noise rejection is quoted as the average difference in dB over a specified band-limited range. This will be referred to henceforth as the band-limited noise cancelling performance (BLNCP). Figure 2.1 depicts a BLNCP measurement in simulation. A pair of Sennheiser PXC 250 headphones ([24]) are among the latest in noise cancelling headphones on the market. Sennheiser specifies these particular headphones to actively reject “up to 15 dB” at frequencies less than 1 kHz. The PXC 250 headphones also provide 15-20 dB of passive attenuation at frequencies above 1.2 kHz[24]. The passive attenuation is realized via the supraaural ear cups. 11 Band−Limited Noise Cancelling Performance Measure I/P − sin200 and sin400 mu=0.0005, taps=13, fanc=2000 50 avg = 39.2299 dB rms = 20.8241 dB max = 137.318 dB min = −77.0005 dB pk−pk = 214.318 dB 0 Mag (dB) −50 −100 −150 −200 −250 −300 0 100 200 300 400 500 600 700 800 900 1000 f (Hz) Figure 2.1: Performance of an active noise control algorithm. 2.3.2 Convergence The time that the algorithm takes to converge on a set of coefficients that match the plant is referred to as the settling time. Since the device will be portable, the plant will be dynamic and require the active noise control circuitry to continually adapt the coefficients in a reasonable time that doesn’t affect the desired audio. A typical convergence measurement made in simulation is shown in figure 2.2. 2.3.3 Summary of Specifications Table 2.2 summarizes the device specifications. The design of the hardware-based noise cancelling solution will be targeted for the specifications in table 2.2. 12 Convergence Measurement 0.14 262 ms. 0.12 0.1 MSE 0.08 0.06 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time (s) Figure 2.2: Typical settling time of an active noise control algorithm. Parameter Abbreviation Band Limited Noise Cancellation Performance BLN CP 1 Settling Time Tset Die Area AAN C Power Dissipation PAN C Notes: 1. Measured as the average difference across the 1 kHz band. Table 2.2: Active noise control specifications. 13 Value Units 15 dB TBD ms 10 mm2 28 mW Chapter 3 Modelling the System Matlab [19] is relied on heavily to model discrete time filters in industry and academia alike. The strength of Matlab lies in its capability to perform matrix algebra efficiently. Since the underlying math in signal processing relies on matrix algebra, it makes for an unparalleled discrete signal processing simulation environment. Mathworks, the developers of Matlab, recommend minimizing the number of for and while loops in the scripting environment, since Matlab is optimized to perform computations on matrices instead of individual pieces of data. Programming without loops can be challenging, but improves the performance of Matlab significantly. Typically, the scripts can be written in the form of a loop to verify functionality, then mapped into parallel operations as an optimization. The simplest example is an FIR filter, wherein the input and and coefficients are vector multiplied to determine the output. Matlab prefers computing in exactly that fashion. Computing the FIR output using a while loop is discouraged. However, most of the adaptive filter block was implemented using loops to mimic a SystemC simulation developed in parallel by Rich Lathrop and Michael Hutchinson. Lower level simulations of the mathematical building blocks are described in section 4.3. The Matlab simulation can be divided into several major blocks. They combine to provide performance measures of the system and enable flexibility of architectural exploration. Figure 3.1 depicts the overall Matlab model. Some of the key Matlab scripts developed for this work can be found in appendix B on page 62. 14 Statistics Performance Passband Window FFT + - Input A FFT Acoustic Builder Input B Quantizer (pADC) u(n) + + P(z) d(n) e(n) + Sampler + F(z) H(z) Plant 1 Builder Plant 2 + - D(z) C(z) V x'(n) Threshold Convergence Measurement y(n) + W(z) MSE Calculator File + x(n) Leaky LMS aud(n) Electric Figure 3.1: Overview of Matlab model including the test bench, input builder, and active noise control blocks. 15 Acoustic F(z) + u(n) d(n) P(z) + + e(n) + H(z) + - D(z) x(n) W(z) y(n) C(z) x'(n) Leaky LMS Electric Figure 3.2: Active noise control module implemented in Matlab. 3.1 3.1.1 Function Breakdown Active Noise Control The active noise control block includes the adaptive filter along with the acoustic environment that surrounds it. Key parameters of the adaptive filter are passed in from the test bench to enable external control. The Matlab script executes each operation manually such that it can be modified at a very low level, at the expense of simulation speed. In figure 3.2, the active noise control function is shown. This filter topology was developed in [16]. 3.1.2 Acoustic Model The active noise control simulation requires an accurate model of the acoustics as well. The study of acoustic transfer functions related to the human head, or the approximation thereof, is known to the acoustic community as the 16 head related transfer function, or HRTF. The Earlab in Boston [9] maintains a database of HRTFs across test subjects and source location. Approximating the HRTF with an IIR filter is documented in [14]. MIT’s OpenCourseWare provides an excellent graduate-level course in acoustics and hearing that goes into great detail describing the propagation of sound and hearing [6]. The simulation script has a placeholder to use a complex transfer function such as one from the Earlab. However, the performance and convergence measurements acquired are not based on HRTFs from the Earlab. The Earlab transfer functions are in-depth models that even account for the dimensions of the hairs within the ear over the entire hearing range. The adaptive filter is targeted for lower frequencies and these types of effects are not included. In the implementation there exists two transfer functions. The first of these acoustic transfer functions is the path from the reference microphone to the secondary source in the left ear and the second is to the right. Note that these two transfer functions are different based on the azimuth of the noise source. The secondary feedback path is also an acoustic one, that may be similar to the primary path, with more attenuation. The end-product may have a microphone positioned very near the output speaker or farther away. In both cases, there should be a headphone cup around the speaker that attenuates high frequencies. 3.1.3 ADC A complex ADC block models the characteristics of a realistic pipelined analog to digital converter. Inclusion of a detailed ADC model enables insight into system effects of the ADC inaccuracies. For example, the gain of the residue amplifier can be set as 2.1 ± 6σ and the performance degradation can be quantified. Other aspects of the ADC include but are not limited to the parameters in table 3.1. Note that the output of the ADC can be digital or analog. If the analog option is used, quantized values between +1 and -1 are returned. The digital option returns an array of bits which can be less useful in the simulation, yet 17 ADC Parameters Description Residue amplifier gain Offset of comparator Sample rate Number of bits Type of output returned (analog/digital) Parameter Ideal Value Gain 2 Offset 0 Fs Bits N Output Type1 Digital Notes: 1. The finite precision analog value simplifies the simulation. Table 3.1: Matlab ADC Parameters. mimic a real pipelined ADC. This can be helpful during the design phase as the array of bits can be post-processed as inputs to a Verilog testbench or HSPICE simulation. The ADC has the ability to quantize the inputs to the adaptive filter, but is not used in finite precision LNS simulations, since the inputs are quantized to finite LNS values. 3.1.4 Test Bench The test-bench passes inputs to the filter and measures the performance thereof. The ANC system is instantiated from the testbench in one line. For sampling frequency fs, stepsize mu, filter order Worder, quantization type quant (infinte, linear, or LNS), left of decimal lod, and right of decimal rod the system is called from the test bench. [output, weights] = anc(fs, input, mu, Worder, quant, lod, rod); The function anc returns an output vector and weight matrix. The size of the weight matrix is Worder x length(input) as it maintains a history of the weights over time. The response of the filter and consequently its performance is greatly dependent on the inputs [30]. A variety of inputs can be passed to the filter to study the performance. The goal of any adaptive filter is to adapt to a 18 changing stimulus. Besides adapting to changing inputs, the filter adapts to a change in the plant as well. The test bench can take this into consideration by modelling a change in the plant characteristics during the simulation. Performance measures of the active noise control system include, but are not limited to, the items in table 3.2. Parameter Noise Cancellation Convergence Time Stability Description Difference in magnitude of noise output and input Time required to reach an acceptable MSE after a transition in the input or plant Is the filter stable or not? Units dB Constraints Band-limited seconds MSE trigger point Y/N Stability Criteria Table 3.2: Perfomance measures acquired in the test bench function. 3.1.5 Input Builder The input builder provides a non-stationary input to the adaptive filter. It accepts two signals, either synthesized in Matlab, or wave files from the hard drive. The two inputs are first windowed to achieve the desired length and to attenuate the beginning and end of each signal. Once the two are windowed, they are concatenated, with any desired amount of overlap. The result is a series of two separate noise sources, which can form a sudden change to the input of the adaptive filter. This type of non-stationary input can be used to benchmark the convergence rate of the filter. A simple case of two sine waves of different frequencies sent as input to builder shows the window-concatenate operation. This is depicted in 3.3. 3.2 Modelling the Logarithmic Number System As part of the effort to make the adaptive filter operate with low power consumption, the logarithmic number system[26], or LNS, is employed. Modelling of the LNS behavior was conducted in MATLAB. This section gives an 19 Figure 3.3: Input builder window-concatenate operation. 20 Linear Table 3.3: LNS word stored as a string Linear Sign Sign Magnitude MATLAB Variable 3.8807 0 0 01110111 ’0001110111’ 0.7521 0 1 00011001 ’0100011001’ -3.8807 1 0 01110111 ’1001110111’ -0.7521 1 1 00011001 ’1100011001’ overview of how to model LNS in MATLAB. More detail of the LNS implementation is found in chapter 4. An N-bit LNS number may be stored as an N +2 length word where the linear sign is stored in the most significant bit. The LNS model utilizes signmagnitude representation of the exponent. Since the logarithm of a negative number is complex, the linear sign bit indicates if the linear number is negative. For modelling purposes, this can be represented in several different ways. The first is to use a struct wherein the sign and zero flags are stored separately from the value. This method requires the least overhead. xlns=struct(’z’,xz,’s’,xs,’x’,xlog); xlns.z % Zero Flag xlns.s % Sign Flag xlns.x % Magnitude The second method of storing this type of data in MATLAB is to pack the binary data into a single word and store the string as it would be stored in a register. For testing LNS itself, as in appendix ??, the struct form is used. Simulation of the adaptive filter, however, only requires quantizing and saturating each stage of the filter. Storing each individual bit is not necessary. 3.2.1 LNS Finite Word-Length Noise The LNS finite-word lengths exhibit several interesting properties, especially for acoustic applications. 21 Figure 3.4: The LNS bow-tie depicts the LNS fixed-point conversion error. Figure 3.4 depicts the conversion error. This will be referred to as the bow-tie effect. The most interesting of the two error properties is the error near zero, referred to henceforth as enz . Note the sharp increase in error as the input approaches zero in figure 3.4. From [26], converting a number, X, to an LNS word of base b, XL , is accomplished by taking the logarithm of X. XL = logb (|X|) (3.1) Note that XL approaches −∞ as |X| approaches 0. This is depicted in figure 3.5 for b = 1.2. (XL )|X|→0 = lim logb (|X|) = −∞ |X|→0 22 (3.2) Logarithm Approaches −∞ −65 logb(X) −70 −75 −80 −85 −90 10 8 6 4 X (Reverse Scale) 2 0 −6 x 10 Figure 3.5: As |X| approaches 0, XL approaches −∞. 23 While a zero flag represents the situation wherein |X| is equal to zero, the largest LNS exponent value, XL , determines the magnitude of enz . If XL is at the maximum value, this is as close to −∞ the system can represent without being exactly zero. Since the value of XL will be stored as a normal fixed-point binary number, enz can be formulated in terms of NLOD and NROD , the number of bits to the left and right of the decimal, respectively. enz = b−(2 enz ≈ b NLOD −2−NROD ) −(2NLOD ) (3.3) (3.4) In lieu of 2NLOD being much larger than 2−NROD , it is desireable to pack more bits to the left of the decimal than the right, resulting in a larger |X L | and consequently a smaller enz . However, stealing bits from the right of the decimal to decrease enz adversely affects the second error property, the slope of the bow-tie. The slope of the bow-tie effect seen in figure 3.4 has a strong linear envelope. The linear envelope has a slope that is proportional to 2−NROD . The effect observed is that the slope of the bow-tie envelope is cut in half for every bit that is added to the right of the decimal, independent of what value is chosen for NLOD . The two properties of the LNS conversion error allow the designer to choose where the error will impact the design. Simulations of varying NROD and NLOD in the adaptive filter showed that enz has the greatest effect on the error. Note that the linear number system has constant quantization error across the linear input range. For a linear word length N , the quantization error is equal to Figure 3.6 shows the mean square error convergence vs time for several different test cases. The benefit of increasing NROD beyond 2 are very low. NLOD = 6 provides acceptable BLNCP and a reasonable settling time. This 24 MSE for Varying LNS Bit Precision LOD/ROD − Length 20 Filter 200 Hz Stationary Input −3 x 10 9 6/1 (20) 6/2 (20) 6/3 (20) 6/4 (20) 7/2 (20) 8/2 (20) 8 7 6 MSE 5 4 3 2 1 0 0 0.5 1 1.5 2 2.5 3 Time (s) Figure 3.6: Mean-square error (MSE) for several different LNS bit precisions. 25 MSE vs Time Varying W(z) Tap Length LNS 6/2 Precision, mu = 0.5e−3 0.025 Order 20 Order 18 Order 16 Order 14 Order 12 Order 10 0.02 MSE 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 Time (s) Figure 3.7: Mean-square error (MSE) for varying number of taps. results in an 8-bit LNS word and a 2 × 8 × 28 lookup operation, short enough to implement several within the system. Filter tap-length requires extensive characterization and modelling across a myriad of scenarios. Several multi-frequency inputs and a broadband wav file were tested against L = 13 and provided sufficient cancellation. Mean square error convergence plots for varying filter taps is depicted in figure 3.7. Error during a multiply or add operation suffers from similar characteristics when the result is near zero. Histograms of the error in LNS multiplication and addition are shown in figures 3.8 and 3.9, respectively. Note the outlying error datapoints in figure 3.9. In these cases, the two numbers have very close magnitude with opposing signs, resulting in a number close to zero. 26 600 500 Hits 400 300 200 100 0 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 Error Figure 3.8: LNS Fixed-Point Multiplcation error for inputs from −1 to +1. 120 100 Hits 80 60 40 20 0 −1 −0.5 0 0.5 1 Addition Error Figure 3.9: LNS Fixed-Point Addition error for inputs from −1 to +1. 27 3.3 Model Results A wide range of design trade-offs are made to meet the specifications. Figure 3.10 shows the different high level decisions that are made in designing the low-power adaptive filter. 3.3.1 Word Length Results Using the Matlab model, the mean-square error, ξ, was measured to determine the optimum bit precision for this system. From section 3.2.1, the integer part of the LNS word, NLOD , must be sufficiently large to correctly represent small numbers. Stouraitis and Paliouras recommend using b = 1.2 to minimize the bit activity [26]. For NLOD = 5, enz = 0.003. This being too small to represent the step size used in the model, NLOD = 6 is used and results in enz = 8.5 · 10−6 . This should be small enough to begin with. Simulations were run with NLOD for varying NROD to determine the optimum NROD . Figure 3.12 depicts the MSE for the system with the adaptive filter disabled, with floating point precision, and with varying NROD . Parameter Input fs µ b Pz Value Single Tone (302.6 Hz) 4 kHz 0.001 1.2 N=20 Chebyshev Window Table 3.4: Configuration used to determine NROD . Using 3 bits follows the floating point results very closely. However, adding a single bit can double the size of the LNS ROM table in the saturating addition unit (see section 4.2.5). NROD = 2 exhibits slightly higher MSE yet meets the specification for noise rejection in table 2.2. 2 bits are allocated for NROD . Determining the filter tap-length requires extensive modelling under a variety of conditions and stimulus. Simulations for different tap-lengths, 28 Active Noise Control Adaptation Methods Least Mean Square Steepest Descent Random Search Architectural Aspects Structure FIR IIR Lattice Differential Coefficients Length Internal Compensation Sample Rate Number System Linear Logarithmic Residue Logic Style CMOS Pass Transistor Logic Fixed Point Error (LOD, ROD) Specifications Architectural Noise Rejection (dB) Rejection Bandwidth (Hz) Convergence Rate Stability Power Cost (Area) Figure 3.10: Architecture and design decision tree. 29 x 10 MSE vs LNS Bit Precision (LOD=6) Varying LOD −3 14 ANC Disabled Floating Point 6/1 6/2 6/3 12 10 MSE 8 6 4 2 0 0 0.25 0.5 0.75 1 1.25 Time (s) Figure 3.11: Determining the fractional precision of the LNS words. 6/2 is found to be optimum. 30 Figure 3.12: MSE measurements after the filter has converged for different LNS fractional precisions. 31 Band−Limited Noise Cancelling Performance Measure I/P − 302 Hz; Plant − Chebyshev Window (N=20) mu=0.001, taps=13, fanc=4000, LOD/ROD − 6/2 50 avg = 21.6162 dB rms = 17.7562 dB max = 135.815 dB min = −59.955 dB pk−pk = 195.77 dB 0 Mag (dB) −50 −100 −150 −200 −250 0 200 400 600 800 1000 f (Hz) 1200 1400 1600 1800 Figure 3.13: BLNCP for the implemented architecture. 32 Convergence Measurement −3 x 10 10 9 8 7 MSE 6 5 4 35 ms 3 2 1 0 0 0.1 0.2 0.3 Time (s) 0.4 0.5 0.6 Figure 3.14: Convergence measurement of the filter. input sequences, plant characteristics, and step sizes must collectively give the designer an idea of how many taps are necessary to meet the specifications. Convergence rate is checked under the conditions listed in table 3.4. Figure 3.14 shows the final convergence rate of the filter. 33 Chapter 4 Low-Power Multiply/Add Building Blocks 4.1 Logarithmic Number System The logarithmic number system[28], or LNS, was introduced as an alternative to linear numbers in an attempt to enhance performance. An increase in performance can translate into power savings with minor changes[8]. logb (A · B) = logb (A) + logb (B) (4.1) The benefit of easily computing the product is paid for with a more complex addition operation. The addition of two linear numbers represented in LNS requires a fairly complex design with many tradeoffs to be made. Most of the literature on LNS focuses on ways to improve the addition circuit. 4.2 Implementation The basic building blocks used to create LNS multiply and add units were created using the NCSU 0.18µm technology. Designing the arithmetic units is expedited with the use of several specialized tools. Verification of the adder was completed by using MATLAB to generate a verilog testbench that provides inputs to the unit and checks the output against what is expected. A custom PERL ROM compiler was developed to create the ROM lookup table for addition and subtraction operations. 34 BL a + c AL b CL - as bs cs Figure 4.1: LNS multiply logic. 4.2.1 LNS Multiplication As mentioned, the LNS multiplier is simple. Figure 4.1 exhibits the logic required to compute the multiplication. The critical path of the multiplier is just a single addition block: tcrit = tadd 4.2.2 (4.2) LNS Adder Addition, or subtraction, in LNS is typically accomplished via[26]: Cl = Al + log(1 + bBl −Al ) =Bl + log(1 + bAl −Bl ) 35 (4.3) Note that the inputs Al and Bl can be swapped, providing two different ways to compute the same result. Typically, the log(1 + bAl −Bl ) operation is accomplished via a lookup table. Reducing the lookup table size is the primary area of interest when using the logarithmic number system. The subtraction operation, expressed in equation 4.4 is similar but requires a separate lookup table. Cl = Al + log(1 − bBl −Al ) =Bl + log(1 − bAl −Bl ) (4.4) The addition block is lengthy and requires two adders and the lookup table in series. The critical path through the addition block is: tcrit = tadd + tmux + tlu + tmux + tadd =2tadd + 2tmux + tlu (4.5) Comparing equation (4.5) to the multiplier critical path in equation (4.2) shows that the adder will have at least 2x the delay of the multiplier. Many papers have been written on how to speed up the addition operation under the LNS system. Moreover, the adder logic has been studied extensively outside the LNS realm, leaving the lookup operation under the focus for improvement. The lookup table can be a PLA, ROM, or a combination of both. Designers have found several ways to minimize the lookup table length. The easiest way to reduce the lookup table is to guarantee a negative value for Bl − Al [29]. This may involve a slight increase in area for the same delay because of the additional adder. However, it reduces the table to one-half the original size. Mark Arnold has researched different interpolation methods extensively in [2], [3], and [4]. 36 BL a + c AL b Lookup Plus z - 0 Mux 2:1 AL a 1 + c cs BL b s 0 - Lookup z Minus s s Mux 2:1 1 s a altb + c as bs sign AL 0 BL 1 Mux 2:1 b CL + s Figure 4.2: LNS addition block. A simple LNS addition scheme was implemented that guarantees negative inputs to the lookup table. Figure 4.2 shows the addition/subtraction unit. Note that the two 7-bit tables can be combined into a single 8-bit table and the output mux becomes part of the lookup column mux. 4.2.3 Linear Adder A Ladner-Fischer[17] style tree adder was chosen to implement the 2’s complement addition required for LNS multiplication and addition. This particular adder is better for longer wordlengths. For 8-bit addition, a simpler carry-save or carry-lookahead adder may give similar results. Since the adder is used for signal processing applications, it must saturate if it overflows. Because of this, the output of the adder has to be multiplexed with the most positive output and the most negative output. 4.2.4 Multiplexer Initially, a pass-gate multiplexer was used. The pass-gate mux, while having desirable timing, is not suitable for low voltage applications. For operation at very low voltages, a CMOS multiplexer is a much better choice. Figure 4.5 depicts the 4:1 multiplexer used to switch between array slices in the ROM (column mux). A similar CMOS 2:1 multiplexer switches is instantiated within 37 pg pg pg pg pg pg pg pg Cin ggl ggpl ggpl ggl ggpl ggl ggpl ggl Cout ggl ggl SL SL ggl SL ggl SL SL SL SL SL Figure 4.3: Ladner-Fischer tree adder[11] used for LNS operations. 38 Figure 4.4: Saturating linear adder used in LNS adder and multiplier blocks. the LNS addition block. 4.2.5 ROM The lookup operation is implemented in a ROM array. More complex partitioning or even a PLA implementation may also be suitable choices for LNS. In [29], Taylor utilized a ROM for the majority of the table, while a PLA represents portions of the table that change less rapidly. A PERL ROM compiler was developed to generate an HSPICE simulation deck that provides power and delay measurements of the array. The goal of the custom ROM compiler developed specifically for this design was to allow quick iterations when determining power consumption and timing. Some of the high level features of the ROM compiler are listed in table 4.2.5. The ROM implemented has 8 bit words and is organized to compensate for the rectangular cell size, 8λ × 12λ. There are four slices of 64 entries. This gives a more square overall ROM (384λ × 512λ). Figure 4.6 depicts the overall ROM layout organization. Note that the PMOS pre-charge devices and the sense amplifiers can 39 Figure 4.5: 4-to-1 CMOS multiplexer [31]. Table 4.1: ROM Compiler high-level features. Implements single or differential bitline ROM data read from a file Generates PWL inputs based on integer input file Short main .sp file hides complexity of array Supports drop-in sense amp replacements Easily integrates with other circuits 40 Figure 4.6: Overall ROM organization. Slices are interleaved in actual implementation. be disabled to save power. If the system that uses the LNS arithmetic block consists of a series of multiply→add→latch operations the ROM array can go into a sleep mode. During the sleep mode the pre-charge devices are turned off along with the sense amplifiers, if used. The enables for the pre-charge devices and the sense amplifiers are ganged by the slice, such that logic may enable individual slices to save power. The array, comprised of multiple slices, must incorporate a column multiplexer to select between the slices. The layout of the column mux is more efficient if the slices are interleaved. The ROM compiler generates three separate HSPICE files to allow simulation of the full array along with the array’s periphery. The romdeck.sp file houses the raw array including individual π-models for the bit and word line routing. A lnsops.sp file sets all of the simulation parameters, generates inputs, and makes measurements. A third file, periph.sp, is the array’s subcircuits. These include PMOS pre-charger devices, sense amplifiers, bit-cell, and π-models. 41 BL_(s)_(r)_(c) R1 WL_(s)_(r)_(c) Figure 4.7: Single-ended bit-cell shown with parasitic metal capacitance and resistance. 4.2.5.1 Cell Each individual ROM cell must be programmed to give the correct data value. This is accomplished by programming resistance values within the cell that behave like metal contacts. The differential bit-cell is shown in figure 4.8 and the single-ended bit-cell is shown in figure 4.7. R0 and R1 must be designed for each cell to be connected to either bitline or bitline b. This is accomplished via a handshake between PERL and HSPICE using two ternary statements. The PERL ROM compiler knows if the current bit, which sits at a particular slice/column/row address, should be a 1 or a 0. If the bit is a logic 1, the cell will have a low resistance contact to the NMOS pull-down in the cell. The opposite is true for the contact to bitline b in the differential case. Below is the HSPICE subcircuit that implements each bitcell. .SUBCKT bitcell bl blb wl dat=1 42 BL_(s)_(r)_(c) BLb_(s)_(r)_(c) R0 R1 WL_(s)_(r)_(c) Figure 4.8: Differential ROM bit-cell with complementary output (BLb) and parasitic elements. 43 M1 netint wl 0 0 TSMC18DN L=180E-9 +W=270E-9 AD=121.5E-15 AS=121.5E-15 +PD=1.44E-6 PS=1.44E-6 M=1 R1 netint bl ’(dat==1) ? 1e-3 : 10MEG’ R0 netint blb ’(dat==0) ? 1e-3 : 10MEG’ .ENDS bitcell For each bitcell that PERL instantiates, it sets the dat parameter to either ’1’ or ’0’. When dat is equal to ’1’, the HSPICE bitcell sub-circuit will program R1, which models the connection to bitline, with a low resistance and R0 with a high resistance. Similarly, dat equal to ’0’ will set a high resistance to bitline and a low resistance to bitline b. The result is a fully programmed NxM array with S slices. This particular implementation makes use of a 64x32 array with 4 slices. 4.2.5.2 Row Circuitry A two-level, static CMOS decoder [23] was chosen to implement the word-line driver. The pitch of the decoder must match that of the individual ROM cell. A single ended 1T ROM cell may be as small as 7λx11λ [11] whereas a standard cell may be roughly 32λ in height. The small signal version of the ROM is metal limited as the bit and word line wires must be wide to have lower resistance. The row decoder has to be staggered in order to mate with the ROM pitch. 4.2.5.3 Column Circuitry The ROM compiler has the capability to instantiate differential bitlines with sense amps or single ended bitlines with an inverter output (large signal). The column circuitry is made up of the bitline receivers and column multiplexer. Two common sense amplifiers were investigated for use in the lookup table along with the simple inverter receiver. The current mirror amplifier [23], shown in figure 4.9, consumes too much power to be useful for this application. It was found to consume upwards of 8-9 mW. A latching sense amplifier [23], shown in figure 4.10, was also measured for power and timing. 44 VDD V DD BLb BL EN Figure 4.9: Current mirror sense amplifier. It provided a reduction in power of around 75% and less than 1 ns worst case timing through the ROM. The final solution to the power consumption issue is the large signal inverter receiver. This provides the least power and area, both at the expense of timing. 4.3 Circuit-level Simulation Blocks within the LNS add unit and the LNS multiply unit were simulated separately, due to the large simulation time of the ROM. From the simulation results, timing and power estimates can be made for the adder and the multiplier. 4.3.1 ROM Simulation Simulation of the ROM at the transistor level was conducted to measure timing and power consumption. The following simulation results do not include the row decoder logic or the column multiplexer. Figure 4.11 depicts the ROM power consumption during a read operation. The ROM power is 45 Figure 4.10: Latching sense amplifier listed in 4.2. Static power should reduce linearly with the number of slices enabled. A transient simulation assessed the read access time through the ROM array. The result is shown in figure 4.12. 4.3.2 Linear Adder The linear adder unit was also simulated to measure the power consumption and worst case timing path. This was also conducted in HSPICE. Since the adder is instantiated three times within the LNS adder and once within the LNS multiplier, the power, area, and timing contributions are significant with regards to the LNS blocks. 46 lns log-add hspice simulation 1.4m 1.2m Params (lin) 1m 800u 600u 400u 200u 0 50n Time (lin) (TIME) 100n Figure 4.11: Power of the 8-bit ROM with inverter receiver. Block LNS Add w/o ROM ROM Power (mW) 0.382 0.620 Timing (ns) 3.1 2.8 Area (µm2 ) Notes 9,422 10,314 LNS Adder 1.002 5.9 19,736 1 LNS Multiplier 0.150 1.2 6,933 2 Notes: 1. The LNS adder is two 7b ROMs, three linear add units, and 2 multipexers. 2. The LNS multiplier is comprised of a single linear add unit and an XOR gate. 3. Power and timing simulations are for VDD=1.2V and room temperature. Table 4.2: Implementation results. 47 lns log-add hspice simulation 1.2 1 Voltages (lin) 800m 600m 400m 200m 0 29n 30n 31n 32n Time (lin) (TIME) 33n Figure 4.12: ROM read access time. 48 Figure 4.13: Power consumption in the 8-bit linear saturating adder. Figure 4.14: Worst case timing measurement of the 8-bit linear saturating adder. 49 Chapter 5 Filter Hardware With architectural and implementation aspects defined, the filter hardware must be addressed such that timing, area, and power specifications are met. The design specifications for this application are developed in chapter 2 and summarized in table 2.2. The adaptive filter model was developed in chater 3 and generic arithmetic units in chapter 4. Mapping the architectural model of chapter 3 using the building blocks from chapter 4 is the focus of this chapter. The model developed makes use of a direct form FIR filter. The output of the direct form FIR filter is expressed. y= L−1 X i=0 w(i) · x(L − i) (5.1) A direct mapping of equation 5.1 into a signal processing diagram is depicted in figure 5.1. The critical path of this structure is Tmult + L · Tadd . Since L will be in the range of 10-20, the timing cycle time would need to be unreasonably long [22]. Transposing the filter in figure 5.1 results places the latches in the sum path, reducing the cycle time to Tmult + Tadd while preserving the filter structre. Transposition is defined by Keshab Parhi: Reversing the direction of all edges in a given signal flow diagram and interchanging the input and output ports preserves the functionality of the system. Transposition decouples the cycle time from the tap-length. Figure 5.2 depicts the transposed filter, also known as the data-broadcast FIR filter. 50 x(i) w0 w1 x w2 wL-2 w L-1 x x x x + + + + y(i) Figure 5.1: Direct form FIR filter block diagram. x(i) w L-1 w L-2 x w L-3 w1 w0 x x x x + + + + Figure 5.2: Transposed FIR filter block diagram. 51 y(i) x(i-L) x(i-L+1) x(i-2) x(i-1) x(i) x x x x x x + + + + wL-2 (i) w L-1 (i-1) wL-2 (i-1) + e(i) 2µ w 0(i) w2(i-1) w1(i-1) w0(i-1) Figure 5.3: Least-mean square weight update block. The FIR filter, alone, could be easily pipelined using feedforward cutsets to improve the cycle time. However, adding the weight update block, shown in figure 5.3, complicates the structure. W(n) = W(n − 1) + 2µ · e(n − 1)X(n − 1) (5.2) The complication arises from the recursive loops formed by adding the weight update to the FIR filter. The weight update unit accepts feedback from the prior weight vector, W(n − 1), and the error signal, e(n − 1). These paths, by definition of the weight update equation in equation 5.2, are not delayed. As it stands, the data-broadcast FIR filter along with the weight update block has a cycle time of Tmult +Tadd . If the designer wishes to reduce the cycle time of the filter as a whole, both the FIR and weight update blocks must be addressed. Trade-offs of pipelining the adaptive filter, while not implemented in this application, discussed in section 5.2. The detailed procedure for pipelining the LMS filter is described by Keshab Parhi in [22] and [25]. The filter and weight update block are shown together in figure 5.4. The low cycle time requirements of the system along with a narrow bit-width allow flexibility in molding the filter to meet the requirements from the model. 52 2µ x x x x x x x + + + + + + w 2 (i-1) w 1 (i-1) w 0 (i-1) w L-1 (i-1) x w L-2 (i-1) w L-3 (i-1) x x x x x + + + + + Figure 5.4: Filter implementation. 53 e(i) x(i) y(i) This implementation of the LMS adaptive filter requires building blocks summarized in table 5.1, with L taps and N word-length. Element LNS Adder Unit LNS Multiplier Unit Latches Expression 2L − 1 2L + 1 3LN L = 13 Quantity 25 27 312 Table 5.1: Filter hardware requirements. From the modelling phase, the sample rate of the filter was found to be 4 kHz. This requires a 250 µs cycle time. Table 5.2 exhibits the entire system’s characteristics including area, power, and timing. These are estimates based on the filter hardware shown in figure 5.4 and the building block results in table 4.2 on page 47. Qty. Operation Power (mW) Area µm2 25 LNS Addition 25.05 493,400 27 LNS Multiplication 4.05 187,191 312 Latches 9.36 9360 Total 38.5 689,951 Table 5.2: Filter hardware results. Power for the adaptive system is 37% over the budget outlined in table 2.2. However, area is less than 10% than what was budgeted for initially. Given the additional area, additional hardware could be added to reduce power consumption. This may come in the form of sleep logic. Individual units should be powered off for some portion of the cycle, leaving power to the latches on to save the filter state. The implementation is fully parallel, resulting in timing surplus as the filter is able to operate clock for clock with input samples. 5.1 Possible Enhancements An enhanced successor to this implementation could be comprised of one or all of the following features: 54 • Shared lookup table: Sharing the lookup operation amongst LNS addition units. This would require a multi-ported ROM, but would be advantageous if a longer wordlength is desired. • Folded LMS: Folding the current architecture would require a small increase in logic and latches, but would significantly reduce power. Timing can easily be achieved given the low sample rate. • Variable Logarithmic Word-Lengths: Variable word-length filters have been studied extensively for the linear number system. • Tap-Length Modulator: The adaptive filter, by definition, minimizes error by adjusting difference equation coefficients. If the error threshold is too high, additional taps could be dynamically added to the filter. Similarly, taps could be removed from the loop to save power. 5.2 Pipelining the LMS Filter To pipeline the weight update stages Keshab Parhi recommends in [25] to add delay directly to the weight input. The additional delay elements may be retimed into the weight update block to achieve a shorter cycle time. In order to strictly follow retiming, delay must be subtracted from the e(n) and x(n) cutsets into the weight update block. Since there are no latches to be removed, the newly added latches change the filter functionality. The weight update routine will be delayed with the assumption being that the weights do not have to be updated clock for clock with the filter output. Stability, convergence, and accuracy of the adaptive filter are sacrificed to improve the cycle time [25]. Since the stability is affected, the step size needs to be adjusted to accomodate the new structure. The simulation model of the adaptive filter must take this type of change into account if it is to be implemented. 55 5.3 Other Filter Options A multitude of filter hardware choices are available outside of direct form FIR. IIR and lattice structures are less common, but feasible options that should be considered. The IIR implementation faces a more strict stability requirement. Not only does the adaptive filter have to be stable by setting the step size appropriately, the IIR filter itself must also be stable. With static weights, this concern is lessened. However, as the weights are adapting they may cause the filter to have poles outside the unit circle. Lattice structures may also be implemented using IIR or FIR functionality. Adaptive lattice filters are commonly used in speech recognition software. 56 Appendices 57 Appendix A Least-Mean Square Algorithm Bernard Widrow and Samuel Stearns are the founders of the least-mean square algorithm (LMS). A more in-depth analysis of the performance surface and alternative means of searching it are described in their book, Adaptive Signal Processing [30]. This appendix is a brief summary of how the algorithm is formulated. Figure A.1: A diagram of an adaptive filter system. The performance function of the non-recursive (FIR) adapative filter is known as the mean square error, MSE or ξ. ξ , E[e2 (n)] (A.1) For the adaptive system in figure A.1, the output of the FIR structure, y(n), at time n is the convolution of the input x(n) and the current filter weights, 58 w(n): y(n) = w(n) ∗ x(n) = L X wln xn−l (A.2) n=0 This can also be realized in vector notation (see [13] for a sound review of vector algebra) as: T y(n) = XT (A.3) n W n = W n Xn From figure A.1, the error signal is defined as: e(n) = d(n) + y(n) (A.4) In some texts, the error signal may be defined as the subtraction of y(n) from d(n), depending on the application. Expanding e(n) gives T e(n) = d(n) + XT n Wn = d(n) + Wn Xn (A.5) The error signal, e(n), is squared to give T T e2 (n) = d2 (n) + XT n Wn Xn Wn + 2d(n)Xn Wn (A.6) The expected value1 of e2 (n) is T T E[e2 (n)] = E[d2 (n) + XT n Wn Xn Wn + 2d(n)Xn Wn ] 2 = E[d (n)] + 2 = E[d (n)] + T E[XT n W n Xn W n ] E[WnT Xn XT n Wn ] + + 2E[d(n)XT n Wn ] T 2E[d(n)Xn Wn ] (A.7) (A.8) (A.9) Since the performance function is a function of Wn , it is assumed to be constant for this derivation and the n subscript is dropped from Wn . The constant weights can be factored out of the expected value operations. T E[e2 (n)] = E[d2 (n)] + WT E[Xn XT n ]W + 2E[d(n)Xn ]W (A.10) Defining A = E[Xk XT k ] and P = E[d(n)Xk ] simplifies the expression for the mean-square error, ξ. 1 A good explanation of expected value is on page 471 of [27] 59 ξ = E[e2 (n)] = E[d2 (n)] + WT AW + 2PT W (A.11) The gradient of the performance function, ∇(ξ), can be used to find the minimum of ξ. δξ δW = 2AW − 2P ∇(ξ) = (A.12) (A.13) The minimum occurs when ξ = 0. Setting the gradient of the meansquare error to zero reveals the coefficients to be used to achieve minimum error, labeled W0 . W0 = A−1 P (A.14) While this representation of the minimum error weight vector looks simple, it requires a significant resources to compute the new weights. From a control system perspective, it is desirable to move towards the minimum in a controlled fashion (damped), rather than immediately moving to the optimum weight vector, W0 . This can be achieved by moving “downhill,” or in the opposite direction of the slope. This is known in the adaptive filter realm as gradient search. W(n + 1) = W(n) − µ · ∇(n) (A.15) This expression for the next weight vector, W 0 , can be used to minimize ξ, realizing the desired behavior. The value of ∇(n) is defined in A.12. Computing 2AW − 2P at each iteration requires significant resources. Calculating ∇(n) with m points used in the expected value requires Nmult multiply operations and Nadd addition operations. Nmult = (L2 + L) + (L2 + 1) + (2L) + (1) 2 (A.16) = 2L + 3L + 2 (A.17) Nadd = (L · m) + (L · m) + 1 (A.18) = 2L · m + 1 60 (A.19) Of course, Nmult and Nadd do not account for the operations required to compute the filter output. Clearly, a simplification must be made in order to make the system have reasonable resource requirements. The LMS algorithm provides that simplification. Where in A.15, the gradient of E[e(n) 2 ] was used, the LMS method makes use use of the gradient of e(n)2 instead. δ e(n)2 δW δ e(n) = 2 · e(n) δW = −2 · e(n)X(n) ∇LM S = (A.20) (A.21) (A.22) With this modification to the gradient search method, the weight vector for n + 1 is revisited. W(n + 1) = W(n) − µ∇LM S = W(n) + 2µe(n) · X(n) (A.23) (A.24) This elegant solution to the resource problem requires no vector multiplies and can be computed using Nmult = L + 1 and Nadd = L. Note that the filter output will require L multiplications and L additions. While the LMS simplification is simple, the path to the minimum error will be longer and less accurate. 61 Appendix B Matlab Routines Some of the key Matlab models used to develop the noise cancellation are included in the following sections. B.1 Modelling Subroutines The following short routine is used within the adaptive filter to quantize and saturation the intermediate calculations to a fixed LNS word length. Array sizes and parasitic elements of the array are also calculated. B.1.1 LNS Fix and Saturate function [y] = fixlns(x,lod,rod,base) if nargin==0 x = 2*rand()-1; x = 0.000001; lod = 4; rod = 4; base = 1.2; end limit = 2^(lod)-1/2^(rod); signx = sign(x); if x==0 xl = -limit; xlf = -limit; else 62 xl = log10(abs(x))/log10(base); xlf = round(xl * 2^(lod+rod)) / 2^(lod+rod); if xlf>(limit) xlf=limit; end if xlf<(-limit) xlf=-limit; end end y = signx*base^(xlf); end 63 B.2 Implementation Aid This function is used to generate the text file that seeds the ROM compiler. Analysis of how much the lookup output changes for different inputs can be done using this function. B.2.1 LNS Lookup Table Generator %LNS ROM Table Generator %Generates the Lookup table for the LNS Add/Sub Routine clear;clc; %Parameters WordLength = 8; SliceWords = 64; Lambda = (0.180e-6)/2; CellH = 1*12*Lambda; CellV = 1*8*Lambda; m2rsq = 0.08; %M2 used for bitlines (R = (l/w)*Rsq) m2cap = 0.1e-15:(0.1e-15):0.2e-15; %(F/um) m2width = CellH*(0.5); m2space = CellH*(0.5); ParXtorCap = 0.37e-15; %Size Calculations Addresses = 2^WordLength; ArrayBits = Addresses*WordLength NumberOfSlices = (Addresses) ./ SliceWords; SingleCell = CellH * CellV; CellArea micronsq = CellH * CellV * ArrayBits CellArea mmsq = CellArea micronsq ./ 1e-6; SliceSize = CellArea mmsq ./ NumberOfSlices; %Cell RC Calculations %BitlineR = ; %bitline resistance 64 %Worst Case Column RC Calculation Columnvr = (CellV*SliceWords/m2width)*m2rsq; ColMetalvc = (CellV*SliceWords*1e6)*m2cap; ColParXtorc = (SliceWords)*ParXtorCap; Columnvc = ColMetalvc+ColParXtorc; %----- TABLE -----% %Log Base b=1.2; LOD=4; %Left of Decimal ROD=4; %Right of Decimal %LNS Base b=1.2; %Variables lsgn = -1; % use -1 if guaranteed to always have a negative input z=0:lsgn*1/2^ROD:lsgn*(2^(LOD+ROD)-1)/2^ROD; GN=-(2^(LOD+ROD)-1)/2^ROD; %Add-Log Intermediate Calculation for Lookup Table sbp=log10(1+b.^z)/log10(b); sbpdiff = diff(sbp); sbprnd = round(sbp*(2^(ROD)))/(2^(ROD)); % round, floor, ceil, fix if 0 plot(z,sbp,z,sbprnd); str=sprintf(’Length of z = %i’ ,length(z)); disp(str); end %figure; plot(z(1:length(z)-1),sbpdiff(1:length(z)-1)); if 0 writeArray(:,1)=z; writeArray(:,2)=sbprnd; dlmwrite(’arrdat.txt’ ,writeArray,’,’ ); end 65 66 Appendix C ROM Compiler in PERL A block diagram of the PERL ROM compiler is shown in figure C.1. 67 Input Vector (inpdat.txt) Matlab Array Data (arrdat.txt) Custom Row and Column Decoders (decckt.sp) Specifications Rows Columns Slices SE or Diff Sense Amp Line Parasitics Simulation Drivers Sim time Step t rise /t fall ROM Compiler (romcomp.pl) Main Spice Control File (lnsops.sp) Peripheral Circuits (periph.sp) Input PWL Vectors Full ROM Spice Model (romdeck.sp) Figure C.1: Block diagram of custom ROM compiler written in PERL. 68 C.1 HSPICE Output File Samples Here is an example of an input vector generated by the ROM compiler. Each address bit is generated for a number of clock cycles based on fixed point entries in a separate file. Vadrin_2 adrin_2 0 PWL +2.5e-10 0 1e-08 0 +1.025e-08 vdd 2e-08 vdd +2.025e-08 0 3e-08 0 +3.025e-08 0 4e-08 0 +4.025e-08 0 5e-08 0 +5.025e-08 vdd 6e-08 vdd +6.025e-08 vdd 7e-08 vdd +7.025e-08 vdd 8e-08 vdd +8.025e-08 0 9e-08 0 +9.025e-08 0 1e-07 0 This listing is a single cell from the array that the compiler generates. The sample is from the final row of the output that is attached to the output receiver. $WORD[63]=(2.187500)=(00100011) XI1522 BL_0_63_0 BL_0_64_0 CRCVERT XI1523 BL_0_63_0 WL_0_63_0 BITCSE dat=1 XI1524 WL_0_63_0 WL_0_63_1 CRCHORI XI1525 BL_0_64_0 SAOUT_0_0 INVERTER1 XI1526 BL_0_63_1 BL_0_64_1 CRCVERT XI1527 BL_0_63_1 WL_0_63_1 BITCSE dat=1 XI1528 WL_0_63_1 WL_0_63_2 CRCHORI XI1529 BL_0_64_1 SAOUT_0_1 INVERTER1 XI1530 BL_0_63_2 BL_0_64_2 CRCVERT XI1531 BL_0_63_2 WL_0_63_2 BITCSE dat=0 XI1532 WL_0_63_2 WL_0_63_3 CRCHORI XI1533 BL_0_64_2 SAOUT_0_2 INVERTER1 69 Index Direct Form, 51 Pipelining, 55 Transposed, 51 Fixed-point, 21, 28 Abstract, vi Acknowledgments, v Acoustics, 16 Adder, 37 ANC Filtered-X, 3 History, 1 Leaky LMS, 3 Secondary Path, 5 Appendices, 57 Appendix Matlab Routines, 62 ROM Compiler in PERL, 67 Architecture Filter, 50 Area, 8, 10 Gradient, 60 Headphones, Noise Cancelling, 7 HSPICE, 69 Implementation, 34 Input Builder, 19 Introduction, 1 Leug, Paul, 1 LMS Derivation, 58 LNS, 34 Adder, 35 Modelling, 19 Multiplication, 35 Logarithmic number system, see LNS Lookup Operation, see ROM Band-limited noise cancelling performance, 11 Bibliography, 74 Bow-tie Effect, 22 Column Circuits, 44 Convergence, 12, 33 Cost, 8 Matlab, 14 Media Players, 7 Model Active Noise Control, 16 ADC, 17 Modelling, 14 Multiplexer, 37 Dedication, iv Die Per Wafer, 9 Earlab, 17 Enhancements, 54 Error near zero (enz ), 22 Expected Value, 59 Performance, 11, 58 Pipelining, 55 Power, 10 Filter, 50, 53 70 Product Survey, 7 Results, 54 ROM, 39 ROM Cell, 42 Row Decoder, 44 Sample Rate, 54 Sense Amplifiers, 44 Settling Time, 12 Specifications, 8, 13 Test Bench, 18 Weight Update Block, 52 71 Bibliography [1] R Schafer A Oppenheim. Discrete-time Signal Processing. Prentice-Hall, Upper Saddle River, N.J., 2nd edition, 1983. [2] Mark Arnold. Design of a faithful lns interpolator. 336–345, 2001. [3] Mark Arnold. A pipelined lns alu. Society Workshop on VLSI, 2001. In DSD, pages In Proc. of the IEEE Computer [4] Mark Arnold. Geometric-mean interpolation for logarithmic number systems. In Proc. of the International Symposium on Circuits and Systems, volume 2, pages 433–436, may 2004. [5] Bose. Home page. http://www.bose.com. [6] L Braida, J Rosowski, C Shera, and K Stevens. Mit open courseware, 6.551j: Acoustics of speech and hearing, 2004. http://ocw.mit.edu/OcwWeb/ElectricalEngineering-and-Computer-Science/6-551JFall-2004/LectureNotes/index.htm. [7] David Carey. Noise cancelling headphones: analog margin-makers. Technical report, http://www.planetanalog.com/showArticle.jhtml?articleID=159402571, 2005. [8] A. P. Chandrakasan, S. Cheng, and R. W. Broderson. Low-power cmos digital design. 27(4):473–484, April 1992. [9] EarLab. A virtual laboratory for auditory research. http://earlab.bu.edu. [10] David Carnoy et al. The sound of silence. Technical report, http://reviews.cnet.com/452030007-1017728-1.html, 2005. [11] David Harris and Neil Weste. CMOS VLSI Design : a circuits and systems perspective. Addison-Wesley, 2005. 72 [12] Paul Howells. Technical report, 1959. US Patent 3,202,990. [13] Erwin Kreyszig. Advanced Engineering Mathematics. Sons, Inc., 1993. 7th edition. John Wiley & [14] Abhijit Kulkarni and Steven Colburn. Infinite-impulse-response models of the head-related transfer function. Technical report, Acoustical Society of America, 2004. Hearing Research Center and Department of Biomedical Engineering, Boston University. [15] Sen Kuo. Design of active noise control systems with the tms320 family. Application note, Texas Instruments, http://focus.ti.com/lit/an/spra042/spra042.pdf, June 1996. [16] Sen Kuo and Dennis Morgan. Active Noise Control Systems: Algorithms and DSP Implementations. Wiley Interscience, 1996. [17] Richard Ladner and Michael Fischer. Parallel prefix computation. Journal of the Association for Computing Machinery, pages 831–838, October 1980. [18] Paul Lueg. Technical report, 1936. US Patent 2,043,416. [19] MathWorks. Home page. http://www.mathworks.com. [20] Harry Olson and Everett May. Electronic sound absorber. Acoustical Society of America, 25:1130, November 1953. [21] Panasonic. Home page. http://www.panasonic.com. [22] Keshab K. Parhi. VLSI Digital Signal Processing Systems: Design and Implementation. Wiley Interscience, 1999. [23] Jan Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. Digital Integrated Circuits. Prentice Hall, Upper Saddle River, New Jersey, 2003. [24] Sennheiser. Home page. http://www.sennheiser.com. [25] Naresh Shanbhag and Keshab Parhi. architecture. November 1991. 73 A pipelined lms adaptive filter [26] T. Stouraitis and V. Paliouras. Considering the alternatives in low-power design. pages 22–29, July 2001. [27] Ferrel Stremler. Introduction to Communication Systems. Wesley, 1990. 3rd edition. Addison- [28] E. E. Swartzlander, Jr. and A. G. Alexopoulos. The sign/logarithm number system. C-24(12):1238–1242, December 1975. [29] Fred Taylor, Rabinder Gill, Jim Joseph, and Jeff Radke. A 20 bit logarithmic number system processor. 37(2):190–200, February 1988. [30] Bernard Widrow and Samuel Stearns. Adaptive Signal Processing. Prentice Hall, 1985. [31] Reto Zimmerman and Wolfgang Fichtner. Low-power logic styles: Cmos versus pass-transistor logic. 32(7), 1997. 74 Vita Jay Brady Fletcher Jay, from Wolfforth, Texas has had a long and illustrious career as an electrical engineer. Down the winding road of academia, he found himself with a BS in engineering physics, a combined physics and electrical engineering degree, from Texas Tech University. He now works at Advanced Micro Devices on high performance microprocessor development. Permanent address: 2600 Lake Austin Blvd #11208 Austin, Texas 78703 This report was typeset with LATEX† by the author. † A LT EX is a document preparation system developed by Leslie Lamport as a special version of Donald Knuth’s TEX Program. 75