www.ece.utexas.edu - The University of Texas at Austin

Transcription

www.ece.utexas.edu - The University of Texas at Austin
Copyright
by
Jay Brady Fletcher
2005
Integrated Noise Cancellation with the Least Mean
Square Algorithm and the Logarithmic Number System
by
Jay Brady Fletcher, B.S.
REPORT
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
MASTER OF SCIENCE IN ENGINEERING
THE UNIVERSITY OF TEXAS AT AUSTIN
December 2005
Integrated Noise Cancellation with the Least Mean
Square Algorithm and the Logarithmic Number System
APPROVED BY
SUPERVISING COMMITTEE:
Jacob Abraham, Supervisor
Mark McDermott
Dedicated to my pet fish Rooney.
Acknowledgments
I would like to thank my friends, project partners, teachers, and family
for their unending support in my endeavors.
v
Integrated Noise Cancellation with the Least Mean
Square Algorithm and the Logarithmic Number System
Jay Brady Fletcher, M.S.E.
The University of Texas at Austin, 2005
Supervisor: Jacob Abraham
This paper outlines design considerations and implementation aspects
of a portable active noise cancellation solution. Power, area, and performance
tradeoffs are examined.
vi
Table of Contents
Acknowledgments
v
Abstract
vi
List of Tables
x
List of Figures
xi
Chapter 1. Introduction
1.1 A Brief History of ANC . . . . . . . . . . . . . . .
1.2 Product Survey . . . . . . . . . . . . . . . . . . .
1.2.1 Modern Active Noise Cancellation Systems
1.2.2 Portable Media Players . . . . . . . . . . .
Chapter 2. Specifications
2.1 Area and Cost . . . . . . . . . . .
2.2 Power . . . . . . . . . . . . . . . .
2.3 Performance . . . . . . . . . . . .
2.3.1 Noise Cancellation . . . . .
2.3.2 Convergence . . . . . . . .
2.3.3 Summary of Specifications .
Chapter 3. Modelling the System
3.1 Function Breakdown . . . . . .
3.1.1 Active Noise Control . . .
3.1.2 Acoustic Model . . . . . .
3.1.3 ADC . . . . . . . . . . .
3.1.4 Test Bench . . . . . . . .
3.1.5 Input Builder . . . . . . .
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
7
7
7
.
.
.
.
.
.
8
8
10
11
11
12
12
.
.
.
.
.
.
14
16
16
16
17
18
19
3.2 Modelling the Logarithmic Number System
3.2.1 LNS Finite Word-Length Noise . . .
3.3 Model Results . . . . . . . . . . . . . . . .
3.3.1 Word Length Results . . . . . . . .
.
.
.
.
19
21
28
28
.
.
.
.
.
.
.
.
.
.
.
.
.
34
34
34
35
35
37
37
39
42
44
44
45
45
46
Chapter 5. Filter Hardware
5.1 Possible Enhancements . . . . . . . . . . . . . . . . . . . . . .
5.2 Pipelining the LMS Filter . . . . . . . . . . . . . . . . . . . .
5.3 Other Filter Options . . . . . . . . . . . . . . . . . . . . . . .
50
54
55
56
Appendices
57
Chapter 4. Low-Power Multiply/Add
4.1 Logarithmic Number System . . . .
4.2 Implementation . . . . . . . . . . .
4.2.1 LNS Multiplication . . . . .
4.2.2 LNS Adder . . . . . . . . . .
4.2.3 Linear Adder . . . . . . . . .
4.2.4 Multiplexer . . . . . . . . . .
4.2.5 ROM . . . . . . . . . . . . .
4.2.5.1 Cell . . . . . . . . .
4.2.5.2 Row Circuitry . . . .
4.2.5.3 Column Circuitry . .
4.3 Circuit-level Simulation . . . . . . .
4.3.1 ROM Simulation . . . . . . .
4.3.2 Linear Adder . . . . . . . . .
Appendix A.
.
.
.
.
.
.
.
.
Building
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Blocks
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Least-Mean Square Algorithm
Appendix B. Matlab Routines
B.1 Modelling Subroutines . . . . . . .
B.1.1 LNS Fix and Saturate . . . .
B.2 Implementation Aid . . . . . . . . .
B.2.1 LNS Lookup Table Generator
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
58
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
62
62
64
64
Appendix C. ROM Compiler in PERL
C.1 HSPICE Output File Samples . . . . . . . . . . . . . . . . . .
67
69
Index
70
Bibliography
72
Vita
75
ix
List of Tables
1.1
Several active noise cancellation headphones available at publication[5][21][24]. . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.1
2.2
Assumptions for estimating the size of an SoC device. . . . . .
Active noise control specifications. . . . . . . . . . . . . . . . .
9
13
3.1
3.2
3.3
3.4
Matlab ADC Parameters. . . . . . . . . . . . . . . . . .
Perfomance measures acquired in the test bench function.
LNS word stored as a string . . . . . . . . . . . . . . . .
Configuration used to determine NROD . . . . . . . . . . .
.
.
.
.
18
19
21
28
4.1
4.2
ROM Compiler high-level features. . . . . . . . . . . . . . . .
Implementation results. . . . . . . . . . . . . . . . . . . . . . .
40
47
5.1
5.2
Filter hardware requirements. . . . . . . . . . . . . . . . . . .
Filter hardware results. . . . . . . . . . . . . . . . . . . . . . .
54
54
x
.
.
.
.
.
.
.
.
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Acoustic summation of two audio signals. . . . . . . . . . . . .
Feedback only noise cancellation. . . . . . . . . . . . . . . . .
Feedforward ANC using the LMS algorithm. . . . . . . . . . .
Secondary source transfer function, H(z), is added to the LMS
system [16]. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Filtered-X LMS system with transfer function C(x) added. . .
Leaky LMS system. . . . . . . . . . . . . . . . . . . . . . . . .
Secondary feedback path compensation added to the LMS system.
2.1
2.2
Performance of an active noise control algorithm. . . . . . . .
Typical settling time of an active noise control algorithm. . . .
3.1
Overview of Matlab model including the test bench, input builder,
and active noise control blocks. . . . . . . . . . . . . . . . . .
Active noise control module implemented in Matlab. . . . . .
Input builder window-concatenate operation. . . . . . . . . . .
The LNS bow-tie depicts the LNS fixed-point conversion error.
As |X| approaches 0, XL approaches −∞. . . . . . . . . . . .
Mean-square error (MSE) for several different LNS bit precisions.
Mean-square error (MSE) for varying number of taps. . . . . .
LNS Fixed-Point Multiplcation error for inputs from −1 to +1.
LNS Fixed-Point Addition error for inputs from −1 to +1. . .
Architecture and design decision tree. . . . . . . . . . . . . . .
Determining the fractional precision of the LNS words. 6/2 is
found to be optimum. . . . . . . . . . . . . . . . . . . . . . . .
MSE measurements after the filter has converged for different
LNS fractional precisions. . . . . . . . . . . . . . . . . . . . .
BLNCP for the implemented architecture. . . . . . . . . . . .
Convergence measurement of the filter. . . . . . . . . . . . . .
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
xi
2
2
3
4
4
5
6
12
13
15
16
20
22
23
25
26
27
27
29
30
31
32
33
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
5.1
5.2
5.3
5.4
LNS multiply logic. . . . . . . . . . . . . . . . . . . . . . . . .
LNS addition block. . . . . . . . . . . . . . . . . . . . . . . . .
Ladner-Fischer tree adder[11] used for LNS operations. . . . .
Saturating linear adder used in LNS adder and multiplier blocks.
4-to-1 CMOS multiplexer [31]. . . . . . . . . . . . . . . . . . .
Overall ROM organization. Slices are interleaved in actual implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single-ended bit-cell shown with parasitic metal capacitance
and resistance. . . . . . . . . . . . . . . . . . . . . . . . . . .
Differential ROM bit-cell with complementary output (BLb)
and parasitic elements. . . . . . . . . . . . . . . . . . . . . . .
Current mirror sense amplifier. . . . . . . . . . . . . . . . . . .
Latching sense amplifier . . . . . . . . . . . . . . . . . . . . .
Power of the 8-bit ROM with inverter receiver. . . . . . . . . .
ROM read access time. . . . . . . . . . . . . . . . . . . . . . .
Power consumption in the 8-bit linear saturating adder. . . . .
Worst case timing measurement of the 8-bit linear saturating
adder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Direct form FIR filter block diagram. . .
Transposed FIR filter block diagram. . .
Least-mean square weight update block.
Filter implementation. . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
42
43
45
46
47
48
49
49
.
.
.
.
51
51
52
53
A.1 A diagram of an adaptive filter system. . . . . . . . . . . . . .
58
C.1 Block diagram of custom ROM compiler written in PERL. . .
68
xii
.
.
.
.
35
37
38
39
40
Chapter 1
Introduction
This document describes the design and validation of an on-chip noise
cancellation solution for low power applications. The development stems from
an increasing demand for feature rich mobile audio.
1.1
A Brief History of ANC
In 1936, Paul Leug filed patent number 2,043,416 describing how undesirable audio tones could be selectively removed from the acoustic spectrum
by broadcasting the undesirable signal with a phase shift of π radians using a
microphone and loudspeaker. Leug stated [18]:
“According to the present invention the sound oscillations, which
are to be silenced are taken in by a receiver and reproduced by a
reproducing apparatus in the form of sounds having an opposite
phase.”
The acoustic waves add together in the atmosphere, minimizing the undesirable acoustic signal, a concept that has existed long before Leug’s patent.
This simple concept is illustrated in figure 1.1. The 1936 patent suggests that
that the microphone and loudspeaker be placed a distance apart such that the
π radian phase shift is realized.
One of the first active noise cancellation techniques was developed by
Olson and May in 1953 [20]. Deemed the “Electronic Sound Absorber,” this
solution used a simple feedback loop to provide the secondary source. This
system is depicted in figure 1.2.
1
1.5
Noise
Secondary Source
Acoustic Sum
1
0.5
0
−0.5
−1
−1.5
0
5
10
15
20
25
30
35
Figure 1.1: Acoustic summation of two audio signals.
Acoustic
x(n)
+
e(n)
+
y(n)
W(z)
Electric
Figure 1.2: Feedback only noise cancellation.
2
Acoustic
x(n)
d(n)
P(z)
e(n)
+
+
y(n)
W(z)
LMS
Electric
Figure 1.3: Feedforward ANC using the LMS algorithm.
Early work by Howells and Applebaum at GE in 1957[12] sparked development of adaptive interference cancelling techniques. Bernard Widrow and
Samuel Stearns developed the least-mean square, or LMS, algorithm in 1959
[30]. The attractiveness in the LMS algorithm is in its simplicity. A complete
derivation of the LMS algorithm is included in appendix A.
The LMS algorithm by itself provides good noise cancellation, but a
higher level of noise cancellation can be achieved by taking other system characteristics into consideration. Note that when the acoustic-electric boundary
is crossed, a data converter is used along with either a microphone or speaker.
The speaker may have a phase and magnitude response, shown as H(z) in
figure 1.4, that can be compensated for by filtering the input x(n) [16]. The
resulting system is shown in figure 1.5 and is referred to as FX-LMS , or
filtered-X LMS.
The LMS algorithm can be modified slightly to help finite precision
rounding noise. This technique is known as leaky LMS [16] and is shown in
figure 1.6.
3
Acoustic
x(n)
d(n)
P(z)
e(n)
+
+
H(z)
y(n)
W(z)
LMS
Electric
Figure 1.4: Secondary source transfer function, H(z), is added to the LMS
system [16].
Acoustic
x(n)
d(n)
P(z)
e(n)
+
+
H(z)
W(z)
y(n)
C(z)
x'(n)
LMS
Electric
Figure 1.5: Filtered-X LMS system with transfer function C(x) added.
4
Acoustic
x(n)
d(n)
P(z)
e(n)
+
+
H(z)
W(z)
y(n)
C(z)
x'(n)
Leaky
LMS
Electric
Figure 1.6: Leaky LMS system.
One final consideration of the noise cancelling system brings the acoustice feedback path into focus. The secondary source is a speaker that may
feed back into the reference microphone with some amount of attenuation and
phase shift. This path is referred to as the secondary feedback path [16]. The
secondary feedback path can corrupt the reference and lessen the performance
of the adaptive system. In order to compensate for this path, an electronic
counterpart can be added into the adaptive filter module. The idea is to take
the secondary source output, apply a digital filter that mimicks the secondary
path, and subtract the result from the reference signal [16]. Figure 1.7 shows
the complete system with this enhancement added.
Note that the acoustic secondary feedback path transfer function, F (z),
would need to be static for the life of the product, lessening the usefulness of
this approach as the plant is considered to be dynamic and the feedback path
is similar to the plant.
5
Acoustic
F(z)
+
u(n)
d(n)
P(z)
+
+
e(n)
+
H(z)
+
-
D(z)
x(n)
y(n)
W(z)
C(z)
x'(n)
Leaky
LMS
Electric
Figure 1.7: Secondary feedback path compensation added to the LMS system.
6
1.2
1.2.1
Product Survey
Modern Active Noise Cancellation Systems
Table 1.1 lists many common noise cancellation headphones and their
product specifications.
Product
Cancellation
Battery Life Price
r
°
Bose QuietComfort 2
Unspecified
1xAAA 35 hrs $299
Panasonic RP-HC300
10 dB
1xAAA 35 hrs $119
Sennheiser PXC 300
≤ 15 dB (<1000 Hz) 2xAAA 80 hrs $189
Table 1.1: Several active noise cancellation headphones available at publication[5][21][24].
While products developed by Panasonic and Sennheiser are cheaper
than that by Bose, user reviews of noise cancelling headphones clearly crowned
the Bose to be the premier noise cancelling performers. All of the headphones
consume about the same power levels via a dedicated battery. Most of the
specifications also limit the noise cancellation to a range less than 1 kHz.
1.2.2
Portable Media Players
The portable audio/video player market is saturated with devices that
have a limited feature set. The media player manufacturers need new features that differentiate their product from existing products without adding
singificant power or cost.
7
Chapter 2
Specifications
The overall goal of the implemtation is to add a complex feature without adding too much area and power. In order to achieve this goal, clear
specifications must be made up front. Approximations for area, power, and
performance can be made based on existing product offerings. The priority of
the specifications are set in order.
1. Cost (area)
2. Power
3. Performance
The following sections describe the reasoning behind the specifications
for the on-chip noise cancelling solution.
2.1
Area and Cost
While adding the noise cancelling feature to a portable media player
will increase the retail price of the end product, only a small increase in the cost
of the SoC will be tolerable. The specified cost of adding the noise cancelling
feature to an existing device is developed based on die area.
According to Jan Rabaey in [23], the amortized cost of an integrated
circuit is a function of the area raised to the fourth power.
cost = f (area4 )
8
From [7], the price of normal headphones typically doubles when ANC technology is added. A high end media player sells for $300. The best selling
ANC-capable headphones also sell for $300. Assuming a projection of the
price of a media player with ANC technology is PAN C , the chip manufacturer
could sell their chip for at least PAN C more without giving away the new feature. The increase in area of the device with ANC, AAN C , could be expressed
as
√
AAN C = f ( 4 P riceIncrease)
For PAN C = 125%, the area occupied by the ANC circuits must be less than
6% of the area of the original SoC. If PAN C = 200%, AAN C may be as much
as 19% of the original device.
With the percent increase in area in mind, a typical SoC die area is
needed to determine the exact area requirements of the integrated solution.
This estimate of the typical SoC die area, ASoC , can be arrived at by taking
the following variables in table 2.1 into consideration.
Description
Wafer cost
Wafer diameter
Yield, packaged and tested
Asking sale price per part
Device sales margin
Symbol
Cw
Dw
YP T
PASP
M
Estimate
$3000
300 mm
75%
$10
70%
Table 2.1: Assumptions for estimating the size of an SoC device.
From [23], the number of die per wafer, or DPW, is expressed.
π · ( D2w )2
π · Dw
−√
DP W =
ASoC
2 · ASoC
(2.1)
The second term in 2.1 accounts for the non-functional die around the
perimeter of the wafer. If the area of the part is much smaller than the area of
the wafer, the second term in 2.1 can be ignored. The cost of the wafer should
be roughly equal to the cost of the sum of useable die.
9
Cw = (Useable Die) · (Cost per Die)
= DP W · YP T · PASP (1 − M )
π · ( D2w )2
=
YP T · PASP (1 − M )
ASoC
(2.2)
(2.3)
(2.4)
Solving for ASoC formulates the die size estimate based on these assumptions.
ASoC =
π · ( D2w )2
YP T · PASP (1 − M )
Cw
(2.5)
From this formulation and the estimates in table 2.1, an estimate of 53
mm is made. As mentioned, the increase in area of the newly integrated noise
cancellation circuitry should occupy an addition 6-19% of this, or between 3
and 10 square millimeters.
2
2.2
Power
Many SoC devices’ power consumption is rated on battery life. Based
on a typical AA battery, the power of an SoC is related to the battery life as
Capacity × Vdd
Battery Life
2850 × 1.5
=
Battery Life
P =
(2.6)
(2.7)
The battery life of the Sigmatel D-major is 50 hrs on 1 AA battery. This
results in average power dissipation of 85 mW. Note that the storage device,
be it flash or hard disk, will also dissipate power. Stand-alone noise cancelling
headphones are always powered off of a separate battery, resulting in a long
battery life for the combined solution. The battery life of the Sennheiser PXC
250 headphones is rated at 80 hours with 2 AAA batteries (roughly 94 mW).
However, the newly integrated media player SoC may only pay a small power
penalty to maintain sufficient battery life.
10
In total, an individual listening to a portable media player with standalone active noise cancellation headphones dissipates around 180 mW. Some of
the power is redundant in that there are two audio amplifiers in the system, one
in the player and another in the headphones. With this in mind, it is reasonable
to allow for an increase in power to account for the noise cancellation circuitry.
This increase will be to the player alone. The assumption is made that an
increase in power of 10-33% would be tolerable, given that a person would
consume nearly 100% more power to use the conventional noise cancelling
headphones with a separate battery. This results in a power range of 8-28
mW for the additional logic that handles the noise cancellation.
2.3
Performance
Two definitive performance metrics describe the operation of the noise
cancelling feature. These are noise cancellation and convergence.
2.3.1
Noise Cancellation
Performance of the noise cancelling solution, a measure of the difference
in magnitude of the input and output noise is chosen to compete directly with
existing noise cancelling headset solutions. Typically the noise rejection is
quoted as the average difference in dB over a specified band-limited range. This
will be referred to henceforth as the band-limited noise cancelling performance
(BLNCP). Figure 2.1 depicts a BLNCP measurement in simulation.
A pair of Sennheiser PXC 250 headphones ([24]) are among the latest
in noise cancelling headphones on the market. Sennheiser specifies these particular headphones to actively reject “up to 15 dB” at frequencies less than 1
kHz. The PXC 250 headphones also provide 15-20 dB of passive attenuation
at frequencies above 1.2 kHz[24]. The passive attenuation is realized via the
supraaural ear cups.
11
Band−Limited Noise Cancelling Performance Measure
I/P − sin200 and sin400
mu=0.0005, taps=13, fanc=2000
50
avg = 39.2299 dB
rms = 20.8241 dB
max = 137.318 dB
min = −77.0005 dB
pk−pk = 214.318 dB
0
Mag (dB)
−50
−100
−150
−200
−250
−300
0
100
200
300
400
500
600
700
800
900
1000
f (Hz)
Figure 2.1: Performance of an active noise control algorithm.
2.3.2
Convergence
The time that the algorithm takes to converge on a set of coefficients
that match the plant is referred to as the settling time. Since the device
will be portable, the plant will be dynamic and require the active noise control circuitry to continually adapt the coefficients in a reasonable time that
doesn’t affect the desired audio. A typical convergence measurement made in
simulation is shown in figure 2.2.
2.3.3
Summary of Specifications
Table 2.2 summarizes the device specifications.
The design of the hardware-based noise cancelling solution will be targeted for the specifications in table 2.2.
12
Convergence Measurement
0.14
262 ms.
0.12
0.1
MSE
0.08
0.06
0.04
0.02
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time (s)
Figure 2.2: Typical settling time of an active noise control algorithm.
Parameter
Abbreviation
Band Limited Noise Cancellation Performance BLN CP 1
Settling Time
Tset
Die Area
AAN C
Power Dissipation
PAN C
Notes:
1. Measured as the average difference across the 1 kHz band.
Table 2.2: Active noise control specifications.
13
Value Units
15
dB
TBD ms
10
mm2
28
mW
Chapter 3
Modelling the System
Matlab [19] is relied on heavily to model discrete time filters in industry
and academia alike. The strength of Matlab lies in its capability to perform
matrix algebra efficiently. Since the underlying math in signal processing relies on matrix algebra, it makes for an unparalleled discrete signal processing
simulation environment.
Mathworks, the developers of Matlab, recommend minimizing the number of for and while loops in the scripting environment, since Matlab is optimized to perform computations on matrices instead of individual pieces of
data. Programming without loops can be challenging, but improves the performance of Matlab significantly. Typically, the scripts can be written in the
form of a loop to verify functionality, then mapped into parallel operations
as an optimization. The simplest example is an FIR filter, wherein the input
and and coefficients are vector multiplied to determine the output. Matlab
prefers computing in exactly that fashion. Computing the FIR output using
a while loop is discouraged. However, most of the adaptive filter block was
implemented using loops to mimic a SystemC simulation developed in parallel
by Rich Lathrop and Michael Hutchinson.
Lower level simulations of the mathematical building blocks are described in section 4.3.
The Matlab simulation can be divided into several major blocks. They
combine to provide performance measures of the system and enable flexibility
of architectural exploration. Figure 3.1 depicts the overall Matlab model.
Some of the key Matlab scripts developed for this work can be found
in appendix B on page 62.
14
Statistics
Performance
Passband
Window
FFT
+ -
Input A
FFT
Acoustic
Builder
Input B
Quantizer
(pADC)
u(n)
+
+
P(z)
d(n)
e(n)
+
Sampler
+
F(z)
H(z)
Plant 1
Builder
Plant 2
+
-
D(z)
C(z)
V
x'(n)
Threshold
Convergence
Measurement
y(n)
+
W(z)
MSE
Calculator
File
+
x(n)
Leaky
LMS
aud(n)
Electric
Figure 3.1: Overview of Matlab model including the test bench, input builder,
and active noise control blocks.
15
Acoustic
F(z)
+
u(n)
d(n)
P(z)
+
+
e(n)
+
H(z)
+
-
D(z)
x(n)
W(z)
y(n)
C(z)
x'(n)
Leaky
LMS
Electric
Figure 3.2: Active noise control module implemented in Matlab.
3.1
3.1.1
Function Breakdown
Active Noise Control
The active noise control block includes the adaptive filter along with
the acoustic environment that surrounds it. Key parameters of the adaptive
filter are passed in from the test bench to enable external control. The Matlab
script executes each operation manually such that it can be modified at a very
low level, at the expense of simulation speed.
In figure 3.2, the active noise control function is shown. This filter
topology was developed in [16].
3.1.2
Acoustic Model
The active noise control simulation requires an accurate model of the
acoustics as well. The study of acoustic transfer functions related to the human
head, or the approximation thereof, is known to the acoustic community as the
16
head related transfer function, or HRTF. The Earlab in Boston [9] maintains
a database of HRTFs across test subjects and source location. Approximating
the HRTF with an IIR filter is documented in [14].
MIT’s OpenCourseWare provides an excellent graduate-level course in
acoustics and hearing that goes into great detail describing the propagation of
sound and hearing [6].
The simulation script has a placeholder to use a complex transfer function such as one from the Earlab. However, the performance and convergence
measurements acquired are not based on HRTFs from the Earlab. The Earlab
transfer functions are in-depth models that even account for the dimensions
of the hairs within the ear over the entire hearing range. The adaptive filter
is targeted for lower frequencies and these types of effects are not included.
In the implementation there exists two transfer functions. The first of
these acoustic transfer functions is the path from the reference microphone to
the secondary source in the left ear and the second is to the right. Note that
these two transfer functions are different based on the azimuth of the noise
source. The secondary feedback path is also an acoustic one, that may be
similar to the primary path, with more attenuation.
The end-product may have a microphone positioned very near the output speaker or farther away. In both cases, there should be a headphone cup
around the speaker that attenuates high frequencies.
3.1.3
ADC
A complex ADC block models the characteristics of a realistic pipelined
analog to digital converter. Inclusion of a detailed ADC model enables insight
into system effects of the ADC inaccuracies. For example, the gain of the
residue amplifier can be set as 2.1 ± 6σ and the performance degradation can
be quantified. Other aspects of the ADC include but are not limited to the
parameters in table 3.1.
Note that the output of the ADC can be digital or analog. If the analog
option is used, quantized values between +1 and -1 are returned. The digital
option returns an array of bits which can be less useful in the simulation, yet
17
ADC Parameters
Description
Residue amplifier gain
Offset of comparator
Sample rate
Number of bits
Type of output returned (analog/digital)
Parameter
Ideal Value
Gain
2
Offset
0
Fs
Bits
N
Output Type1
Digital
Notes:
1. The finite precision analog value simplifies the simulation.
Table 3.1: Matlab ADC Parameters.
mimic a real pipelined ADC. This can be helpful during the design phase as
the array of bits can be post-processed as inputs to a Verilog testbench or
HSPICE simulation.
The ADC has the ability to quantize the inputs to the adaptive filter, but is not used in finite precision LNS simulations, since the inputs are
quantized to finite LNS values.
3.1.4
Test Bench
The test-bench passes inputs to the filter and measures the performance
thereof. The ANC system is instantiated from the testbench in one line. For
sampling frequency fs, stepsize mu, filter order Worder, quantization type
quant (infinte, linear, or LNS), left of decimal lod, and right of decimal rod
the system is called from the test bench.
[output, weights] = anc(fs, input, mu, Worder, quant, lod, rod);
The function anc returns an output vector and weight matrix. The size
of the weight matrix is Worder x length(input) as it maintains a history of
the weights over time.
The response of the filter and consequently its performance is greatly
dependent on the inputs [30]. A variety of inputs can be passed to the filter
to study the performance. The goal of any adaptive filter is to adapt to a
18
changing stimulus. Besides adapting to changing inputs, the filter adapts to
a change in the plant as well. The test bench can take this into consideration
by modelling a change in the plant characteristics during the simulation.
Performance measures of the active noise control system include, but
are not limited to, the items in table 3.2.
Parameter
Noise Cancellation
Convergence
Time
Stability
Description
Difference in magnitude of
noise output and input
Time required to reach an
acceptable MSE after a transition in the input or plant
Is the filter stable or not?
Units
dB
Constraints
Band-limited
seconds MSE trigger point
Y/N
Stability Criteria
Table 3.2: Perfomance measures acquired in the test bench function.
3.1.5
Input Builder
The input builder provides a non-stationary input to the adaptive filter.
It accepts two signals, either synthesized in Matlab, or wave files from the hard
drive. The two inputs are first windowed to achieve the desired length and to
attenuate the beginning and end of each signal. Once the two are windowed,
they are concatenated, with any desired amount of overlap. The result is a
series of two separate noise sources, which can form a sudden change to the
input of the adaptive filter. This type of non-stationary input can be used to
benchmark the convergence rate of the filter.
A simple case of two sine waves of different frequencies sent as input to
builder shows the window-concatenate operation. This is depicted in 3.3.
3.2
Modelling the Logarithmic Number System
As part of the effort to make the adaptive filter operate with low power
consumption, the logarithmic number system[26], or LNS, is employed. Modelling of the LNS behavior was conducted in MATLAB. This section gives an
19
Figure 3.3: Input builder window-concatenate operation.
20
Linear
Table 3.3: LNS word stored as a string
Linear Sign Sign Magnitude MATLAB Variable
3.8807
0
0
01110111
’0001110111’
0.7521
0
1
00011001
’0100011001’
-3.8807
1
0
01110111
’1001110111’
-0.7521
1
1
00011001
’1100011001’
overview of how to model LNS in MATLAB. More detail of the LNS implementation is found in chapter 4.
An N-bit LNS number may be stored as an N +2 length word where the
linear sign is stored in the most significant bit. The LNS model utilizes signmagnitude representation of the exponent. Since the logarithm of a negative
number is complex, the linear sign bit indicates if the linear number is negative.
For modelling purposes, this can be represented in several different ways. The
first is to use a struct wherein the sign and zero flags are stored separately
from the value. This method requires the least overhead.
xlns=struct(’z’,xz,’s’,xs,’x’,xlog);
xlns.z % Zero Flag
xlns.s % Sign Flag
xlns.x % Magnitude
The second method of storing this type of data in MATLAB is to pack
the binary data into a single word and store the string as it would be stored
in a register.
For testing LNS itself, as in appendix ??, the struct form is used. Simulation of the adaptive filter, however, only requires quantizing and saturating
each stage of the filter. Storing each individual bit is not necessary.
3.2.1
LNS Finite Word-Length Noise
The LNS finite-word lengths exhibit several interesting properties, especially for acoustic applications.
21
Figure 3.4: The LNS bow-tie depicts the LNS fixed-point conversion error.
Figure 3.4 depicts the conversion error. This will be referred to as the
bow-tie effect.
The most interesting of the two error properties is the error near zero,
referred to henceforth as enz . Note the sharp increase in error as the input
approaches zero in figure 3.4. From [26], converting a number, X, to an LNS
word of base b, XL , is accomplished by taking the logarithm of X.
XL = logb (|X|)
(3.1)
Note that XL approaches −∞ as |X| approaches 0. This is depicted in figure
3.5 for b = 1.2.
(XL )|X|→0 = lim logb (|X|) = −∞
|X|→0
22
(3.2)
Logarithm Approaches −∞
−65
logb(X)
−70
−75
−80
−85
−90
10
8
6
4
X (Reverse Scale)
2
0
−6
x 10
Figure 3.5: As |X| approaches 0, XL approaches −∞.
23
While a zero flag represents the situation wherein |X| is equal to zero, the
largest LNS exponent value, XL , determines the magnitude of enz . If XL is at
the maximum value, this is as close to −∞ the system can represent without
being exactly zero.
Since the value of XL will be stored as a normal fixed-point binary
number, enz can be formulated in terms of NLOD and NROD , the number of
bits to the left and right of the decimal, respectively.
enz = b−(2
enz ≈ b
NLOD −2−NROD )
−(2NLOD )
(3.3)
(3.4)
In lieu of 2NLOD being much larger than 2−NROD , it is desireable to pack more
bits to the left of the decimal than the right, resulting in a larger |X L | and
consequently a smaller enz . However, stealing bits from the right of the decimal
to decrease enz adversely affects the second error property, the slope of the
bow-tie.
The slope of the bow-tie effect seen in figure 3.4 has a strong linear
envelope. The linear envelope has a slope that is proportional to 2−NROD . The
effect observed is that the slope of the bow-tie envelope is cut in half for every
bit that is added to the right of the decimal, independent of what value is
chosen for NLOD .
The two properties of the LNS conversion error allow the designer to
choose where the error will impact the design. Simulations of varying NROD
and NLOD in the adaptive filter showed that enz has the greatest effect on the
error.
Note that the linear number system has constant quantization error
across the linear input range. For a linear word length N , the quantization
error is equal to
Figure 3.6 shows the mean square error convergence vs time for several
different test cases. The benefit of increasing NROD beyond 2 are very low.
NLOD = 6 provides acceptable BLNCP and a reasonable settling time. This
24
MSE for Varying LNS Bit Precision
LOD/ROD − Length 20 Filter
200 Hz Stationary Input
−3
x 10
9
6/1 (20)
6/2 (20)
6/3 (20)
6/4 (20)
7/2 (20)
8/2 (20)
8
7
6
MSE
5
4
3
2
1
0
0
0.5
1
1.5
2
2.5
3
Time (s)
Figure 3.6: Mean-square error (MSE) for several different LNS bit precisions.
25
MSE vs Time
Varying W(z) Tap Length
LNS 6/2 Precision, mu = 0.5e−3
0.025
Order 20
Order 18
Order 16
Order 14
Order 12
Order 10
0.02
MSE
0.015
0.01
0.005
0
0
0.5
1
1.5
2
2.5
3
Time (s)
Figure 3.7: Mean-square error (MSE) for varying number of taps.
results in an 8-bit LNS word and a 2 × 8 × 28 lookup operation, short enough
to implement several within the system.
Filter tap-length requires extensive characterization and modelling across
a myriad of scenarios. Several multi-frequency inputs and a broadband wav
file were tested against L = 13 and provided sufficient cancellation. Mean
square error convergence plots for varying filter taps is depicted in figure 3.7.
Error during a multiply or add operation suffers from similar characteristics when the result is near zero. Histograms of the error in LNS multiplication and addition are shown in figures 3.8 and 3.9, respectively. Note the
outlying error datapoints in figure 3.9. In these cases, the two numbers have
very close magnitude with opposing signs, resulting in a number close to zero.
26
600
500
Hits
400
300
200
100
0
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
Error
Figure 3.8: LNS Fixed-Point Multiplcation error for inputs from −1 to +1.
120
100
Hits
80
60
40
20
0
−1
−0.5
0
0.5
1
Addition Error
Figure 3.9: LNS Fixed-Point Addition error for inputs from −1 to +1.
27
3.3
Model Results
A wide range of design trade-offs are made to meet the specifications.
Figure 3.10 shows the different high level decisions that are made in designing
the low-power adaptive filter.
3.3.1
Word Length Results
Using the Matlab model, the mean-square error, ξ, was measured to
determine the optimum bit precision for this system. From section 3.2.1, the
integer part of the LNS word, NLOD , must be sufficiently large to correctly
represent small numbers. Stouraitis and Paliouras recommend using b = 1.2
to minimize the bit activity [26]. For NLOD = 5, enz = 0.003. This being
too small to represent the step size used in the model, NLOD = 6 is used and
results in enz = 8.5 · 10−6 . This should be small enough to begin with.
Simulations were run with NLOD for varying NROD to determine the
optimum NROD . Figure 3.12 depicts the MSE for the system with the adaptive
filter disabled, with floating point precision, and with varying NROD .
Parameter
Input
fs
µ
b
Pz
Value
Single Tone (302.6 Hz)
4 kHz
0.001
1.2
N=20 Chebyshev Window
Table 3.4: Configuration used to determine NROD .
Using 3 bits follows the floating point results very closely. However,
adding a single bit can double the size of the LNS ROM table in the saturating
addition unit (see section 4.2.5). NROD = 2 exhibits slightly higher MSE yet
meets the specification for noise rejection in table 2.2. 2 bits are allocated for
NROD .
Determining the filter tap-length requires extensive modelling under
a variety of conditions and stimulus. Simulations for different tap-lengths,
28
Active Noise Control
Adaptation Methods
Least Mean Square
Steepest Descent
Random Search
Architectural Aspects
Structure
FIR
IIR
Lattice
Differential Coefficients
Length
Internal Compensation
Sample Rate
Number System
Linear
Logarithmic
Residue
Logic Style
CMOS
Pass Transistor Logic
Fixed Point Error (LOD, ROD)
Specifications
Architectural
Noise Rejection (dB)
Rejection Bandwidth (Hz)
Convergence Rate
Stability
Power
Cost (Area)
Figure 3.10: Architecture and design decision tree.
29
x 10
MSE vs LNS Bit Precision (LOD=6)
Varying LOD
−3
14
ANC Disabled
Floating Point
6/1
6/2
6/3
12
10
MSE
8
6
4
2
0
0
0.25
0.5
0.75
1
1.25
Time (s)
Figure 3.11: Determining the fractional precision of the LNS words. 6/2 is
found to be optimum.
30
Figure 3.12: MSE measurements after the filter has converged for different
LNS fractional precisions.
31
Band−Limited Noise Cancelling Performance Measure
I/P − 302 Hz; Plant − Chebyshev Window (N=20)
mu=0.001, taps=13, fanc=4000, LOD/ROD − 6/2
50
avg = 21.6162 dB
rms = 17.7562 dB
max = 135.815 dB
min = −59.955 dB
pk−pk = 195.77 dB
0
Mag (dB)
−50
−100
−150
−200
−250
0
200
400
600
800
1000
f (Hz)
1200
1400
1600
1800
Figure 3.13: BLNCP for the implemented architecture.
32
Convergence Measurement
−3
x 10
10
9
8
7
MSE
6
5
4
35 ms
3
2
1
0
0
0.1
0.2
0.3
Time (s)
0.4
0.5
0.6
Figure 3.14: Convergence measurement of the filter.
input sequences, plant characteristics, and step sizes must collectively give the
designer an idea of how many taps are necessary to meet the specifications.
Convergence rate is checked under the conditions listed in table 3.4.
Figure 3.14 shows the final convergence rate of the filter.
33
Chapter 4
Low-Power Multiply/Add Building Blocks
4.1
Logarithmic Number System
The logarithmic number system[28], or LNS, was introduced as an alternative to linear numbers in an attempt to enhance performance. An increase
in performance can translate into power savings with minor changes[8].
logb (A · B) = logb (A) + logb (B)
(4.1)
The benefit of easily computing the product is paid for with a more
complex addition operation. The addition of two linear numbers represented
in LNS requires a fairly complex design with many tradeoffs to be made. Most
of the literature on LNS focuses on ways to improve the addition circuit.
4.2
Implementation
The basic building blocks used to create LNS multiply and add units
were created using the NCSU 0.18µm technology. Designing the arithmetic
units is expedited with the use of several specialized tools. Verification of the
adder was completed by using MATLAB to generate a verilog testbench that
provides inputs to the unit and checks the output against what is expected. A
custom PERL ROM compiler was developed to create the ROM lookup table
for addition and subtraction operations.
34
BL
a
+
c
AL
b
CL
-
as
bs
cs
Figure 4.1: LNS multiply logic.
4.2.1
LNS Multiplication
As mentioned, the LNS multiplier is simple. Figure 4.1 exhibits the
logic required to compute the multiplication. The critical path of the multiplier
is just a single addition block:
tcrit = tadd
4.2.2
(4.2)
LNS Adder
Addition, or subtraction, in LNS is typically accomplished via[26]:
Cl = Al + log(1 + bBl −Al )
=Bl + log(1 + bAl −Bl )
35
(4.3)
Note that the inputs Al and Bl can be swapped, providing two different
ways to compute the same result. Typically, the log(1 + bAl −Bl ) operation is
accomplished via a lookup table. Reducing the lookup table size is the primary
area of interest when using the logarithmic number system. The subtraction
operation, expressed in equation 4.4 is similar but requires a separate lookup
table.
Cl = Al + log(1 − bBl −Al )
=Bl + log(1 − bAl −Bl )
(4.4)
The addition block is lengthy and requires two adders and the lookup
table in series. The critical path through the addition block is:
tcrit = tadd + tmux + tlu + tmux + tadd
=2tadd + 2tmux + tlu
(4.5)
Comparing equation (4.5) to the multiplier critical path in equation
(4.2) shows that the adder will have at least 2x the delay of the multiplier.
Many papers have been written on how to speed up the addition operation
under the LNS system. Moreover, the adder logic has been studied extensively outside the LNS realm, leaving the lookup operation under the focus for
improvement.
The lookup table can be a PLA, ROM, or a combination of both. Designers have found several ways to minimize the lookup table length. The
easiest way to reduce the lookup table is to guarantee a negative value for
Bl − Al [29]. This may involve a slight increase in area for the same delay
because of the additional adder. However, it reduces the table to one-half the
original size.
Mark Arnold has researched different interpolation methods extensively
in [2], [3], and [4].
36
BL
a
+
c
AL
b
Lookup Plus
z
-
0
Mux
2:1
AL
a
1
+
c
cs
BL
b
s
0
-
Lookup
z Minus s
s
Mux
2:1
1
s
a
altb
+
c
as
bs
sign
AL
0
BL
1
Mux
2:1
b
CL
+
s
Figure 4.2: LNS addition block.
A simple LNS addition scheme was implemented that guarantees negative inputs to the lookup table. Figure 4.2 shows the addition/subtraction
unit. Note that the two 7-bit tables can be combined into a single 8-bit table
and the output mux becomes part of the lookup column mux.
4.2.3
Linear Adder
A Ladner-Fischer[17] style tree adder was chosen to implement the
2’s complement addition required for LNS multiplication and addition. This
particular adder is better for longer wordlengths. For 8-bit addition, a simpler
carry-save or carry-lookahead adder may give similar results.
Since the adder is used for signal processing applications, it must saturate if it overflows. Because of this, the output of the adder has to be multiplexed with the most positive output and the most negative output.
4.2.4
Multiplexer
Initially, a pass-gate multiplexer was used. The pass-gate mux, while
having desirable timing, is not suitable for low voltage applications. For operation at very low voltages, a CMOS multiplexer is a much better choice. Figure
4.5 depicts the 4:1 multiplexer used to switch between array slices in the ROM
(column mux). A similar CMOS 2:1 multiplexer switches is instantiated within
37
pg
pg
pg
pg
pg
pg
pg
pg
Cin
ggl
ggpl
ggpl
ggl
ggpl
ggl
ggpl
ggl
Cout
ggl
ggl
SL
SL
ggl
SL
ggl
SL
SL
SL
SL
SL
Figure 4.3: Ladner-Fischer tree adder[11] used for LNS operations.
38
Figure 4.4: Saturating linear adder used in LNS adder and multiplier blocks.
the LNS addition block.
4.2.5
ROM
The lookup operation is implemented in a ROM array. More complex
partitioning or even a PLA implementation may also be suitable choices for
LNS. In [29], Taylor utilized a ROM for the majority of the table, while a PLA
represents portions of the table that change less rapidly.
A PERL ROM compiler was developed to generate an HSPICE simulation deck that provides power and delay measurements of the array. The
goal of the custom ROM compiler developed specifically for this design was
to allow quick iterations when determining power consumption and timing.
Some of the high level features of the ROM compiler are listed in table 4.2.5.
The ROM implemented has 8 bit words and is organized to compensate
for the rectangular cell size, 8λ × 12λ. There are four slices of 64 entries. This
gives a more square overall ROM (384λ × 512λ). Figure 4.6 depicts the overall
ROM layout organization.
Note that the PMOS pre-charge devices and the sense amplifiers can
39
Figure 4.5: 4-to-1 CMOS multiplexer [31].
Table 4.1: ROM Compiler high-level features.
Implements single or differential bitline
ROM data read from a file
Generates PWL inputs based on integer input file
Short main .sp file hides complexity of array
Supports drop-in sense amp replacements
Easily integrates with other circuits
40
Figure 4.6: Overall ROM organization. Slices are interleaved in actual implementation.
be disabled to save power. If the system that uses the LNS arithmetic block
consists of a series of multiply→add→latch operations the ROM array can go
into a sleep mode. During the sleep mode the pre-charge devices are turned
off along with the sense amplifiers, if used. The enables for the pre-charge
devices and the sense amplifiers are ganged by the slice, such that logic may
enable individual slices to save power.
The array, comprised of multiple slices, must incorporate a column
multiplexer to select between the slices. The layout of the column mux is
more efficient if the slices are interleaved.
The ROM compiler generates three separate HSPICE files to allow
simulation of the full array along with the array’s periphery. The romdeck.sp
file houses the raw array including individual π-models for the bit and word
line routing. A lnsops.sp file sets all of the simulation parameters, generates
inputs, and makes measurements. A third file, periph.sp, is the array’s subcircuits. These include PMOS pre-charger devices, sense amplifiers, bit-cell,
and π-models.
41
BL_(s)_(r)_(c)
R1
WL_(s)_(r)_(c)
Figure 4.7: Single-ended bit-cell shown with parasitic metal capacitance and
resistance.
4.2.5.1
Cell
Each individual ROM cell must be programmed to give the correct data
value. This is accomplished by programming resistance values within the cell
that behave like metal contacts. The differential bit-cell is shown in figure 4.8
and the single-ended bit-cell is shown in figure 4.7.
R0 and R1 must be designed for each cell to be connected to either
bitline or bitline b. This is accomplished via a handshake between PERL and
HSPICE using two ternary statements. The PERL ROM compiler knows if
the current bit, which sits at a particular slice/column/row address, should be
a 1 or a 0. If the bit is a logic 1, the cell will have a low resistance contact to the
NMOS pull-down in the cell. The opposite is true for the contact to bitline b
in the differential case. Below is the HSPICE subcircuit that implements each
bitcell.
.SUBCKT bitcell bl blb wl dat=1
42
BL_(s)_(r)_(c)
BLb_(s)_(r)_(c)
R0
R1
WL_(s)_(r)_(c)
Figure 4.8: Differential ROM bit-cell with complementary output (BLb) and
parasitic elements.
43
M1 netint wl 0 0 TSMC18DN L=180E-9
+W=270E-9 AD=121.5E-15 AS=121.5E-15
+PD=1.44E-6 PS=1.44E-6 M=1
R1 netint bl ’(dat==1) ? 1e-3 : 10MEG’
R0 netint blb ’(dat==0) ? 1e-3 : 10MEG’
.ENDS bitcell
For each bitcell that PERL instantiates, it sets the dat parameter to
either ’1’ or ’0’. When dat is equal to ’1’, the HSPICE bitcell sub-circuit will
program R1, which models the connection to bitline, with a low resistance and
R0 with a high resistance. Similarly, dat equal to ’0’ will set a high resistance
to bitline and a low resistance to bitline b. The result is a fully programmed
NxM array with S slices. This particular implementation makes use of a 64x32
array with 4 slices.
4.2.5.2
Row Circuitry
A two-level, static CMOS decoder [23] was chosen to implement the
word-line driver. The pitch of the decoder must match that of the individual
ROM cell. A single ended 1T ROM cell may be as small as 7λx11λ [11] whereas
a standard cell may be roughly 32λ in height. The small signal version of the
ROM is metal limited as the bit and word line wires must be wide to have
lower resistance. The row decoder has to be staggered in order to mate with
the ROM pitch.
4.2.5.3
Column Circuitry
The ROM compiler has the capability to instantiate differential bitlines with sense amps or single ended bitlines with an inverter output (large
signal). The column circuitry is made up of the bitline receivers and column
multiplexer. Two common sense amplifiers were investigated for use in the
lookup table along with the simple inverter receiver. The current mirror amplifier [23], shown in figure 4.9, consumes too much power to be useful for this
application. It was found to consume upwards of 8-9 mW. A latching sense
amplifier [23], shown in figure 4.10, was also measured for power and timing.
44
VDD
V DD
BLb
BL
EN
Figure 4.9: Current mirror sense amplifier.
It provided a reduction in power of around 75% and less than 1 ns worst case
timing through the ROM. The final solution to the power consumption issue
is the large signal inverter receiver. This provides the least power and area,
both at the expense of timing.
4.3
Circuit-level Simulation
Blocks within the LNS add unit and the LNS multiply unit were simulated separately, due to the large simulation time of the ROM. From the
simulation results, timing and power estimates can be made for the adder and
the multiplier.
4.3.1
ROM Simulation
Simulation of the ROM at the transistor level was conducted to measure timing and power consumption. The following simulation results do not
include the row decoder logic or the column multiplexer. Figure 4.11 depicts
the ROM power consumption during a read operation. The ROM power is
45
Figure 4.10: Latching sense amplifier
listed in 4.2. Static power should reduce linearly with the number of slices
enabled.
A transient simulation assessed the read access time through the ROM
array. The result is shown in figure 4.12.
4.3.2
Linear Adder
The linear adder unit was also simulated to measure the power consumption and worst case timing path. This was also conducted in HSPICE.
Since the adder is instantiated three times within the LNS adder and once
within the LNS multiplier, the power, area, and timing contributions are significant with regards to the LNS blocks.
46
lns log-add hspice simulation
1.4m
1.2m
Params (lin)
1m
800u
600u
400u
200u
0
50n
Time (lin) (TIME)
100n
Figure 4.11: Power of the 8-bit ROM with inverter receiver.
Block
LNS Add w/o ROM
ROM
Power (mW)
0.382
0.620
Timing (ns)
3.1
2.8
Area (µm2 ) Notes
9,422
10,314
LNS Adder
1.002
5.9
19,736
1
LNS Multiplier
0.150
1.2
6,933
2
Notes:
1. The LNS adder is two 7b ROMs, three linear add units, and 2 multipexers.
2. The LNS multiplier is comprised of a single linear add unit and an XOR gate.
3. Power and timing simulations are for VDD=1.2V and room temperature.
Table 4.2: Implementation results.
47
lns log-add hspice simulation
1.2
1
Voltages (lin)
800m
600m
400m
200m
0
29n
30n
31n
32n
Time (lin) (TIME)
33n
Figure 4.12: ROM read access time.
48
Figure 4.13: Power consumption in the 8-bit linear saturating adder.
Figure 4.14: Worst case timing measurement of the 8-bit linear saturating
adder.
49
Chapter 5
Filter Hardware
With architectural and implementation aspects defined, the filter hardware must be addressed such that timing, area, and power specifications are
met. The design specifications for this application are developed in chapter
2 and summarized in table 2.2. The adaptive filter model was developed in
chater 3 and generic arithmetic units in chapter 4. Mapping the architectural
model of chapter 3 using the building blocks from chapter 4 is the focus of this
chapter.
The model developed makes use of a direct form FIR filter. The output
of the direct form FIR filter is expressed.
y=
L−1
X
i=0
w(i) · x(L − i)
(5.1)
A direct mapping of equation 5.1 into a signal processing diagram is depicted
in figure 5.1. The critical path of this structure is Tmult + L · Tadd . Since L will
be in the range of 10-20, the timing cycle time would need to be unreasonably
long [22]. Transposing the filter in figure 5.1 results places the latches in the
sum path, reducing the cycle time to Tmult + Tadd while preserving the filter
structre. Transposition is defined by Keshab Parhi:
Reversing the direction of all edges in a given signal flow diagram
and interchanging the input and output ports preserves the functionality of the system.
Transposition decouples the cycle time from the tap-length. Figure 5.2 depicts
the transposed filter, also known as the data-broadcast FIR filter.
50
x(i)
w0
w1
x
w2
wL-2
w L-1
x
x
x
x
+
+
+
+
y(i)
Figure 5.1: Direct form FIR filter block diagram.
x(i)
w L-1
w L-2
x
w L-3
w1
w0
x
x
x
x
+
+
+
+
Figure 5.2: Transposed FIR filter block diagram.
51
y(i)
x(i-L)
x(i-L+1)
x(i-2)
x(i-1)
x(i)
x
x
x
x
x
x
+
+
+
+
wL-2 (i)
w L-1 (i-1)
wL-2 (i-1)
+
e(i)
2µ
w 0(i)
w2(i-1)
w1(i-1)
w0(i-1)
Figure 5.3: Least-mean square weight update block.
The FIR filter, alone, could be easily pipelined using feedforward cutsets to improve the cycle time. However, adding the weight update block,
shown in figure 5.3, complicates the structure.
W(n) = W(n − 1) + 2µ · e(n − 1)X(n − 1)
(5.2)
The complication arises from the recursive loops formed by adding the weight
update to the FIR filter. The weight update unit accepts feedback from the
prior weight vector, W(n − 1), and the error signal, e(n − 1). These paths, by
definition of the weight update equation in equation 5.2, are not delayed. As it
stands, the data-broadcast FIR filter along with the weight update block has a
cycle time of Tmult +Tadd . If the designer wishes to reduce the cycle time of the
filter as a whole, both the FIR and weight update blocks must be addressed.
Trade-offs of pipelining the adaptive filter, while not implemented in this application, discussed in section 5.2. The detailed procedure for pipelining the
LMS filter is described by Keshab Parhi in [22] and [25].
The filter and weight update block are shown together in figure 5.4.
The low cycle time requirements of the system along with a narrow bit-width
allow flexibility in molding the filter to meet the requirements from the model.
52
2µ
x
x
x
x
x
x
x
+
+
+
+
+
+
w 2 (i-1)
w 1 (i-1)
w 0 (i-1)
w L-1 (i-1)
x
w L-2 (i-1)
w L-3 (i-1)
x
x
x
x
x
+
+
+
+
+
Figure 5.4: Filter implementation.
53
e(i)
x(i)
y(i)
This implementation of the LMS adaptive filter requires building blocks
summarized in table 5.1, with L taps and N word-length.
Element
LNS Adder Unit
LNS Multiplier Unit
Latches
Expression
2L − 1
2L + 1
3LN
L = 13 Quantity
25
27
312
Table 5.1: Filter hardware requirements.
From the modelling phase, the sample rate of the filter was found to
be 4 kHz. This requires a 250 µs cycle time.
Table 5.2 exhibits the entire system’s characteristics including area,
power, and timing. These are estimates based on the filter hardware shown in
figure 5.4 and the building block results in table 4.2 on page 47.
Qty. Operation
Power (mW) Area µm2
25
LNS Addition
25.05
493,400
27
LNS Multiplication
4.05
187,191
312 Latches
9.36
9360
Total
38.5
689,951
Table 5.2: Filter hardware results.
Power for the adaptive system is 37% over the budget outlined in table
2.2. However, area is less than 10% than what was budgeted for initially.
Given the additional area, additional hardware could be added to reduce power
consumption. This may come in the form of sleep logic. Individual units should
be powered off for some portion of the cycle, leaving power to the latches on to
save the filter state. The implementation is fully parallel, resulting in timing
surplus as the filter is able to operate clock for clock with input samples.
5.1
Possible Enhancements
An enhanced successor to this implementation could be comprised of
one or all of the following features:
54
• Shared lookup table: Sharing the lookup operation amongst
LNS addition units. This would require a multi-ported ROM,
but would be advantageous if a longer wordlength is desired.
• Folded LMS: Folding the current architecture would require
a small increase in logic and latches, but would significantly
reduce power. Timing can easily be achieved given the low
sample rate.
• Variable Logarithmic Word-Lengths: Variable word-length
filters have been studied extensively for the linear number system.
• Tap-Length Modulator: The adaptive filter, by definition,
minimizes error by adjusting difference equation coefficients.
If the error threshold is too high, additional taps could be
dynamically added to the filter. Similarly, taps could be removed from the loop to save power.
5.2
Pipelining the LMS Filter
To pipeline the weight update stages Keshab Parhi recommends in [25]
to add delay directly to the weight input. The additional delay elements may
be retimed into the weight update block to achieve a shorter cycle time. In
order to strictly follow retiming, delay must be subtracted from the e(n) and
x(n) cutsets into the weight update block. Since there are no latches to be
removed, the newly added latches change the filter functionality. The weight
update routine will be delayed with the assumption being that the weights do
not have to be updated clock for clock with the filter output.
Stability, convergence, and accuracy of the adaptive filter are sacrificed
to improve the cycle time [25]. Since the stability is affected, the step size
needs to be adjusted to accomodate the new structure. The simulation model
of the adaptive filter must take this type of change into account if it is to be
implemented.
55
5.3
Other Filter Options
A multitude of filter hardware choices are available outside of direct
form FIR. IIR and lattice structures are less common, but feasible options
that should be considered.
The IIR implementation faces a more strict stability requirement. Not
only does the adaptive filter have to be stable by setting the step size appropriately, the IIR filter itself must also be stable. With static weights, this
concern is lessened. However, as the weights are adapting they may cause the
filter to have poles outside the unit circle.
Lattice structures may also be implemented using IIR or FIR functionality. Adaptive lattice filters are commonly used in speech recognition
software.
56
Appendices
57
Appendix A
Least-Mean Square Algorithm
Bernard Widrow and Samuel Stearns are the founders of the least-mean
square algorithm (LMS). A more in-depth analysis of the performance surface
and alternative means of searching it are described in their book, Adaptive
Signal Processing [30]. This appendix is a brief summary of how the algorithm
is formulated.
Figure A.1: A diagram of an adaptive filter system.
The performance function of the non-recursive (FIR) adapative filter is
known as the mean square error, MSE or ξ.
ξ , E[e2 (n)]
(A.1)
For the adaptive system in figure A.1, the output of the FIR structure, y(n),
at time n is the convolution of the input x(n) and the current filter weights,
58
w(n):
y(n) = w(n) ∗ x(n) =
L
X
wln xn−l
(A.2)
n=0
This can also be realized in vector notation (see [13] for a sound review of
vector algebra) as:
T
y(n) = XT
(A.3)
n W n = W n Xn
From figure A.1, the error signal is defined as:
e(n) = d(n) + y(n)
(A.4)
In some texts, the error signal may be defined as the subtraction of y(n) from
d(n), depending on the application. Expanding e(n) gives
T
e(n) = d(n) + XT
n Wn = d(n) + Wn Xn
(A.5)
The error signal, e(n), is squared to give
T
T
e2 (n) = d2 (n) + XT
n Wn Xn Wn + 2d(n)Xn Wn
(A.6)
The expected value1 of e2 (n) is
T
T
E[e2 (n)] = E[d2 (n) + XT
n Wn Xn Wn + 2d(n)Xn Wn ]
2
= E[d (n)] +
2
= E[d (n)] +
T
E[XT
n W n Xn W n ]
E[WnT Xn XT
n Wn ]
+
+
2E[d(n)XT
n Wn ]
T
2E[d(n)Xn Wn ]
(A.7)
(A.8)
(A.9)
Since the performance function is a function of Wn , it is assumed to be constant for this derivation and the n subscript is dropped from Wn . The constant
weights can be factored out of the expected value operations.
T
E[e2 (n)] = E[d2 (n)] + WT E[Xn XT
n ]W + 2E[d(n)Xn ]W
(A.10)
Defining A = E[Xk XT
k ] and P = E[d(n)Xk ] simplifies the expression for the
mean-square error, ξ.
1
A good explanation of expected value is on page 471 of [27]
59
ξ = E[e2 (n)] = E[d2 (n)] + WT AW + 2PT W
(A.11)
The gradient of the performance function, ∇(ξ), can be used to find
the minimum of ξ.
δξ
δW
= 2AW − 2P
∇(ξ) =
(A.12)
(A.13)
The minimum occurs when ξ = 0. Setting the gradient of the meansquare error to zero reveals the coefficients to be used to achieve minimum
error, labeled W0 .
W0 = A−1 P
(A.14)
While this representation of the minimum error weight vector looks simple, it
requires a significant resources to compute the new weights.
From a control system perspective, it is desirable to move towards the
minimum in a controlled fashion (damped), rather than immediately moving to
the optimum weight vector, W0 . This can be achieved by moving “downhill,”
or in the opposite direction of the slope. This is known in the adaptive filter
realm as gradient search.
W(n + 1) = W(n) − µ · ∇(n)
(A.15)
This expression for the next weight vector, W 0 , can be used to minimize ξ, realizing the desired behavior. The value of ∇(n) is defined in A.12. Computing
2AW − 2P at each iteration requires significant resources. Calculating ∇(n)
with m points used in the expected value requires Nmult multiply operations
and Nadd addition operations.
Nmult = (L2 + L) + (L2 + 1) + (2L) + (1)
2
(A.16)
= 2L + 3L + 2
(A.17)
Nadd = (L · m) + (L · m) + 1
(A.18)
= 2L · m + 1
60
(A.19)
Of course, Nmult and Nadd do not account for the operations required to
compute the filter output. Clearly, a simplification must be made in order to
make the system have reasonable resource requirements. The LMS algorithm
provides that simplification. Where in A.15, the gradient of E[e(n) 2 ] was used,
the LMS method makes use use of the gradient of e(n)2 instead.
δ
e(n)2
δW
δ
e(n)
= 2 · e(n)
δW
= −2 · e(n)X(n)
∇LM S =
(A.20)
(A.21)
(A.22)
With this modification to the gradient search method, the weight vector
for n + 1 is revisited.
W(n + 1) = W(n) − µ∇LM S
= W(n) + 2µe(n) · X(n)
(A.23)
(A.24)
This elegant solution to the resource problem requires no vector multiplies
and can be computed using Nmult = L + 1 and Nadd = L. Note that the
filter output will require L multiplications and L additions. While the LMS
simplification is simple, the path to the minimum error will be longer and less
accurate.
61
Appendix B
Matlab Routines
Some of the key Matlab models used to develop the noise cancellation
are included in the following sections.
B.1
Modelling Subroutines
The following short routine is used within the adaptive filter to quantize
and saturation the intermediate calculations to a fixed LNS word length. Array
sizes and parasitic elements of the array are also calculated.
B.1.1
LNS Fix and Saturate
function [y] = fixlns(x,lod,rod,base)
if nargin==0
x = 2*rand()-1;
x = 0.000001;
lod = 4;
rod = 4;
base = 1.2;
end
limit = 2^(lod)-1/2^(rod);
signx = sign(x);
if x==0
xl = -limit;
xlf = -limit;
else
62
xl = log10(abs(x))/log10(base);
xlf = round(xl * 2^(lod+rod)) / 2^(lod+rod);
if xlf>(limit)
xlf=limit;
end
if xlf<(-limit)
xlf=-limit;
end
end
y = signx*base^(xlf);
end
63
B.2
Implementation Aid
This function is used to generate the text file that seeds the ROM
compiler. Analysis of how much the lookup output changes for different inputs
can be done using this function.
B.2.1
LNS Lookup Table Generator
%LNS ROM Table Generator
%Generates the Lookup table for the LNS Add/Sub Routine
clear;clc;
%Parameters
WordLength = 8;
SliceWords = 64;
Lambda = (0.180e-6)/2;
CellH = 1*12*Lambda;
CellV = 1*8*Lambda;
m2rsq = 0.08;
%M2 used for bitlines (R = (l/w)*Rsq)
m2cap = 0.1e-15:(0.1e-15):0.2e-15; %(F/um)
m2width = CellH*(0.5);
m2space = CellH*(0.5);
ParXtorCap = 0.37e-15;
%Size Calculations
Addresses = 2^WordLength;
ArrayBits = Addresses*WordLength
NumberOfSlices = (Addresses) ./ SliceWords;
SingleCell = CellH * CellV;
CellArea micronsq = CellH * CellV * ArrayBits
CellArea mmsq = CellArea micronsq ./ 1e-6;
SliceSize = CellArea mmsq ./ NumberOfSlices;
%Cell RC Calculations
%BitlineR = ; %bitline resistance
64
%Worst Case Column RC Calculation
Columnvr = (CellV*SliceWords/m2width)*m2rsq;
ColMetalvc = (CellV*SliceWords*1e6)*m2cap;
ColParXtorc = (SliceWords)*ParXtorCap;
Columnvc = ColMetalvc+ColParXtorc;
%----- TABLE -----%
%Log Base
b=1.2;
LOD=4; %Left of Decimal
ROD=4; %Right of Decimal
%LNS Base
b=1.2;
%Variables
lsgn = -1; % use -1 if guaranteed to always have a negative input
z=0:lsgn*1/2^ROD:lsgn*(2^(LOD+ROD)-1)/2^ROD;
GN=-(2^(LOD+ROD)-1)/2^ROD;
%Add-Log Intermediate Calculation for Lookup Table
sbp=log10(1+b.^z)/log10(b);
sbpdiff = diff(sbp);
sbprnd = round(sbp*(2^(ROD)))/(2^(ROD));
% round, floor, ceil, fix
if 0
plot(z,sbp,z,sbprnd);
str=sprintf(’Length of z = %i’ ,length(z)); disp(str);
end
%figure; plot(z(1:length(z)-1),sbpdiff(1:length(z)-1));
if 0
writeArray(:,1)=z;
writeArray(:,2)=sbprnd;
dlmwrite(’arrdat.txt’ ,writeArray,’,’ );
end
65
66
Appendix C
ROM Compiler in PERL
A block diagram of the PERL ROM compiler is shown in figure C.1.
67
Input
Vector
(inpdat.txt)
Matlab
Array Data
(arrdat.txt)
Custom Row
and Column
Decoders
(decckt.sp)
Specifications
Rows
Columns
Slices
SE or Diff
Sense Amp
Line
Parasitics
Simulation
Drivers
Sim time
Step
t rise /t fall
ROM Compiler
(romcomp.pl)
Main Spice Control
File (lnsops.sp)
Peripheral Circuits
(periph.sp)
Input PWL Vectors
Full ROM Spice Model
(romdeck.sp)
Figure C.1: Block diagram of custom ROM compiler written in PERL.
68
C.1
HSPICE Output File Samples
Here is an example of an input vector generated by the ROM compiler.
Each address bit is generated for a number of clock cycles based on fixed point
entries in a separate file.
Vadrin_2 adrin_2 0 PWL
+2.5e-10
0 1e-08 0
+1.025e-08 vdd 2e-08 vdd
+2.025e-08 0 3e-08 0
+3.025e-08 0 4e-08 0
+4.025e-08 0 5e-08 0
+5.025e-08 vdd 6e-08 vdd
+6.025e-08 vdd 7e-08 vdd
+7.025e-08 vdd 8e-08 vdd
+8.025e-08 0 9e-08 0
+9.025e-08 0 1e-07 0
This listing is a single cell from the array that the compiler generates.
The sample is from the final row of the output that is attached to the output
receiver.
$WORD[63]=(2.187500)=(00100011)
XI1522 BL_0_63_0
BL_0_64_0 CRCVERT
XI1523 BL_0_63_0
WL_0_63_0 BITCSE dat=1
XI1524 WL_0_63_0
WL_0_63_1 CRCHORI
XI1525 BL_0_64_0
SAOUT_0_0 INVERTER1
XI1526 BL_0_63_1
BL_0_64_1 CRCVERT
XI1527 BL_0_63_1
WL_0_63_1 BITCSE dat=1
XI1528 WL_0_63_1
WL_0_63_2 CRCHORI
XI1529 BL_0_64_1
SAOUT_0_1 INVERTER1
XI1530 BL_0_63_2
BL_0_64_2 CRCVERT
XI1531 BL_0_63_2
WL_0_63_2 BITCSE dat=0
XI1532 WL_0_63_2
WL_0_63_3 CRCHORI
XI1533 BL_0_64_2
SAOUT_0_2 INVERTER1
69
Index
Direct Form, 51
Pipelining, 55
Transposed, 51
Fixed-point, 21, 28
Abstract, vi
Acknowledgments, v
Acoustics, 16
Adder, 37
ANC
Filtered-X, 3
History, 1
Leaky LMS, 3
Secondary Path, 5
Appendices, 57
Appendix
Matlab Routines, 62
ROM Compiler in PERL, 67
Architecture
Filter, 50
Area, 8, 10
Gradient, 60
Headphones, Noise Cancelling, 7
HSPICE, 69
Implementation, 34
Input Builder, 19
Introduction, 1
Leug, Paul, 1
LMS
Derivation, 58
LNS, 34
Adder, 35
Modelling, 19
Multiplication, 35
Logarithmic number system, see LNS
Lookup Operation, see ROM
Band-limited noise cancelling performance, 11
Bibliography, 74
Bow-tie Effect, 22
Column Circuits, 44
Convergence, 12, 33
Cost, 8
Matlab, 14
Media Players, 7
Model
Active Noise Control, 16
ADC, 17
Modelling, 14
Multiplexer, 37
Dedication, iv
Die Per Wafer, 9
Earlab, 17
Enhancements, 54
Error near zero (enz ), 22
Expected Value, 59
Performance, 11, 58
Pipelining, 55
Power, 10
Filter, 50, 53
70
Product Survey, 7
Results, 54
ROM, 39
ROM Cell, 42
Row Decoder, 44
Sample Rate, 54
Sense Amplifiers, 44
Settling Time, 12
Specifications, 8, 13
Test Bench, 18
Weight Update Block, 52
71
Bibliography
[1] R Schafer A Oppenheim. Discrete-time Signal Processing. Prentice-Hall,
Upper Saddle River, N.J., 2nd edition, 1983.
[2] Mark Arnold. Design of a faithful lns interpolator.
336–345, 2001.
[3] Mark Arnold. A pipelined lns alu.
Society Workshop on VLSI, 2001.
In DSD, pages
In Proc. of the IEEE Computer
[4] Mark Arnold. Geometric-mean interpolation for logarithmic number systems. In Proc. of the International Symposium on Circuits and Systems,
volume 2, pages 433–436, may 2004.
[5] Bose. Home page. http://www.bose.com.
[6] L Braida, J Rosowski, C Shera, and K Stevens. Mit open courseware,
6.551j: Acoustics of speech and hearing, 2004. http://ocw.mit.edu/OcwWeb/ElectricalEngineering-and-Computer-Science/6-551JFall-2004/LectureNotes/index.htm.
[7] David Carey. Noise cancelling headphones: analog margin-makers. Technical report, http://www.planetanalog.com/showArticle.jhtml?articleID=159402571,
2005.
[8] A. P. Chandrakasan, S. Cheng, and R. W. Broderson. Low-power cmos
digital design. 27(4):473–484, April 1992.
[9] EarLab. A virtual laboratory for auditory research. http://earlab.bu.edu.
[10] David Carnoy et al. The sound of silence. Technical report, http://reviews.cnet.com/452030007-1017728-1.html, 2005.
[11] David Harris and Neil Weste. CMOS VLSI Design : a circuits and
systems perspective. Addison-Wesley, 2005.
72
[12] Paul Howells. Technical report, 1959. US Patent 3,202,990.
[13] Erwin Kreyszig. Advanced Engineering Mathematics.
Sons, Inc., 1993. 7th edition.
John Wiley &
[14] Abhijit Kulkarni and Steven Colburn. Infinite-impulse-response models
of the head-related transfer function. Technical report, Acoustical Society of America, 2004. Hearing Research Center and Department of
Biomedical Engineering, Boston University.
[15] Sen Kuo. Design of active noise control systems with the tms320 family.
Application note, Texas Instruments, http://focus.ti.com/lit/an/spra042/spra042.pdf,
June 1996.
[16] Sen Kuo and Dennis Morgan. Active Noise Control Systems: Algorithms
and DSP Implementations. Wiley Interscience, 1996.
[17] Richard Ladner and Michael Fischer. Parallel prefix computation. Journal of the Association for Computing Machinery, pages 831–838, October
1980.
[18] Paul Lueg. Technical report, 1936. US Patent 2,043,416.
[19] MathWorks. Home page. http://www.mathworks.com.
[20] Harry Olson and Everett May. Electronic sound absorber. Acoustical
Society of America, 25:1130, November 1953.
[21] Panasonic. Home page. http://www.panasonic.com.
[22] Keshab K. Parhi. VLSI Digital Signal Processing Systems: Design and
Implementation. Wiley Interscience, 1999.
[23] Jan Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. Digital Integrated Circuits. Prentice Hall, Upper Saddle River, New Jersey, 2003.
[24] Sennheiser. Home page. http://www.sennheiser.com.
[25] Naresh Shanbhag and Keshab Parhi.
architecture. November 1991.
73
A pipelined lms adaptive filter
[26] T. Stouraitis and V. Paliouras. Considering the alternatives in low-power
design. pages 22–29, July 2001.
[27] Ferrel Stremler. Introduction to Communication Systems.
Wesley, 1990. 3rd edition.
Addison-
[28] E. E. Swartzlander, Jr. and A. G. Alexopoulos. The sign/logarithm
number system. C-24(12):1238–1242, December 1975.
[29] Fred Taylor, Rabinder Gill, Jim Joseph, and Jeff Radke. A 20 bit logarithmic number system processor. 37(2):190–200, February 1988.
[30] Bernard Widrow and Samuel Stearns. Adaptive Signal Processing. Prentice Hall, 1985.
[31] Reto Zimmerman and Wolfgang Fichtner. Low-power logic styles: Cmos
versus pass-transistor logic. 32(7), 1997.
74
Vita
Jay Brady Fletcher Jay, from Wolfforth, Texas has had a long and
illustrious career as an electrical engineer. Down the winding road of academia,
he found himself with a BS in engineering physics, a combined physics and
electrical engineering degree, from Texas Tech University. He now works at
Advanced Micro Devices on high performance microprocessor development.
Permanent address: 2600 Lake Austin Blvd #11208
Austin, Texas 78703
This report was typeset with LATEX† by the author.
† A
LT
EX is a document preparation system developed by Leslie Lamport as a special
version of Donald Knuth’s TEX Program.
75