File

Transcription

File
International Journal of Digital Communication and Networks (IJDCN)
Volume 2, Issue 3, March 2015
Implementation of Fixed-Point LMS Adaptive
Filter Using Computation Blocks
P.Jayachithra, R.Kanagarathinam, R.Ramakala, S.Sri Kanchana Devi

Abstract— an efficient architecture for the implementation of a
delayed least mean square adaptive filter. A Novel partial
product Generator is achieving lower adaptation-delay and
Area delay consumption and propose a strategy for optimized
balanced pipelining across the time-consuming combinational
blocks of the structure. From synthesis results, the proposed
design will offers less area-delay product (ADP) the best of the
existing systolic structures, on average, for filter lengths N =8,
16, and 32. An efficient fixed-point implementation scheme of
the proposed architecture, the analytical result matches with the
simulation result is showed.
Index Terms—Adaptive filters, circuit optimization, fixed-point
arithmetic, least mean square (LMS) algorithms.
Where,
E (n) = d (n) − y (n)
y(n) = W^Tn· x(n )
Here,
(1b)
X (n) is the input vector
W (n) is the weight vector of nth order LMS adaptive filter at
the nth iteration given by
X (n) = [x (n), x (n − 1), · ·, x (n − N+1)] ^T
I. INTRODUCTION
The least mean square (LMS) adaptive filter is the most
popular and widely used adaptive filter, because involves a
long critical path due to its inner-product computation to
obtain the output from filter such that the critical path is
required to be reduced by pipelined implementation when it
exceeds to desired Sample Period of time. But the
conventional LMS algorithm does not Support for pipelined
implementation because of its recursive behavior, so they are
modified to a form called the delayed LMS (DLMS)
Algorithm, which allows pipelined implementation of the
filter. A lot of work has been done to implement the DLMS
algorithm in systolic architectures to increase the frequency
but, they involve an adaptation delay for filter length N this is
quite high for large order filters. We proposed a 2-bit
multiplication cell, and with an efficient adder tree for
pipelined inner-product computation to minimize the critical
path and silicon area without increasing the number of
adaptation delays. The existing work on the LMS adaptive
filter does not discuss with the fixed-point implementation
issues, such as the place of radix point, choose of word length,
and quantization at various stages of computation. Therefore,
fixed-point implementations in the proposed design reduce
the number of pipeline delays
along with the area and delay.
Wn= [wn (0), wn (1), · · ·, wn (N − 1)] ^T
d (n) -desired response y(n) is the filter output of the nth
iteration.
e (n) is the error computed in the nth iteration which is used to
update the weights the convergence-factor.
The DLMS algorithm, instead of using the recent-most
feedback-error e (n) corresponding to the nth iteration for
updating the filter weights, it uses the delayed error e (n−m),
(i.e.) the error corresponding to (n−m) the iteration for
updating the current weight. The weight-update formula of
DLMS algorithm is given by,
Wn+1 = Wn + μ · e (n − m) · x(n − m)
(2)
Where,
m is the adaptation-delay
II. Structure of AOCs:
algorithm calculates the filter output and finds the difference
between the computed output and the desired response. Using
this difference the filter weights are updated in each rotation.
During the nth iteration LMS algorithm updates the weights
as follows
Wn+1= Wn+ μ · e(n) · x(n)
(1a)
Manuscript received March, 2015
P.Jayachithra, PGStudent, Kalasalingam Institute of Technology. Email
id: [email protected]
R.Kanagarathinam PGStudent, Kalasalingam Institute of Technology.
R.Ramakala, PGStudent, Kalasalingam Institute of Technology.
S.Sri Kanchana Devi, PGStudent, Kalasalingam Institute of Technology.
Fig .1. Structure of AND OR CELL
21
All Rights Reserved © 2014 IJDCN
International Journal of Digital Communication and Networks (IJDCN)
Volume 2, Issue 3, March 2015
The structure of conventional delayed LMS adaptive filter is
shown in Figure. It can be seen that the adaptation delay m is
the number of cycles required for the error corresponding to
any given sampling instant to become available to the weight
adaptation circuit.
III.ERROR COMPUTATION BLOCK
The proposed structure for error-computation unit of
an N-tap DLMS adaptive filter is shown in Fig. 4. It consists
of N number of 2-b partial product generators (PPG)
corresponding to N multipliers and a cluster of L/2 binary
adder trees, followed by a single shift–add tree. Each sub
block is described in detail. The structure of each PPGis
shown in Fig. It consists of L/2 number of 2-to-3 decoders and
the same number of AND/OR cells (AOC).1 each of the
2-to-3 decoders takes a 2-b digit (u1u0) as input and produces
three outputs b0 = u0 · . u1, b1 =. u0 · u1, and b2 = u0 · u1,
such that b0 = 1 for (u1u0) =1, b1 = 1 for (u1u0) = 2, and b2 =
1 for (u1u0) =3. The decoder output b0, b1 and b2 along with
w, 2w, and 3w are fall to an AOC, where w, 2w, and 3w are in
2‟s complement considerations and sign-extended to have (W
+ 2) bits each. To take care of the sign of the input samples
while computing the partial product corresponding to the
most significant digit (MSD), i.e., (uL−1uL−2) of the input
sample, the AOC (L/2 − 1) is fed with w, −2w, and −w as input
since (uL−1uL−2) can have four possible values 0, 1, −2, and
−1.
well as the weight-update block for the next iteration.
Fig 3.Proposed structure of the weight-update block.
V.FIXED-POINT SIMULATION ANALYSIS
In this section, we discuss the fixed-point implementation and
optimization of the proposed DLMS adaptive filter. A bit
level pruning of the adder tree is also proposed to reduce the
area
Fig.4. Fixed Point Representation of Binary Number
Fig 2.Pipelined Structure of the Error-Computation Block
IV.WEIGHT-UPDATE BLOCK
The proposed structure for the weight-update block
is shown in Fig. 8. It performs N multiply-accumulate
operations of the form (μ × e) × xi + wi to update N filter
weights. The step size μ is taken as a negative power of 2 to
realize the multiplication with recently available error only by
a shift operation each of the MAC units therefore performs the
multiplication of the shifted value of error with the delayed
input samples xi followed by the additions with the
corresponding old weight values wi . All the N multiplications
for the MAC operations are performed by N PPGs, followed
by N shift add trees. Each of the PPGs creates L/2 partial
products corresponding to the product of the recently shifted
error value μ × e with L/2, the number of 2-bit digits of the
input word xi, where the sub expression 3μ×e is shared within
the multiplier.
The final Result of MAC units constitute the desired updated
weight to be used as inputs to the error computation block as
A. Fixed-Point Design Considerations
For fixed-point implementation, the choice of word
lengths and radix points for input samples, weights, and
internal signals need to be decided. Fig. 9 shows the
fixed-point representation of a binary number. Let (X, Xi) be a
fixed-point representation of a binary number where X is the
word length and Xi is the integer length. The word length and
location of radix point of xn and wn in Fig. 4 need to be
predetermined by the hardware designer taking the design
constraints, such as desired accuracy and hardware
complexity, into consideration. Assuming (L, Li) and(W,Wi),
respectively, as the representations of input signals and filter
weights, all other signals can be decided as shown in The
signal pi j , which is the output of PPG block has at most three
times the value of input coefficients. Thus, we can add two
more bits to the word length and to the integer length of the
coefficients to avoid overflow.
B. Computer Simulation of the Proposed DLMS Filter
The proposed fixed-point DLMS adaptive filter is
used for system identification used. μ is set to 0.5, 0.25, and
0.125 for filter lengths 8, 16, and 32, respectively, such that
22
All Rights Reserved © 2014 IJDCN
International Journal of Digital Communication and Networks (IJDCN)
Volume 2, Issue 3, March 2015
the multiplication with μ does not require any additional
circuits. For the fixed-point simulation, the word length and
radix point of the input and coefficient are set to L = 16, Li =
2, W=16, Wi = 0, and the The fixed-point data type of all the
other signals is obtained from. Each learning curve is
averaged over 50 runs to obtain a clean curve. The proposed
design was coded in C++ using System fixed-point library for
different orders of the band-pass filter, that is, N = 8.
VI.PERFOMANCE RESULTS
If we consider each multiplier to have (L − 1) adders,
then the existing designs involve 16N adders, while the
propose done involves 10N +2 adders for L = 8. This section
evaluates the performance of the proposed modified least
mean square (LMS) algorithm and shows the simulation
results. The first result declares about the output of LMS
adaptive filter with delay. It is having some delay in the output
of Delayed Least Mean Square adaptive filter. The result
declares about the output of LMS adaptive filter without
delay. After the clock input has given the output of the
adaptive filter is achieved without delay. The Modelsim is the
tool used here to check the performance of LMS adaptive
filter. It is a complete HDL simulation environment that
enables to verify the source code and functional and timing
models using test bench
Fig.7. Output of Weight Update Block
VII CONCLUSION
We proposed an area–delay-power efficient low
adaptation delay structure for fixed-point implementation of
LMS adaptive filter. We used a new PPG for efficient
implementation of general multiplications and inner-product
computation by common sub expression sharing. Besides, we
have proposed an efficient addition scheme for inner-product
computation to reduce the adaptation delay significantly in
order to achieve faster convergence performance and to
reduce the critical path to support high input-sampling rates.
Aside from this, we proposed a strategy for optimized
balanced pipelining across the time-consuming blocks of the
structure to reduce the adaptation delay and power
consumption, as well.
References
[1] B. Widrow and S. D. Stearns, Adaptive Signal Processing.,
Englewood Cliffs,.NJ, USA: Prentice-Hall, 1985.
[2] S. Haykin and B. Widrow, Least-Mean-Square Adaptive
Filters. Hoboken, NJ, USA: Wiley, 2003.
Fig.5. Output of And Or Cell
[3] M. D. Meyer and D. P. Agrawal, “A modular pipelined
implementation of a delayed LMS transversal adaptive filter,” in
Proc., IEEE Int. Symp. Circuits Syst., May 1990, pp. 1943–1946.
[4]G. Long, F. Ling, and J. G. Proakis, “The LMS algorithm with
delayed coefficient adaptation,” IEEE Trans. Acoust., Speech,
Signal Process. vol. 37, no. 9, pp. 1397–1405, Sep. 1989. ,
[5]G. Long, F. Ling, and J. G. Proakis, “Corrections to „The
LMS algorithm with delayed coefficient adaptation‟,” IEEE Trans.
Signal Process.vol. 40, no. 1, pp. 230–232, Jan. 1992. ,
[6]H. Herzberg and R. Haimi-Cohen, “A systolic array realization of
an LMS adaptive filter and the effects of delayed adaptation,” .IEEE
Trans Signal Process., vol. 40, no. 11, pp. 2799–2803, Nov. 1992. ,
Fig.6. Output of Error Computation Block
23
All Rights Reserved © 2014 IJDCN