File
Transcription
File
International Journal of Digital Communication and Networks (IJDCN) Volume 2, Issue 3, March 2015 Implementation of Fixed-Point LMS Adaptive Filter Using Computation Blocks P.Jayachithra, R.Kanagarathinam, R.Ramakala, S.Sri Kanchana Devi Abstract— an efficient architecture for the implementation of a delayed least mean square adaptive filter. A Novel partial product Generator is achieving lower adaptation-delay and Area delay consumption and propose a strategy for optimized balanced pipelining across the time-consuming combinational blocks of the structure. From synthesis results, the proposed design will offers less area-delay product (ADP) the best of the existing systolic structures, on average, for filter lengths N =8, 16, and 32. An efficient fixed-point implementation scheme of the proposed architecture, the analytical result matches with the simulation result is showed. Index Terms—Adaptive filters, circuit optimization, fixed-point arithmetic, least mean square (LMS) algorithms. Where, E (n) = d (n) − y (n) y(n) = W^Tn· x(n ) Here, (1b) X (n) is the input vector W (n) is the weight vector of nth order LMS adaptive filter at the nth iteration given by X (n) = [x (n), x (n − 1), · ·, x (n − N+1)] ^T I. INTRODUCTION The least mean square (LMS) adaptive filter is the most popular and widely used adaptive filter, because involves a long critical path due to its inner-product computation to obtain the output from filter such that the critical path is required to be reduced by pipelined implementation when it exceeds to desired Sample Period of time. But the conventional LMS algorithm does not Support for pipelined implementation because of its recursive behavior, so they are modified to a form called the delayed LMS (DLMS) Algorithm, which allows pipelined implementation of the filter. A lot of work has been done to implement the DLMS algorithm in systolic architectures to increase the frequency but, they involve an adaptation delay for filter length N this is quite high for large order filters. We proposed a 2-bit multiplication cell, and with an efficient adder tree for pipelined inner-product computation to minimize the critical path and silicon area without increasing the number of adaptation delays. The existing work on the LMS adaptive filter does not discuss with the fixed-point implementation issues, such as the place of radix point, choose of word length, and quantization at various stages of computation. Therefore, fixed-point implementations in the proposed design reduce the number of pipeline delays along with the area and delay. Wn= [wn (0), wn (1), · · ·, wn (N − 1)] ^T d (n) -desired response y(n) is the filter output of the nth iteration. e (n) is the error computed in the nth iteration which is used to update the weights the convergence-factor. The DLMS algorithm, instead of using the recent-most feedback-error e (n) corresponding to the nth iteration for updating the filter weights, it uses the delayed error e (n−m), (i.e.) the error corresponding to (n−m) the iteration for updating the current weight. The weight-update formula of DLMS algorithm is given by, Wn+1 = Wn + μ · e (n − m) · x(n − m) (2) Where, m is the adaptation-delay II. Structure of AOCs: algorithm calculates the filter output and finds the difference between the computed output and the desired response. Using this difference the filter weights are updated in each rotation. During the nth iteration LMS algorithm updates the weights as follows Wn+1= Wn+ μ · e(n) · x(n) (1a) Manuscript received March, 2015 P.Jayachithra, PGStudent, Kalasalingam Institute of Technology. Email id: [email protected] R.Kanagarathinam PGStudent, Kalasalingam Institute of Technology. R.Ramakala, PGStudent, Kalasalingam Institute of Technology. S.Sri Kanchana Devi, PGStudent, Kalasalingam Institute of Technology. Fig .1. Structure of AND OR CELL 21 All Rights Reserved © 2014 IJDCN International Journal of Digital Communication and Networks (IJDCN) Volume 2, Issue 3, March 2015 The structure of conventional delayed LMS adaptive filter is shown in Figure. It can be seen that the adaptation delay m is the number of cycles required for the error corresponding to any given sampling instant to become available to the weight adaptation circuit. III.ERROR COMPUTATION BLOCK The proposed structure for error-computation unit of an N-tap DLMS adaptive filter is shown in Fig. 4. It consists of N number of 2-b partial product generators (PPG) corresponding to N multipliers and a cluster of L/2 binary adder trees, followed by a single shift–add tree. Each sub block is described in detail. The structure of each PPGis shown in Fig. It consists of L/2 number of 2-to-3 decoders and the same number of AND/OR cells (AOC).1 each of the 2-to-3 decoders takes a 2-b digit (u1u0) as input and produces three outputs b0 = u0 · . u1, b1 =. u0 · u1, and b2 = u0 · u1, such that b0 = 1 for (u1u0) =1, b1 = 1 for (u1u0) = 2, and b2 = 1 for (u1u0) =3. The decoder output b0, b1 and b2 along with w, 2w, and 3w are fall to an AOC, where w, 2w, and 3w are in 2‟s complement considerations and sign-extended to have (W + 2) bits each. To take care of the sign of the input samples while computing the partial product corresponding to the most significant digit (MSD), i.e., (uL−1uL−2) of the input sample, the AOC (L/2 − 1) is fed with w, −2w, and −w as input since (uL−1uL−2) can have four possible values 0, 1, −2, and −1. well as the weight-update block for the next iteration. Fig 3.Proposed structure of the weight-update block. V.FIXED-POINT SIMULATION ANALYSIS In this section, we discuss the fixed-point implementation and optimization of the proposed DLMS adaptive filter. A bit level pruning of the adder tree is also proposed to reduce the area Fig.4. Fixed Point Representation of Binary Number Fig 2.Pipelined Structure of the Error-Computation Block IV.WEIGHT-UPDATE BLOCK The proposed structure for the weight-update block is shown in Fig. 8. It performs N multiply-accumulate operations of the form (μ × e) × xi + wi to update N filter weights. The step size μ is taken as a negative power of 2 to realize the multiplication with recently available error only by a shift operation each of the MAC units therefore performs the multiplication of the shifted value of error with the delayed input samples xi followed by the additions with the corresponding old weight values wi . All the N multiplications for the MAC operations are performed by N PPGs, followed by N shift add trees. Each of the PPGs creates L/2 partial products corresponding to the product of the recently shifted error value μ × e with L/2, the number of 2-bit digits of the input word xi, where the sub expression 3μ×e is shared within the multiplier. The final Result of MAC units constitute the desired updated weight to be used as inputs to the error computation block as A. Fixed-Point Design Considerations For fixed-point implementation, the choice of word lengths and radix points for input samples, weights, and internal signals need to be decided. Fig. 9 shows the fixed-point representation of a binary number. Let (X, Xi) be a fixed-point representation of a binary number where X is the word length and Xi is the integer length. The word length and location of radix point of xn and wn in Fig. 4 need to be predetermined by the hardware designer taking the design constraints, such as desired accuracy and hardware complexity, into consideration. Assuming (L, Li) and(W,Wi), respectively, as the representations of input signals and filter weights, all other signals can be decided as shown in The signal pi j , which is the output of PPG block has at most three times the value of input coefficients. Thus, we can add two more bits to the word length and to the integer length of the coefficients to avoid overflow. B. Computer Simulation of the Proposed DLMS Filter The proposed fixed-point DLMS adaptive filter is used for system identification used. μ is set to 0.5, 0.25, and 0.125 for filter lengths 8, 16, and 32, respectively, such that 22 All Rights Reserved © 2014 IJDCN International Journal of Digital Communication and Networks (IJDCN) Volume 2, Issue 3, March 2015 the multiplication with μ does not require any additional circuits. For the fixed-point simulation, the word length and radix point of the input and coefficient are set to L = 16, Li = 2, W=16, Wi = 0, and the The fixed-point data type of all the other signals is obtained from. Each learning curve is averaged over 50 runs to obtain a clean curve. The proposed design was coded in C++ using System fixed-point library for different orders of the band-pass filter, that is, N = 8. VI.PERFOMANCE RESULTS If we consider each multiplier to have (L − 1) adders, then the existing designs involve 16N adders, while the propose done involves 10N +2 adders for L = 8. This section evaluates the performance of the proposed modified least mean square (LMS) algorithm and shows the simulation results. The first result declares about the output of LMS adaptive filter with delay. It is having some delay in the output of Delayed Least Mean Square adaptive filter. The result declares about the output of LMS adaptive filter without delay. After the clock input has given the output of the adaptive filter is achieved without delay. The Modelsim is the tool used here to check the performance of LMS adaptive filter. It is a complete HDL simulation environment that enables to verify the source code and functional and timing models using test bench Fig.7. Output of Weight Update Block VII CONCLUSION We proposed an area–delay-power efficient low adaptation delay structure for fixed-point implementation of LMS adaptive filter. We used a new PPG for efficient implementation of general multiplications and inner-product computation by common sub expression sharing. Besides, we have proposed an efficient addition scheme for inner-product computation to reduce the adaptation delay significantly in order to achieve faster convergence performance and to reduce the critical path to support high input-sampling rates. Aside from this, we proposed a strategy for optimized balanced pipelining across the time-consuming blocks of the structure to reduce the adaptation delay and power consumption, as well. References [1] B. Widrow and S. D. Stearns, Adaptive Signal Processing., Englewood Cliffs,.NJ, USA: Prentice-Hall, 1985. [2] S. Haykin and B. Widrow, Least-Mean-Square Adaptive Filters. Hoboken, NJ, USA: Wiley, 2003. Fig.5. Output of And Or Cell [3] M. D. Meyer and D. P. Agrawal, “A modular pipelined implementation of a delayed LMS transversal adaptive filter,” in Proc., IEEE Int. Symp. Circuits Syst., May 1990, pp. 1943–1946. [4]G. Long, F. Ling, and J. G. Proakis, “The LMS algorithm with delayed coefficient adaptation,” IEEE Trans. Acoust., Speech, Signal Process. vol. 37, no. 9, pp. 1397–1405, Sep. 1989. , [5]G. Long, F. Ling, and J. G. Proakis, “Corrections to „The LMS algorithm with delayed coefficient adaptation‟,” IEEE Trans. Signal Process.vol. 40, no. 1, pp. 230–232, Jan. 1992. , [6]H. Herzberg and R. Haimi-Cohen, “A systolic array realization of an LMS adaptive filter and the effects of delayed adaptation,” .IEEE Trans Signal Process., vol. 40, no. 11, pp. 2799–2803, Nov. 1992. , Fig.6. Output of Error Computation Block 23 All Rights Reserved © 2014 IJDCN