Adaptive filtering algorithms for acoustic echo and noise cancellation
Transcription
Adaptive filtering algorithms for acoustic echo and noise cancellation
Adaptive filtering algorithms for acoustic echo and noise cancellation Geert Rombouts 25th april 2003 KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT TOEGEPASTE WETENSCHAPPEN DEPARTEMENT ELEKTROTECHNIEK Kasteelpark Arenberg 10, 3001 Leuven (Heverlee) Adaptive filtering algorithms for acoustic echo and noise cancellation Proefschrift voorgedragen tot het behalen van het doctoraat in de toegepaste wetenschappen door Geert ROMBOUTS Jury : Prof. Prof. Prof. Prof. Prof. Prof. Prof. dr. dr. dr. dr. dr. dr. dr. ir. ir. ir. ir. ir. ir. ir. E. Aernoudt, voorzitter M. Moonen, promotor D. Van Compernolle B. De Moor S. Van Huffel P. Sommen (TU Eindhoven) I. K. Proudler (King’s College, UK) UDC 681.3*I12:534 April 2003 Copyright Katholieke Universiteit Leuven - Faculteit Toegepaste Wetenschappen Arenbergkasteel, B-3001 Heverlee Alle rechten voorbehouden. Niets uit deze uitgave mag vermenigvuldigd en/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm, elektronisch of op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uitgever. All rights reserved. No part of this publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the publisher. D/2003/7515/13 ISBN 90-5682-402-3 Voor mijn grootmoeder, Maria Jonckers 4 Abstract In this thesis, we develop a number of algorithms for acoustic echo and noise cancellation. We derive a fast exact implementation for the affine projection algorithm (APA), and we also show that when using strong regularization the existing (approximating) fast techniques exhibit problems. We develop a number of algorithms for noise cancellation based on optimal filtering techniques for multi–microphone systems. By using QR–decomposition based techniques, a complexity reduction of a factor 50 to 100 is achieved compared to existing implementations. Finally, we show that instead of using a cascade of a noise–cancellation system and an echo–cancellation system, it is better to solve the combined problem as a global optimization problem . The aforementioned noise reduction techniques can be used to solve this optimization problem. 5 List of symbols B(k) : Right hand side in QRD–RLS based noise reduction equation d(k) : Desired signal of an adaptive filter at time k d1 (k) d2 (k) d3 (k) : Desired signals for multiple right hand sides d(k) : Vector with recent desired signal samples δ : Regularisation parameter (diagonal loading) e(k) : Error signal of an adaptive filter ε{} : Expected value operator f (k) : Loudspeaker reference signal G : Givens rotation gi : Far end room paths hi : Near end room paths λ : Forgetting factor (weighting factor) λn : Forgetting factor during noise–only periods λs : Forgetting factor during speech+noise–periods M : Number of channels µ : Stepsize n(k) : Noise signal N : Number of filter taps per channel Naec : Number of taps in the AEC part in AENC Q(k) : Orthogonal matrix Q in a QR–decomposition R(k) : Upper triangular matrix R in a QR–decomposition Σ : Diagonal matrix in an SVD–decomposition σi : Singular value u(k) : Input vector with microphone signals and echo reference v(k) : Acoustical disturbance signal 6 v(k) : Vector with recent disturbance samples V (k) : Toeplitz matrix with disturbance signal w(k) : Filter coefficient vector. A subscript may specify the algorithm used. W (k) : Matrix of which the columns are filter vectors x(k) : Input signal x(k) : Input vector X(k) : Toeplitz matrix with input signal Ξ(k) : Input correlation matrix y(k) : Output of adaptive filter N : Convolution symbol Contents 1 2 Speech signal enhancement 13 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.1 Nature of acoustical disturbances . . . . . . . . . . . . . . . 14 1.2.2 AEC, reference–based noise reduction . . . . . . . . . . . . . 15 1.2.3 ANC, reference–less noise reduction . . . . . . . . . . . . . . 19 1.2.4 Combined AEC and ANC . . . . . . . . . . . . . . . . . . . 21 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4 The market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Adaptive filtering algorithms 29 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Normalized Least Mean Squares algorithm . . . . . . . . . . . . . . 33 2.3 Recursive Least Squares algorithms . . . . . . . . . . . . . . . . . . 35 2.3.1 Standard recursive least squares . . . . . . . . . . . . . . . . 35 2.3.2 QRD–updating . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.3 QRD–based RLS algorithm (QRD–RLS) . . . . . . . . . . . 38 7 8 CONTENTS 2.3.4 QRD–based least squares lattice (QRD–LSL) . . . . . . . . . 42 2.3.5 RLS versus LMS . . . . . . . . . . . . . . . . . . . . . . . . 43 Affine Projection based algorithms . . . . . . . . . . . . . . . . . . . 45 2.4.1 The affine projection algorithm . . . . . . . . . . . . . . . . 45 2.4.2 APA versus LMS . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4.3 The Fast Affine Projection algorithm (FAP) . . . . . . . . . . 46 2.5 Geometrical interpretation . . . . . . . . . . . . . . . . . . . . . . . 47 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4 3 4 APA–regularization and Sparse APA for AEC 51 3.1 APA regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.1 Diagonal loading . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.2 Exponential weighting . . . . . . . . . . . . . . . . . . . . . 55 3.2 APA with sparse equations . . . . . . . . . . . . . . . . . . . . . . . 56 3.3 FAP and the influence of regularization . . . . . . . . . . . . . . . . 61 3.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5 Regularization in multichannel AEC . . . . . . . . . . . . . . . . . . 64 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Block Exact APA (BEAPA) for AEC 71 4.1 Block Exact Fast Affine Projection (BEFAP) . . . . . . . . . . . . . . 72 4.2 Block Exact APA (BEAPA) . . . . . . . . . . . . . . . . . . . . . . . 75 4.2.1 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2.2 Complexity reduction . . . . . . . . . . . . . . . . . . . . . . 78 4.2.3 Algorithm specification . . . . . . . . . . . . . . . . . . . . . 79 Sparse Block Exact APA . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3.1 81 4.3 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 CONTENTS 4.4 5 Complexity reduction . . . . . . . . . . . . . . . . . . . . . . 82 4.3.3 Algorithm specification . . . . . . . . . . . . . . . . . . . . . 84 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 QRD–RLS based ANC 87 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Unconstrained optimal filtering based ANC . . . . . . . . . . . . . . 89 5.3 QRD–based algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3.1 Speech+noise – mode . . . . . . . . . . . . . . . . . . . . . 93 5.3.2 Noise only–mode. . . . . . . . . . . . . . . . . . . . . . . . 95 5.3.3 Residual extraction . . . . . . . . . . . . . . . . . . . . . . . 95 5.3.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3.5 Algorithm description . . . . . . . . . . . . . . . . . . . . . 97 Trading off noise reduction vs. signal distortion . . . . . . . . . . . . 97 5.4.1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4.2 Speech+noise mode . . . . . . . . . . . . . . . . . . . . . . 99 5.4.3 Noise–only mode . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4 6 4.3.2 5.5 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Fast QRD–LSL–based ANC 109 6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.2 Modified QRD–RLS based algorithm . . . . . . . . . . . . . . . . . 112 6.3 6.2.1 Speech+noise–mode . . . . . . . . . . . . . . . . . . . . . . 112 6.2.2 Noise–only mode . . . . . . . . . . . . . . . . . . . . . . . . 115 QRD–LSL based algorithm . . . . . . . . . . . . . . . . . . . . . . 119 10 CONTENTS 6.4 6.5 7 6.3.1 Per sample versus per vector classification . . . . . . . . . . . 119 6.3.2 LSL–algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 124 Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.4.1 Transition from speech+noise to noise–only mode . . . . . . 125 6.4.2 Transition from a noise–only to a speech+noise–period . . . . 131 Noise reduction vs. signal distortion trade–off . . . . . . . . . . . . . 132 6.5.1 Regularization in QRD–LSL based ANC . . . . . . . . . . . 132 6.5.2 Regularization using a noise buffer . . . . . . . . . . . . . . . 133 6.5.3 Mode–dependent regularization . . . . . . . . . . . . . . . . 135 6.6 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.7 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Integrated noise and echo cancellation 143 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.2 Optimal filtering based AENC . . . . . . . . . . . . . . . . . . . . . 145 7.3 Data driven approach . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.4 QRD–RLS based algorithm . . . . . . . . . . . . . . . . . . . . . . . 151 7.4.1 Speech+noise/echo updates . . . . . . . . . . . . . . . . . . 151 7.4.2 Noise/echo–only updates . . . . . . . . . . . . . . . . . . . 152 7.5 QRD–LSL algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.6 Regularized AENC . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.6.1 Regularization using a noise/echo buffer . . . . . . . . . . . . 155 7.6.2 Mode–dependent regularization . . . . . . . . . . . . . . . . 156 7.7 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.8 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 7.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 CONTENTS 8 Conclusions 11 165 12 CONTENTS Chapter 1 Speech signal enhancement A microphone often picks up acoustical disturbances together with a speaker’s voice (which is the signal of interest). In this work, algorithms will be developed for techniques that allow for removing these disturbances from the speech signal before further processing it. 1.1 Overview In general, more than one type of disturbances will be present in a microphone signal, each of them requiring a specific enhancement approach. We will mainly focus on two classes of speech enhancement techniques, namely acoustic echo cancellation (AEC) (section 1.2.2) and acoustic noise cancellation (ANC) (section 1.2.3). For AEC, a whole range of algorithms exists, from computationally cheap to expensive, with of course a corresponding performance. We will focus on one of the ’intermediate’ types of algorithms, of which the performance and complexity can be tuned depending on the available computational power. We will describe some methods to increase noise robustness, we will show how existing fast implementations fail when their assumptions are violated, and we will derive a fast implementation which does not require any assumptions. For ANC, a class of promising state of the art techniques exists of which the characteristics could be complementary to the features of computationally cheaper (and commercially available) techniques. Existing algorithms for these techniques have a high numerical complexity, and hence are not suited for real time implementation. This observation motivates our work in the field of acoustic noise cancellation, and we describe a number of algorithms that are (several orders of magnitude) cheaper than 13 14 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT existing implementations, and hence allow for real time implementation. Finally we will show that considering the combined problem of acoustic echo and noise cancellation as a global optimization problem leads to better results than using traditional cascaded schemes. The techniques which we use for ANC can easily be modified to incorporate AEC. The outline of this first chapter is as follows. After a problem statement in section 1.2, we will describe a number of applications in which acoustic echo– and noise cancelling techniques prove useful in section 1.3. In section 1.4, an overview of commercially available applications in this field is given. In section 1.5 our own contributions are summarized. Section 1.6 gives an outline of the remains of the thesis. 1.2 1.2.1 Problem statement Nature of acoustical disturbances In many applications involving speech communication, it is difficult (expensive) to place microphones closely to the speakers. The microphone amplification then has to be large due to the large distance to the speech source. As a result, more environmental noise will be picked up than in the case where the microphones would be close to the speech source. For some of these disturbances, a reference signal may be available. For example a radio may be playing in the background while someone is making a telephone call. The electrical signal that is fed to the radio’s loudspeaker can be used as a reference signal for the radio sound reaching the telephone’s microphone. We will call the techniques that rely on the presence of a reference signal ’acoustic echo cancellation techniques’ (AEC), the reason for this name will become clear below. For other types of disturbances, no reference signal is available. Examples of such disturbances are the noise of a computer fan, people who are babbling in the room where someone is using a telephone, car engine noise, ... Techniques that perform disturbance reduction where no reference signal is available will be called ’acoustic noise cancellation techniques’ (ANC) in this text. In some situations the above two noise reduction techniques should be combined with a third enhancement technique, namely dereverberation. Each acoustical environment has an impulse response, which results in a spectral coloration or reverberation of sounds that are recorded in that room. This reverberation is due to reflections of the sound against walls and objects, and hence has specific spatial characteristics, other than those of the original signal. The human auditory system deals with this effectively because it has the ability to concentrate on sounds coming from a certain direction, using information from both ears. If for example one would hear a signal 15 1.2. PROBLEM STATEMENT recorded by only one microphone in a reverberant room, speech signals may easily become unintelligible. Of course also voice recognition systems that are trained on non–reverberated speech will have difficulties handling signals that have been filtered by the room impulse response, and hence dereverberation is necessary. In this thesis, we will concentrate on algorithms for both classes of noise reduction (noise reduction with (AEC) and without (ANC) a reference signal). Dereverberation will not be treated here (we refer to [32, 40, 3] for dereverberation techniques). 1.2.2 AEC, reference–based noise reduction The most typical application of noise reduction in case a reference signal is available, is acoustic echo cancellation (AEC). As mentioned before, we will use the term AEC to refer to the technique itself, even though the disturbance which is reduced is not always strictly an ’echo’. Single channel techniques. A teleconferencing setup consists of two conference rooms (see Figure 1.1) in both of which microphones and loudspeakers are installed. Near end room Far end Speech AEC Near end Speech Far end room Figure 1.1: Acoustic echo cancellation. The loudspeaker signal in the near end room is picked up by the microphone, and would be sent back to the far end room without an echo canceller, where the far end speaker would hear his own voice again (delayed by the communication setup). Sound picked up by the microphones in one room (called the ’far end speech’ and the ’far end room’) is reproduced by the loudspeakers in the other (near end) room. The task of an ’echo canceller’ is to avoid that the portion of the far–end speech signal, which is picked up by the microphones in the near end room, is sent back to the far end. Hearing his own delayed voice will be very annoying to the far end speaker. 16 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT A similar example is voice control of a CD–player. The music itself then can be considered a disturbance (echo) to the voice control system. The loudspeaker signal in both cases is ’filtered’ by the room impulse response. This impulse response is the result of the sound being reflected and attenuated (in a frequency dependent way) by the walls and by objects in the room. Due to the nature of this process, the room acoustics can be modeled by a finite impulse response (FIR) filter. Nonlinear effects (mostly by loudspeaker imperfections) are not considered here. In an acoustic echo cancellation algorithm, a model of the room impulse response is identified. Since the conditions in the room may vary continuously (people moving around being an obvious example), the model needs to be updated continuously. This is done by means of adaptive filtering techniques. In the situation in Figure 1.2 the far end signal x(k) is filtered by the room impulse response, and then picked up by a microphone, together with the desired speech signal of the near end speaker. We consider digital signal processing techniques, hence A/D converted signals, i.e. discrete–time signals and systems. At the same time, the loudspeaker signal x(k) is filtered by a model w(k) of the room impulse response wreal , and subtracted from the microphone signal d(k) : e(k) = d(k) − wT (k)x(k). During periods where the near end speaker is silent, the error (residual) signal e(k) may be used to update w(k), but when the near end speaker is talking, this signal would disturb the adaptation process. We assume that the room characteristics do not change too much during the periods in which near end speech is present, and the adaptation is frozen in these periods by a control algorithm in order to solve this problem. x(k) = [x(k) ... x(k−N+1)] Far End Signal wk − e(k) ++ d(k) Near end Speech Near end room Figure 1.2: Echo canceller : typical situation. In the acoustic echo canceller scheme, the adaptive filtering structure (see also Figure 17 1.2. PROBLEM STATEMENT 2.1) is easily recognized. The input signal to this adaptive filter is the loudspeaker signal x(k) (the reference signal), the desired signal for the filter is the microphone signal d(k), and the error signal e(k) of the adaptive filter is used as the output signal for the AEC scheme. In practice, the length of the room acoustics (and by consequence also the impulse response length of the model w(k)) can easily be 2000 filter taps (even for a rather low sampling frequency of 8 kHz). This is the reason why people often use the celebrated and computationally cheap Normalized Least Mean Squares (NLMS) adaptive filter (see section 2.2), or even cheaper frequency domain derivatives of it for adapting w(k) [19, 18]. The disadvantage of NLMS is its often poor performance for non–white input signals (like speech). While NLMS is a cheap algorithm, the Recursive Least Squares (RLS) algorithm (section 2.3) has a higher performance, and fast variants are indeed used for acoustic echo cancellation [9, 8, 20, 21]. However, due to its complexity, efforts have been done to find algorithms that combine the low complexity of NLMS with the performance of RLS. Most notably are the Fast Newton Transversal filter (FNTF) [42] and fast variants [26] of the Affine Projection Adaptive (APA) [43] filter (see section 2.4). In this thesis, we derive a number of contributions to the field of APA–filtering. The performance advantage offered by these filters compared to NLMS, is due to a ’prewhitening’ structure that removes the correlation from the reference signal. As will be shown later, further signal processing may require multiple microphones (a microphone array) that pick up the sound in the room. The echo canceller structure then obviously has to be repeated for each of the microphones, as shown in Figure 1.3. The prewhitening stage, however, can be ’shared’ among the different microphones in X (k) Far End − e (k) 1 e2 (k) + − − + − d1(k) Near end Speech d2(k) Figure 1.3: Multi–microphone acoustic echo canceller. The single channel setup can simply be repeated a multi–microphone setup. 18 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT An acoustic echo canceller never consists of the adaptive filter alone, but always requires some control logic. The adaptive filter is in practice never updated when near end speech is present, and only updated if there is far end signal available. The decision can e.g. be based upon measurements of the correlation of the residual signal e(k) with the loudspeaker signal. In this text, however, this control device will not be considered. All experiments have been done with a ’perfect’ control device, i.e. speech periods have been marked manually. In the acoustic echo canceller context, it is important that the decision device never allows the filter to adapt during a double–talk period (when both far end and near end speaker active), since then the adaptation would be disturbed by the near end signal, and the coefficients would converge to wrong values. The other situation is less problematic : when a period in which only far end talk is present is labeled as double–talk, the echo canceller would not adapt. If this would happen often, the overall convergence would just be somewhat slower. We refer to the literature [10, 25, 31, 45] from which a suitable implementation can be picked. Multichannel techniques. Multi–channel techniques for acoustic echo cancellation [4, 28, 2, 41] should not be confused with multi–microphone techniques. In a multi– microphone–setup, all adaptive filters have the same input signal (the mono loudspeaker signal), while in a multichannel–setup, multiple loudspeakers (or reference signals) are used, see Figure 1.4. An application example is a stereo setup used for X (k) 1 X2(k) Far End Near end Speech − + − + d1(k) Figure 1.4: Multi–channel acoustic echo canceller. The fundamental problem of stereophonic AEC tends to occur in this case, and decorrelation of the loudspeaker signals is necessary to achieve good performance 1.2. PROBLEM STATEMENT 19 teleconferencing in order to provide the listener with a comfortable spatial impression. While the extension of the single channel techniques to multiple microphones is trivial, multi–channel AEC on the other hand is highly non–trivial. A specific problem with multichannel echo cancellation is the non–uniqueness [4, 20, 5, 24, 2] of the solution. This is sometimes referred to as the ’fundamental problem’ of stereophonic echo cancellation. Since all loudspeaker signals stem from the same sound source in the far end room, their joint correlation matrix may be rank–deficient. As a result, there is not a single solution for a multi–channel echo canceller, but a solution space. The echo canceller may find a solution for which the output signal is zero in the absence of near end speech, while the filter is not converged to the real room impulse response (see section 3.5). As a result, the slightest change in the far end room impulse response, may destroy the successful echo cancellation. For multichannel echo cancellation both a change in the transmitting– and in the receiving room will have this effect. Even if this situation would not occur, still the problem becomes ill conditioned if both far–end signals are correlated. This often results in a large sensitivity to noise that may be present in d(k), for example due to continuously present background noise in the near end room. This also indicates that proper measures should be used for the evaluation of different algorithms. One should not only look to the energy in the residual echo signal, because it can indeed be small or zero while the filter has not yet converged to the real echo path. For simulated environments, the room acoustics path is known, and hence the distance between this path and the echo canceller path can be plotted. While this is only feasible in artificial setups, it is the only ’correct’ way to evaluate the convergence behaviour of an echo canceller, especially in the multi channel case. 1.2.3 ANC, reference–less noise reduction The signal picked up by the microphone will in realistic situations often also contain disturbance components for which no reference signal is available. Also for this case, multiple approaches to noise cancellation exist. Single channel techniques A microphone picks up a signal of interest, together with noise. Single microphone approaches to noise cancellation will try to estimate the spectral content of the noise (during periods where the signal of interest is absent), and —assuming that the noise signal is stationary— compensate for this spectrum in the spectrum of the microphone input signal whenever the signal of interest is present. The technique is commonly called ’spectral subtraction’ [16, 17]. Single channel approaches are known to perform poorly when the noise source is non–stationary, and when the spectral content of the noise source and of the signal of interest are similar. 20 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT Multi–channel techniques In multi–channel acoustic noise cancellation, a microphone array is used instead of a single microphone to pick up the signal. Apart from the spectral information also the spatial information can be taken into account. Different techniques that exploit this spatial information exist. In filter– and sum beamforming [60], a static beam is formed into the (assumed known) direction of the (speech) source of interest (also called the direction of arrival). While filter–and sum beamforming is about the cheapest multi–channel noise suppression method, deviations in microphone characteristics or microphone placement will have a large influence on the performance, Since signals coming from other directions than the direction of arrival are attenuated, beamforming also provides a form of dereverberation of the signal. Generalized sidelobe cancellers (Griffiths–Jim beamforming) [60] aim at reducing the response into directions of noise sources, with as a constraint a distortionless response towards the direction of arrival. The direction of arrival is required prior knowledge. A voice activity detector is required in order to discriminate between noise– and speech+noise periods, such that the response towards the noise sources can be adapted during noise–only periods. Griffiths–Jim beamforming effectively is a form of constrained optimal filtering. A third method is unconstrained optimal filtering [12][13]. Here a MMSE–optimal estimate of the signal of interest can be obtained, while no prior knowledge is required about geometry. A voice activity detector again is necessary and crucial to proper operation. The distortionless constraint towards the direction of arrival is not imposed here. A parameter can be used to trade off signal distortion against noise reduction. The contributions of this thesis in the field of acoustic noise reduction will be focused on this last method (chapters 5 and 6). Existing algorithms for unconstrained optimal filtering for acoustic noise reduction are highly complex compared to both other (beamforming–based) methods, which implies that they are not suited for real time implementation. On the other hand, they are quite promising for certain applications, since they have different features than the beamforming–based methods : filter–and sum beamformers are well suited (and even optimal) for enhancing a localized speech source in a diffuse noise field, and generalized sidelobe cancellers are able to adaptively eliminate directional noise sources, but both of them rely upon a priori information about the geometry of the sensor array, the sensor characteristics, and the direction of arrival of the signal of interest. This means that the unconstrained optimal filtering technique is more robust against microphone placement and microphone characteristics, and that the direction of arrival is not required to be known a priori. Another advantage is that they can easily be used for combined AEC/ANC, as we show in chapter 7. 1.3. APPLICATIONS 1.2.4 21 Combined AEC and ANC In many applications, techniques to cancel noise for which a reference signal exists (AEC) are often combined with techniques that do not use a reference signal (ANC), since both types of disturbances are often present. The order in which both signal processing blocks are applied to the signals is very important. In Figure 1.5, both options are shown. The upper scheme will first apply multichannel noise cancellation (no reference signal), and then echo cancellation. The advantage is that, since most referenceless noise reduction schemes make use of multiple microphones, only one echo canceller is needed. Moreover, in addition to the echo path, the echo canceller will have to model the variations in the noise cancellation block. The lower scheme in Figure 1.5 requires an echo canceller for each microphone, and these need to be robust against the noise that is still present in their input signals. In spite of the higher complexity of the second scheme, it is most often used because of its better performance compared to the first scheme. Apart from these combination schemes, a lot of different combination schemes are described in literature [1, 7, 37, 38, 6]. In this thesis, we will show that considering the combined problem as a global optimization problem leads to a better performance. We will describe how the unconstrained filtering techniques derived in the chapters about noise cancellation, can easily be adopted for solving the combined acoustic noise and echo cancellation problem. For echo paths of reasonable length, real time implementation of these techniques is possible with present day processors. 1.3 Applications Tele– and videoconferencing As a first application example we consider teleconferencing. A number of people is meeting in two rooms. In each of these rooms, a microphone array and a loudspeaker are present. The loudspeaker reproduces the sound of the speakers in the other meeting room. The system can be expanded to have more loudspeakers, in order to give the conference participants a spatial impression of the reproduced sound. If no echo–cancellation is applied, echo’s and howling can occur. Echo paths can be as long as 200 msec, while a sampling speed of about 16 kHz is required in order to have a high enough speech quality, resulting in echo path impulse responses of up to 3000 taps. On the other hand, people talking in the background, a computer fan, air conditioning are all examples of disturbances that should be handled by means of noise cancellation. Often the echo–cancellers in this type of applications could profit from algorithms as described in chapters 3 and 4, of which the convergence is less dependent on the input signal statistics than what is the case for NLMS. Also algorithms providing the 22 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT From far end Interference Noise To far end Acoustic echo Speaker Desired signal canceller Noise reduction From far end Interference Noise To far end Acoustic echo Speaker Desired signal canceller Noise reduction Figure 1.5: Two methods to combine echo– and noise cancellation. 1.3. APPLICATIONS 23 ’combined’ ANC and AEC–approach in chapter 7 would increase the performance of a speech enhancement system for tele– or videoconferencing. Note though that for larger auditoria the required number of filter taps is huge, and that complexity of the algorithms should be taken into account. Car applications In car applications such as voice control of a mobile phone or sound system, or hands free telephony, noise appears to be the most important problem. For engine noise or radio sound, a reference is available or can be derived, while wind– and tyre noise, passengers talking to each other, ... are disturbances without a reference signal. Acoustic paths in cars are much shorter (up to 256 impulse response taps), as compared to typical conference room impulse responses. Also in this case both ANC and AEC are required. Because of the limited length of the echo path, the algorithms in chapter 7 certainly become an option. Voice control Voice control technology can be found in consumer products, but also finds applications in making technology accessible for disabled people. Speech recognition systems are often trained with clean speech (without noise), because a lot of clean speech databases are available, although also databases are set up for specific noise situations (e.g. speech recognition in cars). A specific problem is voice control of a surround sound audio system, where a multichannel echo–canceller is required in order to suppress the signal stemming from the five speakers after being picked up by the microphone. In this case, reference signals are available, and algorithm with a better performance for coloured signals than NLMS are required (chapters 3 and 4). Hearing aids Acoustic noise cancellation techniques are applied in the field of hearing aids and cochlear implants. It is known that merely amplifying a signal does not contribute to increasing the speech intelligibility, when ’background noises’ are present. Noise cancellation techniques can alleviate this problem, and at present 2– microphone hearing aids with noise cancellation technology are commercially available. The space (and hence the computational power) in a behind–the–ear device is limited, so most of the time cheap (adaptive beamforming) algorithms are used at present, but also these devices could benefit from the techniques in chapters 5 and 6. Selective volume control Techniques that are developed for acoustic echo cancelling, can also be applied in other fields. An example is a ’selective volume control’ 24 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT device, which is used in e.g. discotheques to turn down the sound volume automatically if it exceeds the legal norms. In order to avoid that loud noises made by the crowd would result in lowering the amplifier’s volume, an adaptive filter is used to retain only the sound from the loudspeakers in the signal that is picked up by a measurement microphone before the sound pressure level is calculated. A similar system is a volume control application in e.g. a train station, where the volume is automatically turned up if a train passes, or if the crowd is noisy, but which is not sensitive to the sound of the public address system’s own loudspeakers. This kind of applications is even more demanding concerning filter lengths than ordinary echo cancelling in rooms. The legal norms about the maximum sound pressure level are given per frequency on the full audible frequency spectrum. This means that a sampling rate of 44 kHz is required. So the required filter length is more than 10000 filter taps. On the other hand, calculations could be done off–line instead of in real time, and the music signals can be largely correlated. This again requires ’intermediate’ algorithms between NLMS (convergence depends on input signal statistics) and RLS. Recording A recording of e.g. an orchestra or a theatre play imposes different constraints. Microphones will not be placed in an array with an a priori known geometry, but they will be spread over the whole stage on which the performance takes place. The signal of interest does not originate from one specific direction. In dedicated theatres, the noise will mainly consist of the audience, but also scenario’s with noise of air conditioning or heating systems (recordings in churches) are possible. 1.4 The market A large number of companies are currently offering products and services that are linked with the above–mentioned speech enhancement techniques. While in high end devices for auditorium teleconferencing (price about 5000 Euro) it is difficult to gather information on the type of algorithms used, data sheets about desktop conferencing consumer products often indicate that computationally cheap NLMS–like or frequency domain derivatives of NLMS are used. Examples of companies are Spirit Corporation (http://www.spiritcorp.com) , providing code libraries for acoustic echo and noise cancellation optimized for different types of DSP processors, and for the Microsoft Windows operating system. Polycom (http://www.polycom.com) provides ’desktop’ teleconferencing solutions, and the performance data they publish (a convergence time of 10–40 sec) indicate the use of cheap adaptive filters. Larger systems are e.g. built by Clearone (http://www.clearone.com). 1.5. CONTRIBUTIONS 25 Another application is audio enhancement. Both the application CoolEdit (from Syntrillium, http://www.syntrillium.com) and SoundForge (from Sonic Foundry. http://www.sonicfoundry.com) contain signal enhancement modules providing single channel spectral subtraction techniques. Commercial voice command applications often use proprietary techniques based upon beamforming (e.g. with a microphone array on top of a computer monitor (Andrea Electronics, http://www.andreaelectronics.com)). In hearing aids, the commercial state of the art devices use two microphones and Griffiths–Jim beamforming based noise cancellation schemes. The importance of speech enhancement technology in the current market is also shown by the fact that in the most recent version of Microsoft Windows XP noise–cancellation and echo–cancellation features are built in the operating system (http://www.microsoft.com). It is clear that in the consumer telecommunications market, the demand for handfree mobile telephony — a direct application of the techniques described here — is high, because of security (and legal) issues concerning use of a mobile phone while driving. As an example : in 2002, the worldwide sales of mobile phones has risen with 6%, 423,4 million devices were sold worldwide (http://www.tijdnet.be/archief). 1.5 Contributions From section 1.4, one can see that the commercially available applications are all based upon ’low complexity’ algorithms, obviously due to real–time and cost constraints. For acoustic echo cancelling often more performant algorithms than NLMS– based ones begin to be used, certainly in ’high end’ applications. The performance and the complexity of the APA–based algorithms we have studied in this work can be ’tuned’ to use the available computational power. We provide some an alternative for obtaining noise–robustness and derive an efficient frequency–domain based algorithm, which does not contain any approximations (contrarily to existing implementations) One notices that the computational complexity of the newer (unconstrained adaptive filtering) algorithms for noise reduction prohibits their commercial application. Of course, with the rise of computational power over the years, in a decade from now these algorithms will also be applied, even in consumer electronics. In this text we will focus our attention to some of these new (’academic’) techniques, and we will derive new algorithms that have a (sometimes dramatically) reduced complexity compared to their predecessors, while keeping their performance at the same level. This should allow these more performant techniques to be considered for use in commercial applications in a much shorter time frame. The contributions to the field of speech enhancement which are treated in this text, can be subdivided into three major categories. 26 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT • The first category consists of signal enhancement techniques for acoustic noise reduction when a reference signal is available (AEC). The results consists of alternative regularization techniques for improving the noise robustness of acoustic echo cancellers based upon the affine projection algorithm (see further on in this text) , and the Block Exact Affine Projection Algorithm (BEAPA), which is a fast frequency domain version of the affine projection algorithm with roughly the same complexity as BEFAP (see further on in this text), but without the need for the assumptions that need to be made for BEFAP. The results hereof are published in the conference papers [50, 47, 48, 49] and in the journal paper [55]. They will be treated in chapters 3 and 4. • The second category focusses on MMSE–based optimal filtering for acoustic noise reduction in case no reference signal is available (ANC). We proposed a QRD–RLS and a QRD—LSL based approach to unconstrained optimal filtering that achieves the same performance as existing (GSVD–based) techniques, but with a complexity reduction of respectively one and two orders of magnitude. These results have been published in the papers [54, 52] and [56, 51]. We will treat them in chapters 5 and 6. • Finally, combination of noise– and echo cancelling is treated in chapter 7, and this result is in our paper [53]. 1.6 Outline 1. Speech signal enhancement 2. Adaptive filtering algorithms Introduction 3. APA regularization and Sparse APA 4. BEAPA for AEC 5. QRD−RLS based ANC 6. Fast QRD−LSL based ANC Acoustic echo cancellation Acoustic noise cancellation 7. Integrated noise− and echo cancellation Echo and noise cancellation 8. Conclusions Figure 1.6: Outline of the text The outline of the text is depicted in Figure 1.6. Chapter 2 contains additional introductory material. Relevant adaptive filtering algorithms are reviewed, and the concept of signal flow graphs is explained briefly. 1.6. OUTLINE 27 Chapter 3 and 4 of the thesis focus on acoustic echo cancellation. More specifically in chapter 3 the importance of noise robustness in acoustic echo cancellers is reviewed, and some techniques are derived to implement this into fast affine projection algorithms. We also show that traditional fast implementations exhibit problems when strong regularization is applied. In chapter 4 a frequency domain block exact affine projection algorithm is derived which does not contain the approximations that are present in traditional fast affine projection schemes, while it has a complexity that is comparable to these schemes. Chapter 5 and 6 focus on acoustic noise cancellation techniques. In chapter 5 an unconstrained optimal filtering based noise cancellation algorithm is derived. This algorithm is based upon the QR–decomposition (see section 2.3 for a definition). It obtains the same performance as existing algorithms for unconstrained optimal filtering, while its complexity is an order of magnitude lower. Chapter 6 builds upon the previous one to derive an even cheaper fast QRD–based algorithm while again performance is maintained at the same level. In chapter 7 we discuss the combination of AEC and ANC, and show the performance advantage of using an integrated approach to acoustic noise and echo cancellation compared to traditional combination schemes. Chapter 8 finally, contains the overall conclusions of this work, as well as suggestions for further research. 28 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT Chapter 2 Adaptive filtering algorithms Adaptive filters will play an important role in this text. Therefore, we will devote a chapter to giving an overview of commonly used adaptive filtering techniques. In section 2.1 the general adaptive filtering setup and problem will be reviewed. The normalized least mean squares algorithm (NLMS) and the recursive least squares (RLS) algorithms will be reviewed in sections 2.2 and 2.3. An intermediate class of algorithms, both complexity– and performance–wise, can be derived from the affine projection algorithm (APA). APA will be introduced in section 2.4. In each section complete algorithm descriptions of these algorithms will be given for reference. Later on in this text, APA will be the main topic in chapters 3 and 4, where it will be used for acoustic echo cancellation. Chapters 5, 6 and 7 will mainly be based upon algorithms derived from RLS and fast versions thereof. 2.1 Introduction In this introduction we will give a short overview of the data representations that will be used in the remains of the chapter and the thesis. We will use both adaptive filtering configurations with single and multiple input and output channels. A single input, single output adaptive filtering setup is shown in Figure 2.1. An input signal x(k) is filtered by a filter w(k). The output from this filtering operation is subtracted from a ’desired signal’ d(k) and the resulting ’error signal’ e(k) is used to update the filter coefficients. The signals are assumed to be zero mean, and d(k) is a linearly filtered version of x(k) with zero mean noise added that is assumed to be independent of x(k). 29 30 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS x w −ye + d Figure 2.1: Adaptive filter. The filter coefficients w are adapted such that e is minimized. All of the algorithms are based upon an overdetermined system of linear equations X(k)w(k) = d(k) d(k − 1) . . . T , (2.1) where = X(k) x(k) xT (k) T x (k − 1) xT (k − 2) .. . , x(k) x(k − 1) . . . x(k − N + 1) = T , which will be solved in the least squares sense, i.e. based on a LS criterion min d(k) d(k − 1) . . . wLS (k) T 2 − X(k)w(k) . (2.2) The LS solution is given as wLS (k) = (X T (k)X(k))−1 X T (k) d(k) d(k − 1) . . . T . We will also use the MMSE criterion min ε{(d(k) − xT (k)w(k))2 }, wMMSE (k) (2.3) where ε{} is the expectation operator. The MMSE solution is given as wMMSE (k) = (ε{x(k)xT (k)})−1 ε{x(k)d(k)}. In each time step k, a new equation will be added to (2.1), so at each time instant a new value for w(k) can be calculated. Since adaptivity is required in a changing 31 2.1. INTRODUCTION environment, algorithms will be designed to ’forget’ old information. This can be achieved by exponentially weighting the rows of X(k) as it is usually done in the RLS algorithm, i.e. X(k) = xT (k) λxT (k − 1) λ2 xT (k − 2) .. . , or by only using the P most recent input vectors in X(k) : X(k) = x1 W1 d xT (k) T x (k − 1) .. . xT (k − P + 1) x2 x3 W2 + − + . W3 e Figure 2.2: A multi–channel adaptive filter. The input vector x(k) consists of the concatenation of the channel input vectors xi (k), and similarly the filter vector w(k) = T . w1T (k) w2T (k) w3T (k) In this text, we will also consider multichannel (multiple input) adaptive filters (see 32 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS Figure 2.2), where the input vectors x(k) will be defined as x1 (k) .. . x1 (k − N + 1) −−−−− x(k) = x2 (k) x 2 (k − 1) .. . xM (k − N + 1) . (2.4) Similarly w(k) is then defined as a stacked version of the filter vectors wi (k) for i = 1...M : w(k) = w1 (k) w2 (k) .. . wM (k) . Here M will be the number of input channels of the adaptive filter, and N is the number of filter taps per input channel. Sometimes an alternative definition for the input vector will be used in which the input signals will be interlaced : x(k) = x1 (k) .. . xM (k) −−−−− x1 (k − 1) x2 (k − 1) .. . xM (k − N + 1) . (2.5) As a result, also the corresponding filter taps will be interlaced. Considering setups with multiple microphones, we will be solving least squares minimization problems that share the same left–hand side matrix X(k), but have different right hand side vectors. They can be solved concurrently as one multiple–right hand side least squares problem. In this case the columns of a matrix W (k) will be solutions to LS–problems with the columns of a matrix D(k) as their respective right hand 33 2.2. NORMALIZED LEAST MEAN SQUARES ALGORITHM sides. A system of equations analoguous to (2.1) can be written down : X(k)W (k) = D(k), xT (k) T X(k) = x (k − 1) , .. . x(k) D(k) (2.6) x(k) x(k − 1) . . . x(k − N + 1) dT (k) dT (k − 1) = , .. . = T , dT (k − 1) d(k) d1 (k) d2 (k) . . . = T . Note the structure of d(k) of which the components represent the different desired signal samples at time k. The least squares solution can be found from min kD(k) − X(k)W (k)k . W (k) (2.7) The corresponding MMSE criterion is min ε{d(k) − xT (k)W (k)} . W (k) In the next sections we will give an overview of the different adaptive filtering techniques that will be used in this thesis. 2.2 Normalized Least Mean Squares algorithm One approach to solving (2.1) is the Least Mean Squares (LMS) algorithm. This algorithm is in fact a stochastic gradient descend method applied to the underlying MMSE criterium (2.3). The update equations for the filter coefficient vector wlms (k) are e(k + 1) = d(k + 1) − xT (k + 1)wlms (k), wlms (k + 1) = wlms (k) + µx(k + 1)e(k + 1), y(k + 1) = xT (k + 1)wlms (k + 1). (2.8) (2.9) Here µ is a step size parameter. A full description is shown in Algorithm 1. In order to make the convergence behaviour independent of the input energy, often the Normalized Least Mean Squares (NLMS) algorithm is used, where the filter vector 34 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS Algorithm 1 LMS algorithm wlms = 0;y = 0; Loop (new input vector x and desired signal d in each step): e=d−y wlms = wlms + µxe y = xT wlms update is divided by the input energy. The algorithm is given by e(k + 1) wnlms (k + 1) = d(k + 1) − xT (k + 1)wnlms (k), x(k + 1)e(k + 1) = wnlms (k) + µ T . x (k + 1)x(k + 1) + δ (2.10) Here the δ is a ’regularization term’. In NLMS it guarantees that the denominator can not become zero, but it also provides noise robustness (see section 2.4). Similar equations are obtained for the definitions (2.6). It can be shown that, for µ = 1 and δ = 0, the a posteriori error for NLMS, epost (k + 1) = d(k + 1) − xT (k + 1)wnlms (k + 1), is zero, which means that for the NLMS–algorithm the systems of equations (2.1) or (2.6) are effectively reduced to one single equation, namely the most recent one, and that this equation is solved exactly based on a minimum–norm weight vector adaptation. NLMS is a computationally cheap algorithm with a complexity of 4N flops per sample1 , but it suffers from a slow convergence when non–white input signals are applied. In practice often frequency domain variants of this algorithm are used in order to obtain an even lower complexity. An algorithm description of the time domain NLMS algorithm is given in Algorithm 2. . Algorithm 2 NLMS algorithm wnlms = 0; y = 0 Loop (new input vector x and desired signal d in each step): e=d−y wnlms = wnlms + µ xTxe x+δ y = xT wnlms We also note here that if the LMS–algorithm is to be calculated for multiple desired (right hand side) signals, the whole algorithm simply has to be repeated for each desired signal. In the NLMS–algorithm the (small) cost of calculating the input energy 1 For complexity calculations in this text we will count an addition and a multiplication for 2 separate floating point operations 35 2.3. RECURSIVE LEAST SQUARES ALGORITHMS can be shared : eT (k + 1) = dT (k + 1) − xT (k + 1)Wnlms (k), x(k + 1)eT (k + 1) Wnlms (k + 1) = Wnlms (k) + µ T . x (k + 1)x(k + 1) + δ 2.3 (2.11) Recursive Least Squares algorithms In this section, we will first review the standard recursive least squares algorithm, then the numerically more stable (and thus preferrable) QRD–based RLS algorithm, and finally the fast QRD–Least Squares Lattice algorithm. 2.3.1 Standard recursive least squares Instead of applying a stochastic gradient descent method (NLMS), the recursive least squares (RLS) algorithm solves system (2.6) or (2.1) in a least squares (LS) sense, i.e. based on the LS criterion (2.2), and does so by applying recursive updates to the solution calculated in the previous time step (cfr. newton-iterations on a quadratic error surface where the hessian reduces to a correlation matrix). For exponentially weighted RLS, the update equations are erls (k + 1) Ξ−1 (k + 1) wrls (k + 1) = d(k + 1) − xT (k + 1)wrls (k), 1 −1 (k)x(k + 1)xT (k + 1) λ12 Ξ−1 (k) 1 −1 λ2 Ξ Ξ (k) − = , λ2 1 + λ12 xT (k + 1)Ξ−1 (k)x(k + 1) = wrls (k) + Ξ−1 (k + 1)x(k + 1)erls (k + 1), (2.12) where Ξ−1 (k) is the inverse correlation matrix (Ξ(k) = X T (k)X(k)). The first equation calculates the error at time instant k + 1, while the second equation is the filter coefficient update. Instead of doing an update in the direction of the input vector x(k) as in LMS, in (2.12) the input signal can be seen to be whitened because it is multiplied by the inverse correlation matrix. An algorithm description is provided in Algorithm 3. Again a regularization (or better : ’diagonal loading’) term can be added to the inverse correlation matrix wrls (k + 1) = wrls (k) + (X T (k + 1)X(k + 1) + δI)−1 x(k + 1)erls (k + 1). Here I is the unity matrix. It is well known that this provides robustness to noise terms that could be present in d(k) [27]. 36 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS Algorithm 3 RLS algorithm wrls = 0 Ξinv = 106 .I //Init with large number Loop (input : d and x): erls = d − xT wrls Ξinv = 1 λ2 (Ξinv − 1 λ2 Ξinv xxT 1 λ2 Ξinv 1+ λ12 xT Ξinv x ) wrls = wrls + Ξinv xerls y = xT wrls It is easily seen that in case multiple right hand side signals are present, the update of the inverse correlation matrix can be shared among the different right hand sides : eTrls (k + 1) Ξ(k + 1) Wrls (k + 1) = dT (k + 1) − xT (k + 1)Wrls (k), 1 1 T 1 λ2 Ξ(k)x(k + 1)x (k + 1) λ2 Ξ(k) = Ξ(k) − , λ2 1 + λ12 xT (k + 1)Ξ(k)x(k + 1) = Wrls (k) + Ξ(k + 1)x(k + 1)eTrls (k + 1). (2.13) This effectively means that in that case apart from the cost of calculating the inverse correlation matrix (once), for each channel only an LMS–like updating procedure needs to be calculated. This is easily shown by comparing (2.12) and (2.8). We will now describe an RLS algoritm based on QRD–updates, which is known to have good numerical properties. 2.3.2 QRD–updating Every matrix X ∈ <L×M N with linearly independent columns, L ≥ M N (in our application, M will be the number of microphones or ’input channels’ and N the number of filter taps per microphone) can be decomposed into an orthogonal matrix Q ∈ <L×M N and an upper triangular matrix R ∈ <M N ×M N , where R is of full rank and has no non–zero entries on the diagonal. X = QR. (2.14) This decomposition is called ’QR–decomposition’ (QRD), and R is called the Cholesky– factor or square root of the matrix product X T X, since X T X = RT R. In our applications X(k) is often defined in a time recursive fashion, X(0) = xT (0) , T x (k + 1) X(k + 1) = . (2.15) λX(k) 37 2.3. RECURSIVE LEAST SQUARES ALGORITHMS Here 0 < λ ≤ 1 is a forgetting factor and k is the time index. We will now briefly review the QR–updating procedure [29] for computing the QRD of X(k + 1) from the QRD of X(k). If we replace X(k) by its QR–decomposition, we obtain : X(k + 1) = 1 0 0 Q(k) xT (k + 1) λR(k) . We can now find an orthogonal transformation matrix Q(k + 1) : xT (k + 1) X(k + 1) = , λR(k) 1 0 0 = Q(k + 1) , 0 Q(k) R(k + 1) | {z } 1 0 0 Q(k) (2.16) [∗|Q(k+1)] = Q(k + 1)R(k + 1). The ’*’ are don’t care–entries. Here Q(k + 1) will be constructed as a series of Givens–rotations, Q(k + 1) = G1,2 (θ1 (k + 1))G1,3 (θ2 (k + 1)) . . . G1,M N (θM N (k + 1)), with Gi,j (θ) = Ii−1 0 0 0 0 0 cos θ 0 sin θ 0 0 0 Ij−i−1 0 0 0 − sin θ 0 cos θ 0 0 0 0 0 IM N −j . Each of these rotations will zero out one of the elements of the top row in the compound matrix T x (k + 1) (2.17) λR(k) in order to obtain the updated R(k + 1) in the right hand side of (2.16). Q(k) will not be usefull in applications, and hence will not be stored. The procedure for choosing the i, j and θ for the Givens–rotations is best explained in the signal flow graph (SFG) for QRD–updating which is shown in Figure 2.3 for M = 2 and N = 4. In this SFG the upper triangular matrix R(k) can be recognized, as well as the input vector (from the delay line) that is placed on top of it. Compare this to the matrix (2.17). The rotations (hexagons in the signal flow graph) are defined by : a0 b0 = cos θ − sin θ sin θ cos θ a b . 38 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS When a new input vector x(k+1) enters the scheme, the top left hexagon will calculate θ1 such that its output b0 = 0, tan θ1 = x1 (k + 1) , R11 (k + 1) (2.18) and it will update R11 accordingly. Note that the denominator in this expression is never zero by the definition of the QR–decomposition (the matrix R(k) should be properly initialized before the first iteration). The other hexagons in the first row use this θ1 to process the remaining elements of the input vector and the top row of R(k) . This corresponds to applying G1,2 (θ1 (k)). Then the first hexagon in the second row will calculate θ2 so that applying G1,3 (θ2 (k)) nulls the second element of the modified input vector and updates the second row of R(k), and so on [39]. For a more detailed description of this signal flow graph, we refer to [46]. Algorithm 4 shows the QRD– updating process, see also [29]. Note that the updating scheme requires O((M N )2 ) flops per update. Algorithm 4 QRD–updating UpdateQRD (R, x, Weight) { // x is input vector // an upper triangular matrix R is being updated for (i = 0; i < M * N; i++) { R[i][i] *= Weight; temp = sqrt (R[i][i] * R[i][i] + x[i] * x[i]); sinTheta = x[i] / temp; cosTheta = R[i][i] / temp; R[i][i] = temp; for (j = i+1; j < M * N; j++) { temp = R[i][j] * dWeight; R[i][j] = cosTheta * temp + sinTheta * x[j]; x[j] = -sinTheta * temp + cosTheta * x[j]; } } } 2.3.3 QRD–based RLS algorithm (QRD–RLS) The QR–decomposition can be used to perform a least squares estimation of the form 2 min kX(k)W (k) − D(k)k . W (k) (2.19) Here W (k) is a matrix, each column of which corresponds to a least squares estimation problem with X(k) and the corresponding column of D(k) (referred to as the 39 2.3. RECURSIVE LEAST SQUARES ALGORITHMS filter input 1 ∆ x1(k+1) x 2 (k+1) filter input 2 ∆ x1(k) x 2(k) ∆ ∆ x1(k−1) x 2(k−1) ∆ x1(k−2) x 2 (k−2) ∆ R11 0 R22 0 R33 0 R44 0 R55 Rij(k+1) Rij(k) = ∆ λ a a’ b 0 0 θ =arctan(b/a) R66 0 hexagons are rotations R77 a a’ b b’ θ θ 0 R88 0 Figure 2.3: Givens–rotations based QRD–updating scheme to update an R(k)–matrix. On top the new input vector is fed in, and for each row of the R(k)–matrix a Givens–rotation is executed in order to obtain an upper triangular matrix R(k + 1). 40 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS “desired response signal”). If (2.1) is solved instead of (2.6), both D(k) and W (k) reduce to a vector. D(k) will also be defined in a time–recursive fashion using weighting : T d (k + 1) D(k + 1) = . λD(k) (2.20) Using equation (2.14) it is found that the least squares solution to (2.19) is given by W (k) = (X(k)T X(k))−1 X T (k)D(k) = R(k)−1 QT (k)D(k) . | {z } (2.21) Z(k) Hence W (k) is computed by performing a triangular backsubstitution with left hand side matrix R(k) and right hand side matrix Z(k). From R(k) = QT (k)X(k) and Z(k) , QT (k)D(k) it follows that Z(k) can be obtained by expanding the QRD– updating procedure with the desired signals part, i.e. applying the QRD–updating procedure to X(k) D(k) instead of X(k) only, as shown in Figure 2.4 with d(k) = d1 (k) d2 (k) T . At any point in time the least squares solution W (k) may then be computed based on the stored R(k) and Z(k) according to formula (2.21). The update equation becomes : T 1 0 x (k + 1) dT (k + 1) = 0 Q(k) λR(k) λz(k) 1 0 0 rT (k + 1) Q(k + 1) . 0 Q(k) R(k + 1) z(k + 1) | {z } (2.22) [∗|Q(k+1)] RLS has a rather large computational complexity, but (unlike NLMS) it shows a very good performance that is independent of the input signal statistics. Furthermore, it has been shown in [39] that rT (k + 1) dT (k + 1) − xT (k + 1).W (k) = QM N , i=1 cos θi (k + 1) where rT (k + 1) = ε1 ε2 (2.23) 2.3. RECURSIVE LEAST SQUARES ALGORITHMS 41 is a byproduct of the extended QRD–updating process, as indicated in Figure 2.4. This means that we can extract (a priori) least squares residuals without having to calculate the filter coefficients W (k) first. This is referred to as ’residual extraction’. Note that the denominator in (2.23) can not become zero because since the denominator of (2.18) will never be zero. For the a posteriori residuals, we can write dT (k + 1) − xT (k + 1).W (k + 1) = M N Y cos θi (k + 1)rT (k + 1). (2.24) i=1 The signal flow graph for the whole procedure as given in Figure 2.4 corresponds to Figure 2.3 with the right hand side columns with inputs d1 (k) and d2 (k) added to the right, as well as a “Π cos θ accumulation chain” added to the left. The complexity of this scheme is still O(M 2 N 2 ) per time update. Algorithm 5 gives details about the QRD–RLS procedure. Algorithm 5 Update of the QRD–RLS algorithm QRDRLS_update (R, x, r, Weight) { // x is the input vector // r is the desired signal input (a scalar in this case) // the upper triangular matrix R is being updated, along with // the vector z which is the right hand side // the residual signal is returned PiCos = 1; for (i = 0; i < M * N; i++) { R[i][i] *= Weight; temp = sqrt (R[i][i] * R[i][i] + x[i] * x[i]); sinTheta = x[i] / temp; cosTheta = R[i][i] / temp; R[i][0] = temp; for (j = i+1; j < M * N; j++) { temp = R[i][j] * Weight; R[i][j] = cosTheta * temp + sinTheta * x[j]; x[j] = -sinTheta * temp + cosTheta * x[j]; } temp = z[i] / Weight; z[i] = cosTheta * temp + sinTheta * r; r = -sinTheta * temp + cosTheta * r; PiCos *= cosTheta; } return r * PiCos; } 42 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS x1(k+1) filter input 1 x2(k+1) filter input 2 ∆ x1(k) ∆ x2(k) ∆ x1(k−1) ∆ x2(k−1) ∆ x1(k−2) x2(k−2) ∆ d (k+1) d (k+1) 1 2 Z(k+1) 1 0 0 R11 0 R22 0 0 0 R33 0 R44 0 0 0 R55 0 R66 0 0 0 R77 0 R88 0 Π cos θ ε1 ε2 r(k+1) LS residual Figure 2.4: QRD–RLS algorithm. The right hand side (desired signal) is updated with the same rotations as the left hand side. 2.3.4 QRD–based least squares lattice (QRD–LSL) It is well known that the shift–structure property of the input vectors of Figure 2.4 can be exploited to reduce the overall complexity. It can be shown [46] that a QRD–RLS scheme as shown in Figure 2.4 is equivalent to the scheme of Figure 2.5, which requires only O(M 2 N ) flops per update instead of O(M 2 N 2 ) for the original scheme. Since N (the number of filter taps) is typically larger than M (the number of microphones), this amounts to a considerable complexity reduction. The complexity reduction stems from replacing the off–diagonal part of the triangular structure (the 43 2.3. RECURSIVE LEAST SQUARES ALGORITHMS diagonal part is seen to be still in place), by the computations in the added left hand part. The resulting algorithm is called QRD–LSL (QRD–based least squares lattice), and it is known to be a numerically stable implementation of the RLS algorithm, since it only uses stable orthogonal updates as well as exponential weighting. Note that QRD–LSL needs to read the input one sample ahead as compared to QRD–RLS. For further details on the QRD–LSL derivation, we refer to [46]. In Algorithm 6 the QRD–LSL adaptive filter is given in pseudocode. filter input 1 x1(k+2) filter input 2 x2(k+2) 1 0 0 0 0 ∆ 0 0 0 ∆ 0 0 0 ∆ d (k+1) d (k+1) 1 2 x2(k+1) ∆ R11 0 R22 R33 0 R44 0 ∆ 0 x1(k+1) 0 ∆ 0 ∆ R55 0 R66 0 ∆ 0 0 R77 0 R88 0 Πcos θ ε1 ε2 LS residual Figure 2.5: QRD–LSL. Notice that the inputs for the right hand side part are the desired signals at time k + 1, while the inputs for the left hand side are the input signals at time k + 2 2.3.5 RLS versus LMS RLS is much more complex than LMS, but the performance for colored signals like speech is often better. In formula (2.12) the updating equation for LMS can be recognized, with a ’prewhitening’ added in the form of the multiplication with the inverse correlation matrix. 44 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS Algorithm 6 QRD–LSL update Update(R, x, Weighting, RightHandSideMatrix, r, PiCos) { for (i = 0; i < M*N; i++) { for (j = i; j < M*N; j++) { R[i][j] *= Weighting; } GivensCalcAngle(SinTheta, CosTheta, R[i][i], x[i]); for (j = i+1; j < M*N; j++) { GivensRotate(SinTheta, CosTheta, R[i,j], x[j]); } for (j=0; j < RightHandSideMatrix.GetNrColumns(); j++) { RightHandSide[i][j] *= Weighting; GivensRotate(SinTheta, CosTheta, RightHandSideMatrix[i][j], r[j]); } PiCos *= CosTheta; } return r; } ProcessNewInput(x, Weight, Desired) { PiCos=1; xl = x; xr = x; delay[0] = x; for (int i=0; i < N; i++) { dxl = delay[1]; dxr = dxl; Update(RightR[i], dxr, dWeight, [RotationsRight[i] z[i]], [xr dDesired], PiCos); if (i < N-1) { Update(LeftR[i], xl, dWeight, RotationsLeft[i], dxl, 0); delay[i] = dxl; } xl = xr; } for (int i = 0; i < N-1; i++) { delay[i+1] = delay[i]; } return dDesired*dPiCos; } 2.4. AFFINE PROJECTION BASED ALGORITHMS 2.4 45 Affine Projection based algorithms We will introduce the affine projection algorithm, and its time domain fast version named FAP. 2.4.1 The affine projection algorithm The affine projection algorithm (APA) [43] is an ’intermediate’ algorithm in between the well known NLMS and RLS algorithms, since it has both a performance and a complexity in between those of NLMS and RLS. It is (for the case of a single desired signal) based upon a system of equations of the form d(k) d(k − 1) (2.25) XP (k)T wapa (k − 1) = = dP (k), .. . d(k − P + 1) x(k) x(k − 1) . . . x(k − P + 1) , XP (k) = where N is the filter length, P (with P < N ) is the number of equations in the system. The ’basic’ system of equations (2.1) can again be recognized, this time with a smaller number (P ) of equations. The APA–recursion for a single desired signal is given as : e(k + 1) = dP (k + 1) − XPT (k + 1)wapa (k), g(k + 1) = (XPT (k + 1)XP (k + 1) + δI)−1 e(k + 1), wapa (k + 1) = wapa (k) + µXP (k + 1)g(k + 1) y(k + 1) = xT (k + 1)wapa (k + 1). (2.26) Here µ and δ are a step size and a regularization parameter respectively. The regularization parameter is important in providing noise robustness, as will be explained in section 2.4. P is a small number (e.g. 10) compared to N (e.g. 1000 or 2000). The first element of eP (k + 1) is the a priori error of the ’most recent’ equation in each step. An algorithm specification is given in Algorithm 7. The complexity of the algorithm is O(M N P ). Also this algorithm is easily extended to multiple right hand sides (2.6). 2.4.2 APA versus LMS Just like for the RLS–algorithm, one can recognize an LMS–filter in (2.26), preceeded by a pre–whitening step on the input signal. The NLMS algorithm is as a matter of 46 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS Algorithm 7 Affine projection algorithm w=0 Loop (new x and d in each step): add column x to XP as first column remove last column from XP add d as first element from dP remove bottom element from dP e = dP − XPT w g = (XPT XP + δI)−1 e w = w + µXP g y = xT w fact a special case of the APA–algorithm with P = 1. For µ = 1 and δ = 0, the a posteriori error in (2.26) epost (k + 1) = dP (k + 1) − XPT (k + 1)w(k + 1) is zero, i.e. APA basically solves the P most recent equations exactly, based on a minimum norm weight vector update. Remember that the NLMS–algorithm applied a minimum norm update to the solution vector such that the a posteriori error of the most recently added equation is exactly zero. A geometrical interpretation will be given in section 2.5. 2.4.3 The Fast Affine Projection algorithm (FAP) A fast version of APA, called FAP, which has a complexity of 4N + 40P flops is derived in [26]. Since typically P N , FAP only has a small overhead as compared to NLMS. This complexity reduction is accomplished in two steps. First, one only calculates the first element of the P –element error vector eP (k) (see formula 2.26) and one computes the other P − 1 elements as (1 − µ) times the previously computed error. As stated in [26], this approximation is based upon an assumption about the regularization by diagonal loading (δI) : ei (k) = epost i−1 (k − 1) for i > 1. Here ei (k) means the i’th component of the vector e(k), and similarly for epost i (k). Indeed, we have epost (k) = dP (k) − XPT (w(k − 1) + µX(k)(X T (k)X(k) + δI)−1 e(k)) | {z } w(k) ≈ e(k) − µe(k) ≈ (1 − µ)e(k). 47 2.5. GEOMETRICAL INTERPRETATION As shown in [26], this eventually leads to e(k) ≈ e1 (k) −−−−−−− (1 − µ)e1 (k − 1) .. . (1 − µ)eP −1 (k − 1) , (2.27) where e1 (k) = d(k) − xT (k)w(k − 1). Note that with a stepsize µ = 1, the P − 1 lower equations would have been solved exactly already in the previous time step, and hence their error would indeed be zero. A second complexity reduction is achieved by delaying the multiplications in the matrix–vector product X(k)g(k) in equation 2.26. This results in a ’delayed’ coefficient vector ŵ(k − 1) = w(0) + µ k−1 X x(k − l) l=P l X gj (n − l + j), j=0 such that w(k) = ŵ(k − 1) + µXP (k)f (k), where f (k) = g1 (k) g2 (k) + g1 (k − 1) .. . gP −1 (k) + . . . + g1 (k − P + 1) . It can be shown that an updating formula for ŵ(k) exists. A correction term can be used to obtain the residual at time k without having to calculate w(k) first. Details on the derivation can be found in [26]. Algorithm 8 is a full description of FAP. The complexity of the FAP adaptive filter can even be further reduced by using frequency domain techniques. In chapter 4, the Block Exact Fast Affine Projection (BEFAP) adaptive filter [59] will be reviewed, and we will derive a block exact version of APA, without the FAP–approximations, but with almost equal complexity as BEFAP. 2.5 Geometrical interpretation All algorithms update their filter coefficient vector in each time step. The similarity between (2.10), (2.12) and (2.26) is obvious. For NLMS with µ = 1 and δ = 0 the a posteriori error is zero in each step, and this corresponds to the fact that the most recent equation in a system of equations like (2.6) 48 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS Algorithm 8 Fast affine projections (FAP) for N filter taps which outputs the residuals of the filtering operation. The notation •a:b denotes the vector formed by the a’th to b’th component (inclusive) of vector • Loop : rxx = rxx + x(k).x2:P (k) + x(k − N )x2:P (k − N ) ê1 = d − xT ŵ e1 =ê1 − µrTxx f1:P −1 e1 (1 − µ)e1 e= . . . (1 − µ)eP −1 Update S = (X T X + δ)−1 0 f1 f = . + Se . . fP −1 ŵ = ŵ + µx(k − P + 1)fP e=e output e1 or d − e1 is solved exactly. For APA (µ = 1 and δ = 0) the a posteriori error vector of size P is zero, which means that the P most recent equations in (2.6) are exactly solved. This is possible as long as P ≤ N . When P > N we can only solve the system of equations in the least squares sense, which then corresponds to an RLS algorithm with a sliding window. So APA is clearly an intermediate algorithm in between RLS and NLMS in view of complexity, but also in view of performance. P is a parameter that can be tuned in function of the available processing power, where a larger P results in higher complexity, but improved performance. The fact that the performance of APA is intermediate between NLMS and RLS, can be shown geometrically. Figure 2.6 shows a geometric representation of the convergence of an NLMS filter with two filter taps. Assume the optimal filter vector that has to be identified by the process to be w. The vectors xi are the consecutive input vectors, while the points wi are the estimates of the filter vector in successive time steps. Assume the estimate of the filter vector at time 0 to be w0 . When a new input x1 arrives, w0 will be updated in the direction of x1 such that the error in the direction of x1 becomes zero (µ = 1 is assumed). This means that w0 is projected on to a line (=affinity) that is orthogonal to the input vector x1 , and that contains the vector w. When the process continues, we see that the estimates converge to w. The convergence rate is higher when the directions of the input vectors are ’white’, which means if they are effectively uncorrelated. 49 2.5. GEOMETRICAL INTERPRETATION W X4 W5 X3 W4 X2 X1 W3 W2 W1 W0 X5 Figure 2.6: Geometrical interpretation of NLMS. The estimate of the filter vector is projected upon an affinity of the orthogonal complement of the last input vector, such that this affinity contains the ’real’ filter vector. W W0 W1 X2 X1 Figure 2.7: Geometrical interpretation of APA. The estimate of the filter vector is projected upon an affinity of the orthogonal complement of the last P input vectors. This affinity contains the ’real’ filter vector. 50 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS For the APA–algorithm, we have a sketch in Figure 2.7 for a system with 3 filter taps and the APA–order P = 2. Now the estimate w0 is projected on the affinity of the orthogonal complement of the last P = 2 input vectors, such that this affinity contains the solution vector w (again, if stepsize µ = 1). It can be seen that this results in a faster convergence compared to NLMS when x1 and x2 have almost the same direction (i.e. when the input vectors are correlated). This intuitive geometrical interpretation shows that APA is an extension of the NLMS–algorithm, and at the same time it explains the name of the affine projection algorithm. 2.6 Conclusion In this chapter, we have reviewed some adaptive filtering algorithms that will be important in the rest of this text. The NLMS algorithm is a cheap algorithm (O(4M N ) complexity)that exhibits performance problems in the case where non–white input signals are used, because its convergence speed is dependent on the input signal statistics. On the other hand, the RLS–algorithm, which performs very well even for non–white signals, is much more expensive (O((M N )2 ) complexity for the standard versions, and O(M 2 N ) for stable fast versions like QRD–LSL). A class of ’intermediate’ algorithms is the APA–family of adaptive filters. These filters have a design parameter P with that one can tune both complexity and performance. RLS and APA can be seen as an LMS–filter with additional pre–whitening. Not long ago, only NLMS–filters and even cheaper frequency domain variants were used to implement acoustic echo cancellation, because of their complexity advantage whenever long adaptive filters are involved, allthough both RLS and APA are well known to perform better. Due to the increase of computing power over the years, also APA–filters more and more find their way into this field. We continue in this direction by proposing a new fast version of the APA–algorithm in the next chapters. Chapter 3 APA–regularization and Sparse APA for AEC In the previous chapter we have reviewed the adaptive filtering algorithms that are important in this thesis. In this and the next chapter, we will apply the affine projection algorithm to the problem of acoustic echo cancellation. APA has become a popular method in adaptive filtering applications. Fast versions of it have been developed, such as FAP (chapter 2) and the frequency domain Block Exact Fast Affine Projection (BEFAP) [59]. In this chapter, we focus on three main topics, namely regularization of APA, problems that exist in conventional fast APA implementations when regularization is applied, and finally regularization of APA in multichannel acoustic echo cancellation. In a traditional echo canceller system, the adaptation is switched off by a control algorithm when the microphone signal contains a near end (speech) signal. However, robustness against continuously present near end noise is also very important, especially for the APA–type of algorithms, which indeed tend to exhibit a large sensitivity to such near end noise. Regularization is generally used to obtain near end noise robustness. We will review two alternatives for regularization, and introduce a third alternative (which we will call ’sparse equations’ technique). Existing fast implementations of the affine projection algorithm (based upon FAP) exhibit problems when much regularization is used. Besides that, the FAP algorithm can not be used with the sparse equations technique that we will derive. We will show this in section 3.3, and this motivates further algorithm development in chapter 4. The outline of this chapter is as follows : in section 3.1, we will first state the problem that occurs if near end noise is present, and how diagonal loading and exponential 51 52 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA weighting (regularization) can be used to resolve this . In section 3.2 we will introduce a ’sparse equations’ regularization technique, which will also reduce the influence of near end noise. The problems in the FAP algorithm when much regularization is used, are demonstrated in section 3.3. In sections 3.4 and 3.5 experimental results are given and the behaviour of multichannel echo cancellation is studied when regularization is applied. Conclusions are given in section 3.6. 3.1 APA regularization In this section we will review why regularization is important for APA–type algorithms in case near end noise sources are present. ’Diagonal loading’ and exponential weighting as regularization methods are also reviewed. In the next section, we will introduce a third alternative, which we will call the ’sparse equations technique’. 3.1.1 Diagonal loading The (semipositive definite) covariance matrix X T (k)X(k) that is used (and inverted) in the APA–expressions (2.26) is regularized by adding a small constant δ times a unity matrix (diagonal loading). The equations are repeated here for the update from k − 1 to k (for convenience) : e(k) = dP (k) − XPT (k)w(k − 1) g(k) = (XPT (k)XP (k) + δI)−1 e(k) . w(k) = w(k − 1) + µX(k)g(k) The obvious effect of this is that the matrix can not become indefinite, but regularization also has a beneficial effect when near end noise is present. This is shown in [27] as follows. Rewrite dP (k) as dP (k) = x(k).wreal + n(k) with wreal the room impulse response we are looking for. The vector n(k) consists of only the near end noise in absence of a far end signal. We can derive the formula for the difference vector ∆w(k) between the real impulse response wreal and the identified impulse response at time k, w(k), namely ∆w(k) ≡ w(k) − wreal , ∆w(k) (3.1) = I − µ XP (k)(XPT (k)XP (k) + δI)−1 XPT (k) ∆w(k − 1) + | {z } P (k) µ XP (k)(XPT (k)XP (k) | {z P̃ (k) + δI)−1 n(k). } (3.2) 53 3.1. APA REGULARIZATION If XP (k)is written as its singular value decomposition, XP (k) = U (k)Σ(k)V T (k), we can write P (k) = XP (k)(XPT (k)XP (k) + δI)−1 XPT (k) ⇓ XPT (k)XP (k) , U (k)diag(σ02 (k), . . . , σP2 −1 (k))U T (k) P (k) = U (k)diag( (3.3) σP2 −1 (k) σ02 (k) , . . . , , 0N −P )U T (k), σ02 (k) + δ σP2 −1 (k) + δ where σi (k) are the singular values of XP (k), and U (k) and V (k) are orthogonal. These equations show that δ has an effect both on the adaptation (first term of equation 3.2), and on the near end noise amplification matrix (second term of equation 3.2). P (k) can be interpreted as an almost–projection matrix. If δ is chosen to be the power of the background noise n(k), replacing XP (k) by its singular value decomposition reveals that the directions (see section 2.5) in the adaptation of w(k) with large signal σi2 ≈ 1) and that (unreliable) updates in to noise ratios are retained (since then σ2 +δ i σ2 i directions with small signal to noise ratios are reduced ( σ2 +δ ≈ i obvious choice for δ concerning its influence on the adaptation. σi2 δ ). Hence this is the In the second term of equation 3.2, the continuously present background noise is seen to be multiplied by the matrix µP̃ (k) : P̃ (k) P̃ (k) = XP (k)(XPT (k)XP (k) + δI)−1 ⇓ XPT (k)XP (k) , U (k)diag(σ02 (k), . . . , σP2 −1 (k))U T (k) σ0 (k) σP −1 (k) = U (k)diag( 2 ,..., 2 , 0N −P )V T (k). σ0 (k) + δ σP −1 (k) + δ (3.4) Since U (k) and V (k) are orthogonal matrices, the noise amplification factors for the i–th mode are in the k–th step given as τ (σi (k), δ) = µ σi (k) . σi2 (k) + δ (3.5) So the larger δ is chosen, the less the near end noise is amplified into the adaptation of w(k). The conclusion is that by the proposed choice of δ, the amplification of the near end noise is prevented, while the adaptation itself is only reduced in directions with a low SNR. Figure 3.1 shows the echo energy loss for an acoustic echo canceller against time for some speech signal in the presence of near end noise. The dotted line is the loss for an unregularized APA–algorithm, the full line results when a properly chosen regularization term is applied before inverting the correlation matrix. Both are plotted on a logarithmic scale. The regularized case is better. 54 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA Attenuation [dB] 60 50 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 Time [s] Figure 3.1: When near end noise is present, the dotted line is the echo energy loss in dB for affine projection without regularization, the full line for affine projection with regularization. The graph shows a better result when regularization is applied. Often a Fast Transversal Filter (FTF) algorithm is used [26] to update the inverse correlation matrix (XPT (k)XP (k))−1 , since then also regularization by diagonal loading can rather straightforwardly be built in. But since this type of algorithms is known to have poor numerical properties, we propose to use QR–updating instead. The update equations for APA then become e(k) = dP (k) − XPT (k)w(k − 1) T R (k)RP (k)g(k) = e(k) (3.6) P w(k) = w(k − 1) + µXP (k)g(k) The second equation in (3.6) can then be solved by first updating R(k) by means of Algorithm 4, and then performing two successive backsubstitutions (with quadratical complexity because R(k) is triangular). QR updating is numerically stable since it can be implemented by using only (stable) orthogonal rotations. QR–updating — just like the backsubstitutions — has a quadratic complexity, but since P (the dimension of (XPT (k)XP (k))−1 ) in acoustic echo cancelling applications typically is very small compared to the filter length (P = 2 . . . 10 while N = 2000), this is not an issue. The implementation cost for diagonal loading in the FAP algorithm is zero if (as in the original algorithm) FTF is used to update the correlation matrices, but is is impossible to implement this when (as we propose) the stable QR–updating approach is used. Exponential weighting as a regularization technique on the other hand, fits in nicely 55 3.1. APA REGULARIZATION with the QR–updating approach. 3.1.2 Exponential weighting An alternative way to introduce ’regularization’ consists in using an exponential window for estimating the inverse covariance matrix [44]. The updating for the correlation matrix now becomes (XPT (k + 1)XP (k + 1))−1 = (λXPT (k)XP (k) + x(k + 1)xT (k + 1))−1 with x(k) = [ x(k) x(k − 1) . . . x(k − P + 1)]T . This is in contrast to the equations (2.25) and (2.26), where a sliding window is used. Figure 3.2 shows that when no noise is present, APA with a sliding window and APA with an exponential window both perform almost equally well. As shown in Figure 3.3, the regularization effect of using an exponential window keeps the coefficients from drifting away from the correct solution when noise is present. APA (dotted) versus APA with exponential window (full), no noise present 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 2 2.5 4 x 10 Figure 3.2: For a simulated environment with a speech input signal, this plot shows the distance (the norm of the difference) between the real filter and the identified filter coefficient vectors against time, both for original APA with a sliding window (dotted line) and APA with an exponential window (full line). This experiment shows the noiseless case, performance of both algorithms is equal. 56 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA APA (dotted) versus APA with exponential window (full line) when noise is present 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 2 2.5 4 x 10 Figure 3.3: Distance between identified and real filter vector versus time for the case where the echo signal is a speech signal, and near end noise is present. The full line is APA with an exponential window, the dotted line is the original APA algorithm without regularization. The identified filter coefficients get much closer to the real coefficients in case regularization (by using an exponential window) is used. 3.2 APA with sparse equations In this section we will derive a third alternative for incorporating regularization into the affine projection algorithm, which we call ’Sparse Equations’ technique. Diagonal loading is not easily implemented when the QR–updating technique is used, but both exponential weighting (see previous section) and the ’Sparse Equations’–technique on the other hand, are. Equation 3.5 shows that for δ = 0 noise amplification will be smaller if the smallest singular values are larger. So every method that realizes this is suitable to be used instead of explicit regularization. The reason why the singular values become small, lies in the autocorrelation that exists in the filter input signal. The system of P consecutive equations that is solved in APA, will therefore have a large condition number. This leads to the idea of using nonconsequtive equations. The equations will be less correlated since a typical speech autocorrelation function decreases with the time lag. We call the non–consequtive equations sparse equations. We will develop this further for equally spaced sparse equations. The matrix XP (k) ∈ RN ×P in the equations (2.25) to (2.26) is replaced by a matrix 57 3.2. APA WITH SPARSE EQUATIONS eP (k) ∈ RN ×P as follows. X = d̃P (k) − X̃PT (k)w̃(k − 1) ePT (k)X eP (k))−1 ẽ(k) = (X (3.7) [x(k) x(k − D) . . . x(k − (P − 1)D)] T d(k) d(k − D) . . . d(k − (P − 1)D) = (3.10) ẽ(k) g̃(k) w̃(k) (3.8) (3.9) = w̃(k − 1) + µX̃P (k)g̃(k) where eP (k) X d̃P (k) = Figure 3.4 shows the time behaviour of the smallest and the largest singular value of a e T (k)X eP (k)) regularized1 (XPT (k)XP (k) + δI) (explicit regularization case) and (X P Smallest and largest singular value. Full line : sparse equations, dotted : successive 3 10 σ 2 10 1 10 0 10 −1 10 −2 10 −3 10 −4 10 −5 10 0 0.5 1 1.5 2 2.5 4 Samples x 10 Figure 3.4: Smallest and largest singular value of input correlation matrix for a speech signal in function of time. Dotted line : explicit regularization (δ = 0.1). Full line : sparse equations (D = 10). Signal peak value = 0.1, P = 10, N = 1024. Regularization parameters were tuned for equal initial convergence in an echo canceller setup. (sparse equations case) for a speech signal, plotted on a logarithmic scale. Figure 3.4 shows that the matrix constructed using sparse equations typically has a better condition number than the explicitly regularized one. 1 In order to provide a fair comparison, regularization parameters have been tuned so that the initial convergence performance of an APA–adaptive filter is equal in both cases 58 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA There is a restriction though : the input signal x(k) has to be nonzero over the considered time frame (just some background (far end) noise is enough), because otherwise its covariance matrix even with sparse equations will become zero, (i.e. singular). Notice the silence in the input signal in the beginning of the plot. Since a control algorithm is available in every practical implementation of an echo canceller, its internal signals can be used to switch off adaptation when there is no far end signal present, and if there is a signal present, its covariance matrix should be of full rank. Experiments confirm that this is always the case in practice for speech signals. We will again use QR–updating (and downdating, see below) to track the covariance matrix (see section 2.3.2) . XP (k) = QP (k)RP (k) −1 −T T (XPT (k)XP (k))−1 = (RP (k)QTP (k)QP (k)RP (k))−1 = RP (k)RP (k) (3.11) (3.12) Here RP (k) ∈ RP ×P is an upper triangular matrix, QP (k) is an orthogonal matrix. Equation (3.12) shows that only the upper triangular matrix RP (k) needs to be stored and updated. Equation (3.8) can then be calculated by backsubstitution. (An alternative would be using inverse QR–updating [29] instead of QR–updating and multiplications instead of backsubstitutions ). From (2.25) and (3.10) it is seen that for updating XP (k) to XP (k + 1), instead of adding a column to the right and removing a column from the left, also a row can be added to the top, and one removed from the bottom. This translates into size P updates and downdates for the upper triangular matrix RP (k). Updating can be done using Givens-rotations on RP (k) for the updates (which corresponds to adding a row to XP (k)). Similarly, downdating is performed using hyperbolic rotations [29]. The procedure (and SFG, see Figure 2.3) is similar to the QRD–updating procedure, only now hyperbolic transformations of the form 0 a cosh(θ) − sinh(θ) a = b0 − sinh(θ) cosh(θ) b where angle θ is computed in a diagonal processor in the signal flow graph. The downdating algorithm is given in Algorithm 9, together with the function to update R with a rectangular window. In this way a rectangular window is implemented. Because the hyperbolic rotations are not numerically stable, it is interesting to make the window (weakly) exponential by multiplying the matrix RP (k) with a weighting factor λ (very close to 1) in each step. In this case, the filter weights must be compensated. This is due to the fact that XP (k) is updated row by row, while the actual input vectors are the columns. So the ’compensated’ filter vector becomes w̃comp (k) = diag( 1 λN −1 ,..., 1 )w̃(k) λ0 3.2. APA WITH SPARSE EQUATIONS 59 Algorithm 9 QRD–downdating and tracking of R(k) with a rectangular window. If λ 6= 1 the filter vector should be compensated. DowndateQRD (R, x) { // x is input vector // an upper triangular matrix R is being downdated for (i = 0; i < M * N; i++) { if (abs(x[i]) < abs(R[i][i])) { temp = x[i]/R[i][i]; coshTheta = 1 / sqrt(1-temp*temp); sinhTheta = coshTheta*temp; } else { temp = R[i][i]/x[i]; sinhTheta=1/sqrt(1-temp*temp); coshTheta=sinhTheta*temp; } R[i][i] = coshTheta * R[i][i] - sinhTheta * x[i]; for (j = i+1; j < M * N; j++) { temp = R[i][j] ; R[i][j] = coshTheta * temp - sinhTheta * x[j]; x[j] = -sinhTheta * temp + coshTheta * x[j]; } } } TrackRRectangularWindow { UpdateQRD(R,xk:−1:k−P +1 ,λ) DowndateQRD(R,xk−N :−1:k−N −P +1 ) } 60 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA How much decorrelation is provided by choosing D larger is of course dependent upon the statistics of the far end (echo reference) signal. In our experiments we have taken a fixed value of D. It should be noted that the complexity and memory requirements of the implementation will rise for larger D. If D is chosen large, more ’past information’ is considered for the estimation of the input statistics, (which is also the case for exponential weighting of course), so tracking of the input signal statistics will become slower. The plots in Figure 3.5 show the evolution of the distance between a (synthetically generated) room impulse response and what APA (dotted line) and APA with sparse equations (full line) identify as the filter vector. In Figure 3.5, there is no near end noise present, and then both methods have almost equal performance. In Figure 3.6, a small quantity of white near end noise disrupts the adaptation of the filter coefficients, which now at some points have a tendency to move away from their optimum values. The sparse equations setup can be seen to perform better than the setup with explicit regularization. This experiment was repeated with different distances between the equations and different regularization factors. Here, a distance of D = 5 was chosen, compared to a δ = 0.01 (where the maximum signal level is 0.1). Distance between real room response and (1) APA (dotted) (2) Sparse−APA (full) 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 2 2.5 4 x 10 Figure 3.5: Distance between real room response and w(k), the identified filter vector in function of time (speech input). Dotted line is regularized APA (δ = 0.01) , full line is sparse APA (D = 5). No near end noise is present. Sparse APA converges somewhat slower. The tree alternatives for regularization we have described, can all be used, and even combined. If QR–updating is used to keep track of the covariance matrix, regularization by diagonal loading is difficult to implement, but both exponential weighting and the sparse equations technique are valid choices. 61 3.3. FAP AND THE INFLUENCE OF REGULARIZATION norm van fout van disp−w en van gewone w (Volle lijn = disp). 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 2 2.5 4 x 10 Figure 3.6: Distance between real room response and w(k) for a speech signal. Dotted line is regularized APA (δ = 0.01), full line is sparse APA (D = 5). Near end noise is present. Sparse APA is shown to be a viable alternative for regularization It is also possible to combine the sparse equations technique with exponential weighting. This can easily be done by leaving out the downdates and the compensation for the filter weights. The sparse equations technique provides one with an extra parameter when regularizing APA, or can be used as a standalone technique for regularization. 3.3 FAP and the influence of regularization FAP was reviewed in section 2.4.3. An important observation is that the fast affine projection algorithm [26] builds upon some assumptions that are not valid anymore if the influence of the regularization becomes too large. The algorithm then starts to expose convergence problems, which is clearly shown in Figure 3.7 for a FAP algorithm with exponential weighting as a regularization technique. Figure 3.8 shows another example with explicit regularization for P = 10 with a ’strong’ regularization parameter (δ = 10). The plot shows the time evolution of the synthetically generated room impulse response and the filter vector estimated by both algorithm classes. APA is shown to perform better for this large regularization parameters than the FAP algorithm. This in particular will be a motivation for developing a fast (block exact) APA algorithm in chapter 4, as an alternative to the existing fast (block exact) FAP algorithms. 62 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA 0.25 0.2 0.15 0.1 0.05 0 0 0.5 1 1.5 2 2.5 4 x 10 Figure 3.7: Time evolution of the distance between identified and real filter for FAP (dotted) and FAP with an exponential window, λ = 0.9998 (full line) and a speech signal as far end signal. The approximations made in FAP are clearly not valid anymore when an exponential window is used. 0.2 |Wk−Wreal| 0.18 0.16 0.14 0.12 0.1 FAP, δ=10 0.08 0.06 APA, δ=10 0.04 0.02 0 0 0.5 1 1.5 2 2.5 4 Samples x 10 Figure 3.8: Behaviour of the FAP–based algorithms as compared to the APA–based algorithms when strong regularization is involved. The time evolution of the filter vector error norm (distance between real and identified room impulse response) is shown. The APA–algorithm has a much better convergence. The input is a speech signal. 3.4. EXPERIMENTAL RESULTS 3.4 63 Experimental results We will now show some experimental results concerning the regularization effect of the sparse equations technique on the echo canceller performance in the presence of near end noise. In all experiments in this section the same speech signal is used (maximum value of the signal is 0.1). The length of the echo canceller is 900 taps, and it tries to model a synthetically generated room impulse response of 1024 taps. This is a typical situation for echo cancelling : the ’real’ room impulse response is longer than the length of the acoustic echo canceller. The step size parameter is always µ = 1. In Figure 3.9, we compare the time evolution of weight error for APA (dotted line) and Sparse–APA (full line). In this simulation, white near end noise disturbs the adaptation of the filter coefficients. In the first half of the plot, the SNR is higher than in the second half of the plot. In all these experiments, the regularization parameters (δ for APA and the equation distance D for Sparse APA) were tuned to obtain an equal initial convergence, in order to have a fair comparison of the steady state performance. This was done by setting D = 5 for sparse–APA and then experimentally determining the value of δ (= 0.005) in order to get the same initial convergence. Sparse–APA where 10 out of 50 equations are used (so D = 5), outperforms explicitly regularized FAP (δ = 0.005) with 10 successive equations, both in the high and the low SNR part. Its performance is comparable with explicitly regularized APA (δ = 1) with 10 successive equations when the SNR is not too low, while else explicit regularization is better. On the plot also the performance of an explicitly regularized FAP–algorithm (δ = 1) with 50 successive equations is shown in order to show the performance drop if only 10 out of 50 equations are used. We can conclude that the performance of Sparse–APA where 10 out of 50 equations are used is better than the performance of FAP with 10 successive equations, if near end noise is present. The reason for this can probably be found in the regularizing effect of the sparse equations technique, and in the fact that the approximations in FAP are not present in Sparse– APA. Figure 3.9 also shows that regularization reduces FAP performance more than in the case of APA. In APA a regularization δ = 1 is needed to slow down the convergence to the same rate as the initial convergence for the sparse equations technique with D = 5. For FAP, the initial convergence speed has already decreased to that point with an explicit regularization of δ = 0.005. So this figure proves the performance benefit of using APA instead of FAP. In Figure 3.10, the tracking behaviour of Sparse APA is compared to the behaviour of APA, and it is the same for both algorithms when they are regularized comparably (equal initial convergence behaviour). In this experiment, D = 25 for Sparse APA, and δ = 0.1 for plain APA. P = 10 in both cases. 64 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA 0.18 |Wk−Wreal| 0.16 0.14 0.12 FAP, δ=.005, P=10 Sparse−APA, D=5, P=10 0.1 0.08 0.06 0.04 APA, δ=1 0.02 FAP, 0.5 1 1.5 δ=1, P=50 2 4 Samples x 10 Figure 3.9: Distance between real room response and w(k) in the presence of near end noise in function of time. Regularization parameters have been tuned to give equal initial convergence characteristics. The SNR of the far end speech signal versus the near end noise is higher in the first half of the signal than in the second half. Regularization reduces FAP performance more than APA performance. 3.5 Regularization in multichannel AEC An important issue is multichannel AEC, as we have already mentioned in chapter 1. When multiple loudspeakers are used to reproduce the sound that stems from one speech source, the non–uniqueness problem occurs [41, 4]. In Figure 3.11 the situation is depicted. Microphones in the far end room pick up the sound of ’Source’ filtered by the transmission room impulse responses g1 and g2 . These signals are then again filtered by the receiving room impulse responses h1 and h2 . If the length N of the echo canceller filter w is larger or equal than the length of the transmission room impulse responses, the following equation holds : g2 g1 T T x1 = x2 0 0 such that g2 0 X(k) −g1 = 0 0 which means that X(k) is rank deficient and hence that no unique solution exists for (2.7). 65 3.5. REGULARIZATION IN MULTICHANNEL AEC Tracking behaviour 0.25 |Wk−Wreal| 0.2 0.15 Sparse−APA 0.1 Explicitly regularized APA 0.05 0 0 0.5 1 1.5 2 2.5 4 x 10 Samples Figure 3.10: Tracking behaviour of Sparse APA (full line) compared to explicitly regularized APA (dotted line) (far end signal is a speech signal). Regularization is tuned to obtain equal initial convergence. At the 12000 th sample, the room characteristics change. The tracking behaviour remains equal. Note the small peaks that occur if the input signal is not ’persistently exciting’ (to be solved by the speech detection device) x1 h1 x2 Transmission room g1 h2 W g2 Source ei + di Receiving room Figure 3.11: The multichannel echo cancellation non uniqueness problem. Changes in either the transmission room or the receiving room will destroy successful echo cancellation when the T exact paths h1 and h2 have not been identified by w = w1 w2 . 66 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA Allthough the adaptive filter will find some solution, only the solution corresponding to the true receiving room echo paths is independent of the transmission room impulse responses. In a mono acoustic echo canceller setup, the filter has to re–adapt if the acoustical environment in the receiving room changes. In case a multichannel echo canceller does not succeed to identify the correct filter path, changes in the transmission room will also result in a residual echo signal occurring in the acoustic echo canceller output. In practice this situation does not strictly occur, because for echo cancellation the filter length N is usually smaller than the length of the impulse response in the transmission room g2 xT1 0 xT2 0 =α∼ =0 g1 But still, this means that the problem is typically ill–conditioned. Attempts to solve this problem can be found in the literature [41, 58, 28, 7], and they consist of decorrelating the loudspeaker signals (i.e. reducing the cross–correlation between the inputs of the adaptive filter by means of additional filtering operations, non–linearities, noise insertion, etc.). Obviously it is important that this remains inaudible. In addition to these decorrelation techniques (which can not be exploited too much because of the inaudibility constraint), it is important to use algorithms of which the performance is less sensitive to correlation in the input signal than NLMS. In [4] it is shown that RLS performs well because of its independence of the input eigenvalue spread. Since RLS is an expensive algorithm, and APA is intermediate between NLMS and RLS, APA is often considered to be a good candidate for use in multichannel AEC [34, 17, 33]. Experiments show that the influence of near end noise on the adaptation is a lot larger for a multichannel setup than for a mono echo canceller based upon affine projection, and that for good results the regularization has to be a lot stronger, i.e. the problem that occurs in FAP–based algorithms is even more present in this case. In [26], explicit regularization is suggested with δ equal to the near end noise power. But experiments show that this is not enough for the case with a large cross correlation between the input channels, when a large amount of noise is present. When various appropriately strong regularization methods are used instead, the performance drop due to the approximation in FAP is unacceptably large. For this reason, we propose to use the APA algorithm instead of FAP. In the experiments shown here, 50.000 samples of a signal sampled at 8 kHz have been recorded in stereo, the room impulse responses (1000 taps) we want the filter to identify have been generated artificially, and artificial noise (SNR=30 dB) has been added. If we apply the Sparse–APA algorithm with spacing D between the equations, exper- 67 3.5. REGULARIZATION IN MULTICHANNEL AEC iments show that the cross–correlation problem in the stereo algorithm adds to the auto–correlation problem that is already present in the mono–algorithm. This means that a much stronger regularization is required in the stereo case. We have chosen D = 200. This is to be compared with the typical value of D = 10 for the mono case. For exponential updating, a forgetting factor λ = 0.9998 is a typical value. As already mentioned, explicit regularization can only be used in the FAP–based algorithms for small regularization terms (δ comparable to the near end noise variance, which was 0.001 in our experiments). When large δ are required as in this stereo problem with a lot of noise present, one has to resort to an exact APA–implementation. Even a δ = 0.1, for which the FAP–approximation is not valid anymore, is not large enough to regularize the problem at hand, as shown in Figure 3.12. Eventually, δ = 2 was chosen. The results of these three techniques are shown in Figure 3.13, and can be seen to be comparable. 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 4 x 10 Figure 3.12: Distance between real and identified impulse response versus time for APA (speech signal) with an explicit regularization factor δ = 0.1. For the mono case this was sufficient, but obviously not for a stereo setup : the filter does not converge. Finally, we want to reiterate that strong regularization does not solve the stereo echo cancelling problem, only decorrelation techniques do. But regularization is necessary in addition in order to provide near end noise robustness. 68 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA Full : explicit with δ=2, dashed : exponential updating with λ=0.9999, dotted : sparse equations with D=200 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 4 x 10 Figure 3.13: 3 Ways of regularizing the stereo affine projection algorithm for a speech signal input with near end noise present. Full line is the distance between the real and the identified impulse response for explicit regularization with δ = 2, dashed line is the distance for exponential updating with λ = 0.9999 and dotted line is the distance for the sparse equations technique with an equation spacing of D = 200. The parameters are clearly higher than in the mono case. 3.6. CONCLUSION 3.6 69 Conclusion In this chapter, we have shown that if affine projection techniques are used for acoustic echo cancellation, it is important to provide sufficient regularization in order to obtain robustness against continuously present background noise. This is important in single channel echo cancellation, but even more so in multichannel echo cancellation. In the latter case, cross–correlation between the loudspeaker signals (and hence the input signals of the adaptive filter) leads to ill–conditioning of the problem. Regularization needs to be applied in addition to decorrelation techniques. We proposed to replace the FTF–based update of the small correlation matrix of size P in the original FAP algorithm by a QRD–based updating procedure that is numerically more stable. Diagonal loading is not easily implemented in this QRD–based approach, and therefore we have described two alternative approaches, exponential weighting, and a new technique based on ’sparse equations’. Performance–wise comparable results can be obtained by the three regularization techniques. We have shown that there are both advantages and disadvantages to the FAP–algorithm. Diagonal loading can be incorporated, because it uses the FTF–algorithm for updating the size P correlation matrix, but on the other hand it makes some approximations that are only valid when not too much regularization is applied, and hence it exposes performance problems when more regularization is applied (e.g. in multichannel echo cancellation). This observation motivates the derivation of the BEAPA–algorithm in chapter 4. 70 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA Chapter 4 Block Exact APA (BEAPA) for AEC We have explained in the previous chapter why regularization is an absolute necessity in affine projection based adaptive filtering algorithms, and that FAP (fast implementation of APA) relies on an implicit ’small regularization parameter’ assumption. This in particular may lead to poor performance of the FAP algorithm as compared to (properly regularized) APA. In this chapter, a block exact affine projection algorithm (BEAPA) is derived that does not rely on the assumption of a small regularization parameter. It is an exact frequency domain translation of the original APA algorithm, and still has about the same complexity as BEFAP, which is a similar frequency domain (hence low complexity) version of FAP. In a second stage, the BEAPA algorithm is extended to incorporate an alternative to explicit regularization that is based on so–called “sparse” equations (section 3.2). Section 4.1 will review FAP and a frequency domain version thereof : block exact FAP (BEFAP) [59]. FAP is not exactly equal to APA, and therefore some regularization techniques will have different effects in FAP, compared to APA as shown in the previous chapter. In section 4.2, a fast block exact frequency domain version of the affine projection algorithm is derived (Block Exact APA). This algorithm has a complexity that is comparable to the BEFAP–complexity, while it is an exact but fast version of APA. In section 4.3 Block Exact APA is extended to allow for the sparse equations technique to be used (Sparse Block Exact APA). 71 72 4.1 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC Block Exact Fast Affine Projection (BEFAP) In [36] and [59], a block exact version of FAP (see section 2.4.3) is derived, which is referred to as BEFAP. Since the derivation of the algorithm in this chapter is based upon BEFAP, it is instructive to review the concept of this algorithm in order to clarify the differences. A basis filter vector that is fixed during a block of size N2 (e.g. 128) is taken as a basis for a fast (frequency domain) convolution with a (possibly smaller, but a typical value is also 128) block length N1 . Since the filter vector is fixed during this block, the filtering operation can indeed be calculated cheaply in the frequency domain. N1 can be made smaller to reduce the delay of the system. To obtain an exact version of the FAP–algorithm, corrections to the residuals obtained with the fast convolution are calculated during the block. The complexity of the corrections grows within the block, but because of the choice of the parameters, it never reaches the complexity of the full filtering operation that is needed with FAP. After a block, all the corrections during the block of length N2 are applied to the basis filter vector, by means of a frequency domain convolution, resulting in the same output as the original FAP algorithm. If the filter vector were updated in each step and the basis filter vector w(k − 1) were known at time instant k, we can write w(k + i − 1) in terms of w(k − 1). We let sj (k) denote the j’th component of vector s(k) : w(k + i − 1) = w(k − 1) + i X µXP (k + i − j)g(k + i − j), j=1 = w(k − 1) + i+P X−1 sj (k + i − 1)x(k + i − j) j=1 − P −1 X sj (k − 1)x(k − j). j=1 The meaning of s(k) is as follows : since the columns in XP (k) shift through this matrix, the multiplications with g(k) can largely be simplified by adding together corresponding components of the vectors g(k) for k = 1..i and thus building up a vector s(k) recursively, containing such summed components. In what follows, k is the sample index at the start of a new block, while i is an index inside a block. If we let s|ji denote a sub–vector consisting of the i’th to the j’th element of s, the vector s(k + i − 1) is recursively obtained as s(k) |{z} ∈RP ×1 = 0 −1 s(k − 1)|P 1 + µg(k) (i = 1), 73 4.1. BLOCK EXACT FAST AFFINE PROJECTION (BEFAP) s(k + i − 1) = | {z } 0 s(k + i − 2) + µg(k + i − 1) 0i−1 (i > 1), (4.1) ∈R(P +i−1)×1 where 0i−1 is a null vector of size i − 1. Vector s(k) grows within a block, but its size is reset to P × 1 at each block border (where a new basis filter is calculated). So in each block, vector s(k + i − 1) grows from size P to size P + N2 − 1. The contents of the first P − 1 positions of s(k − 1) remain intact when crossing block borders. In BEFAP, the filter vector is not updated in each time step, but only at the end of a block. We will use the expression for w(k + i − 1) to derive corrections to the filter output that have to be applied after a filtering operation with the basis filter w(k − 1). The filter output is then written as y(k + i) = xT (k + i)w(k + i − 1) = xT (k + i)w(k − 1) + i+P X−1 − P −1 X sj (k + i − 1)xT (k + i)x(k + i − j) j=1 sj (k − 1)xT (k + i)x(k − j) j=1 = xT (k + i)w(k − 1) + i+P X−1 sj (k + i − 1)rj (k + i) (4.2) j=1 − P −1 X sj (k − 1)ri+j (k + i). j=1 We let rj (k) denote the j’th component of vector r(k). These correlations are defined as rj (k + i) ≡ xT (k + i)x(k + i − j). (4.3) In practical implementations, these correlations are recursively updated. Still referring to [59], one can avoid the third term in the equations for the output if one defines an alternative basis filtering vector : z(k − 1) = w(k − 1) − P −1 X sj (k − 1)x(k − j), (4.4) j=1 which can be updated as z(k + N2 − 1) +N2 −1 = z(k − 1) + X(k + N2 − P )s(k + N2 − 1)|P ,(4.5) P where X(k + N2 − P ) is defined as X(k + N2 − P ) = (4.6) x(k + N2 − P ) x(k + N2 − P − 1) . . . x(k − P + 1) . 74 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC We can now rewrite (4.2) as y(k + i) = xT (k + i)z(k − 1) + i+P X−1 sj (k + i − 1)rj (k + i). (4.7) j=1 The filtering operation with the alternative basis vector in the first term of equation 4.7 and the update to the next basis filter vector (4.5) after N2 samples can be performed in the frequency domain by means of fast convolutions. The block sizes of these convolutions do not necessarily have equal length : the block size for the filtering operation is N1 and the block size for the update is N2 . The overall complexity of this BEFAP–algorithm is 6M1 log2 M1 − 7M1 − 31 + 6N2 + 15P − 4+ N1 P2 − P 6M2 log2 M2 − 7M2 − 31 + 10P 2 + . N2 N2 (4.8) Algorithm 10 Block Exact FAP for j = 1 to N2 + P rj1 = (u|kk−L+1 )T u|k−j−1 k−j−L endfor loop for i = 0 to N2 − 1 if (i modulo N1 == 0) <fill next part of y with convolution of next block of N1 samples from u with z> endif ←−−−−−−−−−−−−− ←−−−−−−−−−−−−−−− k−L+i+1 r1 = r1 + uk+i+1 u|k+i+1 k+i+1−N −P +1 − uk−L+i+1 u|k−L+i+1−N −P +1 2 2 +i−1 T 1 i+P ek+i+1 = dk+i+1 − yk+i+1 − (s|P ) r |2 1 E1 = ek+i+1 for α = 2 to P Eα = (1 − µ)Eα−1 endfor <update S −1 , the P × P inverse covariance matrix> e = S −1 E g 0 µe g s= + N2 +P D−2 0N2 −1 s|1 endfor +N2 −1 +N2 +1 z = z+<convolution of s|P with u|k−P P k−P +N2 +1−L+1+1 > k = k + N2 endloop Where N1 and N2 are block lengths for the frequency domain algorithm (e.g. N1 = N2 = 128). Furthermore Mi = N + Ni − 1. The terms containing the logarithms 4.2. BLOCK EXACT APA (BEAPA) 75 are obviously due to the FFT–operations that are used. The complexities of the FFT’s 2 have been taken from [35]. The term P N−P can often be neglected because P N2 . 2 A typical example is P = 3, N1 = N2 = 128, N = 800 leading to 1654 flops per sample (which is about half the complexity of FAP). An algorithm description is provided in Algorithm 10. 4.2 Block Exact APA (BEAPA) In this section a fast implementation of APA (Block Exact Affine Projection Algorithm ) is derived, based on [59]. In FAP (and in BEFAP), the calculation of the lower P − 1 components of the error vector was based upon an approximation (2.27), while only the first component is really computed. In APA all components of the residual vector e ek are computed in each step instead of only the first one. We describe a method that does not require a full filtering operation for all of the P equations. The complexity of the new algorithm is 6M1 log2 M1 − 7M1 − 31 + 6N2 + 15P − 5 + 11P 2 N1 P2 − P 6M2 log2 M2 − 7M2 − 31 + . N2 N2 (4.9) This formula shows that even though the full error vector is calculated, the required number of flops is a lot smaller than doing P full filtering operations. For the example of section 4.1, P = 3, N1 = N2 = 128, N = 800, leading to 1662 flops. (The difference with the BEFAP–algorithm becomes slightly bigger when P is larger). Figure 4.1 is a schematical representation of the final algorithm. 4.2.1 Principle In the FAP–algorithm [26], only the first component of the error vector is calculated, and the others are approximated. Block Exact APA will be derived here along the lines of BEFAP, but such that all error vector components are calculated in each step. When k denotes the sample index corresponding to the beginning of a block, and i (1..N2 ) an index inside the block, we have e (k + i) = XPT (k + i)w(k e + i − 1), y e e (k + i), e(k + i) = dP (k + i) − y e(k + i) = (X T (k + i)X(k + i))−1 e g e(k + i), (4.10) 76 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC From far end Common for all microphones ∆ Blocklength Base Filter z r1 r 2 * * (XT X)−1 rP * * Mostly known from past s + + + + + + } g To far end e Figure 4.1: A schematical representation of the BEAPA algorithm in an echo canceller setup. Bold lines are vectors. A box with an arrow inside is a buffer that outputs a vector. e + i − 1) = w(k e − 1) + w(k i X µXP (k + i − j)e g(k + i − j). (4.11) j=1 In these expressions, dP (k + i) is the desired signal, e e(k + i) is a vector with the e (k + i) is the vector with the a priori errors of the P equations in this step, and y outputs of the filter for these P equations. We again propose to use QR–updating and downdating (or in case of exponential weighting as regularization updating only, see chapter 3) to keep track of R(k), the Cholesky factor of X(k) and use this triangular matrix in order to calculate 4.10 with quadratic complexity. From equation 4.11 it can be seen that, in a similar way as was done for BEFAP, we can write e + i − 1) = w(k e − 1) + w(k i+P X−1 j=1 sej (k + i − 1)x(k + i − j) − P −1 X j=1 sej (k − 1)x(k − j), where w̃(k−1) is our basis filter vector, and where the vector e s(k+i−1) is recursively obtained from e s(k + i − 2). In the beginning of each block, the size of vector e s(k) is reset. The recursion is 0 e s(k) = + µe g(k) i = 1, (4.12) −1 e s(k − 1)|P 1 e(k + i − 1) 0 g e s(k + i − 1) = +µ i > 1. (4.13) e s(k + i − 2) 0i−1 77 4.2. BLOCK EXACT APA (BEAPA) The filter outputs yeα (k + i) can now be written as yeα (k + i) e + i − 1) = xT (k + i − (α − 1))w(k T e − 1) + = x (k + i − (α − 1))w(k i+P X−1 j=1 P −1 X j=1 sej (k + i − 1)xT (k + i − (α − 1))x(k + i − j) − sej (k − 1)xT (k + i − (α − 1))x(k − j) e − 1) + = xT (k + i)w(k P −1 X j=1 i+P X−1 j=1 sej (k + i − 1)rjα (k + i) − α sej (k − 1)ri+j (k + i). The correlations are defined as rjα (k + i) ≡ xT (k + i − (α − 1))x(k + i − j), which is merely a shorthand notation for rj−(α−1) (k + i − (α − 1)) as defined in formula 4.3. We proceed (similarly to (4.4)) by defining a modified basis filter vector e e − 1) − z(k − 1) = w(k P −1 X j=1 sej (k − 1)x(k − j), then the filter output can be written as yeα (k + i) = xT (k + i − (α − 1))e z(k − 1) + e sT (k + i − 1)rα (k + i), (4.14) +N2 −1 e z(k + N2 − 1) = e z(k − 1) + X(k + N2 − P )e s(k + N2 − 1)|P , P (4.15) in which both e s(k + i − 1) and rα (k + i) are vectors of length i + P − 1. The autocorrelation vector rα (k + i) is needed to calculate the correction for the α’th e (k + i) (which is needed in turn to calculate the α’th component of the component of y residual vector e e(k + i)). The first term of this equation is a filtering operation with a filter vector e z(k − 1) that is fixed over a block of size N2 , and that is independent of α, so it can be performed efficiently in the frequency domain. The second term is growing inside each block. A recursion for the new filter vector can also be derived along the lines of [59], which gives where e s|ji is a sub–vector consisting of the i’th through j’th element of e s. The matrix X(k) has been defined in (4.6). Here too, the matrix–vector product can be calculated in the frequency domain with fast convolution techniques (block size N2 ). 78 4.2.2 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC Complexity reduction The calculation of all the components of the error vector seems to render its calculation P times as complex. However some important simplifications can be introduced. Writing out the correlation vectors used in the corrections in (4.14) (an example is given for a more general case further on in this chapter), one notices that a recursion exists for them : α−1 rβα (k + i) = r2−(β−1) (k + i − 1) for β ≤ 2, α > 1, (4.16) α−1 rβα (k + i) = rβ−1 (k + i − 1) for β ≥ 2, α > 1. (4.17) Since memory may be comparably expensive as processing power, this recursion appears not to be any advantage, because N2 + P delay lines would be needed. But we can build on this recursion to achieve a major complexity reduction. The update recursion for s(k + i) (equation (4.13)) shows that a shift operation is applied to this vector (which grows) in each step, and only the first P elements change after this shift. In the calculation of the error vector, s is multiplied with each of the P vectors rα . This means that part of e sT (k + i − 1)rα (k + i), a scalar, has already been calculated in e step k + i − 1 (namely sT (k + i − 1 − 1)rα−1 (k + i − 1)), and we can calculate a correction s(k + i − 1) to this that consists of the accumulated updates to s(k + i − 1 − 1) multiplied by the relevant first part of the correlation vector : s(k + i − 1) 0P =e s(k + i − 1) − 0 e s(k + i − 2) . (4.18) The (fixed) length of s(k + i − 1) is P . For α > 1, equations 4.17 and 4.18 lead to e sT (k + i − 1)rα (k + i) T T s(k + i − 1) 0 α = r (k + i) + rα (k + i) e 0P s(k + i − 2) r2α (k + i) r3α (k + i) = sT (k + i − 1)e rα (k + i) + e sT (k + i − 2) .. . α r1+i+P (k + i) = sT (k + i − 1)e rα (k + i) + e sT (k + i − 2) r1α−1 (k + i − 1) r2α−1 (k + i − 1) .. . α−1 ri+P (k + i − 1) .(4.19) 4.3. SPARSE BLOCK EXACT APA 79 The vector e rα (k + i) is formed from the first P components of rα (k + i). The last term of equation 4.19 (a scalar) has already been calculated in step k + i − 1. So in each step, one needs to calculate a ’large’ vector product for the correction to the first component of the error vector (with the same size as in the BEFAP–algorithm), and P − 1 small vector products (size P + 1 ) since the second term from 4.19 can be fed through a delay line. For the calculations of the corrections for the first error vector +N2 ) component, this gives an average of (P +N2 −1)(P flops per sample, where N2 is N2 the block size. To this, (P + 1) flops per sample for each of the P − 1 remaining components of the error vector must be added. For typical (small) values of P , this is a lot less complex than calculating all the components straightforwardly. In this setting, one is free to choose if the e rα (k + i) are taken from previous steps as described in equations 4.16 and 4.17, or if they are recalculated at the time that they are needed (by up– and downdates). The latter requires less memory (for delay lines), but more flops. In the acoustic echo cancelling application, often scenario’s occur with multiple microphone setups. Instead of merely repeating the full AEC–scheme, the updates for the inverse correlation matrix and the updates of the correction correlation vectors can be shared among different microphone channels. We have now derived an algorithm that is an exact frequency domain version of the original affine projection algorithm. If QR–updating is used to keep track of the correlation matrix, exponential weighting can be used to incorporate regularization. But also FTF–type algorithms can be used to this end, and than also explicit regularisation can be used. The fact that no approximations are made, is — referring to the results in the previous chapter — clearly an advantage compared to FAP and BEFAP. 4.2.3 Algorithm specification An algorithm description of BEAPA can be found in Algorithm 11. We let u = [x(1), x(2), ...]T be the input signal, and v = [d(1), d(2), ...]T the desired signal, in both of which the order of the samples is different from the definitions of x and d. A right to left arrow above a vector flips the order of the components. 4.3 Sparse Block Exact APA In this section a sparse–equations version of Block Exact APA (Sparse–Block Exact APA) is derived where the ’sparse equations’ technique of section 3.2 is used for regularization. The complexity of the new algorithm is 80 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC Algorithm 11 Block Exact Affine Projection Algorithm for j = 1 to N2 + P rj1 = (u|kk−L+1 )T u|k−j−1 k−j−L endfor for α = 2 to P for j = 1 to 1 + P k−(α−1) rjα = (u|k−L−(α−1)+1 )T u|k−j−1 k−j−L endfor endfor loop for i = 0 to N2 − 1 if (i modulo N1 == 0) <fill next part of y with convolution of next block of N1 samples from u with z> endif ←−−−−−−−−−−−−− ←−−−−−−−−−−−−−−− k−L+i+1 r1 = r1 + uk+i+1 u|k+i+1 k+i+1−N2 −P +1 − uk−L+i+1 u|k−L+i+1−N2 −P +1 for α = 2 to P rα = ←−−−−−−−−−−−−− ←−−−−−−−−−−−−−−− α r + uk+i−(α−1)+1 u|k+i+1 − uk−L+i−(α−1)+1 u|k−L+i+1 k+i+1−(1+P )+1 k−L+i+1−(1+P )+1 endfor +i−1 T 1 i+P A1k+i+1+1 = (s|P ) r |2 1 ek+i+1 = dk+i+1 − yk+i+1 − A1k+i+1+1 E1 = ek+i+1 for α = 2 to P α−1 D T α 1+P Aα k+i+1+1 = Ak+i+1 + (s ) r |2 Eα = vk+i−α+1 − y(k + i − α + 1) − Aα k+i+1+1 endfor −1 <update S , the P × P inverse covariance matrix> e = S −1 E g 0 µe g s= + N2 +P D−2 0N2 −1 s|1 endfor +N2 −1 +N2 +1 z = z+<convolution of s|P with u|k−P P k−P +N2 +1−L+1+1 > k = k + N2 endloop 6M1 log2 M1 − 7M1 − 31 + 6N2 + 13P + 2P D − 4 + 10P 2 N1 P 2 D2 − P D 6M2 log2 M2 − 7M2 − 31 + P 2D − D + . N2 N2 (4.20) A typical example (see section 4.2) is P = 3, N1 = N2 = 128, N = 800, D = 3 or 5 leading to 1690 or 1718 flops per sample. 81 4.3. SPARSE BLOCK EXACT APA 4.3.1 Derivation Since the Sparse BEAPA algorithm is derived in a manner very similar to BEAPA, we will only briefly state the derivation. The update equations are : bPT (k + i)w(k b (k + i) = X b + i − 1), y b + i) − y b b (k + i), e(k + i) = d(k b T (k + i)X bP (k + i))−1 b b(k + i) = (X g e(k + i), P b + i − 1) = w(k b − 1) + w(k i X j=1 bP (k + i − j)b µX g(k + i − j). (4.21) We can rewrite b +i−1) = w(k b −1)+ w(k i+P D−1 X j=1 sb(k +i−1)j x(k +i−j)− PX D−1 j=1 sbj (k −1)x(k −j) A recursion for ŝ(k) can be written : b s(k) = b s(k + i − 1) = 0 D−1 b s(k − 1)|P 1 0 b s(k + i − 2) +µ + µgb0 (k), gb0 (k + i − 1) 0i−1 with gb1 (k + i − 1) 0D−1 gb2 (k + i − 1) 0D−1 .. . 0 gb (k + i − 1) = gbP (k + i − 1) 0D−1 . (4.22) , (4.23) 82 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC The filter outputs ybα (k + i) can now be written as ybα (k + i) b + i − 1) = xT (k + i − (α − 1)D)w(k T b − 1) + = x (k + i − (α − 1)D)w(k i+P D−1 X j=1 PX D−1 j=1 sbj (k + i − 1)xT (k + i − (α − 1)D)x(k + i − j) − sbj (k − 1)xT (k + i − (α − 1)D)x(k − j) b − 1) + = xT (k + i)w(k PX D−1 j=1 i+P D−1 X j=1 sbj (k + i − 1)b rjα (k + i) − α sbj (k − 1)b ri+j (k + i). The correlations are defined as rbjα (k + i) ≡ xT (k + i − (α − 1)D)x(k + i − j). The modified filter vector is b b − 1) − z(k − 1) = w(k PX D−1 j=1 sbj (k − 1)x(k − j). Then the filter output can be written as ybα (k + i) = xT (k + i − (α − 1)D)b z(k − 1) + b sT (k + i − 1)b rα (k + i), (4.24) in which both b s(k + i − 1) and b rα (k + i) are vectors of length i + P D − 1. A recursion for the filter vector is D+N2 −1 b z(k + N2 − 1) = b z(k − 1) + X(k + N2 − P D)b s(k + N2 − 1)|P . (4.25) PD The matrix X(k) has been defined in 4.6. Note that it contains all input vectors, not only one out of D. Again, the matrix–vector product can be calculated in the frequency domain with fast convolution techniques (block size N2 ). 4.3.2 Complexity reduction Just like in the BEAPA–algorithm, a recursion exists for the correlation vectors used in the corrections in 4.24. Take e.g. D = 2 : b r1 (k + i − 2) = (4.26) 83 4.3. SPARSE BLOCK EXACT APA x(k + i − 2)x(k + i − 3) + x(k + i − 3)x(k + i − 4) + . . . + x(k + i − L − 1)x(k + i − L − 2) x(k + i − 2)x(k + i − 4) + x(k + i − 3)x(k + i − 5) + . . . + x(k + i − L − 1)x(k + i − L − 3) . . . x(k + i − 2)x(k − P D + 1) + x(k + i − 3)x(k − P D) + . . . + x(k + i − L − 1)x(k − L − P D + 2) b r2 (k + i) = (4.27) x(k + i − D)x(k + i − 1) + x(k + i − D − 1)x(k + i − 2) + . . . + x(k + i − D − L + 1)x(k + i − L) x(k + i − D)x(k + i − 2) + x(k + i − D − 1)x(k + i − 3) + . . . + x(k + i − D − L + 1)x(k + i − L − 1) x(k + i − D)x(k + i − 3) + x(k + i − D − 1)x(k + i − 4) + . . . + x(k + i − D − L + 1)x(k + i − L − 2) . . . x(k + i − D)x(k − P D + 1)) + . . . + x(k + i − D − L + 1)x(k − L − P D + 2) For this example, one can see that the third to the last component of 4.27, which is the autocorrelation vector needed to calculate the second component of the residual vector at time k + i, have already been calculated 2 time steps before as the first rows of 4.26 for the (smaller) autocorrelation vector needed to calculate the first component of the residual vector at time k + i − 2. Generalizing the above example, we can say that all the components starting from component with index D + 1 from b rα (k + i) α−1 are already available in b r (k + i − D). To interpret this, one has to bear in mind that the vectors b rα (k + i) grow in length with i. Writing out also b r1 (k + i − 1) and 1 α b r (k + i) would show that the components of b r (k + i) with indices from 1 to D are also already calculated in previous steps. This can be summarized as α−1 rbβα (k + i) = rb(D+1)−(β−1) (k + i − D) for β ≤ D + 1, α > 1, (4.28) α−1 rbβα (k + i) = rbβ−D (k + i − D) for β ≥ D + 1, α > 1. (4.29) Instead of using N2 + P D delay lines of length D, we can again reduce this to P scalar delay lines of length D. For b s(k + i) only the first P D elements change after the shift operation in each time step : b s(k + i − 1) 0P D =b s(k + i − 1) − 0D b s(k + i − D − 1) . (4.30) The fixed length of b s(k + i − 1) is D + P D − 1. For α > 1, equations 4.29 and 4.30 lead to 84 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC b sT (k + i − 1)b rα (k + i) T T 0D b s(k + i − 1) α b b = r (k + i) + rα (k + i) b s(k + i − D − 1) 0P D α rbD+1 (k + i) α rbD+2 (k + i) T =b s (k + i − 1)b rα (k + i) + b sT (k + i − D − 1) .. . α rbD+i+P D (k + i) T =b s (k + i − 1)b rα (k + i) + α−1 rb1 (k + i − D) rb2α−1 (k + i − D) b sT (k + i − D − 1) .. . α−1 rbi+P D (k + i − D) (4.31) . The vector b rα (k + i) is formed from the first D + P D − 1 components of b rα (k + i). The last term of equation 4.31 (a scalar) has already been calculated in step k + i − D. Hence in each step, one needs to calculate a ’large’ vector product for the correction to the first component of the error vector (with the same size as in the BEFAP– algorithm), and P − 1 small vector products (size P D + D). For the calculations of the corrections for the first error vector component, this gives an average of (P D + N2 − 1)(P D + N2 ) N2 flops per sample, where N2 is the block size in the BEFAP–algorithm. To this, (P D + D) flops per sample for each of the P − 1 remaining components of the error vector must be added. 4.3.3 Algorithm specification A complete specification is given in Algorithm 12. Again u = [x(1), x(2), ...]T is the input signal, and v = [d(1), d(2), ...]T the desired signal. These definitions differ from the definitions of x and d. A right to left arrow above a vector flips the order of the components. 4.4. CONCLUSION 85 Algorithm 12 Sparse Block Exact APA for j = 1 to N2 + P D rj1 = (u|kk−L+1 )T u|k−j−1 k−j−L endfor for α = 2 to P for j = 1 to D + P D k−(α−1)D rjα = (u|k−L−(α−1)D+1 )T u|k−j−1 k−j−L endfor endfor loop for i = 0 to N2 − 1 if (i modulo N1 == 0) <fill next part of y with convolution of next block of N1 samples from u with z> endif ←−−−−−−−−−−−−−− ←−−−−−−−−−−−−−−−− k−L+i+1 r1 = r1 + uk+i+1 u|k+i+1 k+i+1−N2 −P D+1 − uk−L+i+1 u|k−L+i+1−N2 −P D+1 for α = 2 to P ←−−−−−−−−−−−−−− rα = rα + uk+i−(α−1)D+1 u|k+i+1 − k+i+1−(D+P D)+1 ←−−−−−−−−−−−−−−−−− k−L+i+1 uk−L+i−(α−1)D+1 u|k−L+i+1−(D+P D)+1 endfor D+i−1 T 1 i+P D A1k+i+1+D = (s|P ) r |2 1 ek+i+1 = dk+i+1 − yk+i+1 − A1k+i+1+D E1 = ek+i+1 for α = 2 to P α−1 D T α D+P D Aα k+i+1+D = Ak+i+1 + (s ) r |2 Eα = vk+i−αD+1 − y(k + i − αD + 1) − Aα k+i+1+D endfor −1 <update S , the P × P inverse covariance matrix> e> <spread the result of S −1 E into g for m = D downto 2 0 µe g + sm = D−2 0 sm−1 |D+P D−1 1 endfor µe g s1 = 0D−1 0 µe g s= + N2 +P D−2 0N2 −1 s|1 endfor D+N2 −1 D+N2 +1 z = z+<convolution of s|P with u|k−P PD k−P D+N2 +1−L+1+1 > k = k + N2 endloop 4.4 Conclusion We have derived a block exact frequency domain version of the affine projection algorithm, named Block Exact APA (BEAPA). This algorithm has a complexity that is 86 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC comparable with the complexity of a block exact frequency domain version of fast affine projection (namely BEFAP), while it does not use the approximations that are present in (BE)FAP. It has the advantage that the convergence characteristics of the original affine projection algorithm are maintained when regularization is applied, while this is not the case when FAP–based fast versions of APA are used. This algorithm has also been extended to allow for the ’sparse equations’ technique for regularization to be used. This is a technique that regularizes the affine projection algorithm if it is used with signals that have a large autocorrelation only for a small lag (e.g. speech). It can be used as a stand alone technique for regularization, if a ’voice activity detection device’ is present that can prevent the inverse correlation matrix to become infinitely large when no far end signal is present, as it is the case in each echo canceller. Chapter 5 QRD–RLS based ANC While the previous chapters were focussed on reference–based noise reduction (AEC) (which means a reference signal for the disturbances was available, namely the loudspeaker signal), in the next chapters we will concentrate on reference–less noise reduction (ANC). In this chapter we will derive an MMSE–optimal unconstrained filtering technique with a complexity that is an order of magnitude smaller than the complexity of existing noise cancellation algorithms based upon this technique. Performance will be kept at the same level though. The new algorithm is based upon a QRD–RLS adaptive filter. While conventional adaptive filtering algorithms have a ’desired signal’–input, our algorithm does not require this desired signal (which is unknown for noise reduction applications). In the next chapter we will — by thoroughly modifying the basic equations and by employing the fast QRD–LSL algorithm — reduce the complexity even by another order of magnitude. This chapter is organized as follows. In sections 5.2 and 5.3 we will review unconstrained optimal filtering based ANC. Then we introduce our novel approach based upon recursive QRD–based optimal filtering in section 5.3. In section 5.4 we introduce an algorithm that provides a trade–off parameter with which one can tune the system so that more noise reduction is obtained in exchange for some signal distortion. Finally, complexity figures and simulation results are given in sections 5.5 and 5.6. 87 88 5.1 CHAPTER 5. QRD–RLS BASED ANC Introduction In teleconferencing, hands–free telephony or voice controlled systems, acoustic noise cancellation techniques are used to reduce the effect of unwanted disturbances (e.g. car noise, computer noise, background speakers). Single microphone approaches typically exploit the differences in the spectral content of the noise signal(s) and the speech signal(s) to enhance the input signal. Since a speech signal is highly non–stationary, this may result in a rapidly changing filter being applied to the noisy speech signal. A residual noise signal with continuously changing characteristics or even ’musical noise’ will typically appear at the output. A classic example of this class of algorithms is spectral subtraction [16]. Multi–microphone techniques can additionally take into account spatial properties of the noise and speech sources. In general an adaptive signal processing technique is required since the room characteristics indeed change even with the slightest change in the geometry of the environment. In literature various adaptive multimicrophone ANC techniques have been described, all of them having their advantages and disadvantages. Griffiths–Jim beamforming [60] is a constrained optimal filtering method that aims at adaptively steering a null in the direction of the noise source(s) (or ’jammer(s)’), while keeping a distortionless response in the direction of the speech source. Unconstrained optimal filtering [12][13] is an alternative approach that also takes into account both spectral and spatial information. Unlike Griffiths–Jim beamforming it does not rely on a priori information and hence posesses improved robustness [14]. A speech–noise detection algorithm is needed and crucial for proper operation, but will not be further investigated here, as it will be assumed that a perfect speech detection signal is available. In chapter 1 references to methods for speech/noise detection are given. We note that when the algorithm classifies a ’speech’ period as a ’noise only’ period (for longer time periods), this will result in signal distortion, since signal then ’leaks’ into the noise correlation matrix. This is probably a worse situation than when a noise–period is classified as speech. In that case only the estimation of the noise characteristics is not done at that time, and the statistics of the speech signal (spatial characteristics) are ’forgotten’, but this is not really a problem when the misclassification only occurs during a short period. In an unconstrained optimal filtering approach, the microphone signals are fed to an adaptive filter. The optimal filter attempts to use all available information (also reflections coming from directions other than the speech source direction) in order to optimally reconstruct the signal of interest. This effectively means that the spatial pattern of the filter will resemble a beamforming pattern if the reverberation in the room is low, but that the filter will perform better than conventional beamformers under higher reverberation conditions [12]. 89 5.2. UNCONSTRAINED OPTIMAL FILTERING BASED ANC In [12][13][15] the unconstrained optimal filtering problem was solved by means of a GSVD (Generalised Singular Value Decomposition)–approach, while in this chapter we will describe a QRD–based optimal filtering algorithm. While the performance remains roughly the same, the QR–decomposition based algorithm is significantly less complex than the GSVD–based algorithm. The GSVD– approach has a complexity of O(M 3 N 3 ) where M is the number of microphones and N is the number of filter taps per microphone channel. A reduced complexity approximation is possible for the GSVD–approach (based on GSVD–tracking), leading to O(27.5M 2 N 2 ) [13] . The QRD–based approach that we will derive in this paper lowers the complexity to O(3.5M 2 N 2 ) while the performance is equal to that of the initial GSVD–approach, and no approximation whatsoever is employed. Noise Noise h1 h2 h Noise h 4 3 Noise x1 w1 x2 w2 x3 w3 x4 w4 + ^ d Speech component hereof is desired output signal d (unknown) Original speech signal s Figure 5.1: Adaptive optimal filtering in the acoustic noise cancellation context. 5.2 Unconstrained optimal filtering based ANC A typical noise cancellation setup is shown schematically in Figure 5.1 for an array with 4 microphones. A speaker’s voice is picked up by the microphone array, together with noise stemming from sources for which no reference signal is available. Examples are computer fans, air conditioning, other people talking to each other in the background. The absence of a reference signal is the main difference between the ANC techniques we will discuss in this part of this text, and the AEC techniques in chapters 3 and 4. The speech signal s(k) in figure 5.1 is obviously unknown. If we would aim at designing a filter that optimally reconstructs s(k) as a desired signal, then this filter would not only have to cancel the noise in the microphone signals, but it would also have to model the inverse of the acoustic impulse response from the position of the speech source to the position of the microphones. We want to avoid this, since dereverbera- 90 CHAPTER 5. QRD–RLS BASED ANC tion is a different problem that requires different techniques. Hence, we will not use the speech signal itself as a desired signal, but rather the speech component in one (or each) of the microphone signals, which obviously is unknown too. The speech component in the i’th microphone at time k is O di (k) = hi (k) s(k) i = 1 . . . M, where M is the number of microphones, s(k) is the speech signal and hi (k) represents N the room impulse response path from the speech source to microphone i, and is the convolution symbol. The i’th microphone signal is xi (k) = di (k) + vi (k) i = 1 . . . M, where vi (k) is the noise component (sum of the contributions of all noise sources at microphone i). We define the filter input vector as in (2.4), which is repeated for convenience here x(k) = x1 (k) x2 (k) .. . xM (k) x1 (k) = x1 (k) x1 (k − 1) .. . x1 (k − N + 1) . The noise vector v(k) and the speech component signal vector d(k) are defined in a similar way, with x(k) = d(k) + v(k). The following assumptions are made • The noise signal is uncorrelated with the speech signal. This results in ε{x(k)xT (k)} = ε{v(k)vT (k)} + ε{cross terms} +ε{d(k)dT (k)} | {z } =0 T = ε{v(k)v (k)} + ε{d(k)dT (k)} ε{d(k)d (k)} = ε{x(k)xT (k)} − ε{v(k)vT (k)}. T Here ε{·} is the expectation operator. • The noise signal is stationary as compared to the speech signal (by which we mean that its statistics change more slowly). This assumption allows us to estimate ε{v(k)vT (k)} during periods in which only noise is present, i.e. ε{v(k)vT (k)} ∼ = ε{v(k − ∆)vT (k − ∆)}with x(k) = v(k) + 0 during noise–only periods. The unconstrained optimal filtering (Wiener filtering) problem is then given as n 2 o (5.1) min ε xT (k)Wwf (k) − dT (k)2 , Wwf (k) 91 5.3. QRD–BASED ALGORITHM where ε{•} is the expected value operator. Note that for the time being we compute the optimal filter to estimate the speech in all (delayed) microphone signals (cfr. definition of d(k)). We can now write the Wiener solution for the optimal filtering problem with x(k) the filter input and d(k) the (unknown) desired filter output is then given as [30] Wwf (k) = (ε{x(k)xT (k)})−1 ε{x(k)dT (k)} = (ε{x(k)xT (k)})−1 ε{(d(k) + v(k))dT (k)} = (ε{x(k)xT (k)})−1 ε{d(k)dT (k)} = (ε{x(k)xT (k)})−1 (ε{x(k)xT (k)} − ε{v(k)vT (k)}). (5.2) If all statistical quantities in the above formula were available, Wwf (k) could straightforwardly be computed with O(M 3 N 3 ) complexity. Each column of Wwf (k) provides the optimal M N –taps filter for optimally estimating the corresponding element of d(k) from x(k), i.e. d̂T (k) = xT (k)Wwf (k). (5.3) In [13] a GSVD–approach to this optimal filtering is described. The GSVD–approach is based upon the joint diagonalisation ε{x(k)xT (k)} = E(k)diag{σi2 (k)}E T (k) ε{v(k)vT (k)} = E(k)diag{ηi2 (k)}E T (k), (5.4) which is then actually calculated by means of a GSVD–decomposition of the data matrices (see also section 5.3). From 5.4 we get Wwf (k) = E −T (k)diag{ σi2 (k) − ηi2 (k) T }E (k). σi2 (k) In the GSVD–algorithm, only afterwards one column of Wwf is picked to serve as a filter vector. A GSVD–approach would have a O(M 3 N 3 ) complexity, but in practice the GSVD solution is tracked or updated (this involves an approximation), which means that the filter can be tracked in O(27.5M 2 N 2 ) flops per sample. 5.3 QRD–based algorithm In this chapter we will present an alternative QRD–updating based approach that leads to comparable performance (or even improved performance since it does not need an SVD–tracking approximation to reduce complexity), but at a significantly lower cost. In the QRD–approach we can select one single entry of dT (k) that we want to estimate, before we compute the optimal filter. The right hand side part of the corresponding LS–estimation problem will then be a vector instead of a matrix. This is the main reason for the dramatical complexity reduction, but besides this of course QRD– updating in itself is cheaper than GSVD–updating. In order to maintain the parallel 92 CHAPTER 5. QRD–RLS BASED ANC between our approach and the GSVD–procedure, we will still consider the full d(k)– vector throughout the derivation, keeping in mind that for a practical implementation, one would select only one element of it. As we want to track any changes in the acoustic environment, we will introduce a weighting in order to reduce the impact of the contributions from the past. Let λs denote the forgetting factor for the speech+noise data, which can be different from λn , the forgetting factor for the noise–only data. A speech/noise detection device will be necessary to operate the algorithm. Since the noise is assumed to be stationary as compared to the speech contribution, one will often make 0 λs < λn < 1. Our scheme will be based on storing and updating an upper triangular matrix R(k), such that RT (k)R(k) = X T (k)X(k), where we want X T (k)X(k) to be an estimate for ε{x(k)xT (k)}. This is realized by X T (k + 1)X(k + 1) = λ2s X T (k)X(k) + (1 − λ2s )x(k + 1)xT (k + 1). (5.5) Note that this is a slightly different weighting scheme from the one that is explained in equation (2.15), the difference being merely an overall rescaling with (1 − λ2s ). The noise correlation matrix estimate is defined as V T (k + 1)V (k + 1) = λ2n V T (k)V (k) + (1 − λ2n )v(k + 1)vT (k + 1). (5.6) The optimal filtering solution is then obtained as Wqr (k) = (RT (k)R(k))−1 (RT (k)R(k) − V T (k)V (k)) | {z } ≡P (k) Wqr (k) = I − R−1 (k)R−T (k)P (k), where R(k) is the Cholesky factor of X(k) and I is the identity matrix. Due to the second assumption in section 5.2 (namely that the noise is stationary), P (k) can be kept fixed during speech + noise periods and updated (based on formula (5.6)) during noise–only periods. RT (k)R(k) is fixed during noise only periods and updated (based on formula (5.5)) during speech+noise periods. Note that the computed Wqr (k) corresponds to the least squares estimation problem 2 min kD(k) − X(k)Wqr (k)k2 , Wqr (k) where however D(k) = X(k) − V (k) is unknown. Hence, Wqr (k) is a matrix of which the columns are filters that reduce the noise components in the microphone signals in an optimal way. It is clear that WqrN (k) = I − Wqr (k) then provides a set of filters that optimally estimate the noise components in the microphone signals with WqrN (k) = I − Wqr (k) = R−1 (k) R−T (k)P (k) . | {z } ≡B(k) 5.3. QRD–BASED ALGORITHM 93 In the procedure described here, we keep track of both R(k) and B(k) so that at any time WqrN (k) can be computed by backsubstitution in R(k)WqrN (k) = B(k). (5.7) The only storage required is for the matrix R(k) ∈ <M N ×M N and for B(k) ∈ <M N ×M N . In fact, only one column of B(k) has to be stored and updated (cfr. supra), thus providing a signal or noise estimate for the corresponding microphone signal. There are two modes in which the different variables R(k) and B(k) have to be updated, namely speech+noise–mode, and noise only–mode. 5.3.1 Speech+noise – mode Whenever a signal segment is identified as a speech+noise–segment, P (k) is not updated (second assumption), but R(k) needs to be updated. The update formula for R(k) is (compare to (2.16)) p T 0 1 − λ2s xT (k) = Q (k + 1) , R(k + 1) λs R(k) where R(k +1) is again upper triangular1 . As explained in chapter 2, this update gives T both the new upper triangular matrix R(k + 1) and the orthogonal matrix Q (k + 1) containing the necessary rotations to obtain the update. Updating R(k) also implies a change in the stored B(k) = R−T (k)P (k). In order to derive this update, we need an expression for the update of R−1 (k). It is well known2 that the same rotations used to updat R(k) can also be used to update R−T (k) : 0 T ∗ Q (k + 1) = , 1 −T (k) R−T (k + 1) λs R with ∗ a don’t care entry. Hence we have ∗ ∗ = P (k + 1) B(k + 1) R−T (k + 1) ∗ = P (k) R−T (k + 1) 0 T = Q (k + 1) P (k) 1 −T (k) λ R s 0 T = Q (k + 1) . 1 B(k) λs 1 Q(k) 2 This = Q(k)Q(k − 1)...Q(0) does not need to be stored. T x̃ (k) is easily shown starting from 0 R−1 (k) =I R(k) 94 CHAPTER 5. QRD–RLS BASED ANC Note that B(k) is weighted with λ1s which is different from the standard exponential weighting in the right hand side of QRD–based adaptive filtering. The complete update can be written in one single matrix update equation : 0 rT (k + 1) R(k + 1) B(k + 1) T Q (k + 1) p 1 − λ2s xT (k + 1) λs R(k) 0 1 λs B(k) = . (5.8) The least squares solution WqrN (k + 1) can now be computed by backsubstitution (equation 5.7), but we will show later on that (using residual extraction) an estimate of the noise can be calculated directly from r(k +1). A signal flow graph of this updating procedure is given in Figure 5.2. x2(k+1) x1(k+1) x2(k) x1(k) R11 R12 R13 0 0 0 r (k+1) 1 r2(k+1) r (k+1) 3 R14 0 R22 R23 R24 R(k) 0 R33 memory cell (delay) 1/λ x R34 0 R44 0 memory cell (delay) λx B(k) Figure 5.2: Updating scheme for signal+noise mode. On the top left new input vectors enter (2 channels, and 2 taps per channel). Rotations are calculated and fed to the right hand side which is updated with 0’s as input. 5.3. QRD–BASED ALGORITHM 5.3.2 95 Noise only–mode. In the noise–only case, one has to update B(k) = R−T (k)P (k) = R−T (k)V T (k)V (k), while R(k) is obviously kept fixed. From equation (5.6) and the fact that in noise–only mode R(k + 1) = R(k), we find that p p B(k + 1) = λ2n B(k) + (R−T (k + 1) 1 − λ2n v(k + 1)) 1 − λ2n vT (k + 1)). p Given R(k + 1), we can compute (R−T (k + 1) 1 − λ2n v(k + 1)) by a backsubstitution. By using an intermediate vector a(k + 1) : p RT (k + 1)a(k + 1) = 1 − λ2n v(k + 1). p A simple multiplication a(k + 1) 1 − λ2n vT (k + 1) now gives p the update for all columns of B(k + 1), i.e. B(k + 1) = λ2n B(k) + a(k + 1) 1 − λ2n vT (k + 1). As already mentioned, R(k) is not updated in this mode, so in figure 5.2 only the framed black boxes (memory cells in the right hand part) are substituted with the corresponding elements of B(k + 1). Note again that, while in the GSVD–based method, all columns of Wgsvd (k + 1) are calculated, and afterwards one of them is selected (arbitrarily) to provide one specific speech signal estimate, the QRD–based method allows one to choose one column (signal) on beforehand, and do all computations for only that one column. 5.3.3 Residual extraction From (2.24) and (5.8) it can be shown that if x(k + 1) belongs to a signal+noise– period, the estimate for the noise components v̂(k + 1) in the microphone signals can be written as v̂T (k + 1) = xT (k + 1)WqrN (k + 1) p −(0 − 1 − λ2s xT (k + 1)WqrN (k + 1)) p 1 − λ2s QM N cos θi (k + 1)rT (k + 1) p = − i=1 , 1 − λ2s which means that an estimate of the noise component is obtained as a least squares residual with a 0 right hand side input. This is exactly the type of right hand side input applied in speech+noise mode updates (section 5.3.1). To obtain the signal estimates d̂(k+1), the noise estimates then have to be subtracted from the reference microphone 96 CHAPTER 5. QRD–RLS BASED ANC signal d̂T (k + 1) = xT (k + 1)(I − WqrN (k + 1)) = xT (k + 1) − v̂T (k + 1) QM N cos θi (k + 1)rT (k + 1) p = xT (k + 1) + i=1 . 1 − λ2s (5.9) x2(k+1) x1(k+1) x2(k) x1(k) 0 0 0 1 0 R11 R12 R13 R14 0 0 R22 R23 R24 0 0 R33 R34 0 0 R44 0 memory cell (delay) noise estimates memory cell λx (delay) 1/λ x Figure 5.3: Signal flow graph for residual extraction. In this setting, the system does not generate any output during noise–only mode, since in the absence of an input vector for the left part of the signal flow graph 5.2 (see section 5.3.1), no rotation parameters are generated, so no residual extraction is possible. In several applications this would be required though. It is perceived as being disturbing when the output signal is exactly zero, often some ’comfort noise’ is preferred. Also if the voice activity detector can not be trusted, and if it does not detect a speech signal during a speech+noise segment, the output of the algorithm would remain zero. If we want to generate an output signal during segments that the voice activity detector identifies as noise–only segments, we can execute a residual extraction procedure as in the speech+noise–mode producing a priori error signals, be it without updating the 5.4. TRADING OFF NOISE REDUCTION VS. SIGNAL DISTORTION 97 R(k) and B(k) (’frozen mode’). 0 rT (k + 1) ∗ ∗ p 1 − λ2s vT (k + 1) Q (k + 1) λs R(k) T 0 1 λs B(k) = . This will of course increase the complexity in noise–only mode, but since the updates need not to be calculated completely (only the rotation parameters and the outputs), the extra complexity will be about half the complexity in speech+noise–mode. The end result will be that the complexity in noise–only mode will become about equal to the complexity in speech+noise mode, and that the maximum complexity of the algorithm does not rise. For real time processing, this maximum complexity is the most important. 5.3.4 Initialization The upper triangular matrix R(0) may be initialized with a small number η on its diagonal. This is required for the QRD–updating algorithm to start. This initialization corresponds to an initial estimation of the speech+noise covariance matrix equal to η 2 I (white noise with variance η 2 ). Due to the exponential weighting in the algorithm, the influence of the initialization will be negligable after a number of samples. 5.3.5 Algorithm description In this algorithm description, we choose to estimate the speech signal d1 (k) in the first microphone signal. This means that also the right hand side consists of only the first column b(k) of B(k). In Algorithm 13, an output signal is also generated during noise–only periods as described above. 5.4 Trading off noise reduction vs. signal distortion In many applications some distortion in the speech signal can be allowed, and hence it is possible to obtain ’more than optimal’ noise reduction in exchange for some signal distortion. We will introduce a parameter that can be used to tune this trade–off. This parameter will take the form of a regularization parameter in the Wiener filter equation. 98 CHAPTER 5. QRD–RLS BASED ANC Algorithm 13 QRD–RLS based ANC R = 0.0001I Loop : if speech+noise p 0 2 x (k + 1) = 1 − λs x(k+ 1) 0 r(k + 1) = R(k + 1) b(k + 1) 0T x (k + 1) 0 T Q (k + 1) 1 λs R(k) λs b(k) output = x1 (k + 1) + r(k+1) QM N i=1 √ cos θi (k+1) 1−λ2s if noise--only v(k + 1) = x(k + 1) Calculate u(k + 1) from RT (k + 1)u(k + 1) = v(k + 1) 2 2 b(k + 1) = λp n b(k) + x1 (k + 1)(1 − λn )u(k + 1) 0 2 x (k + 1) = 1 − λn x(k + 1) 0 r(k + 1) = ∗ ∗ 0T T x (k + 1) 0 Q (k + 1) 1 λs R(k) λs b(k) QM N i=1 cos θi (k+1) ˆ + 1)= x1 (k + 1) + r(k+1) √ d(k 1−λ2s 5.4. TRADING OFF NOISE REDUCTION VS. SIGNAL DISTORTION 5.4.1 99 Regularization In practice, an additional design parameter is often introduced to the unconstrained optimal filtering approach to obtain more noise reduction than achieved by the standard unconstrained optimal filtering scheme. The result will be an increase in signal distortion, but for a lot of applications this is not necessarily harmful. In [22], an alternative to the MMSE–criterium is derived. We use a similar, but slightly different criterium : 2 2 min (ε{xT (k)W̃qr (k) − d(k)} + µ ε{vT (k)W̃qr (k)} ). F W̃qr (k) (5.10) F The first term in the minimization criterium accounts for the signal distortion, the second one for the noise reduction. The parameter µ2 can be used to trade off noise reduction versus signal distortion. This leads to W̃qr (k) = (ε{x(k)xT (k)} + µ2 ε{v(k)vT (k)})−1 (ε{x(k)xT (k)} − ε{v(k)vT (k)}). The tradeoff parameter translates into a regularization term in the Wiener filter equation. In a deterministic setting, this leads to W̃qr (k) = (X T (k)X(k) + µ2 V T (k)V (k))−1 (X T (k)X(k) − V T (k)V (k)). (5.11) 5.4.2 Speech+noise mode We return to the QRD–framework, by defining R̃T (k)R̃(k) = X T (k)X(k) + µ2 V T (k)V (k). We will now track the Cholesky factor R̃(k) of X T (k)X(k) + µ2 V T (k)V (k) instead of the Cholesky factor R(k) of X T (k)X(k). Noting that X(k) ε [X(k) µV (k)]T = X T (k)X(k) + µ2 V T (k)V (k), µV (k) it is obvious that this can be done by applying two updates instead of one to the left hand side of Figure 5.3 in each time step. First, an update is done with the microphone input vector, as explained in section 5.3.1, and then a second update is done with µ times a noise input vector that we have stored in a noise buffer. This noise buffer consists of successive input vectors from previous noise–only periods. In section 5.3.2, RT (k) is needed to perform a backsubstitution step. Since in this case we only have access to R̃T (k), we have to rewrite equation (5.11) somewhat : 100 CHAPTER 5. QRD–RLS BASED ANC (RT (k)R(k) + µ2 V T (k)V (k)) W̃qr (k) = RT (k)R(k) − V T (k)V (k) | {z } R̃T (k)R̃(k) = (RT (k)R(k) + µ2 V T (k)V (k)) − V T (k)V (k) − µ2 V T (k)V (k) = R̃T (k)R̃(k) − (1 + µ2 )V T (k)V (k), ˜ −T (k)(V T (k)V (k)) . W̃qr (k) = I − (1 + µ2 )R̃−1 (k)R | {z } W̃qrN Written in the same form as (5.7), we obtain R̃(k)W̃qrN (k) = B̃(k). (5.12) So since we store R̃(k) instead of R(k), we now have to update B̃(k) = R̃(k)−T (1 + µ2 )(V (k)T V (k)). The full procedure in speech+noise mode is as follows : first weight R̃(k) with λs , p and then update it with 1 − λ2s x(k + 1) in order to obtain R̃0 (k). The rotation parameters that are generated by this update are used together with zeros applied to the top of the signal flow graph to update B̃(k) to B̃ 0 (k) : B̃ 0 (k) = R̃0 (k)−T (1 + µ2 )(V (k)T V (k)). This corresponds to the original scheme of section 5.3.1. We let this step generate a residual, and we use it as an output of the noise filter. p Then R̃0 (k) is updated to R̃(k + 1) by applying µ 1 − λ2s v(k + 1) to the left hand part of the signal flow graph, and the rotation parameters are again used to update B̃ 0 (k) to B̃ 00 (k) : B̃ 00 (k) = R̃−T (k + 1)(1 + µ2 )(V T (k)V (k)). The residual signal generated in this step should be discarded. It is possible to update the factor V T (k)V (k) during noise–only mode. In that case B̃(k + 1) = B̃ 00 (k). Another option consists in performing also these updates with noise vectors from the noise–buffer during speech+noise mode. The update from B̃ 00 (k) to B̃(k + 1) is performed as B̃(k + 1) = λ2s B˜00 (k) + (R̃−T (k + 1)(1 − λ2s )(1 + µ2 )v(k + 1))vT (k + 1)). Where v(k + 1) is taken from the noise buffer. This can again be calculated using a backsubstitution followed by a multiplication as explained in section 5.3.2. 101 5.5. COMPLEXITY 5.4.3 Noise–only mode The factor V T (k)V (k) in the right hand side of equation (5.12) can also be updated during noise–only periods. In that case, the algorithm proceeds exactly as in section 5.3.2 during noise–only mode. During noise–only mode, also the input vectors must be stored into the noise buffer3 . Algorithm 14 gives a complete specification. 5.5 Complexity In noise-only mode, the complexity for the unregularized algorithm (section 5.3) is (M N )2 + 3M N + M flops per sample if no output signal is generated during noise periods, or 3(M N )2 + 16M N + 2 if an output signal is generated during noise–only periods. In speech+noise mode, the number of flops per sample is 3.5(M N )2 + 15.5M N + M + 2. In these calculations, one flop is one addition or one multiplication. These figures apply when only one filter output is calculated. This can be compared to the complexity of a recursive version of the GSVD–based optimal filtering technique [12] [13], which is O(27.5(M N )2 ) flops per sample. For a typical setting of N = 20 and M = 5, we would obtain 36557 flops per sample for the QRD–based method as compared to 275000 flops per sample for the GSVD–based method, which amounts to an 8–fold complexity reduction4 . For the regularized algorithm of section 5.4, the complexity will be doubled during speech+noise mode (O(7(M N )2 ) compared to the unregularized optimal filtering scheme of section 5.3. The complexity during noise–only mode remains the same. The algorithms have been implemented in real time on a Linux PC (PIII, 1Ghz). For just–real time performance, this leads to a maximum of 3 channels with 10 filter taps per channel for the GSVD–based algorithm. The unregularized QRD–based algorithm 3 If memory would be too expensive, an alternative is to use white noise instead of buffered noise vectors. This will probably lead to more signal distortion, but experiments show that it is still a valid alternative. 4 If complexity would be prohibitive for some applications, the QRD–RLS based algorithm can be used to generate a noise estimate with relatively few filter taps. This estimate can then be fed to a second stage, similarly to [11]. 102 CHAPTER 5. QRD–RLS BASED ANC Algorithm 14 ˆ + 1) is the resulting speech signal QRD–based ANC with trade–off parameter. d(k estimate. R = 0.0001I Loop : if speech+noise p x0 (k + 1) = 1 − λ2s x(k + 1) 0 r(k + 1) T = Q (k + 1) 0 0 R̃ (k) b̃ (k) ˆ + 1) = x1 (k + 1) + d(k r(k+1) QM N i=1 √ x0T (k + 1) λs R̃(k) 1 λs 0 b̃(k) cos θi (k+1) 1−λ2 s v(k + 1) = next noise--vector from noise--buffer p 0 (k + 1) = µ 1 − λ2 v(k + 1) v s 0 r(k + 1) R̃(k + 1) b̃(k + 1) 0T v (k + 1) T = Q (k + 1) R̃0 (k) 0 b̃0 (k) R̃T (k + 1)u(k + 1) = v(k + 1), backsubstitution gives u(k + 1) (if no noise updates during noise--only) b(k + 1) = λ2s b(k) + v1 (k + 1)(1 − λ2s )(1 + µ2 )u(k + 1) (if no noise updates during noise--only) if noise--only Push input vector v(k + 1) = x(k + 1) in noise--buffer RT (k + 1)u(k + 1) = v(k + 1), backsubstitution gives u(k + 1) (if noise updates during noise--only) b(k + 1) = λ2n b(k) + x1 (k + 1)(1 + µ2 )(1 − λ2n )u(k + 1) (if noise updates during noise--only) p x0 (k + 1) = 1 − λ2s x(k + 1) 0T T x (k + 1) 0 0 r(k + 1) = Q (k + 1) 1 λs R(k) b(k) ∗ ∗ λs ˆ + 1) = x1 (k + 1) + d(k r(k+1) QM N i=1 √ cos θi (k+1) 1−λ2 s ! 5.6. SIMULATION RESULTS 103 we have proposed here, when implemented in the time domain, allows for 3 channels with 30 filter taps per channel. The theoretical complexity figures are confirmed : the filter lengths can be made three times longer than for the reference setup with the GSVD–based algorithm (this is indeed expected because of the quadratic complexity, 2 i.e. 27.5 3.5 ≈ 3 ), while the performance is the same. A subband implementation (16 subbands, 12–fold downsampling) of the QRD–based algorithm allows to use 15 taps per subband in 3 channels, which comes down to an equivalent of 12.3.15 = 540 filter taps. 5.6 Simulation results Theoretically, the GSVD–based approach and the QRD–based approach solve the same problem. We will show some subtle differences between the practical implementations of the GSVD– and the QRD–based techniques. The conclusion will be that also in a practical implementation the behaviour of the GSVD– and the QRD– based algorithm is roughly the same, and hence for the performance results for the QRD–based technique, we can refer to the literature about the GSVD–based technique [14, 12]. Note that all covariance matrices in the above equations and algorithms should be positive definite. The clean speech signal typically exists in a subspace of the input space. Hence a number of eigenvalues may be zero in the difference ε{x(k)xT (k)} − ε{v(k)vT (k)}. A practical estimator however will never obtain exact zeroes for the eigenvalues, and may even produce negative values for the estimation of the eigenvalues of ε{x(k)xT (k)} − ε{v(k)vT (k)}. In the GSVD–approach, direct access to the singular values is possible, and the negative eigenvalues can be corrected to be zero. This is not possible in the QRD–based approach. This difference is most of all seen for short estimation windows. For longer estimation windows, the QRD– and GSVD–results become roughly equal. On the other hand, the GSVD–approach has to incorporate an approximation in order to achieve quadratic complexity. We will also show the influence of this approximation. The speech signal is a sentence that is repeated four times. Speech+noise versus noise– only periods were marked manually. Reverberation was added by a simulated acoustical environment (acoustic path of 1000 taps, sampling frequency 8000 Hz). The speech source is located at about 6◦ from broadside, at 2.8meters from the microphone array. The (white) noise source is located at about 54◦ from broadside at 2.2 meters from the array. The microphone array consists of 4 microphones, spaced 20 cm each, the filters have 40 taps per channel. The first column of W (k) is selected for signal estimation. During each utterance of the sentence, the volume decreases, so the SNR is not constant. Figure 5.4 shows the difference between the QRD–based method and the GSVD–based method for a short estimation window. After the first speech utterance, in the beginning of the noise–only period, the convergence is clearly visible. 104 CHAPTER 5. QRD–RLS BASED ANC The QRD–based method has less distortion because of the approximation used in the GSVD–approach [14], while the GSVD–based method obtains more noise reduction due to the ’corrected singular value’ estimates. Figure 5.5 compares both methods without applying the ’corrections’ to the singular values in the GSVD–method. In that case, the results are quite similar, and the QRD–based method performs slightly better (both concerning distortion and noise reduction) because of the tracking approximation in the calculation of the SVD. In Figure 5.6 the noise estimation window is made longer (λn = 0.99995), and the corrections are applied in the GSVD–based algorithm. The figure shows that in spite of this, the performance of both algorithms is almost the same. 105 5.6. SIMULATION RESULTS Signal energy −30 Energy (dB) −40 −50 −60 QRD− output −70 Noise at mic 1 −80 −90 Speech at mic 1 GSVD−output negative ev set to zero 2 4 6 8 10 12 14 4 x 10 Time (samples) Signal energy Energy −32 (dB) −34 Speech at mic 1 −36 QRD− output −38 GSVD−output negative ev set to zero −40 −42 −44 3.6 3.8 4 4.2 4.4 4.6 4.8 Time (samples) 5 4 x 10 Figure 5.4: Four utterances of the same sentence. The clean speech signal and the noise signal at microphone 1 are plotted (simulation). The GSVD–result is better for this case (λn = 0.9997 and λs = 0.9997) than the QRD–result because the negative eigenvalues can be set to zero in the GSVD–method. As shown on the detail below, the distortion is less for the QRD–method though. 106 CHAPTER 5. QRD–RLS BASED ANC Signal energy Energy −30 (dB) −40 −50 −60 −70 GSVD −80 QRD −90 2 4 6 8 10 12 14 4 Time (samples) x 10 Figure 5.5: Comparison between GSVD–based and QRD–based unconstrained optimal filtering when the correlation matrices are not corrected in the GSVD–approach. Again, λn = 0.9997 and λs = 0.9997. The cheaper QRD–method performs slightly better because of the approximation used in the GSVD–approach. So the performance of the QRD and GSVD– algorithms can be considered ’almost equal’ 107 5.6. SIMULATION RESULTS Signal energy Energy (dB) −30 −40 −50 −60 −70 −80 −90 2 4 6 8 10 12 14 4 Time (samples) x 10 Figure 5.6: QRD–approach versus GSVD–approach with longer estimation window : the difference between both algorithms vanishes, even when the eigenvalues are corrected in the GSVD– approach. (λn = 0.99995 and λs = 0.9997) Signal energy Energy (dB) −40 −50 −60 −70 No trade−off −80 −90 Trade off parameter = 2 −100 0.8 0.9 1 1.1 1.2 1.3 5 Time (samples) x 10 Figure 5.7: When a trade off (regularization) parameter is introduced, even more noise reduction is can be achieved, in exchange for some signal distortion. The upper line is always the algorithm output without the trade off parameter, the lower line with a trade off parameter µ = 2. 108 CHAPTER 5. QRD–RLS BASED ANC The result of introducing regularization (see section 5.4) is clearly shown in Figure 5.7. The upper line in the plot is the algorithm output energy for the original algorithm (without regularization), while the lower line shows the output energy when a regularization parameter µ = 2 is chosen. There is more noise reduction (as can be seen in the ’valleys’ of the graph, while also some signal distortion is introduced (as can be seen at the peaks of the graph). 5.7 Conclusion In this chapter, we have derived a new QRD–based algorithm for multichannel unconstrained optimal filtering with an “unknown” desired signal, and applied this to the ANC problem. The same basic problem is solved as in related algorithms that are mostly based upon singular value decompositions. Our approach results in at least an equal performance. In the GSVD–based algorithm approximate GSVD tracking is used, but since these approximations are not present in the QRD–based algorithm, the performance is often even better. The major advantage of the QRD–based optimal filtering technique is that its complexity is an order of magnitude lower than that of the (approximating) GSVD–based approaches. We have also introduced a trade–off parameter in the QRD–based technique that allows for obtaining more noise reduction in exchange for some signal distortion. Chapter 6 Fast QRD–LSL–based ANC The QRD–based unconstrained optimal filtering ANC algorithm we have presented in the previous chapter allows for a complexity reduction of an order of magnitude compared to existing unconstrained optimal filtering approaches based on GSVD– computation and tracking. However, complexity still is quadratic in both the filter length and the number of channels, and since in typical applications the filter length is often a few tens–hundreds of taps, this complexity can still be prohibitive. In standard QRD–RLS adaptive filtering, the shift structure of the input signal is exploited in order to obtain an algorithm (QRD–LSL) that is linear in the filter length. In this chapter, we will show how we can also apply this to the QRD–based algorithm of chapter 5 too. This is not straightforwardly achieved though, since in the previous chapter’s algorithm access to the upper triangular cholesky factor was necessary during noise only periods in order to calculate the update of the right hand side. In a QRD–LSL–based algorithm, this matrix is not explicitly present anymore. We will propose a QRD–Least Squares Lattice (QRD–LSL) based unconstrained optimal filtering algorithm for ANC that obtains again the same performance as the GSVD– or QRD–RLS–approach (chapter 5) but now at a dramatically reduced complexity. As mentioned before, if M is the number of microphones, and N the number of filter taps applied to each microphone signal, then the GSVD–based approach has a complexity of O(M 3 N 3 ). An approximate GSVD–solution (which uses GSVD– tracking) still requires O(27.5M 2 N 2 ) flops per sample. The QRD–RLS based solution of chapter 5 reduces this complexity to O(3.5M 2 N 2 ). The algorithm presented in this chapter has a complexity of O(21M 2 N ). For typical parameter settings (N = 50, M = 2), this amounts to an up to 50–fold complexity reduction when compared to the approximative GSVD–solution, and a 8–fold complexity reduction compared to the QRD–RLS–based algorithm. Our algorithm is based on a 109 110 CHAPTER 6. FAST QRD–LSL–BASED ANC numerically stable fast implementation of the RLS–algorithm (QRD–LSL), applied to a reorganized version of the QRD–RLS–based algorithm of chapter 5. In section 6.1, we describe the data model that is used for this algorithm. Then a QRD–RLS algorithm which is a modified version of the algorithm in chapter 5 is derived in section 6.2, and this is worked into a QRD–LSL algorithm in section 6.3. In section 6.4, the transitions between modes are studied in detail, in section 6.5, a regularization parameter is introduced which — similarly to the regularization factor in the previous chapter — can be used to trade off noise reduction versus signal distortion. In section 6.6, complexity figures are given, and in 6.7 simulation results are described. Conclusions are given in section 6.8. 6.1 Preliminaries In order to show the analogy between the QRD– and GSVD– based methods, we will again derive the algorithms with a matrix W (k) of which the columns are the filter vectors and a vector d(k) of which the elements are the desired signals, but one should keep in mind that for a practical implementation only one column of the matrix has to be calculated. It turns out that the method described in chapter 5 can not be straightforwardly modified into a QRD–LSL fast implementation. By reorganizing the problem, we can indeed substitute a QRD–LSL algorithm. The fast RLS schemes that are available from literature are based upon the requirement that the input signal has a shift structure, which means that each input vector should be a shifted version of the previous input vector. A large number of computations can then be avoided, and ’re–used’ from previous time instants. As a result of the complexity reduction, the matrix R(k) is not explicitly available in the algorithm anymore. In the QRD–based algorithm of chapter 5, the right hand side noise covariance matrix is updated during noise–only periods (section 5.3.2), and the matrix R(k) is needed in order to do this, since a backsubstitution step is required. If we want to obtain a fast algorithm, we will have to come up with a way to update the noise correlation matrix without needing access to R(k). In order to derive a fast algorithm, we first have to reorder the contents of our input vectors. They are redefined as in (2.5), repeated here for convenience x1 (k) .. . xM (k) , x(k) = x1 (k − 1) .. . xM (k − N + 1) which clearly does not have an impact on the algorithms of chapter 5. 111 6.1. PRELIMINARIES Due to the second assumption in section 5.2, we can attempt to estimate ε{x(k)xT (k)} during speech+noise–periods, and ε{v(k)vT (k)} in noise–only periods. We will make use of a weighting scheme in order to provide the ability to adapt to a changing environment. The input matrices X(k) and V (k) are defined as in (5.5) and (5.6). The Wiener–solution is then estimated as W (k) = (RT (k)R(k))−1 (RT (k)R(k) − V T (k)V (k)) = I − R−1 (k) R−T (k)V T (k)V (k), | {z } (6.1) B(k) | {z W N (k) } where I is the identity matrix. Let us now consider the following least squares estimation problem where the upper rows (with X(k)) represent weighted inputs from speech+noise–periods and the lower rows (with V (k)) represent weighted inputs from noise–only periods : 2 X(k) 0 N . W (k) − (6.2) min 1 N V (k) βV (k) W (k) β The normal equations for this system are (X T (k)X(k) + β 2 V T (k)V (k))W N (k) = (V T (k)V (k)), such that W N (k) = ((X T (k)X(k) + β 2 V T (k)V (k)))−1 (V T (k)V (k)) | {z } (6.3) T (k)R (k) Rβ β = Rβ−1 (k) Rβ−T (k)(V T (k)V (k)), | {z } Bβ (k) with Rβ ∈ <M N ×M N and upper triangular. Clearly, for the limiting case of β going to N zero (indicated by β → 0), Rβ→0 (k) = R(k) and Bβ→0 (k) = B(k), so Wβ→0 (k) = N I − W (k), which means that W (k) may be computed as W (k) = I − Wβ→0 (k). We will now provide a QRD-based algorithm that makes use of this feature and that is based on storing and updating the triangular factor Rβ→0 (k) = R(k) as well as the right hand side Bβ→0 (k) = B(k). From now on, we focus on (6.2) and (6.3), keeping in mind that a desired signal estimate is then obtained as d̂T (k) N = xT (k)(I − Wβ→0 (k)) N = xT (k) − xT (k)Wβ→0 (k)), N where xT (k)Wβ→0 (k) in fact corresponds to an estimate of the noise–contribution in T x (k). Note that our updating formulae will be reorganized such that the β does not appear anywhere, hence can effectively be set to zero. 112 CHAPTER 6. FAST QRD–LSL–BASED ANC 6.2 Modified QRD–RLS based algorithm Referring to (6.1), the algorithm will be based on storing and updating only Rβ→0 (k) and Bβ→0 (k). In a second step (section 6.3), this will be turned into a QRD–LSL based algorithm. We know from (6.2) that there will be two modes of updating, depending on the input signal p being classified as speech+noise or noise–only. In speech+noise–mode we apply 1 − λ2s x(k + 1) as an input pto the left hand side , and 0 to the right hand side. During noise–only periods, β 1 − λ2n v(k+1) should be applied to the left hand side of the SFG, and β1 v(k+) to the right hand side with β → 0. Assume that at time k, a QR–decomposition is available as follows X(k) βV (k) 0 1 β V (k) = Q(k) β→0 R(k) B(k) 0 ∗ , (6.4) N where Q(k) is not stored. From this equation, Wβ→0 (k) can be computed as N Wβ→0 (k) = R−1 (k)B(k). Our algorithm will however be based on residual extraction (section 2.3.3), and hence N Wβ→0 (k) will never be computed explicitly. 6.2.1 Speech+noise–mode The update formula for the speech+noise–mode is derived as follows, where X(k) is updated based on formula (5.5), while V (k) is kept unchanged, i.e. V (k+1) = V (k) : X(k + 1) βV (k + 1) If we define β̃ = X(k + 1) βV (k + 1) 1 βV β λs , 0 (k + 1) p β→0 = 1 − λ2s xT (k + 1) λs X(k) βV (k) 0 0 1 V (k) β . β→0 we obtain 0 1 V (k + 1) β = β→0 = p 1 − λ2s xT (k + 1) 0 1 λs X(k) 0 λs 1 1 λs β̃V (k) V (k) λs β̃ β̃→0 p 1 − λ2s xT (k + 1) 0 1 0 1 λs R(k) B(k) . λs 0 Q(k) 0 ∗ 113 6.2. MODIFIED QRD–RLS BASED ALGORITHM mic 1 1−λ2s ~ x1(k+1) x~2(k+1) mic 2 ~ x1(k) ∆ ∆ x~1(k−1) ∆ x~2(k) ∆ ∆ x~2(k−1) x~1(k−2) ~ x2(k−2) ∆ 0 0 1 0 0 R11 Z 11 Z 12 Z21 Z22 0 R22 0 0 R33 0 0 R44 0 0 R55 0 0 R66 0 0 R77 0 0 R88 0 Πcosθ ε1 ε2 LS residual Speech+noise periods : delay 1/λs Givens−rotations delay λ s (a) Givens–rotations during speech+noise periods Figure 6.1: QRD–RLS based optimal filtering for acoustic noise suppression. speech+noise–periods, Givens–rotations are used. During 114 CHAPTER 6. FAST QRD–LSL–BASED ANC mic 1 1−λ2n ~ x1(k+1) x~2(k+1) mic 2 ~ x1(k) ∆ ∆ ~ x1(k−1) ∆ ~ x2(k) ~ x2(k−1) ∆ ∆ ∆ ~ x1(k−2) ~ x2(k−2) N N N N L L L L L L L NO NO NO NO LM LM LM LM LM LM LM JK O NN O NN O NN O NN ONNOONN LLM M LL M LL M LL M LL M LL M LL MLLMMLL K JJ KJKKJJ KJKKJJ KJKKJJ KJKKJJ HHIIHHI IHHIIHH IHHIIHH IHHIIHH IHHIIHH IHHIIHH IHHIIHH IHHIIHH ?>??>> ?>??>> ?>??>> ?>??>> ?>??>> @@AA@@A A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ CBCCBB CBCCBB CBCCBB CBCCBB CBCCBB DDEEDDE EDDEEDD EDDEEDD EDDEEDD EDDEEDD EDDEEDD EDDEEDD EDDEEDD GFFGGFF GFFGGFF GFFGGFF GFFGGFF GFFGGFF O O O O M M M M M M M K R11 Z 11 Z 12 0 Z P PPP P PPP P PPP P PPP P PPP P PPP P PPP PQQPPP SRRSR SSRRR SSRRR SSRRR TUTTUT TUUTTT TUUTTT TUUTTT TUUTTT TUUTTT TUUTTT TUUTTT =<<=< ==<<< ==<<< ==<<< 232232 233222 233222 233222 233222 233222 233222 233222 10010 11000 11000 11000 &'&&'& &''&&& &''&&& &''&&& &''&&& &''&&& &''&&& &''&&& $%%$$$ $%%$$$ 21$%%$$$ $%%$$$ $%%$$$ Q Q Q Q Q Q Q Q QQ QQ QQ QQ QQ QQ QQ S SSS U UUUUUUU = === 3 3333333 1 111 ' ''''''' %%%%% R22 Z22 0 XY Y XX YXYYXX YXYYXX YXYYXX YXYYXX VWWVVW WVWWVV WVWWVV WVWWVV WVWWVV WVWWVV WVWWVV WVWWVV ;::;;:: ;::;;:: ;::;;:: ;::;;:: ;::;;:: 455445 545544 545544 545544 545544 545544 545544 545544 /..//.. /..//.. /..//.. /..//.. /..//.. ())(() )())(( )())(( )())(( )())(( )())(( )())(( )())(( #"##"" #"##"" #"##"" #"##"" #"##"" Y R33 0 Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ 889 988 988 988 6667 6766 6766 6766 6766 6766 6766 6766 ,,- -,, -,, -,, ***+ *+** *+** *+** *+** *+** *+** *+** !!!!! [[Z[[Z [[Z [[Z [[Z [[Z [[Z [[Z 989 998 998 998 767 776 776 776 776 776 776 776 -,- --, --, --, +*+ ++* ++* ++* ++* ++* ++* ++* !!!!!!!!!! R44 0 R55 Noise−only : 0 2 delay λ n R66 0 R77 Gauss transformations, left hand side 0 Gauss−transformations, right hand side R88 0 contain elements of R(k) Figure 6.2: QRD–RLS based optimal filtering for acoustic noise suppression. During noise– only–periods, Gauss–rotations are used. 115 6.2. MODIFIED QRD–RLS BASED ALGORITHM This means (from (6.4)) that the updated R(k + 1) and B(k + 1) may be obtained based on a standard QRD–updating T Q (k + 1) p 0 rT (k + 1) R(k + 1) B(k + 1) 1 − λ2s xT (k + 1) λs R(k) 0 1 λs B(k) = . (6.5) A signal flow graph representation is given in Figure 6.1. For this example, the right hand side part of this signal flow graph has only 2 columns, estimating the speech components in x1 (k) and x2 (k) only. While the left hand side has a weighting with λs , the right hand side part has weighting with λ1s which is shown by means of black squares in boxes. As shown in equation (6.6), we do not have to calculate the filter vector in each step, but we can use residual extraction (2.24) in order to obtain the estimate of the speech signal d̂(k + 1) = x(k + 1) + r(k + 1) QM N cos θi (k + 1) pi=1 . 1 − λ2s (6.6) Note that (6.5) and (6.6) are the same as (5.8) and (5.9). The update formulas for the noise–only case however, will be different from the formulas in chapter 5. 6.2.2 Noise–only mode The update formula for the noise–only mode is derived as follows, where now X(k) is kept unchanged, i.e. X(k + 1) = X(k), while V (k) is implicitly updated as p 1 − λ2n vT (k + 1) . V (k + 1) = λn V (k) It is convenient to redefine/reorder V (k + 1) as V (k + 1) = λn V (k) 1 − λ2n vT (k + 1) p , 116 CHAPTER 6. FAST QRD–LSL–BASED ANC leading to = = X(k + 1) βV (k + 1) X(k) βλn V (k) |{z} 0 1 V (k + 1) β β→0 0 1 2 λ βλn n V (k) ˜ p β̃ β 1 − λ2n vT (k + 1) Q(k) 0 0 1 1 β p 1 − λ2n vT (k + 1) R(k) 0 p 2 β 1 − λn vT (k + 1) β→0 ˜ β̃ → 0 λ2n B(k) ∗ . p 1 T 2 β→0 1 − λ v (k + 1) n β ˜ β̃ → 0 This means that the updated R(k + 1) and B(k + 1) may be obtained based on a QRD–updating R(k + 1) B(k + 1) = (6.7) 0 rT (k + 1) R(k) λ2n B(k) T p p Q (k + 1) , β 1 − λ2n vT (k + 1) β1 1 − λ2n vT (k + 1) N where Q(k + 1) is not stored and from which Wβ→0 (k + 1) can again be computed as N Wβ→0 (k + 1) = R−1 (k + 1)B(k + 1). Note that β and β1 now appear explicitly in the QRD–updating formula. As we are interested in the case β → 0, we have to work the formulas into an alternative form where β does not appear explicitly. The end result will be that orthogonal Givens transformations are replaced by Gauss–transformations, and that the input vector will be p p 1 − λ2n vT (k + 1) 1 − λ2n vT (k + 1) instead of β p 1 − λ2n vT (k + 1) 1 β p 1 − λ2n vT (k + 1) . It will also be shown that the elements of the matrix R(k) are not changed by the updating during noise–only periods. We consider the first orthogonal Givens rotation that is computed in the top left hexagon of the signal flow graph in Figure 6.1. R11 (k + 1) 0 = cos θ1 (k) − sin θ1 (k) sin θ1 (k) cos θ1 (k) p R11 (k) β 1 − λ2n v1 (k + 1) . 117 6.2. MODIFIED QRD–RLS BASED ALGORITHM Here vj (k) denotes the j’th component of vector v(k). Since β → 0, we can write p β 1 − λ2n v1 (k + 1) tan θ1 (k) = R11 (k) p β 1 − λ2n v1 (k + 1) sin θ1 (k)|β→0 ≈ (6.8) R11 (k) β→0 ≈ 1. cos θ1 (k)|β→0 Hence we have − β √ 1 β √ 1−λ2n v1 (k+1) R11 (k) 1−λ2n v1 (k+1) R11 (k) which is equivalent to R11 (k + 1) = 0 1 √ − R11 (k + 1) 0 = 1 0 1−λ2n v1 (k+1) R11 (k) (6.9) β→0 1 p R11 (k) β 1 − λ2n v1 (k + 1) ! , β→0 p R11 (k) 1 − λ2n v1 (k + 1) , where β has disappeared. This rotation is then applied for the updating of the remaining elements in the first row of R(k) . p R1j (k +0 1) = (6.10) β 1 − λ2n vj (k + 1) β→0 √ β 1−λ2n v1 (k+1) 1 R (k) 1j R11 (k) p √ , 2 β 1−λ2n v1 (k+1) β 1 − λ v (k + 1) j n − 1 R11 (k) which is equivalent to 1) p R1j (k + = 1 − λ2n vj0 (k + 1) β→0 1 0 ! R1j (k) . 1 − λ2n vj (k + 1) − 1 R11 (6.11) This shows that the elements R1j (k) are indeed unaffected by this transformation. √ 1−λ2n v1 (k+1) p Applying the rotation to the right hand side (B(k)–part) of the signal flow graph leads to B1j (k + 1) p = (6.12) 1 0 1 − λ2n vrhs,j (k + 1) β β→0 √ β 1−λ2n v1 (k+1) 2 1 R11 (k) p λn B1j (k) √ 2 , 1 2 β 1−λn v1 (k+1) 1 − λn vrhs,j (k + 1) β − 1 R11 (k) β→0 118 CHAPTER 6. FAST QRD–LSL–BASED ANC where vrhs (k + 1) is the input to the right hand side of the SFG. During noise–only periods, vrhs (k) = v(k). Equation (6.12) is equivalent to : ! √ 2 1−λn v1 (k+1) 2 1 p B1j (k0 + 1) p λn B1j (k) R11 (k) = . 1 − λ2n vrhs,j (k + 1) 1 − λ2n vrhs,j (k + 1) 0 1 (6.13) 0 Note that vrhs (k) = vrhs (k). Similar transformations are subsequently applied to the other rows of B(k) and R(k), where the update for the second row of R(k) uses v0 (k + 1)as its input, and generates v00 (k), which will serve as an input for the third row, and so on. Note that we have now removed β from all equations. The above formulae effectively mean that — during noise–only periods — we can replace the original Givens– rotations in Figure 6.1 by so–called Gauss–transformations, as derived by formulae (6.11) and (6.13), see Figure 6.2. Note that from these formulae it also follows that the right–hand part and the left–hand part have differently defined Gauss–transformations, namely 1 0 Gleft = −∗ 1 and Gright = G-T left = 1 ∗ 0 1 . In chapter 5, we have developed a QRD–based algorithm for unconstrained optimal filtering where the right hand side update for noise–only periods, B(k + 1) = λ2n B(k) + (R−T (k + 1)v(k + 1))vT (k + 1)(1 − λ2n ), is calculated directly by a backsubstitution using an intermediate vector u(k), and a vector–vector multiplication. RT (k + 1)u(k + 1) B(k + 1) = v(k + 1) = λ2n B(k) + u(k + 1)vT (k + 1)(1 − λ2n ). It is easily shown that this backsubstitution corresponds exactly to applying (6.11) and that the vector–vector multiplication corresponds to (6.13). The reorganized algorithm, however, is more easily converted into a QRD–LSL scheme, see section 6.3. Since it is assumed that there is no speech present in noise–only mode, we could set the desired signal estimate to zero, i.e. d̂(k + 1) = 0. In practice, it is often required to also have an estimate available during noise periods, because complete silence could be disturbing to the listener (see also section 5.3.3). Hence a residual extraction scheme that also operates during noise–only mode is again needed. It is obtained as follows. T Let G (k) be the matrix that combines all left hand part Gauss–transformations at time k. We can generate residuals by applying these left hand side rotations to an 6.3. QRD–LSL BASED ALGORITHM 119 input of 0 applied to the right hand side. These rotations will also not update the right hand side. We write (with W N (k) = R−1 (k)B(k)) R(k) B(k) = 0 rT (k + 1) R(k) B(k) T p G (k + 1) 1 − λ2n vT (k + 1) 0 R(k) B(k) −W N (k) = 0 rT (k + 1) I | {z } 0 rT (k + 1) R(k) B(k) T −W N (k) p G (k + 1) . I 1 − λ2n vT (k + 1) 0 | {z } 0 p 0 − 1 − λ2n vT (k + 1)W N (k) Now we can obtain the ’speech signal’ estimate : p 1 − λ2n xT (k + 1) + rT (k + 1) T p d̂ (k + 1) = . 1 − λ2n (6.14) An algorithm description can be found in Algorithm 15 6.3 QRD–LSL based algorithm Based upon the reorganization in the previous section, we can now derive a fast algorithm based upon QRD–LSL. First (section 6.3.1the classification in speech+noise or noise must be done ’per sample’ instead of per vector, in order to be able to maintain a shift structure in the input. After that (in section 6.3.2), the LSL–based algorithm is derived. 6.3.1 Per sample versus per vector classification In the previous section we described a QRD–updating based scheme to calculate an optimal speech signal estimate from a noisy signal. The scheme is shown in Figure 6.1 for signal+noise input vectors, and in Figure 6.2 for noise–only input vectors. Most 120 CHAPTER 6. FAST QRD–LSL–BASED ANC Algorithm 15 The modified QRD–RLS algorithm (note the different Gauss– transformations in the left–and in the right hand side) QRDRLS_Mod(x,mode) { Picos = 1; if mode=noise bIn = x[1] // right hand side input for i=1:M*N { Gauss=CalcGauss(R[i][i],x[i]); for (j = i+1:M*N) { ApplyGauss1(Gauss, R[i][j], x[j]) } ApplyGauss2(Gauss, b[i], bIn) } else (mode=signal) bIn = 0; for i=1:M*N { Givens=CalcGivens(R[i][i],x[i]); for (j = i+1:M*N) { ApplyGivens(Givens, R[i][j], x[j]) } ApplyGivens(Givens, b[i], bIn, PiCos) } return PiCos * bIn; } 121 6.3. QRD–LSL BASED ALGORITHM of the time, namely within each noise–only segment and within each speech+noise– segment, the input vector to (the left hand side part of)1 this algorithm is a shifted version of the input vector of the previous time step. This in particular may allow us to derive a fast implementation with a QRD–LSL structure. We note here that the equivalence between the signal flow graphs of Figure 2.4 and Figure 2.5 was stated in section 2.3.4 for the case where orthogonal Givens transformations are used, but equally holds when Gauss transformations (cfr. noise–only mode) are used (as this is the limiting case with β → 0). However, the shift–structure of the input vectors to the signal flow graph is temporarily destroyed by a transition between modes if the classification in noise–only periods versus speech+noise periods is performed on a per–vector basis. When a transition p occurs,p two successive input vectors will indeed have different scalings ( 1 − λ2s versus 1 − λ2n and β → 0 versus β = 1). We therefore propose to do the noise–only/speech+noise–classification on a per–sample basis. Each sample is then effectively given a “flag” f (f = 1 means signal+noise sample, and f = 0 means noise–only sample (which is multiplied by β → 0)), which it maintains while travelling through the signal flow graph, both horizontally through the delay line and vertically through successive transformations in a column. Also all transformations are given a flag g that indicates whether the transformation is based on (calculated from) a sample from a noise–only period or a signal+noise period. This will introduce the transitions gradually into the signal flow graph, which will then allow us to derive a fast algorithm. In this way, the first input vector of a noise period following a speech+noise period will be (including weightings) p 1 − λ2n x1 (k) .. . p 2 x (k) β 1 − λ n M p 1 − λ2s x1 (k − 1) .. . p 2 x (k − 1) 1 − λ s M p 1 − λ2s x1 (k − 2) .. . β . (6.15) Similarly, the first input vector of a speech+noise–period, following a noise–only pe- 1 The right hand side part does not need to have shift structure in order to derive a fast algorithm. 122 CHAPTER 6. FAST QRD–LSL–BASED ANC riod is p 1 − λ2s x1 (k) .. . p p 1 − λ2s xM (k) β 1 − λ2n x1 (k − 1) .. p . β 1 − λ2n xM (k − 1) p β 1 − λ2n x1 (k − 2) .. . . (6.16) The shift-structure of the signal flow graph is then always preserved. Since the transition will occur gradually, some transformations in the graph will be calculated from inputs that have been multiplied by f = 0 (i.e. β → 0), and applied to inputs which have not, and vice versa. Therefore, we will derive four ’rules’ that can be used for the updates in the left hand part of the graph (the part that updates R(k)). 1. A transformation based upon a noise–only input sample (g = 0), and applied to a noise–only input sample (f = 0), can be replaced by a Gauss–transformation. (The gauss–transformation is different for the left– and right hand part of the signal flow graph). 2. A transformation based upon a noise–only inputsample (g = 0), and applied to 1 0 a speech+noise sample (f = 1), is replaced by . 0 1 3. A transformation based upon a speech+noise input sample (g = 1), and applied cos θ 0 to a noise–only sample (f = 0), is replaced by . − sin θ 0 4. A transformation based upon a speech+noise input sample (g = 1), and applied to a speech+noise–sample (f = 1) , is an ordinary Givens–rotation. Rule 1 is proven in (6.11) and (6.13) where respectively the Gauss–transformations for the left hand part and the right hand part of the signal flow graph are shown. Rule 4 is the standard orthogonal update. Rule 2 is obvious from R1j (k + 1) vj0 (k + 1) = = − β √ 1 β √ 1−λ2 n v1 (k+1) R11 (k) R1j (k) √ 1 − λ2s xj (k + 1) 1−λ2 n v1 (k+1) R11 (k) 1 , R1j (k) √ 1 − λ2s xj (k + 1) β→0 (6.17) 123 6.3. QRD–LSL BASED ALGORITHM 1−λ2 x~1(k+1) mic 1 Signal/noise flag f x~2(k+1) mic 2 x~1(k) ∆ x~2(k) ∆ x~1(k−1) ∆ ∆ x~2(k−1) ∆ x~1(k−2) ∆ 0 x~2(k−2) 0 f Signal/noise flag f 1 0 R11 11 ((''(' ((''(' ((''(' ((''(' *))*)* **))*) **))*) **))*) **))*) **))*) **))*) +++ ,,++,+ ,,++,+ ,,++,+ ,,++,+ Z 12 "!!"! ""!!! ""!!! ""!!! ""!!! $##$# $$### $$### $$### $$### $$### 21121 22111 22111 22111 22111 0//0/ 00/// 00/// 00/// 00/// 00/// 00/// --- ..--- 21..--- ..--- ..-- " """" $ $$$$$ 2 2222 0 000000 .... Z22 0 &%%&%& &&%%&% &&%%&% &&%%&% &&%%&% &&%%&% Z 0 R22 Z 0 0 R33 ABBAABA BABBAAA BABBAAA BABBAAA ==== >=>>=== >=>>=== >=>>=== >=>>=== >=>>=== >=>>=== ;;; <<;;; <<;;; <<;;; <<;;; 9::99:9 :9::999 :9::999 :9::999 :9::999 :9::999 B BBB >>>>>> <<<< : ::::: 887787 887787 887787 887787 56655656 65665565 65665565 65665565 65665565 65665565 65665565 3333 43443343 43443343 43443343 43443343 0 0 R44 ??? @@??? @@??? @@??? @@??? @@??? @@??? CDDCCD DCDDCC DCDDCC DCDDCC DCDDCC FEEFE FFEEE FFEEE FFEEE FFEEE FFEEE GHHGGH HGHHGG HGHHGG HGHHGG HGHHGG JIIJI JJIII JJIII JJIII JJIII JJIII JJIII KKK LLKKK LLKKK LLKKK LLKKK @@@@@@ CD DC DC DC DC F FFFFF GH HG HG HG HG J JJJJJJ LLLL 0 0 R55 UUUU VUVVUUU VUVVUUU VUVVUUU VUVVUUU STTSSTS TSTTSSS TSTTSSS TSTTSSS TSTTSSS TSTTSSS VVVV T TTTTT RRQQRQ RRQQRQ RRQQRQ RRQQRQ OPPOOPOP POPPOOPO POPPOOPO POPPOOPO POPPOOPO POPPOOPO POPPOOPO MMMM NMNNMMNM NMNNMMNM NMNNMMNM NMNNMMNM 0 0 R66 XWWWX XXWWWXXWWWXXWWWXXWWWXXWWW ZYYYZ ZZYYYZZYYYZZYYYZZYYY \[[[\ \\[[[\\[[[\\[[[\\[[[\\[[[\\[[[ ]]] ^^]]]^^]]]^^]]]^^]]] X X X X X X Z Z Z Z Z\ \ \ \ \ \ \ ^ ^ ^ ^ 0 0 R77 dcddcc dcddcc dcddcc dcddcc abbaab babbaa babbaa babbaa babbaa babbaa babbaa ___ `_``__ `_``__ `_``__ `_``__ 0 0 R88 eef fee fee fee fee fee fee gg hgg hgg hgg hgg ffeef fffee fffee fffee fffee fffee fffee gg hhhgg hhhgg hhhgg hhhgg 0 Πcosθ ε1 ε2 LS residual flag g Signal + noise : λ=λ s Noise only : λ=λ n Givens− or Gausstransformations, depending on flags f and g Figure 6.3: QRD–RLS scheme for acoustic noise cancellation, classification noise or signal+noise is done sample by sample. Flag g shows if the transformation is based upon a signal+noise or noise–only sample, and flag f is carried together with the sample through the SFG and shows if the sample stems from a signal+noise or a noise–only period. 124 CHAPTER 6. FAST QRD–LSL–BASED ANC and rule 3 is proven by R1j (k) R1j (k + 1) cos θ1 (k) sin θ1 (k) p = 2 vj0 (k) − sin θ1 (k) cos θ1 (k) β 1 − λn vj (k) β→0 R1j (k) cos θ1 (k) 0 p = . (6.18) − sin θ1 (k) 0 1 − λ2n vj (k) It should be noted that none of the rotations computed from the input vector components that are multiplied by β do not have any effect on the elements of R(k). They can be considered to be zero as far as the updates for R(k) are concerned. For the updates of R(k) the input signal can be considered to be x time ∗ ∗ 0 0 0 0 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 0 0 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 0 0 ∗ ∗ S+N-updates N → S+N Noise- − only updates S+N → N S+N-updates In this structure, a pre– and post–windowing of the signal that arrives in the estimation process for the correlation matrix square root R(k) is recognized. Based on these transformation rules, Figure 6.1 and Figure 6.2 (with per vector classification) may be turned into Figure 6.3 (with per sample classification). An important aspect is that this scheme provides —as one can easily verify— an R(k) which effectively corresponds to the triangular factor obtained with Figure 2.4 (plain QR– updating) when fed with the same sequence of input samples, be it that all noise–only samples are set to zero (pre– and post–windowing). In addition, the right hand side B(k) = R−T (k)V T (k)V (k) effectively has the same V(k) as in the per–vector classification case, i.e. consists of (full) noise–only input vectors (as it should be). 6.3.2 LSL–algorithm The signal flow graph of Figure 6.3 is readily transformed into a QRD–LSL type signal flow graph, as shown in Figure 6.4). Note that during a signal+noise to noise–only transition, no residuals can be calculated. In practice, this means that there is no correct noise–estimate available, and this is audible as a small click in the output signal. One could run a parallel lattice during transitions in order to be able to generate residuals. Since some of the transformations 6.4. TRANSITIONS 125 are void during the transition (the right hand side needs not be updated in that part of the graph where the Gauss–transformations have already been introduced), this does not double the complexity), or one could just insert ’comfort noise’ instead (e.g. from a noise buffer), since the transitions are very short in time. The complete algorithm is shown in Figure 6.4 where each sample at the input is accompanied by a flag f which is carried through the SFG along with the signal, and a flag g which accompanies the rotations. Flag f indicates whether the signal stems from a noise–only period or from a signal+noise–period, while g indicates if the transformation was calculated based upon a sample from a noise–only period or not. The combination of these flags determines whether a hexagon should be a Givens– or a Gauss–transformation according to the rules given in section 6.3.1. An algorithm description is given in section 16. The full specification can be found in Algorithm 16. 6.4 Transitions In this section we will look in detail to what happens ’internally’ in the algorithm during the different transitions. This information is not absolutely necessary to implement the algorithm, but it can serve to understand the internal working of the algorithm. 6.4.1 Transition from speech+noise to noise–only mode We will now clarify the internals of the algorithm, more specifically during transitions. First we will explain how the Givens–rotations are changed into Gauss–rotations during speech+noise to noise–only transitions. At the last sample of a speech+noise– period, the first sample from a noise period enters the signal flow graph for the lattice algorithm, since the QRD–LSL ’looks ahead’ one sample. Figure 2.5 can be redrawn as is done in Figure 6.5 because of rule 2 (rotation in the upper left corner has no effect) and rule 3 (as shown in the figure) . Note that all rotations which are passed to the right hand side are still computed as they were in the signal+noise–period . An equivalent QRD–RLS scheme would at this time still operate in signal+noise–mode, and generate the same rotations and residuals (cfr. Figure 6.1) When more noise–only samples enter the graph, it can be redrawn as in Figure 6.6. The equivalent triangular scheme during the transition is shown in Figure 6.7. It should be noted that in the upper part of Figure 6.6, where the rotations are computed from input samples from the noise–only period, Gauss–transformations (arrows filled with horizontal lines) are introduced into the signal flow graph already, allthough they are not used to update the right hand side yet. 126 CHAPTER 6. FAST QRD–LSL–BASED ANC 1−λ2 mic 1 x1(k+2) Signal/noise flag f x2(k+2) 1 0 0 0 ∆ x1(k+1) 0 f mic 2 Signal/noise flag f 0 ∆ x2(k+1) ∆ R11 0 R22 0 ∆ 0 0 0 0 ∆ 0 0 ∆ 0 0 0 0 ∆ 0 0 ∆ 0 flag g Signal + noise : λ=λs Noise only : λ=λn 0 0 0 Πcos θ LS residual Figure 6.4: QRD–LSL based unconstrained optimal filtering for acoustic noise suppression. The hexagons are either Givens–rotations ore Gauss–rotations, depending upon the flags which designate whether the sample / rotation stems from a noise–only period or from a signal+noise– period in the signal. 6.4. TRANSITIONS 127 Algorithm 16 Fast QRD–LSL based noise cancellation algorithm QRDLSLNoise(x, mode) PiCos=1; xl = x; xr = x; delay[0] = x; if mode=Noise {bIn = delay[1]} else {bIn = 0} for (int i=0; i < N; i++) dxl = delay[i+1]; dxr = dxl; if (mode = Signal) Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, dWeight, PiCos) Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) if (mode = Noise) Gauss = ComputeGauss(Comp2[i], dxr) ApplyGauss(Gauss, Rot2[i], xr, b[i], bIn, dNoiseWeight) //Note : different transformations on Rot2/xr and b/bIn !!! Gauss = ComputeGauss(Comp1[i], xl) ApplyGauss(Gauss, Rot1[i], dxl) if (mode = SigToNoise) if xr.IsFromNoisePeriod and dxr.IsFromNoisePeriod Gauss = ComputeGauss(Comp2[i], dxr, dWeight) ApplyGauss(Gauss, Rot2[i], xr, dWeight) //Left hand side still weighted during transition Gauss = ComputeGauss(Comp1[i], xl) ApplyGauss(Gauss, Rot1[i], dxl) else if xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod xr = 0; //dxl is not changed here Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, dWeight, PiCos) else if not xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, dWeight, PiCos) Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) if (mode = NoiseToSig) if this is the first sample in a NoiseToSignal transition Gauss = ComputeGauss(Comp2[i], dxr) ApplyGauss(Gauss, b[i], bIn, dNoiseWeight) //xr is not modified in this case !!! dxl=0; Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) else //as in a Signal period : Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, dWeight, PiCos) Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) xl = xr; for (int i = N-2; i >=0 ; i--) {delay[i+1]=delay[i]} return bIn*dPiCos; 128 CHAPTER 6. FAST QRD–LSL–BASED ANC Rule 2 filter input 1 x1(k+1) filter input 2 x2(k+1) Rule 3 1 0 0 ∆ 0 0 0 ∆ 0 0 0 ∆ x2(k) ∆ d (k) 1 d (k) 2 ε1 ε2 0 R11 0 R22 R33 0 R44 0 ∆ 0 x1(k) 0 ∆ 0 0 ∆ R55 0 R66 0 ∆ 0 0 R77 0 R88 0 Πcos θ LS residual Figure 6.5: When u(k + 1) is a noise sample (multiplied by β → 0), the lattice signal flow graph reduces to the above 129 6.4. TRANSITIONS filter input 1 x1(k+1) filter input 2 x2(k+1) ∆ x1(k) x2(k) ∆ d (k) 1 d (k) 2 ε1 ε2 R11 0 0 0 ∆ 0 ∆ Rule 2 1 0 Rule 3 0 ∆ 0 0 0 ∆ 0 0 R33 0 R44 0 ∆ 0 R22 R55 0 R66 0 ∆ 0 0 R77 0 R88 0 Πcos θ LS residual Figure 6.6: After some noise samples have entered the scheme, the upper part can be loaded with the Gauss-transformations in order to keep the shift–structure 130 CHAPTER 6. FAST QRD–LSL–BASED ANC x1(k) filter input 1 x2(k) filter input 2 ∆ x1(k−1) ∆ x2(k−1) ∆ x1(k−2) ∆ x2(k−2) ∆ x1(k−3) x2(k−3) ∆ d (k) 1 Z(k) 0 R22 0 0 0 2 R11 1 d (k) R33 0 R44 0 0 0 R55 0 R66 0 0 0 R77 0 R88 0 Π cos θ ε1 ε2 LS residual Figure 6.7: The equivalent triangular scheme when the first noise–only sample enters 131 6.4. TRANSITIONS When all rotations are replaced by Gauss–rotations, the transition is finished, and one can start using the rotations to update the right hand side (as described in section 6.2.2). From that time on, also the weighting of the memory elements is stopped, because (since they are not updated anymore during the noise–period) they would otherwise become too small after a while. 6.4.2 Transition from a noise–only to a speech+noise–period This transition takes only one sample, and it has no effect on the residual extraction. In a first step, when the first signal+noise–sample enters the (one step look ahead) input of the lattice, we can redraw the signal flow graph as shown in Figure 6.8. From now on, weighting is again switched on for all memory elements. 0 filter input 1 x1(k+1) filter input 2 x2(k+1) 0 ∆ x1(k) x2(k) ∆ d (k) 1 R11 d (k) 2 0 0 R22 Rule 3 0 ∆ ∆ 0 0 0 R33 0 0 R44 0 ∆ ∆ 0 0 0 R55 0 0 R66 0 ∆ 0 ∆ R77 Rule 2 0 R88 0 ε1 ε2 LS residual Figure 6.8: First step in the transition from noise–only to speech+noise. The residuals are computed based on Gauss–rotations. At the next time instant, all rotations are replaced by Givens–rotations, and we again obtain the scheme of Figure 2.5. 132 6.5 CHAPTER 6. FAST QRD–LSL–BASED ANC Noise reduction vs. signal distortion trade–off In the QRD–RLS based algorithm in chapter 5, we have introduced a regularization parameter that allows for tuning of signal distortion versus noise reduction. In this section we will also do this for the QRD–LSL based algorithm. Two alternatives will be described, the first one (section 6.5.2) comparable to the technique used in chapter 5 for QRD–RLS, the other one (section 6.5.3) based upon continuously updating the signal correlation matrix, even during noise periods. This leads to a ’self tuning’ trade off–parameter, which will provide infinite noise reduction during noise only periods, and a well regularized algorithm during speech+noise periods. 6.5.1 Regularization in QRD–LSL based ANC In section 5.4 we have shown how a regularization term µ can be introduced in a QRD–RLS based system for ANC, see equation (5.10). This has lead to the following update equation 0 rT2 (k + 1) 0 rT1 (k + 1) = (6.19) R(k + 1) B(k + 1) p 1 − λ2s xT (k + 1) 0 p T . Q (k + 1) 1 − λ2s µ2 vT (k) 0 1 λs R(k) B(k) λs Here v(k) is taken from a noise buffer. The residual signal r2 (k + 1) may be used to generate residuals, while the residual signal r1 (k + 1) should be discarded. During noise–only periods, the updates for B(k) remain the same as in the non–regularized case. It is important to see that the property on which the derivation of fast RLS–schemes is based, namely the shift structure of the input signal, is not present anymore in this case (where two consecutive updates are applied). Each x(k) is a shifted version of x(k − 1), and each v(k) is a shifted version of v(k − 1). But since they are applied to the left hand side of the signal flow graph intermittently, each input vector is not a shifted version of the previous one. Effectively the input vectors now correspond to a (weighted) block Toeplitz structure instead of just a Toeplitz structure. Equation (6.19) can be implemented in signal flow graphs like Figure 6.1 and Figure 6.2 by applying both updates ’at the same time’. This is realized by replacing each single hexagon in the signal flow graph with two hexagons. The first one performs the rotation with the input signal, and the second one subsequently performs the rotation with the regularization noise. This is shown in Figure 6.9 for the hexagons representing Givens rotations. The same substitution can be applied for the rotations that represent Gauss rotations. As a result, since the number of hexagons doubles, for 133 6.5. NOISE REDUCTION VS. SIGNAL DISTORTION TRADE–OFF each rotation parameter generated and applied in the original scheme (the thick gray arrows), now two of them are generated and applied in the modified scheme. Sig in Reg.in. Sig in Reg.in. Sig in = a b c Figure 6.9: Doubling the lines and the hexagons in the signal flow graphs We will now describe two alternatives for implementing regularization in a QRD–LSL based noise cancellation algorithm. The first implementation will use a noise buffer, and it will be based upon the QRD–LSL based noise cancellation algorithm derived earlier in this chapter (Algorithm 16). The second method will be based upon a standard QRD–LSL adaptive filter. It avoids the use of a noise buffer, and it provides a regularization mechanism which will put more emphasis on noise cancellation during noise–only periods. 6.5.2 Regularization using a noise buffer The fast algorithm which incorporates regularization can be straightforwardly derived from Figure 6.1 and Figure 6.2, modified as described in Figure 6.9. In Figure 6.10, the complete scheme is shown. Compare this scheme to Figure 6.4 and note that the thick lines are as a matter of fact ’vector signals’ which carry 2– vectors with both signal– and regularisation noise samples, corresponding to Figure 6.9. Note also the extra column on the right hand side which calculates the residuals based upon the memory elements in the one but last right hand side column. The black arrows which depict the rotations are demultiplexed just before this column, and the rotations stemming from updates with regularization noise are discarded (cfr. Figure 6.9). This corresponds to discarding the residual r1 (k+1) in (6.19), and only retaining the r2 (k + 1). During speech+noise/echo mode, an update is done with left hand side microphone inputs x(k) and left hand side regularization inputs µ2 v(k) (taken from a noise buffer). The right hand side input is 0. 134 CHAPTER 6. FAST QRD–LSL–BASED ANC 0 1−λ2 Reg. Input 1 Reg. Input 2 Mic Input 1 x1(k+2) Signal/noise flag f Mic Input 2 x2(k+2) Signal/noise flag f 1 0 0 0 0 ∆ Reg. Input 1 x1(k+1) Mic Input 1 ∆ f x2(k+1) ∆ R11 0 R22 0 ∆ 0 0 0 0 ∆ 0 0 ∆ 0 0 0 0 ∆ 0 0 ∆ 0 flag g Signal + noise : λ=λs Noise only : λ=λn 0 0 0 Πcos θ LS residual Figure 6.10: Regularization in QRD–LSL based noise cancellation using a noise buffer. 6.5. NOISE REDUCTION VS. SIGNAL DISTORTION TRADE–OFF 135 During noise/echo mode, both left hand side inputs and right hand side inputs are v(k). An algorithm description is given in Algorithm 17. 6.5.3 Mode–dependent regularization As an alternative, we propose not to keep (R(k)) fixed during noise–only periods, but to update it continuously, be it with a forgetting factor λs (long window) during speech+noise periods and with forgetting factor λn (short window) during noise–only periods. In this case, the statistics of the near–end (desired speech signal) component are ’forgotten’ by the weighting scheme during noise–only periods, but experiments show that this approach delivers good results concerning ANC. Simulations will be given in section 7.7 where this algorithm will be applied for combined noise/echo cancellation. The statistics from the noise–only period (estimated with a short window) then serve as a good ’starting value’ for the estimation of the speech+noise–statistics during speech+noise periods (with a large window). Due to the fact that the statistics from the near end source are indeed forgotten during noise periods, the speech signal sounds a bit ’muffled’ at the beginning of a speech+noise period. But when the forgetting factors are chosen appropriately, this can be reduced to a hardly noticeable level. A feature of this approach is that during noise–only periods, the system output is reduced to zero. This can be understood as follows. If we allow RT (k)R(k) to be updated also during noise–periods, as we propose here, the influence of the speech+noise covariance estimate will gradually be ’forgotten’ during noise periods. This in fact corresponds to increasing µ in (5.10). The estimate will now converge to V T (k)V (k), which corresponds to µ → ∞ (hence W N → I and W → 0) during noise–only periods. On the other hand, during speech+noise periods, the regularization–effect will gradually be forgotten, resulting in a slight increase in the noise level during speech+noise, in exchange for a less distorted signal during signal+noise–periods. Hence there may be a slightly distorted speech signal in the beginning of a speech+noise period. This procedure can be thought of as a trade–off system that regulates itself : when there is no near end activity, it provides infinite noise reduction, but when there is a near end signal present, the signal quality gains importance in the optimisation. A listening test shows that good results can be achieved after some tuning of the parameters. In order to derive a QRD–LSL based algorithm that continuously updates R(k), we 136 CHAPTER 6. FAST QRD–LSL–BASED ANC Algorithm 17 QRD–LSL noise reduction algorithm with regularization. Inputs are’ mode’ and ’x’, the input vector PiCos=1; xl = x; xr = x; delay[0] = x; if mode=Noise { bIn = delay[1]; add x to noise buffer; bn = 0} else { bIn = 0; bn=get noise vector from noise buffer; bn = µ * bn} extra=bIn; bnl = bn; bnr = bn; ndelay[0] = bn for (int i=0; i < N; i++) dxl = delay[i+1]; dxr = dxl; dbnl = ndelay[i+1]; dbnr = dbnl; if (mode = Signal) Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, extra, dWeight, PiCos) Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) Givens = ComputeGivens(Comp2[i], dbnr, 1) ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, 1) Givens = ComputeGivens(Comp1[i], bnl, dWeight) ApplyGivens(Givens, Rot1[i], dbnl, 1) if (mode = Noise) //different transformations on Rot2/xr and z/zIn Gauss = ComputeGauss(Comp2[i], dxr) ApplyGauss(Gauss, Rot2[i], xr, b[i], bIn, dNoiseWeight) Gauss = ComputeGauss(Comp1[i], xl) ApplyGauss(Gauss, Rot1[i], dxl) if (mode = SigToNoise) //Left hand side still weighted during transition if xr.IsFromNoisePeriod and dxr.IsFromNoisePeriod Gauss = ComputeGauss(Comp2[i], dxr, dWeight) ApplyGauss(Gauss, Rot2[i], xr, dWeight) Gauss = ComputeGauss(Comp1[i], xl) ApplyGauss(Gauss, Rot1[i], dxl) else if xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod xr = 0; dbnr = 0; //dxl is not changed here Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, extra, dWeight, PiCos) Givens = ComputeGivens(Comp2[i], dbnr, dWeight) ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, dWeight) else if not xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, extra, dWeight, PiCos) Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) Givens = ComputeGivens(Comp2[i], dbnr, 1) ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, dWeight) Givens = ComputeGivens(Comp1[i], bnl, 1) ApplyGivens(Givens, Rot1[i], dbnl, 1) if (mode = NoiseToSig) //xr is not modified in this case if this is the first sample in a NoiseToSignal transition Gauss = ComputeGauss(Comp2[i], dxr) ApplyGauss(Gauss, z[i], zIn, dNoiseWeight) dxl=0;dbnl = 0; Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) Givens = ComputeGivens(Comp1[i], bnl, 1) ApplyGivens(Givens, Rot1[i], dbnl, 1) else //as in a Signal period : Givens = ComputeGivens(Comp2[i], dxr, dWeight) ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, extra, dWeight, PiCos) Givens = ComputeGivens(Comp1[i], xl, dWeight) ApplyGivens(Givens, Rot1[i], dxl, dWeight) Givens = ComputeGivens(Comp2[i], dbnr, dWeight) ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, dWeight) Givens = ComputeGivens(Comp1[i], bnl, dWeight) ApplyGivens(Givens, Rot1[i], dbnl, dWeight) xl = xr; bnl = bnr; for (int i = N-2; i >=0 ; i-){delay[i+1]=delay[i];ndelay[i+1]=ndelay[i]} return extra*dPiCos; 6.5. NOISE REDUCTION VS. SIGNAL DISTORTION TRADE–OFF 137 can write system (6.2) with β = 1 N X(k) 0 W = . V (k) V (k) The normal equations are (X T (k)X(k) + V T (k)V (k))W N = V T (k)V (k). During speech+noise periods, the term V T (k)V (k) in the left hand side will become unimportant due to weighting, and the system will converge to X T (k)X(k)W N = V T (k)V (k), (V T (k)V (k) + DT (k)D(k))W N = V T (k)V (k), where D(k) is the desired speech signal. In a QRD–LSL filter, this is achieved by first weighting both the left hand side and the right hand side with λs , and then applying a left hand side input u(k) and a right hand side input 0. During noise–only periods, the term X T (k)X(k) in the left hand side will be ’forgotten’ , and the system will converge to V T (k)V (k)W N = V T (k)V (k), such that after convergence W N = I, providing a ’perfect’ noise estimate. In this mode, both left– and right hand side of the QRD–LSL adaptive filter are weighted with λn , to the left hand side the input vT (k) is applied, and to the right hand side also vT (k). Note that both the left hand side R(k) and the right hand side B(k) are updated together. This is possible since during noise only periods, R(k) will converge to the cholesky factor of the noise correlation matrix (because the desired speech statistics are forgotten due to the weighting). After convergence, we can write the right hand side during noise–only periods as B(k) = R−T (k)V T (k)V (k) = R−T (k)RT (k)R(k) = R(k). So the right hand side converges to R(k) (or, in a practical implementation, to the column of R(k) which corresponds to the chosen right hand side). This can indeed be achieved by applying the input vectors to the left hand side and the right hand side together. 138 CHAPTER 6. FAST QRD–LSL–BASED ANC Transitions between modes. When R(k) is continuously updated, the choice of λs and λn is very important. During speech+noise periods, λs should be chosen close enough to 1 (e.g. λs = 0.999999 for 8000 Hz sampling rate). During noise–only periods, λn can be chosen smaller (shorter windows) for many types of noise (λn = 0.9997 for 8000 Hz sampling rate) so that convergence to the estimate during noise–only periods is very fast. On transitions between modes, the weighting is switched between λn and λs . In a QRD–LSL filter, the shift structure in the input signal must be maintained. This is not the case if classification into signal+noise and noise–only periods is done on a per input vector base. We can solve this by introducing a pre– and post–windowing scheme of the input vectors : on a transition, we will feed N zeroes to the algorithm’s input, and then switch the weighting parameters. This means that the residual signal is wrong during the transition period, but the estimates of R(k) in the algorithm will remain correct. The lack of a correct output during transitions can be solved by inserting comfort noise. But on the other hand, experiments show that good results are obtained when the pre– and post–windowing are ignored, and the weighting factors are switched on a transition. Errors that are introduced in the estimate for R(k) in this way, will be ’forgotten’ by the weighting. For an algorithm description, we refer to the QRD–LSL algorithm (Algorithm 6) with inputs as described in this section. 6.6 Complexity In the complexity calculations, an addition and a multiplication are counted as two separate floating point operations. Table 6.1 shows the complexities of different optimal filtering algorithms, and Table 6.2 shows the complexities for some typical parameter settings. It is seen that the QRD–LSL algorithm has a significantly lower computational complexity, compared to the GSVD–based and QRD–RLS–based algorithms, especially in the case where long filters are used (rightmost column). This results in the QRD–LSL algorithm being suited for real time implementation. 139 6.7. SIMULATION RESULTS Algorithm Mode Complexity recursive GSVD[12][13] 27.5(M N )2 Full QRD (chapter 5) Noise–only (M N )2 + 3M N + M Full QRD (chapter 5) Speech+noise 3.5(M N )2 + 15.5M N + M + 2 Fast QRD–LSL Noise–only 6M 2 N Fast QRD–LSL Speech+noise Fast QRD–LSL reg. (section 6.5.2) Speech+noise 2 21 7 2 )M + 19M N − 2 M 2 7 2((21N − 21 )M + 19M N − 2 2 M) 2 7 (21N − 21 )M + 19M N − 2 2M QRD–LSL (section 6.5.3) (21N − Table 6.1: Complexities in flops per sample of different algorithms Algorithm Mode recursive GSVD[12][13] N = 20, M = 5 N = 50, M = 2 275 000 275 000 Full QRD (chapter 5) Noise–only 10 305 10 302 Full QRD (chapter 5) Speech+noise 36 557 36 554 Fast QRD–LSL Noise–only 3 000 1 200 Fast QRD–LSL Speech+noise 12 120 6 051 Table 6.2: Complexities in flops per sample for typical parameter settings. These figures make the QRD–LSL algorithm suited for real time implementation. 6.7 Simulation results For the simulations we used a simulated room environment, 4 microphones, a desired speaker at broadside angle and a noise source at 45 degrees. The signals are short sentences, recorded at 8 kHz. Figure 6.11 compares the QRD–LSL–based optimal filtering method (without regularization) to the GSVD–based optimal filtering method. The QRD–based algorithm achieves roughly the same performance as to the GSVD–based method, as expected. 6.8 Conclusion We derived a fast QRD–least squares lattice (QRD–LSL) based unconstrained optimal filtering algorithm for multichannel ANC. The derivation of the QRD–LSL algorithm 140 CHAPTER 6. FAST QRD–LSL–BASED ANC Signal Energy GSVD−based QRD−LS−based input signal −30 −40 Energy [dB] −50 −60 −70 −80 3.5 4 4.5 5 Time [samples] 5.5 6 4 x 10 Figure 6.11: Performance comparison of GSVD-based optimal filtering (dotted) and QRD– LSL–based optimal filtering (full line) versus the original signal (dashed). The performance is equal. The energy of the signals are plotted. In the middle of the plot, a speech segment is recognized. The algorithm ’sees’ the silence in the beginning of the plot also as a speech segment, in order to provide a fair indication of the noise reduction during speech periods. 6.8. CONCLUSION 141 is based on a significantly reorganized version of the QRD–RLS–based unconstrained optimal filtering scheme of chapter 5. We have explicitly set up the transitions between speech+noise– and noise–only periods in such a way that the correlation matrices that are implicitly stored in this fast algorithm, correspond to the correlation matrices in the QRD–RLS based algorithm, which assures that the ’internal status’ of the algorithm is always correct. For typical parameter settings, a 8–fold complexity reduction is obtained compared to the QRD–RLS based algorithm without any performance penalty. This makes the approach affordable for real time implementation. Some methods for incorporating regularization were also introduced, allowing for obtaining more noise reduction in exchange for some signal distortion. 142 CHAPTER 6. FAST QRD–LSL–BASED ANC Chapter 7 Integrated noise and echo cancellation In this chapter, we will describe an approach to speech signal enhancement where acoustic echo cancellation and noise reduction, which are traditionally handled separately, are combined in one integrated scheme. The optimization problem defined by this scheme is solved adaptively using the QRD–based algorithms which were developed in previous chapters. We show that the performance of the integrated scheme is superior to the performance of traditional (cascading) schemes, while complexity is kept at an affordable level. 7.1 Introduction An acoustic echo canceller (AEC) traditionally uses an adaptive filter with typically a large number of filter taps, for instance 1000 taps for a signal sampled at 8000 Hz. The reason is that it aims to model the (first part of the) acoustic impulse response of the room, in this case the first 125 msec. Because of the length of the filter, one often has to resort to cheap algorithms (frequency domain NLMS for example) in order to keep complexity manageable. Acoustic noise cancellers (ANC) typically have shorter filters (e.g. delay and sum beamformers do not ’model’ the room impulse response, but are designed to have a certain spatial sensitivity pattern, which can obtained by relatively short filter lengths.). We are interested in ANC schemes that use multiple channels of audio (multiple microphones) in order to exploit both spatial and spectral characteristics of the desired and disturbing signals. 143 144 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION In many applications, for example teleconferencing systems, hand free telephone sets or voice controlled systems, one has to combine both acoustic echo and noise cancellation (AENC). Many different AENC–schemes can be found in literature [1, 7, 37, 38, 6]. Obviously, the combination of both blocks can be done in two ways, as shown by Figure 7.1 : either one applies echo cancellation on each of the microphone channels before the noise reduction block, or one applies one echo canceller on the output signal of the noise reduction block. The latter scheme has the advantage of reduced complexity, but then studies have shown that the former combination (first AEC, then ANC) has better performance. + + + + + + Figure 7.1: Two ways to combine an acoustic echo canceller with a multichannel noise reduction system. Left : first noise reduction, then echo cancellation on the ANC–output. Right : first an echo canceller on each channel, then noise reduction on the residual signals. The mere combination (cascading) of these schemes has implications on the performance of the overall system. For the case where AEC–filters are applied on each channel before the ANC, the adaptive algorithms used in the AEC should be robust against the noise in the microphone signals (which is for example a problem for filters based upon affine projection (see chapter 3 and [27])). It is then also the ANC’s task to remove the residual echo independently of the AEC. If on the other hand ANC is applied before AEC, the ANC is fed with a signal that also contains the far end echo– signal. The AEC in this case has to track the (changing) acoustic path of the room and the changes in the ANC filter. So both combination schemes clearly have their disadvantages vis–a–vis performance. In this chapter, we propose to combine the AEC and the ANC into one single optimisation problem which is then solved adaptively, see Figure 7.2. This will lead to a better overall performance. It will be shown that the length Naec of the ’AEC–part’ of the integrated scheme can be reduced significantly compared to the filter length in traditional echo cancellers, without incurring a major performance loss. The reduced filter lengths then allows us to use more advanced adaptive algorithms, which have better convergence properties than e.g. NLMS. The algorithms in this chapter are based upon QRD–based unconstrained optimal filtering methods for ANC, as described in chapters 5 and 6. 145 7.2. OPTIMAL FILTERING BASED AENC When multichannel acoustic echo cancellation would be required, one will be facing the same problems as in chapter 3. Again decorrelation techniques would have to be applied to remove the correlation between the loudspeaker signals. In this chapter we will make abstraction of this, and demonstrate the combined approach with mono echo cancellation. The outline of this chapter is as follows. In section 7.2 we will describe the situation, and the optimization that we will perform. In section 7.3 the estimates for the statistics are described. Section 7.4 describes the QRD–RLS based algorithm to implement the optimization and section 7.5 describes the QRD–LSL approach. In section 7.6 we describe how regularization (a trade–off parameter) is introduced. Section 7.7 evaluates the performance of the combined acoustic echo and noise canceller, section 7.8 gives complexity figures, and section 7.9 gives conclusions. Far end signal f x1 Noise Noise e h h1 h2 h Noise h 4 3 x2 x3 x4 ANC part N w1 w2 w 3 w4 AEC part N aec + y Speech component in X is desired output signal d (unknown) Original speech signal s Figure 7.2: Combined AENC–scheme. The filters in the M microphone paths have length N , the filter connected to the far end path has length Naec . The (unknown) desired near end speech signal is d. 7.2 Optimal filtering based AENC Referring to Figure 7.2, the near end speech component in the i’th microphone at time k is O di (k) = hi (k) s(k) i = 1 . . . M, (7.1) where M is the number of microphones, s(k) is the near end signal and hi (k) represents the acoustic path between the speech source and microphone i. The echo signals 146 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION are ei (k) = hei (k) O f (k) i = 1 . . . M, where f (k) is the loudspeaker signal and hei (k) is the acoustic path between the loudspeaker and the i’th microphone. An assumption we will make is that both the noise and the echo signal are continuously present, which effectively means that in the resulting scheme filter adaptation will be frozen during off–periods of the echo and/or noise (see below). Then we can distinguish 2 modes in the input signals : first the speech+noise/echo mode, for which we will denote the microphone samples with x(k), and second the noise/echo–only mode, for which we write the inputs as x0 (k). The i’th microphone signal during a speech+noise/echo period is xi (k) = di (k) + ni (k) + ei (k) i = 1 . . . M = di (k) + vi (k), and during a noise/echo–only period x0i (k) = ni (k) + ei (k) = vi (k), where ni (k) is the noise component (sum of the contributions of all noise sources at microphone i). We define the microphone input vector x(k) = x1 (k) x2 (k) .. . xM (k) xi (k) = xi(k) xi (k − 1) .. . xi (k − N + 1) , where N is the number of taps for each of the filters in the ANC–part of the scheme. The noise/echo only microphone signal vector x0 (k), the desired speech vector d(k), the echo signal vector e(k) and the noise vectors n(k) are defined in a similar way. Furthermore x(k) = d(k) + v(k) and v(k) = n(k) + e(k). The loudspeaker signal vector is f (k) f (k − 1) f (k) = , .. . f (k − Naec + 1) where Naec is the number of taps in the AEC–part. We define a compound signal vector during speech+noise/echo periods as x(k) u(k) = , f (k) 147 7.2. OPTIMAL FILTERING BASED AENC and during noise/echo–only periods 0 u (k) = x0 (k) f (k) . The following assumptions are made : • The noise and echo signals are uncorrelated with the speech signal. This results in ε{x(k)xT (k)} = ε{d(k)dT (k)} + ε{cross terms} +ε{v(k)vT (k)} | {z } =0 ⇓ T ε{d(k)d (k)} = ε{x(k)xT (k)} − ε{v(k)vT (k)}, and ε{f (k)dT (k)} = 0. Here ε{·} is the expectation operator. • The noise and echo signals are stationary as compared to the near end speech signal (by which we mean that their statistics change slower). This assumption allows us to estimate ε{v(k)vT (k)} during periods in which only noise and echo are present, i.e. where x0 (k) = v(k). This is a classical assumption for ANC systems, but it can be argumented that it does not hold here since the echo signal e(k) typically is not stationary. However, allthough the assumption is not fullfilled for the spectral content of v(k), it is true for the spatial content, since we assume that the loudspeaker which produces f (k) does not move. Experiments confirm the validity of this assumption (see section 7.7). • The noise and echo signals are always present while the near–end signal is sometimes present, i.e. is an on/off signal. One scenario in which this assumption is obviously fullfilled is a voice command application where the echo signal is e.g. a music signal and the near end signal consists of the voice commands. When the echo signal is also a speech signal, hence also an on/off signal, we can either switch off the adaptation during periods where the far end signal is not present (as is done in traditional echo cancellers), or (when adaptation is not switched off) allow that the algorithm — during long off periods of the echo signal — ’forgets’ the position of the far end loudspeaker to which it would normally — in a beam forming interpretation — attempt to steer a zero. We can now write the optimal filtering problem as 2 min ε uT (k)Wwf − dT (k) F , Wwf (7.2) 148 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION with u(k) the filter input and d(k) the desired filter output, i.e. the (unknown) desired speech contribution in all the (delayed) microphone signals, see (7.1). The signal estimate is then d̂Twf (k) = uT (k)Wwf (k) T d(k) + v(k) = Wwf (k). f (k) The Wiener solution is Wwf (ε{u(k)uT (k)})−1 ε{u(k)dT (k)} I T −1 T T = (ε{u(k)u (k)}) ε{u(k) u (k) − v (k) } 0 I = − (ε{u(k)uT (k)})−1 ε u(k)vT (k) , 0 = so that finally Wwf = I 0 − ε{u(k)uT (k)}−1 ε{u0 (k)x0 (k)}. (7.3) Here I is the identity matrix. We will also use a regularization term in the optimization criterion. Referring to [22], a parameter µ can be used to trade off signal distortion, which is defined as (dT (k) − T µ v(k) µ dT (k) 0 Wwf (k)), versus residual noise/echo defined as ( Wwf ). We f (k) will use a similar, but slightly different approach. We define an optimization criterion ε{uT (k)W µ (k) − dT (k)}2 + µ ε{ vT (k) f T (k) W µ (k)}2 . (7.4) min wf wf µ F F wwf 149 7.2. OPTIMAL FILTERING BASED AENC Now the Wiener–solution is µ Wwf T = (ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 . ε{u(k))dT (k)} = (ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 . I ε{u(k)(uT (k) − vT (k))} 0 = (ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 . I T (ε{u(k)u (k)} − ε{u(k)vT (k)}) 0 T T T (ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 . I T 2 0 0T (ε{u(k)u (k) + µ u (k)u (k)} − 0 I T ε{µ2 u0 (k)u0 (k)} − ε{u(k)vT (k)}) 0 I T = − (ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 . 0 = T T (ε{µ2 u0 (k)x0 (k)} + ε{u0 (k)x0 (k)}), so that finally µ Wwf = I 0 −( T µ2 1 T T ε{u(k)u (k)} + ε{u0 (k)u0 (k)})−1 . 1 + µ2 1 + µ2 ε{u0 (k)x0 (k)} (7.5) µ If all statistical quantities in the above formula were available, Wwf and Wwf could µ straightforwardly be computed. Wwf or Wwf is then a matrix of which each column provides an optimal (M N + Naec )–taps filter. One of these columns can then be chosen (arbitrarily) to optimally estimate the speech part in the corresponding entry of x(k), i.e. filter out the noise/echo in one specific (delayed) microphone signal. In practice, of course, not the whole matrix will be calculated, but only a selected column of it. 150 7.3 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION Data driven approach A data–driven approach will be based on data matrices U (k), U 0 (k) and X 0 (k), which are conveniently defined as uT (k) λs uT (k − 1) p U (k) = 1 − λ2s λ2 uT (k − 2) , (7.6) s .. . T u0 (k) λn u0 T (k − 1) p U 0 (k) = 1 − λ2n λ2 u0 T (k − 2) , (7.7) n .. . X 0 (k) = U 0 (k) I 0 , where λs denotes the forgetting factor for the speech+noise/echo data, and λn the forgetting factor for the noise/echo–only data. In order to compute (7.3), we want U T (k)U (k) to be an estimate of ε{u(k)uT (k)}, i.e. ε{u(k)uT (k)} ≈ U T (k)U (k). This is realised by the above definition of U (k), as can be verified from the corresponding update formulas λ2s U T (k)U (k) U T (k + 1)U (k + 1) = + (1 − λ2s )u(k + 1)uT (k + 1). (7.8) Such updates may be calculated during speech+signal/noise periods. The other estimate we need in (7.3) is T T ε{u0 (k)x0 (k)} ∼ = U 0 (k)X 0 (k). This is realized by the definition of U 0 (k) and X 0 (k) , as the corresponding update formula can be verified from T U 0 (k + 1)X 0 (k + 1) = T λ2n U 0 (k)X 0 (k) + (1 − λ2n )u0 (k (7.9) 0T + 1)x (k + 1), which can be calculated during noise/echo–only periods. Allthough (7.8) and (7.9) ensure that the estimates are correct, it may be interesting in a practical application to divide both (7.8) and (7.9) by min((1 − λ2n ), (1 − λ2s )), in order to avoid multiplication and division by very small numbers, and then correct for this in the final result. For the theoretical derivation of the algorithm, we will continue to work with the unmodified equations (7.8) and (7.9). 151 7.4. QRD–RLS BASED ALGORITHM 7.4 QRD–RLS based algorithm Using the QR–decomposition we can write W (k) I 0 T (R (k)R(k)) (R (k)R(k) − U 0 (k)X 0 (k)) I T = − R−1 (k)R−T (k)U 0 (k)X 0 (k). (7.10) 0 = N Define W (k) = −1 T I 0 T − W (k), then T W N (k) = R−1 (k) R−T (k)U 0 (k)X 0 (k) . | {z } (7.11) ≡B(k) We will again store and update both R(k) and B(k) so that at any time W N (k) can be computed by backsubstitution R(k)W N (k) = B(k). (7.12) Only one column of B(k) has to be stored and updated, thus providing a signal or noise/echo estimate for the corresponding microphone signal. 7.4.1 Speech+noise/echo updates R(k) and B(k) can be updated as T Q (k + 1) p 0 rT (k + 1) R(k + 1) B(k + 1) 1 − λ2s uT (k + 1) λs R(k) 0 1 λs B(k) = . (7.13) The optimal filter coefficients can be computed by backsubstitution (equation (7.12)), or the least squares residuals are then obtained by multiplying the elements of r(k +1) in formula (7.13) by the product of the cosines of the Givens rotation angles. For details, we refer to chapter 5. This rearrangement results in the signal flow graph (SFG) in Figure 7.3 for M = 2, N = 4, Naec = 6. All signal flow graphs shown in this chapter again have rearranged input vectors ũ(k) 152 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION instead of u(k), as follows ũ(k) = x1 (k) .. . xM (k) f (k) −−−−−−− x1 (k − 1) .. . xM (k − Naec + 1) f (k − Naec + 1) −−−−−−− f (k − Naec ) .. . f (k − Naec + 1) . The residuals yn (k) generated by this SFG are the noise+echo signal estimates : p (7.14) ynT (k + 1) = 0 − u(k + 1)W N (k + 1) 1 − λ2s . The overall output signal (the estimate for the written as u1 (k + 1) u2 (k + 1) d̂(k + 1) = .. . uM (k + 1) 7.4.2 near end speech signal) can then be y (k + 1) n . − p 1 − λ2s Noise/echo–only updates During noise/echo–only periods, R(k) remains unchanged, while T B(k) = R−T (k)U 0 (k) X 0 (k) has to be updated. From equation (7.9), we find that p p T B(k + 1) = λ2n B(k) + (R−T (k + 1) 1 − λ2n u0 (k + 1)) 1 − λ2n x0 (k + 1)}). Given R(k + 1), we can compute a(k + 1) = (R−T (k + 1)u0 (k + 1)) by a backsubstitution in RT (k + 1)a(k + 1) = u0 (k + 1) B(k + 1) = λ2n B(k) + a(k + 1)vT (k + 1)(1 − λ2n ), which should be substituted in the memory cells on the right hand side in Figure 7.3 during noise/echo–only mode. It is — just as in chapter 5 — possible to also 153 7.4. QRD–RLS BASED ALGORITHM 1−λ 2 mic 1 x1(k) mic 2 x (k) echo ref f (k) 2 ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 rotation cell a θ a’ θ 0 0 b b’ 0 0 memory cell (delay) 1/λ x 0 0 0 0 memory cell (delay) λ 0 0 0 0 0 0 0 0 0 ε ε Πcos θ LS residual y (k) y1 (k) 1 Figure 7.3: Signal flow graph for residual extraction. Naec = 6, N = 4, M = 2. The signal flow graph is executed during speech+noise/echo mode, while only the memory elements in the right hand side frame are updated during noise/echo–only mode (as described in section 7.4.2) 154 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION generate residuals in noise/echo–only mode, by executing the signal flow graph in ’frozen mode’. An algorithm description can be found in Algorithm 18 QRD–RLS algorithm for AENC QRDRLS_AENC_update (R, x, r, Weight) { // x is the input vector // r = x[1] in signal mode, r=0 in noise/echo mode PiCos = 1; for (i = 0; i < M * N + Naec; i++) { R[i][i] *= Weight; temp = sqrt (R[i][i] * R[i][i] + x[i] * x[i]); sinTheta = x[i] / temp; cosTheta = R[i][i] / temp; R[i][0] = temp; for (j = i+1; j < M * N + Naec; j++) { temp = R[i][j] * Weight; R[i][j] = cosTheta * temp + sinTheta * x[j]; x[j] = -sinTheta * temp + cosTheta * x[j]; } temp = z[i] / Weight; z[i] = cosTheta * temp + sinTheta * r; r = -sinTheta * temp + cosTheta * r; PiCos *= cosTheta; } return r * PiCos; } 7.5 QRD–LSL algorithm We will use the QRD–LSL based algorithm for acoustic noise cancellation which was derived in chapter 6 as a basis for the QRD–LSL based algorithm for combined echo– and noise cancellation. Let us consider the alternative minimization problem with weighting schemes (7.8) and (7.9) : 0 U (k) N . min W − (7.15) 1 0 0 fast N X (k) βU (k) Wfast β The normal equations for this system are T T N (U T (k)U (k) + β 2 U 0 (k)U 0 (k))Wfast (k) = (U 0 (k)X 0 (k)). (7.16) N N These can be solved for Wfast (k). If β → 0, then Wfast (k) → W N (k). This scheme is updated with u(k) as an input and 0 as the desired signal during speech+noise/echo 155 7.6. REGULARIZED AENC periods, and with u0 (k) as input and x0 (k) as desired signal during noise/echo – only periods. Residual extraction can be used to obtain noise estimates , which can then be subtracted from the input signal in order to get clean signal estimates. Since often one will want to use more filter taps for the AEC part than for the ANC part (Naec can be made longer than N ), one can alternatively use a scheme for QRD–LSL with unequal channel lengths. We refer to [46] for details, examples of signal flow graphs with unequal channel lengths are shown below. For an algorithm description, we refer to Algorithm 16, with inputs as described in this section (the input vector is extended with a channel containing the echo reference signal). 7.6 Regularized AENC Experiments show that a better noise/echo cancellation can be obtained by modifying the optimization function where more emphasis is put on the noise/echo cancellation term, at the expense of increased signal distortion. As a matter of fact, this regularization is indispensable when combined echo– and noise cancellation is involved. The corresponding optimization problem is given in (5.10). The update equation becomes 0 rT2 (k + 1) 0 rT1 (k + 1) = R(k + 1) B(k + 1) p 1 − λ2s uT (k + 1) 0 p T T , Q (k + 1) 1 − λ2s µ2 u0 (k) 0 1 λs R(k) λs B(k) (7.17) where x0 (k) is taken out of a noise buffer. During noise–only periods, B(k) is updated as in section 7.4.2. 7.6.1 Regularization using a noise/echo buffer The QRD–LSL based noise cancellation algorithm with regularization from section 6.5.2 in chapter 6 can be used as a basis for a QRD–LSL based acoustic noise and echo cancelling algorithm. If Naec > N a QRD–LSL structure with unequal channel lengths can be used[46]. During speech+noise/echo mode, an update is done with microphone inputs u(k), and 156 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION regularization inputs µ2 x0 (k), taken from a noise buffer. The right hand side inputs are 0. During noise/echo mode, the inputs for the left hand side are u0 (k), and the inputs for the right hand side are x0 (k). For an algorithm description, we refer to Algorithm 17, where the input vector is extended with one channel containing the echo reference signal. 7.6.2 Mode–dependent regularization In order to use the alternative noise cancellation algorithm from section 6.5.3 as a basis for combined noise/echo cancellation, we we write (7.15) with β = 1 : U (k) U 0 (k) N W f ast = 0 X 0 (k) . The normal equations for this system are N T T (U T (k)U (k) + U 0 (k)U 0 (k))W f ast = U 0 (k)X 0 (k). During speech+noise/echo periods, due to weighting the system will converge to N T U T (k)U (k)W f ast = U 0 (k)X 0 (k) T V (k) V (k) F (k) + ( F T (k) D(k) N T T D (k) 0 )W f ast = U 0 (k)X 0 (k), 0 where D(k) is the desired speech signal. In the QRD–LSL filter in Figure 7.4, this is achieved by first weighting both the left hand side and the right hand side with λs , and then applying a left hand side input u(k) and a right hand side input 0. During noise/echo–only periods, the system converges to N T T U 0 (k)U 0 (k)W f ast = U 0 (k)X 0 (k), such that after convergence N W f ast = I 0 . In this mode, both left– and right hand side Figure 7.4 are weighted with λn , and the T to the left hand side the input u0 (k) = x0 T (k) 0 is applied, and to the right T hand side x0 (k). 7.7. PERFORMANCE 157 During transitions between modes, pre–and post–windowing should be used, as explained in chapter 6. For an algorithm description we refer to Algorithm 6, with inputs as described in this section. 7.7 Performance To set up a performance comparison, we have implemented a conventional cascaded multichannel scheme (right hand side of Figure 7.1) consisting of two blocks : first the echo is removed from all of the microphone channels, and then the signals are processed by a noise cancellation scheme. For the echo cancellers we have chosen an RLS–algorithm (QRD–lattice) which is not often used in practice because of its complexity, but this assures us that we achieve the best possible result for the two– block–scheme. The noise cancellation algorithm used is the QRD–based scheme from [56]. This ’traditional’ setup is compared to the integrated approach from section 7.6.2. The sampling frequency was 8kHz. A simulated room environment was used, with 4 microphones spaced 20 centimeter. The near end speaker is located at about 10 degrees from broadside, a white noise source is at 45 degrees, and the loudspeaker for the far end signal at -45 degrees. The near end speaker utters a phrase with decreasing energy, so the signal to noise+echo ratio varies from -10 dB in the beginning of the phrase to -40 dB at the end of the phrase (Figure 7.5 shows some utterances of the used phrase). The signal to noise ratio varies from +13 dB in the beginning of the phrase to -14 dB at the end. The parameters used are λecho,trad = 0.9997 for the forgetting factors in the RLS–based traditional echo cancellers, and λs,trad = λn,trad = 0.9997 for the forgetting factors for the noise cancellation algorithm in the traditional setup. No regularization was applied here. For the new method, we have chosen λs = 0.999999 and λn = 0.9997. While the new method does incorporate regularization, the simulations will show that less signal distortion results. The simulations compare only speech+noise/echo periods, since the new algorithm suppresses all signal during noise/echo only periods, and this would not result in a relevant comparison. All speech/noise detection has been done manually (which means a perfect speech detector is assumed). Figure 7.6 and Figure 7.7 show that the integrated approach outperforms the conventional method for a simulated acoustic path of 200 taps, and with an echo canceller length Naec = 200, M = 4 microphones, and N = 40 taps per microphone. Both algorithms operate in speech+noise/echo–mode in this plot. The valleys (speech pauses) are up to 20 dB lower for the combined algorithm (which means more noise reduction), while the peaks are slightly higher, which shows that there is less signal distortion. 158 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION Left hand side inputs 1−λ2 mic 1 x1(k) mic 2 x (k) echo ref Right hand side input 2 x3(k) ∆ 1 0 0 0 ∆ 0 0 ∆ ∆ ∆ 0 0 0 0 0 0 0 0 ∆ ∆ 0 0 ∆ 0 0 0 0 0 0 0 ∆ ∆ 0 0 ∆ 0 0 0 0 0 0 0 ∆ ∆ 0 0 ∆ 0 0 0 0 0 0 ε Πcos θ Figure 7.4: QRD–LSL AENC scheme. During speech+noise/echo–periods λ = λs , resulting in a large window, and during noise/echo–only periods, λ = λn , resulting in a shorter window. 159 7.7. PERFORMANCE 0 Energy (dB) −20 −40 −60 −80 −100 −120 −140 0 1 2 3 4 5 6 7 8 9 4 x 10 Time (samples) Figure 7.5: Some utterances of the phrase which was used for the simulations. The lower curve is the energy (in dB) of the clean speech signal which reaches the microphone, while the upper curve is the energy of one of the microphone signals. This shows that the SENR varies from -10 dB to -40 dB in each utterance. 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0 0.5 1 1.5 Time (samples) 2 2.5 3 3.5 4 x 10 Figure 7.6: Comparison of the output signal of the cascaded scheme (black) and the integrated approach (light gray). Both algorithms operate in speech+noise/echo–mode. 160 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION 0 Energy (dB) −20 AEC followed by ANC −40 −60 −80 −100 −120 Combined scheme −140 −160 −180 −200 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 Time (samples) Figure 7.7: Comparison of the energy in the output of the cascaded scheme with ANC following a AEC, and of the integrated approach as described in this paper. Notice the deeper valleys (higher noise cancellation during speech pauses) and the higher peaks (less signal distortion during speech) for the integrated algorithm. Both algorithms operate in speech+noise/echo– mode. −20 Energy (dB) −30 −40 −50 −60 −70 −80 −90 N=10, Naec=100 N=20, Naec=100 −100 N=10, N =200 aec −110 N=20, Naec=200 0.5 1 1.5 Time (samples) 2 2.5 3 4 x 10 Figure 7.8: Performance of the integrated scheme when undermodelling the echo path. 161 7.7. PERFORMANCE Energy (dB) −20 −30 −40 −50 −60 −70 −80 −90 −100 0.5 1 1.5 2 2.5 3 4 x 10 Time (samples) Figure 7.9: Comparison for undermodelling : the cascaded approach (full line) can not handle this situation as well as the integrated approach (dashed line) (both cases M=4, Naec = 100, N=40) Energy (dB) −20 −30 −40 −50 −60 −70 −80 −90 −100 0.5 1 1.5 2 2.5 4 Time (samples) x 10 Figure 7.10: For almost the same total number of taps, the cascaded approach with sufficient order for the echo path (M=4, Naec = 200, N=20, total taps = 280, full line) performs worse than the integrated approach with an undermodeled echo path (M=4, Naec = 100, N=40, total taps = 260, dotted line). During the pauses between the word utterances, the difference is very large ! 162 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION Energy (dB) −20 −30 −40 −50 −60 −70 −80 −90 −100 −110 0.5 1 1.5 2 2.5 3 4 Time (samples) x 10 Figure 7.11: Even if for both algorithms N=40, M=4, and the cascaded scheme is sufficient order for the echo (Naec = 200) while the new scheme is undermodelled (Naec = 100) (dashed line), the integrated scheme still performs better because the noise/echo information is not processed in two independant stages. The performance of the integrated approach for the case of undermodelling — which is often the case in a realistic situation — of the echo–path is shown in Figure 7.8. A comparison of this situation with the cascaded scheme is depicted in Figure 7.9. The (undermodelling) echo–cancellers in the cascaded scheme will produce a large instantaneous misadjustment which is due to the non–stationarity of the far–end signal [57]. The —independently adapted— noise cancellation filter in the cascaded scheme can not compensate for this, since its input signal is disturbed by the behaviour of the first (AEC) block. The integrated approach is shown to handle this situation far better. In Figure 7.10, an integrated scheme where the echo–canceller part is undermodelled is compared with a cascaded scheme with a sufficient order modelling of the echo path with (about) the same total number of filter taps. Also here the integrated approach outperforms the conventional cascading scheme. Finally, Figure 7.12 shows that the echo canceller filter can indeed be made shorter due to the advantageous effect of adding the noise filters. The performance with the noise filters in a combined scheme in a noise free environment is better than without the noise filters. 163 7.7. PERFORMANCE −20 Energy (dB) −30 −40 −50 −60 −70 −80 −90 −100 −110 1 2 3 4 5 6 7 8 9 4 Time (samples) x 10 Figure 7.12: Simulation in noise free environment shows that echo cancelling is aided by the M length N filters in the signal path. The dotted line is a combined scheme with a 300 taps echo filter and 4 channels with each 25 taps noise filters. It is better than a 300 taps traditional echo canceller alone (full line). 164 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION 7.8 Complexity In the complexity calculations, an addition and a multiplication are counted as two separate floating point operations. Table 7.1 shows the complexities of the algorithms. Algorithm Complexity 2 Full QRD, noise (M N + Naec ) + 3M (N + Naec ) + M Full QRD, speech 3.5(M N + Naec )2 + 15.5(M N + Naec ) + M + 2 Regul. Full QRD, speech Cont. Upd. QRD–LSL (21N − 2 15 2 (M N + Naec ) 2 21 2 )(M + 1) + 19(M + + 69 2 (M N + Naec ) 1)N − 72 (M + 1) +3 + 21(Naec − N ) Table 7.1: Complexities of different algorithms in flops per sample. M is the number of microphones, Naec =is the length for the AEC–part, and N is the number of filter taps per microphone channel in the ANC part. For a typical setting in a car environment, Naec = 200, N = 10, M = 3, the complexity for the new continuously updating QRD–LSL based technique is 7912 flops per sample. This is to be compared with the complexity of a cascaded scheme. The QRD–LSL based noise cancellation algorithm we have derived in [51], has a complexity of 2346 flops per sample for these settings. An NLMS–based echo canceller would have a complexity of 800 flops per sample with these parameters. This means that a cascading scheme with first echo cancelling and then a QRD–LSL based noise cancellation scheme would amount to 3*800+2346=4746 flops per sample. 7.9 Conclusion In this chapter, we have extended both QRD–RLS and QRD–LSL based schemes for noise cancellation with an extra echo–reference input signal, thus proposing schemes which handle combined noise– and echo cancellation as one single optimization problem. We have shown by simulations that the performance is better when such a global optimization problem is solved than when the traditional cascading approach is used. The complexity and performance figures show that allthough somewhat more complex, the more performant QRD–LSL based approach presented here can be applied for real time processing as an alternative for cascading techniques. Chapter 8 Conclusions In this thesis we have developed a number of techniques which can be used to ’clean up’ a speech signal picked up in an adverse acoustic environment. We have made a distinction between disturbances for which a reference signal is available, and disturbances for which no reference signal is available. The first type of disturbances gives rise to techniques which we classify as ’acoustic echo cancellation techniques’ (AEC), while the second type of disturbances can be reduced by ’noise cancellation techniques’ (ANC) . Acoustic echo cancellation is treated in the first part of the text, acoustic noise cancellation in the second part, and in the third part of the text, the combination of both is discussed. Acoustic echo cancellation The NLMS algorithm is a cheap algorithm which may exhibit some performance problems in case non–white input signals are used. On the other hand, the RLS–algorithm, which performs very well even for non–white signals, is much more expensive. A class of ’intermediate’ algorithms is the APA–family of adaptive filters. RLS and APA can be seen as an NLMS–filter with pre–whitening applied. Not too long ago, only NLMS–filters and even cheaper frequency domain variants were used to implement acoustic echo cancellation, because of the long adaptive filters involved, allthough both RLS and APA are well known to perform better. Due to the increase of computing power over the years, also APA–filters more and more find their way into this field, notably when multichannel acoustic echo cancellation is involved. We have shown that if affine projection techniques are used for acoustic echo cancelling, it is important to provide sufficient regularization in order to obtain robustness 165 166 CHAPTER 8. CONCLUSIONS for continuously present near end background noise. This is important in single channel echo cancellation, but even more in multichannel, where the cross–correlation between the loudspeaker signals (and hence the input signals of the adaptive filter) leads to ill–conditioning of the problem. We have pointed out the advantages and disadvantages of the FAP–algorithm. In its traditional version, what we have called ’explicit regularization’ can easily be incorporated into it, because it uses the FTF–algorithm for updating the size P correlation matrix. On the other hand, it makes some assumptions concerning how much regularization is used, and it exposes problems when an exponential weighting technique is used for regularization. Another disadvantage is that periodic restarting of the estimation is necessary due to numerical stability problems in the FTF–algorithm. We proposed to replace the update of the small correlation matrix of size P by a QRD– based updating procedure which is numerically stable. Explicit regularization is not easily implemented in this QRD–based approach, and therefore we have described an alternative approach, which we have called ’sparse equations technique’. This is a technique that regularizes the affine projection algorithm if it is used with signals that have a large autocorrelation only for a small lag (e.g. speech). It can be used as a stand alone technique for regularization. Unfortunately, this technique violates the assumptions on which the FAP algorithm is based, which motivates the development of a fast APA algorithm which does not incorporate the assumptions present in FAP. But the main reason for developing such an algorithm is the fact that if much regularization is used in FAP, the algorithm exhibits problems. We have thus derived an exact frequency domain version of the affine projection algorithm, named Block Exact APA. This algorithm has a complexity that is comparable with a frequency domain version of fast affine projection (namely BEFAP), and since it is an exact implementation of APA, the convergence characteristics of the original affine projection algorithm are maintained when regularization is applied, while this is not the case when FAP–based fast versions of APA are used. This algorithm was extended to allow for the ’sparse equations’ technique for regularization to be used. Acoustic Noise Cancellation In the literature, several noise cancellation schemes can be found. Most of these schemes use multiple microphones in order to take advantage of the spatial characteristics of both speech and noise. Apart from the classical beamforming approaches, also unconstrained optimal filtering approaches exist. Traditionally they have been based upon singular value decomposition techniques, which inherently have a large 167 complexity. We have derived a new QRD–based algorithm for unconstrained optimal multichannel filtering with an “unknown” desired signal, and applied this to adaptive acoustic noise suppression. The same basic problem is solved as in related algorithms from the literature, but due to the high computational complexity of the SVD–algorithm which is used in the traditional techniques, often approximations (SVD–tracking) are introduced in order to keep complexity manageable. Using the QRD–approach results in a performance which is the same as that of the SVD–based algorithms, or even better since no approximations are required. The complexity of the QRD–based optimal filtering technique is an order of magnitude lower than that of the (approximating) SVD–based approaches. We have also introduced a ’trade–off’ parameter which allows for obtaining more noise reduction in exchange for some (tolerable) signal distortion. Besides the QRD–based approach, we have also derived a fast QRD–LSL–based algorithm and applied it on the ’unknown desired signal’–case which is encountered in acoustic noise cancellation. This algorithm is based on a significantly reorganized version of the QRD–RLS–based unconstrained optimal filtering scheme. While the QRD–based unconstrained optimal filtering algorithm has a complexity which is an order of magnitude lower than the complexity of the (approximating) SVD–tracking based algorithm, fast QRD–LSL based unconstrained optimal filtering even achieves a complexity wich is about 8 times lower than QRD–based unconstrained optimal filtering (for typical parameter settings). Combination of echo and noise cancellation We have extended both QRD–RLS and QRD–LSL based schemes for noise cancellation with an extra echo–reference input signal, thus proposing schemes which handle combined noise– and echo cancellation as one single optimization problem. We have shown by simulations that the performance is better when such a global optimization problem is solved than when the traditional cascading approach is used. The complexity and performance figures show that allthough somewhat more complex, the more performant QRD–LSL based approach can be applied for real time processing as an alternative for cascading techniques. 168 CHAPTER 8. CONCLUSIONS Further research In the field of acoustic echo cancellation, no ’perfect’ solutions exist yet for multichannel decorrelation. For speech signals non–linearities like half–wave rectifiers are providing sufficiently good results, but in applications where multichannel audio is involved (voice command applications for audio devices), these solutions introduce intolerable distortion. This subject clearly requires more research. The adaptive filtering techniques which form the core of acoustic echo cancellers are well explored. For cheap consumer products NLMS and frequency domain adaptive filters can be used, while a whole range of better (and more expensive) algorithms exist if one can afford the extra complexity. For the class of noise cancellation algorithms we have described in this thesis, namely the unconstrained MMSE–optimal filtering class, only SVD–based and (as derived in this text) QRD–based algorithms exist. Another interesting subject for future research would be if this topic could be handled by ’cheaper’ adaptive filtering algorithms, like perhaps APA–based filters. We would like to refer to [23], where the NLMS–algorithm is used to implement unconstrained optimal filtering for multichannel noise cancellation. Finding cheaper algorithms is even more important when the combination of echo– and noise reduction is considered as in chapter 7, since the filter length corresponding to the echo path is usually much larger than the filters used in the ’noise reduction’ part of the algorithm. In traditional setups, where echo– and noise cancellation were handled in two separate schemes which were cascaded, cheap filters could be used for the long paths in echo cancelling, while more complex algorithms could be used for the shorter noise reduction paths. But the experiments in chapter 7 clearly indicate that there is an advantage in solving the combined problem as a whole, so it would be interesting to invest time in trying to reduce the complexity of the integrated optimal filtering approach. Bibliography [1] M. Ali. Stereophonic acoustic echo cancellation system using time–varying all– pass filtering for signal decorrelation. In ICASSP. IEEE, 1998. [2] F. Amand, J. Benesty, A. Gilloire, and Y. Grenier. A fast two–cahnnel projection algorithm for stereophonic acoustic echo cancellation. In ICASSP96. IEEE, 1996. [3] Duncan Bees, Maier Blostein, and Peter Kabal. Reverberant speech enhancement using cepstral processing. In Proceedings of the 1991 IEEE Int. Conf. on Acoust., Speech and Signal Processing, pages 977 – 980. IEEE, May 1991. [4] J. Benesty, F. Amand, A. Gilloire, and Y. Grenier. Adaptive filtering algorithms for stereophonic acoustic echo cancellation. In ICASSP, pages 3099 – 3102. IEEE, 1995. [5] J. Benesty, A. Gilloire, and Y. Grenier. A frequency domain stereophonic acoustic echo canceller exploiting the coherence between the channels and using nonlinear transformations. In Proceedings of International Workshop on Acoustics and Echo Cancelling (IWAENC99), pages 28–31. IEEE, 1999. [6] J. Benesty, D. R. Morgan, and J. L. Hall adn M. M. Sondhi. Synthesised stereo combined with acoustic echo cancellation for desktop conferencing. In Proceedings of ICASSP99, 1999. [7] J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi. Stereophonic acoustic echo cancellation using nonlinear transformations and comb filtering. In ICASSP. IEEE, 1998. [8] F. Capman, J. Boudy, and P. Lockwood. Acoustic echo cancellation using a fast qr-rls algorithm and multirate schemes. Proceedings of ICASSP, pages 969–972, 1995. [9] F. Capman, J. Boudy, and P. Lockwood. Controlled convergence of qr least squares adaptive algorithms — application to speech echo cancellation. In Proceedings of ICASSP, pages 2297–2300. IEEE, 1997. 169 170 BIBLIOGRAPHY [10] C. Carlemalm, F. Gustafsson, and B. Wahlberg. On the problem of detection and discrimination of double talk and change in the echo path. In ICASSP Conference Proceedings. IEEE, ? [11] S. Doclo, E. De Clippel, and M. Moonen. Multi–microphone noise reduction using gsvd–based optimal filtering with anc postprocessing stage. In Proc. of the 9th IEEE DSP Workshop, Hunt TX, USA. IEEE, Oct. 2000. [12] S. Doclo and M. Moonen. SVD–based optimal filtering with applications to noise reduction in speech signals. In Proc. of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’99), New Paltz NY, USA, pages 143–146. IEEE, Oct 1999. [13] S. Doclo and M. Moonen. Noise reduction in multi-microphone speech signals using recursive and approximate GSVD–based optimal filtering. In Proc. of the IEEE Benelux Signal Processing Symposium (SPS-2000), Hilvarenbeek, The Netherlands, March 2000. [14] S. Doclo and M. Moonen. GSVD-Based Optimal Filtering for Multi-Microphone Speech Enhancement, chapter 6 in “Microphone Arrays: Signal Processing Techniques and Applications” (Brandstein, M. S. and Ward, D. B., Eds.), pages 111–132. Springer-Verlag, May 2001. [15] S. Doclo and M. Moonen. GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Trans. Signal Processing, 50(9):2230– 2244, September 2002. [16] Matthias Dörbecker and Stefan Ernst. Combination of two–channel spectral subtraction and adaptive wiener post–filtering for noise reduction and dereverberation. In Proceedings of EUSIPCO96, page 995, September 1996. [17] P. Dreiseitel, E. Hansler, and H. Puder. Acoustic echo and noise control — a long lasting challenge, 1998. [18] K. Eneman. Subband and Frequency–Domain Adaptive Filtering Techniques for Speech Enhancement in Hands–free Communication. PhD thesis, Katholieke Universiteit Leuven, Heverlee, Belgium, March 2002. [19] K. Eneman and M. Moonen. Hybrid Subband/Frequency–Domain Adaptive Systems. Signal Processing, 81(1):117–136, January 2001. [20] P. Eneroth, T. Gänsler, S. Gay, and J. Benesty. Studies of a wideband stereophonic acoustic echo canceler. In Proc. 1999 IEEE Workshop on applications of Signal Processing to Audio and Acoustics, pages 207–210. IEEE, October 1999. [21] P. Eneroth, S. Gay, T. Gänsler, and J. Benesty. A hybrid frls/nlms stereo acoustic echo canceller. In Proceedings of IWAENC, 1999. BIBLIOGRAPHY 171 [22] Y. Ephraim and H. L. Van Trees. A signal subspace approach for speech enhancement. IEEE Transactions on speech and audio processing, 3(4):251–266, july 1995. [23] D. A. F. Florencio and H. S. Malvar. Multichannel filtering for optimum noise reduction in microphone arrays. In EEE International Conference on Acoustics, Speech, and Signal Processing, pages 197–200. IEEE, May 2001. [24] T. Gänsler and J. Benesty. Stereophonic acoustic echo cancellation and two– channel adaptive filtering : an overview. International Journal of Adaptive Control and Signal Processing, February 2000. [25] T. Gänsler, S. L. Gay, M. M. Sondhi, and J. Benesty. Double–talk robust fast converging algorithms for network echo cancellation. In Proc. 1999 IEEE Workshop on applications of Signal Processing to Audio and Acoustics, pages 215–218. IEEE, October 1999. [26] S. L. Gay and S. Tavathia. The fast affine projection algorithm. In ICASSP, pages 3023–3026. IEEE, 1995. [27] Steven Gay. Fast projection algorithms with application to voice echo cancellation. PhD thesis, Rutgers, The State University of New Jersey, New Brunswick, 1994. [28] A. Gilloire and V. Turbin. Using auditory properties to improve the behaviour of stereophonic acoustic echo cancellers. In ICASSP. IEEE, 1998. [29] Golub and Van Loan. Matrix Computations, chapter 12. Johns Hopkins, 1996. [30] Simon Haykin. Adaptive Filter Theory. Prentice Hall, 3 edition, 1996. [31] P. Heitkämper. An adaptation control for acoustic echo cancellers. IEEE Signal processing letters, 4(6):170 – 173, June 1997. [32] Q.-G. Liu, B. Champagne, and P. Kabal. A microphone array processing technique for speech enhancement in a reverberant space. Speech Communication, 18:317–334, 1996. [33] S. Makino and S. Shimauchi. Stereophonic acoustic echo cancellation - an overview and recent solutions. In Proceedings of International Workshop on Acoustics and Echo Cancelling (IWAENC99), pages 12–19. IEEE, 1999. [34] S. Makino, K. Strauss, S. Shimauchi, Y. Haneda, and A. Nakagawa. Subband stereo echo canceller using the projection algorithm with fast convergence to the true echo path. In Proceedings of the ICASSP, pages 299 – 302. IEEE, 1997. [35] Henrique S. Malvar. Signal Processing With Lapped Transforms. Artech House, 0. 172 BIBLIOGRAPHY [36] K. Maouche and D. T. M. Slock. The fast subsampled–updating fast affine projection (fsu fap) algorithm. Research report, Institut EURECOM, 2229, route des Cretes, B.P.193, 06904 Sophia Antipolis Cedex, December 1994. [37] Rainer Martin and Peter Vary. Combined acoustic echo cancellation, dereverberation and noise reduction : a two microphone approach. In Ann. Telecommun., volume 49, pages 429–438. 1994. [38] Rainer Martin and Peter Vary. Combined acoustic echo control and noise reduction for hands–free telephony — state of the art and perspectives. In EUSIPCO96, page 1107, 1996. [39] J. G. McWhirter. Recursive least squares minimisation using a systolic array. In Proc. SPIE Real Time Signal Processing IV, volume 431, pages 105–112, 1983. [40] M. Miyoshi and Y. Kaneda. Inverse Filtering of Room Acoustics. IEEE trans. on Acoustics, Speech and Signal Proc., 36(2):145–152, February 1988. [41] M. Mohan Sondhi, D. R. Morgan, and J. L. Hall. Stereophonic acoustic echo cancellation — an overview of the fundamental problem. IEEE Signal Processing Letters, 2(8):148–151, August 1995. [42] G. V. Moustakides and S. Theodoridis. Fast newton transversal filters – a new class of adaptive estimation algorithms. IEEE Transactions on signal processing, 39(10):2184 – 2193, October 1991. [43] K. Ozeki and T. Umeda. An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties. Electronics and communications in Japan, 67-A(5):126 – 132, February 1984. [44] C. B. Papadias and D. T. M. Slock. New adaptive blind equalization algorithms for constant modulus constellations. In ICASSP94, pages 321–324, Adelaide, Australia, April 1994. IEEE. [45] J. Prado and E. Moulines. Frequency domain adaptive filtering with applications to acoustic echo cancellation. Ann. Telecommun, 49(7-8):414–428, 1994. [46] J. G. Proakis, C. M. Rader, F. Ling, C. L. Nikias, M. Moonen, and I. K. Proudler. Algorithms for Statistical Signal Processing. Prentice–Hall, ISBN: 0-13-062219-2, 1/e edition, 2002. [47] G. Rombouts and M. Moonen. Avoiding explicit regularisation in affine projection algorithms for acoustic echo cancellation. In Proceedings of ProRISC99, Mierlo, The Netherlands, pages 395–398, November 1999. [48] G. Rombouts and M. Moonen. A fast exact frequency domain implementation of the exponentially windowed affine projection algorithm. In Proceedings of Symposium 2000 for Adaptive Systems for Signal Processing, Communication and Control (AS-SPCC), pages 342–346, Lake Louise, Canada, 2000. BIBLIOGRAPHY 173 [49] G. Rombouts and M. Moonen. Regularized affine projection algorithms for multichannel acoustic echo cancellation. In Proceedings of IEEE-SPS2000, page cdrom, Hilvarenbeek, The Netherlands, March 2000. IEEE. [50] G. Rombouts and M. Moonen. Sparse–befap : A fast implementation of fast affine projection avoiding explicit regularisation. In Proceedings of EUSIPCO2000, pages 1871–1874, September 2000. [51] G. Rombouts and M. Moonen. Fast QRD–lattice–based optimal filtering for acoustic noise reduction. Internal Report KULEUVEN/ESAT-SISTA/TR 01-48, Submitted for publication., May 2001. [52] G. Rombouts and M. Moonen. Acoustic noise reduction by means of qrd–based optimal filtering. In Proceedings of MPCA2002, Leuven, Belgium, November 2002. [53] G. Rombouts and M. Moonen. An integrated approach to acoustic noise and echo suppression. Submitted for publication, January 2002. [54] G. Rombouts and M. Moonen. Qrd–based optimal filtering for acoustic noise reduction. In Proceedings of EUSIPCO2002, Toulouse, France, page CDROM, September 2002. [55] G. Rombouts and M. Moonen. A sparse block exact affine projection algorithm. IEEE Transactions on Speech and Audio Processing, 10(2):100–108, February 2002. [56] G. Rombouts and M. Moonen. QRD–based optimal filtering for acoustic noise reduction. Internal Report KULEUVEN/ESAT-SISTA/TR 01-47, Accepted for publication in Elsevier Signal Processing, February 2003. [57] D. W. E. Schobben and P. C. W. Sommen. On the performance of too short adaptive fir filters. In Proceedings Circuits Systems and Signal Proc. (ProRISC), Mierlo, The Netherlands, pages 545–549, November 1997. [58] S. Shimauchi, Y. Haneda, S. Makino, and Y. Kaneda. New configuration for a stereo echo canceller with nonlinear pre–processing. In ICASSP. IEEE, 1998. [59] M. Tanaka and S. Makino. A block exact fast affine projection algorithm. IEEE Transactions on Speech and Audio Processing, 7(1):79–86, January 1999. [60] D. Van Compernolle and S. Van Gerven. Beamforming with microphone arrays. In V. Cappellini and A. Figueiras-Vidal, editors, Applications of Digital Signal Processing to Telecommunications, pages 107–131. COST 229, 1995. 174 BIBLIOGRAPHY List of publications • Vandaele P., Rombouts G., Moonen M., “Implementation of an RTLS blind equalization algorithm on DSP”, in Proc. of the 9th IEEE International Workshop on Rapid System Prototyping, Leuven, Belgium, Jun. 1998, pp. 150-155. • Rombouts G., Moonen M., “Avoiding Explicit Regularisation in Affine Projection Algorithms for Acoustic Echo Cancellation”, in Proc. of the ProRISC/IEEE Benelux Workshop on Circuits, Systems and Signal Processing (ProRISC99), Mierlo, The Netherlands, Nov. 1999, pp. 395-398. • Rombouts G., Moonen M., “A fast exact frequency domain implementation of the exponentially windowed affine projection algorithm”, in Proc. of Symposium 2000 for Adaptive Systems for Signal Processing, Communication and Control (AS-SPCC), Lake Louise, Canada, Oct. 2000, pp. 342-346. • Rombouts G., “Regularized affine projection algorithms for multichannel acoustic echo cancellation”, in Proc. of the IEEE Benelux Signal Processing Symposium (SPS2000), Hilvarenbeek, The Netherlands, Mar. 2000. • Rombouts G., Moonen M., “Sparse-BEFAP : A fast implementation of fast affine projection avoiding explicit regularisation”, in Proc. of the European Signal Processing Conference (EUSIPCO), Tampere, Finland, Sep. 2000, pp. 1871-1874. • Schier J., Vandaele P., Rombouts G., Moonen M., “Experimental implementation of the spatial division multiple access (SDMA) algorithms using DSP system with the TMS320C4x processors”, in The proceedings of the Third European DSP Education and Research Conference, Paris, France, Sept. 2000, pp. CD-ROM. • Rombouts G., Moonen M., “Acoustic noise reduction by means of QRD-based unconstrained optimal filtering”, in Proc. of the IEEE Benelux Workshop on Model based processing and coding of audio (MPCA), Leuven, Belgium, Nov. 2002. 175 • Rombouts G., Moonen M., “A sparse block exact affine projection algorithm”, IEEE Transactions on Speech and Audio Processing, vol. 10, no. 2, Feb. 2002, pp. 100-108. • Rombouts G., Moonen M., “QRD-based optimal filtering for acoustic noise reduction”, Accepted for publication in Elsevier Signal Processing, Internal Report 01-47, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2001. • Rombouts G., Moonen M., “QRD–based optimal filtering for acoustic noise reduction”, EUSIPCO 2002, Toulouse, France, CDROM. Submitted papers • Rombouts G., Moonen M., “An integrated approach to acoustic noise and echo suppression”, Internal Report 02-206, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2002, submitted to Elsevier Signal Processing. • Rombouts G., Moonen M., “Fast-QRD-based optimal filtering for acoustic noise reduction”, Internal Report 01-48, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2001, resubmitted to IEEE Transactions on Speech And Audio processing for 2nd review. 176 Curriculum Vitae Geert Rombouts was born in Turnhout on august 11, 1973. He studied at the Katholieke Universiteit Leuven, faculty of applied sciences (Faculteit Toegepaste Wetenschappen) from 1991 to 1997, where he received his M.Sc. degree in electrical engineering (Burgerlijk Ingenieur Elektrotechniek) in 1997. From 1997 to 2002 he did Phd. research at the Katholieke Universiteit Leuven, faculty of applied sciences. 177