Adaptive filtering algorithms for acoustic echo and noise cancellation

Transcription

Adaptive filtering algorithms for acoustic echo
and noise cancellation
Geert Rombouts
25th april 2003
KATHOLIEKE UNIVERSITEIT LEUVEN
FACULTEIT TOEGEPASTE WETENSCHAPPEN
DEPARTEMENT ELEKTROTECHNIEK
Kasteelpark Arenberg 10, 3001 Leuven (Heverlee)
Adaptive filtering algorithms for acoustic echo and noise cancellation
Proefschrift voorgedragen tot het behalen van het doctoraat in de toegepaste wetenschappen door Geert ROMBOUTS
Jury :
Prof.
Prof.
Prof.
Prof.
Prof.
Prof.
Prof.
dr.
dr.
dr.
dr.
dr.
dr.
dr.
ir.
ir.
ir.
ir.
ir.
ir.
ir.
E. Aernoudt, voorzitter
M. Moonen, promotor
D. Van Compernolle
B. De Moor
S. Van Huffel
P. Sommen (TU Eindhoven)
I. K. Proudler (King’s College, UK)
UDC 681.3*I12:534
April 2003
Copyright Katholieke Universiteit Leuven - Faculteit Toegepaste Wetenschappen
Arenbergkasteel, B-3001 Heverlee
Alle rechten voorbehouden. Niets uit deze uitgave mag vermenigvuldigd en/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm, elektronisch of op
welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uitgever.
All rights reserved. No part of this publication may be reproduced in any form by
print, photoprint, microfilm or any other means without written permission from the
publisher.
D/2003/7515/13
ISBN 90-5682-402-3
Voor mijn grootmoeder,
Maria Jonckers
4
Abstract
In this thesis, we develop a number of algorithms for acoustic echo and noise cancellation.
We derive a fast exact implementation for the affine projection algorithm (APA), and
we also show that when using strong regularization the existing (approximating) fast
techniques exhibit problems.
We develop a number of algorithms for noise cancellation based on optimal filtering
techniques for multi–microphone systems. By using QR–decomposition based techniques, a complexity reduction of a factor 50 to 100 is achieved compared to existing
implementations.
Finally, we show that instead of using a cascade of a noise–cancellation system and
an echo–cancellation system, it is better to solve the combined problem as a global
optimization problem . The aforementioned noise reduction techniques can be used to
solve this optimization problem.
5
List of symbols
B(k) : Right hand side in QRD–RLS based noise reduction equation
d(k) : Desired signal of an adaptive filter at time k
d1 (k) d2 (k) d3 (k) : Desired signals for multiple right hand sides
d(k) : Vector with recent desired signal samples
δ : Regularisation parameter (diagonal loading)
e(k) : Error signal of an adaptive filter
ε{} : Expected value operator
f (k) : Loudspeaker reference signal
G : Givens rotation
gi : Far end room paths
hi : Near end room paths
λ : Forgetting factor (weighting factor)
λn : Forgetting factor during noise–only periods
λs : Forgetting factor during speech+noise–periods
M : Number of channels
µ : Stepsize
n(k) : Noise signal
N : Number of filter taps per channel
Naec : Number of taps in the AEC part in AENC
Q(k) : Orthogonal matrix Q in a QR–decomposition
R(k) : Upper triangular matrix R in a QR–decomposition
Σ : Diagonal matrix in an SVD–decomposition
σi : Singular value
u(k) : Input vector with microphone signals and echo reference
v(k) : Acoustical disturbance signal
6
v(k) : Vector with recent disturbance samples
V (k) : Toeplitz matrix with disturbance signal
w(k) : Filter coefficient vector. A subscript may specify the algorithm used.
W (k) : Matrix of which the columns are filter vectors
x(k) : Input signal
x(k) : Input vector
X(k) : Toeplitz matrix with input signal
Ξ(k) : Input correlation matrix
y(k) : Output of adaptive filter
N
: Convolution symbol
Contents
1
2
Speech signal enhancement
13
1.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.2
Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.2.1
Nature of acoustical disturbances . . . . . . . . . . . . . . .
14
1.2.2
AEC, reference–based noise reduction . . . . . . . . . . . . .
15
1.2.3
ANC, reference–less noise reduction . . . . . . . . . . . . . .
19
1.2.4
Combined AEC and ANC . . . . . . . . . . . . . . . . . . .
21
1.3
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
1.4
The market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
1.5
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
1.6
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
Adaptive filtering algorithms
29
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
2.2
Normalized Least Mean Squares algorithm . . . . . . . . . . . . . .
33
2.3
Recursive Least Squares algorithms . . . . . . . . . . . . . . . . . .
35
2.3.1
Standard recursive least squares . . . . . . . . . . . . . . . .
35
2.3.2
QRD–updating . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.3.3
QRD–based RLS algorithm (QRD–RLS) . . . . . . . . . . .
38
7
8
CONTENTS
2.3.4
QRD–based least squares lattice (QRD–LSL) . . . . . . . . .
42
2.3.5
RLS versus LMS . . . . . . . . . . . . . . . . . . . . . . . .
43
Affine Projection based algorithms . . . . . . . . . . . . . . . . . . .
45
2.4.1
The affine projection algorithm . . . . . . . . . . . . . . . .
45
2.4.2
APA versus LMS . . . . . . . . . . . . . . . . . . . . . . . .
45
2.4.3
The Fast Affine Projection algorithm (FAP) . . . . . . . . . .
46
2.5
Geometrical interpretation . . . . . . . . . . . . . . . . . . . . . . .
47
2.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
2.4
3
4
APA–regularization and Sparse APA for AEC
51
3.1
APA regularization . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.1.1
Diagonal loading . . . . . . . . . . . . . . . . . . . . . . . .
52
3.1.2
Exponential weighting . . . . . . . . . . . . . . . . . . . . .
55
3.2
APA with sparse equations . . . . . . . . . . . . . . . . . . . . . . .
56
3.3
FAP and the influence of regularization . . . . . . . . . . . . . . . .
61
3.4
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . .
63
3.5
Regularization in multichannel AEC . . . . . . . . . . . . . . . . . .
64
3.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Block Exact APA (BEAPA) for AEC
71
4.1
Block Exact Fast Affine Projection (BEFAP) . . . . . . . . . . . . . .
72
4.2
Block Exact APA (BEAPA) . . . . . . . . . . . . . . . . . . . . . . .
75
4.2.1
Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
4.2.2
Complexity reduction . . . . . . . . . . . . . . . . . . . . . .
78
4.2.3
Algorithm specification . . . . . . . . . . . . . . . . . . . . .
79
Sparse Block Exact APA . . . . . . . . . . . . . . . . . . . . . . . .
79
4.3.1
81
4.3
Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
CONTENTS
4.4
5
Complexity reduction . . . . . . . . . . . . . . . . . . . . . .
82
4.3.3
Algorithm specification . . . . . . . . . . . . . . . . . . . . .
84
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
QRD–RLS based ANC
87
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
5.2
Unconstrained optimal filtering based ANC . . . . . . . . . . . . . .
89
5.3
QRD–based algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
91
5.3.1
Speech+noise – mode . . . . . . . . . . . . . . . . . . . . .
93
5.3.2
Noise only–mode. . . . . . . . . . . . . . . . . . . . . . . .
95
5.3.3
Residual extraction . . . . . . . . . . . . . . . . . . . . . . .
95
5.3.4
Initialization . . . . . . . . . . . . . . . . . . . . . . . . . .
97
5.3.5
Algorithm description . . . . . . . . . . . . . . . . . . . . .
97
Trading off noise reduction vs. signal distortion . . . . . . . . . . . .
97
5.4.1
Regularization . . . . . . . . . . . . . . . . . . . . . . . . .
99
5.4.2
Speech+noise mode . . . . . . . . . . . . . . . . . . . . . .
99
5.4.3
Noise–only mode . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4
6
4.3.2
5.5
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.6
Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.7
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Fast QRD–LSL–based ANC
109
6.1
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2
Modified QRD–RLS based algorithm . . . . . . . . . . . . . . . . . 112
6.3
6.2.1
Speech+noise–mode . . . . . . . . . . . . . . . . . . . . . . 112
6.2.2
Noise–only mode . . . . . . . . . . . . . . . . . . . . . . . . 115
QRD–LSL based algorithm
. . . . . . . . . . . . . . . . . . . . . . 119
10
CONTENTS
6.4
6.5
7
6.3.1
Per sample versus per vector classification . . . . . . . . . . . 119
6.3.2
LSL–algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 124
Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.4.1
Transition from speech+noise to noise–only mode . . . . . . 125
6.4.2
Transition from a noise–only to a speech+noise–period . . . . 131
Noise reduction vs. signal distortion trade–off . . . . . . . . . . . . . 132
6.5.1
Regularization in QRD–LSL based ANC . . . . . . . . . . . 132
6.5.2
Regularization using a noise buffer . . . . . . . . . . . . . . . 133
6.5.3
Mode–dependent regularization . . . . . . . . . . . . . . . . 135
6.6
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.7
Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.8
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Integrated noise and echo cancellation
143
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.2
Optimal filtering based AENC . . . . . . . . . . . . . . . . . . . . . 145
7.3
Data driven approach . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.4
QRD–RLS based algorithm . . . . . . . . . . . . . . . . . . . . . . . 151
7.4.1
Speech+noise/echo updates . . . . . . . . . . . . . . . . . . 151
7.4.2
Noise/echo–only updates
. . . . . . . . . . . . . . . . . . . 152
7.5
QRD–LSL algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.6
Regularized AENC . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.6.1
Regularization using a noise/echo buffer . . . . . . . . . . . . 155
7.6.2
Mode–dependent regularization . . . . . . . . . . . . . . . . 156
7.7
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.8
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.9
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
CONTENTS
8
Conclusions
11
165
12
CONTENTS
Chapter 1
Speech signal enhancement
A microphone often picks up acoustical disturbances together with a speaker’s voice
(which is the signal of interest). In this work, algorithms will be developed for techniques that allow for removing these disturbances from the speech signal before further processing it.
1.1
Overview
In general, more than one type of disturbances will be present in a microphone signal,
each of them requiring a specific enhancement approach. We will mainly focus on two
classes of speech enhancement techniques, namely acoustic echo cancellation (AEC)
(section 1.2.2) and acoustic noise cancellation (ANC) (section 1.2.3).
For AEC, a whole range of algorithms exists, from computationally cheap to expensive, with of course a corresponding performance. We will focus on one of the ’intermediate’ types of algorithms, of which the performance and complexity can be tuned
depending on the available computational power. We will describe some methods to
increase noise robustness, we will show how existing fast implementations fail when
their assumptions are violated, and we will derive a fast implementation which does
not require any assumptions.
For ANC, a class of promising state of the art techniques exists of which the characteristics could be complementary to the features of computationally cheaper (and
commercially available) techniques. Existing algorithms for these techniques have
a high numerical complexity, and hence are not suited for real time implementation.
This observation motivates our work in the field of acoustic noise cancellation, and we
describe a number of algorithms that are (several orders of magnitude) cheaper than
13
14
CHAPTER 1. SPEECH SIGNAL ENHANCEMENT
existing implementations, and hence allow for real time implementation.
Finally we will show that considering the combined problem of acoustic echo and
noise cancellation as a global optimization problem leads to better results than using
traditional cascaded schemes. The techniques which we use for ANC can easily be
modified to incorporate AEC.
The outline of this first chapter is as follows. After a problem statement in section
1.2, we will describe a number of applications in which acoustic echo– and noise cancelling techniques prove useful in section 1.3. In section 1.4, an overview of commercially available applications in this field is given. In section 1.5 our own contributions
are summarized. Section 1.6 gives an outline of the remains of the thesis.
1.2
1.2.1
Problem statement
Nature of acoustical disturbances
In many applications involving speech communication, it is difficult (expensive) to
place microphones closely to the speakers. The microphone amplification then has to
be large due to the large distance to the speech source. As a result, more environmental
noise will be picked up than in the case where the microphones would be close to the
speech source.
For some of these disturbances, a reference signal may be available. For example a
radio may be playing in the background while someone is making a telephone call.
The electrical signal that is fed to the radio’s loudspeaker can be used as a reference
signal for the radio sound reaching the telephone’s microphone. We will call the
techniques that rely on the presence of a reference signal ’acoustic echo cancellation
techniques’ (AEC), the reason for this name will become clear below.
For other types of disturbances, no reference signal is available. Examples of such
disturbances are the noise of a computer fan, people who are babbling in the room
where someone is using a telephone, car engine noise, ... Techniques that perform
disturbance reduction where no reference signal is available will be called ’acoustic
noise cancellation techniques’ (ANC) in this text.
In some situations the above two noise reduction techniques should be combined with
a third enhancement technique, namely dereverberation. Each acoustical environment
has an impulse response, which results in a spectral coloration or reverberation of
sounds that are recorded in that room. This reverberation is due to reflections of
the sound against walls and objects, and hence has specific spatial characteristics,
other than those of the original signal. The human auditory system deals with this
effectively because it has the ability to concentrate on sounds coming from a certain
direction, using information from both ears. If for example one would hear a signal
15
1.2. PROBLEM STATEMENT
recorded by only one microphone in a reverberant room, speech signals may easily
become unintelligible. Of course also voice recognition systems that are trained on
non–reverberated speech will have difficulties handling signals that have been filtered
by the room impulse response, and hence dereverberation is necessary.
In this thesis, we will concentrate on algorithms for both classes of noise reduction
(noise reduction with (AEC) and without (ANC) a reference signal). Dereverberation
will not be treated here (we refer to [32, 40, 3] for dereverberation techniques).
1.2.2
AEC, reference–based noise reduction
The most typical application of noise reduction in case a reference signal is available,
is acoustic echo cancellation (AEC). As mentioned before, we will use the term AEC
to refer to the technique itself, even though the disturbance which is reduced is not
always strictly an ’echo’.
Single channel techniques. A teleconferencing setup consists of two conference
rooms (see Figure 1.1) in both of which microphones and loudspeakers are installed.
Near end room
Far end
Speech
AEC
Near end
Speech
Far end room
Figure 1.1: Acoustic echo cancellation. The loudspeaker signal in the near end room is picked
up by the microphone, and would be sent back to the far end room without an echo canceller,
where the far end speaker would hear his own voice again (delayed by the communication
setup).
Sound picked up by the microphones in one room (called the ’far end speech’ and the
’far end room’) is reproduced by the loudspeakers in the other (near end) room. The
task of an ’echo canceller’ is to avoid that the portion of the far–end speech signal,
which is picked up by the microphones in the near end room, is sent back to the far
end. Hearing his own delayed voice will be very annoying to the far end speaker.
16
A similar example is voice control of a CD–player. The music itself then can be
considered a disturbance (echo) to the voice control system.
The loudspeaker signal in both cases is ’filtered’ by the room impulse response. This
impulse response is the result of the sound being reflected and attenuated (in a frequency dependent way) by the walls and by objects in the room. Due to the nature of
this process, the room acoustics can be modeled by a finite impulse response (FIR) filter. Nonlinear effects (mostly by loudspeaker imperfections) are not considered here.
In an acoustic echo cancellation algorithm, a model of the room impulse response is
identified. Since the conditions in the room may vary continuously (people moving
around being an obvious example), the model needs to be updated continuously. This
is done by means of adaptive filtering techniques. In the situation in Figure 1.2 the far
end signal x(k) is filtered by the room impulse response, and then picked up by a microphone, together with the desired speech signal of the near end speaker. We consider
digital signal processing techniques, hence A/D converted signals, i.e. discrete–time
signals and systems. At the same time, the loudspeaker signal x(k) is filtered by a
model w(k) of the room impulse response wreal , and subtracted from the microphone
signal d(k) :
e(k) = d(k) − wT (k)x(k).
During periods where the near end speaker is silent, the error (residual) signal e(k)
may be used to update w(k), but when the near end speaker is talking, this signal
would disturb the adaptation process. We assume that the room characteristics do
not change too much during the periods in which near end speech is present, and
the adaptation is frozen in these periods by a control algorithm in order to solve this
problem.
x(k) = [x(k) ... x(k−N+1)]
Far
End
Signal
wk
−
e(k)
++
d(k)
Near end
Speech
Near end room
Figure 1.2: Echo canceller : typical situation.
In the acoustic echo canceller scheme, the adaptive filtering structure (see also Figure
17
2.1) is easily recognized. The input signal to this adaptive filter is the loudspeaker
signal x(k) (the reference signal), the desired signal for the filter is the microphone
signal d(k), and the error signal e(k) of the adaptive filter is used as the output signal
for the AEC scheme.
In practice, the length of the room acoustics (and by consequence also the impulse
response length of the model w(k)) can easily be 2000 filter taps (even for a rather low
sampling frequency of 8 kHz). This is the reason why people often use the celebrated
and computationally cheap Normalized Least Mean Squares (NLMS) adaptive filter
(see section 2.2), or even cheaper frequency domain derivatives of it for adapting w(k)
[19, 18]. The disadvantage of NLMS is its often poor performance for non–white input
signals (like speech).
While NLMS is a cheap algorithm, the Recursive Least Squares (RLS) algorithm
(section 2.3) has a higher performance, and fast variants are indeed used for acoustic
echo cancellation [9, 8, 20, 21]. However, due to its complexity, efforts have been done
to find algorithms that combine the low complexity of NLMS with the performance
of RLS. Most notably are the Fast Newton Transversal filter (FNTF) [42] and fast
variants [26] of the Affine Projection Adaptive (APA) [43] filter (see section 2.4). In
this thesis, we derive a number of contributions to the field of APA–filtering.
The performance advantage offered by these filters compared to NLMS, is due to a
’prewhitening’ structure that removes the correlation from the reference signal. As
will be shown later, further signal processing may require multiple microphones (a
microphone array) that pick up the sound in the room. The echo canceller structure
then obviously has to be repeated for each of the microphones, as shown in Figure 1.3.
The prewhitening stage, however, can be ’shared’ among the different microphones in
X (k)
Far
End
−
e (k)
1
e2 (k)
+
− −
+
−
d1(k)
Near end
Speech
d2(k)
Figure 1.3: Multi–microphone acoustic echo canceller. The single channel setup can simply be
repeated
a multi–microphone setup.
18
An acoustic echo canceller never consists of the adaptive filter alone, but always requires some control logic. The adaptive filter is in practice never updated when near
end speech is present, and only updated if there is far end signal available. The decision can e.g. be based upon measurements of the correlation of the residual signal
e(k) with the loudspeaker signal. In this text, however, this control device will not
be considered. All experiments have been done with a ’perfect’ control device, i.e.
speech periods have been marked manually.
In the acoustic echo canceller context, it is important that the decision device never
allows the filter to adapt during a double–talk period (when both far end and near end
speaker active), since then the adaptation would be disturbed by the near end signal,
and the coefficients would converge to wrong values. The other situation is less problematic : when a period in which only far end talk is present is labeled as double–talk,
the echo canceller would not adapt. If this would happen often, the overall convergence would just be somewhat slower.
We refer to the literature [10, 25, 31, 45] from which a suitable implementation can
be picked.
Multichannel techniques. Multi–channel techniques for acoustic echo cancellation
[4, 28, 2, 41] should not be confused with multi–microphone techniques. In a multi–
microphone–setup, all adaptive filters have the same input signal (the mono loudspeaker signal), while in a multichannel–setup, multiple loudspeakers (or reference
signals) are used, see Figure 1.4. An application example is a stereo setup used for
X (k)
1
X2(k)
Far
End
Near end
Speech
−
+
−
+
d1(k)
Figure 1.4: Multi–channel acoustic echo canceller. The fundamental problem of stereophonic
AEC tends to occur in this case, and decorrelation of the loudspeaker signals is necessary to
achieve good performance
19
teleconferencing in order to provide the listener with a comfortable spatial impression. While the extension of the single channel techniques to multiple microphones is
trivial, multi–channel AEC on the other hand is highly non–trivial.
A specific problem with multichannel echo cancellation is the non–uniqueness [4, 20,
5, 24, 2] of the solution. This is sometimes referred to as the ’fundamental problem’
of stereophonic echo cancellation. Since all loudspeaker signals stem from the same
sound source in the far end room, their joint correlation matrix may be rank–deficient.
As a result, there is not a single solution for a multi–channel echo canceller, but a
solution space. The echo canceller may find a solution for which the output signal
is zero in the absence of near end speech, while the filter is not converged to the
real room impulse response (see section 3.5). As a result, the slightest change in the
far end room impulse response, may destroy the successful echo cancellation. For
multichannel echo cancellation both a change in the transmitting– and in the receiving
room will have this effect.
Even if this situation would not occur, still the problem becomes ill conditioned if both
far–end signals are correlated. This often results in a large sensitivity to noise that may
be present in d(k), for example due to continuously present background noise in the
near end room.
This also indicates that proper measures should be used for the evaluation of different
algorithms. One should not only look to the energy in the residual echo signal, because
it can indeed be small or zero while the filter has not yet converged to the real echo
path. For simulated environments, the room acoustics path is known, and hence the
distance between this path and the echo canceller path can be plotted. While this is
only feasible in artificial setups, it is the only ’correct’ way to evaluate the convergence
behaviour of an echo canceller, especially in the multi channel case.
1.2.3
ANC, reference–less noise reduction
The signal picked up by the microphone will in realistic situations often also contain
disturbance components for which no reference signal is available. Also for this case,
multiple approaches to noise cancellation exist.
Single channel techniques A microphone picks up a signal of interest, together
with noise. Single microphone approaches to noise cancellation will try to estimate
the spectral content of the noise (during periods where the signal of interest is absent),
and —assuming that the noise signal is stationary— compensate for this spectrum in
the spectrum of the microphone input signal whenever the signal of interest is present.
The technique is commonly called ’spectral subtraction’ [16, 17]. Single channel approaches are known to perform poorly when the noise source is non–stationary, and
when the spectral content of the noise source and of the signal of interest are similar.
20
Multi–channel techniques In multi–channel acoustic noise cancellation, a microphone array is used instead of a single microphone to pick up the signal. Apart from
the spectral information also the spatial information can be taken into account. Different techniques that exploit this spatial information exist.
In filter– and sum beamforming [60], a static beam is formed into the (assumed known)
direction of the (speech) source of interest (also called the direction of arrival). While
filter–and sum beamforming is about the cheapest multi–channel noise suppression
method, deviations in microphone characteristics or microphone placement will have
a large influence on the performance, Since signals coming from other directions than
the direction of arrival are attenuated, beamforming also provides a form of dereverberation of the signal.
Generalized sidelobe cancellers (Griffiths–Jim beamforming) [60] aim at reducing
the response into directions of noise sources, with as a constraint a distortionless
response towards the direction of arrival. The direction of arrival is required prior
knowledge. A voice activity detector is required in order to discriminate between
noise– and speech+noise periods, such that the response towards the noise sources
can be adapted during noise–only periods. Griffiths–Jim beamforming effectively is a
form of constrained optimal filtering.
A third method is unconstrained optimal filtering [12][13]. Here a MMSE–optimal
estimate of the signal of interest can be obtained, while no prior knowledge is required
about geometry. A voice activity detector again is necessary and crucial to proper
operation. The distortionless constraint towards the direction of arrival is not imposed
here. A parameter can be used to trade off signal distortion against noise reduction.
The contributions of this thesis in the field of acoustic noise reduction will be focused on this last method (chapters 5 and 6). Existing algorithms for unconstrained
optimal filtering for acoustic noise reduction are highly complex compared to both
other (beamforming–based) methods, which implies that they are not suited for real
time implementation. On the other hand, they are quite promising for certain applications, since they have different features than the beamforming–based methods :
filter–and sum beamformers are well suited (and even optimal) for enhancing a localized speech source in a diffuse noise field, and generalized sidelobe cancellers are
able to adaptively eliminate directional noise sources, but both of them rely upon a
priori information about the geometry of the sensor array, the sensor characteristics,
and the direction of arrival of the signal of interest. This means that the unconstrained
optimal filtering technique is more robust against microphone placement and microphone characteristics, and that the direction of arrival is not required to be known a
priori. Another advantage is that they can easily be used for combined AEC/ANC, as
we show in chapter 7.
1.3. APPLICATIONS
1.2.4
21
Combined AEC and ANC
In many applications, techniques to cancel noise for which a reference signal exists
(AEC) are often combined with techniques that do not use a reference signal (ANC),
since both types of disturbances are often present. The order in which both signal
processing blocks are applied to the signals is very important. In Figure 1.5, both
options are shown. The upper scheme will first apply multichannel noise cancellation
(no reference signal), and then echo cancellation. The advantage is that, since most
referenceless noise reduction schemes make use of multiple microphones, only one
echo canceller is needed. Moreover, in addition to the echo path, the echo canceller
will have to model the variations in the noise cancellation block. The lower scheme
in Figure 1.5 requires an echo canceller for each microphone, and these need to be
robust against the noise that is still present in their input signals. In spite of the higher
complexity of the second scheme, it is most often used because of its better performance compared to the first scheme. Apart from these combination schemes, a lot of
different combination schemes are described in literature [1, 7, 37, 38, 6].
In this thesis, we will show that considering the combined problem as a global optimization problem leads to a better performance. We will describe how the unconstrained filtering techniques derived in the chapters about noise cancellation, can easily be adopted for solving the combined acoustic noise and echo cancellation problem.
For echo paths of reasonable length, real time implementation of these techniques is
possible with present day processors.
1.3
Applications
Tele– and videoconferencing As a first application example we consider teleconferencing. A number of people is meeting in two rooms. In each of these rooms,
a microphone array and a loudspeaker are present. The loudspeaker reproduces the
sound of the speakers in the other meeting room. The system can be expanded to have
more loudspeakers, in order to give the conference participants a spatial impression of
the reproduced sound.
If no echo–cancellation is applied, echo’s and howling can occur. Echo paths can be
as long as 200 msec, while a sampling speed of about 16 kHz is required in order
to have a high enough speech quality, resulting in echo path impulse responses of up
to 3000 taps. On the other hand, people talking in the background, a computer fan,
air conditioning are all examples of disturbances that should be handled by means of
noise cancellation.
Often the echo–cancellers in this type of applications could profit from algorithms
as described in chapters 3 and 4, of which the convergence is less dependent on the
input signal statistics than what is the case for NLMS. Also algorithms providing the
22
From far end
Interference
Noise
To far end
Acoustic echo
Speaker
Desired
signal
canceller
Noise reduction
From far end
Interference
Noise
To far end
Acoustic echo
Speaker
Desired
signal
canceller
Noise reduction
Figure 1.5: Two methods to combine echo– and noise cancellation.
1.3. APPLICATIONS
23
’combined’ ANC and AEC–approach in chapter 7 would increase the performance of
a speech enhancement system for tele– or videoconferencing. Note though that for
larger auditoria the required number of filter taps is huge, and that complexity of the
algorithms should be taken into account.
Car applications In car applications such as voice control of a mobile phone or
sound system, or hands free telephony, noise appears to be the most important problem. For engine noise or radio sound, a reference is available or can be derived, while
wind– and tyre noise, passengers talking to each other, ... are disturbances without a
reference signal.
Acoustic paths in cars are much shorter (up to 256 impulse response taps), as compared to typical conference room impulse responses. Also in this case both ANC and
AEC are required. Because of the limited length of the echo path, the algorithms in
chapter 7 certainly become an option.
Voice control Voice control technology can be found in consumer products, but
also finds applications in making technology accessible for disabled people. Speech
recognition systems are often trained with clean speech (without noise), because a lot
of clean speech databases are available, although also databases are set up for specific
noise situations (e.g. speech recognition in cars).
A specific problem is voice control of a surround sound audio system, where a multichannel echo–canceller is required in order to suppress the signal stemming from the
five speakers after being picked up by the microphone. In this case, reference signals are available, and algorithm with a better performance for coloured signals than
NLMS are required (chapters 3 and 4).
Hearing aids Acoustic noise cancellation techniques are applied in the field of hearing aids and cochlear implants. It is known that merely amplifying a signal does
not contribute to increasing the speech intelligibility, when ’background noises’ are
present. Noise cancellation techniques can alleviate this problem, and at present 2–
microphone hearing aids with noise cancellation technology are commercially available.
The space (and hence the computational power) in a behind–the–ear device is limited,
so most of the time cheap (adaptive beamforming) algorithms are used at present, but
also these devices could benefit from the techniques in chapters 5 and 6.
Selective volume control Techniques that are developed for acoustic echo cancelling, can also be applied in other fields. An example is a ’selective volume control’
24
device, which is used in e.g. discotheques to turn down the sound volume automatically if it exceeds the legal norms. In order to avoid that loud noises made by the
crowd would result in lowering the amplifier’s volume, an adaptive filter is used to
retain only the sound from the loudspeakers in the signal that is picked up by a measurement microphone before the sound pressure level is calculated.
A similar system is a volume control application in e.g. a train station, where the
volume is automatically turned up if a train passes, or if the crowd is noisy, but which
is not sensitive to the sound of the public address system’s own loudspeakers.
This kind of applications is even more demanding concerning filter lengths than ordinary echo cancelling in rooms. The legal norms about the maximum sound pressure
level are given per frequency on the full audible frequency spectrum. This means that
a sampling rate of 44 kHz is required. So the required filter length is more than 10000
filter taps.
On the other hand, calculations could be done off–line instead of in real time, and the
music signals can be largely correlated. This again requires ’intermediate’ algorithms
between NLMS (convergence depends on input signal statistics) and RLS.
Recording A recording of e.g. an orchestra or a theatre play imposes different constraints. Microphones will not be placed in an array with an a priori known geometry,
but they will be spread over the whole stage on which the performance takes place.
The signal of interest does not originate from one specific direction. In dedicated theatres, the noise will mainly consist of the audience, but also scenario’s with noise of
air conditioning or heating systems (recordings in churches) are possible.
1.4
The market
A large number of companies are currently offering products and services that are
linked with the above–mentioned speech enhancement techniques. While in high
end devices for auditorium teleconferencing (price about 5000 Euro) it is difficult
to gather information on the type of algorithms used, data sheets about desktop conferencing consumer products often indicate that computationally cheap NLMS–like
or frequency domain derivatives of NLMS are used.
Examples of companies are Spirit Corporation (http://www.spiritcorp.com) , providing code libraries for acoustic echo and noise cancellation optimized for different
types of DSP processors, and for the Microsoft Windows operating system. Polycom (http://www.polycom.com) provides ’desktop’ teleconferencing solutions, and
the performance data they publish (a convergence time of 10–40 sec) indicate the use
of cheap adaptive filters. Larger systems are e.g. built by Clearone (http://www.clearone.com).
1.5. CONTRIBUTIONS
25
Another application is audio enhancement. Both the application CoolEdit (from Syntrillium, http://www.syntrillium.com) and SoundForge (from Sonic Foundry. http://www.sonicfoundry.com)
contain signal enhancement modules providing single channel spectral subtraction
techniques.
Commercial voice command applications often use proprietary techniques based upon
beamforming (e.g. with a microphone array on top of a computer monitor (Andrea
Electronics, http://www.andreaelectronics.com)). In hearing aids, the commercial
state of the art devices use two microphones and Griffiths–Jim beamforming based
noise cancellation schemes.
The importance of speech enhancement technology in the current market is also shown
by the fact that in the most recent version of Microsoft Windows XP noise–cancellation
and echo–cancellation features are built in the operating system (http://www.microsoft.com).
It is clear that in the consumer telecommunications market, the demand for handfree
mobile telephony — a direct application of the techniques described here — is high,
because of security (and legal) issues concerning use of a mobile phone while driving.
As an example : in 2002, the worldwide sales of mobile phones has risen with 6%,
423,4 million devices were sold worldwide (http://www.tijdnet.be/archief).
1.5
Contributions
From section 1.4, one can see that the commercially available applications are all
based upon ’low complexity’ algorithms, obviously due to real–time and cost constraints. For acoustic echo cancelling often more performant algorithms than NLMS–
based ones begin to be used, certainly in ’high end’ applications. The performance
and the complexity of the APA–based algorithms we have studied in this work can be
’tuned’ to use the available computational power. We provide some an alternative for
obtaining noise–robustness and derive an efficient frequency–domain based algorithm,
which does not contain any approximations (contrarily to existing implementations)
One notices that the computational complexity of the newer (unconstrained adaptive
filtering) algorithms for noise reduction prohibits their commercial application. Of
course, with the rise of computational power over the years, in a decade from now
these algorithms will also be applied, even in consumer electronics. In this text we
will focus our attention to some of these new (’academic’) techniques, and we will
derive new algorithms that have a (sometimes dramatically) reduced complexity compared to their predecessors, while keeping their performance at the same level. This
should allow these more performant techniques to be considered for use in commercial
applications in a much shorter time frame.
The contributions to the field of speech enhancement which are treated in this text,
can be subdivided into three major categories.
26
• The first category consists of signal enhancement techniques for acoustic noise
reduction when a reference signal is available (AEC). The results consists of alternative regularization techniques for improving the noise robustness of acoustic echo cancellers based upon the affine projection algorithm (see further on in
this text) , and the Block Exact Affine Projection Algorithm (BEAPA), which is
a fast frequency domain version of the affine projection algorithm with roughly
the same complexity as BEFAP (see further on in this text), but without the
need for the assumptions that need to be made for BEFAP. The results hereof
are published in the conference papers [50, 47, 48, 49] and in the journal paper
[55]. They will be treated in chapters 3 and 4.
• The second category focusses on MMSE–based optimal filtering for acoustic
noise reduction in case no reference signal is available (ANC). We proposed a
QRD–RLS and a QRD—LSL based approach to unconstrained optimal filtering
that achieves the same performance as existing (GSVD–based) techniques, but
with a complexity reduction of respectively one and two orders of magnitude.
These results have been published in the papers [54, 52] and [56, 51]. We will
treat them in chapters 5 and 6.
• Finally, combination of noise– and echo cancelling is treated in chapter 7, and
this result is in our paper [53].
1.6
Outline
1. Speech signal enhancement
2. Adaptive filtering algorithms
Introduction
3. APA regularization and Sparse APA
4. BEAPA for AEC
5. QRD−RLS based ANC
6. Fast QRD−LSL based ANC
Acoustic echo cancellation
Acoustic noise cancellation
7. Integrated noise− and echo cancellation
Echo and noise cancellation
8. Conclusions
Figure 1.6: Outline of the text
The outline of the text is depicted in Figure 1.6. Chapter 2 contains additional introductory material. Relevant adaptive filtering algorithms are reviewed, and the concept
of signal flow graphs is explained briefly.
1.6. OUTLINE
27
Chapter 3 and 4 of the thesis focus on acoustic echo cancellation. More specifically in chapter 3 the importance of noise robustness in acoustic echo cancellers is
reviewed, and some techniques are derived to implement this into fast affine projection algorithms. We also show that traditional fast implementations exhibit problems
when strong regularization is applied. In chapter 4 a frequency domain block exact
affine projection algorithm is derived which does not contain the approximations that
are present in traditional fast affine projection schemes, while it has a complexity that
is comparable to these schemes.
Chapter 5 and 6 focus on acoustic noise cancellation techniques. In chapter 5 an
unconstrained optimal filtering based noise cancellation algorithm is derived. This
algorithm is based upon the QR–decomposition (see section 2.3 for a definition). It
obtains the same performance as existing algorithms for unconstrained optimal filtering, while its complexity is an order of magnitude lower. Chapter 6 builds upon
the previous one to derive an even cheaper fast QRD–based algorithm while again
performance is maintained at the same level.
In chapter 7 we discuss the combination of AEC and ANC, and show the performance
advantage of using an integrated approach to acoustic noise and echo cancellation
compared to traditional combination schemes.
Chapter 8 finally, contains the overall conclusions of this work, as well as suggestions
for further research.
28
Chapter 2
Adaptive filtering algorithms
Adaptive filters will play an important role in this text. Therefore, we will devote
a chapter to giving an overview of commonly used adaptive filtering techniques. In
section 2.1 the general adaptive filtering setup and problem will be reviewed. The normalized least mean squares algorithm (NLMS) and the recursive least squares (RLS)
algorithms will be reviewed in sections 2.2 and 2.3. An intermediate class of algorithms, both complexity– and performance–wise, can be derived from the affine
projection algorithm (APA). APA will be introduced in section 2.4. In each section
complete algorithm descriptions of these algorithms will be given for reference.
Later on in this text, APA will be the main topic in chapters 3 and 4, where it will be
used for acoustic echo cancellation. Chapters 5, 6 and 7 will mainly be based upon
algorithms derived from RLS and fast versions thereof.
2.1
Introduction
In this introduction we will give a short overview of the data representations that will
be used in the remains of the chapter and the thesis. We will use both adaptive filtering
configurations with single and multiple input and output channels.
A single input, single output adaptive filtering setup is shown in Figure 2.1. An input
signal x(k) is filtered by a filter w(k). The output from this filtering operation is
subtracted from a ’desired signal’ d(k) and the resulting ’error signal’ e(k) is used to
update the filter coefficients. The signals are assumed to be zero mean, and d(k) is
a linearly filtered version of x(k) with zero mean noise added that is assumed to be
independent of x(k).
29
30
CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS
x
w
−ye
+
d
Figure 2.1: Adaptive filter. The filter coefficients w are adapted such that e is minimized.
All of the algorithms are based upon an overdetermined system of linear equations
X(k)w(k)
=
d(k) d(k − 1) . . .
T
,
(2.1)
where



= 

X(k)
x(k)
xT (k)
T
x (k − 1)
xT (k − 2)
..
.



,

x(k) x(k − 1) . . . x(k − N + 1)
=
T
,
which will be solved in the least squares sense, i.e. based on a LS criterion
min d(k) d(k − 1) . . .
wLS (k)
T
2
− X(k)w(k) .
(2.2)
The LS solution is given as
wLS (k) = (X T (k)X(k))−1 X T (k)
d(k) d(k − 1) . . .
T
.
We will also use the MMSE criterion
min ε{(d(k) − xT (k)w(k))2 },
wMMSE (k)
(2.3)
where ε{} is the expectation operator. The MMSE solution is given as
wMMSE (k) = (ε{x(k)xT (k)})−1 ε{x(k)d(k)}.
In each time step k, a new equation will be added to (2.1), so at each time instant
a new value for w(k) can be calculated. Since adaptivity is required in a changing
31
2.1. INTRODUCTION
environment, algorithms will be designed to ’forget’ old information. This can be
achieved by exponentially weighting the rows of X(k) as it is usually done in the
RLS algorithm, i.e.



X(k) = 

xT (k)
λxT (k − 1)
λ2 xT (k − 2)
..
.



,

or by only using the P most recent input vectors in X(k) :



X(k) = 

x1
W1
d
xT (k)
T
x (k − 1)
..
.

xT (k − P + 1)
x2
x3
W2
+
−
+


.

W3
e
Figure 2.2: A multi–channel adaptive filter. The input vector x(k) consists of the concatenation of the channel input vectors xi (k), and similarly the filter vector w(k) =
T
.
w1T (k) w2T (k) w3T (k)
In this text, we will also consider multichannel (multiple input) adaptive filters (see
32
Figure 2.2), where the input vectors x(k) will be defined as

x1 (k)
..
.



 x1 (k − N + 1)

 −−−−−
x(k) = 

x2 (k)


x
2 (k − 1)


..

.
xM (k − N + 1)







.






(2.4)
Similarly w(k) is then defined as a stacked version of the filter vectors wi (k) for
i = 1...M :



w(k) = 

w1 (k)
w2 (k)
..
.
wM (k)



.

Here M will be the number of input channels of the adaptive filter, and N is the
number of filter taps per input channel. Sometimes an alternative definition for the
input vector will be used in which the input signals will be interlaced :







x(k) = 






x1 (k)
..
.
xM (k)
−−−−−
x1 (k − 1)
x2 (k − 1)
..
.
xM (k − N + 1)







.






(2.5)
As a result, also the corresponding filter taps will be interlaced.
Considering setups with multiple microphones, we will be solving least squares minimization problems that share the same left–hand side matrix X(k), but have different
right hand side vectors. They can be solved concurrently as one multiple–right hand
side least squares problem. In this case the columns of a matrix W (k) will be solutions to LS–problems with the columns of a matrix D(k) as their respective right hand
33
2.2. NORMALIZED LEAST MEAN SQUARES ALGORITHM
sides. A system of equations analoguous to (2.1) can be written down :
X(k)W (k)
= D(k),


xT (k)
 T

X(k) =  x (k − 1)  ,
..
.
x(k)
D(k)
(2.6)
x(k) x(k − 1) . . . x(k − N + 1)


dT (k)
 dT (k − 1) 


= 
,
..


.
=
T
,
dT (k − 1)
d(k)
d1 (k) d2 (k) . . .
=
T
.
Note the structure of d(k) of which the components represent the different desired
signal samples at time k. The least squares solution can be found from
min kD(k) − X(k)W (k)k .
W (k)
(2.7)
The corresponding MMSE criterion is
min ε{d(k) − xT (k)W (k)} .
W (k)
In the next sections we will give an overview of the different adaptive filtering techniques that will be used in this thesis.
2.2
Normalized Least Mean Squares algorithm
One approach to solving (2.1) is the Least Mean Squares (LMS) algorithm. This
algorithm is in fact a stochastic gradient descend method applied to the underlying
MMSE criterium (2.3). The update equations for the filter coefficient vector wlms (k)
are
e(k + 1) = d(k + 1) − xT (k + 1)wlms (k),
wlms (k + 1) = wlms (k) + µx(k + 1)e(k + 1),
y(k + 1) = xT (k + 1)wlms (k + 1).
(2.8)
(2.9)
Here µ is a step size parameter. A full description is shown in Algorithm 1.
In order to make the convergence behaviour independent of the input energy, often the
Normalized Least Mean Squares (NLMS) algorithm is used, where the filter vector
34
Algorithm 1 LMS algorithm
wlms = 0;y = 0;
Loop (new input vector x and desired signal d in each step):
e=d−y
wlms = wlms + µxe
y = xT wlms
update is divided by the input energy. The algorithm is given by
e(k + 1)
wnlms (k + 1)
= d(k + 1) − xT (k + 1)wnlms (k),
x(k + 1)e(k + 1)
= wnlms (k) + µ T
.
x (k + 1)x(k + 1) + δ
(2.10)
Here the δ is a ’regularization term’. In NLMS it guarantees that the denominator
can not become zero, but it also provides noise robustness (see section 2.4). Similar
equations are obtained for the definitions (2.6). It can be shown that, for µ = 1 and
δ = 0, the a posteriori error for NLMS,
epost (k + 1) = d(k + 1) − xT (k + 1)wnlms (k + 1),
is zero, which means that for the NLMS–algorithm the systems of equations (2.1) or
(2.6) are effectively reduced to one single equation, namely the most recent one, and
that this equation is solved exactly based on a minimum–norm weight vector adaptation. NLMS is a computationally cheap algorithm with a complexity of 4N flops
per sample1 , but it suffers from a slow convergence when non–white input signals are
applied. In practice often frequency domain variants of this algorithm are used in order to obtain an even lower complexity. An algorithm description of the time domain
NLMS algorithm is given in Algorithm 2. .
Algorithm 2 NLMS algorithm
wnlms = 0; y = 0
Loop (new input vector x and desired signal d in each step):
e=d−y
wnlms = wnlms + µ xTxe
x+δ
y = xT wnlms
We also note here that if the LMS–algorithm is to be calculated for multiple desired
(right hand side) signals, the whole algorithm simply has to be repeated for each desired signal. In the NLMS–algorithm the (small) cost of calculating the input energy
1 For complexity calculations in this text we will count an addition and a multiplication for 2 separate
floating point operations
35
2.3. RECURSIVE LEAST SQUARES ALGORITHMS
can be shared :
eT (k + 1)
= dT (k + 1) − xT (k + 1)Wnlms (k),
x(k + 1)eT (k + 1)
Wnlms (k + 1) = Wnlms (k) + µ T
.
x (k + 1)x(k + 1) + δ
2.3
(2.11)
Recursive Least Squares algorithms
In this section, we will first review the standard recursive least squares algorithm, then
the numerically more stable (and thus preferrable) QRD–based RLS algorithm, and
finally the fast QRD–Least Squares Lattice algorithm.
2.3.1
Standard recursive least squares
Instead of applying a stochastic gradient descent method (NLMS), the recursive least
squares (RLS) algorithm solves system (2.6) or (2.1) in a least squares (LS) sense,
i.e. based on the LS criterion (2.2), and does so by applying recursive updates to the
solution calculated in the previous time step (cfr. newton-iterations on a quadratic
error surface where the hessian reduces to a correlation matrix). For exponentially
weighted RLS, the update equations are
erls (k + 1)
Ξ−1 (k + 1)
wrls (k + 1)
= d(k + 1) − xT (k + 1)wrls (k),
1 −1
(k)x(k + 1)xT (k + 1) λ12 Ξ−1 (k)
1 −1
λ2 Ξ
Ξ
(k)
−
=
,
λ2
1 + λ12 xT (k + 1)Ξ−1 (k)x(k + 1)
= wrls (k) + Ξ−1 (k + 1)x(k + 1)erls (k + 1),
(2.12)
where Ξ−1 (k) is the inverse correlation matrix (Ξ(k) = X T (k)X(k)). The first
equation calculates the error at time instant k + 1, while the second equation is the
filter coefficient update. Instead of doing an update in the direction of the input vector
x(k) as in LMS, in (2.12) the input signal can be seen to be whitened because it is
multiplied by the inverse correlation matrix. An algorithm description is provided in
Algorithm 3.
Again a regularization (or better : ’diagonal loading’) term can be added to the inverse
correlation matrix
wrls (k + 1) = wrls (k) + (X T (k + 1)X(k + 1) + δI)−1 x(k + 1)erls (k + 1).
Here I is the unity matrix. It is well known that this provides robustness to noise terms
that could be present in d(k) [27].
36
Algorithm 3 RLS algorithm
wrls = 0
Ξinv = 106 .I //Init with large number
Loop (input : d and x):
erls = d − xT wrls
Ξinv =
1
λ2 (Ξinv
−
1
λ2
Ξinv xxT
1
λ2
Ξinv
1+ λ12 xT Ξinv x
)
wrls = wrls + Ξinv xerls
y = xT wrls
It is easily seen that in case multiple right hand side signals are present, the update of
the inverse correlation matrix can be shared among the different right hand sides :
eTrls (k + 1)
Ξ(k + 1)
Wrls (k + 1)
= dT (k + 1) − xT (k + 1)Wrls (k),
1
1
T
1
λ2 Ξ(k)x(k + 1)x (k + 1) λ2 Ξ(k)
=
Ξ(k)
−
,
λ2
1 + λ12 xT (k + 1)Ξ(k)x(k + 1)
= Wrls (k) + Ξ(k + 1)x(k + 1)eTrls (k + 1).
(2.13)
This effectively means that in that case apart from the cost of calculating the inverse
correlation matrix (once), for each channel only an LMS–like updating procedure
needs to be calculated. This is easily shown by comparing (2.12) and (2.8). We will
now describe an RLS algoritm based on QRD–updates, which is known to have good
numerical properties.
2.3.2
QRD–updating
Every matrix X ∈ <L×M N with linearly independent columns, L ≥ M N (in our
application, M will be the number of microphones or ’input channels’ and N the
number of filter taps per microphone) can be decomposed into an orthogonal matrix
Q ∈ <L×M N and an upper triangular matrix R ∈ <M N ×M N , where R is of full rank
and has no non–zero entries on the diagonal.
X = QR.
(2.14)
This decomposition is called ’QR–decomposition’ (QRD), and R is called the Cholesky–
factor or square root of the matrix product X T X, since X T X = RT R. In our applications X(k) is often defined in a time recursive fashion,
X(0) = xT (0) ,
T
x (k + 1)
X(k + 1) =
.
(2.15)
λX(k)
37
Here 0 < λ ≤ 1 is a forgetting factor and k is the time index. We will now briefly
review the QR–updating procedure [29] for computing the QRD of X(k + 1) from the
QRD of X(k). If we replace X(k) by its QR–decomposition, we obtain :
X(k + 1) =
1
0
0 Q(k)
xT (k + 1)
λR(k)
.
We can now find an orthogonal transformation matrix Q(k + 1) :
xT (k + 1)
X(k + 1) =
,
λR(k)
1
0
0
=
Q(k + 1)
,
0 Q(k)
R(k + 1)
|
{z
}
1
0
0 Q(k)
(2.16)
[∗|Q(k+1)]
= Q(k + 1)R(k + 1).
The ’*’ are don’t care–entries. Here Q(k + 1) will be constructed as a series of
Givens–rotations,
Q(k + 1) = G1,2 (θ1 (k + 1))G1,3 (θ2 (k + 1)) . . . G1,M N (θM N (k + 1)),
with



Gi,j (θ) = 


Ii−1
0
0
0
0
0
cos θ
0
sin θ
0
0
0
Ij−i−1
0
0
0
− sin θ
0
cos θ
0
0
0
0
0
IM N −j



.


Each of these rotations will zero out one of the elements of the top row in the compound matrix
T
x (k + 1)
(2.17)
λR(k)
in order to obtain the updated R(k + 1) in the right hand side of (2.16). Q(k) will not
be usefull in applications, and hence will not be stored. The procedure for choosing
the i, j and θ for the Givens–rotations is best explained in the signal flow graph (SFG)
for QRD–updating which is shown in Figure 2.3 for M = 2 and N = 4. In this
SFG the upper triangular matrix R(k) can be recognized, as well as the input vector
(from the delay line) that is placed on top of it. Compare this to the matrix (2.17). The
rotations (hexagons in the signal flow graph) are defined by :
a0
b0
=
cos θ
− sin θ
sin θ
cos θ
a
b
.
38
When a new input vector x(k+1) enters the scheme, the top left hexagon will calculate
θ1 such that its output b0 = 0,
tan θ1 =
x1 (k + 1)
,
R11 (k + 1)
(2.18)
and it will update R11 accordingly. Note that the denominator in this expression is
never zero by the definition of the QR–decomposition (the matrix R(k) should be
properly initialized before the first iteration). The other hexagons in the first row use
this θ1 to process the remaining elements of the input vector and the top row of R(k)
. This corresponds to applying G1,2 (θ1 (k)). Then the first hexagon in the second row
will calculate θ2 so that applying G1,3 (θ2 (k)) nulls the second element of the modified
input vector and updates the second row of R(k), and so on [39]. For a more detailed
description of this signal flow graph, we refer to [46]. Algorithm 4 shows the QRD–
updating process, see also [29]. Note that the updating scheme requires O((M N )2 )
flops per update.
Algorithm 4 QRD–updating
UpdateQRD (R, x, Weight)
{
// x is input vector
// an upper triangular matrix R is being updated
for (i = 0; i < M * N; i++)
{
R[i][i] *= Weight;
temp = sqrt (R[i][i] * R[i][i] + x[i] * x[i]);
sinTheta = x[i] / temp;
cosTheta = R[i][i] / temp;
R[i][i] = temp;
for (j = i+1; j < M * N; j++)
{
temp = R[i][j] * dWeight;
R[i][j] = cosTheta * temp + sinTheta * x[j];
x[j] = -sinTheta * temp + cosTheta * x[j];
}
}
}
2.3.3
QRD–based RLS algorithm (QRD–RLS)
The QR–decomposition can be used to perform a least squares estimation of the form
2
min kX(k)W (k) − D(k)k .
W (k)
(2.19)
Here W (k) is a matrix, each column of which corresponds to a least squares estimation problem with X(k) and the corresponding column of D(k) (referred to as the
39
filter input 1
∆
x1(k+1)
x 2 (k+1)
filter input 2
∆
x1(k)
x 2(k)
∆
∆
x1(k−1)
x 2(k−1)
∆
x1(k−2)
x 2 (k−2)
∆
R11
0
R22
0
R33
0
R44
0
R55
Rij(k+1) Rij(k)
=
∆
λ
a
a’
b
0
0
θ =arctan(b/a)
R66
0
hexagons are rotations
R77
a
a’
b
b’
θ
θ
0
R88
0
Figure 2.3: Givens–rotations based QRD–updating scheme to update an R(k)–matrix. On
top the new input vector is fed in, and for each row of the R(k)–matrix a Givens–rotation is
executed in order to obtain an upper triangular matrix R(k + 1).
40
“desired response signal”). If (2.1) is solved instead of (2.6), both D(k) and W (k)
reduce to a vector.
D(k) will also be defined in a time–recursive fashion using weighting :
T
d (k + 1)
D(k + 1) =
.
λD(k)
(2.20)
Using equation (2.14) it is found that the least squares solution to (2.19) is given by
W (k)
= (X(k)T X(k))−1 X T (k)D(k)
= R(k)−1 QT (k)D(k) .
|
{z
}
(2.21)
Z(k)
Hence W (k) is computed by performing a triangular backsubstitution with left hand
side matrix R(k) and right hand side matrix Z(k). From R(k) = QT (k)X(k) and
Z(k) , QT (k)D(k) it follows that Z(k) can be obtained by expanding the QRD–
updating procedure with the desired signals part, i.e. applying the QRD–updating
procedure to
X(k) D(k)
instead of X(k) only, as shown in Figure 2.4 with
d(k) =
d1 (k) d2 (k)
T
.
At any point in time the least squares solution W (k) may then be computed based on
the stored R(k) and Z(k) according to formula (2.21).
The update equation becomes :
T
1
0
x (k + 1) dT (k + 1)
=
0 Q(k)
λR(k)
λz(k)
1
0
0
rT (k + 1)
Q(k + 1)
.
0 Q(k)
R(k + 1) z(k + 1)
|
{z
}
(2.22)
[∗|Q(k+1)]
RLS has a rather large computational complexity, but (unlike NLMS) it shows a very
good performance that is independent of the input signal statistics. Furthermore, it has
been shown in [39] that
rT (k + 1)
dT (k + 1) − xT (k + 1).W (k) = QM N
,
i=1 cos θi (k + 1)
where
rT (k + 1) =
ε1
ε2
(2.23)
41
is a byproduct of the extended QRD–updating process, as indicated in Figure 2.4. This
means that we can extract (a priori) least squares residuals without having to calculate
the filter coefficients W (k) first. This is referred to as ’residual extraction’. Note
that the denominator in (2.23) can not become zero because since the denominator of
(2.18) will never be zero. For the a posteriori residuals, we can write
dT (k + 1) − xT (k + 1).W (k + 1) =
M
N
Y
cos θi (k + 1)rT (k + 1).
(2.24)
i=1
The signal flow graph for the whole procedure as given in Figure 2.4 corresponds to
Figure 2.3 with the right hand side columns with inputs d1 (k) and d2 (k) added to the
right, as well as a “Π cos θ accumulation chain” added to the left. The complexity of
this scheme is still O(M 2 N 2 ) per time update. Algorithm 5 gives details about the
QRD–RLS procedure.
Algorithm 5 Update of the QRD–RLS algorithm
QRDRLS_update (R, x, r, Weight)
{
// x is the input vector
// r is the desired signal input (a scalar in this case)
// the upper triangular matrix R is being updated, along with
// the vector z which is the right hand side
// the residual signal is returned
PiCos = 1;
for (i = 0; i < M * N; i++)
{
R[i][i] *= Weight;
R[i][0] = temp;
for (j = i+1; j < M * N; j++)
{
temp = R[i][j] * Weight;
}
temp = z[i] / Weight;
z[i] = cosTheta * temp + sinTheta * r;
r = -sinTheta * temp + cosTheta * r;
PiCos *= cosTheta;
}
return r * PiCos;
}
42
x1(k+1)
filter input 1
x2(k+1)
filter input 2
∆
x1(k)
∆
x2(k)
∆
x1(k−1)
∆
x2(k−1)
∆
x1(k−2)
x2(k−2)
∆
d (k+1) d (k+1)
1
2
Z(k+1)
1
0
0
R11
0
R22
0
0
0
R33
0
R44
0
0
0
R55
0
R66
0
0
0
R77
0
R88
0
Π cos θ
ε1
ε2
r(k+1)
LS residual
Figure 2.4: QRD–RLS algorithm. The right hand side (desired signal) is updated with the same
rotations as the left hand side.
2.3.4
QRD–based least squares lattice (QRD–LSL)
It is well known that the shift–structure property of the input vectors of Figure 2.4 can
be exploited to reduce the overall complexity. It can be shown [46] that a QRD–RLS
scheme as shown in Figure 2.4 is equivalent to the scheme of Figure 2.5, which requires only O(M 2 N ) flops per update instead of O(M 2 N 2 ) for the original scheme.
Since N (the number of filter taps) is typically larger than M (the number of microphones), this amounts to a considerable complexity reduction. The complexity
reduction stems from replacing the off–diagonal part of the triangular structure (the
43
diagonal part is seen to be still in place), by the computations in the added left hand
part. The resulting algorithm is called QRD–LSL (QRD–based least squares lattice),
and it is known to be a numerically stable implementation of the RLS algorithm, since
it only uses stable orthogonal updates as well as exponential weighting. Note that
QRD–LSL needs to read the input one sample ahead as compared to QRD–RLS. For
further details on the QRD–LSL derivation, we refer to [46]. In Algorithm 6 the
QRD–LSL adaptive filter is given in pseudocode.
filter input 1
x1(k+2)
filter input 2
x2(k+2)
1
0
0
0
0
∆
0
0
0
∆
0
0
0
∆
d (k+1) d (k+1)
1
2
x2(k+1)
∆
R11
0
R22
R33
0
R44
0
∆
0
x1(k+1)
0
∆
0
∆
R55
0
R66
0
∆
0
0
R77
0
R88
0
Πcos θ
ε1
ε2
LS residual
Figure 2.5: QRD–LSL. Notice that the inputs for the right hand side part are the desired signals
at time k + 1, while the inputs for the left hand side are the input signals at time k + 2
2.3.5
RLS versus LMS
RLS is much more complex than LMS, but the performance for colored signals like
speech is often better. In formula (2.12) the updating equation for LMS can be recognized, with a ’prewhitening’ added in the form of the multiplication with the inverse
correlation matrix.
44
Algorithm 6 QRD–LSL update
Update(R, x, Weighting, RightHandSideMatrix, r, PiCos)
{
for (i = 0; i < M*N; i++)
{
for (j = i; j < M*N; j++)
{
R[i][j] *= Weighting;
}
GivensCalcAngle(SinTheta, CosTheta, R[i][i], x[i]);
for (j = i+1; j < M*N; j++)
{
GivensRotate(SinTheta, CosTheta, R[i,j], x[j]);
}
for (j=0; j < RightHandSideMatrix.GetNrColumns(); j++)
{
RightHandSide[i][j] *= Weighting;
GivensRotate(SinTheta, CosTheta, RightHandSideMatrix[i][j], r[j]);
}
PiCos *= CosTheta;
}
return r;
}
ProcessNewInput(x, Weight, Desired)
{
PiCos=1;
xl = x;
xr = x;
delay[0] = x;
for (int i=0; i < N; i++)
{
dxl = delay[1];
dxr = dxl;
Update(RightR[i], dxr, dWeight,
[RotationsRight[i] z[i]],
[xr dDesired], PiCos);
if (i < N-1)
{
Update(LeftR[i], xl, dWeight, RotationsLeft[i], dxl, 0);
delay[i] = dxl;
}
xl = xr;
}
for (int i = 0; i < N-1; i++)
{
delay[i+1] = delay[i];
}
return dDesired*dPiCos;
}
2.4. AFFINE PROJECTION BASED ALGORITHMS
2.4
45
Affine Projection based algorithms
We will introduce the affine projection algorithm, and its time domain fast version
named FAP.
2.4.1
The affine projection algorithm
The affine projection algorithm (APA) [43] is an ’intermediate’ algorithm in between
the well known NLMS and RLS algorithms, since it has both a performance and a
complexity in between those of NLMS and RLS. It is (for the case of a single desired
signal) based upon a system of equations of the form


d(k)


d(k − 1)


(2.25)
XP (k)T wapa (k − 1) = 
 = dP (k),
..


.
d(k − P + 1)
x(k) x(k − 1) . . . x(k − P + 1) ,
XP (k) =
where N is the filter length, P (with P < N ) is the number of equations in the system.
The ’basic’ system of equations (2.1) can again be recognized, this time with a smaller
number (P ) of equations.
The APA–recursion for a single desired signal is given as :
e(k + 1) = dP (k + 1) − XPT (k + 1)wapa (k),
g(k + 1) = (XPT (k + 1)XP (k + 1) + δI)−1 e(k + 1),
wapa (k + 1) = wapa (k) + µXP (k + 1)g(k + 1)
y(k + 1) = xT (k + 1)wapa (k + 1).
(2.26)
Here µ and δ are a step size and a regularization parameter respectively. The regularization parameter is important in providing noise robustness, as will be explained in
section 2.4. P is a small number (e.g. 10) compared to N (e.g. 1000 or 2000). The
first element of eP (k + 1) is the a priori error of the ’most recent’ equation in each
step. An algorithm specification is given in Algorithm 7.
The complexity of the algorithm is O(M N P ). Also this algorithm is easily extended
to multiple right hand sides (2.6).
2.4.2 APA versus LMS
Just like for the RLS–algorithm, one can recognize an LMS–filter in (2.26), preceeded
by a pre–whitening step on the input signal. The NLMS algorithm is as a matter of
46
Algorithm 7 Affine projection algorithm
w=0
Loop (new x and d in each step):
add column x to XP as first column
remove last column from XP
add d as first element from dP
remove bottom element from dP
e = dP − XPT w
g = (XPT XP + δI)−1 e
w = w + µXP g
y = xT w
fact a special case of the APA–algorithm with P = 1. For µ = 1 and δ = 0, the a
posteriori error in (2.26)
epost (k + 1) = dP (k + 1) − XPT (k + 1)w(k + 1)
is zero, i.e. APA basically solves the P most recent equations exactly, based on a
minimum norm weight vector update. Remember that the NLMS–algorithm applied
a minimum norm update to the solution vector such that the a posteriori error of the
most recently added equation is exactly zero. A geometrical interpretation will be
given in section 2.5.
2.4.3
The Fast Affine Projection algorithm (FAP)
A fast version of APA, called FAP, which has a complexity of 4N + 40P flops is
derived in [26]. Since typically P N , FAP only has a small overhead as compared
to NLMS. This complexity reduction is accomplished in two steps. First, one only
calculates the first element of the P –element error vector eP (k) (see formula 2.26)
and one computes the other P − 1 elements as (1 − µ) times the previously computed
error. As stated in [26], this approximation is based upon an assumption about the
regularization by diagonal loading (δI) :
ei (k) = epost
i−1 (k − 1) for i > 1.
Here ei (k) means the i’th component of the vector e(k), and similarly for epost
i (k).
Indeed, we have
epost (k)
= dP (k) − XPT (w(k − 1) + µX(k)(X T (k)X(k) + δI)−1 e(k))
|
{z
}
w(k)
≈ e(k) − µe(k)
≈ (1 − µ)e(k).
47
2.5. GEOMETRICAL INTERPRETATION
As shown in [26], this eventually leads to




e(k) ≈ 



e1 (k)
−−−−−−−
(1 − µ)e1 (k − 1)
..
.
(1 − µ)eP −1 (k − 1)



,


(2.27)
where e1 (k) = d(k) − xT (k)w(k − 1). Note that with a stepsize µ = 1, the P − 1
lower equations would have been solved exactly already in the previous time step, and
hence their error would indeed be zero.
A second complexity reduction is achieved by delaying the multiplications in the
matrix–vector product X(k)g(k) in equation 2.26. This results in a ’delayed’ coefficient vector
ŵ(k − 1) = w(0) + µ
k−1
X
x(k − l)
l=P
l
X
gj (n − l + j),
j=0
such that
w(k) = ŵ(k − 1) + µXP (k)f (k),
where



f (k) = 

g1 (k)
g2 (k) + g1 (k − 1)
..
.
gP −1 (k) + . . . + g1 (k − P + 1)



.

It can be shown that an updating formula for ŵ(k) exists. A correction term can be
used to obtain the residual at time k without having to calculate w(k) first. Details on
the derivation can be found in [26]. Algorithm 8 is a full description of FAP.
The complexity of the FAP adaptive filter can even be further reduced by using frequency domain techniques. In chapter 4, the Block Exact Fast Affine Projection (BEFAP) adaptive filter [59] will be reviewed, and we will derive a block exact version of
APA, without the FAP–approximations, but with almost equal complexity as BEFAP.
2.5 Geometrical interpretation
All algorithms update their filter coefficient vector in each time step. The similarity
between (2.10), (2.12) and (2.26) is obvious.
For NLMS with µ = 1 and δ = 0 the a posteriori error is zero in each step, and this
corresponds to the fact that the most recent equation in a system of equations like (2.6)
48
Algorithm 8 Fast affine projections (FAP) for N filter taps which outputs the residuals
of the filtering operation. The notation •a:b denotes the vector formed by the a’th to
b’th component (inclusive) of vector •
Loop :
rxx = rxx + x(k).x2:P (k) + x(k − N )x2:P (k − N )
ê1 = d − xT ŵ
e1 =ê1 − µrTxx f1:P −1
e1
 (1 − µ)e1 


e=

.
.


.
(1 − µ)eP −1
Update
S =
(X T X + δ)−1

0
 f1 


f =  .  + Se
 .
. 
fP −1
ŵ = ŵ + µx(k − P + 1)fP
e=e
output e1 or d − e1
is solved exactly. For APA (µ = 1 and δ = 0) the a posteriori error vector of size P is
zero, which means that the P most recent equations in (2.6) are exactly solved. This is
possible as long as P ≤ N . When P > N we can only solve the system of equations
in the least squares sense, which then corresponds to an RLS algorithm with a sliding
window. So APA is clearly an intermediate algorithm in between RLS and NLMS in
view of complexity, but also in view of performance. P is a parameter that can be
tuned in function of the available processing power, where a larger P results in higher
complexity, but improved performance.
The fact that the performance of APA is intermediate between NLMS and RLS, can be
shown geometrically. Figure 2.6 shows a geometric representation of the convergence
of an NLMS filter with two filter taps. Assume the optimal filter vector that has to be
identified by the process to be w. The vectors xi are the consecutive input vectors,
while the points wi are the estimates of the filter vector in successive time steps.
Assume the estimate of the filter vector at time 0 to be w0 . When a new input x1
arrives, w0 will be updated in the direction of x1 such that the error in the direction
of x1 becomes zero (µ = 1 is assumed). This means that w0 is projected on to a line
(=affinity) that is orthogonal to the input vector x1 , and that contains the vector w.
When the process continues, we see that the estimates converge to w. The convergence rate is higher when the directions of the input vectors are ’white’, which means
if they are effectively uncorrelated.
49
2.5. GEOMETRICAL INTERPRETATION
W
X4
W5
X3
W4
X2
X1
W3
W2
W1
W0
X5
Figure 2.6: Geometrical interpretation of NLMS. The estimate of the filter vector is projected
upon an affinity of the orthogonal complement of the last input vector, such that this affinity
contains the ’real’ filter vector.
W
W0
W1
X2
X1
Figure 2.7: Geometrical interpretation of APA. The estimate of the filter vector is projected
upon an affinity of the orthogonal complement of the last P input vectors. This affinity contains
the ’real’ filter vector.
50
For the APA–algorithm, we have a sketch in Figure 2.7 for a system with 3 filter
taps and the APA–order P = 2. Now the estimate w0 is projected on the affinity
of the orthogonal complement of the last P = 2 input vectors, such that this affinity
contains the solution vector w (again, if stepsize µ = 1). It can be seen that this
results in a faster convergence compared to NLMS when x1 and x2 have almost the
same direction (i.e. when the input vectors are correlated). This intuitive geometrical
interpretation shows that APA is an extension of the NLMS–algorithm, and at the
same time it explains the name of the affine projection algorithm.
2.6 Conclusion
In this chapter, we have reviewed some adaptive filtering algorithms that will be important in the rest of this text. The NLMS algorithm is a cheap algorithm (O(4M N )
complexity)that exhibits performance problems in the case where non–white input signals are used, because its convergence speed is dependent on the input signal statistics.
On the other hand, the RLS–algorithm, which performs very well even for non–white
signals, is much more expensive (O((M N )2 ) complexity for the standard versions,
and O(M 2 N ) for stable fast versions like QRD–LSL). A class of ’intermediate’ algorithms is the APA–family of adaptive filters. These filters have a design parameter P
with that one can tune both complexity and performance. RLS and APA can be seen
as an LMS–filter with additional pre–whitening.
Not long ago, only NLMS–filters and even cheaper frequency domain variants were
used to implement acoustic echo cancellation, because of their complexity advantage
whenever long adaptive filters are involved, allthough both RLS and APA are well
known to perform better. Due to the increase of computing power over the years, also
APA–filters more and more find their way into this field. We continue in this direction
by proposing a new fast version of the APA–algorithm in the next chapters.
Chapter 3
APA–regularization and Sparse
APA for AEC
In the previous chapter we have reviewed the adaptive filtering algorithms that are
important in this thesis. In this and the next chapter, we will apply the affine projection
algorithm to the problem of acoustic echo cancellation.
APA has become a popular method in adaptive filtering applications. Fast versions
of it have been developed, such as FAP (chapter 2) and the frequency domain Block
Exact Fast Affine Projection (BEFAP) [59].
In this chapter, we focus on three main topics, namely regularization of APA, problems
that exist in conventional fast APA implementations when regularization is applied,
and finally regularization of APA in multichannel acoustic echo cancellation.
In a traditional echo canceller system, the adaptation is switched off by a control algorithm when the microphone signal contains a near end (speech) signal. However,
robustness against continuously present near end noise is also very important, especially for the APA–type of algorithms, which indeed tend to exhibit a large sensitivity
to such near end noise. Regularization is generally used to obtain near end noise robustness. We will review two alternatives for regularization, and introduce a third
alternative (which we will call ’sparse equations’ technique).
Existing fast implementations of the affine projection algorithm (based upon FAP)
exhibit problems when much regularization is used. Besides that, the FAP algorithm
can not be used with the sparse equations technique that we will derive. We will show
this in section 3.3, and this motivates further algorithm development in chapter 4.
The outline of this chapter is as follows : in section 3.1, we will first state the problem
that occurs if near end noise is present, and how diagonal loading and exponential
51
52
CHAPTER 3. APA–REGULARIZATION AND SPARSE APA
weighting (regularization) can be used to resolve this . In section 3.2 we will introduce
a ’sparse equations’ regularization technique, which will also reduce the influence of
near end noise. The problems in the FAP algorithm when much regularization is used,
are demonstrated in section 3.3. In sections 3.4 and 3.5 experimental results are given
and the behaviour of multichannel echo cancellation is studied when regularization is
applied. Conclusions are given in section 3.6.
3.1
APA regularization
In this section we will review why regularization is important for APA–type algorithms in case near end noise sources are present. ’Diagonal loading’ and exponential
weighting as regularization methods are also reviewed. In the next section, we will
introduce a third alternative, which we will call the ’sparse equations technique’.
3.1.1
Diagonal loading
The (semipositive definite) covariance matrix X T (k)X(k) that is used (and inverted)
in the APA–expressions (2.26) is regularized by adding a small constant δ times a
unity matrix (diagonal loading). The equations are repeated here for the update from
k − 1 to k (for convenience) :

 e(k) = dP (k) − XPT (k)w(k − 1)
g(k) = (XPT (k)XP (k) + δI)−1 e(k) .

w(k) = w(k − 1) + µX(k)g(k)
The obvious effect of this is that the matrix can not become indefinite, but regularization also has a beneficial effect when near end noise is present. This is shown in [27]
as follows. Rewrite dP (k) as
dP (k) = x(k).wreal + n(k)
with wreal the room impulse response we are looking for. The vector n(k) consists of
only the near end noise in absence of a far end signal. We can derive the formula for
the difference vector ∆w(k) between the real impulse response wreal and the identified
impulse response at time k, w(k), namely
∆w(k) ≡ w(k) − wreal ,

∆w(k)
(3.1)



= I − µ XP (k)(XPT (k)XP (k) + δI)−1 XPT (k) ∆w(k − 1) +
|
{z
}
P (k)
µ XP (k)(XPT (k)XP (k)
|
{z
P̃ (k)
+ δI)−1 n(k).
}
(3.2)
53
3.1. APA REGULARIZATION
If XP (k)is written as its singular value decomposition,
XP (k) = U (k)Σ(k)V T (k),
we can write
P (k)
= XP (k)(XPT (k)XP (k) + δI)−1 XPT (k)
⇓ XPT (k)XP (k) , U (k)diag(σ02 (k), . . . , σP2 −1 (k))U T (k)
P (k)
= U (k)diag(
(3.3)
σP2 −1 (k)
σ02 (k)
,
.
.
.
,
, 0N −P )U T (k),
σ02 (k) + δ
σP2 −1 (k) + δ
where σi (k) are the singular values of XP (k), and U (k) and V (k) are orthogonal.
These equations show that δ has an effect both on the adaptation (first term of equation
3.2), and on the near end noise amplification matrix (second term of equation 3.2).
P (k) can be interpreted as an almost–projection matrix. If δ is chosen to be the power
of the background noise n(k), replacing XP (k) by its singular value decomposition
reveals that the directions (see section 2.5) in the adaptation of w(k) with large signal
σi2
≈ 1) and that (unreliable) updates in
to noise ratios are retained (since then σ2 +δ
i
σ2
i
directions with small signal to noise ratios are reduced ( σ2 +δ
≈
i
obvious choice for δ concerning its influence on the adaptation.
σi2
δ ).
Hence this is the
In the second term of equation 3.2, the continuously present background noise is seen
to be multiplied by the matrix µP̃ (k) :
P̃ (k)
P̃ (k)
= XP (k)(XPT (k)XP (k) + δI)−1
⇓ XPT (k)XP (k) , U (k)diag(σ02 (k), . . . , σP2 −1 (k))U T (k)
σ0 (k)
σP −1 (k)
= U (k)diag( 2
,..., 2
, 0N −P )V T (k).
σ0 (k) + δ
σP −1 (k) + δ
(3.4)
Since U (k) and V (k) are orthogonal matrices, the noise amplification factors for the
i–th mode are in the k–th step given as
τ (σi (k), δ) = µ
σi (k)
.
σi2 (k) + δ
(3.5)
So the larger δ is chosen, the less the near end noise is amplified into the adaptation
of w(k). The conclusion is that by the proposed choice of δ, the amplification of the
near end noise is prevented, while the adaptation itself is only reduced in directions
with a low SNR.
Figure 3.1 shows the echo energy loss for an acoustic echo canceller against time for
some speech signal in the presence of near end noise. The dotted line is the loss for
an unregularized APA–algorithm, the full line results when a properly chosen regularization term is applied before inverting the correlation matrix. Both are plotted on a
logarithmic scale. The regularized case is better.
54
Attenuation [dB]
60
50
40
30
20
10
0
0
2
4
6
8
10
12
14
16
18
Time [s]
Figure 3.1: When near end noise is present, the dotted line is the echo energy loss in dB for
affine projection without regularization, the full line for affine projection with regularization.
The graph shows a better result when regularization is applied.
Often a Fast Transversal Filter (FTF) algorithm is used [26] to update the inverse correlation matrix (XPT (k)XP (k))−1 , since then also regularization by diagonal loading
can rather straightforwardly be built in. But since this type of algorithms is known to
have poor numerical properties, we propose to use QR–updating instead. The update
equations for APA then become

e(k) = dP (k) − XPT (k)w(k − 1)

T
R (k)RP (k)g(k) = e(k)
(3.6)
 P
w(k) = w(k − 1) + µXP (k)g(k)
The second equation in (3.6) can then be solved by first updating R(k) by means of
Algorithm 4, and then performing two successive backsubstitutions (with quadratical
complexity because R(k) is triangular). QR updating is numerically stable since it
can be implemented by using only (stable) orthogonal rotations. QR–updating — just
like the backsubstitutions — has a quadratic complexity, but since P (the dimension
of (XPT (k)XP (k))−1 ) in acoustic echo cancelling applications typically is very small
compared to the filter length (P = 2 . . . 10 while N = 2000), this is not an issue.
The implementation cost for diagonal loading in the FAP algorithm is zero if (as in the
original algorithm) FTF is used to update the correlation matrices, but is is impossible
to implement this when (as we propose) the stable QR–updating approach is used.
Exponential weighting as a regularization technique on the other hand, fits in nicely
55
3.1. APA REGULARIZATION
with the QR–updating approach.
3.1.2
Exponential weighting
An alternative way to introduce ’regularization’ consists in using an exponential window for estimating the inverse covariance matrix [44]. The updating for the correlation
matrix now becomes
(XPT (k + 1)XP (k + 1))−1 = (λXPT (k)XP (k) + x(k + 1)xT (k + 1))−1
with x(k) = [ x(k) x(k − 1) . . . x(k − P + 1)]T . This is in contrast to the
equations (2.25) and (2.26), where a sliding window is used. Figure 3.2 shows that
when no noise is present, APA with a sliding window and APA with an exponential
window both perform almost equally well. As shown in Figure 3.3, the regularization
effect of using an exponential window keeps the coefficients from drifting away from
the correct solution when noise is present.
APA (dotted) versus APA with exponential window (full), no noise present
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
0.5
1
1.5
2
2.5
4
x 10
Figure 3.2: For a simulated environment with a speech input signal, this plot shows the distance (the norm of the difference) between the real filter and the identified filter coefficient
vectors against time, both for original APA with a sliding window (dotted line) and APA with an
exponential window (full line). This experiment shows the noiseless case, performance of both
algorithms is equal.
56
APA (dotted) versus APA with exponential window (full line) when noise is present
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
0.5
1
1.5
2
2.5
4
x 10
Figure 3.3: Distance between identified and real filter vector versus time for the case where
the echo signal is a speech signal, and near end noise is present. The full line is APA with an
exponential window, the dotted line is the original APA algorithm without regularization. The
identified filter coefficients get much closer to the real coefficients in case regularization (by
using an exponential window) is used.
3.2
APA with sparse equations
In this section we will derive a third alternative for incorporating regularization into
the affine projection algorithm, which we call ’Sparse Equations’ technique. Diagonal
loading is not easily implemented when the QR–updating technique is used, but both
exponential weighting (see previous section) and the ’Sparse Equations’–technique
on the other hand, are. Equation 3.5 shows that for δ = 0 noise amplification will be
smaller if the smallest singular values are larger. So every method that realizes this is
suitable to be used instead of explicit regularization.
The reason why the singular values become small, lies in the autocorrelation that exists
in the filter input signal. The system of P consecutive equations that is solved in APA,
will therefore have a large condition number. This leads to the idea of using nonconsequtive equations. The equations will be less correlated since a typical speech
autocorrelation function decreases with the time lag. We call the non–consequtive
equations sparse equations. We will develop this further for equally spaced sparse
equations.
The matrix XP (k) ∈ RN ×P in the equations (2.25) to (2.26) is replaced by a matrix
57
3.2. APA WITH SPARSE EQUATIONS
eP (k) ∈ RN ×P as follows.
X
= d̃P (k) − X̃PT (k)w̃(k − 1)
ePT (k)X
eP (k))−1 ẽ(k)
= (X
(3.7)
[x(k) x(k − D) . . . x(k − (P − 1)D)]
T
d(k) d(k − D) . . . d(k − (P − 1)D)
=
(3.10)
ẽ(k)
g̃(k)
w̃(k)
(3.8)
(3.9)
= w̃(k − 1) + µX̃P (k)g̃(k)
where
eP (k)
X
d̃P (k)
=
Figure 3.4 shows the time behaviour of the smallest and the largest singular value of a
e T (k)X
eP (k))
regularized1 (XPT (k)XP (k) + δI) (explicit regularization case) and (X
P
Smallest and largest singular value. Full line : sparse equations, dotted : successive
3
10
σ
2
10
1
10
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
0
0.5
1
1.5
2
2.5
4
Samples
x 10
Figure 3.4: Smallest and largest singular value of input correlation matrix for a speech signal
in function of time. Dotted line : explicit regularization (δ = 0.1). Full line : sparse equations
(D = 10). Signal peak value = 0.1, P = 10, N = 1024. Regularization parameters were
tuned for equal initial convergence in an echo canceller setup.
(sparse equations case) for a speech signal, plotted on a logarithmic scale. Figure
3.4 shows that the matrix constructed using sparse equations typically has a better
condition number than the explicitly regularized one.
1 In order to provide a fair comparison, regularization parameters have been tuned so that the initial
convergence performance of an APA–adaptive filter is equal in both cases
58
There is a restriction though : the input signal x(k) has to be nonzero over the considered time frame (just some background (far end) noise is enough), because otherwise
its covariance matrix even with sparse equations will become zero, (i.e. singular).
Notice the silence in the input signal in the beginning of the plot. Since a control
algorithm is available in every practical implementation of an echo canceller, its internal signals can be used to switch off adaptation when there is no far end signal
present, and if there is a signal present, its covariance matrix should be of full rank.
Experiments confirm that this is always the case in practice for speech signals.
We will again use QR–updating (and downdating, see below) to track the covariance
matrix (see section 2.3.2) .
XP (k) = QP (k)RP (k)
−1
−T
T
(XPT (k)XP (k))−1 = (RP
(k)QTP (k)QP (k)RP (k))−1 = RP
(k)RP
(k)
(3.11)
(3.12)
Here RP (k) ∈ RP ×P is an upper triangular matrix, QP (k) is an orthogonal matrix. Equation (3.12) shows that only the upper triangular matrix RP (k) needs to be
stored and updated. Equation (3.8) can then be calculated by backsubstitution. (An
alternative would be using inverse QR–updating [29] instead of QR–updating and
multiplications instead of backsubstitutions ).
From (2.25) and (3.10) it is seen that for updating XP (k) to XP (k + 1), instead of
adding a column to the right and removing a column from the left, also a row can
be added to the top, and one removed from the bottom. This translates into size P
updates and downdates for the upper triangular matrix RP (k).
Updating can be done using Givens-rotations on RP (k) for the updates (which corresponds to adding a row to XP (k)). Similarly, downdating is performed using hyperbolic rotations [29]. The procedure (and SFG, see Figure 2.3) is similar to the
QRD–updating procedure, only now hyperbolic transformations of the form
0 a
cosh(θ) − sinh(θ)
a
=
b0
− sinh(θ) cosh(θ)
b
where angle θ is computed in a diagonal processor in the signal flow graph. The
downdating algorithm is given in Algorithm 9, together with the function to update
R with a rectangular window.
In this way a rectangular window is implemented. Because the hyperbolic rotations
are not numerically stable, it is interesting to make the window (weakly) exponential
by multiplying the matrix RP (k) with a weighting factor λ (very close to 1) in each
step. In this case, the filter weights must be compensated. This is due to the fact that
XP (k) is updated row by row, while the actual input vectors are the columns. So the
’compensated’ filter vector becomes
w̃comp (k) = diag(
1
λN −1
,...,
1
)w̃(k)
λ0
3.2. APA WITH SPARSE EQUATIONS
59
Algorithm 9 QRD–downdating and tracking of R(k) with a rectangular window. If
λ 6= 1 the filter vector should be compensated.
DowndateQRD (R, x)
{
// x is input vector
// an upper triangular matrix R is being downdated
for (i = 0; i < M * N; i++)
{
if (abs(x[i]) < abs(R[i][i]))
{
temp = x[i]/R[i][i];
coshTheta = 1 / sqrt(1-temp*temp);
sinhTheta = coshTheta*temp;
}
else
{
temp = R[i][i]/x[i];
sinhTheta=1/sqrt(1-temp*temp);
coshTheta=sinhTheta*temp;
}
R[i][i] = coshTheta * R[i][i] - sinhTheta * x[i];
for (j = i+1; j < M * N; j++)
{
temp = R[i][j] ;
R[i][j] = coshTheta * temp - sinhTheta * x[j];
x[j] = -sinhTheta * temp + coshTheta * x[j];
}
}
}
TrackRRectangularWindow
{
UpdateQRD(R,xk:−1:k−P +1 ,λ)
DowndateQRD(R,xk−N :−1:k−N −P +1 )
}
60
How much decorrelation is provided by choosing D larger is of course dependent upon
the statistics of the far end (echo reference) signal. In our experiments we have taken a
fixed value of D. It should be noted that the complexity and memory requirements of
the implementation will rise for larger D. If D is chosen large, more ’past information’
is considered for the estimation of the input statistics, (which is also the case for
exponential weighting of course), so tracking of the input signal statistics will become
slower.
The plots in Figure 3.5 show the evolution of the distance between a (synthetically
generated) room impulse response and what APA (dotted line) and APA with sparse
equations (full line) identify as the filter vector. In Figure 3.5, there is no near end
noise present, and then both methods have almost equal performance. In Figure 3.6, a
small quantity of white near end noise disrupts the adaptation of the filter coefficients,
which now at some points have a tendency to move away from their optimum values.
The sparse equations setup can be seen to perform better than the setup with explicit
regularization. This experiment was repeated with different distances between the
equations and different regularization factors. Here, a distance of D = 5 was chosen,
compared to a δ = 0.01 (where the maximum signal level is 0.1).
Distance between real room response and (1) APA (dotted) (2) Sparse−APA (full)
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
0.5
1
1.5
2
2.5
4
x 10
Figure 3.5: Distance between real room response and w(k), the identified filter vector in function of time (speech input). Dotted line is regularized APA (δ = 0.01) , full line is sparse APA
(D = 5). No near end noise is present. Sparse APA converges somewhat slower.
The tree alternatives for regularization we have described, can all be used, and even
combined. If QR–updating is used to keep track of the covariance matrix, regularization by diagonal loading is difficult to implement, but both exponential weighting and
the sparse equations technique are valid choices.
61
3.3. FAP AND THE INFLUENCE OF REGULARIZATION
norm van fout van disp−w en van gewone w (Volle lijn = disp).
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
0.5
1
1.5
2
2.5
4
x 10
Figure 3.6: Distance between real room response and w(k) for a speech signal. Dotted line is
regularized APA (δ = 0.01), full line is sparse APA (D = 5). Near end noise is present. Sparse
APA is shown to be a viable alternative for regularization
It is also possible to combine the sparse equations technique with exponential weighting. This can easily be done by leaving out the downdates and the compensation for
the filter weights. The sparse equations technique provides one with an extra parameter when regularizing APA, or can be used as a standalone technique for regularization.
3.3
FAP and the influence of regularization
FAP was reviewed in section 2.4.3. An important observation is that the fast affine
projection algorithm [26] builds upon some assumptions that are not valid anymore
if the influence of the regularization becomes too large. The algorithm then starts
to expose convergence problems, which is clearly shown in Figure 3.7 for a FAP
algorithm with exponential weighting as a regularization technique.
Figure 3.8 shows another example with explicit regularization for P = 10 with a
’strong’ regularization parameter (δ = 10). The plot shows the time evolution of the
synthetically generated room impulse response and the filter vector estimated by both
algorithm classes. APA is shown to perform better for this large regularization parameters than the FAP algorithm. This in particular will be a motivation for developing
a fast (block exact) APA algorithm in chapter 4, as an alternative to the existing fast
(block exact) FAP algorithms.
62
0.25
0.2
0.15
0.1
0.05
0
0
0.5
1
1.5
2
2.5
4
x 10
Figure 3.7: Time evolution of the distance between identified and real filter for FAP (dotted)
and FAP with an exponential window, λ = 0.9998 (full line) and a speech signal as far end
signal. The approximations made in FAP are clearly not valid anymore when an exponential
window is used.
0.2
|Wk−Wreal|
0.18
0.16
0.14
0.12
0.1
FAP,
δ=10
0.08
0.06
APA,
δ=10
0.04
0.02
0
0
0.5
1
1.5
2
2.5
4
Samples
x 10
Figure 3.8: Behaviour of the FAP–based algorithms as compared to the APA–based algorithms
when strong regularization is involved. The time evolution of the filter vector error norm (distance between real and identified room impulse response) is shown. The APA–algorithm has a
much better convergence. The input is a speech signal.
3.4. EXPERIMENTAL RESULTS
3.4
63
Experimental results
We will now show some experimental results concerning the regularization effect of
the sparse equations technique on the echo canceller performance in the presence of
near end noise. In all experiments in this section the same speech signal is used (maximum value of the signal is 0.1). The length of the echo canceller is 900 taps, and it
tries to model a synthetically generated room impulse response of 1024 taps. This is a
typical situation for echo cancelling : the ’real’ room impulse response is longer than
the length of the acoustic echo canceller. The step size parameter is always µ = 1.
In Figure 3.9, we compare the time evolution of weight error for APA (dotted line)
and Sparse–APA (full line). In this simulation, white near end noise disturbs the
adaptation of the filter coefficients. In the first half of the plot, the SNR is higher than
in the second half of the plot. In all these experiments, the regularization parameters
(δ for APA and the equation distance D for Sparse APA) were tuned to obtain an equal
initial convergence, in order to have a fair comparison of the steady state performance.
This was done by setting D = 5 for sparse–APA and then experimentally determining
the value of δ (= 0.005) in order to get the same initial convergence.
Sparse–APA where 10 out of 50 equations are used (so D = 5), outperforms explicitly
regularized FAP (δ = 0.005) with 10 successive equations, both in the high and the
low SNR part. Its performance is comparable with explicitly regularized APA (δ =
1) with 10 successive equations when the SNR is not too low, while else explicit
regularization is better. On the plot also the performance of an explicitly regularized
FAP–algorithm (δ = 1) with 50 successive equations is shown in order to show the
performance drop if only 10 out of 50 equations are used. We can conclude that the
performance of Sparse–APA where 10 out of 50 equations are used is better than the
performance of FAP with 10 successive equations, if near end noise is present. The
reason for this can probably be found in the regularizing effect of the sparse equations
technique, and in the fact that the approximations in FAP are not present in Sparse–
APA.
Figure 3.9 also shows that regularization reduces FAP performance more than in the
case of APA. In APA a regularization δ = 1 is needed to slow down the convergence
to the same rate as the initial convergence for the sparse equations technique with
D = 5. For FAP, the initial convergence speed has already decreased to that point
with an explicit regularization of δ = 0.005. So this figure proves the performance
benefit of using APA instead of FAP.
In Figure 3.10, the tracking behaviour of Sparse APA is compared to the behaviour
of APA, and it is the same for both algorithms when they are regularized comparably
(equal initial convergence behaviour). In this experiment, D = 25 for Sparse APA,
and δ = 0.1 for plain APA. P = 10 in both cases.
64
0.18
|Wk−Wreal|
0.16
0.14
0.12
FAP,
δ=.005, P=10
Sparse−APA, D=5, P=10
0.1
0.08
0.06
0.04
APA, δ=1
0.02
FAP,
0.5
1
1.5
δ=1, P=50
2
4
Samples
x 10
Figure 3.9: Distance between real room response and w(k) in the presence of near end noise in
function of time. Regularization parameters have been tuned to give equal initial convergence
characteristics. The SNR of the far end speech signal versus the near end noise is higher in the
first half of the signal than in the second half. Regularization reduces FAP performance more
than APA performance.
3.5
Regularization in multichannel AEC
An important issue is multichannel AEC, as we have already mentioned in chapter
1. When multiple loudspeakers are used to reproduce the sound that stems from one
speech source, the non–uniqueness problem occurs [41, 4]. In Figure 3.11 the situation is depicted. Microphones in the far end room pick up the sound of ’Source’
filtered by the transmission room impulse responses g1 and g2 . These signals are then
again filtered by the receiving room impulse responses h1 and h2 . If the length N of
the echo canceller filter w is larger or equal than the length of the transmission room
impulse responses, the following equation holds :
g2
g1
T
T
x1
= x2
0
0
such that


g2
 0 

X(k) 
 −g1  = 0
0
which means that X(k) is rank deficient and hence that no unique solution exists for
(2.7).
65
3.5. REGULARIZATION IN MULTICHANNEL AEC
Tracking behaviour
0.25
|Wk−Wreal|
0.2
0.15
Sparse−APA
0.1
Explicitly regularized APA
0.05
0
0
0.5
1
1.5
2
2.5
4
x 10
Samples
Figure 3.10: Tracking behaviour of Sparse APA (full line) compared to explicitly regularized
APA (dotted line) (far end signal is a speech signal). Regularization is tuned to obtain equal
initial convergence. At the 12000 th sample, the room characteristics change. The tracking
behaviour remains equal. Note the small peaks that occur if the input signal is not ’persistently
exciting’ (to be solved by the speech detection device)
x1
h1
x2
Transmission
room
g1
h2
W
g2
Source
ei
+
di
Receiving
room
Figure 3.11: The multichannel echo cancellation non uniqueness problem. Changes in either
the transmission room or the receiving room will destroy successful echo cancellation when the
T
exact paths h1 and h2 have not been identified by w = w1 w2
.
66
Allthough the adaptive filter will find some solution, only the solution corresponding
to the true receiving room echo paths is independent of the transmission room impulse responses. In a mono acoustic echo canceller setup, the filter has to re–adapt
if the acoustical environment in the receiving room changes. In case a multichannel
echo canceller does not succeed to identify the correct filter path, changes in the transmission room will also result in a residual echo signal occurring in the acoustic echo
canceller output.
In practice this situation does not strictly occur, because for echo cancellation the filter
length N is usually smaller than the length of the impulse response in the transmission
room
g2
xT1 0 xT2 0
=α∼
=0
g1
But still, this means that the problem is typically ill–conditioned.
Attempts to solve this problem can be found in the literature [41, 58, 28, 7], and they
consist of decorrelating the loudspeaker signals (i.e. reducing the cross–correlation
between the inputs of the adaptive filter by means of additional filtering operations,
non–linearities, noise insertion, etc.). Obviously it is important that this remains inaudible.
In addition to these decorrelation techniques (which can not be exploited too much
because of the inaudibility constraint), it is important to use algorithms of which the
performance is less sensitive to correlation in the input signal than NLMS. In [4] it
is shown that RLS performs well because of its independence of the input eigenvalue
spread. Since RLS is an expensive algorithm, and APA is intermediate between NLMS
and RLS, APA is often considered to be a good candidate for use in multichannel AEC
[34, 17, 33].
Experiments show that the influence of near end noise on the adaptation is a lot larger
for a multichannel setup than for a mono echo canceller based upon affine projection,
and that for good results the regularization has to be a lot stronger, i.e. the problem
that occurs in FAP–based algorithms is even more present in this case.
In [26], explicit regularization is suggested with δ equal to the near end noise power.
But experiments show that this is not enough for the case with a large cross correlation
between the input channels, when a large amount of noise is present. When various
appropriately strong regularization methods are used instead, the performance drop
due to the approximation in FAP is unacceptably large. For this reason, we propose to
use the APA algorithm instead of FAP.
In the experiments shown here, 50.000 samples of a signal sampled at 8 kHz have
been recorded in stereo, the room impulse responses (1000 taps) we want the filter to
identify have been generated artificially, and artificial noise (SNR=30 dB) has been
added.
If we apply the Sparse–APA algorithm with spacing D between the equations, exper-
67
3.5. REGULARIZATION IN MULTICHANNEL AEC
iments show that the cross–correlation problem in the stereo algorithm adds to the
auto–correlation problem that is already present in the mono–algorithm. This means
that a much stronger regularization is required in the stereo case. We have chosen
D = 200. This is to be compared with the typical value of D = 10 for the mono case.
For exponential updating, a forgetting factor λ = 0.9998 is a typical value.
As already mentioned, explicit regularization can only be used in the FAP–based algorithms for small regularization terms (δ comparable to the near end noise variance,
which was 0.001 in our experiments). When large δ are required as in this stereo problem with a lot of noise present, one has to resort to an exact APA–implementation.
Even a δ = 0.1, for which the FAP–approximation is not valid anymore, is not large
enough to regularize the problem at hand, as shown in Figure 3.12. Eventually, δ = 2
was chosen. The results of these three techniques are shown in Figure 3.13, and can
be seen to be comparable.
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
6
4
x 10
Figure 3.12: Distance between real and identified impulse response versus time for APA (speech
signal) with an explicit regularization factor δ = 0.1. For the mono case this was sufficient, but
obviously not for a stereo setup : the filter does not converge.
Finally, we want to reiterate that strong regularization does not solve the stereo echo
cancelling problem, only decorrelation techniques do. But regularization is necessary
in addition in order to provide near end noise robustness.
68
Full : explicit with δ=2, dashed : exponential updating with λ=0.9999, dotted : sparse equations with D=200
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
6
4
x 10
Figure 3.13: 3 Ways of regularizing the stereo affine projection algorithm for a speech signal
input with near end noise present. Full line is the distance between the real and the identified impulse response for explicit regularization with δ = 2, dashed line is the distance for
exponential updating with λ = 0.9999 and dotted line is the distance for the sparse equations
technique with an equation spacing of D = 200. The parameters are clearly higher than in the
mono case.
3.6. CONCLUSION
3.6
69
Conclusion
In this chapter, we have shown that if affine projection techniques are used for acoustic
echo cancellation, it is important to provide sufficient regularization in order to obtain
robustness against continuously present background noise. This is important in single
channel echo cancellation, but even more so in multichannel echo cancellation. In
the latter case, cross–correlation between the loudspeaker signals (and hence the input
signals of the adaptive filter) leads to ill–conditioning of the problem. Regularization
needs to be applied in addition to decorrelation techniques.
We proposed to replace the FTF–based update of the small correlation matrix of size
P in the original FAP algorithm by a QRD–based updating procedure that is numerically more stable. Diagonal loading is not easily implemented in this QRD–based
approach, and therefore we have described two alternative approaches, exponential
weighting, and a new technique based on ’sparse equations’. Performance–wise comparable results can be obtained by the three regularization techniques.
We have shown that there are both advantages and disadvantages to the FAP–algorithm.
Diagonal loading can be incorporated, because it uses the FTF–algorithm for updating the size P correlation matrix, but on the other hand it makes some approximations
that are only valid when not too much regularization is applied, and hence it exposes
performance problems when more regularization is applied (e.g. in multichannel echo
cancellation). This observation motivates the derivation of the BEAPA–algorithm in
chapter 4.
70
Chapter 4
Block Exact APA (BEAPA) for
AEC
We have explained in the previous chapter why regularization is an absolute necessity in affine projection based adaptive filtering algorithms, and that FAP (fast implementation of APA) relies on an implicit ’small regularization parameter’ assumption.
This in particular may lead to poor performance of the FAP algorithm as compared to
(properly regularized) APA.
In this chapter, a block exact affine projection algorithm (BEAPA) is derived that does
not rely on the assumption of a small regularization parameter. It is an exact frequency domain translation of the original APA algorithm, and still has about the same
complexity as BEFAP, which is a similar frequency domain (hence low complexity)
version of FAP. In a second stage, the BEAPA algorithm is extended to incorporate
an alternative to explicit regularization that is based on so–called “sparse” equations
(section 3.2).
Section 4.1 will review FAP and a frequency domain version thereof : block exact
FAP (BEFAP) [59]. FAP is not exactly equal to APA, and therefore some regularization techniques will have different effects in FAP, compared to APA as shown in
the previous chapter. In section 4.2, a fast block exact frequency domain version of
the affine projection algorithm is derived (Block Exact APA). This algorithm has a
complexity that is comparable to the BEFAP–complexity, while it is an exact but fast
version of APA. In section 4.3 Block Exact APA is extended to allow for the sparse
equations technique to be used (Sparse Block Exact APA).
71
72
4.1
CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC
Block Exact Fast Affine Projection (BEFAP)
In [36] and [59], a block exact version of FAP (see section 2.4.3) is derived, which
is referred to as BEFAP. Since the derivation of the algorithm in this chapter is based
upon BEFAP, it is instructive to review the concept of this algorithm in order to clarify
the differences. A basis filter vector that is fixed during a block of size N2 (e.g. 128)
is taken as a basis for a fast (frequency domain) convolution with a (possibly smaller,
but a typical value is also 128) block length N1 . Since the filter vector is fixed during
this block, the filtering operation can indeed be calculated cheaply in the frequency
domain. N1 can be made smaller to reduce the delay of the system. To obtain an
exact version of the FAP–algorithm, corrections to the residuals obtained with the fast
convolution are calculated during the block. The complexity of the corrections grows
within the block, but because of the choice of the parameters, it never reaches the
complexity of the full filtering operation that is needed with FAP. After a block, all
the corrections during the block of length N2 are applied to the basis filter vector, by
means of a frequency domain convolution, resulting in the same output as the original
FAP algorithm.
If the filter vector were updated in each step and the basis filter vector w(k − 1) were
known at time instant k, we can write w(k + i − 1) in terms of w(k − 1). We let sj (k)
denote the j’th component of vector s(k) :
w(k + i − 1)
= w(k − 1) +
i
X
µXP (k + i − j)g(k + i − j),
j=1
= w(k − 1) +
i+P
X−1
sj (k + i − 1)x(k + i − j)
j=1
−
P
−1
X
sj (k − 1)x(k − j).
j=1
The meaning of s(k) is as follows : since the columns in XP (k) shift through this
matrix, the multiplications with g(k) can largely be simplified by adding together
corresponding components of the vectors g(k) for k = 1..i and thus building up a
vector s(k) recursively, containing such summed components. In what follows, k is
the sample index at the start of a new block, while i is an index inside a block. If we
let s|ji denote a sub–vector consisting of the i’th to the j’th element of s, the vector
s(k + i − 1) is recursively obtained as
s(k)
|{z}
∈RP ×1
=
0
−1
s(k − 1)|P
1
+ µg(k) (i = 1),
73
4.1. BLOCK EXACT FAST AFFINE PROJECTION (BEFAP)
s(k + i − 1) =
|
{z
}
0
s(k + i − 2)
+
µg(k + i − 1)
0i−1
(i > 1), (4.1)
∈R(P +i−1)×1
where 0i−1 is a null vector of size i − 1. Vector s(k) grows within a block, but its size
is reset to P × 1 at each block border (where a new basis filter is calculated). So in
each block, vector s(k + i − 1) grows from size P to size P + N2 − 1. The contents
of the first P − 1 positions of s(k − 1) remain intact when crossing block borders.
In BEFAP, the filter vector is not updated in each time step, but only at the end of a
block. We will use the expression for w(k + i − 1) to derive corrections to the filter
output that have to be applied after a filtering operation with the basis filter w(k − 1).
The filter output is then written as
y(k + i)
= xT (k + i)w(k + i − 1)
= xT (k + i)w(k − 1)
+
i+P
X−1
−
P
−1
X
sj (k + i − 1)xT (k + i)x(k + i − j)
j=1
sj (k − 1)xT (k + i)x(k − j)
j=1
= xT (k + i)w(k − 1) +
i+P
X−1
sj (k + i − 1)rj (k + i)
(4.2)
j=1
−
P
−1
X
sj (k − 1)ri+j (k + i).
j=1
We let rj (k) denote the j’th component of vector r(k). These correlations are defined
as
rj (k + i) ≡ xT (k + i)x(k + i − j).
(4.3)
In practical implementations, these correlations are recursively updated. Still referring
to [59], one can avoid the third term in the equations for the output if one defines an
alternative basis filtering vector :
z(k − 1) = w(k − 1) −
P
−1
X
sj (k − 1)x(k − j),
(4.4)
j=1
which can be updated as
z(k + N2 − 1)
+N2 −1
= z(k − 1) + X(k + N2 − P )s(k + N2 − 1)|P
,(4.5)
P
where X(k + N2 − P ) is defined as
X(k + N2 − P ) =
(4.6)
x(k + N2 − P ) x(k + N2 − P − 1) . . . x(k − P + 1) .
74
We can now rewrite (4.2) as
y(k + i) = xT (k + i)z(k − 1) +
i+P
X−1
sj (k + i − 1)rj (k + i).
(4.7)
j=1
The filtering operation with the alternative basis vector in the first term of equation 4.7
and the update to the next basis filter vector (4.5) after N2 samples can be performed
in the frequency domain by means of fast convolutions. The block sizes of these
convolutions do not necessarily have equal length : the block size for the filtering
operation is N1 and the block size for the update is N2 . The overall complexity of this
BEFAP–algorithm is
6M1 log2 M1 − 7M1 − 31
+ 6N2 + 15P − 4+
N1
P2 − P
6M2 log2 M2 − 7M2 − 31
+ 10P 2 +
.
N2
N2
(4.8)
Algorithm 10 Block Exact FAP
for j = 1 to N2 + P
rj1 = (u|kk−L+1 )T u|k−j−1
k−j−L
endfor
loop
for i = 0 to N2 − 1
if (i modulo N1 == 0)
<fill next part of y with convolution of
next block of N1 samples from u with z>
endif
←−−−−−−−−−−−−−
←−−−−−−−−−−−−−−−
k−L+i+1
r1 = r1 + uk+i+1 u|k+i+1
k+i+1−N −P +1 − uk−L+i+1 u|k−L+i+1−N −P +1
2
2
+i−1 T 1 i+P
ek+i+1 = dk+i+1 − yk+i+1 − (s|P
) r |2
1
E1 = ek+i+1
for α = 2 to P
Eα = (1 − µ)Eα−1
endfor
<update S −1 , the P × P inverse covariance matrix>
e = S −1 E
g
0
µe
g
s=
+
N2 +P D−2
0N2 −1
s|1
endfor
+N2 −1
+N2 +1
z = z+<convolution of s|P
with u|k−P
P
k−P +N2 +1−L+1+1 >
k = k + N2
endloop
Where N1 and N2 are block lengths for the frequency domain algorithm (e.g. N1 =
N2 = 128). Furthermore Mi = N + Ni − 1. The terms containing the logarithms
4.2. BLOCK EXACT APA (BEAPA)
75
are obviously due to the FFT–operations that are used. The complexities of the FFT’s
2
have been taken from [35]. The term P N−P
can often be neglected because P N2 .
2
A typical example is P = 3, N1 = N2 = 128, N = 800 leading to 1654 flops
per sample (which is about half the complexity of FAP). An algorithm description is
provided in Algorithm 10.
4.2
Block Exact APA (BEAPA)
In this section a fast implementation of APA (Block Exact Affine Projection Algorithm
) is derived, based on [59]. In FAP (and in BEFAP), the calculation of the lower P − 1
components of the error vector was based upon an approximation (2.27), while only
the first component is really computed. In APA all components of the residual vector
e
ek are computed in each step instead of only the first one. We describe a method that
does not require a full filtering operation for all of the P equations. The complexity
of the new algorithm is
6M1 log2 M1 − 7M1 − 31
+ 6N2 + 15P − 5 + 11P 2
N1
P2 − P
6M2 log2 M2 − 7M2 − 31
+
.
N2
N2
(4.9)
This formula shows that even though the full error vector is calculated, the required
number of flops is a lot smaller than doing P full filtering operations. For the example
of section 4.1, P = 3, N1 = N2 = 128, N = 800, leading to 1662 flops. (The difference with the BEFAP–algorithm becomes slightly bigger when P is larger). Figure
4.1 is a schematical representation of the final algorithm.
4.2.1
Principle
In the FAP–algorithm [26], only the first component of the error vector is calculated,
and the others are approximated. Block Exact APA will be derived here along the lines
of BEFAP, but such that all error vector components are calculated in each step. When
k denotes the sample index corresponding to the beginning of a block, and i (1..N2 )
an index inside the block, we have
e (k + i) = XPT (k + i)w(k
e + i − 1),
y
e
e (k + i),
e(k + i) = dP (k + i) − y
e(k + i) = (X T (k + i)X(k + i))−1 e
g
e(k + i),
(4.10)
76
From far end
Common for all microphones
∆ Blocklength
Base
Filter
z
r1 r
2
* *
(XT X)−1
rP
*
*
Mostly known from past
s
+ +
+
+ +
+
}
g
To far end
e
Figure 4.1: A schematical representation of the BEAPA algorithm in an echo canceller setup.
Bold lines are vectors. A box with an arrow inside is a buffer that outputs a vector.
e + i − 1) = w(k
e − 1) +
w(k
i
X
µXP (k + i − j)e
g(k + i − j).
(4.11)
j=1
In these expressions, dP (k + i) is the desired signal, e
e(k + i) is a vector with the
e (k + i) is the vector with the
a priori errors of the P equations in this step, and y
outputs of the filter for these P equations. We again propose to use QR–updating and
downdating (or in case of exponential weighting as regularization updating only, see
chapter 3) to keep track of R(k), the Cholesky factor of X(k) and use this triangular
matrix in order to calculate 4.10 with quadratic complexity. From equation 4.11 it can
be seen that, in a similar way as was done for BEFAP, we can write
e + i − 1) = w(k
e − 1) +
w(k
i+P
X−1
j=1
sej (k + i − 1)x(k + i − j) −
P
−1
X
j=1
sej (k − 1)x(k − j),
where w̃(k−1) is our basis filter vector, and where the vector e
s(k+i−1) is recursively
obtained from e
s(k + i − 2). In the beginning of each block, the size of vector e
s(k) is
reset. The recursion is
0
e
s(k) =
+ µe
g(k) i = 1,
(4.12)
−1
e
s(k − 1)|P
1
e(k + i − 1)
0
g
e
s(k + i − 1) =
+µ
i > 1.
(4.13)
e
s(k + i − 2)
0i−1
77
4.2. BLOCK EXACT APA (BEAPA)
The filter outputs yeα (k + i) can now be written as
yeα (k + i)
e + i − 1)
= xT (k + i − (α − 1))w(k
T
e − 1) +
= x (k + i − (α − 1))w(k
i+P
X−1
j=1
P
−1
X
j=1
sej (k + i − 1)xT (k + i − (α − 1))x(k + i − j) −
sej (k − 1)xT (k + i − (α − 1))x(k − j)
e − 1) +
= xT (k + i)w(k
P
−1
X
j=1
i+P
X−1
j=1
sej (k + i − 1)rjα (k + i) −
α
sej (k − 1)ri+j
(k + i).
The correlations are defined as
rjα (k + i) ≡ xT (k + i − (α − 1))x(k + i − j),
which is merely a shorthand notation for rj−(α−1) (k + i − (α − 1)) as defined in
formula 4.3.
We proceed (similarly to (4.4)) by defining a modified basis filter vector
e
e − 1) −
z(k − 1) = w(k
P
−1
X
j=1
sej (k − 1)x(k − j),
then the filter output can be written as
yeα (k + i) = xT (k + i − (α − 1))e
z(k − 1) + e
sT (k + i − 1)rα (k + i),
(4.14)
+N2 −1
e
z(k + N2 − 1) = e
z(k − 1) + X(k + N2 − P )e
s(k + N2 − 1)|P
,
P
(4.15)
in which both e
s(k + i − 1) and rα (k + i) are vectors of length i + P − 1. The
autocorrelation vector rα (k + i) is needed to calculate the correction for the α’th
e (k + i) (which is needed in turn to calculate the α’th component of the
component of y
residual vector e
e(k + i)). The first term of this equation is a filtering operation with
a filter vector e
z(k − 1) that is fixed over a block of size N2 , and that is independent
of α, so it can be performed efficiently in the frequency domain. The second term is
growing inside each block. A recursion for the new filter vector can also be derived
along the lines of [59], which gives
where e
s|ji
is a sub–vector consisting of the i’th through j’th element of e
s. The matrix
X(k) has been defined in (4.6). Here too, the matrix–vector product can be calculated
in the frequency domain with fast convolution techniques (block size N2 ).
78
4.2.2
Complexity reduction
The calculation of all the components of the error vector seems to render its calculation
P times as complex. However some important simplifications can be introduced.
Writing out the correlation vectors used in the corrections in (4.14) (an example is
given for a more general case further on in this chapter), one notices that a recursion
exists for them :
α−1
rβα (k + i) = r2−(β−1)
(k + i − 1)
for β ≤ 2, α > 1,
(4.16)
α−1
rβα (k + i) = rβ−1
(k + i − 1) for β ≥ 2, α > 1.
(4.17)
Since memory may be comparably expensive as processing power, this recursion appears not to be any advantage, because N2 + P delay lines would be needed. But
we can build on this recursion to achieve a major complexity reduction. The update
recursion for s(k + i) (equation (4.13)) shows that a shift operation is applied to this
vector (which grows) in each step, and only the first P elements change after this shift.
In the calculation of the error vector, s is multiplied with each of the P vectors rα .
This means that part of e
sT (k + i − 1)rα (k + i), a scalar, has already been calculated in
e
step k + i − 1 (namely sT (k + i − 1 − 1)rα−1 (k + i − 1)), and we can calculate a correction s(k + i − 1) to this that consists of the accumulated updates to s(k + i − 1 − 1)
multiplied by the relevant first part of the correlation vector :
s(k + i − 1)
0P
=e
s(k + i − 1) −
0
e
s(k + i − 2)
.
(4.18)
The (fixed) length of s(k + i − 1) is P . For α > 1, equations 4.17 and 4.18 lead to
e
sT (k + i − 1)rα (k + i)
T
T
s(k + i − 1)
0
α
=
r (k + i) +
rα (k + i)
e
0P
s(k + i − 2)


r2α (k + i)
 r3α (k + i) 


= sT (k + i − 1)e
rα (k + i) + e
sT (k + i − 2) 

..


.
α
r1+i+P
(k + i)



= sT (k + i − 1)e
rα (k + i) + e
sT (k + i − 2) 

r1α−1 (k + i − 1)
r2α−1 (k + i − 1)
..
.
α−1
ri+P
(k + i − 1)



 .(4.19)

4.3. SPARSE BLOCK EXACT APA
79
The vector e
rα (k + i) is formed from the first P components of rα (k + i). The last
term of equation 4.19 (a scalar) has already been calculated in step k + i − 1. So in
each step, one needs to calculate a ’large’ vector product for the correction to the first
component of the error vector (with the same size as in the BEFAP–algorithm), and
P − 1 small vector products (size P + 1 ) since the second term from 4.19 can be fed
through a delay line. For the calculations of the corrections for the first error vector
+N2 )
component, this gives an average of (P +N2 −1)(P
flops per sample, where N2 is
N2
the block size. To this, (P + 1) flops per sample for each of the P − 1 remaining
components of the error vector must be added. For typical (small) values of P , this is
a lot less complex than calculating all the components straightforwardly.
In this setting, one is free to choose if the e
rα (k + i) are taken from previous steps as
described in equations 4.16 and 4.17, or if they are recalculated at the time that they
are needed (by up– and downdates). The latter requires less memory (for delay lines),
but more flops.
In the acoustic echo cancelling application, often scenario’s occur with multiple microphone setups. Instead of merely repeating the full AEC–scheme, the updates for
the inverse correlation matrix and the updates of the correction correlation vectors can
be shared among different microphone channels.
We have now derived an algorithm that is an exact frequency domain version of the
original affine projection algorithm. If QR–updating is used to keep track of the correlation matrix, exponential weighting can be used to incorporate regularization. But
also FTF–type algorithms can be used to this end, and than also explicit regularisation
can be used. The fact that no approximations are made, is — referring to the results
in the previous chapter — clearly an advantage compared to FAP and BEFAP.
4.2.3
Algorithm specification
An algorithm description of BEAPA can be found in Algorithm 11. We let u =
[x(1), x(2), ...]T be the input signal, and v = [d(1), d(2), ...]T the desired signal, in
both of which the order of the samples is different from the definitions of x and d. A
right to left arrow above a vector flips the order of the components.
4.3
Sparse Block Exact APA
In this section a sparse–equations version of Block Exact APA (Sparse–Block Exact
APA) is derived where the ’sparse equations’ technique of section 3.2 is used for
regularization. The complexity of the new algorithm is
80
Algorithm 11 Block Exact Affine Projection Algorithm
for j = 1 to N2 + P
rj1 = (u|kk−L+1 )T u|k−j−1
k−j−L
endfor
for α = 2 to P
for j = 1 to 1 + P
k−(α−1)
rjα = (u|k−L−(α−1)+1 )T u|k−j−1
k−j−L
endfor
endfor
loop
for i = 0 to N2 − 1
endif
←−−−−−−−−−−−−−
←−−−−−−−−−−−−−−−
k−L+i+1
r1 = r1 + uk+i+1 u|k+i+1
k+i+1−N2 −P +1 − uk−L+i+1 u|k−L+i+1−N2 −P +1
for α = 2 to P
rα =
←−−−−−−−−−−−−−
←−−−−−−−−−−−−−−−
α
r + uk+i−(α−1)+1 u|k+i+1
− uk−L+i−(α−1)+1 u|k−L+i+1
k+i+1−(1+P )+1
k−L+i+1−(1+P )+1
endfor
+i−1 T 1 i+P
A1k+i+1+1 = (s|P
) r |2
1
ek+i+1 = dk+i+1 − yk+i+1 − A1k+i+1+1
E1 = ek+i+1
for α = 2 to P
α−1
D T α 1+P
Aα
k+i+1+1 = Ak+i+1 + (s ) r |2
Eα = vk+i−α+1 − y(k + i − α + 1) − Aα
k+i+1+1
endfor
−1
<update S , the P × P inverse covariance matrix>
e = S −1 E
g
0
µe
g
s=
+
N2 +P D−2
0N2 −1
s|1
endfor
+N2 −1
+N2 +1
with u|k−P
P
k−P +N2 +1−L+1+1 >
k = k + N2
endloop
6M1 log2 M1 − 7M1 − 31
+ 6N2 + 13P + 2P D − 4 + 10P 2
N1
P 2 D2 − P D
6M2 log2 M2 − 7M2 − 31
+ P 2D − D +
.
N2
N2
(4.20)
A typical example (see section 4.2) is P = 3, N1 = N2 = 128, N = 800, D =
3 or 5 leading to 1690 or 1718 flops per sample.
81
4.3.1
Derivation
Since the Sparse BEAPA algorithm is derived in a manner very similar to BEAPA, we
will only briefly state the derivation. The update equations are :
bPT (k + i)w(k
b (k + i) = X
b + i − 1),
y
b + i) − y
b
b (k + i),
e(k + i) = d(k
b T (k + i)X
bP (k + i))−1 b
b(k + i) = (X
g
e(k + i),
P
b + i − 1) = w(k
b − 1) +
w(k
i
X
j=1
bP (k + i − j)b
µX
g(k + i − j).
(4.21)
We can rewrite
b +i−1) = w(k
b −1)+
w(k
i+P
D−1
X
j=1
sb(k +i−1)j x(k +i−j)−
PX
D−1
j=1
sbj (k −1)x(k −j)
A recursion for ŝ(k) can be written :
b
s(k) =
b
s(k + i − 1) =
0
D−1
b
s(k − 1)|P
1
0
b
s(k + i − 2)
+µ
+ µgb0 (k),
gb0 (k + i − 1)
0i−1
with

gb1 (k + i − 1)
0D−1
gb2 (k + i − 1)
0D−1
..
.





0
gb (k + i − 1) = 



 gbP (k + i − 1)
0D−1






.




(4.22)
,
(4.23)
82
The filter outputs ybα (k + i) can now be written as
ybα (k + i)
b + i − 1)
= xT (k + i − (α − 1)D)w(k
T
b − 1) +
= x (k + i − (α − 1)D)w(k
i+P
D−1
X
j=1
PX
D−1
j=1
sbj (k + i − 1)xT (k + i − (α − 1)D)x(k + i − j) −
sbj (k − 1)xT (k + i − (α − 1)D)x(k − j)
b − 1) +
= xT (k + i)w(k
PX
D−1
j=1
i+P
D−1
X
j=1
sbj (k + i − 1)b
rjα (k + i) −
α
sbj (k − 1)b
ri+j
(k + i).
The correlations are defined as
rbjα (k + i) ≡ xT (k + i − (α − 1)D)x(k + i − j).
The modified filter vector is
b
b − 1) −
z(k − 1) = w(k
PX
D−1
j=1
sbj (k − 1)x(k − j).
Then the filter output can be written as
ybα (k + i) = xT (k + i − (α − 1)D)b
z(k − 1) + b
sT (k + i − 1)b
rα (k + i),
(4.24)
in which both b
s(k + i − 1) and b
rα (k + i) are vectors of length i + P D − 1. A recursion
for the filter vector is
D+N2 −1
b
z(k + N2 − 1) = b
z(k − 1) + X(k + N2 − P D)b
s(k + N2 − 1)|P
. (4.25)
PD
The matrix X(k) has been defined in 4.6. Note that it contains all input vectors,
not only one out of D. Again, the matrix–vector product can be calculated in the
frequency domain with fast convolution techniques (block size N2 ).
4.3.2
Complexity reduction
Just like in the BEAPA–algorithm, a recursion exists for the correlation vectors used
in the corrections in 4.24. Take e.g. D = 2 :
b
r1 (k + i − 2) =
(4.26)
83







x(k + i − 2)x(k + i − 3) + x(k + i − 3)x(k + i − 4) + . . . + x(k + i − L − 1)x(k + i − L − 2)
x(k + i − 2)x(k + i − 4) + x(k + i − 3)x(k + i − 5) + . . . + x(k + i − L − 1)x(k + i − L − 3)



.


.

.
x(k + i − 2)x(k − P D + 1) + x(k + i − 3)x(k − P D) + . . . + x(k + i − L − 1)x(k − L − P D + 2)
b
r2 (k + i) =










(4.27)
x(k + i − D)x(k + i − 1) + x(k + i − D − 1)x(k + i − 2) + . . . + x(k + i − D − L + 1)x(k + i − L)
x(k + i − D)x(k + i − 2) + x(k + i − D − 1)x(k + i − 3) + . . . + x(k + i − D − L + 1)x(k + i − L − 1)
x(k + i − D)x(k + i − 3) + x(k + i − D − 1)x(k + i − 4) + . . . + x(k + i − D − L + 1)x(k + i − L − 2)
.
.
.
x(k + i − D)x(k − P D + 1)) + . . . + x(k + i − D − L + 1)x(k − L − P D + 2)









For this example, one can see that the third to the last component of 4.27, which is
the autocorrelation vector needed to calculate the second component of the residual
vector at time k + i, have already been calculated 2 time steps before as the first rows
of 4.26 for the (smaller) autocorrelation vector needed to calculate the first component
of the residual vector at time k + i − 2. Generalizing the above example, we can say
that all the components starting from component with index D + 1 from b
rα (k + i)
α−1
are already available in b
r
(k + i − D). To interpret this, one has to bear in mind
that the vectors b
rα (k + i) grow in length with i. Writing out also b
r1 (k + i − 1) and
1
α
b
r (k + i) would show that the components of b
r (k + i) with indices from 1 to D are
also already calculated in previous steps. This can be summarized as
α−1
rbβα (k + i) = rb(D+1)−(β−1)
(k + i − D)
for β ≤ D + 1, α > 1,
(4.28)
α−1
rbβα (k + i) = rbβ−D
(k + i − D) for β ≥ D + 1, α > 1.
(4.29)
Instead of using N2 + P D delay lines of length D, we can again reduce this to P
scalar delay lines of length D. For b
s(k + i) only the first P D elements change after
the shift operation in each time step :
b
s(k + i − 1)
0P D
=b
s(k + i − 1) −
0D
b
s(k + i − D − 1)
.
(4.30)
The fixed length of b
s(k + i − 1) is D + P D − 1. For α > 1, equations 4.29 and 4.30
lead to
84
b
sT (k + i − 1)b
rα (k + i)
T
T
0D
b
s(k + i − 1)
α
b
b
=
r (k + i) +
rα (k + i)
b
s(k + i − D − 1)
0P D


α
rbD+1
(k + i)
α
 rbD+2
(k + i) 
T


=b
s (k + i − 1)b
rα (k + i) + b
sT (k + i − D − 1) 

..


.
α
rbD+i+P
D (k + i)
T
=b
s (k + i − 1)b
rα (k + i) +
 α−1
rb1 (k + i − D)
 rb2α−1 (k + i − D)

b
sT (k + i − D − 1) 
..

.
α−1
rbi+P
D (k + i − D)
(4.31)



.

The vector b
rα (k + i) is formed from the first D + P D − 1 components of b
rα (k + i).
The last term of equation 4.31 (a scalar) has already been calculated in step k + i − D.
Hence in each step, one needs to calculate a ’large’ vector product for the correction to the first component of the error vector (with the same size as in the BEFAP–
algorithm), and P − 1 small vector products (size P D + D). For the calculations of
the corrections for the first error vector component, this gives an average of
(P D + N2 − 1)(P D + N2 )
N2
flops per sample, where N2 is the block size in the BEFAP–algorithm. To this,
(P D + D) flops per sample for each of the P − 1 remaining components of the error
vector must be added.
4.3.3
Algorithm specification
A complete specification is given in Algorithm 12. Again u = [x(1), x(2), ...]T is
the input signal, and v = [d(1), d(2), ...]T the desired signal. These definitions differ
from the definitions of x and d. A right to left arrow above a vector flips the order of
the components.
4.4. CONCLUSION
85
Algorithm 12 Sparse Block Exact APA
for j = 1 to N2 + P D
rj1 = (u|kk−L+1 )T u|k−j−1
k−j−L
endfor
for α = 2 to P
for j = 1 to D + P D
k−(α−1)D
rjα = (u|k−L−(α−1)D+1 )T u|k−j−1
k−j−L
endfor
endfor
loop
for i = 0 to N2 − 1
endif
←−−−−−−−−−−−−−−
←−−−−−−−−−−−−−−−−
k−L+i+1
r1 = r1 + uk+i+1 u|k+i+1
k+i+1−N2 −P D+1 − uk−L+i+1 u|k−L+i+1−N2 −P D+1
for α = 2 to P
←−−−−−−−−−−−−−−
rα = rα + uk+i−(α−1)D+1 u|k+i+1
−
k+i+1−(D+P D)+1
←−−−−−−−−−−−−−−−−−
k−L+i+1
uk−L+i−(α−1)D+1 u|k−L+i+1−(D+P D)+1
endfor
D+i−1 T 1 i+P D
A1k+i+1+D = (s|P
) r |2
1
ek+i+1 = dk+i+1 − yk+i+1 − A1k+i+1+D
E1 = ek+i+1
for α = 2 to P
α−1
D T α D+P D
Aα
k+i+1+D = Ak+i+1 + (s ) r |2
Eα = vk+i−αD+1 − y(k + i − αD + 1) − Aα
k+i+1+D
endfor
−1
<update S , the P × P inverse covariance matrix>
e>
<spread the result of S −1 E into g
for m = D downto 2
0
µe
g
+
sm =
D−2
0
sm−1 |D+P
D−1
1
endfor
µe
g
s1 =
0D−1
0
µe
g
s=
+
N2 +P D−2
0N2 −1
s|1
endfor
D+N2 −1
D+N2 +1
with u|k−P
PD
k−P D+N2 +1−L+1+1 >
k = k + N2
endloop
4.4
Conclusion
We have derived a block exact frequency domain version of the affine projection algorithm, named Block Exact APA (BEAPA). This algorithm has a complexity that is
86
comparable with the complexity of a block exact frequency domain version of fast
affine projection (namely BEFAP), while it does not use the approximations that are
present in (BE)FAP. It has the advantage that the convergence characteristics of the
original affine projection algorithm are maintained when regularization is applied,
while this is not the case when FAP–based fast versions of APA are used.
This algorithm has also been extended to allow for the ’sparse equations’ technique
for regularization to be used. This is a technique that regularizes the affine projection
algorithm if it is used with signals that have a large autocorrelation only for a small lag
(e.g. speech). It can be used as a stand alone technique for regularization, if a ’voice
activity detection device’ is present that can prevent the inverse correlation matrix to
become infinitely large when no far end signal is present, as it is the case in each echo
canceller.
Chapter 5
QRD–RLS based ANC
While the previous chapters were focussed on reference–based noise reduction (AEC)
(which means a reference signal for the disturbances was available, namely the loudspeaker signal), in the next chapters we will concentrate on reference–less noise reduction (ANC).
In this chapter we will derive an MMSE–optimal unconstrained filtering technique
with a complexity that is an order of magnitude smaller than the complexity of existing
noise cancellation algorithms based upon this technique. Performance will be kept at
the same level though.
The new algorithm is based upon a QRD–RLS adaptive filter. While conventional
adaptive filtering algorithms have a ’desired signal’–input, our algorithm does not require this desired signal (which is unknown for noise reduction applications). In the
next chapter we will — by thoroughly modifying the basic equations and by employing the fast QRD–LSL algorithm — reduce the complexity even by another order of
magnitude.
This chapter is organized as follows. In sections 5.2 and 5.3 we will review unconstrained optimal filtering based ANC. Then we introduce our novel approach based
upon recursive QRD–based optimal filtering in section 5.3. In section 5.4 we introduce an algorithm that provides a trade–off parameter with which one can tune the
system so that more noise reduction is obtained in exchange for some signal distortion. Finally, complexity figures and simulation results are given in sections 5.5 and
5.6.
87
88
5.1
CHAPTER 5. QRD–RLS BASED ANC
Introduction
In teleconferencing, hands–free telephony or voice controlled systems, acoustic noise
cancellation techniques are used to reduce the effect of unwanted disturbances (e.g.
car noise, computer noise, background speakers). Single microphone approaches typically exploit the differences in the spectral content of the noise signal(s) and the speech
signal(s) to enhance the input signal. Since a speech signal is highly non–stationary,
this may result in a rapidly changing filter being applied to the noisy speech signal.
A residual noise signal with continuously changing characteristics or even ’musical
noise’ will typically appear at the output. A classic example of this class of algorithms is spectral subtraction [16].
Multi–microphone techniques can additionally take into account spatial properties of
the noise and speech sources. In general an adaptive signal processing technique is
required since the room characteristics indeed change even with the slightest change
in the geometry of the environment. In literature various adaptive multimicrophone
ANC techniques have been described, all of them having their advantages and disadvantages. Griffiths–Jim beamforming [60] is a constrained optimal filtering method
that aims at adaptively steering a null in the direction of the noise source(s) (or ’jammer(s)’), while keeping a distortionless response in the direction of the speech source.
Unconstrained optimal filtering [12][13] is an alternative approach that also takes into
account both spectral and spatial information. Unlike Griffiths–Jim beamforming it
does not rely on a priori information and hence posesses improved robustness [14].
A speech–noise detection algorithm is needed and crucial for proper operation, but
will not be further investigated here, as it will be assumed that a perfect speech detection signal is available. In chapter 1 references to methods for speech/noise detection
are given.
We note that when the algorithm classifies a ’speech’ period as a ’noise only’ period (for longer time periods), this will result in signal distortion, since signal then
’leaks’ into the noise correlation matrix. This is probably a worse situation than when
a noise–period is classified as speech. In that case only the estimation of the noise
characteristics is not done at that time, and the statistics of the speech signal (spatial
characteristics) are ’forgotten’, but this is not really a problem when the misclassification only occurs during a short period.
In an unconstrained optimal filtering approach, the microphone signals are fed to an
adaptive filter. The optimal filter attempts to use all available information (also reflections coming from directions other than the speech source direction) in order to
optimally reconstruct the signal of interest. This effectively means that the spatial pattern of the filter will resemble a beamforming pattern if the reverberation in the room
is low, but that the filter will perform better than conventional beamformers under
higher reverberation conditions [12].
89
5.2. UNCONSTRAINED OPTIMAL FILTERING BASED ANC
In [12][13][15] the unconstrained optimal filtering problem was solved by means of a
GSVD (Generalised Singular Value Decomposition)–approach, while in this chapter
we will describe a QRD–based optimal filtering algorithm.
While the performance remains roughly the same, the QR–decomposition based algorithm is significantly less complex than the GSVD–based algorithm. The GSVD–
approach has a complexity of O(M 3 N 3 ) where M is the number of microphones and
N is the number of filter taps per microphone channel. A reduced complexity approximation is possible for the GSVD–approach (based on GSVD–tracking), leading
to O(27.5M 2 N 2 ) [13] . The QRD–based approach that we will derive in this paper
lowers the complexity to O(3.5M 2 N 2 ) while the performance is equal to that of the
initial GSVD–approach, and no approximation whatsoever is employed.
Noise
Noise
h1
h2
h
Noise
h
4
3
Noise
x1
w1
x2
w2
x3
w3
x4
w4
+
^
d
Speech component hereof
is desired output signal d (unknown)
Original speech signal s
Figure 5.1: Adaptive optimal filtering in the acoustic noise cancellation context.
5.2
Unconstrained optimal filtering based ANC
A typical noise cancellation setup is shown schematically in Figure 5.1 for an array
with 4 microphones. A speaker’s voice is picked up by the microphone array, together with noise stemming from sources for which no reference signal is available.
Examples are computer fans, air conditioning, other people talking to each other in
the background. The absence of a reference signal is the main difference between the
ANC techniques we will discuss in this part of this text, and the AEC techniques in
chapters 3 and 4.
The speech signal s(k) in figure 5.1 is obviously unknown. If we would aim at designing a filter that optimally reconstructs s(k) as a desired signal, then this filter would
not only have to cancel the noise in the microphone signals, but it would also have
to model the inverse of the acoustic impulse response from the position of the speech
source to the position of the microphones. We want to avoid this, since dereverbera-
90
tion is a different problem that requires different techniques. Hence, we will not use
the speech signal itself as a desired signal, but rather the speech component in one (or
each) of the microphone signals, which obviously is unknown too.
The speech component in the i’th microphone at time k is
O
di (k) = hi (k)
s(k) i = 1 . . . M,
where M is the number of microphones, s(k) is the speech signal and hi (k) represents
N
the room impulse response path from the speech source to microphone i, and
is the
convolution symbol. The i’th microphone signal is
xi (k) = di (k) + vi (k) i = 1 . . . M,
where vi (k) is the noise component (sum of the contributions of all noise sources
at microphone i). We define the filter input vector as in (2.4), which is repeated for
convenience here



x(k) = 

x1 (k)
x2 (k)
..
.
xM (k)








x1 (k) = 

x1 (k)
x1 (k − 1)
..
.
x1 (k − N + 1)



.

The noise vector v(k) and the speech component signal vector d(k) are defined in a
similar way, with x(k) = d(k) + v(k). The following assumptions are made
• The noise signal is uncorrelated with the speech signal. This results in
ε{x(k)xT (k)} = ε{v(k)vT (k)} + ε{cross terms} +ε{d(k)dT (k)}
|
{z
}
=0
T
= ε{v(k)v (k)} + ε{d(k)dT (k)}
ε{d(k)d (k)} = ε{x(k)xT (k)} − ε{v(k)vT (k)}.
T
Here ε{·} is the expectation operator.
• The noise signal is stationary as compared to the speech signal (by which we
mean that its statistics change more slowly). This assumption allows us to estimate ε{v(k)vT (k)} during periods in which only noise is present, i.e. ε{v(k)vT (k)} ∼
=
ε{v(k − ∆)vT (k − ∆)}with x(k) = v(k) + 0 during noise–only periods.
The unconstrained optimal filtering (Wiener filtering) problem is then given as
n
2 o
(5.1)
min ε xT (k)Wwf (k) − dT (k)2 ,
Wwf (k)
91
5.3. QRD–BASED ALGORITHM
where ε{•} is the expected value operator. Note that for the time being we compute the
optimal filter to estimate the speech in all (delayed) microphone signals (cfr. definition
of d(k)). We can now write the Wiener solution for the optimal filtering problem with
x(k) the filter input and d(k) the (unknown) desired filter output is then given as [30]
Wwf (k)
= (ε{x(k)xT (k)})−1 ε{x(k)dT (k)}
= (ε{x(k)xT (k)})−1 ε{(d(k) + v(k))dT (k)}
= (ε{x(k)xT (k)})−1 ε{d(k)dT (k)}
= (ε{x(k)xT (k)})−1 (ε{x(k)xT (k)} − ε{v(k)vT (k)}).
(5.2)
If all statistical quantities in the above formula were available, Wwf (k) could straightforwardly be computed with O(M 3 N 3 ) complexity. Each column of Wwf (k) provides the optimal M N –taps filter for optimally estimating the corresponding element
of d(k) from x(k), i.e.
d̂T (k) = xT (k)Wwf (k).
(5.3)
In [13] a GSVD–approach to this optimal filtering is described. The GSVD–approach
is based upon the joint diagonalisation
ε{x(k)xT (k)} = E(k)diag{σi2 (k)}E T (k)
ε{v(k)vT (k)} = E(k)diag{ηi2 (k)}E T (k),
(5.4)
which is then actually calculated by means of a GSVD–decomposition of the data
matrices (see also section 5.3). From 5.4 we get
Wwf (k) = E −T (k)diag{
σi2 (k) − ηi2 (k) T
}E (k).
σi2 (k)
In the GSVD–algorithm, only afterwards one column of Wwf is picked to serve as a
filter vector. A GSVD–approach would have a O(M 3 N 3 ) complexity, but in practice the GSVD solution is tracked or updated (this involves an approximation), which
means that the filter can be tracked in O(27.5M 2 N 2 ) flops per sample.
5.3
QRD–based algorithm
In this chapter we will present an alternative QRD–updating based approach that leads
to comparable performance (or even improved performance since it does not need an
SVD–tracking approximation to reduce complexity), but at a significantly lower cost.
In the QRD–approach we can select one single entry of dT (k) that we want to estimate, before we compute the optimal filter. The right hand side part of the corresponding LS–estimation problem will then be a vector instead of a matrix. This is the
main reason for the dramatical complexity reduction, but besides this of course QRD–
updating in itself is cheaper than GSVD–updating. In order to maintain the parallel
92
between our approach and the GSVD–procedure, we will still consider the full d(k)–
vector throughout the derivation, keeping in mind that for a practical implementation,
one would select only one element of it. As we want to track any changes in the acoustic environment, we will introduce a weighting in order to reduce the impact of the
contributions from the past. Let λs denote the forgetting factor for the speech+noise
data, which can be different from λn , the forgetting factor for the noise–only data. A
speech/noise detection device will be necessary to operate the algorithm. Since the
noise is assumed to be stationary as compared to the speech contribution, one will
often make 0 λs < λn < 1. Our scheme will be based on storing and updating an
upper triangular matrix R(k), such that RT (k)R(k) = X T (k)X(k), where we want
X T (k)X(k) to be an estimate for ε{x(k)xT (k)}. This is realized by
X T (k + 1)X(k + 1) =
λ2s X T (k)X(k) + (1 − λ2s )x(k + 1)xT (k + 1).
(5.5)
Note that this is a slightly different weighting scheme from the one that is explained in
equation (2.15), the difference being merely an overall rescaling with (1 − λ2s ). The
noise correlation matrix estimate is defined as
V T (k + 1)V (k + 1) =
λ2n V T (k)V (k) + (1 − λ2n )v(k + 1)vT (k + 1).
(5.6)
The optimal filtering solution is then obtained as
Wqr (k) = (RT (k)R(k))−1 (RT (k)R(k) − V T (k)V (k))
|
{z
}
≡P (k)
Wqr (k) = I − R−1 (k)R−T (k)P (k),
where R(k) is the Cholesky factor of X(k) and I is the identity matrix. Due to
the second assumption in section 5.2 (namely that the noise is stationary), P (k) can
be kept fixed during speech + noise periods and updated (based on formula (5.6))
during noise–only periods. RT (k)R(k) is fixed during noise only periods and updated
(based on formula (5.5)) during speech+noise periods. Note that the computed Wqr (k)
corresponds to the least squares estimation problem
2
min kD(k) − X(k)Wqr (k)k2 ,
Wqr (k)
where however D(k) = X(k) − V (k) is unknown.
Hence, Wqr (k) is a matrix of which the columns are filters that reduce the noise components in the microphone signals in an optimal way. It is clear that WqrN (k) =
I − Wqr (k) then provides a set of filters that optimally estimate the noise components
in the microphone signals with
WqrN (k) = I − Wqr (k) = R−1 (k) R−T (k)P (k) .
|
{z
}
≡B(k)
93
In the procedure described here, we keep track of both R(k) and B(k) so that at any
time WqrN (k) can be computed by backsubstitution in
R(k)WqrN (k) = B(k).
(5.7)
The only storage required is for the matrix R(k) ∈ <M N ×M N and for B(k) ∈
<M N ×M N . In fact, only one column of B(k) has to be stored and updated (cfr.
supra), thus providing a signal or noise estimate for the corresponding microphone
signal. There are two modes in which the different variables R(k) and B(k) have to
be updated, namely speech+noise–mode, and noise only–mode.
5.3.1
Speech+noise – mode
Whenever a signal segment is identified as a speech+noise–segment, P (k) is not updated (second assumption), but R(k) needs to be updated. The update formula for
R(k) is (compare to (2.16))
p
T
0
1 − λ2s xT (k)
= Q (k + 1)
,
R(k + 1)
λs R(k)
where R(k +1) is again upper triangular1 . As explained in chapter 2, this update gives
T
both the new upper triangular matrix R(k + 1) and the orthogonal matrix Q (k + 1)
containing the necessary rotations to obtain the update. Updating R(k) also implies a
change in the stored B(k) = R−T (k)P (k). In order to derive this update, we need an
expression for the update of R−1 (k). It is well known2 that the same rotations used to
updat R(k) can also be used to update R−T (k) :
0
T
∗
Q (k + 1)
=
,
1
−T
(k)
R−T (k + 1)
λs R
with ∗ a don’t care entry. Hence we have
∗
∗
=
P (k + 1)
B(k + 1)
R−T (k + 1)
∗
=
P (k)
R−T (k + 1)
0
T
= Q (k + 1)
P (k)
1
−T
(k)
λ R
s
0
T
= Q (k + 1)
.
1
B(k)
λs
1 Q(k)
2 This
= Q(k)Q(k − 1)...Q(0) does not need to be stored.
T
x̃ (k)
is easily shown starting from 0 R−1 (k)
=I
R(k)
94
Note that B(k) is weighted with λ1s which is different from the standard exponential weighting in the right hand side of QRD–based adaptive filtering. The complete
update can be written in one single matrix update equation :
0
rT (k + 1)
R(k + 1) B(k + 1)
T
Q (k + 1)
p
1 − λ2s xT (k + 1)
λs R(k)
0
1
λs B(k)
=
.
(5.8)
The least squares solution WqrN (k + 1) can now be computed by backsubstitution
(equation 5.7), but we will show later on that (using residual extraction) an estimate of
the noise can be calculated directly from r(k +1). A signal flow graph of this updating
procedure is given in Figure 5.2.
x2(k+1)
x1(k+1)
x2(k)
x1(k)
R11
R12
R13
0
0
0
r (k+1)
1
r2(k+1)
r (k+1)
3
R14
0
R22
R23
R24
R(k)
0
R33
memory cell
(delay)
1/λ x
R34
0
R44
0
memory cell
(delay)
λx
B(k)
Figure 5.2: Updating scheme for signal+noise mode. On the top left new input vectors enter
(2 channels, and 2 taps per channel). Rotations are calculated and fed to the right hand side
which is updated with 0’s as input.
5.3.2
95
Noise only–mode.
In the noise–only case, one has to update
B(k) = R−T (k)P (k) = R−T (k)V T (k)V (k),
while R(k) is obviously kept fixed. From equation (5.6) and the fact that in noise–only
mode R(k + 1) = R(k), we find that
p
p
B(k + 1) = λ2n B(k) + (R−T (k + 1) 1 − λ2n v(k + 1)) 1 − λ2n vT (k + 1)).
p
Given R(k + 1), we can compute (R−T (k + 1) 1 − λ2n v(k + 1)) by a backsubstitution. By using an intermediate vector a(k + 1) :
p
RT (k + 1)a(k + 1) = 1 − λ2n v(k + 1).
p
A simple multiplication a(k + 1) 1 − λ2n vT (k + 1) now gives
p the update for all
columns of B(k + 1), i.e. B(k + 1) = λ2n B(k) + a(k + 1) 1 − λ2n vT (k + 1).
As already mentioned, R(k) is not updated in this mode, so in figure 5.2 only the
framed black boxes (memory cells in the right hand part) are substituted with the
corresponding elements of B(k + 1).
Note again that, while in the GSVD–based method, all columns of Wgsvd (k + 1) are
calculated, and afterwards one of them is selected (arbitrarily) to provide one specific
speech signal estimate, the QRD–based method allows one to choose one column
(signal) on beforehand, and do all computations for only that one column.
5.3.3
Residual extraction
From (2.24) and (5.8) it can be shown that if x(k + 1) belongs to a signal+noise–
period, the estimate for the noise components v̂(k + 1) in the microphone signals can
be written as
v̂T (k + 1)
= xT (k + 1)WqrN (k + 1)
p
−(0 − 1 − λ2s xT (k + 1)WqrN (k + 1))
p
1 − λ2s
QM N
cos θi (k + 1)rT (k + 1)
p
= − i=1
,
1 − λ2s
which means that an estimate of the noise component is obtained as a least squares
residual with a 0 right hand side input. This is exactly the type of right hand side input
applied in speech+noise mode updates (section 5.3.1). To obtain the signal estimates
d̂(k+1), the noise estimates then have to be subtracted from the reference microphone
96
signal
d̂T (k + 1)
= xT (k + 1)(I − WqrN (k + 1))
= xT (k + 1) − v̂T (k + 1)
QM N
cos θi (k + 1)rT (k + 1)
p
= xT (k + 1) + i=1
.
1 − λ2s
(5.9)
x2(k+1)
x1(k+1)
x2(k)
x1(k)
0
0
0
1
0
R11
R12
R13
R14
0
0
R22
R23
R24
0
0
R33
R34
0
0
R44
0
memory cell
(delay)
noise estimates
memory cell
λx
(delay)
1/λ x
Figure 5.3: Signal flow graph for residual extraction.
In this setting, the system does not generate any output during noise–only mode, since
in the absence of an input vector for the left part of the signal flow graph 5.2 (see section 5.3.1), no rotation parameters are generated, so no residual extraction is possible.
In several applications this would be required though. It is perceived as being disturbing when the output signal is exactly zero, often some ’comfort noise’ is preferred.
Also if the voice activity detector can not be trusted, and if it does not detect a speech
signal during a speech+noise segment, the output of the algorithm would remain zero.
If we want to generate an output signal during segments that the voice activity detector
identifies as noise–only segments, we can execute a residual extraction procedure as
in the speech+noise–mode producing a priori error signals, be it without updating the
5.4. TRADING OFF NOISE REDUCTION VS. SIGNAL DISTORTION
97
R(k) and B(k) (’frozen mode’).
0 rT (k + 1)
∗
∗
p
1 − λ2s vT (k + 1)
Q (k + 1)
λs R(k)
T
0
1
λs B(k)
=
.
This will of course increase the complexity in noise–only mode, but since the updates
need not to be calculated completely (only the rotation parameters and the outputs),
the extra complexity will be about half the complexity in speech+noise–mode. The
end result will be that the complexity in noise–only mode will become about equal
to the complexity in speech+noise mode, and that the maximum complexity of the
algorithm does not rise. For real time processing, this maximum complexity is the
most important.
5.3.4
Initialization
The upper triangular matrix R(0) may be initialized with a small number η on its
diagonal. This is required for the QRD–updating algorithm to start. This initialization
corresponds to an initial estimation of the speech+noise covariance matrix equal to η 2 I
(white noise with variance η 2 ). Due to the exponential weighting in the algorithm, the
influence of the initialization will be negligable after a number of samples.
5.3.5
Algorithm description
In this algorithm description, we choose to estimate the speech signal d1 (k) in the first
microphone signal. This means that also the right hand side consists of only the first
column b(k) of B(k). In Algorithm 13, an output signal is also generated during
noise–only periods as described above.
5.4
Trading off noise reduction vs. signal distortion
In many applications some distortion in the speech signal can be allowed, and hence it
is possible to obtain ’more than optimal’ noise reduction in exchange for some signal
distortion. We will introduce a parameter that can be used to tune this trade–off.
This parameter will take the form of a regularization parameter in the Wiener filter
equation.
98
Algorithm 13 QRD–RLS based ANC
R = 0.0001I
Loop :
if speech+noise
p
0
2
x
(k + 1) = 1 − λs x(k+ 1)
0
r(k + 1)
=
R(k + 1) b(k + 1)
0T
x (k + 1)
0
T
Q (k + 1)
1
λs R(k)
λs b(k)
output = x1 (k + 1) +
r(k+1)
QM N
i=1
√
cos θi (k+1)
1−λ2s
if noise--only
v(k + 1) = x(k + 1)
Calculate u(k + 1) from
RT (k + 1)u(k + 1) = v(k + 1)
2
2
b(k + 1) = λp
n b(k) + x1 (k + 1)(1 − λn )u(k + 1)
0
2
x (k + 1) = 1
− λn x(k + 1)
0 r(k + 1)
=
∗
∗
0T
T
x (k + 1)
0
Q (k + 1)
1
λs R(k)
λs b(k)
QM N
i=1 cos θi (k+1)
ˆ + 1)= x1 (k + 1) + r(k+1) √
d(k
1−λ2s
5.4. TRADING OFF NOISE REDUCTION VS. SIGNAL DISTORTION
5.4.1
99
Regularization
In practice, an additional design parameter is often introduced to the unconstrained
optimal filtering approach to obtain more noise reduction than achieved by the standard unconstrained optimal filtering scheme. The result will be an increase in signal
distortion, but for a lot of applications this is not necessarily harmful. In [22], an alternative to the MMSE–criterium is derived. We use a similar, but slightly different
criterium :
2
2
min (ε{xT (k)W̃qr (k) − d(k)} + µ ε{vT (k)W̃qr (k)} ).
F
W̃qr (k)
(5.10)
F
The first term in the minimization criterium accounts for the signal distortion, the
second one for the noise reduction. The parameter µ2 can be used to trade off noise
reduction versus signal distortion. This leads to
W̃qr (k) = (ε{x(k)xT (k)} + µ2 ε{v(k)vT (k)})−1 (ε{x(k)xT (k)} − ε{v(k)vT (k)}).
The tradeoff parameter translates into a regularization term in the Wiener filter equation. In a deterministic setting, this leads to
W̃qr (k) = (X T (k)X(k) + µ2 V T (k)V (k))−1 (X T (k)X(k) − V T (k)V (k)). (5.11)
5.4.2
Speech+noise mode
We return to the QRD–framework, by defining
R̃T (k)R̃(k) = X T (k)X(k) + µ2 V T (k)V (k).
We will now track the Cholesky factor R̃(k) of X T (k)X(k) + µ2 V T (k)V (k) instead
of the Cholesky factor R(k) of X T (k)X(k). Noting that
X(k)
ε [X(k) µV (k)]T
= X T (k)X(k) + µ2 V T (k)V (k),
µV (k)
it is obvious that this can be done by applying two updates instead of one to the left
hand side of Figure 5.3 in each time step. First, an update is done with the microphone
input vector, as explained in section 5.3.1, and then a second update is done with µ
times a noise input vector that we have stored in a noise buffer. This noise buffer
consists of successive input vectors from previous noise–only periods.
In section 5.3.2, RT (k) is needed to perform a backsubstitution step. Since in this
case we only have access to R̃T (k), we have to rewrite equation (5.11) somewhat :
100
(RT (k)R(k) + µ2 V T (k)V (k)) W̃qr (k) = RT (k)R(k) − V T (k)V (k)
|
{z
}
R̃T (k)R̃(k)
= (RT (k)R(k) + µ2 V T (k)V (k)) −
V T (k)V (k) − µ2 V T (k)V (k)
= R̃T (k)R̃(k) − (1 + µ2 )V T (k)V (k),
˜ −T (k)(V T (k)V (k)) .
W̃qr (k) = I − (1 + µ2 )R̃−1 (k)R
|
{z
}
W̃qrN
Written in the same form as (5.7), we obtain
R̃(k)W̃qrN (k) = B̃(k).
(5.12)
So since we store R̃(k) instead of R(k), we now have to update
B̃(k) = R̃(k)−T (1 + µ2 )(V (k)T V (k)).
The full procedure in speech+noise
mode is as follows : first weight R̃(k) with λs ,
p
and then update it with 1 − λ2s x(k + 1) in order to obtain R̃0 (k).
The rotation parameters that are generated by this update are used together with zeros
applied to the top of the signal flow graph to update B̃(k) to B̃ 0 (k) :
B̃ 0 (k) = R̃0 (k)−T (1 + µ2 )(V (k)T V (k)).
This corresponds to the original scheme of section 5.3.1. We let this step generate a
residual, and we use it as an output of the noise filter.
p
Then R̃0 (k) is updated to R̃(k + 1) by applying µ 1 − λ2s v(k + 1) to the left hand
part of the signal flow graph, and the rotation parameters are again used to update
B̃ 0 (k) to B̃ 00 (k) :
B̃ 00 (k) = R̃−T (k + 1)(1 + µ2 )(V T (k)V (k)).
The residual signal generated in this step should be discarded.
It is possible to update the factor V T (k)V (k) during noise–only mode. In that case
B̃(k + 1) = B̃ 00 (k). Another option consists in performing also these updates with
noise vectors from the noise–buffer during speech+noise mode. The update from
B̃ 00 (k) to B̃(k + 1) is performed as
B̃(k + 1)
= λ2s B˜00 (k) + (R̃−T (k + 1)(1 − λ2s )(1 + µ2 )v(k + 1))vT (k + 1)).
Where v(k + 1) is taken from the noise buffer. This can again be calculated using a
backsubstitution followed by a multiplication as explained in section 5.3.2.
101
5.5. COMPLEXITY
5.4.3
Noise–only mode
The factor V T (k)V (k) in the right hand side of equation (5.12) can also be updated
during noise–only periods. In that case, the algorithm proceeds exactly as in section
5.3.2 during noise–only mode. During noise–only mode, also the input vectors must
be stored into the noise buffer3 .
Algorithm 14 gives a complete specification.
5.5 Complexity
In noise-only mode, the complexity for the unregularized algorithm (section 5.3) is
(M N )2 + 3M N + M
flops per sample if no output signal is generated during noise periods, or
3(M N )2 + 16M N + 2
if an output signal is generated during noise–only periods. In speech+noise mode, the
number of flops per sample is
3.5(M N )2 + 15.5M N + M + 2.
In these calculations, one flop is one addition or one multiplication. These figures apply when only one filter output is calculated. This can be compared to the complexity
of a recursive version of the GSVD–based optimal filtering technique [12] [13], which
is
O(27.5(M N )2 )
flops per sample. For a typical setting of N = 20 and M = 5, we would obtain 36557
flops per sample for the QRD–based method as compared to 275000 flops per sample
for the GSVD–based method, which amounts to an 8–fold complexity reduction4 .
For the regularized algorithm of section 5.4, the complexity will be doubled during
speech+noise mode (O(7(M N )2 ) compared to the unregularized optimal filtering
scheme of section 5.3. The complexity during noise–only mode remains the same.
The algorithms have been implemented in real time on a Linux PC (PIII, 1Ghz). For
just–real time performance, this leads to a maximum of 3 channels with 10 filter taps
per channel for the GSVD–based algorithm. The unregularized QRD–based algorithm
3 If memory would be too expensive, an alternative is to use white noise instead of buffered noise vectors.
This will probably lead to more signal distortion, but experiments show that it is still a valid alternative.
4 If complexity would be prohibitive for some applications, the QRD–RLS based algorithm can be used
to generate a noise estimate with relatively few filter taps. This estimate can then be fed to a second stage,
similarly to [11].
102
Algorithm 14
ˆ + 1) is the resulting speech signal
QRD–based ANC with trade–off parameter. d(k
estimate.
R = 0.0001I
Loop :
if speech+noise
p
x0 (k + 1) = 1 − λ2s x(k + 1)
0
r(k + 1)
T
= Q (k + 1)
0
0
R̃ (k)
b̃ (k)
ˆ + 1) = x1 (k + 1) +
d(k
r(k+1)
QM N
i=1
√
x0T (k + 1)
λs R̃(k)
1
λs
0
b̃(k)
cos θi (k+1)
1−λ2
s
v(k + 1) = next noise--vector
from noise--buffer
p
0 (k + 1) = µ 1 − λ2 v(k + 1)
v
s 0
r(k + 1)
R̃(k + 1) b̃(k + 1)
0T
v (k + 1)
T
= Q (k + 1)
R̃0 (k)
0
b̃0 (k)
R̃T (k + 1)u(k + 1) = v(k + 1),
backsubstitution gives u(k + 1)
(if no noise updates during noise--only)
b(k + 1) = λ2s b(k) + v1 (k + 1)(1 − λ2s )(1 + µ2 )u(k + 1)
(if no noise updates during noise--only)
if noise--only
Push input vector v(k + 1) = x(k + 1)
in noise--buffer
RT (k + 1)u(k + 1) = v(k + 1),
backsubstitution gives u(k + 1)
(if noise updates during noise--only)
b(k + 1) = λ2n b(k) + x1 (k + 1)(1 + µ2 )(1 − λ2n )u(k + 1)
(if noise updates during noise--only)
p
x0 (k + 1) = 1 − λ2s x(k + 1)
0T
T
x (k + 1)
0
0 r(k + 1)
= Q (k + 1)
1
λs R(k)
b(k)
∗
∗
λs
ˆ + 1) = x1 (k + 1) +
d(k
r(k+1)
QM N
i=1
√
cos θi (k+1)
1−λ2
s
!
5.6. SIMULATION RESULTS
103
we have proposed here, when implemented in the time domain, allows for 3 channels
with 30 filter taps per channel. The theoretical complexity figures are confirmed :
the filter lengths can be made three times longer than for the reference setup with the
GSVD–based algorithm (this is indeed expected because of the quadratic complexity,
2
i.e. 27.5
3.5 ≈ 3 ), while the performance is the same. A subband implementation (16
subbands, 12–fold downsampling) of the QRD–based algorithm allows to use 15 taps
per subband in 3 channels, which comes down to an equivalent of 12.3.15 = 540 filter
taps.
5.6
Simulation results
Theoretically, the GSVD–based approach and the QRD–based approach solve the
same problem. We will show some subtle differences between the practical implementations of the GSVD– and the QRD–based techniques. The conclusion will be
that also in a practical implementation the behaviour of the GSVD– and the QRD–
based algorithm is roughly the same, and hence for the performance results for the
QRD–based technique, we can refer to the literature about the GSVD–based technique [14, 12].
Note that all covariance matrices in the above equations and algorithms should be
positive definite. The clean speech signal typically exists in a subspace of the input
space. Hence a number of eigenvalues may be zero in the difference ε{x(k)xT (k)} −
ε{v(k)vT (k)}. A practical estimator however will never obtain exact zeroes for the
eigenvalues, and may even produce negative values for the estimation of the eigenvalues of ε{x(k)xT (k)} − ε{v(k)vT (k)}. In the GSVD–approach, direct access to
the singular values is possible, and the negative eigenvalues can be corrected to be
zero. This is not possible in the QRD–based approach. This difference is most of all
seen for short estimation windows. For longer estimation windows, the QRD– and
GSVD–results become roughly equal. On the other hand, the GSVD–approach has to
incorporate an approximation in order to achieve quadratic complexity. We will also
show the influence of this approximation.
The speech signal is a sentence that is repeated four times. Speech+noise versus noise–
only periods were marked manually. Reverberation was added by a simulated acoustical environment (acoustic path of 1000 taps, sampling frequency 8000 Hz). The
speech source is located at about 6◦ from broadside, at 2.8meters from the microphone array. The (white) noise source is located at about 54◦ from broadside at 2.2
meters from the array. The microphone array consists of 4 microphones, spaced 20
cm each, the filters have 40 taps per channel. The first column of W (k) is selected for
signal estimation. During each utterance of the sentence, the volume decreases, so the
SNR is not constant. Figure 5.4 shows the difference between the QRD–based method
and the GSVD–based method for a short estimation window. After the first speech utterance, in the beginning of the noise–only period, the convergence is clearly visible.
104
The QRD–based method has less distortion because of the approximation used in the
GSVD–approach [14], while the GSVD–based method obtains more noise reduction
due to the ’corrected singular value’ estimates. Figure 5.5 compares both methods
without applying the ’corrections’ to the singular values in the GSVD–method. In
that case, the results are quite similar, and the QRD–based method performs slightly
better (both concerning distortion and noise reduction) because of the tracking approximation in the calculation of the SVD. In Figure 5.6 the noise estimation window
is made longer (λn = 0.99995), and the corrections are applied in the GSVD–based
algorithm. The figure shows that in spite of this, the performance of both algorithms
is almost the same.
105
Signal energy
−30
Energy
(dB)
−40
−50
−60
QRD−
output
−70
Noise
at mic 1
−80
−90
Speech
at mic 1
GSVD−output
negative ev set to zero
2
4
6
8
10
12
14
4
x 10
Time (samples)
Signal energy
Energy −32
(dB)
−34
Speech
at mic 1
−36
QRD−
output
−38
GSVD−output
negative ev set to zero
−40
−42
−44
3.6
3.8
4
4.2
4.4
4.6
4.8
Time (samples)
5
4
x 10
Figure 5.4: Four utterances of the same sentence. The clean speech signal and the noise signal
at microphone 1 are plotted (simulation). The GSVD–result is better for this case (λn = 0.9997
and λs = 0.9997) than the QRD–result because the negative eigenvalues can be set to zero in
the GSVD–method. As shown on the detail below, the distortion is less for the QRD–method
though.
106
Signal energy
Energy
−30
(dB)
−40
−50
−60
−70
GSVD
−80
QRD
−90
2
4
6
8
10
12
14
4
Time (samples)
x 10
Figure 5.5: Comparison between GSVD–based and QRD–based unconstrained optimal filtering when the correlation matrices are not corrected in the GSVD–approach. Again, λn =
0.9997 and λs = 0.9997. The cheaper QRD–method performs slightly better because of
the approximation used in the GSVD–approach. So the performance of the QRD and GSVD–
algorithms can be considered ’almost equal’
107
Signal energy
Energy
(dB)
−30
−40
−50
−60
−70
−80
−90
2
4
6
8
10
12
14
4
Time (samples)
x 10
Figure 5.6: QRD–approach versus GSVD–approach with longer estimation window : the difference between both algorithms vanishes, even when the eigenvalues are corrected in the GSVD–
approach. (λn = 0.99995 and λs = 0.9997)
Signal energy
Energy
(dB)
−40
−50
−60
−70
No trade−off
−80
−90
Trade off parameter = 2
−100
0.8
0.9
1
1.1
1.2
1.3
5
Time (samples)
x 10
Figure 5.7: When a trade off (regularization) parameter is introduced, even more noise reduction is can be achieved, in exchange for some signal distortion. The upper line is always
the algorithm output without the trade off parameter, the lower line with a trade off parameter
µ = 2.
108
The result of introducing regularization (see section 5.4) is clearly shown in Figure
5.7. The upper line in the plot is the algorithm output energy for the original algorithm (without regularization), while the lower line shows the output energy when a
regularization parameter µ = 2 is chosen. There is more noise reduction (as can be
seen in the ’valleys’ of the graph, while also some signal distortion is introduced (as
can be seen at the peaks of the graph).
5.7
Conclusion
In this chapter, we have derived a new QRD–based algorithm for multichannel unconstrained optimal filtering with an “unknown” desired signal, and applied this to
the ANC problem. The same basic problem is solved as in related algorithms that are
mostly based upon singular value decompositions. Our approach results in at least
an equal performance. In the GSVD–based algorithm approximate GSVD tracking is
used, but since these approximations are not present in the QRD–based algorithm, the
performance is often even better.
The major advantage of the QRD–based optimal filtering technique is that its complexity is an order of magnitude lower than that of the (approximating) GSVD–based
approaches.
We have also introduced a trade–off parameter in the QRD–based technique that allows for obtaining more noise reduction in exchange for some signal distortion.
Chapter 6
Fast QRD–LSL–based ANC
The QRD–based unconstrained optimal filtering ANC algorithm we have presented
in the previous chapter allows for a complexity reduction of an order of magnitude
compared to existing unconstrained optimal filtering approaches based on GSVD–
computation and tracking. However, complexity still is quadratic in both the filter
length and the number of channels, and since in typical applications the filter length is
often a few tens–hundreds of taps, this complexity can still be prohibitive. In standard
QRD–RLS adaptive filtering, the shift structure of the input signal is exploited in order
to obtain an algorithm (QRD–LSL) that is linear in the filter length. In this chapter,
we will show how we can also apply this to the QRD–based algorithm of chapter 5
too.
This is not straightforwardly achieved though, since in the previous chapter’s algorithm access to the upper triangular cholesky factor was necessary during noise only
periods in order to calculate the update of the right hand side. In a QRD–LSL–based
algorithm, this matrix is not explicitly present anymore.
We will propose a QRD–Least Squares Lattice (QRD–LSL) based unconstrained optimal filtering algorithm for ANC that obtains again the same performance as the
GSVD– or QRD–RLS–approach (chapter 5) but now at a dramatically reduced complexity. As mentioned before, if M is the number of microphones, and N the number
of filter taps applied to each microphone signal, then the GSVD–based approach has
a complexity of O(M 3 N 3 ). An approximate GSVD–solution (which uses GSVD–
tracking) still requires O(27.5M 2 N 2 ) flops per sample. The QRD–RLS based solution of chapter 5 reduces this complexity to O(3.5M 2 N 2 ). The algorithm presented in this chapter has a complexity of O(21M 2 N ). For typical parameter settings (N = 50, M = 2), this amounts to an up to 50–fold complexity reduction
when compared to the approximative GSVD–solution, and a 8–fold complexity reduction compared to the QRD–RLS–based algorithm. Our algorithm is based on a
109
110
CHAPTER 6. FAST QRD–LSL–BASED ANC
numerically stable fast implementation of the RLS–algorithm (QRD–LSL), applied to
a reorganized version of the QRD–RLS–based algorithm of chapter 5.
In section 6.1, we describe the data model that is used for this algorithm. Then a
QRD–RLS algorithm which is a modified version of the algorithm in chapter 5 is derived in section 6.2, and this is worked into a QRD–LSL algorithm in section 6.3.
In section 6.4, the transitions between modes are studied in detail, in section 6.5, a
regularization parameter is introduced which — similarly to the regularization factor
in the previous chapter — can be used to trade off noise reduction versus signal distortion. In section 6.6, complexity figures are given, and in 6.7 simulation results are
described. Conclusions are given in section 6.8.
6.1
Preliminaries
In order to show the analogy between the QRD– and GSVD– based methods, we
will again derive the algorithms with a matrix W (k) of which the columns are the
filter vectors and a vector d(k) of which the elements are the desired signals, but
one should keep in mind that for a practical implementation only one column of the
matrix has to be calculated. It turns out that the method described in chapter 5 can not
be straightforwardly modified into a QRD–LSL fast implementation. By reorganizing
the problem, we can indeed substitute a QRD–LSL algorithm.
The fast RLS schemes that are available from literature are based upon the requirement that the input signal has a shift structure, which means that each input vector
should be a shifted version of the previous input vector. A large number of computations can then be avoided, and ’re–used’ from previous time instants. As a result of
the complexity reduction, the matrix R(k) is not explicitly available in the algorithm
anymore. In the QRD–based algorithm of chapter 5, the right hand side noise covariance matrix is updated during noise–only periods (section 5.3.2), and the matrix R(k)
is needed in order to do this, since a backsubstitution step is required. If we want
to obtain a fast algorithm, we will have to come up with a way to update the noise
correlation matrix without needing access to R(k).
In order to derive a fast algorithm, we first have to reorder the contents of our input
vectors. They are redefined as in (2.5), repeated here for convenience


x1 (k)


..


.




xM (k)
,
x(k) = 


x1 (k − 1)




..


.
xM (k − N + 1)
which clearly does not have an impact on the algorithms of chapter 5.
111
6.1. PRELIMINARIES
Due to the second assumption in section 5.2, we can attempt to estimate ε{x(k)xT (k)}
during speech+noise–periods, and ε{v(k)vT (k)} in noise–only periods. We will
make use of a weighting scheme in order to provide the ability to adapt to a changing
environment. The input matrices X(k) and V (k) are defined as in (5.5) and (5.6). The
Wiener–solution is then estimated as
W (k)
= (RT (k)R(k))−1 (RT (k)R(k) − V T (k)V (k))
= I − R−1 (k) R−T (k)V T (k)V (k),
|
{z
}
(6.1)
B(k)
|
{z
W N (k)
}
where I is the identity matrix.
Let us now consider the following least squares estimation problem where the upper
rows (with X(k)) represent weighted inputs from speech+noise–periods and the lower
rows (with V (k)) represent weighted inputs from noise–only periods :
2
X(k)
0
N
.
W (k) −
(6.2)
min 1
N
V
(k)
βV
(k)
W (k)
β
The normal equations for this system are
(X T (k)X(k) + β 2 V T (k)V (k))W N (k) = (V T (k)V (k)),
such that
W N (k)
=
((X T (k)X(k) + β 2 V T (k)V (k)))−1 (V T (k)V (k))
|
{z
}
(6.3)
T (k)R (k)
Rβ
β
= Rβ−1 (k) Rβ−T (k)(V T (k)V (k)),
|
{z
}
Bβ (k)
with Rβ ∈ <M N ×M N and upper triangular. Clearly, for the limiting case of β going to
N
zero (indicated by β → 0), Rβ→0 (k) = R(k) and Bβ→0 (k) = B(k), so Wβ→0
(k) =
N
I − W (k), which means that W (k) may be computed as W (k) = I − Wβ→0 (k).
We will now provide a QRD-based algorithm that makes use of this feature and that
is based on storing and updating the triangular factor Rβ→0 (k) = R(k) as well as the
right hand side Bβ→0 (k) = B(k). From now on, we focus on (6.2) and (6.3), keeping
in mind that a desired signal estimate is then obtained as
d̂T (k)
N
= xT (k)(I − Wβ→0
(k))
N
= xT (k) − xT (k)Wβ→0
(k)),
N
where xT (k)Wβ→0
(k) in fact corresponds to an estimate of the noise–contribution in
T
x (k). Note that our updating formulae will be reorganized such that the β does not
appear anywhere, hence can effectively be set to zero.
112
6.2
Modified QRD–RLS based algorithm
Referring to (6.1), the algorithm will be based on storing and updating only Rβ→0 (k)
and Bβ→0 (k). In a second step (section 6.3), this will be turned into a QRD–LSL
based algorithm. We know from (6.2) that there will be two modes of updating,
depending on the input signal p
being classified as speech+noise or noise–only. In
speech+noise–mode we apply 1 − λ2s x(k + 1) as an input
pto the left hand side ,
and 0 to the right hand side. During noise–only periods, β 1 − λ2n v(k+1) should
be applied to the left hand side of the SFG, and β1 v(k+) to the right hand side with
β → 0.
Assume that at time k, a QR–decomposition is available as follows
X(k)
βV (k)
0
1
β V (k)
= Q(k)
β→0
R(k) B(k)
0
∗
,
(6.4)
N
where Q(k) is not stored. From this equation, Wβ→0
(k) can be computed as
N
Wβ→0
(k) = R−1 (k)B(k).
Our algorithm will however be based on residual extraction (section 2.3.3), and hence
N
Wβ→0
(k) will never be computed explicitly.
6.2.1
Speech+noise–mode
The update formula for the speech+noise–mode is derived as follows, where X(k) is
updated based on formula (5.5), while V (k) is kept unchanged, i.e. V (k+1) = V (k) :
X(k + 1)
βV (k + 1)
If we define β̃ =
X(k + 1)
βV (k + 1)
1
βV
β
λs ,
0
(k + 1)
 p
β→0
=
1 − λ2s xT (k + 1)
λs X(k)
βV (k)

0

0
1
V
(k)
β
.
β→0
we obtain
0
1
V
(k
+ 1)
β
=
β→0
=
 p

1 − λ2s xT (k + 1)
0
1


λs X(k)
0


λs
1 1
λs β̃V (k)
V
(k)
λs β̃
β̃→0
 p

1 − λ2s xT (k + 1)
0
1
0
1

λs R(k)
B(k)  .
λs
0 Q(k)
0
∗
113
6.2. MODIFIED QRD–RLS BASED ALGORITHM
mic 1
1−λ2s ~
x1(k+1)
x~2(k+1)
mic 2
~
x1(k)
∆
∆
x~1(k−1)
∆
x~2(k)
∆
∆
x~2(k−1)
x~1(k−2)
~
x2(k−2)
∆
0
0
1
0
0
R11
Z 11
Z 12
Z21
Z22
0
R22
0
0
R33
0
0
R44
0
0
R55
0
0
R66
0
0
R77
0
0
R88
0
Πcosθ
ε1
ε2
LS residual
Speech+noise periods :
delay
1/λs
Givens−rotations
delay
λ
s
(a) Givens–rotations during speech+noise periods
Figure 6.1: QRD–RLS based optimal filtering for acoustic noise suppression.
speech+noise–periods, Givens–rotations are used.
During
114
mic 1
1−λ2n ~
x1(k+1)
x~2(k+1)
mic 2
~
x1(k)
∆
∆
~
x1(k−1)
∆
~
x2(k)
~
x2(k−1)
∆
∆
∆
~
x1(k−2)
~
x2(k−2)
N
N
N
N
L
L
L
L
L
L
L
NO
NO
NO
NO
LM
LM
LM
LM
LM
LM
LM
JK
O
NN O
NN O
NN O
NN ONNOONN LLM M
LL M
LL M
LL M
LL M
LL M
LL MLLMMLL K
JJ KJKKJJ KJKKJJ KJKKJJ KJKKJJ HHIIHHI IHHIIHH IHHIIHH IHHIIHH IHHIIHH IHHIIHH IHHIIHH IHHIIHH ?>??>> ?>??>> ?>??>> ?>??>> ?>??>> @@AA@@A A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ A@@AA@@ CBCCBB CBCCBB CBCCBB CBCCBB CBCCBB DDEEDDE EDDEEDD EDDEEDD EDDEEDD EDDEEDD EDDEEDD EDDEEDD EDDEEDD GFFGGFF GFFGGFF GFFGGFF GFFGGFF GFFGGFF
O
O
O
O
M
M
M
M
M
M
M
K
R11
Z 11
Z 12
0
Z
P
PPP P
PPP P
PPP P
PPP P
PPP P
PPP P
PPP PQQPPP SRRSR SSRRR SSRRR SSRRR TUTTUT TUUTTT TUUTTT TUUTTT TUUTTT TUUTTT TUUTTT TUUTTT =<<=< ==<<< ==<<< ==<<< 232232 233222 233222 233222 233222 233222 233222 233222 10010 11000 11000 11000 &'&&'& &''&&& &''&&& &''&&& &''&&& &''&&& &''&&& &''&&& $%%$$$ $%%$$$ 21$%%$$$ $%%$$$ $%%$$$
Q
Q
Q
Q
Q
Q
Q
Q
QQ
QQ
QQ
QQ
QQ
QQ
QQ S SSS U UUUUUUU = === 3 3333333 1 111 ' ''''''' %%%%%
R22
Z22
0
XY
Y
XX YXYYXX YXYYXX YXYYXX YXYYXX VWWVVW WVWWVV WVWWVV WVWWVV WVWWVV WVWWVV WVWWVV WVWWVV ;::;;:: ;::;;:: ;::;;:: ;::;;:: ;::;;:: 455445 545544 545544 545544 545544 545544 545544 545544 /..//.. /..//.. /..//.. /..//.. /..//.. ())(() )())(( )())(( )())(( )())(( )())(( )())(( )())(( #"##"" #"##"" #"##"" #"##"" #"##""
Y
R33
0
Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ Z[ZZ 889 988 988 988 6667 6766 6766 6766 6766 6766 6766 6766 ,,- -,, -,, -,, ***+ *+** *+** *+** *+** *+** *+** *+** !!!!!
[[Z[[Z [[Z [[Z [[Z [[Z [[Z [[Z 989 998 998 998 767 776 776 776 776 776 776 776 -,- --, --, --, +*+ ++* ++* ++* ++* ++* ++* ++* !!!!!!!!!!
R44
0
R55
Noise−only :
0
2
delay λ n
R66
0
R77
Gauss transformations, left hand side
0
Gauss−transformations, right hand side
R88
0
contain elements of R(k)
Figure 6.2: QRD–RLS based optimal filtering for acoustic noise suppression. During noise–
only–periods, Gauss–rotations are used.
115
This means (from (6.4)) that the updated R(k + 1) and B(k + 1) may be obtained
based on a standard QRD–updating
T
Q (k + 1)
p
0
rT (k + 1)
R(k + 1) B(k + 1)
1 − λ2s xT (k + 1)
λs R(k)
0
1
λs B(k)
=
.
(6.5)
A signal flow graph representation is given in Figure 6.1. For this example, the right
hand side part of this signal flow graph has only 2 columns, estimating the speech
components in x1 (k) and x2 (k) only. While the left hand side has a weighting with
λs , the right hand side part has weighting with λ1s which is shown by means of black
squares in boxes. As shown in equation (6.6), we do not have to calculate the filter
vector in each step, but we can use residual extraction (2.24) in order to obtain the
estimate of the speech signal
d̂(k + 1) = x(k + 1) +
r(k + 1)
QM N
cos θi (k + 1)
pi=1
.
1 − λ2s
(6.6)
Note that (6.5) and (6.6) are the same as (5.8) and (5.9). The update formulas for the
noise–only case however, will be different from the formulas in chapter 5.
6.2.2
Noise–only mode
The update formula for the noise–only mode is derived as follows, where now X(k)
is kept unchanged, i.e. X(k + 1) = X(k), while V (k) is implicitly updated as
p
1 − λ2n vT (k + 1)
.
V (k + 1) =
λn V (k)
It is convenient to redefine/reorder V (k + 1) as
V (k + 1) =
λn V (k)
1 − λ2n vT (k + 1)
p
,
116
leading to





=
=
X(k + 1)
βV (k + 1)
X(k)
βλn V (k)
|{z}
0
1
V
(k
+ 1)
β
β→0
0
1
2
λ
βλn n V (k)
˜
p β̃
β 1 − λ2n vT (k + 1)
Q(k) 0
0
1


1
β
p
1 − λ2n vT (k + 1)
R(k)
0
p
2
β 1 − λn vT (k + 1)





β→0
˜
β̃ → 0

λ2n B(k)

∗
.
p
1
T
2
β→0
1
−
λ
v
(k
+
1)
n
β
˜
β̃ → 0
This means that the updated R(k + 1) and B(k + 1) may be obtained based on a
QRD–updating
R(k + 1) B(k + 1)
=
(6.7)
0
rT (k + 1)
R(k)
λ2n B(k)
T
p
p
Q (k + 1)
,
β 1 − λ2n vT (k + 1) β1 1 − λ2n vT (k + 1)
N
where Q(k + 1) is not stored and from which Wβ→0
(k + 1) can again be computed
as
N
Wβ→0
(k + 1) = R−1 (k + 1)B(k + 1).
Note that β and β1 now appear explicitly in the QRD–updating formula. As we are
interested in the case β → 0, we have to work the formulas into an alternative form
where β does not appear explicitly. The end result will be that orthogonal Givens
transformations are replaced by Gauss–transformations, and that the input vector will
be
p
p
1 − λ2n vT (k + 1)
1 − λ2n vT (k + 1)
instead of
β
p
1 − λ2n vT (k + 1)
1
β
p
1 − λ2n vT (k + 1) .
It will also be shown that the elements of the matrix R(k) are not changed by the
updating during noise–only periods.
We consider the first orthogonal Givens rotation that is computed in the top left
hexagon of the signal flow graph in Figure 6.1.
R11 (k + 1)
0
=
cos θ1 (k)
− sin θ1 (k)
sin θ1 (k)
cos θ1 (k)
p R11 (k)
β 1 − λ2n v1 (k + 1)
.
117
Here vj (k) denotes the j’th component of vector v(k). Since β → 0, we can write
p
β 1 − λ2n v1 (k + 1)
tan θ1 (k) =
R11 (k)
p
β 1 − λ2n v1 (k + 1) sin θ1 (k)|β→0 ≈
(6.8)
R11 (k)
β→0
≈ 1.
cos θ1 (k)|β→0
Hence we have


−
β
√
1
β
√
1−λ2n v1 (k+1)
R11 (k)
1−λ2n v1 (k+1)
R11 (k)
which is equivalent to
R11 (k + 1)
=
0
1
√
−
R11 (k + 1)
0
=


1
0
1−λ2n v1 (k+1)
R11 (k)
(6.9)
β→0
1
p R11 (k)
β 1 − λ2n v1 (k + 1)
!
,
β→0
p R11 (k)
1 − λ2n v1 (k + 1)
,
where β has disappeared.
This rotation is then applied for the updating of the remaining elements in the first row
of R(k) .
p R1j (k +0 1)
=
(6.10)
β 1 − λ2n vj (k + 1) β→0
√


β 1−λ2n v1 (k+1)
1
R
(k)
1j
R11 (k)
p


√
,
2
β 1−λ2n v1 (k+1)
β
1
−
λ
v
(k
+
1)
j
n
−
1
R11 (k)
which is equivalent to
1)
p R1j (k +
=
1 − λ2n vj0 (k + 1)
β→0
1
0
!
R1j (k)
.
1 − λ2n vj (k + 1)
−
1
R11
(6.11)
This shows that the elements R1j (k) are indeed unaffected by this transformation.
√
1−λ2n v1 (k+1)
p
Applying the rotation to the right hand side (B(k)–part) of the signal flow graph leads
to
B1j (k + 1)
p
=
(6.12)
1
0
1 − λ2n vrhs,j
(k + 1)
β
β→0
√


β 1−λ2n v1 (k+1)
2
1
R11 (k)
p λn B1j (k)


√ 2
,
1
2
β 1−λn v1 (k+1)
1 − λn vrhs,j (k + 1) β
−
1
R11 (k)
β→0
118
where vrhs (k + 1) is the input to the right hand side of the SFG. During noise–only
periods, vrhs (k) = v(k). Equation (6.12) is equivalent to :
!
√ 2
1−λn v1 (k+1)
2
1
p B1j (k0 + 1)
p λn B1j (k)
R11 (k)
=
.
1 − λ2n vrhs,j (k + 1)
1 − λ2n vrhs,j (k + 1)
0
1
(6.13)
0
Note that vrhs
(k) = vrhs (k). Similar transformations are subsequently applied to the
other rows of B(k) and R(k), where the update for the second row of R(k) uses
v0 (k + 1)as its input, and generates v00 (k), which will serve as an input for the third
row, and so on.
Note that we have now removed β from all equations. The above formulae effectively mean that — during noise–only periods — we can replace the original Givens–
rotations in Figure 6.1 by so–called Gauss–transformations, as derived by formulae
(6.11) and (6.13), see Figure 6.2. Note that from these formulae it also follows that the
right–hand part and the left–hand part have differently defined Gauss–transformations,
namely
1 0
Gleft =
−∗ 1
and
Gright = G-T
left =
1 ∗
0 1
.
In chapter 5, we have developed a QRD–based algorithm for unconstrained optimal
filtering where the right hand side update for noise–only periods,
B(k + 1) = λ2n B(k) + (R−T (k + 1)v(k + 1))vT (k + 1)(1 − λ2n ),
is calculated directly by a backsubstitution using an intermediate vector u(k), and a
vector–vector multiplication.
RT (k + 1)u(k + 1)
B(k + 1)
= v(k + 1)
= λ2n B(k) + u(k + 1)vT (k + 1)(1 − λ2n ).
It is easily shown that this backsubstitution corresponds exactly to applying (6.11)
and that the vector–vector multiplication corresponds to (6.13). The reorganized algorithm, however, is more easily converted into a QRD–LSL scheme, see section 6.3.
Since it is assumed that there is no speech present in noise–only mode, we could
set the desired signal estimate to zero, i.e. d̂(k + 1) = 0. In practice, it is often
required to also have an estimate available during noise periods, because complete
silence could be disturbing to the listener (see also section 5.3.3). Hence a residual
extraction scheme that also operates during noise–only mode is again needed. It is
obtained as follows.
T
Let G (k) be the matrix that combines all left hand part Gauss–transformations at
time k. We can generate residuals by applying these left hand side rotations to an
6.3. QRD–LSL BASED ALGORITHM
119
input of 0 applied to the right hand side. These rotations will also not update the right
hand side. We write (with W N (k) = R−1 (k)B(k))
R(k)
B(k)
=
0
rT (k + 1)
R(k)
B(k)
T
p
G (k + 1)
1 − λ2n vT (k + 1)
0
R(k)
B(k)
−W N (k)
=
0
rT (k + 1)
I
|
{z
}


0


rT (k + 1)
R(k)
B(k)
T
−W N (k)
p
G (k + 1)
.
I
1 − λ2n vT (k + 1)
0
|
{z
}


0


p
0 − 1 − λ2n vT (k + 1)W N (k)
Now we can obtain the ’speech signal’ estimate :
p
1 − λ2n xT (k + 1) + rT (k + 1)
T
p
d̂ (k + 1) =
.
1 − λ2n
(6.14)
An algorithm description can be found in Algorithm 15
6.3
QRD–LSL based algorithm
Based upon the reorganization in the previous section, we can now derive a fast algorithm based upon QRD–LSL. First (section 6.3.1the classification in speech+noise or
noise must be done ’per sample’ instead of per vector, in order to be able to maintain
a shift structure in the input. After that (in section 6.3.2), the LSL–based algorithm is
derived.
6.3.1
Per sample versus per vector classification
In the previous section we described a QRD–updating based scheme to calculate an
optimal speech signal estimate from a noisy signal. The scheme is shown in Figure 6.1
for signal+noise input vectors, and in Figure 6.2 for noise–only input vectors. Most
120
Algorithm 15 The modified QRD–RLS algorithm (note the different Gauss–
transformations in the left–and in the right hand side)
QRDRLS_Mod(x,mode)
{
Picos = 1;
if mode=noise
bIn = x[1] // right hand side input
for i=1:M*N
{
Gauss=CalcGauss(R[i][i],x[i]);
for (j = i+1:M*N)
{
ApplyGauss1(Gauss, R[i][j], x[j])
}
ApplyGauss2(Gauss, b[i], bIn)
}
else (mode=signal)
bIn = 0;
for i=1:M*N
{
Givens=CalcGivens(R[i][i],x[i]);
for (j = i+1:M*N)
{
ApplyGivens(Givens, R[i][j], x[j])
}
ApplyGivens(Givens, b[i], bIn, PiCos)
}
return PiCos * bIn;
}
121
of the time, namely within each noise–only segment and within each speech+noise–
segment, the input vector to (the left hand side part of)1 this algorithm is a shifted
version of the input vector of the previous time step. This in particular may allow
us to derive a fast implementation with a QRD–LSL structure. We note here that the
equivalence between the signal flow graphs of Figure 2.4 and Figure 2.5 was stated
in section 2.3.4 for the case where orthogonal Givens transformations are used, but
equally holds when Gauss transformations (cfr. noise–only mode) are used (as this is
the limiting case with β → 0).
However, the shift–structure of the input vectors to the signal flow graph is temporarily
destroyed by a transition between modes if the classification in noise–only periods
versus speech+noise periods is performed on a per–vector basis. When a transition
p
occurs,p
two successive input vectors will indeed have different scalings ( 1 − λ2s
versus 1 − λ2n and β → 0 versus β = 1).
We therefore propose to do the noise–only/speech+noise–classification on a per–sample
basis. Each sample is then effectively given a “flag” f (f = 1 means signal+noise
sample, and f = 0 means noise–only sample (which is multiplied by β → 0)), which
it maintains while travelling through the signal flow graph, both horizontally through
the delay line and vertically through successive transformations in a column. Also all
transformations are given a flag g that indicates whether the transformation is based
on (calculated from) a sample from a noise–only period or a signal+noise period. This
will introduce the transitions gradually into the signal flow graph, which will then allow us to derive a fast algorithm. In this way, the first input vector of a noise period
following a speech+noise period will be (including weightings)
















p
1 − λ2n x1 (k)
..
.
p
2 x (k)
β
1
−
λ
n M
p
1 − λ2s x1 (k − 1)
..
.
p
2 x (k − 1)
1
−
λ
s M
p
1 − λ2s x1 (k − 2)
..
.
β








.







(6.15)
Similarly, the first input vector of a speech+noise–period, following a noise–only pe-
1 The
right hand side part does not need to have shift structure in order to derive a fast algorithm.
122
riod is

p
1 − λ2s x1 (k)
..
.



p

 p 1 − λ2s xM (k)

 β 1 − λ2n x1 (k − 1)

..

 p
.

 β 1 − λ2n xM (k − 1)
 p
 β 1 − λ2n x1 (k − 2)

..
.








.







(6.16)
The shift-structure of the signal flow graph is then always preserved. Since the transition will occur gradually, some transformations in the graph will be calculated from
inputs that have been multiplied by f = 0 (i.e. β → 0), and applied to inputs which
have not, and vice versa. Therefore, we will derive four ’rules’ that can be used for
the updates in the left hand part of the graph (the part that updates R(k)).
1. A transformation based upon a noise–only input sample (g = 0), and applied to
a noise–only input sample (f = 0), can be replaced by a Gauss–transformation.
(The gauss–transformation is different for the left– and right hand part of the
signal flow graph).
2. A transformation based upon a noise–only inputsample (g
= 0), and applied to
1 0
a speech+noise sample (f = 1), is replaced by
.
0 1
3. A transformation based upon a speech+noise input
sample (g =
1), and applied
cos θ 0
to a noise–only sample (f = 0), is replaced by
.
− sin θ 0
4. A transformation based upon a speech+noise input sample (g = 1), and applied
to a speech+noise–sample (f = 1) , is an ordinary Givens–rotation.
Rule 1 is proven in (6.11) and (6.13) where respectively the Gauss–transformations
for the left hand part and the right hand part of the signal flow graph are shown. Rule
4 is the standard orthogonal update. Rule 2 is obvious from
R1j (k + 1)
vj0 (k + 1)

= 
=
−
β
√
1
β
√
1−λ2
n v1 (k+1)
R11 (k)
R1j (k)
√
1 − λ2s xj (k + 1)
1−λ2
n v1 (k+1)
R11 (k)
1
,


R1j (k)
√
1 − λ2s xj (k + 1)
β→0
(6.17)
123
1−λ2
x~1(k+1)
mic 1
Signal/noise flag f
x~2(k+1)
mic 2
x~1(k)
∆
x~2(k)
∆
x~1(k−1)
∆
∆
x~2(k−1)
∆
x~1(k−2)
∆
0
x~2(k−2)
0
f
Signal/noise flag f
1
0
R11
11
((''(' ((''(' ((''(' ((''(' *))*)* **))*) **))*) **))*) **))*) **))*) **))*) +++ ,,++,+ ,,++,+ ,,++,+ ,,++,+
Z 12
"!!"! ""!!! ""!!! ""!!! ""!!! $##$# $$### $$### $$### $$### $$### 21121 22111 22111 22111 22111 0//0/ 00/// 00/// 00/// 00/// 00/// 00/// --- ..--- 21..--- ..--- ..--
" """" $ $$$$$ 2 2222 0 000000 ....
Z22
0
&%%&%& &&%%&% &&%%&% &&%%&% &&%%&% &&%%&%
Z
0
R22
Z
0
0
R33
ABBAABA BABBAAA BABBAAA BABBAAA ==== >=>>=== >=>>=== >=>>=== >=>>=== >=>>=== >=>>=== ;;; <<;;; <<;;; <<;;; <<;;; 9::99:9 :9::999 :9::999 :9::999 :9::999 :9::999
B BBB >>>>>> <<<< : :::::
887787 887787 887787 887787 56655656 65665565 65665565 65665565 65665565 65665565 65665565 3333 43443343 43443343 43443343 43443343
0
0
R44
??? @@??? @@??? @@??? @@??? @@??? @@??? CDDCCD DCDDCC DCDDCC DCDDCC DCDDCC FEEFE FFEEE FFEEE FFEEE FFEEE FFEEE GHHGGH HGHHGG HGHHGG HGHHGG HGHHGG JIIJI JJIII JJIII JJIII JJIII JJIII JJIII KKK LLKKK LLKKK LLKKK LLKKK
@@@@@@ CD DC DC DC DC F FFFFF GH HG HG HG HG J JJJJJJ LLLL
0
0
R55
UUUU VUVVUUU VUVVUUU VUVVUUU VUVVUUU STTSSTS TSTTSSS TSTTSSS TSTTSSS TSTTSSS TSTTSSS
VVVV T TTTTT
RRQQRQ RRQQRQ RRQQRQ RRQQRQ OPPOOPOP POPPOOPO POPPOOPO POPPOOPO POPPOOPO POPPOOPO POPPOOPO MMMM NMNNMMNM NMNNMMNM NMNNMMNM NMNNMMNM
0
0
R66
XWWWX XXWWWXXWWWXXWWWXXWWWXXWWW ZYYYZ ZZYYYZZYYYZZYYYZZYYY \[[[\ \\[[[\\[[[\\[[[\\[[[\\[[[\\[[[ ]]] ^^]]]^^]]]^^]]]^^]]]
X X X X X X Z Z Z Z Z\ \ \ \ \ \ \ ^ ^ ^ ^
0
0
R77
dcddcc dcddcc dcddcc dcddcc abbaab babbaa babbaa babbaa babbaa babbaa babbaa ___ `_``__ `_``__ `_``__ `_``__
0
0
R88
eef fee fee fee fee fee fee gg hgg hgg hgg hgg
ffeef fffee fffee fffee fffee fffee fffee gg hhhgg hhhgg hhhgg hhhgg
0
Πcosθ
ε1
ε2
LS residual
flag g
Signal + noise : λ=λ s
Noise only : λ=λ
n
Givens− or Gausstransformations, depending on flags f and g
Figure 6.3: QRD–RLS scheme for acoustic noise cancellation, classification noise or signal+noise is done sample by sample. Flag g shows if the transformation is based upon a
signal+noise or noise–only sample, and flag f is carried together with the sample through the
SFG and shows if the sample stems from a signal+noise or a noise–only period.
124
and rule 3 is proven by
R1j (k)
R1j (k + 1)
cos θ1 (k) sin θ1 (k)
p
=
2
vj0 (k)
− sin θ1 (k) cos θ1 (k)
β 1 − λn vj (k) β→0
R1j (k)
cos θ1 (k) 0
p
=
.
(6.18)
− sin θ1 (k) 0
1 − λ2n vj (k)
It should be noted that none of the rotations computed from the input vector components that are multiplied by β do not have any effect on the elements of R(k). They
can be considered to be zero as far as the updates for R(k) are concerned. For the
updates of R(k) the input signal can be considered to be
x









time 









∗
∗
0
0
0
0
∗
∗
∗
∗
∗
∗
∗
0
0
0
0
∗
∗
∗
∗
∗
∗
∗
0
0
0
0
∗
∗
S+N-updates
N → S+N
Noise- − only updates
S+N → N
S+N-updates
In this structure, a pre– and post–windowing of the signal that arrives in the estimation
process for the correlation matrix square root R(k) is recognized.
Based on these transformation rules, Figure 6.1 and Figure 6.2 (with per vector classification) may be turned into Figure 6.3 (with per sample classification). An important aspect is that this scheme provides —as one can easily verify— an R(k) which
effectively corresponds to the triangular factor obtained with Figure 2.4 (plain QR–
updating) when fed with the same sequence of input samples, be it that all noise–only
samples are set to zero (pre– and post–windowing). In addition, the right hand side
B(k) = R−T (k)V T (k)V (k) effectively has the same V(k) as in the per–vector classification case, i.e. consists of (full) noise–only input vectors (as it should be).
6.3.2
LSL–algorithm
The signal flow graph of Figure 6.3 is readily transformed into a QRD–LSL type
signal flow graph, as shown in Figure 6.4).
Note that during a signal+noise to noise–only transition, no residuals can be calculated. In practice, this means that there is no correct noise–estimate available, and this
is audible as a small click in the output signal. One could run a parallel lattice during
transitions in order to be able to generate residuals. Since some of the transformations
6.4. TRANSITIONS
125
are void during the transition (the right hand side needs not be updated in that part of
the graph where the Gauss–transformations have already been introduced), this does
not double the complexity), or one could just insert ’comfort noise’ instead (e.g. from
a noise buffer), since the transitions are very short in time.
The complete algorithm is shown in Figure 6.4 where each sample at the input is
accompanied by a flag f which is carried through the SFG along with the signal,
and a flag g which accompanies the rotations. Flag f indicates whether the signal
stems from a noise–only period or from a signal+noise–period, while g indicates if the
transformation was calculated based upon a sample from a noise–only period or not.
The combination of these flags determines whether a hexagon should be a Givens– or
a Gauss–transformation according to the rules given in section 6.3.1. An algorithm
description is given in section 16.
The full specification can be found in Algorithm 16.
6.4 Transitions
In this section we will look in detail to what happens ’internally’ in the algorithm during the different transitions. This information is not absolutely necessary to implement
the algorithm, but it can serve to understand the internal working of the algorithm.
6.4.1
Transition from speech+noise to noise–only mode
We will now clarify the internals of the algorithm, more specifically during transitions.
First we will explain how the Givens–rotations are changed into Gauss–rotations during speech+noise to noise–only transitions. At the last sample of a speech+noise–
period, the first sample from a noise period enters the signal flow graph for the lattice
algorithm, since the QRD–LSL ’looks ahead’ one sample. Figure 2.5 can be redrawn
as is done in Figure 6.5 because of rule 2 (rotation in the upper left corner has no
effect) and rule 3 (as shown in the figure) . Note that all rotations which are passed
to the right hand side are still computed as they were in the signal+noise–period . An
equivalent QRD–RLS scheme would at this time still operate in signal+noise–mode,
and generate the same rotations and residuals (cfr. Figure 6.1)
When more noise–only samples enter the graph, it can be redrawn as in Figure 6.6.
The equivalent triangular scheme during the transition is shown in Figure 6.7.
It should be noted that in the upper part of Figure 6.6, where the rotations are computed from input samples from the noise–only period, Gauss–transformations (arrows
filled with horizontal lines) are introduced into the signal flow graph already, allthough
they are not used to update the right hand side yet.
126
1−λ2
mic 1
x1(k+2)
Signal/noise flag f
x2(k+2)
1
0
0
0
∆
x1(k+1)
0
f
mic 2
Signal/noise flag f
0
∆
x2(k+1)
∆
R11
0
R22
0
∆
0
0
0
0
∆
0
0
∆
0
0
0
0
∆
0
0
∆
0
flag g
Signal + noise : λ=λs
Noise only : λ=λn
0
0
0
Πcos θ
LS residual
Figure 6.4: QRD–LSL based unconstrained optimal filtering for acoustic noise suppression.
The hexagons are either Givens–rotations ore Gauss–rotations, depending upon the flags which
designate whether the sample / rotation stems from a noise–only period or from a signal+noise–
period in the signal.
6.4. TRANSITIONS
127
Algorithm 16 Fast QRD–LSL based noise cancellation algorithm
QRDLSLNoise(x, mode)
PiCos=1; xl = x; xr = x; delay[0] = x;
if mode=Noise {bIn = delay[1]}
else {bIn = 0}
dxl = delay[i+1];
dxr = dxl;
if (mode = Signal)
Givens = ComputeGivens(Comp2[i], dxr, dWeight)
ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, dWeight, PiCos)
Givens = ComputeGivens(Comp1[i], xl, dWeight)
ApplyGivens(Givens, Rot1[i], dxl, dWeight)
if (mode = Noise)
Gauss = ComputeGauss(Comp2[i], dxr)
ApplyGauss(Gauss, Rot2[i], xr, b[i], bIn, dNoiseWeight)
//Note : different transformations on Rot2/xr and b/bIn !!!
Gauss = ComputeGauss(Comp1[i], xl)
ApplyGauss(Gauss, Rot1[i], dxl)
if (mode = SigToNoise)
if xr.IsFromNoisePeriod and dxr.IsFromNoisePeriod
Gauss = ComputeGauss(Comp2[i], dxr, dWeight)
ApplyGauss(Gauss, Rot2[i], xr, dWeight)
//Left hand side still weighted during transition
else if xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod
xr = 0;
//dxl is not changed here
else if not xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod
if (mode = NoiseToSig)
if this is the first sample in a NoiseToSignal transition
ApplyGauss(Gauss, b[i], bIn, dNoiseWeight)
//xr is not modified in this case !!!
dxl=0;
else
//as in a Signal period :
xl = xr;
for (int i = N-2; i >=0 ; i--) {delay[i+1]=delay[i]}
return bIn*dPiCos;
128
Rule 2
filter input 1
x1(k+1)
filter input 2
x2(k+1)
Rule 3
1
0
0
∆
0
0
0
∆
0
0
0
∆
x2(k)
∆
d (k)
1
d (k)
2
ε1
ε2
0
R11
0
R22
R33
0
R44
0
∆
0
x1(k)
0
∆
0
0
∆
R55
0
R66
0
∆
0
0
R77
0
R88
0
Πcos θ
LS residual
Figure 6.5: When u(k + 1) is a noise sample (multiplied by β → 0), the lattice signal flow
graph reduces to the above
129
6.4. TRANSITIONS
filter input 1
x1(k+1)
filter input 2
x2(k+1)
∆
x1(k)
x2(k)
∆
d (k)
1
d (k)
2
ε1
ε2
R11
0
0
0
∆
0
∆
Rule 2
1
0
Rule 3 0
∆
0
0
0
∆
0
0
R33
0
R44
0
∆
0
R22
R55
0
R66
0
∆
0
0
R77
0
R88
0
Πcos θ
LS residual
Figure 6.6: After some noise samples have entered the scheme, the upper part can be loaded
with the Gauss-transformations in order to keep the shift–structure
130
x1(k)
filter input 1
x2(k)
filter input 2
∆
x1(k−1)
∆
x2(k−1)
∆
x1(k−2)
∆
x2(k−2)
∆
x1(k−3)
x2(k−3)
∆
d (k)
1
Z(k)
0
R22
0
0
0
2
R11
1
d (k)
R33
0
R44
0
0
0
R55
0
R66
0
0
0
R77
0
R88
0
Π cos θ
ε1
ε2
LS residual
Figure 6.7: The equivalent triangular scheme when the first noise–only sample enters
131
6.4. TRANSITIONS
When all rotations are replaced by Gauss–rotations, the transition is finished, and one
can start using the rotations to update the right hand side (as described in section
6.2.2). From that time on, also the weighting of the memory elements is stopped,
because (since they are not updated anymore during the noise–period) they would
otherwise become too small after a while.
6.4.2
Transition from a noise–only to a speech+noise–period
This transition takes only one sample, and it has no effect on the residual extraction. In
a first step, when the first signal+noise–sample enters the (one step look ahead) input
of the lattice, we can redraw the signal flow graph as shown in Figure 6.8. From now
on, weighting is again switched on for all memory elements.
0
filter input 1
x1(k+1)
filter input 2
x2(k+1)
0
∆
x1(k)
x2(k)
∆
d (k)
1
R11
d (k)
2
0
0
R22
Rule 3
0
∆
∆
0
0
0
R33
0
0
R44
0
∆
∆
0
0
0
R55
0
0
R66
0
∆
0
∆
R77
Rule 2
0
R88
0
ε1
ε2
LS residual
Figure 6.8: First step in the transition from noise–only to speech+noise. The residuals are
computed based on Gauss–rotations.
At the next time instant, all rotations are replaced by Givens–rotations, and we again
obtain the scheme of Figure 2.5.
132
6.5
Noise reduction vs. signal distortion trade–off
In the QRD–RLS based algorithm in chapter 5, we have introduced a regularization
parameter that allows for tuning of signal distortion versus noise reduction. In this
section we will also do this for the QRD–LSL based algorithm. Two alternatives will
be described, the first one (section 6.5.2) comparable to the technique used in chapter
5 for QRD–RLS, the other one (section 6.5.3) based upon continuously updating the
signal correlation matrix, even during noise periods. This leads to a ’self tuning’ trade
off–parameter, which will provide infinite noise reduction during noise only periods,
and a well regularized algorithm during speech+noise periods.
6.5.1
Regularization in QRD–LSL based ANC
In section 5.4 we have shown how a regularization term µ can be introduced in a
QRD–RLS based system for ANC, see equation (5.10). This has lead to the following
update equation


0
rT2 (k + 1)

0
rT1 (k + 1)  =
(6.19)
R(k + 1) B(k + 1)
 p

1 − λ2s xT (k + 1)
0
p
T
.
Q (k + 1) 
1 − λ2s µ2 vT (k)
0
1
λs R(k)
B(k)
λs
Here v(k) is taken from a noise buffer. The residual signal r2 (k + 1) may be used to
generate residuals, while the residual signal r1 (k + 1) should be discarded. During
noise–only periods, the updates for B(k) remain the same as in the non–regularized
case.
It is important to see that the property on which the derivation of fast RLS–schemes
is based, namely the shift structure of the input signal, is not present anymore in this
case (where two consecutive updates are applied). Each x(k) is a shifted version of
x(k − 1), and each v(k) is a shifted version of v(k − 1). But since they are applied
to the left hand side of the signal flow graph intermittently, each input vector is not a
shifted version of the previous one. Effectively the input vectors now correspond to a
(weighted) block Toeplitz structure instead of just a Toeplitz structure.
Equation (6.19) can be implemented in signal flow graphs like Figure 6.1 and Figure
6.2 by applying both updates ’at the same time’. This is realized by replacing each
single hexagon in the signal flow graph with two hexagons. The first one performs
the rotation with the input signal, and the second one subsequently performs the rotation with the regularization noise. This is shown in Figure 6.9 for the hexagons
representing Givens rotations. The same substitution can be applied for the rotations
that represent Gauss rotations. As a result, since the number of hexagons doubles, for
133
6.5. NOISE REDUCTION VS. SIGNAL DISTORTION TRADE–OFF
each rotation parameter generated and applied in the original scheme (the thick gray
arrows), now two of them are generated and applied in the modified scheme.
Sig in Reg.in.
Sig in
Reg.in.
Sig in
=
a
b
c
Figure 6.9: Doubling the lines and the hexagons in the signal flow graphs
We will now describe two alternatives for implementing regularization in a QRD–LSL
based noise cancellation algorithm. The first implementation will use a noise buffer,
and it will be based upon the QRD–LSL based noise cancellation algorithm derived
earlier in this chapter (Algorithm 16).
The second method will be based upon a standard QRD–LSL adaptive filter. It avoids
the use of a noise buffer, and it provides a regularization mechanism which will put
more emphasis on noise cancellation during noise–only periods.
6.5.2
Regularization using a noise buffer
The fast algorithm which incorporates regularization can be straightforwardly derived
from Figure 6.1 and Figure 6.2, modified as described in Figure 6.9.
In Figure 6.10, the complete scheme is shown. Compare this scheme to Figure 6.4
and note that the thick lines are as a matter of fact ’vector signals’ which carry 2–
vectors with both signal– and regularisation noise samples, corresponding to Figure
6.9. Note also the extra column on the right hand side which calculates the residuals
based upon the memory elements in the one but last right hand side column. The black
arrows which depict the rotations are demultiplexed just before this column, and the
rotations stemming from updates with regularization noise are discarded (cfr. Figure
6.9). This corresponds to discarding the residual r1 (k+1) in (6.19), and only retaining
the r2 (k + 1).
During speech+noise/echo mode, an update is done with left hand side microphone
inputs x(k) and left hand side regularization inputs µ2 v(k) (taken from a noise buffer).
The right hand side input is 0.
134
0
1−λ2
Reg. Input 1
Reg. Input 2
Mic Input 1
x1(k+2)
Signal/noise flag f
Mic Input 2
x2(k+2)
Signal/noise flag f
1
0
0
0
0
∆
Reg. Input 1
x1(k+1)
Mic Input 1
∆
f
x2(k+1)
∆
R11
0
R22
0
∆
0
0
0
0
∆
0
0
∆
0
0
0
0
∆
0
0
∆
0
flag g
Signal + noise : λ=λs
Noise only : λ=λn
0
0
0
Πcos θ
LS residual
Figure 6.10: Regularization in QRD–LSL based noise cancellation using a noise buffer.
135
During noise/echo mode, both left hand side inputs and right hand side inputs are
v(k).
An algorithm description is given in Algorithm 17.
6.5.3
Mode–dependent regularization
As an alternative, we propose not to keep (R(k)) fixed during noise–only periods,
but to update it continuously, be it with a forgetting factor λs (long window) during
speech+noise periods and with forgetting factor λn (short window) during noise–only
periods. In this case, the statistics of the near–end (desired speech signal) component
are ’forgotten’ by the weighting scheme during noise–only periods, but experiments
show that this approach delivers good results concerning ANC. Simulations will be
given in section 7.7 where this algorithm will be applied for combined noise/echo cancellation. The statistics from the noise–only period (estimated with a short window)
then serve as a good ’starting value’ for the estimation of the speech+noise–statistics
during speech+noise periods (with a large window).
Due to the fact that the statistics from the near end source are indeed forgotten during noise periods, the speech signal sounds a bit ’muffled’ at the beginning of a
speech+noise period. But when the forgetting factors are chosen appropriately, this
can be reduced to a hardly noticeable level.
A feature of this approach is that during noise–only periods, the system output is
reduced to zero. This can be understood as follows. If we allow RT (k)R(k) to be updated also during noise–periods, as we propose here, the influence of the speech+noise
covariance estimate will gradually be ’forgotten’ during noise periods. This in fact
corresponds to increasing µ in (5.10). The estimate will now converge to V T (k)V (k),
which corresponds to µ → ∞ (hence W N → I and W → 0) during noise–only periods.
On the other hand, during speech+noise periods, the regularization–effect will gradually be forgotten, resulting in a slight increase in the noise level during speech+noise,
in exchange for a less distorted signal during signal+noise–periods. Hence there may
be a slightly distorted speech signal in the beginning of a speech+noise period.
This procedure can be thought of as a trade–off system that regulates itself : when
there is no near end activity, it provides infinite noise reduction, but when there is
a near end signal present, the signal quality gains importance in the optimisation.
A listening test shows that good results can be achieved after some tuning of the
parameters.
In order to derive a QRD–LSL based algorithm that continuously updates R(k), we
136
Algorithm 17 QRD–LSL noise reduction algorithm with regularization. Inputs are’ mode’ and ’x’, the input vector
PiCos=1; xl = x; xr = x; delay[0] = x;
if mode=Noise { bIn = delay[1]; add x to noise buffer; bn = 0}
else { bIn = 0; bn=get noise vector from noise buffer; bn = µ * bn}
extra=bIn; bnl = bn; bnr = bn; ndelay[0] = bn
dxl = delay[i+1]; dxr = dxl; dbnl = ndelay[i+1]; dbnr = dbnl;
if (mode = Signal)
ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, extra, dWeight, PiCos)
Givens = ComputeGivens(Comp2[i], dbnr, 1)
ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, 1)
Givens = ComputeGivens(Comp1[i], bnl, dWeight)
ApplyGivens(Givens, Rot1[i], dbnl, 1)
if (mode = Noise) //different transformations on Rot2/xr and z/zIn
ApplyGauss(Gauss, Rot2[i], xr, b[i], bIn, dNoiseWeight)
if (mode = SigToNoise) //Left hand side still weighted during transition
if xr.IsFromNoisePeriod and dxr.IsFromNoisePeriod
Gauss = ComputeGauss(Comp2[i], dxr, dWeight)
ApplyGauss(Gauss, Rot2[i], xr, dWeight)
else if xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod
xr = 0; dbnr = 0;
//dxl is not changed here
Givens = ComputeGivens(Comp2[i], dbnr, dWeight)
ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, dWeight)
else if not xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod
Givens = ComputeGivens(Comp2[i], dbnr, 1)
Givens = ComputeGivens(Comp1[i], bnl, 1)
if (mode = NoiseToSig) //xr is not modified in this case
if this is the first sample in a NoiseToSignal transition
ApplyGauss(Gauss, z[i], zIn, dNoiseWeight)
dxl=0;dbnl = 0;
Givens = ComputeGivens(Comp1[i], bnl, 1)
else //as in a Signal period :
Givens = ComputeGivens(Comp2[i], dbnr, dWeight)
Givens = ComputeGivens(Comp1[i], bnl, dWeight)
ApplyGivens(Givens, Rot1[i], dbnl, dWeight)
xl = xr; bnl = bnr;
for (int i = N-2; i >=0 ; i-){delay[i+1]=delay[i];ndelay[i+1]=ndelay[i]}
return extra*dPiCos;
137
can write system (6.2) with β = 1
N
X(k)
0
W =
.
V (k)
V (k)
The normal equations are
(X T (k)X(k) + V T (k)V (k))W
N
= V T (k)V (k).
During speech+noise periods, the term V T (k)V (k) in the left hand side will become
unimportant due to weighting, and the system will converge to
X T (k)X(k)W
N
= V T (k)V (k),
(V T (k)V (k) + DT (k)D(k))W
N
= V T (k)V (k),
where D(k) is the desired speech signal. In a QRD–LSL filter, this is achieved by first
weighting both the left hand side and the right hand side with λs , and then applying a
left hand side input u(k) and a right hand side input 0.
During noise–only periods, the term X T (k)X(k) in the left hand side will be ’forgotten’ , and the system will converge to
V T (k)V (k)W
N
= V T (k)V (k),
such that after convergence
W
N
= I,
providing a ’perfect’ noise estimate.
In this mode, both left– and right hand side of the QRD–LSL adaptive filter are
weighted with λn , to the left hand side the input vT (k) is applied, and to the right
hand side also vT (k).
Note that both the left hand side R(k) and the right hand side B(k) are updated together. This is possible since during noise only periods, R(k) will converge to the
cholesky factor of the noise correlation matrix (because the desired speech statistics
are forgotten due to the weighting). After convergence, we can write the right hand
side during noise–only periods as
B(k)
= R−T (k)V T (k)V (k)
= R−T (k)RT (k)R(k)
= R(k).
So the right hand side converges to R(k) (or, in a practical implementation, to the
column of R(k) which corresponds to the chosen right hand side). This can indeed
be achieved by applying the input vectors to the left hand side and the right hand side
together.
138
Transitions between modes. When R(k) is continuously updated, the choice of λs and
λn is very important. During speech+noise periods, λs should be chosen close enough
to 1 (e.g. λs = 0.999999 for 8000 Hz sampling rate). During noise–only periods, λn
can be chosen smaller (shorter windows) for many types of noise (λn = 0.9997 for
8000 Hz sampling rate) so that convergence to the estimate during noise–only periods
is very fast. On transitions between modes, the weighting is switched between λn and
λs . In a QRD–LSL filter, the shift structure in the input signal must be maintained.
This is not the case if classification into signal+noise and noise–only periods is done
on a per input vector base.
We can solve this by introducing a pre– and post–windowing scheme of the input
vectors : on a transition, we will feed N zeroes to the algorithm’s input, and then
switch the weighting parameters. This means that the residual signal is wrong during
the transition period, but the estimates of R(k) in the algorithm will remain correct.
The lack of a correct output during transitions can be solved by inserting comfort
noise. But on the other hand, experiments show that good results are obtained when
the pre– and post–windowing are ignored, and the weighting factors are switched on
a transition. Errors that are introduced in the estimate for R(k) in this way, will be
’forgotten’ by the weighting.
For an algorithm description, we refer to the QRD–LSL algorithm (Algorithm 6) with
inputs as described in this section.
6.6
Complexity
In the complexity calculations, an addition and a multiplication are counted as two
separate floating point operations. Table 6.1 shows the complexities of different optimal filtering algorithms, and Table 6.2 shows the complexities for some typical parameter settings. It is seen that the QRD–LSL algorithm has a significantly lower
computational complexity, compared to the GSVD–based and QRD–RLS–based algorithms, especially in the case where long filters are used (rightmost column). This
results in the QRD–LSL algorithm being suited for real time implementation.
139
Algorithm
Mode
Complexity
recursive GSVD[12][13]
27.5(M N )2
Full QRD (chapter 5)
Noise–only
(M N )2 + 3M N + M
Speech+noise
3.5(M N )2 + 15.5M N + M + 2
Fast QRD–LSL
Noise–only
6M 2 N
Fast QRD–LSL
Speech+noise
Fast QRD–LSL reg. (section 6.5.2)
Speech+noise
2
21
7
2 )M + 19M N − 2 M
2
7
2((21N − 21
)M
+
19M
N
−
2
2 M)
2
7
(21N − 21
)M
+
19M
N
−
2
2M
QRD–LSL (section 6.5.3)
(21N −
Table 6.1: Complexities in flops per sample of different algorithms
Algorithm
Mode
recursive GSVD[12][13]
N = 20, M = 5
N = 50, M = 2
275 000
275 000
Noise–only
10 305
10 302
Speech+noise
36 557
36 554
Fast QRD–LSL
Noise–only
3 000
1 200
Fast QRD–LSL
Speech+noise
12 120
6 051
Table 6.2: Complexities in flops per sample for typical parameter settings. These figures make
the QRD–LSL algorithm suited for real time implementation.
6.7
Simulation results
For the simulations we used a simulated room environment, 4 microphones, a desired
speaker at broadside angle and a noise source at 45 degrees. The signals are short
sentences, recorded at 8 kHz.
Figure 6.11 compares the QRD–LSL–based optimal filtering method (without regularization) to the GSVD–based optimal filtering method. The QRD–based algorithm
achieves roughly the same performance as to the GSVD–based method, as expected.
6.8
Conclusion
We derived a fast QRD–least squares lattice (QRD–LSL) based unconstrained optimal
filtering algorithm for multichannel ANC. The derivation of the QRD–LSL algorithm
140
Signal Energy
GSVD−based
QRD−LS−based
input signal
−30
−40
Energy [dB]
−50
−60
−70
−80
3.5
4
4.5
5
Time [samples]
5.5
6
4
x 10
Figure 6.11: Performance comparison of GSVD-based optimal filtering (dotted) and QRD–
LSL–based optimal filtering (full line) versus the original signal (dashed). The performance
is equal. The energy of the signals are plotted. In the middle of the plot, a speech segment
is recognized. The algorithm ’sees’ the silence in the beginning of the plot also as a speech
segment, in order to provide a fair indication of the noise reduction during speech periods.
6.8. CONCLUSION
141
is based on a significantly reorganized version of the QRD–RLS–based unconstrained
optimal filtering scheme of chapter 5. We have explicitly set up the transitions between
speech+noise– and noise–only periods in such a way that the correlation matrices that
are implicitly stored in this fast algorithm, correspond to the correlation matrices in the
QRD–RLS based algorithm, which assures that the ’internal status’ of the algorithm
is always correct. For typical parameter settings, a 8–fold complexity reduction is obtained compared to the QRD–RLS based algorithm without any performance penalty.
This makes the approach affordable for real time implementation.
Some methods for incorporating regularization were also introduced, allowing for obtaining more noise reduction in exchange for some signal distortion.
142
Chapter 7
Integrated noise and echo
cancellation
In this chapter, we will describe an approach to speech signal enhancement where
acoustic echo cancellation and noise reduction, which are traditionally handled separately, are combined in one integrated scheme. The optimization problem defined by
this scheme is solved adaptively using the QRD–based algorithms which were developed in previous chapters. We show that the performance of the integrated scheme is
superior to the performance of traditional (cascading) schemes, while complexity is
kept at an affordable level.
7.1
Introduction
An acoustic echo canceller (AEC) traditionally uses an adaptive filter with typically
a large number of filter taps, for instance 1000 taps for a signal sampled at 8000 Hz.
The reason is that it aims to model the (first part of the) acoustic impulse response of
the room, in this case the first 125 msec. Because of the length of the filter, one often
has to resort to cheap algorithms (frequency domain NLMS for example) in order to
keep complexity manageable.
Acoustic noise cancellers (ANC) typically have shorter filters (e.g. delay and sum
beamformers do not ’model’ the room impulse response, but are designed to have a
certain spatial sensitivity pattern, which can obtained by relatively short filter lengths.).
We are interested in ANC schemes that use multiple channels of audio (multiple microphones) in order to exploit both spatial and spectral characteristics of the desired
and disturbing signals.
143
144
CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION
In many applications, for example teleconferencing systems, hand free telephone sets
or voice controlled systems, one has to combine both acoustic echo and noise cancellation (AENC). Many different AENC–schemes can be found in literature [1, 7, 37,
38, 6]. Obviously, the combination of both blocks can be done in two ways, as shown
by Figure 7.1 : either one applies echo cancellation on each of the microphone channels before the noise reduction block, or one applies one echo canceller on the output
signal of the noise reduction block. The latter scheme has the advantage of reduced
complexity, but then studies have shown that the former combination (first AEC, then
ANC) has better performance.
+
+
+
+
+
+
Figure 7.1: Two ways to combine an acoustic echo canceller with a multichannel noise reduction system. Left : first noise reduction, then echo cancellation on the ANC–output. Right : first
an echo canceller on each channel, then noise reduction on the residual signals.
The mere combination (cascading) of these schemes has implications on the performance of the overall system. For the case where AEC–filters are applied on each
channel before the ANC, the adaptive algorithms used in the AEC should be robust
against the noise in the microphone signals (which is for example a problem for filters
based upon affine projection (see chapter 3 and [27])). It is then also the ANC’s task
to remove the residual echo independently of the AEC. If on the other hand ANC is
applied before AEC, the ANC is fed with a signal that also contains the far end echo–
signal. The AEC in this case has to track the (changing) acoustic path of the room
and the changes in the ANC filter. So both combination schemes clearly have their
disadvantages vis–a–vis performance.
In this chapter, we propose to combine the AEC and the ANC into one single optimisation problem which is then solved adaptively, see Figure 7.2. This will lead to
a better overall performance. It will be shown that the length Naec of the ’AEC–part’
of the integrated scheme can be reduced significantly compared to the filter length in
traditional echo cancellers, without incurring a major performance loss. The reduced
filter lengths then allows us to use more advanced adaptive algorithms, which have better convergence properties than e.g. NLMS. The algorithms in this chapter are based
upon QRD–based unconstrained optimal filtering methods for ANC, as described in
chapters 5 and 6.
145
7.2. OPTIMAL FILTERING BASED AENC
When multichannel acoustic echo cancellation would be required, one will be facing
the same problems as in chapter 3. Again decorrelation techniques would have to be
applied to remove the correlation between the loudspeaker signals. In this chapter
we will make abstraction of this, and demonstrate the combined approach with mono
echo cancellation.
The outline of this chapter is as follows. In section 7.2 we will describe the situation,
and the optimization that we will perform. In section 7.3 the estimates for the statistics
are described. Section 7.4 describes the QRD–RLS based algorithm to implement
the optimization and section 7.5 describes the QRD–LSL approach. In section 7.6
we describe how regularization (a trade–off parameter) is introduced. Section 7.7
evaluates the performance of the combined acoustic echo and noise canceller, section
7.8 gives complexity figures, and section 7.9 gives conclusions.
Far end signal f
x1
Noise
Noise
e
h
h1
h2
h
Noise
h
4
3
x2
x3
x4
ANC
part
N
w1 w2 w
3 w4 AEC
part
N
aec
+
y
Speech component in X
is desired output signal d
(unknown)
Original speech signal s
Figure 7.2: Combined AENC–scheme. The filters in the M microphone paths have length N ,
the filter connected to the far end path has length Naec . The (unknown) desired near end speech
signal is d.
7.2
Optimal filtering based AENC
Referring to Figure 7.2, the near end speech component in the i’th microphone at time
k is
O
di (k) = hi (k)
s(k) i = 1 . . . M,
(7.1)
where M is the number of microphones, s(k) is the near end signal and hi (k) represents the acoustic path between the speech source and microphone i. The echo signals
146
are
ei (k) = hei (k)
O
f (k) i = 1 . . . M,
where f (k) is the loudspeaker signal and hei (k) is the acoustic path between the loudspeaker and the i’th microphone.
An assumption we will make is that both the noise and the echo signal are continuously present, which effectively means that in the resulting scheme filter adaptation
will be frozen during off–periods of the echo and/or noise (see below). Then we can
distinguish 2 modes in the input signals : first the speech+noise/echo mode, for which
we will denote the microphone samples with x(k), and second the noise/echo–only
mode, for which we write the inputs as x0 (k). The i’th microphone signal during a
speech+noise/echo period is
xi (k)
= di (k) + ni (k) + ei (k) i = 1 . . . M
= di (k) + vi (k),
and during a noise/echo–only period
x0i (k)
= ni (k) + ei (k)
= vi (k),
where ni (k) is the noise component (sum of the contributions of all noise sources at
microphone i). We define the microphone input vector



x(k) = 

x1 (k)
x2 (k)
..
.
xM (k)








xi (k) = 

xi(k)
xi (k − 1)
..
.
xi (k − N + 1)



,

where N is the number of taps for each of the filters in the ANC–part of the scheme.
The noise/echo only microphone signal vector x0 (k), the desired speech vector d(k),
the echo signal vector e(k) and the noise vectors n(k) are defined in a similar way.
Furthermore x(k) = d(k) + v(k) and v(k) = n(k) + e(k). The loudspeaker signal
vector is


f (k)


f (k − 1)


f (k) = 
,
..


.
f (k − Naec + 1)
where Naec is the number of taps in the AEC–part. We define a compound signal
vector during speech+noise/echo periods as
x(k)
u(k) =
,
f (k)
147
and during noise/echo–only periods
0
u (k) =
x0 (k)
f (k)
.
The following assumptions are made :
• The noise and echo signals are uncorrelated with the speech signal. This results
in
ε{x(k)xT (k)} = ε{d(k)dT (k)} + ε{cross terms} +ε{v(k)vT (k)}
|
{z
}
=0
⇓
T
ε{d(k)d (k)} = ε{x(k)xT (k)} − ε{v(k)vT (k)},
and
ε{f (k)dT (k)} = 0.
Here ε{·} is the expectation operator.
• The noise and echo signals are stationary as compared to the near end speech
signal (by which we mean that their statistics change slower). This assumption allows us to estimate ε{v(k)vT (k)} during periods in which only noise
and echo are present, i.e. where x0 (k) = v(k). This is a classical assumption
for ANC systems, but it can be argumented that it does not hold here since the
echo signal e(k) typically is not stationary. However, allthough the assumption
is not fullfilled for the spectral content of v(k), it is true for the spatial content, since we assume that the loudspeaker which produces f (k) does not move.
Experiments confirm the validity of this assumption (see section 7.7).
• The noise and echo signals are always present while the near–end signal is
sometimes present, i.e. is an on/off signal. One scenario in which this assumption is obviously fullfilled is a voice command application where the echo signal
is e.g. a music signal and the near end signal consists of the voice commands.
When the echo signal is also a speech signal, hence also an on/off signal, we
can either switch off the adaptation during periods where the far end signal is
not present (as is done in traditional echo cancellers), or (when adaptation is not
switched off) allow that the algorithm — during long off periods of the echo
signal — ’forgets’ the position of the far end loudspeaker to which it would
normally — in a beam forming interpretation — attempt to steer a zero.
We can now write the optimal filtering problem as
2
min ε uT (k)Wwf − dT (k) F ,
Wwf
(7.2)
148
with u(k) the filter input and d(k) the desired filter output, i.e. the (unknown) desired
speech contribution in all the (delayed) microphone signals, see (7.1). The signal
estimate is then
d̂Twf (k)
= uT (k)Wwf (k)
T
d(k) + v(k)
=
Wwf (k).
f (k)
The Wiener solution is
Wwf
(ε{u(k)uT (k)})−1 ε{u(k)dT (k)}
I
T
−1
T
T
= (ε{u(k)u (k)}) ε{u(k) u (k)
− v (k) }
0
I
=
− (ε{u(k)uT (k)})−1 ε u(k)vT (k) ,
0
=
so that finally
Wwf =
I
0
− ε{u(k)uT (k)}−1 ε{u0 (k)x0 (k)}.
(7.3)
Here I is the identity matrix.
We will also use a regularization term in the optimization criterion. Referring to [22],
a parameter µ can be used to trade off signal distortion, which is defined as (dT (k) −
T
µ
v(k)
µ
dT (k) 0 Wwf
(k)), versus residual noise/echo defined as (
Wwf
). We
f (k)
will use a similar, but slightly different approach. We define an optimization criterion
ε{uT (k)W µ (k) − dT (k)}2 + µ ε{ vT (k) f T (k) W µ (k)}2 . (7.4)
min
wf
wf
µ
F
F
wwf
149
Now the Wiener–solution is
µ
Wwf
T
=
(ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 .
ε{u(k))dT (k)}
=
(ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 .
I
ε{u(k)(uT (k)
− vT (k))}
0
=
(ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 .
I
T
(ε{u(k)u (k)}
− ε{u(k)vT (k)})
0
T
T
T
(ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 .
I
T
2 0
0T
(ε{u(k)u (k) + µ u (k)u (k)}
−
0
I
T
ε{µ2 u0 (k)u0 (k)}
− ε{u(k)vT (k)})
0
I
T
=
− (ε{u(k)uT (k) + µ2 u0 (k)u0 (k)})−1 .
0
=
T
T
(ε{µ2 u0 (k)x0 (k)} + ε{u0 (k)x0 (k)}),
so that finally
µ
Wwf
=
I
0
−(
T
µ2
1
T
T
ε{u(k)u
(k)}
+
ε{u0 (k)u0 (k)})−1 .
1 + µ2
1 + µ2
ε{u0 (k)x0 (k)}
(7.5)
µ
If all statistical quantities in the above formula were available, Wwf and Wwf
could
µ
straightforwardly be computed. Wwf or Wwf is then a matrix of which each column
provides an optimal (M N + Naec )–taps filter. One of these columns can then be
chosen (arbitrarily) to optimally estimate the speech part in the corresponding entry
of x(k), i.e. filter out the noise/echo in one specific (delayed) microphone signal. In
practice, of course, not the whole matrix will be calculated, but only a selected column
of it.
150
7.3
Data driven approach
A data–driven approach will be based on data matrices U (k), U 0 (k) and X 0 (k), which
are conveniently defined as


uT (k)
 λs uT (k − 1) 
p


U (k) =
1 − λ2s  λ2 uT (k − 2)  ,
(7.6)
 s

..
.


T
u0 (k)
 λn u0 T (k − 1) 
p


U 0 (k) =
1 − λ2n  λ2 u0 T (k − 2)  ,
(7.7)
 n

..
.
X 0 (k) = U 0 (k)
I
0
,
where λs denotes the forgetting factor for the speech+noise/echo data, and λn the
forgetting factor for the noise/echo–only data. In order to compute (7.3), we want
U T (k)U (k) to be an estimate of ε{u(k)uT (k)}, i.e. ε{u(k)uT (k)} ≈ U T (k)U (k).
This is realised by the above definition of U (k), as can be verified from the corresponding update formulas
λ2s U T (k)U (k)
U T (k + 1)U (k + 1) =
+ (1 − λ2s )u(k + 1)uT (k + 1).
(7.8)
Such updates may be calculated during speech+signal/noise periods. The other estimate we need in (7.3) is
T
T
ε{u0 (k)x0 (k)} ∼
= U 0 (k)X 0 (k).
This is realized by the definition of U 0 (k) and X 0 (k) , as the corresponding update
formula can be verified from
T
U 0 (k + 1)X 0 (k + 1) =
T
λ2n U 0 (k)X 0 (k)
+ (1 −
λ2n )u0 (k
(7.9)
0T
+ 1)x (k + 1),
which can be calculated during noise/echo–only periods.
Allthough (7.8) and (7.9) ensure that the estimates are correct, it may be interesting in
a practical application to divide both (7.8) and (7.9) by min((1 − λ2n ), (1 − λ2s )), in
order to avoid multiplication and division by very small numbers, and then correct for
this in the final result. For the theoretical derivation of the algorithm, we will continue
to work with the unmodified equations (7.8) and (7.9).
151
7.4. QRD–RLS BASED ALGORITHM
7.4
QRD–RLS based algorithm
Using the QR–decomposition we can write
W (k)
I
0
T
(R (k)R(k)) (R (k)R(k)
− U 0 (k)X 0 (k))
I
T
=
− R−1 (k)R−T (k)U 0 (k)X 0 (k).
(7.10)
0
=
N
Define W (k) =
−1
T
I
0
T
− W (k), then
T
W N (k) = R−1 (k) R−T (k)U 0 (k)X 0 (k) .
|
{z
}
(7.11)
≡B(k)
We will again store and update both R(k) and B(k) so that at any time W N (k) can
be computed by backsubstitution
R(k)W N (k) = B(k).
(7.12)
Only one column of B(k) has to be stored and updated, thus providing a signal or
noise/echo estimate for the corresponding microphone signal.
7.4.1
Speech+noise/echo updates
R(k) and B(k) can be updated as
T
Q (k + 1)
p
0
rT (k + 1)
R(k + 1) B(k + 1)
1 − λ2s uT (k + 1)
λs R(k)
0
1
λs B(k)
=
.
(7.13)
The optimal filter coefficients can be computed by backsubstitution (equation (7.12)),
or the least squares residuals are then obtained by multiplying the elements of r(k +1)
in formula (7.13) by the product of the cosines of the Givens rotation angles. For
details, we refer to chapter 5.
This rearrangement results in the signal flow graph (SFG) in Figure 7.3 for M =
2, N = 4, Naec = 6.
All signal flow graphs shown in this chapter again have rearranged input vectors ũ(k)
152
instead of u(k), as follows













ũ(k) = 












x1 (k)
..
.
xM (k)
f (k)
−−−−−−−
x1 (k − 1)
..
.
xM (k − Naec + 1)
f (k − Naec + 1)
−−−−−−−
f (k − Naec )
..
.
f (k − Naec + 1)












.











The residuals yn (k) generated by this SFG are the noise+echo signal estimates :
p
(7.14)
ynT (k + 1) = 0 − u(k + 1)W N (k + 1) 1 − λ2s .
The overall output signal (the estimate for the
written as

u1 (k + 1)
 u2 (k + 1)

d̂(k + 1) = 
..

.
uM (k + 1)
7.4.2
near end speech signal) can then be

 y (k + 1)

n
.
− p

1 − λ2s
Noise/echo–only updates
During noise/echo–only periods, R(k) remains unchanged, while
T
B(k) = R−T (k)U 0 (k) X 0 (k)
has to be updated. From equation (7.9), we find that
p
p
T
B(k + 1) = λ2n B(k) + (R−T (k + 1) 1 − λ2n u0 (k + 1)) 1 − λ2n x0 (k + 1)}).
Given R(k + 1), we can compute a(k + 1) = (R−T (k + 1)u0 (k + 1)) by a backsubstitution in
RT (k + 1)a(k + 1) = u0 (k + 1)
B(k + 1) = λ2n B(k) + a(k + 1)vT (k + 1)(1 − λ2n ),
which should be substituted in the memory cells on the right hand side in Figure
7.3 during noise/echo–only mode. It is — just as in chapter 5 — possible to also
153
7.4. QRD–RLS BASED ALGORITHM
1−λ
2
mic 1
x1(k)
mic 2
x (k)
echo ref
f (k)
2
∆
∆
∆
∆
∆
∆
∆
∆
∆
∆
∆
0
1
0
0
0
0
0
0
0
0
0
0
0
0
rotation cell
a
θ
a’
θ
0
0
b
b’
0
0
memory cell
(delay)
1/λ x
0
0
0
0
memory cell
(delay)
λ
0
0
0
0
0
0
0
0
0
ε
ε
Πcos θ
LS residual
y (k) y1 (k)
1
Figure 7.3: Signal flow graph for residual extraction. Naec = 6, N = 4, M = 2. The signal
flow graph is executed during speech+noise/echo mode, while only the memory elements in the
right hand side frame are updated during noise/echo–only mode (as described in section 7.4.2)
154
generate residuals in noise/echo–only mode, by executing the signal flow graph in
’frozen mode’.
An algorithm description can be found in
Algorithm 18 QRD–RLS algorithm for AENC
QRDRLS_AENC_update (R, x, r, Weight)
{
// x is the input vector
// r = x[1] in signal mode, r=0 in noise/echo mode
PiCos = 1;
for (i = 0; i < M * N + Naec; i++)
{
R[i][i] *= Weight;
R[i][0] = temp;
for (j = i+1; j < M * N + Naec; j++)
{
temp = R[i][j] * Weight;
}
temp = z[i] / Weight;
z[i] = cosTheta * temp + sinTheta * r;
r = -sinTheta * temp + cosTheta * r;
PiCos *= cosTheta;
}
return r * PiCos;
}
7.5
QRD–LSL algorithm
We will use the QRD–LSL based algorithm for acoustic noise cancellation which was
derived in chapter 6 as a basis for the QRD–LSL based algorithm for combined echo–
and noise cancellation. Let us consider the alternative minimization problem with
weighting schemes (7.8) and (7.9) :
0
U (k)
N
.
min W
−
(7.15)
1
0
0
fast
N
X
(k)
βU
(k)
Wfast
β
T
T
N
(U T (k)U (k) + β 2 U 0 (k)U 0 (k))Wfast
(k) = (U 0 (k)X 0 (k)).
(7.16)
N
N
These can be solved for Wfast
(k). If β → 0, then Wfast
(k) → W N (k). This scheme
is updated with u(k) as an input and 0 as the desired signal during speech+noise/echo
155
7.6. REGULARIZED AENC
periods, and with u0 (k) as input and x0 (k) as desired signal during noise/echo – only
periods.
Residual extraction can be used to obtain noise estimates , which can then be subtracted from the input signal in order to get clean signal estimates.
Since often one will want to use more filter taps for the AEC part than for the ANC part
(Naec can be made longer than N ), one can alternatively use a scheme for QRD–LSL
with unequal channel lengths. We refer to [46] for details, examples of signal flow
graphs with unequal channel lengths are shown below. For an algorithm description,
we refer to Algorithm 16, with inputs as described in this section (the input vector is
extended with a channel containing the echo reference signal).
7.6
Regularized AENC
Experiments show that a better noise/echo cancellation can be obtained by modifying
the optimization function where more emphasis is put on the noise/echo cancellation
term, at the expense of increased signal distortion. As a matter of fact, this regularization is indispensable when combined echo– and noise cancellation is involved. The
corresponding optimization problem is given in (5.10).
The update equation becomes

0
rT2 (k + 1)

0
rT1 (k + 1)  =
R(k + 1) B(k + 1)
 p

1 − λ2s uT (k + 1)
0
p
T
T
,
Q (k + 1) 
1 − λ2s µ2 u0 (k)
0
1
λs R(k)
λs B(k)

(7.17)
where x0 (k) is taken out of a noise buffer. During noise–only periods, B(k) is updated
as in section 7.4.2.
7.6.1
Regularization using a noise/echo buffer
The QRD–LSL based noise cancellation algorithm with regularization from section
6.5.2 in chapter 6 can be used as a basis for a QRD–LSL based acoustic noise and
echo cancelling algorithm. If Naec > N a QRD–LSL structure with unequal channel
lengths can be used[46].
During speech+noise/echo mode, an update is done with microphone inputs u(k), and
156
regularization inputs µ2 x0 (k), taken from a noise buffer. The right hand side inputs
are 0.
During noise/echo mode, the inputs for the left hand side are u0 (k), and the inputs for
the right hand side are x0 (k). For an algorithm description, we refer to Algorithm
17, where the input vector is extended with one channel containing the echo reference
signal.
7.6.2
Mode–dependent regularization
In order to use the alternative noise cancellation algorithm from section 6.5.3 as a
basis for combined noise/echo cancellation, we we write (7.15) with β = 1 :
U (k)
U 0 (k)
N
W f ast =
0
X 0 (k)
.
N
T
T
(U T (k)U (k) + U 0 (k)U 0 (k))W f ast = U 0 (k)X 0 (k).
During speech+noise/echo periods, due to weighting the system will converge to
N
T
U T (k)U (k)W f ast = U 0 (k)X 0 (k)
T
V (k)
V (k) F (k) +
(
F T (k)
D(k)
N
T
T
D (k) 0
)W f ast = U 0 (k)X 0 (k),
0
where D(k) is the desired speech signal. In the QRD–LSL filter in Figure 7.4, this
is achieved by first weighting both the left hand side and the right hand side with λs ,
and then applying a left hand side input u(k) and a right hand side input 0.
During noise/echo–only periods, the system converges to
N
T
T
U 0 (k)U 0 (k)W f ast = U 0 (k)X 0 (k),
such that after convergence
N
W f ast =
I
0
.
In this mode, both left– and right hand side
Figure 7.4 are
weighted with λn , and the
T
to the left hand side the input u0 (k) = x0 T (k) 0 is applied, and to the right
T
hand side x0 (k).
7.7. PERFORMANCE
157
During transitions between modes, pre–and post–windowing should be used, as explained in chapter 6.
For an algorithm description we refer to Algorithm 6, with inputs as described in this
section.
7.7 Performance
To set up a performance comparison, we have implemented a conventional cascaded
multichannel scheme (right hand side of Figure 7.1) consisting of two blocks : first
the echo is removed from all of the microphone channels, and then the signals are
processed by a noise cancellation scheme. For the echo cancellers we have chosen
an RLS–algorithm (QRD–lattice) which is not often used in practice because of its
complexity, but this assures us that we achieve the best possible result for the two–
block–scheme. The noise cancellation algorithm used is the QRD–based scheme from
[56]. This ’traditional’ setup is compared to the integrated approach from section
7.6.2.
The sampling frequency was 8kHz. A simulated room environment was used, with
4 microphones spaced 20 centimeter. The near end speaker is located at about 10
degrees from broadside, a white noise source is at 45 degrees, and the loudspeaker for
the far end signal at -45 degrees. The near end speaker utters a phrase with decreasing
energy, so the signal to noise+echo ratio varies from -10 dB in the beginning of the
phrase to -40 dB at the end of the phrase (Figure 7.5 shows some utterances of the
used phrase). The signal to noise ratio varies from +13 dB in the beginning of the
phrase to -14 dB at the end. The parameters used are λecho,trad = 0.9997 for the
forgetting factors in the RLS–based traditional echo cancellers, and λs,trad = λn,trad =
0.9997 for the forgetting factors for the noise cancellation algorithm in the traditional
setup. No regularization was applied here. For the new method, we have chosen λs =
0.999999 and λn = 0.9997. While the new method does incorporate regularization,
the simulations will show that less signal distortion results. The simulations compare
only speech+noise/echo periods, since the new algorithm suppresses all signal during
noise/echo only periods, and this would not result in a relevant comparison.
All speech/noise detection has been done manually (which means a perfect speech
detector is assumed).
Figure 7.6 and Figure 7.7 show that the integrated approach outperforms the conventional method for a simulated acoustic path of 200 taps, and with an echo canceller length Naec = 200, M = 4 microphones, and N = 40 taps per microphone.
Both algorithms operate in speech+noise/echo–mode in this plot. The valleys (speech
pauses) are up to 20 dB lower for the combined algorithm (which means more noise
reduction), while the peaks are slightly higher, which shows that there is less signal
distortion.
158
Left hand side
inputs
1−λ2
mic 1
x1(k)
mic 2
x (k)
echo ref
Right hand side
input
2
x3(k)
∆
1
0
0
0
∆
0
0
∆
∆
∆
0
0
0
0
0
0
0
0
∆
∆
0
0
∆
0
0
0
0
0
0
0
∆
∆
0
0
∆
0
0
0
0
0
0
0
∆
∆
0
0
∆
0
0
0
0
0
0
ε
Πcos θ
Figure 7.4: QRD–LSL AENC scheme. During speech+noise/echo–periods λ = λs , resulting in
a large window, and during noise/echo–only periods, λ = λn , resulting in a shorter window.
159
7.7. PERFORMANCE
0
Energy (dB)
−20
−40
−60
−80
−100
−120
−140
0
1
2
3
4
5
6
7
8
9
4
x 10
Time (samples)
Figure 7.5: Some utterances of the phrase which was used for the simulations. The lower curve
is the energy (in dB) of the clean speech signal which reaches the microphone, while the upper
curve is the energy of one of the microphone signals. This shows that the SENR varies from -10
dB to -40 dB in each utterance.
1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0
0.5
1
1.5
Time (samples)
2
2.5
3
3.5
4
x 10
Figure 7.6: Comparison of the output signal of the cascaded scheme (black) and the integrated
approach (light gray). Both algorithms operate in speech+noise/echo–mode.
160
0
Energy (dB)
−20
AEC followed by ANC
−40
−60
−80
−100
−120
Combined scheme
−140
−160
−180
−200
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
Time (samples)
Figure 7.7: Comparison of the energy in the output of the cascaded scheme with ANC following
a AEC, and of the integrated approach as described in this paper. Notice the deeper valleys
(higher noise cancellation during speech pauses) and the higher peaks (less signal distortion
during speech) for the integrated algorithm. Both algorithms operate in speech+noise/echo–
mode.
−20
Energy (dB)
−30
−40
−50
−60
−70
−80
−90
N=10, Naec=100
N=20, Naec=100
−100
N=10, N
=200
aec
−110
N=20, Naec=200
0.5
1
1.5
Time (samples)
2
2.5
3
4
x 10
Figure 7.8: Performance of the integrated scheme when undermodelling the echo path.
161
7.7. PERFORMANCE
Energy (dB)
−20
−30
−40
−50
−60
−70
−80
−90
−100
0.5
1
1.5
2
2.5
3
4
x 10
Time (samples)
Figure 7.9: Comparison for undermodelling : the cascaded approach (full line) can not handle
this situation as well as the integrated approach (dashed line) (both cases M=4, Naec = 100,
N=40)
Energy (dB)
−20
−30
−40
−50
−60
−70
−80
−90
−100
0.5
1
1.5
2
2.5
4
Time (samples)
x 10
Figure 7.10: For almost the same total number of taps, the cascaded approach with sufficient
order for the echo path (M=4, Naec = 200, N=20, total taps = 280, full line) performs worse
than the integrated approach with an undermodeled echo path (M=4, Naec = 100, N=40, total
taps = 260, dotted line). During the pauses between the word utterances, the difference is very
large !
162
Energy (dB)
−20
−30
−40
−50
−60
−70
−80
−90
−100
−110
0.5
1
1.5
2
2.5
3
4
Time (samples)
x 10
Figure 7.11: Even if for both algorithms N=40, M=4, and the cascaded scheme is sufficient
order for the echo (Naec = 200) while the new scheme is undermodelled (Naec = 100) (dashed
line), the integrated scheme still performs better because the noise/echo information is not
processed in two independant stages.
The performance of the integrated approach for the case of undermodelling — which
is often the case in a realistic situation — of the echo–path is shown in Figure 7.8. A
comparison of this situation with the cascaded scheme is depicted in Figure 7.9.
The (undermodelling) echo–cancellers in the cascaded scheme will produce a large
instantaneous misadjustment which is due to the non–stationarity of the far–end signal
[57]. The —independently adapted— noise cancellation filter in the cascaded scheme
can not compensate for this, since its input signal is disturbed by the behaviour of the
first (AEC) block. The integrated approach is shown to handle this situation far better.
In Figure 7.10, an integrated scheme where the echo–canceller part is undermodelled
is compared with a cascaded scheme with a sufficient order modelling of the echo path
with (about) the same total number of filter taps. Also here the integrated approach
outperforms the conventional cascading scheme.
Finally, Figure 7.12 shows that the echo canceller filter can indeed be made shorter
due to the advantageous effect of adding the noise filters. The performance with the
noise filters in a combined scheme in a noise free environment is better than without
the noise filters.
163
7.7. PERFORMANCE
−20
Energy (dB)
−30
−40
−50
−60
−70
−80
−90
−100
−110
1
2
3
4
5
6
7
8
9
4
Time (samples)
x 10
Figure 7.12: Simulation in noise free environment shows that echo cancelling is aided by the
M length N filters in the signal path. The dotted line is a combined scheme with a 300 taps
echo filter and 4 channels with each 25 taps noise filters. It is better than a 300 taps traditional
echo canceller alone (full line).
164
7.8
Complexity
In the complexity calculations, an addition and a multiplication are counted as two
separate floating point operations. Table 7.1 shows the complexities of the algorithms.
Algorithm
Complexity
2
Full QRD, noise
(M N + Naec ) + 3M (N + Naec ) + M
Full QRD, speech
3.5(M N + Naec )2 + 15.5(M N + Naec ) + M + 2
Regul. Full QRD, speech
Cont. Upd. QRD–LSL
(21N −
2
15
2 (M N + Naec )
2
21
2 )(M + 1) + 19(M
+
+
69
2 (M N + Naec )
1)N − 72 (M + 1)
+3
+ 21(Naec − N )
Table 7.1: Complexities of different algorithms in flops per sample. M is the number of microphones, Naec =is the length for the AEC–part, and N is the number of filter taps per microphone
channel in the ANC part.
For a typical setting in a car environment, Naec = 200, N = 10, M = 3, the complexity for the new continuously updating QRD–LSL based technique is 7912 flops
per sample. This is to be compared with the complexity of a cascaded scheme. The
QRD–LSL based noise cancellation algorithm we have derived in [51], has a complexity of 2346 flops per sample for these settings. An NLMS–based echo canceller
would have a complexity of 800 flops per sample with these parameters. This means
that a cascading scheme with first echo cancelling and then a QRD–LSL based noise
cancellation scheme would amount to 3*800+2346=4746 flops per sample.
7.9
Conclusion
In this chapter, we have extended both QRD–RLS and QRD–LSL based schemes for
noise cancellation with an extra echo–reference input signal, thus proposing schemes
which handle combined noise– and echo cancellation as one single optimization problem. We have shown by simulations that the performance is better when such a global
optimization problem is solved than when the traditional cascading approach is used.
The complexity and performance figures show that allthough somewhat more complex, the more performant QRD–LSL based approach presented here can be applied
for real time processing as an alternative for cascading techniques.
Chapter 8
Conclusions
In this thesis we have developed a number of techniques which can be used to ’clean
up’ a speech signal picked up in an adverse acoustic environment. We have made
a distinction between disturbances for which a reference signal is available, and disturbances for which no reference signal is available. The first type of disturbances
gives rise to techniques which we classify as ’acoustic echo cancellation techniques’
(AEC), while the second type of disturbances can be reduced by ’noise cancellation
techniques’ (ANC) . Acoustic echo cancellation is treated in the first part of the text,
acoustic noise cancellation in the second part, and in the third part of the text, the
combination of both is discussed.
Acoustic echo cancellation
The NLMS algorithm is a cheap algorithm which may exhibit some performance problems in case non–white input signals are used. On the other hand, the RLS–algorithm,
which performs very well even for non–white signals, is much more expensive. A
class of ’intermediate’ algorithms is the APA–family of adaptive filters. RLS and
APA can be seen as an NLMS–filter with pre–whitening applied.
Not too long ago, only NLMS–filters and even cheaper frequency domain variants
were used to implement acoustic echo cancellation, because of the long adaptive filters
involved, allthough both RLS and APA are well known to perform better. Due to the
increase of computing power over the years, also APA–filters more and more find their
way into this field, notably when multichannel acoustic echo cancellation is involved.
We have shown that if affine projection techniques are used for acoustic echo cancelling, it is important to provide sufficient regularization in order to obtain robustness
165
166
CHAPTER 8. CONCLUSIONS
for continuously present near end background noise. This is important in single channel echo cancellation, but even more in multichannel, where the cross–correlation
between the loudspeaker signals (and hence the input signals of the adaptive filter)
leads to ill–conditioning of the problem.
We have pointed out the advantages and disadvantages of the FAP–algorithm. In its
traditional version, what we have called ’explicit regularization’ can easily be incorporated into it, because it uses the FTF–algorithm for updating the size P correlation
matrix. On the other hand, it makes some assumptions concerning how much regularization is used, and it exposes problems when an exponential weighting technique is
used for regularization. Another disadvantage is that periodic restarting of the estimation is necessary due to numerical stability problems in the FTF–algorithm.
We proposed to replace the update of the small correlation matrix of size P by a QRD–
based updating procedure which is numerically stable. Explicit regularization is not
easily implemented in this QRD–based approach, and therefore we have described an
alternative approach, which we have called ’sparse equations technique’. This is a
technique that regularizes the affine projection algorithm if it is used with signals that
have a large autocorrelation only for a small lag (e.g. speech). It can be used as a
stand alone technique for regularization.
Unfortunately, this technique violates the assumptions on which the FAP algorithm is
based, which motivates the development of a fast APA algorithm which does not incorporate the assumptions present in FAP. But the main reason for developing such an
algorithm is the fact that if much regularization is used in FAP, the algorithm exhibits
problems.
We have thus derived an exact frequency domain version of the affine projection algorithm, named Block Exact APA. This algorithm has a complexity that is comparable
with a frequency domain version of fast affine projection (namely BEFAP), and since
it is an exact implementation of APA, the convergence characteristics of the original
affine projection algorithm are maintained when regularization is applied, while this
is not the case when FAP–based fast versions of APA are used.
This algorithm was extended to allow for the ’sparse equations’ technique for regularization to be used.
Acoustic Noise Cancellation
In the literature, several noise cancellation schemes can be found. Most of these
schemes use multiple microphones in order to take advantage of the spatial characteristics of both speech and noise. Apart from the classical beamforming approaches,
also unconstrained optimal filtering approaches exist. Traditionally they have been
based upon singular value decomposition techniques, which inherently have a large
167
complexity.
We have derived a new QRD–based algorithm for unconstrained optimal multichannel
filtering with an “unknown” desired signal, and applied this to adaptive acoustic noise
suppression.
The same basic problem is solved as in related algorithms from the literature, but
due to the high computational complexity of the SVD–algorithm which is used in the
traditional techniques, often approximations (SVD–tracking) are introduced in order
to keep complexity manageable.
Using the QRD–approach results in a performance which is the same as that of the
SVD–based algorithms, or even better since no approximations are required. The
complexity of the QRD–based optimal filtering technique is an order of magnitude
lower than that of the (approximating) SVD–based approaches.
We have also introduced a ’trade–off’ parameter which allows for obtaining more
noise reduction in exchange for some (tolerable) signal distortion.
Besides the QRD–based approach, we have also derived a fast QRD–LSL–based algorithm and applied it on the ’unknown desired signal’–case which is encountered
in acoustic noise cancellation. This algorithm is based on a significantly reorganized
version of the QRD–RLS–based unconstrained optimal filtering scheme. While the
QRD–based unconstrained optimal filtering algorithm has a complexity which is an
order of magnitude lower than the complexity of the (approximating) SVD–tracking
based algorithm, fast QRD–LSL based unconstrained optimal filtering even achieves
a complexity wich is about 8 times lower than QRD–based unconstrained optimal
filtering (for typical parameter settings).
Combination of echo and noise cancellation
We have extended both QRD–RLS and QRD–LSL based schemes for noise cancellation with an extra echo–reference input signal, thus proposing schemes which handle
combined noise– and echo cancellation as one single optimization problem. We have
shown by simulations that the performance is better when such a global optimization
problem is solved than when the traditional cascading approach is used. The complexity and performance figures show that allthough somewhat more complex, the more
performant QRD–LSL based approach can be applied for real time processing as an
alternative for cascading techniques.
168
CHAPTER 8. CONCLUSIONS
Further research
In the field of acoustic echo cancellation, no ’perfect’ solutions exist yet for multichannel decorrelation. For speech signals non–linearities like half–wave rectifiers are
providing sufficiently good results, but in applications where multichannel audio is
involved (voice command applications for audio devices), these solutions introduce
intolerable distortion. This subject clearly requires more research.
The adaptive filtering techniques which form the core of acoustic echo cancellers are
well explored. For cheap consumer products NLMS and frequency domain adaptive
filters can be used, while a whole range of better (and more expensive) algorithms exist
if one can afford the extra complexity. For the class of noise cancellation algorithms
we have described in this thesis, namely the unconstrained MMSE–optimal filtering
class, only SVD–based and (as derived in this text) QRD–based algorithms exist. Another interesting subject for future research would be if this topic could be handled
by ’cheaper’ adaptive filtering algorithms, like perhaps APA–based filters. We would
like to refer to [23], where the NLMS–algorithm is used to implement unconstrained
optimal filtering for multichannel noise cancellation.
Finding cheaper algorithms is even more important when the combination of echo–
and noise reduction is considered as in chapter 7, since the filter length corresponding
to the echo path is usually much larger than the filters used in the ’noise reduction’
part of the algorithm. In traditional setups, where echo– and noise cancellation were
handled in two separate schemes which were cascaded, cheap filters could be used for
the long paths in echo cancelling, while more complex algorithms could be used for
the shorter noise reduction paths. But the experiments in chapter 7 clearly indicate
that there is an advantage in solving the combined problem as a whole, so it would be
interesting to invest time in trying to reduce the complexity of the integrated optimal
filtering approach.
Bibliography
[1] M. Ali. Stereophonic acoustic echo cancellation system using time–varying all–
pass filtering for signal decorrelation. In ICASSP. IEEE, 1998.
[2] F. Amand, J. Benesty, A. Gilloire, and Y. Grenier. A fast two–cahnnel projection algorithm for stereophonic acoustic echo cancellation. In ICASSP96. IEEE,
1996.
[3] Duncan Bees, Maier Blostein, and Peter Kabal. Reverberant speech enhancement using cepstral processing. In Proceedings of the 1991 IEEE Int. Conf. on
Acoust., Speech and Signal Processing, pages 977 – 980. IEEE, May 1991.
[4] J. Benesty, F. Amand, A. Gilloire, and Y. Grenier. Adaptive filtering algorithms
for stereophonic acoustic echo cancellation. In ICASSP, pages 3099 – 3102.
IEEE, 1995.
[5] J. Benesty, A. Gilloire, and Y. Grenier. A frequency domain stereophonic acoustic echo canceller exploiting the coherence between the channels and using nonlinear transformations. In Proceedings of International Workshop on Acoustics
and Echo Cancelling (IWAENC99), pages 28–31. IEEE, 1999.
[6] J. Benesty, D. R. Morgan, and J. L. Hall adn M. M. Sondhi. Synthesised stereo
combined with acoustic echo cancellation for desktop conferencing. In Proceedings of ICASSP99, 1999.
[7] J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi. Stereophonic acoustic echo cancellation using nonlinear transformations and comb filtering. In
ICASSP. IEEE, 1998.
[8] F. Capman, J. Boudy, and P. Lockwood. Acoustic echo cancellation using a fast
qr-rls algorithm and multirate schemes. Proceedings of ICASSP, pages 969–972,
1995.
[9] F. Capman, J. Boudy, and P. Lockwood. Controlled convergence of qr least
squares adaptive algorithms — application to speech echo cancellation. In Proceedings of ICASSP, pages 2297–2300. IEEE, 1997.
169
170
BIBLIOGRAPHY
[10] C. Carlemalm, F. Gustafsson, and B. Wahlberg. On the problem of detection and
discrimination of double talk and change in the echo path. In ICASSP Conference
Proceedings. IEEE, ?
[11] S. Doclo, E. De Clippel, and M. Moonen. Multi–microphone noise reduction
using gsvd–based optimal filtering with anc postprocessing stage. In Proc. of the
9th IEEE DSP Workshop, Hunt TX, USA. IEEE, Oct. 2000.
[12] S. Doclo and M. Moonen. SVD–based optimal filtering with applications to
noise reduction in speech signals. In Proc. of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’99), New Paltz
NY, USA, pages 143–146. IEEE, Oct 1999.
[13] S. Doclo and M. Moonen. Noise reduction in multi-microphone speech signals
using recursive and approximate GSVD–based optimal filtering. In Proc. of
the IEEE Benelux Signal Processing Symposium (SPS-2000), Hilvarenbeek, The
Netherlands, March 2000.
[14] S. Doclo and M. Moonen. GSVD-Based Optimal Filtering for Multi-Microphone
Speech Enhancement, chapter 6 in “Microphone Arrays: Signal Processing
Techniques and Applications” (Brandstein, M. S. and Ward, D. B., Eds.), pages
111–132. Springer-Verlag, May 2001.
[15] S. Doclo and M. Moonen. GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Trans. Signal Processing, 50(9):2230–
2244, September 2002.
[16] Matthias Dörbecker and Stefan Ernst. Combination of two–channel spectral subtraction and adaptive wiener post–filtering for noise reduction and dereverberation. In Proceedings of EUSIPCO96, page 995, September 1996.
[17] P. Dreiseitel, E. Hansler, and H. Puder. Acoustic echo and noise control — a
long lasting challenge, 1998.
[18] K. Eneman. Subband and Frequency–Domain Adaptive Filtering Techniques for
Speech Enhancement in Hands–free Communication. PhD thesis, Katholieke
Universiteit Leuven, Heverlee, Belgium, March 2002.
[19] K. Eneman and M. Moonen. Hybrid Subband/Frequency–Domain Adaptive Systems. Signal Processing, 81(1):117–136, January 2001.
[20] P. Eneroth, T. Gänsler, S. Gay, and J. Benesty. Studies of a wideband stereophonic acoustic echo canceler. In Proc. 1999 IEEE Workshop on applications of
Signal Processing to Audio and Acoustics, pages 207–210. IEEE, October 1999.
[21] P. Eneroth, S. Gay, T. Gänsler, and J. Benesty. A hybrid frls/nlms stereo acoustic
echo canceller. In Proceedings of IWAENC, 1999.
BIBLIOGRAPHY
171
[22] Y. Ephraim and H. L. Van Trees. A signal subspace approach for speech enhancement. IEEE Transactions on speech and audio processing, 3(4):251–266,
july 1995.
[23] D. A. F. Florencio and H. S. Malvar. Multichannel filtering for optimum noise
reduction in microphone arrays. In EEE International Conference on Acoustics,
Speech, and Signal Processing, pages 197–200. IEEE, May 2001.
[24] T. Gänsler and J. Benesty. Stereophonic acoustic echo cancellation and two–
channel adaptive filtering : an overview. International Journal of Adaptive Control and Signal Processing, February 2000.
[25] T. Gänsler, S. L. Gay, M. M. Sondhi, and J. Benesty. Double–talk robust fast converging algorithms for network echo cancellation. In Proc. 1999 IEEE Workshop
on applications of Signal Processing to Audio and Acoustics, pages 215–218.
IEEE, October 1999.
[26] S. L. Gay and S. Tavathia. The fast affine projection algorithm. In ICASSP,
pages 3023–3026. IEEE, 1995.
[27] Steven Gay. Fast projection algorithms with application to voice echo cancellation. PhD thesis, Rutgers, The State University of New Jersey, New Brunswick,
1994.
[28] A. Gilloire and V. Turbin. Using auditory properties to improve the behaviour of
stereophonic acoustic echo cancellers. In ICASSP. IEEE, 1998.
[29] Golub and Van Loan. Matrix Computations, chapter 12. Johns Hopkins, 1996.
[30] Simon Haykin. Adaptive Filter Theory. Prentice Hall, 3 edition, 1996.
[31] P. Heitkämper. An adaptation control for acoustic echo cancellers. IEEE Signal
processing letters, 4(6):170 – 173, June 1997.
[32] Q.-G. Liu, B. Champagne, and P. Kabal. A microphone array processing technique for speech enhancement in a reverberant space. Speech Communication,
18:317–334, 1996.
[33] S. Makino and S. Shimauchi. Stereophonic acoustic echo cancellation - an
overview and recent solutions. In Proceedings of International Workshop on
Acoustics and Echo Cancelling (IWAENC99), pages 12–19. IEEE, 1999.
[34] S. Makino, K. Strauss, S. Shimauchi, Y. Haneda, and A. Nakagawa. Subband
stereo echo canceller using the projection algorithm with fast convergence to the
true echo path. In Proceedings of the ICASSP, pages 299 – 302. IEEE, 1997.
[35] Henrique S. Malvar. Signal Processing With Lapped Transforms. Artech House,
0.
172
BIBLIOGRAPHY
[36] K. Maouche and D. T. M. Slock. The fast subsampled–updating fast affine projection (fsu fap) algorithm. Research report, Institut EURECOM, 2229, route
des Cretes, B.P.193, 06904 Sophia Antipolis Cedex, December 1994.
[37] Rainer Martin and Peter Vary. Combined acoustic echo cancellation, dereverberation and noise reduction : a two microphone approach. In Ann. Telecommun.,
volume 49, pages 429–438. 1994.
[38] Rainer Martin and Peter Vary. Combined acoustic echo control and noise reduction for hands–free telephony — state of the art and perspectives. In EUSIPCO96, page 1107, 1996.
[39] J. G. McWhirter. Recursive least squares minimisation using a systolic array. In
Proc. SPIE Real Time Signal Processing IV, volume 431, pages 105–112, 1983.
[40] M. Miyoshi and Y. Kaneda. Inverse Filtering of Room Acoustics. IEEE trans.
on Acoustics, Speech and Signal Proc., 36(2):145–152, February 1988.
[41] M. Mohan Sondhi, D. R. Morgan, and J. L. Hall. Stereophonic acoustic echo
cancellation — an overview of the fundamental problem. IEEE Signal Processing Letters, 2(8):148–151, August 1995.
[42] G. V. Moustakides and S. Theodoridis. Fast newton transversal filters – a new
class of adaptive estimation algorithms. IEEE Transactions on signal processing,
39(10):2184 – 2193, October 1991.
[43] K. Ozeki and T. Umeda. An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties. Electronics and communications
in Japan, 67-A(5):126 – 132, February 1984.
[44] C. B. Papadias and D. T. M. Slock. New adaptive blind equalization algorithms
for constant modulus constellations. In ICASSP94, pages 321–324, Adelaide,
Australia, April 1994. IEEE.
[45] J. Prado and E. Moulines. Frequency domain adaptive filtering with applications
to acoustic echo cancellation. Ann. Telecommun, 49(7-8):414–428, 1994.
[46] J. G. Proakis, C. M. Rader, F. Ling, C. L. Nikias, M. Moonen, and I. K.
Proudler. Algorithms for Statistical Signal Processing. Prentice–Hall, ISBN:
0-13-062219-2, 1/e edition, 2002.
[47] G. Rombouts and M. Moonen. Avoiding explicit regularisation in affine projection algorithms for acoustic echo cancellation. In Proceedings of ProRISC99,
Mierlo, The Netherlands, pages 395–398, November 1999.
[48] G. Rombouts and M. Moonen. A fast exact frequency domain implementation
of the exponentially windowed affine projection algorithm. In Proceedings of
Symposium 2000 for Adaptive Systems for Signal Processing, Communication
and Control (AS-SPCC), pages 342–346, Lake Louise, Canada, 2000.
BIBLIOGRAPHY
173
[49] G. Rombouts and M. Moonen. Regularized affine projection algorithms for multichannel acoustic echo cancellation. In Proceedings of IEEE-SPS2000, page
cdrom, Hilvarenbeek, The Netherlands, March 2000. IEEE.
[50] G. Rombouts and M. Moonen. Sparse–befap : A fast implementation of
fast affine projection avoiding explicit regularisation. In Proceedings of EUSIPCO2000, pages 1871–1874, September 2000.
[51] G. Rombouts and M. Moonen. Fast QRD–lattice–based optimal filtering for
acoustic noise reduction. Internal Report KULEUVEN/ESAT-SISTA/TR 01-48,
Submitted for publication., May 2001.
[52] G. Rombouts and M. Moonen. Acoustic noise reduction by means of qrd–based
optimal filtering. In Proceedings of MPCA2002, Leuven, Belgium, November
2002.
[53] G. Rombouts and M. Moonen. An integrated approach to acoustic noise and
echo suppression. Submitted for publication, January 2002.
[54] G. Rombouts and M. Moonen. Qrd–based optimal filtering for acoustic noise
reduction. In Proceedings of EUSIPCO2002, Toulouse, France, page CDROM,
September 2002.
[55] G. Rombouts and M. Moonen. A sparse block exact affine projection algorithm.
IEEE Transactions on Speech and Audio Processing, 10(2):100–108, February
2002.
[56] G. Rombouts and M. Moonen. QRD–based optimal filtering for acoustic noise
reduction. Internal Report KULEUVEN/ESAT-SISTA/TR 01-47, Accepted for
publication in Elsevier Signal Processing, February 2003.
[57] D. W. E. Schobben and P. C. W. Sommen. On the performance of too short
adaptive fir filters. In Proceedings Circuits Systems and Signal Proc. (ProRISC),
Mierlo, The Netherlands, pages 545–549, November 1997.
[58] S. Shimauchi, Y. Haneda, S. Makino, and Y. Kaneda. New configuration for a
stereo echo canceller with nonlinear pre–processing. In ICASSP. IEEE, 1998.
[59] M. Tanaka and S. Makino. A block exact fast affine projection algorithm. IEEE
Transactions on Speech and Audio Processing, 7(1):79–86, January 1999.
[60] D. Van Compernolle and S. Van Gerven. Beamforming with microphone arrays.
In V. Cappellini and A. Figueiras-Vidal, editors, Applications of Digital Signal
Processing to Telecommunications, pages 107–131. COST 229, 1995.
174
BIBLIOGRAPHY
List of publications
• Vandaele P., Rombouts G., Moonen M., “Implementation of an RTLS blind
equalization algorithm on DSP”, in Proc. of the 9th IEEE International Workshop on Rapid System Prototyping, Leuven, Belgium, Jun. 1998, pp. 150-155.
• Rombouts G., Moonen M., “Avoiding Explicit Regularisation in Affine Projection Algorithms for Acoustic Echo Cancellation”, in Proc. of the ProRISC/IEEE
Benelux Workshop on Circuits, Systems and Signal Processing (ProRISC99),
Mierlo, The Netherlands, Nov. 1999, pp. 395-398.
• Rombouts G., Moonen M., “A fast exact frequency domain implementation of
the exponentially windowed affine projection algorithm”, in Proc. of Symposium 2000 for Adaptive Systems for Signal Processing, Communication and
Control (AS-SPCC), Lake Louise, Canada, Oct. 2000, pp. 342-346.
• Rombouts G., “Regularized affine projection algorithms for multichannel acoustic echo cancellation”, in Proc. of the IEEE Benelux Signal Processing Symposium (SPS2000), Hilvarenbeek, The Netherlands, Mar. 2000.
• Rombouts G., Moonen M., “Sparse-BEFAP : A fast implementation of fast
affine projection avoiding explicit regularisation”, in Proc. of the European
Signal Processing Conference (EUSIPCO), Tampere, Finland, Sep. 2000, pp.
1871-1874.
• Schier J., Vandaele P., Rombouts G., Moonen M., “Experimental implementation of the spatial division multiple access (SDMA) algorithms using DSP
system with the TMS320C4x processors”, in The proceedings of the Third European DSP Education and Research Conference, Paris, France, Sept. 2000, pp.
CD-ROM.
• Rombouts G., Moonen M., “Acoustic noise reduction by means of QRD-based
unconstrained optimal filtering”, in Proc. of the IEEE Benelux Workshop on
Model based processing and coding of audio (MPCA), Leuven, Belgium, Nov.
2002.
175
• Rombouts G., Moonen M., “A sparse block exact affine projection algorithm”,
IEEE Transactions on Speech and Audio Processing, vol. 10, no. 2, Feb. 2002,
pp. 100-108.
• Rombouts G., Moonen M., “QRD-based optimal filtering for acoustic noise
reduction”, Accepted for publication in Elsevier Signal Processing, Internal Report 01-47, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2001.
• Rombouts G., Moonen M., “QRD–based optimal filtering for acoustic noise
reduction”, EUSIPCO 2002, Toulouse, France, CDROM.
Submitted papers
• Rombouts G., Moonen M., “An integrated approach to acoustic noise and echo
suppression”, Internal Report 02-206, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2002, submitted to Elsevier Signal Processing.
• Rombouts G., Moonen M., “Fast-QRD-based optimal filtering for acoustic noise
reduction”, Internal Report 01-48, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2001, resubmitted to IEEE Transactions on Speech And Audio processing for 2nd review.
176
Curriculum Vitae
Geert Rombouts was born in Turnhout on august 11, 1973. He studied at the Katholieke
Universiteit Leuven, faculty of applied sciences (Faculteit Toegepaste Wetenschappen) from 1991 to 1997, where he received his M.Sc. degree in electrical engineering
(Burgerlijk Ingenieur Elektrotechniek) in 1997. From 1997 to 2002 he did Phd. research at the Katholieke Universiteit Leuven, faculty of applied sciences.
177

Adaptive filtering algorithms for acoustic echo and noise cancellation

Transcription

Similar documents

Pantallas Acusticas Pared Verde (EN)

quietzone® noise control solutions

Johnny Knoxville - Furman University Scholar Exchange

Lâm Thị Diễm Khoa Điện tử - Viễn thông Trường Đại Học Khoa Học

user manual - Analogue Haven

Audio Filters - Universal Radio

Distortion, 0.01 Full Power Input - Electro

Mathematical Games

Bonus Appendix - Links to Product Sites