Widrow-Hoff Learning

Transcription

Widrow-Hoff Learning
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Widrow-Hoff Learning
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
May 20, 2010
Widrow-Hoff Learning
Outline
Widrow-Hoff
Learning
1 Introduction
Outline
Introduction
2 ADALINE Network
ADALINE
Network
Mean Square
Error
LMS
Algorithm
3 Mean Square Error
4 LMS Algorithm
Analysis of
Converge
Adaptive
Filtering
5 Analysis of Converge
6 Adaptive Filtering
Widrow-Hoff Learning
Introduction
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
In 1960, Bernard Widrow and his doctoral student Marcian
Hoff introduced the ADALINE (ADAptive LInear NEuron)
network and LMS(Least Mean Square) algorithm.
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Perceptron Network
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: a=hardlim(Wp+b)
Widrow-Hoff Learning
ADALINE Network
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: a=purelin(Wp+b)=Wp+b
Widrow-Hoff Learning
Single ADALINE
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
a = purelin(n) = purelin(w1,1 p1 + w1,2 p2 + b)
= w1,1 p1 + w1,2 p2 + b = 1 wT p + b
Widrow-Hoff Learning
decision boundary
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
ADALINE can be used to classify objects into two categories
Widrow-Hoff Learning
Mean Square Error
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
In statistics, the mean square error or MSE of an estimator is
one of many ways to quantify the difference between an
estimator and the true value of the quantity being estimated.
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Mean Square Error
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
In statistics, the mean square error or MSE of an estimator is
one of many ways to quantify the difference between an
estimator and the true value of the quantity being estimated.
In other words, mean square error is a measure of how good an
estimator of a distributional parameter.
Adaptive
Filtering
Widrow-Hoff Learning
Mean Square Error(conti.)
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Given a training set {p1 , t1 }, {p2 , t2 }, . . . , {pQ , tQ }, where pi is
an input, ti is the corresponding target output.
We want to find weights and biases of the ADALINE in order
to minimize the mean square error, where the error is the
difference between target outputs and network outputs.
Adaptive
Filtering
Widrow-Hoff Learning
Mean Square Error(conti.)
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
To simply our discussion, we consider the single-neuron case.
For convenience, denote
p
1w
x=
,z =
b
1
Now we can rewrite a = 1 wT p + b as a = xT z
Adaptive
Filtering
Widrow-Hoff Learning
Error analyse
Widrow-Hoff
Learning
The ADALINE network mean square error:
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
F (x) = E [(t − a)2 ] = E [(t − xT z)2 ]
where E [] is the expected value. We can expand F (x) as
follows:
F (x) = E [t 2 − 2txT z + xT zzT x]
Analysis of
Converge
= E [t 2 ] − 2xT E [tz] + xT E [zzT ]x
Adaptive
Filtering
= c − 2xT h + xT Rx
where c = E [t 2 ], h = E [tz] and R = E [zzT ].
Widrow-Hoff Learning
Error analyse(conti.)
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
The Hessian matrix of F (x) is 2R, however R is the input
correlation matrix, we use the following property
All correlation matrices are either positive definite or
positive semi-definite.
If R is positive define, we can find the strong minimum
x∗ = R−1 h
by setting ∇F (x) = −2h + 2Rx = 0.
Widrow-Hoff Learning
Example 1
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Question: Suppose that we have the following input/target
pairs:
1
1
p1 =
, t1 = 1 , p2 =
, t2 = −1
1
−1
These patterns occur with equality probability, and they are
used to train an ADALINE network with no bias. Find the
point such that minimize the mean square error.
Widrow-Hoff Learning
Example 1(conti.)
Widrow-Hoff
Learning
Solution: Since F (x) = c − 2xT h + xT Rx, we need to calculate
c, h and R.
Outline
Introduction
ADALINE
Network
Mean Square
Error
c = E [t 2 ] = (1)2 (0.5) + (−1)2 (0.5) = 1
1
1
0
h = E [tz] = (0.5)(1)
+ (0.5)(−1)
=
1
−1
1
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
1 1
1 1 + (0.5)
1 −1
R = E [zzT ] = (0.5)
1
−1
1 0
=
0 1
Widrow-Hoff Learning
Example 1(conti.)
Widrow-Hoff
Learning
Therefore
F (x) = c − 2xT h + xT Rx
0
= 1 − 2 w1,1 w1,2
1
1 0
w1,1
+ w1,1 w1,2
0 1
w1,2
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
2
2
= 1 − 2w1,2 + w1,1
+ w1,2
Analysis of
Converge
Adaptive
Filtering
Hence
∗
x =R
−1
h=
1 0
0 1
−1 0
1
=
Then we have minimum at w1,1 = 0, w1,2 = 1.
Widrow-Hoff Learning
0
1
Approximate Steepest Descent
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
In general, it is not desirable or convenient to calculate h and
R.
For this reason, we use an approximate steepest descent
algorithm, in which we use an estimated gradient.
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Approximate Gradient
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
Widrow and Hoff estimate the mean square error F (x) by
F̂ (x) = (t(k) − a(k))2 = e 2 (k)
Then, at each iteration we have gradient estimate of the form
LMS
Algorithm
Analysis of
Converge
(x) = ∇e 2 (k) =
h
∂e 2 (k)
∂w1,1 ,
∂e 2 (k)
∂w1,2 ,
··· ,
Adaptive
Filtering
Widrow-Hoff Learning
∂e 2 (k)
∂w1,R ,
∂e 2 (k)
∂b
iT
Approximate Gradient(conti.)
Widrow-Hoff
Learning
Outline
Introduction
For each j = 1, . . . , R
∂e(k)
∂[t(k) − a(k)]
∂e 2 (k)
= 2e(k)
= 2e(k)
∂w1,j
∂w1,j
∂w1,j
LMS
Algorithm
∂[t(k) − (1 wT p(k) + b)]
∂w1,j
"
#
R
X
∂
t(k) − (
w1,i pi (k) + b)
= 2e(k)
∂w1,j
Analysis of
Converge
= −2e(k)pj (k)
ADALINE
Network
Mean Square
Error
Adaptive
Filtering
= 2e(k)
i=1
Similarly we can obtain
"
#
R
X
∂e 2 (k)
∂
= 2e(k)
t(k) − (
w1,i pi (k) + b) = −2e(k)
∂b
∂b
i=1
Widrow-Hoff Learning
Approximate Gradient(conti.)
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Note z(k) =
p1 (k), p2 (k), · · · , pR (k), 1
T
Thus
∇F̂ (x) = ∇e 2 (k)
T
= −2e(k)p1 (k), · · · , −2e(k)pR (k), −2e(k)
= −2e(k)z(k)
To calculate ∇F̂ (x), we only need to multiply the input times
the error at iteration k.
Widrow-Hoff Learning
LMS Algorithm
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Recall the steepest descent algorithm with constant learning
rate
xk+1 = xk − α∇F (x)|x=xk
We substitute ∇F̂ (x) for ∇F (x), then
xk+1 = xk − α∇F̂ (x) = xk + 2αe(k)z(k)
that means
1 w(k
+ 1) = 1 w(k) + 2αe(k)p(k)
b(k + 1) = b(k) + 2αe(k)
Widrow-Hoff Learning
LMS Algorithm (conti.)
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Thus for multiple neurons, the LMS Algorithm can be rewritten
in matrix form
W(k + 1) = W(k) + 2αe(k)pT (k)
b(k + 1) = b(k) + 2αe(k)
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Example 2
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Question: Suppose that we have the following input/target
pairs:
1
1
p1 =
, t1 = 1 , p2 =
, t2 = −1
1
−1
These patterns occur with equality probability. Train an
ADALINE network with no bias using the LMS algorithm, with
initial guess set to zero and a learning rate α = 0.25.
Widrow-Hoff Learning
Example 2(conti.)
Widrow-Hoff
Learning
Solution : apply
Outline
p1 =
1
1
, t1 = 1
Introduction
ADALINE
Network
a(0) = purelin
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
0 0
1
1
=0
e(0) = t(0) − a(0) = 1 − 0 = 1
Thus
W(1) = W(0) + 2αe(0)pT (0)
= 0 0 + 2(0.25)(1) 1 1 = 0.5 0.5
Widrow-Hoff Learning
Example 2(conti.)
Widrow-Hoff
Learning
apply
Outline
p2 =
1
−1
, t2 = −1
Introduction
ADALINE
Network
a(1) = purelin
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
0.5 0.5
1
−1
=0
e(1) = t(1) − a(1) = −1 − 0 = −1
Thus
W(2) = W(1) + 2αe(1)pT (1)
= 0.5 0.5 + 2(0.25)(−1) 1 −1 = 0 1
Widrow-Hoff Learning
Example 2(conti.)
Widrow-Hoff
Learning
apply
Outline
p1 =
1
1
, t1 = 1 again
Introduction
ADALINE
Network
a(2) = purelin
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
0 1
1
1
=1
e(2) = t(2) − a(2) = 1 − 1 = 0
Thus
W(3) = W(2) + 2αe(2)pT (2)
= 0 1 + 2(0.25)(0) 1 1 = 0 1
Widrow-Hoff Learning
Example 2(conti.)
Widrow-Hoff
Learning
apply
p2 =
1
−1
, t2 = −1
again
Outline
Introduction
a(3) = purelin
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
0 1
1
−1
= −1
e(3) = t(3) − a(3) = −1 − (−1) = 0
Thus
W(4) = W(3) + 2αe(3)pT (3)
= 0 1 + 2(0.25)(0) 1 −1 = 0 1
Thus it converges.
Widrow-Hoff Learning
Analysis of Convergence
Widrow-Hoff
Learning
Outline
Take the expectation of xk+1 = xk + 2αe(k)z(k), then
E [xk+1 ] = E [xk ] + 2αE [e(k)z(k)]
= E [xk ] + 2α{E [ t(k) − xT
k z(k) z(k)]}
Introduction
= E [xk ] + 2α{E [t(k)z(k)] − E [ xT
k z(k) z(k)]}
= E [xk ] + 2α{E [t(k)z(k)] − E [z(k) zT (k)xk ]}
ADALINE
Network
Mean Square
Error
= E [xk ] + 2α{E [t(k)z(k)] − E [z(k)zT (k)xk ]}
LMS
Algorithm
= E [xk ] + 2α{E [t(k)z(k)] − E [z(k)zT (k)]E [xk ]}
Analysis of
Converge
= E [xk ] + 2α{h − RE [xk ]}
Adaptive
Filtering
Thus
E [xk+1 ] = [I − 2αR]E [xk ] + 2αh
Widrow-Hoff Learning
Analysis of Convergence(conti.)
Widrow-Hoff
Learning
Outline
E [xk+1 ] = [I − 2αR]E [xk ] + 2αh will be stable if
Introduction
1 − 2αλi > −1, ∀i
ADALINE
Network
Mean Square
Error
LMS
Algorithm
or
0 < α < 1/λmax
Analysis of
Converge
where λi are eigenvalues of R
Adaptive
Filtering
Recall:1 − 2αλi are eigenvalues of I − 2αR
Widrow-Hoff Learning
Analysis of Convergence(conti.)
Widrow-Hoff
Learning
Outline
ADALINE
Network
If the condition on stability is satisfied, then the steady state
solution is
E [xss ] = [I − 2αR]E [xss ] + 2αh
Mean Square
Error
or
Introduction
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
E [xss ] = R−1 h = x∗
Thus the LMS solution, obtained by applying one input vector
at a time, is the same as the minimum mean square error
solution.
Widrow-Hoff Learning
Example 3
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Question: Suppose that we have the following input/target
pairs:
1
1
p1 =
, t1 = 1 , p2 =
, t2 = −1
1
−1
These patterns occur with equality probability. What is the
maximum stable learning rate for the LMS algorithm ?
Solution: By Example 1, we have
1 0
R=
0 1
obviously λmax = 1, thus the upper limit on the learning rate at
α < 1/λmax = 1
Widrow-Hoff Learning
Perceptron rule V.S. LMS algorithm
Widrow-Hoff
Learning
Consider the classification problem with four classes of these
input vectors
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: input vectors
Widrow-Hoff Learning
Perceptron rule V.S. LMS algorithm(conti.)
Widrow-Hoff
Learning
Outline
Introduction
Train a perceptron network with weights
1
.
1
1 0
0 1
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: Final Decision Boundaries
Widrow-Hoff Learning
and biases
Perceptron rule V.S. LMS algorithm(conti.)
Widrow-Hoff
Learning
Train an ADALINE network with the same weights and biases
as above and learning rate α = 0.04.
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: Final Decision Boundaries
Widrow-Hoff Learning
Perceptron rule V.S. LMS algorithm(conti.)
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
The perceptron rule stops as soon as the patterns are correctly
classified, even through some patterns may be close to the
boundaries.
The LMS algorithm minimizes the mean square. Therefore it
tries to move the the boundaries as far as reference patterns as
possible.
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: Left:perceptron ; Right:adaline
Widrow-Hoff Learning
Adaptive Filtering
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
ADALINE is one of the most widely used neural networks in
practical applications.
One of the major of application areas of the ADALINE has
been adaptive filtering, where it is still used extensively.
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Tapped Delay Line
Widrow-Hoff
Learning
Outline
Introduction
The input signal enters from the left. The output of Tapped
Delay Line(TDL) is an R-dimensional vector, consisting of the
input signal at the current time and at delays of from 1 to R-1
time steps.
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: Tapped Delay Line(TDL)
Widrow-Hoff Learning
Adaptive Filter
Widrow-Hoff
Learning
Outline
Introduction
The output of the adaptive filter is
a(k) = purelin(Wp(k) + b) =
R
X
w1,i y (k − i + 1) + b
i=1
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: Adaptive Filter ADALINE
Widrow-Hoff Learning
Example 4
Widrow-Hoff
Learning
Consider the ADALINE filter below.
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Suppose that w1,1 = 2, w1,2 = −1, w1,3 = 3 and the input
sequence is
{y (k)} = {. . . , 0, 0, 0, 5, −4, 0, 0, 0, . . . }
where y (0) = 5, y (1) = −4, etc.
Widrow-Hoff Learning
Question i
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
[Q:] What is the filter output just prior to k = 0?
[A:] Just prior to k = 0 three zeros have entered the filter, and
the output is zero.
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Question ii
Widrow-Hoff
Learning
[Q:] What is the filter output from k = 0 to k = 5?
[A:] At t = 0
Outline

Introduction
ADALINE
Network
a(0) = Wp(0) =
=
w1,1 w1,2 w1,3
Mean Square
Error

y (0)
 y (−1) 
y (−2)

LMS
Algorithm
2 −1 3
Analysis of
Converge
Adaptive
Filtering

5
 0  = 10
0
Similarly, we have

a(1) = Wp(1) =
2 −1 3

−4
 5  = −13
0
Widrow-Hoff Learning
Question ii(conti.)
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering


0
a(2) = Wp(2) = 2 −1 3  −4  = 19
5


0
a(3) = Wp(3) = 2 −1 3  0  = −12
−4
 
0
a(4) = Wp(4) = 2 −1 3  0  = 0
0
All remaining outputs will be zero.
Widrow-Hoff Learning
Question iii
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
[Q:] How long does y (0) contributes to the output ?
[A:] The effect of y (0) last from k = 0 through k = 2, so it
will an influence for three time intervals.
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Adaptive Noise Cancellation
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Adaptive Noise Cancellation
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Figure: Adaptive Filter for Noise Cancellation
Widrow-Hoff Learning
mathematical analysis
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
The input vector is given by the current and previous values of
the noise source:
v (k)
z(k) =
v (k − 1)
while the target is the sum of the current signal and filter noise:
Mean Square
Error
t(k) = s(k) + m(k)
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Now we have
E [v 2 (k)]
E [v (k)v (k − 1)]
R = E [zz ] =
E [v (k − 1)v (k)] E [v 2 (k − 1)]
E [ s(k) + m(k)v (k)]
h = E [tz] =
E [ s(k) + m(k) v (k − 1)]
T
Widrow-Hoff Learning
signal assumption
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
EEG signal s is a white(uncorrelated from one time step to
the next) random signal uniformly distributed between the
values -0.2 and +0.2.
noise source v (k) = 1.2 sin(2kπ/3)
π
filtered noise m(k) = 0.12 sin( 2kπ
3 + 2)
Adaptive
Filtering
Widrow-Hoff Learning
performance Index
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Then we can calculated c = 0.0205 and the minimum square
error solution
−1 0.72 −0.36
0
−0.0578
∗
−1
x =R h=
=
−0.36 0.72
−0.0624
−0.1156
Thus the minimum mean square error
F (x∗ ) = c − 2x∗T h + x∗T Rx∗ = 0.0133
The mean square value of the EEG signal
Z 0.2
1
2
E [s (k)] =
s 2 ds = 0.0133
0.4 −0.2
Widrow-Hoff Learning
LMS Response
Widrow-Hoff
Learning
Outline
Introduction
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning
Echo Cancellation
Widrow-Hoff
Learning
Outline
Introduction
Another very important practical application of adaptive noise
cancellation is echo cancellation.
ADALINE
Network
Mean Square
Error
LMS
Algorithm
Analysis of
Converge
Adaptive
Filtering
Widrow-Hoff Learning

Similar documents