Widrow-Hoff Learning
Transcription
Widrow-Hoff Learning
Widrow-Hoff Learning Outline Introduction ADALINE Network Widrow-Hoff Learning Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering May 20, 2010 Widrow-Hoff Learning Outline Widrow-Hoff Learning 1 Introduction Outline Introduction 2 ADALINE Network ADALINE Network Mean Square Error LMS Algorithm 3 Mean Square Error 4 LMS Algorithm Analysis of Converge Adaptive Filtering 5 Analysis of Converge 6 Adaptive Filtering Widrow-Hoff Learning Introduction Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm In 1960, Bernard Widrow and his doctoral student Marcian Hoff introduced the ADALINE (ADAptive LInear NEuron) network and LMS(Least Mean Square) algorithm. Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Perceptron Network Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: a=hardlim(Wp+b) Widrow-Hoff Learning ADALINE Network Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: a=purelin(Wp+b)=Wp+b Widrow-Hoff Learning Single ADALINE Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering a = purelin(n) = purelin(w1,1 p1 + w1,2 p2 + b) = w1,1 p1 + w1,2 p2 + b = 1 wT p + b Widrow-Hoff Learning decision boundary Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering ADALINE can be used to classify objects into two categories Widrow-Hoff Learning Mean Square Error Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error In statistics, the mean square error or MSE of an estimator is one of many ways to quantify the difference between an estimator and the true value of the quantity being estimated. LMS Algorithm Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Mean Square Error Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge In statistics, the mean square error or MSE of an estimator is one of many ways to quantify the difference between an estimator and the true value of the quantity being estimated. In other words, mean square error is a measure of how good an estimator of a distributional parameter. Adaptive Filtering Widrow-Hoff Learning Mean Square Error(conti.) Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Given a training set {p1 , t1 }, {p2 , t2 }, . . . , {pQ , tQ }, where pi is an input, ti is the corresponding target output. We want to find weights and biases of the ADALINE in order to minimize the mean square error, where the error is the difference between target outputs and network outputs. Adaptive Filtering Widrow-Hoff Learning Mean Square Error(conti.) Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge To simply our discussion, we consider the single-neuron case. For convenience, denote p 1w x= ,z = b 1 Now we can rewrite a = 1 wT p + b as a = xT z Adaptive Filtering Widrow-Hoff Learning Error analyse Widrow-Hoff Learning The ADALINE network mean square error: Outline Introduction ADALINE Network Mean Square Error LMS Algorithm F (x) = E [(t − a)2 ] = E [(t − xT z)2 ] where E [] is the expected value. We can expand F (x) as follows: F (x) = E [t 2 − 2txT z + xT zzT x] Analysis of Converge = E [t 2 ] − 2xT E [tz] + xT E [zzT ]x Adaptive Filtering = c − 2xT h + xT Rx where c = E [t 2 ], h = E [tz] and R = E [zzT ]. Widrow-Hoff Learning Error analyse(conti.) Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering The Hessian matrix of F (x) is 2R, however R is the input correlation matrix, we use the following property All correlation matrices are either positive definite or positive semi-definite. If R is positive define, we can find the strong minimum x∗ = R−1 h by setting ∇F (x) = −2h + 2Rx = 0. Widrow-Hoff Learning Example 1 Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Question: Suppose that we have the following input/target pairs: 1 1 p1 = , t1 = 1 , p2 = , t2 = −1 1 −1 These patterns occur with equality probability, and they are used to train an ADALINE network with no bias. Find the point such that minimize the mean square error. Widrow-Hoff Learning Example 1(conti.) Widrow-Hoff Learning Solution: Since F (x) = c − 2xT h + xT Rx, we need to calculate c, h and R. Outline Introduction ADALINE Network Mean Square Error c = E [t 2 ] = (1)2 (0.5) + (−1)2 (0.5) = 1 1 1 0 h = E [tz] = (0.5)(1) + (0.5)(−1) = 1 −1 1 LMS Algorithm Analysis of Converge Adaptive Filtering 1 1 1 1 + (0.5) 1 −1 R = E [zzT ] = (0.5) 1 −1 1 0 = 0 1 Widrow-Hoff Learning Example 1(conti.) Widrow-Hoff Learning Therefore F (x) = c − 2xT h + xT Rx 0 = 1 − 2 w1,1 w1,2 1 1 0 w1,1 + w1,1 w1,2 0 1 w1,2 Outline Introduction ADALINE Network Mean Square Error LMS Algorithm 2 2 = 1 − 2w1,2 + w1,1 + w1,2 Analysis of Converge Adaptive Filtering Hence ∗ x =R −1 h= 1 0 0 1 −1 0 1 = Then we have minimum at w1,1 = 0, w1,2 = 1. Widrow-Hoff Learning 0 1 Approximate Steepest Descent Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm In general, it is not desirable or convenient to calculate h and R. For this reason, we use an approximate steepest descent algorithm, in which we use an estimated gradient. Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Approximate Gradient Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error Widrow and Hoff estimate the mean square error F (x) by F̂ (x) = (t(k) − a(k))2 = e 2 (k) Then, at each iteration we have gradient estimate of the form LMS Algorithm Analysis of Converge (x) = ∇e 2 (k) = h ∂e 2 (k) ∂w1,1 , ∂e 2 (k) ∂w1,2 , ··· , Adaptive Filtering Widrow-Hoff Learning ∂e 2 (k) ∂w1,R , ∂e 2 (k) ∂b iT Approximate Gradient(conti.) Widrow-Hoff Learning Outline Introduction For each j = 1, . . . , R ∂e(k) ∂[t(k) − a(k)] ∂e 2 (k) = 2e(k) = 2e(k) ∂w1,j ∂w1,j ∂w1,j LMS Algorithm ∂[t(k) − (1 wT p(k) + b)] ∂w1,j " # R X ∂ t(k) − ( w1,i pi (k) + b) = 2e(k) ∂w1,j Analysis of Converge = −2e(k)pj (k) ADALINE Network Mean Square Error Adaptive Filtering = 2e(k) i=1 Similarly we can obtain " # R X ∂e 2 (k) ∂ = 2e(k) t(k) − ( w1,i pi (k) + b) = −2e(k) ∂b ∂b i=1 Widrow-Hoff Learning Approximate Gradient(conti.) Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Note z(k) = p1 (k), p2 (k), · · · , pR (k), 1 T Thus ∇F̂ (x) = ∇e 2 (k) T = −2e(k)p1 (k), · · · , −2e(k)pR (k), −2e(k) = −2e(k)z(k) To calculate ∇F̂ (x), we only need to multiply the input times the error at iteration k. Widrow-Hoff Learning LMS Algorithm Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Recall the steepest descent algorithm with constant learning rate xk+1 = xk − α∇F (x)|x=xk We substitute ∇F̂ (x) for ∇F (x), then xk+1 = xk − α∇F̂ (x) = xk + 2αe(k)z(k) that means 1 w(k + 1) = 1 w(k) + 2αe(k)p(k) b(k + 1) = b(k) + 2αe(k) Widrow-Hoff Learning LMS Algorithm (conti.) Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Thus for multiple neurons, the LMS Algorithm can be rewritten in matrix form W(k + 1) = W(k) + 2αe(k)pT (k) b(k + 1) = b(k) + 2αe(k) Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Example 2 Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Question: Suppose that we have the following input/target pairs: 1 1 p1 = , t1 = 1 , p2 = , t2 = −1 1 −1 These patterns occur with equality probability. Train an ADALINE network with no bias using the LMS algorithm, with initial guess set to zero and a learning rate α = 0.25. Widrow-Hoff Learning Example 2(conti.) Widrow-Hoff Learning Solution : apply Outline p1 = 1 1 , t1 = 1 Introduction ADALINE Network a(0) = purelin Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering 0 0 1 1 =0 e(0) = t(0) − a(0) = 1 − 0 = 1 Thus W(1) = W(0) + 2αe(0)pT (0) = 0 0 + 2(0.25)(1) 1 1 = 0.5 0.5 Widrow-Hoff Learning Example 2(conti.) Widrow-Hoff Learning apply Outline p2 = 1 −1 , t2 = −1 Introduction ADALINE Network a(1) = purelin Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering 0.5 0.5 1 −1 =0 e(1) = t(1) − a(1) = −1 − 0 = −1 Thus W(2) = W(1) + 2αe(1)pT (1) = 0.5 0.5 + 2(0.25)(−1) 1 −1 = 0 1 Widrow-Hoff Learning Example 2(conti.) Widrow-Hoff Learning apply Outline p1 = 1 1 , t1 = 1 again Introduction ADALINE Network a(2) = purelin Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering 0 1 1 1 =1 e(2) = t(2) − a(2) = 1 − 1 = 0 Thus W(3) = W(2) + 2αe(2)pT (2) = 0 1 + 2(0.25)(0) 1 1 = 0 1 Widrow-Hoff Learning Example 2(conti.) Widrow-Hoff Learning apply p2 = 1 −1 , t2 = −1 again Outline Introduction a(3) = purelin ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering 0 1 1 −1 = −1 e(3) = t(3) − a(3) = −1 − (−1) = 0 Thus W(4) = W(3) + 2αe(3)pT (3) = 0 1 + 2(0.25)(0) 1 −1 = 0 1 Thus it converges. Widrow-Hoff Learning Analysis of Convergence Widrow-Hoff Learning Outline Take the expectation of xk+1 = xk + 2αe(k)z(k), then E [xk+1 ] = E [xk ] + 2αE [e(k)z(k)] = E [xk ] + 2α{E [ t(k) − xT k z(k) z(k)]} Introduction = E [xk ] + 2α{E [t(k)z(k)] − E [ xT k z(k) z(k)]} = E [xk ] + 2α{E [t(k)z(k)] − E [z(k) zT (k)xk ]} ADALINE Network Mean Square Error = E [xk ] + 2α{E [t(k)z(k)] − E [z(k)zT (k)xk ]} LMS Algorithm = E [xk ] + 2α{E [t(k)z(k)] − E [z(k)zT (k)]E [xk ]} Analysis of Converge = E [xk ] + 2α{h − RE [xk ]} Adaptive Filtering Thus E [xk+1 ] = [I − 2αR]E [xk ] + 2αh Widrow-Hoff Learning Analysis of Convergence(conti.) Widrow-Hoff Learning Outline E [xk+1 ] = [I − 2αR]E [xk ] + 2αh will be stable if Introduction 1 − 2αλi > −1, ∀i ADALINE Network Mean Square Error LMS Algorithm or 0 < α < 1/λmax Analysis of Converge where λi are eigenvalues of R Adaptive Filtering Recall:1 − 2αλi are eigenvalues of I − 2αR Widrow-Hoff Learning Analysis of Convergence(conti.) Widrow-Hoff Learning Outline ADALINE Network If the condition on stability is satisfied, then the steady state solution is E [xss ] = [I − 2αR]E [xss ] + 2αh Mean Square Error or Introduction LMS Algorithm Analysis of Converge Adaptive Filtering E [xss ] = R−1 h = x∗ Thus the LMS solution, obtained by applying one input vector at a time, is the same as the minimum mean square error solution. Widrow-Hoff Learning Example 3 Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Question: Suppose that we have the following input/target pairs: 1 1 p1 = , t1 = 1 , p2 = , t2 = −1 1 −1 These patterns occur with equality probability. What is the maximum stable learning rate for the LMS algorithm ? Solution: By Example 1, we have 1 0 R= 0 1 obviously λmax = 1, thus the upper limit on the learning rate at α < 1/λmax = 1 Widrow-Hoff Learning Perceptron rule V.S. LMS algorithm Widrow-Hoff Learning Consider the classification problem with four classes of these input vectors Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: input vectors Widrow-Hoff Learning Perceptron rule V.S. LMS algorithm(conti.) Widrow-Hoff Learning Outline Introduction Train a perceptron network with weights 1 . 1 1 0 0 1 ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: Final Decision Boundaries Widrow-Hoff Learning and biases Perceptron rule V.S. LMS algorithm(conti.) Widrow-Hoff Learning Train an ADALINE network with the same weights and biases as above and learning rate α = 0.04. Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: Final Decision Boundaries Widrow-Hoff Learning Perceptron rule V.S. LMS algorithm(conti.) Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error The perceptron rule stops as soon as the patterns are correctly classified, even through some patterns may be close to the boundaries. The LMS algorithm minimizes the mean square. Therefore it tries to move the the boundaries as far as reference patterns as possible. LMS Algorithm Analysis of Converge Adaptive Filtering Figure: Left:perceptron ; Right:adaline Widrow-Hoff Learning Adaptive Filtering Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm ADALINE is one of the most widely used neural networks in practical applications. One of the major of application areas of the ADALINE has been adaptive filtering, where it is still used extensively. Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Tapped Delay Line Widrow-Hoff Learning Outline Introduction The input signal enters from the left. The output of Tapped Delay Line(TDL) is an R-dimensional vector, consisting of the input signal at the current time and at delays of from 1 to R-1 time steps. ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: Tapped Delay Line(TDL) Widrow-Hoff Learning Adaptive Filter Widrow-Hoff Learning Outline Introduction The output of the adaptive filter is a(k) = purelin(Wp(k) + b) = R X w1,i y (k − i + 1) + b i=1 ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: Adaptive Filter ADALINE Widrow-Hoff Learning Example 4 Widrow-Hoff Learning Consider the ADALINE filter below. Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Suppose that w1,1 = 2, w1,2 = −1, w1,3 = 3 and the input sequence is {y (k)} = {. . . , 0, 0, 0, 5, −4, 0, 0, 0, . . . } where y (0) = 5, y (1) = −4, etc. Widrow-Hoff Learning Question i Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm [Q:] What is the filter output just prior to k = 0? [A:] Just prior to k = 0 three zeros have entered the filter, and the output is zero. Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Question ii Widrow-Hoff Learning [Q:] What is the filter output from k = 0 to k = 5? [A:] At t = 0 Outline Introduction ADALINE Network a(0) = Wp(0) = = w1,1 w1,2 w1,3 Mean Square Error y (0) y (−1) y (−2) LMS Algorithm 2 −1 3 Analysis of Converge Adaptive Filtering 5 0 = 10 0 Similarly, we have a(1) = Wp(1) = 2 −1 3 −4 5 = −13 0 Widrow-Hoff Learning Question ii(conti.) Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering 0 a(2) = Wp(2) = 2 −1 3 −4 = 19 5 0 a(3) = Wp(3) = 2 −1 3 0 = −12 −4 0 a(4) = Wp(4) = 2 −1 3 0 = 0 0 All remaining outputs will be zero. Widrow-Hoff Learning Question iii Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm [Q:] How long does y (0) contributes to the output ? [A:] The effect of y (0) last from k = 0 through k = 2, so it will an influence for three time intervals. Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Adaptive Noise Cancellation Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Adaptive Noise Cancellation Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Figure: Adaptive Filter for Noise Cancellation Widrow-Hoff Learning mathematical analysis Widrow-Hoff Learning Outline Introduction ADALINE Network The input vector is given by the current and previous values of the noise source: v (k) z(k) = v (k − 1) while the target is the sum of the current signal and filter noise: Mean Square Error t(k) = s(k) + m(k) LMS Algorithm Analysis of Converge Adaptive Filtering Now we have E [v 2 (k)] E [v (k)v (k − 1)] R = E [zz ] = E [v (k − 1)v (k)] E [v 2 (k − 1)] E [ s(k) + m(k)v (k)] h = E [tz] = E [ s(k) + m(k) v (k − 1)] T Widrow-Hoff Learning signal assumption Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge EEG signal s is a white(uncorrelated from one time step to the next) random signal uniformly distributed between the values -0.2 and +0.2. noise source v (k) = 1.2 sin(2kπ/3) π filtered noise m(k) = 0.12 sin( 2kπ 3 + 2) Adaptive Filtering Widrow-Hoff Learning performance Index Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Then we can calculated c = 0.0205 and the minimum square error solution −1 0.72 −0.36 0 −0.0578 ∗ −1 x =R h= = −0.36 0.72 −0.0624 −0.1156 Thus the minimum mean square error F (x∗ ) = c − 2x∗T h + x∗T Rx∗ = 0.0133 The mean square value of the EEG signal Z 0.2 1 2 E [s (k)] = s 2 ds = 0.0133 0.4 −0.2 Widrow-Hoff Learning LMS Response Widrow-Hoff Learning Outline Introduction ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Widrow-Hoff Learning Echo Cancellation Widrow-Hoff Learning Outline Introduction Another very important practical application of adaptive noise cancellation is echo cancellation. ADALINE Network Mean Square Error LMS Algorithm Analysis of Converge Adaptive Filtering Widrow-Hoff Learning