Exercise 8: Bias-variance decomposition of mean squared error

Transcription

CS331: Machine Learning
Prof. Dr. Volker Roth
[email protected]
FS 2015
Aleksander Wieczorek
[email protected]
Dept. of Mathematics and Computer Science
Spiegelgasse 1
4051 Basel
Date: Monday, April 13th 2015
Exercise 8: Bias-variance decomposition of mean squared error
Suppose that data (xi , yi ) are observed, where
yi = f (xi ) + i
i = 1, . . . , n
and
• xi = (xi1 , . . . , xip ) ∈ Rp
• yi ∈ R
• f : Rp → R
• i error with Ei = 0, V ar[i ] = σ, Cov[i , j ] = 0 for i 6= j.
Assume that function fˆ constructed based on data (xi , yi ) is used to approximate unknown f .
The mean squared error (MSE) of fˆ measures how well fˆ approximates f (i.e. predicts y
given new x):
M SE(fˆ(x)) = E[(fˆ(x) − f (x))2 ]
For a new data point x, MSE can be shown to depend on the bias and variance of fˆ(x):
M SE(fˆ(x)) = Bias(fˆ(x))2 + V ar(fˆ(x))
where
Bias(fˆ(x)) = E[fˆ(x)] − f (x)
V ar(fˆ(x)) = E[(fˆ(x) − E[fˆ(x)])2 ]
Prove the above result. What does it mean? What is its interpretation in the context of
regression / regularized (ridge) regression?
Exercise 9: Hoeffding’s inequality
Consider independent random variables X1 , . . . , Xn which are bounded i.e. P
Xi takes values in
[ai , bi ] with probability 1, i = 1, . . . , n. Then, for any t > 0, the sum Sn = n
i=1 Xi fulfils
the following inequality:
−2t2
.
(1)
P (|Sn − ESn | ≥ t) ≤ 2 exp P
(bi − ai )2
Exercise
Give a proof of Hoeffding’s Inequality.
This can be done in several steps:
1
CS331: Machine Learning
FS 2015
1. Show that for independent rv X1 , . . . , Xn and any s > 0 we have:
P (Sn − ESn ≥ t) ≤ e−st
n
Y
Ees(Xi −EXi ) .
(2)
i=1
(a) Multiply both sides by s and take the exp .
(b) Use Markov’s inequality: for a positive rv X we have that P (X ≥ t) ≤
EX
.
t
(c) Use the independence of X1 , . . . , Xn .
2. Show that for a rv X with EX = 0 if X takes values in [a, b] with probability 1 then for
any s > 0:
2
2
EesX ≤ es (b−a) /8
(3)
(a) Since exp is a convex function esX ≤ Cesb + Desa , for some C and D to
determine.
(b) Take the expectation EesX .
(c) Take the Taylor serie expansion of log(EesX ).
3. Combining (2) and (3) we obtain:
P (Sn − ESn ≥ t) ≤ inf
s>0
2
−st
e
n
Y
i=1
s2 (bi −ai )2 /8
e
!

Exercise 8: Bias-variance decomposition of mean squared error

Transcription

Similar documents

Examine the causes of income and wealth inequality in

CS331: Machine Learning FS 2012

First lesson

linear equations

How to Register on ESN’s Business Directory for Members 1) Go to Create Listing

Describe the solutions of each inequality in words. 1.

9th Grade Chapter 1 - 2 Review Worksheet 1. Classify each function

1. What is globalization? Africa in the Global Economy Renata Serra Spring 2008

Chapter 1 and 2

File

Youth leaving care Pathways into adulthood OCMW GENT

on global economy

Document 6522316

Reviewed By - National Education Policy Center

Why Physical and Human Capital Interactions Matter

A dynamic approach to a proximal-Newton method for

as PDF - Michigan Law Review

Local large deviations principle for occupation measures of the

Does horizontal education inequality lead to violent conflict?

notes - People - University of Oxford