Chapter 2: Large sample theory Part IV Florian Pelgrin September-December, 2010
Transcription
Chapter 2: Large sample theory Part IV Florian Pelgrin September-December, 2010
Chapter 2: Large sample theory Part IV Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 1 / 46 Introduction 1. Introduction ...Under certain conditions, the OLS estimator is BLUE or one can get the exact sampling distribution (e.g., Gaussian linear model). In a more general framework (without the Gauss-Markov assumptions or the normality assumption), it is not always possible to find (best linear) unbiased estimators or the exact sampling distribution. What can be done? Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 2 / 46 Introduction Solutions: 1. One may settle for estimators that are (weakly, strongly) consistent, meaning that as the sample size goes to infinity, the distribution of the estimator collapses to the parameter value: ? βˆn,OLS → β0 n→∞ where β0 is the true unknown parameter vector. Consistency is then the ”counterpart” of unbiasedness for large samples. 2. Asymptotic or large sample theory tells us about the distribution of the OLS estimator if the sample is sufficiently large: √ ? n(βˆn,OLS − β0 ) → N (., .) n→∞ Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 3 / 46 Introduction To obtain these results, one applies: The Law of Large Numbers (LLN); The Central Limit Theorem (CLT). Using the law of large numbers (and suitable assumptions) shows the consistency property: ? βˆn,OLS → β0 n→∞ Using the central limit theorem (and suitable assumptions) shows the large sample distribution: √ ? n(βˆn,OLS − β0 ) → N (., .) n→∞ Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 4 / 46 Introduction The law of large numbers and the central limit theorem invoke different modes of convergence Indeed, there is not a once-for-all convergence definition for random variables (vectors, matrices) as in sequences of real numbers. Among others, Almost sure convergence (a.s) Convergence in probability (p) Convergence in L2 , in mean square or in quadratic mean (m.s or L2 ) Convergence in distribution Each convergence notion provides an essential foundation for certain results and requires some suitable assumptions. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 5 / 46 Introduction Some relations exist between these different modes: quadratic mean ⇒ probability ⇒ distribution. All in all, different laws of large numbers and central limit theorems exist, i.e. different conditions that apply to different kinds of economic and financial data (e.g., time series versus cross-section data). Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 6 / 46 Introduction 1 2 Introduction Modes of convergence Almost sure convergence Convergence in probability Convergence in mean square Convergence in distribution Some handy theorems 3 Law of large numbers Overview Law of large numbers in the i.i.d case 4 Consistency of the OLS estimator 5 Central limit theorems Overview Univariate central limit theorem with i.i.d. observations Multivariate central limit theorem with i.i.d. observations Large sample distribution of the OLS estimator 6 7 8 Extension: Delta method Example Univariate Delta method Multivariate Delta method Summary Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 7 / 46 Modes of convergence 2. Mode of convergence We are mainly concerned with four modes of convergence: 1 Almost sure convergence 2 Convergence in probability 3 Convergence in quadratic mean 4 Convergence in distribution. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 8 / 46 Modes of convergence Almost sure convergence 2.1. Almost sure convergence Definition (Almost sure convergence) Let X1 ,X2 ,· · · ,Xn be a sequence of (real-valued) random variables. Let X be a stochastic or non-stochastic variable. Xn converges almost surely to X , if, for every > 0, P lim |Xn − X | < = 1. n→∞ It is written: a.s Xn → X . Remark: For sake of simplicity, we assume that X is a degenerate random variable, i.e. a real number, say c. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 9 / 46 Modes of convergence Almost sure convergence Definition A point estimator θˆn of θ0 is strongly consistent if: a.s θˆn → θ0 . Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 10 / 46 Modes of convergence Convergence in probability 2.2. Convergence in probability Definition Let {Xi } i = 1, · · · , n be a sequence of real-valued random variables. p Xn converges in probability to c, written Xn → c or plimXn = c, if there exists c such that for all > 0, lim P (|Xn − c| > ) = 0 n→∞ 1 ...Xn is very likely to be close to c for large n, but what about the location of the remaining small probability mass which is not close to c?... 2 Convergence in probability allows more erratic behavior in the converging sequence than almost sure convergence. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 11 / 46 Modes of convergence Convergence in probability Definition A point estimator θˆn of θ0 is (weakly) consistent if p θˆn → θ0 . Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 12 / 46 Modes of convergence Convergence in mean square 2.3. Convergence in mean square Definition Let {Xi } i =1, · · · , n be a sequence of real-valued random variables such that E |Xn |2 < ∞. Xn converges in mean square to c, written m.s. Xn → c, if there exists a real number c such that: h i E |Xn − c|2 → 0. n→∞ Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 13 / 46 Modes of convergence Convergence in distribution 2.4. Convergence in distribution Definition Let X1 ,· · · ,Xn be a sequence of random variables and let X be another random variable. Let Fn and F denote the cumulative distribution function of Xn and X , respectively. Xn converges in distribution to X : lim Fn (t) = F (t) n→∞ for all t such that F is continuous. Convergence in distribution is written: d l a Xn → X or Xn → X or Xn ∼ X . Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 14 / 46 Some handy theorems Modes of convergence 2.5. Some handy theorems Theorem (Continuous mapping theorem) Let g : Rm → Rp (m, p ∈ N) be a multivariate function. Let {Xi } i = 1, · · · , n denote any sequence of random m × 1 vectors such that Xn converges almost surely to c. If g is continuous at c, then: a.s g(Xn ) → g(c). Example: Suppose that X1 ,· · · ,Xn are i.i.d. P(λ). Then: p ¯n → X λ (WLLN) and (using the continuous mapping theorem): p ¯n ) → exp(−λ). exp(−X Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 15 / 46 Modes of convergence Some handy theorems Theorem (Slutsky) Let Xn and Yn be two sequences of random variables. p p p p p p p 1 If Xn → X and Yn → Y , then Xn + Yn → X + Y . 2 If Xn → X and Yn → Y , then Xn Yn → XY . 3 If Xn → X and Yn → c, then Xn /Yn → X /c. p p Remark: This also holds for sequences of random matrices, the last p p statement reads: If Xn → Ω, then Xn−1 → Ω−1 (provided Ω−1 exists). Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 16 / 46 Modes of convergence Some handy theorems Theorem (Slutsky!!!) Let X1 ,· · · , Xn and Y1 ,· · · , Yn be two sequences of random variables and let X and c be a random variable and a constant, respectively. If p d Xn → X and Yn → c then, d Xn Yn → cX d Xn + Yn → X + c d Xn /Yn → X /c for c 6= 0. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 17 / 46 Law of large numbers Overview 3. Law of large numbers. 3.1. Overview The law of large numbers tells you that sample moments converge in probability (weak law of large numbers), almost surely (strong law of large numbers), or in Lp (Lp law of large numbers) to the corresponding population moments: n 1X r r ¯ Xn ≡ Xi → E X¯nr . n→∞ n i=1 ...”the probability that the sample mean of order r gets close to the population mean of order r can be made as high as you like by taking a sufficiently large enough sample” Example: The proportion of heads of a large number of (independent) tosses of a fair coin is expected to be close to 1/2. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 18 / 46 Law of large numbers Overview Example: Xi ∼ U[−0.5,0.5] for all i = 1, · · · , 1000. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 19 / 46 Law of large numbers Overview General form of a law of large numbers: Suppose restrictions on the dependence, the distribution, and moments of a sequence of random variables {Zi }, then: ¯ nr a.s. Z¯nr − m →0 n P where Z¯nr ≡ n−1 Zir and mnr ≡ E Z¯nr . i=1 Generally, four cases: 1 2 3 4 Independent and identically distributed observations; Independent and heterogeneously distributed observations; Dependent and identically distributed observations; Dependent and heterogeneously distributed observations. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 20 / 46 Law of large numbers Law of large numbers in the i.i.d case 3.2. Law of large numbers in the i.i.d case Theorem (Khinchine) If {Zi } , i = 1, · · · , n, is a sequence of independently and identically distributed random variables with finite mean E(Zi ) = µ0 (< ∞), then: n 1X p Zi → µ0 . Z¯n = n i=1 Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 21 / 46 Law of large numbers Law of large numbers in the i.i.d case Theorem (Kolmogorov) If {Zi } , i = 1, · · · , n, is a sequence of independently and identically distributed random variables such that: E(Zi ) = µ0 < +∞ E(|Zi |) < +∞ then: n 1 X a.s Z¯n = Zi → µ0 . n i=1 Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 22 / 46 Consistency of the OLS estimator 4. Consistency of the OLS estimator Theorem (Consistency in the i.i.d case) Under suitable regularity conditions, the OLS estimator of β0 in the multiple linear regression model yi = xi0 β0 + ui satisfies: p βˆn,OLS → β0 Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 23 / 46 Consistency of the OLS estimator Proof: The ordinary least squares estimator is given by: !−1 ! n n X X βˆn,OLS = xi x 0 xi yi . i i=1 i=1 STEP 1: Replace yi by xi0 β0 + ui and expand: !−1 ! n n X X βˆn,OLS = xi x 0 xi (x 0 β0 + ui ) i i i=1 = β0 + n X xi xi0 i=1 !−1 n X i=1 xi ui . i=1 STEP 2: Sample mean specification (multiply and divide by n): !−1 ! n n X X 1 1 0 βˆn,OLS = β0 + xi xi xi ui . n n i=1 Florian Pelgrin (HEC) Ordinary least squares estimator i=1 September-December, 2010 24 / 46 Consistency of the OLS estimator STEP 3: Using the weak law of large numbers (conditions hold!): n 1X 0 p xi xi → E[xi xi0 ] n i=1 n 1X n p xi ui → E[xi ui ]. i=1 STEP 4: Using Slutsky theorem: p βˆn,OLS → β0 + E−1 [xi xi0 ]E[xi ui ] p βˆn,OLS → β0 since E−1 [xi xi0 ] exists and E(xi ui ) = 0k×1 (implication of the zero conditional mean condition). Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 25 / 46 Consistency of the OLS estimator Inconsistency of OLS estimator Failure of the zero conditional mean assumption, E(u | X ) = 0n×1 , causes bias. Failure of E(xi ui ) = 0k×1 causes inconsistency. Consistency only requires zero correlation between u and X (implied by and weaker than the unconditional mean independence assumption). Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 26 / 46 Central limit theorems Overview 5. Central limit theorems 5.1. Overview One needs more than consistency to do inference: The sampling distribution of the OLS estimator On the one hand, consistency (and thus the use of laws of large numbers) yields degenerated or point-mass distribution. On the other hand, the exact sampling distribution of the OLS estimator was obtained under the (conditional) normality of u (u ∼ N (., .) or u | X ∼ N (., .)). In practice, many outcomes (under study) are not (conditionally) normal! Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 27 / 46 Central limit theorems Overview In this respect, large sample theory tells us that the distribution of the OLS estimator is approximately normally distributed. In doing so, one applies central limit theorems: ...”sample moments are asymptotically normally distributed (after re-normalizing) and the asymptotic variance-covariance matrix is given by the variance of the underlying random variable”...or...”appropriately normalized sample moments are approximately normally distributed in large enough samples”... Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 28 / 46 Central limit theorems Overview Example: Xi ∼ U[−0.5,0.5] for all i = 1, · · · , 1000. Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 29 / 46 Central limit theorems Overview General form of a central limit theorem: Suppose restrictions on the dependence, the distribution, and moments of a sequence of random variables {Zi }, then: ¯n Z¯n − m σ¯n √ n = √ Z¯n − m ¯ n a.d. n → N (0, 1) σ¯n n √ P where Z¯n ≡ n−1 Zi , mn ≡ E Z¯n , and σ¯n 2 ≡ V nZ¯n . i=1 Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 30 / 46 Central limit theorems Univariate central limit theorem with i.i.d. observations 5.2. Univariate central limit theorem with i.i.d. observations ´ Theorem (Lindeberg-Levy) Let {Zi } denote a sequence of independent and identically distributed real random variables, with mean µ0 = E (Zi ) and variance σ02 = V (Zi ) < ∞. If σ02 6= 0, then: √ √ ¯n /¯ σn = n Z¯n − µ n Z¯n − µ0 /σ0 = n−1/2 n X a.d. (Zi − µ0 ) /σ0 → N (0, 1) i=1 where Z¯n = n 1P Zi , n i=1 µ ¯n = n 1P µ0 n i=1 = µ0 , and σ ¯ n = σ0 . Application: Let X1 ,· · · ,Xn denote a sequence of independent and identically distributed random variables, Xi ∼ B(p), √ ` ´ a.d. ¯n Ordinary Florian Pelgrin (HEC) leastN squares estimator September-December, 2010 n X −p → (0, p(1 − p)). 31 / 46 Central limit theorems Univariate central limit theorem with i.i.d. observations 1 ´ theorem with If we compare the conditions of the Lindeberg-Levy the law of large numbers for independent and identically distributed observations, only one single additional requirement is imposed: σ02 = V (Zi ) < ∞. This implies that E|Zi | < ∞. 2 The central limit theorem requires virtually no assumptions (other than independence and finite variances) to end up with normality: normality is inherited from the sums of ”small” independent disturbances with finite variance. 3 The central limit theory is ”stronger” than the law of large numbers—conclusions can be inferred regarding the speed of √ convergence ( n) and the asymptotic behavior of the distribution. 4 No general result to check how good the approximation is in ` general (e.g., the Berry-Esseen inequality). 5 The central limit theorem does not assert that the sample mean tends to normality. It is the transformation of the sample mean that has this property!!! Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 32 / 46 Central limit theorems Multivariate central limit theorem with i.i.d. observations 5.3. Multivariate central limit theorem with i.i.d. observations Theorem Let Z1 ,Z2 ,· · · ,Zn be independent and identically distributed random vectors (of dimension k )—Zi = (Z1i , Z2i , · · · , Zki )t with mean vector µ0 = (µ1,0 , · · · , µk,0 )t and a positive definite variance-covariance matrix Σ0 . Let Z¯1,n Z¯n = ... Z¯k,n n P where Z¯j,n = n−1 Zji (with j = 1, · · · , k ). Then: i=1 √ Florian Pelgrin (HEC) a.d. n Z¯n − µ0 → N (0, Σ0 ). Ordinary least squares estimator September-December, 2010 33 / 46 Large sample distribution of the OLS estimator 6. Large sample distribution of the OLS estimator Theorem Consider the multiple linear regression model: yi = xi0 β0 + ui with assumptions H1-H5. Then, under suitable regularity conditions, the large sample distribution of the OLS estimator in the i.i.d. case is given by: √ d n(βˆn,OLS − β0 ) → N (0k×1 , σ02 E−1 (xi xi0 )) Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 34 / 46 Large sample distribution of the OLS estimator Proof: The ordinary least squares estimator is: βˆn,OLS = n X !−1 xi xi0 n X i=1 ! xi yi i=1 STEP 1: Proceed as in the consistency result: βˆn,OLS = β0 + n X !−1 xi xi0 i=1 n X xi ui i=1 STEP 2: Normalize the vector (βˆn,OLS − β0 ): √ n n(βˆn,OLS − β0 ) = 1X 0 xi xi n !−1 i=1 Florian Pelgrin (HEC) Ordinary least squares estimator n 1 X √ xi ui n i=1 ! September-December, 2010 35 / 46 Large sample distribution of the OLS estimator STEP 3: Weak law of large numbers and central limit theorem Using the WLLN (and Slutsky theorem): n 1X 0 xi xi n !−1 p → E−1 [xi xi0 ] i=1 Using the CLT: n 1 X √ xi ui n i=1 ! d → N (0k×1 , V(xi ui )) where V(xi ui ) = E (V(xi ui | xi )) + V (E(xi ui | xi )) = E xi V(ui | xi )xi0 + 0 = σ02 E(xi xi0 ) Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 36 / 46 Large sample distribution of the OLS estimator STEP 4: Using Slutsky theorem (convergence in distribution) implies that: n 1X 0 xi xi n i=1 !−1 n 1 X √ xi ui n i=1 ! d → A−1 Z d with A = E[xi xi0 ], Z → N (0k×1 , σ02 A) from which it follows: √ d n(βˆn,OLS − β0 ) → N A−1 0k×1 , σ02 A−1 AA−1 d → N 0k×1 , σ02 A−1 . Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 37 / 46 Large sample distribution of the OLS estimator Remarks: 1. σ02 A−1 is unknown! 2. A consistent estimator of σ02 is: 2 σ ˆn,OLS = (n − k)−1 3. The sample analog of A−1 n n P 2 ˆi2 or σ u ˜n,OLS = n−1 i=1 n P i=1 ˆi2 u is: 1X 0 xi xi n !−1 = n(X 0 X )−1 i=1 4. A consistent estimator of the asymptotic variance-covariance matrix of βˆn,OLS is: 2 Vasy = σ ˆn,OLS (X 0 X )−1 Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 38 / 46 Large sample distribution of the OLS estimator Definition A consistent estimator θˆn of θ0 is said to be asymptotically normally distributed (asymptotically normal) if: √ a.d. n θˆn − θ0 → N (0, Σ0 ) . Equivalently, θˆn is asymptotically normal if: a θˆn ∼ N θ0 , n−1 Σ0 ˆ n. Vasy θˆn ≡ avar θˆn = n−1 Σ Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 39 / 46 Extension: Delta method Example 7. Extension: The Delta method 7.1. Example Consider the following generalized learning curve (Berndt, 1992): αc /R Ct = C1 Nt (1−R)/R Yt exp(ut ) t = 1, · · · , T where Ct ,Nt , Yt , and ut denote respectively the real unit cost at time t, the cumulative production up to time t, the production in time t, and an i.i.d.(0, σ 2 ) error term. The two structural parameters are αc (the learning curve parameter) and R (the returns to scale parameter). The log-linear model writes: α 1−R c log(Ct ) = log(C1 ) + log(Nt ) + log(Yt ) + ut R R = xt0 β + ut (reduced-form equation) where β0 = logC1 , β1 = αc /R, β2 = (1 − R)/R, and xt = (1, logNt , logYt )0 . Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 40 / 46 Extension: Delta method Example Starting from the asymptotic distribution of the estimator of β (the reduced-form model), can we back out the asymptotic distribution of the structural parameters (αc , R)0 ? Three ingredients: 1 2 3 The asymptotic distribution of the estimator of β has to be known... The structural parameters (or the parameters of interest) must be some functions of β. Example : The learning curve may be recovered using: αc = R = β1 = g1 (β) 1 + β2 1 = g2 (β). 1 + β2 Regularity condition(s)... Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 41 / 46 Extension: Delta method Univariate Delta method 7.2. Univariate Delta method Proposition Let Z1 ,· · · ,Zn be a sequence of independent and identically distributed real random variables, with mean E(Zi ) = µ0 and V(Zi ) = σ02 < ∞. If σ02 6= 0 and g is a continuously differentiable function (from R to R) with g 0 (µ0 ) 6= 0, then: 2 ! √ a.d. dg n g(Z¯n ) − g(µ0 ) → N 0, σ02 (µ0 ) . dz Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 42 / 46 Extension: Delta method Univariate Delta method Example: Let X1 ,· · · ,Xn be i.i.d. B(p) random variables. By the central limit theorem: √ d ¯n − p → N (0, p(1 − p)). n X ¯n ) = Find the limiting distribution of g(X Take g(s) = s 1−s √ and g 0 (s) = 1 . (1−s)2 We have: d ¯n ) − g(p) → n g(X N Florian Pelgrin (HEC) ¯n X ¯n ? 1−X p 0, (1 − p)3 Ordinary least squares estimator . September-December, 2010 43 / 46 Extension: Delta method Multivariate Delta method 7.3. Multivariate Delta method Proposition Suppose that the conditions of the multivariate central limit theorem for independent and identically random vectors hold. If g is a continuously differentiable function from Rk to Rp , then: √ a.d. ∂g ∂g t ¯ n g(Zn ) − g(µ0 ) → N 0, t (µ0 )Σ0 (µ0 ) ∂z ∂z where ∂g ∂z t denotes the p × k Jacobian matrix of g and: √ a.d. n Z¯n − µ0 → N (0, Σ0 ) . Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 44 / 46 Extension: Delta method Multivariate Delta method Example: The generalized learning curve (Berndt, 1992) Using the reduced-form model: 0 −1 ! √ XX a.d. 2 T βˆOLS − β → N 0, σ T or (using a consistent estimator of σ 2 ): a βˆOLS ∼ N 0, σ ˆ 2 (X 0 X )−1 . Therefore, α ˆc ˆ R a ∼N 0 ˆ ˆ ∂g(β) 2 0 −1 ∂g (β) g(β), σ ˆ (X X ) ∂β 0 ∂β where g(β) = (αc , R)0 and ˆ ˆ ∂g1 (β) ∂g1 (β) ∂g(β) ∂β ∂β2 1 = ˆ ˆ ∂g2 (β) ∂g2 (β) ∂β 0 ∂β1 Florian Pelgrin (HEC) ∂β2 ˆ ∂g1 (β) ∂β3 ˆ ∂g2 (β) ∂β3 = Ordinary least squares estimator 0 0 1 1+β2 0 ! −β1 (1+β2 )2 −1 (1+β2 )2 ! September-December, 2010 . 45 / 46 Summary 8. Key concepts How can we define the convergence of sequence of real random variables? Are they equivalent? What is the interpretation of a weak (strong) law of large numbers? Why do we use it? What does consistency mean? Show the weak consistency of the ordinary least squares estimator of β0 (and σ02 ) with i.i.d. observations. What is the interpretation of central limit theorems? Show the asymptotic distribution of the ordinary least squares estimator of β0 with i.i.d. observations. What is the delta method? Why is it useful? Florian Pelgrin (HEC) Ordinary least squares estimator September-December, 2010 46 / 46