Distribution models
Transcription
Distribution models
Distribution models 3 Ignacio Cascos Depto. Estadística, Universidad Carlos III 1 Outline 1. Discrete distributions Binomial distribution Geometric distribution Poisson distribution 2. Continuous distributions Uniform distribution Exponential distribution Normal distribution Central Limit Theorem 3. Multivariate normal distribution Ignacio Cascos Depto. Estadística, Universidad Carlos III 2 Binomial distribution A random experiment consists of n trials such that: The trials are independent. Each trial results either in success or failure. The probability of a success, p, remains constant. The random variable that equals the number of trials that result in a success follows a binomial distribution with parameters 0< p <1 and n=1,2,… X ~ B(n, p) X≡ number of (indep.) trials that result in a success Ignacio Cascos Depto. Estadística, Universidad Carlos III 3 Binomial distribution Probability mass function: n k n−k P( X = k ) = p (1 − p) , k ∈ {0,1, K , n}. k It is possible to write X=X1+…+Xn where Xi ~ B(1, p) independent random variables. Parameters: E[X] = np ; Var[X] = np(1−p) If X~B(n1, p) and Y~B(n2, p) are indep, X+Y~B(n1+n2, p) Ignacio Cascos Depto. Estadística, Universidad Carlos III 4 Binomial distribution B(50,0'7) 0.00 0.00 0.05 0.02 0.10 0.04 0.15 0.06 0.20 0.08 0.25 0.10 0.30 0.35 0.12 B(5,0'7) 0 Ignacio Cascos 1 2 3 4 5 0 3 6 9 Depto. Estadística, Universidad Carlos III 13 17 21 25 29 33 37 41 45 49 5 Example (Four flamates and a die) Four flatmates Alice, Bob, Charly, and Dave roll a 6-sided die every night in order to decide who washes the dishes after dinner. If the outcome is 1, Alice washes, it is its 2, Bob does, while for 3 or 4 Charly must wash the dishes and for 5 or 6 it is Dave's turn. a) What is the probability that Charly washes the dishes at most twice in a week (7 days)? b) How many days (dinners) must we wait on average until it is Alice's turn to wash the dishes? Ignacio Cascos Depto. Estadística, Universidad Carlos III 6 Outline 1. Discrete distributions Binomial distribution Geometric distribution Poisson distribution 2. Continuous distributions Uniform distribution Exponential distribution Normal distribution Central Limit Theorem 3. Multivariate normal distribution Ignacio Cascos Depto. Estadística, Universidad Carlos III 7 Geometric distribution In a series of independent trials with constant probability of a success, 0<p<1, the random variable denoting the number of trials until the first success follows a geometric distribution with parameter p X ~ G(p) X≡ number of (indep) trials until the first success Ignacio Cascos Depto. Estadística, Universidad Carlos III 8 Geometric distribution Probability mass function: P( X = k ) = (1 − p ) k −1 p, k ∈ {1,2,3, K}. Parameters: E[X] = 1/p ; Var[X] = (1−p)/p2 Ignacio Cascos Depto. Estadística, Universidad Carlos III 9 Geometric distribution G(0'3) 0.0 0.00 0.05 0.1 0.10 0.2 0.15 0.3 0.20 0.4 0.25 0.5 0.30 G(0'5) 1 2 Ignacio Cascos 3 4 5 6 7 8 9 10 12 14 1 2 Depto. Estadística, Universidad Carlos III 3 4 5 6 7 8 9 10 12 14 10 Example (Four flamates and a die) Four flatmates Alice, Bob, Charly, and Dave roll a 6-sided die every night in order to decide who washes the dishes after dinner. If the outcome is 1, Alice washes, it is its 2, Bob does, while for 3 or 4 Charly must wash the dishes and for 5 or 6 it is Dave's turn. a) What is the probability that Charly washes the dishes at most twice in a week (7 days)? b) How many days (dinners) must we wait on average until it is Alice's turn to wash the dishes? Ignacio Cascos Depto. Estadística, Universidad Carlos III 11 Outline 1. Discrete distributions Binomial distribution Geometric distribution Poisson distribution 2. Continuous distributions Uniform distribution Exponential distribution Normal distribution Central Limit Theorem 3. Multivariate normal distribution Ignacio Cascos Depto. Estadística, Universidad Carlos III 12 Poisson distribution Assume that certain events occur in a fixed interval of real numbers (period of time, area, volume,…) with a known average rate λ>0 and independently one from the others. The random variable that equals the number of events occurring in the interval follows a Poisson distribution with parameter λ, X ~ ℘(λ) X≡ number of events in the interval Ignacio Cascos Depto. Estadística, Universidad Carlos III 13 Poisson distribution Probability mass function: P( X = k ) = e Parameters: −λ λ k k! , k ∈ {0,1,2,K}. E[X] = λ ; Var[X] = λ If X~℘(λ1) and Y~℘(λ2) are indep, X+Y~℘(λ1+λ2) Ignacio Cascos Depto. Estadística, Universidad Carlos III 14 Poisson distribution P(3) 0.00 0.00 0.05 0.05 0.10 0.15 0.10 0.20 0.15 0.25 0.30 0.20 0.35 P(1) 0 Ignacio Cascos 1 2 3 4 5 6 7 8 9 10 0 1 Depto. Estadística, Universidad Carlos III 2 3 4 5 6 7 8 9 10 15 Example (radioactive material) A sample of radioactive material emits, on average, 15 alpha particles per minute. If the number of alpha particles emitted follows a Poisson distribution, what is the probability of 10 alpha particles being emitted in: a) 1 minute ? b) 2 minutes ? c) Many years later, the material averages 6 alpha particles emitted per min. What is the probability of at least 6 alpha particles being emitted in 1 minute? Ignacio Cascos Depto. Estadística, Universidad Carlos III 16 Outline 1. Discrete distributions Binomial distribution Geometric distribution Poisson distribution 2. Continuous distributions Uniform distribution Exponential distribution Normal distribution Central Limit Theorem 3. Multivariate normal distribution Ignacio Cascos Depto. Estadística, Universidad Carlos III 17 (Continuous) Uniform distribution A random variable uniformly distributed in the interval (a,b) represents a number chosen at random between a and b. The selection is made in such a way that the probability that the random variable lays in any interval inside (a,b) depends only on the length of such interval, X~U(a,b) Ignacio Cascos Depto. Estadística, Universidad Carlos III 18 (Continuous) Uniform distribution Density mass function: b −1 a if f ( x) = 0 if x ∉ ( a, b) Cumulative distribution function: 0 x−a F ( x) = b − a 1 x ∈ ( a, b) Parameters: Ignacio Cascos if x≤a if a< x<b if x≥b E[X] = (a+b)/2 ; Var[X] = (b−a)2/12 Depto. Estadística, Universidad Carlos III 19 (Continuous) Uniform distribution Uniform Distribution 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 Lower limit,Upper limit 1,3 cumulative probability density Uniform Distribution Lower limit,Upper limit 1 1,3 0,8 0,6 0,4 0,2 0 0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2 3,6 4 0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2 3,6 4 x x Ignacio Cascos Depto. Estadística, Universidad Carlos III 20 Outline 1. Discrete distributions Binomial distribution Geometric distribution Poisson distribution 2. Continuous distributions Uniform distribution Exponential distribution Normal distribution Central Limit Theorem 3. Multivariate normal distribution Ignacio Cascos Depto. Estadística, Universidad Carlos III 21 Exponential distribution The random variable that equals the distance between successive events in a Poisson process with mean λ>0 follows an exponential distribution with parameter λ, X ~ Exp(λ) X≡ distance between successive events Ignacio Cascos Depto. Estadística, Universidad Carlos III 22 Exponential distribution Density mass function: λe − λx f ( x) = 0 x>0 x≤0 Cumulative distribution function: 1 − e − λx F ( x) = 0 if if Parameters: Ignacio Cascos if if x>0 x≤0 E[X] = λ−1 ; Var[X] = λ−2 Depto. Estadística, Universidad Carlos III 23 Exponential distribution Exponential Distribution 0,1 Mean 10 density 0,08 0,06 0,04 0,02 0 0 10 20 30 40 50 60 cumulative probability Exponential Distribution 1 Mean 10 0,8 0,6 0,4 0,2 0 -10 0 x Ignacio Cascos 10 20 30 40 50 60 70 x Depto. Estadística, Universidad Carlos III 24 Exponential distribution Lack of memory property. For an exponential random variable T, given t1,t2>0 P(T > t1+t2 | T > t1) = P(T > t2) Ignacio Cascos Depto. Estadística, Universidad Carlos III 25 Example (radioactive material) On average, a sample of radioactive material emits 15 alpha particles per minute. a) What is the average time between the emission of two alpha particles? b) What is the probability that the time between the emission of two alpha particles is longer than 10 sec? c) Last alpha particle was emitted 10 seconds ago. What is the probability that it still takes longer than 10 seconds until the next particle is emitted? Ignacio Cascos Depto. Estadística, Universidad Carlos III 26 Outline 1. Discrete distributions Binomial distribution Geometric distribution Poisson distribution 2. Continuous distributions Uniform distribution Exponential distribution Normal distribution Central Limit Theorem 3. Multivariate normal distribution Ignacio Cascos Depto. Estadística, Universidad Carlos III 27 Normal distribution The most widely used model for the distribution of a random variable is a normal (or Gaussian) distribution. Apart from other relevant properties, it appears as the limit distribution in the Central Limit Theorem. A normal distribution is determined by two parameters, the mean µ and the standard deviation σ>0, X ~ N(µ,σ) Ignacio Cascos Depto. Estadística, Universidad Carlos III 28 Normal distribution Standard normal density mass function N(0,1): x2 1 exp− f ( x) = 2π 2 Density mass function N(µ,σ): ( x − µ )2 1 exp− f ( x) = 2 2σ σ 2π Parameters: Ignacio Cascos E[X] = µ ; Var[X] = σ2 Depto. Estadística, Universidad Carlos III 29 Normal distribution Normal Distribution 0,4 Mean,Std. dev.1 0,1 0,8 0,3 density cumulative probability Normal Distribution 0,2 0,1 0 -5 -3 -1 1 3 5 Mean,Std. dev. 0,1 0,6 0,4 0,2 0 -5 -3 x Ignacio Cascos -1 1 3 5 x Depto. Estadística, Universidad Carlos III 30 Normal distribution N(0,1) negro, N(2,1) rojo 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.3 0.8 0.4 N(0,0'5) rojo, N(0,1) negro, N(0,2) azul -6 -4 -2 0 2 4 6 -6 r Ignacio Cascos -4 -2 0 2 4 6 r Depto. Estadística, Universidad Carlos III 31 Normal distribution 1. 2. Properties. If X ~ N(µ,σ) , for every a and b, aX+b ~ N(aµ+b , |a|σ) If X ~ N(µ µ1,σ σ1) , Y ~ N(µ µ2,σ σ2) indep, for a, b aX+bY ~ N(aµ1+bµ2 , (a2σ12+b2σ22)1/2) Standardization. Given X~N(µ,σ), the random variable (X−µ)/σ follows a standard normal distribution, N(0,1). Ignacio Cascos Depto. Estadística, Universidad Carlos III 32 Table for the N(0,1) cdf Ignacio Cascos Depto. Estadística, Universidad Carlos III 33 Example (Capacitors) A machine makes capacitors with a mean value of 25 µF and a standard deviation of 6 µF. Assuming that capacitance follows a Gaussian distribution, find the probability that the value of capacitance exceeds 31 µF. Ignacio Cascos Depto. Estadística, Universidad Carlos III 34 Central Limit Theorem Given n independent random variables X1,X2,…,Xn, with finite means and variances E[Xi]=µi and Var[Xi]=σi2, the limiting distribution (n→∞) of their sum is normal X1+X2+…+Xn≈N(Σi=1,nµi , (Σi=1,nσi2)1/2) The approximation is usually good for n > 30. If the variables are discrete, we will use a correction factor called continuity correction. Ignacio Cascos Depto. Estadística, Universidad Carlos III 35 Normal approximations Normal approximation to the Binomial distribution. A Binomial distribution B(n,p) with n > 30 and np(1−p) > 5, is approximately ( N np , np (1 − p ) ) 0.00 0.04 0.08 0.12 B (5 0 ,0 '7 ) y N (3 5 ,3 '2 4 ) 0 Ignacio Cascos 3 6 9 1 3 1 7 2 1 2 5 2 9 3 3 3 7 Depto. Estadística, Universidad Carlos III 4 1 4 5 4 9 36 Example (Four flamates and a die) Four flatmates Alice, Bob, Charly, and Dave roll a 6-sided die every night in order to decide who washes the dishes after dinner. If the outcome is 1, Alice washes, it is its 2, Bob does, while for 3 or 4 Charly must wash the dishes and for 5 or 6 it is Dave's turn. c) How many days (dinners) must we wait until Bob washes the dishes at least 11 times with probability 0.95? Ignacio Cascos Depto. Estadística, Universidad Carlos III 37 Normal approximations Normal approximation to the Poisson distribution. A Poisson distribution℘(λ) with λ>5 is approximately N(λ, λ1/2) 0.00 0.02 0.04 P (4 9 ) y N (4 9 ,7 ) 0 Ignacio Cascos 6 13 21 29 37 45 53 61 Depto. Estadística, Universidad Carlos III 69 77 85 93 38 Example (radioactive material) On average, a sample of radioactive material emits 15 alpha particles per minute. What is the approximate probability of 10 alpha particles being emitted in: a) 1 minute ? b) Many years later, the material averages 6 alpha particles emitted per min. What is the probability of at least 6 alpha particles being emitted in 1 minute? Ignacio Cascos Depto. Estadística, Universidad Carlos III 39 Outline 1. Discrete distributions Binomial distribution Geometric distribution Poisson distribution 2. Continuous distributions Uniform distribution Exponential distribution Normal distribution Central Limit Theorem 3. Multivariate normal distribution Ignacio Cascos Depto. Estadística, Universidad Carlos III 40 Bivariate normal distribution The density mass function of a random vector that follows a bivariate normal distribution with mean vector (µ1,µ2) and covariance matrix Σ is 1 x − µ1 1 −1 1 f ( x1 , x2 ) = exp− (x1 − µ1 , x2 − µ 2 )Σ 1/ 2 2π Σ x2 − µ 2 2 σ 12 if Σ = ρσ 1 σ 2 ρσ 1 σ 2 , then 2 σ 2 − 1 1 exp 2 2 − 2 1 ρ 2πσ 1σ 2 1 − ρ ) Ignacio Cascos Depto. Estadística, Universidad Carlos III ( x − µ 2 x − µ 2 x1 − µ1 x2 − µ 2 1 1 2 2 + − 2 ρ σ 1 σ 2 σ 1 σ 2 41 Bivariate normal distribution If X1 and X2 have a bivariate normal distribution with mean vector (µ1,µ2) and covariance matrix Σ, the marginal distributions of X1 and X2 are normal, X1~N(µ1,σ1) y X2~N(µ2,σ2) . The correlation ρ measures the dependence between the variables. Ignacio Cascos Depto. Estadística, Universidad Carlos III 42 Bivariate normal distribution rho=0, sigma1)1, sigma2=3 -3 -10 -2 -5 -1 0 0 1 5 2 10 rho=0, sigma1=sigma2 -1 0 1 2 3 rho=0.8, sigma1=sigma2 -10 -5 0 rho=-0.8, sigma1=sigma2 10 0 -5 0 -5 -4 Ignacio Cascos 5 5 -2 5 -3 -2 0 2 4 Depto. Estadística, Universidad Carlos III -4 -2 0 2 4 43 Bivariate normal distribution Properties. Given (X1,X2) a normal random vactor with mean vector (µ1,µ2) and covariance matrix σ 12 ρσ 1 σ 2 Σ = 2 ρσ σ σ 2 1 2 1. 2. Ignacio Cascos if ρ = 0 then X1 and X2 are independent ; given a1,a2∈IR, a1X1+a2X2 is normal . Depto. Estadística, Universidad Carlos III 44 Example Given (X1,X2) a normal bivariate random vector with mean vector (50,45) and covariance matrix 6 2 Σ = 2 4 Determine: a) P(4X1+X2 ≥ 250) b) P(X1+4X2 ≥ 220) Ignacio Cascos Depto. Estadística, Universidad Carlos III 45