4 RATIO AND REGRESSION METHODS OF ESTIMATION
Transcription
4 RATIO AND REGRESSION METHODS OF ESTIMATION
RATIO AND REGRESSION METHODS OF ESTIMATION 4 Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 4.1 INTRODUCTION In sampling theory the auxiliary information is being utilized in following ways: Utilization of information at pre-selection stage i.e. for stratifying the population. Utilization of information at selection stage i.e. in selecting the units with probabilities proportional to some suitable measure of size (size being based on some auxiliary variables). Utilization of information at estimation stage i.e. in formulation of the ratio-type, regression, difference and product estimators etc. Auxiliary information may also be utilized in mixed ways. Usually the information available is in the form that: The values of the auxiliary character(s) are known in advance for each and every sampling unit of the population. The population total(s) or mean(s) of auxiliary character(s) are known in advance. If it is desired to stratify the population according to the values of some variate x, their frequency distribution must be known. The use of auxiliary information at estimation stage in the formation of ratio-type and regression estimators and sampling scheme providing unbiased regression estimator has been discussed in the following sections. In sample surveys, many a time the characteristic y under study is closely related to an auxiliary characteristic x, and data on x are either readily available or can be easily collected for all the units in the population. In such situations, it is customary to consider estimators of population mean YN of survey variable y that use the data on x and are more efficient than the estimators which use data on the characteristic y alone. The fact that the data on the auxiliary variable can be used even at a later stage after selecting the sample, encourages such procedures. Two types of these commonly used methods are as follows: the ratio-type method of estimation the regression method of estimation 4.1 4.2 RATIO-TYPE METHOD OF ESTIMATION Let a sample of size n be drawn by SRSWOR (Simple random sampling without replacement) from a population of size N. Denote by yi = the value of the characteristic under study for the ith unit of the population, xi = the value of the auxiliary characteristic on the ith unit of the population, Y = the total of the y values in the population, X = the total of the x values in the population, ri yi , the ratio of y to x for the ith unit, xi rN 1 N rn 1 n , the simple arithmetic mean of the ratios for all the units in the sample, ri n i 1 RN N i 1 ri , the simple arithmetic mean of the ratio for all the units in the population, YN Y , the ratio of the population mean of y to the population mean of x, and XN X n y Rn n = in1 xn yi x i 1 ,the corresponding ratio for the sample. i With this, an estimator of the population mean YN is given by yR Rn X N yn XN . xn This estimator is known as the ratio-type estimator and pre-supposes the knowledge of X N . Here, Rn provide an estimator of the population ratio R N . For example, if y is the number of bullocks on a holding and x its area in acres, the ratio Rn is an estimator of the number of bullocks per acre of holding in the population. The product of Rn with X N , the average size of a holding in acres would provide an estimator of YN , the average number of bullocks per holding in the population. 4.2.1 Expected value of the ratio estimator Note that Rn is a biased estimator of RN and the bias in Rn is given by 4.2 Bias in Rn = Cov( Rn , xn ) . xN Expected value of the ratio estimator to the first approximation is given by N n E1 ( yR ) y N 1 + ( )(C x2 C y C x ) , Nn S Sx , C y y and = population correlation coefficient between x XN YN and y. It may be noted here that the bias to the first approximation vanishes when the regression of y on x is a straight line passing through the origin. where, C x 4.2.2. Variance of the Ratio Estimator The variance of the ratio estimator to a first approximation is given by V1 Rn = RN2 ( N n 2 )(C y Cx2 - 2 CyCx ) , Nn and the variance of the ratio estimator of population mean to a first approximation is given by V1 ( yR ) = N n 2 Sy + R 2N S2x 2 RN S yx Nn . 4.2.3. Estimator of the variance of the ratio estimator A consistent estimator of the relative variance of a ratio estimator is given by 2s yx R N n s y2 sx2 Vˆ1 n = 2 2 Nn yn xn yn xn RN and the estimator of variance of the ratio estimator of population mean to a first approximation is given by N n 2 s y + Rn2 sx2 2 Rn s yx Vˆ1 ( yR ) Nn where s y2 , sx2 and s yx are the corresponding sample values. 4.3 4.2.4 Efficiency of the Ratio Estimator In large samples, the ratio estimator will be more efficient than the corresponding sample estimator based on the simple arithmetic mean if Cy Cx > 1 2 > or 1 Cx . 2 Cy If Cx C y , as may be expected, for example, when y and x denote values of the same variate, in two consecutive periods, will be larger than one-half in order that the ratio estimator may be more efficient than the one based on the simple arithmetic mean. 4.3 RATIO ESTIMATOR IN STRATIFIED SAMPLING Let there be K stratum in the population. Let Nt denotes the number of units in the tth stratum and nt the size of the sample to be selected there from, so that K Nt N t 1 K and n t 1 t n. Denote by Rnt the estimate of the population ratio RN t YN t / X N t and by yRt the ratio estimate of the population mean YN t for the tth stratum. Then clearly, the ratio estimator of the population mean YN i 1 Nt Y has been discussed in the next N Nt section. 4.3.1 Separate Ratio Estimator ( y Rs ) K yRs t 1 K N Nt yRt pt yRt , where pt t N N t 1 (t 1,..., K ). This is a biased but consistent estimator of population mean YN . The bias to the first approximation is given by K Bias in ( y Rs ) = E1 ( yRs ) YN ptYN t ( t 1 where Ctx Stx X Nt and Cty Sty YNt Nt nt )(Ctx2 t CtxCty ) , Nt nt . The variance of y RS to a first approximation is given by 4.4 K 1 1 V1 yRs pt2 Sty2 RN2 t Stx2 2 RNt Stxy , t 1 nt Nt pt ( Nt nt )(Sty2 RN2 t Stx2 2 RN t Stxy ) , nt pt ( Nt nt )(Sty2 RN2 t Stx2 2 RN t t Stx Sty ) . nt K V1 ( yRs ) 1 N V1 ( yRs ) 1 N t 1 K t 1 The above formula is based on the assumption that nt is large. A consistent estimator of V1 yRs is given by N n 1 K Vˆ1 yRs pt ( t t )( sty2 Rn2t stx2 2 Rnt styx ) . N t 1 nt In practice, the assumption that nt is large is not always true. To get over this difficulty, a combined ratio estimator has been suggested as below: 4.3.2. Combined Ratio Estimator ( y Rc ) K y Rc p t 1 K t p t 1 t ynt XN . x nt This is again a biased estimator, however, it is a consistent estimator. The relative bias to the first approximation is given by K Relative Bias in ( yRc ) = (( E1 ( yRc ) YN ) / YN pt2 ( t 1 Nt nt )(Ctx2 ρt Ctx Cty ) . Nt nt The variance of y Rc to a first approximation is given by V1 ( yRc ) 1 N K p t 1 t Nt nt 2 ( Sty RN2 Stx2 2 RN ρt Sty Stx ) , nt and an estimator of the variance is given by 1 K N nt 2 Vˆ1 ( yRc ) pt t ( sty Rn2 stx2 2 Rn styx ) , N t 1 nt 4.5 K where, Rnt = ynt xn t and Rn = pt ynt pt xnt t 1 K t 1 4.4 REGRESSION METHOD OF ESTIMATION We have seen that the ratio estimate provides on efficient estimate of the population mean if the regression of y, the variable under study, on x, the auxiliary variable is linear and the regression line passes through the origin. It happens frequently that even though the regression of y on x is linear, the regression line does not pass through the origin. Under such conditions, it is more appropriate to use the regression method of estimation rather than ratio method of estimation. 4.4.1 Simple Regression Estimate Since the regression coefficient is generally not known, the usual practice is to use estimate s βˆ xy2 , sx 1 n where s xy = ( xi xn )( yi yn ) n 1 simple regression estimate, and 1 n ( xi xn ) 2 giving the s = n 1 2 x ylr yn ˆ ( xN xn ) . Note: The general form of the estimator is Yˆ = y + k(X N xn ) . (i) If k = βˆ , then Yˆ yn βˆ ( X N xn ) i.e. Yˆ is regression estimator (ii) If k = y y then Yˆ yn n x xn X N - xn = yn X N i.e. Y is a ratio estimator. xn 4.4.2 Expected value of the Simple Regression Estimator E ( ylr ) = y N Cov(ˆ , xn ) showing that the simple regression estimate is biased by an amount - Cov( ˆ , xn ) . 4.6 4.4.3 Variance of the Simple Regression Estimate To a first approximation, ~ ( 1 1 ) S2 (1 2 ) V ( ylr ) = y n N where is the correlation coefficient between y and x in the population. 4.4.4 Estimator of the variance 1 1 Vˆ ( ylr ) = ( ) s 2y (1 r 2 ) n N where r = s xy sx s y is the sample correlation coefficient. 4.5 REGRESSION ESTIMATORS IN STRATIFIED SAMPLING At first, we shall consider two difference estimates, namely (i) Separate difference estimator (ii) Combined difference estimate 4.5.1 Separate Regression Estimate When i , s are not known in case of separate difference estimator, we estimate these from the sample and in that case the estimator is known as separate regression estimator. K ylrs pi yni ˆi ( xN i xni ) i 1 where ˆi sixy six2 This estimator is biased and the variance of the estimator to the first approximation, is given by K V ( ylrs ) pi2 ( i 1 1 1 2 ) Siy (1 i2 ) ni Ni where i is the correlation coefficient between y and x for the i-th stratum and K 1 1 Vˆ ( ylrs ) pi2 ( )(siy2 ˆi2 six2 2ˆi sixy ) ni Ni i 1 4.7 4.5.2 Combined Regression Estimator When the pooled regression coefficient is not known then we replace it by and get the combined regression estimator, K K i 1 i 1 ylrc pi yni ˆ ( X N pi xni ) , K 1 1 ) sixy ni N i i 1 ˆ where K . 1 2 2 1 pi ( ) six ni N i i 1 p 2 i ( The variance of the estimator along with its estimator, to the first approximation are given by K V ( ylrc ) pi2 ( i 1 1 1 )(Siy2 2 Six2 2Sixy ) , ni N i and K 1 1 Vˆ ( ylrc ) pi2 ( )(siy2 ˆ 2 six2 2ˆsixy ) . ni Ni i 1 4.6 PRACTICAL EXAMPLES Let y i (i 1,..., N ) be the variate under study, and xi (i 1,..., N ) be the auxiliary variate. Let N be the population size out of which a sample of size n is drawn. Let X N be the population total of the auxiliary variate. n STEP-I: Calculate: yi , i 1 n xi , i 1 n yi2 , i 1 STEP-II: Calculate: 2 yi 1 2 s = yi (n 1) n 2 y 2 xi 1 2 s xi (n 1) n 2 x s xy = 1 (n 1) xi yi xi y i n 4.8 n xi2 and i 1 n x y i 1 i i . b s xy r s x2 s xy s x .s y yn = 1 yi n xn = Rn yn xn X = 1 xi n XN N STEP-III: Calculate: (a) Ratio estimate . yR = yn XN . xn Estimate of its variance 1 1 V ( y R ) = s y2 Rn2 s x2 2Rn s xy . n N (b) Regression estimate ( ylr ) y lr = y n bX N xn . Estimate of its variance 1 1 1 1 V ( ylr ) ) = s y2 b 2 s x2 2bs xy 1 r 2 s y2 n N n N (c) Simple Mean estimate . y srs y n . Estimate of its variance . 1 1 V ( y SRS ) = s y2 . n N STEP-IV: Calculate Estimate of Relative Efficiency (a) Estimate of Relative Efficiency of Ratio estimate over Simple Mean estimate = Vˆ y SRS x 100 Vˆ y R (b) Estimate of Relative Efficiency of Regression estimate over Simple Mean estimate = Vˆ y SRS x100 Vˆ y lr (c) Estimate of Relative Efficiency of Regression estimate over Ratio estimate 4.9 = Vˆ y R x 100 Vˆ y lr Note: Estimate of Standard Error (SE) of the estimate can be worked out by taking square root of the corresponding value of the estimate of the variance. Practical Exercise 1 A sample survey for the study of yield and cultivation practices of guava was conducted in Allahabad district. Out of a total of 146 guava growing villages in Phulpur-Saran tehsil, 13 villages were selected by method of simple random sampling. The Table below presents total number of guava trees and area under guava orchards for the selected 13 villages. It is also given that the total area under guava orchards of 146 villages is 354.78 acres. Using area under guava orchards as auxiliary variate, estimate the total number of guava trees in the tehsil along with its standard error, by using (i) Ratio method of estimation, and (ii) Regression method of estimation. (iii) Discuss the efficiency of these estimates with the one which does not make use of the information on the auxiliary variate. Sl. No. of Village 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Total number of guava trees ( y i ) 492 1008 714 1265 1889 784 294 798 780 619 403 467 197 4.80 5.99 4.27 8.43 14.39 6.53 1.88 6.35 6.58 9.18 2.00 2.20 1.00 SOLUTION: n y i 1 = 9710 i n x i 1 i = 73.60 n y i 1 2 i Area under guava orchards (in acres) ( xi ) = 9685234 4.10 n x i 1 2 i = 579.20 n x y i 1 i i = 72879.72 s y2 = 202717.60 s x2 = 13.54 s xy = 1492.18 b = 110.19 r = 0.90 y n = 746.92 x n = 5.66 Rn = 131.93 X N = 2.43 y R = 320.59 Vˆ ( y R ) = 3132.35 (Estimate of Standard Error = 55.97) y lr = 390.85 Vˆ ( y lr ) = 2683.74 (Estimate of Standard Error = 51.80) y n = 746.92 Vˆ ( y n ) = 14205.18 (Estimate of Standard Error = 119.19) (a) Estimate of Relative Efficiency of Ratio estimate over Simple Mean estimate 453.50 (b) Estimate of Relative Efficiency of Regression estimate over Simple Mean estimate 529.31 (c) Estimate of Relative Efficiency of Regression estimate over Ratio estimate 116.72 4.11 Practical Exercise 2 A sample survey was conducted for studying milk yield, feeding and management practices of cattle and buffaloes in the eastern districts of U.P. The whole of the eastern districts of U.P. were divided into four Zones (strata). The Table below present total number of milch cows in 17 randomly selected villages of Zone-I as enumerated in winter season and as per Livestock Census. Number of Milch Cows Sl. No. of Village 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Winter Season ( y i ) Livestock Census ( xi ) 29 44 25 38 37 27 63 53 64 30 53 25 16 15 12 12 23 41 44 27 53 17 40 53 46 89 37 70 15 30 18 22 13 66 Estimate the number of milch cows per village with its standard error for the rural area of Zone-I in winter season by using (i) Ratio method of estimation, and (ii) Regression method of estimation. It is given that total number of milch cows in Zone-I as per Livestock Census was 10,87,004 and number of villages in Zone-I was 22,654. Also compare the efficiency of these estimates with Simple Mean estimate. SOLUTION: n y i 1 = 566 i n x i 1 i = 681 n y i 1 2 i = 23450 n x i 1 2 i = 34617 4.12 n x y i 1 i i = 26879 s y2 = 287.85 s x2 = 458.56 s xy = 262.86 b = 0.57 r = 0.72 y n = 33.29 x n = 40.06 Rn = 0.83 X N = 47.98 yˆ R = 39.88 Vˆ ( yˆ R ) = 9.86 yˆ lr = 37.84 Vˆ ( yˆ lr ) = 8.06 yˆ n = 33.29 SE y R 3.14 (Estimate of Standard Error = 3.14) Vˆ ( yˆ n ) = 16.92 (Estimate of Standard Error = 2.84) (Estimate of Standard Error = 4.11) (a) Estimate of Relative Efficiency of Ratio estimate over Simple Mean estimate 171.67 (b) Estimate of Relative Efficiency of Regression estimate over Simple Mean estimate 209.85 (c) Estimate of Relative Efficiency of Regression estimate over Ratio estimate 122.24 4.13 Practical Exercise 3 A pilot sample survey for estimating the extent of cultivation and production of fresh fruits was conducted in three districts of Uttar Pradesh State during the agricultural year 1976-77. The following data were collected Stratum Number Total number of villages ( Nm ) Total area under orchards 985 11253 1 2 2196 3 (ha.) (X m ) Number of villages in Sample (n m ) 6 25115 1020 8 18870 11 Area under orchards (ha.) (x m ) Total number of trees (y m ) 10.63 9.90 1.45 747 719 78 3.38 5.17 10.35 201 311 448 14.66 2.61 4.35 580 103 316 9.87 2.42 5.60 739 196 235 4.70 36.75 212 1646 11.60 5.29 7.94 488 227 374 7.29 8.00 1.20 491 499 50 11.50 1.70 2.01 455 47 879 7.96 23.15 115 115 Estimate the total number of trees in the three districts by different methods and compare their precision. SOLUTION The calculations have been shown in the Table given below: Stratum Wm 1 1 nm N m xm ym Rˆ m W m xm W m ym s 2x m s 2y m s xym 1 0.2345 0.16598 6.81 417.33 61.28 1.60 97.66 16.03 2 0.5227 0.12454 10.07 503.38 49.99 5.26 263.12 129.64 259107.98 5643.81 3 0.2428 0.08902 1.94 82.55 38.39 W m = Nm N 7.97 340.00 42.66 m , Rˆ m = y m xm 4.14 74778.80 1008.75 65885.60 1403.69 (A) RATIO ESTIMATORS (i) Separate Ratio Estimate ( y Rs ) K y Rs = R m 1 m X m = 2750077 Estimate of its variance V ( y Rs ) 1 1 2 s ym Rˆ m2 .s x2m 2.Rˆ m .s xym = 2441137855.48 V y Rs = N m2 nm N m (ii) Combine Ratio Estimate ( y Rc ) y Rc = ∑W ∑W m ym m xm X = (2783995) Estimate of its variance V ( y Rc ) 1 1 2 s ym Rˆ .s x2m 2.Rˆ .s xym V y Rc = N m2 nm N m ˆ = where R W m ym W m xm (iii) Efficiency of Separate Ratio Estimate ( y Rs ) over the Combined Ratio Estimate ( y Rc ) V y Rc Estimate of Relative Precision Efficiency (R.P.)= x 100 (246.58%) V y Rs (B) Regression estimators (i) Separate Regression Estimate ( yls ) yls N m ym bm X m xm = 2672911 K m Estimate of its variance V ( yls ) K 1 1 2 s ym bm2 .s x2m = 1870633332 V yls N m2 m nm N m (ii) Combine Regression Estimate ylc ylc = N yst bc X xst where bc K K y st N m y m m y and K nm m j x st N m xm m 4.15 mj y m xmj xm x K nm m j xm 2 mj = 2643949 Estimate of its variance V ( ylc ) K W 2 1 f m nm ymj ym bc xmj xm 2 = 2020917640 V ylc m m nm nm 1 j where f m nm Nm a) Estimate of Efficiency of Separate Regression Estimate yls over the Separate Ratio Estimate y Rs is given by V y Rs Relative Precision (R.P.) = . 100 = 130.50% V yls b) Estimate of Efficiency of Combine Regression Estimate ( ylc ) over the Combined Ratio Estimate ( y Rc ) is given by V y Rc Relative Precision (R.P.) = . 100 = 297.86% V ylc c) Estimate of Efficiency of Separate Regression Estimate ( yls ) over the Combined Regression Estimate ( ylc ) is given by V ylc Relative Precision (R.P.) = . 100 = 108.03% V yls REFERENCES Cochran, William G. (1977). Sampling Techniques. Third Edition. John Wiley and Sons. Des Raj (1968). Sampling Theory. TATA McGRAW-HILL Publishing Co. Ltd. Des Raj and Promod Chandok (1998). Sample Survey Theory. Narosa Publishing House. Murthy, M.N. (1977). Sampling Theory and Methods. Statistical Publishing Society, Calcutta. Singh, Daroga and Chaudhary, F.S. (1986). Theory and Analysis of Sample Survey Designs. Wiley Eastern Limited. Singh, Daroga, Singh, Padam and Pranesh Kumar (1978). Handbook of Sampling Methods. I.A.S.R.I., New Delhi. Singh Ravindra and Mangat N.S. (1996). Elements of Survey Sampling. Kluwer Academic Publishers. Sukhatme, P.V. and Sukhatme, B.V. (1970). Sampling Theory of Surveys with Application. Second Edition. Iowa State University Press, USA. Sukhatme, P. V., Sukhatme, B.V., Sukhatme, S. and Asok, C. (1984). Sampling Theory of Surveys with Applications. Third Revised Edition, Iowa State University Press, USA. 4.16