Sampling
Transcription
Sampling
Sampling More pixels Less pixels 1 Basic Statistics 2 Basic Statistics of Universe Central Tendency Population Mean and Total • This statistic is a measure of the “central” value in the observed Total Population N - the number of all elements in the Total Population Xi - the value of each element in the Total Population Population Mean Population Total X NX N X X i 1 N i N Xi i 1 3 Basic Statistics of Universe Spread xi Population Variance (2) This statistic is a measure of how variable the Population is about the MEAN X i.e. What is the average distance of the mean from each of the elements Xi X Xi X … but this will be zero by definition X i X … removes the negative values 1 i 1 1 2 i 1 N S22 (X i 1 i X) N 1 2 … is the population variance 4 Population Variability – example Population I (1,2,3,4,5,6,7,8,9,10) MEAN = 5.5 VARIANCE = 9.17 Population II(1,2,3,4,5,11,12,13,14,15) MEAN = 8 VARIANCE = 30 5 Population I: Mean and Variance Xi X X i X 1 Xi Sum Mean X 1 2 3 4 5 6 7 8 9 10 55 5.5 N Variance S (2) 2= i1 -4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5 0 0 (X i 1 i X) 20.25 12.25 6.25 2.25 0.25 0.25 2.25 6.25 12.25 20.25 82.5 2 2 9.17 N 1 6 Sampling More dots Less dots 8 Sample • Sample is a subset of the universe that’s used for making conclusions or inferences about the universe. • It reduces the time, effort and cost in estimating parameters such as market size, a brand’s sales volume or market share • Sample size is usually a commercial decision weighing the cost and benefit – Small unreliable samples are not meaningful – Large, overly accurate samples, that none can afford also do not make commercial sense 9 Sample size depends on • Population Variability – Larger variance - bigger sample is needed • Product Distribution • Sample Design – Optimum stratification and allocation – Selection – Projection • Level of Accuracy – The higher the required accuracy, larger the sample Sample size is not dependent on Universe size. 10 Why large samples in India and China? Retail Environment Highly Variable Product Distribution Low Difficult to Measure Need Large Sample Reliability is more Expensive 12 Central Limit Theorem • The determination of sample size is based partly on the Central Limit Theorem, which states that the sampling distribution of the mean will be normally distributed as the sample size (n) increases, even if the population distribution is not normally distributed. • This means that when we choose samples of the same size from the universe (e.g. retail outlets), while each of these samples will yield a different result for any variable (such as rate of brand sales per store) being measured, it can however be expected that all samples of the same size and design will yield a result that is within a measured range around the true value 13 Different samples give different results Universe size = N Sample size = n True value = μ Measure: Sample 1: x1 Sample 2: x2 Sample 3: x3 Sample 4: x4 …… Based on the Central Limit Theorem, the frequency distribution of values x1, x2 … follows a bell-shaped curve called normal distribution 14 Normal Distribution Frequency (%) Distribution density function Frequency distribution of estimates ഥ2, ࢞ ഥ3, ࢞ ഥ4 … of a variable X from ഥ1, ࢞ ࢞ multiple samples follows the bellshaped normal distribution curve Samples of the same size yield a result that is within a measured range around the true value. 68% of all of its observations fall within a range of ±1 standard deviation from the mean Normal distribution is defined by a function which has two parameters: mean, standard deviation μ Estimated Value x Normal distribution represents one of the empirically verified elementary "truths about the general nature of reality" 15 Probability that estimated value will lie within ±3% of actual value is 90% Frequency (%) 90% 5% 5% 1.65 μ 1.65 x Estimated Value 20 Standards for sampling error • Nielsen standards for Sampling Error* (tolerance level of error) are: –National Market –Major MBDs / Channels –Minor MBDs ±3% of sales level ±6% of sales level ±6-10% of sales level • National Market ±3% of sales level … Probability that estimated value will lie within ±3% of actual value is 90% * Sampling Error also called Relative Standard Error or RSE 21 Sample Size for estimating population mean μ Variance of distribution = 2 1.65 = 0.03 μ = e i.e. Z = e Z 2 2 = e2 2 = S2 / n S2 is the Universe variance, μ is universe average (Central Limit Theorem) Z 2 (S2 / n ) = e2 n = Z2 S2 e2 22 Sample Size (for estimation population mean) n = Z2 S2 e2 Sample Size = n Universe variance = S2 Standardized z value associated = Z with the level of confidence Level of confidence is: 90% … Z=1.65 95% … Z=1.96 99% … Z=2.58 Acceptable tolerance level of error = e (stated in absolute value) 24 Sample Size (for estimation of variable) 2 S2 Z n= e2 In order to half a sampling error we need to quadruple the sampling size S =100 μ = 200 Z = 1.65 e = 3% x 200 Sample size is directly proportional to Universe variance n = 1.65 2 x 1002/ (0.03 x 200)2 = 756 25 Sample Size (for estimation population proportion) • Quantitative Research usually deals with estimation of population proportion. i.e. What % of the population … – are aware of product X – say they will buy product X – agree / strongly agree that product X is superior to product Y • For large population, the sample size (n) for these type of “Yes”, “No” questions is: n = Z2 p(1-p) e2 Where: p is the probability for given response and varies from 0 to 1. It reflects the variability in the data. When p=1, no sample required. Z is the standardized value associated with the level of confidence. For 95% confidence level Z = 1.96 e is the desired precision or the margin of error 26 Sample Size (for estimation population proportion) • 0.5 is the most conservative value for ‘p’. This is probability for which we require the largest sample size (n) • Assuming: p = 0.5 e = + 5% z = 1.96 ≈ 2 … for confidence level of 95% n = Z2 p(1-p) = 22 0.5(1-0.5) = 400 e2 (or 384 to be precise) 0.052 n=1 e2 • To estimate the proportion of population that respond positively to a question, with confidence level of 95% and 5% margin of error, we need a sample of 400 respondent 27 Sample Size (for estimation population proportion) n=1 e2 e +20% +10% +5% +4% +3% +2% +1% n 25 100 400 625 1100 2,500 10,000 • For small populations, 100 for instance, take census. • Finite Population Correction Factor: For medium size populations, the required (reduced) sample size may be computed using the following adjustment: nadj = • n 1 + n/N If N = 2000, e = 5% then n = 384 (not rounding Z to 2), • nadj = 384/(1+384/2000) = 322 Using FPC factor one can optimize sample size and save cost for smaller populations. Further savings are possible if it’s known that the proportions are skewed (i.e. p is not close to 0.5) 28 Sample size for different margins of error (at confidence level of 95%) 25% e +20% +14% +10% +7% +5% +4% +3.5% +3% +2% n 25 50 100 200 400 600 800 1000 2,500 10,000 +1% 20% 14% Margin of error (e) 15% 10% 10% 7% 5% 5% 4% 3.5% 3% 0% 0 50 100 200 300 400 500 600 Sample size (n) 700 800 900 1000 29 Design issues for tracking surveys • Many market research surveys are tracking surveys in which an independent sample is taken at fixed intervals. The main objective of a tracker is to measure change between the intervals • Estimates of change (differences) have margins of error which are 40% larger than the corresponding estimates from the individual surveys – Because both estimates are subject to sample error 30 Tracking Studies - margin of error for differences in proportions. (at confidence level of 95%) 35% 33% 30% e +20% +14% +10% +7% +5% +4% +3% n 50 100 200 400 800 1250 2,200 20,000 +1% 25% Margin of error (e) 20% 20% 14% 15% 10% 10% 7% 6% 5% 5% 4% 0% 0 50 100 200 300 400 500 600 Sample size (n) 700 800 900 1000 31 Design issues for tracking surveys • The previous charts show a “tracking survey” for which no change was happening – the population value did not change over time – The charts are only showing sample error! • Many trackers have samples of around 20-50 per wave and are unable to measure the change happening in the population 32 Design issues for tracking surveys • This problem is often “solved” with rolling data • With rolling data the result for each wave is averaged across the current and several previous waves – eg with a weekly sample size of 25 per week an 8 week rolling average would give a sample size of 200 – This improves the standard error (by using more sample) but an 8 week average flattens the data, making any change hard to detect 33 Types of Sampling • Probability sampling: Where we know the probability of a data point being included in the sample … though the probability of inclusion may not be equal – Random sampling: equal chance of inclusion – Systematic sampling: equal chance of inclusion – Stratified sampling: unequal chance of inclusion • Non-probability sampling: – Convenience sampling … such as selecting sample (of outlets as in retail audit) that is near home – Purposive sampling … eg selecting brand users – Quota sampling … where sample matches a predetermined profile In real life, we usually use a combination of methods 34 Biased Unbiased, random errors. Unbiased and accurate 35 Stratified Sampling • Stratification is a process of – dividing a Universe into groups (called Strata or Cell) for the purpose of selecting sample from each one • Strata examples: Provision, Supermarkets, mini-markets – Each group is usually internally homogenous. Homogeneity in retail measurement is based on store characteristics such as store type / retailer chain, geographical location and shop size. • Stratified Sample provides greater precision than a Simple Random Sample of the same Sample Size • Most suitable for retail measurement services 36 Stratification Population II (1,2,3,4,5, 11,12,13,14,15) MEAN = 8 2 = 30 Strata I (1,2,3,4,5) MEAN = 3 2 = 2.5 Strata II (11,12,13,14,15) MEAN = 13 2 = 2.5 Through reduction in strata variance, a Stratified Sample provides greater precision than a Simple Random Sample of the same size 37 Sample Size in Stratified Sampling Sample Size = ni … for stratai (k in all) Stratai Population (i.e. Universe) = Ni Stratai Population variance = σi2 Standardized z value associated =Z with the level of confidence Acceptable tolerance level of error = e% ni = Z2 σi2 e2 x N i σi 2 Njσj2 j = 1 to k Level of confidence is: 90% … Z=1.65 95% … Z=1.96 99% … Z=2.58 38 Projection from sample to universe • Projection - statistical technique used to estimate Population variables from the observations gathered from the Sample • Stratified Samples are projected at the Cell level 39 Projection factors • Cell Projection Factors can be based on – Number of Shops - Numeric Projection # of Shops in Universe # of Shops in Sample – Shops ACV (‘All commodity value' or Turnover) or any other size indicator - Ratio Estimation Universe Shops ACV Sample Shops ACV 40 Projection • Ratio Estimation is better than Numeric Projection when – the correlation between given Category sales and measure of size (ACV sales) used to calculate Ratio Estimation is greater than 0.5 • Distance between the Numeric and Ratio estimates reflects Sample Quality 41 42