Generalizability, Reliability and Validity Elke Johanna de Buhr, PhD
Transcription
Generalizability, Reliability and Validity Elke Johanna de Buhr, PhD
Generalizability, Reliability and Validity Elke Johanna de Buhr, PhD Tulane University Your Research Proposal I. Introduction • • • • A. Problem statement B. Research question(s) C. Hypothesis D. Definitions of terms II. Review of the relevant literature (the more complete, the better) • A. Importance of the question being asked • B. Current status of the topic • C. Relationship between the literature and the problem statement III. Method • • • • A. Target population B. Research design and sampling C. Data collection plans D. Proposed analysis of the data IV. Implications and limitations Textbook Chapters and Other Sources • Course textbooks • USC Library Guides, Organizing Your Social Sciences Research Paper, http://libguides.usc.edu/writingguid e • And other materials Data Analysis Plan • Type of data • Qualitative and/or quanititative? Representative of larger population? Standardized vs. open-ended responses? • Type of comparison • Comparison between groups and/or over time? • Type of variable • Categorical and/or continuous? • Type of data analysis • Descriptive analysis and/or hypothesis testing? Research Limitations • The limitations of the study are those characteristics of design or methodology that impacted or influenced the application or interpretation of the results of your study • They are the constraints on generalizability and utility of findings that are the result of the ways in which you chose to design the study and/or the method used to establish internal and external validity USC Library Guides Generalizability • When a sample represents a population, the results of the study are said to be generalizable or have generalizability Salkind Sample Selection • Sample size • Sampling frame • Sample selection = sampling • Probability sampling • Nonprobability sampling Sampling Methods • Census vs. Sampling • Census measures all units in a population • Sampling identifies and measures a subset of individuals within the population • Probability vs. Non-Probability Sampling • Probability sampling results in a sample that is representative of the target population • A non-probability sample is not representative of any population Reliability, Validity and Sensitivity • Reliability • Does the instrument yield consistent results? • Validity • Is the instrument appropriate for what needs to be measured? • Sensitivity • Indicators changing proportionally in response to actual changes in the condition or item being measured? Reliability • The extent to which the measure produces the same results when used repeatedly to measure the same thing • Variation in results = measurement error • Unreliability in measures obscures real differences Rossi, Lipsey & Freeman Measurement Errors • Systematic error (we do not measure what we think we measure) • Random error (inconsistencies from one measurement to the next) Reliability (cont.) • How to verify? • Test-retest reliability: • Most straightforward but often problematic, esp. if measurement cannot be repeated before outcome might have changed • Internal consistency reliability: • Examining consistency between similar items on a multi-item measure • Ready-made measures: • Reliability information available from previous research Rossi, Lipsey & Freeman Validity • The extent to which a measure measures what it is intended to measure • Usually difficult to test whether a particular measure is valid • However, it is important that an outcome measure is accepted as valid by stakeholders Rossi, Lipsey & Freeman Validity (cont.) • How to verify? • Empirical demonstrations: • Comparison, often with another measure, that shows that the measure produces the results expected • Demonstration that results of the measure “predict” other characteristics expected to be related to the outcome • Other approaches: • Using data from more than one source, careful theoretical justification based on program impact theory, etc. Rossi, Lipsey & Freeman Internal Threats to Validity Matrix History Maturation Mortality Testing Instrumentation One-Shot Case Study YES YES YES - - One-Group PretestPosttest Design YES YES CONT. YES Time Series Design YES CONT. CONT. Pretest-Posttest Control Group Design CONT. CONT. Posttest-Only Control Group Design CONT. Single-Factor Multiple Treatment Designs John Henry Effect Compensatory Equalization Differential Selection - - - MAYBE - - - YES MAYBE - - - CONT. CONT. CONT. MAYBE MAYBE CONT. CONT. YES - - MAYBE MAYBE CONT. CONT. CONT. CONT. CONT. CONT. MAYBE MAYBE CONT. Solomon 4 – Group Design CONT. CONT. CONT. CONT. CONT. MAYBE MAYBE CONT. Factorial Design CONT. CONT. CONT. CONT. CONT. MAYBE MAYBE CONT. Static-Group Comparison Design CONT. CONT. YES - - MAYBE MAYBE YES Nonequivalent Control Group Design CONT. CONT. CONT. CONT. CONT. MAYBE MAYBE CONT. Glossary of Threats to Internal Validity • INTERNAL VALIDITY: Any changes that are observed in the dependent variable are due to the effect of the independent variable. They are not due to some other independent variables (extraneous variables, alternative explanations, rival hypotheses). The extraneous variables need to be controlled for in order to be sure that any results are due to the treatment and thus the study is internally valid. • Threat of History: Study participants may have had outside learning experiences and enhanced their knowledge on a topic and thus score better when they are assessed after an intervention independent from the impact of the intervention. (No control group) Threat of Maturation: Study participants may have matured in their ability to understand concepts and developed learning skills over time and thus score better when they are assessed after an intervention independent from the impact of the intervention. (No control group) Threat of Mortality: Study participants may drop out and do not participate in all measures. Those that drop out are likely to differ from those that continue to participate. (No pretest) Treat of Testing: Study participants might do better on the posttest compared to the pretest simply because they take the same test a second time. Threat of Instrumentation: The posttest may have been revised or otherwise modified compared to the pretest and the two test are not comparable anymore. John Henry Effect: Control group may try extra hard after not becoming part of the “chosen” group (compensatory rivalry). Resentful Demoralization of Control Group: Opposite of John Henry Effect. Control group may be demoralized and perform below normal after not becoming part of the “chosen” group. Compensatory Equalization: Control group may feel disadvantaged for not being part of the “chosen” group and receive extra resources to keep everybody happy. This can cloud the effect if the intervention. Statistical Regression: Threat to validity in cases in which the researcher uses extreme groups as study participants that have been selected based on test scores. Due to the role that chance plays in test scores, the scores of students that score at the bottom of the normal curve are likely to go up, the scores of those that score at the top will go down if they are assessed a second time. Differential Selection: Experimental and control group differ in its characteristics. This may influence the results. Selection-Maturation Interaction: Combines the threats to validity described as differential selection and maturation. If experimental and control group differ in important respects, as for example age, differences in achievement might be due to this maturational characteristic rather than the treatment. Experimental Treatment Diffusion: Close proximity of treatment and control group might result in treatment diffusion. This clouds the effect of the intervention. • • • • • • • • • • • Sensitivity • The extent to which the values of the measure change when there is a change or difference in the thing being measured • Outcome measures in program evaluation are sometimes insensitive because: • They include elements that the program could not reasonably be expected to change • They have been developed for a different (often diagnostic) purpose Rossi, Lipsey & Freeman Example Margoluis & Salafsky Sensitivity (cont.) • How to verify? • Previous research: • Identify research in which the measure was used successfully (need to be very similar programs, sample size needs to be sufficiently large ) • Known differences: • Apply the outcome measure to groups of known difference or situations of known change, and determine how responsive it is Group Discussion 1. Implications and limitations • Generalizability • Reliability • Validity 2. Discussion of individual projects 3. Other questions/concerns