Tom Louis - University of Michigan School of Public Health

Transcription

Symposium in Honor of Rod Little’s 65th Birthday, 31 October 2015
Perils and Potentials of Self-selected Entry to
Epidemiological Studies and Surveys1
Thomas A. Louis, PhD
Department of Biostatistics
Johns Hopkins Bloomberg SPH
[email protected]
Research & Methodology
U. S. Census Bureau
1
Keiding N, Louis TA (2016). Perils and potentials of self-selected entry to epidemiological studies and
surveys (with discussion and response). J. Roy. Statist. Soc., Ser. A, 179: to appear.
T. A. Louis: Johns Hopkins Biostatistics & Census Bureau
Perils & Potentials • Little Symposium • 10/31/15
1
Outline
Connections with Rod
Perils & Potentials
The traditional sample survey and challenges to it
The epidemiological context
Epi/Biostat and survey futures
Convergence of the Biostat/Epi and Survey cultures
Closing
2
Connections
We met in 1972 at Imperial College, I was a postdoc., he was a graduate
student
He seems to have caught up!
Served together on CNSTAT, projects including SAIPE
Followed Rod as Associate Director for Research & Methodology,
Census Bureau
Continue to collaborate on a few projects
EB for the ACS, Low overhead design-based CIs
Section 203 of the voting rights act alternative language determinations
Possibly most important,
3
Connections
We met in 1972 at Imperial College, I was a postdoc., he was a graduate
student
He seems to have caught up!
Served together on CNSTAT, projects including SAIPE
Followed Rod as Associate Director for Research & Methodology,
Census Bureau
Continue to collaborate on a few projects
EB for the ACS, Low overhead design-based CIs
Section 203 of the voting rights act alternative language determinations
Possibly most important,
I served as his caddy for 10 holes and then got fired
He claims he played the last 8 holes very well!
4
It’s not Rod,
but the name is
as close as I could get
5
Rod at “tomfest”
6
Rod at “tomfest”
Counterpoint idea for today
7
The traditional sample survey
Identify a reference population and a sampling frame
Develop and implement a sampling plan
Simple random, clustered, probability proportional to size, . . .
Conduct a design-based analysis
Population values (Y) are fixed, sample inclusion indicators are the
random variables that follow a joint distribution determined by the
sampling plan: inclusion probabilities/propensities
If the propensities are known, the sample is representative and a
model-free, unbiased estimate of a population feature
(functional of the Ys) is available along with its SE/MOE
However, propensities need to be adjusted for imputing missing data and
non-participation, generally using a model, and the “purity” of the
design-based approach isn’t so pure
8
The pure, probability-based survey is in trouble
Survey response rates & representation are declining
Phone response rates are declining due to:
Caller ID, the tsunami of “surveys”
The increasing prevalence of cell phones
Government is not allowed to robocall
“Local” representation is degrading because cells phones are not linked to
an address
1
Trewin (2014) What are the quality impacts of conducting high profile official statistical collections on a
voluntary basis? Statistical Journal of the IAOS, 30: 231–235.
9
The pure, probability-based survey is in trouble
Survey response rates & representation are declining
Phone response rates are declining due to:
Caller ID, the tsunami of “surveys”
The increasing prevalence of cell phones
Government is not allowed to robocall
“Local” representation is degrading because cells phones are not linked to
an address
Consequences
Bias depends on the extent to which the characteristics of respondents
differ from those of non-respondents
The lesson is that rather than focusing just on response rates, there is a
need to focus on representativeness
“In my view, adjusting for non-response at the estimation stage is the
non-preferred option. Emphasis should be on the design stage, including
consideration of which auxiliary variables should be used in stratification.1
1
Trewin (2014) What are the quality impacts of conducting high profile official statistical collections on a
voluntary basis? Statistical Journal of the IAOS, 30: 231–235.
10
Growth of online surveys
“Internet surveys have emerged rapidly over the past decade or so . . . .
Inside Research estimates that in 2012, the online survey business had
grown from nothing a decade and a half earlier to more than $1.8 billion.
. . . This represents 43% of all surveys in the U.S. Almost all (85%) of
that growth came at the expense of traditional methods.” 1
The internet to the rescue?
The internet in conducting a traditional sample survey
Targeted invitations, ID or non-ID processing
Avoids many of the drawbacks of Phone/Hard-copy/door-to-door,
but faces similar response rate challenges
Self-enrolled, internet surveys
Big participation, but no sampling frame
Information is “organic” in the manner of Big Data
Big Data may be able to help approximate a sampling frame,
but it is not a cure
1
McCutcheon et al. (2014). Online Panel Surveys: An Interdisciplinary Approach The Untold Story of
Multi-Mode (Online and Mail) Consumer Panels: From Optimal Recruitment to Retention and Attrition, Wiley.
11
Are non-probability samples informative?
Most state that nonprobability, volunteer samples, can’t be used for
population estimates because the necessary weights aren’t available
“The debate over probability vs. nonprobability samples is about
representation.”
Keeter (2014). Change Is Afoot in the World of Election Polling amstat news, October: 3-4
However, would you rather have 60% response rate from a well-designed
and conducted (Gallup) survey or a 95% rate from a self-selected group?
12
Are non-probability samples informative?
Most state that nonprobability, volunteer samples, can’t be used for
population estimates because the necessary weights aren’t available
“The debate over probability vs. nonprobability samples is about
representation.”
Keeter (2014). Change Is Afoot in the World of Election Polling amstat news, October: 3-4
However, would you rather have 60% response rate from a well-designed
and conducted (Gallup) survey or a 95% rate from a self-selected group?
Advantage Gallup: The 60% is also self-selected, but information on the
relation of respondents to non-respondents is available from the sampling
frame and generalizing from the sample is possible
Non-probability has potential: There may be other data that can be
used to develop reasonable weights for some reference population
Collecting paradata and “big data” to support “causal analysis” is key
Analogously, in clinical trials many causal (most?) questions are not
protected by randomization, are not Intent to Treat, but careful,
observational data analysis can be informative
13
Random digit dialing vs internet surveys1
The probability sample surveys were consistently more accurate than the
non-probability sample surveys, even after post-stratification with
demographics
The non-probability sample survey measurements were much more
variable in their accuracy, both across measures within a single survey
and across surveys with a single measure
Post-stratification improved the overall accuracy of some of the
nonprobability sample surveys but decreased the overall accuracy of others
Probability samples, even ones without especially high response rates,
yielded quite accurate results
1
Yeager et al., (2011). Comparing the accuracy of RDD telephone surveys and Internet surveys
conducted with probability and non-probability samples. Public Opinion Quarterly, 75: 709-747.
14
Percentage-point absolute errors1
PS Telephone • PS Internet • Non-PS Internet
1
Plots from Yeager et al. (2011)
15
There is potential1
“The use of the Internet, the willingness to help advance public health
research, and the study being publicly funded were key motives for
participating in the Web-based NutriNet-Santé cohort.”
“These motives differed by sociodemographic profile and obesity, yet were
not associated with lifestyle or health status.”
But, selection effects and representation can’t be ignored
1
Méjean, et al. (2014). Motives for participating in a web-based nutrition cohort according to
sociodemographic, lifestyle, and health characteristics: the NutriNet-Santé cohort study. J Med Internet Res., 16:
e189.
16
Transportability1
Science is about generalization, and generalization requires that conclusions
from the laboratory be transported and applied elsewhere, in an environment
that differs in many aspects from that of the laboratory
That most studies are conducted with the intention of applying the results
elsewhere means that we usually deem the target environment sufficiently
similar to the study environment to justify the transport of experimental results
or their ramifications
Very different from Miettinen’s, “In science the generalization is from the
actual study experience to the abstract, with no referent in place or time”
The conditions that permit “transport” have not received systematic treatment
Based on judgments of how target populations may differ from those under
study, the paper offers a formal representational language for making these
assessments precise and for deciding whether causal relations in the target
population can be inferred from those obtained in an experimental study
1
J. Pearl and E. Barenboim. (2014). External validity: From do-calculus to transportability across populations. Statistical Science
29, 579-595
17
Lack of transportation: Election survey
Discrepancies between actual and survey-reported voting behavior
“. . . the rate at which people report voting in surveys greatly exceeds the
rate at which they actually vote
For example, 78% of respondents to the 2008 National Election Study
(NES) reported voting in the presidential election, compared with the
estimated 57% who actually voted”
”. . . the 57% coming from voting records (a form of ‘big data’)”
“standard predictors of participation, like demographics and measures of
partisanship and political engagement, explain a third to a half as much
about voting participation as one would find from analyzing behavior
reported by survey respondents.”1
1
Ansolabehere & Hersh (2012). Validation: What Big Data Reveal About Survey Misreporting and the Real
Electorate. Political Analysis, 20: 437–459
18
Lack of transportation: Election survey
Discrepancies between actual and survey-reported voting behavior
“. . . the rate at which people report voting in surveys greatly exceeds the
rate at which they actually vote
For example, 78% of respondents to the 2008 National Election Study
(NES) reported voting in the presidential election, compared with the
estimated 57% who actually voted”
”. . . the 57% coming from voting records (a form of ‘big data’)”
“standard predictors of participation, like demographics and measures of
partisanship and political engagement, explain a third to a half as much
about voting participation as one would find from analyzing behavior
reported by survey respondents.”1
Lack of transportation
The magnitude of associations between personal attributes and voting
participation computed using the survey data don’t transport to those
computed using administrative records
1
Ansolabehere & Hersh (2012). Validation: What Big Data Reveal About Survey Misreporting and the Real
Electorate. Political Analysis, 20: 437–459
19
Are longitudinal analyses protected?
Unaccounted for selection propensities that are associated with outcomes,
produce biased estimates
Biases are most apparent for prevalences and other cross-sectional
estimates
Changes over time and associations may less vulnerable to selection
effects, however if change also depends on inadequately modeled
propensities, estimated change will be biased for a population value
If level is biased, then bias protection depends on level and change being
unrelated, after adjusting for baseline attributes
But, there are many examples of high association between level and
change, for example the ‘horse racing effect’– faster runners are in front
Similar issues apply to associations
20
The Epidemiological Context
The internet is an attractive resource for enrolling and following volunteer
participants in observational epidemiological studies, but such enrollment raises
concerns
Many epidemiologists discuss the issues, implicitly assuming that “representative
sampling” is equivalent to “simple random sampling” and generally downplay
the role of sampling in favor of careful confounder control
However, they maintain an interest in the possibility of selection bias in the
composition of the study group
A central issue is whether conditional effects in the study group may be
transported to desired target populations
21
The SnartGravid Study
Initiated in Denmark in 2007 by researchers from Boston U and Aarhus U
Couples recruited via on-line advertisements (non-commercial health sites, social
networks), press releases, blogs, posters, word-of mouth
Recruitment shortly after initiation, followed until pregnancy or giving up
trying, or 12 menstrual cycles after initiation
No attempt at representativity of the volunteers
Follow-up via web
By June 1, 2014, more than 8,500 couples recruited
High follow-up: more than 80% of the cohort still included after 1 year
Delayed entry (left truncation): many couples were recruited after start of
attempt, and only post-recruitment menstrual cycles were included in analysis
Care was taken to include only recently started couples to avoid hazard
ratio attenuation
22
SnartGravid: Optimism regarding self-selection
Paraphrase of Huybrechts et al. (2010)1
Internet-based recruitment of volunteers has raised concerns because the
demographics (e.g., age, socio-economic status) of those with ready internet
access differ from those without it. Furthermore, among those with internet
access, those who choose to volunteer for studies may differ considerably in
lifestyle and health from those who decline
However, volunteering to be studied via the Internet does not introduce concerns
about validity beyond those already present in other studies using volunteers
Differences between study participants and non-participants do not affect
the validity of internal comparisons within a cohort study of volunteers,
which is the main concern
Given internal validity, the only problems with studying Internet users
would occur if the biologic relations that we are studying differed between
Internet users and non-users, a possibility that seems unlikely
The primary concern should therefore be to select study groups for
homogeneity with respect to important confounders, for highly
cooperative behavior, and for availability of accurate information, rather
than attempt to be representative of a natural population
1
Huybrechts et al. (2010) A successful implementation of e-epidemiology: the Danish pregnancy planning
study ‘Snart-Gravid’ Eur J Epidemiol, 25: 297-304
23
Optimistic conclusion
Scientific generalization of valid estimates of effect (i.e., external validity) does
not require representativeness of the study population in a survey-sampling
sense either
Despite differences between volunteers and non-participants, volunteer cohorts
are often as satisfactory for scientific generalization as demographically
representative cohorts, because of the nature of the questions that
epidemiologists study
The relevant issue is whether the factors that distinguish studied groups from
other groups somehow modify the effect in question
Yes, that is the issue
1
Miettinen, O. S. (1985). Theoretical Epidemiology. Wiley, New York
24
Optimistic conclusion
Scientific generalization of valid estimates of effect (i.e., external validity) does
not require representativeness of the study population in a survey-sampling
sense either
Despite differences between volunteers and non-participants, volunteer cohorts
are often as satisfactory for scientific generalization as demographically
representative cohorts, because of the nature of the questions that
epidemiologists study
The relevant issue is whether the factors that distinguish studied groups from
other groups somehow modify the effect in question
Yes, that is the issue
Miettinen’s view, at least in 19851
“In science the generalization from the actual study experience is not made to a
population of which the study experience is a sample in a technical sense of
probability sampling. In science the generalization is from the actual study
experience to the abstract, with no referent in place or time.”
As many analyses document (e.g., social status as an effect modifier) his view is
far too trusting in immutable truths
1
Miettinen, O. S. (1985). Theoretical Epidemiology. Wiley, New York
25
Snart-Gravid and representativity
The authors’s view is heavily influenced by the Miettinen declaration, but they
do admit to possible non-representativity of the study sample, phrased as
possible selection bias in the composition of the study sample
Indeed, they are working on a validation study regarding representativity, based
on Danish population registers
‘Representativeness’ is interpreted as simple random sampling, which they
generally consider unnecessary or even counterproductive
Instead, they use an analysis-based cure: perform careful confounder control
(which it is hoped does not depend on representativeness of sample!) to support
conditional associations which are more generalizable than marginal associations
in existing populations
This approach places considerable responsibility on the statistical techniques and
leaves unmeasured confounders unattended
26
Arguments in favor of representative samples1
Richiardi, Pizzi & Pearce
Non-representative cohorts lack heterogeneity
If the exposure of interest is associated with the probability of selection, the
exposure-outcome associations estimated in a non- representative cohort may be
biased
If an intermediate variable in the causal pathway from the exposure to the
outcome is associated with the selection, exposure-outcome associations
estimated in a non-representative cohort may be biased
Ebrahim & Davey Smith (editors)
Concluded very cautiously that representativeness should neither be avoided nor
uncritically universally adopted, but its value evaluated in each particular setting
1
Discussion in Int. J. Epid. 42 (2013)
27
Internal vs External Suicide Rates
Pooled clinical trial suicide rates compared to the age-adjusted rates
in the nationally representative, Youth Risk Behavior Survey (YRBS)1
These relations can influence treatment comparisons
1
Greenhouse, Kaizar, Kelleher, Seltman, Gardner (2008). Generalizing from clinical trial data: a case study. The risk of suicidality
among pediatric antidepressant users. Statistics in Medicine, 27: 1801-1813
28
Representative sampling
Kruskal & Mosteller1 identified 9 meanings
1. General acclaim for data (the term ‘representative’ essentially used in a positive
rhetorical fashion)
2. Absence of selective forces [in the sampling process]
3. The sample as a miniature of the population
4. Representative as typical
5. Coverage of the population’s heterogeneity
6. ‘Representative sampling’ as a vague term that is to be made precise
7. Representative sampling as a specific sampling method
8. Representative sampling as permitting good estimation
9. Representative sampling as good enough for a particular purpose
1
Kruskal, Mosteller (1979). Representative sampling, III: The current statistical literature. International
Statistical Review, 47: 245–265
29
Validation from population-level databases
A finding that did not generalize
In the Nordic countries individual record linkage to detailed population registries
sometimes allows validation of the representativity of a study cohort, which is
always at least partly based on volunteers
Mortality misalignment:
Andersen et al. (1998)1 compared mortality of participants in 3 cohorts
recruited in the Copenhagen area to the general mortality in that area
There is a risk of bias if other causes for the disease under study or
confounders are not taken into account and are differently distributed
among the participants and the target population
Many factors associated with disease and death differ between
participants and non-participants either because they are implicit in the
selection criteria or because of the self-selection
The analysis showed survivor selection in all cohorts (recruited participants
being healthier at baseline than non-recruited individuals), which persisted
beyond ten years of observation for most combinations of age and sex
1
(1998) A comparison of mortality rates in three prospective studies from Copenhagen with mortality rates in the central part of the
city, and the entire country. European J. of Epidemiology, 14: 579–585
30
Validation from population-level databases
A finding that did generalize
Results from clinical trials on breast-conserving operations
appear applicable to all Danish women1
The Danish Breast Cancer Cooperative Group (DBCG) coordinates breast
cancer therapy in Denmark, where almost all women are treated for free at the
public hospitals
Many RCTs on adjuvant therapy have been conducted with sampling frame all
Danish women, suitably stratified e.g. by age and/or menopausal status
From 1982 to 1989 a randomized trial compared breast conserving surgery to
total mastectomy, and subsequently breast conserving therapy was offered as
option to qualifying patients across Denmark
The population-based registry of the DBCG allowed population-based follow-up
1989-98, finding that
Women younger than 75 years and operated on according to the
recommendations, had survival, loco-regional recurrences, distant
metastases and benefit from adjuvant radiotherapy closely matching the
results from the clinical trial
1
Ewertz et al. (2008) Breast conserving treatment in Denmark, 19891998. A nationwide population-based
study of the Danish Breast Cancer Co-operative Group. Acta Oncologica, 47, 682–690.
31
The future web-based enrollment
In surveys and Epi/Clnical studies
Web-based enrollment is here to stay and will only increase,
so creative designs and analyses are needed
Personal attributes (demographics, location, . . . ) need to be collected
along with externally available frame information
Administrative records and other ‘big data’ can supplement, calibrate,
and sometimes replace data collected from self-enrolled surveys
They can do the same for traditional surveys
Large N, whether in a web-enrolled survey or from organic data, does not
imply large information or high validity
Properties of web-based enrollment need to be compared to other,
also imperfect, alternatives
32
The future of (survey) research
Survey research faces increasing challenges in achieving acceptable
response rates, coverage and accuracy
We are less sure how to conduct good survey research now than we were four
years ago, and much less than eight years ago
And don’t look for too much help in what the polling aggregation sites may be
offering. They are only as good as the raw material they have to work with
We may not even know when were off base. What this means for 2016 is
anybody’s guess.”1
Ditto, for surveys and Epi/Clinical studies!
1
C. Zukin (2015). What’s the matter with Polling? The New York Times, 21 June 2015, http://nyti.ms/1H00TPy
33
Technology transfer, some convergence
Survey =⇒ Epi/Clinical: Attention to external inference
In most RCTs emphasis is on internal validity (establishing causality). Much less
attention is paid to the implications for patients who may differ in varying ways
and degrees from the specific homogeneous population studied.
However, considerations of external validity are vital for the practising physician
Weisberg (2015) Significance, 12: 22–27
Design strategies to increase generalizability of randomized trials
Random sampling from the target population of interest
Pragmatic trials, which aim to enroll a more representative sample
Doubly randomized preference trials (estimate the effect of randomization)
Stuart (2014). Generalizability of clinical trials results. In, Methods in Comparative Effectiveness Research
34
Technology transfer, some convergence
Survey =⇒ Epi/Clinical: Attention to external inference
In most RCTs emphasis is on internal validity (establishing causality). Much less
attention is paid to the implications for patients who may differ in varying ways
and degrees from the specific homogeneous population studied.
However, considerations of external validity are vital for the practising physician
Weisberg (2015) Significance, 12: 22–27
Design strategies to increase generalizability of randomized trials
Random sampling from the target population of interest
Pragmatic trials, which aim to enroll a more representative sample
Doubly randomized preference trials (estimate the effect of randomization)
Stuart (2014). Generalizability of clinical trials results. In, Methods in Comparative Effectiveness Research
Epi/Clinical =⇒ Survey: Relax the grip of design-based approaches
Survey research: Causal modeling, model-based imputation, design-consistent
modeling, including Bayesian approaches for stabilization, adaptive design, . . .
Research on surveys: Experiment whenever possible, including nesting in
ongoing surveys; consider all Epi/Clinical designs and analyses
35
Research on Surveys & Survey Research
Research on Surveys
Experiments need to permeate the survey world,
including experiments nested in ongoing surveys
When conducting research on surveys, employ the full armamentarium of
designs and analyses
Internal validity dominates, but external validity must be considered
Survey Research
Expand the use of (design-consistent) models, going well beyond small
domain estimation
36
Summary
Surveys emphasize external validity, representation of a well-specified
reference population
Clinical and epidemiological studies emphasize internal validity
Recommendations
Surveys should adopt additional Biostat/Epi goals and methods
In research on surveys and in survey research
Biostat/Epi should adopt additional survey goals and methods,
and use the survey definition of “representative”
With known sampling weights the sample is representative
1
Pearl J, Bareinboim E (2014). External Validity: From do-calculus to Transportability across Populations.
Statistical Science, 29: 579–595.
37
Summary
Recommendations
Transportability as a unifying theme1
Innovative designs/analyses including use of BIG DATA to transport
1
38
Summary
Recommendations
Transportability as a unifying theme1
Innovative designs/analyses including use of BIG DATA to transport
Increased cross-fertilization between the epi/biostat
and survey domains will benefit science and policy
A bit more ⇒
1
39
Part of Rod’s comments on Keiding/Louis1
I appreciate the authors’ thoughtful, nuanced article (Thanks!)
1
40
The role of probability sampling was widely argued in early debates over
the design of a massive longitudinal epidemiologic study, the U.S.
National Children’s Study (NCS). I was a member of the U.S. Federal
Advisory Committee for the study in its early days, and quoted Sir
Maurice Kendall as arguing powerfully for probability sampling as the
“scientific” design,
in the context of the World Fertility Survey in the 1970s. The Federal
Advisory Committee, consisting largely of prominent epidemiologists,
voted decisively in favor of probability sampling (Thanks for persevering)
1
41
The role of probability sampling was widely argued in early debates over
the design of a massive longitudinal epidemiologic study, the U.S.
National Children’s Study (NCS). I was a member of the U.S. Federal
Advisory Committee for the study in its early days, and quoted Sir
Maurice Kendall as arguing powerfully for probability sampling as the
“scientific” design,
in the context of the World Fertility Survey in the 1970s. The Federal
Advisory Committee, consisting largely of prominent epidemiologists,
voted decisively in favor of probability sampling (Thanks for persevering)
Survey samplers distinguish between descriptive estimands–finite
population quantities–and analytic estimands-parameters of a
superpopulation model. Some believe that probability sampling is
important for the former but not the latter. I disagree
Measures of association may be less subject to selection bias than means
and totals, but when there is significant effect modification with observed
or unobserved population characteristics, bias is clearly possible
1
42
Closing
An important word is missing from the foregoing:
43
Closing
An important word is missing from the foregoing: BAYES
44
Closing
Your conceptual and technological contributions to issues such as the
foregoing are broad and deep
You have been one of the leaders in identifying problems, and rather than
dwelling on the “illness” conduct and communicate research aimed at
prevention and cure, with substantial benefits to science and policy
I thank you for these, and for your friendship
And, of course,
45
Closing
Your conceptual and technological contributions to issues such as the
foregoing are broad and deep
You have been one of the leaders in identifying problems, and rather than
dwelling on the “illness” conduct and communicate research aimed at
prevention and cure, with substantial benefits to science and policy
I thank you for these, and for your friendship
And, of course,
I wish you a very happy 65th
46

Tom Louis - University of Michigan School of Public Health

Transcription

Similar documents

Unit 12: Not So Long Ago Lesson 9: Who Are We Now? Chapter 62

Central West End FOR LEASE

Windows Vista - Information Technology at the Johns Hopkins

r e snubs regulators

The Murmur - The Johns Hopkins Medical Auxiliary

Windows Vista - Information Technology at the Johns Hopkins

Rodeo Flyer Rolla 8/08 (Page 1)

castyour - Cancer Support Community Greater St. Louis

Security Safe Company