Nonparametric hypothesis tests One sample tests (7.2) Two-sample tests (7.3) Ordered categories (7.4)
Transcription
Nonparametric hypothesis tests One sample tests (7.2) Two-sample tests (7.3) Ordered categories (7.4)
Nonparametric hypothesis tests One sample tests (7.2) Two-sample tests (7.3) Ordered categories (7.4) Stratified tests (7.5) Tests for crossing hazards (7.6) Tests based on differences in outcome at a fixed point in time (7.8) 1 One-sample tests: censored sample of size n test hypothesis that population hazard is h0(t) for all t # J for hazard function fully-specified over interval typically take J to be largest of study times for all t # J i.e., H0: HA: for at some t # J could also formulate hypotheses in terms of , , or since there are 1-1 transformations between functions can also formulate in terms of mrl(tJ) (restricted mean life) 2 form of tests: compare observed and hypothesized weighted hazards observed event times ; di events at ti increment in estimated cumulative hazard at ti: 3 so, compare (observed) to (expected) for nonparametric estimators of the hazard, we have and so we have 4 we then compare the weighted sums for the observed and expected to obtain variance of test statistic under null: 5 derivation using counting process approach: is a martingale is stochastic integral of weight wrt martingale 6 predictable variation process can be found from the original martingale by (3.6) now, and so for large samples, Z(J) is asymptotically normally distributed; use this in testing hypothesis 7 most popular weight; W(t) = Y(t) yields log-rank statistic then, , the number of observed events prior to J, and is expected # of events 8 generalization: Harrington-Fleming family of weights: , for p, q $ 0 choice of p and q will put different weights on early, late departures from null hypothesis 9 disadvantages of 1-sample tests: neither SAS nor stata permit one-sample tests difficult to fully specify a priori reasonable form for nonetheless, valuable didactically: form of tests similar to 2 and K-sample tests 10 Tests for 2 or more samples let denote hazard in group j at t hypotheses: for all t # J HA: at least one of the is different for some t # J global alternative notation: dij / # of failures at the ith failure-time in group j Yij / # subjects at risk in ith failure-time in group j the index i refers to failures in the pooled group (i.e., all groups put together) 11 will continue to derive tests as comparisons of observed and expected weighted hazards, as before expected hazards: for 1-sample test, compute directly from prespecified hazard function compute hazard expected under hypothesis that there is no association pool all groups together; derive pooled Nelson-Aalen type estimator of increments in cumulative hazard at each failure time: compare stratum-specific increments in the cumulative hazard at the same failure-time to the pooled cumulative hazard compute summed statistic there are K summed statistics Zj(J) 12 will combine information from each of the Zj(J) to get overall test statistic for global null in principle can use different weight Wj(t) for each different group j in practice, restricted to form so obtain weighted sum of difference between observed and expected deaths in each group under null under null, hazard increment is di/Yi; multiply by # at risk to get expected # of deaths alternatively, di is observed # of deaths in pooled sample at ti; under null, divide deaths evenly among groups (proportional to # at risk), so expect deaths similar to computations in chi-squared goodness of fit statistic 13 Variance of the statistics covariance where g j interpretation of components of formulae: 14 interpretation of components of formulae: correction for ties; equals 1 if di = 1 (no ties) , come from variance/covariance of multinomial random variable with parameters di, pj= Yij/Yi 15 when structured weights used, the components vector {Zj(J)} is linearly dependent so to test, select any K-1 of the Zjs (column vector), and let G be let corresponding covariance matrix test statistic: distributed as under H0 what additional can be done for K=2 (2 sample test)? 16 for K=2 (2 sample test), can get Z statistic, compare to normal distribution, test 1-sided alternatives 17 weight functions W(t) most common: W(t) = 1; log rank test optimal power to detect proportional hazards alternatives to null W(ti) = Yi yields generalization of Mann-WhitneyWilcoxon test Tarone and Ware generalize this W(ti) = f(Yi) where f(y) is a known, fixed function suggest f(y) = y can be misleading, since weights in each stratum can depend on event times and censoring distributions especially when censoring distributions in different arms/groups differ 18 alternative version without this problem let , a survival estimator close to the product limit estimator Fleming-Harrington weights , for p, q $ 0 use so that weights are known before failure (predictable) when weights are predictable, derivation works easily with counting process theory is the integral of a predictable process with respect to a martingale log-rank: p=q=0 version of Wilcoxon: p=1, q=0 19 greater weight to early failures/differences by appropriate choice, can pick any single region to give greatest weights statistics presented are generalizations to censored data of linear rank statistics 20 typically, compute weights with log-rank weights (W(ti)= 1), Gehan weights W(ti)= Yi computed in SAS see SAS code below: proc lifetest; time t1*d1(0); strata g(0.5 1.5 2.5); other option using test statement; book recommends using strata statement automatically produces these 2 test statistics: 21 The LIFETEST Procedure Testing Homogeneity of Survival Curves for T1 over Strata Rank Statistics g Wilcoxon 2.680 -15.767 13.087 127.0 -1545.0 1418.0 1 2 >=2.5 Log-Rank Covariance Matrix for the Log-Rank Statistics g 1 2 >=2.5 1 2 >=2.5 15.5407 -9.9934 -5.5473 -9.9934 19.7910 -9.7976 -5.5473 -9.7976 15.3449 22 Covariance Matrix for the Wilcoxon Statistics g 1 2 >=2.5 1 2 >=2.5 158038 -96825 -61213 -96825 192190 -95365 -61213 -95365 156578 Test of Equality over Strata Test Log-Rank Wilcoxon -2Log(LR) Chi-Square DF 15.2198 16.3034 20.6325 2 2 2 23 Pr > Chi-Square 0.0005 0.0003 <.0001 other option, using test statement: proc lifetest; time t1*d1(0); test g2 g3; /* tests groups 2 and 3 separately vs. 1 then combine*/ The LIFETEST Procedure Univariate Chi-Squares for the Wilcoxon Test Variable g2 g3 Test Statistic 11.1941 -10.2208 Standard Deviation 3.0937 3.0123 24 Chi-Square 13.0923 11.5126 Pr > Chi-Square 0.0003 0.0007 Covariance Matrix for the Wilcoxon Statistics Variable g2 g3 g2 g3 9.57106 -4.90931 -4.90931 9.07394 Forward Stepwise Sequence of Chi-Squares for the Wilcoxon Test Variable g2 g3 DF 1 2 Chi-Square 13.0923 16.1524 Pr > Chi-Square Chi-Square Increment 0.0003 0.0003 13.0923 3.0601 Pr > Increment 0.0003 0.0802 Univariate Chi-Squares for the Log-Rank Test Variable g2 g3 Test Statistic 15.7673 -13.0873 Standard Deviation 4.4521 3.9205 25 Chi-Square 12.5426 11.1435 Pr > Chi-Square 0.0004 0.0008 Covariance Matrix for the Log-Rank Statistics Variable g2 g3 g2 g3 19.8210 -9.8123 -9.8123 15.3701 Forward Stepwise Sequence of Chi-Squares for the Log-Rank Test Variable g2 g3 DF 1 2 Chi-Square 12.5426 15.1963 Pr > Chi-Square Chi-Square Increment 0.0004 0.0005 12.5426 2.6537 Pr > Increment 0.0004 0.1033 Overall test shown here; somewhat different from what is produced by strata statement book recommends first version; what is discussed above 26 Stata: after stsetting the data, sts test varlist infile g T1 T2 d1 d2 d3 TA A TC C TP P Z1 Z2 Z3 Z4 Z5 Z6 Z7 z8 z9 z10 using d:\wpfiles\surv_anl\gvt1.txt (137 observations read) . stset T1 d1 id: -- (meaning each record a unique subject) entry time: exit time: failure/censor: -T1 d1 (meaning all entered at time 0) 27 . sts test g failure time: failure/censor: T1 d1 Log-rank test for equality of survivor functions -----------------------------------------------| Events g | observed expected ------+------------------------1 | 24 21.32 2 | 23 38.77 3 | 34 20.91 ------+------------------------Total | 81 81.00 chi2(2) = Pr>chi2 = 15.22 0.0005 28 . sts test g, wilcoxon failure time: failure/censor: T1 d1 Wilcoxon (Breslow) test for equality of survivor functions ---------------------------------------------------------| Events Sum of g | observed expected ranks ------+-------------------------------------1 | 24 21.32 127 2 | 23 38.77 -1545 3 | 34 20.91 1418 ------+-------------------------------------Total | 81 81.00 0 chi2(2) = Pr>chi2 = 16.30 0.0003 29 . sts test g, logrank failure time: failure/censor: T1 d1 Log-rank test for equality of survivor functions -----------------------------------------------| Events g | observed expected ------+------------------------1 | 24 21.32 2 | 23 38.77 3 | 34 20.91 ------+------------------------Total | 81 81.00 chi2(2) = Pr>chi2 = 15.22 0.0005 30 . sts test g,fh(0,0) failure _d: analysis time _t: d1 T1 Fleming-Harrington test for equality of survivor functions | Events Events Sum of g | observed expected ranks ------+-------------------------------------1 | 24 21.32 2.6800072 2 | 23 38.77 -15.767289 3 | 34 20.91 13.087282 ------+-------------------------------------Total | 81 81.00 0 chi2(2) = Pr>chi2 = 15.22 0.0005 31 . sts test g,fh(1,0) failure _d: analysis time _t: d1 T1 Fleming-Harrington test for equality of survivor functions | Events Events Sum of g | observed expected ranks ------+-------------------------------------1 | 24 21.32 1.0428607 2 | 23 38.77 -11.899202 3 | 34 20.91 10.856341 ------+-------------------------------------Total | 81 81.00 0 chi2(2) = Pr>chi2 = 15.92 0.0003 32 . sts test g,fh(0,1) failure _d: analysis time _t: d1 T1 Fleming-Harrington test for equality of survivor functions | Events Events Sum of g | observed expected ranks ------+-------------------------------------1 | 24 21.32 1.6371465 2 | 23 38.77 -3.8680867 3 | 34 20.91 2.2309402 ------+-------------------------------------Total | 81 81.00 0 chi2(2) = Pr>chi2 = 8.45 0.0147 33 . sts test g,fh(0.5,0.5) failure _d: analysis time _t: d1 T1 Fleming-Harrington test for equality of survivor functions | Events Events Sum of g | observed expected ranks ------+-------------------------------------1 | 24 21.32 2.3425108 2 | 23 38.77 -6.6296218 3 | 34 20.91 4.2871109 ------+-------------------------------------Total | 81 81.00 0 chi2(2) = Pr>chi2 = 14.50 0.0007 34 35 Tests for trend: what are null, alternative hypotheses? What is idea behind test for trend? (Same as in categorical data) 36 null hypothesis is the same as before idea is to get more power under ordered alternatives will use the same stratum-specific statisitics and estimated covariate matrix strata are presumed meaningfully ordered assign stratum-specific scores aj typically, aj = j chosen, but may choose other values (e.g., if using categories based on continuous variables, might choose mean of underlying continuous variables) 37 compute Z-statistic: i.e., numerator is weighted sum of statistics denominator is variance under null, derived from 38 with SAS, use TEST statement asthma data proc lifetest notable; time ttr28*rcsr28(1); test tseq; Univariate Chi-Squares for the Log-Rank Test Test Standard Pr > Variable Statistic Deviation Chi-Square TSEQ -15.9666 10.0939 2.5021 39 Chi-Square 0.1137 in stata, use trend statement GVT data . sts test g,tr failure _d: analysis time _t: d1 T1 Log-rank test for equality of survivor functions | Events Events g | observed expected ------+------------------------1 | 24 21.32 2 | 23 38.77 3 | 34 20.91 ------+------------------------Total | 81 81.00 chi2(2) = Pr>chi2 = 15.22 0.0005 Test for trend of survivor functions chi2(1) = Pr>chi2 = 40 2.58 0.1082 . sts test g,tr fh(0.5,0.5) failure _d: analysis time _t: d1 T1 Fleming-Harrington test for equality of survivor functions | Events Events Sum of g | observed expected ranks ------+-------------------------------------1 | 24 21.32 2.3425108 2 | 23 38.77 -6.6296218 3 | 34 20.91 4.2871109 ------+-------------------------------------Total | 81 81.00 0 chi2(2) = Pr>chi2 = 14.50 0.0007 Test for trend of survivor functions chi2(1) = Pr>chi2 = 0.60 0.4385 41 . generate g3 = g^2 . sts test g3,tr failure _d: analysis time _t: d1 T1 Log-rank test for equality of survivor functions | Events Events g3 | observed expected ------+------------------------1 | 24 21.32 4 | 23 38.77 9 | 34 20.91 ------+------------------------Total | 81 81.00 chi2(2) = Pr>chi2 = 15.22 0.0005 Test for trend of survivor functions chi2(1) = Pr>chi2 = 4.78 0.0289 nonlinear transformations of scores yield different tests 42 Stratified tests want to test whether, within levels of some covariate(s) X, hazard is the same/not associated with other covariate A (test about main effect of A) what is null hypothesis? When will tests be of particular use? 43 Null hypothesis: for all X, a, a’ useful if: have some covariate associated with hazard; i.e., if A is treatment of interest and associated with covariate , have confounding unconditional tests biased similarly, if have censoring which is associated with X1, comparison of crude hazards will be biased 44 stratified tests: nonparametric if strata levels naturally defined can also be done by categorizing continuous variables, collapsing what are problems with this? 45 no longer nonparametric; makes homogeneity assumption hypothesis: for all strata s test statistic: let ; construct chi-squared test statistic as above limitations on stratified test? 46 As with other stratified analyses, when # of strata gets large, can lose power, strata get small 47 SAS implementation: proc lifetest plots = (s); time t1*d1(0); strata z3; /* sex */ test g2 g3; stata sts test g, strata(Z3) detail /* detail provides stratum-specific as well as overall stratified tests */ failure time: failure/censor: T1 d1 48 Stratified log-rank test for equality of survivor functions ---------------------------------------> Z3 = 0 | Events g | observed expected ------+------------------------1 | 8 7.21 2 | 10 18.83 3 | 18 9.96 ------+------------------------Total | 36 36.00 chi2(2) = Pr>chi2 = 11.33 0.0035 -> Z3 = 1 | Events g | observed expected ------+------------------------1 | 16 13.74 2 | 13 20.35 3 | 16 10.91 ------+------------------------Total | 45 45.00 chi2(2) = Pr>chi2 = 49 5.45 0.0655 -> Total | Events g | observed expected(*) ------+------------------------1 | 24 20.95 2 | 23 39.18 3 | 34 20.87 ------+------------------------Total | 81 81.00 (*) sum over calculations within Z3 chi2(2) = Pr>chi2 = results fairly similar to unstratified test 50 15.90 0.0004 stratified test for matched pairs in any stratum, if both subjects (i.e., subject in stratum j, other stratum) at risk at time of failure in stratum j, have contribution to sum at first failure time of ; at subsequent failure time (when/if other subject fails) of if other subject fails first, have contribution of , contribution of 0 at other failuretime if no subject fails, no contribution second failure in stratum contributes nothing 51 let D1 be the # of matched pairs in which the individual from group 1 fails first for any weight functions discussed, get test statistic: what is needed condition for asymptotics to work? 52 asymptotically normal (i.e., as # of strata goes to infinity) strata/pairs in which both subjects fail contribute nothing McNemar’s test; also used in matched case-control studies (there, strata always have 1 failure) no loss of information by excluding strata with 0 failures, 2 failures 53 Tests for crossing hazards Discuss Renyi-type tests tests with power to detect crossing hazards alternatives consider 2-sample tests censored-data analogs of Kolmogorov-Smirnov statistic find value of test statistic at each death time; increments in test statistic death time ti have positive expectation if negative expectation if reverse thus, expect to increase until hazards cross, decrease thereafter if hazards cross, statistic will reach maximum before maximum death time 54 at , where hazards do not cross, can base test on test stat at maximum death time, i.e., since this is expected to increase, this will be reasonably powerful where hazards cross, poor power, since, test stat decreases after hazards cross; early and late part of test statistics may cancel each other out base test on maximum value of test statistic let be a test statistic where F(J) is, as before, variance of (i.e., at end of follow-up) under null Q is approximately distributed as , where is a standard Brownian motion process critical values of Q found in table C.5 55 Tests for equality of survival functions at particular time point Let Let , etc. denote (column) vector H0: Let C denote matrix of contrasts, e.g., each row defines separate contrast: hypothesis: how to test: 56 where is estimated covariance matrix of Then, what is structure of sample test? is test statistic , and how does it lead to simple 2- 57 s independent, leads to Z test 58 related: nonparametric estimation of contrasts in survival relative risk: general term risk ratio: estimate as how should one estimate confidence intervals? 59 delta method, log transformation: not but usual approach symmetry of confidence intervals when indices of exposed/unexposed permuted 60