Survival Analysis and Competing Risks Data Analysis
Transcription
Survival Analysis and Competing Risks Data Analysis
DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Survival Analysis January-February 2016 Haesook T Kim Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute 1 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Contents 1. Introduction 2. Terminologies and Notations 3. Nonparametric Estimation 4. Comparison of Survival Curves: Nonparametric Approach 5. Semiparametric Survival Methods: PH model 6. Analysis of Competing Risks Data 2 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 3 1. Introduction • Survival Analysis is used to analyze data in which the time until the event is of interest. • Survival Analysis utilizes two measurements simultaneously: binary (i.e., occurrence of an event vs. no event) and continuous response variables (i.e., time to the event). It allows incomplete measurement of time to the event (i.e., censoring). • Event should be clearly and precisely defined before the analysis. Examples of ‘event’ in cancer clinical trials include death, disease recurrence after treatment, disease progression, treatment related death, or incidence of new disease. Note: ‘event’ and ‘failure’ are often used interchangeably. • Time origin should be unambiguously defined before the analysis. All individuals should be as comparable as possible at their time origin. e.g., date of randomization, date of documented CR, date of transplantation, date of study enrollment in prospective studies. Time origin (or time zero) is usually well defined in prospective studies, but not always well defined in retrospective studies (e.g., onset of thrombotic microangiopathy (TMA) after allogeneic transplant, readmission study, ES study, Stanford Heart Study and so on.) “While it might be more biologically meaningful to measure time from the first instant at which the patient’s symptoms met certain criteria of severity, the difficulty of determining and the possibility of bias in such values would normally exclude their use as time origin. Such information might, however, be useful as an explanatory variable” from Cox and Oakes’ Analysis of Survival Data. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 4 • Time origin does not need to be and usually is not at the same calendar time for each individual. Most clinical trials have staggered entry. i.e., patients enter over a certain time period. x x o o o o x x x 2001 2005 x 2010 t=0 t=10 • Timescale (scale for measuring time) should be identical. e.g., days, months, years. • Characteristics of failure time – Failure time is always non-negative. – It has a skewed distribution and will never be normallly distributed, thus reporting mean survival time is not so meaningful – The probability of surviving past a certain time is often more relevant than the expected survival time. Expected survival time may be difficult to estimate if the amount of censoring is large. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 5 1.1 Censoring • Right-censoring: a failure time is censored if the failure (or event) has not occurred by the end of follow-up. i.e., the true unobserved event is right of the censoring time. δ= 1 if T ≤ C (uncensored or observed) 0 if T > C (censored) where δ is failure indicator, C is censoring time, T is failure time. • There are three types of right-censoring: Type I,II,III censoring. – Type I censoring: occurs when a study is planned to end after a predetermined time, all Ci are the same (e.g., sacrificing all animal after one month). – Type II censoring: occurs when a study terminates after a predetermined number of events observed. (e.g., terminate when X number of deaths occur in animal studies). Type II censoring is generally not a feasible design if there is staggered entry to the study. (See Kalbfleisch and Prentice for further details). – Type III censoring: C is a random variable, and δ is an indicator. – We usually deal with random type III right censoring in which each subject’s event time is observed only if the event occurs before a certain time, but censoring time can vary between subjects. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 6 • Left-censoring: a failure time is censored if the failure is known to be occurred before a certain time. δ= 1 if C ≤ T 0 if C > T (censored) where δ is failure indicator, C is censoring time, T is failure time. – In HIV study, HIV infection may have occurred before an individual enters a cohort study on AIDS. If the time origin is unknown, it is left censored. – A study of age at which African children learn a task. Some already knew (left-censored), some learned during the study period, some had not learned by the end of the study (right-censored). – A survey of smoking. Q: “When did you first smoke?” Answers are exact age, never smoked, or smoked but can not remember the exact time (left-censored). • Interval-censoring: combination of right and left censoring occur. e.g., in the nephropathy for diabetics (Andersen at el, page 30-31, Statistical Models Based on Counting Processes), for some individuals, the exact time of onset of diabetic nephropathy (DN) was not observed, and only the time last seen without DN and the time first seen with DN are available. • Independent vs. Informative Censoring – T and C are indendent (T ⊥ C) if censoring distribution contains (non-informative) no information about the distribution of T . note: we usually assume independenece. – Censoring is considered informative if the distribution of C contains any information about the parameters characterizing the distribution of T . DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 2. Terminologies and Notations • T : time to an event. • C: censoring time. • Y = min(T, C): right-censored observed time. • δ = I(T ≤ C): failure indicator, i.e., δ = 1 if Y = T ; δ = 0 otherwise. • X: a random vector of covariates. Covariates can be discrete or continuous. Covariates also can be time-constant or time-varying. • λ(t): hazard function • Λ(t): cumulative hazard function • S(t): (unconditional) survival function, and S(t) = Prob(T ≥ t) If there are covariates, then the conditional functions are • λ(t|X): conditional hazard function • Λ(t|X): conditional cumulative hazard function • S(t|X): conditional survival function The relationship of these conditional functions will be discussed later. 7 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim If T is a continuous random variable, then S(t) = 1 − F (t), where F (t) is the cumulative distribution function for T . The hazard function is Prob(t ≤ T < t + u|T ≥ t) λ(t) = lim u→0 u Prob(t ≤ T < t + u ∩ T ≥ t) = lim u→0 Prob(T ≥ t) ∗ u Prob(t ≤ T < t + u) = lim u→0 Prob(T ≥ t) ∗ u F (t + u) − F (u) ∂F (t)/∂t f (t) = lim = = u→0 S(t) ∗ u S(t) S(t) where f (t) is the probability density function of T evaluated at t. Furthermore, f (t) ∂logS(t) ∂S(t)/∂t = =− ∂t S(t) S(t) ∂logS(t) λ(t) = − . ∂t Thus, Λ(t) = Z t 0 λ(v)dv = −logS(t); and S(t) = exp[−Λ(t)] 8 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 9 If T is a discrete random variable, then λ(tk ) = P (T = tk ) f (tk ) = where P (T ≥ tk ) S(tk ) S(tk ) = P (T ≥ tk ) = P (T ≥ t1, T ≥ t2, · · · , T ≥ tk ) = P (T ≥ t1)P (T ≥ t2|T ≥ t1) · · · P (T ≥ tk |T ≥ tk−1) = P (T ≥ t1) = k Y j=1 k Y j=2 P (T ≥ tj |T ≥ tj−1) [1 − P (T = tj |T ≥ tj )] = k Y j=1 (1 − λj ). Survival probability at a certain time is a conditional probability of surviving beyond that time, given that an individual has survived just prior to that time. This conditional probability can be estimated in a study as the number of patients who are alive or event-free without loss to follow-up at that time divided by the number of patients who are alive just prior to that time. The Kaplan-Meier (KM) estimate of survival probability is then the product of these conditional probabilities up until that time. We will discuss this further later. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 2.1 Properties of the distribution of T . Tq is the time by which a fraction q of the subjects will fail (q th quantile). Conversely, the value t such that S(t) = 1 − q. Tq = S −1 (1 − q) = Λ−1[−log(1 − q)] The median life length, the time by which 50% of subjects will fail, is obtained by setting S(t) = 0.5: T0.5 = S −1 (0.5) = Λ−1[log(2)] If the survival distribution is exponential, then λ(t) = λ (i.e., constant hazard over time), and thus, Λ(t) = λt and S(t) = exp[−Λ(t)] = exp(−λt) T0.5 = log(2)/λ If the survival distribution is Weibull, then λ(t) = αγtγ−1, and thus, Λ(t) = αtγ S(t) = exp(−αtγ ), and T0.5 = [log(2)/α]1/γ 10 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 11 3. Nonparametric Estimation 3.1 Kaplan-Meier Method Since the true survival distribution is seldom known, it is useful to estimate the distribution without making any assumptions. Let Fn(t) be the usual empirical cumulative distribution function in the absence of censoring. Then a nonparametric estimator of S(t) is Sn (t) = 1 − Fn (t) based on the observed failure times T1, · · · Tn . Sn (t) = [number of Tj > t]/n That is, Sn (t) is the fraction of observed failure times that exceed t. In the presence of censoring, however, S(t) can be estimated up until the end of follow-up time by the Kaplan-Meier product-limit estimator (1958, JASA). The product-limit estimator is a nonparametric maximum likelihood estimator and the formula is as follows. Ŝ(t) = Y (1 − λ̂j ) = i:tj <t Y (1 − dj /nj ) i:tj <t where t1, t2, · · · , tk are the unique event times, dj is the number of failures at tj , and nj is the number of subjects at risk just prior to tj . The KM estimator of Λ(t) is Λ̂(t) = −logŜ(t) and an estimator of q th quantile failure time is Ŝ −1 (1 − q). DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 12 Example: suppose a set of failure times are 10 30 30 50+ 70+ 90 100+ where + denotes a censored time. Then --------------------------------------------------------------ti ni di ci di/ni (ni-di)/ni est. of S(t) --------------------------------------------------------------0 7 0 0 0/7 7/7 1 10 7 1 0 1/7 6/7 1*(6/7)=0.85 30 6 2 0 2/6 4/6 0.85*(4/6)=0.57 50+ 4 0 1 0/4 4/4 0.57*(4/4)=0.57 70+ 3 0 1 0/3 3/3 0.57*(3/3)=0.57 90 2 1 0 1/2 1/2 0.57*(1/2)=0.29 100+ 1 0 1 0/1 1/1 0.29*(1/1)=0.29 --------------------------------------------------------------- There are a few variance estimators for Ŝ(t). The Greenwood formula for asymptotic variance estimator of Ŝ(t) is dj ˆ Ŝ(t)) = Ŝ 2 (t) Y Var( j|tj ≤t nj (nj − dj ) This Greenwood formula may be unstable in the tail of ditribution and some authors suggested alternatives. e.g., Tsiatis’s estimator is ˆ Ŝ(t)) = Ŝ 2(t) Var( dj 2 j|tj ≤t nj Y Once we estimate the survival function, we can construct pointwise (1-α)% confidence intervals. Ŝ(t) ± z1−α/2s.e.[Ŝ(t)] DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 13 However, this approach can yield bounds outside of [0,1]. If this happens, replace <0 with 0 and >1 with 1. Another approach is taking a Log-Log transformation. • L(t) = log(−log(S(t))) • 95% CI: [L̂(t) − A, L̂(t) + A] • Since S(t) = exp(−exp(L(t)), the confidence bounds for the 95% CI on S(t) are [exp(−e(L̂(t) + A)), exp(−e(L̂(t) − A))] • Substituting L̂(t) = log(−log(Ŝ(t))) back into the above bounds, we get ([Ŝ(t)]exp(A), [Ŝ(t)]exp(−A)) • Replacing A with 1.96se(L̂(t)), [Ŝ(t)]exp(±1.96se(L̂(t)) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 14 Example 1a: product-limit estimator of S(t). I. SAS code: proc lifetest data=one outsurv=outs timelist=(12, 24) plots=(s) graphics; time time*status(0); run; The LIFETEST Procedure Product-Limit Survival Estimates Survival Standard Timelist os_t Survival Failure Error 12.0000 24.0000 11.6961 22.8994 0.4742 0.3674 0.5258 0.6326 0.0375 0.0387 Summary Statistics for Time Variable os_t Percent 75 50 25 Quartile Estimates Point 95% Confidence Interval Estimate [Lower Upper) 53.3881 37.5524 . 9.1335 6.5380 16.0329 3.6468 2.4312 4.5010 Mean Standard Error 22.1518 1.7612 Summary of the Number of Censored and Uncensored Values Percent Total Failed Censored Censored 181 114 67 37.02 Number Failed Number Left 94 108 77 35 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc print data=outs; 15 ** _CENSOR_=1 for censored observations; Obs 91 92 93 94 time 11.4333 11.6961 11.7290 12.1889 _CENSOR_ 1 0 1 1 SURVIVAL 0.48027 0.47419 0.47419 0.47419 SDF_LCL . 0.40073 . . SDF_UCL . 0.54765 (i.e., 0.47419 +/- 1.96*0.0375) . . 130 131 132 133 134 135 136 137 22.1109 22.8994 22.9651 22.9651 23.3593 23.6879 24.3778 24.7721 1 0 1 1 1 1 1 0 0.37681 0.36739 0.36739 0.36739 0.36739 0.36739 0.36739 0.35658 . 0.29164 . . . . . 0.28015 . 0.44314 . . . . . 0.43301 Note1: SAS PROC LIFETEST prints out a pointwise 95% confidence interval of KM estimate at each observed failure time. Use ALPHA=0.1 for 90% confidence intervals. The Greenwood formula is defalut. These are pointwise confidence intervals for particular time points, and this should not be interpreted as the global confidence band. Note2: survfit in R offers other options to calculate the confidence intervals (e.g., conf.type=“log-log”, “log”, ...) ods ods ods ods graphics on; rtf file="lifetest.rtf"; noptitle; select SurvivalPlot; * to select the survival curve only; *proc lifetest data=one plots=survival(cl cb=hw strata=panel); * check this out; *proc lifetest data=one plots=survival(cl cb=hw); * this produces output tables; proc lifetest data=one plots=survival(cl cb=hw) notable; time survtime*censor(1); strata cell; run; ods rtf close; ods graphics off; DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim The SAS System 16 1 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 0.8 0.4 0.0 0.4 probability 0.8 KM curve without 95% CI 0.0 probability KM curve with 95% CI 17 0 6 12 18 months 24 30 36 0 6 12 18 24 30 36 months II. R code: library(survival) attach(subset(aml.data, grp==1)) par(mfrow=c(1,2)) fit <- survfit(Surv(os.t, os) ~ grp, type="kaplan-meier") plot(fit, xlab ="months", ylab = "probability", mark.time=T, xlim=c(0,36), xaxt="n", col=4) title("KM curve with 95% CI", cex = 0.7) # axis(1, at = c(0, 6, 12, 18, 24, 30, 36), lwd=0.5) axis(2, at = c(0, 0.2, 0.4, 0.6, 0.8, 1), lwd =0.5) fit2 <- survfit(Surv(os.t, os) ~ grp, conf.type=c("none"), data=temp) plot(fit2, xlab ="months", ylab = "probability", mark.time=T, xlim=c(0,36), xaxt="n", col=6) title("KM curve without 95% CI", cex = 0.7) # axis(1, at = c(0, 6, 12, 18, 24, 30, 36), lwd=2) axis(2, at = c(0, 0.2, 0.4, 0.6, 0.8, 1), lwd = 2) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 18 3.2 Estimation of Tq : PROC LIFETEST SAS v9.2 • Tq = min{tj |Ŝ(tj ) < 1 − q} If Ŝ(t) = 1 − q from tj to tj+1, the Tq is taken to be (tj + tj+1)/2 in SAS and R. Note: by definition, Ŝ(t) = 1 − q, but Ŝ(t) ≤ 1 − q in practice, thus Tq = tj . • median survival time T0.5 = min{tj |Ŝ(tj ) < 0.5} • confidence interval for median survival time based on Brookmeyer and Crowley (1982) (Ŝ(t) − 0.5)2 ≤ cα Y = Var(Ŝ(t))/n where cα is the upper α-th percenbtile of a central chi-suqare distribution with 1 d.f. i.e. prob(Y > cα ) = α where Y ∼ χ21. (e.g., prob(Y > 3.84146) = 0.05.) • This methodology was further generalized to construct the confidence interval for Tq based on a g-transformed confidence interval for S(t)( Klein and Moeschberger (1997)) | g(Ŝ(t)) − g(1 − q)) | ≤ Z1−α/2 ′ g (Ŝ(t))σ̂(Ŝ(t)) where g ′ (x) is the first derivative of g(x) and Z1−α/2 is the 100(1 − α/2)th percentile of the standard distribution. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim √ • Options for g(.): Linear (x), Log-Log (log(−log(x))), Arcsine-Square Root (sin−1 ( x), Logit x (log( 1−x )), Log (log(x)). • Note: The default method may be different among various software packages. Therefore, it is imprtant to check the default setting in each type of software. 19 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc lifetest data=one outsurv=out1 stderr timelist=(12,24); time SurvTime*Censor(1); data temp; set out1; ci_med=((SURVIVAL-0.5)/sdf_stderr)**2; run; proc print data=temp; ------------------------------------------------------------------------The LIFETEST Procedure Product-Limit Survival Estimates Timelist 12.0000 24.0000 SurvTime 12.000 24.000 Survival 0.8759 0.7518 Failure 0.1241 0.2482 Survival Standard Error 0.0282 0.0369 Number Failed 17 34 Number Left 120 103 Summary Statistics for Time Variable SurvTime Quartile Estimates Percent 75 50 25 Mean 132.777 Point Estimate 162.000 80.000 25.000 95% Confidence Interval Transform [Lower Upper) ** note [, ): exclude next event LOGLOG 132.000 231.000 LOGLOG 52.000 100.000 LOGLOG 18.000 33.000 Standard Error 15.368 Summary of the Number of Censored and Uncensored Values Percent Total Failed Censored Censored 137 128 9 6.57 20 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Obs 32 33 34 35 Surv Time 45 48 49 51 _CENSOR_ 0 0 0 0 21 SURVIVAL 0.63408 0.62671 0.61933 0.59721 SDF_ STDERR 0.041228 0.041403 0.041567 0.041998 SDF_LCL 0.54737 0.53984 0.53233 0.50992 SDF_UCL 0.70863 0.70174 0.69483 0.67399 ci_med 10.58 9.37 8.24 5.36 36 37 38 39 40 41 42 43 44 52 (LL) 53 54 56 59 61 63 72 73 0 0 0 0 0 0 0 0 0 0.57509 0.56772 0.55298 0.54560 0.53823 0.53086 0.52348 0.51611 0.50874 0.042340 0.042434 0.042594 0.042659 0.042715 0.042761 0.042798 0.042826 0.042844 0.48769 0.48031 0.46562 0.45830 0.45100 0.44372 0.43645 0.42921 0.42198 0.65298 0.64594 0.63181 0.62471 0.61760 0.61047 0.60332 0.59616 0.58897 3.15 2.55 1.55 1.14 0.80 0.52 0.30 0.14 0.04 45 80 0 0.49399 0.042852 0.40759 0.57456 0.02 0 1 0 0 1 0 0 0 1 0 0 1 0.48662 0.48662 0.47913 0.47165 0.47165 0.46404 0.45643 0.44122 0.44122 0.42574 0.41799 0.41799 0.042842 . 0.042832 0.042812 . 0.042792 0.042762 0.042668 . 0.042552 0.042477 . 0.40041 . 0.39313 0.38587 . 0.37849 0.37113 0.35647 . 0.34160 0.33419 . 0.56732 . 0.55997 0.55261 . 0.54512 0.53762 0.52255 . 0.50718 0.49946 . 0.10 . 0.24 0.44 . 0.71 1.04 1.90 . 3.05 3.73 . 46 47 48 49 50 51 52 53 54 55 56 57 82 83 84 87 87 90 92 95 97 99 100 (UL) 100 58 103 0 0.41011 0.042401 0.32665 0.49161 4.49 59 103 1 0.41011 . . . . 60 105 0 0.40207 0.042325 0.31896 0.48360 5.35 note: when censored and failure times are tied, the failure time is assumed to occur before the censored time. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim data one; * Example when estimate of S(t)=0.5; input time event; cards; 10 1 30 1 30 1 33 1 50 1 70 0 90 1 100 0 ; proc lifetest; time time*event(0); run; Product-Limit Survival Estimates Survival Standard Number time Survival Failure Error Failed 0.000 1.0000 0 0 0 10.000 0.8750 0.1250 0.1169 1 30.000 . . . 2 30.000 0.6250 0.3750 0.1712 3 33.000 0.5000 0.5000 0.1768 4 50.000 0.3750 0.6250 0.1712 5 70.000* . . . 5 90.000 0.1875 0.8125 0.1578 6 100.000* . . . 6 NOTE: The marked survival times are censored observations. Number Left 8 7 6 5 4 3 2 1 0 Summary Statistics for Time Variable time Quartile Estimates Point 95% Confidence Interval Percent Estimate Transform [Lower Upper) 75 90.000 LOGLOG 30.000 . 50 41.500 LOGLOG 10.000 . *** (33+50)/2=41.5 25 30.000 LOGLOG 10.000 50.000 22 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 23 > library(survival) > attach(median_data) > my.surv <- Surv(time, event) > s1 <- survfit(Surv(time, event)~dummy) > print(s1) Call: survfit(formula = Surv(time, event) ~ dummy) records 8.0 n.max n.start 8.0 8.0 events 6.0 median 0.95LCL 0.95UCL 41.5 30.0 NA NOTE: median time is the same but the 95% CI is different between SAS and R. This is because SAS uses the Brookmeyer and Crowley method as the default and R uses a log transformation as the default. proc lifetest conftype=log; time time*event(0); run; Summary Statistics for Time Variable time Quartile Estimates Percent Point Estimate 75 50 25 90.000 41.500 30.000 95% Confidence Interval Transform [Lower Upper) LOG LOG LOG 50.000 30.000 10.000 . . . DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 95% CI using log−log transformation 1.0 1.0 0.8 0.8 probability probability Median survival time when S(t)=0.5 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 0 20 40 60 80 100 0 20 60 80 100 time 95% CI using log transformation 95% CI using ’plain’ (i.e., no transformation) (Default in R) (+/− 1.96*s.e.) 1.0 0.8 0.8 probability probability 40 time 1.0 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 0 24 20 40 60 time 80 100 0 20 40 60 time 80 100 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 25 3.3 Life table • This is a precursor of the KM method. • A life table is a summary of the survival data grouped into convenient time intervals. In some applications, data are collected in such a grouped form. In other cases, the data might be grouped to get a simpler and more easily understood presentation. • The life table method is designed primarily for situations in which actual failure and censoring times are unavailable and only the number of failures and the number of censored cases are known in a given interval. • A good example is the SEER (Surveillance Epidemiology and End Results) data from NCI. SEER provides information on cancer statistics to help reduce the burden of this disease on the U.S. population. They show the survival curve by type of cancer yearly. • We will not discuss the details of this method further, but refer to Section 3.6 in Lawless, ”Statistical models and methods for lifetime data” for descriptions and examples. Example 1b: Life Table from Grouped Data. Stanford Heart Transplant Data. See also page 41 in Allison’s Survival Analysis Using SAS. data; input time status number @@; cards; 25 1 16 25 0 3 75 1 11 75 0 0 300 1 5 300 0 4 550 1 2 550 0 6 1150 1 1 1150 0 2 ... ; 150 1 4 850 1 4 150 0 2 850 0 3 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc lifetest method=life intervals=50 100 200 400 700 1000 1300 1600 plots=(s. h) graphics; time time*status(0); freq number; run; Note: at time=50, 16 deaths and 3 censored. Thus, SKM (t = 50)=(63-16)/63=0.75 26 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 4. Comparison of Survival Curves: Nonparametric Approach If there are two (or more) treatment groups in a clinical trial, a natural question to ask is whether one treatment prolongs survival compared to the other treatment. i.e., testing Ho : S1(t) = S2 (t) for all t. There are a few nonparametric tests to answer this question. These are • Log-rank test: Mantel-Haenszel test (default in PROC LIFETEST), Peto & Peto test • Wilcoxon test: Gehan test (default in PROC LIFETEST), Peto and Prentice test • Gray and Tsiatis Log-rank test for a cure rate survival curves. Note: • The weighted log-rank test was suggested by many authors, but not widely used Table. Weights available in PROC LIFETEST Test W (ti) Log-rank 1 Wilcoxon n qi Tarone and Ware (ni ) Peto-Peto S̄(ti) i modified Peto-Peto S̄(ti) nin+1 Harrington and Fleming-Harrington [Ŝ(ti)]p[1 − Ŝ(ti)]q where Ŝ(t) is the product-limit estiamte at t, S̄(t) is a survivor function estimate • there is a likelihood ratio test in PROC LIFETEST. This test assumes exponential survival distribution - i.e., a constant hazard function. 27 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 28 4.1 Log-rank test Let t1 < · · · < tk represent the k ordered distinct failure times for the sample formed by pooling the two (or p) samples. At the j-th failure time tj , we have the following table. Treatment no. of death no of alive Total 1 d1j n1j − d1j n1j 2 d2j n2j − d2j n2j Total dj nj − dj nj where d1j and d2j are the number of failures in treatment 1 and 2, respectively, at the j-th failure time, n1j and n2j are the number of patients at risk in treatment 1 and 2, respectively, at the j-th failure time. Then the log-rank statistic over the k failures is k n1j dj X w = (d1j − ) = (o1j − e1j ) nj 1 1 k X i.e., this is a sum of deviations of observed number of events from expected number of events over time. Note: By symmetry, the absolute value of w for Treatment 1 is the same as the absolute value for Treatment 2. Thus whether summing over treatment 1 or 2, we will have the same overall test statistic since the numerator of the overall test statistic is the square of w. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 29 If the k contingency tables were independent, the variance of the log-rank statistic w would be V = V1 + · · · + Vk and an approximate test of equality of the two (or p) survival distributions is based on an asymptotic χ21 (or χ2p−1 for p samples) for ′ w V −1w where n1j n2j dj (nj − dj ) n2j (nj − 1) 1 Thus, the log-rank test is obtained by constructing a 2x2 table at each distinct failure time, and comparing the failure rates between the two groups, conditional on the number at risk in the groups. The tables are then combined using the Cochran-Mantel-Haenszel test. V= k X The log-rank statistic is most powerful if the hazard ratios (or odds ratios) among the samples are constant (called ’proportional hazards’) over time. The departure from the proportional hazards can be checked by examing the estimated survival curves. Note: Mantel and Haenszel (1959) proposed the summary statistics for K strata of 2x2 tables P P P M 2 = (| n11k − m11k | − 0.5)2/ (V (n11k )), assuming the null hypothesis of conditional independence. This statistic has approximately a chi-squared distribution with df=1. Cochran (1954) proposed a similar statistic without the continuity correction (0.5) and conditioned only on the group totals in each stratum, treating each 2x2 table as 2 independent binomials. Because of the similarity, the statistic is called “Cochran-Mantel-Haenszel (CMH)” test. Later, this CMH test was generalized for IxJxK tables by Birch (1965), Landis et al. (1978) and Mantel and Byar (1978) and called “The Generalized CMH test”. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 30 4.2 Wilcoxon Test The Wilcoxon statistic is k X w= 1 and the test statistic is rj (d1j − n1j dj ), nj ′ χ2 = w V −1w where V= k X 1 rj2n1j n2j dj (nj − dj ) n2j (nj − 1) The Wilcoxon statistic is a weighted log-rank statistic. i.e., the Wilcoxon test gives more weight to early times than to late times since the weight, rj , j = 1, · · · , k, reflects the number of subjects at risk at each time and always decreases. Thus, it is less sensitive than the log-rank test to differences between groups that occur at later points in time. Note: Neither test is particulalrly good at detecting differnces when survival curves cross. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 31 4.3 Gray and Tsiatis Log-rank test for a cure rate model Gray and Tsiatis (1989, Biometrics) proposed a linear rank test that tests equality of survival distribution, giving more weight to later survival differences than does the log-rank test. Their proposed statistic is Z−1 [KM (ti−)]−1[△N1(ti) − p(ti)] = rP [KM (ti−)]−2p(ti)[1 − p(ti)]] P where the weight KM −1(t−) is the inverse of the left continuous version of the Kaplan-Meier estimate of Y1 (t) is the expected number of survival, N1(t) is the number of observed failures by t, and p(t) = [Y1 (t)+Y 2 (t)] deaths on treatment 1 at time t, given one death occurs at t. There is an in-house R program for this test (the latest version) >library(mysurv) >logrank(time, status, group, strata, rho=-1) rho: Specifies the value of rho in the G-rho test (Harrington and Fleming, 1982). rho=0 gives the logrank test, and rho=1 the Peto-Peto Wilcoxon test (and rho=-1 the test discussed by Gray and Tsiatis, 1989). DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 32 4.4 Stratified Log-rank Test An overall test statististic is obtained by summing the log-rank statistics over h strata and corresponding variances obtained within each of the independent strata. ( s X h=1 s h T X w ) ( h=1 s h −1 X V ) ( wh ) h=1 where w and V are defined above. Note: In SAS PROC LIFETEST, use the TEST (not STRATA) statement for staratified log-rank test. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 33 Example 2A: Log-rank test a. Leukemia data from Table 1.1 in Cox and Oakes (1984) control: 1 1 2 2 3 4 4 5 5 8 8 8 8 11 11 12 12 15 17 22 23 6-MP: 6* 6 6 6 7 9* 10* 10 11* 13 16 17* 19* 20* 22 23 *: censored Let us reconstruct the data. Failure Time 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 19 20 22 23 25 Control 6-MP d0 c0 n0 d1 c1 n1 2 0 21 0 0 21 2 0 19 0 0 21 1 0 17 0 0 21 2 0 16 0 0 21 2 0 14 0 0 21 0 0 12 3 1 21 0 0 12 1 0 17 4 0 12 0 0 16 0 0 8 0 1 16 0 0 8 1 1 15 2 0 8 0 1 13 2 0 6 0 0 12 0 0 4 1 0 12 1 0 4 0 0 11 0 0 3 1 0 11 1 0 3 0 1 10 0 0 2 0 1 9 0 0 2 0 1 8 1 0 2 1 0 7 1 0 1 1 0 6 0 0 0 0 1 5 25* 32* 32* 34* 35* DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Calculating logrank statistic by hand. Failure Control Time d0 n0 1 2 21 2 2 19 3 1 17 4 2 16 5 2 14 6 0 12 7 0 12 8 4 12 9 0 8 10 0 8 11 2 8 12 2 6 13 0 4 15 1 4 16 0 3 17 1 3 19 0 2 20 0 2 22 1 2 23 1 1 25 0 0 Sum o = d0 0 e = d0+1 nn0+1 Combined d0+1 n0+1 2 42 2 40 1 38 2 37 2 35 3 33 1 29 4 28 0 24 1 23 2 21 2 18 1 16 1 15 1 14 1 13 0 11 0 10 2 9 2 7 0 5 0+1 −d0+1 ) var = d0+1 n0nn21 (n(n 0+1 −1) 0+1 χ21 = (10.251)2/6.257 = 16.793 e 1.00 0.95 0.45 0.86 o−e 1.00 1.05 0.55 1.44 var 0.488 10.251 6.257 34 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 35 SAS codes proc freq; tables time*grp*status / cmh nopercent norow nocol; weight cnt; Table 1 of grp by status Controlling for time=1 Table 2 of grp by status Controlling for time=2 grp grp status Frequency|Dead |Alive | ---------+--------+--------+ Control | 2 | 19 | ---------+--------+--------+ 6-MP | 0 | 21 | ---------+--------+--------+ Total 2 40 Total 21 21 42 status Frequency|Dead |Alive | ---------+--------+--------+ Control | 2 | 17 | ---------+--------+--------+ 6-MP | 0 | 21 | ---------+--------+--------+ Total 2 38 Total ....... Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob --------------------------------------------------------------1 Nonzero Correlation 1 16.7929 <.0001 2 Row Mean Scores Differ 1 16.7929 <.0001 3 General Association 1 16.7929 <.0001 ** Log-rank test -------------------------------------------------------------------- 19 21 40 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc lifetest; time time*dth(0); strata grp; Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank 16.7929 1 <.0001 Wilcoxon 13.4579 1 0.0002 -2Log(LR) 16.4852 1 <.0001 R codes > library(survival) > attach(leukemia_data) > logrank <- survdiff(Surv(time, dth) ~ grp) print(logrank) Call: survdiff(formula = Surv(time, dth) ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp=6-MP 21 9 19.3 5.46 16.8 grp=cnt 21 21 10.7 9.77 16.8 Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05 36 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim > cox1 <- coxph(Surv(time, dth)~factor(grp), method="exact") > print(summary(cox1)) Call: coxph(formula = Surv(time, dth) ~ factor(grp), method = "exact") n= 42 coef exp(coef) se(coef) z Pr(>|z|) factor(grp2)1 -1.6282 0.1963 0.4331 -3.759 0.000170 *** --Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 factor(grp2)1 exp(coef) exp(-coef) lower .95 upper .95 0.1963 5.095 0.08398 0.4587 Rsquare= 0.321 (max possible= 0.98 ) Likelihood ratio test= 16.25 on 1 df, Wald test = 14.13 on 1 df, Score (logrank) test = 16.79 on 1 df, Note: Score test from coxph is the log-rank test. p=5.544e-05 p=0.0001704 p=4.169e-05 *** 37 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 38 more examples b. proc lifetest data=one timelist=(12, 24); time time*status(0); strata group; title "Overall Survival"; ** Also, use strata group treatment; ** for a comparison of 4 subsets. strata age (30, 40, 50); ** for a comparison of age intervals: (0, 30), [30, 40), [40, 50), [50+). The LIFETEST Procedure Timelist time 12.0000 24.0000 9.2649 16.5257 Stratum 1: group = Mini Product-Limit Survival Estimates Survival Standard Survival Failure Error 0.4694 0.2701 0.5306 0.7299 0.0713 0.0707 Number Failed Number Left 26 33 21 4 *** 95% CI for 2-year survival: (0.27-1.96*0.0707, 0.27+1.96*0.0707) = (0.13, 0.409). Summary Statistics for Time Variable time Quartile Estimates Point 95% Confidence Interval Percent Estimate [Lower Upper) 75 50 25 24.7721 9.1335 4.6653 14.6201 6.0123 3.9754 . 15.3758 6.1437 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Stratum 2: group = STD Timelist 12.0000 24.0000 time 11.6961 22.8994 Product-Limit Survival Estimates Survival Standard Survival Failure Error 0.4771 0.5229 0.0440 0.4036 0.5964 0.0453 Summary Statistics for Time Variable time Quartile Estimates Percent 75 50 25 Point Estimate . 9.9220 2.6283 95% Confidence Interval [Lower Upper) 37.6181 . 5.8152 22.8994 1.5113 4.4025 Summary of the Number of Censored and Uncensored Values Percent Stratum group Total Failed Censored Censored 1 Mini 49 34 15 30.61 2 STD 132 80 52 39.39 -----------------------------------------------------------Total 181 114 67 37.02 Test of Equality over Strata Test Log-Rank Wilcoxon -2Log(LR) Chi-Square DF Pr > Chi-Square 0.4374 0.2844 4.6428 1 1 1 0.5084 0.5939 0.0312 Number Failed 68 75 Number Left 56 31 39 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 40 II. R code > library(survival) > logrank <- survdiff(Surv(time, status) ~ group, data=aml.data) > print(logrank) N Observed Expected (O-E)^2/E Mini 49 34 30.9 0.3100 STD 132 80 83.1 0.1153 Chisq= 0.4 on 1 degrees of freedom, p= 0.5084 fit1 <- survfit(Surv(time, status) ~ group, conf.type=c("none"), data=aml.data) plot(fit1, xlab=c("Months"), ylab=c("Probability"),mark.time=T, xlim=c(0,36), xaxt="n", col=c(3,6), lwd=2) title("Overall Survival: AML/MDS, STD vs. NST", cex = 1.2) # axis(1, at = c(0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72)) axis(2, at = c(0, 0.2, 0.4, 0.6, 0.8, 1), lwd = 2) leg.txt <- c("NST", "STD") *** Note: NST=mini ** legend(60, 0.9, leg.txt, lty = c(1:2), lwd = 2, col = c(3, 6)) Overall Survival: AML/MDS, STD vs. NST 1.0 NST STD probability 0.8 0.6 0.4 0.2 0.0 0 6 12 18 24 30 36 42 months 48 54 60 66 72 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 41 b. Veterans Administration (VA) Lung Cancer Data This was a clinical trial with 137 male patients with advanced inpoerable lung cancer. The endpoint was time to death. There were 6 covariates measured at the time of randomization: cell type (squamous cell, large cell, small cell, and adenocarcinoma), Karnofsky performance status, time in months from diagnosis to the start of therapy, age in years, prior therapy (yes/no), and treatment (experimental vs. standard therapy). proc lifetest data=one notable; time SurvTime*Censor(1); strata Cell / test=logrank adjust=sidak; run; Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank 25.4037 3 <.0001 ** other test=: Fleming, none, LR, MODPETO, PETO, Wilcoxon, Tarone; Adjustment for Multiple Comparisons for the Logrank Test Strata Comparison p-Values Cell Cell Chi-Square Raw Sidak adeno large 7.8476 0.0051 0.0301 adeno small 0.4843 0.4865 0.9817 adeno squamous 15.0560 0.0001 0.0006 large small 8.9284 0.0028 0.0167 large squamous 0.8739 0.3499 0.9245 small squamous 14.8237 0.0001 0.0007 ------------------------------------------------------------------------------proc lifetest data=one notable; time SurvTime*Censor(1); strata Cell / test=logrank adjust=sidak diff=control(’adeno’); run; Adjustment for Multiple Comparisons for the Logrank Test Strata Comparison p-Values Cell Cell Chi-Square Raw Sidak large adeno 7.8476 0.0051 0.0152 small adeno 0.4843 0.4865 0.8646 squamous adeno 15.0560 0.0001 0.0003 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 42 c. CLL data 1.0 Probability 0.8 0.6 0.4 0.2 Group 1 Group 2 Group 3 0.0 0 5 10 15 Years from Diagnosis 1. multiple comparisons: proc lifetest data=one plot=s timelist=(10,20); time os_t_dx*os(0); *strata grp / test=logrank adjust=sidak diff=control("2"); strata grp / test=logrank adjust=sidak; run; 20 25 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Group Total Failed Censored Censored 1 64 17 47 73.44 2 99 5 94 94.95 3 15 2 13 86.67 ------------------------------------------------------------------------------Total 178 24 154 86.52 Adjustment for Multiple Comparisons for the Logrank Test Strata Comparison p-Values Group Group Chi-Square Raw Sidak 1 2 21.8311 <.0001 <.0001 3 2 6.4525 0.0111 0.0220 *******************************************************************; Adjustment for Multiple Comparisons for the Logrank Test Strata Comparison p-Values Group Group Chi-Square Raw Sidak 1 2 21.8311 <.0001 <.0001 1 3 15.4962 <.0001 0.0002 2 3 6.4525 0.0111 0.0329 2. two group comparison I proc lifetest data=one plot=s timelist=(10,20); where grp in ("2", "3"); time os_t_dx*os(0); strata grp / test=logrank adjust=sidak diff=control("2"); *strata grp; * same statement; run; Adjustment for Multiple Comparisons for the Logrank Test Strata Comparison p-Values Group Group Chi-Square Raw Sidak 3 2 0.3624 0.5472 0.5472 43 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 3. two group comparison II proc lifetest data=one plot=s timelist=(10,20); where grp in ("1", "3"); time os_t_dx*os(0); strata grp / test=logrank adjust=sidak diff=control("3"); *strata grp; * same statement; run; Adjustment for Multiple Comparisons for the Logrank Test Strata Comparison p-Values Group Group Chi-Square Raw Sidak 3 1 4.2057 0.0403 0.0403 Note: options for adjust=: bonferroni, dunnett, scheffe, sidak, SMM, GTE, Tukey. 44 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 45 Example 3: Stratified Log-Rank test. a. proc lifetest data=one timelist=(12, 24); time time*status(0); strata age50; *** 1 if age>=50, 0 else; test group2; Note: The log-rank and Wilcoxon statistics produced by the TEST statement are first calculated within stratum and then averaged across strata. These are stratified statistics that control for age50. i.e, rank test for group was calculated for age<50 and age≥50 separately and then averaged. Rank Tests for the Association of os_t with Covariates Pooled over Strata Univariate Chi-Squares for the Wilcoxon Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square group2 -5.3943 3.0784 3.0706 0.0797 Forward Stepwise Sequence of Chi-Squares for the Wilcoxon Test Pr > Chi-Square Pr > Variable DF Chi-Square Chi-Square Increment Increment group2 1 3.0706 0.0797 3.0706 0.0797 Univariate Chi-Squares for the Log-Rank Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square group -4.2870 4.2820 1.0024 0.3167 Covariance Matrix for the Log-Rank Statistics Variable group2 group2 18.3357 Forward Stepwise Sequence of Chi-Squares for the Log-Rank Test Pr > Chi-Square Pr > Variable DF Chi-Square Chi-Square Increment Increment group 1 1.0024 0.3167 1.0024 0.3167 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim > library(survival) > fit1 <- survdiff(Surv(time, status) ~ group+strata(age50), data=aml.data) # stratified log-rank test N Observed Expected (O-E)^2/E (O-E)^2/V factor(grp)=0 132 80 75.7 0.243 1.00 factor(grp)=1 49 34 38.3 0.480 1.00 Chisq= 1 on 1 degrees of freedom, p= 0.316 46 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 47 b. proc lifetest data=one timelist=(12, 24); time time*status(0); strata group age50; Stratum 1 2 3 4 age50 <50 >=50 <50 >=50 Test of Equality over Strata Pr > Chi-Square DF Chi-Square Log-Rank Wilcoxon -2Log(LR) 14.6157 17.5549 17.4462 3 3 3 0.0022 0.0005 0.0006 Progression-free Survival: AML/MDS, STD vs. NST 1.0 NST, age<50 NST, age>=50 STD, age<50 STD, age>=50 0.8 probability Test Legend for Strata group Mini Mini STD STD 0.6 0.4 0.2 0.0 0 6 12 18 24 30 36 42 months 48 54 60 66 72 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim d. Veterans Administration (VA) Lung Cancer Data title ’VA Lung Cancer Data’; symbol1 c=blue ; symbol2 c=orange; symbol3 c=green; symbol4 c=red; symbol5 c=cyan; symbol6 c=black; proc lifetest plots=(s,ls,lls) outtest=Test maxtime=600; time SurvTime*Censor(1); id Therapy; *** to identify the type of therapy for each observation in PL estimates strata Cell; test Age Prior DiagTime Kps Treatment; ** testing multiple variables stratified by Cell run; ** Output of the test statement Rank Tests for the Association of SurvTime with Covariates Pooled over Strata Variable Age Prior DiagTime Kps Treatment Test Statistic 14.4158 -26.3997 -82.5069 856.0 -3.1952 Univariate Chi-Squares for the Wilcoxon Test Standard Pr > Deviation Chi-Square Chi-Square Label 66.7598 0.0466 0.8290 age in years 28.9150 0.8336 0.3612 prior treatment? 72.0117 1.3127 0.2519 months till randomization 118.8 51.9159 <.0001 karnofsky index 3.1910 1.0027 0.3167 treatment indicator 48 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Variable Test Statistic Age Prior DiagTime Kps Treatment -40.7383 -19.9435 -115.9 1123.1 -4.2076 49 Univariate Chi-Squares for the Log-Rank Test Standard Pr > Deviation Chi-Square Chi-Square Label 105.7 46.9836 97.8708 170.3 5.0407 0.1485 0.1802 1.4013 43.4747 0.6967 0.7000 0.6712 0.2365 <.0001 0.4039 age in years prior treatment? months till randomization karnofsky index treatment indicator Plot options: • s: estimated survivor function against time • h: estimated hazard function against time • ls: estimated -log S(t) function against time ** This is cumulative hazard** • lls: estimated log(-log S(t)) function against log time • Note1: If exponential,−logS(t) = λt and log(−logS(t)) = log(λ) + log(t). Thus, log(−logS(t)) is a linear function of log(t). • Note2: ls and lls plots provide an empirical check of the appropriateness of the exponential model or Weibull model. If the exponential model is appropriate, the ls curve should be approximately linear through the origin. If the Weibull model is appropriate, the lls curve should be approximately linear since Weibull S(t) = exp(−a ∗ tr ). DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 50 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 51 4.5 Mantel-Byar Test (JASA, 1974) • What if group membership is deterimined during the study follow-up (i.e., time-dependent). • In the Stanford Heart Transplant data, the group membership for ’tranplant’ vs. ’no tranplsant’ is determined during the study follow up and patients receiving a heart transplant must have at least survived from time of diagnosis to time of transplant (thus ’time-to-transplant bias’), whereas no such requirement is necessary for the control subjects. • In comparison of survival time between responders and non-responders, responders must live long enough to achieve a response. • Thus, ’naive’ analysis from the start of the treatment (or transplant) ignores this ’lead-time’ bias. The ’bias’ is coming from i) increasing probability of observing a response or undergoing transplant with longer follow-up time or wait-time. i.e, the response/transplant group is depnedent on the length of follow-up. • Mantel-Byar Test is a simple modification of log-rank test to test group difference when group membership changes over time (i.e., time-dependent variable). i.e., a transient state analysis. See Example 3B. • KM curves do not work here since the number of subjects at risk is not necessarily decreasing. • Simon and Makuch (Stat. Med. 1984) proposed a graphycal presentation using a multi-state survival model to calculate cumulative hazard and then transformed it into survival function KM type estimator DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 52 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 53 4.6 Landmark Analysis • In the initial paper by Anderson et al (JCO, 1983), survival time between ”responders” and ”non-responders” was compared from the start of the study although the response status was not determined at the start of the study. To correct this ’time-to-response’ bias (also called ”lead-time bias”, ”guarantee-time bias”), the landmark analysis was introduced. • The goal is to estimate in an unbiased way the time-to-event probabilities in each group conditional on the group membership of patients at a specific time point (the landmark time) • Landmark Analysis: – Select a fixed time point – Include only patients alive at the landmark time in the analysis – Conduct a usual survival analysis • Advantages: – Simple execution and use of standard survival analysis – Unbiased estimation – Correct conditional statistical tests DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 54 • Limitations: – Arbitrary selection of the landmark time (select the landmark a priori) – Exclusion of early events that occurred prior to the landmark time and possibly data drive results. – Lack of generalization due to the conditional estimates – Issue of miscalssification at longer follow-up time for early landmark time; exclusion of high proportion of events for late landmark time. – Lack of randomization property. The group membership can be confounded with the patient characteristics. e.g. responders are good prognsois patients. • Alternative method is to use Cox models with a time dependent covariate. • See Dafni (Circ. Cardiovasc Qual Outcomes 2011) for review DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 55 Example 3B: Mantel-Byar test. Stanford Heart Transplant data taken from Mantel and Byar (1974) Failure time 0 1 2 5 7 8 11 15 17 27 34 35 36 38 39 43 44 49 50 57 60 65 67 68 71 76 77 84 99 101 148 152 187 218 284 339 674 732 851 1032 Total Transplant n1 d1 0 0 2 0 4 0 8 1 6 0 7 0 8 0 11 1 13 0 22 1 24 0 24 0 26 0 27 1 25 0 27 1 26 2 25 0 25 1 26 1 26 1 25 1 25 1 24 0 25 2 22 1 21 1 22 0 22 1 21 0 20 0 20 1 19 1 17 1 16 1 15 1 9 1 7 1 6 1 3 1 26 No Transplant n0 d0 68 1 65 2 61 3 54 2 52 1 50 1 48 1 44 1 40 1 28 0 25 1 24 1 21 1 19 0 19 2 15 0 15 0 14 1 13 0 11 0 10 0 10 0 9 0 9 1 7 0 7 0 7 0 5 1 4 0 4 1 3 1 2 0 2 0 2 0 1 0 1 0 1 0 1 0 1 0 1 0 23 Loss to Transplant 2 2 4 0 1 1 3 3 9 3 0 2 1 0 2 0 1 0 2 1 0 1 0 1 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 42 P P P o = 26, e = 26.575, var = 7.349 χ21 = (26 − 26.575)2 /7.349 = 0.045 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 56 proc freq; tables time*status*grp / nopercent norow nocol cmh; weight cnt; Table 1 of status by grp Controlling for time=0 Table 1 of status by grp Controlling for time=1 status grp Frequency|NT |T | ---------+--------+--------+ 0 | 67 | 0 | ---------+--------+--------+ 1 | 1 | 0 | ---------+--------+--------+ Total 68 0 status grp Frequency|NT |T | ---------+--------+--------+ 0 | 63 | 2 | ---------+--------+--------+ 1 | 2 | 0 | ---------+--------+--------+ Total 65 2 Total 67 1 68 Table 3 of status by grp Controlling for time=2 status grp Frequency|NT |T | ---------+--------+--------+ 0 | 58 | 4 | ---------+--------+--------+ 1 | 3 | 0 | ---------+--------+--------+ Total 61 0 .... Total 65 2 67 Table 5 of status by grp Controlling for time=5 Total 62 3 65 status grp Frequency|NT |T | ---------+--------+--------+ 0 | 52 | 7 | ---------+--------+--------+ 1 | 2 | 1 | ---------+--------+--------+ Total 54 8 Total 59 3 62 Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob --------------------------------------------------------------1 Nonzero Correlation 1 0.0450 0.8320 2 Row Mean Scores Differ 1 0.0450 0.8320 3 General Association 1 0.0450 0.8320 ** MB test NOTE: MB test can also be obtained by fitting a Cox model with a time dependent covariate. MB test is the score test. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 57 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 58 5. Survival Models 5.1 Proportional Hazards Model - a semiparametric model • Let λ(t|X) represent the hazard function at time t for an individual with basic covariates x. The relative risk model by Cox (1972) is ′ λ(t|X) = λ0(t)expX β (1) where λ0(·) is an arbitrary unspecified baseline hazard function that depends on time t, also called an underlying hazard function or a hazard function for a standard subject. X ′ β = β1X1 + β2X2 + · · · + βk Xk is the regression effect of covariates, X1, X2, · · · , Xk , that is independent of time t. • This is a PH structure with a loglinear model. • Any parametric function (e.g., Weibull) can be used for λ0(t). In PH model (1), λ0(t) dosen’t need to be specified to estimate β. • (1) can also be written in terms of the cumulative hazard and survival functions: ′ Λ(t|X) = Λ0(t)expX β ′ S(t|X) = exp[−Λ0(t)expX β ] ′ = exp[−Λ0(t)]exp(X β) ′ = S0 (t)exp(X β) where Λ0(t) is an underlying cumulative hazard function, S0(t) underlying survival function, S(t|X) is the probability of surviving past time t given the values of the predictors X. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 59 5.2 PH Model Assumptions The PH model can be linearized with respect to Xβ. logλ(t|X) = logλ0(t) + Xβ logΛ(t|X) = logΛ0(t) + Xβ 1. Linearity and additivity of the predictors with respect to log hazard or log cumulative hazard. 2. Proportional hazards of no time by predictor interaction, i.e., the predictors have the same effect on the hazard function at all values of t. The relative hazard function exp(Xβ) is constant over time. For example, the hazard ratio between age 30 and age 40 is the same as the hazard ratio between age 70 and age 80. 3. Exponential Link Function DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 60 5.3 Interpretation of Parameters The regression coefficient for Xj , βj , is the increase in log hazard or log cumulative hazard at any fixed point in time if Xj is increased by one unit and all other predictors are held constant. i.e., βj = logλ(t|X1, · · · , Xj + 1, Xj+1, · · · , Xk ) − logλ(t|X1, · · · , Xj, Xj+1, · · · , Xk ), which is equivalent to the log of the hazards at time t. The ratio of hazards for an individual with predictor variable values X ∗ compared to an individual with predictor X is X ∗ : X hazard ratio = [λ0(t)exp(X ∗β)]/[λ0(t)exp(Xβ)] = exp(X ∗β)]/exp(Xβ)] = exp[(X ∗ − X)β] If there is only one binary predictor X1, the PH model can be written λ(t|X1 = 0) = λ0(t) λ(t|X1 = 1) = λ0(t)exp(β1) Here exp(β1) is the hazard ratio of X1 = 1 : X1 = 0. Note that the PH assumption between X1 = 1 and X1 = 0 needs to be examined, but no need to check the linearity assumption. If the single predictor X1 is continuous, the model becomes λ(t|X1) = λ0(t)exp(β1X) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 61 5.4 Estimation of Parameters Assuming the PH model (1) is correct, Cox (1972, 1975) proposed a partial likelihood approach to estimate β since unknown baseline hazard function λ0(t) prevents constructing a full likelihood function. Cox argued that if the PH model holds, information about λ0(t) is not useful for estimating the parameter of primary interest, β. Let t1 < t2 < · · · < tk (k < n) be the distinct ordered uncensored observations of the n subjects in a sample, assuming no tied uncensored observations, and let ni be the number of subjects at risk at time ti. The conditional probability that the ith subject is the one that fails at time ti given that only one subject fails among those subjects at risk at ti is Prob(subject i fails at ti| only one subject fails and ni at ti ) = Prob(subject i fails at ti|ni ) = Prob(only one fails at ti|ni) λ0(ti)exp(Xiβ) = P λ (t )exp(X β) j∈ni 0 i j exp(Xiβ) = P exp(X β) j∈ni j exp(Xiβ) P Yj ≥ti exp(Xj β) Thus, the partial likelihood is P L(β) = Y i P exp(Xiβ) exp(Xiβ) Y = P Yi uncensored Yj ≥Yi exp(Xj β) j∈ni exp(Xj β) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 62 • Note that λ0(t) cancels out of the numerator and denominator, thus, the ratio of hazard is constant over time. • Since the likelihood function does not make direct use of the censored and uncensored survival times and it does not require the specification of λ0(t), it is refered to as a particial likelihood function. • Since it is computationally more convenient to maximize the log-likelihood function and approximations to the variance of MLE can be obtained from the second derivatives, the log partial likelihood function can be written as l(β) = log PL(β) = n X [Xiβ − log( i:Yi uncensored X Yj ≥Yi exp(Xiβ))] • Differentiating this function with respect to β gives the px1 score vector, U (β). ∂ l(β) U (β) = ∂β and the MLE of β can be obtained by solving U (β) = 0. The negative second derivative of this function is the pxp information matrix, ∂ 2l(β) I(β) = − ∂β 2 , and the variance of β̂ is the inverse of the expected information matrix. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • β̂ is consistent and asymptotically normally distributed with mean β and variance {E[I(β)]}−1. Because the expected information matrix is a function of β, which is unknown, the observed information is used to calculate the estimate of β and its variance. • Both SAS and S-plus/R use the Newton-Raphson algorithm to maximize the log partial likelihood equation. • Furthermore, estimates depend on the ranks of the event times - thus robust and scale invariant 63 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • Estimators for λ0(t) – Nelson-Aalen estimator: Λ̂0(t) = P – Breslow (1972) estimator: Λ̂0(t) = dj i:tj <t nj dj i:tj <t P exp(Xβ) P – Kalbfleisch-Prentice (1973) estimator: KM without covariates, default in PROC PHREG • If we have the MLE of λ0(t), the MLEs for λ(t|X), Λ(t|X) and S(t|X) are λ̂(t|X) = λ̂0(t)exp(X β̂) Λ̂(t|X) = Λ̂0(t)exp(X β̂) Ŝ(t|X) = exp[−Λ̂0(t)exp(X β̂)] = Ŝ0(t)exp(X β̂) • Advantages of Cox model – can estimate β without any distributional assumption for λ0(t), thus semiparametric. – easy to incorporate time-dependent covariates – permits a stratified analysis that controls nuisance variables. – can accommodate both discrete and continuous measurements of event times. 64 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 65 5.5 Testing the Global/Local Null Hypothesis: Wald, score, and likelihood ratio tests • Likelihood ratio test (LRT): χLR = −2[l(β̂) − l(β (0))]. β (0) = 0, the initial value of β̂. For an individual variable, the model needs to be fit twice - with and without the variable of interest. ˆ β̂ − β (0)), where Iˆ = I(β̂) is the estimated information matrix at the • Wald test: χW = (β̂ − β (0))′ I( solution of I(β). This is a direct use of MLE β̂ For an individual variable, both SAS and R/S-plus outputs provide the Wald test. Confidence intervals for hazard ratios of individual variables are usually calculated based on the Wald statistics (e.g., 95% CI: exp(β̂ ± 1.96s.e.(β̂)). ′ • Score test: χS = U (β (0))I(β (0))−1U (β (0)). For an individual variable, this test requires fitting two models, with and without the variable of interest, since this is a test at β = 0. When there is a single categorical variable, the score test is identical to the log-rank test. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 66 • Under mild assumptions, each test statistic has a χ2 with p d.f. given the null hypothesis, where p is the dimension of β. • They are asymptotically equivalent but may differ in finite samples. • The direct use of MLE (β̂ (i.e., Wald test) has advantages in simple presentation (i.e., convenient), but it is not invariant under reparametrization and it does not behave well if the likelihood is of unusual shape. • In finite samples, LRT is the most reliable and recommended although this requires fitting model twice. • Missing values in the data set can be an issue when computing the LR test for a varible. • For testing global hypothesis, both SAS and R/S-plus output provide all three test statistics. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example 4: Cox model 1. SAS: PROC PHREG. proc phreg data=one; class pt_dnr_gender(ref=’MF’) group(ref=’Mini’) donor(ref=’MRD’); model os_t*os(0)=age pt_dnr_gender group donor / risklimits ties=efron; hazardratio pt_dnr_gender / diff=all; ** check diff=ref; run; Note: ‘risklimits’ option gives the 95% confidence interval for the hazard ratio. The PHREG Procedure Model Information Data Set WORK.ONE Dependent Variable os_t Censoring Variable os Censoring Value(s) 0 Ties Handling EFRON Number of Observations Read Number of Observations Used Class Class Level Information Value Design Variables pt_dnr_gender group donor 181 181 FF FM MF MM Mini STD MRD MUD 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 Summary of the Number of Event and Censored Values Percent Total Event Censored Censored 181 114 67 37.02 67 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L AIC SBC 1060.874 1060.874 1060.874 1037.388 1049.388 1065.805 Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Parameter age pt_dnr_gender pt_dnr_gender pt_dnr_gender group donor FF FM MM STD MUD Chi-Square 23.4855 22.6099 22.2285 DF 1 1 1 1 1 1 DF 6 6 6 Pr > ChiSq 0.0006 0.0009 0.0011 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits 0.04515 0.01053 18.3964 <.0001 1.046 1.025 1.068 0.65204 0.28764 5.1388 0.0234 1.919 1.092 3.373 0.32404 0.30935 1.0972 0.2949 1.383 0.754 2.535 0.30973 0.27429 1.2752 0.2588 1.363 0.796 2.333 0.53453 0.25882 4.2652 0.0389 1.707 1.028 2.834 0.30787 0.19615 2.4635 0.1165 1.361 0.926 1.998 Hazard Ratios for pt_dnr_gender Description pt_dnr_gender pt_dnr_gender pt_dnr_gender pt_dnr_gender pt_dnr_gender pt_dnr_gender FF FF FF FM FM MF vs vs vs vs vs vs FM MF MM MF MM MM Point Estimate 1.388 1.919 1.408 1.383 1.014 0.734 95% Wald Confidence Limits 0.811 2.377 1.092 3.373 0.872 2.275 0.754 2.535 0.598 1.721 0.429 1.256 68 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 2. R code coxph1 <- coxph(Surv(os.t, os) ~ age+factor(pt.dnr.gender)+factor(group) +factor(donor), method="efron", data=aml.data) print(summary(coxph1)) Call: coxph(formula = Surv(os.t, os) ~ age + factor(pt.dnr.gender) + factor(group) + factor(donor), method = "efron") n= 181 coef exp(coef) se(coef) z Pr(>|z|) age 0.04515 1.04619 0.01053 4.289 1.79e-05 *** factor(pt.dnr.gender)FM -0.32800 0.72036 0.27441 -1.195 0.2320 factor(pt.dnr.gender)MF -0.65204 0.52098 0.28764 -2.267 0.0234 * factor(pt.dnr.gender)MM -0.34231 0.71013 0.24473 -1.399 0.1619 factor(group)STD 0.53453 1.70665 0.25882 2.065 0.0389 * factor(donor)MUD 0.30787 1.36053 0.19615 1.570 0.1165 --Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 age 1.0462 0.9559 1.0248 1.0680 factor(pt.dnr.gender)FM 0.7204 1.3882 0.4207 1.2335 factor(pt.dnr.gender)MF 0.5210 1.9195 0.2965 0.9155 factor(pt.dnr.gender)MM 0.7101 1.4082 0.4396 1.1472 factor(group)STD 1.7066 0.5859 1.0276 2.8344 factor(donor)MUD 1.3605 0.7350 0.9263 1.9984 Rsquare= 0.122 (max possible= 0.997 ) Likelihood ratio test= 23.49 on 6 df, Wald test = 22.23 on 6 df, Score (logrank) test = 22.61 on 6 df, p=0.0006492 p=0.001101 p=0.0009383 Note: ‘method’ is an option for handling ties. Here the options are ”efron”,”breslow”, and ”exact”. If there are no tied death times all the methods are equivalent. Nearly all Cox regression programs use the Breslow method by default, but this is not the case in R/Splus. The Efron approximation is the default in R/Splus. 69 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 3. Example of LRT when there are missing data --------------------------------------------------------------------a. proc phreg data=one; *** if there are missing data in age; class pt_dnr_gender(ref=’MF’) group(ref=’Mini’) donor(ref=’MRD’); model os_t*os(0)=age2 pt_dnr_gender group donor / rl ties=efron; Parameter age2 DF 1 Parameter Estimate 0.04814 Analysis of Maximum Likelihood Estimates Standard Hazard 95% Hazard Ratio Error Chi-Square Pr > ChiSq Ratio Confidence Limits 0.01132 18.0949 <.0001 1.049 1.026 1.073 Summary of the Number of Event and Censored Values Percent Total Event Censored Censored 157 96 61 38.85 Without With Criterion Covariates Covariates -2 LOG L 870.425 846.979 --------------------------------------------------------------------b. proc phreg data=one; class pt_dnr_gender(ref=’MF’) group(ref=’Mini’) donor(ref=’MRD’); model os_t*os(0)=pt_dnr_gender group donor / rl ties=efron; Total 181 Criterion -2 LOG L Event 114 Without Covariates 1060.874 Censored 67 Percent Censored 37.02 With Covariates 1056.794 LRT: 1056.794-846.979=209.895 (??) 70 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim c. proc phreg data=one; where age2>0; class pt_dnr_gender(ref=’MF’) group(ref=’Mini’) donor(ref=’MRD’); model os_t*os(0)=pt_dnr_gender group donor / rl ties=efron; run; Total 157 Event 96 Censored 61 Percent Censored 38.85 Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 870.425 LRT: 866.147-846.979=19.168 866.147 71 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 72 5.6 Stratified Cox PH model • The idea behind the stratified Cox PH model is to allow different baseline hazard functions across levels of the stratification factors. • The stratified Cox model ranks the failure times separately within strata and formulates a separate log likelihood function for each stratum, but with each log likelihood having a common β vector. The likelihood functions are then multiplied together to form a joint likelihood over strata. Computationally, l(β) = K X k=1 lk (β) where lk (β) is log partial likelihood function in stratum k. • Stratified Cox model is commonly used to adjust for observations involving some kind of clustering or multi-level grouping. Examples include muticenter clincial trials. Because of varying patient populations, supportive care, and referral patterns, the different clinical centers in the trial are likely to have different baseline survival functions. • Advantage of stratification is that it gives the most general adjustment for a confounding variable. • Disadvantages are i) no direct inference (thus no p-value) can be made about the stratification factors because these are merely “adjusted for” or “controlled for” and not regarded as a risk factor; ii) the precision of estimated coefficients and power of hypothesis may be deminished if there are a large number of strata. • A stratified Cox model also allows a modeled factor to interact with strata. • Stratification is useful for checking the PH and linearity assumptions. We will discuss this later. See Example 8. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 73 Example 5. Stratified Cox model: 1. proc phreg data=one; strata grp; model os_t*os(0)=age/ ties=efron; Variable age Analysis of Maximum Likelihood Estimates Parameter Standard Estimate Error Chi-Square Pr > ChiSq 0.03666 0.00966 14.3909 0.0001 DF 1 Hazard Ratio 1.037 2. proc phreg data=one; strata grp; model os_t*os(0)=age grp_age/ ties=efron; grp_age=grp*age; Variable age grp_age Analysis of Maximum Likelihood Estimates Parameter Standard Estimate Error Chi-Square Pr > ChiSq 0.01541 0.01595 0.9333 0.3340 -0.03075 0.01963 2.4541 0.1172 DF 1 1 Hazard Ratio 1.016 0.970 3. proc phreg data=one; strata grp; class age50(ref="0"); model os_t*os(0)=age50 grp_age50/ ties=efron; grp_age50=grp*age50; run; Parameter DF Parameter Estimate Standard Error age50 grp_age50 1 1 0.83554 -0.89158 0.23618 0.44480 Chi-Square Pr > ChiSq 12.5158 4.0179 0.0004 0.0450 Hazard Ratio 2.306 0.410 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim *** Since grp is non PH, a more appropriate model is proc phreg data=one; class grp(ref="0"); model os_t*os(0)=age grp grp_age grp_t/ ties=efron; grp_age=grp*age; grp_t=grp*os_t; run; Parameter Standard Parameter DF Estimate Error Chi-Square age grp grp_age grp_t 1 1 1 1 1 0.04703 0.45698 -0.03006 0.12062 0.01145 1.09489 0.01984 0.03524 74 Pr > ChiSq Hazard Ratio <.0001 0.6764 0.1297 0.0006 1.048 1.579 0.970 1.128 Pr > ChiSq Hazard Ratio 0.0003 0.4411 0.0435 0.0009 2.335 0.720 0.407 1.124 16.8788 0.1742 2.2958 11.7164 *** what if we use age50 instead of age as a continuous variable proc phreg data=one; class age50(ref="0") grp(ref="0"); model os_t*os(0)=age50 grp grp_age50 grp_t/ ties=efron; grp_age50=grp*age50; grp_t=grp*os_t; run; Parameter Standard Parameter DF Estimate Error Chi-Square age50 1 grp 1 grp_age50 grp_t 1 1 1 1 0.84784 -0.32889 -0.89777 0.11659 0.23597 0.42699 0.44464 0.03496 12.9101 0.5933 4.0767 11.1244 5. R code aml.cox <- coxph(Surv(os_t, os) ~ age + strata(grp), method="efron", data=aml.dat) aml.cox <- coxph(Surv(os_t, os) ~ age + strata(grp)+age*grp, data=aml.dat) aml.cox <- coxph(Surv(os_t, os) ~ age + strata(grp)+age*grp+age*os_t, data=aml.dat) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 75 5.7 Tied Event Times The partial likelihood equation is valid if there are no tied event times. To handle ties, several modifications to the likelihood have been proposed in the literature. These methods are the Cox, Peto-Breslow, Efron, and Exact methods. • Breslow’s approximation: This method calculates an approximation from all possible combinations of ties. This is the default in PROG PHREG. This solution is the least accurate, but fast. It is reasonable when the number of ties is small. • Efron’s (1977): This is the default in COXPH in R/Splus. This is a closer approximation to the discrete method than Breslow’s and computationally efficient when dealing with tied event times. • Discrete exact (Cox’s): This method assumes that tied failure times are discrete and truly happened at the same time (i.e., no underlying ordering of events). If there are many ties, this method is computationally intensive. This is called discrete in SAS, but exact in R and S-plus. • (Continuous) Exact: This method is based on the continuous likelihood and assumes that there is a true but unknown ordering for tied event times (i.e., time is continuous) and thus ties are merely the result of imprecise measurement of time. This method computes the exact partial likelihood for all possible combinations of ordering and can be very computationally intensive if there are a large number of ties. For example, if there are 5 tied event times, then 5!=120 possible partial likelihoods. This is called exact in SAS, and not implemented in R/S-plus. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 76 Example. Suppose that the first 2 events out of 5 at risk are tied. If the time data were more accurate, the first two terms in the likelihood would be either r1 r2 l1(β) = ( )( ) r1 + r2 + r3 + r4 + r5 r2 + r3 + r4 + r5 or r1 r2 )( ) l2(β) = ( r1 + r2 + r3 + r4 + r5 r1 + r3 + r4 + r5 • Breslow approximation: uses the complete sum r1 + r2 + r3 + r4 + r5 for the denominator. • Efron method: r1 r2 ( )( ) r1 + r2 + r3 + r4 + r5 0.5r1 + 0.5r2 + r3 + r4 + r5 • Discrete exact method: r1r2 . r1r2 + r1r3 + r1r4 + r1r5 + r2r3 + r2r4 + r2r5 + r3r4 + r3r5 + r4r5 • (Continuous) Exact method: sums all possible combinations of partial likelihood function. i.e., l(β) = l1(β) + l2(β). Notes: • When there are no ties, all methods are the same • When there are a few ties, whichever method is used, it makes little difference. • When there are too many ties, both approximation methods are biased. However, Efron’s is better. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example 6. Tied Event Times proc phreg data=pbc; model fu_days*status2(0) = age log_bili/tie=breslow; Variable age log_bili DF 1 1 Parameter Estimate 0.04378 1.01465 Standard Error 0.00750 0.07796 Chi-Square 34.0309 169.3939 Pr > ChiSq <.0001 <.0001 Hazard Ratio 1.045 2.758 Pr > ChiSq <.0001 <.0001 Hazard Ratio 1.045 2.759 Pr > ChiSq <.0001 <.0001 Hazard Ratio 1.045 2.759 proc phreg data=pbc; model fu_days*status2(0) = age log_bili/tie=efron; Variable age log_bili DF 1 1 Parameter Estimate 0.04378 1.01497 Standard Error 0.00750 0.07796 Chi-Square 34.0327 169.4945 proc phreg data=pbc; model fu_days*status2(0) = age log_bili/tie=exact; Variable age log_bili DF 1 1 Parameter Estimate 0.04378 1.01497 Standard Error 0.00750 0.07796 Chi-Square 34.0329 169.4943 proc phreg data=pbc; model fu_days*status2(0) = age log_bili/tie=discrete; Variable age log_bili DF 1 1 Parameter Estimate 0.04381 1.01524 Standard Error 0.00751 0.07800 Chi-Square 34.0440 169.3922 Pr > ChiSq <.0001 <.0001 Hazard Ratio 1.045 2.760 R code > coxph(Surv(fu_days, status2) ~ age + log(bili), method="breslow", data=pbc.dat) > coxph(Surv(fu_days, status2) ~ age + log(bili), method="efron", data=pbc.dat) 77 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Using the leukemia data to compare 6-MP vs. control proc phreg; class grp(ref=’cnt’)/param=ref; model time*dth(0)= grp/ ties=breslow; run; Parameter Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Hazard Ratio Label grp 6-MP 1 -1.50919 0.40956 13.5783 0.0002 0.221 grp 6-MP --------------------------------------------------------------------------------proc phreg; class grp(ref=’cnt’)/param=ref; model time*dth(0)= grp/ ties=efron; run; Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Label grp 6-MP 1 -1.57213 0.41240 14.5326 0.0001 0.208 grp 6-MP --------------------------------------------------------------------------------proc phreg; class grp(ref=’cnt’)/param=ref; model time*dth(0)= grp/ ties=exact; run; Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Label grp 6-MP 1 -1.59787 0.42162 14.3630 0.0002 0.202 grp 6-MP --------------------------------------------------------------------------------proc phreg; class grp(ref=’cnt’)/param=ref; model time*dth(0)= grp/ ties=discrete; run; Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Label grp 6-MP 1 -1.62822 0.43313 14.1316 0.0002 0.196 grp 6-MP 78 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 5.8 Predicted Survival Just as in the linear regression model, once β̂ and Ŝ0(t) are obtained, the predicted survival probabilities and predicted median survival time can be calculated as. Ŝ(t|X) = Ŝ0(t)exp(X β̂) 0.5 = Ŝ0(T0.5)exp(X β̂) =⇒ Ŝ0(T0.5) = [0.5]exp(−X β̂) Note1: Ŝ0(t) can be obtained using PROC LIFETEST (the KM estimate) or using the BASELINE statement in PHREG, and β̂ from the PHREG output. Estimators for λ0(t) and S0 (t) were discussed before. Note2: Before calculating the predicted survival probabilities, it is critical to assess the model adequacy first. Example 7: Predicted Survival under PH data pred; input age grp; ** STD if grp=0, Mini if grp=1; cards; 0 0 ** this line generates the estimate of So(t); 50 0 50 1 60 0 60 1 ; run; proc phreg data=one; model os_t*os(0)=age grp/ ties=efron; baseline covariates=pred out=outpred survival=s lower=S_lower95 upper=S_upper95; run; 79 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Variable age grp DF 1 1 Analysis of Maximum Likelihood Estimates Parameter Standard Estimate Error Chi-Square Pr > ChiSq 0.03711 0.00973 14.5400 0.0001 -0.36429 0.24632 2.1872 0.1392 80 Hazard Ratio 1.038 0.695 proc print data=outpred; Obs 2 3 . 64 65 66 age 0 0 . 0 0 0 grp STD STD . STD STD STD os_t 0.3614 0.4599 . 6.0123 6.0780 6.1437 s 0.99785 0.99566 . 0.89876 0.89696 0.89514 S_lower95 0.99437 0.99007 . 0.81897 0.81597 0.81294 S_upper95 1.00000 1.00000 . 0.98633 0.98600 0.98566 62 63 64 65 50 50 50 50 STD STD STD STD 5.8480 5.9466 6.0123 6.0780 0.51816 0.51172 0.50529 0.49886 0.42688 0.42030 0.41377 0.40726 0.62896 0.62302 0.61706 ** (0.89876)^(exp(0.03711*50))=0.50529 0.61107 ** (0.89696)^(exp(0.03711*50))=0.49886 83 84 50 50 STD STD 11.6961 13.5359 0.37663 0.36958 0.28709 0.28039 0.49410 * predicted 1-year survival for age=50 in STD 0.48714 186 187 188 189 50 50 50 50 Mini Mini Mini Mini 11.6961 13.5359 14.4230 14.5873 0.50745 0.50083 0.49390 0.48695 0.39601 0.38900 0.38165 0.37430 0.65025 * predicted 1-year survival for age=50 in Mini 0.64480 0.63916 0.63350 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 252 253 254 255 60 60 60 60 STD STD STD STD 4.43532 4.46817 4.50103 4.53388 0.51864 0.50306 0.48760 0.47995 0.39218 0.37612 0.36032 0.35255 0.68588 0.67283 0.65983 0.65338 289 290 60 60 STD STD 11.6961 13.5359 0.24286 0.23630 0.13757 0.13243 0.42873 * predicted 1-year survival for age=60 in STD 0.42161 372 373 374 375 60 60 60 60 Mini Mini Mini Mini 5.9466 6.0123 6.0780 6.1437 0.50937 0.50294 0.49649 0.49004 0.39969 0.39308 0.38648 0.37991 0.64916 0.64349 0.63780 ** (0.89696)^(exp(0.03711*60-0.36429))=0.49649 0.63210 392 393 60 60 Mini 11.6961 Mini 13.5359 0.37412 0.36707 0.26606 0.25932 0.52606 * predicted 1-year survival for age=60 in Mini 0.51959 Let us calculate the median predicted survival time for an individual with age=50 and grp=0 (STD) using the formula. Sˆ0 (T0.5 ) = (0.5)exp−(0.03711∗50−0.36429∗0) = (0.5)0.1563 = 0.89696 The corresponding survival time in the baseline survival is 6.078. 81 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 82 predicted survival for AML/MDS 1.0 survival 0.8 0.6 0.4 0.2 •• •• •• •••• •••• ••• ••••• ••••••• •••• •• •••• •• • ••••• •• •• • •• •• • •• •• ••• • •• •• •• •• • •• •• •• •• •• ••• •• •• • •• • •• • •• •• ••• ••• •• • ••• •• ••• ••• •• •• • •• • •• ••• •• ••• ••• ••• •• ••• ••• ••• •• ••• •••• •• •• • • ••• ••• ••• ••• ••• • ••• •• •• • • •• ••• •• age=50, Mini age=50, STD age=60, Mini age=60, STD • • •• • • •• •• • • • • •• • • • •• • • • • • •• • • 0.0 0 10 20 30 40 50 60 70 time This plot shows the predicted survival probabilities of 4 subgroups, assuming that the PH assumption holds. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 83 5.9 Assessment of PH Model Assumptions 5.9.1 Regression Assumptions Regression assumptions can be checked by • plotting martingale or deviance residuals against a covariate or covariates (i.e., Xβ) • plotting log relative hazard against a covariate • supremum test in PHREG • categorizing a continuous variable into intervals (e.g., by quantiles) and then plotting β̂ by the midpoints of intervals, and then checking the shape. 5.9.1.1 Martingale Residuals • Recall the predicted survival in the previous section. At each time point, the predicted survival for fixed values of covariates is Ŝ(t|X) = Ŝ0(t)exp(X β̂) thus Λ̂(t|X) = −log[Ŝ(t|X)]. The martingale residual for the i-th individual is ri = δi − Λ̂i = (observed − expected) where δi is an event indicator. ri s have mean 0 and range between −∞ and 1. • This residual is derived from the difference between the coutning process and the integrated intensity R function. i.e., Mi(t) = Ni (t) − ot Yi(s)expXβ dΛ0(s), i = 1, · · · , n. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 84 • Martingale residuals can be used for assessment of either very short or long survival (e.g., a large negative value of residual) and for evaluation of an appropriate functional form for a covariate. 5.9.1.2 Deviance Residuals r D̂i = sign(r̂i ) −2[r̂i + δi log(δi − r̂i)] • The martingale residual is highly skewed. The deviance residual is a further transformation of the martingale residual and much like residuals from ordinary linear square regression. • The deviance residual is asymptotically normally distributed with mean 0 and standard deviation 1. • The deviance residual is negative for observations that have longer survival times than expected (i.e., censored observations) and positive for observations with survival times that are smaller than expected (i.e., uncensored). Extreme values suggest that the observtion may be an outlier and require a special attention. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 85 5.9.1.3 Functional Form of X λ(t|x) = λ0(t)expf (x)β (2) If a covariate appears to be non-linear, there are many ways to find an appropriate functional forms. We discuss here a few ways to assess the functional form of a covariate. • Plot the martigale residuals from a nulll model against each covariate separately and superimpose a smoothing curve. This was proposed by Therneau et al(1990, Biometrika). If (2) is correct for some smoothing function f , then the smooth for the j − th covariate will display the form of f , under certain conditions. That is, E(ri |Xij = xj ) ≈ cf (xj ) where c is roughly independent of xj and depends on the amount of censoring. See Therneau et al(1990, Biometrika) for detailed derivation. • Use smoothing splines or regression splines in a model. – An alternative to residual manipulations is to model the functional form directly in the Cox regression model. – A naive way is to include all polynomial terms (x, x2, x3, cdots) in a model. However, with this approach, the data fits are not local thus unstable and may computationally unstable. – A better approcah is to include splines in a model directly to fit curves locally. Thease are ”regression splines”, ”natural splines”, ”smoothing splines” or ”restricted cubic splines (rcs)”. An example of rcs is f (X) = β0 + β1X + β2(X − a)3+ + β3(X − b)3+ + β4(X − c)3+ for k=3 knots at a, b, c. A test of linearity in X can be obatined by testing Ho : β2 = β3 = β4 = 0. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example 8: Testing regression assumption in Cox model 1. AML data: SAS code proc phreg data=one; model os_t*os(0)=age grp /ties=efron; *** age>=50 only; output out=d.aml_res survival=s logsurv=ls loglogs=lls resmart=mart resdev=dev; run; note: covariates are included in the model. if the linearity assumption is met, the smoothing curve on the residuals is a straight line around zero. 86 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim checking linearity 1 0 −1 −3 50 55 60 65 70 75 age 3 2 1 0 −1 −2 −3 50 55 60 65 age 50 55 60 65 age checking linearity deviance residuals −2 martingale residuals 0 −1 −2 −4 −3 Log−Log Surv 1 2 2 checking linearity 87 70 75 70 75 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 88 2. PBC (Primary Biliary Cirrhosis) data: R code Data from the Mayo Clinic trial in primary biliary cirrhosis of the liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. The first 312 cases in the data set participated in the randomized trial and contain largely complete data. The additional 112 cases did not participate in the clinical trial, but consented to have basic measurements recorded and to be followed for survival. Six of those cases were lost to follow-up shortly after diagnosis, so the data here are on an additional 106 cases as well as the 312 randomized participants. Missing data items are denoted by NA. a. Testing Linearity mart <- coxph(Surv(fu.days, status2) ~ age+bili, data=pbc.dat) mr <- resid(mart) plot(pbc.dat$age, mr, xlab="Age", ylab="Martingale Residual") lines(lowess(pbc.dat$age, mr, iter=0), lwd=2, col=6) plot(pbc.dat$bili, mr, xlab="Bilirubin", ylab="Martingale Residual") lines(lowess(pbc.dat$bili, mr, iter=0), lwd=2, col=6) mart2 <- coxph(Surv(fu.days, status2) ~ age+log.bili, data=pbc.dat) mr2 <- resid(mart2) plot(pbc.dat$log.bili, mr2, xlab="log(Bilirubin)", ylab="Martingale Residual") lines(lowess(pbc.dat$log.bili, mr2, iter=0), lwd=2, col=6) 1 0 −1 −3 −4 30 40 50 60 70 80 0 −1 −2 −3 −4 −5 −1 0 1 log(Bilirubin) 0 5 10 15 Bilirubin 1 Age Martingale Residual 89 −2 Martingale Residual −1 −2 −4 −3 Martingale Residual 0 1 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 2 3 20 25 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim b. Functional Forms To assess a functional form for each covariate, fit a model without covariates first > > > > > > > > coxph1 <- coxph(Surv(fu.days, status2) ~ 1, data=pbc.dat) # to create null residual plots mart <- resid(coxph1) plot(pbc.dat$age, mart, xlab="Age", ylab="Martingale Residual") lines(lowess(pbc.dat$age, mart, iter=0), lwd=2, col=2) # Age looks pretty linear plot(pbc.dat$bili, mart, xlab="Bilirubin", ylab="Martingale Residual") lines(lowess(pbc.dat$bili, mart, iter=0), lwd=2, col=2) # Bilirubin doesn’t look linear plot(pbc.dat$log.bili, mart, xlab="log(Bilirubin)", ylab="Martingale Residual") lines(lowess(pbc.dat$log.bili, mart, iter=0), lwd=2, col=2) # log(Bilirubin) looks linear Note: ‘Residuals’ in R calculates martingale, deviance, score or Schoenfeld residuals for a Cox proportional hazards model. coxph1 <- coxph(Surv(fu.days, status2) ~ age+log.bili, data=pbc.dat) print(resid(coxph1, type=c("martingale"))) print(resid(coxph1, type=c("deviance"))) print(resid(coxph1, type=c("score", "schoenfeld""))) 90 −1 −1.0 −0.5 0.0 Martingale Residual 0.5 1.0 40 0 50 91 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 30 60 1 70 2 80 Age 0 5 10 15 20 25 Bilirubin 3 log(Bilirubin) −1.0 −1.0 −0.5 −0.5 0.0 Martingale Residual 0.0 Martingale Residual 0.5 0.5 1.0 1.0 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim > coxph1 <- coxph(Surv(fu.days, status2) ~ age+edema+bili+protime+albumin, data=pbc.dat) > print(coxph1) coef exp(coef) se(coef) z p age 0.0383 1.039 0.00806 4.76 2.0e-06 edema 0.9368 2.552 0.28162 3.33 8.8e-04 bili 0.1159 1.123 0.01301 8.91 0.0e+00 protime 0.2008 1.222 0.05659 3.55 3.9e-04 albumin -0.9710 0.379 0.20538 -4.73 2.3e-06 Likelihood ratio test=183 on 5 df, p=0 n=416 (2 observations deleted due to missingness) LRT for bili: chi-square 58.9 with 1 df AIC with 5 covariates: 1561.298 (thus 1733.915-1561.298=172.617) AIC for null model : 1733.915 > coxph2 <- coxph(Surv(fu.days, status2) ~ age+edema+log.bili+protime+albumin, data=pbc.dat) > print(coxph2) coef exp(coef) se(coef) z p age 0.0402 1.04 0.00767 5.24 1.6e-07 edema 0.9382 2.56 0.27082 3.46 5.3e-04 log.bili 0.8686 2.38 0.08289 10.48 0.0e+00 protime 0.1746 1.19 0.06109 2.86 4.3e-03 albumin -0.7542 0.47 0.20951 -3.60 3.2e-04 Likelihood ratio test=229 on 5 df, p=0 n=416 (2 observations deleted due to missingness) LRT for log.bili: chi-square 104.9 with 1 df AIC with 5 covariates: 1515.333 (1733.915-1515.333=218.582. Compare this number to 172.617) 3. SAS code: PBC data proc phreg data=pbc; model fu.days*status2(0)= / ties=efron; output out=res_m1 resmart=mart resdev=dev; title "PBC data"; data pbc2; merge pbc res_m1; run; 92 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc print data=pbc2; var id fu_days status2 age bili Obs id fu_days status2 1 1 400 1 2 2 4500 0 3 3 1012 1 4 4 1925 1 5 5 1504 0 6 6 2503 1 7 7 1832 0 8 8 2466 1 9 9 2400 1 10 10 51 1 11 11 3762 1 12 12 304 1 mart dev; age 58.7668 56.4478 70.0745 54.7421 38.1065 66.2605 55.5361 53.0583 42.5090 70.5618 53.7154 59.1392 bili 14.5 1.1 1.4 1.8 3.4 0.8 1.0 0.3 3.2 12.6 1.4 3.6 mart 0.92047 -1.03072 0.79455 0.63248 -0.30159 0.51937 -0.35705 0.52750 0.54307 0.99040 0.15763 0.93842 dev 1.79506 -1.43577 1.25540 0.85848 -0.77665 0.65312 -0.84504 0.66666 0.69304 2.70399 0.16677 1.92299 ods graphics on; ods rtf file="assess.rtf"; proc phreg data=pbc; model fu_days*status2(0)= age log_bili/ ties=efron; assess var=(log_bili) / npaths=50; run; ods rtf close; ods graphics off; note: The resample option of ASSESS in PROC PHREG gives a test of the functional form and a test of PH. Tests are based on a Kolmogorov-type supremum test using 1000 simulated patterns. A significant p-value indictes poor fit. 93 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc phreg data=pbc; model fu_days*status2(0) = age bili/tie=efron; assess var=(bili) / resample seed=; Supremum Test for Functional Form Maximum Absolute Variable Value Replications Seed bili 36.8864 1000 901907139 Pr > MaxAbsVal <.0001 proc phreg data=pbc; model fu_days*status2(0) = age log_bili/tie=efron; assess var=(log_bili) / resample; Supremum Test for Functional Form Maximum Absolute Pr > Variable Value Replications Seed MaxAbsVal log_bili 9.3530 1000 906300144 0.1500 94 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 95 4. Use Frank Harrell’s programs rcspline.plot(bili, fu.days, event=status2, nk=3, xlab="Bilirubin)") rcspline.plot(log.bili, fu.days, event=status2, nk=3, xlab="log(Bilirubin)") Estimated Spline Transformation 3 Cox Regression Model, n=418 events=161 Statistic X2 df Model L.R. 152.98 2 AIC= 148.98 Association Wald 151.93 2 p= 0 Linearity Wald 53.76 1 p= 0 Cox Regression Model, n=418 events=161 Statistic X2 df Model L.R. 155.42 2 AIC= 151.42 Association Wald 144.88 2 p= 0.0000 Linearity Wald 2.23 1 p= 0.1355 1 0 log Relative Hazard 2 −2 −1 1 0 log Relative Hazard 3 2 4 Estimated Spline Transformation 0 5 10 Bilirubin) 15 20 −1 0 1 log(Bilirubin) 2 3 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 5. R code for spline functions: PBC data > library(survival) > library(stats) ** this is for termplot > coxph1 <- coxph(Surv(fu.days, status==2) ~ age+pspline(bili, df=4)+pspline(protime, df=4)+pspline(albumin, df=4), data=pbc.dat) > print(coxph1) > termplot(coxph1, rug=T, terms=1, ylab="Spline fit") ** rug=T for actual data points to appear on X axis > termplot(coxph1, rug=T, terms=2, ylab="Spline fit") > termplot(coxph1, rug=T, terms=3, ylab="Spline fit") > termplot(coxph1, rug=T, terms=4, ylab="Spline fit") coef se(coef) se2 Chisq DF age 0.0358 0.00787 0.00782 20.73 1.00 pspline(bili, df = 4), linear 0.1167 0.01573 0.01541 55.03 1.00 pspline(bili, df = 4), nonlin 42.10 3.04 pspline(protime, df = 4), linear 0.3725 0.08012 0.07962 21.61 1.00 pspline(protime, df = 4), nonlin 6.61 3.00 pspline(albumin, df = 4), linear -0.9757 0.18930 0.18805 26.57 1.00 pspline(albumin, df = 4), nonlin 1.72 3.06 Iterations: 4 outer, 12 Newton-Raphson Theta= 0.741 ** tuning parameter for bili Theta= 0.559 ** tuning parameter for protime Theta= 0.77 ** tuning parameter for albumin Degrees of freedom for terms= 1.0 4.0 4.0 4.1 Likelihood ratio test=239 on 13.1 df, p=0 n=416 (2 observations deleted due to missingness) p 5.3e-06 1.2e-13 4.1e-09 3.3e-06 8.6e-02 2.5e-07 6.4e-01 If you are not sure about d.f. in pspline, just fit pspline(x). The pspline function uses Akaike’s information criteria, AIC = LR test -2*df, to select a “best” degrees of freedom for the term. 96 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim coxph1 <- coxph(Surv(fu.days, status==2) ~ age+pspline(bili, df=4)+pspline(protime) +pspline(albumin), data=pbc.dat) coef se(coef) se2 Chisq age 0.0358 0.00787 0.00782 20.73 pspline(bili, df = 4), li 0.1167 0.01573 0.01541 55.03 pspline(bili, df = 4), no 42.10 pspline(protime), linear 0.3725 0.08012 0.07962 21.61 pspline(protime), nonlin 6.61 pspline(albumin), linear -0.9757 0.18930 0.18805 26.57 pspline(albumin), nonlin 1.72 Iterations: 4 outer, 12 Newton-Raphson Degrees of freedom for terms= 1.0 4.0 4.0 4.1 Likelihood ratio test=239 on 13.1 df, p=0 n=416 (2 observations deleted due to missingness) DF 1.00 1.00 3.04 1.00 3.00 1.00 3.06 p 5.3e-06 1.2e-13 4.1e-09 3.3e-06 8.6e-02 2.5e-07 6.4e-01 The AIC criterion has chosen 4 df for protime albumin. Albumin is linear, thus fit a linear term only. 97 98 −1.0 1.0 −1.0 0.0 Spline fit 0.0 −0.5 Spline fit 0.5 2.0 1.0 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 30 40 50 60 70 80 0 5 20 25 bili 1.0 −0.5 0.5 Spline fit 1.0 0.0 −1.0 −2.0 Spline fit 15 1.5 age 10 8 10 12 14 protime 16 18 2.0 2.5 3.0 3.5 albumin 4.0 4.5 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 99 5.9.2 PH Assumption The primary assumption of the Cox model is proportional hazards, thus it is important to assess whether this assumption holds for each variable. The relative hazard for any two subjects i and j is λ0(t)exp(Xiβ) = exp(Xi − Xj )β λ0(t)exp(Xj β) which is independent of time for time-fixed covariates. This hazard ratio is exp(β) if X is a binary variable with 1 or 0, and it is a hazard ratio for one unit change if X is continuous. For example, edema in the PBC data is coded as 1 or 0 and β̂(edema)=0.94 in Example 8-2. This means that the log-log Survival curves for edema=0 and 1 should be parallel, spaced 0.94 units apart since logΛ(t|X) = log{−logS(t|X)} = logΛ0(t) + Xβ = logΛ0(t) + 0.94 There are many graphical and analytical methods of verifying the PH assumption, and we present here a few of those. 1. For categorical variables. (a) Plot Kaplan-Meier curves or log(-log(S(t)). If the PH assumption holds, the KM curves should not cross and log(-log(S(t)) against log(time) should be approximately parallel. (b) Plots from stratified Cox model. (c) Calculate a log hazard ratio for a predictor to be a function of time by fitting specifically stratified Cox models. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 100 2. For continuous variables (a) For fixed points of X, plots of log(-log(S(t)) against log(time) should be approximately parallel. 3. For any type of variables (a) Test a time by predictor interaction term in Cox model. (b) Plot Schoenfeld partial residuals (weighted or unweighted) against t with respect to each predictor (c) A formal PH test using cox.zph in R/S-plus. (d) A formal PH test in PROC PHREG using supremum test. Let us discuss the option 1-c) in detail with an example of the Veterans Administration (VA) Lung Cancer dataset. Log Λ plots indicated that the four cell types did not satisfy the PH assumption. For the purpose of illustration, omit patients with large cell type and let the binary predictor be 1 if the cell type is squamous and 0 if it is small or adeno. We are assessing whether survival patterns for the two groups squamous vs. small/adeno have PH. Interval specific log-hazard ratio are found below. time interval [ 0, 21) [ 21, 52) [ 52, 118) 118+ total obs. 26 25 31 28 events log hazard ratio 26 0.75 24 -0.006 26 -0.63 26 -1.04 standard error 0.50 0.52 0.51 0.45 There is evidence of a trend of decreasing hazard ratio over time which is consistent with the observation that patients with squamous cell carcinoma had poorer prognosis in the early period, but better prognosis in the late period. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 101 5.9.2.1 Schoenfeld Residual • Proposed by Schoenfeld (1982, Biometrika). Suppose that subject i dies at time ti with p covariates Xi = (Xi1, · · · , Xip)′. The Schoenfeld residual for the subject i is ri = (ri1 , · · · , rip)′, where rik = Xik − Ê(Xik |Ri ) = Xik − X k∈R(ti ) Xkj Pj = O−E E: the expected value of the covariate for the risk set at ti, i.e., the average over the risk set at ti; Pj : the probablity of having event at that time. this probability is estimated from the Cox model. • The Schoenfeld residual is not defined for censored individuals. • Instead of a single residual for each individual, there is a separate residual for each individual for each covariate. • The Schoenfeld residual does not depend on time so that the residual plot against time is used to test the PH assumption. • Sum to zero with the expected value zero. • Later, Grambsch and Therneau (1999) proposed scaled Schonfeld residuals. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 102 5.9.2.2 PH test More general form of (1) is ′ λ(t|x) = λ0(t)expX β(t) If β(t)=β, it is PH. Thus if PH holds, a plot of β(t) against time should be a horizontal line. Grambsch and Therneau (1999) argued that if β̂ is the coefficient from an ordinary fit of the Cox model, then E(rij∗ ) + β̂j ≈ βj (ti) rij∗ r is the scaled Schoenfeld residual (i.e., rij / V (β̂, ti )) for rij from ri = (ri1, · · · , rip )′. Thus, if where PH holds, plotting rij∗ + β̂j versus time should be a horizontal line with a zero slope. The cox.zph tests a nonzero slope as evidence against PH. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 103 Example 9: Testing PH assumption 1. SAS code: AML/MDS study proc phreg data=one; model os_t*os(0)=age/ ties=efron; strata grp; ** to check the PH assumption of grp; baseline out=srvpred2 survival=s loglogs=lls; run; Log-log survival plots for AML/MDS, stratified by group 1 Mini STD Log-Log Survival 0 -1 -2 -3 -4 0 1 2 3 4 log(time) This plot shows the baseline survivor function for the 2 groups evaluated at the mean of age. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 104 proc phreg data=one; model os_t*os(0)=age grp/ ties=efron; output out=res_s xbeta=xb ressch=schage schgrp; title "AML/MDS : overall survival"; Schoenfeld Residuals for Group 0.3 0.0 −0.6 −0.3 schonfeld residuals for grp 5 −5 0.0 0 −1.2 −0.9 −25 −35 schonfeld residuals for age 15 0.6 25 0.9 35 1.2 Schoenfeld Residuals for Age 0 6 12 18 24 30 time 36 42 48 54 60 0 6 12 18 24 30 time 36 42 48 54 60 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 2. R code: AML/MDS study >aml.cox <- coxph(Surv(os.t, os) ~ age + grp, data=aml.data) >summary(aml.cox) coxph(formula = Surv(os.t, os) ~ age + grp) n= 181 coef exp(coef) se(coef) z p age 0.0371 1.04 0.00973 3.81 0.00014 grp 0.3643 1.44 0.24632 1.48 0.14000 age grp exp(coef) exp(-coef) lower .95 upper .95 1.04 0.964 1.018 1.06 1.44 0.695 0.888 2.33 Rsquare= 0.083 (max possible= Likelihood ratio test= 15.7 on Wald test = 15 on 2 Efficient score test = 15.1 on 0.997 ) 2 df, p=0.000397 df, p=0.000558 2 df, p=0.000516 >ph.test <- cox.zph(aml.cox) >print(ph.test) # display the results >plot(ph.test) # plot for scaled Schoenfeld residuals against time for each covariate rho chisq p age -0.137 2.3 1.29e-01 grp -0.339 15.5 8.38e-05 GLOBAL NA 16.0 3.44e-04 *** Non PH Note: In the cox.zph output • rho: Pearson product-moment correlation between the scaled Schoenfeld residuals and log(t) for each covariate • chisq: test statistics • GLOBAL: global test for PH • the test is not always sensitive 105 5 106 −5 0 Beta(t) for grp 0 −5 Beta(t) for age 5 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 0.63 1.4 3.3 4.5 6.1 Time 9.3 18 44 0.63 1.4 3.3 4.5 6.1 Time 9.3 18 44 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 3. SAS code: model with a time by covariate interaction proc phreg data=one; where age50=1; model os_t*os(0)=grp age grp_age grp_t diff_sex dnr_type agvhd second_tx high_risk cell_srce sex_grp sec_grp/ selection=b ; grp_t=os_t*grp; grp_age=age*grp; sex_grp=diff_sex*grp; sec_grp=second_tx*grp; title "OS, all"; After a backward selection, the final model is Analysis of Maximum Likelihood Estimates Variable grp grp_t DF Parameter Estimate Standard Error Chi-Square Pr > ChiSq Hazard Ratio 1 1 1.46186 -0.15271 0.46896 0.06367 9.7170 5.7523 0.0018 0.0165 4.314 0.858 Interpretation: the hazard ratio for grp is exp(1.46186 - 0.15271*time). 107 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 4. Plotting -Log SDF (ls) against time and Log(-Log SDF) (lls) against log (time). 108 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 5. SAS code to check the PH assumption for Cell Type: VA Lung Cancer Data. proc lifetest plots=(s,ls,lls) outtest=Test maxtime=600; time SurvTime*Censor(1); strata Cell; *** include adenocarcinoma and large cell only run; Note: If the proportional hazards assumption is appropriate, then the lls curves should be approximately parallel across strata. 109 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc phreg data=pbc; model fu_days*status2(0) = age bili/tie=efron; assess var=(bili) PH / resample; Supremum Test for Proportionals Hazards Assumption Maximum Absolute Pr > Variable Value Replications Seed MaxAbsVal age 0.9154 1000 412213316 0.2640 bili 1.4470 1000 412213316 <.0001 proc phreg data=pbc; model fu_days*status2(0) = age log_bili/tie=efron; assess var=(log_bili) PH / resample; Supremum Test for Proportionals Hazards Assumption Maximum Absolute Pr > Variable Value Replications Seed MaxAbsVal age 0.7989 1000 45062701 0.3720 log_bili 0.7137 1000 45062701 0.4880 110 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 111 5.9.2.3 What to do when PH fails • Check if all important covariates are included • Check if appropriate functional forms of X are used. A wrong functional form or missing covariate can look like non-PH • Use a stratified Cox model as discussed above. For a continuous variable, create a quantile group. Caution: stratification only makes sense for nuisance variables since no estimates are obtained for the stratifying variables, thus no test for the main effect of the stratifying variables. if the variable is the variable of interest, stratification is not a solution. • Include a time by predictor or predictor by predictor interaction in the model. • Partition the time axis: PH holds over short time periods • Model time-dependent covariate: β(t)X ≈ βX ∗(t) • Explore other models 5.9.2.4 Summary of Residuals Residual Purposes Martingale or assess adequacy of the regression assumptions of predictors Deviance and functional forms of predictors Score detect influential observations DFBETA detect influential observations. Schoenfeld test PH assumption Weighted Schoenfeld test PH assumption DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 112 5.11 Time-Dependent Covariates and Time-Dependent Coefficients ′ λ(t|X) = λ0(t)expX (t)β ′ λ(t|X) = λ0(t)expX β(t) (3) (4) (3) is commonly known as a Cox model with time-dependent covariates. That is, the hazard at time t depends on the value of X at time t, and thus the hazard may not be constant over time. This model is still PH. A time-dependent covariate can be measured once (e.g., Stanford Heart Transplant study) or repeatedly over time. (4) is a time-dependent coefficient model. This is a non-proportional hazards model with the regression effect of X changing over time. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 113 5.11 Counting Process form of a Cox model The basic concept is like a slow Poisson process - censoring is not “incomplete data”, rather “the event hasn’t occurred yet” (Laird and Oliver, JASA, 1981). Computationally this requires a simple reconstruction of data set. In the data set, create start and stop instead of time. i.e., (start, stop] status strata x1 x2 ... instead of time status x1 x2 ... where • (start, stop]: an interval of risk • status=1 if the subject had an event at time stop; 0 otherwise This simple reconstruction allows to analyze data that contains time-dependent covariates, time-dependent strata, multiple events per subject and so on. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 114 Example 10: Time-dependent Covariate a. Stanford Heart Transplant Study. Taken from SAS. Patients are accepted if physicians judge them suitable for heart transplant. Then, when a donor becomes available, physicians choose transplant recipients according to various medical criteria. A patient’s status can be changed during the study from waiting for a transplant to transplant recipient. Transplant status can be defined by the time-dependent covariate function X=X(t) as X(t)=0 if the patient has not received the transplant at time t 1 if the patient has received the transplant at time t. The Stanford heart transplant data that appear in Crowley and Hu (1977) consist of 103 patients, 69 of whom received transplants. data Stanford_Heart; input ID @5 Bir_Date mmddyy8. @14 Acc_Date mmddyy8. @23 Xpl_Date mmddyy8. @32 Ter_Date mmddyy8. @41 Status 1. @43 PrevSurg 1. @45 NMismatch 1. @47 Antigen 1. @49 Mismatch 4. @54 Reject 1. @56 NotTyped $1.; label Bir_Date =’Date of birth’ Acc_Date =’Date of acceptance’ Xpl_Date =’Date of transplant’ Ter_Date =’Date last seen’ Status = ’Dead=1 Alive=0’ PrevSurg =’Previous surgery’ NMismatch= ’No of mismatches’ Antigen = ’HLA-A2 antigen’ Mismatch =’Mismatch score’ NotTyped = ’y=not tissue-typed’; Time= Ter_Date - Acc_Date; Acc_Age=int( (Acc_Date - Bir_Date)/365 ); DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim if ( Xpl_Date ne .) then do; WaitTime= Xpl_Date - Acc_Date; Xpl_Age= int( (Xpl_Date - Bir_Date)/365 ); end; if ( Xpl_Date ne .) then transplant=1; else transplant=0; datalines; 1 01 10 37 11 15 67 01 03 68 1 0 2 03 02 16 01 02 68 01 07 68 1 0 3 09 19 13 01 06 68 01 06 68 01 21 68 1 0 2 0 1.11 0 4 12 23 27 03 28 68 05 02 68 05 05 68 1 0 3 0 1.66 0 5 07 28 47 05 10 68 05 27 68 1 0 6 11 18 13 06 13 68 06 15 68 1 0 7 08 29 17 07 12 68 08 31 68 05 17 70 1 0 4 0 1.32 1 8 03 27 23 08 01 68 09 09 68 1 0 9 06 11 21 08 09 68 11 01 68 1 0 10 02 09 26 08 11 68 08 22 68 10 07 68 1 0 2 0 0.61 1 11 08 22 20 08 15 68 09 09 68 01 14 69 1 0 1 0 0.36 0 12 07 09 15 09 17 68 09 24 68 1 0 13 02 22 14 09 19 68 10 05 68 12 08 68 1 0 3 0 1.89 1 14 09 16 14 09 20 68 10 26 68 07 07 72 1 0 1 0 0.87 1 15 12 04 14 09 27 68 09 27 68 1 1 16 05 16 19 10 26 68 11 22 68 08 29 69 1 0 2 0 1.12 1 17 06 29 48 10 28 68 12 02 68 1 0 18 12 27 11 11 01 68 11 20 68 12 13 68 1 0 3 0 2.05 0 19 10 04 09 11 18 68 12 24 68 1 0 20 10 19 13 01 29 69 02 15 69 02 25 69 1 0 3 1 2.76 1 21 09 29 25 02 01 69 02 08 69 11 29 71 1 0 2 0 1.13 1 22 06 05 26 03 18 69 03 29 69 05 07 69 1 0 3 0 1.38 1 23 12 02 10 04 11 69 04 13 69 04 13 71 1 0 3 0 0.96 1 24 07 07 17 04 25 69 07 16 69 11 29 69 1 0 3 1 1.62 1 25 02 06 36 04 28 69 05 22 69 04 01 74 0 0 2 0 1.06 0 26 10 18 38 05 01 69 03 01 73 0 0 27 07 21 60 05 04 69 01 21 70 1 0 28 05 30 15 06 07 69 08 16 69 08 17 69 1 0 2 0 0.47 0 29 02 06 19 07 14 69 08 17 69 1 0 ................................ 115 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 116 To illustrate calculation of a time-dependent covariate in the PL, let’s suppose that we have one time-dependent covariate, WaitTime in the model. ******************************************************* Of 103, 22 pts died or censored before Time 27 and thus 81 are at risk. Wait Obs Time Status Time transplant X(27) 23 27 1 17 1 1 24 29 1 4 1 1 25 30 0 . 0 0 26 31 1 . 0 0 27 34 1 . 0 0 28 35 1 . 0 0 29 36 1 . 0 0 30 38 1 35 1 0 <== 31 38 0 37 1 0 <== ... ... .. .. .. .. 103 1799 0 24 1 1 ******************************************************* At the failure time=27, eβx23 (27) eβx23 (27) + eβx24 (27) + eβx25 (27) + · · · + eβx29 (27) + eβx30 (27) + eβx31 (27) + · · · + eβx103 (27) eβ∗1 = β∗1 e + eβ∗1 + eβ∗0 + · · · + eβ∗0 + eβ∗0 + eβ∗0 + · · · + eβ∗1 But if WaitTime is ignored and a fixed covariate ‘tranplant’ is used instread. eβ∗1 eβ∗1 + eβ∗1 + eβ∗0 + · · · + eβ∗0 + eβ∗1 + eβ∗1 + · · · + eβ∗1 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 117 proc phreg; model Time*Status(0)= Acc_Age PrevSurg XStatus; if (WaitTime = . or Time < WaitTime) then XStatus=0; else XStatus= 1; run; *note1: no pt had Time < WaitTime. 1 pt died on the operation table, thus Time=WaitTime *note2: Unlike an IF statement in the DATA step, the IF statement in PHREG compares waiting times for patients who are at risk of a death with survival times for patients who experienced events. Summary of the Number of Event and Censored Values Percent Total Event Censored Censored 103 75 28 27.18 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 596.649 591.312 AIC 596.649 595.312 SBC 596.649 599.947 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 5.3370 2 0.0694 Score 4.7900 2 0.0912 Wald 4.7812 2 0.0916 Parameter Acc_Age PrevSurg XStatus DF 1 1 1 Analysis of Maximum Likelihood Estimates Parameter Standard Estimate Error Chi-Square Pr > ChiSq 0.03130 0.01386 5.0975 0.0240 -0.77052 0.35961 4.5911 0.0321 -0.04086 0.30225 0.0183 0.8925 Hazard Ratio 1.032 0.463 0.960 Label Age at Acceptance Previous surgery DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 118 What if we ignore the wait time and treat ’transplant status’ as a fixed effect at the baseline. proc phreg; model Time*Status(0)= Acc_Age PrevSurg transplant ; *transplant=1 if ever had a tranplant, 0 otherwise; run; Analysis of Maximum Likelihood Estimates Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Label Acc_Age 1 0.05837 0.01496 15.2270 <.0001 1.060 Age at Acceptance PrevSurg 1 -0.42241 0.37098 1.2964 0.2549 0.655 Previous surgery transplant 1 -1.70341 0.27812 37.5125 <.0001 0.182 note: selection bias because patients who died quickly were less likely to get transplants. -------------------------------------------------------------------------------------------proc phreg data=one; model Time*Status(0)= XStatus / ties=discrete; if (WaitTime = . or Time < WaitTime) then XStatus=0.; else XStatus= 1.0; run; Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Parameter XStatus DF 1 Chi-Square 0.0608 0.0606 0.0597 DF 1 1 1 Pr > ChiSq 0.8052 0.8056 0.8069 <=== MB test Analysis of Maximum Likelihood Estimates Parameter Standard Estimate Error Chi-Square Pr > ChiSq 0.07232 0.29585 0.0597 0.8069 Hazard Ratio 1.075 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 119 b. Example 49.5 from SAS PHREG: Time-Dependent Repeated Measurements at regular time intervals Consider an experiment to study the dosing effect of a tumor-promoting agent. Forty-five rodents initially exposed to a carcinogen were randomly assigned to three dose groups. After the first death of an animal, the rodents were examined every week for the number of papillomas. Investigators were interested in determining the effects of dose on the carcinoma incidence after adjusting for the number of papillomas. The input data set TUMOR consists of the following 19 variables: * ID (subject identification) * Time (survival time of the subject) * Dead (censoring status where 1=dead and 0=censored) * Dose (dose of the tumor-promoting agent) * P1 -P15 (number of papillomas at the 15 times that animals died. These 15 death times are weeks 27, 34, 37, 41, 43, 45, 46, 47, 49, 50, 51, 53, 65, 67, and 71. For instance, subject 1 died at week 47; it had no papilloma at week 27, five papillomas at week 34, six at week 37, eight at week 41, and 10 at weeks 43, 45, 46, and 47. For an animal that died before week 71, the number of papillomas is missing for those times beyond its death.) The following SAS statements create the data set TUMOR: data Tumor; infile datalines missover; input ID Time Dead Dose P1-P15; label ID=’Subject ID’; datalines; 1 2 3 5 8 9 10 11 47 71 81 81 69 67 81 37 1 1 0 0 0 1 0 1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0 0 0 0 0 0 0 9 5 0 1 0 0 0 0 9 6 0 1 0 0 1 0 9 8 10 10 10 10 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 2 2 2 2 0 0 0 0 0 1 1 0 0 3 0 1 1 0 0 3 0 1 1 0 0 3 0 1 1 0 0 3 0 1 1 0 0 3 0 1 1 0 0 3 0 1 1 0 0 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 ; 53 38 54 51 47 27 41 49 53 50 37 49 46 48 54 37 53 45 53 49 39 27 49 43 28 34 45 37 43 0 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1 1 0 1 1 1 1 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0 0 5 2 15 13 22 6 0 0 0 3 2 4 15 12 12 3 4 6 0 7 17 0 14 8 11 10 0 9 0 0 0 0 0 0 0 0 0 0 13 14 6 6 6 6 6 6 6 6 6 6 15 15 16 16 17 17 17 17 17 17 20 20 20 20 20 20 20 13 3 0 0 15 3 6 26 14 16 6 12 10 2 8 13 3 1 2 15 3 7 26 15 17 6 15 13 2 8 13 3 1 3 3 1 4 3 1 6 3 1 6 3 1 6 3 1 6 1 6 1 120 0 6 1 3 3 4 4 4 4 9 9 9 9 26 26 26 26 26 15 15 15 15 15 15 15 15 15 6 6 6 6 6 6 6 6 6 20 20 20 13 13 15 15 15 15 15 15 20 2 2 2 2 2 2 6 9 14 14 14 14 14 14 18 20 20 20 18 12 16 16 16 16 1 1 19 19 19 19 The number of papillomas (NPap) for each animal in the study was measured repeatedly over time. One way of handling time-dependent repeated measurements in the PHREG procedure is to use programming statements to capture the appropriate covariate values of the subjects in each risk set. In this example, NPap is a time-dependent explanatory variable with values that are calculated by means of the programming statements shown in the following SAS statements: DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 121 proc phreg data=Tumor; model Time*Dead(0)=Dose NPap; array pp{*} P1-P14; array tt{*} t1-t15; t1 = 27; t2 = 34; t3 = 37; t4 = 41; t5 = 43; t6 = 45; t7 = 46; t8 = 47; t9 = 49; t10= 50; t11= 51; t12= 53; t13= 65; t14= 67; t15= 71; if Time < tt[1] then NPap=0; else if time >= tt[15] then NPap=P15; else do i=1 to dim(pp); if tt[i] <= Time < tt[i+1] then NPap= pp[i]; end; run; At each death time, the NPap value of each subject in the risk set is recalculated to reflect the actual number of papillomas at the given death time. For instance, subject one in the data set Tumor was in the risk sets at weeks 27 and 34; at week 27, the animal had no papilloma, while at week 34, it had five papillomas. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Model Information Data Set WORK.TUMOR Dependent Variable Time Censoring Variable Dead Censoring Value(s) 0 Ties Handling BRESLOW Summary of the Number of Event and Censored Values Percent Total Event Censored Censored 45 25 20 44.44 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 166.793 143.269 AIC 166.793 147.269 SBC 166.793 149.707 Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Variable Dose NPap DF 1 1 Chi-Square DF Pr > ChiSq 23.5243 2 <.0001 28.0498 2 <.0001 21.1646 2 <.0001 Analysis of Maximum Likelihood Estimates Parameter Standard Estimate Error Chi-Square Pr > ChiSq 0.06885 0.05620 1.5010 0.2205 0.11714 0.02998 15.2705 <.0001 Hazard Ratio 1.071 1.124 NOTE: After the number of papillomas is adjusted for, the dose effect of the tumor-promoting agent is not statistically significant. 122 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim What if we construct the data set as (start, stop] status x1 x2. i.e., Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ID 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 47 47 47 47 47 47 47 47 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 Dead 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Dose 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Npap 0 5 6 8 10 10 10 10 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 start 0 27 34 37 41 43 45 46 0 27 34 37 41 43 45 46 47 49 50 51 53 65 67 stop 27 34 37 41 43 45 46 47 27 34 37 41 43 45 46 47 49 50 51 53 65 67 71 status 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 123 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc phreg data=three; model (start stop)*status(0)=Dose NPap; run; Model Information Data Set WORK.THREE Dependent Variable start Dependent Variable stop Censoring Variable status Censoring Value(s) 0 Ties Handling BRESLOW Summary of the Number of Event and Censored Values Percent Total Event Censored Censored 412 25 387 93.93 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 166.793 143.269 AIC 166.793 147.269 SBC 166.793 149.707 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 23.5243 2 <.0001 Score 28.0498 2 <.0001 Wald 21.1646 2 <.0001 Variable Dose Npap DF 1 1 Analysis of Maximum Likelihood Estimates Parameter Standard Estimate Error Chi-Square Pr > ChiSq 0.06885 0.05620 1.5010 0.2205 0.11714 0.02998 15.2705 <.0001 Hazard Ratio 1.071 1.124 124 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim R code coxph1 <- coxph(Surv(start, stop, status) ~ dose+npap, method=’breslow’, data=pap.data) n= 412 coef exp(coef) se(coef) z p dose 0.0689 1.07 0.0562 1.23 2.2e-01 npap 0.1172 1.12 0.0300 3.91 9.3e-05 dose npap exp(coef) exp(-coef) lower .95 upper .95 1.07 0.933 0.96 1.20 1.12 0.889 1.06 1.19 Rsquare= 0.055 (max possible= Likelihood ratio test= 23.5 on Wald test = 21.2 on Score (logrank) test = 28.1 on 0.333 ) 2 df, p=7.8e-06 2 df, p=2.53e-05 2 df, p=8.11e-07 125 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 126 c. DIEP study To investigate whether early pregnancy losses increase with marked hyperglycemia in diabetic pregnancy, the DIEP (Diabetes In Early Pregnancy) study was conducted in five academic centers, and the association between the glycated protein level and fetal loss during the first trimester has been examined. The diagnosis of pregnancy was made during the first week of missed menses by plasma HCG determination. Diabetic subjects were then admitted to a metabolic ward for monitoring, educated in home glucose monitoring and diary keeping and insulin adjustment. Non-diabetic control subjects were screened for gestational diabetes at 26 weeks gestation, and were excluded from the control cohort if positive. Glycated protein measurements were performed in 429 control and 389 diabetic pregnancies. The methods of early pregnancy diagnosis, pregnancy dating and assessment of pregnancy loss have been described in detail elsewhere. To assess the relationship between the protein level and early pregnancy loss, a Cox model with a time-dependent covariate, glycated protein, was employed on the rationale that pregnancy loss in a given time interval might be affected by the values of the glycated protein in that interval. Obs IDNO week loss GROUP gp1 gp2 gp3 1 2 3 5 6 9 10 11 12 13 14 15 ... 431 433 434 435 437 440 443 446 12024 12026 12033 12036 12037 12045 12047 12052 12059 12071 12072 12078 14 13 13 2 2 13 13 13 13 3 11 11 0 0 0 1 1 0 0 0 0 1 0 0 CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL -0.32547 -0.74759 -0.69945 -0.05110 -0.16817 -0.16336 0.70871 -1.13056 -1.23302 . -0.55763 -0.66517 -0.16690 -0.56695 0.64185 -0.11780 1.74174 -0.03750 -0.13455 -0.70450 -1.65499 -0.89960 -0.81389 1.54764 0.73009 . -0.50187 . . -1.16834 -0.48167 -0.50187 -0.11166 . -1.45108 0.79067 11004 11006 11007 11009 11012 11017 11021 11031 14 2 11 2 12 13 12 11 0 1 0 1 0 0 0 0 DIABETIC DIABETIC DIABETIC DIABETIC DIABETIC DIABETIC DIABETIC DIABETIC 1.17303 -0.18828 . 2.88259 -0.18530 0.75729 -0.81090 1.21524 -0.13306 -0.81389 -1.00084 1.38556 1.83879 . -0.19091 -0.36447 0.20499 . 0.20499 . -0.88559 -0.72403 0.54631 -0.83542 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 447 448 459 461 462 463 510 11048 11056 11107 11110 11111 11114 21067 10 12 1 2 12 3 3 0 0 1 1 0 1 1 DIABETIC DIABETIC DIABETIC DIABETIC DIABETIC DIABETIC DIABETIC 0.58591 -0.44238 1.49422 2.27084 3.07254 . 1.97591 -0.84624 1.35355 . . 2.25934 3.48863 1.55105 0.85126 -0.76442 . . 0.91916 2.76988 0.49886 proc phreg; model week*loss(0)=grp zgp1 zgp1_s / ties=efron; array gp(*) gp1-gp3; zgp1=gp[week]; zgp1_s=zgp1*zgp1; run; Analysis of Maximum Likelihood Estimates Variable grp zgp1 zgp1_s DF Parameter Estimate Standard Error Chi-Square Pr > ChiSq Hazard Ratio 1 1 1 -0.14965 -0.18885 0.06494 0.29132 0.10914 0.02243 0.2639 2.9943 8.3847 0.6074 0.0836 0.0038 0.861 0.828 1.067 127 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 128 d. Time-dependent covariates at irregular intervals E2491 is a Phase III clinical trial to test the effect of all-trans-retinoic acid (ATRA) on acute promyelocytic leukemia (APL). 350 patients with previously untreated APL were randomly assigned to receive ATRA or daunorubicin plus cytarabine (Chemo) as induction treatment. Patients who had a complete remission received consolidation therapy consisting of one cycle of treatment identical to the induction chemotherapy, then high-dose cytarabine plus daunorubicin. Patients still in complete remission after two cycles of consolidation therapy were then randomly assigned to maintenance treatment with ATRA or to observation (Obs). INDUCTION CROSSOVER MAINTENANCE REREGISTRATION A: D: F: I: Chemo (> 3 yo), B: Chemo (<= 3 yo), C: ATRA Chemo (> 3 yo), E: Chemo (<= 3 yo) ATRA, G: Obs (direct), H: Obs ATRA extension E2491 Schema Chemo ATRA CR 1st Rando ATRA Consol. 2nd Rando Obs Some years after the study closure, a question arose whether there is a difference in survival between two types of morphology (M3 vs. M3v), and whether the treatment effect is different between two types of morphology in APL. To address this question, the data are constructed in two ways: i) treat the treatment as a time-dependent variable in the array statement in PHREG, ii) fit a model with the time-dependent variable using the counting process style of input. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 1. Time-dependent at irregular intervals pt_id t1 t2 t3 t4 atra1 atra2 atra3 atra4 1 0 . 4.10678 . 1 . 1 . 2 0 1.54415 . . 1 0 . . 3 0 1.60986 7.81930 . 1 0 0 . 4 0 . . . 0 . . . 5 0 . 4.79671 . 0 . 0 . 6 0 . 5.65092 . 1 . 0 . 7 0 . . . 0 . . . 8 0 . 4.56674 . 1 . 1 . 9 0 . 4.23819 . 0 . 0 . 10 0 . 4.10678 . 0 . 1 . 11 0 . 5.88090 . 1 . 0 . 12 0 . . . 1 . . . 13 0 . . . 0 . . . 14 0 . . . 0 . . . 15 0 . . . 0 . . . 16 0 . 5.51951 . 1 . 0 . 17 0 . 6.76797 . 1 . 1 . 18 0 . 6.47228 . 0 . 0 . 19 0 . 6.17659 . 1 . 0 . 20 0 . . . 1 . . . ..... proc phreg; class m3v(ref=’0’) sex(ref=’2’) wbc_cat(ref=’0’) plt40k(ref=’0’); model os_t*os(0)=age60 sex wbc_cat plt40k hgb m3v atra; array tt{*} t1-t4; array a{*} atra1-atra4; do i=1 to 4; if os_t ge tt[i] and tt[i] ne . then atra=a[i]; end; run; os 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 1 0 1 129 os_t 122.513 1.347 26.119 172.025 13.602 166.604 0.526 165.815 5.290 11.170 173.897 0.657 0.296 18.333 10.448 132.435 171.696 26.448 162.793 0.263 m3v 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Analysis of Maximum Likelihood Estimates Parameter Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq age60(>=60 vs <60) 0.60284 0.19821 9.2504 0.0024 sex (M vs F) 0.42957 0.16295 6.9494 0.0084 wbc_cat (>10 vs 20-50) 0.42479 0.22268 3.6389 0.0564 wbc_cat (>10 vs >=50K) 0.50615 0.28911 3.0650 0.0800 plt40k (>=40K vs <40K) -0.35150 0.17253 4.1509 0.0416 Hgb 0.03076 0.01239 6.1641 0.0130 m3v (M3V vs M3) 0.25435 0.22338 1.2966 0.2548 atra (ATRA vs no ATRA) -0.42493 0.16354 6.7516 0.0094 2. Counting process style input pt death trt atra 1 0 C 1 1 0 F 1 2 1 C 1 3 0 C 1 3 0 D 0 3 1 G 0 4 0 A 0 5 0 A 0 5 1 G 0 6 0 C 1 6 0 G 0 7 1 A 0 8 0 C 1 8 0 F 1 9 0 A 0 9 1 G 0 11 0 A 0 11 1 F 1 start 0.0 4.1 0.0 0.0 1.6 7.8 0.0 0.0 4.8 0.0 5.7 0.0 0.0 4.6 0.0 4.2 0.0 4.1 stop 4.1 122.5 1.3 1.6 7.8 26.1 172.0 4.8 13.6 5.7 166.6 0.5 4.6 165.8 4.2 5.3 4.1 11.2 130 Hazard Ratio 1.827 1.537 1.529 1.659 0.704 1.031 1.290 0.654 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 12 12 13 14 15 16 17 17 18 .... 0 0 1 1 1 1 0 0 0 C G C A A A C G C 1 0 1 0 0 0 1 0 1 0.0 5.9 0.0 0.0 0.0 0.0 0.0 5.5 0.0 5.9 173.9 0.7 0.3 18.3 10.4 5.5 132.4 6.8 proc phreg multipass; class atra(ref=’0’) m3v(ref=’0’) sex(ref=’2’) wbc_cat(ref=’0’) plt40k(ref=’0’); model (start stop)*os(0)=age60 sex wbc_cat plt40k hgb m3v atra / rl; run; ** You get the same result as above. 131 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 132 proc phreg multipass; ** equal treatment effect to both M3V and M3? class atra(ref=’0’) m3v(ref=’0’) sex(ref=’2’) wbc_cat(ref=’0’); model (start stop)*os(0)=age60 sex wbc_cat hgb m3v atra m3v|atra / rl; * if just ’rl’ then CL from Wald test is default; hazardratio ’H1’ m3v / diff=ref cl=both; * ’both’ gives CL from both Wald and PL; hazardratio ’H2’ atra / diff=ref cl=both; contrast ’C1’ atra 0 m3v 1 m3v*atra 0, atra 0 m3v 1 m3v*atra 1 / estimate=exp; ** exp specifies the linear predictors be estimated in the exponential scale. run; **Note: the result of ’H1’ is the same as ’C1’. H1: Hazard Ratios for m3v Description m3v 1 vs 0 At atra=0 m3v 1 vs 0 At atra=1 Point Estimate 1.069 1.970 95% Wald Confidence Limits 0.605 1.888 1.055 3.680 95% Profile Likelihood Confidence Limits 0.583 1.834 1.019 3.586 H2: Hazard Ratios for atra Description atra 1 vs 0 At m3v=0 atra 1 vs 0 At m3v=1 Contrast C1 Contrast C1 C1 Point Estimate 0.587 1.083 Contrast Test Results Wald DF Chi-Square 2 4.5307 Type EXP EXP Row 1 2 95% Wald Confidence Limits 0.410 0.842 0.521 2.249 95% Profile Likelihood Confidence Limits 0.407 0.837 0.515 2.260 Pr > ChiSq 0.1038 Contrast Rows Estimation and Testing Results Standard Estimate Error Alpha Confidence Limits 1.0688 0.3103 0.05 0.6049 1.8882 1.9699 0.6281 0.05 1.0545 3.6799 Wald Chi-Square 0.0525 4.5222 Pr > ChiSq 0.8188 0.0335 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 133 5.12 Model Selection and Assessment • It is challenging to fit models to data without knowing what the true model is or might be. • Choosing an appropriate model is a fundamental difficulty in statistical analysis. • To do it properly, it requires knowledge of the disease, in depth understading of the statistcis, and data analysis experience 5.12.1 Automatic Variable Selection • Forward: variables are added to the model one at a time. At each time, variables added is the one that gives the largest decrease in −2Log L̂ • Backward: first fit the largest model and eliminate variables one by one using the −2Log L̂ values • Stepwise: same as forward except that a variable that has been included in the model can be removed at a later time. • Best subset in SAS using score test. • SAS uses Wald test for F/B/S selection, STATA uses score test. • Automatic variable selection process leads to the identification of one particular subset, but there might be a number of equally good subsets • The process also depends on a particular selection process (B/F/S) one chooses and a stopping rule for inclusion/exclusion criteria DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 134 5.12.2. -2 Log (L): • For a given set of data, the larger the value of the maximized likelihood (L), the better the agreement between the model and the observed data. • Because L̂ is the product of a series of conditional probabilities, L̂ is bounded by 1. i.e., −2Log(L̂) is positive. • For a given data set, the smaller the value of −2log(L̂) is, the better the model is However, −2Log(L̂) cannot be used on its own as a measure of model adequacy because it depends on the number of observations in the data set and it decreases as the number of parameters increases. Conversely, L̂ increases as the number of parameters increases. 5.12.3 Akaike Information Criterion (AIC) • Accepting partcial likelihood as method for measuring how well a model fits data, i.e., Accuracy Measure = E[log likelihood of the fitted], AIC is an unbiased estimator of −2LogL. AIC = −2Log L̂} {z | measure of inaccuracy + |2 ∗{z p} penalty where L̂ is the maximized likelihood function, p is the number of parameters in the model. • Since Log(L) increases as the number of parameters increases, 2 ∗ p serves as a penalty term. For the mathmatical derivationn of AIC, check Bozdogan (Psychometrika, 1987) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 135 • It is designed to approximate the real model to minimize the average estimation error. Thus the smaller value is the better. • Used when comparing different models for the SAME data set (make sure any missing data are excluded prior to model selection). The model with smallest AIC value is chosen as the best model to fit the data. See Example 10B. • For a finite sample size, use AICc. 2 ∗ p ∗ (p + 1) AICc = AIC + n − {zp − 1 } | biascorrection • STATA has a function swaic that sequentially adds or deletes predictors depending on whether they improve AIC. • AIC can also be used to select a best transformation (e.g., bilirubin vs log(bilirubin) in the PBC data) • Also, check Cp, BIC, and a penalized model with L1 lasso penalty (Hastie, Tibshirani, Friedman, The elements of statistcial learning, 2nd edition, Springer, 2009) • See Steyerberg et al (Epidemiology 2010). They assessed the performance of traditional and novel prediction models. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example 10B. proc phreg data=pbc; model fu_days*status2(0) = age sex edema log_bili selection=f slstay=0.1 details; title "Forward"; proc phreg data=pbc; model fu_days*status2(0) = age sex edema log_bili selection=b slstay=0.1 details; title "Backward"; proc phreg data=pbc; model fu_days*status2(0) = age sex edema log_bili selection=s slentry=0.1 slstay=0.1 details; title "Stepwise"; proc phreg data=pbc; model fu_days*status2(0) = age sex edema log_bili selection=score best=3 details; title "Best"; protime albumin / protime albumin / protime albumin / protime albumin / ----------------------------------------------------------------------Forward Summary of Forward Selection Step Effect Entered DF Number In Score Chi-Square Pr > ChiSq 1 log_bili 1 1 180.9505 <.0001 2 edema 1 2 37.1781 <.0001 3 age 1 3 31.1436 <.0001 4 albumin 1 4 13.5624 0.0002 5 protime 1 5 8.1288 0.0044 ----------------------------------------------------------------------- 136 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim ----------------------------------------------------------------------Backward Analysis of Maximum Likelihood Estimates Parameter age edema log_bili protime albumin DF 1 1 1 1 1 Parameter Estimate 0.04020 0.93643 0.86811 0.17448 -0.75159 Standard Error 0.00767 0.27103 0.08290 0.06111 0.20946 Chi-Square 27.4635 11.9379 109.6515 8.1529 12.8756 Pr > ChiSq <.0001 0.0006 <.0001 0.0043 0.0003 Hazard Ratio 1.041 2.551 2.382 1.191 0.472 Summary of Backward Elimination Step 1 Effect Removed sex DF 1 Number In 5 Wald Chi-Square 0.9398 Pr > ChiSq 0.3323 ---------------------------------------------------------------------------Stepwise Summary of Stepwise Selection Step 1 2 3 4 5 Effect Entered Removed log_bili edema age albumin protime DF 1 1 1 1 1 Number In 1 2 3 4 5 Score Chi-Square 180.9505 37.1781 31.1436 13.5624 8.1288 Wald Chi-Square Pr > ChiSq <.0001 <.0001 <.0001 0.0002 0.0044 137 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim ---------------------------------------------------------------------------Best Subset Regression Models Selected by Score Criterion Number of Variables Score Chi-Square 1 1 1 Variables Included in Model -2LogL AIC 180.9505 103.6249 70.1695 log_bili edema albumin 1584.379 1586.379 1672.175 1674.175 2 2 2 249.9111 218.8702 211.2429 edema log_bili age log_bili log_bili albumin 1555.940 1559.940 1550.672 1554.672 1555.098 1559.098 3 3 3 277.0052 265.8361 258.8539 age edema log_bili edema log_bili albumin edema log_bili protime 1525.644 1531.644 1538.148 1544.148 1548.633 1554.633 4 4 4 289.2636 285.2958 277.0054 age edema log_bili albumin age edema log_bili protime age sex edema log_bili 1512.478 1520.478 1518.087 1526.087 1525.154 1533.154 5a 5 5 296.8619 289.3640 285.3139 age edema log_bili protime albumin age sex edema log_bili albumin age sex edema log_bili protime 1505.610 1515.610 (**) 1511.606 1521.606 1517.526 1527.526 6a 296.8902 age sex edema log_bili protime albumin 1504.711 1516.711 ----------------------------------------------------------------------------------- 138 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 139 5.12.3 Harrell’s C-Index • The performance of a mathematical model predicting a dichotomous outcome is charcaterized by two types of measures: discrimination and calibration. • Discrimination quantifies the ability of the model to correctly classify subjects into one of two categories. • Calibration describes how closely the predicted probabilities agree numerically with the actual outcomes. • A measure of discrimination used in a dichotomous outcome is the area under the receiver operating characteristic (ROC) curve, c. • Measuring discrimination in survival analysis is more difficult and ambiguous than in logistic regression. In survival, we expect our model to correctly distinguish between those that have shorter survival time and longer survival times • Harrell’s C-Index is an extension of the area under the ROC (receiver operating charcateristic) curve to survival data to measure this discriminatory ability. • Harrell’s C-Index is a proportion of usable pairs in which the predictions and outcomes are concordant DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • Usable pairs: – Both event times are observed – Ti < Cj , where i 6= j • Unusable pairs: – Both events are censored – Ti > Cj , where i 6= j • Probability of concordant and discordant pairs πc = P [(Ti < Tj &Zi < Zj ) or (Ti > Tj &Zi > Zj )] = P (Ti < Tj &Zi < Zj ) + P (Ti > Tj &Zi > Zj ) πd = P ((Ti < Tj &Zi > Zj )or(Ti > Tj &Zi < Zj )) = P (Ti < Tj &Zi > Zj ) + P (Ti > Tj &Zi < Zj ) where Ti and Tj denote the survival times for subjects i and j and Zi and Zj denote the predicted probabilities of survival for subjects i and j • Proportion of unusable pairs: πu = 1 − (πc + πd) • C-index C = πc πc = 1 − πu πc + πd 140 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 141 5.12.4 Integrated Brier Score (IBS) • The expected Brier score is a mean squared error of prediction. • A measure of prediction error, with a better predictor indicated by a lower score. • Also check NRI(Net Reclassification Improvement), IDI(Integrated Discrimination Improvement) 5.12.5 General Comments • Be mindful of the purpose of regression analysis: – To dentify potential prognostic factors for a particular failure – To assess a prognostic factor of interest in the presence of other potential prognostic factors • Regression analysis depends on at which purpose you aim • Model selection is not an exact science. In reality, there may not be one best model. • The decision on selecting a most appropriate model should be made based on both statistical and non-statistical considerations. • Consider the trade off between variance and bias. • PH modelling is also a regression analysis. Thus, in addition to checking assumptions and assessing the model fit, one also has to consider issues of collinearity, overfitting, what interaction terms need to be included and so on. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 142 NOTE: For the Cox model, the default calculation correlates the linear predictor with survival time. A large linear predictor (i.e., large log hazard) means shorter survival time. To obtain the larger value for a longer survival time, negate ’pred’ before computing C library(Hmisc) library(survival) set.seed(333) x1 <- rnorm(200) x2 <- x1 + rnorm(200) d.time <- rexp(200) + (x1 - min(x1)) cens <- runif(200, 0.5, 4) death <- d.time <= cens o.time <- pmin(d.time, cens) cox1 <- coxph(Surv(o.time, death) ~ x1 + x2) print(cox1) pred <- predict(cox1) r1 <- rcorr.cens(pred, Surv(o.time, death)) print(r1) r2 <- rcorr.cens(-pred, Surv(o.time, death)) print(r2) print(cbind(x1, x2, d.time,cens,death,o.time, pred, -pred)) > print(cox1) Call: coxph(formula = Surv(o.time, death) ~ x1 + x2) coef exp(coef) se(coef) z p x1 -1.963 0.14 0.313 -6.27 3.6e-10 x2 0.285 1.33 0.167 1.70 8.8e-02 Likelihood ratio test=66 on 2 df, p=4.77e-15 n= 200, number of events= 38 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim > pred <- predict(cox1) > r1 <- rcorr.cens(pred, Surv(o.time, death)) > print(r1) C Index Dxy S.D. n missing uncensored Relevant Pairs 8.761416e-02 -8.247717e-01 3.955424e-02 2.000000e+02 0.000000e+00 3.800000e+01 7.008000e+03 Uncertain 3.279200e+04 > r2 <- rcorr.cens(-pred, Surv(o.time, death)) > print(r2) C Index Dxy S.D. n missing uncensored Relevant Pairs 9.123858e-01 8.247717e-01 3.955424e-02 2.000000e+02 0.000000e+00 3.800000e+01 7.008000e+03 Uncertain 3.279200e+04 > print(cbind(x1, x2, d.time,cens,death,o.time, pred, -pred)) x1 x2 d.time cens death o.time pred -pred [1,] -0.08281164 -0.9941912071 2.39364016 3.3066471 1 2.39364016 -0.030526402 0.030526402 [2,] 1.93468099 1.9228580779 5.15371891 1.5965431 0 1.59654312 -3.159974636 3.159974636 [3,] -2.05128979 -2.6469355391 2.32330035 2.9687605 1 2.32330035 3.362809753 -3.362809753 [4,] 0.27773897 1.4638776176 5.70229234 1.5630664 0 1.56306638 -0.038165802 0.038165802 143 Concordant 6.140000e+02 Concordant 6.394000e+03 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim library(Hmisc) library(survival) attach(subset(data_102210, alln==0)) # Oct 28, 2010 -----------------------------------------------------------------------------------cox1 <- coxph(Surv(tos, os)~age50+femdon+minifg+tcd+wbc50+cr6+cr1+kps90+pbsc +cmvpos+mudgp+mm+secfg+pbsc.t+factor(mrc.cat)) pred1 <- predict(cox1) c1 <- rcorr.cens(-pred1, Surv(tos, os)) print(c1) C Index Dxy S.D. n missing 7.641054e-01 5.282108e-01 2.345114e-02 7.490000e+02 0.000000e+00 uncensored Relevant Pairs Concordant Uncertain 3.830000e+02 4.056600e+05 3.099670e+05 1.543600e+05 One way to calculate 95% CI: C hat +/- 1.96*SD/sqrt(n) 0.7641054 +/- 1.96*0.02345114/sqrt(749) NOTE: For another confidence interval, you can check Hajime’s R package "SurvC" -----------------------------------------------------------------------------------cox2 <- coxph(Surv(tos, os)~age50+femdon+minifg+tcd+wbc50+cr6+cr1+kps90+pbsc +cmvpos+mudgp+mm+secfg+pbsc.t+factor(dfci.cat)) pred2 <- predict(cox2) c2 <- rcorr.cens(-pred2, Surv(tos, os)) print(c2) C Index Dxy S.D. n missing 7.64029e-01 5.28058e-01 2.33881e-02 7.49000e+02 0.00000e+00 uncensored Relevant Pairs Concordant Uncertain 3.83000e+02 4.05660e+05 3.09936e+05 1.54360e+05 -----------------------------------------------------------------------------------cox3 <- coxph(Surv(tos, os)~age50+femdon+minifg+tcd+wbc50+cr6+cr1+kps90+pbsc +cmvpos+mudgp+mm+secfg+pbsc.t+factor(eortc.cat2)) pred3 <- predict(cox3) c3 <- rcorr.cens(-pred3, Surv(tos, os)) print(c3) C Index Dxy S.D. n missing 7.616378e-01 5.232756e-01 2.344496e-02 7.490000e+02 0.000000e+00 uncensored Relevant Pairs Concordant Uncertain 3.830000e+02 4.056600e+05 3.089660e+05 1.543600e+05 144 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example of Brier Score library(survcomp) attach(subset(data_102210, alln2 == 0)) pbsc.t <- pbsc * tos cox1 <- coxph(Surv(tos, os) ~ age50 + femdon + minifg + tcd + wbc50 + cr6 + cr1 + kps90 + pbsc + cmvpos + mudgp + mm + secfg + pbsc.t + factor(mrc.cat)) pred <- predict(cox1) dd <- data.frame(time = tos, event = os, score = pred) sb <- sbrier.score2proba(data.tr = dd, data.ts = dd, method = "cox") print(sb) ---------------------------------1. MRC $bsc.integrated [1] 0.1541683 2. DFCI $bsc.integrated [1] 0.1531032 3. EORTC $bsc.integrated [1] 0.1538132 145 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 146 5.13 Multiple Events per Subject • Multiple events of the same type or different types do occur in clinical trials. Examples include both death and recurrence in cancer clinical trials, multiple myocardial infactions (MI) in cardiovascular disease, repeated events of infectious diarrhrea caused by E. Coli among young children in developing countries (a water intervation study). • A major issue in this situation is intrasubject correlation. • The ordinary Cox model assumes independence. i.e., V (β̂) = {E[I(β)]}−1. However, when multiple events occur within a subject, this assumption does not hold and needs an appropriate correction. ′ • One way to correct this is to use a robust variance estimate (D = D̃ D̃) for correlated data using the jackknife method. • The jackknife method leaves out one subject at a time rather than one observation at a time. This can be obtained in coxph by specifying cluster(id) or robust = T , or by specifying covsandwich(aggregate) in PHREG. • In this section, we will focus on multiple events of same type. Multiple events of different types will be discussed in Section 6. • There are several models in the literature. Here we discuss three marginal models for ordered events: Andersen-Gill (AG) model, marginal model by Wei, Lin, Weissfeld (WLW), and conditional model by Prentice, Williams, Peterson (PWP). DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • The AG model is 147 Yi(t)λ0(t)expXi(t)β where Yi(t) is a at-risk indicator. In Cox model, the uncensored subject i is not included further at risk once the subject fails at time t. In the AG model for recurrent events, Yi(t) remains in the model and Yi(t) = 1 for each new event. This model assumes independence of multiple events within a subject. i.e., the numbers of events in nonoverlapping time intervals are independent given the covariates (”independent increments”). The time scale is “time since entery”. • The WLW model is Yij (t)λ0j (t)expXi (t)βj for the jth event for the ith subject. This model treats the ordered outcome data set as though it were an unordered competing risks data. Unlike the AG model, this model allows a separate underlying hazard for each event. The analysis is on the “time from study entry” scale and all the time intervals start at zero. i.e., if there is a maximum 3 events, then there will be 3 strata in the data set and in each stratum, time starts from zero. • The PWP conditional model assumes that a subject cannot be at risk for event k + 1 until event k occurs. The hazard function for the ith event is identical to the hazard function in the WLW model, except for the definition of the at risk indicator Yij (t). Yij (t) is zero until the j − 1st event, then becomes 1. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Schematic Illustration of the three models Cox model PWP model AG model WLW model 148 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 149 Example 11. Multiple Events per Subject a. Bladder Cancer Data The bladder cancer data listed in Wei, Lin, and Weissfeld (1989, JASA 84, 1065-71). The data consist of 86 patients with superficial bladder tumors, which were removed when they entered the study. Forty eight of these patients were randomized into the placebo group, and 38 into the thiotepa group. Many patients have multiple recurrences of tumors in the study, and new tumors were removed at each visit. The data set contains the first four recurrences of the tumor for each patient, and each recurrence time was measured from the patient’s entry time into the study. The input data consist of eight variables: • ID (patient’s identification) • TRT (treatment group, where 1=placebo and 2=thiotepa) • NUMBER (number of initial tumors) • SIZE (initial tumor size) • VISIT (event number, where 1=first recurrence, 2= second recurrence, and so on) • TIME (followup time) • T1, T2, T3, and T4 : times of the four possible recurrences of the bladder tumor. • TSTART (time of the (k − 1)st recurrence if VISIT=k, or the the entry time if VISIT=1) • TSTOP (time of the kth recurrence if VISIT=k) • STATUS (event status, where 1=recurrence, 0=censored) data bladder; input trt time number size @27 t1 @31 t2 @35 t3 @39 t4; id + 1; cards; 1 0 1 1 1 1 1 3 1 4 2 1 1 7 1 1 1 10 5 1 1 10 4 1 6 1 14 1 1 1 18 1 1 1 18 1 3 5 1 18 1 1 12 16 1 23 3 3 1 23(*) 1 3 10 15 1 23 1 1 3 16 23 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 1 1 1 1 1 ... 26 26 26 28 29 1 8 1 1 1 2 1 4 2 4 1 2 25 26 Let us consider ID=12 AG Interval Stratum (0, 3] 1 (3, 10] 1 (10, 15] 1 WLW Interval Stratum (0, 3] 1 (0, 10] 2 (0, 15] 3 PWP: interval Interval Stratum (0, 3] 1 (3, 10] 2 (10, 15] 3 PWP: gap time Interval Stratum (0, 3] 1 (0, 7] 2 (0, 5] 3 1. Data layout for the WLW marginal model: id=1 is deleted due to zero futime id 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 time 1 1 1 1 4 4 4 4 7 7 7 7 10 10 10 10 6 10 10 status 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 trt 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 size 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 number 1 1 1 1 2 2 2 2 1 1 1 1 5 5 5 5 4 4 4 visit 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 150 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 2. Data layout for the AG and PWP conditional models Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 id 2 3 4 5 6 6 7 8 9 9 10 10 10 11 12 12 12 13 13 13 14 14 tstart 0 0 0 0 0 6 0 0 0 5 0 12 16 0 0 10 15 0 3 16 0 3 tstop 1 4 7 10 6 10 14 18 5 18 12 16 18 23 10 15 23 3 16 23 3 9 status 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 0 1 1 1 1 1 trt 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 size 3 1 1 1 1 1 1 1 3 3 1 1 1 3 3 3 3 1 1 1 1 1 number 1 2 1 5 4 4 1 1 1 1 1 1 1 3 1 1 1 1 1 1 3 3 visit 1 1 1 1 1 2 1 1 1 2 1 2 3 1 1 2 3 1 2 3 1 2 151 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 3. R code and coxph outputs for the three models a. time to first event (Cox model) Cox <- coxph(Surv(time, status) ~ trt+size+number, data=cox.dat) ***time: time to the first event n= 85 coef exp(coef) se(coef) z p trt -0.5260 0.591 0.3158 -1.665 0.0960 size 0.0696 1.072 0.1016 0.685 0.4900 number 0.2382 1.269 0.0759 3.139 0.0017 exp(coef) exp(-coef) lower .95 upper .95 trt 0.591 1.692 0.318 1.10 size 1.072 0.933 0.879 1.31 number 1.269 0.788 1.094 1.47 Rsquare= 0.11 (max possible= 0.987 ) Likelihood ratio test= 9.92 on 3 df, Wald test = 10.5 on 3 df, Score (logrank) test = 11.1 on 3 df, p=0.0193 p=0.0145 p=0.0111 b. WLW model WLW <- coxph(Surv(time, status) ~ trt+size+number+strata(visit)+cluster(id), data=wlw.dat) n= 340 coef exp(coef) se(coef) robust se z Pr(>|z|) trt -0.58479 0.55722 0.20105 0.30795 -1.899 0.05756 . size -0.05162 0.94969 0.06973 0.09459 -0.546 0.58526 number 0.21029 1.23404 0.04675 0.06664 3.156 0.00160 ** trt size number exp(coef) exp(-coef) lower .95 upper .95 0.5572 1.7946 0.3047 1.019 0.9497 1.0530 0.7890 1.143 1.2340 0.8103 1.0829 1.406 Rsquare= 0.072 (max possible= 0.924 ) 152 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 153 c. AG model AG <- coxph(Surv(tstart, tstop, status) ~ trt+size+number+cluster(id), data=bladder_AG.dat) n= 190 coef exp(coef) se(coef) robust se z p trt -0.4116 0.663 0.1999 0.2488 -1.655 0.0980 size -0.0411 0.960 0.0703 0.0742 -0.554 0.5800 number 0.1637 1.178 0.0478 0.0584 2.801 0.0051 exp(coef) exp(-coef) lower .95 upper .95 trt 0.663 1.509 0.407 1.08 size 0.960 1.042 0.830 1.11 number 1.178 0.849 1.050 1.32 Rsquare= 0.074 (max possible= Likelihood ratio test= 14.7 on Wald test = 11.2 on Score (logrank) test = 16.2 on 0.992 ) 3 df, p=0.00213 3 df, p=0.0107 3 df, p=0.00104, Robust = 10.8 p=0.0126 proc phreg covs(aggregate); model (tstart, tstop)*status(0)=trt size number / ties=efron; id id; * ** to calculate the robust sandwich variance estimate for each subject. where tstart < tstop; Analysis of Maximum Likelihood Estimates with Sandwich Variance Estimate Parameter trt size number DF 1 1 1 Parameter Estimate -0.41164 -0.04108 0.16367 Standard Error 0.24152 0.07228 0.05691 StdErr Ratio 1.208 1.028 1.191 Chi-Square 2.9048 0.3230 8.2722 Pr > ChiSq 0.0883 0.5698 0.0040 Hazard Ratio 0.663 0.960 1.178 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 154 d. conditional model PWP <- coxph(Surv(tstart, tstop, status) ~ trt+size+number+strata(visit)+cluster(id), data=bladder_AG.dat) n= 190 coef exp(coef) se(coef) robust se z p trt -0.3335 0.716 0.2162 0.2048 -1.628 0.10 size -0.0085 0.992 0.0728 0.0616 -0.138 0.89 number 0.1196 1.127 0.0533 0.0514 2.328 0.02 exp(coef) exp(-coef) lower .95 upper .95 trt 0.716 1.396 0.480 1.07 size 0.992 1.009 0.879 1.12 number 1.127 0.887 1.019 1.25 Rsquare= 0.034 (max possible= Likelihood ratio test= 6.51 on Wald test = 7.26 on Score (logrank) test = 6.91 on 0.965 ) 3 df, p=0.0893 3 df, p=0.064 3 df, p=0.0747, Robust = 8.83 proc phreg covs(aggregate); ** PWP model; strata visit; model (tstart, tstop)*status(0)=trt size number / ties=efron; id id; where tstart < tstop; Analysis of Maximum Likelihood Estimates with Sandwich Variance Estimate Parameter Standard StdErr Parameter DF Estimate Error Ratio Chi-Square trt 1 -0.33349 0.19727 0.913 2.8579 size 1 -0.00849 0.06018 0.827 0.0199 number 1 0.11962 0.04971 0.932 5.7894 p=0.0317 Pr > ChiSq 0.0909 0.8878 0.0161 Hazard Ratio 0.716 0.992 1.127 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 155 Example 12. Weighted Cox regression Analysis: E4494 E4494 is a phase III trial of CHOP versus R-CHOP in older patients with diffuse large B-cell lymphoma (DLBCL). The study was a two stage randomized design with the first randomization to either CHOP(cyclophosphamide, doxorubicin, vincristine, prednisone) or R-CHOP for the induction treatment and the second randomization to maintenance rituximab (MR) or observation (Obs) for remitters. The primary endpoint was failure-free survival (FFS), defined as time from randomization to relapse, non-protocol treatment or death. There were two study questions: i) which induction treatment is better; ii) whether rituximab should be given in maintenance. The results were reported by Habermann et al.(JCO, 2006). The schema is shown below. R−CHOP MR CR/PR 1st Rando CHOP • Results in the published paper (Habermann et al. JCO, 2006) Induction no. of pts Maintenance N RR rate 3yr FFS not in main N 2yr FFS R-CHOP 267 77% 53% 90 MR 177 76% CHOP 279 76% 46% 97 Obs 182 61% 0.009 p-value 0.04 note: 2yr FFS from the second randomization. N 2yr FFS R-CHOP → MR 82 79% R-CHOP → Obs 95 77% CHOP → MR 95 74% CHOP → Obs 87 45% p-value 0.0004 FFS: after the second randomization. 2nd Rando Obs DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 156 • However, the interaction between induction and maintenance therapy was significant because MR improved the outcome after CHOP but not after R-CHOP. i.e., the data suggest that rituximab as part of induction therapy or as maintenance in responding patients result in a significant prolongation of FFS (P=0.0004). • Because of the observed difference in effect of MR according to the type of induction, a secondary analysis was performed to address the induction question without MR. In the secondary analysis, the HR of R-CHOP relative to CHOP in FFS was 0.64 with p=0.003. • Common practice of data analysis in two-stage randomization studies are: i)estimating survival distribution under different induction therapies using all data while ignoring maintenance therapy; ii) estimating postremission survival distribution using only data for individuals receiving maintenance therapy. But neither of these approaches addresses the induction and maintenance question properly since a subsequent randomization to maintenance therapy was conducted contingent on their remission status and consent. To remedy this problem, Lunceford et al (2002) and Wahed et al (2004) proposed a weighted approach. • Motivated by Lunceford et al (2002) and Wahed et al (2004), a weighted Cox regression analysis was conducted to compare induction treatments without the confounding effect of maintenance therapy by i) excluding MR patients, ii) roughly doubling the information for patients randomly assigned to observation, iii) using the robust variance estimator. • If patients with MR are simply excluded in the analysis, patients with second randomization would be underrepresented in the comparison of induction therapy. Thus, in the weighted Cox model for the induction comparison, patients with MR were excluded, but patients in the observation arm were weighted by 1.97. • i.e., 1.97=no of Obs / total no of pts in the maintenance therapy−1 = (182/359)−1=1.97 • Robust variance estimator was used by specifying cluster(case). DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim R code: the results shown below are slightly different from the publsihsed numbers due to the update of the data a. naive Cox model applied to 546 patients who were randomized to RCHOP (n=267) or to CHOP(n=279). naive.cox <- coxph(Surv(ttfst1,failst1)~trt1, data=e4494.dat) ** trt1=1 if RCHOP, 0 for CHOP n= 546 coef exp(coef) se(coef) z p trt1 -0.236 0.79 0.111 -2.12 0.034 trt1 exp(coef) exp(-coef) lower .95 upper .95 0.79 1.27 0.635 0.983 Rsquare= 0.008 (max possible= Likelihood ratio test= 4.51 on Wald test = 4.49 on Score (logrank) test = 4.51 on 0.999 ) 1 df, p=0.0337 1 df, p=0.0342 1 df, p=0.0338 b. naive Cox model with exclusion of MR no.MR.cox <- coxph(Surv(ttfst1,failst1)~trt1, data=no.MR.dat) n= 369 (182+187) coef exp(coef) se(coef) z p trt11 -0.299 0.742 0.128 -2.34 0.019 trt11 exp(coef) exp(-coef) lower .95 upper .95 0.742 1.35 0.577 0.953 Rsquare= 0.015 (max possible= Likelihood ratio test= 5.51 on Wald test = 5.48 on Score (logrank) test = 5.52 on 0.999 ) 1 df, p=0.0189 1 df, p=0.0192 1 df, p=0.0188 157 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim c. weighted Cox model with exclusion of MR wt.cox <- coxph(Surv(ttfst1,failst1)~trt1+cluster(case), weight=wt, data=no.MR.dat) n= 369 coef exp(coef) se(coef) robust se z p trt1 -0.362 0.696 0.108 0.136 -2.66 0.0077 trt1 exp(coef) exp(-coef) lower .95 upper .95 0.696 1.44 0.533 0.909 Rsquare= 0.03 (max possible= 1 ) Likelihood ratio test= 11.2 on 1 df, Wald test = 7.1 on 1 df, Score (logrank) test = 11.3 on 1 df, p=0.000809 p=0.0077 p=0.000782, Robust = 7.06 p=0.00788 Note: The p-value in the JCO paper was 0.003, but in this re-analysis, p=0.0077. Since ECOG data change very often after study closure due to the late response or change in pathology or change in eligibility and so on, it is not surprisong to see a slightly different p-value. 158 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 159 6.0 Analysis of Competing Risks Data 6.1. Introduction • Competing risks (CR) occur commonly in medical research, although the presence is not always recognized. • Competing risks data are inherent to cancer clinical trials in which failure can be classified by the types. e.g., death from the treatment related toxicity (TRM) and recurrence of disease (relapse). • Competing risks arise when individuals can experience any one of J distinct event types and the occurrence of one type of event prevents the occurrence of other types of events or alters the probability of occurence of the other event. • Types of CR: ’classic’, ’semi-competing risks’ • For the analysis of competing risks data, standard survival analysis should not be applied. • Parallel to the standard survival analysis, competing risks data analysis includes estimation of the cumulative incidence of an event of interest in the presence of competing risks, comparison of cumulative incidence curves in the presence of competing risks, and competing risks regression analysis. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Types of Treatment Failure 1.0 1.0 0.8 0.8 0.6 0.6 Probability Probability Treatment Failure 160 NRM+Relapse (1−EFS) 0.4 0.2 NRM 0.4 0.2 Relapse 0.0 0.0 0 1 2 3 Years from Transplantation 4 0 1 2 3 Years from Transplantation 4 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 161 6.2. Mathematical Definitions and Terminologies 6.2.1. Approaches Failure time in the competing risks setting can be described univariately or multivariately. • Traditional (latent failure times) Approach – (T1, · · · , Tk ): k latent failure times, where Ti is the time to failure of cause i, i = 1, 2, · · · , k – T = min(T1, · · · , Tk ) since only one of the failures can occur. – Accounting for censoring, the observable quantities are (Y, I), where Y = C if I = 0, and Y = T and I = i if an event occurs with a failure type i, (i = 1, 2, · · · , k). • Focused on cause-specific hazard • Because the latent approach is based on multivariate failure times, the cause-specific hazard for an event of interest is derived from a joint and marginal survivor functions. • The joint distribution of competing risks failure times is unidentifiable unless failure times are independent (Tsiatis 1975, PNAS). • Even though competing risks are observable, observations of (Y, I) give no information on whether failure times are independent or not and • The assumption of independence is untestable and unjustifiable in the competing risks setting in which the biologic mechanisms among risks of events may be either unknown or likely interdependent. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Non−myeloablative Transplant 1.0 1.0 0.8 0.8 0.6 0.6 Probability Probability Myeloablative Transplant 162 0.4 TRM 0.4 TRM 0.2 0.2 Relapse Relapse 0.0 0.0 0 1 2 3 Years from Transplantation 4 5 0 1 2 3 Years from Transplantation 4 5 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • Modern Approach based on subdistribution function – T : denote time to an event – C: censoring time – Y = min(T, C): observed failure time – I = i (i = 1, 2, · · · , k) for failure type i – (Y, I): observable quantaties • Focus on cumulative incidence function of cause i directly • no independenece assumption 163 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 6.2.2. Definitions Suppose there are k distinct types of failure. • Overall hazard function at time t 1 λ(t) = lim Prob(t ≤ T < t + u|T ≥ t) u→0 u • Cause-specific (CS) hazard Prob(t ≤ T < t + u, I = i|T ≥ t) , i = 1, · · · , k u→0 u λi(t) = lim λi (t) represents the instanteneous rate for failure of type i at time t in the presence of other failure types. • CS Cumulative hazard function Λi (t) = • CS Survival function Z t 0 λi (u)du Si (t) = exp[−Λi(t)] 164 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 165 • If only one of the failure types can occur for each individual, then λ(t) = k X i=1 λi (t) and S(t) = P (T > t) = exp[− • Subdensity function for failure i k X i=1 Λi(t)] 1 fi (t) = lim Prob(t ≤ T < t + u, I = i) u→0 u = λi (t)S(t), i = 1, · · · , k Thus λi(t) = fi(t) S(t) (5) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • Cumulative incidence function (CIF) of cause i Fi (t) = Prob(T ≤ t, I = i) = Z t 0 fi(u)du = 166 Z t 0 λi (u)S(u)du for i = 1, · · · , k. This is also called subdistribution function. As t → ∞, P Fi (∞) = Prob(I = i) = pi < 1 where ki=1 pi = 1. • CIF for cause i ignoring other causes F∗i (t) = Z t 0 λi(u)Si∗(u)du where Si∗ (t) is a cause-specific survival function for cause i censoring competing risks. F∗i (t) + Si∗ (t) = 1. • Because events from causes other than i are treated as censored in Si∗(t), S(t) ≤ Si∗(t), and thus Fi (t) ≤ F∗i (t). • Si∗ (t) is used in the standard survival analysis and it is biased if there are competing risks. (6) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 167 • Since no one-to-one relationship exists between the cause-specific hazard and the CIF for failure i, the comparison of cause-specific hazards of failure i between different groups can be quite different from the comparison of the cumulative incidence of failure i. • To be able to directly compare subdistribution functions, Gray (1988, Ann Statist) further defined a hazard function that corresponds to the subdistribution. • Subdistribution hazard for failure i 1 γi(t) = lim Pr{t ≤ T < t + u, I = i|T ≥ t ∪ (T ≤ t ∩ I 6= i)} u→0 u fi (t) = 1 − Fi(t) • Subdistribution hazard: probability of observing an event of interest in the next time interval given that either the event did not occur until that time or that the competing risks event occurred. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 168 6.3. Estimation of Cumulative Incidence Function • Let 0 < t1 < · · · < tl represent the l ordered distinct failure times for any cause of failure. If t is discrete, the hazard of failing from cause i is Prob(T = tj , I = i) , j = 1, · · · , l λi (tj ) = Prob(T > tj−1) and the estimate is dij λ̂i(tj ) = nj where dij is the number of failues of cause i at time tj and nj is the number of subjects at risk just prior to tj . • Let dj = Pk i=1 dij and λ̂(tj ) = Pk i=1 λ̂i (tj ). Ŝ(t) = Y Then the KM estimate of the overall survival function (5) is (1 − λ̂(tj )) = j:tj <t • Thus, the estimate of the CIF (6) is F̂i (t) = Y (1 − i:tj <t dj ). nj dij Ŝ(tj−1). j:tj <t nj X note: see Marubini and Valsecchi (1995) for the derivation of the variance of (8). (7) (8) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 169 • But, if we use the naive KM method which ignores competing risks, the estimate of CIF for failure i is Y dij F̂i∗(t) = Ŝi (tj−1), (9) j:tj <t nj where Ŝi (t) = Y (1 − j:tj <t dij ). nj (10) Because Ŝ(t) ≤ Ŝi (t), F̂i(t) ≤ F̂i∗(t). Therefore, when there are competing risks, the KM method in standard survival analysis overestimates the cumulative incidence function, and the maginitude of overestimation in the KM method depends on the level of incidence rates of competing events. • In summary CR method KM method Fi (t) = 0t λi(u)S(u)du F∗i (t) = 0t λi (u)Si(u)du P S(t) = exp[− ki=1 Λi (t)] Si (t) = exp[−Λi(t)] d d Q Q F̂i(t) = j:tj <t nijj Ŝ(tj−1) F̂i∗(t) = j:tj <t nijj Ŝi (tj−1) R Ŝ(t) = Q R i:tj <t (1 d − njj ) Ŝi (t) = Q j:tj <t (1 − dij nj ) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Numeric Example a. naive (KM) method time cod at rsk rel cens. ŜR∗ (t) 10 R 10 1 0 1*(9/10)=0.9 20+ 9 0 1 0.9*(9/9)=0.9 35 R 8 1 0 0.9*(7/8)=0.787 40 T 7 0 1 0.787*(7/7)=0.787 50+ 6 0 1 0.787*(6/6)=0.787 55 R 5 1 0 0.7875*(4/5)=0.63 70 T 4 0 1 0.63*(4/4)=0.63 71 T 3 0 1 0.63*(3/3)=0.63 80 R 2 1 0 0.63*(1/2)=0.315 90+ 1 0 1 0.315*(1/1)=0.315 b. CR method time cod at rsk r/t cens. Ŝ(t) 10 R 10 1 0 1*(9/10)=0.9 20+ 9 0 1 0.9*(9/9)=0.9 35 R 8 1 0 0.9*(7/8)=0.787 40 T 7 1 0 0.787*(6/7)=0.675 50+ 6 0 1 0.675*(6/6)=0.675 55 R 5 1 0 0.675*(4/5)=0.54 70 T 4 1 0 0.54*(3/4)=0.405 71 T 3 1 0 0.405*(2/3)=0.27 80 R 2 1 0 0.27*(1/2)=0.135 90+ 1 0 1 0.135*(0/1)=0.135 cod: cause of death, r/t: relapse or TRM KM F̂R∗ (t) 0+1*(1/10)=0.1 0.1+0.9*(0/9)=0.1 0.1+0.9*(1/8)=0.212 0.212+0.787*(0/7)=0.212 0.212+0.787*(0/6)=0.212 0.212+0.787*(1/5)=0.37 0.37+0.63*(0/4)=0.37 0.37+0.63*(0/3)=0.37 0.37+0.63*(1/2)=0.685 0.685+0.315*(0/1)=0.685 CR F̂R (t) 0+1*(1/10)=0.1 0.1+0.9*(0/9)=0.1 0.1+0.9*(1/8)=0.212 0.212+0.787*(0/7)=0.212 0.212+0.675*(0/6)=0.212 0.212+0.675*(1/5)=0.347 0.347+0.54*(0/4)=0.347 0.347+0.405*(0/3)=0.347 0.347+0.27*(1/2)=0.482 0.482+0.135*(0/1)=0.482 170 171 0.8 1.0 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 0.6 0.2 0.4 CR 0.0 probability KM 0 20 40 60 time 80 100 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 172 Example: Myeloablative (MT) vs. Nonmyeloablative (NT) Allogeneic hematopoietic stem cell transplantation (HSCT) for patients > 50 years of age with hematologic malignancies - a real data example. (Alyea et al, 2005) • Allogeneic HSCT refers to the transplantation of allogeneic stem cells derived from donor bone marrow or blood. HSCT is a treatment modality that can provide curative therapy for many hematologic malignancies. • A typical competing risks data set - both relapse and TRM are important outcomes. • Therapeutic benefit and potential cure achieved by allogeneic HSCT is derived from the donor immune system (called ‘graft-vs-tumor’ effect). • However, the therapeutic potential of allogeneic HSCT has not been fully realized due to both disease relapse and transplant-related mortality and morbidity (TRM). • The objective of this study was thus to examine the impact of conditioning regimens (MT vs NT) on the two important endpoints in HSCT: relapse and TRM. • 152 patients over age 50 who underwent HLA-matched allogeneic transplantation from 1997 through 2002 at our institution were included. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 173 • Of these 152, 81 patients underwent MT and 71 underwent NT. Analyzing the MT cohort first, 3-yr F̂R 3-yr F̂T KM 50% 58% 108% CR 30% 50% 80% 0.8 0.6 TRM − KM TRM − CR 0.4 probability 0.8 0.6 0.4 Relapse − KM 0.2 0.0 0.2 Relapse − CR 0.0 probability 1.0 KM vs CR: TRM 1.0 KM vs CR: relapse 0 6 12 18 24 30 months post HSCT 36 42 48 0 6 12 18 24 30 months post HSCT 36 42 48 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 174 1.2 CR method 1.2 KM method 1.0 0.8 TRM+Rel (1−EFS) TRM 0.4 0.4 Relapse 0.6 probability 0.6 TRM 0.2 0.0 0.2 Relapse 0.0 probability 0.8 1.0 TRM+Relapse 0 6 12 18 24 30 months post HSCT 36 42 48 0 6 12 18 24 30 months post HSCT 36 42 48 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim R code: library(cmprsk) attach(subset(age50.dat, mini==0)) # c.rirk=1 if TRM, 2 if relpase, 0 if censored cr1 <- cuminc(c.time, c.risk, cencode=0) print(cr1) timepoints(cr1, c(24, 36)) # point estiamtes at 2 yr and 3 yr Estimates and Variances: $est 10 20 30 40 1 1 0.4320988 0.4718793 0.4983996 0.4983996 1 2 0.2098765 0.2496571 0.2629172 0.2996377 $var 10 20 30 40 1 1 0.003092805 0.003183996 0.003219427 0.003219427 1 2 0.002090070 0.002403033 0.002497967 0.002917386 $est 24 36 1 1 0.4851395 0.4983996 1 2 0.2629172 0.2996377 $var 24 36 1 1 0.003204113 0.003219427 1 2 0.002497967 0.002917386 plot(cr1, curvlab=c("TRM","Relapse "), ylim=c(0,1.0), xlim=c(0,48), lty=c(1,1), main=" ", ylab=" ", xlab=" ", col=c(2,4), lwd=c(2,2),yaxt="n") title("Cumulative Incidence of TRM and Relapse") box(which="plot", lty = "solid") # no box if this statement is omitted par(xaxt="s") axis(1, at=c(0,12,24,36,48), label=c("0","1","2","3","4"), cex.axis=1.3) axis(2, at=c(0.0,0.2,0.4,0.6,0.8,1.0), cex.axis=1.3) 175 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 176 6.4 Comparison of Multiple Cumulative Incidence Functions We discuss here the Gray test. For other tests, check Pepe Mori, Stat Med 12:737-751, 1993; Lin DY Stat Med 16:901-910 1997 Gray test • Gray (1988, Annals of Statistics) proposed a class of tests for comparing the cumulative incidence functions of a particular type of failure among different groups in the presence of competing risks. • Suppose there are two types of failure (i=1, 2) and two treatment groups (k=A,B). Then testing the group difference in failure 1 is Ho : F1A = F1B = F1o where F1A is a subdistribution of failure type 1 in the treatment group A. • Gray argued that testing F1A = F1B is not the same as testing λ1A = λ1B . • To test F1A = F1B , the Gray test compares weighted averages of the sub-distribution hazards, γ1k = f1k /(1 − F1k ), k = A, B. Z τ 0 w(t) Z τ dF̂1B (t) dF̂1A(t) = w(t) (γ̂1A − γ̂1B )) − 0 1 − F̂1A(t−) 1 − F̂1B (t−) ˙ and w(t) is a weight function. ˙ is an estimate of F1(t) where F̂1(t) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 177 • Under Ho , the subdistribution hazard ratio of the two treatments is equal to 1 and constant over time (PH). • Suppose there are two types of failures (i=1, 2), relapse and TRM, and two treatment groups (k=A,B). • Let T = min(T1, T2 ) and (T1, T2) represent failure times of relapse and TRM in group 1 that have a bivariate exponential distribution. T = min(T1, T2), subdistribution functions : cause specific hazards : subdensity : Fk (t1, t2) = 1 − e−λ1k t1 −λ2k t2 λ1k F1k (t) = (1 − e−(λ1k +λ2k )t ) λ1k + λ2k λ2k (1 − e−(λ1k +λ2k )t ) F2k (t) = λ1k + λ2k λ1k λ2k f1k (t) = λ1k e−(λ1k +λ2k )t, f2k (t) = λ2k e−(λ1k +λ2k )t DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Group 1 Failure 1 λ11 = 0.3 Group 2 λ12 = 0.2 F11(t) = 63 (1 − e−0.6t) F12(t) = 23 (1 − e−0.3t ) Failure 2 λ21 = 0.3 λ22 = 0.1 F21(t) = 63 (1 − e−0.6t) F22(t) = 13 (1 − e−0.3t ) At t=2, F11(2) = 0.35, F12(4) = 0.30. At t=4, F11(5) = 0.48, F12(4) = 0.52 Thus, λ11 > λ12 6=⇒ F11 > F12. • This is caused by the dependency of the CIFs, not only on the λ1k , but also on the λ2k for the competing risks. 178 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 179 Figure 2 , 0.8 No CR 0.6 CR, Group 2 0.2 0.4 CR, Group 1 0.0 Cumulative Incidence Function 1.0 p1 No CR, Grou 2 Group 0 2 4 6 8 Time to event 10 12 14 16 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 180 6.4.1 Estimation of Gray Statistic • Let njA and njB be the number of subjects at risk at tj who are free from failures of any type in treatment A and B, respectively. • The number of subjects at risk of failing for type 1 event is expected to be greater than njA and njB as we need the number of subjects free from failure of type 1. Thus, Gray test propses a ’correction F̂i (t−) factor’ of njA and njB , and that is 1−Ŝ(t−) ≥ 1. • The estimates of risk sets for type 1 failure at tj for the treatment groups A and B are R̂1A(tj ) = njA 1 − F̂1A(t−) 1 − F̂1B (t−) and R̂1B (tj ) = njB , ŜA(t−) ŜB (t−) R̂1(tj ) = R̂1A(tj ) + R̂1B (tj ). R̂1(tj ) is the total number of subjects from two treatment groups combined at risk at tj for failure type 1. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • Then, the score (numerator of the Gray statistic) for failure 1 in group A is d1A d1 X zA = w(t)( − ) if w(t) = R1A R1A R1 t∈(0,tl ) R1A X = (d1A − d1) R1 t∈(0,tl ) where R1 = R1A + R1B . • The quadratic term of this score divided by its variance, V , is ′ (note: χ2k−1 for k groups). zk V −1 zk ∼ χ21. 181 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Let us consider the example presented in Table 10.1 in Marubini and Valsecchi (1995). Trt A Failure type 1 1 13 17 30 34 41 78 100 119 169 Failure type 2 1 6 8 13 13 15 33 37 44 45 63 80 89 89 91 132 144 171 183 240 censored data 34 60 63 149 207 Trt B 7 16 16 20 39 49 56 73 93 113 1 2 4 6 8 9 10 13 17 17 17 18 18 27 29 39 50 69 76 110 34 60 63 78 149 At time 16 for failure type 1, Ordinary 2x2 table using Gray’s risk set Trt no. of failure 1 R16 Trt no. of failure 1 R16 A 0 27 A 0 34.5 B 2 26 B 2 34 Thus, the score (or the numerator) of the Gray test statistic at t=16 is thus 0−2∗ 34.5 = −1.007 68.5 182 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim ------------------------------------------------------------------------------trt time ni e(t) S(t-) ci(t) ci(t-) 11-F1A(t-)/ R1k(t) type 1 F1A(t-) S(t-) ------------------------------------------------------------------------------A 0 35 1 1.0000 0.00000 A 1 35 1 1.0000 0.01429 0.00000 1.00000 1.00000 35.00 A 13 31 1 0.8857 0.04286 0.01429 0.98571 1.11292 34.50 <== A 17 27 1 0.7714 0.08571 0.04286 0.95714 1.24079 33.50 A 30 26 1 0.7429 0.11429 0.08571 0.91429 1.23070 32.00 A 34 24 1 0.6857 0.12857 0.11429 0.88571 1.29169 31.00 A 41 21 1 0.6273 0.15869 0.12857 0.87143 1.38917 29.17 A 78 16 1 0.5060 0.22406 0.15869 0.84131 1.66267 26.60 A 100 10 1 0.3374 0.26127 0.22406 0.77594 2.29978 23.00 A 119 9 1 0.3036 0.29848 0.26127 0.73873 2.43324 21.90 A 169 6 1 0.2024 0.32453 0.29848 0.70152 3.46602 20.80 ------------------------------------------------------------------------------B 0 35 1 0.0000 0.00000 B 7 31 1 0.8857 0.02857 0.00000 1.00000 1.12905 35.00 B 16 26 2 0.7429 0.07143 0.02857 0.97143 1.30762 34.00 <== B 20 19 1 0.5429 0.14363 0.10000 0.90000 1.65776 31.50 B 39 16 1 0.4571 0.17375 0.14363 0.85637 1.87349 29.98 B 49 13 1 0.3962 0.18880 0.17375 0.82625 2.08545 27.11 B 56 11 1 0.3352 0.20643 0.18880 0.81120 2.42004 26.62 B 73 7 1 0.2667 0.24266 0.20643 0.79357 2.97552 20.83 B 93 5 1 0.1905 0.27987 0.24266 0.75734 3.97553 19.88 B 113 2 1 0.0952 . 0.27987 0.72013 7.56437 15.13 ------------------------------------------------------------------------------ci: cumulative incidence of failure 1. e(t): no of events of interest at t. 183 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example: Returning to the previous example. KM method NT MT CR method p-value (log-rank) NT MT p-value (Gray test) 2.5-yr F̂R 61% 50% 0.35 46% 30% 0.052 2.5-yr F̂T 38% 58% 0.008 32% 50% 0.01 >library(cmprsk) >attach(age50.dat) # c.risk=1 if TRM, 2 if Relapse, 0 if censored >cuminc1 <- cuminc(c.time, c.risk, mini, cencode=0) >timepoints(cuminc1, c(24, 30)) Tests: stat pv df 1 6.401663 0.01140135 1 2 3.788293 0.05161226 1 Estimates and $est 24 0 1 0.4851395 1 1 0.3155839 0 2 0.2629172 1 2 0.4099636 Variances: 30 0.4983996 0.3155839 0.2629172 0.4648541 $var 0 1 0 1 1 1 2 2 24 0.003204113 0.003444021 0.002497967 0.004217095 30 0.003219427 0.003444021 0.002497967 0.006467915 184 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 185 0.6 0.4 MT, TRM NT, Relapse MT, Relapse 0.2 NT, TRM 0.0 probability 0.8 1.0 CR method 0 6 12 18 24 30 months post HSCT 36 42 48 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 186 Relapse vs. TRM Relapse TRM TRM Relapse DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 187 6.5. Competing Risks Regression Analysis • Why do we do regression analysis? – to identify potential prognostic factors for a particular failure – to assess a prognostic factor of interest in the presence of other potential prognostic factors • Fitting a Cox model for an event of interest when competing risks are present won’t address these two questions properly because the cause-specific Cox model treats competing risks as censored observations, and the cause-specific hazard function does not have a direct interpretation. • Fine and Gray (JASA, 1999) and Klein and Andersen (Biometrics, 2005) proposed a direct regression modeling of the effect of covariates on the cumulative incidence function for competing risks data. • These models distinguish between patients who are still alive and those who have already failed from competing causes and allow direct inference about the effects of covariates on the cumulative incidence DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 188 Fine and Gray model: • A Cox PH like model for the subdistribution hazard. • The model uses the partial likelihood principle and weighted estimated equations to obtain consistent estimators of the covariate effects. • Let γ1(t; X) be the subdistribution hazard for failure 1, conditional on the covariates, X. 1 γ1(t; X) = lim Pr{t ≤ T < t + u, I = 1|T ≥ t ∪ (T ≤ t ∩ I 6= 1), X} u→0 u f1 (t; X) ′ = = γ0(t)eX β 1 − F1(t; X) where γ0(t) is the baseline hazard of the subdistribution, F1, X is the vector of covariates, and β is the vector of coefficients. • The risk set is Ri = {j : (min(Cj , Tj ) ≥ Ti ) | {z } those who have not failed from any cause ∪ (Tj ≤ Ti ∩ I 6= 1 ∩ Cj ≥ Ti)} | {z } those who have failed from another cause • The risk set is improper and unnatural since in reality those individuals who failed from causes other than failure 1 prior to time ti can not be ”at risk” at ti. • Although the risk set is unnatural, it leads to a proper PL for the improper F1(t; X) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim • The partial likelihood function is P L(β) = Y j 189 exp(Xj β) P i∈Rj wij (t)exp(Xi β) • A choice of weight is wij (t) = Ĝ(ti) , Ĝ(ti ∧ tj ) (11) where Ĝ is the Kaplan-Meier estimate of the survivor function of the censoring distribution. The weight is 1 for those who did not experience any event by time ti and ≤ 1 for those who experienced a competing risk event before time ti. i.e., individuals experiencing a competing risk event are not fully counted in the PL. • As in the Cox partial likelihood function, taking derivatives with respect to β to the log partial likelihood function gives the score statistic P l X r∈R wij (t)Xr exp(Xr β) }. U (β) = {Xj − P j w (t)exp(X β) j r∈Rj ij r β̂ is then the value which maximizes the score function. • If there is only one type of failiure, the Fine and Gray model reduces to the Cox model. • Limitations: no stratification is allowed. It handles β(t)X, but not βX(t). (12) DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example of weight calculation for CIF of relapse subject time cod 1 10 R 2 20+ 3 35 R 4 40 T 5 50+ 6 55 R 7 70 T 8 71 T 9 80 R 10 90+ cod: cause of death, time 10 35 55 80 Ĝ(t) 1 0.89 0.89 0.89 0.74 0.74 0.74 0.74 0.74 0 R: relapse, T: TRM s1 s2 s3 (R) (C) (R) 1 1 1 1 - - - s4 s5 s6 (T) (C) (R) 1 1 1 1 1 1 Ĝ(55) = 0.83 1 Ĝ(40) Ĝ(80) Ĝ(40) = 0.83 - - s7 (T) 1 1 1 Ĝ(80) Ĝ(70) =1 s8 (T) 1 1 1 Ĝ(80) Ĝ(71) =1 ’-’ denotes observations that are not in the risk set at that time point. s9 s10 (R) (C) 1 1 1 1 1 1 1 1 190 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 191 Example: Returning to the previous example. • The cumulative incidence curves of TRM indicate that MT is associated with an increased risk of TRM (p=0.01). However, this is confounded by bone marrow (BM) progenitor cells. 35 of the 40 patients who died of TRM in MT received BM. • The cumulative incidence curves of relapse indicate that NT is associated with an increased risk of relapse (p=0.052). However, this is also confounded by the unfavorable risk status at the time of transplantation. Of 51 relapsed patients (24 in MT and 28 in NT), 41 pts (17 in MT, 24 in NT) had unfavorable risk characteristics at the time of HSCT. Table 6.1: Results of the HSCT example Relapse & TRM: Cox Relapse: CRR TRM: CRR Variable β HR p-value β HR p-value β HR p-value Bone marrow 0.118 1.13 0.71 -0.781 0.46 0.11 0.808 2.24 0.057 Non-myeloablative -0.428 0.65 0.25 -0.556 0.57 0.33 0.057 1.06 0.90 Poor prognosis 0.426 1.53 0.06 0.891 2.44 0.02 0.255 1.29 0.38 Sex mismatch -0.385 0.68 0.053 -0.659 0.52 0.04 0.179 1.20 0.50 HR: hazard ratio. Cox: Cox proportional hazards model for Relapse and TRM combined as a single event. CRR: competing risks regression model using the Fine and Gray method. DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim R code library(cmprsk) attach(age50.dat) # failcode = 1 if TRM, 2 if Relapse, 0 if censored crr.rel <- crr(c.time, c.risk, cov1 = cbind(age, urd, bm, mini, good, sec.tx, fk506, sex.mm), failcode = 2, cencode = 0) # summary(crr.rel) crr(ftime = c.time, fstatus = c.risk, cov1 = cbind(age, urd, bm, mini, good, sec.tx, fk506, sex.mm), failcode = 2, cencode = 0) age urd bm mini good sec.tx fk506 sex.mm coef exp(coef) se(coef) z p-value 0.0487 1.050 0.0443 1.099 0.270 -0.1644 0.848 0.3393 -0.485 0.630 -0.7811 0.458 0.4849 -1.611 0.110 -0.5561 0.573 0.5715 -0.973 0.330 -0.8906 0.410 0.3861 -2.307 0.021 0.0889 1.093 0.4211 0.211 0.830 0.0805 1.084 0.4996 0.161 0.870 -0.6589 0.517 0.3192 -2.065 0.039 age urd bm mini good sec.tx fk506 sex.mm exp(coef) exp(-coef) 2.5% 97.5% 1.050 0.952 0.963 1.145 0.848 1.179 0.436 1.650 0.458 2.184 0.177 1.184 0.573 1.744 0.187 1.758 0.410 2.437 0.193 0.875 1.093 0.915 0.479 2.495 1.084 0.923 0.407 2.885 0.517 1.933 0.277 0.967 Num. cases = 152 Pseudo Log-likelihood = -231 Pseudo likelihood ratio test = 22.1 on 8 df, rel # 192 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 193 Stratified Fine and Gray model • Zhou et. al. (Biometrics 2011) developed stratified Fine and Gray model. • Allows the baseline hazard to vary across levels of the stratification covariate • Two types of a stratification factor: i) a small number of levels in strata, ii) a large number levels in strata. • We consider here the first case. For the second case, see Zhou et. al. • Stratified partical likelihood function is P L(β) = exp(Xkj β) P k=1 j=1 i∈Rkj wkij (t)exp(Xki β) s Y k Y DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 194 >install.packages("crrSC") >library(crrSC) attach(wbc_data_061813) new_grp2 <- 1*(new_grp==2) new_grp3 <- 1*(new_grp==3) new_grp4 <- 1*(new_grp==4) dri1 <- 1*(DRI==1) dri2 <- 1*(DRI==2 | DRI==3) ecog_m <- 1*(ecog_ps<0) ecog_1 <- 1*(ecog_ps==1) ecog_2 <- 1*(ecog_ps==2 | ecog_ps==3) ric <- 1*(REGINTEN==2) c_trm <-crrs(pfs_t, rrisk, strata=REGINTEN, cov1=cbind(age60, mf, dnr_type0, prophn, cmv_pos, dri1, dri2, yr_bmt, cd34_cat2, ecog_m, ecog_1, ecog_2, new_grp2, new_grp3, new_grp4), failcode =1, cencode=3, ctype=1) # TRM >print.crrs(c_trm) convergence: TRUE coefficients: [1] 0.27670 0.17970 -0.10570 -0.06656 0.16060 -0.02863 -0.25700 -0.04353 -0.17490 -0.02699 0.19790 0.69530 0.64950 0.83480 0.49500 standard errors: [1] 0.22940 0.15260 0.11900 0.14440 0.14090 0.19240 0.21570 0.04189 0.13130 0.22420 0.16580 0.24210 0.23180 0.22730 0.21800 two-sided p-values: [1] 0.23000 0.24000 0.37000 0.64000 0.25000 0.88000 0.23000 0.30000 0.18000 0.90000 0.23000 0.00410 0.00510 0.00024 0.02300 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 195 Klein and Andersen model (2005): • Modeling cumulative incidence function for subject j at t, Cjt, through a link function g(x). g(Cjt) = αt + βZjt , where Z is a covariate vector. • Regression estimator of β is based on pseudovalues from the cumulative incidence function Ĉjt = nĈt − (n − 1)Ĉt−j for j individual at time t, where Ĉt is the cumulative incidence function at time t for the complete data set, and Ĉt−j is the cumulative incidence function on the data set obtained by deleting subject j. • A link function for CRR is g(x) = log[−log(1 − x)] • Parameter estiamtes and standard errors are obtained by using generalized estimating equations and PROC GENMOD in SAS. • SAS and R programs for this model is available in CIBMTR website http://www.cibmtr.org/ReferenceCenter/Statistical/Education or http://www.mcw.edu/biostatistics/statisticalresources/CollaborativeSoftware.htm • Time dependent covariates are allowed DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Example of Klein and Andersen model options mprint mlogic merror source2 spool symbolgen; libname in "/usr/stats/kim/ecog/Training/Comp_Risk/Pseudo"; %include "/usr/stats/kim/ecog/Training/Comp_Risk/Pseudo/pseudoci2.txt"; ** pseudoci.txt does not work; %include "/usr/stats/kim/ecog/Training/Comp_Risk/Pseudo/cuminc.txt"; data one; set in.bmt; if dfs=1 and rel=0 then nrm=1; else nrm=0; keep id dfs rel dfs_t nrm fab disease pt_age; run; data times; input time @@; ** calcualte pseudo values at 5 data points roughly equally spaced on the event scale; cards; 50 105 170 280 530 ; run; *** %macro pseudoci(datain,x,r,d,howmany,datatau,dataout); %pseudoci(one,dfs_t,rel,nrm,137,times,in.dataoutcr); data two; set in.dataoutcr; dis2=0; if disease=2 then dis2=1; dis3=0; if disease=3 then dis3=1; run; proc print data=two round; proc genmod; class oid otime; FWDLINK LINK=LOG(-LOG(1-_MEAN_)); INVLINK ILINK=1-EXP(-EXP(_XBETA_)); model rpseudo= otime dis2 dis3 fab pt_age/dist=normal noscale noint; repeated subject=oid / corr=ind ; run; 196 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 197 The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable WORK.TWO Normal User rpseudo Number of Observations Read Number of Observations Used Missing Values 686 680 6 Class Level Information Class oid Levels 136 otime 5 Values 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 ... 50 105 170 280 530 Parameter Information Parameter Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 Prm10 Algorithm converged. Effect Intercept otime otime otime otime otime dis2 dis3 fab pt_age otime 50 105 170 280 530 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 198 GEE Model Information Correlation Structure Subject Effect Number of Clusters Clusters With Missing Values Correlation Matrix Dimension Maximum Cluster Size Minimum Cluster Size Independent oid (138 levels) 138 2 5 5 0 The GENMOD Procedure Algorithm converged. GEE Fit Criteria QIC QICu 77.0100 89.1155 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Parameter Intercept otime otime otime otime otime dis2 dis3 fab pt_age 50 105 170 280 530 Estimate 0.0000 -3.6459 -2.6063 -2.1331 -1.7867 -1.4960 -1.7393 -0.1820 1.0574 0.0169 Standard Error 0.0000 0.8515 0.6671 0.6499 0.6329 0.6207 0.6493 0.5686 0.5007 0.0220 95% Confidence Limits 0.0000 0.0000 -5.3148 -1.9769 -3.9137 -1.2989 -3.4069 -0.8594 -3.0272 -0.5463 -2.7126 -0.2794 -3.0119 -0.4667 -1.2964 0.9324 0.0761 2.0387 -0.0261 0.0600 Z Pr > |Z| . . -4.28 <.0001 -3.91 <.0001 -3.28 0.0010 -2.82 0.0048 -2.41 0.0159 -2.68 0.0074 -0.32 0.7489 2.11 0.0347 0.77 0.4405 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim %include "/usr/stats/htkimc/core/BMT/Cutler/prt05377/March13/pseudoci2.txt"; %include "/usr/stats/htkimc/core/BMT/Cutler/prt05377/March13/cuminc.txt"; data one; set g.both_dri_050213; keep bmtid age case mrd mf myeloid mini dri2 agvhd_t agvhd24 ext_cgvhd2 dth_rel ext_cgvhd_t; run; data times; input time @@; ** calcualte pseudo values at 5 data points roughly equally spaced on the ext cgvhd event scale; cards; 5.3 6.35 7.2 8.6 13.3 ; run; %pseudoci(one,ext_cgvhd_t,ext_cgvhd2, dth_rel,133,times, g.dataoutcr); data two; set g.dataoutcr; if .Z<agvhd_t<otime then agvhd_tv=1; run; proc print data=two round; else agvhd_tv=0; *** time dependent variable; proc genmod; class oid otime; FWDLINK LINK=LOG(-LOG(1-_MEAN_)); INVLINK ILINK=1-EXP(-EXP(_XBETA_)); model rpseudo= otime age case mrd mf myeloid mini dri2 agvhd_tv /dist=normal noscale noint; repeated subject=oid / corr=ind ; run; 199 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard Estimate Error Parameter Intercept otime otime otime otime otime age case mrd mf myeloid mini dri2 agvhd_tv 5.3 6.35 7.2 8.6 13.3 0.0000 -1.2647 -0.7386 -0.3391 -0.0693 0.2784 -0.0102 -0.7505 -0.2340 -0.0225 -1.0040 -0.2932 0.4674 0.6582 0.0000 0.8994 0.9203 0.9186 0.9136 0.9055 0.0182 0.3748 0.3584 0.4013 0.3908 0.4110 0.3967 0.3553 95% Confidence Limits 0.0000 -3.0276 -2.5422 -2.1395 -1.8600 -1.4964 -0.0460 -1.4851 -0.9364 -0.8090 -1.7700 -1.0988 -0.3101 -0.0382 0.0000 0.4981 1.0651 1.4613 1.7213 2.0532 0.0255 -0.0159 0.4684 0.7639 -0.2381 0.5125 1.2449 1.3545 Z Pr > |Z| . -1.41 -0.80 -0.37 -0.08 0.31 -0.56 -2.00 -0.65 -0.06 -2.57 -0.71 1.18 1.85 . 0.1597 0.4222 0.7120 0.9395 0.7585 0.5753 0.0452 0.5138 0.9553 0.0102 0.4757 0.2387 0.0640 ** time dependent variable 200 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim proc genmod; class oid otime; FWDLINK LINK=LOG(-LOG(1-_MEAN_)); INVLINK ILINK=1-EXP(-EXP(_XBETA_)); model rpseudo= otime age case mrd mf myeloid mini dri2 agvhd24 /dist=normal noscale noint; repeated subject=oid / corr=ind ; run; Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Parameter Intercept otime otime otime otime otime age case mrd mf myeloid mini dri2 agvhd24 5.3 6.35 7.2 8.6 13.3 Estimate Standard Error 0.0000 -1.0707 -0.4850 -0.0386 0.2571 0.5961 -0.0144 -0.7560 -0.3363 0.1067 -0.7675 -0.2773 0.3994 0.4177 0.0000 0.8694 0.8808 0.8661 0.8567 0.8446 0.0175 0.3567 0.3632 0.3800 0.3706 0.4087 0.3867 0.3921 95% Confidence Limits 0.0000 -2.7747 -2.2113 -1.7362 -1.4221 -1.0594 -0.0487 -1.4550 -1.0481 -0.6381 -1.4939 -1.0784 -0.3584 -0.3507 0.0000 0.6334 1.2414 1.6589 1.9363 2.2516 0.0200 -0.0569 0.3755 0.8515 -0.0412 0.5238 1.1573 1.1861 Z Pr > |Z| . -1.23 -0.55 -0.04 0.30 0.71 -0.82 -2.12 -0.93 0.28 -2.07 -0.68 1.03 1.07 . 0.2182 0.5819 0.9644 0.7641 0.4804 0.4130 0.0341 0.3545 0.7789 0.0384 0.4975 0.3016 0.2867 ** time fixed variable 201 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 202 6.6 Power Calculation: Pintilie (2002, Stat in Med) • Suppose that a randomized clinical trial is planned comparing two treatment difference for failure 1 in the presence of competing risks. Under the proportional hazards assumption, the number of events (n1 ) necessary to detect a specific subdistribution hazard ratio (HRsub) with Type I and II error rates of α and β is (Z1−α/2 + Z1−β )2 , (13) n1 = 2 σ [log(HRsub)]2 where z1−γ is the usual r-th quantile of the standard normal distribution, and σ 2 is the variance of the covariate of interest, σ 2 = p(1 − p), and p is the proportion of patients in the experimental arm. • Then the total sample size required is n1 , (14) P1 where P1 is the probability of occurrence of failure 1 at a specific time point. Analogous to the sample size formula in the absense of competing risks, P1 can be calculated as 1 Z a+f P1 = 1 − f (1 − F1(u)du) a where a is the accrual time, and f is the additional follow-up time after the completion of the accrual. N= • If exponentiality is assumed, P1 for the treatment group A is e−(λ1A +λ2A)f − e(λ1A+λ2A(a+f ) λ1A } {1 − P1A = λ1A + λ2A (λ1A + λ2A)a DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 203 • P1B for the treatment group B can be calculated in a similar manner. Then P1 = πP1A + (1 − π)P1B , where π is the proportion of patients assined to the treatment group A. • Typically λ1A and the cumulative incidence of failure 1 at a specified time point are known from a previous study, and HRsub is a hypothesized value at the time of study design. • Mimicking the HSCT study presented in Example 1, Table 6.2 below shows power for various scenarios of cumulative incidence in the presence and absence of competing risks. Table 6.2. Power in the presence and absence of a competing risk (N=400). Group 1 Group 2 power Group 1 Group 2 power CIFe CIFc CIFe CIFc a=2 a=3 CIFe CIFc CIFe CIFc a=2 a=3 30% 50% 45% 30% 50% 52% 50% 30% 30% 45% 96% 97% 30% 0% 45% 0% 88% 91% 50% 0% 30% 0% 99% 99% a: accrual time in years. CIFe : cumulative incidence rate of an event of interest. CIFc : cumulative incidence rate of a competing event. This power calculation is based on a two-sided significance level of 0.05 for a sample size 400 assuming f=2. As indicated in the table, power can be very different between presence and absence (i.e., CIFc=0%) of a competing risk as well as the magnitude of a competing risk. > power(N=400, a=2, f=2, pi=0.5, t0=3, CIFev0=0.3, CIFcr0=0.5, CIFev1=0.45, CIFcr1=0.3) [1] 0.4969512 > power(N=400, a=2, f=2, pi=0.5, t0=3, CIFev0=0.3, CIFcr0=0, CIFev1=0.45, CIFcr1=0) [1] 0.8833121 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 6.6. General Comments on CR data analysis • Recognition • Choice of competing event • Presentation 204 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 6.7. Computing Tools in R • R package cmprsk: cuminc and crr. crr outputs a matrix of Schoenfeld residuals. Plots of these residuals against failure times can be used for checking the proportional hazards assumption. • In addition, two adds-on functions to crr are available on http://www.stat.unipg.it/ luca/R/. CumIncidence is a R program to calculate the confidence intervals of cumulative incidence functions. modsel.crr is a model selection tool among candidate competing risks models. • R package ’tworeg’ for confidence interval • For stratified Fine and Gray model, R package ’crrSC’ • For Klein and Andersen model, check the CIBMTR website http://www.cibmtr.org/ReferenceCenter/Statistical/Education or http://www.mcw.edu/biostatistics/statisticalresources/CollaborativeSoftware.htm • For power calculation, power, a R program, is available on http://www.uhnres.utoronto.ca/labs/hill/People Pintilie.htm. 205 DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 206 References [1] Akaike, Hirotugu (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6): 716-723. [2] Allison P. Survival Analysis Using SAS. SAS Institute Inc., Cary, NC: 2003. [3] Alyea, E. P., Kim, H. T., Ho, V., Cutler, C., Gribben, J., DeAngelo, D. J. , Lee, S. J., Windawi, S., Ritz, J., Stone, R. M., Antin, J. H., Soiffer, R. J. (2005) Comparative outcome of nonmyeloablative and myeloablative allogeneic hematopoietic cell transplantation for patients older than 50 years of age. Blood 105:1810-1814. [4] Anderson JR, Cain KC, Gelber RD. (1983). Analysis of survival by tumor response and other comparisons of time-to-event by outcome variables. J. Clin Oncol. 1983 Nov;1(11):710-9. [5] Basu, A. P., Ghosh, J. K. (1978). Identifiability of the multinormal distribution under competing risks model (with J.K. Ghosh). J. Multivariate Analysis, 8:413-429. [6] Basu, A. P., Ghosh, J. K. (1980). Identifiability of distributions under competing risks and complementary risks model Communications in Statistics, A: Theory and Methods, 9, 1515-1525 [7] Basu, A. P., Klein, J. P. (1982). Some recent results in competing risks theory Survival Analysis, 216-229 Crowley, John (ed.) and Johnson, Richard A. (ed.) Institute of Mathematical Statistics (Hayward) [8] Brookmeyer, Ron and Crowley, John (1982) A confidence interval for the median survival time Biometrics, 38, 29-41 [9] Cortese, G., Andersen, P.K. (2009). Competing Risks and Time-Dependent Covariates. Biometrical Journal, 51:138-158. [10] Cox, D.R., Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall p.91-110. [11] Crowder, M. (1994). identifiability crises in competing risks. Int. Statist. Rev. 62, 379-391. [12] Crowder, M. (2001). Classical Competing Risks. Chapman & Hall/CRC. [13] David, H. A., Moeschberger, M.L. (1978). The theory of competing risks. London: Griffin [14] Dafni U. Landmark analysis at the 25-year landmark point. Circ Cardiovasc Qual Outcomes. 2011 May;4(3):363-71 [15] Fine, J. P., Gray, R.J.(1999). A proportional hazards model for the subdistribution of a competing risk. J. Am. Stat. Assoc. 94:496-509. [16] Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine. 18:2529-2545, 15 - 30 September 1999 [17] Gray, R. J. A class of K-sample tests for comparing the cumulative incidence of a competing risk(1988). Ann. Statist. 16:1140-1154 [18] Grambsch, Patricia M., Therneau, Terry M. and Fleming, Thomas R. (1995) Diagnostic plots to reveal functional form for covariates in multiplicative intensity models Biometrics, 51, 1469-1482 [19] Grambsch, Patricia M. and Therneau, Terry M. (1994) Proportional hazards tests and diagnostics based on weighted residuals (Corr: 95V82 p668) Biometrika, 81, 515-526 [20] Harrell F. E. (2001). Regression Modeling Strategies. Springer DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim 207 [25] Kim, H.T. (2007). Cumulative incidence in a competing risks setting and competing risks regression analysis. Clinical Cancer Research. 13(2):559-65. [26] Klein J.P., Andersen P.K. (2005). Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 61:223-229. [27] Klein JP and Moeschberger ML (1997) Survival analysis: techniques for censored and truncated data Springer-Verlag Inc (Berlin; New York) [28] Latouche, A., Porcher, R. (2007). Sample size calculations in the presence of competing risks. Stat. Med. 30;26(30):5370-80. [29] Lunceford, Jared K., Davidian, Marie and Tsiatis, Anastasios A. (2002) Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials Biometrics, 58, 48-57 [30] Mantel N, Byar DP. Evaluation of response-time data involving transient states: an illustration using heart-transplant data. J Am Stat Assoc. 1974; 69:81-86. [31] Maki, E. (2006). Power and sample size considerations in clinical trials with competing risk endpoints. Pharm. Stat. 5(3):159-71. [32] Latouche, A., Porcher, R, Chevret, S. (2004). Sample size formula for proportional hazards modelling of competing risks. Stat Med. 23(21):3263-74. [33] Pencina MJ, D’Agostino RB. Stat Med. 2004 Jul 15;23(13):2109-23. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. [34] Pintilie, M. (2002). Dealing with competing risks: Testing covariates and calculating sample size Statistics in Medicine, 21:3317-3324 [35] Pintilie, M. (2006). Competing Risks: A Practical Perspective. Wiley, New York. [36] Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Statistics in Medicine. 2007;26(11):2389-2430. [37] Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241. [38] Simon R, Makuch RW (1984) A non-parametric graphical representation of the relationship between survival and the occurrence of an event: Application to responder versus non-responder bias. Stat Med 3:35-44 [39] Scrucca L, Santucci A, Aversa F. (2007). Competing risk analysis using R: an easy guide for clinicians. Bone Marrow Transplantation. Aug;40(4):381-7. Epub 2007 Jun 11. [40] Scrucca L, Santucci A, Aversa F (2010). Regression Modeling of Competing Risk Using R: An In Depth Guide for Clinicians. Bone Marrow Transplantation. Jan 11. [Epub ahead of print]. [41] Schwarz, Gideon (1978).Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. [42] Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010 Jan;21(1):128-38. [43] Therneau, Terry M. and Grambsch, Patricia M. (2000) Modeling survival data: extending the Cox model Springer-Verlag Inc (Berlin; New York) [44] Therneau, Terry M., Grambsch, Patricia M. and Fleming, Thomas R. (1990) Martingale-based residuals for survival models Biometrika, 77, 147-160 [45] Tsiatis, A. (1975). A nonidentifiability aspect of the problem of competing risks. Proc. Natl. Acad. Sci. U.S.A. 72(1):20-22.