Survival Analysis and Competing Risks Data Analysis

Transcription

DFCI/BCB Training Session, Survival Analysis, January-February 2016, by H Kim
Survival Analysis
January-February 2016
Haesook T Kim
Department of Biostatistics and Computational Biology
Dana-Farber Cancer Institute
1
Contents
1. Introduction
2. Terminologies and Notations
3. Nonparametric Estimation
4. Comparison of Survival Curves: Nonparametric Approach
5. Semiparametric Survival Methods: PH model
6. Analysis of Competing Risks Data
2
3
1. Introduction
• Survival Analysis is used to analyze data in which the time until the event is of interest.
• Survival Analysis utilizes two measurements simultaneously: binary (i.e., occurrence of an event vs. no
event) and continuous response variables (i.e., time to the event). It allows incomplete measurement
of time to the event (i.e., censoring).
• Event should be clearly and precisely defined before the analysis. Examples of ‘event’ in cancer
clinical trials include death, disease recurrence after treatment, disease progression, treatment related
death, or incidence of new disease. Note: ‘event’ and ‘failure’ are often used interchangeably.
• Time origin should be unambiguously defined before the analysis. All individuals should be as
comparable as possible at their time origin. e.g., date of randomization, date of documented CR, date
of transplantation, date of study enrollment in prospective studies. Time origin (or time zero) is
usually well defined in prospective studies, but not always well defined in retrospective studies (e.g.,
onset of thrombotic microangiopathy (TMA) after allogeneic transplant, readmission study, ES study,
Stanford Heart Study and so on.)
“While it might be more biologically meaningful to measure time from the first instant at
which the patient’s symptoms met certain criteria of severity, the difficulty of determining and
the possibility of bias in such values would normally exclude their use as time origin. Such
information might, however, be useful as an explanatory variable” from Cox and Oakes’ Analysis
of Survival Data.
4
• Time origin does not need to be and usually is not at the same calendar time for each individual. Most
clinical trials have staggered entry. i.e., patients enter over a certain time period.
x
x
o
o
o
o
x
x
x
2001
2005
x
2010
t=0
t=10
• Timescale (scale for measuring time) should be identical. e.g., days, months, years.
• Characteristics of failure time
– Failure time is always non-negative.
– It has a skewed distribution and will never be normallly distributed, thus reporting mean survival
time is not so meaningful
– The probability of surviving past a certain time is often more relevant than the expected survival
time. Expected survival time may be difficult to estimate if the amount of censoring is large.
5
1.1 Censoring
• Right-censoring: a failure time is censored if the failure (or event) has not occurred by the end of
follow-up. i.e., the true unobserved event is right of the censoring time.



δ=

1 if T ≤ C (uncensored or observed)
0 if T > C (censored)
where δ is failure indicator, C is censoring time, T is failure time.
• There are three types of right-censoring: Type I,II,III censoring.
– Type I censoring: occurs when a study is planned to end after a predetermined time, all Ci are the
same (e.g., sacrificing all animal after one month).
– Type II censoring: occurs when a study terminates after a predetermined number of events
observed. (e.g., terminate when X number of deaths occur in animal studies). Type II censoring is
generally not a feasible design if there is staggered entry to the study. (See Kalbfleisch and
Prentice for further details).
– Type III censoring: C is a random variable, and δ is an indicator.
– We usually deal with random type III right censoring in which each subject’s event time is observed
only if the event occurs before a certain time, but censoring time can vary between subjects.
6
• Left-censoring: a failure time is censored if the failure is known to be occurred before a certain
time.



δ=

1 if C ≤ T
0 if C > T (censored)
where δ is failure indicator, C is censoring time, T is failure time.
– In HIV study, HIV infection may have occurred before an individual enters a cohort study on AIDS.
If the time origin is unknown, it is left censored.
– A study of age at which African children learn a task. Some already knew (left-censored), some
learned during the study period, some had not learned by the end of the study (right-censored).
– A survey of smoking. Q: “When did you first smoke?” Answers are exact age, never smoked, or
smoked but can not remember the exact time (left-censored).
• Interval-censoring: combination of right and left censoring occur. e.g., in the nephropathy for
diabetics (Andersen at el, page 30-31, Statistical Models Based on Counting Processes), for some
individuals, the exact time of onset of diabetic nephropathy (DN) was not observed, and only the time
last seen without DN and the time first seen with DN are available.
• Independent vs. Informative Censoring
– T and C are indendent (T ⊥ C) if censoring distribution contains (non-informative) no
information about the distribution of T . note: we usually assume independenece.
– Censoring is considered informative if the distribution of C contains any information about the
parameters characterizing the distribution of T .
2. Terminologies and Notations
• T : time to an event.
• C: censoring time.
• Y = min(T, C): right-censored observed time.
• δ = I(T ≤ C): failure indicator, i.e., δ = 1 if Y = T ; δ = 0 otherwise.
• X: a random vector of covariates. Covariates can be discrete or continuous. Covariates also can be
time-constant or time-varying.
• λ(t): hazard function
• Λ(t): cumulative hazard function
• S(t): (unconditional) survival function, and S(t) = Prob(T ≥ t)
If there are covariates, then the conditional functions are
• λ(t|X): conditional hazard function
• Λ(t|X): conditional cumulative hazard function
• S(t|X): conditional survival function
The relationship of these conditional functions will be discussed later.
7
If T is a continuous random variable, then S(t) = 1 − F (t), where F (t) is the cumulative distribution
function for T . The hazard function is
Prob(t ≤ T < t + u|T ≥ t)
λ(t) = lim
u→0
u
Prob(t ≤ T < t + u ∩ T ≥ t)
= lim
u→0
Prob(T ≥ t) ∗ u
Prob(t ≤ T < t + u)
= lim
u→0
Prob(T ≥ t) ∗ u
F (t + u) − F (u) ∂F (t)/∂t f (t)
= lim
=
=
u→0
S(t) ∗ u
S(t)
S(t)
where f (t) is the probability density function of T evaluated at t.
Furthermore,
f (t)
∂logS(t) ∂S(t)/∂t
=
=−
∂t
S(t)
S(t)
∂logS(t)
λ(t) = −
.
∂t
Thus,
Λ(t) =
Z
t
0
λ(v)dv = −logS(t); and
S(t) = exp[−Λ(t)]
8
9
If T is a discrete random variable, then
λ(tk ) =
P (T = tk ) f (tk )
=
where
P (T ≥ tk ) S(tk )
S(tk ) = P (T ≥ tk )
= P (T ≥ t1, T ≥ t2, · · · , T ≥ tk )
= P (T ≥ t1)P (T ≥ t2|T ≥ t1) · · · P (T ≥ tk |T ≥ tk−1)
= P (T ≥ t1)
=
k
Y
j=1
k
Y
j=2
P (T ≥ tj |T ≥ tj−1)
[1 − P (T = tj |T ≥ tj )] =
k
Y
j=1
(1 − λj ).
Survival probability at a certain time is a conditional probability of surviving beyond that time, given that
an individual has survived just prior to that time. This conditional probability can be estimated in a study
as the number of patients who are alive or event-free without loss to follow-up at that time divided by the
number of patients who are alive just prior to that time. The Kaplan-Meier (KM) estimate of survival
probability is then the product of these conditional probabilities up until that time. We will discuss this
further later.
2.1 Properties of the distribution of T .
Tq is the time by which a fraction q of the subjects will fail (q th quantile). Conversely, the value t such
that S(t) = 1 − q.
Tq = S −1 (1 − q) = Λ−1[−log(1 − q)]
The median life length, the time by which 50% of subjects will fail, is obtained by setting S(t) = 0.5:
T0.5 = S −1 (0.5) = Λ−1[log(2)]
If the survival distribution is exponential, then λ(t) = λ (i.e., constant hazard over time), and thus,
Λ(t) = λt and
S(t) = exp[−Λ(t)] = exp(−λt)
T0.5 = log(2)/λ
If the survival distribution is Weibull, then λ(t) = αγtγ−1, and thus,
Λ(t) = αtγ
S(t) = exp(−αtγ ),
and
T0.5 = [log(2)/α]1/γ
10
11
3. Nonparametric Estimation
3.1 Kaplan-Meier Method
Since the true survival distribution is seldom known, it is useful to estimate the distribution without
making any assumptions.
Let Fn(t) be the usual empirical cumulative distribution function in the absence of censoring. Then a
nonparametric estimator of S(t) is Sn (t) = 1 − Fn (t) based on the observed failure times T1, · · · Tn .
Sn (t) = [number of Tj > t]/n
That is, Sn (t) is the fraction of observed failure times that exceed t. In the presence of censoring,
however, S(t) can be estimated up until the end of follow-up time by the Kaplan-Meier product-limit
estimator (1958, JASA). The product-limit estimator is a nonparametric maximum likelihood estimator
and the formula is as follows.
Ŝ(t) =
Y
(1 − λ̂j ) =
i:tj <t
Y
(1 − dj /nj )
i:tj <t
where t1, t2, · · · , tk are the unique event times, dj is the number of failures at tj , and nj is the number of
subjects at risk just prior to tj . The KM estimator of Λ(t) is Λ̂(t) = −logŜ(t) and an estimator of q th
quantile failure time is Ŝ −1 (1 − q).
12
Example: suppose a set of failure times are
10 30 30 50+ 70+ 90 100+
where + denotes a censored time. Then
--------------------------------------------------------------ti
ni di ci
di/ni
(ni-di)/ni est. of S(t)
--------------------------------------------------------------0
7
0
0
0/7
7/7
1
10
7
1
0
1/7
6/7
1*(6/7)=0.85
30
6
2
0
2/6
4/6
0.85*(4/6)=0.57
50+
4
0
1
0/4
4/4
0.57*(4/4)=0.57
70+
3
0
1
0/3
3/3
0.57*(3/3)=0.57
90
2
1
0
1/2
1/2
0.57*(1/2)=0.29
100+
1
0
1
0/1
1/1
0.29*(1/1)=0.29
---------------------------------------------------------------
There are a few variance estimators for Ŝ(t). The Greenwood formula for asymptotic variance estimator of
Ŝ(t) is
dj
ˆ Ŝ(t)) = Ŝ 2 (t) Y
Var(
j|tj ≤t nj (nj − dj )
This Greenwood formula may be unstable in the tail of ditribution and some authors suggested
alternatives. e.g., Tsiatis’s estimator is
ˆ Ŝ(t)) = Ŝ 2(t)
Var(
dj
2
j|tj ≤t nj
Y
Once we estimate the survival function, we can construct pointwise (1-α)% confidence intervals.
Ŝ(t) ± z1−α/2s.e.[Ŝ(t)]
13
However, this approach can yield bounds outside of [0,1]. If this happens, replace <0 with 0 and >1 with
1.
Another approach is taking a Log-Log transformation.
• L(t) = log(−log(S(t)))
• 95% CI: [L̂(t) − A, L̂(t) + A]
• Since S(t) = exp(−exp(L(t)), the confidence bounds for the 95% CI on S(t) are
[exp(−e(L̂(t) + A)), exp(−e(L̂(t) − A))]
• Substituting L̂(t) = log(−log(Ŝ(t))) back into the above bounds, we get
([Ŝ(t)]exp(A), [Ŝ(t)]exp(−A))
• Replacing A with 1.96se(L̂(t)),
[Ŝ(t)]exp(±1.96se(L̂(t))
14
Example 1a: product-limit estimator of S(t).
I. SAS code:
proc lifetest data=one outsurv=outs timelist=(12, 24) plots=(s) graphics;
time time*status(0);
run;
The LIFETEST Procedure
Product-Limit Survival Estimates
Survival
Standard
Timelist
os_t
Survival
Failure
Error
12.0000
24.0000
11.6961
22.8994
0.4742
0.3674
0.5258
0.6326
0.0375
0.0387
Summary Statistics for Time Variable os_t
Percent
75
50
25
Quartile Estimates
Point
95% Confidence Interval
Estimate
[Lower
Upper)
53.3881
37.5524
.
9.1335
6.5380
16.0329
3.6468
2.4312
4.5010
Mean
Standard Error
22.1518
1.7612
Summary of the Number of Censored and Uncensored Values
Percent
Total Failed
Censored
Censored
181
114
67
37.02
Number
Failed
Number
Left
94
108
77
35
proc print data=outs;
15
** _CENSOR_=1 for censored observations;
Obs
91
92
93
94
time
11.4333
11.6961
11.7290
12.1889
_CENSOR_
1
0
1
1
SURVIVAL
0.48027
0.47419
0.47419
0.47419
SDF_LCL
.
0.40073
.
.
SDF_UCL
.
0.54765 (i.e., 0.47419 +/- 1.96*0.0375)
.
.
130
131
132
133
134
135
136
137
22.1109
22.8994
22.9651
22.9651
23.3593
23.6879
24.3778
24.7721
1
0
1
1
1
1
1
0
0.37681
0.36739
0.36739
0.36739
0.36739
0.36739
0.36739
0.35658
.
0.29164
.
.
.
.
.
0.28015
.
0.44314
.
.
.
.
.
0.43301
Note1: SAS PROC LIFETEST prints out a pointwise 95% confidence interval of KM estimate at each observed failure time. Use ALPHA=0.1 for
90% confidence intervals. The Greenwood formula is defalut. These are pointwise confidence intervals for particular time points, and this should
not be interpreted as the global confidence band.
Note2: survfit in R offers other options to calculate the confidence intervals (e.g., conf.type=“log-log”, “log”, ...)
ods
ods
ods
ods
graphics on;
rtf file="lifetest.rtf";
noptitle;
select SurvivalPlot; * to select the survival curve only;
*proc lifetest data=one plots=survival(cl cb=hw strata=panel); * check this out;
*proc lifetest data=one plots=survival(cl cb=hw); * this produces output tables;
proc lifetest data=one plots=survival(cl cb=hw) notable;
time survtime*censor(1);
strata cell;
run;
ods rtf close;
ods graphics off;
The SAS System
16
1
0.8
0.4
0.0
0.4
probability
0.8
KM curve without 95% CI
0.0
probability
KM curve with 95% CI
17
0
6
12
18
months
24
30
36
0
6
12
18
24
30
36
months
II. R code:
library(survival)
attach(subset(aml.data, grp==1))
par(mfrow=c(1,2))
fit <- survfit(Surv(os.t, os) ~ grp, type="kaplan-meier")
plot(fit, xlab ="months", ylab = "probability", mark.time=T, xlim=c(0,36), xaxt="n", col=4)
title("KM curve with 95% CI", cex = 0.7) #
axis(1, at = c(0, 6, 12, 18, 24, 30, 36), lwd=0.5)
axis(2, at = c(0, 0.2, 0.4, 0.6, 0.8, 1), lwd =0.5)
fit2 <- survfit(Surv(os.t, os) ~ grp, conf.type=c("none"), data=temp)
plot(fit2, xlab ="months", ylab = "probability", mark.time=T, xlim=c(0,36), xaxt="n", col=6)
title("KM curve without 95% CI", cex = 0.7) #
axis(1, at = c(0, 6, 12, 18, 24, 30, 36), lwd=2)
axis(2, at = c(0, 0.2, 0.4, 0.6, 0.8, 1), lwd = 2)
18
3.2 Estimation of Tq : PROC LIFETEST SAS v9.2
•
Tq = min{tj |Ŝ(tj ) < 1 − q}
If Ŝ(t) = 1 − q from tj to tj+1, the Tq is taken to be (tj + tj+1)/2 in SAS and R. Note: by definition,
Ŝ(t) = 1 − q, but Ŝ(t) ≤ 1 − q in practice, thus Tq = tj .
• median survival time
T0.5 = min{tj |Ŝ(tj ) < 0.5}
• confidence interval for median survival time based on Brookmeyer and Crowley (1982)
(Ŝ(t) − 0.5)2
≤ cα
Y =
Var(Ŝ(t))/n
where cα is the upper α-th percenbtile of a central chi-suqare distribution with 1 d.f. i.e.
prob(Y > cα ) = α where Y ∼ χ21. (e.g., prob(Y > 3.84146) = 0.05.)
• This methodology was further generalized to construct the confidence interval for Tq based on a
g-transformed confidence interval for S(t)( Klein and Moeschberger (1997))
|
g(Ŝ(t)) − g(1 − q))
| ≤ Z1−α/2
′
g (Ŝ(t))σ̂(Ŝ(t))
where g ′ (x) is the first derivative of g(x) and Z1−α/2 is the 100(1 − α/2)th percentile of the standard
distribution.
√
• Options for g(.): Linear (x), Log-Log (log(−log(x))), Arcsine-Square Root (sin−1 ( x), Logit
x
(log( 1−x
)), Log (log(x)).
• Note: The default method may be different among various software packages. Therefore, it is
imprtant to check the default setting in each type of software.
19
proc lifetest data=one outsurv=out1 stderr timelist=(12,24);
time SurvTime*Censor(1);
data temp;
set out1;
ci_med=((SURVIVAL-0.5)/sdf_stderr)**2;
run;
proc print data=temp;
------------------------------------------------------------------------The LIFETEST Procedure
Timelist
12.0000
24.0000
SurvTime
12.000
24.000
Survival
0.8759
0.7518
Failure
0.1241
0.2482
Survival
Standard
Error
0.0282
0.0369
Number
Failed
17
34
Number
Left
120
103
Summary Statistics for Time Variable SurvTime
Quartile Estimates
Percent
75
50
25
Mean
132.777
Point
Estimate
162.000
80.000
25.000
Transform
[Lower
Upper) ** note [, ): exclude next event
LOGLOG
132.000
231.000
LOGLOG
52.000
100.000
LOGLOG
18.000
33.000
Standard Error
15.368
Percent
Total Failed
Censored
Censored
137
128
9
6.57
20
Obs
32
33
34
35
Surv
Time
45
48
49
51
_CENSOR_
0
0
0
0
21
SURVIVAL
0.63408
0.62671
0.61933
0.59721
SDF_
STDERR
0.041228
0.041403
0.041567
0.041998
SDF_LCL
0.54737
0.53984
0.53233
0.50992
SDF_UCL
0.70863
0.70174
0.69483
0.67399
ci_med
10.58
9.37
8.24
5.36
36
37
38
39
40
41
42
43
44
52 (LL)
53
54
56
59
61
63
72
73
0
0
0
0
0
0
0
0
0
0.57509
0.56772
0.55298
0.54560
0.53823
0.53086
0.52348
0.51611
0.50874
0.042340
0.042434
0.042594
0.042659
0.042715
0.042761
0.042798
0.042826
0.042844
0.48769
0.48031
0.46562
0.45830
0.45100
0.44372
0.43645
0.42921
0.42198
0.65298
0.64594
0.63181
0.62471
0.61760
0.61047
0.60332
0.59616
0.58897
3.15
2.55
1.55
1.14
0.80
0.52
0.30
0.14
0.04
45
80
0
0.49399
0.042852
0.40759
0.57456
0.02
0
1
0
0
1
0
0
0
1
0
0
1
0.48662
0.48662
0.47913
0.47165
0.47165
0.46404
0.45643
0.44122
0.44122
0.42574
0.41799
0.41799
0.042842
.
0.042832
0.042812
.
0.042792
0.042762
0.042668
.
0.042552
0.042477
.
0.40041
.
0.39313
0.38587
.
0.37849
0.37113
0.35647
.
0.34160
0.33419
.
0.56732
.
0.55997
0.55261
.
0.54512
0.53762
0.52255
.
0.50718
0.49946
.
0.10
.
0.24
0.44
.
0.71
1.04
1.90
.
3.05
3.73
.
46
47
48
49
50
51
52
53
54
55
56
57
82
83
84
87
87
90
92
95
97
99
100 (UL)
100
58
103
0
0.41011
0.042401
0.32665
0.49161
4.49
59
103
1
0.41011
.
.
.
.
60
105
0
0.40207
0.042325
0.31896
0.48360
5.35
note: when censored and failure times are tied, the failure time is assumed to occur
before the censored time.
data one;
* Example when estimate of S(t)=0.5;
input time event;
cards;
10 1
30 1
30 1
33 1
50 1
70 0
90 1
100 0
;
proc lifetest;
time time*event(0);
run;
Survival
Standard
Number
time
Survival
Failure
Error
Failed
0.000
1.0000
0
0
0
10.000
0.8750
0.1250
0.1169
1
30.000
.
.
.
2
30.000
0.6250
0.3750
0.1712
3
33.000
0.5000
0.5000
0.1768
4
50.000
0.3750
0.6250
0.1712
5
70.000*
.
.
.
5
90.000
0.1875
0.8125
0.1578
6
100.000*
.
.
.
6
NOTE: The marked survival times are censored observations.
Number
Left
8
7
6
5
4
3
2
1
0
Summary Statistics for Time Variable time
Quartile Estimates
Point
Percent
Estimate
Transform
[Lower
Upper)
75
90.000
LOGLOG
30.000
.
50
41.500
LOGLOG
10.000
.
*** (33+50)/2=41.5
25
30.000
LOGLOG
10.000
50.000
22
23
> library(survival)
> attach(median_data)
> my.surv <- Surv(time, event)
> s1 <- survfit(Surv(time, event)~dummy)
> print(s1)
Call: survfit(formula = Surv(time, event) ~ dummy)
records
8.0
n.max n.start
8.0
8.0
events
6.0
median 0.95LCL 0.95UCL
41.5
30.0
NA
NOTE: median time is the same but the 95% CI is different between SAS and R. This is because SAS uses the Brookmeyer and Crowley method as
the default and R uses a log transformation as the default.
proc lifetest conftype=log;
time time*event(0);
run;
Quartile Estimates
Percent
Point
Estimate
75
50
25
90.000
41.500
30.000
Transform
[Lower
Upper)
LOG
LOG
LOG
50.000
30.000
10.000
.
.
.
95% CI using log−log transformation
1.0
1.0
0.8
0.8
probability
probability
Median survival time when S(t)=0.5
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
0
20
40
60
80
100
0
20
60
80
100
time
95% CI using log transformation
95% CI using ’plain’ (i.e., no transformation)
(Default in R)
(+/− 1.96*s.e.)
1.0
0.8
0.8
probability
probability
40
time
1.0
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
0
24
20
40
60
time
80
100
0
20
40
60
time
80
100
25
3.3 Life table
• This is a precursor of the KM method.
• A life table is a summary of the survival data grouped into convenient time intervals. In some
applications, data are collected in such a grouped form. In other cases, the data might be grouped to
get a simpler and more easily understood presentation.
• The life table method is designed primarily for situations in which actual failure and censoring times
are unavailable and only the number of failures and the number of censored cases are known in a given
interval.
• A good example is the SEER (Surveillance Epidemiology and End Results) data from NCI. SEER
provides information on cancer statistics to help reduce the burden of this disease on the U.S.
population. They show the survival curve by type of cancer yearly.
• We will not discuss the details of this method further, but refer to Section 3.6 in Lawless, ”Statistical
models and methods for lifetime data” for descriptions and examples.
Example 1b: Life Table from Grouped Data. Stanford Heart Transplant Data. See also page 41 in
Allison’s Survival Analysis Using SAS.
data;
input time status number @@;
cards;
25
1 16
25
0 3
75 1 11
75 0 0
300 1 5
300 0 4 550 1 2
550 0 6
1150 1 1
1150 0 2 ...
;
150 1 4
850 1 4
150 0 2
850 0 3
proc lifetest method=life intervals=50 100 200 400 700 1000 1300 1600
plots=(s. h) graphics;
freq number;
run;
Note: at time=50, 16 deaths and 3 censored. Thus, SKM (t = 50)=(63-16)/63=0.75
26
4. Comparison of Survival Curves: Nonparametric Approach
If there are two (or more) treatment groups in a clinical trial, a natural question to ask is whether one
treatment prolongs survival compared to the other treatment. i.e., testing Ho : S1(t) = S2 (t) for all t.
There are a few nonparametric tests to answer this question. These are
• Log-rank test: Mantel-Haenszel test (default in PROC LIFETEST), Peto & Peto test
• Wilcoxon test: Gehan test (default in PROC LIFETEST), Peto and Prentice test
• Gray and Tsiatis Log-rank test for a cure rate survival curves.
Note:
• The weighted log-rank test was suggested by many authors, but not widely used
Table. Weights available in PROC LIFETEST
Test
W (ti)
Log-rank
1
Wilcoxon
n
qi
Tarone and Ware
(ni )
Peto-Peto
S̄(ti)
i
modified Peto-Peto
S̄(ti) nin+1
Harrington and Fleming-Harrington [Ŝ(ti)]p[1 − Ŝ(ti)]q
where Ŝ(t) is the product-limit estiamte at t, S̄(t) is a survivor function estimate
• there is a likelihood ratio test in PROC LIFETEST. This test assumes exponential survival distribution - i.e., a
constant hazard function.
27
28
4.1 Log-rank test
Let t1 < · · · < tk represent the k ordered distinct failure times for the sample formed by pooling the two
(or p) samples. At the j-th failure time tj , we have the following table.
Treatment no. of death no of alive Total
1
d1j
n1j − d1j n1j
2
d2j
n2j − d2j n2j
Total
dj
nj − dj
nj
where d1j and d2j are the number of failures in treatment 1 and 2, respectively, at the j-th failure time,
n1j and n2j are the number of patients at risk in treatment 1 and 2, respectively, at the j-th failure time.
Then the log-rank statistic over the k failures is
k
n1j dj
X
w = (d1j −
) = (o1j − e1j )
nj
1
1
k
X
i.e., this is a sum of deviations of observed number of events from expected number of events over time.
Note: By symmetry, the absolute value of w for Treatment 1 is the same as the absolute value for
Treatment 2. Thus whether summing over treatment 1 or 2, we will have the same overall test statistic
since the numerator of the overall test statistic is the square of w.
29
If the k contingency tables were independent, the variance of the log-rank statistic w would be
V = V1 + · · · + Vk and an approximate test of equality of the two (or p) survival distributions is based on
an asymptotic χ21 (or χ2p−1 for p samples) for
′
w V −1w
where
n1j n2j dj (nj − dj )
n2j (nj − 1)
1
Thus, the log-rank test is obtained by constructing a 2x2 table at each distinct failure time, and comparing
the failure rates between the two groups, conditional on the number at risk in the groups. The tables are
then combined using the Cochran-Mantel-Haenszel test.
V=
k
X
The log-rank statistic is most powerful if the hazard ratios (or odds ratios) among the samples are
constant (called ’proportional hazards’) over time. The departure from the proportional hazards can be
checked by examing the estimated survival curves.
Note: Mantel and Haenszel (1959) proposed the summary statistics for K strata of 2x2 tables
P
P
P
M 2 = (| n11k − m11k | − 0.5)2/ (V (n11k )), assuming the null hypothesis of conditional independence.
This statistic has approximately a chi-squared distribution with df=1. Cochran (1954) proposed a similar
statistic without the continuity correction (0.5) and conditioned only on the group totals in each stratum,
treating each 2x2 table as 2 independent binomials. Because of the similarity, the statistic is called
“Cochran-Mantel-Haenszel (CMH)” test. Later, this CMH test was generalized for IxJxK tables by Birch
(1965), Landis et al. (1978) and Mantel and Byar (1978) and called “The Generalized CMH test”.
30
4.2 Wilcoxon Test
The Wilcoxon statistic is
k
X
w=
1
and the test statistic is
rj (d1j −
n1j dj
),
nj
′
χ2 = w V −1w
where
V=
k
X
1
rj2n1j n2j dj (nj − dj )
n2j (nj − 1)
The Wilcoxon statistic is a weighted log-rank statistic. i.e., the Wilcoxon test gives more weight to early
times than to late times since the weight, rj , j = 1, · · · , k, reflects the number of subjects at risk at each
time and always decreases. Thus, it is less sensitive than the log-rank test to differences between groups
that occur at later points in time.
Note: Neither test is particulalrly good at detecting differnces when survival curves cross.
31
4.3 Gray and Tsiatis Log-rank test for a cure rate model
Gray and Tsiatis (1989, Biometrics) proposed a linear rank test that tests equality of survival distribution,
giving more weight to later survival differences than does the log-rank test. Their proposed statistic is
Z−1
[KM (ti−)]−1[△N1(ti) − p(ti)]
= rP
[KM (ti−)]−2p(ti)[1 − p(ti)]]
P
where the weight KM −1(t−) is the inverse of the left continuous version of the Kaplan-Meier estimate of
Y1 (t)
is the expected number of
survival, N1(t) is the number of observed failures by t, and p(t) = [Y1 (t)+Y
2 (t)]
deaths on treatment 1 at time t, given one death occurs at t.
There is an in-house R program for this test (the latest version)
>library(mysurv)
>logrank(time, status, group, strata, rho=-1)
rho:
Specifies the value of rho in the G-rho test (Harrington
and Fleming, 1982).
rho=0 gives the logrank test, and
rho=1 the Peto-Peto Wilcoxon test (and rho=-1 the test
discussed by Gray and Tsiatis, 1989).
32
4.4 Stratified Log-rank Test
An overall test statististic is obtained by summing the log-rank statistics over h strata and corresponding
variances obtained within each of the independent strata.
(
s
X
h=1
s
h T X
w ) (
h=1
s
h −1 X
V ) (
wh )
h=1
where w and V are defined above.
Note: In SAS PROC LIFETEST, use the TEST (not STRATA) statement for staratified log-rank test.
33
Example 2A: Log-rank test
a. Leukemia data from Table 1.1 in Cox and Oakes (1984)
control: 1 1 2 2 3 4 4 5 5 8 8 8 8 11 11 12 12 15 17 22 23
6-MP: 6* 6 6 6 7 9* 10* 10 11* 13 16 17* 19* 20* 22 23
*: censored
Let us reconstruct the data.
Failure
Time
1
2
3
4
5
6
7
8
9
10
11
12
13
15
16
17
19
20
22
23
25
Control
6-MP
d0 c0 n0 d1 c1 n1
2 0 21 0 0 21
2 0 19 0 0 21
1 0 17 0 0 21
2 0 16 0 0 21
2 0 14 0 0 21
0 0 12 3 1 21
0 0 12 1 0 17
4 0 12 0 0 16
0 0 8 0 1 16
0 0 8 1 1 15
2 0 8 0 1 13
2 0 6 0 0 12
0 0 4 1 0 12
1 0 4 0 0 11
0 0 3 1 0 11
1 0 3 0 1 10
0 0 2 0 1 9
0 0 2 0 1 8
1 0 2 1 0 7
1 0 1 1 0 6
0 0 0 0 1 5
25*
32*
32* 34*
35*
Calculating logrank statistic by hand.
Failure Control
Time d0 n0
1
2 21
2
2 19
3
1 17
4
2 16
5
2 14
6
0 12
7
0 12
8
4 12
9
0
8
10
0
8
11
2
8
12
2
6
13
0
4
15
1
4
16
0
3
17
1
3
19
0
2
20
0
2
22
1
2
23
1
1
25
0
0
Sum
o = d0
0
e = d0+1 nn0+1
Combined
d0+1 n0+1
2
42
2
40
1
38
2
37
2
35
3
33
1
29
4
28
0
24
1
23
2
21
2
18
1
16
1
15
1
14
1
13
0
11
0
10
2
9
2
7
0
5
0+1 −d0+1 )
var = d0+1 n0nn21 (n(n
0+1 −1)
0+1
χ21 = (10.251)2/6.257 = 16.793
e
1.00
0.95
0.45
0.86
o−e
1.00
1.05
0.55
1.44
var
0.488
10.251 6.257
34
35
SAS codes
proc freq;
tables time*grp*status / cmh nopercent norow nocol;
weight cnt;
Table 1 of grp by status
Controlling for time=1
Table 2 of grp by status
grp
grp
status
Frequency|Dead
|Alive
|
---------+--------+--------+
Control |
2 |
19 |
---------+--------+--------+
6-MP
|
0 |
21 |
---------+--------+--------+
Total
2
40
Total
21
21
42
status
Frequency|Dead
|Alive
|
---------+--------+--------+
Control |
2 |
17 |
---------+--------+--------+
6-MP
|
0 |
21 |
---------+--------+--------+
Total
2
38
Total
.......
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic
Alternative Hypothesis
DF
Value
Prob
--------------------------------------------------------------1
Nonzero Correlation
1
16.7929
<.0001
2
Row Mean Scores Differ
1
16.7929
<.0001
3
General Association
1
16.7929
<.0001
** Log-rank test
--------------------------------------------------------------------
19
21
40
proc lifetest;
time time*dth(0);
strata grp;
Test of Equality over Strata
Pr >
Test
Chi-Square
DF
Chi-Square
Log-Rank
16.7929
1
<.0001
Wilcoxon
13.4579
1
0.0002
-2Log(LR)
16.4852
1
<.0001
R codes
> library(survival)
> attach(leukemia_data)
> logrank <- survdiff(Surv(time, dth) ~ grp)
print(logrank)
Call:
survdiff(formula = Surv(time, dth) ~ grp)
N Observed Expected (O-E)^2/E (O-E)^2/V
grp=6-MP 21
9
19.3
5.46
16.8
grp=cnt 21
21
10.7
9.77
16.8
Chisq= 16.8
on 1 degrees of freedom, p= 4.17e-05
36
> cox1 <- coxph(Surv(time, dth)~factor(grp), method="exact")
> print(summary(cox1))
Call:
coxph(formula = Surv(time, dth) ~ factor(grp), method = "exact")
n= 42
coef exp(coef) se(coef)
z Pr(>|z|)
factor(grp2)1 -1.6282
0.1963
0.4331 -3.759 0.000170 ***
--Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
factor(grp2)1
exp(coef) exp(-coef) lower .95 upper .95
0.1963
5.095
0.08398
0.4587
Rsquare= 0.321
(max possible= 0.98 )
Likelihood ratio test= 16.25 on 1 df,
Wald test
= 14.13 on 1 df,
Score (logrank) test = 16.79 on 1 df,
Note: Score test from coxph is the log-rank test.
p=5.544e-05
p=0.0001704
p=4.169e-05
***
37
38
more examples
b. proc lifetest data=one timelist=(12, 24);
strata group;
title "Overall Survival";
** Also, use
strata group treatment; ** for a comparison of 4 subsets.
strata age (30, 40, 50); ** for a comparison of age intervals: (0, 30), [30, 40), [40, 50), [50+).
The LIFETEST Procedure
Timelist
time
12.0000
24.0000
9.2649
16.5257
Stratum 1: group = Mini
Survival
Standard
Survival
Failure
Error
0.4694
0.2701
0.5306
0.7299
0.0713
0.0707
Number
Failed
Number
Left
26
33
21
4
*** 95% CI for 2-year survival: (0.27-1.96*0.0707, 0.27+1.96*0.0707) = (0.13, 0.409).
Quartile Estimates
Point
Percent
Estimate
[Lower
Upper)
75
50
25
24.7721
9.1335
4.6653
14.6201
6.0123
3.9754
.
15.3758
6.1437
Stratum 2: group = STD
Timelist
12.0000
24.0000
time
11.6961
22.8994
Survival
Standard
Survival
Failure
Error
0.4771
0.5229
0.0440
0.4036
0.5964
0.0453
Quartile Estimates
Percent
75
50
25
Point
Estimate
.
9.9220
2.6283
[Lower
Upper)
37.6181
.
5.8152
22.8994
1.5113
4.4025
Percent
Stratum
group
Total Failed
Censored
Censored
1
Mini
49
34
15
30.61
2
STD
132
80
52
39.39
-----------------------------------------------------------Total
181
114
67
37.02
Test
Log-Rank
Wilcoxon
-2Log(LR)
Chi-Square
DF
Pr >
Chi-Square
0.4374
0.2844
4.6428
1
1
1
0.5084
0.5939
0.0312
Number
Failed
68
75
Number
Left
56
31
39
40
II. R code
> library(survival)
> logrank <- survdiff(Surv(time, status) ~ group, data=aml.data)
> print(logrank)
N Observed Expected (O-E)^2/E
Mini 49
34
30.9
0.3100
STD 132
80
83.1
0.1153
Chisq= 0.4
on 1 degrees of freedom, p= 0.5084
fit1 <- survfit(Surv(time, status) ~ group, conf.type=c("none"), data=aml.data)
plot(fit1, xlab=c("Months"), ylab=c("Probability"),mark.time=T, xlim=c(0,36), xaxt="n", col=c(3,6), lwd=2)
title("Overall Survival: AML/MDS, STD vs. NST", cex = 1.2) #
axis(1, at = c(0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72))
axis(2, at = c(0, 0.2, 0.4, 0.6, 0.8, 1), lwd = 2)
leg.txt <- c("NST", "STD") *** Note: NST=mini **
legend(60, 0.9, leg.txt, lty = c(1:2), lwd = 2, col = c(3, 6))
Overall Survival: AML/MDS, STD vs. NST
1.0
NST
STD
probability
0.8
0.6
0.4
0.2
0.0
0
6
12 18
24
30 36
42
months
48 54
60 66
72
41
b. Veterans Administration (VA) Lung Cancer Data
This was a clinical trial with 137 male patients with advanced inpoerable lung cancer. The endpoint was time to death. There were 6 covariates
measured at the time of randomization: cell type (squamous cell, large cell, small cell, and adenocarcinoma), Karnofsky performance status, time in
months from diagnosis to the start of therapy, age in years, prior therapy (yes/no), and treatment (experimental vs. standard therapy).
proc lifetest data=one notable;
strata Cell / test=logrank adjust=sidak;
run;
Pr >
Test
Chi-Square
DF
Chi-Square
Log-Rank
25.4037
3
<.0001
** other test=: Fleming, none, LR, MODPETO, PETO, Wilcoxon, Tarone;
Adjustment for Multiple Comparisons
for the Logrank Test
Strata Comparison
p-Values
Cell
Cell
Chi-Square
Raw
Sidak
adeno
large
7.8476
0.0051
0.0301
adeno
small
0.4843
0.4865
0.9817
adeno
squamous
15.0560
0.0001
0.0006
large
small
8.9284
0.0028
0.0167
large
squamous
0.8739
0.3499
0.9245
small
squamous
14.8237
0.0001
0.0007
------------------------------------------------------------------------------proc lifetest data=one notable;
strata Cell / test=logrank adjust=sidak diff=control(’adeno’);
run;
Adjustment for Multiple Comparisons
for the Logrank Test
Strata Comparison
p-Values
Cell
Cell
Chi-Square
Raw
Sidak
large
adeno
7.8476
0.0051
0.0152
small
adeno
0.4843
0.4865
0.8646
squamous
adeno
15.0560
0.0001
0.0003
42
c. CLL data
1.0
Probability
0.8
0.6
0.4
0.2
Group 1
Group 2
Group 3
0.0
0
5
10
15
Years from Diagnosis
1. multiple comparisons:
proc lifetest data=one plot=s timelist=(10,20);
time os_t_dx*os(0);
*strata grp / test=logrank adjust=sidak diff=control("2");
strata grp / test=logrank adjust=sidak;
run;
20
25
Group
Total Failed
Censored
Censored
1
64
17
47
73.44
2
99
5
94
94.95
3
15
2
13
86.67
------------------------------------------------------------------------------Total
178
24
154
86.52
Adjustment for Multiple Comparisons for the Logrank Test
Strata Comparison
p-Values
Group
Group
Chi-Square
Raw
Sidak
1
2
21.8311
<.0001
<.0001
3
2
6.4525
0.0111
0.0220
*******************************************************************;
Strata Comparison
p-Values
Group
Group
Chi-Square
Raw
Sidak
1
2
21.8311
<.0001
<.0001
1
3
15.4962
<.0001
0.0002
2
3
6.4525
0.0111
0.0329
2. two group comparison I
where grp in ("2", "3");
time os_t_dx*os(0);
strata grp / test=logrank adjust=sidak diff=control("2");
*strata grp; * same statement;
run;
Strata Comparison
p-Values
Group
Group
Chi-Square
Raw
Sidak
3
2
0.3624
0.5472
0.5472
43
3. two group comparison II
where grp in ("1", "3");
time os_t_dx*os(0);
strata grp / test=logrank adjust=sidak diff=control("3");
*strata grp; * same statement;
run;
Strata Comparison
p-Values
Group
Group
Chi-Square
Raw
Sidak
3
1
4.2057
0.0403
0.0403
Note: options for adjust=: bonferroni, dunnett, scheffe, sidak, SMM, GTE, Tukey.
44
45
Example 3: Stratified Log-Rank test.
a. proc lifetest data=one timelist=(12, 24);
strata age50; *** 1 if age>=50, 0 else;
test group2;
Note: The log-rank and Wilcoxon statistics produced by the TEST statement are first calculated within stratum and then averaged across strata.
These are stratified statistics that control for age50. i.e, rank test for group was calculated for age<50 and age≥50 separately and then averaged.
Rank Tests for the Association of os_t with Covariates Pooled over Strata
Univariate Chi-Squares for the Wilcoxon Test
Test
Standard
Pr >
Variable
Statistic
Deviation
Chi-Square
Chi-Square
group2
-5.3943
3.0784
3.0706
0.0797
Forward Stepwise Sequence of Chi-Squares for the Wilcoxon Test
Pr >
Chi-Square
Pr >
Variable
DF
Chi-Square
Chi-Square
Increment
Increment
group2
1
3.0706
0.0797
3.0706
0.0797
Univariate Chi-Squares for the Log-Rank Test
Test
Standard
Pr >
Variable
Statistic
Deviation
Chi-Square
Chi-Square
group
-4.2870
4.2820
1.0024
0.3167
Covariance Matrix for the Log-Rank Statistics
Variable
group2
group2
18.3357
Forward Stepwise Sequence of Chi-Squares for the Log-Rank Test
Pr >
Chi-Square
Pr >
Variable
DF
Chi-Square
Chi-Square
Increment
Increment
group
1
1.0024
0.3167
1.0024
0.3167
> library(survival)
> fit1 <- survdiff(Surv(time, status) ~ group+strata(age50), data=aml.data) # stratified log-rank test
N Observed Expected (O-E)^2/E (O-E)^2/V
factor(grp)=0 132
80
75.7
0.243
1.00
factor(grp)=1 49
34
38.3
0.480
1.00
Chisq= 1
on 1 degrees of freedom, p= 0.316
46
47
b. proc lifetest data=one timelist=(12, 24);
strata group age50;
Stratum
1
2
3
4
age50
<50
>=50
<50
>=50
Pr >
Chi-Square
DF
Chi-Square
Log-Rank
Wilcoxon
-2Log(LR)
14.6157
17.5549
17.4462
3
3
3
0.0022
0.0005
0.0006
Progression-free Survival: AML/MDS, STD vs. NST
1.0
NST, age<50
NST, age>=50
STD, age<50
STD, age>=50
0.8
probability
Test
Legend for Strata
group
Mini
Mini
STD
STD
0.6
0.4
0.2
0.0
0
6
12 18
24
30 36
42
months
48 54
60 66
72
d. Veterans Administration (VA) Lung Cancer Data
title ’VA Lung Cancer Data’;
symbol1 c=blue ; symbol2 c=orange; symbol3 c=green;
symbol4 c=red; symbol5 c=cyan; symbol6 c=black;
proc lifetest plots=(s,ls,lls) outtest=Test maxtime=600;
id Therapy; *** to identify the type of therapy for each observation in PL estimates
strata Cell;
test Age Prior DiagTime Kps Treatment; ** testing multiple variables stratified by Cell
run;
** Output of the test statement
Rank Tests for the Association of SurvTime with Covariates Pooled over Strata
Variable
Age
Prior
DiagTime
Kps
Treatment
Test
Statistic
14.4158
-26.3997
-82.5069
856.0
-3.1952
Univariate Chi-Squares for the Wilcoxon Test
Standard
Pr >
Deviation
Chi-Square
Chi-Square
Label
66.7598
0.0466
0.8290
age in years
28.9150
0.8336
0.3612
prior treatment?
72.0117
1.3127
0.2519
months till randomization
118.8
51.9159
<.0001
karnofsky index
3.1910
1.0027
0.3167
treatment indicator
48
Variable
Test
Statistic
Age
Prior
DiagTime
Kps
Treatment
-40.7383
-19.9435
-115.9
1123.1
-4.2076
49
Univariate Chi-Squares for the Log-Rank Test
Standard
Pr >
Deviation
Chi-Square
Chi-Square
Label
105.7
46.9836
97.8708
170.3
5.0407
0.1485
0.1802
1.4013
43.4747
0.6967
0.7000
0.6712
0.2365
<.0001
0.4039
age in years
prior treatment?
months till randomization
karnofsky index
treatment indicator
Plot options:
• s: estimated survivor function against time
• h: estimated hazard function against time
• ls: estimated -log S(t) function against time ** This is cumulative hazard**
• lls: estimated log(-log S(t)) function against log time
• Note1: If exponential,−logS(t) = λt and log(−logS(t)) = log(λ) + log(t). Thus, log(−logS(t)) is a linear function of log(t).
• Note2: ls and lls plots provide an empirical check of the appropriateness of the exponential model or Weibull model. If the exponential model
is appropriate, the ls curve should be approximately linear through the origin. If the Weibull model is appropriate, the lls curve should be
approximately linear since Weibull S(t) = exp(−a ∗ tr ).
50
51
4.5 Mantel-Byar Test (JASA, 1974)
• What if group membership is deterimined during the study follow-up (i.e., time-dependent).
• In the Stanford Heart Transplant data, the group membership for ’tranplant’ vs. ’no tranplsant’ is
determined during the study follow up and patients receiving a heart transplant must have at least
survived from time of diagnosis to time of transplant (thus ’time-to-transplant bias’), whereas no such
requirement is necessary for the control subjects.
• In comparison of survival time between responders and non-responders, responders must live long
enough to achieve a response.
• Thus, ’naive’ analysis from the start of the treatment (or transplant) ignores this ’lead-time’ bias. The
’bias’ is coming from i) increasing probability of observing a response or undergoing transplant with
longer follow-up time or wait-time. i.e, the response/transplant group is depnedent on the length of
follow-up.
• Mantel-Byar Test is a simple modification of log-rank test to test group difference when group
membership changes over time (i.e., time-dependent variable). i.e., a transient state analysis. See
Example 3B.
• KM curves do not work here since the number of subjects at risk is not necessarily decreasing.
• Simon and Makuch (Stat. Med. 1984) proposed a graphycal presentation using a multi-state survival
model to calculate cumulative hazard and then transformed it into survival function KM type estimator
52
53
4.6 Landmark Analysis
• In the initial paper by Anderson et al (JCO, 1983), survival time between ”responders” and
”non-responders” was compared from the start of the study although the response status was not
determined at the start of the study. To correct this ’time-to-response’ bias (also called ”lead-time
bias”, ”guarantee-time bias”), the landmark analysis was introduced.
• The goal is to estimate in an unbiased way the time-to-event probabilities in each group conditional on
the group membership of patients at a specific time point (the landmark time)
• Landmark Analysis:
– Select a fixed time point
– Include only patients alive at the landmark time in the analysis
– Conduct a usual survival analysis
• Advantages:
– Simple execution and use of standard survival analysis
– Unbiased estimation
– Correct conditional statistical tests
54
• Limitations:
– Arbitrary selection of the landmark time (select the landmark a priori)
– Exclusion of early events that occurred prior to the landmark time and possibly data drive results.
– Lack of generalization due to the conditional estimates
– Issue of miscalssification at longer follow-up time for early landmark time; exclusion of high
proportion of events for late landmark time.
– Lack of randomization property. The group membership can be confounded with the patient
characteristics. e.g. responders are good prognsois patients.
• Alternative method is to use Cox models with a time dependent covariate.
• See Dafni (Circ. Cardiovasc Qual Outcomes 2011) for review
55
Example 3B: Mantel-Byar test. Stanford Heart Transplant data taken from Mantel and Byar (1974)
Failure
time
0
1
2
5
7
8
11
15
17
27
34
35
36
38
39
43
44
49
50
57
60
65
67
68
71
76
77
84
99
101
148
152
187
218
284
339
674
732
851
1032
Total
Transplant
n1
d1
0
0
2
0
4
0
8
1
6
0
7
0
8
0
11
1
13
0
22
1
24
0
24
0
26
0
27
1
25
0
27
1
26
2
25
0
25
1
26
1
26
1
25
1
25
1
24
0
25
2
22
1
21
1
22
0
22
1
21
0
20
0
20
1
19
1
17
1
16
1
15
1
9
1
7
1
6
1
3
1
26
No Transplant
n0
d0
68
1
65
2
61
3
54
2
52
1
50
1
48
1
44
1
40
1
28
0
25
1
24
1
21
1
19
0
19
2
15
0
15
0
14
1
13
0
11
0
10
0
10
0
9
0
9
1
7
0
7
0
7
0
5
1
4
0
4
1
3
1
2
0
2
0
2
0
1
0
1
0
1
0
1
0
1
0
1
0
23
Loss to
Transplant
2
2
4
0
1
1
3
3
9
3
0
2
1
0
2
0
1
0
2
1
0
1
0
1
0
0
2
0
0
0
0
0
0
1
0
0
0
0
0
0
42
P
P
P
o = 26,
e = 26.575,
var = 7.349
χ21 = (26 − 26.575)2 /7.349 = 0.045
56
proc freq;
tables time*status*grp / nopercent norow nocol cmh;
weight cnt;
Table 1 of status by grp
status
grp
Frequency|NT
|T
|
---------+--------+--------+
0 |
67 |
0 |
---------+--------+--------+
1 |
1 |
0 |
---------+--------+--------+
Total
68
0
status
grp
Frequency|NT
|T
|
---------+--------+--------+
0 |
63 |
2 |
---------+--------+--------+
1 |
2 |
0 |
---------+--------+--------+
Total
65
2
Total
67
1
68
status
grp
Frequency|NT
|T
|
---------+--------+--------+
0 |
58 |
4 |
---------+--------+--------+
1 |
3 |
0 |
---------+--------+--------+
Total
61
0
....
Total
65
2
67
Total
62
3
65
status
grp
Frequency|NT
|T
|
---------+--------+--------+
0 |
52 |
7 |
---------+--------+--------+
1 |
2 |
1 |
---------+--------+--------+
Total
54
8
Total
59
3
62
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic
Alternative Hypothesis
DF
Value
Prob
--------------------------------------------------------------1
Nonzero Correlation
1
0.0450
0.8320
2
Row Mean Scores Differ
1
0.0450
0.8320
3
General Association
1
0.0450
0.8320 ** MB test
NOTE: MB test can also be obtained by fitting a Cox model with a time dependent covariate. MB test is the score test.
57
58
5. Survival Models
5.1 Proportional Hazards Model - a semiparametric model
• Let λ(t|X) represent the hazard function at time t for an individual with basic covariates x. The
relative risk model by Cox (1972) is
′
λ(t|X) = λ0(t)expX β
(1)
where λ0(·) is an arbitrary unspecified baseline hazard function that depends on time t, also called an
underlying hazard function or a hazard function for a standard subject.
X ′ β = β1X1 + β2X2 + · · · + βk Xk is the regression effect of covariates, X1, X2, · · · , Xk , that is
independent of time t.
• This is a PH structure with a loglinear model.
• Any parametric function (e.g., Weibull) can be used for λ0(t). In PH model (1), λ0(t) dosen’t need to
be specified to estimate β.
• (1) can also be written in terms of the cumulative hazard and survival functions:
′
Λ(t|X) = Λ0(t)expX β
′
S(t|X) = exp[−Λ0(t)expX β ]
′
= exp[−Λ0(t)]exp(X β)
′
= S0 (t)exp(X β)
where Λ0(t) is an underlying cumulative hazard function, S0(t) underlying survival function, S(t|X) is
the probability of surviving past time t given the values of the predictors X.
59
5.2 PH Model Assumptions
The PH model can be linearized with respect to Xβ.
logλ(t|X) = logλ0(t) + Xβ
logΛ(t|X) = logΛ0(t) + Xβ
1. Linearity and additivity of the predictors with respect to log hazard or log cumulative hazard.
2. Proportional hazards of no time by predictor interaction, i.e., the predictors have the same effect on
the hazard function at all values of t. The relative hazard function exp(Xβ) is constant over time.
For example, the hazard ratio between age 30 and age 40 is the same as the hazard ratio between age
70 and age 80.
3. Exponential Link Function
60
5.3 Interpretation of Parameters
The regression coefficient for Xj , βj , is the increase in log hazard or log cumulative hazard at any fixed
point in time if Xj is increased by one unit and all other predictors are held constant. i.e.,
βj = logλ(t|X1, · · · , Xj + 1, Xj+1, · · · , Xk ) − logλ(t|X1, · · · , Xj, Xj+1, · · · , Xk ),
which is equivalent to the log of the hazards at time t. The ratio of hazards for an individual with
predictor variable values X ∗ compared to an individual with predictor X is
X ∗ : X hazard ratio = [λ0(t)exp(X ∗β)]/[λ0(t)exp(Xβ)]
= exp(X ∗β)]/exp(Xβ)]
= exp[(X ∗ − X)β]
If there is only one binary predictor X1, the PH model can be written
λ(t|X1 = 0) = λ0(t)
λ(t|X1 = 1) = λ0(t)exp(β1)
Here exp(β1) is the hazard ratio of X1 = 1 : X1 = 0. Note that the PH assumption between X1 = 1 and
X1 = 0 needs to be examined, but no need to check the linearity assumption. If the single predictor X1 is
continuous, the model becomes
λ(t|X1) = λ0(t)exp(β1X)
61
5.4 Estimation of Parameters
Assuming the PH model (1) is correct, Cox (1972, 1975) proposed a partial likelihood approach to
estimate β since unknown baseline hazard function λ0(t) prevents constructing a full likelihood function.
Cox argued that if the PH model holds, information about λ0(t) is not useful for estimating the parameter
of primary interest, β.
Let t1 < t2 < · · · < tk (k < n) be the distinct ordered uncensored observations of the n subjects in a
sample, assuming no tied uncensored observations, and let ni be the number of subjects at risk at time ti.
The conditional probability that the ith subject is the one that fails at time ti given that only one subject
fails among those subjects at risk at ti is
Prob(subject i fails at ti| only one subject fails and ni at ti ) =
Prob(subject i fails at ti|ni )
=
Prob(only one fails at ti|ni)
λ0(ti)exp(Xiβ)
=
P
λ
(t
)exp(X
β)
j∈ni 0 i
j
exp(Xiβ)
=
P
exp(X
β)
j∈ni
j
exp(Xiβ)
P
Yj ≥ti exp(Xj β)
Thus, the partial likelihood is
P L(β) =
Y
i
P
exp(Xiβ)
exp(Xiβ)
Y
=
P
Yi uncensored Yj ≥Yi exp(Xj β)
j∈ni exp(Xj β)
62
• Note that λ0(t) cancels out of the numerator and denominator, thus, the ratio of hazard is constant
over time.
• Since the likelihood function does not make direct use of the censored and uncensored survival times
and it does not require the specification of λ0(t), it is refered to as a particial likelihood function.
• Since it is computationally more convenient to maximize the log-likelihood function and
approximations to the variance of MLE can be obtained from the second derivatives, the log partial
likelihood function can be written as
l(β) = log PL(β) =
n
X
[Xiβ − log(
i:Yi uncensored
X
Yj ≥Yi
exp(Xiβ))]
• Differentiating this function with respect to β gives the px1 score vector, U (β).
∂
l(β)
U (β) =
∂β
and the MLE of β can be obtained by solving U (β) = 0. The negative second derivative of this
function is the pxp information matrix,
∂ 2l(β)
I(β) = −
∂β 2
, and the variance of β̂ is the inverse of the expected information matrix.
• β̂ is consistent and asymptotically normally distributed with mean β and variance {E[I(β)]}−1.
Because the expected information matrix is a function of β, which is unknown, the observed
information is used to calculate the estimate of β and its variance.
• Both SAS and S-plus/R use the Newton-Raphson algorithm to maximize the log partial likelihood
equation.
• Furthermore, estimates depend on the ranks of the event times - thus robust and scale invariant
63
• Estimators for λ0(t)
– Nelson-Aalen estimator: Λ̂0(t) =
P
– Breslow (1972) estimator: Λ̂0(t) =
dj
i:tj <t nj
dj
i:tj <t P exp(Xβ)
P
– Kalbfleisch-Prentice (1973) estimator: KM without covariates, default in PROC PHREG
• If we have the MLE of λ0(t), the MLEs for λ(t|X), Λ(t|X) and S(t|X) are
λ̂(t|X) = λ̂0(t)exp(X β̂)
Λ̂(t|X) = Λ̂0(t)exp(X β̂)
Ŝ(t|X) = exp[−Λ̂0(t)exp(X β̂)] = Ŝ0(t)exp(X β̂)
• Advantages of Cox model
– can estimate β without any distributional assumption for λ0(t), thus semiparametric.
– easy to incorporate time-dependent covariates
– permits a stratified analysis that controls nuisance variables.
– can accommodate both discrete and continuous measurements of event times.
64
65
5.5 Testing the Global/Local Null Hypothesis: Wald, score, and likelihood ratio tests
• Likelihood ratio test (LRT): χLR = −2[l(β̂) − l(β (0))].
β (0) = 0, the initial value of β̂. For an individual variable, the model needs to be fit twice - with and
without the variable of interest.
ˆ β̂ − β (0)), where Iˆ = I(β̂) is the estimated information matrix at the
• Wald test: χW = (β̂ − β (0))′ I(
solution of I(β).
This is a direct use of MLE β̂ For an individual variable, both SAS and R/S-plus outputs provide the
Wald test.
Confidence intervals for hazard ratios of individual variables are usually calculated based on the Wald
statistics (e.g., 95% CI: exp(β̂ ± 1.96s.e.(β̂)).
′
• Score test: χS = U (β (0))I(β (0))−1U (β (0)).
For an individual variable, this test requires fitting two models, with and without the variable of
interest, since this is a test at β = 0.
When there is a single categorical variable, the score test is identical to the log-rank test.
66
• Under mild assumptions, each test statistic has a χ2 with p d.f. given the null hypothesis, where p is
the dimension of β.
• They are asymptotically equivalent but may differ in finite samples.
• The direct use of MLE (β̂ (i.e., Wald test) has advantages in simple presentation (i.e., convenient),
but it is not invariant under reparametrization and it does not behave well if the likelihood is of
unusual shape.
• In finite samples, LRT is the most reliable and recommended although this requires fitting model twice.
• Missing values in the data set can be an issue when computing the LR test for a varible.
• For testing global hypothesis, both SAS and R/S-plus output provide all three test statistics.
Example 4: Cox model
1. SAS: PROC PHREG.
proc phreg data=one;
class pt_dnr_gender(ref=’MF’) group(ref=’Mini’) donor(ref=’MRD’);
model os_t*os(0)=age pt_dnr_gender group donor / risklimits ties=efron;
hazardratio pt_dnr_gender / diff=all; ** check diff=ref;
run;
Note: ‘risklimits’ option gives the 95% confidence interval for the hazard ratio.
The PHREG Procedure
Model Information
Data Set
WORK.ONE
Dependent Variable
os_t
Censoring Variable
os
Censoring Value(s)
0
Ties Handling
EFRON
Number of Observations Read
Number of Observations Used
Class
Class Level Information
Value
Design Variables
pt_dnr_gender
group
donor
181
181
FF
FM
MF
MM
Mini
STD
MRD
MUD
1
0
0
0
0
1
0
1
0
1
0
0
0
0
0
1
Summary of the Number of Event and Censored Values
Percent
Total
Event
Censored
Censored
181
114
67
37.02
67
Model Fit Statistics
Without
With
Criterion
Covariates
Covariates
-2 LOG L
AIC
SBC
1060.874
1060.874
1060.874
1037.388
1049.388
1065.805
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Parameter
age
pt_dnr_gender
pt_dnr_gender
pt_dnr_gender
group
donor
FF
FM
MM
STD
MUD
Chi-Square
23.4855
22.6099
22.2285
DF
1
1
1
1
1
1
DF
6
6
6
Pr > ChiSq
0.0006
0.0009
0.0011
Analysis of Maximum Likelihood Estimates
Parameter
Standard
Hazard 95% Hazard Ratio
Estimate
Error Chi-Square Pr > ChiSq
Ratio Confidence Limits
0.04515
0.01053
18.3964
<.0001
1.046
1.025
1.068
0.65204
0.28764
5.1388
0.0234
1.919
1.092
3.373
0.32404
0.30935
1.0972
0.2949
1.383
0.754
2.535
0.30973
0.27429
1.2752
0.2588
1.363
0.796
2.333
0.53453
0.25882
4.2652
0.0389
1.707
1.028
2.834
0.30787
0.19615
2.4635
0.1165
1.361
0.926
1.998
Hazard Ratios for pt_dnr_gender
Description
pt_dnr_gender
pt_dnr_gender
pt_dnr_gender
pt_dnr_gender
pt_dnr_gender
pt_dnr_gender
FF
FF
FF
FM
FM
MF
vs
vs
vs
vs
vs
vs
FM
MF
MM
MF
MM
MM
Point
Estimate
1.388
1.919
1.408
1.383
1.014
0.734
95% Wald Confidence
Limits
0.811
2.377
1.092
3.373
0.872
2.275
0.754
2.535
0.598
1.721
0.429
1.256
68
2. R code
coxph1 <- coxph(Surv(os.t, os) ~ age+factor(pt.dnr.gender)+factor(group) +factor(donor),
method="efron", data=aml.data)
print(summary(coxph1))
Call:
coxph(formula = Surv(os.t, os) ~ age + factor(pt.dnr.gender) +
factor(group) + factor(donor), method = "efron")
n= 181
z Pr(>|z|)
age
0.04515
1.04619 0.01053 4.289 1.79e-05 ***
factor(pt.dnr.gender)FM -0.32800
0.72036 0.27441 -1.195
0.2320
factor(pt.dnr.gender)MF -0.65204
0.52098 0.28764 -2.267
0.0234 *
factor(pt.dnr.gender)MM -0.34231
0.71013 0.24473 -1.399
0.1619
factor(group)STD
0.53453
1.70665 0.25882 2.065
0.0389 *
factor(donor)MUD
0.30787
1.36053 0.19615 1.570
0.1165
--Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
age
1.0462
0.9559
1.0248
1.0680
factor(pt.dnr.gender)FM
0.7204
1.3882
0.4207
1.2335
factor(pt.dnr.gender)MF
0.5210
1.9195
0.2965
0.9155
factor(pt.dnr.gender)MM
0.7101
1.4082
0.4396
1.1472
factor(group)STD
1.7066
0.5859
1.0276
2.8344
factor(donor)MUD
1.3605
0.7350
0.9263
1.9984
Rsquare= 0.122
Wald test
= 22.23 on 6 df,
p=0.0006492
p=0.001101
p=0.0009383
Note: ‘method’ is an option for handling ties. Here the options are ”efron”,”breslow”, and ”exact”. If there are no tied death times all the
methods are equivalent. Nearly all Cox regression programs use the Breslow method by default, but this is not the case in R/Splus. The Efron
approximation is the default in R/Splus.
69
3. Example of LRT when there are missing data
--------------------------------------------------------------------a. proc phreg data=one; *** if there are missing data in age;
model os_t*os(0)=age2 pt_dnr_gender group donor / rl ties=efron;
Parameter
age2
DF
1
Parameter
Estimate
0.04814
Standard
Hazard 95% Hazard Ratio
Ratio Confidence Limits
0.01132
18.0949
<.0001
1.049
1.026
1.073
Percent
Total
Event
Censored
Censored
157
96
61
38.85
Without
With
Criterion
Covariates
Covariates
-2 LOG L
870.425
846.979
--------------------------------------------------------------------b. proc phreg data=one;
model os_t*os(0)=pt_dnr_gender group donor / rl ties=efron;
Total
181
Criterion
-2 LOG L
Event
114
Without
Covariates
1060.874
Censored
67
Percent
Censored
37.02
With
Covariates
1056.794
LRT: 1056.794-846.979=209.895 (??)
70
c. proc phreg data=one;
where age2>0;
model os_t*os(0)=pt_dnr_gender group donor / rl ties=efron;
run;
Total
157
Event
96
Censored
61
Percent
Censored
38.85
Without
With
Criterion
Covariates
Covariates
-2 LOG L
870.425
LRT: 866.147-846.979=19.168
866.147
71
72
5.6 Stratified Cox PH model
• The idea behind the stratified Cox PH model is to allow different baseline hazard functions across
levels of the stratification factors.
• The stratified Cox model ranks the failure times separately within strata and formulates a separate log
likelihood function for each stratum, but with each log likelihood having a common β vector. The
likelihood functions are then multiplied together to form a joint likelihood over strata.
Computationally,
l(β) =
K
X
k=1
lk (β)
where lk (β) is log partial likelihood function in stratum k.
• Stratified Cox model is commonly used to adjust for observations involving some kind of clustering or
multi-level grouping. Examples include muticenter clincial trials. Because of varying patient
populations, supportive care, and referral patterns, the different clinical centers in the trial are likely to
have different baseline survival functions.
• Advantage of stratification is that it gives the most general adjustment for a confounding variable.
• Disadvantages are i) no direct inference (thus no p-value) can be made about the stratification factors
because these are merely “adjusted for” or “controlled for” and not regarded as a risk factor; ii) the
precision of estimated coefficients and power of hypothesis may be deminished if there are a large
number of strata.
• A stratified Cox model also allows a modeled factor to interact with strata.
• Stratification is useful for checking the PH and linearity assumptions. We will discuss this later. See
Example 8.
73
Example 5. Stratified Cox model:
1. proc phreg data=one;
strata grp;
model os_t*os(0)=age/ ties=efron;
Variable
age
Parameter
Standard
Estimate
Error
Chi-Square
Pr > ChiSq
0.03666
0.00966
14.3909
0.0001
DF
1
Hazard
Ratio
1.037
strata grp;
model os_t*os(0)=age grp_age/ ties=efron;
grp_age=grp*age;
Variable
age
grp_age
Parameter
Standard
Estimate
Error
Chi-Square
Pr > ChiSq
0.01541
0.01595
0.9333
0.3340
-0.03075
0.01963
2.4541
0.1172
DF
1
1
Hazard
Ratio
1.016
0.970
strata grp;
class age50(ref="0");
model os_t*os(0)=age50 grp_age50/ ties=efron;
grp_age50=grp*age50;
run;
Parameter
DF
Parameter
Estimate
Standard
Error
age50
grp_age50
1
1
0.83554
-0.89158
0.23618
0.44480
Chi-Square
Pr > ChiSq
12.5158
4.0179
0.0004
0.0450
Hazard
Ratio
2.306
0.410
*** Since grp is non PH, a more appropriate model is
class grp(ref="0");
model os_t*os(0)=age grp grp_age grp_t/ ties=efron;
grp_age=grp*age;
grp_t=grp*os_t;
run;
Parameter
Standard
Parameter
DF
Estimate
Error
Chi-Square
age
grp
grp_age
grp_t
1
1
1
1
1
0.04703
0.45698
-0.03006
0.12062
0.01145
1.09489
0.01984
0.03524
74
Pr > ChiSq
Hazard
Ratio
<.0001
0.6764
0.1297
0.0006
1.048
1.579
0.970
1.128
Pr > ChiSq
Hazard
Ratio
0.0003
0.4411
0.0435
0.0009
2.335
0.720
0.407
1.124
16.8788
0.1742
2.2958
11.7164
*** what if we use age50 instead of age as a continuous variable
class age50(ref="0") grp(ref="0");
model os_t*os(0)=age50 grp grp_age50 grp_t/ ties=efron;
grp_age50=grp*age50;
grp_t=grp*os_t;
run;
Parameter
Standard
Parameter
DF
Estimate
Error
Chi-Square
age50
1
grp
1
grp_age50
grp_t
1
1
1
1
0.84784
-0.32889
-0.89777
0.11659
0.23597
0.42699
0.44464
0.03496
12.9101
0.5933
4.0767
11.1244
5. R code
aml.cox <- coxph(Surv(os_t, os) ~ age + strata(grp), method="efron", data=aml.dat)
aml.cox <- coxph(Surv(os_t, os) ~ age + strata(grp)+age*grp, data=aml.dat)
aml.cox <- coxph(Surv(os_t, os) ~ age + strata(grp)+age*grp+age*os_t, data=aml.dat)
75
5.7 Tied Event Times
The partial likelihood equation is valid if there are no tied event times. To handle ties, several
modifications to the likelihood have been proposed in the literature. These methods are the Cox,
Peto-Breslow, Efron, and Exact methods.
• Breslow’s approximation: This method calculates an approximation from all possible combinations of
ties. This is the default in PROG PHREG. This solution is the least accurate, but fast. It is reasonable
when the number of ties is small.
• Efron’s (1977): This is the default in COXPH in R/Splus. This is a closer approximation to the
discrete method than Breslow’s and computationally efficient when dealing with tied event times.
• Discrete exact (Cox’s): This method assumes that tied failure times are discrete and truly happened at
the same time (i.e., no underlying ordering of events). If there are many ties, this method is
computationally intensive. This is called discrete in SAS, but exact in R and S-plus.
• (Continuous) Exact: This method is based on the continuous likelihood and assumes that there is a
true but unknown ordering for tied event times (i.e., time is continuous) and thus ties are merely the
result of imprecise measurement of time. This method computes the exact partial likelihood for all
possible combinations of ordering and can be very computationally intensive if there are a large
number of ties. For example, if there are 5 tied event times, then 5!=120 possible partial likelihoods.
This is called exact in SAS, and not implemented in R/S-plus.
76
Example. Suppose that the first 2 events out of 5 at risk are tied. If the time data were more accurate,
the first two terms in the likelihood would be either
r1
r2
l1(β) = (
)(
)
r1 + r2 + r3 + r4 + r5 r2 + r3 + r4 + r5
or
r1
r2
)(
)
l2(β) = (
r1 + r2 + r3 + r4 + r5 r1 + r3 + r4 + r5
• Breslow approximation: uses the complete sum r1 + r2 + r3 + r4 + r5 for the denominator.
• Efron method:
r1
r2
(
)(
)
r1 + r2 + r3 + r4 + r5 0.5r1 + 0.5r2 + r3 + r4 + r5
• Discrete exact method:
r1r2
.
r1r2 + r1r3 + r1r4 + r1r5 + r2r3 + r2r4 + r2r5 + r3r4 + r3r5 + r4r5
• (Continuous) Exact method: sums all possible combinations of partial likelihood function. i.e.,
l(β) = l1(β) + l2(β).
Notes:
• When there are no ties, all methods are the same
• When there are a few ties, whichever method is used, it makes little difference.
• When there are too many ties, both approximation methods are biased. However, Efron’s is better.
Example 6. Tied Event Times
proc phreg data=pbc;
model fu_days*status2(0) = age log_bili/tie=breslow;
Variable
age
log_bili
DF
1
1
Parameter
Estimate
0.04378
1.01465
Standard
Error
0.00750
0.07796
Chi-Square
34.0309
169.3939
Pr > ChiSq
<.0001
<.0001
Hazard
Ratio
1.045
2.758
Pr > ChiSq
<.0001
<.0001
Hazard
Ratio
1.045
2.759
Pr > ChiSq
<.0001
<.0001
Hazard
Ratio
1.045
2.759
model fu_days*status2(0) = age log_bili/tie=efron;
Variable
age
log_bili
DF
1
1
Parameter
Estimate
0.04378
1.01497
Standard
Error
0.00750
0.07796
Chi-Square
34.0327
169.4945
model fu_days*status2(0) = age log_bili/tie=exact;
Variable
age
log_bili
DF
1
1
Parameter
Estimate
0.04378
1.01497
Standard
Error
0.00750
0.07796
Chi-Square
34.0329
169.4943
model fu_days*status2(0) = age log_bili/tie=discrete;
Variable
age
log_bili
DF
1
1
Parameter
Estimate
0.04381
1.01524
Standard
Error
0.00751
0.07800
Chi-Square
34.0440
169.3922
Pr > ChiSq
<.0001
<.0001
Hazard
Ratio
1.045
2.760
R code
> coxph(Surv(fu_days, status2) ~ age + log(bili), method="breslow", data=pbc.dat)
> coxph(Surv(fu_days, status2) ~ age + log(bili), method="efron", data=pbc.dat)
77
Using the leukemia data to compare 6-MP vs. control
proc phreg;
class grp(ref=’cnt’)/param=ref;
model time*dth(0)= grp/ ties=breslow;
run;
Parameter
Standard
Parameter
DF
Estimate
Hazard
Ratio Label
grp
6-MP 1
-1.50919
0.40956
13.5783
0.0002
0.221 grp 6-MP
--------------------------------------------------------------------------------proc phreg;
model time*dth(0)= grp/ ties=efron;
run;
Parameter
Standard
Hazard
Parameter
DF
Estimate
Ratio Label
grp
6-MP 1
-1.57213
0.41240
14.5326
0.0001
0.208 grp 6-MP
--------------------------------------------------------------------------------proc phreg;
model time*dth(0)= grp/ ties=exact;
run;
Parameter
Standard
Hazard
Parameter
DF
Estimate
Ratio Label
grp
6-MP 1
-1.59787
0.42162
14.3630
0.0002
0.202 grp 6-MP
--------------------------------------------------------------------------------proc phreg;
model time*dth(0)= grp/ ties=discrete;
run;
Parameter
Standard
Hazard
Parameter
DF
Estimate
Ratio Label
grp
6-MP
1
-1.62822
0.43313
14.1316
0.0002
0.196 grp 6-MP
78
5.8 Predicted Survival
Just as in the linear regression model, once β̂ and Ŝ0(t) are obtained, the predicted survival probabilities
and predicted median survival time can be calculated as.
Ŝ(t|X) = Ŝ0(t)exp(X β̂)
0.5 = Ŝ0(T0.5)exp(X β̂) =⇒ Ŝ0(T0.5) = [0.5]exp(−X β̂)
Note1: Ŝ0(t) can be obtained using PROC LIFETEST (the KM estimate) or using the BASELINE
statement in PHREG, and β̂ from the PHREG output. Estimators for λ0(t) and S0 (t) were discussed
before.
Note2: Before calculating the predicted survival probabilities, it is critical to assess the model adequacy
first.
Example 7: Predicted Survival under PH
data pred;
input age grp; ** STD if grp=0, Mini if grp=1;
cards;
0 0 ** this line generates the estimate of So(t);
50 0
50 1
60 0
60 1
;
run;
model os_t*os(0)=age grp/ ties=efron;
baseline covariates=pred out=outpred survival=s lower=S_lower95
upper=S_upper95;
run;
79
Variable
age
grp
DF
1
1
Parameter
Standard
Estimate
Error
Chi-Square
Pr > ChiSq
0.03711
0.00973
14.5400
0.0001
-0.36429
0.24632
2.1872
0.1392
80
Hazard
Ratio
1.038
0.695
proc print data=outpred;
Obs
2
3
.
64
65
66
age
0
0
.
0
0
0
grp
STD
STD
.
STD
STD
STD
os_t
0.3614
0.4599
.
6.0123
6.0780
6.1437
s
0.99785
0.99566
.
0.89876
0.89696
0.89514
S_lower95
0.99437
0.99007
.
0.81897
0.81597
0.81294
S_upper95
1.00000
1.00000
.
0.98633
0.98600
0.98566
62
63
64
65
50
50
50
50
STD
STD
STD
STD
5.8480
5.9466
6.0123
6.0780
0.51816
0.51172
0.50529
0.49886
0.42688
0.42030
0.41377
0.40726
0.62896
0.62302
0.61706 ** (0.89876)^(exp(0.03711*50))=0.50529
0.61107 ** (0.89696)^(exp(0.03711*50))=0.49886
83
84
50
50
STD
STD
11.6961
13.5359
0.37663
0.36958
0.28709
0.28039
0.49410 * predicted 1-year survival for age=50 in STD
0.48714
186
187
188
189
50
50
50
50
Mini
Mini
Mini
Mini
11.6961
13.5359
14.4230
14.5873
0.50745
0.50083
0.49390
0.48695
0.39601
0.38900
0.38165
0.37430
0.65025 * predicted 1-year survival for age=50 in Mini
0.64480
0.63916
0.63350
252
253
254
255
60
60
60
60
STD
STD
STD
STD
4.43532
4.46817
4.50103
4.53388
0.51864
0.50306
0.48760
0.47995
0.39218
0.37612
0.36032
0.35255
0.68588
0.67283
0.65983
0.65338
289
290
60
60
STD
STD
11.6961
13.5359
0.24286
0.23630
0.13757
0.13243
0.42873 * predicted 1-year survival for age=60 in STD
0.42161
372
373
374
375
60
60
60
60
Mini
Mini
Mini
Mini
5.9466
6.0123
6.0780
6.1437
0.50937
0.50294
0.49649
0.49004
0.39969
0.39308
0.38648
0.37991
0.64916
0.64349
0.63780 ** (0.89696)^(exp(0.03711*60-0.36429))=0.49649
0.63210
392
393
60
60
Mini 11.6961
Mini 13.5359
0.37412
0.36707
0.26606
0.25932
0.52606 * predicted 1-year survival for age=60 in Mini
0.51959
Let us calculate the median predicted survival time for an individual with age=50 and grp=0 (STD) using the formula.
Sˆ0 (T0.5 ) = (0.5)exp−(0.03711∗50−0.36429∗0) = (0.5)0.1563 = 0.89696
The corresponding survival time in the baseline survival is 6.078.
81
82
predicted survival for AML/MDS
1.0
survival
0.8
0.6
0.4
0.2
••
••
••
••••
••••
•••
•••••
•••••••
•••• ••
•••• ••
•
••••• ••
•• • ••
•• • ••
•• ••• •
•• •• ••
•• • ••
•• •• ••
•• ••• ••
•• • ••
• •• •
•• •• •••
••• •• •
••• •• •••
••• •• •• •
•• •
•• •••
••
••• •••
•••
•• •••
•••
••• ••
••• ••••
••
•• •
•
•••
•••
•••
•••
•••
•
•••
••
•• •
• ••
•••
••
age=50, Mini
age=50, STD
age=60, Mini
age=60, STD
• •
••
• •
••
••
•
•
•
•
••
•
•
• ••
• •
•
•
•
••
•
•
0.0
0
10
20
30
40
50
60
70
time
This plot shows the predicted survival probabilities of 4 subgroups, assuming that the PH assumption holds.
83
5.9 Assessment of PH Model Assumptions
5.9.1 Regression Assumptions
Regression assumptions can be checked by
• plotting martingale or deviance residuals against a covariate or covariates (i.e., Xβ)
• plotting log relative hazard against a covariate
• supremum test in PHREG
• categorizing a continuous variable into intervals (e.g., by quantiles) and then plotting β̂ by the
midpoints of intervals, and then checking the shape.
5.9.1.1 Martingale Residuals
• Recall the predicted survival in the previous section. At each time point, the predicted survival for
fixed values of covariates is
Ŝ(t|X) = Ŝ0(t)exp(X β̂) thus
Λ̂(t|X) = −log[Ŝ(t|X)].
The martingale residual for the i-th individual is
ri = δi − Λ̂i = (observed − expected)
where δi is an event indicator. ri s have mean 0 and range between −∞ and 1.
• This residual is derived from the difference between the coutning process and the integrated intensity
R
function. i.e., Mi(t) = Ni (t) − ot Yi(s)expXβ dΛ0(s), i = 1, · · · , n.
84
• Martingale residuals can be used for assessment of either very short or long survival (e.g., a large
negative value of residual) and for evaluation of an appropriate functional form for a covariate.
5.9.1.2 Deviance Residuals
r
D̂i = sign(r̂i ) −2[r̂i + δi log(δi − r̂i)]
• The martingale residual is highly skewed. The deviance residual is a further transformation of the
martingale residual and much like residuals from ordinary linear square regression.
• The deviance residual is asymptotically normally distributed with mean 0 and standard deviation 1.
• The deviance residual is negative for observations that have longer survival times than expected (i.e.,
censored observations) and positive for observations with survival times that are smaller than expected
(i.e., uncensored). Extreme values suggest that the observtion may be an outlier and require a special
attention.
85
5.9.1.3 Functional Form of X
λ(t|x) = λ0(t)expf (x)β
(2)
If a covariate appears to be non-linear, there are many ways to find an appropriate functional forms. We
discuss here a few ways to assess the functional form of a covariate.
• Plot the martigale residuals from a nulll model against each covariate separately and superimpose a
smoothing curve. This was proposed by Therneau et al(1990, Biometrika). If (2) is correct for some
smoothing function f , then the smooth for the j − th covariate will display the form of f , under
certain conditions. That is,
E(ri |Xij = xj ) ≈ cf (xj )
where c is roughly independent of xj and depends on the amount of censoring. See Therneau et
al(1990, Biometrika) for detailed derivation.
• Use smoothing splines or regression splines in a model.
– An alternative to residual manipulations is to model the functional form directly in the Cox
regression model.
– A naive way is to include all polynomial terms (x, x2, x3, cdots) in a model. However, with this
approach, the data fits are not local thus unstable and may computationally unstable.
– A better approcah is to include splines in a model directly to fit curves locally. Thease are
”regression splines”, ”natural splines”, ”smoothing splines” or ”restricted cubic splines (rcs)”. An
example of rcs is f (X) = β0 + β1X + β2(X − a)3+ + β3(X − b)3+ + β4(X − c)3+ for k=3 knots at
a, b, c. A test of linearity in X can be obatined by testing Ho : β2 = β3 = β4 = 0.
Example 8: Testing regression assumption in Cox model
1. AML data: SAS code
model os_t*os(0)=age grp /ties=efron; *** age>=50 only;
output out=d.aml_res survival=s logsurv=ls loglogs=lls resmart=mart resdev=dev;
run;
note: covariates are included in the model. if the linearity assumption is met, the smoothing curve on the residuals is a straight line around zero.
86
checking linearity
1
0
−1
−3
50
55
60
65
70
75
age
3
2
1
0
−1
−2
−3
50
55
60
65
age
50
55
60
65
age
checking linearity
deviance residuals
−2
martingale residuals
0
−1
−2
−4
−3
Log−Log Surv
1
2
2
checking linearity
87
70
75
70
75
88
2. PBC (Primary Biliary Cirrhosis) data: R code
Data from the Mayo Clinic trial in primary biliary cirrhosis of the liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to
Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. The first
312 cases in the data set participated in the randomized trial and contain largely complete data. The additional 112 cases did not participate in the
clinical trial, but consented to have basic measurements recorded and to be followed for survival. Six of those cases were lost to follow-up shortly
after diagnosis, so the data here are on an additional 106 cases as well as the 312 randomized participants. Missing data items are denoted by NA.
a. Testing Linearity
mart <- coxph(Surv(fu.days, status2) ~ age+bili, data=pbc.dat)
mr <- resid(mart)
plot(pbc.dat$age, mr, xlab="Age", ylab="Martingale Residual")
lines(lowess(pbc.dat$age, mr, iter=0), lwd=2, col=6)
plot(pbc.dat$bili, mr, xlab="Bilirubin", ylab="Martingale Residual")
lines(lowess(pbc.dat$bili, mr, iter=0), lwd=2, col=6)
mart2 <- coxph(Surv(fu.days, status2) ~ age+log.bili, data=pbc.dat)
mr2 <- resid(mart2)
plot(pbc.dat$log.bili, mr2, xlab="log(Bilirubin)", ylab="Martingale Residual")
lines(lowess(pbc.dat$log.bili, mr2, iter=0), lwd=2, col=6)
1
0
−1
−3
−4
30
40
50
60
70
80
0
−1
−2
−3
−4
−5
−1
0
1
log(Bilirubin)
0
5
10
15
Bilirubin
1
Age
Martingale Residual
89
−2
Martingale Residual
−1
−2
−4
−3
Martingale Residual
0
1
2
3
20
25
b. Functional Forms
To assess a functional form for each covariate, fit a model without covariates first
>
>
>
>
>
>
>
>
coxph1 <- coxph(Surv(fu.days, status2) ~ 1, data=pbc.dat) # to create null residual plots
mart <- resid(coxph1)
plot(pbc.dat$age, mart, xlab="Age", ylab="Martingale Residual")
lines(lowess(pbc.dat$age, mart, iter=0), lwd=2, col=2)
# Age looks pretty linear
plot(pbc.dat$bili, mart, xlab="Bilirubin", ylab="Martingale Residual")
lines(lowess(pbc.dat$bili, mart, iter=0), lwd=2, col=2) # Bilirubin doesn’t look linear
plot(pbc.dat$log.bili, mart, xlab="log(Bilirubin)", ylab="Martingale Residual")
lines(lowess(pbc.dat$log.bili, mart, iter=0), lwd=2, col=2) # log(Bilirubin) looks linear
Note: ‘Residuals’ in R calculates martingale, deviance, score or Schoenfeld residuals for a Cox proportional hazards model.
coxph1 <- coxph(Surv(fu.days, status2) ~ age+log.bili, data=pbc.dat)
print(resid(coxph1, type=c("martingale")))
print(resid(coxph1, type=c("deviance")))
print(resid(coxph1, type=c("score", "schoenfeld"")))
90
−1
−1.0
−0.5
0.0
Martingale Residual
0.5
1.0
40
0
50
91
30
60
1
70
2
80
Age
0
5
10
15
20
25
Bilirubin
3
log(Bilirubin)
−1.0
−1.0
−0.5
−0.5
0.0
Martingale Residual
0.0
Martingale Residual
0.5
0.5
1.0
1.0
> coxph1 <- coxph(Surv(fu.days, status2) ~ age+edema+bili+protime+albumin, data=pbc.dat)
> print(coxph1)
z
p
age
0.0383
1.039 0.00806 4.76 2.0e-06
edema
0.9368
2.552 0.28162 3.33 8.8e-04
bili
0.1159
1.123 0.01301 8.91 0.0e+00
protime 0.2008
1.222 0.05659 3.55 3.9e-04
albumin -0.9710
0.379 0.20538 -4.73 2.3e-06
Likelihood ratio test=183 on 5 df, p=0 n=416 (2 observations deleted due to missingness)
LRT for bili: chi-square 58.9 with 1 df
AIC with 5 covariates: 1561.298 (thus 1733.915-1561.298=172.617)
AIC for null model
: 1733.915
> coxph2 <- coxph(Surv(fu.days, status2) ~ age+edema+log.bili+protime+albumin, data=pbc.dat)
> print(coxph2)
z
p
age
0.0402
1.04 0.00767 5.24 1.6e-07
edema
0.9382
2.56 0.27082 3.46 5.3e-04
log.bili 0.8686
2.38 0.08289 10.48 0.0e+00
protime
0.1746
1.19 0.06109 2.86 4.3e-03
albumin -0.7542
0.47 0.20951 -3.60 3.2e-04
Likelihood ratio test=229 on 5 df, p=0 n=416 (2 observations deleted due to missingness)
LRT for log.bili: chi-square 104.9 with 1 df
AIC with 5 covariates: 1515.333 (1733.915-1515.333=218.582. Compare this number to 172.617)
3. SAS code: PBC data
model fu.days*status2(0)= / ties=efron;
output out=res_m1 resmart=mart resdev=dev;
title "PBC data";
data pbc2;
merge pbc res_m1;
run;
92
proc print data=pbc2;
var id fu_days status2 age bili
Obs
id
fu_days
status2
1
1
400
1
2
2
4500
0
3
3
1012
1
4
4
1925
1
5
5
1504
0
6
6
2503
1
7
7
1832
0
8
8
2466
1
9
9
2400
1
10
10
51
1
11
11
3762
1
12
12
304
1
mart dev;
age
58.7668
56.4478
70.0745
54.7421
38.1065
66.2605
55.5361
53.0583
42.5090
70.5618
53.7154
59.1392
bili
14.5
1.1
1.4
1.8
3.4
0.8
1.0
0.3
3.2
12.6
1.4
3.6
mart
0.92047
-1.03072
0.79455
0.63248
-0.30159
0.51937
-0.35705
0.52750
0.54307
0.99040
0.15763
0.93842
dev
1.79506
-1.43577
1.25540
0.85848
-0.77665
0.65312
-0.84504
0.66666
0.69304
2.70399
0.16677
1.92299
ods graphics on;
ods rtf file="assess.rtf";
model fu_days*status2(0)= age log_bili/ ties=efron;
assess var=(log_bili) / npaths=50;
run;
ods rtf close;
ods graphics off;
note: The resample option of ASSESS in PROC PHREG gives a test of the functional form
and a test of PH. Tests are based on a Kolmogorov-type supremum test using
1000 simulated patterns. A significant p-value indictes poor fit.
93
model fu_days*status2(0) = age bili/tie=efron;
assess var=(bili) / resample seed=;
Supremum Test for Functional Form
Maximum
Absolute
Variable
Value
Replications
Seed
bili
36.8864
1000
901907139
Pr >
MaxAbsVal
<.0001
assess var=(log_bili) / resample;
Supremum Test for Functional Form
Maximum
Absolute
Pr >
Variable
Value
Replications
Seed
MaxAbsVal
log_bili
9.3530
1000
906300144
0.1500
94
95
4. Use Frank Harrell’s programs
rcspline.plot(bili, fu.days, event=status2, nk=3, xlab="Bilirubin)")
rcspline.plot(log.bili, fu.days, event=status2, nk=3, xlab="log(Bilirubin)")
Estimated Spline Transformation
3
Cox Regression Model, n=418 events=161
Statistic
X2 df
Model
L.R. 152.98 2 AIC= 148.98
Association Wald 151.93 2 p= 0
Linearity Wald 53.76 1 p= 0
Cox Regression Model, n=418 events=161
Statistic
X2 df
Model
L.R. 155.42 2 AIC= 151.42
Association Wald 144.88 2 p= 0.0000
Linearity Wald 2.23 1 p= 0.1355
1
0
log Relative Hazard
2
−2
−1
1
0
log Relative Hazard
3
2
4
Estimated Spline Transformation
0
5
10
Bilirubin)
15
20
−1
0
1
log(Bilirubin)
2
3
5. R code for spline functions: PBC data
> library(survival)
> library(stats) ** this is for termplot
> coxph1 <- coxph(Surv(fu.days, status==2) ~ age+pspline(bili, df=4)+pspline(protime,
df=4)+pspline(albumin, df=4), data=pbc.dat)
> print(coxph1)
> termplot(coxph1, rug=T, terms=1, ylab="Spline fit") ** rug=T for actual data points to appear on X axis
> termplot(coxph1, rug=T, terms=2, ylab="Spline fit")
coef
se(coef) se2
Chisq DF
age
0.0358 0.00787 0.00782 20.73 1.00
pspline(bili, df = 4), linear
0.1167 0.01573 0.01541 55.03 1.00
pspline(bili, df = 4), nonlin
42.10 3.04
pspline(protime, df = 4), linear 0.3725 0.08012 0.07962 21.61 1.00
pspline(protime, df = 4), nonlin
6.61 3.00
pspline(albumin, df = 4), linear -0.9757 0.18930 0.18805 26.57 1.00
pspline(albumin, df = 4), nonlin
1.72 3.06
Iterations: 4 outer, 12 Newton-Raphson
Theta= 0.741 ** tuning parameter for bili
Theta= 0.559 ** tuning parameter for protime
Theta= 0.77
** tuning parameter for albumin
Degrees of freedom for terms= 1.0 4.0 4.0 4.1
Likelihood ratio test=239 on 13.1 df, p=0
n=416 (2 observations deleted due to missingness)
p
5.3e-06
1.2e-13
4.1e-09
3.3e-06
8.6e-02
2.5e-07
6.4e-01
If you are not sure about d.f. in pspline, just fit pspline(x). The pspline function uses Akaike’s information criteria, AIC = LR test -2*df, to
select a “best” degrees of freedom for the term.
96
coxph1 <- coxph(Surv(fu.days, status==2) ~ age+pspline(bili, df=4)+pspline(protime)
+pspline(albumin), data=pbc.dat)
coef
se(coef) se2
Chisq
age
0.0358 0.00787 0.00782 20.73
pspline(bili, df = 4), li 0.1167 0.01573 0.01541 55.03
pspline(bili, df = 4), no
42.10
pspline(protime), linear
0.3725 0.08012 0.07962 21.61
pspline(protime), nonlin
6.61
pspline(albumin), linear -0.9757 0.18930 0.18805 26.57
pspline(albumin), nonlin
1.72
Iterations: 4 outer, 12 Newton-Raphson
Degrees of freedom for terms= 1.0 4.0 4.0 4.1
Likelihood ratio test=239 on 13.1 df, p=0
n=416 (2 observations deleted due to missingness)
DF
1.00
1.00
3.04
1.00
3.00
1.00
3.06
p
5.3e-06
1.2e-13
4.1e-09
3.3e-06
8.6e-02
2.5e-07
6.4e-01
The AIC criterion has chosen 4 df for protime albumin. Albumin is linear, thus fit a linear term only.
97
98
−1.0
1.0
−1.0
0.0
Spline fit
0.0
−0.5
Spline fit
0.5
2.0
1.0
30
40
50
60
70
80
0
5
20
25
bili
1.0
−0.5
0.5
Spline fit
1.0
0.0
−1.0
−2.0
Spline fit
15
1.5
age
10
8
10
12
14
protime
16
18
2.0
2.5
3.0
3.5
albumin
4.0
4.5
99
5.9.2 PH Assumption
The primary assumption of the Cox model is proportional hazards, thus it is important to assess whether
this assumption holds for each variable.
The relative hazard for any two subjects i and j is
λ0(t)exp(Xiβ)
= exp(Xi − Xj )β
λ0(t)exp(Xj β)
which is independent of time for time-fixed covariates. This hazard ratio is exp(β) if X is a binary variable
with 1 or 0, and it is a hazard ratio for one unit change if X is continuous.
For example, edema in the PBC data is coded as 1 or 0 and β̂(edema)=0.94 in Example 8-2. This means
that the log-log Survival curves for edema=0 and 1 should be parallel, spaced 0.94 units apart since
logΛ(t|X) = log{−logS(t|X)} = logΛ0(t) + Xβ = logΛ0(t) + 0.94
There are many graphical and analytical methods of verifying the PH assumption, and we present here a
few of those.
1. For categorical variables.
(a) Plot Kaplan-Meier curves or log(-log(S(t)). If the PH assumption holds, the KM curves should not
cross and log(-log(S(t)) against log(time) should be approximately parallel.
(b) Plots from stratified Cox model.
(c) Calculate a log hazard ratio for a predictor to be a function of time by fitting specifically stratified
Cox models.
100
2. For continuous variables
(a) For fixed points of X, plots of log(-log(S(t)) against log(time) should be approximately parallel.
3. For any type of variables
(a) Test a time by predictor interaction term in Cox model.
(b) Plot Schoenfeld partial residuals (weighted or unweighted) against t with respect to each predictor
(c) A formal PH test using cox.zph in R/S-plus.
(d) A formal PH test in PROC PHREG using supremum test.
Let us discuss the option 1-c) in detail with an example of the Veterans Administration (VA) Lung Cancer
dataset. Log Λ plots indicated that the four cell types did not satisfy the PH assumption. For the purpose
of illustration, omit patients with large cell type and let the binary predictor be 1 if the cell type is
squamous and 0 if it is small or adeno. We are assessing whether survival patterns for the two groups
squamous vs. small/adeno have PH. Interval specific log-hazard ratio are found below.
time
interval
[ 0, 21)
[ 21, 52)
[ 52, 118)
118+
total
obs.
26
25
31
28
events log hazard
ratio
26
0.75
24
-0.006
26
-0.63
26
-1.04
standard
error
0.50
0.52
0.51
0.45
There is evidence of a trend of decreasing hazard ratio over time which is consistent with the observation
that patients with squamous cell carcinoma had poorer prognosis in the early period, but better prognosis
in the late period.
101
5.9.2.1 Schoenfeld Residual
• Proposed by Schoenfeld (1982, Biometrika). Suppose that subject i dies at time ti with p covariates
Xi = (Xi1, · · · , Xip)′. The Schoenfeld residual for the subject i is ri = (ri1 , · · · , rip)′, where
rik = Xik − Ê(Xik |Ri )
= Xik −
X
k∈R(ti )
Xkj Pj
= O−E
E: the expected value of the covariate for the risk set at ti, i.e., the average over the risk set at ti;
Pj : the probablity of having event at that time. this probability is estimated from the Cox model.
• The Schoenfeld residual is not defined for censored individuals.
• Instead of a single residual for each individual, there is a separate residual for each individual for each
covariate.
• The Schoenfeld residual does not depend on time so that the residual plot against time is used to test
the PH assumption.
• Sum to zero with the expected value zero.
• Later, Grambsch and Therneau (1999) proposed scaled Schonfeld residuals.
102
5.9.2.2 PH test
More general form of (1) is
′
λ(t|x) = λ0(t)expX β(t)
If β(t)=β, it is PH. Thus if PH holds, a plot of β(t) against time should be a horizontal line. Grambsch
and Therneau (1999) argued that if β̂ is the coefficient from an ordinary fit of the Cox model, then
E(rij∗ ) + β̂j ≈ βj (ti)
rij∗
r
is the scaled Schoenfeld residual (i.e., rij / V (β̂, ti )) for rij from ri = (ri1, · · · , rip )′. Thus, if
where
PH holds, plotting rij∗ + β̂j versus time should be a horizontal line with a zero slope. The cox.zph tests a
nonzero slope as evidence against PH.
103
Example 9: Testing PH assumption
1. SAS code: AML/MDS study
model os_t*os(0)=age/ ties=efron;
strata grp; ** to check the PH assumption of grp;
baseline out=srvpred2 survival=s loglogs=lls;
run;
Log-log survival plots for AML/MDS, stratified by group
1
Mini
STD
Log-Log Survival
0
-1
-2
-3
-4
0
1
2
3
4
log(time)
This plot shows the baseline survivor function for the 2 groups evaluated at the mean of age.
104
model os_t*os(0)=age grp/ ties=efron;
output out=res_s xbeta=xb ressch=schage schgrp;
title "AML/MDS : overall survival";
Schoenfeld Residuals for Group
0.3
0.0
−0.6
−0.3
schonfeld residuals for grp
5
−5 0.0
0
−1.2
−0.9
−25
−35
schonfeld residuals for age
15
0.6
25
0.9
35
1.2
Schoenfeld Residuals for Age
0
6
12
18
24
30
time
36
42
48
54
60
0
6
12
18
24
30
time
36
42
48
54
60
2. R code: AML/MDS study
>aml.cox <- coxph(Surv(os.t, os) ~ age + grp, data=aml.data)
>summary(aml.cox)
coxph(formula = Surv(os.t, os) ~ age + grp)
n= 181
z
p
age 0.0371
1.04 0.00973 3.81 0.00014
grp 0.3643
1.44 0.24632 1.48 0.14000
age
grp
1.04
0.964
1.018
1.06
1.44
0.695
0.888
2.33
Rsquare= 0.083
(max possible=
Likelihood ratio test= 15.7 on
Wald test
= 15 on 2
Efficient score test = 15.1 on
0.997 )
2 df,
p=0.000397
df,
p=0.000558
2 df,
p=0.000516
>ph.test <- cox.zph(aml.cox)
>print(ph.test) # display the results
>plot(ph.test)
# plot for scaled Schoenfeld residuals against time for each covariate
rho chisq
p
age -0.137
2.3 1.29e-01
grp -0.339 15.5 8.38e-05
GLOBAL
NA 16.0 3.44e-04
*** Non PH
Note: In the cox.zph output
• rho: Pearson product-moment correlation between the scaled Schoenfeld residuals and log(t) for each covariate
• chisq: test statistics
• GLOBAL: global test for PH
• the test is not always sensitive
105
5
106
−5
0
Beta(t) for grp
0
−5
Beta(t) for age
5
0.63 1.4
3.3 4.5
6.1
Time
9.3
18
44
0.63 1.4
3.3 4.5
6.1
Time
9.3
18
44
3. SAS code: model with a time by covariate interaction
where age50=1;
model os_t*os(0)=grp age grp_age grp_t diff_sex dnr_type agvhd second_tx
high_risk cell_srce sex_grp sec_grp/ selection=b ;
grp_t=os_t*grp;
grp_age=age*grp;
sex_grp=diff_sex*grp;
sec_grp=second_tx*grp;
title "OS, all";
After a backward selection, the final model is
Variable
grp
grp_t
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
1
1
1.46186
-0.15271
0.46896
0.06367
9.7170
5.7523
0.0018
0.0165
4.314
0.858
Interpretation: the hazard ratio for grp is exp(1.46186 - 0.15271*time).
107
4. Plotting -Log SDF (ls) against time and Log(-Log SDF) (lls) against log (time).
108
5. SAS code to check the PH assumption for Cell Type: VA Lung Cancer Data.
proc lifetest plots=(s,ls,lls) outtest=Test maxtime=600;
strata Cell; *** include adenocarcinoma and large cell only
run;
Note: If the proportional hazards assumption is appropriate, then the lls curves should be approximately parallel across strata.
109
model fu_days*status2(0) = age bili/tie=efron;
assess var=(bili) PH / resample;
Supremum Test for Proportionals Hazards Assumption
Maximum
Absolute
Pr >
Variable
Value
Replications
Seed
MaxAbsVal
age
0.9154
1000
412213316
0.2640
bili
1.4470
1000
412213316
<.0001
assess var=(log_bili) PH / resample;
Supremum Test for Proportionals Hazards Assumption
Maximum
Absolute
Pr >
Variable
Value
Replications
Seed
MaxAbsVal
age
0.7989
1000
45062701
0.3720
log_bili
0.7137
1000
45062701
0.4880
110
111
5.9.2.3 What to do when PH fails
• Check if all important covariates are included
• Check if appropriate functional forms of X are used. A wrong functional form or missing covariate can
look like non-PH
• Use a stratified Cox model as discussed above. For a continuous variable, create a quantile group.
Caution: stratification only makes sense for nuisance variables since no estimates are obtained for
the stratifying variables, thus no test for the main effect of the stratifying variables. if the variable is
the variable of interest, stratification is not a solution.
• Include a time by predictor or predictor by predictor interaction in the model.
• Partition the time axis: PH holds over short time periods
• Model time-dependent covariate: β(t)X ≈ βX ∗(t)
• Explore other models
5.9.2.4 Summary of Residuals
Residual
Purposes
Martingale or
assess adequacy of the regression assumptions of predictors
Deviance
and functional forms of predictors
Score
detect influential observations
DFBETA
detect influential observations.
Schoenfeld
test PH assumption
Weighted Schoenfeld test PH assumption
112
5.11 Time-Dependent Covariates and Time-Dependent Coefficients
′
λ(t|X) = λ0(t)expX (t)β
′
λ(t|X) = λ0(t)expX β(t)
(3)
(4)
(3) is commonly known as a Cox model with time-dependent covariates. That is, the hazard at time t
depends on the value of X at time t, and thus the hazard may not be constant over time. This model is
still PH. A time-dependent covariate can be measured once (e.g., Stanford Heart Transplant study) or
repeatedly over time.
(4) is a time-dependent coefficient model. This is a non-proportional hazards model with the regression
effect of X changing over time.
113
5.11 Counting Process form of a Cox model
The basic concept is like a slow Poisson process - censoring is not “incomplete data”, rather “the event
hasn’t occurred yet” (Laird and Oliver, JASA, 1981). Computationally this requires a simple
reconstruction of data set. In the data set, create start and stop instead of time. i.e.,
(start, stop] status strata x1 x2 ...
instead of
time status x1 x2 ...
where
• (start, stop]: an interval of risk
• status=1 if the subject had an event at time stop; 0 otherwise
This simple reconstruction allows to analyze data that contains time-dependent covariates, time-dependent
strata, multiple events per subject and so on.
114
Example 10: Time-dependent Covariate
a. Stanford Heart Transplant Study. Taken from SAS. Patients are accepted if physicians judge them suitable for heart transplant. Then, when a
donor becomes available, physicians choose transplant recipients according to various medical criteria. A patient’s status can be changed during the
study from waiting for a transplant to transplant recipient. Transplant status can be defined by the time-dependent covariate function X=X(t) as
X(t)=0 if the patient has not received the transplant at time t
1 if the patient has received the transplant at time t.
The Stanford heart transplant data that appear in Crowley and Hu (1977) consist of 103 patients, 69 of whom received transplants.
data Stanford_Heart;
input ID
@5 Bir_Date mmddyy8.
@14 Acc_Date mmddyy8.
@23 Xpl_Date mmddyy8.
@32 Ter_Date mmddyy8.
@41 Status 1.
@43 PrevSurg 1.
@45 NMismatch 1.
@47 Antigen 1.
@49 Mismatch 4.
@54 Reject 1.
@56 NotTyped $1.;
label Bir_Date =’Date of birth’
Acc_Date =’Date of acceptance’
Xpl_Date =’Date of transplant’
Ter_Date =’Date last seen’
Status
= ’Dead=1 Alive=0’
PrevSurg =’Previous surgery’
NMismatch= ’No of mismatches’
Antigen = ’HLA-A2 antigen’
Mismatch =’Mismatch score’
NotTyped = ’y=not tissue-typed’;
Time= Ter_Date - Acc_Date;
Acc_Age=int( (Acc_Date - Bir_Date)/365 );
if ( Xpl_Date ne .) then do;
WaitTime= Xpl_Date - Acc_Date;
Xpl_Age= int( (Xpl_Date - Bir_Date)/365 );
end;
if ( Xpl_Date ne .) then transplant=1; else transplant=0;
datalines;
1 01 10 37 11 15 67
01 03 68 1 0
2 03 02 16 01 02 68
01 07 68 1 0
3 09 19 13 01 06 68 01 06 68 01 21 68 1 0 2 0 1.11 0
4 12 23 27 03 28 68 05 02 68 05 05 68 1 0 3 0 1.66 0
5 07 28 47 05 10 68
05 27 68 1 0
6 11 18 13 06 13 68
06 15 68 1 0
7 08 29 17 07 12 68 08 31 68 05 17 70 1 0 4 0 1.32 1
8 03 27 23 08 01 68
09 09 68 1 0
9 06 11 21 08 09 68
11 01 68 1 0
10 02 09 26 08 11 68 08 22 68 10 07 68 1 0 2 0 0.61 1
11 08 22 20 08 15 68 09 09 68 01 14 69 1 0 1 0 0.36 0
12 07 09 15 09 17 68
09 24 68 1 0
13 02 22 14 09 19 68 10 05 68 12 08 68 1 0 3 0 1.89 1
14 09 16 14 09 20 68 10 26 68 07 07 72 1 0 1 0 0.87 1
15 12 04 14 09 27 68
09 27 68 1 1
16 05 16 19 10 26 68 11 22 68 08 29 69 1 0 2 0 1.12 1
17 06 29 48 10 28 68
12 02 68 1 0
18 12 27 11 11 01 68 11 20 68 12 13 68 1 0 3 0 2.05 0
19 10 04 09 11 18 68
12 24 68 1 0
20 10 19 13 01 29 69 02 15 69 02 25 69 1 0 3 1 2.76 1
21 09 29 25 02 01 69 02 08 69 11 29 71 1 0 2 0 1.13 1
22 06 05 26 03 18 69 03 29 69 05 07 69 1 0 3 0 1.38 1
23 12 02 10 04 11 69 04 13 69 04 13 71 1 0 3 0 0.96 1
24 07 07 17 04 25 69 07 16 69 11 29 69 1 0 3 1 1.62 1
25 02 06 36 04 28 69 05 22 69 04 01 74 0 0 2 0 1.06 0
26 10 18 38 05 01 69
03 01 73 0 0
27 07 21 60 05 04 69
01 21 70 1 0
28 05 30 15 06 07 69 08 16 69 08 17 69 1 0 2 0 0.47 0
29 02 06 19 07 14 69
08 17 69 1 0
................................
115
116
To illustrate calculation of a time-dependent covariate in the PL, let’s suppose that we have one time-dependent covariate, WaitTime in the model.
*******************************************************
Of 103, 22 pts died or censored before Time 27 and thus 81 are at risk.
Wait
Obs
Time
Status
Time
transplant X(27)
23
27
1
17
1
1
24
29
1
4
1
1
25
30
0
.
0
0
26
31
1
.
0
0
27
34
1
.
0
0
28
35
1
.
0
0
29
36
1
.
0
0
30
38
1
35
1
0 <==
31
38
0
37
1
0 <==
...
...
..
..
..
..
103
1799
0
24
1
1
*******************************************************
At the failure time=27,
eβx23 (27)
eβx23 (27) + eβx24 (27) + eβx25 (27) + · · · + eβx29 (27) + eβx30 (27) + eβx31 (27) + · · · + eβx103 (27)
eβ∗1
= β∗1
e + eβ∗1 + eβ∗0 + · · · + eβ∗0 + eβ∗0 + eβ∗0 + · · · + eβ∗1
But if WaitTime is ignored and a fixed covariate ‘tranplant’ is used instread.
eβ∗1
eβ∗1 + eβ∗1 + eβ∗0 + · · · + eβ∗0 + eβ∗1 + eβ∗1 + · · · + eβ∗1
117
proc phreg;
model Time*Status(0)= Acc_Age PrevSurg XStatus;
if (WaitTime = . or Time < WaitTime) then XStatus=0;
else XStatus= 1;
run;
*note1: no pt had Time < WaitTime. 1 pt died on the operation table, thus Time=WaitTime
*note2: Unlike an IF statement in the DATA step, the IF statement in PHREG compares
waiting times for patients who are at risk of a death with survival times for patients
who experienced events.
Percent
Total
Event
Censored
Censored
103
75
28
27.18
Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Without
With
Criterion
Covariates
Covariates
-2 LOG L
596.649
591.312
AIC
596.649
595.312
SBC
596.649
599.947
Test
Chi-Square
DF
Pr > ChiSq
Likelihood Ratio
5.3370
2
0.0694
Score
4.7900
2
0.0912
Wald
4.7812
2
0.0916
Parameter
Acc_Age
PrevSurg
XStatus
DF
1
1
1
Parameter
Standard
Estimate
Error
Chi-Square
Pr > ChiSq
0.03130
0.01386
5.0975
0.0240
-0.77052
0.35961
4.5911
0.0321
-0.04086
0.30225
0.0183
0.8925
Hazard
Ratio
1.032
0.463
0.960
Label
Age at Acceptance
Previous surgery
118
What if we ignore the wait time and treat ’transplant status’ as a fixed effect at the baseline.
proc phreg;
model Time*Status(0)= Acc_Age PrevSurg transplant ; *transplant=1 if ever had a tranplant, 0 otherwise;
run;
Parameter
Standard
Hazard
Parameter
DF
Estimate
Error
Chi-Square
Pr > ChiSq
Ratio
Label
Acc_Age
1
0.05837
0.01496
15.2270
<.0001
1.060
Age at Acceptance
PrevSurg
1
-0.42241
0.37098
1.2964
0.2549
0.655
Previous surgery
transplant
1
-1.70341
0.27812
37.5125
<.0001
0.182
note: selection bias because patients who died quickly were less likely to get transplants.
-------------------------------------------------------------------------------------------proc phreg data=one;
model Time*Status(0)= XStatus / ties=discrete;
if (WaitTime = . or Time < WaitTime) then XStatus=0.;
else XStatus= 1.0;
run;
Test
Likelihood Ratio
Score
Wald
Parameter
XStatus
DF
1
Chi-Square
0.0608
0.0606
0.0597
DF
1
1
1
Pr > ChiSq
0.8052
0.8056
0.8069
<=== MB test
Parameter
Standard
Estimate
Error
Chi-Square
Pr > ChiSq
0.07232
0.29585
0.0597
0.8069
Hazard
Ratio
1.075
119
b. Example 49.5 from SAS PHREG: Time-Dependent Repeated Measurements at regular time intervals
Consider an experiment to study the dosing effect of a tumor-promoting agent. Forty-five rodents initially exposed to a carcinogen were randomly
assigned to three dose groups. After the first death of an animal, the rodents were examined every week for the number of papillomas. Investigators
were interested in determining the effects of dose on the carcinoma incidence after adjusting for the number of papillomas.
The input data set TUMOR consists of the following 19 variables:
* ID (subject identification)
* Time (survival time of the subject)
* Dead (censoring status where 1=dead and 0=censored)
* Dose (dose of the tumor-promoting agent)
* P1 -P15 (number of papillomas at the 15 times that animals died.
These 15 death times are weeks 27, 34, 37, 41, 43, 45, 46, 47, 49, 50, 51, 53, 65, 67, and 71. For instance, subject 1 died at week 47; it had no
papilloma at week 27, five papillomas at week 34, six at week 37, eight at week 41, and 10 at weeks 43, 45, 46, and 47. For an animal that died
before week 71, the number of papillomas is missing for those times beyond its death.)
The following SAS statements create the data set TUMOR:
data Tumor;
infile datalines missover;
input ID Time Dead Dose P1-P15;
label ID=’Subject ID’;
datalines;
1
2
3
5
8
9
10
11
47
71
81
81
69
67
81
37
1
1
0
0
0
1
0
1
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0
0
0
0
0
0
0
9
5
0
1
0
0
0
0
9
6
0
1
0
0
1
0
9
8 10 10 10 10
0 0 0 0 0
1 1 1 1 1
0 0 0 0 0
0 0 0 0 0
1 2 2 2 2
0 0 0 0 0
1
1
0
0
3
0
1
1
0
0
3
0
1
1
0
0
3
0
1
1
0
0
3
0
1
1
0
0
3
0
1
1
0
0
3
0
1
1
0
0
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
;
53
38
54
51
47
27
41
49
53
50
37
49
46
48
54
37
53
45
53
49
39
27
49
43
28
34
45
37
43
0
0
0
1
1
1
1
1
0
1
1
1
1
0
0
1
1
1
0
1
0
1
1
1
0
1
1
1
1
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
0
5
2
15
13
22
6
0
0
0
3
2
4
15
12
12
3
4
6
0
7
17
0
14
8
11
10
0
9
0 0 0 0 0 0 0 0 0 0
13 14
6 6 6 6 6 6 6 6 6 6
15 15 16 16 17 17 17 17 17 17
20 20 20 20 20 20 20
13
3
0
0
15
3
6
26
14
16
6
12
10
2
8
13
3
1
2
15
3
7
26
15
17
6
15
13
2
8
13
3
1
3
3
1
4
3
1
6
3
1
6
3
1
6
3
1
6
1
6
1
120
0
6
1
3 3 4 4 4 4
9 9 9 9
26 26 26 26 26
15 15 15 15 15 15 15 15 15
6 6 6 6 6 6 6 6 6
20 20 20
13 13 15 15 15 15 15 15 20
2 2 2 2 2 2
6 9 14 14 14 14 14 14
18 20 20 20
18
12 16 16 16 16
1 1
19 19 19 19
The number of papillomas (NPap) for each animal in the study was measured repeatedly over time. One way of handling time-dependent repeated
measurements in the PHREG procedure is to use programming statements to capture the appropriate covariate values of the subjects in each risk
set. In this example, NPap is a time-dependent explanatory variable with values that are calculated by means of the programming statements shown
in the following SAS statements:
121
proc phreg data=Tumor;
model Time*Dead(0)=Dose NPap;
array pp{*} P1-P14;
array tt{*} t1-t15;
t1 = 27; t2 = 34; t3 = 37; t4 = 41; t5 = 43; t6 = 45; t7 = 46;
t8 = 47; t9 = 49; t10= 50; t11= 51; t12= 53; t13= 65; t14= 67;
t15= 71;
if Time < tt[1] then NPap=0;
else if time >= tt[15] then NPap=P15;
else do i=1 to dim(pp);
if tt[i] <= Time < tt[i+1] then NPap= pp[i];
end;
run;
At each death time, the NPap value of each subject in the risk set is recalculated to reflect the actual number of papillomas at the given death
time. For instance, subject one in the data set Tumor was in the risk sets at weeks 27 and 34; at week 27, the animal had
no papilloma, while at week 34, it had five papillomas.
Model Information
Data Set
WORK.TUMOR
Dependent Variable
Time
Censoring Variable
Dead
Censoring Value(s)
0
Ties Handling
BRESLOW
Percent
Total
Event
Censored
Censored
45
25
20
44.44
Convergence Status
Without
With
Criterion
Covariates
Covariates
-2 LOG L
166.793
143.269
AIC
166.793
147.269
SBC
166.793
149.707
Test
Likelihood Ratio
Score
Wald
Variable
Dose
NPap
DF
1
1
Chi-Square
DF
Pr > ChiSq
23.5243
2
<.0001
28.0498
2
<.0001
21.1646
2
<.0001
Parameter
Standard
Estimate
Error
Chi-Square
Pr > ChiSq
0.06885
0.05620
1.5010
0.2205
0.11714
0.02998
15.2705
<.0001
Hazard
Ratio
1.071
1.124
NOTE: After the number of papillomas is adjusted for, the dose effect of the tumor-promoting agent is not statistically significant.
122
What if we construct the data set as (start, stop] status x1 x2. i.e.,
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ID
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Time
47
47
47
47
47
47
47
47
71
71
71
71
71
71
71
71
71
71
71
71
71
71
71
Dead
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Dose
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Npap
0
5
6
8
10
10
10
10
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
start
0
27
34
37
41
43
45
46
0
27
34
37
41
43
45
46
47
49
50
51
53
65
67
stop
27
34
37
41
43
45
46
47
27
34
37
41
43
45
46
47
49
50
51
53
65
67
71
status
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
123
proc phreg data=three;
model (start stop)*status(0)=Dose NPap;
run;
Model Information
Data Set
WORK.THREE
Dependent Variable
start
Dependent Variable
stop
Censoring Variable
status
Censoring Value(s)
0
Ties Handling
BRESLOW
Percent
Total
Event
Censored
Censored
412
25
387
93.93
Convergence Status
Without
With
Criterion
Covariates
Covariates
-2 LOG L
166.793
143.269
AIC
166.793
147.269
SBC
166.793
149.707
Test
Chi-Square
DF
Pr > ChiSq
Likelihood Ratio
23.5243
2
<.0001
Score
28.0498
2
<.0001
Wald
21.1646
2
<.0001
Variable
Dose
Npap
DF
1
1
Parameter
Standard
Estimate
Error
Chi-Square
Pr > ChiSq
0.06885
0.05620
1.5010
0.2205
0.11714
0.02998
15.2705
<.0001
Hazard
Ratio
1.071
1.124
124
R code
coxph1 <- coxph(Surv(start, stop, status) ~ dose+npap, method=’breslow’, data=pap.data)
n= 412
z
p
dose 0.0689
1.07
0.0562 1.23 2.2e-01
npap 0.1172
1.12
0.0300 3.91 9.3e-05
dose
npap
1.07
0.933
0.96
1.20
1.12
0.889
1.06
1.19
Rsquare= 0.055
(max possible=
Wald test
= 21.2 on
Score (logrank) test = 28.1 on
0.333 )
2 df,
p=7.8e-06
2 df,
p=2.53e-05
2 df,
p=8.11e-07
125
126
c. DIEP study
To investigate whether early pregnancy losses increase with marked hyperglycemia in diabetic pregnancy, the DIEP (Diabetes In Early Pregnancy)
study was conducted in five academic centers, and the association between the glycated protein level and fetal loss during the first trimester has
been examined. The diagnosis of pregnancy was made during the first week of missed menses by plasma HCG determination. Diabetic subjects
were then admitted to a metabolic ward for monitoring, educated in home glucose monitoring and diary keeping and insulin adjustment.
Non-diabetic control subjects were screened for gestational diabetes at 26 weeks gestation, and were excluded from the control cohort if positive.
Glycated protein measurements were performed in 429 control and 389 diabetic pregnancies. The methods of early pregnancy diagnosis, pregnancy
dating and assessment of pregnancy loss have been described in detail elsewhere.
To assess the relationship between the protein level and early pregnancy loss, a Cox model with a time-dependent covariate, glycated protein, was
employed on the rationale that pregnancy loss in a given time interval might be affected by the values of the glycated protein in that interval.
Obs
IDNO
week
loss
GROUP
gp1
gp2
gp3
1
2
3
5
6
9
10
11
12
13
14
15
...
431
433
434
435
437
440
443
446
12024
12026
12033
12036
12037
12045
12047
12052
12059
12071
12072
12078
14
13
13
2
2
13
13
13
13
3
11
11
0
0
0
1
1
0
0
0
0
1
0
0
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
CONTROL
-0.32547
-0.74759
-0.69945
-0.05110
-0.16817
-0.16336
0.70871
-1.13056
-1.23302
.
-0.55763
-0.66517
-0.16690
-0.56695
0.64185
-0.11780
1.74174
-0.03750
-0.13455
-0.70450
-1.65499
-0.89960
-0.81389
1.54764
0.73009
.
-0.50187
.
.
-1.16834
-0.48167
-0.50187
-0.11166
.
-1.45108
0.79067
11004
11006
11007
11009
11012
11017
11021
11031
14
2
11
2
12
13
12
11
0
1
0
1
0
0
0
0
DIABETIC
DIABETIC
DIABETIC
DIABETIC
DIABETIC
DIABETIC
DIABETIC
DIABETIC
1.17303
-0.18828
.
2.88259
-0.18530
0.75729
-0.81090
1.21524
-0.13306
-0.81389
-1.00084
1.38556
1.83879
.
-0.19091
-0.36447
0.20499
.
0.20499
.
-0.88559
-0.72403
0.54631
-0.83542
447
448
459
461
462
463
510
11048
11056
11107
11110
11111
11114
21067
10
12
1
2
12
3
3
0
0
1
1
0
1
1
DIABETIC
DIABETIC
DIABETIC
DIABETIC
DIABETIC
DIABETIC
DIABETIC
0.58591
-0.44238
1.49422
2.27084
3.07254
.
1.97591
-0.84624
1.35355
.
.
2.25934
3.48863
1.55105
0.85126
-0.76442
.
.
0.91916
2.76988
0.49886
proc phreg;
model week*loss(0)=grp zgp1 zgp1_s / ties=efron;
array gp(*) gp1-gp3;
zgp1=gp[week];
zgp1_s=zgp1*zgp1;
run;
Variable
grp
zgp1
zgp1_s
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
1
1
1
-0.14965
-0.18885
0.06494
0.29132
0.10914
0.02243
0.2639
2.9943
8.3847
0.6074
0.0836
0.0038
0.861
0.828
1.067
127
128
d. Time-dependent covariates at irregular intervals
E2491 is a Phase III clinical trial to test the effect of all-trans-retinoic acid (ATRA) on acute promyelocytic leukemia (APL). 350 patients with
previously untreated APL were randomly assigned to receive ATRA or daunorubicin plus cytarabine (Chemo) as induction treatment. Patients who
had a complete remission received consolidation therapy consisting of one cycle of treatment identical to the induction chemotherapy, then
high-dose cytarabine plus daunorubicin. Patients still in complete remission after two cycles of consolidation therapy were then randomly assigned
to maintenance treatment with ATRA or to observation (Obs).
INDUCTION
CROSSOVER
MAINTENANCE
REREGISTRATION
A:
D:
F:
I:
Chemo (> 3 yo), B: Chemo (<= 3 yo), C: ATRA
Chemo (> 3 yo), E: Chemo (<= 3 yo)
ATRA,
G: Obs (direct),
H: Obs
ATRA extension
E2491 Schema
Chemo
ATRA
CR
1st
Rando
ATRA
Consol.
2nd
Rando
Obs
Some years after the study closure, a question arose whether there is a difference in survival between two types of morphology (M3 vs. M3v), and
whether the treatment effect is different between two types of morphology in APL. To address this question, the data are constructed in two ways:
i) treat the treatment as a time-dependent variable in the array statement in PHREG, ii) fit a model with the time-dependent variable using the
counting process style of input.
1. Time-dependent at irregular intervals
pt_id
t1
t2
t3
t4
atra1
atra2
atra3
atra4
1
0
.
4.10678
.
1
.
1
.
2
0
1.54415
.
.
1
0
.
.
3
0
1.60986
7.81930
.
1
0
0
.
4
0
.
.
.
0
.
.
.
5
0
.
4.79671
.
0
.
0
.
6
0
.
5.65092
.
1
.
0
.
7
0
.
.
.
0
.
.
.
8
0
.
4.56674
.
1
.
1
.
9
0
.
4.23819
.
0
.
0
.
10
0
.
4.10678
.
0
.
1
.
11
0
.
5.88090
.
1
.
0
.
12
0
.
.
.
1
.
.
.
13
0
.
.
.
0
.
.
.
14
0
.
.
.
0
.
.
.
15
0
.
.
.
0
.
.
.
16
0
.
5.51951
.
1
.
0
.
17
0
.
6.76797
.
1
.
1
.
18
0
.
6.47228
.
0
.
0
.
19
0
.
6.17659
.
1
.
0
.
20
0
.
.
.
1
.
.
.
.....
proc phreg;
class m3v(ref=’0’) sex(ref=’2’) wbc_cat(ref=’0’) plt40k(ref=’0’);
model os_t*os(0)=age60 sex wbc_cat plt40k hgb m3v atra;
array tt{*} t1-t4;
array a{*} atra1-atra4;
do i=1 to 4;
if os_t ge tt[i] and tt[i] ne . then atra=a[i];
end;
run;
os
0
1
1
0
1
0
1
0
1
1
0
1
1
1
1
0
0
1
0
1
129
os_t
122.513
1.347
26.119
172.025
13.602
166.604
0.526
165.815
5.290
11.170
173.897
0.657
0.296
18.333
10.448
132.435
171.696
26.448
162.793
0.263
m3v
1
1
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
Parameter
Standard
Parameter
DF
Estimate
Error
Chi-Square
Pr > ChiSq
age60(>=60 vs <60)
0.60284
0.19821
9.2504
0.0024
sex (M vs F)
0.42957
0.16295
6.9494
0.0084
wbc_cat (>10 vs 20-50) 0.42479
0.22268
3.6389
0.0564
wbc_cat (>10 vs >=50K) 0.50615
0.28911
3.0650
0.0800
plt40k (>=40K vs <40K) -0.35150
0.17253
4.1509
0.0416
Hgb
0.03076
0.01239
6.1641
0.0130
m3v (M3V vs M3)
0.25435
0.22338
1.2966
0.2548
atra (ATRA vs no ATRA) -0.42493
0.16354
6.7516
0.0094
2. Counting process style input
pt
death
trt
atra
1
0
C
1
1
0
F
1
2
1
C
1
3
0
C
1
3
0
D
0
3
1
G
0
4
0
A
0
5
0
A
0
5
1
G
0
6
0
C
1
6
0
G
0
7
1
A
0
8
0
C
1
8
0
F
1
9
0
A
0
9
1
G
0
11
0
A
0
11
1
F
1
start
0.0
4.1
0.0
0.0
1.6
7.8
0.0
0.0
4.8
0.0
5.7
0.0
0.0
4.6
0.0
4.2
0.0
4.1
stop
4.1
122.5
1.3
1.6
7.8
26.1
172.0
4.8
13.6
5.7
166.6
0.5
4.6
165.8
4.2
5.3
4.1
11.2
130
Hazard
Ratio
1.827
1.537
1.529
1.659
0.704
1.031
1.290
0.654
12
12
13
14
15
16
17
17
18
....
0
0
1
1
1
1
0
0
0
C
G
C
A
A
A
C
G
C
1
0
1
0
0
0
1
0
1
0.0
5.9
0.0
0.0
0.0
0.0
0.0
5.5
0.0
5.9
173.9
0.7
0.3
18.3
10.4
5.5
132.4
6.8
proc phreg multipass;
class atra(ref=’0’) m3v(ref=’0’) sex(ref=’2’) wbc_cat(ref=’0’) plt40k(ref=’0’);
model (start stop)*os(0)=age60 sex wbc_cat plt40k hgb m3v atra / rl;
run;
** You get the same result as above.
131
132
proc phreg multipass; ** equal treatment effect to both M3V and M3?
class atra(ref=’0’) m3v(ref=’0’) sex(ref=’2’) wbc_cat(ref=’0’);
model (start stop)*os(0)=age60 sex wbc_cat hgb m3v atra m3v|atra / rl;
* if just ’rl’ then CL from Wald test is default;
hazardratio ’H1’ m3v / diff=ref cl=both; * ’both’ gives CL from both Wald and PL;
hazardratio ’H2’ atra / diff=ref cl=both;
contrast ’C1’ atra 0 m3v 1 m3v*atra 0,
atra 0 m3v 1 m3v*atra 1 / estimate=exp; ** exp specifies the linear predictors be estimated
in the exponential scale.
run;
**Note: the result of ’H1’ is the same as ’C1’.
H1: Hazard Ratios for m3v
Description
m3v 1 vs 0 At atra=0
m3v 1 vs 0 At atra=1
Point
Estimate
1.069
1.970
95% Wald Confidence
Limits
0.605
1.888
1.055
3.680
95% Profile
Likelihood
Confidence Limits
0.583
1.834
1.019
3.586
H2: Hazard Ratios for atra
Description
atra 1 vs 0 At m3v=0
atra 1 vs 0 At m3v=1
Contrast
C1
Contrast
C1
C1
Point
Estimate
0.587
1.083
Contrast Test Results
Wald
DF
Chi-Square
2
4.5307
Type
EXP
EXP
Row
1
2
95% Wald Confidence
Limits
0.410
0.842
0.521
2.249
95% Profile
Likelihood
Confidence Limits
0.407
0.837
0.515
2.260
Pr > ChiSq
0.1038
Contrast Rows Estimation and Testing Results
Standard
Estimate
Error
Alpha
Confidence Limits
1.0688
0.3103
0.05
0.6049
1.8882
1.9699
0.6281
0.05
1.0545
3.6799
Wald
Chi-Square
0.0525
4.5222
Pr > ChiSq
0.8188
0.0335
133
5.12 Model Selection and Assessment
• It is challenging to fit models to data without knowing what the true model is or might be.
• Choosing an appropriate model is a fundamental difficulty in statistical analysis.
• To do it properly, it requires knowledge of the disease, in depth understading of the statistcis, and
data analysis experience
5.12.1 Automatic Variable Selection
• Forward: variables are added to the model one at a time. At each time, variables added is the one that
gives the largest decrease in −2Log L̂
• Backward: first fit the largest model and eliminate variables one by one using the −2Log L̂ values
• Stepwise: same as forward except that a variable that has been included in the model can be removed
at a later time.
• Best subset in SAS using score test.
• SAS uses Wald test for F/B/S selection, STATA uses score test.
• Automatic variable selection process leads to the identification of one particular subset, but there
might be a number of equally good subsets
• The process also depends on a particular selection process (B/F/S) one chooses and a stopping rule
for inclusion/exclusion criteria
134
5.12.2. -2 Log (L):
• For a given set of data, the larger the value of the maximized likelihood (L), the better the agreement
between the model and the observed data.
• Because L̂ is the product of a series of conditional probabilities, L̂ is bounded by 1. i.e., −2Log(L̂) is
positive.
• For a given data set, the smaller the value of −2log(L̂) is, the better the model is However,
−2Log(L̂) cannot be used on its own as a measure of model adequacy because it depends on the
number of observations in the data set and it decreases as the number of parameters increases.
Conversely, L̂ increases as the number of parameters increases.
5.12.3 Akaike Information Criterion (AIC)
• Accepting partcial likelihood as method for measuring how well a model fits data,
i.e., Accuracy Measure = E[log likelihood of the fitted],
AIC is an unbiased estimator of −2LogL.
AIC =
−2Log
L̂}
{z
|
measure of inaccuracy
+ |2 ∗{z p}
penalty
where L̂ is the maximized likelihood function, p is the number of parameters in the model.
• Since Log(L) increases as the number of parameters increases, 2 ∗ p serves as a penalty term. For the
mathmatical derivationn of AIC, check Bozdogan (Psychometrika, 1987)
135
• It is designed to approximate the real model to minimize the average estimation error. Thus the
smaller value is the better.
• Used when comparing different models for the SAME data set (make sure any missing data are
excluded prior to model selection). The model with smallest AIC value is chosen as the best model to
fit the data. See Example 10B.
• For a finite sample size, use AICc.
2 ∗ p ∗ (p + 1)
AICc = AIC +
n − {zp − 1 }
|
biascorrection
• STATA has a function swaic that sequentially adds or deletes predictors depending on whether they
improve AIC.
• AIC can also be used to select a best transformation (e.g., bilirubin vs log(bilirubin) in the PBC data)
• Also, check Cp, BIC, and a penalized model with L1 lasso penalty (Hastie, Tibshirani, Friedman, The
elements of statistcial learning, 2nd edition, Springer, 2009)
• See Steyerberg et al (Epidemiology 2010). They assessed the performance of traditional and novel
prediction models.
Example 10B.
model fu_days*status2(0) = age sex edema log_bili
selection=f slstay=0.1 details;
title "Forward";
selection=b slstay=0.1 details;
title "Backward";
selection=s slentry=0.1 slstay=0.1 details;
title "Stepwise";
selection=score best=3 details;
title "Best";
protime albumin /
protime albumin /
protime albumin /
protime albumin /
----------------------------------------------------------------------Forward
Summary of Forward Selection
Step
Effect
Entered
DF
Number
In
Score
Chi-Square
Pr > ChiSq
1
log_bili
1
1
180.9505
<.0001
2
edema
1
2
37.1781
<.0001
3
age
1
3
31.1436
<.0001
4
albumin
1
4
13.5624
0.0002
5
protime
1
5
8.1288
0.0044
-----------------------------------------------------------------------
136
----------------------------------------------------------------------Backward
Parameter
age
edema
log_bili
protime
albumin
DF
1
1
1
1
1
Parameter
Estimate
0.04020
0.93643
0.86811
0.17448
-0.75159
Standard
Error
0.00767
0.27103
0.08290
0.06111
0.20946
Chi-Square
27.4635
11.9379
109.6515
8.1529
12.8756
Pr > ChiSq
<.0001
0.0006
<.0001
0.0043
0.0003
Hazard
Ratio
1.041
2.551
2.382
1.191
0.472
Summary of Backward Elimination
Step
1
Effect
Removed
sex
DF
1
Number
In
5
Wald
Chi-Square
0.9398
Pr > ChiSq
0.3323
---------------------------------------------------------------------------Stepwise
Summary of Stepwise Selection
Step
1
2
3
4
5
Effect
Entered
Removed
log_bili
edema
age
albumin
protime
DF
1
1
1
1
1
Number
In
1
2
3
4
5
Score
Chi-Square
180.9505
37.1781
31.1436
13.5624
8.1288
Wald
Chi-Square
Pr > ChiSq
<.0001
<.0001
<.0001
0.0002
0.0044
137
---------------------------------------------------------------------------Best Subset
Regression Models Selected by Score Criterion
Number of
Variables
Score
Chi-Square
1
1
1
Variables Included in Model
-2LogL
AIC
180.9505
103.6249
70.1695
log_bili
edema
albumin
1584.379 1586.379
1672.175 1674.175
2
2
2
249.9111
218.8702
211.2429
edema log_bili
age log_bili
log_bili albumin
1555.940 1559.940
1550.672 1554.672
1555.098 1559.098
3
3
3
277.0052
265.8361
258.8539
age edema log_bili
edema log_bili albumin
edema log_bili protime
1525.644 1531.644
1538.148 1544.148
1548.633 1554.633
4
4
4
289.2636
285.2958
277.0054
age edema log_bili albumin
age edema log_bili protime
age sex edema log_bili
1512.478 1520.478
1518.087 1526.087
1525.154 1533.154
5a
5
5
296.8619
289.3640
285.3139
age edema log_bili protime albumin
age sex edema log_bili albumin
age sex edema log_bili protime
1505.610 1515.610 (**)
1511.606 1521.606
1517.526 1527.526
6a
296.8902 age sex edema log_bili protime albumin 1504.711 1516.711
-----------------------------------------------------------------------------------
138
139
5.12.3 Harrell’s C-Index
• The performance of a mathematical model predicting a dichotomous outcome is charcaterized by two
types of measures: discrimination and calibration.
• Discrimination quantifies the ability of the model to correctly classify subjects into one of two
categories.
• Calibration describes how closely the predicted probabilities agree numerically with the actual
outcomes.
• A measure of discrimination used in a dichotomous outcome is the area under the receiver operating
characteristic (ROC) curve, c.
• Measuring discrimination in survival analysis is more difficult and ambiguous than in logistic
regression. In survival, we expect our model to correctly distinguish between those that have shorter
survival time and longer survival times
• Harrell’s C-Index is an extension of the area under the ROC (receiver operating charcateristic) curve to
survival data to measure this discriminatory ability.
• Harrell’s C-Index is a proportion of usable pairs in which the predictions and outcomes are concordant
• Usable pairs:
– Both event times are observed
– Ti < Cj , where i 6= j
• Unusable pairs:
– Both events are censored
– Ti > Cj , where i 6= j
• Probability of concordant and discordant pairs
πc = P [(Ti < Tj &Zi < Zj ) or (Ti > Tj &Zi > Zj )]
= P (Ti < Tj &Zi < Zj ) + P (Ti > Tj &Zi > Zj )
πd = P ((Ti < Tj &Zi > Zj )or(Ti > Tj &Zi < Zj ))
= P (Ti < Tj &Zi > Zj ) + P (Ti > Tj &Zi < Zj )
where Ti and Tj denote the survival times for subjects i and j and Zi and Zj denote the predicted
probabilities of survival for subjects i and j
• Proportion of unusable pairs: πu = 1 − (πc + πd)
• C-index
C =
πc
πc
=
1 − πu πc + πd
140
141
5.12.4 Integrated Brier Score (IBS)
• The expected Brier score is a mean squared error of prediction.
• A measure of prediction error, with a better predictor indicated by a lower score.
• Also check NRI(Net Reclassification Improvement), IDI(Integrated Discrimination Improvement)
5.12.5 General Comments
• Be mindful of the purpose of regression analysis:
– To dentify potential prognostic factors for a particular failure
– To assess a prognostic factor of interest in the presence of other potential prognostic factors
• Regression analysis depends on at which purpose you aim
• Model selection is not an exact science. In reality, there may not be one best model.
• The decision on selecting a most appropriate model should be made based on both statistical and
non-statistical considerations.
• Consider the trade off between variance and bias.
• PH modelling is also a regression analysis. Thus, in addition to checking assumptions and assessing
the model fit, one also has to consider issues of collinearity, overfitting, what interaction terms need to
be included and so on.
142
NOTE: For the Cox model, the default calculation correlates the linear predictor with survival time. A
large linear predictor (i.e., large log hazard) means shorter survival time. To obtain the larger value for a
longer survival time, negate ’pred’ before computing C
library(Hmisc)
library(survival)
set.seed(333)
x1 <- rnorm(200)
x2 <- x1 + rnorm(200)
d.time <- rexp(200) + (x1 - min(x1))
cens <- runif(200, 0.5, 4)
death <- d.time <= cens
o.time <- pmin(d.time, cens)
cox1 <- coxph(Surv(o.time, death) ~ x1 + x2)
print(cox1)
pred <- predict(cox1)
r1 <- rcorr.cens(pred, Surv(o.time, death))
print(r1)
r2 <- rcorr.cens(-pred, Surv(o.time, death))
print(r2)
print(cbind(x1, x2, d.time,cens,death,o.time, pred, -pred))
> print(cox1)
Call:
coxph(formula = Surv(o.time, death) ~ x1 + x2)
z
p
x1 -1.963
0.14
0.313 -6.27 3.6e-10
x2 0.285
1.33
0.167 1.70 8.8e-02
Likelihood ratio test=66 on 2 df, p=4.77e-15
n= 200, number of events= 38
> pred <- predict(cox1)
> r1 <- rcorr.cens(pred, Surv(o.time, death))
> print(r1)
C Index
Dxy
S.D.
n
missing
uncensored Relevant Pairs
8.761416e-02 -8.247717e-01
3.955424e-02
2.000000e+02
0.000000e+00
3.800000e+01
7.008000e+03
Uncertain
3.279200e+04
> r2 <- rcorr.cens(-pred, Surv(o.time, death))
> print(r2)
C Index
Dxy
S.D.
n
missing
9.123858e-01
8.247717e-01
3.955424e-02
2.000000e+02
0.000000e+00
3.800000e+01
7.008000e+03
Uncertain
3.279200e+04
> print(cbind(x1, x2, d.time,cens,death,o.time, pred, -pred))
x1
x2
d.time
cens death
o.time
pred
-pred
[1,] -0.08281164 -0.9941912071 2.39364016 3.3066471
1 2.39364016 -0.030526402 0.030526402
[2,] 1.93468099 1.9228580779 5.15371891 1.5965431
0 1.59654312 -3.159974636 3.159974636
[3,] -2.05128979 -2.6469355391 2.32330035 2.9687605
1 2.32330035 3.362809753 -3.362809753
[4,] 0.27773897 1.4638776176 5.70229234 1.5630664
0 1.56306638 -0.038165802 0.038165802
143
Concordant
6.140000e+02
Concordant
6.394000e+03
library(Hmisc)
library(survival)
attach(subset(data_102210, alln==0)) # Oct 28, 2010
-----------------------------------------------------------------------------------cox1 <- coxph(Surv(tos, os)~age50+femdon+minifg+tcd+wbc50+cr6+cr1+kps90+pbsc
+cmvpos+mudgp+mm+secfg+pbsc.t+factor(mrc.cat))
pred1 <- predict(cox1)
c1 <- rcorr.cens(-pred1, Surv(tos, os))
print(c1)
C Index
Dxy
S.D.
n
missing
7.641054e-01
5.282108e-01
2.345114e-02
7.490000e+02
0.000000e+00
Concordant
Uncertain
3.830000e+02
4.056600e+05
3.099670e+05
1.543600e+05
One way to calculate 95% CI: C hat +/- 1.96*SD/sqrt(n)
0.7641054 +/- 1.96*0.02345114/sqrt(749)
NOTE: For another confidence interval, you can check Hajime’s R package "SurvC"
+cmvpos+mudgp+mm+secfg+pbsc.t+factor(dfci.cat))
print(c2)
C Index
Dxy
S.D.
n
missing
7.64029e-01
5.28058e-01
2.33881e-02
7.49000e+02
0.00000e+00
Concordant
Uncertain
3.83000e+02
4.05660e+05
3.09936e+05
1.54360e+05
+cmvpos+mudgp+mm+secfg+pbsc.t+factor(eortc.cat2))
print(c3)
C Index
Dxy
S.D.
n
missing
7.616378e-01
5.232756e-01
2.344496e-02
7.490000e+02
0.000000e+00
Concordant
Uncertain
3.830000e+02
4.056600e+05
3.089660e+05
1.543600e+05
144
Example of Brier Score
library(survcomp)
attach(subset(data_102210, alln2 == 0))
pbsc.t <- pbsc * tos
cox1 <- coxph(Surv(tos, os) ~ age50 + femdon + minifg + tcd +
wbc50 + cr6 + cr1 + kps90 + pbsc + cmvpos + mudgp + mm +
secfg + pbsc.t + factor(mrc.cat))
pred <- predict(cox1)
dd <- data.frame(time = tos, event = os, score = pred)
sb <- sbrier.score2proba(data.tr = dd, data.ts = dd, method = "cox")
print(sb)
---------------------------------1. MRC
$bsc.integrated
[1] 0.1541683
2. DFCI
$bsc.integrated
[1] 0.1531032
3. EORTC
$bsc.integrated
[1] 0.1538132
145
146
5.13 Multiple Events per Subject
• Multiple events of the same type or different types do occur in clinical trials. Examples include both
death and recurrence in cancer clinical trials, multiple myocardial infactions (MI) in cardiovascular
disease, repeated events of infectious diarrhrea caused by E. Coli among young children in developing
countries (a water intervation study).
• A major issue in this situation is intrasubject correlation.
• The ordinary Cox model assumes independence. i.e., V (β̂) = {E[I(β)]}−1. However, when multiple
events occur within a subject, this assumption does not hold and needs an appropriate correction.
′
• One way to correct this is to use a robust variance estimate (D = D̃ D̃) for correlated data using the
jackknife method.
• The jackknife method leaves out one subject at a time rather than one observation at a time. This
can be obtained in coxph by specifying cluster(id) or robust = T , or by specifying
covsandwich(aggregate) in PHREG.
• In this section, we will focus on multiple events of same type. Multiple events of different types will be
discussed in Section 6.
• There are several models in the literature. Here we discuss three marginal models for ordered events:
Andersen-Gill (AG) model, marginal model by Wei, Lin, Weissfeld (WLW), and conditional model by
Prentice, Williams, Peterson (PWP).
• The AG model is
147
Yi(t)λ0(t)expXi(t)β
where Yi(t) is a at-risk indicator. In Cox model, the uncensored subject i is not included further at risk
once the subject fails at time t. In the AG model for recurrent events, Yi(t) remains in the model and
Yi(t) = 1 for each new event. This model assumes independence of multiple events within a subject.
i.e., the numbers of events in nonoverlapping time intervals are independent given the covariates
(”independent increments”). The time scale is “time since entery”.
• The WLW model is
Yij (t)λ0j (t)expXi (t)βj
for the jth event for the ith subject. This model treats the ordered outcome data set as though it
were an unordered competing risks data. Unlike the AG model, this model allows a separate underlying
hazard for each event. The analysis is on the “time from study entry” scale and all the time intervals
start at zero. i.e., if there is a maximum 3 events, then there will be 3 strata in the data set and in
each stratum, time starts from zero.
• The PWP conditional model assumes that a subject cannot be at risk for event k + 1 until event k
occurs. The hazard function for the ith event is identical to the hazard function in the WLW model,
except for the definition of the at risk indicator Yij (t). Yij (t) is zero until the j − 1st event, then
becomes 1.
Schematic Illustration of the three models
Cox model
PWP model
AG model
WLW model
148
149
Example 11. Multiple Events per Subject
a. Bladder Cancer Data
The bladder cancer data listed in Wei, Lin, and Weissfeld (1989, JASA 84, 1065-71). The data consist of 86 patients with superficial bladder
tumors, which were removed when they entered the study. Forty eight of these patients were randomized into the placebo group, and 38 into the
thiotepa group. Many patients have multiple recurrences of tumors in the study, and new tumors were removed at each visit. The data set contains
the first four recurrences of the tumor for each patient, and each recurrence time was measured from the patient’s entry time into the study.
The input data consist of eight variables:
• ID (patient’s identification)
• TRT (treatment group, where 1=placebo and 2=thiotepa)
• NUMBER (number of initial tumors)
• SIZE (initial tumor size)
• VISIT (event number, where 1=first recurrence, 2= second recurrence, and so on)
• TIME (followup time)
• T1, T2, T3, and T4 : times of the four possible recurrences of the bladder tumor.
• TSTART (time of the (k − 1)st recurrence if VISIT=k, or the the entry time if VISIT=1)
• TSTOP (time of the kth recurrence if VISIT=k)
• STATUS (event status, where 1=recurrence, 0=censored)
data bladder;
input trt time number size @27 t1 @31 t2 @35 t3 @39 t4;
id + 1;
cards;
1
0
1
1
1
1
1
3
1
4
2
1
1
7
1
1
1
10
5
1
1
10
4
1
6
1
14
1
1
1
18
1
1
1
18
1
3
5
1
18
1
1
12 16
1
23
3
3
1
23(*)
1
3
10 15
1
23
1
1
3
16 23
1
1
1
1
1
...
26
26
26
28
29
1
8
1
1
1
2
1
4
2
4
1
2
25
26
Let us consider ID=12
AG
Interval Stratum
(0, 3]
1
(3, 10]
1
(10, 15]
1
WLW
Interval Stratum
(0, 3]
1
(0, 10]
2
(0, 15]
3
PWP: interval
Interval Stratum
(0, 3]
1
(3, 10]
2
(10, 15]
3
PWP: gap time
Interval Stratum
(0, 3]
1
(0, 7]
2
(0, 5]
3
1. Data layout for the WLW marginal model: id=1 is deleted due to zero futime
id
2
2
2
2
3
3
3
3
4
4
4
4
5
5
5
5
6
6
6
time
1
1
1
1
4
4
4
4
7
7
7
7
10
10
10
10
6
10
10
status
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
trt
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
size
3
3
3
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
number
1
1
1
1
2
2
2
2
1
1
1
1
5
5
5
5
4
4
4
visit
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
150
2. Data layout for the AG and PWP conditional models
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
id
2
3
4
5
6
6
7
8
9
9
10
10
10
11
12
12
12
13
13
13
14
14
tstart
0
0
0
0
0
6
0
0
0
5
0
12
16
0
0
10
15
0
3
16
0
3
tstop
1
4
7
10
6
10
14
18
5
18
12
16
18
23
10
15
23
3
16
23
3
9
status
0
0
0
0
1
0
0
0
1
0
1
1
0
0
1
1
0
1
1
1
1
1
trt
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
size
3
1
1
1
1
1
1
1
3
3
1
1
1
3
3
3
3
1
1
1
1
1
number
1
2
1
5
4
4
1
1
1
1
1
1
1
3
1
1
1
1
1
1
3
3
visit
1
1
1
1
1
2
1
1
1
2
1
2
3
1
1
2
3
1
2
3
1
2
151
3. R code and coxph outputs for the three models
a. time to first event (Cox model)
Cox <- coxph(Surv(time, status) ~ trt+size+number, data=cox.dat)
***time: time to the first event
n= 85
z
p
trt
-0.5260
0.591
0.3158 -1.665 0.0960
size
0.0696
1.072
0.1016 0.685 0.4900
number 0.2382
1.269
0.0759 3.139 0.0017
trt
0.591
1.692
0.318
1.10
size
1.072
0.933
0.879
1.31
number
1.269
0.788
1.094
1.47
Rsquare= 0.11
Wald test
= 10.5 on 3 df,
p=0.0193
p=0.0145
p=0.0111
b. WLW model
WLW <- coxph(Surv(time, status) ~ trt+size+number+strata(visit)+cluster(id), data=wlw.dat)
n= 340
coef exp(coef) se(coef) robust se
z Pr(>|z|)
trt
-0.58479
0.55722 0.20105
0.30795 -1.899 0.05756 .
size
-0.05162
0.94969 0.06973
0.09459 -0.546 0.58526
number 0.21029
1.23404 0.04675
0.06664 3.156 0.00160 **
trt
size
number
0.5572
1.7946
0.3047
1.019
0.9497
1.0530
0.7890
1.143
1.2340
0.8103
1.0829
1.406
Rsquare= 0.072
152
153
c. AG model
AG <- coxph(Surv(tstart, tstop, status) ~ trt+size+number+cluster(id), data=bladder_AG.dat)
n= 190
z
p
trt
-0.4116
0.663
0.1999
0.2488 -1.655 0.0980
size
-0.0411
0.960
0.0703
0.0742 -0.554 0.5800
number 0.1637
1.178
0.0478
0.0584 2.801 0.0051
trt
0.663
1.509
0.407
1.08
size
0.960
1.042
0.830
1.11
number
1.178
0.849
1.050
1.32
Rsquare= 0.074
(max possible=
Wald test
= 11.2 on
0.992 )
3 df,
p=0.00213
3 df,
p=0.0107
3 df,
p=0.00104,
Robust = 10.8
p=0.0126
proc phreg covs(aggregate);
model (tstart, tstop)*status(0)=trt size number / ties=efron;
id id; * ** to calculate the robust sandwich variance estimate for each subject.
where tstart < tstop;
with Sandwich Variance Estimate
Parameter
trt
size
number
DF
1
1
1
Parameter
Estimate
-0.41164
-0.04108
0.16367
Standard
Error
0.24152
0.07228
0.05691
StdErr
Ratio
1.208
1.028
1.191
Chi-Square
2.9048
0.3230
8.2722
Pr > ChiSq
0.0883
0.5698
0.0040
Hazard
Ratio
0.663
0.960
1.178
154
d. conditional model
PWP <- coxph(Surv(tstart, tstop, status) ~ trt+size+number+strata(visit)+cluster(id),
data=bladder_AG.dat)
n= 190
z
p
trt
-0.3335
0.716
0.2162
0.2048 -1.628 0.10
size
-0.0085
0.992
0.0728
0.0616 -0.138 0.89
number 0.1196
1.127
0.0533
0.0514 2.328 0.02
trt
0.716
1.396
0.480
1.07
size
0.992
1.009
0.879
1.12
number
1.127
0.887
1.019
1.25
Rsquare= 0.034
(max possible=
Wald test
= 7.26 on
0.965 )
3 df,
p=0.0893
3 df,
p=0.064
3 df,
p=0.0747,
Robust = 8.83
proc phreg covs(aggregate); ** PWP model;
strata visit;
model (tstart, tstop)*status(0)=trt size number / ties=efron;
id id;
where tstart < tstop;
with Sandwich Variance Estimate
Parameter
Standard
StdErr
Parameter
DF
Estimate
Error
Ratio
Chi-Square
trt
1
-0.33349
0.19727
0.913
2.8579
size
1
-0.00849
0.06018
0.827
0.0199
number
1
0.11962
0.04971
0.932
5.7894
p=0.0317
Pr > ChiSq
0.0909
0.8878
0.0161
Hazard
Ratio
0.716
0.992
1.127
155
Example 12. Weighted Cox regression Analysis: E4494
E4494 is a phase III trial of CHOP versus R-CHOP in older patients with diffuse large B-cell lymphoma (DLBCL). The study was a two stage
randomized design with the first randomization to either CHOP(cyclophosphamide, doxorubicin, vincristine, prednisone) or R-CHOP for the
induction treatment and the second randomization to maintenance rituximab (MR) or observation (Obs) for remitters. The primary endpoint was
failure-free survival (FFS), defined as time from randomization to relapse, non-protocol treatment or death. There were two study questions: i)
which induction treatment is better; ii) whether rituximab should be given in maintenance. The results were reported by Habermann et al.(JCO,
2006). The schema is shown below.
R−CHOP
MR
CR/PR
1st
Rando
CHOP
• Results in the published paper (Habermann et al. JCO, 2006)
Induction
no. of pts
Maintenance
N RR rate 3yr FFS not in main
N 2yr FFS
R-CHOP 267
77%
53%
90
MR 177
76%
CHOP
279
76%
46%
97
Obs 182
61%
0.009
p-value
0.04
note: 2yr FFS from the second randomization.
N 2yr FFS
R-CHOP → MR 82
79%
R-CHOP → Obs 95
77%
CHOP → MR
95
74%
CHOP → Obs
87
45%
p-value
0.0004
FFS: after the second randomization.
2nd
Rando
Obs
156
• However, the interaction between induction and maintenance therapy was significant because MR improved the outcome after CHOP but not
after R-CHOP. i.e., the data suggest that rituximab as part of induction therapy or as maintenance in responding patients result in a
significant prolongation of FFS (P=0.0004).
• Because of the observed difference in effect of MR according to the type of induction, a secondary analysis was performed to address the
induction question without MR. In the secondary analysis, the HR of R-CHOP relative to CHOP in FFS was 0.64 with p=0.003.
• Common practice of data analysis in two-stage randomization studies are: i)estimating survival distribution under different induction therapies
using all data while ignoring maintenance therapy; ii) estimating postremission survival distribution using only data for individuals receiving
maintenance therapy. But neither of these approaches addresses the induction and maintenance question properly since a subsequent
randomization to maintenance therapy was conducted contingent on their remission status and consent. To remedy this problem, Lunceford
et al (2002) and Wahed et al (2004) proposed a weighted approach.
• Motivated by Lunceford et al (2002) and Wahed et al (2004), a weighted Cox regression analysis was conducted to compare induction
treatments without the confounding effect of maintenance therapy by i) excluding MR patients, ii) roughly doubling the information for
patients randomly assigned to observation, iii) using the robust variance estimator.
• If patients with MR are simply excluded in the analysis, patients with second randomization would be underrepresented in the comparison of
induction therapy. Thus, in the weighted Cox model for the induction comparison, patients with MR were excluded, but patients in the
observation arm were weighted by 1.97.
• i.e., 1.97=no of Obs / total no of pts in the maintenance therapy−1 = (182/359)−1=1.97
• Robust variance estimator was used by specifying cluster(case).
R code: the results shown below are slightly different from the publsihsed numbers due to the update of the data
a. naive Cox model applied to 546 patients who were randomized to RCHOP (n=267) or to CHOP(n=279).
naive.cox <- coxph(Surv(ttfst1,failst1)~trt1, data=e4494.dat) ** trt1=1 if RCHOP, 0 for CHOP
n= 546
z
p
trt1
-0.236
0.79
0.111 -2.12 0.034
trt1
0.79
1.27
0.635
0.983
Rsquare= 0.008
(max possible=
Wald test
= 4.49 on
0.999 )
1 df,
p=0.0337
1 df,
p=0.0342
1 df,
p=0.0338
b. naive Cox model with exclusion of MR
no.MR.cox <- coxph(Surv(ttfst1,failst1)~trt1, data=no.MR.dat)
n= 369 (182+187)
z
p
trt11
-0.299
0.742
0.128 -2.34 0.019
trt11
0.742
1.35
0.577
0.953
Rsquare= 0.015
(max possible=
Wald test
= 5.48 on
0.999 )
1 df,
p=0.0189
1 df,
p=0.0192
1 df,
p=0.0188
157
c. weighted Cox model with exclusion of MR
wt.cox <- coxph(Surv(ttfst1,failst1)~trt1+cluster(case), weight=wt,
data=no.MR.dat)
n= 369
z
p
trt1
-0.362
0.696
0.108
0.136
-2.66 0.0077
trt1
0.696
1.44
0.533
0.909
Rsquare= 0.03
(max possible= 1 )
Wald test
= 7.1 on 1 df,
p=0.000809
p=0.0077
p=0.000782,
Robust = 7.06
p=0.00788
Note: The p-value in the JCO paper was 0.003, but in this re-analysis, p=0.0077.
Since ECOG data change very often after study closure due to the late response
or change in pathology or change in eligibility and so on, it is not surprisong to
see a slightly different p-value.
158
159
6.0 Analysis of Competing Risks Data
6.1. Introduction
• Competing risks (CR) occur commonly in medical research, although the presence is not always
recognized.
• Competing risks data are inherent to cancer clinical trials in which failure can be classified by the
types. e.g., death from the treatment related toxicity (TRM) and recurrence of disease (relapse).
• Competing risks arise when individuals can experience any one of J distinct event types and the
occurrence of one type of event prevents the occurrence of other types of events or alters the
probability of occurence of the other event.
• Types of CR: ’classic’, ’semi-competing risks’
• For the analysis of competing risks data, standard survival analysis should not be applied.
• Parallel to the standard survival analysis, competing risks data analysis includes estimation of the
cumulative incidence of an event of interest in the presence of competing risks, comparison of
cumulative incidence curves in the presence of competing risks, and competing risks regression analysis.
Types of Treatment Failure
1.0
1.0
0.8
0.8
0.6
0.6
Probability
Probability
Treatment Failure
160
NRM+Relapse (1−EFS)
0.4
0.2
NRM
0.4
0.2
Relapse
0.0
0.0
0
1
2
3
Years from Transplantation
4
0
1
2
3
4
161
6.2. Mathematical Definitions and Terminologies
6.2.1. Approaches
Failure time in the competing risks setting can be described univariately or multivariately.
• Traditional (latent failure times) Approach
– (T1, · · · , Tk ): k latent failure times, where Ti is the time to failure of cause i, i = 1, 2, · · · , k
– T = min(T1, · · · , Tk ) since only one of the failures can occur.
– Accounting for censoring, the observable quantities are (Y, I), where Y = C if I = 0, and Y = T
and I = i if an event occurs with a failure type i, (i = 1, 2, · · · , k).
• Focused on cause-specific hazard
• Because the latent approach is based on multivariate failure times, the cause-specific hazard for an
event of interest is derived from a joint and marginal survivor functions.
• The joint distribution of competing risks failure times is unidentifiable unless failure times are
independent (Tsiatis 1975, PNAS).
• Even though competing risks are observable, observations of (Y, I) give no information on whether
failure times are independent or not and
• The assumption of independence is untestable and unjustifiable in the competing risks setting in which
the biologic mechanisms among risks of events may be either unknown or likely interdependent.
Non−myeloablative Transplant
1.0
1.0
0.8
0.8
0.6
0.6
Probability
Probability
Myeloablative Transplant
162
0.4
TRM
0.4
TRM
0.2
0.2
Relapse
Relapse
0.0
0.0
0
1
2
3
4
5
0
1
2
3
4
5
• Modern Approach based on subdistribution function
– T : denote time to an event
– C: censoring time
– Y = min(T, C): observed failure time
– I = i (i = 1, 2, · · · , k) for failure type i
– (Y, I): observable quantaties
• Focus on cumulative incidence function of cause i directly
• no independenece assumption
163
6.2.2. Definitions
Suppose there are k distinct types of failure.
• Overall hazard function at time t
1
λ(t) = lim Prob(t ≤ T < t + u|T ≥ t)
u→0 u
• Cause-specific (CS) hazard
Prob(t ≤ T < t + u, I = i|T ≥ t)
, i = 1, · · · , k
u→0
u
λi(t) = lim
λi (t) represents the instanteneous rate for failure of type i at time t in the presence of other failure
types.
• CS Cumulative hazard function
Λi (t) =
• CS Survival function
Z
t
0
λi (u)du
Si (t) = exp[−Λi(t)]
164
165
• If only one of the failure types can occur for each individual, then
λ(t) =
k
X
i=1
λi (t)
and
S(t) = P (T > t) = exp[−
• Subdensity function for failure i
k
X
i=1
Λi(t)]
1
fi (t) = lim Prob(t ≤ T < t + u, I = i)
u→0 u
= λi (t)S(t), i = 1, · · · , k
Thus
λi(t) =
fi(t)
S(t)
(5)
• Cumulative incidence function (CIF) of cause i
Fi (t) = Prob(T ≤ t, I = i) =
Z
t
0
fi(u)du =
166
Z
t
0
λi (u)S(u)du
for i = 1, · · · , k. This is also called subdistribution function. As t → ∞,
P
Fi (∞) = Prob(I = i) = pi < 1 where ki=1 pi = 1.
• CIF for cause i ignoring other causes
F∗i (t)
=
Z
t
0
λi(u)Si∗(u)du
where Si∗ (t) is a cause-specific survival function for cause i censoring competing risks.
F∗i (t) + Si∗ (t) = 1.
• Because events from causes other than i are treated as censored in Si∗(t), S(t) ≤ Si∗(t), and thus
Fi (t) ≤ F∗i (t).
• Si∗ (t) is used in the standard survival analysis and it is biased if there are competing risks.
(6)
167
• Since no one-to-one relationship exists between the cause-specific hazard and the CIF for failure i, the
comparison of cause-specific hazards of failure i between different groups can be quite different from
the comparison of the cumulative incidence of failure i.
• To be able to directly compare subdistribution functions, Gray (1988, Ann Statist) further defined a
hazard function that corresponds to the subdistribution.
• Subdistribution hazard for failure i
1
γi(t) = lim Pr{t ≤ T < t + u, I = i|T ≥ t ∪ (T ≤ t ∩ I 6= i)}
u→0 u
fi (t)
=
1 − Fi(t)
• Subdistribution hazard: probability of observing an event of interest in the next time interval given
that either the event did not occur until that time or that the competing risks event occurred.
168
6.3. Estimation of Cumulative Incidence Function
• Let 0 < t1 < · · · < tl represent the l ordered distinct failure times for any cause of failure. If t is
discrete, the hazard of failing from cause i is
Prob(T = tj , I = i)
, j = 1, · · · , l
λi (tj ) =
Prob(T > tj−1)
and the estimate is
dij
λ̂i(tj ) =
nj
where dij is the number of failues of cause i at time tj and nj is the number of subjects at risk just
prior to tj .
• Let dj =
Pk
i=1 dij
and λ̂(tj ) =
Pk
i=1 λ̂i (tj ).
Ŝ(t) =
Y
Then the KM estimate of the overall survival function (5) is
(1 − λ̂(tj )) =
j:tj <t
• Thus, the estimate of the CIF (6) is
F̂i (t) =
Y
(1 −
i:tj <t
dj
).
nj
dij
Ŝ(tj−1).
j:tj <t nj
X
note: see Marubini and Valsecchi (1995) for the derivation of the variance of (8).
(7)
(8)
169
• But, if we use the naive KM method which ignores competing risks, the estimate of CIF for failure i is
Y dij
F̂i∗(t) =
Ŝi (tj−1),
(9)
j:tj <t nj
where
Ŝi (t) =
Y
(1 −
j:tj <t
dij
).
nj
(10)
Because Ŝ(t) ≤ Ŝi (t), F̂i(t) ≤ F̂i∗(t). Therefore, when there are competing risks, the KM method in
standard survival analysis overestimates the cumulative incidence function, and the maginitude of
overestimation in the KM method depends on the level of incidence rates of competing events.
• In summary
CR method
KM method
Fi (t) = 0t λi(u)S(u)du F∗i (t) = 0t λi (u)Si(u)du
P
S(t) = exp[− ki=1 Λi (t)] Si (t) = exp[−Λi(t)]
d
d
Q
Q
F̂i(t) = j:tj <t nijj Ŝ(tj−1) F̂i∗(t) = j:tj <t nijj Ŝi (tj−1)
R
Ŝ(t) =
Q
R
i:tj <t (1
d
− njj )
Ŝi (t) =
Q
j:tj <t (1
−
dij
nj )
Numeric Example
a. naive (KM) method
time cod at rsk rel cens. ŜR∗ (t)
10
R
10
1
0
1*(9/10)=0.9
20+ 9
0
1
0.9*(9/9)=0.9
35
R
8
1
0
0.9*(7/8)=0.787
40
T
7
0
1
0.787*(7/7)=0.787
50+ 6
0
1
0.787*(6/6)=0.787
55
R
5
1
0
0.7875*(4/5)=0.63
70
T
4
0
1
0.63*(4/4)=0.63
71
T
3
0
1
0.63*(3/3)=0.63
80
R
2
1
0
0.63*(1/2)=0.315
90+ 1
0
1
0.315*(1/1)=0.315
b. CR method
time cod at rsk r/t cens. Ŝ(t)
10
R
10
1
0
1*(9/10)=0.9
20+ 9
0
1
0.9*(9/9)=0.9
35
R
8
1
0
0.9*(7/8)=0.787
40
T
7
1
0
0.787*(6/7)=0.675
50+ 6
0
1
0.675*(6/6)=0.675
55
R
5
1
0
0.675*(4/5)=0.54
70
T
4
1
0
0.54*(3/4)=0.405
71
T
3
1
0
0.405*(2/3)=0.27
80
R
2
1
0
0.27*(1/2)=0.135
90+ 1
0
1
0.135*(0/1)=0.135
cod: cause of death, r/t: relapse or TRM
KM F̂R∗ (t)
0+1*(1/10)=0.1
0.1+0.9*(0/9)=0.1
0.1+0.9*(1/8)=0.212
0.212+0.787*(0/7)=0.212
0.212+0.787*(0/6)=0.212
0.212+0.787*(1/5)=0.37
0.37+0.63*(0/4)=0.37
0.37+0.63*(0/3)=0.37
0.37+0.63*(1/2)=0.685
0.685+0.315*(0/1)=0.685
CR F̂R (t)
0+1*(1/10)=0.1
0.1+0.9*(0/9)=0.1
0.1+0.9*(1/8)=0.212
0.212+0.787*(0/7)=0.212
0.212+0.675*(0/6)=0.212
0.212+0.675*(1/5)=0.347
0.347+0.54*(0/4)=0.347
0.347+0.405*(0/3)=0.347
0.347+0.27*(1/2)=0.482
0.482+0.135*(0/1)=0.482
170
171
0.8
1.0
0.6
0.2
0.4
CR
0.0
probability
KM
0
20
40
60
time
80
100
172
Example: Myeloablative (MT) vs. Nonmyeloablative (NT) Allogeneic hematopoietic stem cell
transplantation (HSCT) for patients > 50 years of age with hematologic malignancies - a real data
example. (Alyea et al, 2005)
• Allogeneic HSCT refers to the transplantation of allogeneic stem cells derived from donor bone
marrow or blood. HSCT is a treatment modality that can provide curative therapy for many
hematologic malignancies.
• A typical competing risks data set - both relapse and TRM are important outcomes.
• Therapeutic benefit and potential cure achieved by allogeneic HSCT is derived from the donor immune
system (called ‘graft-vs-tumor’ effect).
• However, the therapeutic potential of allogeneic HSCT has not been fully realized due to both disease
relapse and transplant-related mortality and morbidity (TRM).
• The objective of this study was thus to examine the impact of conditioning regimens (MT vs NT) on
the two important endpoints in HSCT: relapse and TRM.
• 152 patients over age 50 who underwent HLA-matched allogeneic transplantation from 1997 through
2002 at our institution were included.
173
• Of these 152, 81 patients underwent MT and 71 underwent NT. Analyzing the MT cohort first,
3-yr F̂R
3-yr F̂T
KM
50%
58%
108%
CR
30%
50%
80%
0.8
0.6
TRM − KM
TRM − CR
0.4
probability
0.8
0.6
0.4
Relapse − KM
0.2
0.0
0.2
Relapse − CR
0.0
probability
1.0
KM vs CR: TRM
1.0
KM vs CR: relapse
0
6
12
18
24
30
months post HSCT
36
42
48
0
6
12
18
24
30
months post HSCT
36
42
48
174
1.2
CR method
1.2
KM method
1.0
0.8
TRM+Rel (1−EFS)
TRM
0.4
0.4
Relapse
0.6
probability
0.6
TRM
0.2
0.0
0.2
Relapse
0.0
probability
0.8
1.0
TRM+Relapse
0
6
12
18
24
30
months post HSCT
36
42
48
0
6
12
18
24
30
months post HSCT
36
42
48
R code:
library(cmprsk)
attach(subset(age50.dat, mini==0)) # c.rirk=1 if TRM, 2 if relpase, 0 if censored
cr1 <- cuminc(c.time, c.risk, cencode=0)
print(cr1)
timepoints(cr1, c(24, 36)) # point estiamtes at 2 yr and 3 yr
Estimates and Variances:
$est
10
20
30
40
1 1 0.4320988 0.4718793 0.4983996 0.4983996
1 2 0.2098765 0.2496571 0.2629172 0.2996377
$var
10
20
30
40
1 1 0.003092805 0.003183996 0.003219427 0.003219427
1 2 0.002090070 0.002403033 0.002497967 0.002917386
$est
24
36
1 1 0.4851395 0.4983996
1 2 0.2629172 0.2996377
$var
24
36
1 1 0.003204113 0.003219427
1 2 0.002497967 0.002917386
plot(cr1, curvlab=c("TRM","Relapse "), ylim=c(0,1.0), xlim=c(0,48), lty=c(1,1), main=" ",
ylab=" ", xlab=" ", col=c(2,4), lwd=c(2,2),yaxt="n")
title("Cumulative Incidence of TRM and Relapse")
box(which="plot", lty = "solid") # no box if this statement is omitted
par(xaxt="s")
axis(1, at=c(0,12,24,36,48), label=c("0","1","2","3","4"), cex.axis=1.3)
axis(2, at=c(0.0,0.2,0.4,0.6,0.8,1.0), cex.axis=1.3)
175
176
6.4 Comparison of Multiple Cumulative Incidence Functions
We discuss here the Gray test. For other tests, check Pepe Mori, Stat Med 12:737-751, 1993; Lin DY Stat
Med 16:901-910 1997
Gray test
• Gray (1988, Annals of Statistics) proposed a class of tests for comparing the cumulative incidence
functions of a particular type of failure among different groups in the presence of competing risks.
• Suppose there are two types of failure (i=1, 2) and two treatment groups (k=A,B). Then testing the
group difference in failure 1 is
Ho : F1A = F1B = F1o
where F1A is a subdistribution of failure type 1 in the treatment group A.
• Gray argued that testing F1A = F1B is not the same as testing λ1A = λ1B .
• To test F1A = F1B , the Gray test compares weighted averages of the sub-distribution hazards,
γ1k = f1k /(1 − F1k ), k = A, B.
Z
τ
0


w(t) 


Z τ
dF̂1B (t) 
dF̂1A(t)
 =
w(t) (γ̂1A − γ̂1B ))
−

0
1 − F̂1A(t−) 1 − F̂1B (t−)
˙ and w(t) is a weight function.
˙ is an estimate of F1(t)
where F̂1(t)
177
• Under Ho , the subdistribution hazard ratio of the two treatments is equal to 1 and constant over time
(PH).
• Suppose there are two types of failures (i=1, 2), relapse and TRM, and two treatment groups
(k=A,B).
• Let T = min(T1, T2 ) and (T1, T2) represent failure times of relapse and TRM in group 1 that have a
bivariate exponential distribution.
T = min(T1, T2),
subdistribution functions :
cause specific hazards :
subdensity :
Fk (t1, t2) = 1 − e−λ1k t1 −λ2k t2
λ1k
F1k (t) =
(1 − e−(λ1k +λ2k )t )
λ1k + λ2k
λ2k
(1 − e−(λ1k +λ2k )t )
F2k (t) =
λ1k + λ2k
λ1k λ2k
f1k (t) = λ1k e−(λ1k +λ2k )t, f2k (t) = λ2k e−(λ1k +λ2k )t
Group 1
Failure 1 λ11 = 0.3
Group 2
λ12 = 0.2
F11(t) = 63 (1 − e−0.6t) F12(t) = 23 (1 − e−0.3t )
Failure 2 λ21 = 0.3
λ22 = 0.1
F21(t) = 63 (1 − e−0.6t) F22(t) = 13 (1 − e−0.3t )
At t=2, F11(2) = 0.35, F12(4) = 0.30.
At t=4, F11(5) = 0.48, F12(4) = 0.52
Thus, λ11 > λ12 6=⇒ F11 > F12.
• This is caused by the dependency of the CIFs, not only on the λ1k , but also on the λ2k for the
competing risks.
178
179
Figure 2
,
0.8
No CR
0.6
CR, Group 2
0.2
0.4
CR, Group 1
0.0
Cumulative Incidence Function
1.0
p1
No CR, Grou
2
Group
0
2
4
6
8
Time to event
10
12
14
16
180
6.4.1 Estimation of Gray Statistic
• Let njA and njB be the number of subjects at risk at tj who are free from failures of any type in
treatment A and B, respectively.
• The number of subjects at risk of failing for type 1 event is expected to be greater than njA and njB
as we need the number of subjects free from failure of type 1. Thus, Gray test propses a ’correction
F̂i (t−)
factor’ of njA and njB , and that is 1−Ŝ(t−)
≥ 1.
• The estimates of risk sets for type 1 failure at tj for the treatment groups A and B are
R̂1A(tj ) = njA
1 − F̂1A(t−)
1 − F̂1B (t−)
and R̂1B (tj ) = njB
,
ŜA(t−)
ŜB (t−)
R̂1(tj ) = R̂1A(tj ) + R̂1B (tj ).
R̂1(tj ) is the total number of subjects from two treatment groups combined at risk at tj for failure
type 1.
• Then, the score (numerator of the Gray statistic) for failure 1 in group A is
d1A
d1
X
zA =
w(t)(
− ) if w(t) = R1A
R1A R1
t∈(0,tl )
R1A
X
=
(d1A −
d1)
R1
t∈(0,tl )
where R1 = R1A + R1B .
• The quadratic term of this score divided by its variance, V , is
′
(note: χ2k−1 for k groups).
zk V −1 zk ∼ χ21.
181
Let us consider the example presented in Table 10.1 in Marubini and Valsecchi (1995).
Trt A
Failure type 1 1 13 17 30 34 41 78 100 119 169
Failure type 2 1 6 8 13 13 15 33 37 44 45 63
80 89 89 91 132 144 171 183 240
censored data 34 60 63 149 207
Trt B
7 16 16 20 39 49 56 73 93 113
1 2 4 6 8 9 10 13 17 17 17 18
18 27 29 39 50 69 76 110
34 60 63 78 149
At time 16 for failure type 1,
Ordinary 2x2 table
using Gray’s risk set
Trt no. of failure 1 R16 Trt no. of failure 1 R16
A
0
27 A
0
34.5
B
2
26 B
2
34
Thus, the score (or the numerator) of the Gray test statistic at t=16 is thus
0−2∗
34.5
= −1.007
68.5
182
------------------------------------------------------------------------------trt time ni
e(t) S(t-)
ci(t)
ci(t-)
11-F1A(t-)/
R1k(t)
type 1
F1A(t-) S(t-)
------------------------------------------------------------------------------A
0
35
1
1.0000
0.00000
A
1
35
1
1.0000
0.01429 0.00000 1.00000
1.00000
35.00
A
13
31
1
0.8857
0.04286 0.01429 0.98571
1.11292
34.50 <==
A
17
27
1
0.7714
0.08571 0.04286 0.95714
1.24079
33.50
A
30
26
1
0.7429
0.11429 0.08571 0.91429
1.23070
32.00
A
34
24
1
0.6857
0.12857 0.11429 0.88571
1.29169
31.00
A
41
21
1
0.6273
0.15869 0.12857 0.87143
1.38917
29.17
A
78
16
1
0.5060
0.22406 0.15869 0.84131
1.66267
26.60
A
100
10
1
0.3374
0.26127 0.22406 0.77594
2.29978
23.00
A
119
9
1
0.3036
0.29848 0.26127 0.73873
2.43324
21.90
A
169
6
1
0.2024
0.32453 0.29848 0.70152
3.46602
20.80
------------------------------------------------------------------------------B
0
35
1
0.0000
0.00000
B
7
31
1
0.8857
0.02857 0.00000 1.00000
1.12905
35.00
B
16
26
2
0.7429
0.07143 0.02857 0.97143
1.30762
34.00 <==
B
20
19
1
0.5429
0.14363 0.10000 0.90000
1.65776
31.50
B
39
16
1
0.4571
0.17375 0.14363 0.85637
1.87349
29.98
B
49
13
1
0.3962
0.18880 0.17375 0.82625
2.08545
27.11
B
56
11
1
0.3352
0.20643 0.18880 0.81120
2.42004
26.62
B
73
7
1
0.2667
0.24266 0.20643 0.79357
2.97552
20.83
B
93
5
1
0.1905
0.27987 0.24266 0.75734
3.97553
19.88
B
113
2
1
0.0952
.
0.27987 0.72013
7.56437
15.13
------------------------------------------------------------------------------ci: cumulative incidence of failure 1. e(t): no of events of interest at t.
183
Example: Returning to the previous example.
KM method
NT
MT
CR method
p-value
(log-rank) NT
MT
p-value
(Gray test)
2.5-yr F̂R
61% 50%
0.35
46% 30%
0.052
2.5-yr F̂T
38% 58%
0.008
32% 50%
0.01
>library(cmprsk)
>attach(age50.dat) # c.risk=1 if TRM, 2 if Relapse, 0 if censored
>cuminc1 <- cuminc(c.time, c.risk, mini, cencode=0)
>timepoints(cuminc1, c(24, 30))
Tests:
stat
pv df
1 6.401663 0.01140135 1
2 3.788293 0.05161226 1
Estimates and
$est
24
0 1 0.4851395
1 1 0.3155839
0 2 0.2629172
1 2 0.4099636
Variances:
30
0.4983996
0.3155839
0.2629172
0.4648541
$var
0
1
0
1
1
1
2
2
24
0.003204113
0.003444021
0.002497967
0.004217095
30
0.003219427
0.003444021
0.002497967
0.006467915
184
185
0.6
0.4
MT, TRM
NT, Relapse
MT, Relapse
0.2
NT, TRM
0.0
probability
0.8
1.0
CR method
0
6
12
18
24
30
months post HSCT
36
42
48
186
Relapse vs. TRM
Relapse
TRM
TRM
Relapse
187
6.5. Competing Risks Regression Analysis
• Why do we do regression analysis?
– to identify potential prognostic factors for a particular failure
– to assess a prognostic factor of interest in the presence of other potential prognostic factors
• Fitting a Cox model for an event of interest when competing risks are present won’t address these two
questions properly because the cause-specific Cox model treats competing risks as censored
observations, and the cause-specific hazard function does not have a direct interpretation.
• Fine and Gray (JASA, 1999) and Klein and Andersen (Biometrics, 2005) proposed a direct regression
modeling of the effect of covariates on the cumulative incidence function for competing risks data.
• These models distinguish between patients who are still alive and those who have already failed from
competing causes and allow direct inference about the effects of covariates on the cumulative incidence
188
Fine and Gray model:
• A Cox PH like model for the subdistribution hazard.
• The model uses the partial likelihood principle and weighted estimated equations to obtain consistent
estimators of the covariate effects.
• Let γ1(t; X) be the subdistribution hazard for failure 1, conditional on the covariates, X.
1
γ1(t; X) = lim Pr{t ≤ T < t + u, I = 1|T ≥ t ∪ (T ≤ t ∩ I 6= 1), X}
u→0 u
f1 (t; X)
′
=
= γ0(t)eX β
1 − F1(t; X)
where γ0(t) is the baseline hazard of the subdistribution, F1, X is the vector of covariates, and β is
the vector of coefficients.
• The risk set is
Ri =
{j : (min(Cj , Tj ) ≥ Ti )
|
{z
}
those who have not failed from any cause
∪ (Tj ≤ Ti ∩ I 6= 1 ∩ Cj ≥ Ti)}
|
{z
}
those who have failed from another cause
• The risk set is improper and unnatural since in reality those individuals who failed from causes other
than failure 1 prior to time ti can not be ”at risk” at ti.
• Although the risk set is unnatural, it leads to a proper PL for the improper F1(t; X)
• The partial likelihood function is
P L(β) =
Y
j
189
exp(Xj β)
P
i∈Rj wij (t)exp(Xi β)
• A choice of weight is
wij (t) =
Ĝ(ti)
,
Ĝ(ti ∧ tj )
(11)
where Ĝ is the Kaplan-Meier estimate of the survivor function of the censoring distribution. The
weight is 1 for those who did not experience any event by time ti and ≤ 1 for those who experienced a
competing risk event before time ti. i.e., individuals experiencing a competing risk event are not fully
counted in the PL.
• As in the Cox partial likelihood function, taking derivatives with respect to β to the log partial
likelihood function gives the score statistic
P
l
X
r∈R wij (t)Xr exp(Xr β)
}.
U (β) = {Xj − P j
w
(t)exp(X
β)
j
r∈Rj ij
r
β̂ is then the value which maximizes the score function.
• If there is only one type of failiure, the Fine and Gray model reduces to the Cox model.
• Limitations: no stratification is allowed. It handles β(t)X, but not βX(t).
(12)
Example of weight calculation for CIF of relapse
subject time cod
1
10
R
2
20+ 3
35
R
4
40
T
5
50+ 6
55
R
7
70
T
8
71
T
9
80
R
10
90+ cod: cause of death,
time
10
35
55
80
Ĝ(t)
1
0.89
0.89
0.89
0.74
0.74
0.74
0.74
0.74
0
R: relapse, T: TRM
s1 s2 s3
(R) (C) (R)
1
1
1
1
-
-
-
s4
s5 s6
(T)
(C) (R)
1
1
1
1
1
1
Ĝ(55)
= 0.83 1
Ĝ(40)
Ĝ(80)
Ĝ(40)
= 0.83
-
-
s7
(T)
1
1
1
Ĝ(80)
Ĝ(70)
=1
s8
(T)
1
1
1
Ĝ(80)
Ĝ(71)
=1
’-’ denotes observations that are not in the risk set at that time point.
s9 s10
(R) (C)
1
1
1
1
1
1
1
1
190
191
Example: Returning to the previous example.
• The cumulative incidence curves of TRM indicate that MT is associated with an increased risk of
TRM (p=0.01). However, this is confounded by bone marrow (BM) progenitor cells. 35 of the 40
patients who died of TRM in MT received BM.
• The cumulative incidence curves of relapse indicate that NT is associated with an increased risk of
relapse (p=0.052). However, this is also confounded by the unfavorable risk status at the time of
transplantation. Of 51 relapsed patients (24 in MT and 28 in NT), 41 pts (17 in MT, 24 in NT) had
unfavorable risk characteristics at the time of HSCT.
Table 6.1: Results of the HSCT example
Relapse & TRM: Cox
Relapse: CRR
TRM: CRR
Variable
β HR p-value
β HR p-value
β HR p-value
Bone marrow
0.118 1.13 0.71
-0.781 0.46 0.11
0.808 2.24 0.057
Non-myeloablative -0.428 0.65 0.25
-0.556 0.57 0.33
0.057 1.06 0.90
Poor prognosis
0.426 1.53 0.06
0.891 2.44 0.02
0.255 1.29 0.38
Sex mismatch
-0.385 0.68 0.053
-0.659 0.52 0.04
0.179 1.20 0.50
HR: hazard ratio.
Cox: Cox proportional hazards model for Relapse and TRM combined as a single event.
CRR: competing risks regression model using the Fine and Gray method.
R code
library(cmprsk)
attach(age50.dat) # failcode = 1 if TRM, 2 if Relapse, 0 if censored
crr.rel <- crr(c.time, c.risk, cov1 = cbind(age, urd, bm, mini, good,
sec.tx, fk506, sex.mm), failcode = 2, cencode = 0)
#
summary(crr.rel)
crr(ftime = c.time, fstatus = c.risk, cov1 = cbind(age, urd,
bm, mini, good, sec.tx, fk506, sex.mm), failcode = 2, cencode = 0)
age
urd
bm
mini
good
sec.tx
fk506
sex.mm
z p-value
0.0487
1.050
0.0443 1.099
0.270
-0.1644
0.848
0.3393 -0.485
0.630
-0.7811
0.458
0.4849 -1.611
0.110
-0.5561
0.573
0.5715 -0.973
0.330
-0.8906
0.410
0.3861 -2.307
0.021
0.0889
1.093
0.4211 0.211
0.830
0.0805
1.084
0.4996 0.161
0.870
-0.6589
0.517
0.3192 -2.065
0.039
age
urd
bm
mini
good
sec.tx
fk506
sex.mm
exp(coef) exp(-coef) 2.5% 97.5%
1.050
0.952 0.963 1.145
0.848
1.179 0.436 1.650
0.458
2.184 0.177 1.184
0.573
1.744 0.187 1.758
0.410
2.437 0.193 0.875
1.093
0.915 0.479 2.495
1.084
0.923 0.407 2.885
0.517
1.933 0.277 0.967
Num. cases = 152
Pseudo Log-likelihood = -231
Pseudo likelihood ratio test = 22.1
on 8 df,
rel #
192
193
Stratified Fine and Gray model
• Zhou et. al. (Biometrics 2011) developed stratified Fine and Gray model.
• Allows the baseline hazard to vary across levels of the stratification covariate
• Two types of a stratification factor: i) a small number of levels in strata, ii) a large number levels in
strata.
• We consider here the first case. For the second case, see Zhou et. al.
• Stratified partical likelihood function is
P L(β) =
exp(Xkj β)
P
k=1 j=1 i∈Rkj wkij (t)exp(Xki β)
s
Y
k
Y
194
>install.packages("crrSC")
>library(crrSC)
attach(wbc_data_061813)
new_grp2 <- 1*(new_grp==2)
dri1 <- 1*(DRI==1)
dri2 <- 1*(DRI==2 | DRI==3)
ecog_m <- 1*(ecog_ps<0)
ecog_1 <- 1*(ecog_ps==1)
ecog_2 <- 1*(ecog_ps==2 | ecog_ps==3)
ric <- 1*(REGINTEN==2)
c_trm <-crrs(pfs_t, rrisk, strata=REGINTEN, cov1=cbind(age60, mf, dnr_type0, prophn, cmv_pos, dri1, dri2, yr_bmt,
cd34_cat2, ecog_m, ecog_1, ecog_2, new_grp2, new_grp3, new_grp4), failcode =1, cencode=3, ctype=1) # TRM
>print.crrs(c_trm)
convergence: TRUE
coefficients:
[1] 0.27670 0.17970 -0.10570 -0.06656 0.16060 -0.02863 -0.25700 -0.04353 -0.17490 -0.02699 0.19790 0.69530
0.64950 0.83480 0.49500
standard errors:
[1] 0.22940 0.15260 0.11900 0.14440 0.14090 0.19240 0.21570 0.04189 0.13130 0.22420 0.16580 0.24210 0.23180 0.22730
0.21800
two-sided p-values:
[1] 0.23000 0.24000 0.37000 0.64000 0.25000 0.88000 0.23000 0.30000 0.18000 0.90000 0.23000 0.00410 0.00510 0.00024
0.02300
195
Klein and Andersen model (2005):
• Modeling cumulative incidence function for subject j at t, Cjt, through a link function g(x).
g(Cjt) = αt + βZjt ,
where Z is a covariate vector.
• Regression estimator of β is based on pseudovalues from the cumulative incidence function
Ĉjt = nĈt − (n − 1)Ĉt−j
for j individual at time t, where Ĉt is the cumulative incidence function at time t for the complete
data set, and Ĉt−j is the cumulative incidence function on the data set obtained by deleting subject j.
• A link function for CRR is g(x) = log[−log(1 − x)]
• Parameter estiamtes and standard errors are obtained by using generalized estimating equations and
PROC GENMOD in SAS.
• SAS and R programs for this model is available in CIBMTR website
http://www.cibmtr.org/ReferenceCenter/Statistical/Education or
http://www.mcw.edu/biostatistics/statisticalresources/CollaborativeSoftware.htm
• Time dependent covariates are allowed
Example of Klein and Andersen model
options mprint mlogic merror source2 spool symbolgen;
libname in "/usr/stats/kim/ecog/Training/Comp_Risk/Pseudo";
%include "/usr/stats/kim/ecog/Training/Comp_Risk/Pseudo/pseudoci2.txt"; ** pseudoci.txt does not work;
%include "/usr/stats/kim/ecog/Training/Comp_Risk/Pseudo/cuminc.txt";
data one;
set in.bmt;
if dfs=1 and rel=0 then nrm=1; else nrm=0;
keep id dfs rel dfs_t nrm fab disease pt_age;
run;
data times;
input time @@; ** calcualte pseudo values at 5 data points roughly equally spaced on the event scale;
cards;
50 105 170 280 530
;
run;
*** %macro pseudoci(datain,x,r,d,howmany,datatau,dataout);
%pseudoci(one,dfs_t,rel,nrm,137,times,in.dataoutcr);
data two;
set in.dataoutcr;
dis2=0; if disease=2 then dis2=1;
dis3=0; if disease=3 then dis3=1;
run;
proc print data=two round;
proc genmod;
class oid otime;
FWDLINK LINK=LOG(-LOG(1-_MEAN_));
INVLINK ILINK=1-EXP(-EXP(_XBETA_));
model rpseudo= otime dis2 dis3 fab pt_age/dist=normal noscale noint;
repeated subject=oid / corr=ind ;
run;
196
197
The GENMOD Procedure
Model Information
Data Set
Distribution
Link Function
Dependent Variable
WORK.TWO
Normal
User
rpseudo
Number of Observations Read
Number of Observations Used
Missing Values
686
680
6
Class Level Information
Class
oid
Levels
136
otime
5
Values
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
...
50 105 170 280 530
Parameter Information
Parameter
Prm1
Prm2
Prm3
Prm4
Prm5
Prm6
Prm7
Prm8
Prm9
Prm10
Algorithm converged.
Effect
Intercept
otime
otime
otime
otime
otime
dis2
dis3
fab
pt_age
otime
50
105
170
280
530
198
GEE Model Information
Correlation Structure
Subject Effect
Number of Clusters
Clusters With Missing Values
Correlation Matrix Dimension
Maximum Cluster Size
Minimum Cluster Size
Independent
oid (138 levels)
138
2
5
5
0
The GENMOD Procedure
Algorithm converged.
GEE Fit Criteria
QIC
QICu
77.0100
89.1155
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter
Intercept
otime
otime
otime
otime
otime
dis2
dis3
fab
pt_age
50
105
170
280
530
Estimate
0.0000
-3.6459
-2.6063
-2.1331
-1.7867
-1.4960
-1.7393
-0.1820
1.0574
0.0169
Standard
Error
0.0000
0.8515
0.6671
0.6499
0.6329
0.6207
0.6493
0.5686
0.5007
0.0220
95% Confidence
Limits
0.0000
0.0000
-5.3148 -1.9769
-3.9137 -1.2989
-3.4069 -0.8594
-3.0272 -0.5463
-2.7126 -0.2794
-3.0119 -0.4667
-1.2964
0.9324
0.0761
2.0387
-0.0261
0.0600
Z Pr > |Z|
.
.
-4.28
<.0001
-3.91
<.0001
-3.28
0.0010
-2.82
0.0048
-2.41
0.0159
-2.68
0.0074
-0.32
0.7489
2.11
0.0347
0.77
0.4405
%include "/usr/stats/htkimc/core/BMT/Cutler/prt05377/March13/pseudoci2.txt";
%include "/usr/stats/htkimc/core/BMT/Cutler/prt05377/March13/cuminc.txt";
data one;
set g.both_dri_050213;
keep bmtid age case mrd mf myeloid mini dri2 agvhd_t agvhd24 ext_cgvhd2 dth_rel ext_cgvhd_t;
run;
data times;
input time @@; ** calcualte pseudo values at 5 data points roughly equally spaced on the ext cgvhd event scale;
cards;
5.3 6.35 7.2 8.6 13.3
;
run;
%pseudoci(one,ext_cgvhd_t,ext_cgvhd2, dth_rel,133,times, g.dataoutcr);
data two;
set g.dataoutcr;
if .Z<agvhd_t<otime then agvhd_tv=1;
run;
proc print data=two round;
else agvhd_tv=0;
*** time dependent variable;
proc genmod;
class oid otime;
model rpseudo= otime age case mrd mf myeloid mini dri2 agvhd_tv /dist=normal noscale noint;
run;
199
Standard
Estimate
Error
Parameter
Intercept
otime
otime
otime
otime
otime
age
case
mrd
mf
myeloid
mini
dri2
agvhd_tv
5.3
6.35
7.2
8.6
13.3
0.0000
-1.2647
-0.7386
-0.3391
-0.0693
0.2784
-0.0102
-0.7505
-0.2340
-0.0225
-1.0040
-0.2932
0.4674
0.6582
0.0000
0.8994
0.9203
0.9186
0.9136
0.9055
0.0182
0.3748
0.3584
0.4013
0.3908
0.4110
0.3967
0.3553
95% Confidence
Limits
0.0000
-3.0276
-2.5422
-2.1395
-1.8600
-1.4964
-0.0460
-1.4851
-0.9364
-0.8090
-1.7700
-1.0988
-0.3101
-0.0382
0.0000
0.4981
1.0651
1.4613
1.7213
2.0532
0.0255
-0.0159
0.4684
0.7639
-0.2381
0.5125
1.2449
1.3545
Z Pr > |Z|
.
-1.41
-0.80
-0.37
-0.08
0.31
-0.56
-2.00
-0.65
-0.06
-2.57
-0.71
1.18
1.85
.
0.1597
0.4222
0.7120
0.9395
0.7585
0.5753
0.0452
0.5138
0.9553
0.0102
0.4757
0.2387
0.0640
** time dependent variable
200
proc genmod;
class oid otime;
model rpseudo= otime age case mrd mf myeloid mini dri2 agvhd24 /dist=normal noscale noint;
run;
Parameter
Intercept
otime
otime
otime
otime
otime
age
case
mrd
mf
myeloid
mini
dri2
agvhd24
5.3
6.35
7.2
8.6
13.3
Estimate
Standard
Error
0.0000
-1.0707
-0.4850
-0.0386
0.2571
0.5961
-0.0144
-0.7560
-0.3363
0.1067
-0.7675
-0.2773
0.3994
0.4177
0.0000
0.8694
0.8808
0.8661
0.8567
0.8446
0.0175
0.3567
0.3632
0.3800
0.3706
0.4087
0.3867
0.3921
95% Confidence
Limits
0.0000
-2.7747
-2.2113
-1.7362
-1.4221
-1.0594
-0.0487
-1.4550
-1.0481
-0.6381
-1.4939
-1.0784
-0.3584
-0.3507
0.0000
0.6334
1.2414
1.6589
1.9363
2.2516
0.0200
-0.0569
0.3755
0.8515
-0.0412
0.5238
1.1573
1.1861
Z Pr > |Z|
.
-1.23
-0.55
-0.04
0.30
0.71
-0.82
-2.12
-0.93
0.28
-2.07
-0.68
1.03
1.07
.
0.2182
0.5819
0.9644
0.7641
0.4804
0.4130
0.0341
0.3545
0.7789
0.0384
0.4975
0.3016
0.2867
** time fixed variable
201
202
6.6 Power Calculation: Pintilie (2002, Stat in Med)
• Suppose that a randomized clinical trial is planned comparing two treatment difference for failure 1 in
the presence of competing risks. Under the proportional hazards assumption, the number of events
(n1 ) necessary to detect a specific subdistribution hazard ratio (HRsub) with Type I and II error rates
of α and β is
(Z1−α/2 + Z1−β )2
,
(13)
n1 = 2
σ [log(HRsub)]2
where z1−γ is the usual r-th quantile of the standard normal distribution, and σ 2 is the variance of the
covariate of interest, σ 2 = p(1 − p), and p is the proportion of patients in the experimental arm.
• Then the total sample size required is
n1
,
(14)
P1
where P1 is the probability of occurrence of failure 1 at a specific time point. Analogous to the sample
size formula in the absense of competing risks, P1 can be calculated as
1 Z a+f
P1 = 1 − f (1 − F1(u)du)
a
where a is the accrual time, and f is the additional follow-up time after the completion of the accrual.
N=
• If exponentiality is assumed, P1 for the treatment group A is
e−(λ1A +λ2A)f − e(λ1A+λ2A(a+f )
λ1A
}
{1 −
P1A =
λ1A + λ2A
(λ1A + λ2A)a
203
• P1B for the treatment group B can be calculated in a similar manner. Then P1 = πP1A + (1 − π)P1B ,
where π is the proportion of patients assined to the treatment group A.
• Typically λ1A and the cumulative incidence of failure 1 at a specified time point are known from a
previous study, and HRsub is a hypothesized value at the time of study design.
• Mimicking the HSCT study presented in Example 1, Table 6.2 below shows power for various scenarios
of cumulative incidence in the presence and absence of competing risks.
Table 6.2. Power in the presence and absence of a competing risk (N=400).
Group 1
Group 2
power
Group 1
Group 2
power
CIFe CIFc CIFe CIFc a=2 a=3 CIFe CIFc CIFe CIFc a=2 a=3
30% 50% 45% 30% 50% 52% 50% 30% 30% 45% 96% 97%
30% 0% 45% 0% 88% 91% 50% 0% 30% 0% 99% 99%
a: accrual time in years. CIFe : cumulative incidence rate of an event of interest. CIFc : cumulative incidence rate of a
competing event.
This power calculation is based on a two-sided significance level of 0.05 for a sample size 400
assuming f=2. As indicated in the table, power can be very different between presence and absence
(i.e., CIFc=0%) of a competing risk as well as the magnitude of a competing risk.
> power(N=400, a=2, f=2, pi=0.5, t0=3, CIFev0=0.3, CIFcr0=0.5, CIFev1=0.45, CIFcr1=0.3)
[1] 0.4969512
> power(N=400, a=2, f=2, pi=0.5, t0=3, CIFev0=0.3, CIFcr0=0, CIFev1=0.45, CIFcr1=0)
[1] 0.8833121
6.6. General Comments on CR data analysis
• Recognition
• Choice of competing event
• Presentation
204
6.7. Computing Tools in R
• R package cmprsk: cuminc and crr. crr outputs a matrix of Schoenfeld residuals. Plots of these
residuals against failure times can be used for checking the proportional hazards assumption.
• In addition, two adds-on functions to crr are available on http://www.stat.unipg.it/ luca/R/.
CumIncidence is a R program to calculate the confidence intervals of cumulative incidence
functions. modsel.crr is a model selection tool among candidate competing risks models.
• R package ’tworeg’ for confidence interval
• For stratified Fine and Gray model, R package ’crrSC’
• For Klein and Andersen model, check the CIBMTR website
http://www.cibmtr.org/ReferenceCenter/Statistical/Education or
http://www.mcw.edu/biostatistics/statisticalresources/CollaborativeSoftware.htm
• For power calculation, power, a R program, is available on
http://www.uhnres.utoronto.ca/labs/hill/People Pintilie.htm.
205
206
References
[1] Akaike, Hirotugu (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6): 716-723.
[2] Allison P. Survival Analysis Using SAS. SAS Institute Inc., Cary, NC: 2003.
[3] Alyea, E. P., Kim, H. T., Ho, V., Cutler, C., Gribben, J., DeAngelo, D. J. , Lee, S. J., Windawi, S., Ritz, J., Stone, R. M., Antin, J. H., Soiffer,
R. J. (2005) Comparative outcome of nonmyeloablative and myeloablative allogeneic hematopoietic cell transplantation for patients older than
50 years of age. Blood 105:1810-1814.
[4] Anderson JR, Cain KC, Gelber RD. (1983). Analysis of survival by tumor response and other comparisons of time-to-event by outcome
variables. J. Clin Oncol. 1983 Nov;1(11):710-9.
[5] Basu, A. P., Ghosh, J. K. (1978). Identifiability of the multinormal distribution under competing risks model (with J.K. Ghosh). J. Multivariate
Analysis, 8:413-429.
[6] Basu, A. P., Ghosh, J. K. (1980). Identifiability of distributions under competing risks and complementary risks model Communications in
Statistics, A: Theory and Methods, 9, 1515-1525
[7] Basu, A. P., Klein, J. P. (1982). Some recent results in competing risks theory Survival Analysis, 216-229 Crowley, John (ed.) and Johnson,
Richard A. (ed.) Institute of Mathematical Statistics (Hayward)
[8] Brookmeyer, Ron and Crowley, John (1982) A confidence interval for the median survival time Biometrics, 38, 29-41
[9] Cortese, G., Andersen, P.K. (2009). Competing Risks and Time-Dependent Covariates. Biometrical Journal, 51:138-158.
[10] Cox, D.R., Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall p.91-110.
[11] Crowder, M. (1994). identifiability crises in competing risks. Int. Statist. Rev. 62, 379-391.
[12] Crowder, M. (2001). Classical Competing Risks. Chapman & Hall/CRC.
[13] David, H. A., Moeschberger, M.L. (1978). The theory of competing risks. London: Griffin
[14] Dafni U. Landmark analysis at the 25-year landmark point. Circ Cardiovasc Qual Outcomes. 2011 May;4(3):363-71
[15] Fine, J. P., Gray, R.J.(1999). A proportional hazards model for the subdistribution of a competing risk. J. Am. Stat. Assoc. 94:496-509.
[16] Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics
in Medicine. 18:2529-2545, 15 - 30 September 1999
[17] Gray, R. J. A class of K-sample tests for comparing the cumulative incidence of a competing risk(1988). Ann. Statist. 16:1140-1154
[18] Grambsch, Patricia M., Therneau, Terry M. and Fleming, Thomas R. (1995) Diagnostic plots to reveal functional form for covariates in
multiplicative intensity models Biometrics, 51, 1469-1482
[19] Grambsch, Patricia M. and Therneau, Terry M. (1994) Proportional hazards tests and diagnostics based on weighted residuals (Corr: 95V82
p668) Biometrika, 81, 515-526
[20] Harrell F. E. (2001). Regression Modeling Strategies. Springer
207
[25] Kim, H.T. (2007). Cumulative incidence in a competing risks setting and competing risks regression analysis. Clinical Cancer Research.
13(2):559-65.
[26] Klein J.P., Andersen P.K. (2005). Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function.
Biometrics. 61:223-229.
[27] Klein JP and Moeschberger ML (1997) Survival analysis: techniques for censored and truncated data Springer-Verlag Inc (Berlin; New York)
[28] Latouche, A., Porcher, R. (2007). Sample size calculations in the presence of competing risks. Stat. Med. 30;26(30):5370-80.
[29] Lunceford, Jared K., Davidian, Marie and Tsiatis, Anastasios A. (2002) Estimation of survival distributions of treatment policies in two-stage
randomization designs in clinical trials Biometrics, 58, 48-57
[30] Mantel N, Byar DP. Evaluation of response-time data involving transient states: an illustration using heart-transplant data. J Am Stat Assoc.
1974; 69:81-86.
[31] Maki, E. (2006). Power and sample size considerations in clinical trials with competing risk endpoints. Pharm. Stat. 5(3):159-71.
[32] Latouche, A., Porcher, R, Chevret, S. (2004). Sample size formula for proportional hazards modelling of competing risks. Stat Med.
23(21):3263-74.
[33] Pencina MJ, D’Agostino RB. Stat Med. 2004 Jul 15;23(13):2109-23. Overall C as a measure of discrimination in survival analysis: model
specific population value and confidence interval estimation.
[34] Pintilie, M. (2002). Dealing with competing risks: Testing covariates and calculating sample size Statistics in Medicine, 21:3317-3324
[35] Pintilie, M. (2006). Competing Risks: A Practical Perspective. Wiley, New York.
[36] Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Statistics in Medicine.
2007;26(11):2389-2430.
[37] Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241.
[38] Simon R, Makuch RW (1984) A non-parametric graphical representation of the relationship between survival and the occurrence of an event:
Application to responder versus non-responder bias. Stat Med 3:35-44
[39] Scrucca L, Santucci A, Aversa F. (2007). Competing risk analysis using R: an easy guide for clinicians. Bone Marrow Transplantation.
Aug;40(4):381-7. Epub 2007 Jun 11.
[40] Scrucca L, Santucci A, Aversa F (2010). Regression Modeling of Competing Risk Using R: An In Depth Guide for Clinicians. Bone Marrow
Transplantation. Jan 11. [Epub ahead of print].
[41] Schwarz, Gideon (1978).Estimating the dimension of a model. The Annals of Statistics, 6, 461-464.
[42] Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction
models: a framework for traditional and novel measures. Epidemiology. 2010 Jan;21(1):128-38.
[43] Therneau, Terry M. and Grambsch, Patricia M. (2000) Modeling survival data: extending the Cox model Springer-Verlag Inc (Berlin; New
York)
[44] Therneau, Terry M., Grambsch, Patricia M. and Fleming, Thomas R. (1990) Martingale-based residuals for survival models Biometrika, 77,
147-160
[45] Tsiatis, A. (1975). A nonidentifiability aspect of the problem of competing risks. Proc. Natl. Acad. Sci. U.S.A. 72(1):20-22.

Survival Analysis and Competing Risks Data Analysis

Transcription

Similar documents

Regression - Tyler Moore

inauguraldissertation zur erlangung des akademischen

ODS - Urz

Time matters - Beuth Hochschule für Technik Berlin