Sample Size Determination for Clinical Trials with Two Correlated Time -
Transcription
Sample Size Determination for Clinical Trials with Two Correlated Time -
Sample Size Determination for Clinical Trials with Two Correlated Time-to-Event Co-primary Endpoints The 7th IASC-ARS Joint Taipei Symposium 2011 Academia Sinica, Taipei, Taiwan, December 16-20, 2011 Toshimitsu Hamasaki, PhD Osaka University Graduate School of Medicine Scott Evans, PhD Harvard University School of Public Health Tomoyuki Sugimoto, PhD Hirosaki University Graduate School of Mathematical Science Takashi Sozu, PhD Kyoto University School of Public Health This research is financially supported by the following research grants from the MEXT Grant-in-Aid for Scientific Research (C) (No. 23500348), Pfizer Health Research Foundation, Japan and Statistical and Data Management Center of the Adult AIDS Clinical Trials Group grant 1 U01 068634 1. Introduction Background and Objectives 3 Clinical Trials with Multiple Endpoints Background z z In clinical trials, historically, a single outcome is selected as the primary endpoint and is used as the basis for the trial design including sample size determination, as well as for interim monitoring and final analyses. Many recent clinical trials become more complex, utilizing more than one primary endpoints z z z z Oncology E1: Time until clinical progression E2: Time to death Prevention of Mother-to-Child HIV/Hepatitis B Transmission E1: Time to infant HIV infection E2: Time to Hepatitis B infection Cardiovascular Disease Therapy E1: Time until the first of MI, Stroke, or death E2: Time until hospitalization or death The rationale for this is that the assessment of a an intervention using a single endpoint may not provide a comprehensive picture of the intervention’s effects. 4 Strategies for Multiple Endpoints Background T1) significance on all endpoints being sufficient for proof of effect z Each hypothesis should be rejected at the same significance level z No adjustment is needed to control type I error z Type II error increases as the number of outcomes to be tested increases z “Multiple Co-Primary Endpoints” (Hung, Wang, 2009) T2) significance on at least one endpoint being sufficient for proof of effect with a prespecified ordering or nonordering of outcomes z Type I error increases as the number of outcomes to be tested increases z An adjustment to control type I error is required Hung HMJ, Wang SJ (2009). J Biopharm Statist 19, 1-11. 5 Arising Natural Questions Background z How large a sample should be for T1 and T2? z Is there any considerable overestimation or underestimation in the sample size when the correlation is ignored? z Is there any considerable reduction or increase in the sample size when the correlation is taken account into the sample size calculation ? 6 Our Research Focus Objectives z To discuss the power and sample size determination for superiority comparative clinical trials with two possibly correlated time-to-events endpoints to be evaluated as primary variables for the design and analysis, with paying more attention to T1 z To consider a simpler approach that assumes that the time-to-event outcomes are exponentially distributed z Sugimoto et al (2011) discuss an approach to sizing clinical trials with two correlated time-to-event outcomes based on the log-rank statistics. z Implementing the method requires technical knowledge, sophisticated programming skill, and expensive computations z We will focus on hazard ratio : results of difference in hazard rates are very similar as seen in those of hazard ratios Sugimoto T, Hamasaki T, Sozu T (2011). In Abstract of the 7th International Conference on Multiple Comparison Procedure, 121, Washington DC, USA, August 29-September 1, 2011. 7 Co-Primary Endpoints Sample Sizing Related Research All Continuous Normal Endpoints Xiong et al (2005, Controlled Clinical Trials), Sozu et al (2006, Japanese Journal of Biometric Scoiety), Eaton, Muirhead (2007, Journal of Statistical Planning and Inference), Senn, Bretz (2007, Pharmaeutical Statistics), Hung, Wang (2009, Journal of Biopharmaceutical Statistics); Sozu, Sugimoto, Hamasaki (2010, Statistics in Medicine; 2011, Journal of Biopharmaceutical Statistics); Sugimoto, Sozu, Hamasaki (2011, Pharmaceutical Statistics); Kordzakhia, Siddiqui, Huque (2010, Statistics in Medicine) All Binary Endpoints Song (2009, Computational Statistics and Data Analysis), Sozu, Sugimoto, Hamasaki (2010, 2011), Hamasaki, Evans (2011, presented at 2011 Symposium on Applied Statistics) All Time-to-Event Endpoints Sugimoto, Hamasaki, Sozu (2011, presented at MPC2011) Mixed Endpoints Sozu, Sugimoto, Hamasaki (2010, presented at IBC2010, mixed continuous and binary endpoints), Sugimoto, Sozu, Hamasaki (2011, presented at MPC2011, mixed binary and time-to-event endpoints) 8 Outline 1. Background and Objectives 2. Comparing log-transformed Hazard ratios (HR) from Two Correlated Exponential Time-to-Event Endpoints z Statistical Settings z Conjunctive Power and Sample Size Calculation Without Censoring/Limited Recruitment and Censoring 3. Behaviors of Sample Size and Empirical Power z Bivariate Exponential Distributions Clayton Copula/Positive Stable Copula/Fatal-Shock Model 4. Further Developments 5. Summary * Result for difference in hazard rates is available. 2. Required Sample Size to Compare Hazard Ratio from Two Correlated Exponential Time-to-Event Endpoints Statistical Setting Conjunctive Power and Sample Size Calculation 10 Statistical Settings Trial Design, Endpoints Distribution Total Sample Size Time-to-Event Endpoint 1 Test Treatment TT1i ∼ Exp(λT1 ) nT = rN N N = nT + nC TT2i ∼ Exp(λT2 ) corr[TT1i , TT2i ] = ρ T > 0 nT : nC = r :1 − r Control Treatment Endpoint 2 TC1j ∼ Exp(λC1 ) nC = (1 − r ) N TC2j ∼ Exp(λC2 ) corr[TC1j , TC2j ] = ρC > 0 z Randomized, control, superiority clinical trials for two treatment comparison with two time to event endpoints z TTik , TCjk follow the exponential distribution with constant hazard rates λTk , λCk (k = 1, 2; i = 1,… , nT ; j = 1,… , nC ) 11 Statistical Settings Distribution of log Hazard Ratio (HR) Assumption z Participants are followed until the event of interest z No participant is lost to follow-up Distributions for large sample z log-transformed hazard rates Æ Approximately normal-distributed ( ( ⎧ log λˆTk ∼ N log λTk , nT−1 ⎪ approx ⎨ −1 ⎪⎩ log λˆCk ∼ N log λCk , nC approx z ) ) log-transformed hazard ratioÆ Approximately normal-distributed ( ∼ N ( logψ logψˆ k = log λˆTk − log λˆCk logψˆ1 ∼ N logψ 1 , nT−1 + nC−1 logψ k = log λTk − log λCk logψˆ 2 approx approx 2 ) ) , nT−1 + nC−1 Collett D (2003). Modelling Survival Data in Medical Research. 2nd Edition. Chapman & Hall Gross AJ, Clark VA.(1975). Survival Distributions John Wiley & Sons. 12 Statistical Setting Joint Distribution of log HRs Joint distribution of the two log-transformed HRs for large sample (logψˆ1 , logψˆ 2 ) ∼ N 2 (μ , Σ) approx ⎛ σ 12 σ 12 ⎞ ⎛ logψ 1 ⎞ μ =⎜ ⎟ Σ=⎜ 2 ⎟ log ψ σ σ 2⎠ ⎝ 2 ⎠ ⎝ 21 ⎧ 2 1 ⎛1 1 ⎞ σ = + k = k′ ⎪ k ⎜ ⎟ N ⎝ r 1− r ⎠ ⎪ ⎨ ⎪σ = 1 ⎛ ρT + ρ C ⎞ k ≠ k ′ ⎪⎩ kk ′ N ⎜⎝ r 1 − r ⎟⎠ Correlation between the two log-transformed HRs for large sample ρ HR = corr ⎡⎣logψˆ1 , logψˆ 2 ⎤⎦ ≈ r ρ T + (1 − r ) ρ C ρ HR = ρ Common correlation ρ = ρT = ρC Continuous Endpoints mean difference ρ D = Binary Endpoints ρ RD risk difference relative risk ρ RR ρ ≤ρ ≤ρ 13 Statistical Setting Hypothesis, Statistics and Rejection Region Hypothesis for a joint significance − zα Z2 − zα ⎧ H1 : logψ 1 < 0 and logψ 2 < 0 ⎨ logψ 2 ≥ 0 ⎩ H 0 : logψ 1 ≥ 0 or Test statistics for hypothesis Rejection Region of Z k = logψˆ k H0 ⎡⎣{Z1 < − zα } ∩ {Z 2 < − zα }⎤⎦ 1 N 1 ⎞ ⎛1 + ⎜ r 1− r ⎟ ⎝ ⎠ Significant level for hypothesis testing α is the upper α th percent point of the standard normal distribution zα Z1 14 Overall Power and Sample Size Without Censoring Sample size Overall power for showing a joint statistical significance ⎡2 ⎤ 1 − β = Pr ⎢∩ {Z k < − zα }⎥ ⎣ k =1 ⎦ N NC ⎡2 ⎤ ≈ Pr ⎢∩ Z k* > ck ⎥ ⎣ k =1 ⎦ { Z k* = − logψˆ k + logψ k 1 N 1 ⎞ ⎛1 ⎜ r + 1− r ⎟ ⎝ ⎠ } ck = zα + if N is an interger ⎧⎪ N =⎨ ⎪⎩[ N ] + 1 otherwise N is the smallest value z satisfying the overall power logψ k 1 N 1− β 1 ⎞ ⎛1 ⎜ r + 1− r ⎟ ⎝ ⎠ “Conjunctive Power” or “Complete Power” (Senn, Bretz, 2007) Senn S, Bretz F (2007). Pharm Statist 6, 161-170. Φ 2 ( −c1 , −c2 ρ HR ) Distribution function of standard bivariate normal distribution z [ N ] is the greatest integer less than N 15 Asymptotic Variance for HR Limited Recruitment and Censoring T0 0 Recruitment period T Follow-up period T − T0 z Participants are recruited for study over an interval zero to T0 z All recruited participants are followed to time of the terminal event or time to T (T > T0 ) Asymptotic variance of log-transformed HR for large sample ⎧ 1 ⎛1 1 ⎞ + ⎪ ⎜ r 1− r ⎟ N φ λ ( ) ⎝ ⎠ k ⎪ var ⎡⎣logψˆ k ⎤⎦ ≈ ⎨ ⎞ 1 ⎪1 ⎛ 1 + ⎪ N ⎜ rφ (λ ) (1 − r )φ (λ ) ⎟ Tk Ck ⎠ ⎩ ⎝ λk = r λTk + (1 − r )λCk φ (λk ) = 1 − Homogeneous variance Null hypothesis heterogeneous variance Alterative hypothesis exp ( −λk T + λk T0 ) − exp ( −λk T ) λk T0 16 Conjunctive Power and Sample Size Limited Recruitment and Censoring Over power for showing a joint statistical significance 1 − β = Φ 2 ( −c1 , −c2 ρ HR ) ⎛ ⎜ zα ⎜ ck = ⎝ ⎞ 1 ⎛1 1 ⎞ log + + ψ k ⎟ ⎜ ⎟ ⎟ Nφ (λk ) ⎝ r 1 − r ⎠ ⎠ 1 N r if N is an interger ⎧⎪ N =⎨ ⎪⎩[ N ] + 1 otherwise + λC2k 1 ⎞ ⎛1 ≥ λk2 ⎜ + ⎟ 1− r ⎝ r 1− r ⎠ logψ k ck′ = zα + 1 N ⎛ 1 ⎞ 1 + ⎜ ⎟ ( ) (1 ) ( ) − r φ λ r φ λ T C k k ⎝ ⎠ Sample size N CN λT2k ⎛ 1 ⎞ 1 + ⎜ ⎟ ( ) (1 ) ( ) φ λ φ λ r r − Tk Ck ⎠ ⎝ Simplified Sample size N * CN if N is an interger ⎧⎪ N =⎨ ⎪⎩[ N ] + 1 otherwise Æ Improving the approximation 17 Conjunctive Power Limited Recruitment and Censoring Conjunctive Power z The overall power increases as the correlation toward one. z The lowest overall power is when the correlation is zero and the two hazard ratios are equal, with equal hazard rates between control groups ψ 2 = 0.50 ψ 2 = 0.556 0.80 0.75 ψ 2 = 0.625 0.70 ψ 2 = 0.667 0.65 0.60 0.0 0.2 0.4 0.6 0.8 Corrrelation T0 = 2.0 T = 5.0 ψ 1 = 0.667 λC1 = 0.5 λC2 = 0.5 α = 0.025 1 − β = 0.8 r = 0.5 1.0 3. Behaviors of Sample Size and Empirical Power Bivariate Exponential Distributions Sample Size Behavior Empirical Power for Log-Rank Test 19 Models for Correlation Bivariate Exponential Distributions 1. Clayton Copula Model (Clayton, 1976) S 0 (u , v;θ ) = (u −θ + v −θ − 1) −1 θ z z 0 ≤θ Times are positively associated 0 ≤ Late dependency θ: Association Parameter θ: Association Parameter ρ <1 2. Positive Stable Copula Model (Hougaard, 1984) S 0 (u , v;θ ) = exp[−{(− log u )1 θ + (− log v)1 θ }θ ] 0 ≤ θ ≤ 1 z z Times are positively associated 0 ≤ Early dependency ρ <1 3. Fatal-Shock Model/Marshall-Olkin’s Model (Marshall-Olkin, 1967) ⎧exp{−θ1u − (θ 2 + θ12 )v} 0 ≤ u ≤ v ⎩exp{−(θ1 + θ12 )u − θ 2 v} 0 ≤ v ≤ u S 0 (u , v; λ12 ) = ⎨ z z The range is restricted 0 ≤ Linear dependency θ1 , θ 2 , θ12 : Hazard Parameter ρ < min ( λ1 λ2 , λ2 λ1 ) Clayton DG.(1976). Biometrika 65, 141-151. Hougaard P.(1984). Biometrika 71, 75-83 Marshall AW, Olkin I (1967). J Amer Statist Assoc 62, 30-44 20 Relationship between Two Endpoints Bivariate Exponential Distributions 8.0 ρ = 0.3 ρ = 0.0 ρ = 0.5 ρ = 0.95 ρ = 0.8 TIME 2 Clayton 6.0 4.0 2.0 6.0 TIME 2 Positive Stable 0.0 8.0 4.0 2.0 20 40 60 80 4.0 6.0 8.0 6.0 TIME 2 Fatal-Shock 0.0 8.0 0 0 4.0 2.0 0.0 0.0 2.0 TIME 1 0.0 2.0 4.0 TIME 1 6.0 8.0 0.0 2.0 4.0 TIME 1 6.0 8.0 0.0 2.0 4.0 TIME 1 6.0 8.0 0.0 2.0 4.0 6.0 8.0 TIME 1 λT1 λC1 = λT2 λC2 21 Sample Size Behavior Limited Recruitment and Censoring Total sample size required 550 ψ 1 = 0.667 ψ 2 = 0.667 ψ 1 = 0.667 ψ 2 = 0.625 ψ 1 = 0.667 ψ 2 = 0.50 * N CN N CN 500 450 400 T0 = 2.0 T = 5.0 λC1 = 0.5 λC2 = 0.5 α = 0.025 1 − β = 0.8 r = 0.5 350 0.0 0.2 0.4 0.6 Correlation 0.8 1.0 0.0 0.2 0.4 0.6 Correlation 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Correlation z All of the sample sizes decrease as correlation goes toward one. However, the degree of decrease is smaller as the difference between the hazard ratios is larger z The largest values for all the sample sizes are commonly observed when equal hazard ratio and zero-correlation * z The value of N CN is always lager than that of N CN 22 Empirical Power for Log-Rank Test Clayton Copula Model Empirical Conjunctive Power 0.90 ψ 1 = 0.667 ψ 2 = 0.667 ψ 1 = 0.667 ψ 2 = 0.625 ψ 1 = 0.667 ψ 2 = 0.50 0.85 0.80 T0 = 2.0 T = 5.0 0.75 λC1 = 0.5 λC2 = 0.5 α = 0.025 1 − β = 0.8 r = 0.5 0.70 0.0 0.2 0.4 0.6 0.8 Correlation * N CN N CN 1.0 0.0 0.2 0.4 Correlation 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Correlation z All of the empirical powers decrease as correlation goes toward one z In particular the powers are less than the desired power 0.8 as correlation is greater than approximately 0.4 while the empirical powers are greater than the desired power of 0.8 when the correlation is less than around 0.4 * z The empirical power of N CN is always better than that of N CN * 100,000 Monte-Carlo Trials 23 Empirical Power for Log-Rank Test Positive Stable Copula Model Empirical Conjunctive Power 0.90 ψ 1 = 0.667 ψ 2 = 0.667 ψ 1 = 0.667 ψ 2 = 0.625 ψ 1 = 0.667 ψ 2 = 0.50 0.85 0.80 T0 = 2.0 T = 5.0 0.75 λC1 = 0.5 λC2 = 0.5 α = 0.025 1 − β = 0.8 r = 0.5 0.70 0.0 0.2 0.4 0.6 0.8 Correlation * N CN N CN 1.0 0.0 0.2 0.4 Correlation 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Correlation z All of the empirical powers do not much change with correlation and they are attained at the desired power of 0.8 * z The empirical power of N CN is always slightly larger than that of N CN * 100,000 Monte-Carlo Trials 24 Empirical Power for Log-Rank Test Fatal-Shock Model Empirical Conjunctive Power 0.90 ψ 1 = 0.667 ψ 2 = 0.667 ψ 1 = 0.667 ψ 2 = 0.625 ψ 1 = 0.667 ψ 2 = 0.50 0.85 0.80 T0 = 2.0 T = 5.0 0.75 λC1 = 0.5 λC2 = 0.5 α = 0.025 1 − β = 0.8 r = 0.5 0.70 0.0 0.2 0.4 0.6 0.8 Correlation * N CN N CN 1.0 0.0 0.2 0.4 Correlation 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Correlation z All of the empirical powers do not much change with correlation and they are attained at the desired power of 0.8 * z The empirical power of N CN is always slightly larger than that of N CN * 100,000 Monte-Carlo Trials 4. Further Developments At Least One Statistical Significance Non-Inferiority Hypothesis Mixed Binary and Time-to-Event Endpoints 26 At Least One Statistical Significance Power for Bonferroni Adjustment Overall power for showing statistical significance for at least one endpoint with Bonferroni adjustment ⎡2 ⎤ 1 − β = 1 − Pr ⎢∩ Z k > − zα 2 ⎥ ⎣ k =1 ⎦ { 1.0 Ratio of Total Sample Size Required ψ 2 = 0.625 0.8 0.7 ψ 2 = 0.556 0.6 ψ 2 = 0.50 0.5 “Disjunctive power” or “Minimal power” (Senn, Bretz, 2007). 1.7 ψ 2 = 0.667 0.9 Disjunctive Power } 0.4 0.3 0.2 0.1 1.6 ψ 2 = 0.667 1.5 ψ 2 = 0.625 1.4 1.3 ψ 2 = 0.556 1.2 ψ 2 = 0.50 1.1 ψ 1 = 0.667 λC1 = 0.5 λC2 = 0.5 α = 0.025 1 − β = 0.8 r = 0.5 1.0 0.0 0.0 0.2 0.4 0.6 Corrrelation 0.8 1.0 T0 = 2.0 T = 5.0 0.0 0.2 0.4 0.6 Correlation 0.8 1.0 27 Non-Inferiority Hypothesis Power and Sample Size NI hypothesis ⎧ H1 : logψ 1 < log M 1 ⎨ ⎩ H 0 : logψ 1 ≥ log M 1 and logψ 2 < log M 2 logψ 2 ≥ log M 2 or ⎧ M 1 Non-inferiority ⎨ margin ⎩M 2 Test statistics Z k = ( logψˆ k − log M k ) 1 N 1 ⎞ ⎛1 + ⎜ r 1− r ⎟ ⎝ ⎠ Overall power for showing a joint statistical significance (Heterogeneous variance) 1 − β = Φ 2 (−c1 , −c2 ρ HR ) logψ k − log M k ck = zα + 1 N ⎛ 1 ⎞ 1 + ⎜ ⎟ ( ) (1 ) ( ) − φ λ φ λ r r T C k k ⎝ ⎠ 28 Binary and Time-to-Event Outcomes Correlation Correlation between hazard ratio and relative risk ⎡ pˆ T ⎤ λˆT corr ⎢ log , log ⎥≈− ˆ ˆ p λC ⎢⎣ C⎥ ⎦ (1 − r ) ρT λT pT qT + r ρC λC pC qC {(1 − r ) λT2 + rλC2 }{(1 − r ) pT qT + rpC qC } Binary endpoint Time-to-Endpoint YTi ∼ Bin(nT , pT ) STi ∼ Exp(λT ) YCj ∼ Bin(nC , pC ) SCj ∼ Exp(λC ) E[YTi ] = pT var[YTi ] = pT qT E[ STi ] = λT−1 var[ STi ] = λT−2 E[YTj ] = pC var[YCj ] = pC qC E[ SCj ] = λC−1 var[ SCj ] = λC−2 z One of issues is how to define the correlation: a use of correlation form the joint distribution as a limiting distribution of Copulas 5. Summary 30 Summary z We described the power and sample size determination for comparative clinical trials with two correlated time-to-event endpoints to be evaluated as primary variables. z A simpler approach that assumes that the time-to-event endpoints are exponentially distributed. z Displaying significance on both endpoints for proof of an acceptable efficacy profile z The method may work when the dependency structure is early or linear one. While a careful use of the method is recommended when the late high dependency is observed. z Our research is restricted to “two treatment comparison and two time-to-event endpoints” z The result from two endpoints gains the insight into more than two endpoints z The extension of the result to more than two hazard ratios is not difficult although other issues will arise. 31 Thank you for your kind attention If you have any questions, please e-mail to [email protected]