A How-To Guide: Evaluating Surrogate Endpoints in Oncology Trials Faculty Disclosures
Transcription
A How-To Guide: Evaluating Surrogate Endpoints in Oncology Trials Faculty Disclosures
Faculty Disclosures • Nothing to disclose pertinent to this presentation A How-To Guide: Evaluating Surrogate Endpoints in Oncology Trials Ming Poi, PharmD, PhD Specialty Practice Pharmacist, Phase 1 Clinical Treatment Unit and Investigational Drug Service The Arthur G. James Cancer Hospital and Richard J. Solove Research Institute at The Ohio State University 2013 Annual HOPA Conference in Los Angeles, California March 21, 2013 Heimberg J, et al. N Engl J Med. 1999;102:365-78 2 Common Endpoints/Outcomes Objectives • Identify appropriate statistical methods for analysis of common oncology trial designs • Review available statistical methods for reporting surrogate endpoints in oncology clinical trials • Describe statistical issues in the evaluation of prognostic and predictive biomarkers in oncology • • • • • • • • • • • • Overall Survival (OS) Progression Free Survival (PFS) Disease Free Survival (DFS) Time to Progression (TTP) Time to Treatment Failure (TTF) Disease Specific Survival (DSS) Complete Response (CR) Durable Complete Response (DCR) Partial Response (PR) Objective Response Rate (ORR) = (CR + PR) Stable Disease (SD) Progressive Disease (PD) 3 Overall Survival (OS) 4 Pitfalls of OS • Viewed as the gold standard (“cure”) • “Failure” = Death • Required for randomized phase 3 trials, however… • Difficult to determine OS in “chronic disease” (non-curative) for evaluation of efficacy • Potential influence of a variety of possible 2nd, 3rd, 4th –line therapies on OS • Particularly for agents hypothesized to produce their clinical impact via cytostatic rather than cytotoxic effects - May result in inability to complete, high dropout rate, loss to follow-up, etc. (long study) - Require large sample size for definitive (“statistically significant”) result - Longer time to complete and achieve required number of events 5 6 Surrogate Endpoint - Definition Surrogate Endpoint • A marker - a laboratory measurement, or physical sign - that is used in clinical trials as an indirect or substitute measurement that represents a clinically meaningful outcome, such as survival or symptom improvement • “…intended to substitute for a clinical endpoint” (Temple R.J. JAMA 1999; 282(8):790-5; Lesko and Atkinson. Ann Rev Pharmacol Toxicol 2001; 41:347-366) • “intended to substitute for a clinical endpoint” - Project effects on a clinical benefit endpoint - Can be used to reasonably predict clinical benefit • The pathophysiology of the disease of the pathway of mechanism must be reasonably well-understood - Is the biomarker in causal pathway of disease process? (Woodcock, CDER-FDA, 2011) Example: Blood pressure ↔ strokes prevention and CVD • Requires validation from many randomized trials 7 Surrogate Endpoint – Regulatory Definitions Source Definition 57 FR 13234–13242 (1992) A surrogate end point, or “marker,” is a laboratory measurement or physical sign that is used in therapeutic trials as a substitute for a clinically meaningful endpoint that is a direct measure of how a patient feels, functions or survives and is expected to predict the effect of the therapy. Prentice’s Criteria for Surrogate Endpoint Validation • Correlate - statistically correlated to the clinical endpoint FDAMA 1997 USC Section 504(b)(1) …a “surrogate” endpoint that is reasonably likely to predict clinical benefit. Title 21 – Food and Drugs 21 C.F.R. 314 Section 314.510 …a “surrogate” endpoint that is reasonably likely, based on epidemiologic, therapeutic, pathophysiologic, or other evidence, to predict clinical benefit. Guidance for Industry: Evidence-based review system for the scientific evaluation of health claims Surrogate endpoints are risk biomarkers that have been shown to be valid predictors of disease risk and therefore may be used in place of clinical measurements of the onset of the disease in a clinical trial. FR=Federal Register; FDAMA= (Food and Drug Administration Modernization Act); CFR= Code of Federal Regulations 8 • Capture - the surrogate endpoint should account for all of an intervention’s effects (Prentice, R. L. Statistics in Medicine 1989; 8: 431–440) 9 10 Ideal Surrogate Advantages of Surrogate Endpoints • Expedite the approval for new drugs/indications to treat serious/life-threatening illnesses and expected to provide a meaningful therapeutic benefit over existing therapy Guide to Clinical Trials: • “The ideal surrogate endpoint is a disease marker that reflects what is happening with the underlying disease. The relationship between the marker and the true endpoint is important to establish…. the validity of data based on how the marker is affected by a medicine/treatment can be translated into a valid statement about the disease and true endpoint” (Spilker, B. (1991). Guide to clinical trials. New York: Raven Press) • Smaller and shorter trials = ↓ $$, ↓ Nme Time Intervention Disease 11 Surrogate Endpoint True Clinical Outcome (Fleming TR and DeMets DL. Ann Intern Med. 1996; 125(7):605-13) 12 Potential Surrogate Endpoints in Oncology • • • • • Surrogate Endpoint – Accelerated Approval • From 1992 -2008, FDA approved 90 applications for drugs based on surrogate endpoints through its accelerated approval process • 79 of the 90 - drugs to treat cancer, HIV/AIDS, and inhalational anthrax • Approval given on the condition that post marketing trials be performed to verify clinical benefit (aka phase IV confirmatory trials) • If confirmatory trial(s) fail – could lead to removal of the drug or indication from the market Response rate Molecular endpoint assessment Functional Imaging Tumor marker Biomarker 13 Examples of Cancer Drugs Approved by the FDA under Accelerated Approval Process from 1992 – 2008 Using Surrogate Endpoint(s) Drug Name Approval Date Approval Indication Surrogate Endpoint(s) Used Bicalutamide Oct. 4, 1995 Combination therapy for the treatment of advanced prostate cancer Time to treatment failure Docetaxel May 14, 1996 Treatment of locally advanced or metastatic breast cancer in specific patients Response rate 14 Examples of Cancer Drugs Approved by the FDA under Traditional Approval Process from 1992 – 2008 Using Surrogate Endpoint(s) Drug Name Approval Date Approval Indication Surrogate Endpoint(s) Used Exemestane Oct. 21, 1999 Treatment of advanced breast cancer in postmenopausal women Objective response rate (partial and complete) Sorafenib Dec. 20, 2005 Treatment of advanced renal cell carcinoma Progression free survival Sunitinib Jan. 26, 2006 Treatment of gastrointestinal stromal tumor and advanced renal cell carcinoma Time to progression Lapatinib Mar. 13, 2007 Treatment of advanced metastatic breast cancer Time to progression Irinotecan June 14, 1996 Treatment of metastatis carcinoma of the colon or rectum in certain circumstances Response rate Capecitabine April 30, 1998 Treatment of a specific type of metastasis breast cancer in certain patients Response rate Treatment of refractory anaplastic astrocytoma in specific adult patients Progression free survival at 6 months and objective response Ixabepilone Oct. 16, 2007 Treatment of metastatic or locally advanced breast cancer Progression free survival Bendamustine Mar. 20, 2008 Treatment of chronic lymphocytic leukemia Objective response and progression-free survival Temozolomide Aug. 11, 1999 Imatinib May 10, 2001 Treatment of chronic myeloid leukemia in certain circumstances Hematologic/cytogenic response Letrozole Oct. 29, 2004 Extended adjuvant treatment of early breast cancer in specific postmenopausal women Disease free survival Nilotinib Oct. 29, 2007 Treatment of chronic phase and accelerated phase Philadelphia chromosome positive chronic myelogenous leukemia in specific adult patients Major cytogenic response and hematologic response Bevacizumab Feb. 22, 2008 Treatment of breast cancer in specific patients Progression-free survival (Source:GAO-09-866 NEW DRUG APPROVAL: FDA Needs to Enhance Its Oversight of Drugs Approved on the Basis of Surrogate Endpoints (2009). Accessed from: www.gao.gov. 3 December, 2012) (Source:GAO-09-866 New Drug Approval: FDA Needs to Enhance Its Oversight of Drugs Approved on the Basis of Surrogate Endpoints (2009). Accessed from: www.gao.gov. 3 December, 2012) 15 Which of the following is an acceptable surrogate endpoint for a phase III trial to investigate if Drug XYZ improves the outcomes in colon cancer patients in an adjuvant setting? a) b) c) d) 16 Which of the following is the least preferred regulatory endpoint for drug approval in oncology? a) Time to treatment failure (TTF) b) Disease free survival (DFS) c) Progression free survival (PFS) d) Time to progression (TTP) Complete response Disease free survival Stable disease Time to progression 17 18 Statistical Tests – Which One? Common Statistical Tests Type of Data Key questions to ask: 1. What type of data (nominal, ordinal or continuous?) 2. Are the samples we are comparing independent or related (i.e. cross-over study)? 3. How many samples/groups we are comparing (2 or more)? 4. Do the data have a normal-distribution? Others: Equal variance? Confounders? 1 sample 2 samples (Independent) 2 related samples Nominal (“categorical” data) Pearson Chi-squared (χ2) Test Fishers Exact Test Ordinal (ranked data) Mann-Whitney U Test (parametric) Wilcoxon Rank-Sum Test (non-parametric) Kolmogorov-Smirnov Test (non-parametric) Sign Test Wilcoxon Signed-Rank Test Student t-Test (parametric) Welch Test (same as Student t-Test, unequal variance) Mann Whitney U Test (nonparametric ) Paired t-Test Continuous (interval data) Z-Test (population SD known OR n is large (> 30)) t-Test (population SD unknown AND n is small (<30)) McNemar Test > 2 Independent Samples Chi-squared (χ2) for k independent samples Kruskal Wallis One Way Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) n=sample size; SD=standard deviation 19 20 Type I and II Errors: Analogy Types of Error • α (alpha) = Type I - “False positive”, “an innocent person goes to jail” - Occurs when really is no difference but random sampling error caused data to show statistically significance - H0 rejected but shouldn’t have had • β (beta) = Type II Justice System Defendant Defendant Innocent Guilty Guilty Verdict [*power=(1-β)] - “False negative”, “set the guilty free” - Occurs when really is a difference but random sampling error caused data fail to show statistically significance - Fail to reject H0 but should have had Type I Not Guilty Verdict Reject H0 Type II Fail to reject H0 H0 False Type I Type II If the significance level, α, is decreased (ie from 0.05 to 0.01) then the chance of a Type II error will be So What… Ha is true Type I error Type II error Power (1-type II) Set α = 0.05 N=N1 Difference between mean =D a) Increased b) Decreased c) Unchanged Decrease N, Same difference of mean N=N2 (where N2 < N1) Effects = increase Type II error = decrease Power H0 True 22 21 H0 is true Statistical Testing Reverse in effects 2-fold Increase difference in mean, N=N2 Effects = decrease in Type II error = increase in Power 23 (Applet on http://www.intuitor.com/statistics/T1T2Errors.html) 24 Sample Size Calculation Sample Size Calculation Comparing 2 proportions of 2 independent samples Comparing 2 means of 2 independent samples • For 1-sided test, Sample Size “n” for testing two means: n = • For 2-sided test, 2 (Zα + Zβ )2 σ 2 (δ)2 α=type I error; β=type II error; δ=critical difference; σ2= variance α=type I error; β=type II error; p0=hypothesized population proportion or rate; p1= alternative hypothesis. ** the population SD, σ is needed in the sample size formula. This typically unknown value may be (i) estimated from historical data, or (ii) from a previous study (ie phase II) 26 25 Sample Size Calculation Sample Size Calculation A randomized trial proposed to assess the effectiveness of HOPAtinib compared to Standard-of-Care (SOC) for the treatment of patients with locally advanced or metastatic non-small cell lung cancer (NSCLC) that is HOPA-positive. A previous study showed that proportion of subjects cured by HOPAtinib is 50% and clinically important difference of 15% as compared to SOC is targeted. A randomized trial proposed to assess the effectiveness of HOPAtinib compared to Standard-of-Care (SOC) for the treatment of patients with locally advanced or metastatic non-small cell lung cancer (NSCLC) that is HOPApositive. A previous study showed that proportion of subjects cured by HOPAtinib is 50% and clinically important difference of 15% as compared to SOC is targeted. α= 5%, Power = 80%, 2-sided test. What is the n needed? p1 = proportion of subject cured by HOPAtinib = 0.50, p2 = proportion of subject cured by SOC = 0.35, p1-p2 = clinically significant difference = 0.15 Zα/2 = 1.96 (refer to z table in any stats text) Zβ = 0.84 (refer to z table in any stats text) Based on above formula the sample size required per group is 167. Hence total sample size required is 334 ~340. α= 5%, Power = 80%, 2-sided test. What is the n needed? 27 Non-inferiority (NI) Trials 28 Non-inferiority (NI) Trials Rationale for NI Trial: • To determine whether a new treatment is no worse than a reference treatment by more than a pre-defined margin • NI ≠ the 2 treatments/drugs are equivalent • NI ≠ the new drug is not inferior to standard therapy • Take into consideration: New treatment may not be better than the standard but may have other advantages, i.e cost, toxicity profile, invasiveness, etc. 29 E = experimental therapy S = standard therapy ∆ = The margin (difference b/t E and S) µ = mean H0 for NI in layman terms: • E is NOT non-inferior to S (a double negative statement) Ha : E IS non-inferior to S • Therefore, if p < 0.05, we reject null hypothesis and conclude E is non-inferior to S by pre-defined margin • For p > 0.05, fail to reject null hypothesis = E is NOT non-inferior to S 30 Biomarker NI Checklists • Defined as a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or biological responses to a therapeutic intervention • In IT trials, ITT analysis will often increase the risk of type I error (falsely reject H0). Would like to look for “per-protocol”, or both analyses • Is the margin defined reasonable? • Are the 2 arms set-up “fairly” – dosing equipotent, stopping rule, etc? (FDA. DrugDevelopmentToolsQualificationProgram. Access from: http://www.fda.gov/Drugs/Development ApprovalProcess/ DrugDevelopmentToolsQualification Program/ucm284395.htm. 30 November, 2012) Examples: Cholesterol, serum creatinine, blood sugar, tumor size from magnetic resonance imaging (MRI) or computed tomography (CT) 31 32 Biomarker Prognostic Biomarkers • May be a physiologic, pathologic, or anatomic characteristic/measurement associate with some aspect of normal or abnormal biologic function/process • A prognostic factor is any patient or disease characteristic that has a significant impact on a clinical endpoint. Example: performance status (PS) • Change in biomarkers post treatment may predict/identify safety issues due to a drug • Change in biomarkers post treatment may reveal a pharmacological activity expected to predict an eventual benefit from treatment • Usually in terms of relative hazard of failure • Used to determine treatment vs. no treatment following surgery • Used to consider aggressiveness of treatment 33 34 Predictive Biomarkers Types of Validation for Prognostic and Predictive Biomarkers • Associated with response (benefit) or lack of response (benefit) to a particular therapy relative to other available therapy • In statistics ~ “interaction effect” • Select one vs. another treatment • Example: Her2/neu overexpression/response to trastuzumab • • • • • • • • • 35 Analytical validation Accuracy compared to gold-standard assay Robust and reproducible Clinical validation Able to predict independent data Clinical/medical utility Result in patient benefit Actionable Improving treatment decisions 36 Need for Prognostic & Predictive Biomarkers • Available treatments are: - Not effective - Unable to predict which patients are likely to benefit Control medical costs Improve the success rate of clinical drug development (JNCI 2009; 101:736-75) 37 Cheang, et al. (cont’) 38 Cheang, et al., 2009 – Methods Background: • New technology (gene expression profiling) has enabled new molecular classification of breast cancer subtypes • ER(+) subtypes, Luminal A and Luminal B have different characteristics and prognosis • Luminal-B subtype confers increased risk of early relapse compared with the luminal-A 357 patient tumors w/ invasive breast carcinomas were subtyped by gene expression profile. - ER and PR status , HER2 status, and the Ki67 index (% of Ki67(+) cancer nuclei) were determined IHC - Pre-specified Ki67 cut point to distinguish luminal B from luminal A tumors - The prognostic value of the IHC assignment for BrCa recurrence-free and disease-specific survival was investigated with an independent tissue microarray of 4046 breast cancers by use of Kaplan-Meier curves and multivariable Cox regression 39 Cheang, et al., 2009 – Results Relapsed No Relapse Total Luminal A 151 474 625 Luminal B 86 177 263 Luminal/HER2+ 20 35 55 Total 257 686 943 40 Common Point Estimates • Point Estimates commonly seen (and misunderstood) in clinical oncology Odds ratio Risk difference Relative risk (aka risk ratio) Hazard ratio What is the Relative Risk (RR) for relapse between Luminal B vs. A? RR =(86/263) / (151/625) = 1.35 Which one should be used??? What is the difference??? 41 42 Hazard Ratio • Relative risk (RR) is meaningless for casecontrol studies! Use odd ratios(OR) instead • RR cannot generally be calculated in a casecontrol study because the entire population has not been studied, so incidences are unknown UNLESS the incidence is low RR = (a/a+b) / (c/c+d) RR ≅ (a/b) / (c/d) ≅ OR ~ “RR-averaged-over-time” • The risk does not depend on time; risk is constant over time • Example: HR=0.7 comparing patients in temsirolimus group vs. interferon-α => Patients in the temsirolimus group are at 0.7 times the risk of death as those in the interferon-α arm, at any given point in time a, c = “event” 43 44 Understanding Diagnostic Test Comparing ROC curves Actual Value Predicted Value • Comparing tests Best Better Good Positive negative Total Number Positive TP FP PP Negative FP TN PN Total number AP AN N Sensitivity = True Positive Rate = TP/AP Worthless Specificity = True Negative Rate = TN/AN Accuracy = (TP + TN) / N Positive Predictive Value (PPV) = Precision = TP / PP Negative Predictive Value (NPV) = TN / PN F1 score (aka F-score) = a measure of a test's accuracy. It considers both the sensitivity and precision (“the true positive factor”) = 2 x TP /(PP + AP) Shows the tradeoff b/t sensitivity and specificity (↑ sensiNvity = ↓ specificity) 45 46 Example: T4 cut-off Prognostic vs. Predictive Biomarkers Phase III Study Design Test Results [Goldstein and Mushlin (J Gen Intern Med 1987;2:20-24.)] • • • • • • Determine Sensitivity and Specificity ….. ROC plot 47 Treatment-by-biomarker interactions Enrichment design Completely randomized design Randomized block design Biomarker-strategy design Prospective vs. retrospective 48 Targeted (Enrichment) Design Targeted (Enrichment) Design • Predictive Marker Study Design Enrichment Design • Only marker(+) patients are randomized and/or treated • Example from NSCLC: EGFR mutation as a predictive biomarker (Mok et al. N Engl J Med; 361: 947-57) Develop Predictor of Response to New Drug Patient Predicted Responsive New Drug Patient Predicted Non-Responsive Off Study Control 50 49 Biomarker Stratified Design • Do not use the biomarker to restrict eligibility • Purpose: To evaluate the new treatment overall and for the pre-defined population Examples of successful stories… Predictive biomarker For New Tx Predicted Responders New TX Drug Disease Target Response Rate in Phase Ib/II studies (%) Crizotinib Imatinib Imatinib NSCLC CML GI stromal EMLA-ALK BCR-ACL KIT 57 95 54 Kwak, et al., 2010 Kantarjian, et al., 2002 Demetri, et al., 2002 Predicted Non-responders Control New TX Control 51 52 A relative risk should not be computed for the following design because the prevalence of the disease is artificially constrained. Take Home Messages • Only way to definitively determine treatment effectiveness is an RCT that has - Intent-to-treat procedures and analysis a) Randomized, doubleblinded prospective study b) Case-control study c) Prospective cohort study d) Cross-sectional study e) Poll the audience… - Very little loss of follow-up data - No other threats (randomization, blinding) - Non-adherence is bad, but loss to follow-up is much worse - Loss before randomization is OK, loss after randomization is not 53 54 Surrogate is a biomarker that Thank you! a) is intended to substitute for a clinical endpoint b) can be used to reasonably predict clinical benefit c) is in the causal pathway of disease process d) requires validation from many randomized trials e) All of the above 55 56