Sample Size Determination 樣本數的計算謝宗成助理教授慈濟大學醫學研究所

Transcription

Sample Size Determination
樣本數的計算
謝宗成助理教授
慈濟大學醫學研究所
[email protected]
TEL: 03-8565301 ext 2015
研究室：勤耕樓 712
1
主題

Part I 基本觀念
 為什麼要計算樣本數?
 樣本數要大?還是小?
 樣本數要多大才夠?
 樣本數的計算必須考量的條件

Part II 軟體操作
 Sample
size formula vs. Effect Size
 Software for sample size calculation
 Sample size for means comparison
 Sample size for proportions comparison
 Sample size for linear regressions
2
為什麼要計算樣本數?

當欲研究對象的母群體很大 (無限大)時，隨
機抽取有限的樣本，藉由統計方法來推論
母群體的特性。
研究高血壓新藥對台灣的高血壓患者的治
療效果。
 E.g.,

所抽取之”有限”樣本的個數必須要”夠大” ，
才能代表母群體，反映母群體的特性。
3
樣本數要大?還是小?

樣本數太小的壞處
 樣本數的代表性不夠，不足以推論母群體。
的計畫會被 Challenge。
 投的 paper 不會被接受。
 Submit

樣本數太大的壞處
 浪費不必要的資源與費用。
 造成假性的顯著差異
: 統計上顯著，但臨床上
不顯著。E.g., 相關係數檢定有統計上顯著意義
，但由樣本計算而得之相關係數只為 0.1。
 樣本數之於母群體個數的比例很高時，已不符
合統計學無限大母群體的假設。
4
樣本數要大?還是小?
當研究對象的母群體為有限母體時，
還需要進行抽樣或計算樣本數嗎?
5
樣本數要多大才夠?

統計 ~ 透過有限個數的樣本提供的訊息來
推估母群體的特性，以做出適當之決策。
=> 有限樣本的訊息 vs. 決策對或錯

如何提高決策品質 ~ 足夠的樣本數，以使
 做出錯誤決策的機率控制在可接受的範圍內
=> 型一錯誤 (Type I Error)。
 做出正確決策的機率至少高過一定程度
=> 檢定力 (Power)。
6

統計假設檢定 (Statistical hypothesis testing)

統計 ~ 對所欲瞭解之母群體特性提出兩種不同且完
全相反之假設，然後透過樣本的訊息來檢定哪一
個假設較為合理。

此一經由樣本訊息被認為合理的假設，即為統計
分析的產品 ~ 決策。

E.g., 評估新高血壓藥是否有效

新高血壓藥 vs. 安慰劑的臨床試驗

假設一:新高血壓藥與安慰劑效果相同

假設二:新高血壓藥與安慰劑效果不同
7
真實情況
決策
型一誤差
H0為真
H1為真
接受(accept) H0
推論正確
推論錯誤
拒絕(reject) H0
≒
接受 (accept) H1
推論錯誤
型I誤差
推論正確
檢定力
(Type-I error )

H0為真，但 reject H0 (接受H1) 的犯錯(做錯決策)機率。

新高血壓藥與安慰劑效果其實相同，但統計分析結果的結論卻說新高
血壓藥與安慰劑效果不同的犯錯機率。
檢定力

(Power )
H1為真，而也接受H1 (reject H0) 的正確(做對決策)機率。
 新高血壓藥與安慰劑效果不同，而統計分析結果的結論也說新高
血壓藥與安慰劑效果確實不同的正確機率。
8

什麼時候應該拒絕虛無假設 (接受對立假設)?

當 P-value < 事先決定的最大可接受容許型一錯誤 (即
α) 時。

E.g., P-value < 0.05
統計是透過有限的樣本來瞭解母群體。根據樣本所做
的決策一定有錯誤之風險(即機率)
 P-value ，如果事實上虛無假設是對的，但根據樣本
訊息所做的決策會是拒絕虛無假設 (接受對立假設) 的
機率 =>做錯決策的機率=>型一錯誤。


E.g., 如果新高血壓藥與安慰劑效果其實相同，但根據試驗
所搜集之樣本，進行統計分析後，我們決定下結論新高血壓
藥與安慰劑效果不同的決策錯誤機率。
9
樣本數的計算必須考量的條件
經驗值 ( Experienced Data)
 樣本數的計算係根據對應之統計分析方法
所發展出之樣本數公式來計算

 研究目的
(study objective)
 研究設計 (study design)
 試驗組數 (number of treatment groups)
 評估指標 (outcome measure)
 統計方法 (Statistic method)
 統計假設 (statistical hypothesis)
10




Detectable treatment effect
=> Clinical meaningful effect
型一誤差 (Type-I error ) 與檢定力 (Power)
組別間的樣本數比例 (allocation ratio)
中途離開研究以致無法獲得評估結果的受試者
比例 (anticipated dropout rate)
11

經驗值 ( Experienced Data)
 在抽樣(進行試驗)前對母群體的瞭解
 From
pilot study, e.g., phase II study
 From references
 Based on guess
欲進行 phase III study。根據phase II 結果的經驗
 平均降血壓值:
新高血壓藥 vs. 安慰劑 = 10.5 mmHg vs. 1.2 mmHg
 所降血壓標準差
新高血壓藥 vs. 安慰劑 = 5.2 mmHg vs. 0.4 mmHg
 E.g.,
12
 研究目的
(study objective)
 比較不同治療組別平均效果的差異。

t-test、ANOVA
 建立prediction

model。
Regression models
 探討變數間的相關性

。
Correlation analysis
 建立診斷標準。

ROC Curve Analysis
13
 研究設計
 平行設計

(parallel design)。
Unpaired t-test、ANOVA
 交叉設計

(study design)
(crossover design)
Mixed effect model
 群組循序設計方法(group

sequential design)
Group sequential analysis
14

平行設計 (parallel design)
R
A
N
D
O
M
I
Z
A
T
IO
N
交叉設計
R
A
N
D
O
M
I
Z
A
T
I
O
N
Test Arm: A medication
Control Arm: B medication
(crossover design)
Period I
Sequence
I
Sequence
II
A medication
B medication
Washout
Period
Period II
B medication
A medication
15

試驗組數 (number of treatment groups)
 Single-arm

Paired t-test
 Two-arm


(with control group): 2
Unpaired t-test
 Dose

(without control group): 1
response study: maybe more than 2
ANOVA
評估指標 (outcome measure)
 量性變數
(quantitative variable):血壓、血糖等
 質性變數 (qualitative variable):好 / 壞、有反應 / 無反
應等
 Time to event
16
 統計方法 (statistic method)
 量性變數 (quantitative variable)
 平均數的比較：paired t-test、unpaired t-test
、ANOVA
 Prediction model: regression analysis
 質性變數 (qualitative variable)
 母体比例的比較：Chi-square test
 Prediction model: logistic regression
 Time to event

Survival analysis
17

組別間的樣本數比例 (allocation ratio)
 Active

drug vs. Placebo: 2 vs 1 or more
中途離開研究以致無法獲得評估結果的受試者比
例 (anticipated dropout rate)
 Dropout
rate=P
 total sample size=(No. of evaluable subjects)/(1-P)
 E.g,



no. of requested evaluable subjects by the sample size
formula=40.
Dropout rate=0.2.
Then, the requested sample size=40/(1-0.2)=50.
18


Detectable treatment effect
Clinical meaningful effect vs.
statistical significant effect
統計假設 (statistical hypothesis)
 Test
for equality (difference=0)
 Test for superiority
 Test for noninferiority

型一誤差 (Type-I error ) 與檢定力 (Power)
19

統計假設的種類
Example

進行一個新藥 DRUGN 臨床試驗以評估新藥治療
高血壓的效果。

對照組: DRUGC

評估指標: 治療6個月後的血壓下降量。

μN: DRUGN 在治療6個月後的平均血壓下降量
μC: DRUGC 在治療6個月後的平均血壓下降量
20

Test for equality (difference)

Purpose: DRUGN 與 DRUGC 藥效是否不同

H0: μN =μC vs. H1: μN ≠μC 或寫成
H0: μN -μC =0 vs. H1: μN -μC ≠0

Test for superiority

Purpose: DRUGN的藥效是否比 DRUGC 好

If clinical meaningful difference is δ (正值)，
則 H0: μN -μC ≦ δ vs. H1: μN -μC> δ
21

Test for noninferiority

Purpose: DRUGN的藥效是否沒比 DRUGC 差

If clinical meaningful difference is δ (負值) ，
則 H0: μN -μC≦ δ vs. H1: μN -μC > δ
22

型一誤差 (Type-I error )
 做錯決策的機率

當H0為真，但所做之決策卻認為H0為假(拒絕H0) 。
 一般而言，設定為

0.05。
檢定力 (Power)
 做對決策的機率

當H0為假(即H1為真)，而所做之決策也認為H0為假(
拒絕H0，即接受H1) 。
 一般而言，設定為
0.8。
23
Sample Size formula vs. Effect Size



A measure of the strength of the relationship between two
variables in a statistical population, or a sample-based
estimate of that quantity.
Some rules for deciding effect size are useful for sample
size calculation.
Example: comparison of two independent means
 Effect
size
 Sample
size formula
nc 
( Z1  Z1  ) 2 ( c 2   t 2 / k )
( t  c   )
2
nt  knc
24
Software for Sample Size Calculation

GPower 3.1

GUI interface for Window OS

Cover lots of statistic test and design

Online user manual available
 Free

!
Website for download:
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
25
Sample Size for Comparison of Means between 2 groups

Test for equality
 H0:
µt - µc = 0 vs H1: µt - µc  0
 Two-sided unpaired test
 K: nt / nc (allocation ratio)
 Desirable significant level α, Power 1-β

Formula 1 (hand calculation)
nc 
( Z1 / 2  Z1  ) 2 ( c 2   t 2 / k )
( t   c )
2
nt  knc
公式中的 t , c ,  t 2 ,  c 2必須以經驗值代入 
26

Test for equality

Formula 2 (used by G-Power)
Effect size d =
( t   c )
2
2



 c t /2
noncentrality papameter   d nt nc /  nt  nc 
df  nt  nc  2
nt  knc
nt , nc can be obtained by solving the equation of powr  1-
27

Test for equality (by hand calculation)








Example:治療高血壓新藥臨床試驗 (Phase III study)
Study design: two-arm, randomized, parallel, controlled study
Efficacy endpoint: the decrease of SBP from baseline after 6-month
treatment
Treatment group: DRUGN; Control Group: Placebo
Allocation ratio: 2
由Phase II study result 得知， t , c ,  t ,  c 約為 23, 10, 20,
Desirable significant level α=5%, Power 1-β=80%
Dropout rate: 20%
nc 
(1.96  0.84) 2 (252  202 / 2)
2
 23  10
25
= 38.3  39
nt  2  nc  78

Total sample size= (39+78)/(1-0.2) ≒147
28

Test for equality (by G-Power)
 Test
family: t tests
 Statistical test: Difference between two independent means
(two groups)
 Type of power analysis: A priori: compute required sample
size, given α, power, and effect size
 Input parameters:


Tails: Two
Effect size d:
 Click “Determine=>”
 Select n1=n2 panel
 Mean group 1: 23
 Mean group 2: 10
 SD group 1: 20
 SD group 2: 25
 Click “Calculate and transfer to main window”
29

 Input



parameters:
α error prob: 0.05
Power: 0.8
Allocation ratio N2/N1: 0.5
 Click
“Calculate “
 Click “X-Y plot for the range of values”

Click “Draw plot”
30

Test for equality (by G-Power) : Calculation of Sample size
如果 2
groups的
SD 相等，
則選這裡
Calculated
sample size
31

Test for equality (by G-Power): power analysis for power vs. sample size
32

H0: µt - µc ≦ δ vs H1: µt - µc ＞δ
 One-sided unpaired test
 Desirable significant level α, Power 1-β Superiority margin (clinical
meaningful difference) δ (正值)
 Formula 1 (hand calculation)

nc 
( Z1  Z1  ) 2 ( c 2   t 2 / k )
( t  c   )
2
nt  knc
33


Effect size d =
( t  c   )

2
c

 t2 / 2
df  nt  nc  2
nt  knc
at  -significance level
34

Efficacy endpoint: SBP after 6-month treatment
Allocation ratio: 2
 由Phase II study result 得知， t , c ,  t ,  c
約為 23,





10, 20,
25
Desirable significant level α =5%, Power 1-β =80%
 Superiority margin (clinical meaningful difference) δ=10
 Drop rate: 20%

nc 
(1.645  0.84)2 (252  202 / 2)
 23  10   10
2
=
(1.96  0.84)2 (252  202 / 2)
2
3
=566.1  567
nt  2  nc  1134

35

Test for superiority (by G-Power)
 Test
family: t tests
(two groups)


單尾檢定
Tails: One
Effect size d:
由於 μt – μt – δ=3，所以將
兩 group mean 設成相減 = 3
 Mean group 1: 3
 Mean group 2: 0
 SD group 1: 20
 SD group 2: 25
36

 Input



parameters:
α error prob: 0.05
Power: 0.8
 Click
“Calculate “

37

Test for superiority (by G-Power) : Calculation of Sample size
Hand calculated
sample size 比較
保守
38

H0: µt - µc < -δ vs H1: µt - µc ≧ -δ
 One-sided unpaired test


Noninferiority margin (clinical meaningful difference) δ (負值)

Formula 1 (hand calculation)
nc 
( Z1  Z1  ) 2 ( c 2   t 2 / k )
( t  c   )
2
nt  knc
39


Effect size d =
( t  c   )

2
c

 t2 / 2
df  nt  nc  2
nt  knc
at  -significance level
40


Allocation ratio: 2
由Phase II study result 得知， t , c ,  t ,  c 約為 18, 21, 20, 25
Superiority margin (clinical meaningful difference) δ=-10

Drop rate: 20%







nc 
(1.645  0.84) 2 (252  202 / 2)
18  21  ( 10) 
2
(1.96  0.84) 2 (252  202 / 2)
=
=103.9  104
72
nt  2  nc  208

Total sample size=(104+208)/(1-0.2) ≒390
41

Test for noninferiority (by G-Power)
 Test
family: t tests
(two groups)


單尾檢定
Tails: One
Effect size d:
由於 μt – μC – δ=7，所以將
兩 group mean 設成相減 = 7
 Mean group 1: 7
 Mean group 2: 0
 SD group 1: 20
 SD group 2: 25
42

Test for noninferiority (by G-Power)
 Input



parameters:
α error prob: 0.05
Power: 0.8
 Click
“Calculate “

43

Test for noninferiority (by G-Power) : Calculation of Sample size
Hand calculated
sample size 比較
保守
44
Sample Size for Comparison of Means between 2
groups

The sample size based on nonparametric method can
also be obtained.
 Mann-Whitney
test
 Steps by G-Power




Test family: t tests
Statistical test: Means: Wilcoxon-Mann-Whitney (two groups)
Type of power analysis: A priori: compute required sample size, given
α, power, and effect size
The remaining steps are similar as the steps described in the
previous slides.
45
Sample Size for Comparison of Means between 2
groups

2
2
If there is no idea about the values of t , c ,  t ,  c ,
the following effect size d proposed by Cohen J.
(1969) can be considered:
 Small
effect size d = 0.2
 Medium effect size d = 0.5
 Large effect size d = 0.8
46
Sample Size for Comparison of Means among 3 or more
groups

Test for equality

H0: μ1 =μ2 = ….. =μk for k≧3
vs. H1: μi ≠μj for some i≠j
 Analysis of Variance (ANOVA)
 n: required sample size in each group
 Formula 1 (hand calculation)
1 k
   i
k i 1
1
2  2

2
k

i


i 1
n=  /  2
where  : SD within each group, i is the mean in each group i
公式中的 i ,  , 必須以經驗值代入  由下頁之Table取得
47
groups
48
groups

Test for equality
 Formula 2 (used by G-Power)
Effecect size f =  m / 
 T2   m2   2 : total variability of the samples
1 k
   i
k i 1
k
2
m
 
i 1
ai
   
i
k
a
2
: variability explained by treatment
i
i 1
where i is the mean in each group i
 2 : variance within group (MSE) : variability due to random error
ai : sampl size in each group
公式中的 i ,  , ai 必須以經驗值代入 
49
groups

Test for equality




Study design: three-arm, randomized, parallel, controlled study
Treatment group: DRUGA, DRUGB, DRUGC
由Phase II study result 得知，  1 ,  2 ,  3 ,  約為 9.25, 11.75, 12, 6


Drop rate: 20%
1 k
   i =11
k i 1
1
2  2

2
k

i


 0.12847
i 1
n=  /  2 =9.64/0.12847  76
where  : SD in each group, i is the mean in each group i

Total sample size=(76×3)/(1-0.2)=285
50
Size for Comparison of Means among 3 or more
groups
Sample

 Test
family: F tests
 Statistical test: ANOVA: Fixed effects, omnibus, one-way

Effect size f:
 Select procedure: Effect size from means
這邊的 size 是
 Number of groups: 3
previous study 的
 SD within each group: 6
sample size，只要
 Group 1: Mean=9.25, size=5;
隨便輸入相同之值
 Group 2: Mean=11.75, size=5;
即可
 Group 3: Mean=12,
size=5;
51
groups

 Input



parameters:
α error prob: 0.05
Power: 0.8
Number of groups: 3
 Click
“Calculate “

52
groups

53
Sample Size for Comparison of Means among 3 or
more groups

2
If there is no idea about the values of i ,  , the
following effect size f proposed by Cohen J. (1969)
can be considered:
 Small
effect size f : 0.1
 Medium effect size : 0.25
 Large effect size f : 0.4

Suggested minimum sample size
 Per
cell > 20 is preferred.
54
Sample Size for Comparison of Proportions between 2
groups

Test for equality
 H0:
Pt - Pc = 0 vs H1: Pt - Pc  0
 Two-sided Chi-square test (or Z-test)

Formula
Z1 2  Z1β 
2
nc 
 Pt Pc 
2
 P(
c 1 Pc )  P(
t 1 Pt ) / k 
nt  knc
公式中的 Pt , Pc 必須以經驗值代入 
55
groups

Test for equality (by hand calculation)
Efficacy endpoint: response of treatment after 6-month treatment
Allocation ratio: 2
 由Phase II study result 得知，Pt , Pc 約為 85%, 65%
 Desirable significant level α=5%, Power 1-β=80%
 Dropout rate: 20%





1.96  0.84
nc 
0.65(1-0.65)+0.85(1-0.85)/2  58
2 
0.85  0.65
2
nt  258  116

56
Sample Size for Comparison of Proportions between 2 groups

 Test
family: z tests
 Statistical test: Difference between two independent
proportions






Tails: Two
Proportions p2: 0.65
Proportions p1: 0.85
α error prob: 0.05
Power: 0.8
 Click
“Calculate “

57
groups

58
groups
 Test for superiority
H0: Pt - Pc ≦ δ vs H1: Pt - Pc > δ
 One-sided Chi-square test (or Z-test)
 Desirable significant level α, Power 1-β Desirable significant level α,
Power 1-β
 Superiority margin δ (正值)


Formula
Z1  Z1β 
2
nc 
 Pt Pc 
2
Pc (1-Pc )+Pt (1-Pt )/ k
nt  knc
59
groups

Test for superiority (by hand calculation)
Allocation ratio: 1
 由Phase II study result 得知，Pt , Pc 約為 85%, 65%







Superiority margin δ = 5%
Dropout rate: 20%
1.645  0.84
2
nc 
0.85  0.65  0.05
2
0.65(1-0.65)+0.85(1-0.85)/1  98
nt  198  98


Not available in G-Power
60
groups
 Test for noninferiority
H0: Pt - Pc ≦ δ vs H1: Pt - Pc > δ
 One sided Chi-square test (or Z-test)
 Noninferiority margin δ (負值)


Formula
Z1  Z1β 
2
nc 
 Pt Pc 
2
 Pc (1Pc )+Pt (1Pt ) / k 
nt  knc
61
groups

Test for noninferiority (by hand calculation)
Allocation ratio: 2
 由Phase II study result 得知，Pt , Pc 約為 70, 75%







Superiority margin δ = -10%
Dropout rate: 20%
1.645  0.84
nc 
0.75(1-0.75)+0.70(1-0.70)/2  723
2 
 0.70  0.75  0.1


2
nt  2 723  1446


Not available in G-Power
62
Sample Size for building a linear regression model

Test for all regression coefficients=0
R2 =0 vs H1: R2 ≠0
 F-test
 Formula
 H0:
 n 1 p
R2
2
2
Power  P 

R  Ra 
2
p
1 R


p: number of predictors (independent variables)
Ra2  R 2 value obtained from previous study results
n can be obatined via solving the abve equation
2
R
In G-Power, the effect size f 2 
1  R2
63

Test for all regression coefficients=0 (by G-power)
Example: Multiple linear regression model for predicting blood pressure
via LDL, HDL, gender, age and TG,
 Number of predictors : 5 ( p = 5)



由 previous study result 得知，
R2 約為 0.3
Test family: F tests
 Statistical test: Linear multiple regression: Fixed model, R2 deviation from
zero
 Type of power analysis: A priori: compute required sample size, given α,
power, and effect size





Effect size: click “Determine =>”
Click “From correlation coefficient”
Squared multiple coefficient ρ2: 0.3
Click “Calculate and transfer to main window”
64

 Input



parameters:
α error prob: 0.05
Power: 0.95
Number of predictors: 5
 Click
“Calculate “

65

Test for superiority (by G-Power) : Calculation of Sample size
66

If there is no idea about the value of R2 , the
following effect size f2 proposed by Cohen J. (1969)
can be considered:




Small effect size f : 0.02
Medium effect size : 0.15
Large effect size f : 0.35
Suggested minimum sample size



Min. 5 cases per predictor (5:1)
Ideally 20 cases per predictor (20:1), with an overall N of at
least 100;
N should ideally be 50 + 8(k) for testing a full regression
model or 104 + k when testing individual predictors
(where k is the number of predictors)
67
Key Summary

Clarify the following factors before sample size
calculation:
 研究目的
(study objective)
 研究設計 (study design)
 試驗組數 (number of treatment groups)
 組別間的樣本數比例 (allocation ratio)
 中途離開研究以致無法獲得評估結果的受試者比例
(anticipated dropout rate)
 統計方法 (statistic method)
 評估指標 (outcome measure)
 統計假設 (statistical hypothesis)
 Detectable treatment effect
=> Clinical meaningful effect
 型一誤差 (Type-I error ) 與檢定力 (Power)
68
Key Summary



The hand-calculation can be used when no software
available.
The rules of effect size proposed by Cohen J. (1969)
can be considered if no idea about the values of the
parameters.
The minimum sample size should be achieved.
69

Sample Size Determination 樣本數的計算 謝宗成助理教授 慈濟大學醫學研究所

Transcription

Similar documents

Sample Size Determination 樣本數的計算謝宗成助理教授慈濟大學醫學研究所