Bootstrapping the triangles

Transcription

Bootstrapping the triangles
Bootstrapping the triangles
Prečo, kedy a ako (NE)bootstrapovat’ v trojuholníkoch?
Michal Pešta
Charles University in Prague
Faculty of Mathematics and Physics
Actuarial Seminar, Prague
19 October 2012
Overview
- Idea and goal of bootstrapping
- Bootstrap in reserving triangles
- Residuals
- Diagnostics
- Real data example
• Support:
Czech Science Foundation project “DYME Dynamic
Models in Economics” No. P402/12/G097
Prologue for
Bootstrap
- computationally intensive method popularized in
1980s due to the introduction of computers in
statistical practice
bootstrap
does not replace or add to the original data
- a strong mathematical background
- unfortunately, the name “bootstrap” conveys the
impression of “something for nothing”
resampling from their samples
idly
Reserving Issue
- consider traditional actuarial approach to reserving
-
?
E
!
risk . . . the uncertainty in the outcomes over the
lifetime of the liabilities
bootstrap can be also applied under Solvency II
. . . outstanding liabilities after 1 year
distribution-free methods (e.g., chain ladder) only
provide a standard deviation of the
ultimates/reserves (or claims development
result/run-off result)
another risk measure (e.g., V aR @ 99.5%)
moreover, distributions of ultimate cost of claims
and the associated cash flows (not just a standard
deviation)?
claims reserving technique
applied mechanically and without judgement
Bootstrap
- simple (distribution-independent) resampling
method
- estimate properties (distribution) of an estimator
by sampling from an approximating (e.g., empirical)
distribution
- useful when the theoretical distribution of
a statistic of interest is complicated or unknown
Resampling with
replacement
- random sampling with replacement from the
original dataset
for b = 1, . . . , B resample from
∗ , . . . , X∗
X1 , . . . , Xn with replacement and obtain X1,b
n,b
Bootstrap example
- input data (# of catastrophic claims per year in 10y
history): 35, 34, 13, 33, 27, 30, 19, 31, 10, 33
mean = 26.5, sd = 9.168182
Case sampling
I
..
.
I
bootstrap sample 1 (1st draw with replacement):
30, 27, 35, 35, 13, 35, 33, 34, 35, 33
mean∗1 = 31.0, sd∗1 = 6.847546
bootstrap sample 1000 (1000th draw with
replacement): 19, 19, 31, 19, 33, 34, 31, 34, 34, 10
mean∗1000 = 26.4, sd∗1000 = 8.771165
- mean∗1 , . . . , mean∗1000 provide bootstrap empirical
distribution for mean and sd∗1 , . . . , sd∗1000 provide
bootstrap empirical distribution for sd (REALLY!?)
Terminology
- Xi,j . . . claim amounts in development year j with
accident year i
- Xi,j stands for the incremental claims in accident
year i made in accounting year i + j
- n . . . current year – corresponds to the most
recent accident year and development period
- Our data history consists of
right-angled isosceles triangles Xi,j , where
i = 1, . . . , n and j = 1, . . . , n + 1 − i
Run-off (incremental)
triangle
Accident
year i
1
2
1
X1,1
X2,1
..
.
..
.
n−1
n
Xn−1,1
Xn,1
Development year j
2
···
n−1
X1,2
···
X1,n−1
X2,2
···
X2,n−1
..
.
..
.
Xi,n+1−i
Xn−1,2
n
X1,n
Notation
- Ci,j . . . cumulative payments in origin year i after j
development periods
Ci,j =
j
X
Xi,k
k=1
- Ci,j . . . a random variable of which we have an
observation if i + j ≤ n + 1
- Aim is to estimate the ultimate claims amount Ci,n
and the outstanding claims reserve
Ri = Ci,n − Ci,n+1−i ,
i = 2, . . . , n
by completing the triangle into a square
Run-off (cumulative)
triangle
Accident
year i
1
2
1
C1,1
C2,1
..
.
..
.
n−1
n
Cn−1,1
Cn,1
Development year j
2
···
n−1
C1,2
···
C1,n−1
C2,2
···
C2,n−1
..
.
..
.
Ci,n+1−i
Cn−1,2
n
C1,n
Theory behind the
bootstrap
- validity of bootstrap procedure
- asymptotically distributionally coincide
b∗ − R
bi {Xi,j : i + j ≤ n + 1}
R
i
bi − Ri
and R
- approaching (each other) in distribution
(n)
in probability along Dn = {Xi,j : i + j ≤ n + 1}
D b
∗
b
b
→R
n→∞
Ri − Ri Dn(n) ←
i − Ri in probability,
I
∀ real-valued bounded continuous function f
h i
h i
P
bi∗ − R
bi Dn(n) − E f R
bi − Ri −−−
E f R
−→ 0
n→∞
Residuals
- measure the discrepancy of fit in a model
- can be used to explore the adequacy of fit of
a model
- may also indicate the presence of anomalous values
requiring further investigation
- in regression type problems, it is common to
bootstrap the residuals, rather than bootstrap the
data themselves
- should mimic iid r.v. having zero mean, common
variance, symmetric distribution
Raw residuals
(R) ri,j
bi,j
= Xi,j − X
or
(R) ri,j
bi,j
= Ci,j − C
- in homoscedastic linear regression, CL with α = 0,
GLM with normal distribution and identity link
Pearson residuals
(P ) ri,j
bi,j
Xi,j − X
= q
bi,j )
V (X
- in GLM or GEE
- idea is to standardize raw residuals
- ODP:
(P ) ri,j
=
bi,j
Xi,j − X
q
bi,j
X
(P ) ri,j
=
bi,j
Xi,j − X
bi,j
X
- Gamma GLM:
Anscombe residuals I
(A) ri,j
=
bi,j )
A(Xi,j ) − A(X
q
bi,j ) V (X
bi,j )
A0 ( X
- in GLM or GEE
- disadvantage of Pearson residuals: often markedly
skewed
- idea is to obtain residuals, which are not skewed
(to "normalize" residuals)
Anscombe residuals II
Z
dµ
A(·) =
V
- ODP:
(A) ri,j =
1/3 (µ)
2/3
b 2/3
3 Xi,j − X
i,j
1/6
2
b
X
i,j
- Gamma GLM:
(A) ri,j = 3
1/3
b 1/3
Xi,j − X
i,j
b 1/3
X
i,j
- inverse Gaussian:
(A) ri,j
=
bi,j
log Xi,j − log X
1/2
b
X
i,j
Deviance residuals
(D) ri,j
bi,j )
= sign(Xi,j − X
p
di ,
where di is the deviance for one unit, i.e.,
P
di = D
- in GLM or GEE
- similar advantageous properties like Anscombe
residuals (see Taylor expansion)
- ODP:
(D) ri,j
r b
bi,j ) − Xi,j + X
bi,j
= sign(Xi,j −Xi,j ) 2 Xi,j log(Xi,j /X
Scaled residuals
- scaling parameter φ
(sc)
ri,j
ri,j = q
φb
- to obtain the bootstrap prediction error, it is
necessary to add an estimate of the process
variance . . . bias correction in the bootstrap
estimation variance
- Pearson scaled residuals in ODP for the
bootstrap prediction error:
(sc) 0
(P ) ri,j
2
i,j≤n+1 (P ) ri,j
P
= (P ) ri,j ×
n(n + 1)/2 − (2n + 1)
!−1/2
Adjusted residuals
- alternative to scaled residuals . . . bias correction in
the bootstrap estimation variance
r
n−p
(adj)
ri,j = ri,j ×
n
Diagnostics for
residuals
- residuals and squared residuals
- type of plots:
I accident years
I accounting years
I development years
- no pattern visible
- iid with zero mean, symmetrically distributed,
common variance
Bootstrap procedure I
- Ex: bootstrap in ODP
(1) fit a model
estimates α
bi , βbj , γ
b, φb
(2) fitted (expected) values for triangle
bi,j = exp{b
X
γ+α
bi + βbj }
(3) scaled Pearson residuals
(sc)
(P ) ri,j
=
bi,j
Xi,j − X
q
bi,j
φbX
Bootstrap procedure II
n
o
(sc)
(4) resample residuals (P ) ri,j B-times with
replacement
Bo triangles of bootstrapped
n
(sc) ∗
residuals (P,b) ri,j , 1 ≤ b ≤ B
(5) construct B bootstrap triangles
q
(sc) ∗
∗
bb
b
(b) Xi,j = (P,b) ri,j φXi,j + Xi,j
(6) perform ODP on each bootstrap triangle
bootstrapped estimates (b) α
bi , (b) βbj , (b) γ
b, (b) φb
Bootstrap procedure III
(7) calculate bootstrap reserves
b∗
(b) Ri =
n
X
j=n+2−i
b∗
b + (b) α
bi }
(b) Xi,j = exp{(b) γ
n
X
exp{(b) βbj }
j=n+2−i
I (4)–(7) is a bootstrap loop (repeated B-times)
(8) empirical distribution of size B for the reserves
empirical (estimated) mean, s.e., quantiles, . . .
Taylor and Ashe (1983)
data
- incremental triangle
357 848
352 118
290 507
310 608
443 160
396 132
440 832
359 480
376 686
344 014
766 940
884 021
1 001 799
1 108 250
693 190
937 085
847 631
1 061 648
986 608
227 229
425 046
67 948
610 542
933 894
926 219
776 189
991 983
847 498
1 131 398
1 443 370
482 940
1 183 289
1 016 654
1 562 400
769 488
805 037
1 063 269
527 326
445 745
750 816
272 482
504 851
705 960
- R software, ChainLadder package
574 398
320 996
146 923
352 053
470 639
146 342
527 804
495 992
206 286
139 950
266 172
280 405
Development of claims
3e+06
8
7
3
4
6
2
5
1
4
7
2
3
6
5
4
3
2
6
5
4
3
2
5
1
2
3
4
1
1
1
1
1
1
8
4
9
6
3
7
2
5
1
1e+06
Claims
5e+06
2
2
3
5
7
6
9
8
1
2
0
4
3
2
4
6
Development period
8
10
CL with Mack’s s.e.
Chain ladder developments by origin period
Chain ladder dev.
2
8e+06
4
6
8
Mack's S.E.
10
2
4
6
8
10
1
2
3
4
5
6
7
8
9
10
6e+06
4e+06
Amount
2e+06
0e+00
8e+06
6e+06
4e+06
2e+06
0e+00
2
4
6
8
10
2
4
6
8
10
Development period
2
4
6
8
10
Chain ladder diagnostics
6e+06
1e+06
0e+00
4
5
6
7
8
1
9 10
4
6
8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6
8
2
1
3e+06
4e+06
5e+06
●
●
●
●
●
●
●
●
●
●
4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Origin period
●
●
●
●
●
●
●
●
4
2e+06
1
1
●
●
●
●
2
●
●
0
●
●
●
●
●
●
0
1
0
●
Standardised residuals
●
●
●
● ●
●
●
Fitted
●
−1
●
●
●●
●
●
●
●
●●
●
●
1e+06
●
●
●
●
●
●
●
●
●
●
●
10
●
●
●
●
●
●
Development period
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
●
●
●
●
●
5
7
6
9
8
1
2
0
4
3
2
2
1
Standardised residuals
3
1
1
1
1
−1
2
8
4
9
6
3
7
2
5
1
4
7
2
3
6
5
8
7
3
4
6
2
5
1
1
0
Amount
4e+06
2e+06
●
2
2
3
2
3
4
4
3
2
5
4
3
2
6
5
Standardised residuals
●
−1
●
●
2
●
5e+06
●
●
3e+06
●
Origin period
Standardised residuals
●
●
●
1
−1
●
●
●
●
Value
Chain ladder developments by origin period
2
Forecast
Latest
7e+06
Mack Chain Ladder Results
6
Calendar period
●
8
1
2
3
4
5
6
Development period
7
8
Bootstrap results
ecdf(Total.IBNR)
0.8
Fn(x)
0.0
2.0e+07
3.0e+07
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
3
4
5
6
origin period
7
8
9
10
●
●
Latest actual
●
●
1000000
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
●
●
●
●
●
●
●
●
●
2000000
Latest actual incremental claims
against simulated values
Mean ultimate claim
1
3.0e+07
Simulated ultimate claims cost
●
●
2.0e+07
Total IBNR
●
5.0e+06
1.0e+07
Total IBNR
latest incremental claims
1.5e+07
1.0e+07
ultimate claims costs
0.4
150
0 50
Frequency
250
Histogram of Total.IBNR
●
●
●
1
2
3
4
5
6
origin period
7
8
9
10
Mack CL vs bootstrap
GLM (ODP)
Accident
year
1
2
3
4
5
6
7
8
9
10
Total
Ultimate
3 901 463
5 433 719
5 378 826
5 297 906
4 858 200
5 111 171
5 660 771
6 784 799
5 642 266
4 969 825
53 038 946
Chain Ladder
IBNR
0
94 634
469 511
709 638
984 889
1 419 459
2 177 641
3 920 301
4 278 972
4 625 811
18 680 856
S.E.
0
75 535
121 699
133 549
261 406
411 010
558 317
875 328
971 258
1 363 155
2 447 095
Ultimate
3 901 463
5 434 680
5 396 815
5 315 089
4 875 837
5 113 745
5 686 423
6 790 462
5 675 167
5 148 456
53 338 139
Bootstrap
IBNR
0
95 595
487 500
726 821
1 002 526
1 422 033
2 203 293
3 925 964
4 311 873
4 804 442
18 980 049
S.E.
0
106 313
222 001
265 696
313 015
377 703
487 891
789 329
1 034 465
2 091 629
3 096 767
Comparison of
distributional properties
- why to bootstrap ?
- moment characteristics (mean, s.e., . . . ) does not
provide full information about the reserves’
distribution
- additional assumption required in the classical
approach
- 99.5% quantile necessary for VaR
I
I
I
assuming normally distributed reserves
. . . 24 984 154
assuming log-normally distributed reserves
. . . 25 919 050
bootstrap . . . 28 201 572
Conclusions
- mean and variance do not contain full information
about the distribution
quantities like VaR
cannot provide
- assumption of log-normally distributed claims <
log-normally distributed reserves (far more
restrictive)
- various types of residuals
- bootstrap (simulated) distribution
mimics the unknown distribution of reserves
(a mathematical proof necessary)
- R software provides a free sufficient actuarial
environment for reserving
References
England, P. D. and Verrall, R. J. (1999)
Analytic and bootstrap estimates of prediction
errors in claims reserving.
Insurance: Mathematics and Economics, 25.
England, P. D. and Verrall, R. J. (2006)
Predictive distributions of outstanding claims in
general insurance.
Annals of Actuarial Science, 1 (2).
Pesta, M. (2012)
Total least squares and bootstrapping with
application in calibration.
Statistics: A Journal of Theoretical and Applied
Statistics.
Thank you !
[email protected]