Bootstrapping the triangles
Transcription
Bootstrapping the triangles
Bootstrapping the triangles Prečo, kedy a ako (NE)bootstrapovat’ v trojuholníkoch? Michal Pešta Charles University in Prague Faculty of Mathematics and Physics Actuarial Seminar, Prague 19 October 2012 Overview - Idea and goal of bootstrapping - Bootstrap in reserving triangles - Residuals - Diagnostics - Real data example • Support: Czech Science Foundation project “DYME Dynamic Models in Economics” No. P402/12/G097 Prologue for Bootstrap - computationally intensive method popularized in 1980s due to the introduction of computers in statistical practice bootstrap does not replace or add to the original data - a strong mathematical background - unfortunately, the name “bootstrap” conveys the impression of “something for nothing” resampling from their samples idly Reserving Issue - consider traditional actuarial approach to reserving - ? E ! risk . . . the uncertainty in the outcomes over the lifetime of the liabilities bootstrap can be also applied under Solvency II . . . outstanding liabilities after 1 year distribution-free methods (e.g., chain ladder) only provide a standard deviation of the ultimates/reserves (or claims development result/run-off result) another risk measure (e.g., V aR @ 99.5%) moreover, distributions of ultimate cost of claims and the associated cash flows (not just a standard deviation)? claims reserving technique applied mechanically and without judgement Bootstrap - simple (distribution-independent) resampling method - estimate properties (distribution) of an estimator by sampling from an approximating (e.g., empirical) distribution - useful when the theoretical distribution of a statistic of interest is complicated or unknown Resampling with replacement - random sampling with replacement from the original dataset for b = 1, . . . , B resample from ∗ , . . . , X∗ X1 , . . . , Xn with replacement and obtain X1,b n,b Bootstrap example - input data (# of catastrophic claims per year in 10y history): 35, 34, 13, 33, 27, 30, 19, 31, 10, 33 mean = 26.5, sd = 9.168182 Case sampling I .. . I bootstrap sample 1 (1st draw with replacement): 30, 27, 35, 35, 13, 35, 33, 34, 35, 33 mean∗1 = 31.0, sd∗1 = 6.847546 bootstrap sample 1000 (1000th draw with replacement): 19, 19, 31, 19, 33, 34, 31, 34, 34, 10 mean∗1000 = 26.4, sd∗1000 = 8.771165 - mean∗1 , . . . , mean∗1000 provide bootstrap empirical distribution for mean and sd∗1 , . . . , sd∗1000 provide bootstrap empirical distribution for sd (REALLY!?) Terminology - Xi,j . . . claim amounts in development year j with accident year i - Xi,j stands for the incremental claims in accident year i made in accounting year i + j - n . . . current year – corresponds to the most recent accident year and development period - Our data history consists of right-angled isosceles triangles Xi,j , where i = 1, . . . , n and j = 1, . . . , n + 1 − i Run-off (incremental) triangle Accident year i 1 2 1 X1,1 X2,1 .. . .. . n−1 n Xn−1,1 Xn,1 Development year j 2 ··· n−1 X1,2 ··· X1,n−1 X2,2 ··· X2,n−1 .. . .. . Xi,n+1−i Xn−1,2 n X1,n Notation - Ci,j . . . cumulative payments in origin year i after j development periods Ci,j = j X Xi,k k=1 - Ci,j . . . a random variable of which we have an observation if i + j ≤ n + 1 - Aim is to estimate the ultimate claims amount Ci,n and the outstanding claims reserve Ri = Ci,n − Ci,n+1−i , i = 2, . . . , n by completing the triangle into a square Run-off (cumulative) triangle Accident year i 1 2 1 C1,1 C2,1 .. . .. . n−1 n Cn−1,1 Cn,1 Development year j 2 ··· n−1 C1,2 ··· C1,n−1 C2,2 ··· C2,n−1 .. . .. . Ci,n+1−i Cn−1,2 n C1,n Theory behind the bootstrap - validity of bootstrap procedure - asymptotically distributionally coincide b∗ − R bi {Xi,j : i + j ≤ n + 1} R i bi − Ri and R - approaching (each other) in distribution (n) in probability along Dn = {Xi,j : i + j ≤ n + 1} D b ∗ b b →R n→∞ Ri − Ri Dn(n) ← i − Ri in probability, I ∀ real-valued bounded continuous function f h i h i P bi∗ − R bi Dn(n) − E f R bi − Ri −−− E f R −→ 0 n→∞ Residuals - measure the discrepancy of fit in a model - can be used to explore the adequacy of fit of a model - may also indicate the presence of anomalous values requiring further investigation - in regression type problems, it is common to bootstrap the residuals, rather than bootstrap the data themselves - should mimic iid r.v. having zero mean, common variance, symmetric distribution Raw residuals (R) ri,j bi,j = Xi,j − X or (R) ri,j bi,j = Ci,j − C - in homoscedastic linear regression, CL with α = 0, GLM with normal distribution and identity link Pearson residuals (P ) ri,j bi,j Xi,j − X = q bi,j ) V (X - in GLM or GEE - idea is to standardize raw residuals - ODP: (P ) ri,j = bi,j Xi,j − X q bi,j X (P ) ri,j = bi,j Xi,j − X bi,j X - Gamma GLM: Anscombe residuals I (A) ri,j = bi,j ) A(Xi,j ) − A(X q bi,j ) V (X bi,j ) A0 ( X - in GLM or GEE - disadvantage of Pearson residuals: often markedly skewed - idea is to obtain residuals, which are not skewed (to "normalize" residuals) Anscombe residuals II Z dµ A(·) = V - ODP: (A) ri,j = 1/3 (µ) 2/3 b 2/3 3 Xi,j − X i,j 1/6 2 b X i,j - Gamma GLM: (A) ri,j = 3 1/3 b 1/3 Xi,j − X i,j b 1/3 X i,j - inverse Gaussian: (A) ri,j = bi,j log Xi,j − log X 1/2 b X i,j Deviance residuals (D) ri,j bi,j ) = sign(Xi,j − X p di , where di is the deviance for one unit, i.e., P di = D - in GLM or GEE - similar advantageous properties like Anscombe residuals (see Taylor expansion) - ODP: (D) ri,j r b bi,j ) − Xi,j + X bi,j = sign(Xi,j −Xi,j ) 2 Xi,j log(Xi,j /X Scaled residuals - scaling parameter φ (sc) ri,j ri,j = q φb - to obtain the bootstrap prediction error, it is necessary to add an estimate of the process variance . . . bias correction in the bootstrap estimation variance - Pearson scaled residuals in ODP for the bootstrap prediction error: (sc) 0 (P ) ri,j 2 i,j≤n+1 (P ) ri,j P = (P ) ri,j × n(n + 1)/2 − (2n + 1) !−1/2 Adjusted residuals - alternative to scaled residuals . . . bias correction in the bootstrap estimation variance r n−p (adj) ri,j = ri,j × n Diagnostics for residuals - residuals and squared residuals - type of plots: I accident years I accounting years I development years - no pattern visible - iid with zero mean, symmetrically distributed, common variance Bootstrap procedure I - Ex: bootstrap in ODP (1) fit a model estimates α bi , βbj , γ b, φb (2) fitted (expected) values for triangle bi,j = exp{b X γ+α bi + βbj } (3) scaled Pearson residuals (sc) (P ) ri,j = bi,j Xi,j − X q bi,j φbX Bootstrap procedure II n o (sc) (4) resample residuals (P ) ri,j B-times with replacement Bo triangles of bootstrapped n (sc) ∗ residuals (P,b) ri,j , 1 ≤ b ≤ B (5) construct B bootstrap triangles q (sc) ∗ ∗ bb b (b) Xi,j = (P,b) ri,j φXi,j + Xi,j (6) perform ODP on each bootstrap triangle bootstrapped estimates (b) α bi , (b) βbj , (b) γ b, (b) φb Bootstrap procedure III (7) calculate bootstrap reserves b∗ (b) Ri = n X j=n+2−i b∗ b + (b) α bi } (b) Xi,j = exp{(b) γ n X exp{(b) βbj } j=n+2−i I (4)–(7) is a bootstrap loop (repeated B-times) (8) empirical distribution of size B for the reserves empirical (estimated) mean, s.e., quantiles, . . . Taylor and Ashe (1983) data - incremental triangle 357 848 352 118 290 507 310 608 443 160 396 132 440 832 359 480 376 686 344 014 766 940 884 021 1 001 799 1 108 250 693 190 937 085 847 631 1 061 648 986 608 227 229 425 046 67 948 610 542 933 894 926 219 776 189 991 983 847 498 1 131 398 1 443 370 482 940 1 183 289 1 016 654 1 562 400 769 488 805 037 1 063 269 527 326 445 745 750 816 272 482 504 851 705 960 - R software, ChainLadder package 574 398 320 996 146 923 352 053 470 639 146 342 527 804 495 992 206 286 139 950 266 172 280 405 Development of claims 3e+06 8 7 3 4 6 2 5 1 4 7 2 3 6 5 4 3 2 6 5 4 3 2 5 1 2 3 4 1 1 1 1 1 1 8 4 9 6 3 7 2 5 1 1e+06 Claims 5e+06 2 2 3 5 7 6 9 8 1 2 0 4 3 2 4 6 Development period 8 10 CL with Mack’s s.e. Chain ladder developments by origin period Chain ladder dev. 2 8e+06 4 6 8 Mack's S.E. 10 2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 6e+06 4e+06 Amount 2e+06 0e+00 8e+06 6e+06 4e+06 2e+06 0e+00 2 4 6 8 10 2 4 6 8 10 Development period 2 4 6 8 10 Chain ladder diagnostics 6e+06 1e+06 0e+00 4 5 6 7 8 1 9 10 4 6 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 8 2 1 3e+06 4e+06 5e+06 ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Origin period ● ● ● ● ● ● ● ● 4 2e+06 1 1 ● ● ● ● 2 ● ● 0 ● ● ● ● ● ● 0 1 0 ● Standardised residuals ● ● ● ● ● ● ● Fitted ● −1 ● ● ●● ● ● ● ● ●● ● ● 1e+06 ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● Development period ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● 5 7 6 9 8 1 2 0 4 3 2 2 1 Standardised residuals 3 1 1 1 1 −1 2 8 4 9 6 3 7 2 5 1 4 7 2 3 6 5 8 7 3 4 6 2 5 1 1 0 Amount 4e+06 2e+06 ● 2 2 3 2 3 4 4 3 2 5 4 3 2 6 5 Standardised residuals ● −1 ● ● 2 ● 5e+06 ● ● 3e+06 ● Origin period Standardised residuals ● ● ● 1 −1 ● ● ● ● Value Chain ladder developments by origin period 2 Forecast Latest 7e+06 Mack Chain Ladder Results 6 Calendar period ● 8 1 2 3 4 5 6 Development period 7 8 Bootstrap results ecdf(Total.IBNR) 0.8 Fn(x) 0.0 2.0e+07 3.0e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 3 4 5 6 origin period 7 8 9 10 ● ● Latest actual ● ● 1000000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● 2000000 Latest actual incremental claims against simulated values Mean ultimate claim 1 3.0e+07 Simulated ultimate claims cost ● ● 2.0e+07 Total IBNR ● 5.0e+06 1.0e+07 Total IBNR latest incremental claims 1.5e+07 1.0e+07 ultimate claims costs 0.4 150 0 50 Frequency 250 Histogram of Total.IBNR ● ● ● 1 2 3 4 5 6 origin period 7 8 9 10 Mack CL vs bootstrap GLM (ODP) Accident year 1 2 3 4 5 6 7 8 9 10 Total Ultimate 3 901 463 5 433 719 5 378 826 5 297 906 4 858 200 5 111 171 5 660 771 6 784 799 5 642 266 4 969 825 53 038 946 Chain Ladder IBNR 0 94 634 469 511 709 638 984 889 1 419 459 2 177 641 3 920 301 4 278 972 4 625 811 18 680 856 S.E. 0 75 535 121 699 133 549 261 406 411 010 558 317 875 328 971 258 1 363 155 2 447 095 Ultimate 3 901 463 5 434 680 5 396 815 5 315 089 4 875 837 5 113 745 5 686 423 6 790 462 5 675 167 5 148 456 53 338 139 Bootstrap IBNR 0 95 595 487 500 726 821 1 002 526 1 422 033 2 203 293 3 925 964 4 311 873 4 804 442 18 980 049 S.E. 0 106 313 222 001 265 696 313 015 377 703 487 891 789 329 1 034 465 2 091 629 3 096 767 Comparison of distributional properties - why to bootstrap ? - moment characteristics (mean, s.e., . . . ) does not provide full information about the reserves’ distribution - additional assumption required in the classical approach - 99.5% quantile necessary for VaR I I I assuming normally distributed reserves . . . 24 984 154 assuming log-normally distributed reserves . . . 25 919 050 bootstrap . . . 28 201 572 Conclusions - mean and variance do not contain full information about the distribution quantities like VaR cannot provide - assumption of log-normally distributed claims < log-normally distributed reserves (far more restrictive) - various types of residuals - bootstrap (simulated) distribution mimics the unknown distribution of reserves (a mathematical proof necessary) - R software provides a free sufficient actuarial environment for reserving References England, P. D. and Verrall, R. J. (1999) Analytic and bootstrap estimates of prediction errors in claims reserving. Insurance: Mathematics and Economics, 25. England, P. D. and Verrall, R. J. (2006) Predictive distributions of outstanding claims in general insurance. Annals of Actuarial Science, 1 (2). Pesta, M. (2012) Total least squares and bootstrapping with application in calibration. Statistics: A Journal of Theoretical and Applied Statistics. Thank you ! [email protected]