Adaptive numerical designs for the calibration of

Transcription

Adaptive numerical designs for the calibration of
Adaptive numerical designs for the calibration of
computer codes
Guillaume Damblin1,2
1 EDF
2 AgroParisTech/INRA
R&D, 6 quai Watier 78401 Chatou
UMR MIA 518, 16 rue Claude Bernard 75005 Paris
MASCOT NUM 2015, April 8 2015
1 / 27
Outline
Calibration of costly computer codes
Adaptive designs based on the EI criterion
2 / 27
Calibration of costly computer codes
Outline
Calibration of costly computer codes
Adaptive designs based on the EI criterion
3 / 27
Calibration of costly computer codes
Notations
Let r (x) ∈ R be a physical quantity of interest:
I
x ∈ X ⊂ Rd is a vector of control variables,
I
z(x) = r (x) + (x) is the physical measurement.
Let yt (x) ∈ R be a computer code:
I
x is aligned on the r input,
I
t ∈ T ∈ Rp is a vector of code parameters (may have no
counterpart in r ).
What is the value of t making the best agreement between r (x)
and yt (x) ?
4 / 27
Calibration of costly computer codes
Illustration
0
yt(x)
5
10
15
The function yt (x) = (6x − 2)2 × sin (tx − 4) on [0, 1] for several values
of t ∈ [5, 15]. Red dots are the physical measurements z(xi ).
●
−5
●
−10
●
●
0.0
t=11
t=6
t=8
t=13
field measures
0.2
0.4
0.6
0.8
1.0
x
5 / 27
Calibration of costly computer codes
The statistical modelling
I
n physical experiments:
I
I
x = {x1 , · · · , xn },
z = {z(x1 ), · · · , z(xn )}.
I
∃θ ∈ T r (xi ) = yθ (xi ) (negligible model error),
I
Recall z(xi ) = r (xi ) + (xi ),
I
Hence, z(xi ) = yθ (xi ) + where ∼ N (0, λ2 ).
i.i.d
Statistical calibration consists in estimating θ in this
regression model!
6 / 27
Calibration of costly computer codes
Bayesian inference of θ
Bayesian inference : Π(θ|z) ∝ L(z|θ)Π(θ)
I
I
Π(θ) is the prior distribution,
1
exp − 2λ1 2 SS(θ) ,
L(z|θ) = √2πλ
where SS(θ) = ||z − yθ (x)||2 .
7 / 27
Calibration of costly computer codes
Bayesian inference of θ
The code yθ (x) is non-linear:
=⇒ no closed form for Π(θ|z),
=⇒ need for MCMC methods,
=⇒ need for hundreds of simulations yθi (xi ).
Issue : the code is costly =⇒ M << ∞ simulations are allocated!
A possible solution : replacing the code by a Gaussian process
emulator!
8 / 27
Calibration of costly computer codes
The Gaussian process emulator (GPE)
Prior hypothesis:
ytj (xj ) = y (xj , tj ) ∼ Y = PG(mβ (.), ΣΨ (.)).
Design of numerical experiments:
DM := {(x1 , t1 ), · · · , (xM , tM )} ⊂ X × T
=⇒
y(DM ) := {y (x1 , t1 ), · · · , y (xM , tM )}
GPE emulator:
M
Y M := Y |y(DM ) ∼ PG(µM
β (.), VΨ ),
which gives a stochastic prediction of yt (x) over X × T .
9 / 27
Calibration of costly computer codes
The approximated likelihood based on a GPE
ˆ Ψ)
ˆ
It is given by the conditional likelihood LC (z|y(DM ), θ, β,
ˆ Ψ)
ˆ ∝ |V ˆ +λ2 In |−1/2 exp −
LC (z|θ, y (DM ), β,
Ψ
1h
T
z − µM
ˆ (x, θ) )
β
2
i
2
−1
M
(VΨ
ˆ + λ In ) (z − µβ
ˆ (x, θ)) .
ˆ = argmax LM (y(DN )|β, Ψ)
ˆ Ψ)
where (β,
(β,Ψ)
10 / 27
Calibration of costly computer codes
Approximated Bayesian calibration of θ
I
ΠC (θ|z, DM ) ∝ LC (z|θ, DM )Π(θ),
11 / 27
Calibration of costly computer codes
Approximated Bayesian calibration of θ
I
ΠC (θ|z, DM ) ∝ LC (z|θ, DM )Π(θ),
I
ΠC (θ|z, DM ) is cheap to evaluate =⇒ MCMC methods OK!
11 / 27
Calibration of costly computer codes
Approximated Bayesian calibration of θ
I
ΠC (θ|z, DM ) ∝ LC (z|θ, DM )Π(θ),
I
ΠC (θ|z, DM ) is cheap to evaluate =⇒ MCMC methods OK!
I
The larger DM , the closer LC (θ|z, DM ) to L(θ|z)
11 / 27
Calibration of costly computer codes
Approximated Bayesian calibration of θ
I
ΠC (θ|z, DM ) ∝ LC (z|θ, DM )Π(θ),
I
ΠC (θ|z, DM ) is cheap to evaluate =⇒ MCMC methods OK!
I
The larger DM , the closer LC (θ|z, DM ) to L(θ|z)
I
KL(ΠC (θ|z, DM )||Π(θ|z)) −→ 0
M→∞
When M is small, KL(ΠC (θ|z, DM )||Π(θ|z)) may be high !
11 / 27
Calibration of costly computer codes
Artificial example
0.0
0.5
1.0
1.5
2.0
Left: The target posterior distribution Π(θ|z)
6
8
10
12
14
θ
12 / 27
Calibration of costly computer codes
Toy example
Two maximin Latin Hypercube Design DM
●
●
●
14
14
●
●
●
●
●
●
●
●
●
●
●
●
12
●
●
12
●
●
●
●
●
●
●
●
●
θ
●
●
10
●
●
●
●
●
●
●
●
●
●
8
8
●
●
●
●
●
●
●
●
0.0
0.2
6
●
●
●
●
●
●
6
θ
10
●
●
●
●
●
0.4
●
0.6
x
●
0.8
●
1.0
0.0
0.2
0.4
0.6
0.8
1.0
x
13 / 27
Calibration of costly computer codes
Toy example
0
0
50
50
100
100
150
150
200
200
The corresponding ΠC (θ|z, DM ) according to DM
9
10
11
12
θ
13
14
15
6
8
10
12
14
θ
14 / 27
Adaptive designs based on the EI criterion
Outline
Calibration of costly computer codes
Adaptive designs based on the EI criterion
15 / 27
Adaptive designs based on the EI criterion
Adaptive calibration
I
Question: How choosing DM to reduce
KL(ΠC (θ|z, DM )||Π(θ|z)) ?
16 / 27
Adaptive designs based on the EI criterion
Adaptive calibration
I
Question: How choosing DM to reduce
KL(ΠC (θ|z, DM )||Π(θ|z)) ?
I
An idea: reduce the difference |LC (θ|z, DM ) − L(θ|z)| where
L(θ|z) is high,
16 / 27
Adaptive designs based on the EI criterion
Adaptive calibration
I
Question: How choosing DM to reduce
KL(ΠC (θ|z, DM )||Π(θ|z)) ?
I
An idea: reduce the difference |LC (θ|z, DM ) − L(θ|z)| where
L(θ|z) is high,
I
An equivalent idea : reduce the uncertainty of the GPE at
locations {(xi , θ)} where SS(θ) is low.
16 / 27
Adaptive designs based on the EI criterion
Adaptive calibration
I
Question: How choosing DM to reduce
KL(ΠC (θ|z, DM )||Π(θ|z)) ?
I
An idea: reduce the difference |LC (θ|z, DM ) − L(θ|z)| where
L(θ|z) is high,
I
An equivalent idea : reduce the uncertainty of the GPE at
locations {(xi , θ)} where SS(θ) is low.
I
A solution: DM is sequentially built thanks to the EI criterion
applied to SS(θ).
EI
D1 −→ · · · −→ Dk −→ Dk+1 −→ · · · −→ DM
16 / 27
Adaptive designs based on the EI criterion
The step k
I
Y k := Y |y(Dk ) constructed from Dk ,
I
mk := min {SS(θ 1 ), · · · , SS(θ k−1 ), SS(θ k )},
I
Dk = {(xi , θj )}1≤i≤n,1≤j≤k is a grid.
How to choose the next input locations {(xi , θ k+1 )}1≤i≤n where
the code is run ?
17 / 27
Adaptive designs based on the EI criterion
The EI criterion: from Dk to Dk+1
EI k (θ) = E
h
i
mk − SSk (θ) 1SSk (θ)≤mk |Y k ∈ [0, mk ],
Then,
I
θ k+1 = argmax EI k (θ),
I
Dk+1 = Dk ∪ {(xi , θ k+1 )}1≤i≤n .
θ
To construct DM , repeat the EI criterion for 1 ≤ k ≤ M !
18 / 27
Adaptive designs based on the EI criterion
14
●
●
12
●
●
●
10
●
●
●
8
θ
●
●
●
●
6
Design Dk
●
●
●
0.0
0.2
0.4
0.6
0.8
1.0
x
19 / 27
Adaptive designs based on the EI criterion
1.4
Optimization of the EI criterion
0.0
0.2
0.4
0.6
EI
0.8
1.0
1.2
●
●
●
●
●
●
6
8
10
12
14
θ
20 / 27
Adaptive designs based on the EI criterion
14
●
●
12
●
●
●
10
●
●
●
●
●
●
8
θ
●
●
●
●
6
Design Dk+1
●
●
●
0.0
0.2
0.4
0.6
0.8
1.0
x
21 / 27
Adaptive designs based on the EI criterion
0
0.0
200
0.5
400
1.0
600
1.5
800
2.0
1000
Approximated calibration using DM
6
8
10
θ
12
14
9.5
10.0
10.5
11.0
11.5
12.0
θ
=⇒ low KL value !
22 / 27
Adaptive designs based on the EI criterion
Comments
I
no closed-form for EI k (θ),
23 / 27
Adaptive designs based on the EI criterion
Comments
I
no closed-form for EI k (θ),
I
Dk is a grid design,
23 / 27
Adaptive designs based on the EI criterion
Comments
I
no closed-form for EI k (θ),
I
Dk is a grid design,
I
unsuitable when n is large,
23 / 27
Adaptive designs based on the EI criterion
Comments
I
no closed-form for EI k (θ),
I
Dk is a grid design,
I
unsuitable when n is large,
I
need of one at a time strategies:
I
maximize the EI criterion =⇒ θ k+1 ,
I
pick up a single pair (x? , θ k+1 ) where x? ∈ {x1 , · · · , xn }.
23 / 27
Adaptive designs based on the EI criterion
Two criteria for one at a time strategies
I
First criterion to reduce the uncertainty of the GPE:
x? = max V(Y k (xi , θ k+1 ))
xi
I
Second criterion to compromise with the calibration goal:

x? = max 
xi

V(µkβ (xi , T ))
V Y k (xi , θ k+1 )

×
max V Y k (xi , θ k+1 )
max V(µkβ (xi , T ))
i=1,··· ,n
i=1,··· ,n
24 / 27
Adaptive designs based on the EI criterion
Design comparison
●
●
14
14
Black dots are the initial design. Red stars are the new experiments selected
from the EI criterion.
●
●
●
●
12
●
12
●
θ
10
●
●
●
●
●
●
●
●
●
6
●
●
8
8
●
6
θ
●
10
●
●
0.2
0.4
0.6
x
0.8
●
0.2
0.4
0.6
0.8
x
25 / 27
Adaptive designs based on the EI criterion
2
1
●
●
●
●
0
KL divergence
3
Robustness in terms of the KL divergence
maximin LHD
version 1
●
●
version 2
version 3
26 / 27
Adaptive designs based on the EI criterion
Main references
G. Damblin, P. Barbillon, M. Keller, A. Pasanisi, and
E. Parent.
Adaptive numerical designs for the calibration of computer
models.
Submitted and http://arxiv.org/abs/1502.07252.
D.R. Jones, M. Schonlau, and W.J. Welch.
Efficient global optimization of expensive black-box functions.
Journal of Global Optimization, 13:455–492, 1998.
M. Kennedy and A. O’Hagan.
Bayesian calibration of computer models.
Journal of the Royal Statistical Society, Series B,
Methodological, 63:425–464, 2001(a).
27 / 27