pp. 135

Transcription

pp. 135
Safe and Effective Importance Sampling
Author(s): Art Owen and Yi Zhou
Source: Journal of the American Statistical Association, Vol. 95, No. 449 (Mar., 2000), pp. 135143
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2669533
Accessed: 18/08/2010 17:45
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Safe and EffectiveImportance Sampling
ArtOWEN and Yi ZHOU
We presenttwo improvements
on thetechniqueof importancesampling.First,we showthatimportancesamplingfroma mixture
of densities,usingthose densitiesas controlvariates,resultsin a usefulupperbound on the asymptoticvariance.That bound is
a small multipleof the asymptoticvarianceof importancesamplingfromthe best singlecomponentdensity.This allows one to
benefitfromthe greatvariancereductionsobtainableby importancesampling,whileprotectingagainsttheequally greatvariance
increasesthatmighttake thepractitioner
by surprise.The second improvement
is to show how importancesamplingfromtwo or
moredensitiescan be used to approacha zero samplingvarianceeven forintegrands
thattake bothpositiveand negativevalues.
KEY WORDS: Controlvariates;Monte Carlo; Rare events;Reliability;Value at risk;Variancereduction.
1. INTRODUCTION
even for integrandstakingboth signs. One can also positivisethe difference
betweenthe integrandand a control
We considerheretheproblemof computing
variate.The methodcombinesnaturallywiththe mixture
samplingdescribedearlier.
f(x)q(x) dx,(1
I1=
For definiteness,
our discussionis couched in termsof
X havingdensityq over D C Rd. Our mainresultsextend
foran integrand
f (x) and probability
densityq(x), bothde- to discreterandomvariablesX, on replacingdensitiesby
finedon thedomainD C Rd. We are especiallyinterested
in probabilitymass functions.
problemswheref is a spikyfunction,by whichwe mean
1.1 Outline
thatan appreciablefractionof the varianceof f may be
In Section 2 we reviewimportancesamplingand show
to a subsetof D havingrelativelysmall probaattributed
bilityundersamplingfromq. Spikyintegrandsof thissort how it can succeed spectacularlywhenp is nearlyproporarise in high-energy
physics,Bayesian statistics,computa- tionalto qf. But in Example 1 giventhere,importancesameven thoughp is nearlyproportionalfinance,computergraphics,and in the computation pling fails spectacularly,
region.The cause is lack of
of rare event probabilities,as, for example,in reliability tionalto qf in the important
proportionality
outsideof theimportant
region.The method
problems.
It is naturalto considerMonte Carlo samplingforthese of defensiveimportancesampling(Hesterberg1995) adproblems,especiallywhend is not small.Furthermore,
for dressesthisproblem,by samplingfroma mixtureof p and
it is commonto use some formof impor- q. But defensiveimportancesamplingcan cause a greatdespikyintegrands
whenordinaryimportancesamplingwouldhave
tance sampling,replacingthenominaldensityq by a sam- terioration
plingdensityp thatprovidesmoredata fromtheimportant worked;see Example 2.
It is possible to get the best of both worlds,benefitregioncontainingthe spikes.
This articleconsiderstwo improvements
in importance ing fromimportancesamplingwhen it works well, while
sampling.The firstis importancesamplingfroma mix- protectingagainstits failings.This safe and effectiveimturedensity,while using the mixturecomponentsas con- portancesamplingcan be obtainedby using p and q as
trolvariates.We show thatthis methodis asymptotically controlvariateswhile samplingfroma mixtureof p and
notmuchworsethanimportancesamplingfromthebest of q. Section 3 presentsthe methodof controlvariatesas it
themixturecomponents,even if all but one of thosecom- is used in combinationwith importancesampling.Theoponentswould have given an infinitevariance.The prac- rem 1 theregivesconditionsunderwhichestimatedcontrol
are essentiallyequivalentto theoptimal
tical benefitis that one can use several densitiesin the variatecoefficients
hope thatone of themis particularlygood, withoutlos- ones.
Section4 presentsa methodofimportancesamplingfrom
bad.
ing too muchif one or more of themis particularly
This methodis also useful when several integrals,to be a finitemixtureof densities.We show thatusing the inestimatedfromthe same sample,have different
important dividualmixturecomponentsas controlvariatesyields an
asymptoticvariancenot much,if any,largerthan thatof
regions.
For thesecondimprovement,
we introducea techniqueof importancesamplingfromsinglebest mixturecomponent.
positivisation.The integrandis splitinto nonnegativeand Theorem 2 proves this assumingoptimal controlvariate
and Section 4.1 gives mild conditionsunder
nonpositiveparts,and importancesamplingis applied to coefficients,
to behavelike
each of them.This allows one to approachzero variance whichwe can expectour samplecoefficients
the optimalones. Althoughmost of the resultsin Section
4 are presentedforsimplesamplingfroma mixturedistriArt Owen is Professor,Departmentof Statistics,StanfordUniversity, bution,a form of stratified
samplingwith deterministic
Yi Zhou is AssoStanford,CA 94025 (E-mail: [email protected]).
ciate, Goldman-SachsNew York, NY 10005. This work was supported
by theNationalScience Foundation.The authorsthankBennettFox, Paul
Tim Hesterberg,
and Eric Veach fordisGlasserman,PhilipHeidelberger,
cussions and the reviewersforcommentsthatgreatlyimprovedthe presentation.
135
? 2000 American Statistical Association
Journal of the American Statistical Association
March 2000, Vol. 95, No. 449, Theory and Methods
136
Journalof the AmericanStatisticalAssociation,March2000
we findvar(Ip) = C2/n,
By elementarymanipulations,
mixturecomponentsizes presentedthereis preferredin
practice.
where
Section 5 presentsthe methodof multipleimportance
2
~ (f(x) q(x)
2
samplingdue to Veach and Guibas (1995). Like defensive
I) p(x) dx
Up]= /
(x)importancesampling,multipleimportancesamplingis motivatedby thedesireto pool importancesamplingmethods
(3)
f(x)q2 (x) dx-12
and get nearlythebest performance.
p(x)
Section6 is a simulationof Examples 1 and 2. The proposed hybridmethodsare nearlybest on bothexamples.
variance.The densityp* that
is referred
to as theasymptotic
Section 7 introducesa versionof multipleimportance minimizesthisasymptoticvarianceis knownto be proporsamplingthat can approach zero variance on integrands tionalto If(x) q(x) (Kahn and Marshall 1953).
taking both signs. We show there how to exploit a
In therestof thissection,we considerthecommonspecontrolvariate in conjunctionwith multipleimportance cial case with f (x) > 0 and I > 0. Then we may write
sampling.
p*(x) = f(x)q(x)/I. This densitygives r2* = 0 but does
Section 8 gives an example in whichpositivisationand not satisfythepracticalconstraint,
because fq/p*= I, and
multipleimportancesamplingare combined.In thisexam- I is unknown.
ple we replace the integrandby a spikyone and obtaina
The value in knowingthatp* = fq/I is nearlyoptilargevariancereductionby mixturesampling.
mal is that it suggeststhat a good importancesampling
densityshould then be roughlyproportionalto fq. Let
1.2 Background
r(x) = f(x)q(x) - Ip(x). This r(x) is the residual from
An earlyuse of mixturesamplingwas by Torrieand Val- proportionality,
which satisfiesf r(x) dx = 0. We can
leau (1977). Arsham,Fuerverger,McLeish, Kreimer,and rewrite(3) as
Rubinstein(1989) advocatedusingthelikelihoodratioas a
controlvariatein importancesampling.Hesterberg(1995)
(4)
r(()) dx.
52=
consideredusing this ratio as a controlvariatein combinationwith defensivemixturesampling,and obtainedan
Note that(4) holds withoutassumingf > 0.
asymptoticbound on the ratioof the resultingvarianceto
thatof samplingfromthenominaldensity.This appearsto 2.2 Failureof ImportanceSampling
be the firstpublishedresultof this kind,althoughit apeven whenp
Importancesamplingcan fail dramatically,
(Hesterberg1988).
pearedearlierin a dissertation
matchesfq/I well near themode(s) of fq/I. The cause is
is to extend the theoreticalresult to
Our contribution
of (3) and (4). If
theappearanceof p(x) in thedenominator
moregeneralmixturesamples,to describeconditionsunder
p(x) decreasestoward0 fasterthanf2 (x)q2(x) as x moves
whichestimatedcoefficients
approachthetrueones,and to
away fromits mode(s),we can findthatvar(IP) = oo. The
combine it with positivisation.Multiple importancesamvariancemaybe due to a regionof
ironyis thatthisinfinite
plingwas proposedby Veach and Guibas (1995) forprobin ordinaryMonte Carlo sampling.
D thatis unimportant
lems in computergraphics.Our positivisationtechniqueis
To illustratethispoint,we presentExample 1. We use
based on this.
thebeta densityfunction
2. IMPORTANCESAMPLING
J
o
This sectionreviewsimportancesamplingand introduces
F(a)F(b)/F(a + b)
our notation.Integralswithoutan explicitdomain are assumed to be over D. For brevity,we sometimesomit the where a > 0 and b > 0 are parametersand F(z) is the
forexample,in writingq forq(x), gamma function.For vectorsin the unitcube, we use suof functions,
arguments
perscriptsto denotetheircomponents.
whentheargumentx is clear fromcontext.
2.1
Basic ImportanceSampling
In importancesampling,we sample Xi independently
froma densityp, withp(x) > 0 wheneverf (x)q(x) 74 0,
thenestimateI by
Example 1. Let D = (0,1)5, with nominal density
q(x) = U(O, 1)5. The integrandis
5
5
f,(x) = .9 x I| B (xi, 20, 20) + .1Ix I| B (xi, 2, 2), (5)
j=1
j=1
1
I
1
nA
p(X)q(Xi)
xi
(2)
and theimportancesamplingdensityis
5
It is easy to see thatE(Ip) = I. There are two practical
(6)
p(x) = fI B(xi, 20,20).
constraints
on importancesampling:It mustbe feasibleto
j=1
samplefromp, and we mustbe able to computefq/p. For
l
assumingthatwe can computef, it
It is easy to showthatI
o forExample 1.
the second constraint,
1and
CJ2
sufficesto be able to computethe ratio q/p, whichcan be As Figure 1 shows,thisdensityis verynearlyproportional
simplerthancomputingp and q separately.
to fiq, so we mighthave expecteda good result.In similar
Owen and Zhou: ImportanceSampling
137
3000
3000
2500 -
2500
2000-
2000
1500 -1500
1000-
1000
500 -
500
0
_
_
0.0
0.2
_
_
0.4
_
0.6
_
_
0.8
_0
_
1.0
0.0
_
0.2
0.4
(a)
_
_
0.6
_
0.8
_
1.0
(b)
Figure 1. Plots of the Integrandf1(x) (Solid) and the Densityp(x) (Dashed) Along Two TransectsThroughthe UnitCube (0, 1)5. In (a) x =
(z, .5, .5, .5, .5), and in (b) x = (z, z, z, z, z), both for0 < z < 1. The densityis nearlyproportionalto the integrandnear the mode, butgives an
infinite
varianceifused in importancesampling.
plotsusingfi and .9p,thecurvesare visuallyindistinguish- the problemis thatif p is nearlyproportionalto fq, then
able.
a, q + c2p will notbe, apartfromtrivialcases withq nearly
proportional
to fq.
2.3 Defensive ImportanceSampling
The followingis an examplewheredefensiveimportance
The failureof importancesamplingcan be counteredby samplingfails,in thatit destroysthe near proportionality
method.
defensiveimportancesampling(Hesterberg1995). Let p(x) of theoriginalnondefensive
be a densitythatis thoughtto be a good approximation
Example 2. Let D = (0,1)5, with nominal density
to fq/I, at least in the important
partof D. Pick av1with
= U(0, 1)5. The integrand
is
q(x)
o< a,< 1 and definethemixturedensity
Pa(x)-alq(x)
+ av2p(x),
(7)
. Here a is thevector(a,i, a2).
wherea2 = ,1By mixingin some of q, we preventpa frombeingmuch
smallerthanq anywhere.We findthat
f
(X) q 9(X) dx -
5
f2(x) = fI B(xi, 20,20)
j=1
+ 0.1
17sin(60wr(xi
5
- 1/3))11/3<xj<2/3,
(9)
j=1
and theimportancesamplingdensityis (6), fromExample 1.
The integralhere is I = 1, and the functionsf2 and p
12
f2(xq(x)
qx)dxare
Example 2 is consideredfurvisuallyindistinguishable.
Ja(X
in
importancesampling,as
ther
Section
6,
where
defensive
_
(x)q(x)
dx)
< I(ff2
expected,greatlyincreasesthevariance.
fromprowe can showthatas p differs
In some settings,
1 (2
(8) portionality
a2)
by O(E), defensivesamplingcauses a loss of
of O(E-2) as E - 0. Example 3 providesa simefficiency
of thiseffect,withE = a - b.
ple
illustration
Equation(8) providesa kindof insuranceagainsttheworst
effectsof importancesampling.If thenominaldensityproExample 3. Let D = (0,1), withnominaldensityq
vides a finitevariance,thendefensiveimportancesampling
U(0, 1). The integrandis f(x) = Xa, for some a > 1 and
will as well.
forb > 1.
Hesterberg(1995) recommended
usingav1between.1 and p(x) = (b + 1)Xb,
.5. Spiky nonnegativeintegrandscan have Jq I > 0, in
3. CONTROL VARIATESWITH
whichcase thisadvice will approximately
bound the samIMPORTANCE SAMPLING
plingvarianceby between2 and 10 timeswhatit would be
underthenominaldensity.
The methodof controlvariatesuses knowledgeof one or
moreintegralsto reducevariancein theestimateof I. The
2.4 Failureof DefensiveImportanceSampling
basic methodis describedin textssuch as those by BratDefensiveimportancesamplingcan greatlyincreasethe ley,Fox, and Schrage(1987), Hammersleyand Handscomb
varianceover what it would have been withordinaryim- (1964), and Ripley(1987). Here we presentcontrolvariates
portancesampling.In otherwords,the premiumthatwe in combinationwith importancesampling.The integrand
pay for the insurance(8) can be very high. The root of need notbe nonnegative.
52 =
f
2
138
Journalof the AmericanStatisticalAssociation,March2000
Suppose thatwe knowf hj (x) dx = 1,j,
j = I ... I M.We
assume thatp(x) > 0 if any hj(x) > 0, or if f(x)q(x) > 0.
Let 3 = (31,.. IO.) be a vectorof real values. Under
independentsamplingof Xi fromp(x),
A1
Ip,
=
1
S
f (Xi)q(Xi)
-
En1
j3hj (Xi)
p(Xi)
4.
m
+ S3j
j=1
tj (10)
is an unbiasedestimateof I.
The varianceof &,3 is ap 3/n, where
f(x) q(x) - Ej 3jhj(x) _ 8+13It
2
as if theunknownoptimalvalues werebeingused,whereas
in practiceone uses theestimatedcoefficients.
This practice
is usuallyreasonablein Monte Carlo samplingwheren is
large.Because -,Ip,* is a sum of m terms,n should
be largecomparedto m, as it typicallyis.
x p(x)dx.
(11)
MIXTURE SAMPLING
Suppose thatwe have a list of densityfunctionspj, j
1,... m. In defensiveimportancesampling,thepj include
the nominaldensityand anotherthoughtto be nearlyproportionalto fq. In othersettingswe may have a list of
densities,of which we hope thatone or more is roughly
to fq. Finally,we mayhave morethanone inproportional
tegrandto considerin our simulation,and each pj may be
customizedfora subsetof theseintegrands.
These densities
may have been suggestedby subjectmatterknowledge,or
mayhave been foundby numericalsearch.
We sample from the mixture density p, (x)
-
Let 3* minimizetheintegralin (11), over i3, forthe given Zi1 agjpj(x), whereagj > 0 and Zn1 aj = 1. Because
functionsf,p, and q. Equation (11) suggeststhatan esti- f pj(x) dx = 1 andpa(x) > 0 whenever
pj(x) > 0, we can
matei3 of 13*can be foundby a multipleregression(includ- use thepj as controlvariatesas describedin Section3. We
ing an interceptterm)of f(Xi)q(Xi)/p(Xi) on predictors writetheresultingestimatoras
hj(Xi)/p(Xi).
Because theintegralincludesan interceptcoefficient3o,
the residualswill sum to 0. As a result,(10) withi3
simplifiesto
f (Xi) q(Xi )-jnL1
1
n-
(Xi)
3jpj
=1 ajpj(Xi)
m
+Zj'
E
m
j=1
=io+ M
j=1
Thieorem1. Suppose thatthereis a unique vector13*
that minimizesopu,, and let i3 be determinedby least
squares as describedearlier.Suppose furtherthatthe expectationsundersamplingfromp of W/p4 and h?f2q2/p4
1,.. m. Then
existand are finite,forj
+ Op (n-l2
pi = l0fl*
forjm1
(12)
rn,m
, and
I
Ip,* + OpQA ).
(13)
Proof. The unique optimal13* is a functionof some
expectedvalues of crossproductsundersamplingfromp.
These are thecrossproducts
arisingin a regressionof fq/p
on predictorshi/p. The uniquenessof 3* impliesthateach
d is a differentiable
functionof sample means of the
crossproductsused in the regression.The assumed finite
momentsensurethatsample versionsof thesemeans convergeto populationvalues at theOp(n-1/2) rate,establishing (12).
/T3(i-3j*)
=
To establish(13), write ,-p*
=
n->>E hj (Xi)/p(Xi) = bI +
(yj - Aj), where Aij
o (n-
1/2)
9 at the
Theorem1 shows thatalthough approaches -3
standardMonte Carlo rate,theeffectof substituting
dj for
unknownoptimal/3*is asymptotically
negligible.For this
reason,it is customaryto analyze controlvariatemethods
(14)
mixturesamreservingthe notationIa,gifordeterministic
plingintroducedin Section4.3.
The asymptoticvariancecr 3 of Ia,gi is given by (11)
with hj = pj and t]j = 1. This is a positive semidefinite quadraticfunctionin 13.The minimumis not unique,
as required by Theorem 1. For any scalar c, we have
of ux13, then
'pa ii3+ca = Ip.,:,, and so if 13*is a minimizer
so is X3*+ ca. Section4.1 describeshow to applyTheorem
1 to thissetting.
Theorem2 shows thatIa,g3is unbiased,and thatforan
optimal/3,the varianceis neverlargerthanwhatone gets
froman importancesampleof size nacj frompj.
Theorem2.
Let pj, aj, and Ia,gibe as above. If at least
0 whenever
f (x)q(x) > 0, thenforany13,
we have E(I a,,) = I.
one ofpj
(x) >
r
Let upc
be the asymptoticvarianceof la,g3 and let 2pi
be the asymptoticvariance(3), underimportancesampling
frompj. If 13*is any minimizerof ox ., then
'x2 P"
,*
<
min aj- 1l2
1<j<mn
(15)
pi
Proof. To establishunbiasedness,write
j f(x)q(x) - Z
f3jp (x)
m
+Epj = I.
j=1
Owen and Zhou: ImportanceSampling
139
< 2 /cal. Considerthe par- ficients
Next,we provethat 2
to behavelike theoptimalones in theway described
ticularvector/3having 13 = 0 and 3j = -Iaj/cai for by Theorem1.
j > 1. Let ri(x) = f(x)q(x) - Ipi(x), so fri(x) dx 0.
We preferto use a singularvalue decomposition(SVD)
thesevalues,we findthatforthis/3,
to computethe regressioncoefficientsj3j (see Golub and
Substituting
dropsa linearcomVan Loan 1983). The SVD effectively
binationof predictorsfromthe regression.The SVD will
+ la-' E'
2
[ t
ES2
X
stillworkifthereare additionaldependenciesamongthereV
~Pa
gressorspj/ph, whichwouldrequiredroppingtwo or more
mightariseby accipredictors.
Such additionalsingularities
m\
dentor design,as, forexample,if one of thepj is a mixture
a Eaj
1-jl
(
pa(x)dx
of some others.
apa
t3=
J
f(x)q(x)
(x)
j=2J
2
Ip (x) + ri (x)
4.2
of Theorem2
Interpretation
vector
Theorem2 showsthatwithan optimalcoefficient
/3,we get an asymptoticvariancethatis at least as good
as we would have had withnaij observationsfrompj. It is
hardto imaginethatwe could do betterin general,because
we get on average nacj observationsfrompj, and pj may
x pa(cv)Pdx
be
theonlyone of ourcomponentdensitiesthatwouldhave
X
givena finiteasymptoticvariancein importancesampling.
<
For defensiveimportancesampling,we findthatu2
Pa (x)dx
u2 /a, as insurance,removingthe influenceof I in (8).
(x) dx
The premiumthat we pay is bounded because up"M <
? rI
a1
Pi (x)
-2/av2. We can also find this by reversingthe roles of
p and q in defensiveimportancesamplingwith a control
P1
variate.
There is still the worrythatone or more bad sampling
densitiespj could distortthe sample value /3 enough to
using(4). Now
< j2
< aj 1 92 By makingsimi- destroythe asymptoticequivalenceof -1 and 15,O*.The
oj2
lar arguments
forj = 2,. . ., m, (15) is established.
discussionin Section 4.1 shows thatthis will not happen
as long as at least one of the componentdensitieshas
4.1 Estimating
the Coefficients
+Iac 1(pa(x) - alp,(x))
Pa(x)dx
2
< 00.
We nowturnto theproblemof findinggood controlvariate coefficients
MixtureSampling
Oj to use in Ip,,. Theorem1 providessuffi- 4.3 Deterministic
cientconditionsunderwhichestimatedcontrolvariatecoefIn a deterministic
mixturesample,one takes nj = naj
ficientsare essentiallyas good as theoptimalones: unique- observations(or an integerclose to
naj) fromthe density
ness of theminimizerand finiteexpectationsforp4/p4 and
Let Xji - pj be independent,for j = 1,... , m and
pj.
p2f 2q2/p4
i =1, .. . , nj. Incorporating
controlvariatespj, the resultBecause Ej ajpj(x)/pa(x) = 1 forall x, theregression ing estimateis
describedin Section3 is singular,and special care mustbe
taken.Let us assume affineindependenceof thepj; thatis,
nj
if -yo+ Ejm=1yjpj(x) = 0 forall x, thenall -yj= 0. Under
?m 6kPk(Xji)
Q,
f (Xji)q(Xji)fsp=- |
thiscondition,droppingone controlvariatepj, /Pa,fromthe
m
regressionis equivalentto selectingthe unique minimizer
of (11) over /%j,
j + j', with 3j, = 0. We could also drop
(16)
+j:OjI
theinterceptterm.
j=l
Because pj4(X)/p4(X) < oe-4 for all x, the firstmoment
conditionfollowseasily.Now supposethatforat least one whereXji are independentand drawnfrompj (x).
of the pj, the asymptoticvariance 2 from(3) is finite. The estimate(16) is unbiased.When naj is an integer,
thenunderdeterministic
samplingwithnj = nraj,we have
Then
<
The
differencearises fromelimivar(IQ,,)
var(IQ,,).
in
mixture
randomness
the
componentsample sizes.
nating
d
q2-dx < a- adjXf2q2p d
p2f2
p
Hesterberg(1995) called thisstratification.
mixThe unbiasednessof estimatesfromdeterministic
? 2
0
-1qT2
a-2
9
crossture
extends
to
the
moments
and
sample
sampling
+
<
=
01(
opj
X0.
momentsof the controlvariatesPk/Pc, and of fq/pQg.It
Thus if the densitiespj are affinelyindependent,and at followsthatthe resultsof Theorem 1 also apply to deterleast one of themgivesrise to a finiteimportancesampling ministicmixturesampling.Deterministic
mixturesampling
variance,thenwe can expectestimatedcontrolvariatecoef- obtainssuperiorestimatesof theregressioncoefficients
that
pj
140
Journalof the AmericanStatisticalAssociation,March2000
would have been optimalfor randommixturesampling. 20 independent
replicates,we compute
Thereremainsthepossibilityof defining
and estimating
still
bettercoefficients,
optimalunderdeterministic
sampling.
20
=
Z
(Ik _I)2,
Q
0
5. MULTIPLE IMPORTANCE SAMPLING
k=1
Multipleimportancesamplingwas introducedby Veach
and Guibas (1995). They consideredpath samplingalgorithmsforrenderingimages in computergraphics.One of
theirgoals was to combine several importancesampling
strategiesto get performancenearlyequal to thatof the
best strategy.In theirexamples,the optimal methodfor
renderingan image can differfrompixel to pixel in an unpredictableway.
1,... , m be a partition
Let wj (x), j
of unity;thatis,
foreveryx C D, 0 < wj (x) <
lwj (x) = 1. Define
whereI is the trueintegralvalue and Ik is the estimatein
thekthreplicate.For methodswithnegligiblebias and finite
variance,(22) estimatesthe asymptoticvariance.Different
methodswere based on independentsamples.
We comparedthefollowingmethods:
IID:
mw= E
-I EnwjX
w;(Xji) ff(Xji)
f(
(17)
Xl, .,XX
f
-
In,w
(22)
IS:
DIS:
MCV:
n-
1
are iid fromq = U(0,1)5, and I
En 1 f(Xi)
Xl, ,XXn are iid fromp(x), andf
1
n-l En1 f(Xi)lp(Xi)
X1,. . ,XXnare iid frompa = alq + ae2p, and I
'pay
Use faB from(16), withPi = qIP2= p, and nr
-
whereXji are independent
drawsfrompj andthesubscripts
naij
on I denotethepartitionof unityand thesamplesizes used. BAL: Use Jn,w from(17), withp, =qIP2=
p, nj naj,
The estimateIn,wis unbiasedundermildconditionson the
and w from(18)
supportsof thefunctionpj and wj.
CUT: Like BAL, exceptthatw is from(19), withry .1
Veach and Guibas (1995) consideredseveralways of seand nj = n/2
lectingwj. Their motivationis to make fwj "locally pro- POW: Like BAL, exceptthatw is from(20), withp 2
portional"to pj. Theirbalance heuristictakes weights
and nj = n/2
MAX: Like BAL, except thatw is from(21), and nj
n/2.
(18)
wj (x)
k
nkPk(x)
Of these eight methods,we investigatedDIS, MCV, and
The resultmatches(16) withnj = naj and 3j = 0.
BAL at both a,- .1 and at a1 = .5, bringingthe totalto
Theircutoffheuristictakes
11 methods.These methodsare closelyrelated:MCV without controlvariatesis BAL, and BAL withrandommixing
(19) is DIS.
nkpk(X)
wj(x) cXnjpj(X) ln.p.(x)>ymaMXk
The resultsfortheseexamplesare plottedin Figure2. A
forsome 0 < ry< 1, and theirpowerheuristictakes
referencerectanglethere,based on an approximateF20,20
for Q ratios,can be used to gauge statistical
(20) distribution
Wj(X) oc (njpj(x))'
significance.
for p > 0. In both heuristics,the weightsare normalized
As expected,IID samplingdoes badly on both of these
to sum to unity.Sending p -? oc in the power heuristic spiky integrands.Also as expected,importancesampling
or taking-y= 1 in the cutoffheuristicgives rise to the does verybadlyon Example 1, despitehavingmatchedthe
maximumheuristic
mode well,but does verywell on Example 2.
DIS witheithervalue of ae1does verywell on Example 1,
Wj(X) XC Injpj(x)=maXk
(21)
nkpk(X)
thatitwas designedto provide.But
providingtheprotection
IS.
DIS
does
on
badly Example2 comparedto nondefensive
Unless thereare ties among the njpj, (21) puts all of the
does
as
well
As
Theorem
MCV
predictedby
2,
nearly
weighton one of thej's.
as thebetterof IS and DIS on bothexamples.The choice
6. EXAMPLES
of ae1is of secondaryimportancecomparedto the choice
This sectionpresentssimulationresultsfor Examples 1 of method.We believe thatas long as no aj is too small,
and 2 describedin Section 2. Both exampleshave an im- thereis little to gain fromoptimizingover at that canportancesamplingdensitythat,when scaled properly,is notbe gainedby estimating3. The variousheuristicsfrom
fromtheintegrand.Despite this, Veach and Guibas (1995) are roughlyequivalenton these
visuallyindistinguishable
varianceunderimportancesampling, examples.
Example 1 has infinite
whereasExample 2 has verysmall varianceunderimpor7. POSITIVISATION
tance sampling.Defensiveimportancesamplingcures the
problemof Example 1 whilelosingthe accuracyin ExamHere we show that(multiple)importancesamplingcan
achievezero varianceforintegrands
ofmixedsign.Then we
ple 2.
We compare11 methods.For each method,we compute combinethispositivisationmethodwithmixturesampling
i05 observations.Then, using and controlvariates.
an estimateof I using n
-
Owen and Zhou: ImportanceSampling
141
10A3 1OA2
-
1OAJ
-
IID
DIS.5
1OAO -
DIS.1
1OA~1-
li
POUVAL.5
MARCUT gAL.1
1OA-210A-31OA-4 -
1QA5
Here and elsewhere,an expressionlike [(f - g)+q](X) replaces (f (X) - g(X))+q(X) to shortenformulas.The estimate (24) is unbiasedif p? > 0 when (f - g)?q > 0. The
ideal densitiesp? are now proportional
to (f - g)?q.
The functiong is like a controlvariate,althoughit has
a knownnominalexpectation,not a knownintegral,and is
used in a nonstandard
way.A good candidateforg wouldbe
a functionthatwas close to f overmostof D and forwhich
one can guess wherethe greatestdifferences
are likelyto
be, to targetthoseregionswithdensitiesp?.
Anotherwayto splittheintegrand,
usinga controlvariate
h withknownintegralf h(x) dx = , is to write
Ihi
-
H
n+
E
+n+
1
(fq(-h)+(
10A-6 -
-
1OA-7- MCV.5 gMCV.1
0.01
0.10
__E
(fq j(X(
i,
. (25)
IS
1.00
10.00
100.00
Figure2. NormalizedMean Squared ErrorsQ From(22), forExamples 1 and 2 forthe 11 Sampling Methods Described in the Text.The
horizontalaxis is forExample 1, and the verticalaxis is forExample 2.
Statisticalsignificancemaybe assessed by theplottedrectangle.Points
at about
thatdiffervertically
by more than its heightdiffer
significantly
for
the 1% level forExample 2. Its widthhas a similarinterpretation
Example 1.
7.1
h>X,
P+ (Xi,+)
Simple Positivisation
For a uniformnominaldistribution,
withq(x) = 1, we find
Effective
thatih? = ih?, but in generaltheyare different.
use of ih? can be made using a functionh withknown
integralforwhichfq - h is smallin mostplaces, assuming
are likely
thatone can guess wherethegreatestdifferences
to be.
We prefer(24) to (25), because we thinkit will be easier
to selectg to approximatef thanto selecth to approximate
of positivisation
that
fq. For thisreason,thegeneralizations
followare generalizationsof (24), not (25).
For f havingmixed sign, write f = f+ - f_, where
7.2 Partitionof Identity
f+ (x) = max(f(x),O) and f_(x) = max(-f (x),O). Then
Importancesamplingcan be applied with quasi-Monte
I = f f+ (x)q(x) dx - f f_(x)q(x) dx. By takinga sample
of size n+ fromp+ oc f+q and a sample of size n_ from Carlo sampling(Niederreiter1992) or othermethodsthat
But thefunctions(f -g)
p_ oc f_q,it is possibleto attaina zero varianceestimate, benefitfromsmoothintegrands.
are not necessarilysmootheven if f (x) and g(x) are both
smooth.It is thusof interestto positivisef - g withoutlosI1 n f+ (Xi,+)q(Xi,+)
i1=
smoothness.(This discussiondoes notappiyto discrete
ing
n+
P+(Xi,+)
X wheredifferentiability
of f is not advantageous.)
Define a partitionof the identityby a set of functions
n_ 1 pf(Xi, q )
) (23) Vj,j
1,...,r satisfying
in findingp? to attainor apThereis a practicaldifficulty
in evaluating
proachthe optima,but thereis no difficulty
f? at a givenvalue of x.
The foregoingdecomposition"splitstheintegrandat 0."
We could just as easily writef(x) = c + (f(x) - c)+ (f(x) - c)> forsome well-chosenconstantc. In a personal
BennettFox describeda strategyusing a
communication,
0 One applies
value c < infxf (x) so that(f (x) - c) = O.
importancesamplingto (f(x) - c)+ = f(x) - c and adds c
to theresult.
More generally,
supposethatg(x) is a functionforwhich
f g(x)q(x) dx = ,uis known.Let Xi,? be independentsamples fromp?, i = 1,..., n?, and
Tg=H
it i,V
<
I
n=
IV-
g)+q](Xi,+)
r
z
= Evj(z),
-oc < z < oc.
(26)
j=1
foreach j we have eithervj (z) > 0 forall z or
If,moreover,
<
for
all z, we call thesefunctionsa semidefinite
0
(z)
vj
partitionof the identityfunction.A smoothsemidefinite
partitionof identitycan be achievedby
z
22
-+?
Tj?+
Z2
wherer1> 0.
Our estimatenow becomes
Z
T,
-
i
,
1-\
1 =1
, ~~~~Vj((g)(Xj j))q(Xjj)
(7
(27)
whereXji are independently
sampledfromthe densityPj.
The ideal p3 are proportional
to {vj(f - g) iq.
142
7.3
Journalof the AmericanStatisticalAssociation,March2000
MixtureSamplingand Positivisation
Mixturesamplingcan be combinedwithpositivisation
by
applyingmixturesamplingwithcontrolvariatesto each of
mixturesampling
ther termsin (27). In principle,a different
methodwithcontrolvariatescouldbe appliedto each of the
r terms.
A great simplification
occurs if we use the same mixturedensityand controlvariatesforall r terms,and use a
commonset of data pointsXji for all r integrals.The v;
recombineto formthe identity,
and insteadof mr control
variatecoefficients,
we need onlym of them.
Withindependent
Xi.- p, theresultingestimatoris
[(f
f-g)q] (Xi) -
1
t=
n
kPk(Xi)
Z=l
pa(Xi)
=
m
+?
+?,3j,
(28)
j=1
and usingdeterministic
it is
mixtures,
T
-
1
_
m
gln
j
E
[(f - g)q](Xji)
-
-
Zm1
/kpk(Xji)
k=
Pa (Xji)
m
+?/i+?Zj,
(29)
Example 4. Take D = (0,1)5 and q = U(0,1)5. Let
h(x) = 100(E5=1 X- 1) and f (x) = max(min(g(x),300),
-25). Under P1, the componentsXi are independent
N(0, U2) variablesconditionedto lie in (0, 1). UnderP2, the
componentsXi are independentN(1, U2) variablesconditionedto lie in (0, 1). The thirddensityis p3= q = U(O, 1)5.
We take o2 .2236 and a, = .75 * o2 .1677.
Trivially,p = 150, and using a resulton the volume of
a simplex,we findI = 150 - 100/6!+ (.75)5 * 75/6!
149.8858.
We expectthatin practice,subjectmatterknowledgewill
oftenallow one to know roughlywherethe spikes in f g are, perhapsbecause g is knownto be monotonein its
arguments.But we also expectthatit will be hardto find
densitiesthatpreciselymatchthe spikes. Our densitiesP1
and P2 are meantto mimicthis qualitativelycorrect,but
imperfect,
knowledge.Our choices of a, and u2 give rise
to some X's fallingoutsidethe spike regionsand intothe
f - g = 0 region,while at the same timefailingto cover
certaincornersof thespikeregions.This shouldcause naive
positivisationof f - g, withoutdefensivesampling,to fail.
For thepositivisation
methods,let p- = P1 and p+ = P2
For themixturemethods,we use a, = a2 = .45 and a3 =
.1. This makesrelativelylightuse of thedefensivedensity
p3
q. We consideredthesemethods:
Xl, . . ., Xn are iid fromq = U(0, 1)5, andI=
fq= n-l I:=l f (Xi)
whereXji - pj are independent
fori = 1, . . ., nj, and j
CCV:
Classical control variates (10), using g and
1,... . m. The coefficients
,j are estimatedby regressionof
Xi q
(f - g)q/p, on predictorsPk/Pa,replacingf by f - g in PM:
The positivisation
method(24) withn? = n/2
thediscussionsof Sections3 and 4.1.
PMDM:
The positivisation
method(24), replacingp? by
We suggestthefollowingstrategy
in applications,where
defensivemixtures. lq + .9p? and usingcontrol
subjectmatterknowledgeallows it. First,finda suitable
variates.
proxyfunction
g thatis close to f and has a knownintegral. MCV-R: Mixture sampling (28) with densitycontrol
Then,finddensitiesP1 and P2 thatare largewhere(f - )+
variates
and (f - g)- are large.Finally,take a thirddensityp3 as MCV-Rg: MCV-R with an estimatedcoefficientfor g
thenominalor some otherdefensivedensity.
and p
This procedureis illustrated
by Example 4 in Section 8. MCV-D:
MCV-R using deterministic
mixturesampling
Where (f - g)? has a small numberof modes withknown
as at (29)
locations,a mixturedensitycan be constructedfor each MCV-Dg: MCV-Rg withdeterministic
mixturesampling.
mode.If a moregeneralpartitionof identity
havingr terms
For each method,we conducted20 independentruns,
is used, as in (26), thenone or more componentdensities
each withsamplesize n = 105, and computedQ from(22).
can be designedforeach such term.
The resultsare reportedin Table 1.
In (28) and (29), g and p appear with a coefficient
of
Using g as a controlvariate reduces the variance by
1. We could also estimatea coefficient
for them,or rea factorof 650 comparedto plain IID sampling.
roughly
place g by a linearcombinationof severalcontrolvariates
The
MCV
methodsreduced the variance still furtherby
while replacing,uby the same linearcombinationof their
integrals.
Table 1. NormalizedMean Squared Errors(22)
j=1
8.
POSITIVISATION EXAMPLE
This exampleis modeledaftersomeintegrands
in computationalfinance,where f f (x)q(x) dx representsthe value
of a financialinstrument.
Suppose thatf h(x)q(x) dx = ,
and thatf is equal to g subjectto a floorof A and a ceiling
of B. The value g(x)q(x) dx mightbe knownin closed
formor be obtainableby some fastalgorithm.
Example 4 illustratesthe essence of this setting,with
functionsmuchsimplerthanthosearisingin finance.
f
IID:
forthe Simulationof Example 4
Method
1ID
CCV
PM
PMDM
MCV-R
MCV-Rg
MCV-D
MCV-Dg
Normalizedmean squared error
4.00
6.07
1.90
5.49
4.47
5.22
2.66
8.17
x
103
x 10
x 103
x 10-2
x 10-2
x 10-2
x 10-2
x 10-2
Owen and Zhou: ImportanceSampling
amountsrangingfromnearly75 (MCV-Dg) to nearly230
(MCV-D). For statisticalsignificance
at the1% level,a variance ratiomustbe largerthan3.32.
The naive PM methodfailed, as expected.The rough
qualitativeaccuracyof P1 and P2, whichwas so well exploitedbytheMCV methods,is notsufficient
withoutsome
defensivemixing.The PMDM methodwas much better
thanthe PM methodand was competitivewiththe MCV
methods.
143
(2nd ed.), New York:Springer-Verlag.
Golub,G. H., andVan Loan, C. F. (1983), MatrixComputations,
Baltimore:
JohnsHopkinsUniversityPress.
Hammersley,
J.,and Handscomb,D. (1964), Monte Carlo Methods,London: Methuen.
Hesterberg,T. (1988), "Advancesin ImportanceSampling,"unpublished
doctoralthesis,StanfordUniversity.
(1995), "WeightedAverageImportanceSampling and Defensive
MixtureDistributions,"
Technometrics,
37(2), 185-194.
Kahn,H., and Marshall,A. (1953), "Methodsof ReducingSample Size in
MonteCarlo Computations,"
JournaloftheOperationsResearchSociety
ofAmerica,1, 263-278.
Niederreiter,
H. (1992), Random NumberGenerationand Quasi-Monte
[ReceivedJuly1998. RevisedJuly1999.]
Carlo Methods,Philadelphia:SIAM.
Ripley,B. D. (1987), StochasticSimulation,New York:Wiley.
in
Torrie,
G., and Valleau,J.(1977), "NonphysicalSamplingDistributions
REFERENCES
MonteCarlo Free EnergyCalculations:UmbrellaSampling,"Journalof
Arsham,H., Feuerverger,
A., McLeish,D., Kreimer,J.,and Rubinstein,
R.
ComputationalPhysics,23, 187-199.
(1989), "Sensitivity
Analysisand the 'What if' Problemin Simulation Veach,E., and Guibas,L. (1995), "OptimallyCombiningSamplingTechAnalysis,"Mathematicaland ComputerModelling,12, 193-219.
in SIGGRAPH '95 ConferenceProniquesforMonteCarlo Rendering,"
Bratley,P., Fox, B. J.,and Schrage,L. E. (1987), A Guide to Simulation
ceedings,Reading,MA: Addison-Wesley,
pp. 419-428.