pp. 135
Transcription
pp. 135
Safe and Effective Importance Sampling Author(s): Art Owen and Yi Zhou Source: Journal of the American Statistical Association, Vol. 95, No. 449 (Mar., 2000), pp. 135143 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2669533 Accessed: 18/08/2010 17:45 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=astata. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org Safe and EffectiveImportance Sampling ArtOWEN and Yi ZHOU We presenttwo improvements on thetechniqueof importancesampling.First,we showthatimportancesamplingfroma mixture of densities,usingthose densitiesas controlvariates,resultsin a usefulupperbound on the asymptoticvariance.That bound is a small multipleof the asymptoticvarianceof importancesamplingfromthe best singlecomponentdensity.This allows one to benefitfromthe greatvariancereductionsobtainableby importancesampling,whileprotectingagainsttheequally greatvariance increasesthatmighttake thepractitioner by surprise.The second improvement is to show how importancesamplingfromtwo or moredensitiescan be used to approacha zero samplingvarianceeven forintegrands thattake bothpositiveand negativevalues. KEY WORDS: Controlvariates;Monte Carlo; Rare events;Reliability;Value at risk;Variancereduction. 1. INTRODUCTION even for integrandstakingboth signs. One can also positivisethe difference betweenthe integrandand a control We considerheretheproblemof computing variate.The methodcombinesnaturallywiththe mixture samplingdescribedearlier. f(x)q(x) dx,(1 I1= For definiteness, our discussionis couched in termsof X havingdensityq over D C Rd. Our mainresultsextend foran integrand f (x) and probability densityq(x), bothde- to discreterandomvariablesX, on replacingdensitiesby finedon thedomainD C Rd. We are especiallyinterested in probabilitymass functions. problemswheref is a spikyfunction,by whichwe mean 1.1 Outline thatan appreciablefractionof the varianceof f may be In Section 2 we reviewimportancesamplingand show to a subsetof D havingrelativelysmall probaattributed bilityundersamplingfromq. Spikyintegrandsof thissort how it can succeed spectacularlywhenp is nearlyproporarise in high-energy physics,Bayesian statistics,computa- tionalto qf. But in Example 1 giventhere,importancesameven thoughp is nearlyproportionalfinance,computergraphics,and in the computation pling fails spectacularly, region.The cause is lack of of rare event probabilities,as, for example,in reliability tionalto qf in the important proportionality outsideof theimportant region.The method problems. It is naturalto considerMonte Carlo samplingforthese of defensiveimportancesampling(Hesterberg1995) adproblems,especiallywhend is not small.Furthermore, for dressesthisproblem,by samplingfroma mixtureof p and it is commonto use some formof impor- q. But defensiveimportancesamplingcan cause a greatdespikyintegrands whenordinaryimportancesamplingwouldhave tance sampling,replacingthenominaldensityq by a sam- terioration plingdensityp thatprovidesmoredata fromtheimportant worked;see Example 2. It is possible to get the best of both worlds,benefitregioncontainingthe spikes. This articleconsiderstwo improvements in importance ing fromimportancesamplingwhen it works well, while sampling.The firstis importancesamplingfroma mix- protectingagainstits failings.This safe and effectiveimturedensity,while using the mixturecomponentsas con- portancesamplingcan be obtainedby using p and q as trolvariates.We show thatthis methodis asymptotically controlvariateswhile samplingfroma mixtureof p and notmuchworsethanimportancesamplingfromthebest of q. Section 3 presentsthe methodof controlvariatesas it themixturecomponents,even if all but one of thosecom- is used in combinationwith importancesampling.Theoponentswould have given an infinitevariance.The prac- rem 1 theregivesconditionsunderwhichestimatedcontrol are essentiallyequivalentto theoptimal tical benefitis that one can use several densitiesin the variatecoefficients hope thatone of themis particularlygood, withoutlos- ones. Section4 presentsa methodofimportancesamplingfrom bad. ing too muchif one or more of themis particularly This methodis also useful when several integrals,to be a finitemixtureof densities.We show thatusing the inestimatedfromthe same sample,have different important dividualmixturecomponentsas controlvariatesyields an asymptoticvariancenot much,if any,largerthan thatof regions. For thesecondimprovement, we introducea techniqueof importancesamplingfromsinglebest mixturecomponent. positivisation.The integrandis splitinto nonnegativeand Theorem 2 proves this assumingoptimal controlvariate and Section 4.1 gives mild conditionsunder nonpositiveparts,and importancesamplingis applied to coefficients, to behavelike each of them.This allows one to approachzero variance whichwe can expectour samplecoefficients the optimalones. Althoughmost of the resultsin Section 4 are presentedforsimplesamplingfroma mixturedistriArt Owen is Professor,Departmentof Statistics,StanfordUniversity, bution,a form of stratified samplingwith deterministic Yi Zhou is AssoStanford,CA 94025 (E-mail: [email protected]). ciate, Goldman-SachsNew York, NY 10005. This work was supported by theNationalScience Foundation.The authorsthankBennettFox, Paul Tim Hesterberg, and Eric Veach fordisGlasserman,PhilipHeidelberger, cussions and the reviewersforcommentsthatgreatlyimprovedthe presentation. 135 ? 2000 American Statistical Association Journal of the American Statistical Association March 2000, Vol. 95, No. 449, Theory and Methods 136 Journalof the AmericanStatisticalAssociation,March2000 we findvar(Ip) = C2/n, By elementarymanipulations, mixturecomponentsizes presentedthereis preferredin practice. where Section 5 presentsthe methodof multipleimportance 2 ~ (f(x) q(x) 2 samplingdue to Veach and Guibas (1995). Like defensive I) p(x) dx Up]= / (x)importancesampling,multipleimportancesamplingis motivatedby thedesireto pool importancesamplingmethods (3) f(x)q2 (x) dx-12 and get nearlythebest performance. p(x) Section6 is a simulationof Examples 1 and 2. The proposed hybridmethodsare nearlybest on bothexamples. variance.The densityp* that is referred to as theasymptotic Section 7 introducesa versionof multipleimportance minimizesthisasymptoticvarianceis knownto be proporsamplingthat can approach zero variance on integrands tionalto If(x) q(x) (Kahn and Marshall 1953). taking both signs. We show there how to exploit a In therestof thissection,we considerthecommonspecontrolvariate in conjunctionwith multipleimportance cial case with f (x) > 0 and I > 0. Then we may write sampling. p*(x) = f(x)q(x)/I. This densitygives r2* = 0 but does Section 8 gives an example in whichpositivisationand not satisfythepracticalconstraint, because fq/p*= I, and multipleimportancesamplingare combined.In thisexam- I is unknown. ple we replace the integrandby a spikyone and obtaina The value in knowingthatp* = fq/I is nearlyoptilargevariancereductionby mixturesampling. mal is that it suggeststhat a good importancesampling densityshould then be roughlyproportionalto fq. Let 1.2 Background r(x) = f(x)q(x) - Ip(x). This r(x) is the residual from An earlyuse of mixturesamplingwas by Torrieand Val- proportionality, which satisfiesf r(x) dx = 0. We can leau (1977). Arsham,Fuerverger,McLeish, Kreimer,and rewrite(3) as Rubinstein(1989) advocatedusingthelikelihoodratioas a controlvariatein importancesampling.Hesterberg(1995) (4) r(()) dx. 52= consideredusing this ratio as a controlvariatein combinationwith defensivemixturesampling,and obtainedan Note that(4) holds withoutassumingf > 0. asymptoticbound on the ratioof the resultingvarianceto thatof samplingfromthenominaldensity.This appearsto 2.2 Failureof ImportanceSampling be the firstpublishedresultof this kind,althoughit apeven whenp Importancesamplingcan fail dramatically, (Hesterberg1988). pearedearlierin a dissertation matchesfq/I well near themode(s) of fq/I. The cause is is to extend the theoreticalresult to Our contribution of (3) and (4). If theappearanceof p(x) in thedenominator moregeneralmixturesamples,to describeconditionsunder p(x) decreasestoward0 fasterthanf2 (x)q2(x) as x moves whichestimatedcoefficients approachthetrueones,and to away fromits mode(s),we can findthatvar(IP) = oo. The combine it with positivisation.Multiple importancesamvariancemaybe due to a regionof ironyis thatthisinfinite plingwas proposedby Veach and Guibas (1995) forprobin ordinaryMonte Carlo sampling. D thatis unimportant lems in computergraphics.Our positivisationtechniqueis To illustratethispoint,we presentExample 1. We use based on this. thebeta densityfunction 2. IMPORTANCESAMPLING J o This sectionreviewsimportancesamplingand introduces F(a)F(b)/F(a + b) our notation.Integralswithoutan explicitdomain are assumed to be over D. For brevity,we sometimesomit the where a > 0 and b > 0 are parametersand F(z) is the forexample,in writingq forq(x), gamma function.For vectorsin the unitcube, we use suof functions, arguments perscriptsto denotetheircomponents. whentheargumentx is clear fromcontext. 2.1 Basic ImportanceSampling In importancesampling,we sample Xi independently froma densityp, withp(x) > 0 wheneverf (x)q(x) 74 0, thenestimateI by Example 1. Let D = (0,1)5, with nominal density q(x) = U(O, 1)5. The integrandis 5 5 f,(x) = .9 x I| B (xi, 20, 20) + .1Ix I| B (xi, 2, 2), (5) j=1 j=1 1 I 1 nA p(X)q(Xi) xi (2) and theimportancesamplingdensityis 5 It is easy to see thatE(Ip) = I. There are two practical (6) p(x) = fI B(xi, 20,20). constraints on importancesampling:It mustbe feasibleto j=1 samplefromp, and we mustbe able to computefq/p. For l assumingthatwe can computef, it It is easy to showthatI o forExample 1. the second constraint, 1and CJ2 sufficesto be able to computethe ratio q/p, whichcan be As Figure 1 shows,thisdensityis verynearlyproportional simplerthancomputingp and q separately. to fiq, so we mighthave expecteda good result.In similar Owen and Zhou: ImportanceSampling 137 3000 3000 2500 - 2500 2000- 2000 1500 -1500 1000- 1000 500 - 500 0 _ _ 0.0 0.2 _ _ 0.4 _ 0.6 _ _ 0.8 _0 _ 1.0 0.0 _ 0.2 0.4 (a) _ _ 0.6 _ 0.8 _ 1.0 (b) Figure 1. Plots of the Integrandf1(x) (Solid) and the Densityp(x) (Dashed) Along Two TransectsThroughthe UnitCube (0, 1)5. In (a) x = (z, .5, .5, .5, .5), and in (b) x = (z, z, z, z, z), both for0 < z < 1. The densityis nearlyproportionalto the integrandnear the mode, butgives an infinite varianceifused in importancesampling. plotsusingfi and .9p,thecurvesare visuallyindistinguish- the problemis thatif p is nearlyproportionalto fq, then able. a, q + c2p will notbe, apartfromtrivialcases withq nearly proportional to fq. 2.3 Defensive ImportanceSampling The followingis an examplewheredefensiveimportance The failureof importancesamplingcan be counteredby samplingfails,in thatit destroysthe near proportionality method. defensiveimportancesampling(Hesterberg1995). Let p(x) of theoriginalnondefensive be a densitythatis thoughtto be a good approximation Example 2. Let D = (0,1)5, with nominal density to fq/I, at least in the important partof D. Pick av1with = U(0, 1)5. The integrand is q(x) o< a,< 1 and definethemixturedensity Pa(x)-alq(x) + av2p(x), (7) . Here a is thevector(a,i, a2). wherea2 = ,1By mixingin some of q, we preventpa frombeingmuch smallerthanq anywhere.We findthat f (X) q 9(X) dx - 5 f2(x) = fI B(xi, 20,20) j=1 + 0.1 17sin(60wr(xi 5 - 1/3))11/3<xj<2/3, (9) j=1 and theimportancesamplingdensityis (6), fromExample 1. The integralhere is I = 1, and the functionsf2 and p 12 f2(xq(x) qx)dxare Example 2 is consideredfurvisuallyindistinguishable. Ja(X in importancesampling,as ther Section 6, where defensive _ (x)q(x) dx) < I(ff2 expected,greatlyincreasesthevariance. fromprowe can showthatas p differs In some settings, 1 (2 (8) portionality a2) by O(E), defensivesamplingcauses a loss of of O(E-2) as E - 0. Example 3 providesa simefficiency of thiseffect,withE = a - b. ple illustration Equation(8) providesa kindof insuranceagainsttheworst effectsof importancesampling.If thenominaldensityproExample 3. Let D = (0,1), withnominaldensityq vides a finitevariance,thendefensiveimportancesampling U(0, 1). The integrandis f(x) = Xa, for some a > 1 and will as well. forb > 1. Hesterberg(1995) recommended usingav1between.1 and p(x) = (b + 1)Xb, .5. Spiky nonnegativeintegrandscan have Jq I > 0, in 3. CONTROL VARIATESWITH whichcase thisadvice will approximately bound the samIMPORTANCE SAMPLING plingvarianceby between2 and 10 timeswhatit would be underthenominaldensity. The methodof controlvariatesuses knowledgeof one or moreintegralsto reducevariancein theestimateof I. The 2.4 Failureof DefensiveImportanceSampling basic methodis describedin textssuch as those by BratDefensiveimportancesamplingcan greatlyincreasethe ley,Fox, and Schrage(1987), Hammersleyand Handscomb varianceover what it would have been withordinaryim- (1964), and Ripley(1987). Here we presentcontrolvariates portancesampling.In otherwords,the premiumthatwe in combinationwith importancesampling.The integrand pay for the insurance(8) can be very high. The root of need notbe nonnegative. 52 = f 2 138 Journalof the AmericanStatisticalAssociation,March2000 Suppose thatwe knowf hj (x) dx = 1,j, j = I ... I M.We assume thatp(x) > 0 if any hj(x) > 0, or if f(x)q(x) > 0. Let 3 = (31,.. IO.) be a vectorof real values. Under independentsamplingof Xi fromp(x), A1 Ip, = 1 S f (Xi)q(Xi) - En1 j3hj (Xi) p(Xi) 4. m + S3j j=1 tj (10) is an unbiasedestimateof I. The varianceof &,3 is ap 3/n, where f(x) q(x) - Ej 3jhj(x) _ 8+13It 2 as if theunknownoptimalvalues werebeingused,whereas in practiceone uses theestimatedcoefficients. This practice is usuallyreasonablein Monte Carlo samplingwheren is large.Because -,Ip,* is a sum of m terms,n should be largecomparedto m, as it typicallyis. x p(x)dx. (11) MIXTURE SAMPLING Suppose thatwe have a list of densityfunctionspj, j 1,... m. In defensiveimportancesampling,thepj include the nominaldensityand anotherthoughtto be nearlyproportionalto fq. In othersettingswe may have a list of densities,of which we hope thatone or more is roughly to fq. Finally,we mayhave morethanone inproportional tegrandto considerin our simulation,and each pj may be customizedfora subsetof theseintegrands. These densities may have been suggestedby subjectmatterknowledge,or mayhave been foundby numericalsearch. We sample from the mixture density p, (x) - Let 3* minimizetheintegralin (11), over i3, forthe given Zi1 agjpj(x), whereagj > 0 and Zn1 aj = 1. Because functionsf,p, and q. Equation (11) suggeststhatan esti- f pj(x) dx = 1 andpa(x) > 0 whenever pj(x) > 0, we can matei3 of 13*can be foundby a multipleregression(includ- use thepj as controlvariatesas describedin Section3. We ing an interceptterm)of f(Xi)q(Xi)/p(Xi) on predictors writetheresultingestimatoras hj(Xi)/p(Xi). Because theintegralincludesan interceptcoefficient3o, the residualswill sum to 0. As a result,(10) withi3 simplifiesto f (Xi) q(Xi )-jnL1 1 n- (Xi) 3jpj =1 ajpj(Xi) m +Zj' E m j=1 =io+ M j=1 Thieorem1. Suppose thatthereis a unique vector13* that minimizesopu,, and let i3 be determinedby least squares as describedearlier.Suppose furtherthatthe expectationsundersamplingfromp of W/p4 and h?f2q2/p4 1,.. m. Then existand are finite,forj + Op (n-l2 pi = l0fl* forjm1 (12) rn,m , and I Ip,* + OpQA ). (13) Proof. The unique optimal13* is a functionof some expectedvalues of crossproductsundersamplingfromp. These are thecrossproducts arisingin a regressionof fq/p on predictorshi/p. The uniquenessof 3* impliesthateach d is a differentiable functionof sample means of the crossproductsused in the regression.The assumed finite momentsensurethatsample versionsof thesemeans convergeto populationvalues at theOp(n-1/2) rate,establishing (12). /T3(i-3j*) = To establish(13), write ,-p* = n->>E hj (Xi)/p(Xi) = bI + (yj - Aj), where Aij o (n- 1/2) 9 at the Theorem1 shows thatalthough approaches -3 standardMonte Carlo rate,theeffectof substituting dj for unknownoptimal/3*is asymptotically negligible.For this reason,it is customaryto analyze controlvariatemethods (14) mixturesamreservingthe notationIa,gifordeterministic plingintroducedin Section4.3. The asymptoticvariancecr 3 of Ia,gi is given by (11) with hj = pj and t]j = 1. This is a positive semidefinite quadraticfunctionin 13.The minimumis not unique, as required by Theorem 1. For any scalar c, we have of ux13, then 'pa ii3+ca = Ip.,:,, and so if 13*is a minimizer so is X3*+ ca. Section4.1 describeshow to applyTheorem 1 to thissetting. Theorem2 shows thatIa,g3is unbiased,and thatforan optimal/3,the varianceis neverlargerthanwhatone gets froman importancesampleof size nacj frompj. Theorem2. Let pj, aj, and Ia,gibe as above. If at least 0 whenever f (x)q(x) > 0, thenforany13, we have E(I a,,) = I. one ofpj (x) > r Let upc be the asymptoticvarianceof la,g3 and let 2pi be the asymptoticvariance(3), underimportancesampling frompj. If 13*is any minimizerof ox ., then 'x2 P" ,* < min aj- 1l2 1<j<mn (15) pi Proof. To establishunbiasedness,write j f(x)q(x) - Z f3jp (x) m +Epj = I. j=1 Owen and Zhou: ImportanceSampling 139 < 2 /cal. Considerthe par- ficients Next,we provethat 2 to behavelike theoptimalones in theway described ticularvector/3having 13 = 0 and 3j = -Iaj/cai for by Theorem1. j > 1. Let ri(x) = f(x)q(x) - Ipi(x), so fri(x) dx 0. We preferto use a singularvalue decomposition(SVD) thesevalues,we findthatforthis/3, to computethe regressioncoefficientsj3j (see Golub and Substituting dropsa linearcomVan Loan 1983). The SVD effectively binationof predictorsfromthe regression.The SVD will + la-' E' 2 [ t ES2 X stillworkifthereare additionaldependenciesamongthereV ~Pa gressorspj/ph, whichwouldrequiredroppingtwo or more mightariseby accipredictors. Such additionalsingularities m\ dentor design,as, forexample,if one of thepj is a mixture a Eaj 1-jl ( pa(x)dx of some others. apa t3= J f(x)q(x) (x) j=2J 2 Ip (x) + ri (x) 4.2 of Theorem2 Interpretation vector Theorem2 showsthatwithan optimalcoefficient /3,we get an asymptoticvariancethatis at least as good as we would have had withnaij observationsfrompj. It is hardto imaginethatwe could do betterin general,because we get on average nacj observationsfrompj, and pj may x pa(cv)Pdx be theonlyone of ourcomponentdensitiesthatwouldhave X givena finiteasymptoticvariancein importancesampling. < For defensiveimportancesampling,we findthatu2 Pa (x)dx u2 /a, as insurance,removingthe influenceof I in (8). (x) dx The premiumthat we pay is bounded because up"M < ? rI a1 Pi (x) -2/av2. We can also find this by reversingthe roles of p and q in defensiveimportancesamplingwith a control P1 variate. There is still the worrythatone or more bad sampling densitiespj could distortthe sample value /3 enough to using(4). Now < j2 < aj 1 92 By makingsimi- destroythe asymptoticequivalenceof -1 and 15,O*.The oj2 lar arguments forj = 2,. . ., m, (15) is established. discussionin Section 4.1 shows thatthis will not happen as long as at least one of the componentdensitieshas 4.1 Estimating the Coefficients +Iac 1(pa(x) - alp,(x)) Pa(x)dx 2 < 00. We nowturnto theproblemof findinggood controlvariate coefficients MixtureSampling Oj to use in Ip,,. Theorem1 providessuffi- 4.3 Deterministic cientconditionsunderwhichestimatedcontrolvariatecoefIn a deterministic mixturesample,one takes nj = naj ficientsare essentiallyas good as theoptimalones: unique- observations(or an integerclose to naj) fromthe density ness of theminimizerand finiteexpectationsforp4/p4 and Let Xji - pj be independent,for j = 1,... , m and pj. p2f 2q2/p4 i =1, .. . , nj. Incorporating controlvariatespj, the resultBecause Ej ajpj(x)/pa(x) = 1 forall x, theregression ing estimateis describedin Section3 is singular,and special care mustbe taken.Let us assume affineindependenceof thepj; thatis, nj if -yo+ Ejm=1yjpj(x) = 0 forall x, thenall -yj= 0. Under ?m 6kPk(Xji) Q, f (Xji)q(Xji)fsp=- | thiscondition,droppingone controlvariatepj, /Pa,fromthe m regressionis equivalentto selectingthe unique minimizer of (11) over /%j, j + j', with 3j, = 0. We could also drop (16) +j:OjI theinterceptterm. j=l Because pj4(X)/p4(X) < oe-4 for all x, the firstmoment conditionfollowseasily.Now supposethatforat least one whereXji are independentand drawnfrompj (x). of the pj, the asymptoticvariance 2 from(3) is finite. The estimate(16) is unbiased.When naj is an integer, thenunderdeterministic samplingwithnj = nraj,we have Then < The differencearises fromelimivar(IQ,,) var(IQ,,). in mixture randomness the componentsample sizes. nating d q2-dx < a- adjXf2q2p d p2f2 p Hesterberg(1995) called thisstratification. mixThe unbiasednessof estimatesfromdeterministic ? 2 0 -1qT2 a-2 9 crossture extends to the moments and sample sampling + < = 01( opj X0. momentsof the controlvariatesPk/Pc, and of fq/pQg.It Thus if the densitiespj are affinelyindependent,and at followsthatthe resultsof Theorem 1 also apply to deterleast one of themgivesrise to a finiteimportancesampling ministicmixturesampling.Deterministic mixturesampling variance,thenwe can expectestimatedcontrolvariatecoef- obtainssuperiorestimatesof theregressioncoefficients that pj 140 Journalof the AmericanStatisticalAssociation,March2000 would have been optimalfor randommixturesampling. 20 independent replicates,we compute Thereremainsthepossibilityof defining and estimating still bettercoefficients, optimalunderdeterministic sampling. 20 = Z (Ik _I)2, Q 0 5. MULTIPLE IMPORTANCE SAMPLING k=1 Multipleimportancesamplingwas introducedby Veach and Guibas (1995). They consideredpath samplingalgorithmsforrenderingimages in computergraphics.One of theirgoals was to combine several importancesampling strategiesto get performancenearlyequal to thatof the best strategy.In theirexamples,the optimal methodfor renderingan image can differfrompixel to pixel in an unpredictableway. 1,... , m be a partition Let wj (x), j of unity;thatis, foreveryx C D, 0 < wj (x) < lwj (x) = 1. Define whereI is the trueintegralvalue and Ik is the estimatein thekthreplicate.For methodswithnegligiblebias and finite variance,(22) estimatesthe asymptoticvariance.Different methodswere based on independentsamples. We comparedthefollowingmethods: IID: mw= E -I EnwjX w;(Xji) ff(Xji) f( (17) Xl, .,XX f - In,w (22) IS: DIS: MCV: n- 1 are iid fromq = U(0,1)5, and I En 1 f(Xi) Xl, ,XXn are iid fromp(x), andf 1 n-l En1 f(Xi)lp(Xi) X1,. . ,XXnare iid frompa = alq + ae2p, and I 'pay Use faB from(16), withPi = qIP2= p, and nr - whereXji are independent drawsfrompj andthesubscripts naij on I denotethepartitionof unityand thesamplesizes used. BAL: Use Jn,w from(17), withp, =qIP2= p, nj naj, The estimateIn,wis unbiasedundermildconditionson the and w from(18) supportsof thefunctionpj and wj. CUT: Like BAL, exceptthatw is from(19), withry .1 Veach and Guibas (1995) consideredseveralways of seand nj = n/2 lectingwj. Their motivationis to make fwj "locally pro- POW: Like BAL, exceptthatw is from(20), withp 2 portional"to pj. Theirbalance heuristictakes weights and nj = n/2 MAX: Like BAL, except thatw is from(21), and nj n/2. (18) wj (x) k nkPk(x) Of these eight methods,we investigatedDIS, MCV, and The resultmatches(16) withnj = naj and 3j = 0. BAL at both a,- .1 and at a1 = .5, bringingthe totalto Theircutoffheuristictakes 11 methods.These methodsare closelyrelated:MCV without controlvariatesis BAL, and BAL withrandommixing (19) is DIS. nkpk(X) wj(x) cXnjpj(X) ln.p.(x)>ymaMXk The resultsfortheseexamplesare plottedin Figure2. A forsome 0 < ry< 1, and theirpowerheuristictakes referencerectanglethere,based on an approximateF20,20 for Q ratios,can be used to gauge statistical (20) distribution Wj(X) oc (njpj(x))' significance. for p > 0. In both heuristics,the weightsare normalized As expected,IID samplingdoes badly on both of these to sum to unity.Sending p -? oc in the power heuristic spiky integrands.Also as expected,importancesampling or taking-y= 1 in the cutoffheuristicgives rise to the does verybadlyon Example 1, despitehavingmatchedthe maximumheuristic mode well,but does verywell on Example 2. DIS witheithervalue of ae1does verywell on Example 1, Wj(X) XC Injpj(x)=maXk (21) nkpk(X) thatitwas designedto provide.But providingtheprotection IS. DIS does on badly Example2 comparedto nondefensive Unless thereare ties among the njpj, (21) puts all of the does as well As Theorem MCV predictedby 2, nearly weighton one of thej's. as thebetterof IS and DIS on bothexamples.The choice 6. EXAMPLES of ae1is of secondaryimportancecomparedto the choice This sectionpresentssimulationresultsfor Examples 1 of method.We believe thatas long as no aj is too small, and 2 describedin Section 2. Both exampleshave an im- thereis little to gain fromoptimizingover at that canportancesamplingdensitythat,when scaled properly,is notbe gainedby estimating3. The variousheuristicsfrom fromtheintegrand.Despite this, Veach and Guibas (1995) are roughlyequivalenton these visuallyindistinguishable varianceunderimportancesampling, examples. Example 1 has infinite whereasExample 2 has verysmall varianceunderimpor7. POSITIVISATION tance sampling.Defensiveimportancesamplingcures the problemof Example 1 whilelosingthe accuracyin ExamHere we show that(multiple)importancesamplingcan achievezero varianceforintegrands ofmixedsign.Then we ple 2. We compare11 methods.For each method,we compute combinethispositivisationmethodwithmixturesampling i05 observations.Then, using and controlvariates. an estimateof I using n - Owen and Zhou: ImportanceSampling 141 10A3 1OA2 - 1OAJ - IID DIS.5 1OAO - DIS.1 1OA~1- li POUVAL.5 MARCUT gAL.1 1OA-210A-31OA-4 - 1QA5 Here and elsewhere,an expressionlike [(f - g)+q](X) replaces (f (X) - g(X))+q(X) to shortenformulas.The estimate (24) is unbiasedif p? > 0 when (f - g)?q > 0. The ideal densitiesp? are now proportional to (f - g)?q. The functiong is like a controlvariate,althoughit has a knownnominalexpectation,not a knownintegral,and is used in a nonstandard way.A good candidateforg wouldbe a functionthatwas close to f overmostof D and forwhich one can guess wherethe greatestdifferences are likelyto be, to targetthoseregionswithdensitiesp?. Anotherwayto splittheintegrand, usinga controlvariate h withknownintegralf h(x) dx = , is to write Ihi - H n+ E +n+ 1 (fq(-h)+( 10A-6 - - 1OA-7- MCV.5 gMCV.1 0.01 0.10 __E (fq j(X( i, . (25) IS 1.00 10.00 100.00 Figure2. NormalizedMean Squared ErrorsQ From(22), forExamples 1 and 2 forthe 11 Sampling Methods Described in the Text.The horizontalaxis is forExample 1, and the verticalaxis is forExample 2. Statisticalsignificancemaybe assessed by theplottedrectangle.Points at about thatdiffervertically by more than its heightdiffer significantly for the 1% level forExample 2. Its widthhas a similarinterpretation Example 1. 7.1 h>X, P+ (Xi,+) Simple Positivisation For a uniformnominaldistribution, withq(x) = 1, we find Effective thatih? = ih?, but in generaltheyare different. use of ih? can be made using a functionh withknown integralforwhichfq - h is smallin mostplaces, assuming are likely thatone can guess wherethegreatestdifferences to be. We prefer(24) to (25), because we thinkit will be easier to selectg to approximatef thanto selecth to approximate of positivisation that fq. For thisreason,thegeneralizations followare generalizationsof (24), not (25). For f havingmixed sign, write f = f+ - f_, where 7.2 Partitionof Identity f+ (x) = max(f(x),O) and f_(x) = max(-f (x),O). Then Importancesamplingcan be applied with quasi-Monte I = f f+ (x)q(x) dx - f f_(x)q(x) dx. By takinga sample of size n+ fromp+ oc f+q and a sample of size n_ from Carlo sampling(Niederreiter1992) or othermethodsthat But thefunctions(f -g) p_ oc f_q,it is possibleto attaina zero varianceestimate, benefitfromsmoothintegrands. are not necessarilysmootheven if f (x) and g(x) are both smooth.It is thusof interestto positivisef - g withoutlosI1 n f+ (Xi,+)q(Xi,+) i1= smoothness.(This discussiondoes notappiyto discrete ing n+ P+(Xi,+) X wheredifferentiability of f is not advantageous.) Define a partitionof the identityby a set of functions n_ 1 pf(Xi, q ) ) (23) Vj,j 1,...,r satisfying in findingp? to attainor apThereis a practicaldifficulty in evaluating proachthe optima,but thereis no difficulty f? at a givenvalue of x. The foregoingdecomposition"splitstheintegrandat 0." We could just as easily writef(x) = c + (f(x) - c)+ (f(x) - c)> forsome well-chosenconstantc. In a personal BennettFox describeda strategyusing a communication, 0 One applies value c < infxf (x) so that(f (x) - c) = O. importancesamplingto (f(x) - c)+ = f(x) - c and adds c to theresult. More generally, supposethatg(x) is a functionforwhich f g(x)q(x) dx = ,uis known.Let Xi,? be independentsamples fromp?, i = 1,..., n?, and Tg=H it i,V < I n= IV- g)+q](Xi,+) r z = Evj(z), -oc < z < oc. (26) j=1 foreach j we have eithervj (z) > 0 forall z or If,moreover, < for all z, we call thesefunctionsa semidefinite 0 (z) vj partitionof the identityfunction.A smoothsemidefinite partitionof identitycan be achievedby z 22 -+? Tj?+ Z2 wherer1> 0. Our estimatenow becomes Z T, - i , 1-\ 1 =1 , ~~~~Vj((g)(Xj j))q(Xjj) (7 (27) whereXji are independently sampledfromthe densityPj. The ideal p3 are proportional to {vj(f - g) iq. 142 7.3 Journalof the AmericanStatisticalAssociation,March2000 MixtureSamplingand Positivisation Mixturesamplingcan be combinedwithpositivisation by applyingmixturesamplingwithcontrolvariatesto each of mixturesampling ther termsin (27). In principle,a different methodwithcontrolvariatescouldbe appliedto each of the r terms. A great simplification occurs if we use the same mixturedensityand controlvariatesforall r terms,and use a commonset of data pointsXji for all r integrals.The v; recombineto formthe identity, and insteadof mr control variatecoefficients, we need onlym of them. Withindependent Xi.- p, theresultingestimatoris [(f f-g)q] (Xi) - 1 t= n kPk(Xi) Z=l pa(Xi) = m +? +?,3j, (28) j=1 and usingdeterministic it is mixtures, T - 1 _ m gln j E [(f - g)q](Xji) - - Zm1 /kpk(Xji) k= Pa (Xji) m +?/i+?Zj, (29) Example 4. Take D = (0,1)5 and q = U(0,1)5. Let h(x) = 100(E5=1 X- 1) and f (x) = max(min(g(x),300), -25). Under P1, the componentsXi are independent N(0, U2) variablesconditionedto lie in (0, 1). UnderP2, the componentsXi are independentN(1, U2) variablesconditionedto lie in (0, 1). The thirddensityis p3= q = U(O, 1)5. We take o2 .2236 and a, = .75 * o2 .1677. Trivially,p = 150, and using a resulton the volume of a simplex,we findI = 150 - 100/6!+ (.75)5 * 75/6! 149.8858. We expectthatin practice,subjectmatterknowledgewill oftenallow one to know roughlywherethe spikes in f g are, perhapsbecause g is knownto be monotonein its arguments.But we also expectthatit will be hardto find densitiesthatpreciselymatchthe spikes. Our densitiesP1 and P2 are meantto mimicthis qualitativelycorrect,but imperfect, knowledge.Our choices of a, and u2 give rise to some X's fallingoutsidethe spike regionsand intothe f - g = 0 region,while at the same timefailingto cover certaincornersof thespikeregions.This shouldcause naive positivisationof f - g, withoutdefensivesampling,to fail. For thepositivisation methods,let p- = P1 and p+ = P2 For themixturemethods,we use a, = a2 = .45 and a3 = .1. This makesrelativelylightuse of thedefensivedensity p3 q. We consideredthesemethods: Xl, . . ., Xn are iid fromq = U(0, 1)5, andI= fq= n-l I:=l f (Xi) whereXji - pj are independent fori = 1, . . ., nj, and j CCV: Classical control variates (10), using g and 1,... . m. The coefficients ,j are estimatedby regressionof Xi q (f - g)q/p, on predictorsPk/Pa,replacingf by f - g in PM: The positivisation method(24) withn? = n/2 thediscussionsof Sections3 and 4.1. PMDM: The positivisation method(24), replacingp? by We suggestthefollowingstrategy in applications,where defensivemixtures. lq + .9p? and usingcontrol subjectmatterknowledgeallows it. First,finda suitable variates. proxyfunction g thatis close to f and has a knownintegral. MCV-R: Mixture sampling (28) with densitycontrol Then,finddensitiesP1 and P2 thatare largewhere(f - )+ variates and (f - g)- are large.Finally,take a thirddensityp3 as MCV-Rg: MCV-R with an estimatedcoefficientfor g thenominalor some otherdefensivedensity. and p This procedureis illustrated by Example 4 in Section 8. MCV-D: MCV-R using deterministic mixturesampling Where (f - g)? has a small numberof modes withknown as at (29) locations,a mixturedensitycan be constructedfor each MCV-Dg: MCV-Rg withdeterministic mixturesampling. mode.If a moregeneralpartitionof identity havingr terms For each method,we conducted20 independentruns, is used, as in (26), thenone or more componentdensities each withsamplesize n = 105, and computedQ from(22). can be designedforeach such term. The resultsare reportedin Table 1. In (28) and (29), g and p appear with a coefficient of Using g as a controlvariate reduces the variance by 1. We could also estimatea coefficient for them,or rea factorof 650 comparedto plain IID sampling. roughly place g by a linearcombinationof severalcontrolvariates The MCV methodsreduced the variance still furtherby while replacing,uby the same linearcombinationof their integrals. Table 1. NormalizedMean Squared Errors(22) j=1 8. POSITIVISATION EXAMPLE This exampleis modeledaftersomeintegrands in computationalfinance,where f f (x)q(x) dx representsthe value of a financialinstrument. Suppose thatf h(x)q(x) dx = , and thatf is equal to g subjectto a floorof A and a ceiling of B. The value g(x)q(x) dx mightbe knownin closed formor be obtainableby some fastalgorithm. Example 4 illustratesthe essence of this setting,with functionsmuchsimplerthanthosearisingin finance. f IID: forthe Simulationof Example 4 Method 1ID CCV PM PMDM MCV-R MCV-Rg MCV-D MCV-Dg Normalizedmean squared error 4.00 6.07 1.90 5.49 4.47 5.22 2.66 8.17 x 103 x 10 x 103 x 10-2 x 10-2 x 10-2 x 10-2 x 10-2 Owen and Zhou: ImportanceSampling amountsrangingfromnearly75 (MCV-Dg) to nearly230 (MCV-D). For statisticalsignificance at the1% level,a variance ratiomustbe largerthan3.32. The naive PM methodfailed, as expected.The rough qualitativeaccuracyof P1 and P2, whichwas so well exploitedbytheMCV methods,is notsufficient withoutsome defensivemixing.The PMDM methodwas much better thanthe PM methodand was competitivewiththe MCV methods. 143 (2nd ed.), New York:Springer-Verlag. Golub,G. H., andVan Loan, C. F. (1983), MatrixComputations, Baltimore: JohnsHopkinsUniversityPress. Hammersley, J.,and Handscomb,D. (1964), Monte Carlo Methods,London: Methuen. Hesterberg,T. (1988), "Advancesin ImportanceSampling,"unpublished doctoralthesis,StanfordUniversity. (1995), "WeightedAverageImportanceSampling and Defensive MixtureDistributions," Technometrics, 37(2), 185-194. Kahn,H., and Marshall,A. (1953), "Methodsof ReducingSample Size in MonteCarlo Computations," JournaloftheOperationsResearchSociety ofAmerica,1, 263-278. Niederreiter, H. (1992), Random NumberGenerationand Quasi-Monte [ReceivedJuly1998. RevisedJuly1999.] Carlo Methods,Philadelphia:SIAM. Ripley,B. D. (1987), StochasticSimulation,New York:Wiley. in Torrie, G., and Valleau,J.(1977), "NonphysicalSamplingDistributions REFERENCES MonteCarlo Free EnergyCalculations:UmbrellaSampling,"Journalof Arsham,H., Feuerverger, A., McLeish,D., Kreimer,J.,and Rubinstein, R. ComputationalPhysics,23, 187-199. (1989), "Sensitivity Analysisand the 'What if' Problemin Simulation Veach,E., and Guibas,L. (1995), "OptimallyCombiningSamplingTechAnalysis,"Mathematicaland ComputerModelling,12, 193-219. in SIGGRAPH '95 ConferenceProniquesforMonteCarlo Rendering," Bratley,P., Fox, B. J.,and Schrage,L. E. (1987), A Guide to Simulation ceedings,Reading,MA: Addison-Wesley, pp. 419-428.