Swift: Compiled Inference for Probabilistic Programming Languages

Transcription

Swift: Compiled Inference for Probabilistic Programming Languages
Yi Wu
UC Berkeley
[email protected]
Lei Li
Toutiao.com
[email protected]
Abstract
A probabilistic program defines a probability measure over its semantic structures. One common goal
of probabilistic programming languages (PPLs)
is to compute posterior probabilities for arbitrary
models and queries, given observed evidence, using a generic inference engine. Most PPL inference engines—even the compiled ones—incur significant runtime interpretation overhead, especially
for contingent and open-universe models. This
paper describes Swift, a compiler for the BLOG
PPL. Swift-generated code incorporates optimizations that eliminate interpretation overhead, maintain dynamic dependencies efficiently, and handle
memory management for possible worlds of varying sizes. Experiments comparing Swift with other
PPL engines on a variety of inference problems
demonstrate speedups ranging from 12x to 326x.
1
Introduction
Probabilistic programming languages (PPLs) aim to combine
sufficient expressive power for writing real-world probability
models with efficient, general-purpose inference algorithms
that can answer arbitrary queries with respect to those models. One underlying motive is to relieve the user of the obligation to carry out machine learning research and implement
new algorithms for each problem that comes along. Another
is to support a wide range of cognitive functions in AI systems and to model those functions in humans.
General-purpose inference for PPLs is very challenging;
they may include unbounded numbers of discrete and continuous variables, a rich library of distributions, and the ability
to describe uncertainty over functions, relations, and the existence and identity of objects (so-called open-universe models). Existing PPL inference algorithms include likelihood
weighting (LW) [Milch et al., 2005b], parental Metropolis–
Hastings (PMH) [Milch and Russell, 2006; Goodman et
al., 2008], generalized Gibbs sampling (Gibbs) [Arora et
al., 2010], generalized sequential Monte Carlo [Wood et
al., 2014], Hamiltonian Monte Carlo (HMC) [Stan Development Team, 2014], variational methods [Minka et al., 2014;
Kucukelbir et al., 2015] and a form of approximate Bayesian
Stuart Russell
Rastislav Bodik
UC Berkeley
University of Washington
[email protected] [email protected]
computation [Mansinghka et al., 2013]. While better algorithms are certainly possible, our focus in this paper is on
achieving orders-of-magnitude improvement in the execution
efficiency of a given algorithmic process.
A PPL system takes a probabilistic program (PP) specifying a probabilistic model as its input and performs inference to compute the posterior distribution of a query given
some observed evidence. The inference process does not (in
general) execute the PP, but instead executes the steps of an
inference algorithm (e.g., Gibbs) guided by the dependency
structures implicit in the PP. In many PPL systems the PP exists as an internal data structure consulted by the inference
algorithm at each step [Pfeffer, 2001; Lunn et al., 2000;
Plummer, 2003; Milch et al., 2005a; Pfeffer, 2009]. This process is in essence an interpreter for the PP, similar to early
Prolog systems that interpreted the logic program. Particularly when running sampling algorithms that involve millions of repetitive steps, the overhead can be enormous. A
natural solution is to produce model-specific compiled inference code, but, as we show in Sec. 2, existing compilers for general open-universe models [Wingate et al., 2011;
Yang et al., 2014; Hur et al., 2014; Chaganty et al., 2013;
Nori et al., 2014] miss out on optimization opportunities and
often produce inefficient inference code.
The paper analyzes the optimization opportunities for PPL
compilers and describes the Swift compiler, which takes as
input a BLOG program [Milch et al., 2005a] and one of three
inference algorithms (LW, PMH, Gibbs) and generates target
code for answering queries. Swift includes three main contributions: (1) elimination of interpretative overhead by joint
analysis of the model structure and inference algorithm; (2) a
dynamic slice maintenance method (FIDS) for incremental
computation of the current dependency structure as sampling
proceeds; and (3) efficient runtime memory management for
maintaining the current-possible-world data structure as the
number of objects changes. Comparisons between Swift and
other PPLs on a variety of models demonstrate speedups
ranging from 12x to 326x, leading in some cases to performance comparable to that of hand-built model-specific code.
To the extent possible, we also analyze the contributions of
each optimization technique to the overall speedup.
Although Swift is developed for the BLOG language, the
overall design and the choices of optimizations can be applied
to other PPLs and may bring useful insights to similar AI
Classical Probabilistic
Program Optimizations
(insensitive to inference
algorithm)
Classical Algorithm
Optimizations
(insensitive to model)
Inference Code 𝑷𝑰
Probabilistic
Program
(Model)
𝑷𝑴
𝓕𝑰 𝑷𝑴 → 𝑷𝑰
Algorithm-Specific
Code
𝓕cg 𝑷𝑰 → 𝑷𝑻
Target Code
(C++)
𝑷𝑻
Model-Specific Code
Our Optimization: FIDS
(combine model and algorithm)
Our Optimization:
Data Structures for FIDS
1
2
3
4
5
6
7
8
9
10
type Ball; type Draw; type Color; //3 types
distinct Color Blue, Green; //two colors
distinct Draw D[2]; //two draws: D[0], D[1]
#Ball ~ UniformInt(1,20);//unknown # of balls
random Color color(Ball b) //color of a ball
~ Categorical({Blue -> 0.9, Green -> 0.1});
random Ball drawn(Draw d)
~ UniformChoice({b for Ball b});
obs color(drawn(D[0])) = Green; // drawn ball is green
query color(drawn(D[1])); // query
Figure
2: The
urn-ball
model
Figure
2: The
urn-ball
model
1 type Cluster; type Data;
Figure 1: PPL compilers and optimization opportunities.
systems for real-world applications.
2
Existing PPL Compilation Approaches
In a general purpose programming language (e.g., C++), the
compiler compiles exactly what the user writes (the program).
By contrast, a PPL compiler essentially compiles the inference algorithm, which is written by the PPL developer, as
applied to a PP. This means that different implementations of
the same inference algorithm for the same PP result in completely different target code.
As shown in Fig. 1, a PPL compiler first produces an intermediate representation combining the inference algorithm (I)
and the input model (PM ) as the inference code (PI ), and then
compiles PI to the target code (PT ). Swift focuses on optimizing the inference code PI and the target code PT given a
fixed input model PM .
Although there have been successful PPL compilers, these
compilers are all designed for a restricted class of models
with fixed dependency structures: see work by Tristan [2014]
and the Stan Development Team [2014] for closed-world
Bayes nets, Minka [2014] for factor graphs, and Kazemi and
Poole [2016] for Markov logic networks.
Church [Goodman et al., 2008] and its variants [Wingate
et al., 2011; Yang et al., 2014; Ritchie et al., 2016] provide a lightweight compilation framework for performing
the Metropolis–Hastings algorithm (MH) over general openuniverse probability models (OUPMs). However, these approaches (1) are based on an inefficient implementation of
MH, which results in overhead in PI , and (2) rely on inefficient data structures in the target code PT .
For the first point, consider an MH iteration where we are
proposing a new possible world w0 from w accoding to the
proposal g(·) by resampling random variable X from v to v 0 .
The computation of acceptance ratio α follows
g(v 0 → v) Pr[w0 ]
α = min 1,
.
(1)
g(v → v 0 ) Pr[w]
Since only X is resampled, it is sufficient to compute α using merely the Markov blanket of X, which leads to a much
simplified formula for α from Eq.(1) by cancelling terms in
Pr[w] and Pr[w0 ]. However, when applying MH for a contingent model, the Markov blanket of a random variable cannot be determined at compile time since the model dependencies vary during inference. This introduces tremendous runtime overhead for interpretive systems (e.g., BLOG, Figaro),
which
track all
theD[20];
dependencies
at runtime
2 distinct
Data
// 20 data
points even though typ3 #Cluster
Poisson(4);
// the
number
of clusters
ically
only a~ tiny
portion of
dependency
structure may
4 random Real mu(Cluster c) ~ Gaussian(0,10); //cluster mean
change
per
iteration.
Church
simply
avoids
keeping
track of
5 random Cluster z(Data d)
dependencies
and uses Eq.(1),
including
6
~ UniformChoice({c
for Cluster
c}); a potentially huge
7 random for
Real
x(Data d)
~ Gaussian(mu(z(d)),
overhead
redundant
probability
computations.1.0); // data
8 obs x(D[0]) = 0.1; // omit other data points
For
the
second
point,
in
order
to
track the existence of
9 query size({c for Cluster c});
the variables in an open-universe model, similar to BLOG,
Church also maintains a complicated dynamic string-based
Figure 3: The infinite Gaussian mixture model
hashing scheme in the target code, which again causes interpretation overhead.
Lastly,
by Hur et
al. [2014]such
, Chaganty
generictechniques
approach proposed
is Monte-Carlo
sampling,
as likeli[
]
[
]
et al.
2013
and
Nori
et
al.
2014
primarily
focus
on optihood weighting [Milch et al., 2005a] (LW) and Metropolismizing
PM by
analyzing
the static
properties
ofidea
the input
PP,
Hastings
algorithm
(MH).
For LW,
the main
is to sample
which
arerandom
complementary
Swift.
every
variable to
from
its prior per iteration and collect weighted samples for the queries, where the weight is the
3 likelihood
Background
of the evidences. For MH, the high-level procedure
ofpaper
the algorithm
is summarized
in Alg. 1.although
M denotes
an inThis
focuses on
the BLOG language,
our apput BLOG
program
model),
its evidence,
Q asemanquery, N
proach
also applies
to (or
other
PPLsEwith
equivalent
the number
samples,
denotes
world,
[McAllester
ticsdenotes
et al., of
2008;
Wu etWal.,
2014].a possible
Other PPLs
(x)also
is the
value of xtoinBLOG
possiblevia
world
W ,single
Pr[x|W
] denotes
canWbe
converted
static
assigntheform
conditional
probability
of x in W[,Cytron
and Pr[W
] denotes
ment
(SSA form)
transformation
et al.,
1989; the
of].possible world W . In particular, when the proHurlikelihood
et al., 2014
posal distribution g in Alg.1 is the prior of every variable, the
3.1algorithm
The BLOG
Language
becomes
the parental MH algorithm (PMH). Our
in the paper
focuses
LW and
PMHprobabilbut our pro[Milch
] defines
Thediscussion
BLOG language
et al.,on2005a
solution
to other
Monte-Carlo
methods
well.
ity posed
measures
over applies
first-order
(relational)
possible
worlds;asin
this sense it is a probabilistic analogue of first-order logic. A
BLOG
declares
types for objects and defines distri4 program
The Swift
Compiler
butions over their numbers, as well as defining distributions
has the
following
design applied
goals for
handlingObopenfor Swift
the values
of random
functions
to objects.
universe
probabilistic
models.
servation
statements
supply
evidence, and a query statement
specifies
the posterior
probability
of interest.
A random vari• Provide
a general
framework
to automatically
and efable in BLOG
to exact
the application
of a random
ficientlycorresponds
(1) track the
Markov blanket
for each
functionrandom
to specific
objects
a possible
BLOG natvariable
andin (2)
maintainworld.
the minimum
set of
urally supports open universe probability models (OUPMs)
and context-specific dependencies.
Algorithm
1: Metropolis-Hastings
Fig.
2 demonstrates
the open-universealgorithm
urn-ball (MH)
model. In
this version
the, E
query
asks
for the
color ofHthe next random
Input: M
, Q, N
; Output
: samples
pick1 from
an urn
given world
the colors
of balls
drawn previously.
initialize
a possible
W0 with
E satisfied;
Line2 4forisi a←number
statement stating that the number vari1 to N do
randomly
pick a variable
x from
able3 #Ball,
corresponding
to the
totalWnumber
of balls, is
i−1 ;
4
Wdistributed
and v ← 1
Wand
i ← Wi−1 between
i−1 (x)
uniformly
20.; Lines 5–6 declare a
5
propose color(·),
a value v 0applied
for x viatoproposal
g;
random
function
balls, picking
Blue or
6
Wi (x)
← v 0 and
ensure WiLines
is supported;
Green
with
a biased
probability.
7–8
state
that
each
0
→v) Pr[W
i ] the urn, with replacedraw
a ball1,at g(v
random
from
7 chooses
α ← min
;
g(v→v 0 ) Pr[Wi−1 ]
ment.
8
if rand(0, 1) ≥ α then Wi ← Wi−1
Fig. 3 describes
OUPM,
the infinite Gaussian mixH ← H +another
(Wi (Q))
;
ture model (∞-GMM). The model includes an unknown
r
t
1
t
• P
s
i
For
(Fram
gener
goal,
code (
For
demo
discus
Sec. 4
Swift
only s
contin
4.1
Our d
thoug
includ
adapti
ing (R
and R
Dyna
Dynam
a dyn
variab
comp
piler):
and q
during
For
input
is use
and (2
the cu
infere
DB
is to r
struct
target
For
x(d) i
doubl
//
mem
//
val
ret
Since
mu(c
Notab
dency
non-e
8
~ UniformChoice({b for Ball b});
9 obs color(drawn(D[0])) = Green; // drawn ball is green
1
type
10 1 query
color(drawn(D[1]));
//Color;
query //3
type Ball;
Ball; type
type Draw;
Draw; type
type
Color;
//3 types
types
2 distinct Color Blue, Green; //two colors
2 distinct Color Blue, Green; //two colors
3 distinct Draw D[2]; //two draws: D[0], D[1]
3 distinct Draw D[2]; //two draws: D[0], D[1]
4 #Ball ~ UniformInt(1,20);//unknown # of balls
4 #Ball ~ UniformInt(1,20);//unknown # of balls
5 random Color color(Ball b) //color of a ball
5 random Color color(Ball b) //color of a ball
6
~~ Categorical({Blue
-> 0.9,
Categorical({Blue
0.9, Green
Green ->
-> 0.1});
0.1});
176 type
Cluster;
type Data; ->
random
drawn(Draw
d)
random Ball
Ball
drawn(Draw
d)data points
2 7 distinct
Data
D[20];
//
20
8
~
UniformChoice({b
for
Ball
b});
~ UniformChoice({b
fornumber
Ball b});
3 8 #Cluster
~ Poisson(4); //
of clusters
9 obs color(drawn(D[0])) = Green; // drawn ball is green
obs color(drawn(D[0]))
// drawn //cluster
ball is green
4 9 random
Real mu(Cluster c) =~ Green;
Gaussian(0,10);
mean
10 query color(drawn(D[1])); // query
query Cluster
color(drawn(D[1]));
// query
510 random
z(Data d)
6
~ UniformChoice({c for Cluster c});
7 random Real x(Data d) ~ Gaussian(mu(z(d)), 1.0); // data
8 obs x(D[0]) = 0.1; // omit other data points
9 query size({c for Cluster c});
structures for Monte Carlo sampling algorithms to avoid
variables
evaluate
the
query
interpretive
overhead
atnecessary
runtime.
therandom
evidence
(the dynamic
sliceto
random
variables
necessary
to[Agrawal
evaluateand
the Horgan,
query and
and
[
the
evidence
(the
dynamic
slice
Agrawal
and
Horgan,
]
, evidence
or goal,
equivalently
the minimum
par[self-supported
For 1990
thethe
first
we
propose
a novel
FIDS
(the
dynamic
slice framework
Agrawal and
Horgan,
Figure 2: The urn-ball model
or
equivalently
the
self-supported
[Milch
]). Dynamic
tial1990
world
et al., updating
2005a
(Framework
of]],,Incrementally
Slices) for par1990
or
equivalently
the minimum
minimum
self-supported
par[
]
tial
world
Milch
et
al.,
2005a
).
generating
the
inference
code
(P
in
Fig.
1).
For
the
second
[
]
tial world
Milch
et al.,I 2005a
).
•
Provide
efficient
memory
management
and
fast
data
• Provide
efficient
memory
management
fast
datadata
goal,
we
carefully
choose
structures
inalgorithms
the and
target
C++
•structures
efficient
memory
management
and
fast
for
Montedata
Carlo
sampling
to
avoid
• Provide
Providefor
efficient
memory
management
andto
fast
data
structures
Monte
Carlo
sampling
algorithms
avoid
code (P
).
Tstructures for Monte Carlo sampling algorithms to avoid
interpretive
overhead
at
runtime.
structures
for
Monte
Carlo
sampling
algorithms
to
avoid
overhead
at runtime.
Forinterpretive
convenience,
r.v.
is short
for random variable. We
interpretive
overhead
atatruntime.
interpretive
overhead
runtime.
For
the
first
goal,
we
propose
novel
framework
FIDS
demonstrate
examples
of
compiled
inframework,
the following
ForFor
thethe
firstfirst
goal,
we we
propose
aacode
novel
FIDS
goal,
propose
aa novel
framework
FIDS
For
the
first
goal,
we
propose
novel
framework
FIDS
discussion:
the
inference
code
(P
)
generated
by
FIDS
in
(Framework
of
Incrementally
updating
Dynamic
Slices)
for
I
Figure
2:
The
urn-ball
model
Figure
2:
The
urn-ball
model
(Framework
for of
Incrementally
updating
Dynamic
Slices),
for
Figure 2: The urn-ball model
(Framework
Incrementally
updating
Dynamic
Slices)
(Framework
of
Incrementally
updating
Dynamic
for
Sec.
4.1 are all
ininference
pseudo-code
while
theFig.
target
code
(P
)second
by for
generating
the
code
(P
in
1).
For
the
T Slices)
I
generating
thethe
inference
code
(PI(PinI Fig.
1). 1).
ForFor
thethe
second
generating
inference
code
in
second
Swift
shown
in Sec.
4.2
are indata
C++.
Due
toFig.
limited
space,
weC++
generating
the
inference
code
(P
in
Fig.
1).
For
the
second
goal,
we
carefully
choose
structures
in
the
target
I
goal,
we carefully
choose
data structures
in the
target
C++
goal,
choose
target
2 distinct Data D[20]; // 20 data points
only
show
detailed transformation
rules for in
the
adaptive
goal,
we
carefully
choose data
data structures
structures
in the
the
target C++
C++
code
(PTwe
).thecarefully
2 distinct
Data
D[20];
// 20
data points
Figure
3:The
Theinfinite
infinite
Gaussian
mixture
model
Figure
3:
Gaussian
mixture
model
code
(P
).
3 #Cluster
~
Poisson(4);
//
number
of
clusters
T
code
(P
).
TTupdating
contingency
(ACU)
technique
and
omit the
others. We
3 #Cluster ~ Poisson(4); // number of clusters
code
(P
).
For
convenience,
r.v.
is
short
for
random
variable.
ForFor
convenience, r.v.r.v.
is short
for random
variable.
We
isis short
random
For convenience,
convenience,
r.v.compiled
short for
for
random
variable.
We
demonstrate
examples of
of
code
in the
thevariable.
followingWe
demonstrate
examples
compiled
code
in
following
4.1
Optimizations
in
FIDS
demonstrate
examples
of
code
the
following
6Algorithm
~~ UniformChoice({c
for
c});
1: Metropolis–Hastings
algorithm
(MH)
approach
is Monte-Carlo
sampling,
such
as likelidemonstrate
examples
of compiled
compiled
code in
in by
theFIDS
following
6 generic
UniformChoice({c
for Cluster
Cluster
c});
discussion:
the
inference
code
(P
)
generated
in
I
discussion:
theinthe
inference
code
(PI(P
)onIgenerated
by by
FIDS
in
7 random Real x(Data d) ~ Gaussian(mu(z(d)), 1.0); // data
discussion:
inference
code
)LW
generated
FIDS
Our
discussion
this
section
focuses
andcode
PMH,
]H
7 hood
random
x(Data
d)et~ al.,
Gaussian(mu(z(d)),
// data
2005adata
(LW)
and1.0);
Metropolisdiscussion:
the
inference
code
(P
generated
by
FIDS
in
:weighting
MReal
, E, =Q
,[Milch
N ; Output
: samples
Sec.
4.1isare
all
in
pseudo-code
while
the
(Pal) by in
I ) target
T
8 Input
obs x(D[0])
0.1;
// omit
other
points
Sec.
4.1
in
pseudo-code
while
the
target
code
(P
)
by
Swift
8 obs x(D[0]) = 0.1; // omit
other
data
points
T
though
FIDS
can
be
also
applied
to
other
algorithms.
FIDS
Sec.
4.1
are
all
in
pseudo-code
while
the
target
code
(P
))by
(0)
algorithm
(MH).
For
LW,
the
main
idea
is
to
sample
T
query
size({c
for
Cluster
c});
Sec.
4.1
are
all
in
pseudo-code
while
the
target
code
(P
by
199 Hastings
initialize
a
possible
world
w
with
E
satisfied;
Swift
shown
in
Sec.
4.2
are
in
C++.
Due
to
limited
space,
we
T
query size({c for Cluster c});
shown
inthree
Sec. optimizations:
4.2
is in 4.2
C++.
Due
to limited
space,
wespace,
show
includes
dynamic
Swift
shown
in
are
in
Due
to
limited
random
from its prior per iteration and col2 every
for i ←
1 to N variable
do
Swift
shown
inSec.
Sec. 4.2
are
inC++.
C++.backchaining
Due
tofor
limited
space,we
we
only
show
the
detailed
transformation
rules
the(DB),
adaptive
the
detailed
transformation
rules
only
for
the
adaptive
continadaptive
contingency
updating
(ACU) and reference
count(i−1)
only
show
the
detailed
transformation
rules
for
the
adaptive
Figure
3:samples
The
infinite
mixture
weighted
for
theGaussian
weight
is the
3 lect randomly
pick
a variable
Xqueries,
from wwhere
; themodel
only
show
the
detailed
transformation
rules
for
the
adaptive
contingency
updating
(ACU)
technique
and
omit
the
others.
Figure
3:
The
infinite
Gaussian
mixture
model
Figure
The infinite
Gaussian
mixture model
gency
updating
(ACU)
andand
omit
the while
others.the
ing
(RC).
DB is updating
applied technique
both LW
PMH
likelihood
of the3:
evidences.
For MH,
the high-level
procedure
contingency
technique
and
contingency
updatingto(ACU)
(ACU)
technique
andomit
omit ACU
theothers.
others.
4
w(i) ← w(i−1) and v ← X(w(i−1) );
and RC are particularly for PMH.
of the algorithm is summarized
in Alg. 1. M denotes an in0
4.1
Optimizations
in
FIDS
5
propose
a
value
v
for
X
via
proposal
g
;
4.14.1
in in
FIDS
generic
approach
is (or
Monte-Carlo
sampling,
such
as likeliOptimizations
put
BLOG
program
model), E
its
evidence,
Qsuch
a query,
N
4.1Optimizations
Optimizations
inFIDS
FIDS
generic
approach
sampling,
as
(i)
0 is
backchaining
generic
approach
is Monte-Carlo
Monte-Carlo
sampling,
such world,
as likelilikeli- Dynamic
6
X(wthe
)number
←[vMilch
and
ensure
w(i)
is
self-supporting;
]
Our
discussion
in
this
section
focuses
onon
LWLW
andand
PMH,
al-alhood
weighting
et
al.,
2005a
(LW)
and
Metropolisdenotes
of
samples,
W
denotes
a
possible
OurOur
discussion
in
this
section
focuses
on
LW
and
PMH,
al ]] (LW) and Metropolis- Dynamic
[[Milch
discussion
in
this
section
PMH,
hood
weighting
et
al.,
2005a
0
(i)
backchaining
(DB)
aims
tofocuses
incrementally
construct
Our FIDS
discussion
in also
this section
focuses
on
LW
and
PMH,
alhood
weighting
Milch
et
al.,
(LW)
and
Metropolisg(v
→v)For
Pr[w
]2005a
though
can
be
applied
to
other
algorithms.
FIDS
Hastings
algorithm
(MH).
LW,
the
main
idea
is
to
sample
(x)
is
the
value
of
x
in
possible
world
W
,
Pr[x|W
]
denotes
;
7 W
α
←
min
1,
though
FIDS
can
be
also
applied
to
other
algorithms.
FIDS
though
FIDS
can
be
also
applied
to
other
algorithms.
FIDS
(i−1)
0 ) Pr[w
Hastings
algorithm
(MH).
For
LW,
the
main
idea
isisto
sample
aincludes
dynamic
slice
in
eachbeLW
iteration
and
sample
those
g(v→v
]
thoughthree
FIDS
can
also
applied
to only
other
algorithms.
FIDS
Hastings
algorithm
(MH).
For
LW,
the
main
idea
to
sample
optimizations:
dynamic
backchaining
(DB),
every
random
variable
from
its
prior
per
iteration
and
colthe conditional
probabilityfrom
of (i)
x in
W ,(i−1)
andper
Pr[W
] denotes
thecol- variables
includes
three
optimizations:
dynamic
backchaining
includes
three
optimizations:
dynamic
backchaining
random
its
iteration
and
necessary
at
runtime. DB
is
adaptive
from idea (DB),
of (DB),
8 every
if rand(0,
1)variable
≥ α for
thenthe
wqueries,
← prior
w where
includes
three
optimizations:
dynamic
backchaining
(DB),
every
random
variable
from
perthe
iteration
adaptive
contingency
updating
(ACU)
andand
reference
countlect
weighted
weight
isprothecol- adaptive
likelihood
ofsamples
possible
world
W . its
In prior
particular,
when
theand
contingency
updating
(ACU),
and
reference
countadaptive
contingency
updating
(ACU)
reference
count(i) for the queries, where the weight is the compiling
lect
weighted
samples
lazy
evaluation
(also
similar
to
the
Prolog
comadaptive
contingency
updating
(ACU)
and
reference
countH
←
H
+
(Q(w
))
;
lect
weighted
samples
for
the
queries,
where
the
weight
is
the
inging
(RC).
DBDB
applied
to both
both
LWLW
andand
PMH
while
ACU
likelihood
of the evidences.
MH,
the of
high-level
procedure
posal distribution
g in Alg.1For
is the
prior
every variable,
the
ing
(RC).
DB
isis applied
to
LW
and
PMH
while
ACU
(RC).
is
to
PMH
while
ACU
piler):
in
each
iteration,
DB
backtracks
from
the
evidence
likelihood
of
evidences.
For
MH,
the
high-level
procedure
ing
(RC).
DB
is applied
applied
to both
both
LW
and
PMH
while
ACU
likelihood
ofthe
the
evidences.
For
MH,
the
high-level
procedure
andand
RC
are
particularly
for
PMH.
becomes
the parental
algorithm
(PMH).
ofalgorithm
the
algorithm
is summarized
inMH
Alg.
1. M
denotes
anOur
in- and
and
RC
are
specific
to
PMH.
RC
are
particularly
for
PMH.
query
and
samples
a
r.v.
only
when
its
value
is
required
of
the
algorithm
is
summarized
in
Alg.
1.
M
denotes
an
inand
RC
are
particularly
for
PMH.
of
the algorithm
is
inevidence,
Alg.
1. M
denotes
an
discussion
in the paper
focuses on
LW
and PMH
our proput
BLOG
program
(orsummarized
model),
E its
Qbut
a query,
N in- during inference.
put
program
(or
model),
E
its
Q
N
Dynamic
backchaining
putBLOG
BLOG
program
(or
model),
E
itsevidence,
evidence,
Qaas
aquery,
query,
N Dynamic
Dynamic
backchaining
backchaining
posed
solution
applies
to
other
Monte-Carlo
methods
well.
denotes
the
number
of
samples,
W
denotes
a
possible
world,
Dynamic
backchaining
For
everybackchaining
r.v.
declaration
random
X ∼ CX ; in
the
number
of clusters
as stated
in line W
3,
which
is ato
be inferred
denotes
the
number
of
samples,
denotes
possible
world,
Dynamic
(DB)
toTincrementally
construct
denotes
the
number
of
samples,
W
denotes
a
possible
world,
[aims
] constructs
Dynamic
backchaining
(DB)
aims
incrementally
construct
Dynamic
backchaining
Milch
2005a
W
(x)
is
the
value
of
x
in
possible
world
W
,
Pr[x|W
]
denotes
Dynamic
backchaining
(DB)
aimsetto
toal.,
incrementally
construct
input
PP,
FIDS
generates (DB)
a getter
function
get
X(),
which
from
the
data.
W
(x)
is
the
value
of
x
in
possible
world
W
,
Pr[x|W
]
denotes
dynamic
slice
in in
each
LWLW
iteration
andand
only
sample
those
(x)
is Swift
the value
of x inofpossible
W , Pr[x|W
] denotes
aa dynamic
slice
each
iteration
only
sample
those
slice
incrementally
inX
each
LW
iteration
andCX
samthe4W
conditional
probability
x in W world
, and Pr[W
] denotes
the isaa dynamic
The
Compiler
dynamic
slice
in
each
LW
iteration
and
only
sample
those
used
to
(1)
sample
a
value
for
from
its
declaration
the
probability
of
in
W
]]denotes
the
variables
necessary
at
runtime.
DB
is
adaptive
from
idea
of of
theconditional
conditional
probability
of.xxIn
inparticular,
W,,and
andPr[W
Pr[W
denotes
the and
variables
necessary
at
runtime.
DB
is
adaptive
from
idea
ples
only
those
variables
necessary
at
runtime.
DB
in
Swift
likelihood
of
possible
world
W
when
the
pro(2)
memoize
the
sampled
value
for
later
references
in
variables
necessary
at
runtime.
DB
is
adaptive
from
idea
of
3.2Swift
Generic
Inference
Algorithms
likelihood
of
possible
world
W
.
In
particular,
when
the
prohas the
followingworld
design
goals
for handling
opencompiling
lazy
evaluation
(also
similar
to
the
Prolog
comlikelihood
of
possible
W
.
In
particular,
when
the
procompiling
lazy
evaluation
(also
to
the
comis an
example
of
compiling
lazy
in
iteration,
posal
distribution
g in Alg.1
is the prior of every variable, the the
current
iteration.
Whenever
aevaluation:
r.v.similar
is referred
to Prolog
during
compiling
lazy
evaluation
(also
similar
to each
the
Prolog
comuniverse
probabilistic
models.
posal
distribution
g
in
Alg.1
is
the
prior
of
every
variable,
the
All
PPLsdistribution
need
to infer
posterior
ofvariable,
a query
piler):
inits
each
iteration,
DB
backtracks
from
thethe
evidence
posal
g inthe
Alg.1
isMH
the distribution
prior
of every
piler):
in
each
iteration,
DB
from
DB
backtracks
from
the evidence
and query
and
an
algorithm
becomes
the
parental
algorithm
(PMH).
Ourthe inference,
function
will
bebacktracks
evoked.
piler):
ingetter
each
iteration,
DB
backtracks
fromsamples
the evidence
evidence
algorithm
becomes
the
parental
MH
algorithm
(PMH).
given
observed
evidence.
The
majority
use
and
query
and
samples
a
r.v.
only
when
its
value
is
required
algorithm
becomes
the
parental
MH
algorithm
(PMH).
Our r.v.
• the
Provide
a general
framework
to and
automatically
and proef-Our
and
query
and
samples
a
r.v.
only
when
its
value
isis required
only
when
its
value
is
required
during
inference.
discussion
in the
paper
focuses
ongreat
LW
PMHof
butPPLs
our
DB
is
the
fundamental
technique
for
Swift.
The
key
insight
and
query
and
samples
a
r.v.
only
when
its
value
required
discussion
in
the
paper
focuses
on
LW
and
PMH
but
our
proMonte
Carlo sampling
methods
such
as
likelihood
inference.
ficiently
(1)
the
exact
Markov
blanket
for
discussion
inapplies
thetrack
paper
focuses
on
LW
and
PMHweighting
but
our
during
inference.
every
declaration
random
T interpretive
X ∼ CX ;data
in the
posed
solution
to
other
Monte-Carlo
methods
as each
well.pro- isduring
toFor
replace
ther.v.
dependency
look-up
in some
during
inference.
posed
solution
applies
to
Monte-Carlo
methods
as
well.
(LW)
and
Metropolis–Hastings
MCMC
(MH).
LW samples
random
variable
and
(2)
maintain
the
minimum
set
For
every
r.v.generates
declaration
random
T TX
∼∼
CXCC
;Xwhich
in in
thethe
posed
solution
applies
toother
other
Monte-Carlo
methods
asof
well. structure
every
r.v.
declaration
random
X
;
direct
machine
seeking
in
the
input For
PP,
aaddress/instruction
getter
function
get
X(),
ForbyFIDS
every
r.v.
declaration
random
T
X
∼
X ; in the
unobserved random variables sequentially in topological orinput
PP,
FIDS
generates
a
getter
function
get
X(),
which
input
PP,
FIDS
generates
a
getter
function
get
X(),
which
target
executable
file.
is used
(1)FIDS
sample
a value afor
X from
its declaration
X
4 The Swift on
Compiler
inputtoPP,
generates
getter
function
get X(),Cwhich
der,44conditioned
evidence
and sampled values earlier in
The
Swift
Compiler
isFor
used
to
(1)
sample
asampled
value
for
XX
from
its its
declaration
example,
thesample
inference
code
for
the
getter
function
ofCin
isis(2)
used
to
(1)
aa value
for
from
declaration
CCXX
X
and
memoize
the
value
for
later
references
The
Swift
Compiler
used
to
(1)
sample
value
for
X
from
its
declaration
Algorithm
1:following
Metropolis-Hastings
algorithm
(MH)
Swift
has theeach
design
goals
for handling
the
sequence;
complete
sample
is
weighted
by theopenlikeand
(2)
memoize
the
sampled
value
for
later
references
in
x(d)
in
the
∞-GMM
model
(Fig.
3)
is
shown
below.
and
(2)
memoize
the
sampled
value
for
later
references
the
current
iteration.
Whenever
an
r.v.
is
referred
to
during
and (2) memoize the sampled value for later references in
in
Swift
has
the
design
goals
handling
openuniverse
Swift
has
the
following
design
goals
forthe
handling
open- thethe
Input
:the
M,evidence
E
, Qfollowing
, N ;models.
Output
: samples
H for
lihood
of probabilistic
variables
appearing
in
sequence.
current
Whenever
a evoked.
r.v.
is is
referred
to to
during
current
iteration.
Whenever
aa r.v.
inference,
itsiteration.
getter
function
double
get_x(Data
d)
{ will be
the
current
iteration.
Whenever
r.v.
is referred
referred
to during
during
universe
probabilistic
models.
1
initialize
a
possible
world
W
with
E
satisfied;
universe
probabilistic
models.
0
The• MH
algorithm
is summarized
1. M denotes
a
inference,
getter
function
will
be be
evoked.
Provide
a general
framework in
to Alg.
automatically
and ef//
someitscode
the
sampled
value
inference,
its
function
will
evoked.
Compiling
DB
ismemoizing
the
fundamental
technique
Swift. The
inference,
itsgetter
getter
function
will
be
evoked.for
2 for i ← 1 to N do
•ficiently
Provide
general
framework
to
automatically
and
BLOG
model,
Eaaits
evidence,
Q aMarkov
query,
Nblanket
the number
of efmemoization;
DB
is
the
fundamental
technique
for
Swift.
The
keykey
insight
(1)
track
the
exact
for
each
DB
is
the
fundamental
technique
for
Swift.
The
insight
•
Provide
general
framework
to
automatically
and
efkey
insight
is
to
replace
dependency
look-ups
and
method
in3
randomly pick a variable x from Wi−1 ;
DB
is
the
fundamental
technique
for
Swift.
The
key
insight
ifreplace
notthesampled,
sample
a innew
value
ficiently
(1)
track
the
exact
Markov
each
samples,
w a←possible
world,
X(w)
the
value
ofblanket
randomfor
variis//
to
dependency
look-up
some
interpretive
data
variable
and
maintain
the minimum
set
of
isisreplace
to
the
dependency
look-up
in
some
interpretive
ficiently
(1)and
track
the
exact
blanket
for
each vocations
from
some
internal
PP
data
structure
with
direct
4 random
W
Wi−1
v ←(2)
W
(x)
; Markov
i
i−1
the dependency look-up in some interpretivedata
data
valto=replace
sample_gaussian(get_mu(get_z(d)),1);
variable
maintain
the minimum
set
able
X random
in
w, variables
Pr[X(w)|w
denotes
the gconditional
proba0 and
structure
by
direct
machine
address/instruction
seeking
inexethe
-X ]x(2)
to evaluate
query
and
structure
by
direct
machine
address/instruction
seeking
in
the
random
and
(2)
maintain
minimum
set of
of machine
address
accessing
and
branching in the
target
5 random
propose
avariable
value vnecessary
for
via proposal
; thethe
structure
by
direct
machine
address/instruction
seeking
in
the
return
val;
}
0
bility
of W
Xi (x)
in ←
w,vand
the likelihood of postarget
executable
file.
target
executable
file.
6
andPr[w]
ensuredenotes
W
cutable
file.
i is supported;
target
executable
file.
the proposal distribution
Since
get_z(
· ) the
is the
called
before
get_mu(
·getter
),getter
only
those of
sible world w. In particular,
when
ForFor
example,
the
inference
code
forfor
thethe
function
of of
g(v 0 →v) Pr[W
example,
inference
code
function
i]
For
example,
inference
code
for
the
getter
function
For
example,
the
inference
code
for
the
getter
function
of
α ← min
1, g(v→v
; algorithm (MH)
0 ) Pr[W
mu(c)
corresponding
to
non-empty
clusters
will
be
sampled.
i−1 ]
1: 1:
Metropolis-Hastings
algorithm
g Algorithm
in7 Algorithm
Alg.1
samples
a variable
conditioned
on its(MH)
parents,
Algorithm
1:Metropolis-Hastings
Metropolis-Hastings
algorithm
(MH) the
x(d)
in
the
∞-GMM
model
(Fig.
3)
is
shown
below.
x(d)
in
the
∞-GMM
model
(Fig.
3)
is
shown
below.
x(d)
in
the
∞-GMM
model
(Fig.
3)
is
shown
below.
x(d) with
in thethe
∞-GMM
model
(Fig.
3) no
is shown
below.
8
if
rand(0,
1)
≥
α
then
W
←
W
Notably,
inference
code
above,
explicit
depeni
i−1
Input
:: M
Q
,,;N
;parental
::MH
samples
algorithm
becomes
the
algorithm
(PMH). Our
Input
:M
, E,, E
Q
N
Output
: samples
HH
Input
E, ,(W
Q
N
;Output
Output
samples
H
double
get_x(Data
double
get_x(Data
d)d)
{ at{{runtime to discover those
H in
←M
+
; Won
i (Q))
double
get_x(Data
d)
dency
look-up
is ever required
1 initialize
aH
world
E
satisfied;
0 with
discussion
the
paper
focuses
LW
and
PMH but our pro1 initialize
a possible
world
W0W
with
E satisfied;
1 initialize
apossible
possible
world
E
satisfied;
0 with
//
some
code
memoizing
the
sampled
value
//
some
code
memoizing
the
sampled
value
//
some
code
memoizing
the
sampled
value
non-empty clusters.
2 for
ii←
to
2 for
isolution
←
1←to11N
do
posed
applies
2 for
toN
N do
doto other Monte Carlo methods as well.
randomly
pick
aavariable
xxfrom
W
randomly
pick
a variable
x from
Wi−1
; ;;
randomly
pick
variable
from
Wi−1
i−1
←
W
and
v
←
W
(x)
;
i
i−1
i−1
WiW
←
W
and
v
←
W
(x)
;
i−1
i−1
Wi ← Wi−1 and 0v ← Wi−1 (x);
4 The
Swift
Compiler
propose
a
xxvia
proposal
propose
a value
v 0 vfor
x
proposal
g ; gg;;
propose
avalue
value
v 0 for
forvia
via
proposal
0 v 0 0 and ensure W is supported;
W
(x)
←
i (x)
i supported;
Wi (x)
v
and
ensure
W
isgoals
has
design
for
handling openiW
Wthe
←
v
and
ensure
is
supported;
i ←following
i
g(v
] 0
Pr[W
0→v)
g(v 0g(v
→v)
Pr[W
i ] ii ]
universe
probability
models.
→v)
Pr[W
αα←
min
1,
;
0
7 77 α ←
min
1,
;
0
) Pr[W
]
← min 1, g(v→v
;
3
3
4 44
5 55
6 66
Swift
3
8
0 ) Pr[W
g(v→v
) Pr[W
]i−1 ]
i−1i−1
g(v→v
ififrand(0,
≥
ααthen
W
i ←
if rand(0,
1) 1)
≥
WiW
←
rand(0,
1)α
≥then
then
W
←i−1
Wi−1
i W
i−1
←
H
+
(W
(Q))
;
i
HH
←
H
+
(W
(Q))
;
i
H ← H + (W
i (Q));
•88 Provide a general framework to automatically and efficiently (1) track the exact Markov blanket for each
random variable and (2) maintain the minimum set of
random variables necessary to evaluate the query and
the evidence (the dynamic slice [Agrawal and Horgan,
1990], or equivalently the minimum self-supporting partial world [Milch et al., 2005a]).
memoization;
memoization;
memoization;
not
sampled,
sample
value
////
ifif
not
sampled,
sample
a anew
value
//
if
not
sampled,
sample
a new
new
value
val
val
= =sample_gaussian(get_mu(get_z(d)),1);
val
= sample_gaussian(get_mu(get_z(d)),1);
sample_gaussian(get_mu(get_z(d)),1);
return
val;
return
val;
} }}
return
val;
Since
get_z(
is
before
get_mu(
those
memoization
some
pseudo-code
for
Since
get_z(
· ) ··is))denotes
called
before
get_mu(
· ),·snippet
those
Since
get_z(
is called
called
before
get_mu(
·),only
), only
only
those
mu(c)
corresponding
to
non-empty
clusters
will
be
sampled.
performing
memoization.
Since
get_z(·)
is
called
before
mu(c)
corresponding
to
non-empty
clusters
will
be
sampled.
mu(c) corresponding to non-empty clusters will be sampled.
Notably,
with
inference
code
above,
depenget_mu(·),
only
those
mu(c)
corresponding
toexplicit
non-empty
Notably,
with
thethe
inference
code
above,
nono
explicit
depenNotably,
with
the
inference
code
above,
no
explicit
dependency
look-up
is
required
runtime
to
those
clusters
will
be issampled.
Notably,
thetoinference
code
dency
look-up
ever
required
at at
runtime
discover
those
dency
look-up
is ever
ever
required
atwith
runtime
to discover
discover
those
non-empty
clusters.
above,
no
explicit
dependency
look-up
is
ever
required
at
runnon-empty
clusters.
non-empty clusters.
time to discover those non-empty clusters.
When sampling a number variable, we need to allocate
memory for the associated variables. For example, in the urnball model (Fig. 2), we need to allocate memory for color(b)
When sampling a number variable, we need to allocate
When sampling
a number variables,
variable, we
to allocate
memory
for the associated
i.e.,need
in the
urn-ball
memory
for 2),
thewe
associated
variables,
i.e., in
urn-ball
model (Fig.
need to allocate
memory
forthe
color(b)
afmodel
(Fig. 2),#Ball.
we needThe
to allocate
memorygenerated
for color(b)
ter sampling
corresponding
codeaf-is
ter
sampling
is
after
sampling
#Ball.The
Thecorresponding
correspondinggenerated
generatedcode
infershown
below.#Ball.
shown
below.
ence
code
is shown below.
int get_num_Ball() {
int//get_num_Ball()
some code for {memoization
//
some
code for memoization
memoization;
memoization;
// sample a value
//
a value
valsample
= sample_uniformInt(1,20);
val
= sample_uniformInt(1,20);
// some
code allocating memory for color
//
some code allocating memory for color
allocate_memory_color(val);
allocate_memory_color(val);
return val; }
return val; }
allocate_memory_color(val) denotes some pseudoallocate_memory_color(val)
denotes some
some pseudoallocate_memory_color(val)
denotes
code segment for allocating val chunks
of memory for the
code segment for allocating val chunks of memory for the
values of color(b).
values of color(b).
Reference Counting
Reference Counting
Reference counting (RC) generalize the idea of DB to increReference
counting
(RC)
generalize
the idea of
to increReference
counting the
(RC)
generalize
of DB
DB
mentally maintain
dynamic
slicetheinidea
PMH.
RC to
is increa effimentally
maintain
the
dynamic
slice
ininPMH.
RC
isisana effimentally
maintain
the
dynamic
slice
PMH.
RC
cient compilation strategy for the interpretive BLOG toeffidycient
compilation
strategy
for
interpretive
BLOG to
dycient
compilation
strategy
for the
the
interpretive
dynamically
maintain
references
to variables
andBLOG
excludetothose
namically
maintain
references
to
and
those
namically
maintain
references
to variables
variables
and exclude
exclude
without any
references
from the
current possible
worldthose
with
without
any
references
from
the
current
possible
world
with
without
any
references
from
the
current
possible
world
with
minimal runtime overhead. RC is also similar to dynamic
minimal
runtime
RC
to
minimal
runtime overhead.
overhead.
RC is
is also
also similar
similar
to dynamic
dynamic
garbage collection
in programming
language
community.
garbage
collection
in
programming
language
community.
garbage
collection
in
programming
language
community.
For an
a r.v.
being
tracked,
say X,
X, RC
RC maintains
maintains aa referreferFor
r.v. being
being tracked,
tracked, say
say
Forcounter
a r.v.
X,cnt(X)
RC maintains
a reference
cnt(X)
defined
by
=
|Ch(X|W
)|.
ence
counter cnt(X)
cnt(X) defined
defined by
by cnt(X)
cnt(X) ==|Ch(X|W
|Chw (X)|.
ence
counter
)|..
Ch(X|W
)
denote
the
children
of
X
in
the
possible
world
W
Ch
the
of X
in the
possible world
world W
w.
w (X) denote
Ch(X|W
) denote
thechildren
children
the possible
When
cnt(X)
becomes
zero,of
XXis
isin
removed
from w;
W ;when
when.
When
cnt(X)
becomes
zero,
X
removed
from
When
cnt(X)
becomes
zero,
X
is
removed
from
W
;
when
cnt(X)become
becomepositive,
positive,X
X isinstantiated
instantiatedand
and addedback
back
cnt(X)
cnt(X)
become
positive,
X is
is instantiated
and added
added back
to
W
.
This
procedure
is
performed
recursively.
to
is performed recursively.
to w.
W
.This
Thisprocedure
procedure
Tracking
referencesistotoperformed
everyr.v.
r.v.recursively.
mightcause
causeunnecessary
unnecessary
Tracking
references
every
might
Tracking
references
to
every
r.v.
unnecessary
overhead.For
Forexample,
example,for
forclassical
classicalmight
Bayescause
nets,RC
RC
neverexexoverhead.
Bayes
nets,
never
overhead.
For
example,
for classical
Bayes nets,
RC
never
excludes
any
variables.
Hence,
as
a
trade-off,
Swift
only
counts
cludes
any
Hence,
aa trade-off,
Swift
cludes
any variables.
variables.
Hence, as
asmodels,
trade-off,
Swift only
onlytocounts
counts
references
in open-universe
open-universe
particularly,
those
references
in
models,
particularly,
to
references
in open-universe
models,
particularly,
to those
those
variables
associated
with
number
variables
(i.e.,
color(b)
in
variables
associated
with
variables
associated
with number
number variables
variables (e.g.,
(i.e., color(b)
color(b) in
in
the
urn-ball
model).
the
urn-ball
model).
theTake
urn-ball
model).
theurn-ball
urn-ballmodel
modelas
asan
an example. When
Whenresampling
resampling
Take
the
Take
the
urn-ball
modelthe
as proposed
an example.
example.
When
resampling
drawn(d)
and
accepting
value
v,
the
generated
drawn(d)
and
the
value
drawn(d)
and accepting
accepting
the proposed
proposed
value v,
v, the
the generated
generated
code
for
accepting
the
proposal
will
be
code
code for
for accepting
accepting the
the proposal
proposal will
will be
be
void inc_cnt(X) {
void
if inc_cnt(X)
(cnt(X) == {0) W.add(X);
if
(cnt(X)
== 0) +W.add(X);
cnt(X)
= cnt(X)
1; }
cnt(X)
= cnt(X){ + 1; }
void
dec_cnt(X)
void
dec_cnt(X)
{ - 1;
cnt(X)
= cnt(X)
cnt(X)
= cnt(X)
1;
if (cnt(X)
== 0)- W.remove(X);
}
if (cnt(X)
== 0) W.remove(X);
void
accept_value_drawn(Draw
d, }Ball v) {
void
accept_value_drawn(Draw
d, Ball
v) {
// code
for updating dependencies
omitted
// dec_cnt(color(val_drawn(d)));
code for updating dependencies omitted
dec_cnt(color(val_drawn(d)));
val_drawn(d) = v;
val_drawn(d)
= v; }
inc_cnt(color(v));
inc_cnt(color(v)); }
The
The function
function inc_cnt(X)
inc_cnt(X) and
and dec_cnt(X)
dec_cnt(X) update
update the
the
The function
dec_cnt(X) update the
references
totoX.
The
accept_value_drawn()
isis
references
X.inc_cnt(X)
Thefunction
functionand
references
function Swift
is
specialized
r.v.
drawn(d):
specializedtototoX.
r.v.The
drawn(d):
Swift analyzes
analyzes the
the input
input propro-
specialized
to r.v. drawn(d):
the variables.
input
program
specialized
codes
for
gramand
andgenerates
generates
specializedSwift
codesanalyzes
fordifferent
different
variables.
gram and generates specialized codes for different variables.
Adaptive
Adaptivecontingency
contingencyupdating
updating
Adaptive
contingency
updating
The
goal
of
ACU
is
to
The goal of ACU is to incrementally
incrementally maintain
maintain the
the Markov
Markov
The
goalfor
ofevery
ACUr.v.
is with
to
incrementally
maintain the Markov
Blanket
minimal
Blanket
for
every
r.v.
with
minimalefforts.
efforts.
Blanket
for every
with minimal
CX denotes
ther.v.declaration
of efforts.
r.v. X in the input PP;
P arw (X) denote the parents of X in the possible world w;
w[X ← v] denotes the new possible world derived from w
by only changing the value of X to v. Note that deriving
CX denotes the declaration of r.v. X in the input PP;
the declaration
X the
in possible
the inputworld
PP;
X denotes
PC
ar(X|W
) denote
the parents of
of r.v.
X in
PWar(X|W
the parents
X in the
possible
; W (X )←denote
v) denotes
the newof
possible
world
derivedworld
from
W
v) denotes
newofpossible
from
W; W
by (X
only←changing
thethe
value
X to v.world
Notederived
that deriving
W
by
only
changing
the
value
of
X
to
v.
Note
that
deriving
w[X
←
v]
may
require
instantiating
new
variables
not
existW (X ← v) may require instantiating new variables not exW
(X
←due
v) due
may
instantiating
new variables not exing
in win
to
dependency
changes.
isting
W
torequire
dependency
changes.
isting
in W due
toar(X|W
dependency
Computing
PPar
X changes.
is
by executComputing
) for
X straightforward
is straightforward
by exew (X) for
Computing
P ar(X|W
)C
for
is straightforward
by
exeing
its generation
process,
,Xwithin
w, which
is often
incuting
its generation
process,
W , which
is often
XC
X , within
cuting
its
generation
process,
C
,
within
W
,
which
is
often
X
expensive.
Hence,
the
principal
challenge
for
maintaining
inexpensive. Hence, the principal challenge for maintaining
inexpensive.
Hence, isthe
challenge
for
maintaining
the
efficiently
maintain
aa children
the Markov
Markov blanket
blanket
is totoprincipal
efficiently
maintain
children set
set
the
Markov
blanket
efficiently
a children
Ch(X)
for
r.v.
X
with
the aim maintain
identical set
to
Ch(X)
for each
each
r.v. is
Xto
of keeping
identical
the
Ch(X)
for
each
r.v. X with
thethe
aimcurrent
of keeping
the
the
true set
Ch
current
possible
world
ground
truth
Ch(X|W
under
PW
Widentical
. w.
w (X) for) the
0 W.
ground
Ch(X|W
) under
the current
PW
With
aaproposed
possible
world
(PW)
← itv],is
Withtruth
proposed
possible
world
(PW)w
W=
(Xw[X
← v),
With
a
proposed
possible
world
(PW)
W
(X
←
v),
it is
iteasy
is
easy
to
add
all
missing
dependencies
from
a
particular
to add all missing dependencies from a particular varieasy
add
all
dependencies
varir.v.
. .That
is,is,missing
for
U and everyfrom
∈ PPar(U
arw0 (U
ableUto
U
That
forana r.v.
r.v. Va particular
∈
|W),),
able
UU. must
That be
is,whether
forCh
aw
r.v.
U
and
Vadd
∈P
|W ),
0 (V
since
in
we every
can
Ch(V)
up-to-date
we can
verify
U
∈),Ch(V
|Wkeep
)r.v.
and
Uar(U
to Ch(X)
we
can
verify
whether
U
∈
Ch(V
|W
)
and
add
U
to
Ch(X)
by
adding
U
to
Ch(V)
if
U
∈
6
Ch(V).
Therefore,
the
if U 6∈ Ch(X). Therefore, the key step becomes trackingkey
the
ifsetU of
6∈ variables,
Ch(X).
thev)),
key which
step
becomes
the
step
is
tracking Therefore,
the
set(X
of ←
variables,
∆(w[X
← tracking
v]),
which
∆(W
will have
a different
set
variables,
v)),←
which
will have
a different
will
have
a set in
of∆(W
parents
w[X
v] different
from
w.
setof
of
parents
W
(X(X
←in←
v).
set of parents in W (X ← v).
∆(W(X
[X ←
←v))
v]) =
= {Y : P arw (Y ) 6= P arw[X←v] (Y )}
∆(W
∆(W (X ← v)){Y
= : P ar(Y |W ) 6= P ar(Y |W (X ← v))}
Precisely computing ∆(w[X ← v]) is very expensive at
{Y : P ar(Y |W ) 6= P ar(Y |W (X ← v))}
runtime but computing an upper bound for ∆(w[X ← v])
Precisely
computing
∆(W
(X ← of
v)) is very
expensive
at
does
not influence
the ∆(W
correctness
inference
code.
Precisely
computing
(X ←
v)) the
is
very
expensive
at
runtime
but
computing
an
upper
bound
for
∆(W
(X
←
v))
Therefore,
we
proceed to
find
an bound
upper bound
that(X
holds
for
runtime
computing
ancorrectness
upper
←code.
v))
doesvalue
notbut
influence
the
of for
the∆(W
inference
any
v.
We
call
this
over-approximation
the
contingent
does
not influence
the tocorrectness
of the
inference
code.
Therefore,
we proceed
find an upper
bound
that holds
for
set
Contw (X).
Therefore,
weWe
proceed
to find
an upper bound the
thatcontingent
holds for
any
value
v.
call
this
over-approximation
Note
thatv.for
r.v. Uover-approximation
with dependency that
in
any
value
Weevery
this
the changes
contingent
set Cont(X|W
).call be
w(X
← v), X must
reached in the condition of a control
set Note
Cont(X|W
).
that
for
every
r.v. Vpath
with dependency
that changes
in
statement
onfor
theevery
execution
CU within w.
implies
Note
that
r.v.
Vreached
withofdependency
thatThis
changes
in
W
(X
←
v),
X
must
be
in
the
condition
of
a
con∆(w[X
v])X⊆must
Chwbe
(X).
One straightforward
idea
set
W
(Xstatement
←←v),
reached
of, namely
aiscontrol
on
the(X).
execution
pathinofthe
CVcondition
within
W
Cont
Ch
However,
this
leads
toW
too
much
w (X) =on
w execution
trol
statement
the
path
of
C
within
,
namely
V
∆(W (X ←
v)) ⊆ Ch(X|W
One straightforward
ideaforis
overhead.
∆(·)).).should
be always empty
∆(W
(X ←For
v))example,
⊆=Ch(X|W
One
straightforward
idea
is
set
Cont(X|W
)
Ch(X|W
).
However,
this
leads
to too
models
with fixed
dependencies.
set
Cont(X|W
)
=
Ch(X|W
).
However,
this
leads
to
too
much
overhead.
For example,
∆(
· ) should
be always
empty
Another
approach
is to first∆(
find
the be
set
of switching
much
overhead.
For
example,
· ) out
should
always
empty
for models
with
fixed
dependency.
variables
S
for
every
r.v.
X,
which
is
defined
by
for Another
models X
with
fixed dependency.
approach
is to first find out the set of switching
Another
approach
is
toarfirst
find6=out
the
set of
switching
SX =S{Y
: ∃w,
vP
Pisar
(X)},
variables
every
r.v.
X,
which
defined
w (X)
w[Y ←v]by
X for
variables SX for every r.v. X, which is defined by
This
immediate
advantage:
SX can (Y
be ←
statically
SXbrings
= {Y an
: ∃W,
v P ar(X|W
) 6= P ar(X|W
v))},
SX = {Yat :compile
∃W, v Ptime.
ar(X|W
) 6= P ar(X|W
(Y S←
computed
However,
computing
is NPX v))},
1
This brings
Complete
. an immediate advantage: SX can be statically
This
brings
immediate
SX canbbeSstatically
computed
atancompile
time.advantage:
However,
computing
is NPX taking
Our
solution
is to derive
the
approximation
S by
computed
1at compile time. However, computingXSX is NPComplete
.
the
union
1of free variables of if/case conditions, function arComplete .
Our solution
to derive conditions
the approximation
SbX then
by taking
guments,
and setisexpression
in CX , and
set
solution
is variables
to derive of
theif/case
approximation
SbXfunction
by taking
theOur
union
of free
conditions,
arbY }.
the
union
of
conditions,
function
arCont
(X)
= {Y : Yofconditions
∈if/case
Chw (X)
S
wfree
guments,
and
set variables
expression
in∧CXX ,∈and
then set
guments, and set expression conditions in CX , and then set
Contw (X) is a function of the sampling variable Xb and the
Cont(X|W
) = {Y : Y ∈ Ch(X|W
) ∧ we
X ∈ bSY }. a
current
PW w. Likewise,
r.v. )X,
Cont(X|W
) = {Y : Yfor∈ every
Ch(X|W
∧ X ∈maintain
SY }.
runtime
contingent
set Cont(X)
identical
to theXtrue
Cont(X|W
) is a function
of the sampling
variable
andset
the
Cont(X|W
)W
is .a Likewise,
function
ofwe
the
sampling
X
and
Cont
the current
PW
w. Cont(X)
be the
inw (X)
current
PWunder
maintain
a variable
runtime can
contingent
current
PW W
.for
Likewise,
weXmaintain
a runtime
contingent
crementally
updated
as well.
set Cont(X)
every
r.v.
corresponding
to true
continset
Cont(X)
for
every
r.v.
X
corresponding
to
true
continBack
to
the
original
focus
of
ACU,
suppose
we
are
adding
gent set Cont(X|W ) under the current
PW
W
.
Cont(X)
gent
setincrementally
Cont(X|W ) updated.
the PW
current
W←
. Cont(X)
new
dependencies
inunder
the new
w0 =PW
w[X
v]. There
can the
be
can
be
incrementally
updated.
are Back
three to
steps
accomplish
goal: (1)
enumerating
the
theto
original
focusthe
of ACU,
suppose
we are all
adding
Back
focus
ACU,
are
0 (U
variables
Uthe
∈ original
Cont(X),
(2)of
for
all
P ar
), adding
add are
U
wx).
new
thetodependencies
in the
new
PWVsuppose
W∈(X
←we
There
new
the
dependencies
new
(X
←Sbx).
There
0 (U
to
Ch(V),
and
(3) forinalltheVthe
∈goal:
PPW
arwW
)∩
U are
to
three
steps
to accomplish
(1)
enumerating
the
U , addall
three steps These
to accomplish
(1) enumerating
allway
the
Cont(V).
steps canthe
be goal:
also repeated
in a similar
1
Since the
CX may contain arbitrary boolean formuto remove
thedeclaration
vanished dependencies.
1
Since
thereduce
declaration
CX may
contain computing
arbitrary boolean
las,
one the
can
the model
3-SAT
problem
SX . formuTake
∞-GMM
(Fig. 3)toto
ascomputing
an example
las, one can reduce the 3-SAT problem
SX,. when resampling z(d), we need to change the dependency of r.v. x(d)
1
Since the declaration CX may contain arbitrary boolean formulas, one can reduce the 3-SAT problem to computing SX .
{ F
Fcc
(C, Id,
Id, {})
{}) }
}
cc (C,
{
void Id::accept_value(T
Id::accept_value(T v)
v)
void
Fcc(M
(M=
={
random
Tin
Id([T
Id,]
,]∗∗∗∗))))∼
∼C)
C)=
=
{random
for(uT
in
Cont(Id))
u.del_from_Ch();
FF
(M
==
random
T
Id([T
Id
,]
∼∼
C)
==
for(u
Cont(Id))
u.del_from_Ch();
(M
random
T
Id([T
Id
,]
C)
F
Id([T
Id
c
c
void
Id::add_to_Ch()
Id
=
v;
void
Id::add_to_Ch()
Id
=
v;
void Id::add_to_Ch()
Id::add_to_Ch()
void
{F
Fcccc(C,
(C,
Id,
{}) }
}
for(u
inId,
Cont(Id))
u.add_to_Ch(); }
}
{
FF
(C,
Id,
{})
}}
for(u
in
Cont(Id))
u.add_to_Ch();
{
Id,
{})
{
{})
cc (C,
cc
∗
void
Id::accept_value(T
v)
∗
void
Id::accept_value(T
v)
F
(M
=
random
T
Id([T
Id
,]
)
∼
C)
=
c
void
Id::accept_value(T
v)
Fc (M =void
random
T Id([T Id ,] ) ∼ C) =
Id::accept_value(T
v)
{
for(u
in Cont(Id))
Cont(Id))
u.del_from_Ch();
Fcc
(Cvoid
={
Exp,
X, seen)
seen)
=
Fcc
(Exp, X,
X,
seen)
{{
for(u
in
Cont(Id))
u.del_from_Ch();
cc (C
cc (Exp,
void
Id::add_to_Ch()
F
=
Exp,
X,
=
F
seen)
for(u
in
Cont(Id))
u.del_from_Ch();
Id::add_to_Ch()
for(u
in
u.del_from_Ch();
IdF
=(C,
v;Id,
Fcc
(C =
= Dist(Exp),
Dist(Exp),
X, seen)
seen)
=F
Fcc
(Exp, X,
X, seen)
seen)
Id
=
v;
cc (C
cc (Exp,
{Id
F=
(C,
Id,
{}) }
}=
F
X,
cc
Id
=
v;
{
{})
v;
cc
for(u
in Cont(Id))
Cont(Id))
u.add_to_Ch();
}
Fcc
(C void
= if
iffor(u
(Exp)
then
C11 else
else C
Cu.add_to_Ch();
X,
seen) =
=
for(u
in
Cont(Id))
u.add_to_Ch();
}}
cc (C
2
void
Id::accept_value(T
v) seen)
F
=
(Exp)
then
C
,, X,
for(u
in
Cont(Id))
u.add_to_Ch();
2
Id::accept_value(T
v)
in
}
F
(Exp,
X,
seen)
cc
{
for(u
in
Cont(Id))
u.del_from_Ch();
F
(Exp,
X,
seen)
{ccfor(u in Cont(Id)) u.del_from_Ch();
Fcccc(C
(C==
=
Exp,
X,v;
seen)=
=F
Fcccc(Exp,
(Exp,X,
X,seen)
seen)
if
(Exp)
FF
(C
Exp,
X,
seen)
==
FF
(Exp,
seen)
Id
=
v;
if
(Exp)
Exp,
X,
seen)
X,
seen)
Id
=
F
Exp,
X,
seen)
cc (C=
cc (Exp,X,
cc
cc
F
(C
=
Dist(Exp),
X,
seen)
=
F
(Exp,
X,seen)
seen)
{
F
(C
,
X,
seen
∪
F
V
(Exp))
}X,
F
=
Dist(Exp),
X,
seen)
=
F
(Exp,
X,
seen)
cc(C
cc
cc
1
cc
cc
for(u
in
Cont(Id))
u.add_to_Ch();
}
{
F
(C
,
X,
seen
∪
F
V
(Exp))
}
F
(C
=
Dist(Exp),
X,
seen)
=
F
(Exp,
seen)
cc
1 X,
for(u
in
Cont(Id))
}
Fcccc(C = Dist(Exp),
seen) = Fccccu.add_to_Ch();
(Exp, X,
Fcccc(C
(C==
=
if(Exp)
(Exp)then
thenC
C11else
elseC
C22,,,,X,
X,seen)
seen)=
=
else
FF
(C
if
(Exp)
then
C
else
X,
==
else
if
(Exp)
then
C
C
seen)
F
if
cc (C=
cc
11 elseC
22 X,seen)
F
(Exp,
X,
seen)
{
FX,
(C
X,
seen
∪
FV
V (Exp))
(Exp))
}
FF
X,
seen)
cc(Exp,
cc (C
2 ,,seen)
cc
Fcc
(C =
=F
Exp,
X,
seen)
=
Fcc
(Exp,
X, seen)
seen)}
{
F
X,
seen
F
cc (C
cc∪
(Exp,
X,
seen)
cc
2
F
Exp,
seen)
=
F
(Exp,
X,
X,
cc(Exp,
cc
if
(Exp) X,
if
(Exp)
Fcc
(C =
=if
Dist(Exp),
X, seen)
seen) =
=F
Fcc
(Exp, X,
X, seen)
seen)
cc (C
cc (Exp,
if
(Exp)
F
Dist(Exp),
(Exp)
{ (Exp)
Fcccc(C
(C1then
X,seen
seen
∪F
FV
VC
(Exp))
} =
{X,
F
(C
seen
∪∪
FF
VV
(Exp))
}}
1, ,X,
Fcc
(Exp,
X,
seen)
=
FF
(C
= if
if
(Exp)
then
C11 else
else
C
X, seen)
seen)
=
cc(C
cc
2 ,, X,
{
F
X,
seen
(Exp))
(Exp,
seen)
F
=
C
{
F
∪
(Exp))
}
cc (C
1, ,X,
cc
2
cc
1=
else
else
F
(Exp,
X, seen);
seen);
/* emit
emitF
line
of code
code
for every
every r.v.
r.v. U
U in
in the
the corresponding
corresponding set
set */
*/
ccline
(Exp,
X,
else
/*
aaelse
of
for
cc
{(Exp)
Fcc
(C
X,seen
seen
∪
FV
V(Exp))
(Exp))
}
{
FF
seen
∪∪
F)\seen)
VV
(Exp))
}}
cc(C
2,,,X,
b
if
(Exp)
∀{
U{
∈F
((F
V22(Exp)
(Exp)
∩S
S
Cont(U)
+= X;
X;
b
(C
X,
seen
F
(Exp))
if
, X,
∪
}
XF
cc(C
2
cc
∀
U
∈
((F
V
∩
)\seen)
:: Cont(U)
+=
X
{U F
F
(C
X, seen
seen ∪
∪F
FV
V
(Exp));
} X;
cc
,, X,
}
∀{
∈cc
(F(C
V11(Exp)\seen)
(Exp)\seen)
Ch(U) +=
+=
∀
UX,
∈
(F
V
:: (Exp));
Ch(U)
X;
F
(Exp,
seen)
=
F
(Exp,
X,
seen)
=
else
cc
cc
else
(Exp,
X,seen)
seen)==
FFcccc(Exp,
X,
{
F
(C
,
X,
seen
∪
F
V
(Exp));
}
/*emit
emit4:
line
of(C
code
forseen
every
r.v.
U(Exp));
inthe
thecorresponding
corresponding
setcode
*/
/*/*
emit
line
of
code
for
every
r.v.
UU
in
the
corresponding
set
*/*/
cc
2 , X,
Fof
∪r.v.
Ffor
V
} inference
Figure
Transformation
rules
outputting
inference
cc
2
emit
line
of
code
for
every
r.v.
in
the
corresponding
set
/*
code
for
every
U
in
set
*/
Figure
4:aaaa{line
Transformation
rules
outputting
code
bXXfor
bS
∀U
U∈
∈V
((F
V(Exp)
(Exp)
∩S
)\seen)
Cont(U)in+=
+=
X;
∀∀
UU
∈∈
((F
VV
(Exp)
∩∩
S
)\seen)
Cont(U)
X;
b
with ACU.
ACU.
F
(Expr)
denotes
the free
free:::: variables
variables
Expr.
((F
(Exp)
SbX
)\seen)
Cont(U)
+=
X;
∀
V
∩
Cont(U)
X;
X)\seen)
with
F
V((F
(Expr)
denotes
the
in+=
Expr.
Fcc
(Exp,
X,
seen)
=
cc (Exp,
∀X,
U∈
∈(F
(FV
V=
(Exp)\seen):::: Ch(U)
Ch(U)
+= X;
X;
∀∀
UU
∈
(F
V
(Exp)\seen)
Ch(U)
+=
X;
F
seen)
bX have
∈
(F
V
(Exp)\seen)
Ch(U)
+=
X;
∀
U
(Exp)\seen)
+=
b
We
assume
the
switching
variables
S
have
already been
been
We
assume
switching
S
already
X
/* emit
emit
a line
line the
of code
code
for every
everyvariables
r.v. U
U in
in the
the
corresponding
set */
*/
/*
a
of
for
r.v.
corresponding
set
computed.
The
rules
for
number
statements
and
case
statecomputed.
The
rules
for
number
statements
and
case
statebX )\seen)
Figure
4:
Transformation
rules
for
outputting
inference
code
Figure
4:
Transformation
rules
for
outputting
inference
code
∀
U
∈
((F
V
(Exp)
∩
S
:
Cont(U)
+=
X;
b
∀
∈ ((F
V (Exp)
∩
SX )\seen)
: Cont(U)
+= code
X;
Figureare
Transformation
rules
foroutputting
outputting
inference
code
Figure
4:4:UTransformation
rules
for
inference
ments
not
shown
for conciseness—they
conciseness—they
are
analogous
to
ments
are
not
shown
for
are
analogous
to
with
ACU.
F
V
(Expr)
denotes
the
free
variables
in
Expr.
with
ACU.
F
V
(Expr)
denotes
the
free
variables
in
Expr.
∀
U
∈
(F
V
(Exp)\seen)
:
Ch(U)
+=
X;
∀
U
∈
(F
V
(Exp)\seen)
:
Ch(U)
+=
X;
withACU.
ACU.FFVV(Expr)
(Expr)denotes
denotesthe
thefree
freevariables
variablesininExpr.
Expr.
with
the rules
rules for
for random
random variables
variables and
and if
if statements
statements
respectively.
the
respectively.
bXX have
bS
We assume
assume the
the switching
switching variables
variables
have already
already been
been
We
assume
the
switching
variables
S
already
been
b
We
assume
the
switching
variables
SbX
have
already
been
We
S
X have
Figure
4: Transformation
Transformation
rules
for statements
outputting
inference
code
Figure
4:
rules
for
outputting
inference
code
computed.
The
rules
for
number
statements
and
case
statecomputed.
The
rules
for
number
and
case
statecomputed. The
The rules
rules for
for number
number statements
statements and
and case
case statestatecomputed.
with
ACU.
F∈V
Vshown
(Expr)
denote
the
free
variables
in),Expr.
Expr.
with
ACU.
F
(Expr)
in
ments
areV
not
shown
fordenote
conciseness—they
areanalogous
analogous
to
ments
are
not
for
conciseness—they
are
analogous
to
variables
Vnot
Cont(X),
(2) for
forthe
allfree
U∈
∈variables
P ar(V
ar(V
|W
add to
Vto
ments
are
not
shown
for
conciseness—they
are
analogous
variables
∈ shown
Cont(X),
(2)
all
U
P
|W
),
add
V
ments
are
for
conciseness—they
are
b
b
We
assume
the
switching
variables
S
have
already
been
the
rules
for
random
variables
and
if
statements
respectively.
the
rules
for
random
variables
and
if
statements
respectively.
X|W
brespectively.
We
assume
the
switching
variables
Sstatements
have
theCh(U),
rulesfor
forand
random
variables
and
respectively.
X
b
to
Ch(U),
and
(3) for
for
all U
U
∈and
P ar(V
ar(V
∩ already
S
addbeen
V to
to
the
rules
random
variables
ififstatements
V ,, add
to
(3)
∈
P
|W
)) ∩
S
V
V
computed.
The
rules
forall
number
statements
and
case statestatecomputed.
The
rules
for
number
statements
and
case
Cont(U).
These
steps
can
be
also
repeated
in
a
similar
way
Cont(U).
These
steps
can
be also repeatedare
in analogous
a similar way
ments
are not
not
shown
fordependencies.
conciseness—they
to
ments
are
for
conciseness—they
are analogous
to
to
remove
the∈shown
vanished
to
remove
the
vanished
dependencies.
variables
V
Cont(X),
(2)
for
all
U
∈
P
ar(V
|W),),
),add
addVV
V
the
rules
for
random
variables
and
if
statements
respectively.
variables
V
∈
Cont(X),
(2)
for
all
U
∈
P
ar(V
|W
add
variables
V
∈
Cont(X),
(2)
for
all
U
∈
P
ar(V
|W
the
rules
for
random
variables
and
if
statements
respectively.
Take the
the
∞-GMM
model
(Fig.
3) U
as an
anPexample
example
when
revariables
V ∞-GMM
∈ Cont(X),
(2)(Fig.
for all
∈
ar(Vb|W,, when
), addreV
Take
model
3)
as
to
Ch(U),
and
(3)
for
all
U
∈
P
ar(V
|W
)
∩
S
,
add
V
to
b
b
V
toCh(U),
Ch(U),
andwe
(3)need
forall
all
ar(V
|W)))∩
add
V to
to
to
Ch(U),
and
(3)
for
all
UU ∈
∈∈ P
PPar(V
ar(V
|W
∩∩S
S
add
V
to
bSVVVof
sampling
z(d),
we
need
to change
change
the
dependency
of
r.v. V
x(d)
to
and
(3)
for
U
|W
,,,add
sampling
z(d),
to
the
dependency
r.v.
x(d)
Cont(U).
These
steps
can
be
also
repeated
in
a
similar
way
Cont(U).
Thesesteps
steps
can
be
also
repeated
inThe
similar
way
Cont(U).
These
steps
can
be
also
repeated
in
similar
way
since
Cont(z(d)|W
=can
{x(d)}
forrepeated
any
W ..in
The
generated
Cont(U).
These
be
also
aaasimilar
way
since
Cont(z(d)|W
=
{x(d)}
for
any
W
generated
since
Cont
(z(d))
=)) {x(d)}
{x(d)}
for
any
w. The
The
generated
code
wthe
since
Cont
(z(d))
=
for
any
w.
generated
code
toremove
remove
vanished
dependencies.
w
to
remove
the
vanished
dependencies.
to
the
vanished
dependencies.
code
for
this
process
is
shown
below.
to
remove
the
vanished
dependencies.
code
for
this
process
is
shown
below.
for this
this
process
is shown
shown
below.
for
is
below.
Takeprocess
the ∞-GMM
∞-GMM
model
(Fig. 3)
3) as
as an
an example
example ,, when
when rereTake
the
model
Take
the
model (Fig.
(Fig.
3) as
as an
an example
example ,, when
when
void
x(d)::add_to_Ch()
{ the
Take
the
∞-GMM
model
(Fig.
3)
void
x(d)::add_to_Ch()
{
sampling
z(d),
we
need
to
change
dependency
of
r.v.
x(d)
sampling
z(d),
we
need
to change
the dependency
of r.v. x(d)
resampling
z(d),
we
need
to change
change
the dependency
dependency
of
Cont(z(d))
+=
x(d);
resampling
z(d),
we
need
to
of
Cont(z(d))
+=
since
Cont(z(d)|W
)x(d);
=
{x(d)}
for any
anythe
W .. The
The generated
generated
since
Cont(z(d)|W
)
=
{x(d)}
for
W
r.v.
x(d)
since
Cont(z(d)|W
)
=
{x(d)}
for
any
W
.
The
Ch(z(d))
+=
x(d);
r.v.
x(d)
Cont(z(d)|W
= {x(d)} for any W . The
Ch(z(d))
+=
x(d);
code
for since
this process
process
is shown
shown) below.
below.
code
for
this
is
generated
code for
for this
this
process
is}
shown below.
below.
Ch(mu(z(d)))
+=
x(d); is
}shown
Ch(mu(z(d)))
+=
x(d);
generated
code
process
void z(d)::accept_value(Cluster
x(d)::add_to_Ch() {
{
void
z(d)::accept_value(Cluster
v) {
{
void
v)
void
x(d)::add_to_Ch()
Cont(z(d))
+=
x(d);
//
accept
the
proposed
value
v for
for r.v.
r.v. z(d)
z(d)
// Cont(z(d))
accept the +=
proposed
x(d); value v
Ch(z(d))
+=
x(d); references
// Ch(z(d))
code for
for +=
updating
references omitted
omitted
//
code
updating
x(d);
Ch(mu(z(d)))
+= x(d);
x(d); u.del_from_Ch();
}
for
(u in
in Cont(z(d)))
Cont(z(d)))
u.del_from_Ch();
for
(u
Ch(mu(z(d)))
+=
}
void
v) {
{
val_z(d)
= v;
v;
val_z(d)
=
void
v)
//
accept
the
proposed
value
v
for
r.v.}
z(d)
for
(u
in
Cont(z(d)))
u.add_to_Ch();
} z(d)
(u inthe
Cont(z(d)))
u.add_to_Ch();
//for
accept
proposed value
v for r.v.
// code
code for
for updating
updating references
references omitted
omitted
//
We
omit
the
code
for del_from_Ch()
del_from_Ch() for
for
conciseness,
We
omit
the
code
for
for conciseness,
conciseness,
We
omit
the
code
We
omit
for
for
conciseness,
for
(uthe
incode
Cont(z(d)))
u.del_from_Ch();
for
(u
in
Cont(z(d)))
u.del_from_Ch();
which
isis
essentially
the
same
as
add_to_Ch()
which
isessentially
essentially
thesame
sameas
asadd_to_Ch()
add_to_Ch()
which
is
the
....
which
essentially
the
same
as
add_to_Ch()
val_z(d)
=
v;
val_z(d)
= v;
The
formal
transformation
rules
for
ACU
are
demonstrated
The
formal
transformation
rules
forACU
ACUare
aredemonstrated
demonstrated
for
(u in
in
Cont(z(d)))
u.add_to_Ch();
}
The
formal
transformation
rules
for
The
formal
transformation
rules
for
ACU
are
demonstrated
for
(u
Cont(z(d)))
u.add_to_Ch();
}
in
Fig.
4.4.
FF
takes
in
random
variable
declaration
statement
inFig.
Fig.
4.F
Fthe
takes
inaafor
random
variabledeclaration
declaration
statement
cctakes
in
4.
in
random
variable
statement
in
Fig.
in
aafor
random
variable
declaration
statement
c
We
omit
code
del_from_Ch()
for
conciseness,
c takes
We
omit
the
code
del_from_Ch()
forcode
conciseness,
We
omit
the
code
for
del_from_Ch()
for
conciseness,
We
omit
the
code
for
del_from_Ch()
for
conciseness,
in
the
BLOG
program
and
outputs
the
inference
containin
the
BLOG
program
and
outputs
the
inference
code
containin
the
BLOG
program
and
outputs
the
inference
code
containin
the BLOG
program
and
outputs
the inference..code
containwhich
isisessentially
essentially
the
same
as
add_to_Ch()
which
essentially
the
same
asadd_to_Ch()
add_to_Ch()
which
is
the
same
as
which
is essentially
the
same
as
add_to_Ch()
ing
methods
accept
value()
and
add
to
Ch().
FF
value()
andadd
add to
to.. Ch().
Ch().F
Fcc
ing
methods
accept
cc
ing
methods
accept
value()
and
cc
value()
and
add
to
Ch().
ing
methods
accept
cc
The
formal
transformation
rules
for
ACU
are
demonstrated
The
transformation
rules
forACU
ACU
are
demonstrated
The
transformation
rules
for
are
demonstrated
takes
in
aformal
BLOG
expression
and
generates
the
inference
code
The
formal
transformation
rules
for
ACU
are
demonstrated
takes
informal
aBLOG
BLOG
expressionand
and
generates
the
inference
code
takes
in
a
expression
generates
the
inference
code
takes
in
a
BLOG
expression
and
generates
the
inference
code
in
Fig.
4.4.
FF
takes
inin
random
variable
declaration
statement
inFig.
Fig.
4.F
Fccctakes
takes
random
variable
declaration
statement
in
4.
aaaarandom
statement
inside
method
add
to
Ch().
It
keeps
track
of
already
seen
in
Fig.
takes
random
variable
declaration
statement
inside
method
addinin
to
Ch().variable
Itkeeps
keepsdeclaration
trackof
ofalready
already
seen
c add
inside
method
to
Ch().
It
track
seen
inside
method
add
to
Ch().
It
keeps
track
of
already
seen
in
the
BLOG
program
and
outputs
the
inference
code
containin
the
BLOG
program
and
outputs
the
inference
code
containin
the
BLOG
program
and
outputs
the
inference
code
containvariables
in
seen,
which
makes
sure
that
a
variable
will
be
in
the
BLOG
program
and
outputs
the
inference
code
containvariables
in
seen,
which
makes
sure
that
a
variable
will
be
variables
in
seen,
which
makes
sure
that
a
variable
will
be
variables
in seen,
which
makes sure
variable
willFF
be
ing
methods
accept
value()
and
add
to
Ch().
value()
andthat
addaset
to
Ch().
ingmethods
methods
accept
cc
cc
ing
accept
value()
and
add
to
Ch().
FFcc
added
to
the
contingent
set
or
the
children
atat
most
once.
value()
and
add
to
Ch().
ing
methods
accept
added
tothe
thecontingent
contingent
setor
orthe
thechildren
children
set
atmost
most
once.
cc
added
to
set
set
at
once.
added
to
the
contingent
set
or
the
children
set
most
once.
takes
in
aremoving
BLOG
expression
and
generates
the
inference
code
takes
in
a
BLOG
expression
and
generates
the
inference
code
takes
in
a
BLOG
expression
and
generates
the
inference
code
Code
for
variables
can
be
generated
similarly.
takes
in
a
BLOG
expression
and
generates
the
inference
code
Codefor
forremoving
removingvariables
variablescan
canbe
begenerated
generatedsimilarly.
similarly.
Code
Code
for
removing
can
generated
similarly.
inside
method
add
to
Ch().
Itchildren
keeps
track
of
already
seen
inside
method
addvariables
to
Ch().
Itbe
keeps
track
of
already
seen
inside
method
add
to
Ch().
keeps
track
already
seen
inside
method
add
to
Ch().
It
keeps
track
of
already
seen
Since
we maintain
maintain
the
exactIt
children
for of
each
r.v. with
with
Since
we
maintain
the
exact
for
each
r.v.
with
Since
we
maintain
the
exact
children
for
each
r.v.
with
Since
we
the
exact
children
for
each
r.v.
variables
in
seen,
which
makes
sure
that
a
variable
will
be
variables
in
seen,
which
makes
sure
that
a
variable
will
be
variables
in
makes
that
variable
be
ACU,
thecomputation
computation
ofacceptance
acceptance
ratio
αain
(line
in will
Alg.
1)
variables
in seen,
seen, which
which
makes sure
sure
that
a(line
variable
will
be
ACU,
the
computation
ofof
acceptance
ratio
αα
in
Eq. 7171 for
for
PMH
ACU,
the
computation
acceptance
ratio
in
Alg.
1)
ACU,
the
of
ratio
α
Eq.
PMH
added
to
the
contingent
set
or
the
children
set
at
most
once.
added
to
the
contingent
set
or
the
children
set
at
most
once.
added
to
the
contingent
set
or
the
children
set
at
most
once.
added
to the contingent
set or the children set at most once.
can
be simplified
simplified
to
can
be
to
Code
for
removing
variables
can
be
generated
similarly.
Code
forremoving
removing
variablescan
canbe
begenerated
generatedsimilarly.
similarly.
Code
for
variables
Code
for
removing
variables
can
be
generated
similarly.
Since
we
maintain
the
exact
children
for
each r.v.
r.v. with
with
Since
we
maintain
the
exact
children
for
each
r.v.
with
Since we
we maintain
maintain the
the exact
exact
children for
for each
each
r.v.
with
Since
!
Q children
!
Q
0acceptance
0in
0
ACU,
the
computation
of
acceptance
ratio
α
(line
7
in
Alg.
1)
ACU,
the
computation
of
ratio
α
(line
7
Alg.
1)
0
0
0
|w|
Pr[X
=
v|w
]
Pr[U
(w
)|w
ACU,
the
computation
of
acceptance
ratio
α
(line
7
in
Alg.
1)
ACU, the|w|
computation
of-X
acceptance
ratio
(line(w7 in
Alg.
Pr[X = v|w
] U
Pr[U
)|w
-X
-U ]] 1)
U ∈Ch
∈Chw
(X) α
00 (X)
-U
w
Q
min
1,
min 1, |w00 | Pr[X = v00 |w ] Q
Pr[V (w)|w
(w)|w-V
-X ]
-V ]]
|w | Pr[X = v |w-X
V ∈Ch
∈Chw
(X) Pr[V
w (X)
V
(2)
(2)
!
Q
!
|Wi−1
Pr[x =
= v|W
v|Wii]] Q u∈Ch(x|Wi ) Pr[u|W
Pr[u|Wii]]
i−1 || Pr[x
|W
u∈Ch(x|Wi )
Q
min
1,
min 1, |W | Pr[x = v00 |W ] Q
Pr[v|Wi−1
i−1 ]
i−1 ]]
|Wii| Pr[x = v |Wi−1
v∈Ch(x|Wi−1 )) Pr[v|W
for
PMHcan
canbe
besimplified
simplifiedto
to v∈Ch(x|Wi−1
for
PMH
can
be
simplified
to
forPMH
PMH
can
be
simplified
to
(2)
for
(2)
!
!
Q of random variables !
Q
!
Q
Q
Here |W
|W || denotes
denotes
the
total
number
exHere
the
total
number
of
random
variables
ex|W
|
Pr[x
=
v|W
]
Pr[u|W
]
|W
|
Pr[x
=
v|W
]
Pr[u|W
]
i−1 Pr[x = v|Wi i ] u∈Ch(x|W
i−1
u∈Ch(x|W
)))Pr[u|W
|Wi−1
Pr[u|Wiii]i]
i
= maintained
v|WiQ
i−1| |Pr[x
i]Q u∈Ch(x|W
i)
u∈Ch(x|W
min in
1,
ii
min
1,1,
isting
in1,
W ,, |W
which
can 0be
be
via RC.
RC.
Q
isting
W
which
can
maintained
via
0 |W
min
Q
min
|W
|
Pr[x
=
v
]
Pr[v|Wi−1
Here
|w|
denotes
the
total
number
of
random
variables
exist|W
|
Pr[x
=
v
|W
]
Pr[v|W
0
i
i−1
i−1
0
i
i−1
v∈Ch(x|W
)
Here
|w|
denotes
the
total
number
of
random
variables
existv∈Ch(x|W
|W
Pr[x==vv|W
|Wi−1
Pr[v|W
i−1
] ] v∈Ch(x|W
Pr[v|W
]]]]
i−1
i−1
i−1
i i| |Pr[x
i−1
Finally,|W
the
computation
time
of ACU
ACUi−1
is)))strictly
strictly
shorter
v∈Ch(x|W
i−1
Finally,
the
computation
time
of
is
shorter
ing
in
w,
which
can
be
maintained
via
RC.
(2)
(2)
ing
in
w,
which
can
be
maintained
via
RC.
(2)
(2)
than that
that of
of acceptance
acceptance ratio
ratio α
α (Eq.
(Eq. 2).
2).
than
Finally,
the
computation
time
of ACU
ACU
is strictly
strictly
shorter
Here
|W||||the
denotes
thetotal
totaltime
number
ofrandom
random
variables
exHere
|W
denotes
the
total
number
of
random
variables
exFinally,
computation
of
is
shorter
Here
|W
denotes
the
total
number
of
random
variables
exHere
|W
denotes
the
number
of
variables
exthan
that
of
acceptance
ratio
α
(Eq.
2).
isting
in
W
,
which
can
be
maintained
via
RC.
isting
in
W
,
which
can
be
maintained
via
RC.
4.2
Implementation
of
Swift
than
that
αSwift
(Eq. 2). via
isting
inof
Wacceptance
whichcan
canratio
beof
maintained
viaRC.
RC.
isting
in
W
, ,which
be
maintained
4.2
Implementation
Finally, the
thememoization
computation time
time of
of ACU
ACU is
is strictly
strictly shorter
shorter
Finally,
the
computation
time
of
ACU
isis
strictly
shorter
Finally,
the
computation
time
of
ACU
strictly
shorter
Finally,
computation
Lightweight
Lightweight
memoization
4.2
Implementation
of
Swift
than
that
of
acceptance
ratio
α
(Eq.
2).
than
that
of
acceptance
ratio
α
(Eq.
2).
4.2
Implementation
of
Swift
thanthat
that
ofacceptance
acceptance
ratioαα(Eq.
(Eq.2).
2). details
than
ratio
Here
we of
introduce
the implementation
implementation
details of
of the
the memomemoHere
we
introduce
the
Lightweight
memoization
ization
code
in
the
getter
function
mentioned
in
section
4.1.
Lightweight
memoization
ization
code
in
the
getter
function
mentioned
in
section
4.1.
4.2 Implementation
Implementationof
ofSwift
Swift
4.2
Implementation
of
Swift
4.2
of
Swift
4.2
Here
weImplementation
introduce
the
implementation
details
of target
the memomemoObjects
will be
be the
converted
to integers
integers
in the
the
target
code.
Here
we
introduce
implementation
details
of
the
Objects
will
converted
to
in
code.
Lightweight
memoization
Lightweight
memoization
Lightweight
memoization
ization
code in
inmemoization
the
getter
function
mentioned
in section
section
4.1.for
Lightweight
Swift
analyzes
thegetter
input function
PP and
and allocates
allocates
static
memory
for
ization
code
the
mentioned
in
4.1.
Swift
analyzes
the
input
PP
static
memory
Here
we
introduce
the
implementation
details
of
thememomemoHere
we
introduce
the
implementation
details
of
the
memoObjects
will
be
converted
to
integers
in
the
target
code.
Here
we
introduce
the
implementation
details
of
the
memoHere
we
introduce
the
implementation
details
of
the
memoization
in
the
target
code.
For
open-universe
models
Objects
willin
be
to integers
in theintarget
code.
memoization
inthe
theconverted
targetfunction
code.
For
open-universe
models
ization
code
the
getter
function
mentioned
section
4.1.
ization
code
in
getter
mentioned
in
section
4.1.
Swift
analyzes
the
input
PP
and
allocates
static
memory
for
ization
code
inthe
the
getter
function
mentioned
insection
section
4.1.
ization
code
in
getter
function
mentioned
in
4.1.
where
the
number
of
random
variables
arestatic
unknown,
the
dySwift
analyzes
input
PP
and
allocates
memory
for
where
the
number
of
random
variables
are
unknown,
the
dyObjects
will
be
converted
to
integers
in
the
target
code.
Objects
will
be
converted
to
integers
in
the
target
code.
memoization
in
the
target
code.
For
open-universe
models
Objects
will
be
converted
to
integers
in
the
target
code.
Objects
be
converted
to integers
in the target
code.
namic
tablewill
data
structure
(i.e.,
vector<int>
in models
C++)
is
memoization
in
the
target
code.
For
open-universe
namic
table
data
structure
(i.e.,
vector<int>
in
C++)
is
Swiftto
analyzes
theofinput
input
PPof
and
allocates
staticmemory
memory
for
Swift
analyzes
the
input
PP
and
allocates
static
memory
for
where
the
number
random
variables
are
unknown,
the dydySwift
analyzes
the
input
PP
and
allocates
static
memory
for
Swift
analyzes
the
PP
and
allocates
static
for
used
reduce
the
amount
dynamic
memory
allocations:
where
the
number
of
random
variables
are
unknown,
the
used
to
reduce
the
amount
of
dynamic
memory
allocations:
memoization
in the
the target
target(e.g.,
code.
For open-universe
open-universe
models
memoization
in
the
target
code.
For
open-universe
models
namic
table
data
structure
vector
in C++)
C++)
is used
used
to
memoization
in
target
code.
For
open-universe
models
memoization
in
code.
For
models
we
only
increase
the
length
of
the
array
when
the number
number
benamic
table
data
structure
(e.g.,
vector
in
is
to
we
only
increase
the
length
of
the
array
when
the
bewhere
the
number
of
random
variables
are
unknown,
the
dywhere
the
number
of
random
variables
are
unknown,
the
dyreduce
amount
ofofdynamic
dynamic
memory are
allocations:
we
only
wherethe
theamount
number
randomvariables
variables
areunknown,
unknown,
the
dywhere
the
number
random
the
dycomes
larger
than the
the
capacity.
reduce
of
memory
allocations:
we
only
comes
larger
than
capacity.
namic
table
data
structure
(i.e.,
vector<int>
in
C++)
is
namic
table
data
structure
(i.e.,
vector<int>
in
C++)
isis
increase
the
length
of
the
array
when
the
number
becomes
namic
table
data
structure
(i.e.,
vector<int>
in
C++)
namic
dataweakness
structure
(i.e.,
vector<int>
inbecomes
C++)tais
Onetable
potential
weakness
of
directly
applying
dynamic
taincrease
the
length
of
the array
when
thememory
number
One
potential
of
directly
applying
aa dynamic
used
to
reduce
the
amount
of
dynamic
allocations:
used
to
reduce
the
amount
of
dynamic
memory
allocations:
larger
than
the
capacity.
usedis
tothat,
reduce
the
amount
ofmultiple
dynamicnested
memory
allocations:
used
reduce
the
amount
of
dynamic
memory
allocations:
ble
isto
that,
forcapacity.
models
with
multiple
nested
number
statelarger
than
the
ble
for
models
with
number
statewe
only
increase
theconsumption
length
of
thearray
array
when
the
number
bewe
only
increase
the
length
of
the
array
when
the
number
bewe
only
increase
the
length
of
the
array
when
the
number
beOne
potential
weakness
of
directly
applying
a
dynamic
tawe
only
increase
the
length
of
the
when
the
number
bements,
the
memory
can
be
large.
In
Swift,
the
ments,
the
memory
consumption
canapplying
be large.a In
Swift, tathe
One potential
weakness
of directly
dynamic
comes
larger
than
the
capacity.
comes
larger
than
the
capacity.
comes
larger
than
thewith
capacity.
ble
is can
that,
for models
models
with
multiple
nested
number
statements
comes
larger
than
capacity.
user
can
force
the the
target
code
to clear
clear
all the
the
unused
memory
user
force
the
target
code
to
all
unused
memory
ble
is
that,
for
multiple
nested
number
statements
One
potential
weakness
ofdirectly
directly
applying
dynamic
taOne
potential
weakness
of
directly
applying
dynamic
taOne
potential
weakness
of
directly
applying
aadynamic
dynamic
ta(e.g.
the
aircraft
tracking
model
with
multiple
aircrafts
and
One
potential
weakness
of
aaaircrafts
taevery
fixed
number
of iterations
iterations
via applying
an
compilation
option,
every
fixed
number
of
via
an
compilation
option,
(e.g.
the
aircraft
tracking
model
with
multiple
and
ble
is
that,
for
models
with
multiple
nested
number
stateble
is
that,
for
models
with
multiple
nested
number
stateble isis
for
models
with multiple
multiple
nested number
number
stateblips
for
each
one),
thedefault.
memory
consumption
can
be large.
large.
ble
that,
for
models
with
nested
statewhich
isthat,
turned
off
by
default.
which
is
turned
off
by
blips
for
each
one),
the
memory
consumption
can
be
ments,
the
memory
consumption
canbe
be
large.
InSwift,
Swift,
the
ments,
the
memory
consumption
can
be
large.
In
Swift,
the
ments,
the
memory
consumption
can
be
large.
In
Swift,
the
In
Swift,
the
user
can
force
the
target
code
to
clear
all
the
ments,
the
memory
consumption
can
large.
In
the
The
following
code
demonstrates
the
target
code
for
The
following
codeforce
demonstrates
the
target
for
In
Swift,
the
user
can
the
target
code
to
clearcode
all the
user
can
force
the
target
code
to
clear
all
the
unused
memory
user
can
force
the
target
code
to
clear
all
the
unused
memory
usercan
can
force
the
target
code
clear
all
theunused
unused
unused
memory
every
fixed
number
ofall
iterations
via2)
a where
comuser
force
the
target
code
totoclear
the
memory
#Ball
and
color(b)
in
thenumber
urn-ball
model
(Fig.
2)
where
#Ball
and
color(b)
in
the
urn-ball
model
(Fig.
unused
memory
every
fixed
of
iterations
via
amemory
comeveryfixed
fixed
number
of
iterations
via
an
compilation
option,
every
fixed
number
of
iterations
via
an
compilation
option,
every
fixed
number
of
iterations
via
an
compilation
option,
pilation
option,
whichiteration
isiterations
turned
offvia
by an
default.
every
of
compilation
option,
iter
is
thenumber
current
iteration
number.
iter
is
the
current
number.
pilation
option,
which
is
turned
off
by
default.
which
isturned
turnedoff
off
bydefault.
default.
which
is
turned
off
by
default.
which
is
turned
off
by
default.
The is
following
code
demonstrates
the target
target code
code for
for the
the
which
by
vector<int>
val_color,
mark_color;
The
following
code
demonstrates
the
vector<int>
val_color,
mark_color;
The
following
code
demonstrates
thecolor(b)’s
target code
code
for
The
following
code
demonstrates
the
target
code
for
The
following
code
demonstrates
the
target
code
for
number
variable
#Ball
and
its
associated
in
the
The
following
code
demonstrates
the
target
for
int
get_color(int
b)
{
number
variable
#Ball
and
associated
color(b)’s
in
the
int
get_color(int
b)
{its
#Ball
and
color(b)
in
the
urn-ball
model
(Fig.
2)
where
#Ball
and
color(b)
in
the
urn-ball
model
(Fig.
2)
where
#Ball
and
color(b)
in
the
urn-ball
model
(Fig.
2)
where
urn-ball
model
(Fig. 2)
2)inwhere
where
iter
ismodel
the current
current
iteration
#Ball
and
color(b)
the
(Fig. 2)
where
if (mark_color[b]
(mark_color[b]
==urn-ball
iter)is
if
==
iter)
urn-ball
(Fig.
iter
the
iteration
iter
ismodel
thecurrent
current
iteration
number.
iter
isis
the
current
iteration
number.
iter
the
current
iteration
number.
return
val_color[b];
// memoization
memoization
number.
iter
is
the
iteration
number.
return
val_color[b];
//
number.
vector<int>
val_color,
mark_color;
mark_color[b]
= iter;
iter;mark_color;
// mark
mark the
the flag
flag
vector<int>
val_color,
mark_color;
mark_color[b]
=
//
vector<int>
val_color,
mark_color;
vector<int>
val_color,
int
get_color(int
b) {
{
val_color[b]
= ...//sampling
...//sampling
code omitted
omitted
int
get_color(int
b)
{{
val_color[b]
=
code
int
get_color(int
b)
int
get_color(int
b)
if (mark_color[b]
(mark_color[b]
== iter)
iter)
return
val_color[b];}
if
(mark_color[b]
==
iter)
return
val_color[b];}
if
(mark_color[b]
==
iter)
if
==
return val_color[b];
val_color[b];
// memoization
memoization
int return
val_num_B,
mark_num_B;
return
val_color[b];
//
memoization
int
val_num_B,
mark_num_B;
return
val_color[b];
//
memoization
//
mark_color[b]
=
iter;
//
mark
the flag
flag
int
get_num_Ball()
{
mark_color[b]
=
iter;
//
mark
the
flag
int
get_num_Ball()
{
mark_color[b] == iter;
iter; //
// mark
mark the
the
flag
mark_color[b]
val_color[b] =
===
...//sampling
code
omitted
if(mark_num_B
==
iter) return
returncode
val_num_B;
val_color[b]
==
...//sampling
code
omitted
if(mark_num_B
iter)
val_num_B;
val_color[b]
...//sampling
code
omitted
val_color[b]
...//sampling
omitted
return val_color[b];}
val_color[b];}
val_num_B
= ...//
...// sampling
sampling code
code omitted
omitted
return
val_color[b];}
val_num_B
=
return
val_color[b];}
return
int
val_num_B, mark_num_B;
mark_num_B;
if(val_num_B
>mark_num_B;
val_color.size(){//allocate
int
val_num_B,
mark_num_B;
if(val_num_B
>
int
val_num_B,
int
val_num_B,
int
get_num_Ball() {
{
{ get_num_Ball()
val_color.resize(val_num_B);
//memory
int
get_num_Ball()
{{
{
//memory
int
get_num_Ball()
int
if(mark_num_B
== iter)
iter) return
return val_num_B;
val_num_B;
mark_color.resize(val_num_B);
}
if(mark_num_B
==
iter)
return
val_num_B;
}
if(mark_num_B
==
iter)
return
val_num_B;
if(mark_num_B
==
val_num_B
=
...//
sampling
code
omitted
return
val_num_B;
}
val_num_B
=
...//
sampling
code
omitted
return
val_num_B;
}
val_num_B == ...//
...// sampling
sampling code
code omitted
omitted
val_num_B
if(val_num_B >
> val_color.size(){//allocate
if(val_num_B
>>
if(val_num_B
if(val_num_B
Note
that
when
generating
a
new
PW,
by
simply
increasing
Note
that
when
generating
a
new
PW,
by
simply
increasing
{ val_color.resize(val_num_B);
val_color.resize(val_num_B); //memory
//memory
{{
//memory
//memory
{
counter,
all
the
memoization
flags
(i.e.,
mark
color
for
counter,
all the memoization flags (i.e., mark}} color for
}}
color(
·
))
will
be
automatically
cleared.
color(
·
))
will
be
automatically
cleared.
return val_num_B;
val_num_B; }
}
return
val_num_B;
}}
return
return
Lastly,
in val_num_B;
PMH, we need
to randomly select a r.v. per
Lastly,
inwhen
PMH,
we needaato
randomly
select
a r.v. inper
Note
that
when
generating
new
PW,
by simply
simply
Notethat
that
when
generating
new
PW,
byprocess
simply
increasing
Note
that
generating
new
PW,
by
simply
increasing
Note
that
when
generating
aa new
PW,
by
inNote
that
when
generating
new
PW,
by
simply
increasing
Note
when
PW,
by
simply
increasing
iteration.
In
the generating
target
C++aanew
code,
this
is
accomiteration.
In
the
target
C++
code,
this
process
is
accomcreasing
the
counter
iter, all
allflags
the memoization
memoization
flags
(e.g.,
counter,via
allpolymorphism
the memoization
memoization
flags
(i.e.,
mark
color
for
counter,
all
the
memoization
(i.e.,
mark
color
for
creasing
the
counter
iter,
the
flags
counter,
all
the
memoization
flags
(i.e.,
mark
color
for
counter,
all
the
flags
(i.e.,
for
plished
by declaring
declaring
anmark
abstractcolor
class(e.g.,
with
plished
via
polymorphism
by
an
abstract
class
with
mark
color
for
color(·))
will
be
automatically
cleared.
color(
·
))
will
be
automatically
cleared.
color(
·
))
will
be
automatically
cleared.
mark
color
for
color(·))
will
be
automatically
cleared.
willbe
bemethod
automatically
cleared.class
color(
··))))will
automatically
cleared.
resample()
method
and aa derived
derived
class for
for every
every r.v.
r.v. in
in
aacolor(
resample()
and
Lastly,
inin
PMH,
wecode
need
to randomly
randomly
select
an r.v.
r.v.aaaamodel
to
samLastly,
in
PMH,
we
need
to randomly
randomly
select
r.v.
per
Lastly,
in
PMH,
we
need
to
randomly
select
r.v.
per
Lastly,
in
PMH,
we
need
to
an
to
samLastly,
in
PMH,
we
need
to
randomly
select
r.v.
per
Lastly,
PMH,
we
need
to
select
r.v.
per
the
PP. An
An
example
snippet
for
theselect
∞−GMM
is
the
PP.
example
code
snippet
for
the
∞−GMM
model
ple
per iteration.
iteration.
Intarget
the target
target
C++
code,
this
process
is acac-is
iteration.
In the
theIn
target
C++C++
code,code,
this this
process
is accomaccomiteration.
In
the
target
C++
code,
this
process
isis
accomple
per
the
process
is
iteration.
In
the
target
C++
code,
this
process
accomiteration.
In
C++
code,
this
process
is
shown
below.
shown
below.
complished
via
polymorphism
by declaring
declaring
an abstract
abstract
class
plishedvia
viapolymorphism
polymorphism
bydeclaring
declaring
anabstract
abstract
classclass
with
plished
via
polymorphism
by
declaring
an
abstract
class
with
complished
via
polymorphism
by
an
plished
via
polymorphism
by
declaring
an
abstract
class
with
plished
by
an
class
with
class
MH_OBJ
{public:
//abstract
class
MH_OBJ
{public:
//abstract
class
with
a
resample()
method
and
a
derived
class
for
every
aaresample()
resample()
method
and
a
derived
class
for
every
r.v.
in
aaclass
resample()
method
and
a
derived
class
for
every
r.v.
in
with
a
resample()
method
and
a
derived
class
for
every
resample()
method
andaaderived
derivedclass
classfor
forevery
every
r.v.in
in
method
and
r.v.
virtual void
void
resample()=0;//resample
step
step
r.v.
in
the
PP.
An example
example
code for
snippet
for the
the ∞-GMM
∞-GMM
thevirtual
PP.the
AnPP.
example
codesnippet
snippet
forthe
the∞−GMM
∞−GMM
modelis
is
the
PP.
An
example
code
snippet
for
the
∞−GMM
model
is
r.v.
in
An
code
snippet
for
the
PP.
An
example
code
snippet
for
the
∞−GMM
model
is
the
PP.
An
example
code
model
virtual void
void add_to_Ch()=0;
add_to_Ch()=0; //
// for
for ACU
ACU
virtual
model
is
shown
below.
shown
below.
shown
below.
model
isbelow.
shown
below. get_likeli()=0;
shown
below.
virtual
double
get_likeli()=0;
shown
virtual
double
class
MH_OBJ
{public:
//abstractclass
class
};
// MH_OBJ
some
methods
omitted
class
MH_OBJ
{public:
//abstract
class
class
{public:
//abstract
};
//
some
methods
omitted
class
MH_OBJ
{public:
//abstract
class
class
MH_OBJ
{public:
//abstract
class
virtualvoid
voidresample()=0;//resample
resample()=0;//resamplestep
step
virtual
void
step
virtual
virtual
void
step
virtual
void
step
virtual
void
add_to_Ch()=0;
//
for
ACU
virtual
void
add_to_Ch()=0;
//
for
ACU
virtual
void
add_to_Ch()=0;
//for
ACU
virtual
void
add_to_Ch()=0;
//
for
ACU
virtual void add_to_Ch()=0; // for ACU
virtualvoid
double
get_likeli()=0;
virtual
double
get_likeli()=0;
virtual
accept_value()=0;//for
ACU
virtual
double
get_likeli()=0;
virtual
double
get_likeli()=0;
};
//
some
methods
omitted
};
//
some
methods
omitted
virtual
double
get_likeli()=0;
};
//
some
methods
omitted
}; // some methods omitted
}; // some methods omitted
// maintain all the r.v.s in the current PW
std::vector<MH_OBJ*> active_vars;
// derived
derived classes
classes for
for r.v.s
r.v.s in
in the
the PP
PP
//
class Var_mu:public
Var_mu:public MH_OBJ{public://for
MH_OBJ{public://for mu(c)
mu(c)
class
double val;
val; //
// value
value for
for the
the var
var
double
double cached_val;
cached_val; //cache
//cache for
for proposal
proposal
double
int mark,
mark, cache_mark;
cache_mark; //
// flags
flags
int
std::set<MH_OBJ**>> Ch,
Ch, Cont;
Cont; //sets
//sets for
for ACU
ACU
std::set<MH_OBJ
double getval(){...};
getval(){...}; //sample
//sample && memoize
memoize
double
double getcache(){...};//generate
getcache(){...};//generate proposal
proposal
double
... };
}; //
// some
some methods
methods omitted
omitted
...
std::vector<Var_mu**>> mu;
mu; //
// dynamic
dynamic table
table
std::vector<Var_mu
class Var_z:public
Var_z:public MH_OBJ{
MH_OBJ{ ...
... };//for
};//for z(d)
z(d)
class
Var_z** z[20];
z[20]; //
// fixed-size
fixed-size array
array
Var_z
Efficient proposal
proposal manipulation
manipulation
Efficient
Although
one
PMH
iteration only
only samples
samples aa single
single variable,
variable,
Although one PMH iteration
generating aa new
new PW
PW may
may still
still involve
involve (de-)instantiating
(de-)instantiating an
an
generating
arbitrary number
number of
of random
random variables
variables due
due to
to RC
RC or
or sampling
sampling
arbitrary
number variable.
variable. The
The general
general approach
approach commonly
commonly adopted
adopted
aa number
by
many
PPL
systems
is
to
construct
the
proposed
possible
by many PPL systems is to construct the proposed possible
world in
in dynamically
dynamically allocated
allocated memory
memory and
and then
then copy
copy itit to
to
world
the current
current world
world [[Milch
Milch et
et al.,
al., 2005a;
2005a; Wingate
Wingate et
et al.,
al., 2011
2011]],,
the
which suffers
suffers from
from significant
significant overhead.
overhead.
which
In
order
to
generate
and
accept
new PWs
PWs in
in PMH
PMH with
with
In order to generate and accept new
negligible memory
memory management
management overhead,
overhead, we
we extend
extend the
the
negligible
lightweight memoization
memoization to
to manipulate
manipulate proposals:
proposals: Swift
Swift statstatlightweight
ically allocates
allocates an
an extra
extra memoized
memoized cache
cache for
for each
each random
random
ically
variable,
which
is
dedicated
to
storing
the
proposed
value
for
variable, which is dedicated to storing the proposed value for
that variable.
variable. During
During resampling,
resampling, all
all the
the intermediate
intermediate results
results
that
are stored
stored in
in the
the cache.
cache. When
When accepting
accepting the
the proposal,
proposal, the
the proproare
posed value
value is
is directly
directly loaded
loaded from
from the
the cache;
cache; when
when rejection,
rejection,
posed
no action
action is
is needed
needed due
due to
to our
our memoization
memoization mechanism.
mechanism.
no
Here is
is an
an example
example target
target code
code fragment
fragment for
for mu(c)
mu(c) in
in the
the
Here
∞-GMM model.
model. proposed
proposed vars
vars is
is an
an array
array storing
storing all
all
∞-GMM
model.
proposed
vars
is
an
array
storing
all
∞-GMM
the variables
variables to
to be
be updated
updated ifif the
the proposal
proposal gets
gets accepted.
accepted.
the
the variables to be updated if the proposal gets accepted.
std::vector<MH_OBJ*>proposed_vars;
>proposed_vars;
std::vector<MH_OBJ
*
class Var_mu:public
Var_mu:public MH_OBJ{public://for
MH_OBJ{public://for mu(c)
mu(c)
class
double val;
val; //
// value
value
double
double cached_val;
cached_val; //cache
//cache for
for proposal
proposal
double
int mark,
mark, cache_mark;
cache_mark; //
// flags
flags
int
double getcache(){
getcache(){ //
// propose
propose new
new value
value
double
if(mark ==
== 1)
1) return
return val;
val;
if(mark
if(mark_cache ==
== iter)
iter) return
return cached_val;
cached_val;
if(mark_cache
mark_cache == iter;
iter; //
// memoization
memoization
mark_cache
proposed_vars.push_back(this);
proposed_vars.push_back(this);
cached_val == ...
... //
// sample
sample new
new value
value
cached_val
return cached_val;
cached_val; }}
return
void accept_value()
accept_value() {{
void
val == cached_val;//accept
cached_val;//accept proposed
proposed value
value
val
if(mark ==
== 0)
0) {{ //
// not
not instantiated
instantiated yet
yet
if(mark
mark == 1;//instantiate
1;//instantiate variable
variable
mark
active_vars.push_back(this);
active_vars.push_back(this);
}} }}
void resample()
resample() {{ //
// resample
resample
void
proposed_vars.clear();
proposed_vars.clear();
mark == 0;
0; //
// generate
generate proposal
proposal
mark
new_val == getcache();
getcache();
new_val
mark == 1;
1;
mark
alpha == ...
... //compute
//compute acceptance
acceptance ratio
ratio
alpha
if (sample_unif(0,1)
(sample_unif(0,1) <=
<= alpha)
alpha) {{
if
for(auto v:
v: proposed_vars)
proposed_vars)
for(auto
v->accept_value(); //
// accept
accept
v->accept_value();
// code
code for
for ACU
ACU and
and RC
RC omitted
omitted
//
}; //
// some
some methods
methods omitted
omitted
}} }} };
Supporting new
new algorithms
algorithms
Supporting
We
demonstrate
that
FIDS can
can be
be applied
applied to
to new
new algorithms
algorithms
We demonstrate that FIDS
Supporting
new algorithms
by implementing
implementing
translator for
for the
the Gibbs
Gibbs sampling
sampling [[Arora
Arora
by
aa translator
et al.,
al.,
2010]] (Gibbs)
(Gibbs)
in
Swift.
We
demonstrate
that in
FIDS
can be applied to new algorithms
et
2010
Swift.
Gibbs
is aa variant
variant
of MH
MH with
with
the
proposal
distribution
g(·)
[Arora
byGibbs
implementing
a translator
forthe
theproposal
Gibbs sampling
is
of
distribution
g(·)
set
to
be
the
posterior
distribution.
The
acceptance
ratio
α
is
]
et
al.,
2010
(Gibbs)
in
Swift.
set to be the posterior distribution. The acceptance ratio α is
always
while
the disadvantage
disadvantage
is that
that itit is
is only
only possible
possible
to
Gibbs11iswhile
a variant
of MH with the
proposal
distribution
g(·)
always
the
is
to
explicitly
construct
thedistribution.
posterior distribution
distribution
with conjugate
conjugate
set
to be the
posterior
The acceptance
ratio α is
explicitly
construct
the
posterior
with
priors. In
Gibbs,the
thedisadvantage
proposal distribution
distribution
still constructed
constructed
always
1Inwhile
is that it isis only
possible to
priors.
Gibbs,
the
proposal
still
from the
the Markov
Markov
blanket,
which distribution
again requires
requires
maintaining
explicitly
constructblanket,
the posterior
with
conjugate
from
which
again
maintaining
the children
children
set. Hence,
Hence,
FIDS can
can be
be fully
fully is
utilized.
priors.
In Gibbs,
the proposal
distribution
still constructed
the
set.
FIDS
utilized.
However,
different
conjugate
priors
yield
different
forms
from
the Markov
blanket,
whichpriors
again yield
requires
maintaining
However,
different
conjugate
different
forms
of posterior
posterior
distributions.
Incan
order
to support
support
variety of
of
the
children set.
Hence, FIDS
be to
fully
utilized.
of
distributions.
In
order
aa variety
proposal
distributions,
we
need
to
(1)
implement
a
conjugacy
However,
different conjugate
yield different
forms
proposal
distributions,
we need topriors
(1) implement
a conjugacy
checker
to check
check
the form
form of
of
posterior
distribution
for each
each
of
posterior
distributions.
In posterior
order to distribution
support
a variety
of
checker
to
the
for
r.v. in
in the
thedistributions,
PP at
at compile
compile
time
(the
checker
has nothing
nothing
to do
do
proposal
wetime
need(the
to (1)
implement
a conjugacy
r.v.
PP
checker
has
to
with FIDS);
FIDS);
(2) implement
implement
a posterior
posterior
sampler
for every
every
conchecker
to check
the form aof
posteriorsampler
distribution
for each
with
(2)
for
conjugate
prior
in
the
runtime
library
of
Swift.
r.v.
in
the
PP
at
compile
time
(the
checker
has
nothing
to
do
jugate prior in the runtime library of Swift.
with FIDS); (2) implement a posterior sampler for every conExperiments
jugate
prior in the runtime library of Swift.
55 Experiments
In
the
experiments, Swift
Swift generates
generates target
target code
code in
in C++
C++ with
with
In the experiments,
5C++
C++Experiments
standard
<random>
library
for
random
number
generstandard <random> library for random number generation
and
the armadillo
armadillo
package
Sanderson,
2010
forwith
maation
the
ma[[Sanderson,
]] for
In
theand
experiments,
Swiftpackage
generates
target code2010
in C++
trix
computation.
The
baseline
systems
include
BLOG
(vertrix
computation.
The
baseline
systems
include
BLOG
(verC++ standard <random> library for random number genersion 0.9.1),
0.9.1),
Figaro
(version
3.3.0),
Church (webchurch),
(webchurch),
Insion
(version
3.3.0),
Church
In[Sanderson,
ation
and theFigaro
armadillo
package
2010] for mafer.NET
(version
2.6),
BUGS
(winBUGS
1.4.3)
and
Stan
fer.NET
(version The
2.6),baseline
BUGS systems
(winBUGS
1.4.3)
and (verStan
trix
computation.
include
BLOG
(cmdStan
2.6.0).
(cmdStan
sion
0.9.1),2.6.0).
Figaro (version 3.3.0), Church (webchurch), Infer.NET
(version 2.6),
BUGS (winBUGS 1.4.3) and Stan
5.1 Benchmark
Benchmark
models
5.1
models
(cmdStan
2.6.0).
We collect
collect aa set
set of
of benchmark
benchmark models
models22 which
which exhibit
exhibit varvarWe
ious
capabilities
of
a
PPL
(Tab.
1),
including
the
burglary
ious
capabilities
of
a
PPL
(Tab.
1),
including
the
burglary
5.1 Benchmark models
model (Bur),
(Bur), the
the hurricane
hurricane model
model (Hur),
(Hur),
the urn-ball model
model
2 the urn-ball model
We
collect denotes
a set of
benchmark
models
(U-B(x,y)
the
urn-ball model
model
withwhich
at most
mostexhibit
ballsvarand
(U-B(x,y)
denotes
the
urn-ball
with
at
xx balls
and
ious
capabilities
of
a
PPL
(Tab.
1),
including
the
draws), the
the TrueSkill
TrueSkill model
model [[Herbrich
Herbrich et
et al.,
al., 2007
2007burglary
(T-K),
]] (T-K),
yy draws),
model
(Bur), the
hurricane
model model
(Hur), (GMM,
the urn-ball
1-dimensional
Gaussian
mixture
100 model
points
1-dimensional
Gaussian
mixture
model (GMM,
100
points
(U-B(x,y)
denotes
the
urn-ball
model
with
at
most
x
balls and
with 44 clusters)
clusters) and
and the
the infinite
infinite GMM
GMM model
model (∞-GMM).
(∞-GMM).
We
with
We
[Herbrich
yalso
draws),
theaTrueSkill
et al.,digits
2007][[LeCun
(T-K),
include
real-worldmodel
dataset:
handwritten
also
include
a
real-world
dataset:
handwritten
digits
LeCun
1-dimensional
Gaussian
mixturemodel
modelTipping
(GMM,and
100Bishop,
points
et al.,
al., 1998
1998]] using
using
the PPCA
PPCA
et
the
model [[Tipping
and Bishop,
with
4
clusters)
and
the
infinite
GMM
model
(∞-GMM).
We
1999]].. All
All the
the models
models can
can be
be downloaded
downloaded from
from the
the GitHub
GitHub
1999
[LeCun
also
includeof
aBLOG.
real-world
dataset:
handwritten digits
repository
repository
of] BLOG.
et
al., 1998
using the PPCA model [Tipping and Bishop,
]
1999
.
All
the
can within
be downloaded
5.2
Speedup
by FIDS
FIDS
within
Swift from the GitHub
5.2 Speedupmodels
by
Swift
repository
of
BLOG.
We first
first evaluate
evaluate the
the speedup
speedup promoted
promoted by
by each
each of
of the
the three
three
We
optimizations in
in FIDS
FIDS individually.
individually.
optimizations
5.2 Speedup by FIDS within Swift
Dynamic
Backchaining
for LW
LW
Dynamic
Backchaining
for
We
first evaluate
the speedup
promoted by each of the three
We compare
compare the
the
running
time of
of the
the following
following versions
versions of
of
We
running
time
optimizations
in FIDS
individually.
code:
(1)
the
code
generated
by
Swift
(“Swift”);
(2)
the
modcode: (1) the code generated by Swift (“Swift”); (2) the modDynamic
Backchaining
for
ified compiled
compiled
code with
with DB
DB LW
manually turned
turned off
off (“No
(“No DB”),
DB”),
ified
code
manually
which
sequentially
samples
allof
thethe
variables
in the
the PP;
PP; (3)
(3)
We
compare
the running
time
following
versions
of
which
sequentially
samples
all
the
variables
in
the hand-optimized
hand-optimized
code without
without
unnecessary
memoization
code:
(1) the code generated
by Swift
(“Swift”);memoization
(2) the modthe
code
unnecessary
or recursive
recursive
function
callsDB
(“Hand-Opt”).
ified
compiled
code with
manually turned off (“No DB”),
or
function
calls
(“Hand-Opt”).
We measure
measure
the running
running
time
for all
all
the 33 versions
versions
and the
the
which
sequentially
samplestime
all for
the
variables
in the PP;
(3)
We
the
the
and
number
of calls
calls to
to the
the
random
number
generator
for “Swift”
“Swift”
the
hand-optimized
code
without
unnecessary
memoization
number
of
random
number
generator
for
andrecursive
“No DB”.
DB”.
The result
result
is(“Hand-Opt”).
concluded in
in Tab.
Tab. 2.
2.
or
function
callsis
and
“No
The
concluded
2
Most
22 Most
models are simplified to ensure that all the benchmark
Most models are simplified to ensure that all the benchmark
systems can
can produce
produce an
an output
output in
in reasonably
reasonably short
short running
running time.
time. All
All
systems
systems can produce an
output in
reasonably short
running time.
All
of these
these models
models can
can be
be enlarged
enlarged to
to handle
handle more
more data
data without
without adding
adding
of
extra lines
lines of
of BLOG
BLOG code.
code.
extra
extra
lines of
BLOG code.
Bur
R
CT
CC
OU
CG
Hur
X
X
X
X
U-B
X
X
X
T-K
GMM
∞-GMM
PPCA
X
X
X
X
X
X
X
X
X
30
25
20
2.3x
No RC
Swift
15
10
5
X
Table 1: Features of the benchmark models. R: continuous
scalar or vector variables. CT: context-specific dependencies
(contingency). CC: cyclic dependencies in the PP (while in
any particular PW the dependency structure remains acyclic).
OU: open-universe. CG: conjugate priors.
Alg.
Bur
Hur
U-B(20,2) U-B(20,8)
Running Time (s): Swift v.s. No DB
No DB
0.768
1.288
2.952
5.040
Swift
0.782
1.115
1.755
4.601
Speedup
0.982
1.155
1.682
1.10
Running Time (s): Swift v.s. Hand-Optimized
Hand-Opt
0.768
1.099
1.723
4.492
Swift
0.782
1.115
1.755
4.601
Overhead
1.8%
1.4%
1.8%
2.4%
Calls to Random Number Generator
No DB
3 ∗ 107
3 ∗ 107
1.35 ∗ 108 1.95 ∗ 108
Swift
3 ∗ 107 2.0 ∗ 107 4.82 ∗ 107 1.42 ∗ 108
Calls Saved
0%
33.3%
64.3%
27.1%
Table 2: Performance on LW with 107 samples for Swift, the
version without DB and the hand-optimized version
We measure the running time for all the 3 versions and the
number of calls to the random number generator for “Swift”
and “No DB”. The result is concluded in Tab. 2.
The overhead due to memoization compared against the
hand-optimized code is less than 2.4%. We can further notice
that the speedup is proportional to the number of calls saved.
Reference Counting for PMH
RC only applies to open-universe models in Swift. Hence we
focus on the urn-ball model with various model parameters.
The urn-ball model represents a common model structure appearing in many applications with open-universe uncertainty.
This experiment reveals the potential speedup by RC for realworld problems with similar structures. RC achieves greater
speedup when the number of balls and the number of observations become larger.
We compare the code produced by Swift with RC (“Swift”)
and RC manually turned off (“No RC”). For “No RC”, we
traverse the whole dependency structure and reconstruct the
dynamic slice every iteration. Fig. 5 shows the running time
of both versions. RC is indeed effective – leading to up to
2.3x speedup.
Adaptive Contingency Updating for PMH
We compare the code generated by Swift with ACU (“Swift”)
and the manually modified version without ACU (“Static
Dep”) and measure the number of calls to the likelihood function in Tab. 3.
The version without ACU (“Static Dep”) demonstrates the
0
U-B(10,5)
U-B(20,5)
U-B(40,5)
U-B(40,10)
U-B(40,20)
U-B(40,40)
Figure 5: Running time (s) of PMH in Swift with and without
RC on urn-ball models for 20 million samples.
Alg.
Bur
Hur
U-B(20,10) U-B(40,20)
Running Time (s): Swift v.s. Static Dep
Static Dep
0.986
1.642
4.433
7.722
Swift
0.988
1.492
3.891
4.514
Speedup
0.998
1.100
1.139
1.711
Calls to Likelihood Functions
Static Dep
1.8 ∗ 107 2.7 ∗ 107 1.5 ∗ 108
3.0 ∗ 108
7
7
7
Swift
1.8 ∗ 10
1.7 ∗ 10
4.1 ∗ 10
5.5 ∗ 107
Calls Saved
0%
37.6%
72.8%
81.7%
Table 3: Performance of PMH in Swift and the version utilizing static dependencies on benchmark models.
best efficiency that can be achieved via compile-time analysis without maintaining dependencies at runtime. This version statically computes an upper bound of the children set
c
by Ch(X)
= {Y : X appears in CY } and again uses Eq.(2)
to compute the acceptance ratio. Thus, for models with fixed
dependencies, “Static Dep” should be faster than “Swift”.
In Tab. 3, “Swift” is up to 1.7x faster than “Static Dep”,
thanks to up to 81% reduction of likelihood function evaluations. Note that for the model with fixed dependencies
(Bur), the overhead by ACU is almost negligible: 0.2% due
to traversing the hashmap to access to child variables.
5.3
Swift against other PPL systems
We compare Swift with other systems using LW, PMH and
Gibbs respectively on the benchmark models. The running
time is presented in Tab. 4. The speedup is measured between
Swift and the fastest one among the rest. An empty entry
means that the corresponding PPL fails to perform inference
on that model. Though these PPLs have different host languages (C++, Java, Scala, etc.), the performance difference
resulting from host languages is within a factor of 23 .
5.4
Experiment on real-world dataset
We use Swift with Gibbs to handle the probabilistic principal component analysis (PPCA) [Tipping and Bishop, 1999],
with real-world data: 5958 images of digit “2” from the handwritten digits dataset (MNIST) [LeCun et al., 1998]. We
compute 10 principal components for the digits. The training
and testing sets include 5958 and 1032 images respectively,
each with 28x28 pixels and pixel value rescaled to [0, 1].
Since most benchmark PPL systems are too slow to handle this amount of data, we compare Swift against other two
3
http://benchmarksgame.alioth.debian.org/
PPL
Bur
Church
Figaro
BLOG
Swift
Speedup
9.6
15.8
8.4
0.04
196
Church
Figaro
BLOG
Swift
Speedup
12.7
10.6
6.7
0.11
61
BUGS
Infer.NET
Swift
Speedup
87.7
1.5
0.12
12
Hur
U-B
(20,10)
T-K
LW with 106 samples
22.9
179
57.4
24.7
176
48.6
11.9
189
49.5
0.08
0.54
0.24
145
326
202
PMH with 106 samples
25.2
246
173
59.6
17.4
18.5
30.4
38.7
0.15
0.4
0.32
121
75
54
Gibbs with 106 samples
0.19
0.34
∞
∞
-
GMM
∞-GMM
1627
997
998
6.8
147
1038
235
261
1.1
214
3703
151
72.5
0.76
95
1057
62.2
68.3
0.60
103
84.4
77.8
0.42
185
0.86
∞
Table 4: Running time (s) of Swift and other PPLs on the
benchmark models using LW, PMH and Gibbs.
scalable PPLs, Infer.NET and Stan, on this model. Both PPLs
have compiled inference and are widely used for real-world
applications. Stan uses HMC as its inference algorithm. For
Infer.NET, we select variational message passing algorithm
(VMP), which is Infer.NET’s primary focus. Note that HMC
and VMP are usually favored for fast convergence.
Stan requires a tuning process before it can produce samples. We ran Stan with 0, 5 and 9 tuning steps respectively4 .
We measure the perplexity of the generated samples over test
images w.r.t. the running time in Fig. 6, where we also visualize the produced principal components. For Infer.NET, we
consider the mean of the approximation distribution.
Swift quickly generates visually meaningful outputs with
around 105 Gibbs iterations in 5 seconds. Infer.NET takes
13.4 seconds to finish the first iteration5 and converges to a
result with the same perplexity after 25 iterations and 150
seconds. The overall convergence of Gibbs w.r.t. the running
time significantly benefits from the speedup by Swift.
For Stan, its no-U-turn sampler [Homan and Gelman,
2014] suffers from significant parameter tuning issues. We
also tried to manually tune the parameters, which does not
help much. Nevertheless, Stan does work for the simplified
PPCA model with 1-dimensional data. Although further investigation is still needed, we conjecture that Stan is very sensitive to its parameters in high-dimensional cases. Lastly, this
experiment also suggests that those parameter-robust inference engines, such as Swift with Gibbs and Infer.NET with
VMP, would be preferred at practice when possible.
4
We also ran Stan with 50 and 100 tuning steps, which took more
than 3 days to finish 130 iterations (including tuning). However, the
results are almost the same as that with 9 tuning steps.
5
Gibbs by Swift samples a single variable while Infer.NET processes all the variables per iteration.
Figure 6: Log-perplexity w.r.t running time(s) on the PPCA
model with visualized principal components. Swift converges
faster.
6
Conclusion and Future Work
We have developed Swift, a PPL compiler that generates
model-specific target code for performing inference. Swift
uses a dynamic slicing framework for open-universe models
to incrementally maintain the dependency structure at runtime. In addition, we carefully design data structures in the
target code to avoid dynamic memory allocations when possible. Our experiments show that Swift achieves orders of
magnitudes speedup against other PPL engines.
The next step for Swift is to support more algorithms, such
as SMC [Wood et al., 2014], as well as more samplers, such
as the block sampler for (nearly) deterministic dependencies [Li et al., 2013]. Although FIDS is a general framework
that can be naturally extended to these cases, there are still
implementation details to be studied.
Another direction is partial evaluation, which allows the
compiler to reason about the input program to simplify it.
Shah et al. [2016] proposes a partial evaluation framework
for the inference code (PI in Fig. 1). It is very interesting to
extend this work to the whole Swift pipeline.
Acknowledgement
We would like to thank our anonymous reviewers, as well
as Rohin Shah for valuable discussions. This work is supported by the DARPA PPAML program, contract FA875014-C-0011.
References
[Agrawal and Horgan, 1990] Hiralal Agrawal and Joseph R Horgan. Dynamic program slicing. In ACM SIGPLAN Notices, volume 25, pages 246–256. ACM, 1990.
[Arora et al., 2010] Nimar S. Arora, Rodrigo de Salvo Braz, Erik B.
Sudderth, and Stuart J. Russell. Gibbs sampling in open-universe
stochastic languages. In UAI, pages 30–39. AUAI Press, 2010.
[Chaganty et al., 2013] Arun Chaganty, Aditya Nori, and Sriram
Rajamani. Efficiently sampling probabilistic programs via program analysis. In AISTATS, pages 153–160, 2013.
[Cytron et al., 1989] Ron Cytron, Jeanne Ferrante, Barry K Rosen,
Mark N Wegman, and F Kenneth Zadeck. An efficient method
of computing static single assignment form. In Proceedings of
the 16th ACM SIGPLAN-SIGACT symposium on Principles of
programming languages, pages 25–35. ACM, 1989.
[Goodman et al., 2008] Noah D. Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: A language for generative models. In UAI, pages
220–229, 2008.
[Herbrich et al., 2007] Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill(tm): A bayesian skill rating system. In Advances in
Neural Information Processing Systems 20, pages 569–576. MIT
Press, January 2007.
[Homan and Gelman, 2014] Matthew D Homan and Andrew Gelman. The no-u-turn sampler: Adaptively setting path lengths in
hamiltonian monte carlo. The Journal of Machine Learning Research, 15(1):1593–1623, 2014.
[Hur et al., 2014] Chung-Kil Hur, Aditya V. Nori, Sriram K. Rajamani, and Selva Samuel. Slicing probabilistic programs. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages
133–144, New York, NY, USA, 2014. ACM.
[Kazemi and Poole, 2016] Seyed Mehran Kazemi and David Poole.
Knowledge compilation for lifted probabilistic inference: Compiling to a low-level language. 2016.
[Kucukelbir et al., 2015] Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David Blei. Automatic variational inference
in Stan. In NIPS, pages 568–576, 2015.
[LeCun et al., 1998] Yann LeCun, Léon Bottou, Yoshua Bengio,
and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324,
1998.
[Li et al., 2013] Lei Li, Bharath Ramsundar, and Stuart Russell.
Dynamic scaled sampling for deterministic constraints. In AISTATS, pages 397–405, 2013.
[Lunn et al., 2000] David J. Lunn, Andrew Thomas, Nicky Best,
and David Spiegelhalter. WinBUGS - a Bayesian modelling
framework: Concepts, structure, and extensibility. Statistics and
Computing, 10(4):325–337, October 2000.
[Mansinghka et al., 2013] Vikash K. Mansinghka, Tejas D. Kulkarni, Yura N. Perov, and Joshua B. Tenenbaum. Approximate Bayesian image interpretation using generative probabilistic
graphics programs. In NIPS, pages 1520–1528, 2013.
[McAllester et al., 2008] David McAllester, Brian Milch, and
Noah D Goodman. Random-world semantics and syntactic independence for expressive languages. Technical report, 2008.
[Milch and Russell, 2006] Brian Milch and Stuart J. Russell.
General-purpose MCMC inference over relational structures. In
UAI. AUAI Press, 2006.
[Milch et al., 2005a] Brian Milch, Bhaskara Marthi, Stuart Russell,
David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG:
Probabilistic models with unknown objects. In IJCAI, pages
1352–1359, 2005.
[Milch et al., 2005b] Brian Milch, Bhaskara Marthi, David Sontag, Stuart Russell, Daniel L. Ong, and Andrey Kolobov. Approximate inference for infinite contingent Bayesian networks.
In Tenth International Workshop on Artificial Intelligence and
Statistics, Barbados, 2005.
[Minka et al., 2014] T. Minka, J.M. Winn, J.P. Guiver, S. Webster,
Y. Zaykov, B. Yangel, A. Spengler, and J. Bronskill. Infer.NET
2.6, 2014.
[Nori et al., 2014] Aditya V Nori, Chung-Kil Hur, Sriram K Rajamani, and Selva Samuel. R2: An efficient MCMC sampler for
probabilistic programs. In AAAI Conference on Artificial Intelligence, 2014.
[Pfeffer, 2001] Avi Pfeffer. IBAL: A probabilistic rational programming language. In In Proc. 17th IJCAI, pages 733–740. Morgan
Kaufmann Publishers, 2001.
[Pfeffer, 2009] Avi Pfeffer. Figaro: An object-oriented probabilistic programming language. Charles River Analytics Technical
Report, 2009.
[Plummer, 2003] Martyn Plummer. JAGS: A program for analysis
of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical
Computing, volume 124, page 125. Vienna, 2003.
[Ritchie et al., 2016] Daniel Ritchie, Andreas Stuhlmüller, and
Noah D. Goodman. C3: Lightweight incrementalized MCMC for
probabilistic programs using continuations and callsite caching.
In AISTATS 2016, 2016.
[Sanderson, 2010] Conrad Sanderson. Armadillo: An open source
c++ linear algebra library for fast prototyping and computationally intensive experiments. 2010.
[Shah et al., 2016] Rohin Shah, Emina Torlak, and Rastislav Bodik.
SIMPL: A DSL for automatic specialization of inference algorithms. arXiv preprint arXiv:1604.04729, 2016.
[Stan Development Team, 2014] Stan Development Team. Stan
Modeling Language Users Guide and Reference Manual, Version
2.5.0, 2014.
[Tipping and Bishop, 1999] Michael E. Tipping and Chris M.
Bishop. Probabilistic principal component analysis. Journal of
the Royal Statistical Society, Series B, 61:611–622, 1999.
[Tristan et al., 2014] Jean-Baptiste Tristan, Daniel Huang, Joseph
Tassarotti, Adam C Pocock, Stephen Green, and Guy L Steele.
Augur: Data-parallel probabilistic modeling. In NIPS, pages
2600–2608. 2014.
[Wingate et al., 2011] David Wingate, Andreas Stuhlmueller, and
Noah D Goodman. Lightweight implementations of probabilistic programming languages via transformational compilation. In
International Conference on Artificial Intelligence and Statistics,
pages 770–778, 2011.
[Wood et al., 2014] Frank Wood, Jan Willem van de Meent, and
Vikash Mansinghka. A new approach to probabilistic programming inference. In Proceedings of the 17th International conference on Artificial Intelligence and Statistics, pages 2–46, 2014.
[Wu et al., 2014] Yi Wu, Lei Li, and Stuart J. Russell. BFiT:
From possible-world semantics to random-evaluation semantics
in open universe. In Neural Information Processing Systems,
Probabilistic Programming workshop, 2014.
[Yang et al., 2014] Lingfeng Yang, Pat Hanrahan, and Noah D.
Goodman. Generating efficient MCMC kernels from probabilistic programs. In AISTATS, pages 1068–1076, 2014.

Swift: Compiled Inference for Probabilistic Programming Languages

Transcription

Similar documents

Val Babajov - Invest Bulgaria Agency

Page 13***.indd

Bugs, au Naturale.

Paris Charles de Gaulle Airport (France)

The world`s longest-running soap opera

Farms/Ranch/Acreage - Austin Portfolio Real Estate

Day 31

spain

case study: channel 5 news - London

A Modest Proposal Jonathan Swift 1667–1745