Introduction to Temporal Bayesian Networks y 1

Transcription

Introduction to Temporal Bayesian Networks y 1
Introduction to Temporal Bayesian Networks y
and Eugene Santos Jr.
Department of Electrical and Computer Engineering
Air Force Institute of Technology
Wright-Patterson AFB, OH 45433-7765
jdyoung@a
t.af.mil, esantos@a
t.af.mil
Joel D. Young
Presented At
The Seventh Midwest AI and Cognitive Science Conference
April 26-28, 1996
Abstract
A Bayesian network is a directed acyclic graph in which nodes are random variables
and the edges indicate that the source exerts direct causal inuence on the destination. A problem with the Bayesian network is that there is no natural mechanism for
representing temporal relations between and within the random variables. This paper
introduces a new technique for representing when a random variable holds a particular state, e.g. when an event happens, as well as techniques for applying temporal
constraints (precedes, during, etc.) to the edges. Allen's interval structure is used to
provide the formal basis. A restricted model is presented along with a corresponding
inferencing algorithm based on a linear constraint formulation.
1 Introduction
Complex systems consist of collections of interacting processes. These processes change over
time in response to both internal and external stimuli as well as to the passage of time
itself. There is great variety in the behavior of processes. Some processes are simple events
such as going to lunch or ipping a switch. Others are complex. One example being a
communication channel, in which errors may occur due to lightning strikes and in which
errors are more likely following previous errors. Processes can also be recurrent or periodic,
such as the passing of day into night or shifts in a work schedule.
Prior temporal modeling techniques have often had diculty modeling uncertainty as
to when and if an event occurs. Techniques able to model uncertainty often do not have
strong semantics. One such method, the Temporal Abduction Problem (TAP) of 4] uses a
cost based approach to model the uncertainty in an events occurrence, however the costs are
adhoc and TAP does not model the uncertainty as to when the event happened or for how
long.
y
This research was supported in part by AFOSR Project #940006.
1
Bayesian networks 2] provide a robust, probabilistic method of reasoning with uncertainty which has become popular with articial intelligence researchers. Bayesian networks,
however, do not provide a direct mechanism for representing temporal dependencies. For
example, it is dicult to represent a situation such as the variability of when an employee
arrives at work and the causal relationships between the time of arrival and later events.
This paper presents a new system, the Temporal Bayesian Network (TBN), for representing temporal and atemporal information while maintaining probabilistic semantics. The
technique allows representation of time constrained causality, of when and if events occur,
and of the periodic and recurrent nature of processes. Bayesian networks lie at the foundation of the system and provide the probabilistic basis. Allen's interval system 1] and his 13
relations provide the temporal basis.
2 Theoretical Structures
In probabilistic reasoning, random variables (RVs) are used to represent events and objects.
By making various assignments to these RVs, we can model the current state of the world
and weight the states according to the joint probabilities.
A Bayesian network is a directed acyclic graph. Directed arcs between RVs represent
conditional dependencies. When all the parents of a given RV are instantiated, that RV is
said to be conditionally independent of the remaining RVs given it's parents.
Allen's interval algebra is governed by 13 relations on the intervals. Basically, there is
a time interval in which each event occurs denoted by a b] where a is the starting time
point and b is the termination point. Temporal relationships between events are expressed
as relations between their intervals. The relations between intervals, denoted A, are f= <
> m mi d di s si f fi o oig (gure 1). For example, event A = a b] preceding event
B = c d] is denoted A < B indicating that a < b < c < d. The set of 13 relations is
mutually exclusive and exhaustive.
Uncertainty in the exact relationship between intervals is expressed as disjunctions. For
example, \interval A precedes or meets interval B " is written as Af< mgB . Some commonly
used disjunctions are disjoint, written f< > m mig, and contains, written fdi si fig 1].
A temporal random variable is a set of states, e.g. ftrue falseg, f1 2 3g, or ffalseg fRed Blueg, and a set of temporal intervals each having an associated random variable
(RV). Each RV has dened a density function giving the probability for each state.
Denition 1 (Temporal Random Variable(TRV)) An ordered pair (T ) where is
a set of states and T is a set of ordered pairs (i r) where i is a temporal interval and r is a
RV over . No two pairs (i1 r1 ) and (i2 r2 ) can exist such that i1 = i2 .
There is no requirement that the (i r) pairs have any semantic relationship to each other.
The TRV, however, represents the state of an event or process in time and should have a
clear semantic meaning. This is especially important in more restricted models where only
one (i r) pair can be true and the state of that one pair provides the visible state of the
entire TRV.
An assignment to a TRV consists of an assignment to each RV in the TRV:
Denition 2 (Temporal Assignment (TA)) A = ( ) is a temporal assignment i 2
T and 2 such that 8 2 T 9! 2 such that ( ) 2 A.
Sometimes the state of all of the RVs in a TRV is not available. A partial temporal
assignment is a subset of a temporal assignment.
2
=
precedes
XXX
YYY
XXX YYY
mi
meets
XXXYYY
d
di
during
s
si
starts
f
fi
finishes
o
oi
overlaps
XXX
YYYYYY
XXX
YYYYY
XXX
YYYYY
XXX
YYYYY
<
>
m
Figure 1: Allen's thirteen possible relations.
Denition 3 (Partial TA (PTA)) Ap is a partial temporal assignment i Ap A where
A is a Temporal Assignment for some TRV.
A TBN is a directed acyclic graph in which the nodes are TRVs and the edges indicate
that the source exerts direct causal inuence on the destination. Furthermore, the causal
inuence is tempered with the thirteen temporal relations described above.
Denition 4 (Temporal Bayesian Network) An ordered pair (R E ) where R is the set
of all variables and E is the set of directed edges between them. E is of form (rcause 2
R reffect 2 R C L). L is the set of all possible temporal causal relations (see Def. 5).
While, a TBN can only hold TRVs, a special class of TRV is used to represent random
variables. These TRVs have only one interval which spans the entire model. For convenience,
these TRVs are referred to as RVs.
What does it mean for one TRV, r1, to exert temporal causal inuence on another TRV,
r2? The probability of r2 to take on some particular state is dependent on r1 taking on
some state on some interval tting the temporal relation, e.g. \no interval in r2 can have
state true unless that interval is before an interval in r1 having state true." This is written
r1(f>g MAP )r2 with every (i r) 2 T(r2 ) having P (rj::: :r1) = 0:0. The MAP is an OR
function mapping from a set of RVs in r1 to a single state in of r1. The OR function is
dened below.
Denition 5 (Temporal Causal Relation(TCR)) A TCR is a relation between two TRVs
(r1 r2). The TCR species a function mapping a set of RVs in r1 to a single element of ,
the states of r1 , for use in the pdfs of r2 as well as providing the temporal conditions which
intervals in r1 must meet in order for their RVs to be eligible for membership in the set to be
mapped. The relation is written r1 (R MAP )r2 where r1 is the causal TRV, r2 is the eected
RV, R A, and MAP is a function mapping from the set of RVs in the TRV meeting the
3
temporal constraints in R to a single element of (r1 ). In the graphical representation of
the TBN, the relation is drawn as an edge from r1 to r2 labeled with the pair (R MAP ).
In order to preserve Bayesian syntax/semantics, the value of a TRV can only appear as
a singleton to other variables in the TBN. This is the role of the map function. A mapping
must be done from the set of values held by the RVs in the TRV to a single element of
. Just as care must be taken to avoid cycles in Bayesian networks, care must be taken to
ensure that the mapping functions are total. For many models, these maps will be extremely
simple, e.g.
XOR = f (f(r1 true) (r2 true)g false) (f(r1 false) (r2 true)g true)
(f(r1 true) (r2 false)g true) (f(r1 false) (r2 false)g false) g
which performs an exclusive or on the set. The map PASSTHROUGH maps from a
singleton RV set to an element of (r1). The following example is for = ftrue falseg
PASSTHROUGH = f(f(r true)g true) (f(r false)g false)g
Thus the relationship r1(A PASSTHROUGH)r2, read \r1 exerts direct causal inuence
on r2 under all temporal relationships." is equivalent to the causal relation in Bayesian
networks. This relationship does not need to be explicitly stated for relations between RVs
and from RVs to TRVs.
If each random variable and temporal random variable is assigned, then the TBN is said
to be completely assigned. The set of all of these assignments and there associated random
variables forms a complete assignment to the TBN.
Denition 6 (Complete Assignment) The set C is a complete assignment i
1. 8(r ) 2 C such that r 2 R, and is a TA.
2. 8(r1 1 ) (r2 2 ) 2 C such that r1 = r2 ) 1 = 2 .
3. 8r 2 R9(rc ) 2 C such that r = rc
A partial assignment is a partial specication of the state of the TBN consisting of a
subset of the variables of the TBN and the associated temporal assignments. More formally:
Denition 7 (Partial Assignment) The set Cp is a partial assignment of the TBN i
1. 8(r ) 2 Cp such that r 2 R and is a PTA.
2. 8(r1 1 ) (r2 2 ) 2 C such that r1 = r2 ) 1 = 2 .
A partial assignment, Ap1 , is said to be a subset of another partial assignment, Ap2 , (denoted Ap1 v Ap2 ) if every (rp1 p1 ) in Ap1 (except those having p1 = ) has a corresponding
(rp2 p2 ) in Ap2 such that rp1 = rp2 and p1 p2 .
3 First Model Temporal Bayesian Network (FMTBN)
We now present a more restricted TBN formulation:
1. Denition 8 (First Model TRV (FMTRV)) A FMTRV is a TRV A = (T )
where = ftrue falseg, 0 < j T j < 1, and
(a) 8(i1 r1 ) (i2 r2 ) 2 T
i. (i1 :disjoint i2 ) ) (i1 r1 ) = (i2 r2)
ii. (i1 < i2 ) ) (r2 dependent r1 ) ^ (r1 :dependent r2 )
(b) (9(i1 r1 ) r1 = true) ) (8(i2 r2) r2 = false _(i1 r1) = (i2 r2 ))
4
2. All relations between RVs are (A PASSTHROUGH), No relations between TRVs,
and relations from TRVs to RVs are (A XOR).
This model can represent anything that can be modeled with a Bayesian network as it
provides a superset of the Bayesian network syntax. Also, we can explicitly model an event
that occurs during an interval with the limitation that after the event has occurred, there
is persistence. For example if a support technician arrived at work between 07:00am and
07:30am the technician will still be at work at 09:00am and if the tech leaves work between
4:30pm and 5:00pm, she will still be gone from work at 10:00pm. We model a tech-support
situation (see below) with a three node TBN (Figure 2).
Tech support is only available if the phones are working and the support technician has arrived at work. The
probability that the phones are working is 0.95. The support tech has a fty percent chance of starting work
between 7:15am and 7:45am, 25 percent chance between 7:46am and 8:15am, and a 12.5 percent chance between
8:16am and 8:45am. If the tech is not in by 8:45am, she is not coming in at all.
P(sa|pw, ta) = 0.95
P(sa|~pw, ta) = 0.0
P(sa|pw, ~ta) = 0.0
support-available
ta = {([0715,0745],a),
([0746,0815],b),
([0816,0845],c)}
P(sa|~pw,~ta) = 0.0
P(a)=0.5
P(pw)=0.95
phones-working
tech-arrived
P(b|~a) = 0.5
P(c|~a->)=0.5
Figure 2:
Poor Support Inc. Tech-support TBN with two RVs and one TRV. The probabilities used in the gure are the
dependent probabilities rather then the break-out used in the text description, e.g. (0:5 0:25 0:125) becomes (0:5 0:5 0:5).
As all probabilities for the RVs in the TRV are dependent on all previous RVs not
happening, the notation P (cj:b !) is used as shorthand for P (cj:b :a)
For reasoning over the model, we focus on belief revision (i.e. nding the most probable
explanation). For our example model, there are two normal RVs and one TRV. The two
RVs each have two states, true and false, and the TRV also has two states however the TRV
is true only for some interval, a, b, or c. This gives a total of 22 4 = 16 possible complete
assignments to our TBN. The cases where the TRV is true for more than one interval do not
need to be considered as they have zero probability.
Now we need to choose which of these sixteen complete assignments is the most probable
explanation given some evidence. A set of evidence is presented in the form of a partial
assignment. For our example, E = f(PW f(;1 1] true)g) (TA f(0746 0815] false)g)g
is the PA for \Phones working and technician did not arrive between 07:46 and 08:15."
A complete assignment, M , is said to be compatible with E i E v M . If :E v M
then M is incompatible with E . A complete assignment Mc in our sample model compatible with E is f(PW f(;1 1] true)g), (SA f(;1 1] true)g) (TA f(0715 0745] false),
(0746 0815] false), (0816 0845] true)g)g.
In a Bayesian network, the joint conditional probability of a complete assignment is found
using the chain rule 2, pg. 227]. In the FMTBN, the chain rule is also used. We perform a
topological ordering on the nodes of the TBN to nd the order of computation. As the TRVs
are not dependent on anything, they are placed at the end of the order. The remainder of
the network is a Bayesian network and the ordering is constructed accordingly.
For our example network, the set of TRVs is fTAg and the RVs are fSA PH g and our
ordering is (SA PH TA) We then write the chain rule for some complete assignment Z :
P (Z ) = P (SAjPW TA) P (PW ) P (TA)
(1)
P (TA) = P (T jA B C ) P (C jB A) P (B jA) P (A)
(2)
5
where T is a XOR RV which has P (T = true jA B C ) = 1:0 when only one of A, B , or
C is true, zero on all other cases. Applying this to the complete assignment Mc above, we
compute
P (TA = f(C true)g) = P (T = true jC = true B = true A = true)
P (C = true jB = true A = true)
P (B = true jA = true)P (A = true)
= 1:0 0:5 0:5 0:5 = 0:125
P (Mc) = P (SA = true jPW = true TA = true) P (PW )
P (TA = f(A false) (B false) (C true)g)
= 0:95 0:95 0:125 = :112815
Denition 9 (Most Probable Explanation(MPE)) Let B = (R D) be a TBN, let E
be an evidential PA, and let M be some complete assignment. M is a MPE i for all A
where A is a complete assignment, E v A and P (M jE ) P (AjE ).
)
Since P (AjE ) = PP(AE
(E ) and an incompatible complete assignment can not be a MPE, we
only need to consider those complete assignments for which E v M as candidates. Thus
since E v A, we derive P (AjE ) = PP ((EA))
Following the above method, the FMTBN can be transformed into an equivalent Bayesian
network that can than be used for belief revision and updating 2]. Details of the transform
are available in 5]. The transform generates many conditional probabilities, requiring 2i
conditional probabilities for the ith interval. As the TRV requires only one conditional
probability per interval, the large number of probabilities introduced by the transform are
wasteful.
The conciseness, in terms of probabilities needed, of the TRV notation, suggests that
using Bayesian networks for computation may not be the best approach. An alternate approach, using a linear constraint system, was developed. This extends an approach developed
in 3] for coding Bayesian Networks as a system of linear constraints.
The following steps produce the constraints, variables, and costs.
1. For each TRV A
(a) Introduce variables AT and AF with cost functions
!
Pn Qi
(qAT ] true) = ; log
P (Rj = true)
(qAT ] false) = 0:0
i=1 j =1
!
n i
(qAF ] true) = ; log 1 ; P Q P (Rj = true)
(qAF ] false) = 0:0
i=1 j =1
(b) and the constraint
AT + AF = 1
(3)
(c) Now let T = f(i1 R1) (i2 R2) : : : (in Rn)g such that 8ij ij+1 where ij < ij+1
and 1 j < n.
(d) For each RV Rj
i. Introduce variables RjT and RjF with cost functions
(RjT true) = ; log (P (Rk = true jR1 = false : : : Rj;1 = false))
(RjF true) = ; log (1 ; P (Rk = true jR1 = false : : : Rj;1 = false))
(RjT false) = 0:0
(RjF false) = 0:0
6
j ;1
ii. and the constraints RjT + RjF = 1 and RjT = P RkF ; (i ; 1) + 1.
k=1
(e) And nally to tie A to its component RVs, we add the constraint
n
X
AT = RjT
i=1
.
Constraint 3 ensures that, to the rest of the TBN, TRV A can only take on one value.
Constraint 1e forces A to be true i one of the RVs for its intervals is true. Constraint
set 1(d)ii combined with the cost functions 1(d)i ensure that the cost of an instantiation set
is consistent with the conditional probabilities in the TRV.
Note that the construction of the linear constraint system can be done linear to the size
of T. Contrast this with the conversion to Bayesian network which was exponential to the
size of T.
After the linear constraint system has been eshed out for the TRVs, the method from 3]
is used to provide the rules for the rest of the random variables. An algorithm from same,
using mixed Boolean linear programming, allows determination of the most probable explanation.
4 Conclusion
We have presented a new technique for representing probabilistic knowledge in the temporal
domain. The technique, temporal Bayesian networks, is rmly based on the probabilistic
theory of Bayesian networks. A simple model is presented as well as a transformation into
a linear constraint system formulations. The concept of the temporal random variable and
the temporal causal relation hold great promise for modeling complex temporal situations
such as recurring or periodic events.
References
1] James F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM,
26(11):832{843, 1983.
2] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan
Kaufmann, San Mateo, CA, 1988.
3] Eugene Santos, Jr. On the generation of alternative explanations with implications for belief revision.
In Proceedings of the Conference on Uncertainty in Articial Intelligence, pages 337{347, 1991.
4] Eugene Santos, Jr. Unifying time and uncertainty for diagnosis. to appear in the Journal of Experimental
and Theoretical Articial Intelligence., 1996.
5] Joel D. Young and Eugene Santos, Jr. Temporal bayesian networks. Technical report, Department of
Electrical and Computer Engineering, Air Force Institute of Technology, 1996.
7