Information Integration via Hierarchical and Hybrid Bayesian Networks

Transcription

1
Information Integration via Hierarchical and Hybrid
Bayesian Networks
Haiying Tu, Jeffrey Allanach, Satnam Singh, Krishna R. Pattipati, Fellow, IEEE, Peter Willett, Fellow, IEEE
Abstract— A collaboration scheme for information integration
among multiple agencies (and/or various divisions within a single
agency) is designed using hierarchical and hybrid Bayesian networks (HHBNs). In this scheme, raw information is represented
by transactions (such as communication, travel, financing), and
information entities to be integrated are modeled as random
variables (such as: an event occurs, an effect exists, or an action
is undertaken). Each random variable has certain states with
probabilities assigned to them. Hierarchical is in terms of the
model structure and hybrid stems from our usage of both general
Bayesian networks (BNs) and hidden Markov models (HMMs, a
special form of dynamic BNs). The general Bayesian networks are
adopted in the top (decision) layer to address global assessment
for a specific question (e.g., “Is target A under terrorist threat?”
in the context of counter-terrorism). HMMs function in the
bottom (observation) layer to report processed evidence to the
upper layer BN based on the local information available to
a particular agency or a division. A software tool, termed
the adaptive safety analysis and monitoring (ASAM) system,
is developed to implement HHBNs for information integration
either in a centralized or in a distributed fashion. As an example,
a terrorist attack scenario gleaned from open sources is modeled
and analyzed to illustrate the functionality of the proposed
framework.
Index Terms— Information integration, decision making, hidden Markov models, Bayesian networks, counter-terrorism.
I. I NTRODUCTION
A. Motivation
ECISION making in the modern information era is a
complex process. Not only are the sources of information diverse, distributed and possibly conflicting, the acquired
information is very likely noisy, dynamic, incomplete and uncertain. In complex decision making scenarios, a strategic decision is supported by collaboration among multiple agencies
(or multiple divisions in a single agency), wherein each agency
has access to a portion of the total information, and may only
be responsible for part of the problem under consideration. The
key issues involve not only identifying valuable information
in a timely fashion, sharing this information across agencies
in an efficient manner, but also to integrate large volumes of
disparate information to support strategic decision making.
Information technologies including information integration
are vital to the national security and world-wide counterterrorism operations [1]. Terrorist organizations are typically
D
This work was supported by Aptima Inc., Woburn, MA 01801, USA.
The authors are with Electrical and Computer Engineering Department, University of Connecticut, Storrs, CT 06269-2157, USA. Jeffrey
Allanach now works at Applied Physical Science Corp., New London,
CT 06320, USA. (e-mail: [email protected], [email protected],
satnam/krishna/[email protected]).
elusive, geographically distributed across many countries,
highly dynamic and adaptive. Consequently, raw information
from various intelligence agencies is noisy, scattered and
evolving over time. However, analysis of prior terrorist attacks
suggests that a high magnitude terrorist attack requires certain
enabling events to take place [2]. We term the raw information
entities about the terrorist events “transactions”; this raw
information is filtered, processed in a summary form, and
reported to a higher level agency. The higher level agency can
be viewed as a “fusion center” that integrates the summarized
information, thus possibly providing early warning to facilitate
preemption and/or support strategic decision making.
Information integration covers numerous research areas,
such as data mining, information extraction, machine learning,
constraint reasoning, databases, view integration, web services,
and other related areas. This paper utilizes Bayesian networks
(BNs) and hidden Markov models (HMMs, a special form of
dynamic BNs) as basic modelling techniques to address the
information integration problem, with the identification and
analysis of the terrorist threats as the application background.
HMMs are hosted in lower level (sensing) agencies that
serve as information filters; that is, they take transactions
as inputs and provide local assessments as outputs that are
transformed into soft evidence (i.e., local decisions and the
concomitant confidence levels). BNs are maintained by higher
level (decision making) agencies functioning as fusion centers,
and pool the summarized information (in the form of soft
evidence) to support global decisions. BNs and HMMs are
therefore graphically constructed in a hierarchical fashion in
our modeling framework, resulting in hierarchical and hybrid
Bayesian networks (HHBNs).
Why do HHBNs make sense for information integration?
First, a HMM is natural in situations where there is no direct
access to true states of the environment having an underlying
Markovian structure. Analogous to the target tracking problem
where states (location, velocity, etc.) are observed through
noisy measurements, the true states of terrorist activities are
detected against a background of “noise” transactions (observations). Thus, HMMs provide a realistic representation for
information processing in intelligence agencies tasked to identify terrorist activities. Intelligence agencies may only obtain
partial information of the complete pattern of transactions
associated with a particular terrorist activity. Consequently,
the hidden states of the terrorist activity are observed through
another set of stochastic processes that produce the sequence
of observable transactions. The key problem here is to detect
a suspicious pattern (i.e., a HMM corresponding to a terrorist
activity) and assess its likelihood, given a sequence of partial
2
and noisy observed transactions. Secondly, a BN incorporates
uncertainty in a cause-effect modeling framework. BNs are
useful for both inferential exploration of previously undetermined relationships among variables as well as descriptions
of these relationships upon discovery [3]. Different pieces of
information from various agencies may be related to each
other, if they potentially belong to the same attack plan. More
information can be inferred, if not directly collected from
agencies, according to prior knowledge of causal relationships.
With additional choices of counter-terrorism actions, a BN can
even be used to suggest optimal action strategies (i.e., the best
courses of action). Finally, the hierarchical structure of HHBN
is naturally suited to information integration across multiple
collaborating agencies monitoring the information space.
A software tool, termed the adaptive safety analysis and
monitoring (ASAM) system, is developed to implement the
HHBNs for information integration in either a centralized or
a distributed fashion. Although the ASAM system itself is
developed for counter-terrorism applications, the underlying
HHBN theory and the prototype software tool have broad
applications in command and control, strategic framework
for business information, information pooling in economic
organizations, to name a few.
B. Related Work
HMMs are powerful statistical techniques for modeling
sequential data. Although they are well-known and have been
successfully applied in speech recognition, HMMs have also
been used in many other areas, such as DNA sequence
analysis, robot control, fault diagnosis [4], signal detection
[5] [6], and so on. A tutorial on HMMs can be found in [7]
[8].
BNs, also known as probabilistic networks, causal networks
or belief networks, are a formalism for representing uncertainty in a way that is consistent with the axioms of probability
theory [9]. With the assumption of conditional independence,
BNs can model complex systems in well-structured and easily
interpretable ways. A fairly large set of theoretical concepts
and results can be found in [10]–[12].
Research efforts also cover combinations of different models
or extensions of general HMMs and BNs. Fine et al. [13]
generalized the HMM inference and learning processes to
hierarchical hidden Markov models (HHMMs), and demonstrated the usability of HHMMs in hand-written English text
recognition. A combination of HMM and decision tree, termed
the hidden Markov decision tree (HMDT), can be found in
[14]. Hierarchical Bayesian networks (HBN) are introduced in
[15] to represent additional information about the structure of
the domains of variables. Although represented hierarchically,
the HBN inference algorithm is the same as that for a general
BN when applied on a fully “flattened” HBN. In [16], a hybrid
HMM/BN model is proposed to supplement acoustic spectrum
features in speech recognition. The HMM is used for modeling
temporal speech characteristics and a state probability model
is represented by the BN. As far as we are aware, this is the
only work in the literature which combines HMMs and BNs
in a single model, and our work is quite different in both the
application background and the representation details.
BN or HMM-related research has been brought into the
national security community as well. Coffman and Marcus
[17] employ HMMs to identify groups with suspicious behaviors based on communication patterns among the group
members. An anti-terrorist risk management tool, termed
“Site Profiler® ”, was introduced in [18]. This tool applies
knowledge-based BN construction and allows one to combine
evidence from analytical models, simulations, historical data,
and user judgments. Paté-Cornell and Guikema [19] present
a model in the form of an influence diagram (a variant
of the BN) for setting priorities among threats and among
countermeasures, based on probabilistic risk analysis, decision
analysis, and elements of game theory. While the methods in
[18] and [19] are consistent with our approach to fuse the
data from many sources at the BN layer, we introduce HMMs
at a lower (observation or sensing) layer to automate the
detection of terrorist activities, and to report the concomitant
soft evidence to the BN layer.
C. Organization of the Paper
The remainder of the paper is organized as follows. The
proposed HHBN model is described in Section II. The theoretical background on information processing using HHBN
is discussed in Section III. In Section IV, we describe the
ASAM system designed to implement the HHBN scheme. The
Indian Airlines Flight IC-814 hijacking scenario is modeled
and analyzed in Section V using the HHBN scheme. Finally,
we conclude the paper with a summary and an outline of our
current track of research in Section VI.
II. T HE HHBN M ODEL
As mentioned earlier, a HHBN model is a hierarchical
combination of BNs and HMMs, which can be arranged in
multiple layers. We demonstrate our key ideas of information
integration with a two-layer model in this paper, and discuss
other possibilities in Section VI.
A typical HHBN model is shown in Fig. 1. It consists of a
BN model at the top layer serving as a fusion center, several
HMM models (two HMMs belonging to two agencies are
shown in this figure) at the bottom layer serving as information
filters; they process the raw information and provide soft
evidence to the corresponding BN node where the BN is
maintained by another agency.
Formally, a HHBN is a triplet hMBN , MHM M s , Ri, where
1) MBN denotes the top layer BN model. It contains N
random variables (BN nodes) {Vi |(i = 1, 2, · · · , N )}
and each random variable has {Qi |(i = 1, 2, · · · , N )}
number of discrete states. We will use the upper case
letters for the random variables and lower case letters
for the instances of the random variables hereafter. The
relationships among random variables are constrained by
the model structure; viz., a directed acyclic graph (DAG)
obeying the usual conditional independence assumptions.
Specifically, the arcs (links) between the BN nodes represent probabilistic causal dependencies. The function of
the MBN can therefore be characterized by the joint
probability distribution function P (v1 , v2 , · · · , vN ) =
3
V2
Agency 3
V1
Global belief
(integrated information)
V4
V3
Soft evidence
(updated information)
EV4
V5
BN Level
HMM Level
HMM1
(ȁ1)
S1
X1
Fig. 1.
(1)
Agency 1
S2
X2
HMM2
(ȁ2)
(2)
S1'
X3
S2'
Confidence measurement
(local information)
Agency 2
S3'
observation space
()
Transactions
(raw information)
Information flow
HHBN model structure.
QN
i=1 P (vi |pa(vi )), where pa(vi ) is the possible instantiation of the parent nodes of Vi , derived using the chain
rule of probability and conditional independence [20].
To be precise, given the state of a node’s parents, all
the ancestors are conditionally independent of the node.
Here, we use “parents” to depict the direct fan-in nodes,
and “ancestors” to represent the parents’ parents, and so
on [21]. The conditional probabilities {P (vi |pa(vi ))|(i =
1, 2, · · · , N )} constitute the numerical parameters of the
model; they correspond to the conditional probability
tables (CPTs) in the discrete case. The size of CPTs
will increase with the number of parent nodes. In many
cases, the parent nodes can be assumed to be marginally
independent and linked to the effect node via NoisyOR logic, which will limit the conditional probabilities
to a reasonable size. When the Noisy-OR assumption is
not valid and the scale of the CPTs is a concern, one
can introduce intermediate causal nodes to reduce the
density of a single node. The nodes are either partially
observable or probabilistically inferable based on the
network structure.
2) MHM M s is a set of discrete time, finite state HMMs
{HM Mi |(i = 1, 2, · · · , M )} functioning at the bottom layer. A discrete HMM itself is a five-tuple:
hS, X, A, B, Πi, where Λ = (A, B, Π) represents the
set of model parameters, i.e., state transition matrix,
emission matrix, and the prior probabilities of the
states. In the rest of this paper, we may use the
HMM parameter set {Λi |(i = 1, 2, · · · , M )} to represent the corresponding HMMs, with λi or λi denoting
HM Mi being active or inactive, respectively. Here, S =
{S1 , S2 , · · · , SNS } denotes the set of finite states, and
X = {X1 , X2 , · · · , XNX } is the set of possible observations. MHM M s represents multiple hypotheses of the
environment (e.g., diverse patterns of terrorist activities);
the objective is to detect which HMM is active (or, which
of several HMMs are active) at a certain time index k
based on the available observations up to time k.
Unlike traditional HMMs, the states as well as the possible observations in our HMM are also in the form of
networks in this paper. Such a network contains nodes and
arcs, where the nodes are the keywords (person, target,
etc.) in the terrorist activity modeling; an arc between
two nodes creates a transaction. We will further clarify
this representation in Section V in the context of Indian
Airlines hijacking example.
3) The relation R, a key concept of the HHBN model,
provides the bridge between the top layer BN and the
bottom layer HMMs. R is a set of associations {Rijk |(i =
1, 2, · · · , M ); j ∈ (1, 2, · · · , N ); k ∈ (1, 2, · · · , Qj )}
with Rijk = 1 implying that HM Mi is assigned to
state k of BN node Vj . In Fig. 1, HM M1 (or Λ1 ) is
assigned to the BN node V1 and HM M2 (or Λ2 ) is
assigned to the BN node V4 . The node states are not
specified in this figure. However, as we will explain in the
next section, a binary BN node (i.e., a node having two
mutually exclusive states) has a one-to-one relationship
with one HMM along with its alternative hypothesis (λ1
vs. λ1 , where λ1 means HM M1 is active and λ1 means
HM M1 is inactive).
In order to complete our model description, the definitions
of several notions are provided here. They are used in Fig. 1
and/or frequently appear in the rest of the paper.
1) Transactions: A transaction is a link between key nodes.
For example, a simple event “an unknown person purchased chemicals” can be modeled as a transaction,
where “unknown person” and “chemicals” are the key
nodes and there is a link (arc) between them denoting
a transaction called “collecting resources”. Typically, the
transactions occurring in the real world could be classified
into two groups: “signal” (“harmful”) transactions and
“noise” (“benign”) transactions. The former, i.e., signal
transactions, are represented in a HMM state; and the
latter, i.e., the noise transactions, are not. The noise
transactions are treated as clutter.
2) Observations: Observations are the inputs to HMMs; they
are a series of transactions among suspicious people,
places, and things with a time stamp associated with each
transaction denoting the event occurrence time.
3) Patterns: A pattern is the time evolution of different
transactions. Each HMM state sequence can be viewed as
a hypothetical pattern. The patterns are typically gleaned
from past statistics or subject matter experts.
4) Confidence measurement: The output of a HMM is a
confidence measurement, which is the likelihood ratio of
observing the sequences of observation up to current time
given alternative hypotheses (e.g., λ1 vs. λ1 ).
5) Evidence: Evidence is a terminology of BNs. The evidence for a particular BN node can be observed as one
of its states, called hard evidence; or, the evidence may
be observed with uncertainty, i.e., soft evidence.
6) Soft evidence: Soft or virtual evidence is the most general
type of evidence introduced to reflect uncertainty [22].
For a node without parents, soft evidence is equivalent to
modifying the prior probability of that node; otherwise,
soft evidence on a variable V (i.e., a node in a BN) is represented by a reported state v together with its conditional
4
probability vector P (V = v|Hi ), (i = 1, 2, · · · , Q) for
all the Q states, where Hi denotes the hypothesis that the
true state is i.
The right side of Fig. 1 shows the information flow associated with the HHBN model. Raw information arrives as
sequences of transactions, which constitute the inputs to the
HMMs. HMMs, based on the partition of the observation
space, detect the “signal” transactions (if any), and report the
local decisions and the corresponding confidence to higher
layer BN nodes. Since only the active HMMs will report
their findings and trigger the BN inference, the HMMs are
essentially running in a faster time scale compared to the
BN. The confidence measurement from the active HMMs is
then transformed into soft evidence, and is used to update the
evidential nodes (BN nodes assigned by R). Newly arriving
evidence is thus propagated through the BN structure using
the inference scheme of the BN. The details of how to process
and propagate the information via the HHBN model will be
discussed in the next section.
How does one specify the model parameters (viz., conditional probability tables, transition probabilities)? While these
parameters could be estimated using a learning algorithm such
as EM (or Baum-Welch algorithm [23] in the classical HMM
phraseology), in a data-scarce environment such as counterterrorism it is doubtful if one can obtain enough training
data to learn the model (including the structure and model
parameters)1 . Our approach has been to develop an initial
model based on our understanding of the domain, and seek
review and feedback from the subject matter experts, as in
[18] and [19].
III. I NFORMATION P ROCESSING WITH HHBN
In this section, we will discuss the theoretical foundation
of the HHBN model. It addresses how the HMMs filter noise
transaction data and produce local information, how the local
information becomes soft evidence, and how the BN handles
the soft evidence from the HMMs and integrates it for a
global assessment. The information transformation between
the HMMs and BN is our primary focus.
The three basic problems solvable using HMMs are [24]:
1) Evaluation: Evaluating the probability of a sequence of
observations given a particular HMM.
2) Decoding: Finding the most likely sequence of state
transitions (i.e., the most likely path) associated with an
observed sequence.
3) Training: Adjusting the parameter set Λ to maximize the
probability of generating an observed sequence via the
Baum-Welch algorithm.
In this paper, we will assume that the model parameters
are known and fixed. That is, we focus on the evaluation and
decoding problems associated with HMMs. In the context of
counter-terrorism, the states and observations of the HMMs
are snapshots of the transactions associated with the modeled
terrorist activities; graphically, they are terrorist networks with
1 It may be possible to learn HMMs that model benign behavior. Better
inference of these frequent occurrences translates to easier removal of such
“clutter”.
Ω1
Ω2
Ω1
Ω2
Ω1
Ω2
(a) Independent
Fig. 2.
(b) Overlapping
(c) Intersecting
Three cases of the observation space for two HMMs.
the instantiated nodes and links. The HMM parameter set
Λ = (A, B, Π) represents the probability of moving from the
current state of terrorist activity to another (usually denoting an
increase in terrorist threat), the probability of observing a new
set of suspicious transactions given the current state, and the
probability of initial threat, respectively [25]. The HMMs accept a series of transactions among suspicious people, places,
and things as inputs. The goal of a HMM algorithm is to detect
the “signal” transactions, which are embedded in many noise
transactions, in a timely fashion.
Given new observed evidence (e), the probabilistic inference in a BN has four tasks [26]:
1) Belief updating, P (V = v|e);
2) Finding most probable explanation (MPE);
3) Finding maximum a posterior probability estimate of
network state (MAP);
4) Finding maximum expected utility (MEU) decision.
The current realization of the HHBN model (with the
ASAM system) considers the first task, viz., belief updating
only; the other three tasks will be included later. For example,
the MEU task is of interest when suggesting preemptive
counter-terrorism actions to a threat.
The BN evolution is triggered by the evidence from HMMs
and/or directly observed evidence (viz., hard evidence on the
BN nodes). Since the belief updating with hard evidence is
the basic function of BN inference algorithms, we do not
address this issue in the paper. With uncertainties in the raw
information as well as the nature of HMM detection statistics,
the HHBN model considers the soft evidence gleaned from
the likelihoods transmitted from HMMs to the BN; the soft
evidence measures the confidence of the corresponding HMM
in detecting the monitored terrorist activities.
Fig. 2 illustrates three cases of how the observation space
may be clustered for multiple HMMs (two HMMs as example): independent, overlapping and intersecting (e.g., the model
in Fig. 1 illustrated the third case). When the HMMs are
based on independent observation spaces, the collaboration or
information integration is among agencies that are monitoring
different aspects of the problem: they are accessing different
databases to obtain their raw information. Alternatively, both
the overlapping and intersecting spaces corresponding to the
cases where the collaborative agencies are sharing entirely or
partially the same data source, such as a transaction database.
Consequently, the independent case only requires a decoupled
tracking scheme, while the overlapping and intersecting cases
require a multiple hypothesis tracking scheme. In the later
case, the soft evidence requires additional processing to make
it statistically meaningful.
5
A. Independent Tracking
In the independent case, a binary hypothesis can be constructed for each HMM. Specifically, instead of evaluating the
probability of a sequence of observations up to a specified
discrete time index k given a particular HMM as in the usual
evaluation problem, we are interested in a hypothesis testing
problem with the null hypothesis H0 as pure noise (“benign”
or random transactions) and H1 as a HMM (HM M1 parameterized by Λ1 for example) of interest (viz., “terrorist activity”)
being detected at a specified discrete time index n0 . The details
of a single HMM detection scheme based on Page’s test [27]
is given in Appendix A. With the forward variables in Page’s
test, we have:
P (x1 , x2 , · · · , xk |λ1 ) =
NS
X
αk (i)
(1)
Fig. 3.
Detection of a single HMM in the presence of “noise” background.
i=1
where NS is the total number of states in HM M1 and the αk ’s
are the forward variables (defined in equation (14) of Appendix
A). Using the likelihood ratio, or the so-called confidence
measurement (with xk1 denoting the sequence of observations
{x1 , x2 , · · · , xk }), we have:
L(xk1 ) =
P (xk1 |λ1 )
.
P (xk1 |λ1 )
(2)
We can calculate the posterior probability of the HMM via:
P (xk1 |λ1 )P (λ1 )
P (xk1 )
P (xk1 |λ1 )P (λ1 )
=
P (xk1 |λ1 )P (λ1 ) + P (xk1 |λ1 )P (λ1 )
L(xk1 )L0
.
=
L(xk1 )L0 + 1
P (λ1 |xk1 ) =
(3)
Here, P (λ1 ) is the prior belief on the existence of HM M1 ,
and L0 = P (λ1 )/P (λ1 ) is the prior odds ratio. The posterior
probability is the agency’s belief on the existence of HM M1
based on the observations up to time index k. It is the
probability of detection in the BN layer, thus forming the soft
evidence to update the BN inference as discussed in Appendix
B. Briefly, when HM M1 is associated with a binary BN
node “V ” (with state “1” associated with λ1 and state “0”
associated with λ1 ), we will augment the initial BN with a
dummy node “EV ” which has the same set of states as “V ”
and a link from node “V ” when HM M1 is detected. The BN
belief updating is triggered by a hard evidence “EV = 1”
(since the local agency reports that the HMM is active) with a
conditional probability table (CPT) constructed from P (λ1 |xk1 )
and 1−P (λ1 |xk1 ) to represent the uncertainties in the evidence.
Actually, only the column corresponding to “EV = 1” in the
CPT will be of interest for belief updating, as we can see
from the two equations in Appendix B. It is also feasible that
multiple HMMs report to different states of the same nonbinary BN node. However, the states of a BN node have an
assumption that they are mutually exclusive, thus creating a
conflict if more than one HMM reports as being active to
the same node at the same time. We assume that this issue
is resolved in the modeling process, where we design binary
BN nodes to collect information from individual HMMs, while
adding intermediate nodes to specify the possible relationships
and semantics among active HMMs.
An example of the detection of a terrorist network via Page’s
test is illustrated in Fig. 3. HM M1 is detected at time unit
n0 = 60 with the threshold of detection “h” set at 20. Again, a
HMM only reports its confidence measurement to the BN node
when it is detected (viz., the CuSum test statistic associated
with Page’s test is above the threshold).
B. Multiple Hypothesis Tracking
When multiple HMMs share a data source, the inference
becomes essentially a multiple hypothesis tracking (MHT)
or multiple target tracking problem [28] because the HMMs
now compete for the observations (i.e., the association of
transactions to HMMs is uncertain). The HMMs with such
a superimposed observation space are similar to the so-called
factorial HMMs [29]. This paper follows a technique developed in [6], which is an extension of the single HMM detection
case, to solve this multiple HMMs detection problem.
For illustrative purpose, consider two HMMs: HM M1 and
HM M2 . Without loss of generality, we assume that at most
one HMM can be activated or deactivated at a time. The
valid multiple hypothesis tests are constructed as shown in
Fig. 4. In each test, H0 represents the null hypothesis and
H1 or H2 is the alternative hypothesis. The term “NULL”
implies that none of the HMMs is active, i.e., λ1 λ2 . MHT
starts with the independence assumption (test #1) that is
identical to independent tracking. Once one of the HMMs
is detected (i.e., significant transactions showing that the
underlying terrorist activity is active), a new hypothesis test
is formed (either test #2 or #3 based on which of the two
HMMs is detected first). A simulation result in the form of
CuSum test statistic is shown in Fig. 5. The ground truth [2]
for the simulation is superimposed in the figure where HM M1
is actually active from k = 1 to k = 150 and HM M2
is truly active from k = 50 to k = 92. The decision can
be made based on either L(xk1 ) = P (xk1 |λ1 λ2 )/P (xk1 |λ1 λ2 )
or L(xk1 ) = P (xk1 |λ1 λ2 )/P (xk1 |λ1 λ2 ) exceeding some predefined threshold h ( h = 20 in this simulation). It is evident
6
Test #2
Test #1
H0: NULL
H1: HMM1 only
H2: HMM2 only
λ 1λ 2
(a)
H0: HMM1 only
H1: HMM1 and HMM2
H2: NULL
HMM1
HMM2
(b)
λ 1λ 2
λ 1λ 2
λ 1λ 2
Test #3
H0: HMM2 only
H1: HMM1 and HMM2
H2: NULL
Test #4
H0: HMM1 and HMM2
H1: HMM1 only
H2: HMM2 only
(a) Detection of modeled HMM1 at k = 25
Fig. 4.
Illustration of tests for two HMMs. The arrows represent test
outcomes: for example, in test #2, HM M1 can disappear, or be joined by
HM M2 .
HMM1
HMM2
that HM M1 is detected first at time k = 25 as shown in Fig.
5(a), which causes a transition from test #1 to test #2 in Fig.
4. Starting from k = 25, a new test is generated to track if
both HM M1 and HM M2 are active, given that HM M1 is
already detected and still valid. Fig. 5(b) shows that the new
test statistic exceeds the threshold at time k = 60. In this case,
L(xk1 ) = P (xk1 |λ1 λ2 )/P (xk1 |λ1 λ2 ) ≥ h. The new test result
causes a transition from test #2 to test #3 as shown in Fig. 4.
Extension to more than two HMMs is straightforward.
A major output of the MHT is the likelihood function of the
observation sequence given multiple HMMs to be detected [6],
e.g., P (xk1 |λ1 , λ2 ). However, we require the marginal posterior
probabilities of individual HMMs to be reported to BN, i.e.,
P (λi |xk1 ) ∀i. Suppose we are currently dealing with hypothesis
test #2 in the previous example, and that both HMMs are
detected (viz., accept “H1 ”). The marginal probabilities can
then be approximated by:
.
P (λ1 |xk1 ) = P (λ1 λ2 |xk1 ) + P (λ1 λ2 |xk1 )
.
P (λ2 |xk1 ) = P (λ1 λ2 |xk1 )
(4)
(5)
The first and the second posterior probabilities in (4) come
from the hypotheses H0 and H1 in test #2, respectively.
Generally, this marginal posterior probability is approximated
via:
. X
P (Hj |xk1 ) ∀i
P (λi |xk1 ) =
Fig. 5.
Detection of multiple HMMs.
where
P (xk1 ) = P (xk1 |λ1 λ2 )P (λ1 )P (λ2 )+
P (xk1 |λ1 λ2 )P (λ1 )P (λ2 ) + P (xk1 |λ1 λ2 ))P (λ1 )P (λ2 )
≈ P (xk1 |λ1 λ2 )P (λ1 )P (λ2 ) + P (xk1 |λ1 λ2 )P (λ1 )P (λ2 ) (8)
with
L(xk1 ) = P (xk1 |λ1 λ2 )/P (xk1 |λ1 λ2 ))
(9)
L0 = P (λ2 )/(1 − P (λ2 ))
(10)
and
HMMs are assumed to be marginally independent (independent in the absence of observations) in (8).
(6)
IV. S OFTWARE I MPLEMENTATION
λi ∈Hj
i.e., sum over all the posterior probabilities where the current
hypothesis covers the HMM of interest (HM Mi is active in
this hypothesis). The joint posterior probabilities are determined in a way similar to the independent tracking case. For
example,
P (xk1 |λ1 λ2 )P (λ1 λ2 )
P (xk1 )
k
L(x1 )L0
=
L(xk1 )L0 + 1
(b) Detection of HMM1 and HMM2 in the presence of HMM1
P (λ1 λ2 |xk1 ) =
(7)
The adaptive safety analysis and monitoring (ASAM) system is developed based on the HHBN architecture and aimed
to support the collaborative analysis of intelligence information. As shown in Fig. 6, the ASAM system consists of five
functional modules: a graphical modeling tool, a knowledge
repository, HMM Engines, a BN Engine, and a web browser.
The modules of the ASAM system can be either locally hosted
or distributed via a network connection. While the former
case can be used for demonstration purposes or for prototype
testing, the latter is the deployment structure. For example,
the knowledge repository and the web service can be hosted
in a secure server, while authorized users can still access the
7
Web GUI
BN Engine
HMM Engine
Local Host or Distributed via Network Connection
Knowledge
Repository
Fig. 6.
Graphical Modeling
in TEAMS®
The ASAM architecture.
ASAM web site from anywhere via the internet; the modeling
tool can be installed where the modeling expertise resides;
the location of the BN engine and HMM engines should be
consistent with the geographical distribution of the agencies
(or divisions) involved.
The HMM-related algorithms are implemented in the HMM
engine, and the BN engine is implemented with the support
of the BN API (Application Programming Interface) “SMILE”
[30]. The graphical modeling tool is developed using TEAMS®
(Testability Engineering and Maintenance Systems [31]). We
utilize the hierarchical modeling capability of TEAMS® , which
was expanded to include inputs related to BNs (e.g., states
and conditional probabilities), HMMs (e.g., Markov chains
and transition probabilities), and the relationship between them
(who reports to whom). The complete model information is
then exported into the knowledge repository, which is currently
hosted in a MySQL database. The modeling process is offline
and requires subject matter experts (i.e., intelligence analysts)
to enter the scenarios (in the form of BNs) and terrorist
activity templates (in the form of HMMs). A snapshot of the
TEAMS® interface for the BN layer modeling is illustrated in
Fig. 7. This model will be used in the example in Section V.
The conditional probability table is for the highlighted node:
“Planning And Strategy”.
Once the models are entered, the ASAM system can be deployed for online monitoring or for offline “what-if” analysis.
A typical online monitoring scenario is as follows: various
agencies run their local HMMs to detect patterns of terrorist
events in the transaction space, and transmit their beliefs
of the events, as well as the observed transactions into the
repository. The BN, which also runs in real-time, obtains the
new information of related HMMs from the repository in the
form of soft evidence, updates the overall network beliefs, and
saves the inference results back into the repository. The analyst
can query any model, the HMM results, BN inference results
or transactions of interest via a web browser. The graphical
results such as the HMM confidence estimates or the state
probabilities of user-specified BN nodes are displayed on the
web browser in real-time.
An offline usage scenario of the ASAM system is similar to
the online usage scenario, except that an analyst can perform
“what-if” studies by editing existing transactions or adding
others into the transaction sequences, and can re-execute the
BN and HMM engines to generate a new set of results to inject
their subjective assessments into the analysis process. A simple
“what-if” analysis example is as follows: an analyst, examining
a detected transaction sequence from the web page, may
realize that an unobserved transaction must have happened and
thus one more transaction should be added at a certain time
index2 . He can test this subjective assumption by adding a new
transaction, and re-executing the HMM and BN engines in an
offline mode. The analyst can compare the new results with
the original ones and assess whether the results are sensitive
to the newly added transaction.
V. E XAMPLE : I NDIAN A IRLINES H IJACKING M ODEL
As discussed in the previous sections, the HHBN models are
transaction-based. A pattern of transactions is a potential realization of a possible event such as a hijack, suicide bombing,
or attacking an infrastructure target as in a counter-terrorism
application. A specific event scenario can be decomposed into
groups of transactions, and each group is assigned to the state
of a Hidden Markov chain. A BN model represents the overall
threat from diverse scenarios, with each scenario modeled as a
HMM. In this section, a hijacking scenario gleaned from open
sources is modeled and analyzed. A HHBN model related to
the Athens Olympics as well as general modeling process can
be found in [32].
On December 24, 1999, an Indian Airlines (IA) flight IC814, flying from Kathmandu to New Delhi with 180 persons
on board, was hijacked by a group of terrorists. The stand-off
ended on December 31st when the Indian government released
three high profile terrorists from a Kashmir jail. Our Indian
Airline Hijacking model abstracts the IA flight IC-814 hijacking event, and is created based on open source information
from the Embassy of India [33] and the Frontline Magazine
[34]. The model contains patterns of terrorist activities that
are present in the actual hijacking. The people, places and
things involved in the IA hijacking events are encapsulated
in non-specific nodes in an attempt to develop a canonical
representation of any airline hijacking.
Fig. 8 shows the BN model with representative prior probabilities and conditional probability tables. The Bayesian node
labeled “PU” depicts the level of political unrest between India
and Pakistan over the issue of Kashmir. Another Bayesian
node labeled “Activity” represents the activity level of terrorist
organizations in Kashmir. In the following simulations, the
prior probabilities associated with the BN nodes are held
constant, while the statistical inferences calculated by the
underlying HMMs (“Planning and Strategy”, “Collect Resources” and “Preparations for Hijacking”) update the soft
evidence of the corresponding BN nodes. The final, or global,
effect of these individual terrorist activities causes the BN
node, “Hijack”, to change with respect to the current belief –
the state of which (in the form of a probability mass function)
shows the likelihood of a hijacking taking place.
In this model, there are three HMMs (assumed to be originally from independent observation spaces) which symbolize:
planning and strategy, resource collection, and preparations
2 In fact, a proper HMM will allow for such a “missed detection”, but it
may be helpful to see how important it is to the inference.
8
Fig. 7.
Graphical modeling in TEAMS® .
PU
PU
(High)
0.02
PU
(Medium)
0.18
Activity
(High)
0.05
PU
(Low)
0.8
Activity
(Medium)
0.1
High
Activity
(Low)
0.85
Medium
Low
Planning
Prepare
Yes
Yes
No
Yes
No
No
Resources
Yes
No
Yes
No
Yes
No
Yes
No
Hijack
(Yes)
0.99
0.8
0.7
0.5
0.8
0.6
0.5
0.02
Hijack
(No)
0.01
0.2
0.3
0.5
0.2
0.4
0.5
0.98
High
Medium
Low
High
Medium
Low
High
Medium
Low
Planning
Yes
No
Resources
Yes
No
Fig. 8.
Activity
Planning
Yes
No
Yes
No
Planning
(Yes)
0.99
0.8
0.3
0.8
0.6
0.1
0.7
0.6
0.02
Resources
(Yes)
0.98
0.02
Prepare
(Yes)
0.98
0.4
0.6
0.02
Planning
(No)
0.01
0.2
0.7
0.2
0.4
0.9
0.3
0.4
0.98
Resources
(No)
0.02
0.98
Prepare
(No)
0.02
0.6
0.4
0.98
BN model for Indian airline hijacking.
for hijacking. The likelihood of these events are associated
with the Boolean BN node state: “Yes”. The Markov chain
of the these three HMMs are shown in Figs. 9, 10 and 11,
respectively.
The evolution of planning activities, political ideology and
general goals of the terrorist organization are depicted in
the HMM: “Planning and Strategy”. Political instability associated with a terrorist organization induces them to set up
bases/cells in the country X. Parallel to this, fundamentalists
and separatists also announce Holy War against the country
X. Headquarters personnel of terrorist organizations recruit
and train new members with particular talents that can be
employed in the attack. Planners analyze the targets and, in
selecting the target, attention is given to seize installations that
are highly visible and, consequently, would warrant extensive
media coverage. A HMM representation of planning and
strategy for the Indian airlines hijacking problem is illustrated
in Fig. 9. This model has nine states (N = 9) with state
transition probabilities (which form matrix A) labeled next
to the feasible transitions. The transaction network snapshots
corresponding to S1 , S2 and S9 are shown in Fig. 12(a)-(c).
The other states have the same set of nodes, but different
links. The transactions of solid lines in S9 represent the signal
transactions of this state and the transactions with dashed lines
superimpose possible signal transactions accumulated from
the state transitions (those are the transactions that occurred
before reaching the absorbing states). A transaction links two
nodes of the network, but each state may introduce more than
one new signal transaction. For instance, the assertion that
this HMM is in state S1 is denoting the network state that
“there is a political intent from certain terrorist organizations”;
the assertion that this HMM is in state S2 corresponds to
9
6
3ROLWLFDOLQVWDELOLW\
DVVRFLDWHGZLWKD
WHUURULVWRUJDQL]DWLRQ
6
6
Fig. 9.
6
6
5HFUXLWV
WUDLQLQJQHZ
PHPEHUV
6HWXS
EDVHVFHOOVLQ
FRXQWU\;
6
6
)XQGDPHQWDOLVWV
DQQRXQFHKRO\ZDU
DJDLQVWFRXQWU\;
3ODQQHUVHVWDEOLVK
UHODWLRQVKLSZLWK
ORFDOVPXJJOHUV
3ODQQHUV
HPEHGGHG
LQFRXQWU\;
7DUJHWLV
LGHQWLILHG
6
3ODQQHUVDUH
DVVLJQHG
6
3ODQQHUV
DQDO\]H
SRWHQWLDOWDUJHWV
Markov chain for HMM: Planning and Strategy.
0.2
0.5
S1
S5
Planners arrange forged
document for Hijackers
Planners meet
0.45
Planners
get money
0.25
S6
Planners establish
relationship with
local smugglers
0.2
Planners
Collect Tools
0.2
0.2
0.2
2
0.
S7
Fig. 10.
S8
All resources
collected
0.4
0.2
1
0.4
Assign tasks
to Hijackers
S3
S4
0.4
0.55
0.2
S2
0.2
0.4
0.75
0.5
Planners
collect
weapons
Markov chain for HMM: Collect Resources.
0.7
0.5
S4
0.4
2
Arrival of Hijack
leader at
target airport
Fig. 11.
S7
0.4
1
S9
Hijacking
0.2
Weapons
embedded
on flight
Markov chain for HMM: Preparations for Hijacking.
Political
intent
Terrorist
bases/cells
Terrorist
organization
Fundamentalists
Target country
Target
New terrorists
Planners
Local smugglers
Potential targets
(a) Network of S1 .
Fig. 12.
0.6
Communications
with weapons
installment team
Hijackers
assemble at
target airport
0.5
0.8
S6
0.2
S8
0.
3
0.
45
0.2
S5
8
0.
Target airport
and flight
reconnaissance
by Hijackers
4
0.
S2
Meeting
between
Planners
and Hijackers
0.5
0.55
S3
Planners go to
hidden location
0.
0.3
S1 Target airport
and flight
reconnaissance
by Planners
Political
intent
Terrorist
bases/cells
Terrorist
organization
Fundamentalists
Target country
Target
New terrorists
Planners
Local smugglers
Potential targets
(b) Network of S2 .
Transaction network snapshots for HMM: Planning and Strategy.
Political
intent
Terrorist
bases/cells
Terrorist
organization
Fundamentalists
Target country
Target
New terrorists
Planners
Local smugglers
Potential targets
(c) Network of S9 .
10
Hijackers
Planners
Weapons
Money
Misc. tools
Forged documents
Local smugglers
Target country
(a) Last state (S8 ) of HMM: Collect Resources.
Fig. 13.
Hijackers
Weapons
Planners
Hijack leader
Target airline
Weapon team
Target airport
Hijack flight
Hidden location
(b) Last state (S9 ) of HMM: Preparations for Hijacking.
Transaction network snapshots for the last states.
the event “enroll fundamentalists from the target country into
the terrorist organizations”. A possible state sequence of a
HMM is essentially a concatenation of all the transactions
in its previous state(s) with the current set of transactions,
i.e., a snapshot of a pattern. The prior probability Π for this
model is set as: [0.5, 0.5, 0, 0, 0, 0, 0, 0, 0]. This implies that,
at the time this HMM is detected, it will be in state S1 or
S2 with a probability of 0.5. The state evolution with the
structure in Fig. 9 implies that these two steps (S1 and S2 )
of terrorist planning and strategy process can be performed
simultaneously. The emission probabilities are assigned by
comparing the observation to the state model via the specified
probabilities of false alarm and missed detection associated
with the model [25].
Once a target is identified, a detailed plan of attack is
developed. Such a plan includes the kinds of demands that will
be made and the means by which they will be communicated
to authorities and the media. The HMM corresponding to
“Collect Resources”, as shown in Fig. 10, tracks the transactions that involve collecting resources to carry out a terrorist
attack. Terrorists begin to function as a group, once their organizational identity is established. The tactical and logistical
requirements of the operation, such as the types of weapons
that will be employed, the means by which the target (an
airplane in this case) will be held, the requirements of satellite
phones and other miscellaneous equipment, are established.
Planners acquire and transport the arms, ammunition, forged
documents and related equipment through interconnections
with local organized crime cells.
The HMM, denoting “Preparations for Hijacking”, as shown
in Fig. 11, demonstrates all the exercises for the hijacking. Planners and hijackers check the target airport and the
target airline. They repeatedly reconnoitre the target airline
to estimate the actions and measures they need to take in
order to neutralize or penetrate whatever security measures
had been established to protect the target. Each hijacker has
an organizational affiliation and identity. The organizational
identities of the hijackers enable them to get more quickly
into the personal roles that they will play throughout the
preparation and duration of the attack. Sometime before the
hijacking, planners hide in secret locations so that security
personnel cannot capture them after the hijacking. The hijack
leader communicates with the weapons team sometime before
the flight departure. When weapons team informs the hijack
leader that weapons are installed on the plane, the hijack leader
executes the hijacking of plane with his team. Due to space
limitations, only the last states for the latter two HMMs are
shown in Figs. 13 (a) and (b).
Detection of these modeled HMMs is shown in Fig. 14
in the form of CuSum test statistic. The evolution of the
corresponding Bayesian belief that the airline hijacking occurs
is shown in Fig. 15. We speed up the flow of the new transactions (e.g., every two seconds in the figures) for simulation
purposes. The real time associated with the IA hijacking events
are labeled for reference. The starting point of each HMM
detection curve is associated with the first time this HMM is
detected; thus, we believe (with certain probability) that the
modeled terrorist activity is in progress. A peak probability
usually results when this pattern evolves into the absorbing
state of the HMM. Once the peak is attained, the numerous
unrelated transactions will reduce the confidence in the detection. Thus, there are two reasons which can decrease the
probability in Fig. 14. They are caused by noise transactions
or simply because the terrorist activities have already reached
their goal and do not warrant any further transactions. The
BN updates its belief only when HMMs detect significant new
evidence. Typically, it merges all available information from
diverse sources and generates a global alarm.
VI. S UMMARY AND F UTURE W ORK
An information integration scheme using hierarchical and
hybrid Bayesian networks is introduced with counter-terrorism
as an application context. A HHBN model is constructed
from one BN and several HMMs. HMMs function in lower
layer transaction spaces in a fast time-scale, while the BN
is operating in top layer strategy space in a relatively slow
time-scale. An analytical software tool, the ASAM system, is
developed in accordance with the HHBN scheme. The ASAM
system uses HMMs to model the stochastic and dynamic
evolution of terrorist activities, which pertain to a particular
node state in the BN. The HMMs transmit soft evidence to
BN nodes, and the BN inference algorithms integrate the soft
evidence from multiple HMMs into an overall assessment
of terrorist threat. An example terrorist scenario, related to
the Indian Airlines hijacking, was adopted to illustrate the
proposed scheme and test the functionality of the software.
In designing and implementing strategies of response to
potential terrorist attacks, it is essential to think beyond the re-
11
HMM Detection Scheme
Terrorists Have
Collected All Necessary
Resources
Attack
HMM1
HMM3
Probability
CUSUM Statistic
HMM2
Bayesian Network Inference
Terrorists Are
Planning
Attack
08/1999
08/1999
Fig. 15.
10/1999
Event Time
12/1999
The belief of the Indian airline hijacking occurrence.
Samples
Agency 1
Fig. 14. Detection of three modeled HMMs in the presence of “noise”
background.
occurrence of the last event [19]. Although our methodology of
using HHBN and the ASAM system to analyze the information
is based on the knowledge of past events, a large spectrum of
possible scenarios and hypothetical patterns can be generated
using the ASAM modeling tool, and support the analyst in
exploring a range of possible countermeasures as well as in
conducting “what-if” analyses.
Our current work provided a distributed processing structure
for gathering, sharing, understanding, and using information
to assess the evolution of the terrorist activities. In combination with counter-terrorist network models, feasible actions
can be suggested to inhibit potential terrorist threats. More
sophisticated BN models, such as influence diagrams, may be
incorporated in the top level for strategic decision support by
adding action nodes and utility nodes into the BN model.
The HHBN is illustrated with a two-layer model in this
paper. Theoretically, hierarchical HMMs or BNs are also
possible, but dramatically increase the modeling and analysis
complexity. Fig. 16 shows another reasonable model where
the submodels (including HMMs and BNs) are tree-structured.
While HMMs always reside in the bottom layer for information filtering, the local agency can host a local BN for
further analysis. The local analysis results are then propagated
upwards to a higher level agency, and finally in a threat
integration center to arrive at a final decision.
The information flow in Fig. 1 shows upwards propagation.
More research can be done on the alternative direction, i.e.,
propagate backwards to suggest the future possible information that needs to be gathered by the local agencies. In other
words, global assessment can give direction to the informationcollection process and thus reduce the probability of future
missed detections. When the evidence from a particular HMM
is always inconsistent with the rest of the BN inference
(evidence conflict), it is very likely that this HMM is malfunctioning and it is preferable to prune it from the network.
This function requires structural adaptation of the network and
will be one of our future research efforts.
In this paper, we assumed the model parameters are derived
from interviews of subject matter experts. We are currently ex-
BN1
Agency 2
Agency 3
top level BN
second level HMMs
HMM1
HMM2
BN3
BN2
HMM3
Fig. 16.
second level BNs
third level HMM
Tree-structured HHBN model.
panding the proposed HHBN mechanism to other applications
where the data are available (e.g., fault diagnosis, command
and control architecture), and can learn the parameters from
the data. Further, online adaptation of model parameters is
feasible with evolving data. We are also continuing to refine
the model and performance/sensitivity tests are underway.
ACKNOWLEDGMENT
This work was supported by Aptima Inc. as part of the
NEMESIS (NEtwork Modeling Environment for Structural
Intervention Strategies) project. A preliminary version of this
paper was presented in SPIE 2004 [35].
The authors would like to thank Qualtech Systems Inc. for
the TEAMS® software, and the Decision Systems Laboratory
at the University of Pittsburgh for the “SMILE” Bayesian
Networks API in C++. We thank anonymous reviewers for
valuable comments.
A PPENDIX
A. The HMM Detection Scheme
The state transition matrix of the underlying Markov chain
associated with a HMM hS, X, A, B, Πi parameterized by Λ =
hA, B, Πi is given by:
h ¡
¢i
A = [aij ] = p s(k + 1) = Sj |s(k) = Si
¡
¢
i, j ∈ {1, 2, · · · , NS }
(11)
12
where s(k) is the state at time k. The observation process is
represented via the emission matrix:
h ¡
¢i
B = [bil ] = p x(k) = Xl |s(k) = Si
¡
¢
i ∈ {1, 2, · · · , NS }, l ∈ {1, 2, · · · , NX }
(12)
The prior probabilities of the Markov states at time k = 1
are given by
h ¡
¢i ¡
¢
i ∈ {1, 2, · · · , NS } (13)
Π = [πi ] = p s(1) = Si
An efficient detection scheme based on forward variables
and the log likelihood ratio was developed in [25]. The forward
variable αk (i) [36] is defined as the joint probability of
observation sequence and state at time k given λ (meaning
that the HMM parameterized by Λ is active) as follows:
¡
¢
αk (i) = p x(1), x(2), · · · , x(k), s(k) = Si |λ
(14)
This variable can be updated recursively via:
#
"N
S
X
αk (i)aij bjx(k+1)
αk+1 (j) =
(15)
i=1
with the initial condition
α1 (j) = πj bjx(1)
(16)
The detection time n0 , based on Page’s test [27], can be found
via:
½µ
¶
¾
n
n0 = arg min
max Lk ≥ h
(17)
n
1≤k≤n
Here, h is a predefined threshold and Lnk is the log likelihood
ratio of observations {x(k), · · · , x(n)} given by:
Ã
¡
¢!
n
X
PH1 x(i)|x(i − 1), · · · , x(k)
n
¡
¢
ln
Lk =
(18)
PH0 x(i)
i=k
with H1 and H0 contain the alternative hypotheses as discussed in Section III, and are consistent for both independent
tracking case and multiple hypothesis tracking case. The
unconditioned denominator comes from the assumption that
the “benign” transaction-based observations are independent.
The HMM detection scheme, also know as Page’s test or the
Cumulative Sum (CuSum) method, is optimal in this case [27].
We use Page’s test to detect a switch from ordinary noise
(“benign”) transactions to those modeled “signal” (“terrorist
activity”) transactions. This is a change detection problem,
wherein the distribution of transactions is different before and
after an unknown time n0 ; and our objective is to detect the
change, if it exists, as soon as possible. Extending Page’s test
to fit the theoretical framework of HMMs is straightforward,
given the forward variables. Recall that at time index k,
NS
¡
¢ X
αk (i)
P x(1), x(2), · · · , x(k)|λ =
(19)
i=1
where NS is the total number of states of the HMM. Given
this, the conditional probability in (18) is readily solved via:
PH1 (x(k)|x(k − 1), · · · , x1 )=P (x(k)|x(k − 1), · · · , x(1), λ)
PNS
αk (i)
= PNi=1
(20)
S
i=1 αk−1 (i)
More details on the detection algorithm as well as the use
of HMMs for prediction can be found in [25].
In our development we have assumed the observations x(k)
to be observed or missed. However, for the (realistic) case
that transactions are imperfectly observed — that is, there is
vagueness — the feature-aided tracking approach of [37] can
be directly applied.
B. BN Belief Updating While Observing Soft Evidence
To simplify the presentation, consider a binary node V with
state variables (0, 1). Define H1 and H0 as binary hypotheses
that node V is in state “1” or “0” with prior probabilities of
P (H1 ) and P (H0 ) = 1−P (H1 ); The two likelihoods P (V =
1|H1 ) and P (V = 1|H0 ), which form the soft evidence vector,
are indeed the probability of detection (PD ) and the probability
of false alarm (PF ), respectively.
In order to illustrate how we update the belief with soft
evidence, consider the BN in Fig. 17 with three binary (with
state “1” and “0”) nodes A, B and C. Before we
P receive any
evidence, the belief of node C is P (C = 1) = A,B P (C =
1|A, B)P (A)P (B) = 0.624. This value is easy to obtain for
small networks; however, for larger and more practical networks, efficient algorithms such as junction tree are required
to reduce the computation time. A survey on the BN inference
algorithms can be found in [38].
If soft evidence is observed on a node such as node A, and
since this evidence is the only source of information, we can
directly update the prior probability. As an example, suppose
that the HMM corresponding to node A is detected to be active
with confidence 0.7, we will then have P (A = 1) = 0.7
and P (A = 0) = 0.3 and use these priors to update the
inference. For a node such as node C, the soft evidence can
be modeled as a noisy sensor. Whenever soft evidence is
reported to a BN node, a dummy node (EC as example)
is added to represent the output of the sensor, and the link
between the physical node and the dummy node characterizes
the confidence of the sensor measurement. The soft evidence
is represented as a contingency matrix, with elements that are
function of the probability of detection and the probability of
false alarm. Without loss of generality, we assume that the
sensors have symmetric performance, that is, PD + PF = 1.
This assumption is identical to the idea of normalizing the
conditional probabilities in the parlance of BNs. Thus, given
the parameters listed in Fig. 17, we update the belief of node
C as follows:
Q(C = 1) = P (C = 1|EC = 1)
P (EC = 1|C = 1)P (C = 1)
=
P (EC = 1|C = 1)P (C = 1) + P (EC = 1|C = 0)P (C = 0)
≈ 0.937
(21)
The prior probability distribution of node C is the probabilistic belief before the new evidence arrives, viz., P (C =
1) = 0.624 and P (C = 0) = 0.376. We can see that the belief
updating trades off the prior knowledge and the new “dummy”
observation from the soft evidence.
13
A=1
A=0
0.2
H1
H0
observe observe
1
0
PD
1-PD
PF
1-PF
0.8
C
1
0
EC=1 EC=0
0.9
0.1
C
0.1
0.9
EC
Fig. 17.
B=1
B=0
0.9
0.1
B
A
A
B
C=1
C=0
1
1
0
0
1
0
1
0
0.9
0.7
0.6
0.2
0.1
0.3
0.4
0.8
Example for belief updating with soft evidence.
R EFERENCES
[1] R. Popp, T. Armour, T. Senator, and K. Numrych, “Countering terrorism
through information technology,” Communications of ACM, vol. 47,
no. 3, pp. 36–43, March 2004.
[2] S. Singh, J. Allanach, H. Tu, K. R. Pattipati, and P. Willett, “Stochastic
modeling of a terrorist event via the ASAM system,” in IEEE International Conference on SMC, The Hague, The Netherlands, October 10-13
2004.
[3] D. Niedermayer. An introduction to Bayesian networks
and
their
contemporary
applications.
[Online].
Available:
http://www.niedermayer.ca/papers/bayesian/bayes.html
[4] J. Ying, T. Kirubarajan, K. R. Pattipati, and A. Patterson-Hine, “A hidden
Markov model-based algorithm for fault diagnosis with partial and
imperfect tests,” IEEE Transactions on Systems, Man, and Cybernetics
- Part C: Applications and Reviews, vol. 30, no. 4, pp. 463–473,
November 2000.
[5] B. Chen and P. Willett, “Detection of hidden Markov model transient
signals,” IEEE Transactions on Aerospace and Electronic systems,
vol. 36, no. 4, pp. 1253–1268, December 2000.
[6] ——, “Superimposed HMM transient detection via target tracking
ideas,” IEEE Transactions on Aerospace and Electronic systems, vol. 37,
no. 3, pp. 946–956, July 2001.
[7] L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov
models,” IEEE ASSP Magazine, pp. 4–16, January 1986.
[8] L. R. Rabiner, “A tutorial on hidden Markov models and selected
applications in speech recognition,” Proceedings of the IEEE, vol. 77,
no. 2, pp. 257–286, February 1989.
[9] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. San Mateo, CA: Morgan Kaufmann, 1988.
[10] M. I. Jordan, Learning in Graphical Models. MIT Press, 1999.
[11] F. V. Jensen, An Introduction to Bayesian Networks. UCL Press London,
1996.
[12] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter,
Probabilistic Networks and Expert Systems. Springer-Verlag, 1999.
[13] S. Fine, Y. Singer, and N. Tishby, “The hierarchical hidden Markov
model: Analysis and applications,” Machine Learning, vol. 32, no. 1,
pp. 41–62, July 1998.
[14] M. I. Jordan, Z. Ghahramani, and L. K. Saul, “Hidden Markov decision
trees,” in Advances in Neural Information Processing Systems, M. C.
Mozer, M. I. Jordan, and T. Petsche, Eds. The MIT Press, 1997.
[15] E. Gyftodimos and P. Flach, “Hierarchical Bayesian networks: a probabilistic reasoning model for structured domains,” in Proceedings of the
ICML-2002 Workshop on Development of Representations, E. D. Jong
and T. Oates, Eds. The university of New South Wales, July 2002, pp.
23–30.
[16] S. Nakamura and K. Markov, “A hybrid HMM/Bayesian network
approach to robust speech recognition,” in Proceedings of the Special
Workshop in MAUI (SWIM), Maui, Hawaii, January 12-14 2004.
[17] T. Coffman and S. Marcus, “Dynamic classification of groups through
social network analysis and HMMs,” in IEEE Aerospace Conference,
Big Sky, Montana, March 2004.
[18] L. Hudson, B. Ware, K. Laskey, and S. Mahoney, “An application
of Bayesian networks to antiterrorism risk management for military
planners,” Department of Systems Engineering and Operations Research,
Georgy Mason University, Tech. Rep., 2001.
[19] E. Paté-Cornell and S. Guikema, “Probabilistic modeling of terrorist
threats: a systems analysis approach to setting priorities among countermeasures,” Military Operations Research, vol. 7, no. 4, pp. 5–20,
December 2002.
[20] D. Heckerman and J. S. Breese, “Causal independence for probability
assessment and inference using Bayesian networks,” IEEE Transactions
on Systems, Man & Cybernetics - Part A: Systems and Humans, vol. 26,
no. 6, pp. 826–831, November 1996.
[21] H. Tu, J. Levchuk, and K. R. Pattipati, “Robust action strategies
to induce desired effects,” IEEE Transactions on Systems, Man &
Cybernetics - Part A: Systems and Humans, vol. 34, no. 5, pp. 664–
680, September 2004.
[22] C. Huang and A. Darwiche, “Inference in belief networks: a procedural
guide,” International Journal of Approximate Reasoning, vol. 15, no. 3,
pp. 225–263, 1996.
[23] L. Baum, T. Petric, G. Soules, and N. Weiss, “A maximization technique
occurring in the statistical analysis of probabilistic function of Markov
chains,” Annals of Mathematical Statistics, vol. 41, no. 1, pp. 164–171,
1970.
[24] C. Zhai. (2003) A brief note on the hidden Markov models. [Online].
Available: http://sifaka.cs.uiuc.edu/course/397cxz03f/hmm.pdf
[25] J. Allanach, H. Tu, S. Singh, P. Willett, and K. R. Pattipati, “Detecting,
tracking, and counteracting terrorist networks via hidden Markov models,” in IEEE Aerospace Conference, Big Sky, Montana, March 2004.
[26] I.
Rish
and
M.
Singh.
A
tutorial
on
inference
and learning in Bayesian networks. [Online]. Available:
http://www.research.ibm.com/people/r/rish/talks/BN-tutorial.ppt
[27] E. Page, “Continuous inspection schemes,” Biometrika, vol. 41, pp. 100–
115, 1954.
[28] S. Blackman and R. Popoli, Design and Analysis of Modern Tracking
Systems. Artech House, 1999.
[29] Z. Ghahramani and M. I. Jordan, “Factorial hidden Markov model,”
Machine Learning, vol. 29, no. 2-3, pp. 245–273, November/December
1997.
[30] Genie/SMILE, Decision Systems Laboratory, University of Pittsburgh.
[Online]. Available: http://www.sis.pitt.edu/~genie
[31] TEAMS® . [Online]. Available: http://www.teamqsi.com
[32] S. Singh, H. Tu, J. Allanach, J. Areta, P. Willett, and K. R. Pattipati,
“Modeling threats,” IEEE Potentials, pp. 18–21, August/September
2004.
[33] Hijacking of Indian Airlines Flight IC-814. [Online]. Available:
http://www.indianembassy.org/archive/IC 814.htm
[34] Frontline Magazine, India,, vol. 17, no. 2, January-February 2000. [Online]. Available: http://www.frontlineonnet.com/fl1702/17020040.htm
[35] H. Tu, J. Allanach, S. Singh, K. R. Pattipati, and P. Willett, “The adaptive
safety analysis and monitoring system,” in SPIE Defense and Security
Symposium, Orlando, April 12-16 2004, pp. 153–165.
[36] S. Levinson, L. Rabiner, and M. Sondhi, “An introduction to the application of the theory of probabilistic functions of a Markov process to
automatic speech recognition,” Bell Systems Technical Journal, vol. 62,
pp. 1035–1074, 1983.
[37] Y. Bar-Shalom, T. Kirubarajan, and C. Gokberk, “Tracking with
classification-aided multiframe data association,” IEEE Transactions on
Aerospace and Electronic systems, vol. 41, no. 4, October 2005.
[38] H. Guo and W. Hsu, “A survey on algorithms for real-time
Bayesian network inference,” In the joint AAAI-02/KDD-02/UAI-
14
02 workshop on Real-Time Decision Support and Diagnosis
Systems, Edmonton, Alberta, Canada, 2002. [Online]. Available:
citeseer.ist.psu.edu/guo02survey.html
Haiying Tu received the BS degree in automatic
control from Shanghai Institute of Railway Technology in 1993 and MS in transportation information engineering and control from Shanghai Tiedao
University in 1996. She is currently a Ph.D. student
in Electrical and Computer Engineering at the University of Connecticut (UCONN). Prior to joining
UCONN, she was a lecturer at Tongji University in
Shanghai, China and also worked as an employee
of Computer Interlocking System Testing Center,
which belongs to the Ministry of Railway of China.
Her current research interests include organizational design, Bayesian analysis,
fault diagnosis and decision making.
Jeffrey Allanach is a systems engineer for Applied
Physical Sciences (APS) in New London, CT. Prior
to join APS, he was a graduate student of Electrical and Computer Engineering at the University
of Connecticut (UConn). He received his MS in
May 2005 and BS in December 2003, both from
UConn. Currently, his research interests include signal processing and target tracking.
Satnam Singh is a PhD student at Systems Optimization Laboratory, University of Connecticut. He
received his MS degree in Electrical Engineering
from the University of Wyoming. His interests are in
signal processing, communication and optimization.
Krishna Pattipati is a Professor of Electrical and
Computer Engineering at the University of Connecticut, Storrs, CT, USA. His research has been
primarily in the application of systems theory and
optimization techniques to complex systems. Prof.
Pattipati received the Centennial Key to the Future
award in 1984 from the IEEE Systems, Man and Cybernetics (SMC) Society, and was elected a Fellow
of the IEEE in 1995. He received the Andrew P.
Sage award for the Best SMC Transactions Paper
for 1999, Barry Carlton award for the Best AES
Transactions Paper for 2000, the 2002 NASA Space Act Award, and the
2003 AAUP Research Excellence Award at the University of Connecticut. He
also won the best technical paper awards at the 1985, 1990, 1994, 2002 and
2004 IEEE AUTOTEST Conferences, and at the 1997 and 2004 Command
and Control Conferences. Prof. Pattipati served as Editor-in-Chief of the IEEE
Transactions on SMC-Cybernetics (Part B) during 1998-2001.
Peter Willett is a Professor of Electrical and Computer Engineering at the University of Connecticut.
Previously he was at the University of Toronto,
from which he received his BS in 1982, and at
Princeton University from which he received his
PhD in 1986. He has written, among other topics,
about the processing of signals from volumetric
arrays, decentralized detection, information theory,
CDMA, learning from data, target tracking, and
transient detection. He is a Fellow of the IEEE, is a
member of the Board of Governors of IEEE’s AES
society, and is a member of the IEEE Signal Processing Society’s SAM
technical committee. He is an associate editor both for IEEE Transactions
on Aerospace and Electronic Systems and for IEEE Transactions on Systems,
Man, and Cybernetics. He is a track organizer for Remote Sensing at the IEEE
Aerospace Conference (2001-2003), and was co-chair of the Diagnostics,
Prognosis, and System Health Management SPIE Conference in Orlando.
He also served as Program Co-Chair for the 2003 IEEE Systems, Man and
Cybernetics Conference in Washington, DC.

Information Integration via Hierarchical and Hybrid Bayesian Networks

Transcription

Similar documents

view a pdf of the included designs