How to Decide What to Do?

Transcription

How to Decide What to Do?
How to Decide What to Do?
A Comparison between Classical, Qualitative and Cognitive Theories
Mehdi Dastani
Utrecht University
The Netherlands
[email protected]
Joris Hulstijn
Utrecht University
The Netherlands
[email protected]
Leendert van der Torre
CWI Amsterdam
The Netherlands
[email protected]
January 10, 2003
Abstract
In this paper Classical Decision Theory (CDT) is compared with three alternative
theories developed in artificial intelligence and agent theory, called Qualitative Decision Theory (QDT), Knowledge Based Systems (KBS) and Beliefs-Desire-Intention
(BDI) models. All theories are based on an informational attitude (probabilities,
qualitative abstractions of probabilities, knowledge, beliefs) and motivational attitude (utilities, qualitative abstractions, goals, desires). All theories represent a set
of alternatives (a small set, logical formulas, decision variables, branches of trees).
Some theories focus on the decision rule (maximize expected utility, Wald criterion)
and others on the way to reach decisions (deliberation, agent types). We also consider the relation between decision processes and intentions, and the relation between
game theory and norms and commitments. We show that QDT, KBS and BDI are
sometimes alternative solutions to the same problem, but in general they are complementary. For example, QDT offers qualitative decision rules that can make trade-offs
between normal and exceptional circumstances, QDT and BDI offer logical specification languages, KBS offers goal-based decision making, and BDI offers stability in
decision processes.
1
Introduction
In recent years there has been new interest in the foundations of decision making, due
to the automation of decision making in the context of tasks like planning, learning and
communication. In artificial intelligence and agent theory various alternatives to classical decision theory have been developed, rooted in different research traditions and with
different objectives. The relations between the different theories that aim to explain the
decision-making behavior of rational agents are not well studied.
We are interested in relations between the various approaches to decision making. In
this paper we consider four theories: Classical Decision Theory (CDT) [42, 28] developed
1
within economics and used within operations research, a Qualitative variant of Decision
Theory (QDT) developed in artificial intelligence [36, 6], Knowledge Based Systems (KBS)
[34], also developed in artificial intelligence, and Belief-Desire-Intention (BDI) models,
developed in philosophy and agent theory [7, 39, 13, 15, 29]. The main distinction between
CDT and QDT on the one hand, and KBS and BDI on the other hand, is that the latter
two describe decision making in terms of deliberation on cognitive attitudes such as beliefs,
desires, intentions, goals, etc. They are therefore usually called theories of deliberation
instead of decision theories.
The alternative theories criticize classical decision theory, which is our starting point
in the comparison. This criticism goes back to Simon’s notion of limited (or bounded)
rationality, and his introduction of utility aspiration levels [46]. This has led to the notion
of a goal in KBS. The research area of QDT developed much more recently in artificial
intelligence out of research on models for reasoning under uncertainty. It focusses on
theoretical models of decision making with potential applications in planning. The research
area of BDI developed out of philosophical arguments that – besides the knowledge and
goals used within KBS – also intentions should be first class citizens of a cognitive theory
of deliberation.
The following example of Doyle and Thomason [22] illustrates the criticism on classical
decision theory. It is based on the automation of human financial advice dialogues, which
have been studied in linguistics [52]. Consider a user who seeks advice about financial
planning. The user may for example want to retire early, secure a good pension and
maximize the inheritance for her children. We can model the decision of such a user in terms
of the usual decision theoretic parameters. The user can choose between a limited number
of actions: retire at a certain age, invest her savings and give certain sums of money to her
children per year. The user does not know all factors that might influence the decision. She
does not know if she will get a pay raise next year, the outcome of her financial actions is
uncertain, and not even the user’s own preferences are clear. For example, securing her own
pension conflicts with her children’s inheritance. An experienced decision theoretic analyst
interactively guides the user through the decision process, indicating possible choices and
desirable consequences. As a result the user may drop her initial preferences. For example,
she may prefer to continue working for another five years before retiring. According to
Doyle and Thomason, this interactive process of preference elicitation cannot be automated
in decision theory itself, although there are various approaches and methodologies available
in decision theoretic practice. The problem is that it is difficult to describe the alternatives
in this example. Another problem is that CDT is not suitable to model generic preferences.
This example can also be used to illustrate the differences between objectives and
concepts of the three alternative theories. Most importantly, the different theories provide different conceptualization of the decision making behavior, which all may be helpful.
They focus at automating and modeling different aspects of the preference elicitation phase.
QDT uses the same conceptualization as CDT. Preferences are typically uncertain, formulated in general terms, dependent on uncertain assumptions and subject to change. Often
a preference is expressed in the shape of a trade-off. Is this really what the user wants
and how much is she prepared to sacrifice for that? KBS model and formalize a particular
2
task and domain in terms of the knowledge and goals of a user [34]. Knowledge-based
systems use high-level conceptual models of particular domains such as medical, legal or
engineering domains, and aim to re-use inference schemes applicable for particular tasks,
such as classification or configuration. In many applications, reasoning with knowledge
and goals is implemented by production rules with either forward or backward chaining.
There have been many successful applications of KBS. Development methods for KBS
have matured. Schreiber et al. [43] for example, provide an elaborate methodology for
modeling, developing and testing knowledge-based applications. BDI is a cognitive theory
of decision making behavior of rational agents and aims at describing the decision making
behavior in terms of cognitive concepts such as beliefs, desires, and intentions. In the BDI
theory, an agent can behave rationally in different ways depending on its type. The belief
and desire concepts corresponds closely to the concepts used in alternative decision theories, and the intention concepts, which can be considered as previous decision, is proposed
to constraint the alternatives from which agents can decide. The concept of intention is
claimed to stabilize the decision making behavior through time.
In this paper we do not only discuss the four basic theories, but we also consider some
extensions of them. In particular, we also consider the relation between decision processes
and intentions, and the relation between game theory and norms and commitments. A
summary of all related theories is given in Table 1. Some concepts can be mapped easily
on concepts of other theories. For example, all theories use some notion of an informational
attitude (probabilities, qualitative abstractions of probabilities, knowledge or beliefs) and a
notion of motivational attitude (utilities, qualitative abstractions, goals or desires). Other
concepts are more ambiguous, such as intentions. For example, in goal-based planning
the goals have a desirability aspect as well as an intentional aspect [20]. KBS have been
developed to formalize goal-based reasoning, and some qualitative decision theories like [6]
have been developed as a criticism to the inflexibility of the notion of goal in goal based
planning.
CDT
QDT
KBS / BDI
theory
classical
qualitative
knowledge-based
decision theory
decision theory
systems
+ time
(Markov)
decision-theoretic belief-desire-intention
decision processes
planning
logics & systems
+ multiagent
classical
qualitative
normative systems
game theory
game theory
(BOID)
Table 1: Theories discussed in this paper
Clearly, the discussions in this paper have to be superficial and cannot do justice to the
subtleties defined in each approach. We therefore urge the reader to read the original papers
we discuss. To restrict the scope of our comparison, we make some simplifications. First,
we discuss QDT in more detail than KBS and BDI, because it is closer to CDT and has been
positioned as an intermediary between CDT and KBS/BDI [22]. Second, we do not try to
3
be complete, but we restrict our comparison to some illustrative examples. In particular,
for the relation between CDT and QDT we discuss Doyle and Thomason [22] and Pearl
[36]. For the relation between QDT and KBS/BDI we focus on the different interpretations
of goals, and our discussion is based on Boutilier [6] and Rao and Georgeff [39]. For the
direct relation between CDT and KBS/BDI we discuss Rao and Georgeff’s translation of
decision trees to beliefs and desires [38]. The extension with time in decision processes
focusses on the role of intentions in Rao and Georgeff’s work [39] and the extension with
multiple agents in game theory focusses on the role of norms in a logic of commitments
[9]. We restrict ourselves to formal theories and logics, and do not go into architectures or
into the philosophical motivations of the underlying cognitive or social concepts.
The layout of this paper is as follows. In Section 2 we discuss informational and
motivational attitudes, alternativea and decision rules in classical and qualitative decision
theory. In Section 3 we discuss goals in qualitative decision theory, knowledge based
systems and BDI systems. In Section 4 we compare classical decision theory and Rao
and Georgeff’s BDI approach. Finally, in Section 5 we discuss intentions and norms in
extensions of classical decision theory that deal with time in processes, and that deal with
multiple agents in game theory.
2
Classical vs qualitative decision theory
In this section we give a comparison between classical and qualitative decision theory,
following Doyle and Thomason’s introduction to qualitative decision theory [22].
2.1
Classical decision theory
In classical decision theory, a decision is a choice made by some entity of an action from
a set of alternatives. It has nothing to say about actions – either about their nature or
about how a set of them becomes available to the decision maker. A decision is good if
it is an alternative that the decision maker believes will prove at least as good as other
alternative actions. Good decisions are formally characterized as actions that maximize
expected utility, a notion involving both belief and goodness. See [22] or [42, 28] for further
explanations.
Definition 1 Let A stand for the set of actions or alternatives. With each action, a set of
outcomes is associated. Let W 1 stand for the set of all possible worlds or outcomes . Let U
be a measure of outcome value that assigns a utility U (w) to each outcome w ∈ W , and let
P be a measure of the probability of outcomes conditional on actions, with P (w|a) denoting
the probability that outcome w comes about after taking action a ∈ A in the situation under
consideration.
1
Note that outcomes are usually represented by Ω. Here we use W to facilitate the comparison with
other theories.
4
The expected utility EU (a) of an action a is the average utility of the outcomes associated with the alternative, weighing the utility of each outcome by the probability that
P
the outcome results from the alternative, that is, EU (a) = w∈W U (w)P (w|a). A rational
decision maker always maximizes expected utility, i.e. he makes decisions according to the
MEU decision rule.
Decision theory is an active research area within economics, and the number of extensions and subtleties is too large to address here. In some presentations of CDT, not only
uncertainty about the effect of actions is considered, but also uncertainty about the actual
world. A classic result is that uncertainty about the effects of actions can be expressed in
terms of uncertainty about the present state. Moreover, several other decision rules have
been investigated, including qualitative ones, such as maximin, minimax, minregret etc.
Finally, classical decision theory has been extended in several ways to deal for example with
multiple objective, sequential decisions, multiple agents, distinct notions of risk, etcetera.
The extensions with sequential decisions and multiple agents are discussed in Section 5.1
and 5.2.
Decision theory has become one of the main foundations of economic theory due to
so-called representation theorems. Representation theorems, such as the most famous one
of Savage [42], typically prove that each decision maker obeying certain innocent looking
postulates (about weighted choices) acts as if he applies the MEU decision rule with some
probability distribution and utility function. Thus, he does not have to be aware of it, and
his utility function does not have to represent selfishness. In fact, exactly the same is true
for altruistic decision makers. They also act as if they maximize expected utility; they just
have another utility function.
2.2
Qualitative decision theory
According to Doyle and Thomason [22, p.58], quantitative representations of probability
and utility and procedures for computing with these representations, do provide an adequate framework for manual treatment of very simple decision problems, but they are less
successful in more realistic cases. For example, they argue that classical decision theory
does not address decision making in unforeseen circumstances, it offers no means for capturing generic preferences, it provides little help in modeling decision makers who exhibit
discomfort with numeric tradeoffs, and it provides little help in effectively representing and
reasoning about decisions involving broad knowledge of the world.
Doyle and Thomason argue for various formalization tasks. They distinguish the following new tasks: formalisms to express generic probabilities and preferences, properties
of decision formulations, reasons and explanations, revision of preferences, practical qualitative decision-making procedures and agent modeling. Moreover, they argue that hybrid representation and reasoning with quantitative and qualitative techniques, as well as
reasoning within context, deserve special attention. Many of these issues are related to
subjects studied in artificial intelligence. It appears that researchers now realize the need
5
to reconnect the methods of AI with the qualitative foundations and quantitative methods
of economics.
Some first results have been obtained in the area of reasoning about uncertainty, a subdomain of artificial intelligence which mainly attracts researchers with a background in
reasoning about defaults and beliefs. Often the formalisms of reasoning about uncertainty
are re-applied in the area of decision making. Thus, typically uncertainty is not represented
by a probability function, but by a plausibility function, a possibilistic function, Spohntype rankings, etc. Another consequence of this historic development is that the area is
much more mathematically oriented than the planning community or the BDI community.
As a typical example, we mention the work of Pearl [36]. First, he introduces so-called
semi-qualitative rankings. A ranking κ(w) can be considered as an order-of-magnitude
approximation of a probability function P (w) by writing P (w) as a polynomial of some
small quantity ǫ and taking the most significant term of that polynomial. Similarly, positive
µ(w) can be considered as an approximation of a utility function U (w). There is one more
subtlety here, which has to do with the fact that whereas κ rankings are positive, the µ
rankings can be either positive or negative. This represents that outcomes can be either
very desirable or very undesirable.
Definition 2 A belief ranking function κ(w) is an assignment of non-negative integers to
outcomes or possible worlds w ∈ W such that κ(w) = 0 for at least one world. Intuitively,
κ(w) represents the degree of surprise associated with finding a world w realized, and worlds
assigned κ(w) = 0 are considered serious possibilities. Likewise, µ(w) is an integer-valued
utility ranking of worlds. Moreover, both probabilities and utilities are defined as a function
of the same ǫ, which is treated as an infinitisimal quantity (smaller than any real number).
C is a constant and O is the order of magnitude.
P (w) ∼ Cǫκ(w) , U (w) = O(1/ǫµ(w) ) if µ(w) ≥ 0, −O(1/ǫ−µ(w) ) otherwise
This definition illustrates the use of abstractions of probabilities and utilities. However,
as a consequence of Pearl’s qualitative perspective, more has to change. There are also
several issues with respect to the set of alternatives, and the use of the MEU decision rule
in this qualitative setting. These issues are discussed in the comparison in the following
section.
2.3
2.3.1
Relation CDT and QDT
Alternatives
A difference between the two approaches is that actions in decision theory are pre-designated
to a few distinguished atomic variables, but Pearl assumes that actions ‘Do(ϕ)’ are presumed applicable to any proposition ϕ. He associates a valuation function with the set
of worlds W , which assigns a truth value to every every proposition at each world. A
proposition ϕ is now identified with the set of worlds that satisfy ϕ.
6
In this setting, he explains that the expected utility of a proposition ϕ clearly depends
on how we came to know ϕ. For example, if we find out that the ground is wet, it matters
whether we happened to find the ground wet (observation) or we watered the ground
(action). In the first case, finding ϕ true may provide information about the natural process
that led to the observation ϕ, and we should change the current probability from P (w)
to P (w|ϕ). In the second case, our actions may perturb the natural flow of events, and
P (w) will change without shedding light on the typical causes of ϕ. This is represented
differently, by Pϕ (w). According to Pearl, this distinction between P (w|ϕ) and Pϕ (w)
nicely corresponds to various other distinctions found in a wide variety of theories, such as
the distinction between Lewis’ conditioning and imaging, between belief revision and belief
update, and between indicative and subjunctive conditionals. One of the tools Pearl uses
for the formalization of this distinction are causal networks (a kind of Bayesian networks
with actions).
A similarity between the two theories is that both suppress explicit references to time.
Pearl suggests that his approach [36] differs in this respect from other theories of action in
artificial intelligence, since they are normally formulated as theories of temporal change.
Pearl also observes that in this respect he is inspired by deontic logic, the logic of obligations
and permissions. See also Section 5.2 on norms.
2.3.2
Decision rule
A second similarity between CDT as presented in Definition 1 and Pearl’s QDT as presented
in Definition 2 is that both consider trade-offs between normal situations and exceptional
situations. QDT differs in this respect from qualitative decision rules that can be found in
decision theory itself, such as ‘minimize the worst outcome’ (pessimistic Wald criterion),
because these qualitative decision rules cannot make such trade-offs.
The problem with a purely qualitative approach is that it is unclear how, besides the
most likely situations, also less likely situations can be taken into account. In particular
we are interested in situations which are unlikely, but which have a high impact, i.e., an
extreme high or low utility. For example, the probability that your house will burn down
is not very high, but it is very uncomfortable. Some people therefore decide to take an
insurance. In a purely qualitative setting it is unknown how much expected impact such
an unlikely situation has. There is no way to compare a likely but mildly important effect
to an unlikely but important effect. Going from quantitative to qualitative we may have
gained computational efficiency, but we lost one of the main useful properties of decision
theory.
The solution proposed by Pearl as discussed above is based on two ideas. First, the
initial probabilities and utilities are neither represented by quantitative probability distributions and utility functions, nor by pure qualitative orders, but by something semiqualitative in between.
Second, Pearl introduces an assumption to make the two semiqualitative functions comparable. This is called the commensurability assumption, see e.g.
[24].
Technically, alternatives or actions are represented by propositional formulas, as dis7
cussed in the following subsection. Taking P (w|ϕ) as the probability function that would
prevail after obtaining ϕ, which may be exchanged by Pϕ (w) in case of actions, the expected
utility criterion U (ϕ) = Σw∈W U (w)P (w|ϕ) shows that we can have for example likely and
moderately interesting worlds (κ(w) = 0, µ(w) = 0) or unlikely but very important worlds
(κ(w) = 1, µ(w) = 1), which have become comparable since in the second case we have
ǫ/ǫ.
Though Pearl’s approach is capable of dealing with trade-offs between normal and
exceptional circumstances, it is less clear how it can deal with trade-offs between two
effects under normal circumstances.
3
Qualitative decision theory vs KBS/BDI
In this section we give a comparison between QDT and KBS/BDI, based on their interpretation of beliefs and goals. We focus our attention to two theories that both propose
qualitative versions of probability and preferences defined on possible worlds or states,
namely Boutilier’s QDT [6] and Rao and Georgeff’s BDI [38, 40, 41].
3.1
Qualitative decision theory (continued)
Another class of qualitative decision theories is developed in the context of planning. Goals
serve a dual role in most planning systems, capturing aspects of both desires towards states
and commitment to pursuing that state [20].
In goal-based planning, adopting a proposition as a goal commits the agent to find some
way to accomplish the goal, even if this requires adopting some subgoals that may not
correspond to desirable propositions themselves [17]. In realistic planning situations goals
can be achieved to varying degrees, and frequently goals cannot be achieved completely.
Context-sensitive goals are formalized with basic concepts from decision theory [17, 23, 6].
In general, goal-based planning must be extended with a mechanism to choose between
which goals must be adopted.
Boutilier [6] proposes a logic and possible worlds semantics for representing and reasoning with qualitative probabilities and utilities, and suggests several strategies for qualitative
decision making based on this logic. His semantics is not quantitative (like CDT) or semiqualitative (like Pearl’s QDT), but purely qualitative. Consequently, the MEU decision
rule is replaced by qualitative rules like Wald’s criterion. The conditional preference is
captured by a preference ordering (an ordinal value function) that is defined on possible
worlds. The preference ordering represents the desirability of worlds. Similarly, probabilities are captured by an ordering, called normality ordering, on possible worlds representing
their likelihood.
Definition 3 The possible worlds semantics for this logic is based on models of the form
M = hW, ≤P , ≤N , V i
8
where W is a set of worlds (outcomes), ≤P is a transitive and connected preference ordering
relation on W , ≤N is a transitive and connected normality ordering relation on W , and V
is a valuation function.
The difference between the two previous theories is that Boutilier also proposes a logic
to reason about this semantics. In the proposed logic, conditional preferences can be
represented by means of ideal operator I. We have that a model M satisfies the formula
I(ϕ|ψ) if the preferred or best or minimal ψ worlds are ϕ worlds. For example, let u
be the proposition ‘agent has umbrella’ and r be the proposition ‘it rains’, then I(u|r)
expresses that in the most preferred rain-worlds the agent has an umbrella. The formal
definition of the ideal operator I will be explained in more details in section 3.3.6. Similarly,
probabilities are represented in this logic by means of a normative conditional connective ⇒.
For example, let w be the proposition ‘the agent is wet’ and r be the proposition ‘it rains’,
then r ⇒ w expresses that the agent is wet at the most normal rain-worlds. Its semantics is
derived from Hansson’s deontic logic with modal operator O for obligation [25].2 A similar
type of semantics is used by Lang [31] for a modality D to model desire. An alternative
approach represents conditional modalities by so called ‘ceteris paribus’ preferences, using
additional formal machinery to formalize the notion of ‘similar circumstances’, see e.g.
[23, 21, 47, 48].
A rational agent is then assumed to attempt to achieve the best possible world consistent
with its (default) knowledge. Based on this idea, the notion of goals is introduced as being
some proposition that the agent desire to make true.
Definition 4 Given a set of facts KB, a goal is thought to be any proposition α such that
M |= I(α | Cl(KB))
where Cl(KB) is the default closure of the facts KB defined as follows:
Cl(KB) = {α | KB ⇒ α}
The expressions of this logic represent qualitative constraints such that the proposed
logic enables the agents to reason directly with such constraints.
3.2
KBS/BDI
According to Dennett [18], attitudes like beliefs and desires are folk psychology concepts
that can be fruitfully used in explanations of rational behavior. If you were asked to explain
why someone is carrying an umbrella, you may reply that he believes it is going to rain
and that he does not want to get wet. For the explanation it does not matter whether
he actually possesses these mental attitudes. Similarly, we describe the behavior of an
affectionate cat or an unwilling screw in terms of mental attitudes. Dennett calls treating
a person or artifact as a rational agent the ‘intentional stance’.
2
In default logic, an exception is a digression from a default rule. Similarly, in deontic logic an offense
is a digression from the ideal.
9
“Here is how it works: first you decide to treat the object whose behavior is
to be predicted as a rational agent; then you figure out what beliefs that agent
ought to have, given its place in the world and its purpose. Then you figure
out what desires it ought to have, on the same considerations, and finally you
predict that this rational agent will act to further its goals in the light of its
beliefs. A little practical reasoning from the chosen set of beliefs and desires
will in most instances yield a decision about what the agent ought to do; that
is what you predict the agent will do.” [18, p. 17]
In this tradition, knowledge bases (KB) and beliefs (B) represent the information of
an agent about the state of the world. Belief is like knowledge, except that it does not
have to be true. Goals (G) or desires (D) represent the preferred states of affairs for an
agent. The terms goals and desires are sometimes used interchangeably. In other cases,
desires are like goals, except that they do not have to be mutually consistent. Desires
are long term preferences that motivate the decision process. Intentions (I) correspond to
previously made commitments of the agent, either to itself or to others.
As argued by Bratman [7], intentions are meant to stabilize decision making. To motivate intentions we consider the application of one of the first BDI systems, namely a
lunar robot. The robot is supposed to reach some destination on the surface of the moon.
Its path is obstructed by a rock. Suppose that based on its cameras and other sensors,
the robot decides that it will go around the rock on the left. At every step the robot
will receive new information through its sensors. Because of shadows, rocks may suddenly
appear much larger. If the robot were to reconsider its decision with each new piece of
information, it would never reach its destination. Therefore, the agent will adopt a plan
until some really strong reason forces it to change it. The intentions of an agent correspond
to the set of adopted plans at some point in time.
BDI theories are successfully applied in natural language processing and the design
of interactive systems. The theory of speech acts [3, 44] and subsequent applications in
artificial intelligence [14, 1] analyze the meaning of an utterance in terms of its applicability
and sincerity conditions and the intended effect. These conditions are best expressed using
belief (or knowledge), desire (or goal) and intention. For example, a question is applicable
when the speaker does not yet know the answer and the hearer is expected to know the
answer. A question is sincere if the speaker actually desires to know the answer. By the
conventions encoded in language, the effect of a question is that it signals the intention of
the speaker to let the hearer know that the speaker desires to know the answer. Now if
we assume that the hearer is cooperative, which is not a bad assumption for interactive
systems, the hearer will adopt the goal to let the speaker know the answer to the question
and will consider plans to find and formulate such answers. In this way, traditional planning
systems and natural language communication can be combined. For example, Sadek [8]
describes the architecture of a spoken dialogue system that assists the user in selecting
automated telephone services like the weather forecast, directory services or collect calls.
According to its developers the advantage of the BDI specification is its flexibility. In case
of a misunderstanding, the system can retry and reach its goal to assist the user by some
10
other means. This specification in terms of BDI later developed into the standard for agent
communication languages endorsed by FIPA.
As a typical example of a BDI model, we discuss Rao and Georgeff’s initial BDI logic
[39]. The (partial) information on the state of the environment, which is represented by
quantitative probabilities in classical decision theory [42, 28] and by qualitative ordering
in qualitative decision theory [6], is now reduced to binary values (0-1). This abstraction
of the (partial) information on the state of the environment is called the beliefs of the
decision making agent. Similarly, the (partial) information about the objectives of the
decision making agent, which is represented by quantitative utilities in classical decision
theory and by qualitative preference ordering in qualitative decision theory, is reduced
to binary values (0-1) as well. This abstraction of the (partial) information about the
objectives of the decision making agent, is called the desires of the decision making agent.
In the following rather complicated semantics of the BDI logic, this binary representation
is represented by the fact that each relation B, D, and I maps each world (at a given time
point) to an unstructured set of worlds. In the spirit of Boutilier’s QDT, each world (at a
time point) would have to be mapped to an ordering on worlds.
Definition 5 (Semantics of BDI logic [39]) An interpretation M is defined to be a
tuple M = hW, E, T, <, U, B, D, I, Φi, where W is the set of worlds, E is the set of primitive
event types, T is a set of time points, < a binary relation on time points, U is the universe
of discourse, and Φ is a mapping from first-order entities to elements in U for any given
world and time point. A situation is a world, say w, at a particular time point, say t, and is
denoted by wt . The relations B, D, and I map the agent’s current situation to her beliefs,
desire, and intention-accessible worlds, respectively. More formally, B ⊆ W × T × W and
similarly for D and I. Sometimes we shall use R to refer to any one of these relations and
shall use Rtw to denote the set of worlds R-accessible from world w at time t.
Again there is a logic to reason about this semantics. The consequence of the fact
that we no longer have pre-orders, like in Boutilier’s logic, but only an unstructured set, is
that we can now only represent monadic expressions like B(ϕ) and D(ϕ), no longer dyadic
expressions like I(ϕ|ψ). We have that a world (at a certain time point) of the model
satisfies B(ϕ) if ϕ is true in all accessible worlds (at the same time point). However, Rao
and Georgeff have introduced something else, namely a temporal component. The BDI
logic is an extension of the so-called computational tree Logic (CT L∗ ), which is often used
to model branching time structure, with modal epistemic operators for beliefs B, desires
D, and intentions I. The modal epistemic operators are used to model the cognitive
state (belief, desire, and intention) of a decision making agent, while the branching time
structure is used to model possible events that could take place at a certain time point and
determines the alternative worlds at that time point.
3.3
Comparison QDT and KBS/BDI
As in the previous comparison and in order to compare QDT and BDI, we consider the
two issues: alternatives and decision rules. Subsequently, we consider some distinctions
11
particular to these approaches.
3.3.1
Alternatives
The two approaches have chosen different ways to represent alternatives. Boutilier [6]
introduces a very simple but elegant distinction between consequences of actions and
consequences of observations, by distinguishing between controllable and uncontrollable
propositional atoms. This seems to make his logic less expressive but also simpler than
Pearl’s (but beware, there are still a lot of complications around). One of the issues he discusses is the distinction between goals under complete knowledge and goals under partial
knowledge.
On the other hand, BDI does not involve an explicit notion of actions, but instead
model possible events that can take place through time by the branching time. In fact, the
branching time structure model possible (sequences of) events and determines the alternative (cognitive) worlds that an agent can bring about through time. Thus, each branch in
the temporal structure represents an alternative the agent can choose. Uncertainty about
the effects of actions is not modelled by branching time, but by distinguishing between
different belief worlds. In effect, they model all uncertainty about the effects of actions as
uncertainty about the present state, a well known trick from decision theory we already
mentioned in Section 2.1. This is discussed at length in [38], a paper we discuss in more
detail in Section 4.
3.3.2
Decision rules
As observed above, the qualitative normality and the qualitative desirability orderings on
possible worlds that are used in the QDT are reduced to dichotomous values in BDI. Based
on the normality and the desirability orderings, Boutilier uses a qualitative decision rule
like the Wald criterion. Since there is no such ordering on the possible worlds in BDI, each
desire world can in principle be chosen in BDI as the goal world which need to be achieved.
It may be clear that it is not an intuitive idea to select a desire world as the goal world in
a random way since desire worlds can be in principle not belief worlds. Selecting a desire
world which is not believed results in wishful thinking and thus suggests an unrealistic
decision.
Therefore, BDI proposes a number of constraints under which each desire world can be
chosen as a goal world. These constraints are usually characterized by some axioms called
realism, strong realism or weak realism [41, 11]. In the BDI logic, one can thus model the
belief, desire, and intention worlds of an agent through time and relate these states to each
other by means of axioms. In this way, one can state that the set of desire worlds should
a subset of belief worlds and that the alternative worlds at certain time point, i.e. all time
branches from a world at a certain time point, should be identical to the alternative desire
worlds at that time point. This constraint is the so-called realism.
In particular, the realism constraint states that agent’s desire should be consistent with
its beliefs. Note that this constraint is the same as in QDT where the goal worlds should
12
be consistent with the belief worlds. Formally, the realism axiom states that the set of
desire accessible worlds should be a subset of the set of belief accessible worlds, i.e.
B(ϕ) → D(ϕ)
and, moreover, the belief and desire worlds should have identical branching time structure
(the same alternatives), i.e.
∀w∀t∀w′ if w′ ∈ Dtw then w′ ∈ Btw
The axioms, which constrain the relation between beliefs, desires, and alternatives determine the type of agents, i.e. the type of decision making behavior. Thus, in contrast to
QDT, where the decision rule is based on the ordering on the possible worlds, in BDI any
desire world that satisfies a set of assumed axioms such as realism, can be chosen as the
goal world, i.e. BDI systems do not consider decision rules but they consider agent types.
Although, there is no such issue as agent type in classical or qualitative decision theory,
there are discussions which can be related to agent types. For example, in discussions
about risk attitude, often a distinction is made between risk neutral and risk averse.
In the full version of the BDI, where intentions are considered, additional axioms are
introduced to reduced the set of desire worlds that can be chosen as the goal world. These
axioms guarantee that chosen desire world is consistent with the worlds that are already
chosen as the goal worlds. For example, in addition to the above axiom defined on beliefs
and desires, the definition of realism includes also the following axiom that states that the
intention accessible worlds should be a subset of desire accessible worlds, i.e.
D(ϕ) → I(ϕ)
and, moreover, the desire and intention worlds should have identical branching time structure (the same alternatives), i.e.
∀w∀t∀w′ if w′ ∈ Itw then w′ ∈ Dtw
In addition to these constraints, which are classified as static constraints, there are different
types of constraints introduced in BDI resulting in additional agent types. These axioms
determine when intentions or previously decided goals should be reconsidered or dropped.
In BDI, choosing to drop a goal is thus considered to be as important as choosing a
new goal. These constraints, called commitment strategies, involve time and intentions
and express the dynamics of decision making. The well-known commitment strategies
are ‘blindly committed decision making’, ‘single-minded committed decision making’, and
’open-minded committed decision making’. For example, the single-minded commitment
strategy states that an agent remains committed to its intentions until wither it achieves
its corresponding objective or does not believe that it can achieve it anymore.
The notion of agent type has been refined and it has been extended to include obligations in Boersen et al.’s BOID system [10]. For example, they also formally distinguish
between selfish agents, that give priority to their own desires, and social agents, that give
priority to their obligations.
13
3.3.3
Two Steps
A similarity between the two approaches is that we can distinguish two steps. In Boutilier’s
approach, decision-making with flexible goals has split the decision-making process in two
steps. First a decision is made which goals to adopt, and second a decision is made how
to reach these goals. These two steps have been further studied by Thomason [49] and
Broersen et al. [10] in the context of default logic.
1. First the agent has to combine desires and resolve conflicts between them. For
example, assume that the agent desires to be on the beach, if he is on the beach
then he desires to eat an ice-cream, he desires to be in the cinema, if he is in the
cinema then he desires to eat popcorn, and he cannot be at the beach as well as in
the cinema. Now he has to choose one of the two combined desires, or optional goals,
being at the beach with ice-cream or being in the cinema with popcorn.
2. Second, the agent has to find out which actions or plans can be executed to reach
the goal, and he has to take all side-effects of the actions into account. For example,
assume that he desires to be on the beach, if he will quit his job and drive to the
beach, he will be on the beach, if he does not have a job he will be poor, if he is poor
then he desires to work. The only desire and thus the goal is to be on the beach, the
only way to reach this goal is to quit his job, but the side effect of this action is that
he will be poor and in that case he does not want to be on the beach but he wants
to work.
Now crucially, desires come into the picture two times! First they are used to determine
the goals, and second they are used to evaluate the side-effects of the actions to reach
these goals. In extreme cases, like the example above, what seemed like a goal may not
be desirable, because the only actions to reach these goals have negative effects with much
more impact than these goals. This shows that the balancing philosophy of BDI could be
very helpful.
At first sight, it seems that we can apply classical decision theory to each of these two
sub-decisions. However, there is a caveat. The two sub-decisions are not independent, but
closely related! For example, to decide which goals to adopt we must know which goals
are feasible, and we thus have to take the possible actions into account. Moreover, the
intended actions constrain the candidate goals which can be adopted. Other complications
arise due to uncertainty, changing environments, etc, and we conclude here that the role of
decision theory in planning is complex, and that decision-theoretic planning is much more
complex than classical decision theory.
3.3.4
Goals vs desires
A distinction between the two approaches is that Boutilier distinguishes between desires
and goals, whereas Rao and Goergeff do not. In Boutilier’s logic, there is a formal distinction between preference ordering and goals expressed by ideality statements. Rao and
Georgeff have unified these two notions, which has been criticized by [16]. In decision
14
systems such as [10], desires are considered to be more primitive than goals, because goals
have to be adopted or generated based on desires. Moreover, goals can be based on desires, but also on other sources. For example, a social agent may adopt as a goal the
desires of another agent or an obligation. Desires and candidate goals may conflict, but
also goals have been considered which do not conflict. There are three main traditions.
In the Newell/Simon tradition of knowledge level and knowledge based systems, goals are
related to utility aspiration level and to limited (bounded) rationality. In this tradition
goals have a desirability aspect as well as intentionality aspect. In the more recent BDI
tradition knowledge and goals have been replaced by beliefs, desires and intentions due
to Bratman’s work on the role of intentions in deliberation [7]. The third tradition relates desires and goals to utilities as used in classical decision theory. The problem here
is that decision theory abstracts away from the deliberation cycle. Typically, Savage like
constructions only consider the input (state of the world) and output (action) of agent.
Consequently, utilities can be related to either desires or goals.
3.3.5
Conflict resolution
A similarity between the two logics is that both are not capable of representing conflicts,
either conflicting beliefs or conflicting desires.
Although the constraints imposed by the I operator are rather weak, they are still too
strong to represent certain types of conflicts. Consider conflicts among desires. Typically
desires are allowed to be inconsistent, but once they are adopted and have become intentions, they should be consistent. Moreover, in the logic proposed in [48] a specificity
set like D(p), D(¬p|q) is inconsistent (this problem is solved in [47], but the approach is
criticized as ad hoc in [5]). A large set of potential conflicts between desires, including
a classification and ways to resolve it, is given in [32]. A radically different approach to
solving conflicts is to apply Reiter’s default logic to create extensions. This is recently
proposed by Thomason [49], and thereafter also used in the BOID architecture [10].
Finally, an important branch of decision theory has to do with reasoning about multiple
objectives, which may conflict, using multiple attribute utility theory: [30]. This is also the
basis of the ‘ceteris paribus’ preferences mentioned above. This can be used to formalize
conflicting desires. Note that all the modal approaches above would make conflicting
desires inconsistent.
3.3.6
Proof theory
A similarity of Boutilier’s logic and Rao and Georgeff’s logic is that both are based on
a normal monadic modal logic. However, Boutilier interprets the accessibility relation as
a preference relation, and defines in this monadic modal logic dyadic (i.e. conditional)
operators. This works as follows.
The truth condition for the modal connectives are defined as follow:
M, w |= 2P ϕ ⇔ ∀v ∈ W v ≤P w, M, v |= ϕ
15
←
M, w |= 2P ϕ ⇔ ∀v ∈ W w <P v, M, v |= ϕ
Thus, 2P ϕ is true in a world w iff ϕ is true at all worlds that are at least as preferred as w,
←
and 2P ϕ is true in a world w iff ϕ is true at all less preferred worlds. The dual ‘possibility’
connectives are defined as usual (♦P ϕ = ¬2P ¬ϕ) meaning that ϕ is true at some equally
↔
←
↔
←
or more preferred world, 2P = 2P ϕ∧ 2P ϕ, and ♦P = ♦P ϕ∨ ♦P ϕ. Moreover, a second
variant of the modal operators, which are based on the normality orderings, are defined in
←
similar way. These modal operators have N as the index, i.e. 2N , 2N , ♦N , etc.
This logic formalizes the notion of ideal and knowledge using the preferences and normality orderings on worlds. The semantics of the ideal and normative connective for the
knowledge are defined as follows:
↔
↔
↔
↔
I(B|A) ≡def 2P ¬A ∨ ♦P (A ∧ 2P (A → B))
A ⇒ B ≡def 2N ¬A ∨ ♦N (A ∧ 2N (A → B))
Unlike QDT, in the BDI logic different modalities can be nested arbitrarily to represent
beliefs about beliefs, belief about desires, etc. In the BDI model, there is no preference
and normality orderings on the worlds such that there is no qualitative notion of decision
rule: unlike QDT where agents attempt to achieve the best world which are consistent to
their knowledge, in BDI all desire worlds are equally good such that the agent can aim to
achieve each desire worlds.
In Rao and Georgeff’s logic, a world at a certain time is also called a situation. From
a certain situation, each time branch represents an event and determines one alternative
situation, i.e one alternative world at the next time point. The modal epistemic operators
have specific properties such as closeness under implication and consistency (KD axioms).
Like in CT L∗ , the BDI logic has two types of formula. The first type of formula is called
state formula, which are evaluated at a situation, i.e. in a given world and at a given time
point. The second type of formula is called path formula, which are evaluated along a given
path in a given world. Therefore, the path formulae express alternative worlds through
time.
In order to give examples of how state formulae are evaluated, let B and D be modal
epistemic operator and ϕ be a state formula, then the state formulae Bϕ and Dϕ are
evaluated relative to the model M and situation wt as follows:
M, wt |= Bϕ ⇔ ∀w′ ∈ Btw M, wt′ |= ϕ
M, wt |= Dϕ ⇔ ∀w′ ∈ Dtw M, wt′ |= ϕ
3.3.7
Non-monotonic closure rules
A distinction between the logics is that Rao and Georgeff only present a monotonic logic,
whereas Boutilier also presents a non-monotonic extension of his logic.
The constraints imposed by I formulas of Boutilier (ideal) are relatively weak. Since the
semantics of the I operator is analogous to the semantics of many default logics, Boutilier
16
[6] proposes to use non-monotonic closure rules for the I operator too. In particular he uses
the well-known system Z [35]. Its workings can be summarized as ‘gravitation towards the
ideal’, in this case. An advantage of this system is that it always gives exactly one preferred
model, and that the same logic can be used for both desires and defaults. A variant of
this idea was developed by Lang [31], who directly associates penalties with the desires
(based on penalty logic [37]) and who does not use rankings of utility functions but utility
functions themselves. More complex constructions have been discussed in [48, 47, 51, 33].
4
Classical decision theory vs BDI logic
In this section we compare classical decision theory to BDI theory. Thus far, we have
seen a quantatative ordering in CDT, a semi-qualitative and qualitative ordering in QDT,
and binary values in BDI. Classical decision theory and BDI thus seem far apart, and the
question can be raised how they can be related. This question has been ignored in the
literature, except by Rao and Georgeff’s little known translation of decision trees to beliefs
and desires in [38]. Rao and Georgeff show that probabilistic constructions like probability
and pay-off can be recreated in the setting of their BDI logic. This results in a somewhat
artificial system, in which probabilities and pay-offs appear both in the object language
and the model. However, it does show that the two approaches are compatible. In this
section we sketch their approach.
4.1
BDI, continued
In the semantics, Rao and Georgeff add semantic strutures to represent probabilities and
utilities.
Definition 6 (Extended BDI models [38]) The semantics of the BDI logic is based on
models M of the following form:
M = hW, E, T, <, RB , RD , RI , P, U, V i
where W , E , T , <, RB , RD , RI and V are as before. The function P is a probability
assignment function that assigns to each situation wt a probability distribution Ptw that in
turns assigns probabilities to propositions (i.e. sets of possible worlds). The function U is
a pay-off assignment function that assigns to each situation a pay-off function Utw that in
turn assigns real-valued numbers to paths.
In the formal logic, Rao and Georgeff add expressions to talk about the two new functions. Moreover, they also add expressions for various decision rules such as MEU and
Wald’s criterion. We did not give the latter since they are rather involved.
M, wt |= P ROB(ϕ) ≥ n
⇔ Ptw ({w′ ∈ B(wt ) | M, wt′ |= ϕ}) ≥ n.
′
M, wt |= P AY OF F (ψ) ≥ m ⇔ ∀wt′ ∈ G(wt ) and xi = hwt′ , wt+1
, ...i
such that M, xi |= ψ it holds that Utw (xi ) ≥ m
17
We do not give any more formal details (they can be found in the cited paper), but we
illustrate the logic by an example. Consider the example in figure 1. There is an American
politician, a member of the house of representatives, who must make a decision about his
political career. He may stay in the house of representatives (Rep), try to be elected in
the senate (Sen) or retire altogether (Ret). He does not consider the option of retiring
seriously, and is certain to keep his seat in the house. He must make his decision based on
a poll, which is either a majority approval of his move to the senate (yes) or a majority
disapproval (no). There are four belief-accessible worlds, each with a specific likelihood
attached. The general propositions win, loss, yes and no are true at the appropriate points.
The goal worlds are also shown, with the individual payoffs attached. For example, the
payoff of trying to go to the senate and losing is 100, whereas trying the senate and wining
is valued as 300. Note that retiring is an option in the belief worlds, but is not considered
a goal. Finally, if we apply the maximal expected value decision rule, we end up with four
remaining intention worlds, that indicate the commitments the agent should rationally
make.
Belief worlds
0.24
Rep
Sen
T4
win
no poll
Sen
no poll
loss
Rep
T5
loss
poll
win
Rep
T2
poll
T2
yes
yes
Sen
Sen
Rep
poll
T2
win
no poll
T0
Rep
T2
T6
Sen
Ret
T0
yes
Sen
T1
Ret
T0
poll
Rep
T1
Ret
T0
0.16
Rep
Rep
T1
Ret
no poll
0.18
0.18
T3
T1
no
Sen
loss
loss
Sen
win
Goal worlds
Rep
Rep
200
200
no poll
Sen
205.9
200
Rep
200
Rep
300 win
no poll
Sen
205.9
200
214.2
Rep
100
loss
no poll
200
Sen
Sen
Rep
205.9
214.2
300 win
300
win
200
no poll
100
loss
Sen
100
loss
Sen
Rep
205.9
155.2
100
loss
200
155.2
poll
no
Sen
200
Ret
poll
yes
yes
Rep
200
Ret
poll
poll
200
200
Ret
Ret
no
Sen
300
win
Sen
100
loss
Intention Worlds
Rep
poll
poll
poll
yes
yes
Sen
300
win
200
Rep
200
poll
no
no
Figure 1: Belief, Goal and Intention worlds, using maxexpval as decision rule.
4.2
Relation between decision theory and BDI
Rao and Georgeff relate decision trees to these kinds of structures on possible worlds. The
proposed transformation is not between decision theory and BDI logic, but only between
a decision tree and the goal accessible worlds of some agent.
A decision tree consists of two types of nodes: one type of nodes expresses agent’s
choices and the other type expresses the uncertainties about the effect of actions. These
two types of nodes are indicated by a square and circle in the decision trees as illustrated
18
Rep
200
200
Ret
α=1
Rep
200
no poll
o
205.9
100
No Poll
α = 0.42
win
yes
Sen
win
300
o
100
No Poll
Sen
yes
Rep
100
no poll
Sen
300
o
200
Ret
Sen
100 loss
200
Rep
win
300 win
200
no
P(win) = 0.4
P(loss) = 0.6
P(yes) = 0.42
P(no) = 0.58
P(win|yes) = 0.571
P(loss|yes) = 0.429
P(win|no) = 0.275
P(loss|no) = 0.724
Sen
loss
200
Rep
300
0.24
o
loss
o
200
poll
200
Rep
200
Rep
Rep
214.2
loss
Poll
300 win
300
win
Sen
Sen
Poll
100
205.9
loss
yes
win
Sen
200
214.2
poll
o
100
loss
α = 0.58
Rep
300
yes
Sen
0.18
100 loss
200
Rep
Rep
200
200
win
No Poll
Sen
Ret
300
o
no poll
Sen
300 win
100
loss
205.9
Rep
200
155.2
200
Rep
poll
Poll
no
no
win
Sen
300
Sen
0.16
300 win
o
Rep
100
200
200
loss
Ret
no poll
Sen
205.9
Rep
100 loss
200
155.2
poll
no
0.42
Sen
100 loss
Figure 2: Transformation of a decision tree into a possible worlds structure
in figure 2. In order to generate relevant plans (goals), the uncertainties about the effect of
actions are removed from the given decision tree (circle in figure 2) resulting in a number
of new decision trees. The uncertainties about the effect of actions are now assigned to the
newly generated decision trees.
For example, consider the decision tree in figure 2. A possible plan is to perform P oll
followed by Sen if the effect of the poll is yes or Rep if the effect of the poll is no. Suppose
that the probability of yes as the effect of a poll is 0.42 and that the probability of no is 0.58.
Now the transformation will generate two new decision trees: one in which event yes takes
place after choosing P oll and one in which event no takes place after choosing P oll. The
uncertainties 0.42 and 0.57 are then assigned to the resulting trees, respectively. The new
decision trees provide two scenarios P oll; if yes, then Sen and P oll; if no, then Rep with
probabilities 0.42 and 0.58, respectively. In these scenarios the effects of events are known.
The same mechanism can be repeated for the remaining chance nodes. The probability
of a scenario that occurs in more than one goal world is the sum of the probabilities of
19
the different goal worlds in which the scenario occurs. This results in the goal accessible
worlds from figure 1. The agent can decide on a scenario by means of a decision rule such
as maximum expected utility.
5
Extensions
In this section we discuss extension of classical decision theory with time in processes and
with multiple agents in game theory.
5.1
Time: processes, planning and intentions
Decision processes are sequences of decision problems. The field of decision processes
contains some particular problems, such as the relation between these sequential decisions.
One popular assumption is that sequential decisions are unrelated. This has been called
the Markov assumption. One of the main open problems of BDI theory is how intentions
are related to decision processes.
As we mentioned above, the important contribution of the BDI logic is the introduction
of the notion of intentions (or previous decisions) and commitments to the intentions. The
idea of intention is to make the decision making behavior more stable. The argument of
the BDI research [40, 41] is that the classic and qualitative decision theories may produce
instable decision behavior when the environment is dynamic. Every change in the environment requires the decision problem to be reformulated, which in turn may result in
contradictory decisions. Recall the example of the lunar robot, which may make wildly
diverging decisions based on relatively arbitrary differences in the reading of its sensors.
Committing to previous decisions can therefore influence the decisions that an agent
makes at each time. An agent can adopt different commitment strategies to deal with its
intentions. In BDI, several strategies are introduced to keep or drop the commitments to
previous decisions. The well-known commitment strategies are blindly committed, singleminded committed, and open-minded committed [40]. The first denies any change in its
beliefs and desires that conflicts with its previous decisions, the second allows belief changes
and drop previous decisions that conflict with the new beliefs, and the last strategy allows
both its desires and beliefs to change and drop its previous decisions that conflict with
the new beliefs and desires. The process of intention creation and reconsideration is often
called the deliberation process.
5.2
Multiple agents: games, norms and commitments
Norms and social commitments are of interest when there is more than one agent making
decisions. This situation has been studied in classical game theory, with its equilibria
analysis. Recently computational BDI has been extended with norms (or obligations), see
e.g. [19], though it is still debated whether artificial agents need norms, and if they are
needed then for what purposes. It is also debated whether they should be represented
20
explicitly. Arguments for their use have been given in the cognitive approach to BDI, in
evolutionary game theory and in the philosophical areas of practical reasoning and deontic
logic. Many different notions of norms and commitments have been discussed, for example:
Norms as goal generators. The cognitive science approach to BDI [15, 12] argues that
norms are needed to model social agents. Norms are important concepts for social
agents, because they are a mechanism by which society can influence the behavior of
individual agents. This happens through the creation of normative goals, a process
which consists of four steps. First the agent has to believe that there is a norm, then
it has to believe that this norm is applicable, then it has to decide that it accepts
this norm – the norm now leads to a normative goal – and finally it has to decide
whether it will fulfill this normative goal.
Reciprocal norms. The argument of evolutionary game theory [4] is that reciprocal
norms are needed to establish cooperation in repeated prisoner’s dilemmas.
Norms influencing decisions. In practical reasoning, in legal philosophy and in deontic
logic (in philosophy as well as in computer science) it has been studied how norms
influence behavior.
Norms stabilizing multiagent systems. It has been argued that obligations play the
same role in multiagent systems as intentions do in single agent systems, namely they
stabilize its behavior [50].
Here we discuss an example which is closely related to game theory, in particular to the
pennies pinching example. This is a problem discussed in philosophy that is also relevant
for advanced agent-based computer applications. It is related to trust, but it has been
discussed in the context of game theory, where it is known as a non-zero sum game. Hollis
[26, 27] discusses the example and the related problem of backward induction as follows.
A and B play a game where ten pennies are put on the table and each in turn
takes one penny or two. If one is taken, then the turn passes. As soon as
two are taken the game stops and any remaining pennies vanish. What will
happen, if both players are rational? Offhand one might suppose that they
emerge with five pennies each or with a six-four split – when the player with
the odd-numbered turns take two at the end. But game theory seems to say
not. Its apparent answer is that the opening player will take two pennies, thus
killing the golden goose at the start and leaving both worse off. The immediate
trouble is caused by what has become known as backward induction. The
resulting pennies gained by each player are given by the bracketed numbers,
with A’s put first in each case. Looking ahead, B realizes that they will not
reach (5,5), because A would settle for (6,4). A realizes that B would therefore
settle for (4,5), which makes it rational for A to stop at (5,3). In that case, B
would settle for (3,4); so A would therefore settle for (4,2), leading B to prefer
(2,3); and so on. A thus takes two pennies at his first move and reason has
obstructed the benefit of mankind.
21
Game-theory and backward induction reasoning do not offer the intuitive solutions to
the problem, because agents are assumed to be rational in the sense of economics and
consequently game-theoretic solutions do not consider an implicit mutual understanding
of a cooperation strategy [2]. Cooperation results in an increased personal benefit by
seducing the other party in cooperation. The open question is how such ‘super-rational’
behavior can be explained.
Hollis considers in his book ‘Trust within reason’ [27] several possible explanations why
an agent should take one penny instead of two. For example, taking one penny in the first
move ‘signals’ to the other agent that the agent wants to cooperate (and it signals that
the agent is not rational in the economic sense). Two concepts that play a major role in
his book are trust and commitment (together with norm and obligation). One possible
explanation is that taking one penny induces a commitment that the agent will take one
penny again in his next move. If the other agent believes this commitment, then it has
become rational for him to take one penny too. Another explanation is that taking one
penny leads to a commitment of the other agent to take one penny too, maybe as a result
of a social law. Moreover, other explanations are not only based on commitments, but also
on the trust in the other party.
In [9] Broersen et al. introduce a language in which some aspects of these analyzes can
be represented. They introduce a modal language, like the ones which have seen before,
in which they introduce two new modalities. The formula Ci,j (α > β) means that agent i
is more committed towards agent j to do α than β, and Tj,i (α > β) means that agent j
is more trusted by agent i after executing α than after executing β. In all examples they
accept the following relation between trust and commitment, which denotes that violations
of stronger commitments result in a higher loss of trustworthiness than violations of weaker
ones.
Ci,j (α > β) → Tj,i (α > β)
We first consider the example without communication. The set of agents is G = {1, 2}
and the set of atomic actions A = {takei (1), takei (2) | i ∈ G}, where takei (n) denotes that
the agent i takes n pennies. The following formula denotes that taking one penny induces
a commitment to take one penny later on.
[take1 (1); take2 (1)]C1,2 (take1 (1) > take1 (2))
The formula expresses that taking one penny is interpreted as a signal that the agent 1 will
take one penny again on his next turn. When this formula holds, it is rational for agent 2
to take one penny.
The following formula denotes that taking one penny induces a commitment for the
other agent to take one penny on the next move.
[take1 (1)]C2,1 (take2 (1) > take2 (2))
The formula denotes the implications of a social law, which states that you have to return
favors. It is like giving a present at someone’s birthday, thereby giving the person the
obligation to return a present for your birthday.
22
Besides the commitment operator more complex examples involve also the trust operator. For example, the following formula denotes that taking one penny increases the
amount of trust.
Ti,j ((α; takej (1)) > α).
The following formulas illustrate how commitment and trust may interact. The first formula expresses that each agent intends to increase the amount of trust (=long term benefit).
The second formula expresses that any commitment to itself is also a commitment to the
other agent (a very strong cooperation rule).
Ti,j (β > α) → Ij (β > α).
Cj,j (β > α) ↔ Cj,i (β > α).
From these two rules, together with the definitions and the general rule, we can deduce:
Ci,j (takei (1) > takei (2)) ↔ Tj,i (takei (1) > takei (2))
In this scenario, each agent is assumed to act to increase its long term benefit, i.e. act to
increase the trust of other agents. Note that the commitment of i to j to take one penny
increases the trust of j in i and vice versa. Therefore, each agent would not want to take
two pennies since this will decrease its long term benefit.
Broersen et al. also discuss scenarios of pennies pinching with communication.
6
Concluding remarks
In this paper we consider some examples of how the research areas CDT, QDT, KBS and
BDI are related. These approaches have been developed in separation, and not much is
know about their relationships. Consequently our study is exploratory.
We start with the observation that QDT, KBS and BDI have been developed as a
criticism on classical decision theory. QDT introduces qualitative abstractions to deal with
situations in which not all numbers are available. KBS introduces goals to formalize utility
aspiration levels and satisficing instead of maximizing, and BDI introduces intentions to
make the decision-making process more stable.
We discuss various similarities and differences. For example, all approaches have a
representation of information (probabilities, qualitative abstractions, knowledge, beliefs)
and a representation of motivation (utilities, qualitative abstractions, goals, desires). All
approaches have a way to represent alternatives (a small set, formulas do(ϕ) for any ϕ,
a set of decision variables, branches in a tree). Some of the approaches focus on decision
rules (the output of the decision process), others on deliberation constraints or agent types
(the way the output is produced).
We also discuss several extensions. If we consider how decisions are calculated in
theories of deliberation, then there are various notions of goals, which can for example be
adopted or generated. In decision processes the role of intentions becomes object of study,
23
area
information
motivation
alternatives
focus
CDT
operations research
probabilities
utilities
small set
decision rule
QDT
artificial intelligence
qualitative prob
qualitative ut
Do(ϕ)
decision rule
KBS
artificial intelligence
knowledge
goals
decision var
deliberation
BDI
agent theory
beliefs
desires
branches
deliberation
Table 2: Comparison
and in decision games, the role of obligations and commitment is studied. In KBS and BDI
norms are usually understood as obligations from the society, inspired by work on social
agents, social norms and social commitments [12], whereas in decision theory and game
theory norms are understood as reciprocal norms in evolutionary game theory [4, 45] that
lead to cooperation in iterated prisoner’s dilemmas and in general lead to an decrease in
uncertainty and an increase in stability of a society.
We draw three preliminary conclusions, which indicate directions for further research.
First, we have not found any assumptions or techniques which are incompatible between
the approaches. This suggest that there is no need to choose one of the approaches, but
that they can be combined.
Second, several techniques can be interchanged. For example, whether one uses decision
variables or branches in a decision tree to model the choice between alternatives seems to
be irrelevant.
Third, there are various issues which have been studied in one area but ignored in
another one, which suggests that the approaches can benefit from each other. For example,
several ideas developed in QDT which can be used to enrich BDI: the use of different
decision rules, the role of knowledge in decision making and the granularity of actions, and
the construction process of goals from explicit (conditional) desires. One way to integrate
QDT and BDI is to start with QDT and try to incorporate ingredients of BDI. An example
of such an approach is Broersen et al.’s BOID architecture, which takes the process-oriented
approach from Thomason’s BDP and extending it with intentions and norms. However,
most elements of QDT we discuss in this paper have not yet been incorporated in this
approach. Another approach is to extend BDI with the elements of QDT we discuss here.
We hope that our study will lead to more research into hybrid approaches.
Acknowledgments
Thanks to Jan Broersen and Zhisheng Huang for many discussions on related subjects in
the context of the BOID project.
24
References
[1] J. Allen. and G. Perrault. Analyzing intention in dialogues. Artificial Intelligence,
15(3):143–178, 1980.
[2] R. Auman. Rationality and bounded rationality. Games and Economic behavior,
21:2–14, 1986.
[3] J. Austin. How to do things with words. Harvard University Press, Cambridge Mass.,
1962.
[4] R. Axelrod. The evolution of cooperation. Basic Books, New York, 1984.
[5] F. Bacchus and A.J. Grove. Utility independence in a qualitative decision theory.
In Proceedings of Fifth International Conference on Knowledge Representation and
Reasoning (KR’96), pages 542–552. Morgan Kaufmann, 1996.
[6] C. Boutilier. Towards a logic for qualitative decision theory. In Proceedings of
the Fourth International Conference on Knowledge Representation and Reasoning
(KR’94), pages 75–86. Morgan Kaufmann, 1994.
[7] M. Bratman. Intention, plans, and practical reason. Harvard University Press, Cambridge Mass, 1987.
[8] P. Bretier and D. Sadek. A rational agent as the kernel of a cooperative spoken
dialogue system: Implementing a logical theory of interaction. In Intelligent Agents
III: ECAI’96 Workshop on Agent Theories, Architectures and Languages (ATAL),
volume 1193 of LNCS, pages 189–204. Springer, 1997.
[9] J. Broersen, M. Dastani, Z. Huang, and L. van der Torre. Trust and commitment
in dynamic logic. In Proceedings of The First Eurasian Conference on Advances in
Information and Communication Technology (EurAsia ICT 2002), volume 2510 of
LNCS, pages 677–684. Springer, 2002.
[10] J. Broersen, M. Dastani, J. Hulstijn, and L. van der Torre. Goal generation in the
BOID architecture. Cognitive Science Quarterly, 2(3-4):428–447, 2002.
[11] J. Broersen, M. Dastani, and L. van der Torre. Realistic desires. Journal of Applied
Non-Classical Logics, 12(2):287–308, 2002.
[12] C. Castelfranchi. Modeling social agents. Artificial Intelligence, 1998.
[13] P. Cohen and H. Levesque. Intention is choice with commitment. Artificial Intelligence,
42 (2-3):213–261, 1990.
[14] P. Cohen and C. Perrault. Elements of a plan-based theory of speech acts. Cognitive
Science, 3:177–212, 1979.
25
[15] R. Conte and C Castelfranchi. Understanding the effects of norms in social groups
through simulation. In G.N. Gilbert and R. Conte, editors, Artificial societies: the
computer simulation of social life. London, UCL Press, 1995.
[16] M. Dastani and L. van der Torre. Specifying the merging of desires into goals in the
context of beliefs. In Proceedings of The First Eurasian Conference on Advances in
Information and Communication Technology (EurAsia ICT 2002), LNCS 2510, pages
824–831. Springer, 2002.
[17] T. Dean and M. Wellman. Planning and Control. Morgan Kaufmann, San Mateo,
1991.
[18] D. Dennett. The Intentional Stance. MIT Press, Cambridge, MA, 1987.
[19] F. Dignum. Autonomous agents with norms. Artificial Intelligence and Law, pages
69–79, 1999.
[20] J. Doyle. A model for deliberation, action and introspection. Technical Report AITR-581, MIT AI Laboratory, 1980.
[21] J. Doyle, Y. Shoham, and M.P. Wellman. The logic of relative desires. In Sixth
International Symposium on Methodologies for Intelligent Systems, Charlotte, North
Carolina, 1991.
[22] J. Doyle and R. Thomason. Background to qualitative decision theory. AI magazine,
20:2:55–68, 1999.
[23] J. Doyle and M. Wellman. Preferential semantics for goals. In Proceedings of the
Tenth National Conference on Artificial Intelligence (AAAI’91), pages 698–703, 1991.
[24] D. Dubois and H. Prade. Possibility theory as a basis for qualitative decision theory. In
Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence
(IJCAI’95), pages 1924–1930. Morgan Kaufmann, 1995.
[25] B. Hansson. An analysis of some deontic logics. Nous, 1969.
[26] M. Hollis. Penny pinching and backward induction. Journal of Philosophy, 88:473–488,
1991.
[27] M. Hollis. Trust within Reason. Cambridge University Press, 1998.
[28] R. C. Jeffrey. The Logic of Decision. McGraw-Hill, New York., 1965.
[29] N. Jennings. On agent-based software engineering. Artificial Intelligence, 117(2):277–
296, 2000.
[30] R.L. Keeney and H. Raiffa. Decisions with Multiple Objectives: Preferences and Value
Trade-offs. John Wiley and Sons, New York, 1976.
26
[31] J. Lang. Conditional desires and utilities - an alternative approach to qualitative
decision theory. In Proceedings of the Twelth European Conference on Artificial Intelligence (ECAI’96), pages 318–322. John Wiley and Sons, 1996.
[32] J. Lang, E. Weydert, and L. van der Torre. Conflicting desires. In Proceedings
GTDT01, 2001.
[33] J. Lang, E. Weydert, and L. van der Torre. Utilitarian desires. Autonomous Agents
and Multi-Agent Systems, 5:3:329–363, 2002.
[34] Allen Newell. The knowledge level. Artificial Intelligence, 18(1):87–127, 1982.
[35] J. Pearl. System Z. In Proceedings of the TARK’90, pages 121–135. Morgan Kaufmann,
1990.
[36] J. Pearl. From conditional ought to qualitative decision theory. In Proceedings of
the Ninth Conference on Uncertainty in Artificial Intelligence (UAI’93), pages 12–20.
John Wiley and Sons, 1993.
[37] G. Pinkas. Reasoning, nonmonotonicity and learning in connectionist network that
capture propositional knowledge. Artificial Intelligence, 77:203–247, 1995.
[38] A. Rao and M. Georgeff. Deliberation and its role in the formation of intentions. In
Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence (UAI91), pages 300–307, San Mateo, CA, 1991. Morgan Kaufmann Publishers.
[39] A. Rao and M. Georgeff. Modeling rational agents within a bdi architecture. In Proceedings of Second International Conference on Knowledge Representation and Reasoning (KR’91), pages 473–484. Morgan Kaufmann, 1991.
[40] A. Rao and M. Georgeff. BDI agents: From theory to practice. In Proceedings of the
First International Conference on Multi-Agent Systems (ICMAS’95), 1995.
[41] A. Rao and M. Georgeff. Decision procedures for BDI logics. Journal of Logic and
Computation, 8:293–342, 1998.
[42] L. Savage. The foundations of statistics. Wiley, New York, 1954.
[43] G. Schreiber, H. Akkermans, A. Anjewierden, R. de Hoog, N. Shadbolt, W. Van
de Velde, and B. Wielinga. Knowledge Engineering and Management — The CommonKADS Methodology. The MIT Press, Cambridge, Massachusetts; London, England, 1999.
[44] J. Searle. Speech acts: an Essay in the Philosophy of Language. Cambridge University
Press, Cambridge, 1969.
[45] Y. Shoham and M. Tennenholtz. On the emergence of social conventions: modeling,
analysis, and simulations. Artificial Intelligence, 94(1-2):139–166, 1997.
27
[46] H. A. Simon. A behavioral model of rational choice. Quarterly Journal of Economics,
pages 99–118, 1955.
[47] S.-W. Tan and J. Pearl. Qualitative decision theory. In Proceedings of the Thirteenth
National Conference on Artificial Intelligence (AAAI’94), 1994.
[48] S.-W. Tan and J. Pearl. Specification and evaluation of preferences under uncertainty.
In Proceedings of the Fourth International Conference on Knowledge Representation
and Reasoning (KR’94), pages 530–539. Morgan Kaufmann, 1994.
[49] R. Thomason. Desires and defaults: a framework for planning with inferred goals.
In Proceedings of Seventh International Conference on Knowledge Representation and
Reasoning (KR’2000), pages 702–713. Morgan Kaufmann, 2000.
[50] L. van der Torre. Contextual deontic logic: Normative agents, violations and independence. Annals of Mathematics and Artificial Intelligence, 37 (1-2):33–63, 2003.
[51] L. van der Torre and E. Weydert. Parameters for utilitarian desires in a qualitative
decision theory. Applied Intelligence, 14:285–301, 2001.
[52] M. Walker. Inferring acceptance and rejection in dialog by default rules of inference.
Language and Speech, 39(2-3):265–304, 1996.
28