How to Decide What to Do?
Transcription
How to Decide What to Do?
How to Decide What to Do? A Comparison between Classical, Qualitative and Cognitive Theories Mehdi Dastani Utrecht University The Netherlands [email protected] Joris Hulstijn Utrecht University The Netherlands [email protected] Leendert van der Torre CWI Amsterdam The Netherlands [email protected] January 10, 2003 Abstract In this paper Classical Decision Theory (CDT) is compared with three alternative theories developed in artificial intelligence and agent theory, called Qualitative Decision Theory (QDT), Knowledge Based Systems (KBS) and Beliefs-Desire-Intention (BDI) models. All theories are based on an informational attitude (probabilities, qualitative abstractions of probabilities, knowledge, beliefs) and motivational attitude (utilities, qualitative abstractions, goals, desires). All theories represent a set of alternatives (a small set, logical formulas, decision variables, branches of trees). Some theories focus on the decision rule (maximize expected utility, Wald criterion) and others on the way to reach decisions (deliberation, agent types). We also consider the relation between decision processes and intentions, and the relation between game theory and norms and commitments. We show that QDT, KBS and BDI are sometimes alternative solutions to the same problem, but in general they are complementary. For example, QDT offers qualitative decision rules that can make trade-offs between normal and exceptional circumstances, QDT and BDI offer logical specification languages, KBS offers goal-based decision making, and BDI offers stability in decision processes. 1 Introduction In recent years there has been new interest in the foundations of decision making, due to the automation of decision making in the context of tasks like planning, learning and communication. In artificial intelligence and agent theory various alternatives to classical decision theory have been developed, rooted in different research traditions and with different objectives. The relations between the different theories that aim to explain the decision-making behavior of rational agents are not well studied. We are interested in relations between the various approaches to decision making. In this paper we consider four theories: Classical Decision Theory (CDT) [42, 28] developed 1 within economics and used within operations research, a Qualitative variant of Decision Theory (QDT) developed in artificial intelligence [36, 6], Knowledge Based Systems (KBS) [34], also developed in artificial intelligence, and Belief-Desire-Intention (BDI) models, developed in philosophy and agent theory [7, 39, 13, 15, 29]. The main distinction between CDT and QDT on the one hand, and KBS and BDI on the other hand, is that the latter two describe decision making in terms of deliberation on cognitive attitudes such as beliefs, desires, intentions, goals, etc. They are therefore usually called theories of deliberation instead of decision theories. The alternative theories criticize classical decision theory, which is our starting point in the comparison. This criticism goes back to Simon’s notion of limited (or bounded) rationality, and his introduction of utility aspiration levels [46]. This has led to the notion of a goal in KBS. The research area of QDT developed much more recently in artificial intelligence out of research on models for reasoning under uncertainty. It focusses on theoretical models of decision making with potential applications in planning. The research area of BDI developed out of philosophical arguments that – besides the knowledge and goals used within KBS – also intentions should be first class citizens of a cognitive theory of deliberation. The following example of Doyle and Thomason [22] illustrates the criticism on classical decision theory. It is based on the automation of human financial advice dialogues, which have been studied in linguistics [52]. Consider a user who seeks advice about financial planning. The user may for example want to retire early, secure a good pension and maximize the inheritance for her children. We can model the decision of such a user in terms of the usual decision theoretic parameters. The user can choose between a limited number of actions: retire at a certain age, invest her savings and give certain sums of money to her children per year. The user does not know all factors that might influence the decision. She does not know if she will get a pay raise next year, the outcome of her financial actions is uncertain, and not even the user’s own preferences are clear. For example, securing her own pension conflicts with her children’s inheritance. An experienced decision theoretic analyst interactively guides the user through the decision process, indicating possible choices and desirable consequences. As a result the user may drop her initial preferences. For example, she may prefer to continue working for another five years before retiring. According to Doyle and Thomason, this interactive process of preference elicitation cannot be automated in decision theory itself, although there are various approaches and methodologies available in decision theoretic practice. The problem is that it is difficult to describe the alternatives in this example. Another problem is that CDT is not suitable to model generic preferences. This example can also be used to illustrate the differences between objectives and concepts of the three alternative theories. Most importantly, the different theories provide different conceptualization of the decision making behavior, which all may be helpful. They focus at automating and modeling different aspects of the preference elicitation phase. QDT uses the same conceptualization as CDT. Preferences are typically uncertain, formulated in general terms, dependent on uncertain assumptions and subject to change. Often a preference is expressed in the shape of a trade-off. Is this really what the user wants and how much is she prepared to sacrifice for that? KBS model and formalize a particular 2 task and domain in terms of the knowledge and goals of a user [34]. Knowledge-based systems use high-level conceptual models of particular domains such as medical, legal or engineering domains, and aim to re-use inference schemes applicable for particular tasks, such as classification or configuration. In many applications, reasoning with knowledge and goals is implemented by production rules with either forward or backward chaining. There have been many successful applications of KBS. Development methods for KBS have matured. Schreiber et al. [43] for example, provide an elaborate methodology for modeling, developing and testing knowledge-based applications. BDI is a cognitive theory of decision making behavior of rational agents and aims at describing the decision making behavior in terms of cognitive concepts such as beliefs, desires, and intentions. In the BDI theory, an agent can behave rationally in different ways depending on its type. The belief and desire concepts corresponds closely to the concepts used in alternative decision theories, and the intention concepts, which can be considered as previous decision, is proposed to constraint the alternatives from which agents can decide. The concept of intention is claimed to stabilize the decision making behavior through time. In this paper we do not only discuss the four basic theories, but we also consider some extensions of them. In particular, we also consider the relation between decision processes and intentions, and the relation between game theory and norms and commitments. A summary of all related theories is given in Table 1. Some concepts can be mapped easily on concepts of other theories. For example, all theories use some notion of an informational attitude (probabilities, qualitative abstractions of probabilities, knowledge or beliefs) and a notion of motivational attitude (utilities, qualitative abstractions, goals or desires). Other concepts are more ambiguous, such as intentions. For example, in goal-based planning the goals have a desirability aspect as well as an intentional aspect [20]. KBS have been developed to formalize goal-based reasoning, and some qualitative decision theories like [6] have been developed as a criticism to the inflexibility of the notion of goal in goal based planning. CDT QDT KBS / BDI theory classical qualitative knowledge-based decision theory decision theory systems + time (Markov) decision-theoretic belief-desire-intention decision processes planning logics & systems + multiagent classical qualitative normative systems game theory game theory (BOID) Table 1: Theories discussed in this paper Clearly, the discussions in this paper have to be superficial and cannot do justice to the subtleties defined in each approach. We therefore urge the reader to read the original papers we discuss. To restrict the scope of our comparison, we make some simplifications. First, we discuss QDT in more detail than KBS and BDI, because it is closer to CDT and has been positioned as an intermediary between CDT and KBS/BDI [22]. Second, we do not try to 3 be complete, but we restrict our comparison to some illustrative examples. In particular, for the relation between CDT and QDT we discuss Doyle and Thomason [22] and Pearl [36]. For the relation between QDT and KBS/BDI we focus on the different interpretations of goals, and our discussion is based on Boutilier [6] and Rao and Georgeff [39]. For the direct relation between CDT and KBS/BDI we discuss Rao and Georgeff’s translation of decision trees to beliefs and desires [38]. The extension with time in decision processes focusses on the role of intentions in Rao and Georgeff’s work [39] and the extension with multiple agents in game theory focusses on the role of norms in a logic of commitments [9]. We restrict ourselves to formal theories and logics, and do not go into architectures or into the philosophical motivations of the underlying cognitive or social concepts. The layout of this paper is as follows. In Section 2 we discuss informational and motivational attitudes, alternativea and decision rules in classical and qualitative decision theory. In Section 3 we discuss goals in qualitative decision theory, knowledge based systems and BDI systems. In Section 4 we compare classical decision theory and Rao and Georgeff’s BDI approach. Finally, in Section 5 we discuss intentions and norms in extensions of classical decision theory that deal with time in processes, and that deal with multiple agents in game theory. 2 Classical vs qualitative decision theory In this section we give a comparison between classical and qualitative decision theory, following Doyle and Thomason’s introduction to qualitative decision theory [22]. 2.1 Classical decision theory In classical decision theory, a decision is a choice made by some entity of an action from a set of alternatives. It has nothing to say about actions – either about their nature or about how a set of them becomes available to the decision maker. A decision is good if it is an alternative that the decision maker believes will prove at least as good as other alternative actions. Good decisions are formally characterized as actions that maximize expected utility, a notion involving both belief and goodness. See [22] or [42, 28] for further explanations. Definition 1 Let A stand for the set of actions or alternatives. With each action, a set of outcomes is associated. Let W 1 stand for the set of all possible worlds or outcomes . Let U be a measure of outcome value that assigns a utility U (w) to each outcome w ∈ W , and let P be a measure of the probability of outcomes conditional on actions, with P (w|a) denoting the probability that outcome w comes about after taking action a ∈ A in the situation under consideration. 1 Note that outcomes are usually represented by Ω. Here we use W to facilitate the comparison with other theories. 4 The expected utility EU (a) of an action a is the average utility of the outcomes associated with the alternative, weighing the utility of each outcome by the probability that P the outcome results from the alternative, that is, EU (a) = w∈W U (w)P (w|a). A rational decision maker always maximizes expected utility, i.e. he makes decisions according to the MEU decision rule. Decision theory is an active research area within economics, and the number of extensions and subtleties is too large to address here. In some presentations of CDT, not only uncertainty about the effect of actions is considered, but also uncertainty about the actual world. A classic result is that uncertainty about the effects of actions can be expressed in terms of uncertainty about the present state. Moreover, several other decision rules have been investigated, including qualitative ones, such as maximin, minimax, minregret etc. Finally, classical decision theory has been extended in several ways to deal for example with multiple objective, sequential decisions, multiple agents, distinct notions of risk, etcetera. The extensions with sequential decisions and multiple agents are discussed in Section 5.1 and 5.2. Decision theory has become one of the main foundations of economic theory due to so-called representation theorems. Representation theorems, such as the most famous one of Savage [42], typically prove that each decision maker obeying certain innocent looking postulates (about weighted choices) acts as if he applies the MEU decision rule with some probability distribution and utility function. Thus, he does not have to be aware of it, and his utility function does not have to represent selfishness. In fact, exactly the same is true for altruistic decision makers. They also act as if they maximize expected utility; they just have another utility function. 2.2 Qualitative decision theory According to Doyle and Thomason [22, p.58], quantitative representations of probability and utility and procedures for computing with these representations, do provide an adequate framework for manual treatment of very simple decision problems, but they are less successful in more realistic cases. For example, they argue that classical decision theory does not address decision making in unforeseen circumstances, it offers no means for capturing generic preferences, it provides little help in modeling decision makers who exhibit discomfort with numeric tradeoffs, and it provides little help in effectively representing and reasoning about decisions involving broad knowledge of the world. Doyle and Thomason argue for various formalization tasks. They distinguish the following new tasks: formalisms to express generic probabilities and preferences, properties of decision formulations, reasons and explanations, revision of preferences, practical qualitative decision-making procedures and agent modeling. Moreover, they argue that hybrid representation and reasoning with quantitative and qualitative techniques, as well as reasoning within context, deserve special attention. Many of these issues are related to subjects studied in artificial intelligence. It appears that researchers now realize the need 5 to reconnect the methods of AI with the qualitative foundations and quantitative methods of economics. Some first results have been obtained in the area of reasoning about uncertainty, a subdomain of artificial intelligence which mainly attracts researchers with a background in reasoning about defaults and beliefs. Often the formalisms of reasoning about uncertainty are re-applied in the area of decision making. Thus, typically uncertainty is not represented by a probability function, but by a plausibility function, a possibilistic function, Spohntype rankings, etc. Another consequence of this historic development is that the area is much more mathematically oriented than the planning community or the BDI community. As a typical example, we mention the work of Pearl [36]. First, he introduces so-called semi-qualitative rankings. A ranking κ(w) can be considered as an order-of-magnitude approximation of a probability function P (w) by writing P (w) as a polynomial of some small quantity ǫ and taking the most significant term of that polynomial. Similarly, positive µ(w) can be considered as an approximation of a utility function U (w). There is one more subtlety here, which has to do with the fact that whereas κ rankings are positive, the µ rankings can be either positive or negative. This represents that outcomes can be either very desirable or very undesirable. Definition 2 A belief ranking function κ(w) is an assignment of non-negative integers to outcomes or possible worlds w ∈ W such that κ(w) = 0 for at least one world. Intuitively, κ(w) represents the degree of surprise associated with finding a world w realized, and worlds assigned κ(w) = 0 are considered serious possibilities. Likewise, µ(w) is an integer-valued utility ranking of worlds. Moreover, both probabilities and utilities are defined as a function of the same ǫ, which is treated as an infinitisimal quantity (smaller than any real number). C is a constant and O is the order of magnitude. P (w) ∼ Cǫκ(w) , U (w) = O(1/ǫµ(w) ) if µ(w) ≥ 0, −O(1/ǫ−µ(w) ) otherwise This definition illustrates the use of abstractions of probabilities and utilities. However, as a consequence of Pearl’s qualitative perspective, more has to change. There are also several issues with respect to the set of alternatives, and the use of the MEU decision rule in this qualitative setting. These issues are discussed in the comparison in the following section. 2.3 2.3.1 Relation CDT and QDT Alternatives A difference between the two approaches is that actions in decision theory are pre-designated to a few distinguished atomic variables, but Pearl assumes that actions ‘Do(ϕ)’ are presumed applicable to any proposition ϕ. He associates a valuation function with the set of worlds W , which assigns a truth value to every every proposition at each world. A proposition ϕ is now identified with the set of worlds that satisfy ϕ. 6 In this setting, he explains that the expected utility of a proposition ϕ clearly depends on how we came to know ϕ. For example, if we find out that the ground is wet, it matters whether we happened to find the ground wet (observation) or we watered the ground (action). In the first case, finding ϕ true may provide information about the natural process that led to the observation ϕ, and we should change the current probability from P (w) to P (w|ϕ). In the second case, our actions may perturb the natural flow of events, and P (w) will change without shedding light on the typical causes of ϕ. This is represented differently, by Pϕ (w). According to Pearl, this distinction between P (w|ϕ) and Pϕ (w) nicely corresponds to various other distinctions found in a wide variety of theories, such as the distinction between Lewis’ conditioning and imaging, between belief revision and belief update, and between indicative and subjunctive conditionals. One of the tools Pearl uses for the formalization of this distinction are causal networks (a kind of Bayesian networks with actions). A similarity between the two theories is that both suppress explicit references to time. Pearl suggests that his approach [36] differs in this respect from other theories of action in artificial intelligence, since they are normally formulated as theories of temporal change. Pearl also observes that in this respect he is inspired by deontic logic, the logic of obligations and permissions. See also Section 5.2 on norms. 2.3.2 Decision rule A second similarity between CDT as presented in Definition 1 and Pearl’s QDT as presented in Definition 2 is that both consider trade-offs between normal situations and exceptional situations. QDT differs in this respect from qualitative decision rules that can be found in decision theory itself, such as ‘minimize the worst outcome’ (pessimistic Wald criterion), because these qualitative decision rules cannot make such trade-offs. The problem with a purely qualitative approach is that it is unclear how, besides the most likely situations, also less likely situations can be taken into account. In particular we are interested in situations which are unlikely, but which have a high impact, i.e., an extreme high or low utility. For example, the probability that your house will burn down is not very high, but it is very uncomfortable. Some people therefore decide to take an insurance. In a purely qualitative setting it is unknown how much expected impact such an unlikely situation has. There is no way to compare a likely but mildly important effect to an unlikely but important effect. Going from quantitative to qualitative we may have gained computational efficiency, but we lost one of the main useful properties of decision theory. The solution proposed by Pearl as discussed above is based on two ideas. First, the initial probabilities and utilities are neither represented by quantitative probability distributions and utility functions, nor by pure qualitative orders, but by something semiqualitative in between. Second, Pearl introduces an assumption to make the two semiqualitative functions comparable. This is called the commensurability assumption, see e.g. [24]. Technically, alternatives or actions are represented by propositional formulas, as dis7 cussed in the following subsection. Taking P (w|ϕ) as the probability function that would prevail after obtaining ϕ, which may be exchanged by Pϕ (w) in case of actions, the expected utility criterion U (ϕ) = Σw∈W U (w)P (w|ϕ) shows that we can have for example likely and moderately interesting worlds (κ(w) = 0, µ(w) = 0) or unlikely but very important worlds (κ(w) = 1, µ(w) = 1), which have become comparable since in the second case we have ǫ/ǫ. Though Pearl’s approach is capable of dealing with trade-offs between normal and exceptional circumstances, it is less clear how it can deal with trade-offs between two effects under normal circumstances. 3 Qualitative decision theory vs KBS/BDI In this section we give a comparison between QDT and KBS/BDI, based on their interpretation of beliefs and goals. We focus our attention to two theories that both propose qualitative versions of probability and preferences defined on possible worlds or states, namely Boutilier’s QDT [6] and Rao and Georgeff’s BDI [38, 40, 41]. 3.1 Qualitative decision theory (continued) Another class of qualitative decision theories is developed in the context of planning. Goals serve a dual role in most planning systems, capturing aspects of both desires towards states and commitment to pursuing that state [20]. In goal-based planning, adopting a proposition as a goal commits the agent to find some way to accomplish the goal, even if this requires adopting some subgoals that may not correspond to desirable propositions themselves [17]. In realistic planning situations goals can be achieved to varying degrees, and frequently goals cannot be achieved completely. Context-sensitive goals are formalized with basic concepts from decision theory [17, 23, 6]. In general, goal-based planning must be extended with a mechanism to choose between which goals must be adopted. Boutilier [6] proposes a logic and possible worlds semantics for representing and reasoning with qualitative probabilities and utilities, and suggests several strategies for qualitative decision making based on this logic. His semantics is not quantitative (like CDT) or semiqualitative (like Pearl’s QDT), but purely qualitative. Consequently, the MEU decision rule is replaced by qualitative rules like Wald’s criterion. The conditional preference is captured by a preference ordering (an ordinal value function) that is defined on possible worlds. The preference ordering represents the desirability of worlds. Similarly, probabilities are captured by an ordering, called normality ordering, on possible worlds representing their likelihood. Definition 3 The possible worlds semantics for this logic is based on models of the form M = hW, ≤P , ≤N , V i 8 where W is a set of worlds (outcomes), ≤P is a transitive and connected preference ordering relation on W , ≤N is a transitive and connected normality ordering relation on W , and V is a valuation function. The difference between the two previous theories is that Boutilier also proposes a logic to reason about this semantics. In the proposed logic, conditional preferences can be represented by means of ideal operator I. We have that a model M satisfies the formula I(ϕ|ψ) if the preferred or best or minimal ψ worlds are ϕ worlds. For example, let u be the proposition ‘agent has umbrella’ and r be the proposition ‘it rains’, then I(u|r) expresses that in the most preferred rain-worlds the agent has an umbrella. The formal definition of the ideal operator I will be explained in more details in section 3.3.6. Similarly, probabilities are represented in this logic by means of a normative conditional connective ⇒. For example, let w be the proposition ‘the agent is wet’ and r be the proposition ‘it rains’, then r ⇒ w expresses that the agent is wet at the most normal rain-worlds. Its semantics is derived from Hansson’s deontic logic with modal operator O for obligation [25].2 A similar type of semantics is used by Lang [31] for a modality D to model desire. An alternative approach represents conditional modalities by so called ‘ceteris paribus’ preferences, using additional formal machinery to formalize the notion of ‘similar circumstances’, see e.g. [23, 21, 47, 48]. A rational agent is then assumed to attempt to achieve the best possible world consistent with its (default) knowledge. Based on this idea, the notion of goals is introduced as being some proposition that the agent desire to make true. Definition 4 Given a set of facts KB, a goal is thought to be any proposition α such that M |= I(α | Cl(KB)) where Cl(KB) is the default closure of the facts KB defined as follows: Cl(KB) = {α | KB ⇒ α} The expressions of this logic represent qualitative constraints such that the proposed logic enables the agents to reason directly with such constraints. 3.2 KBS/BDI According to Dennett [18], attitudes like beliefs and desires are folk psychology concepts that can be fruitfully used in explanations of rational behavior. If you were asked to explain why someone is carrying an umbrella, you may reply that he believes it is going to rain and that he does not want to get wet. For the explanation it does not matter whether he actually possesses these mental attitudes. Similarly, we describe the behavior of an affectionate cat or an unwilling screw in terms of mental attitudes. Dennett calls treating a person or artifact as a rational agent the ‘intentional stance’. 2 In default logic, an exception is a digression from a default rule. Similarly, in deontic logic an offense is a digression from the ideal. 9 “Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in most instances yield a decision about what the agent ought to do; that is what you predict the agent will do.” [18, p. 17] In this tradition, knowledge bases (KB) and beliefs (B) represent the information of an agent about the state of the world. Belief is like knowledge, except that it does not have to be true. Goals (G) or desires (D) represent the preferred states of affairs for an agent. The terms goals and desires are sometimes used interchangeably. In other cases, desires are like goals, except that they do not have to be mutually consistent. Desires are long term preferences that motivate the decision process. Intentions (I) correspond to previously made commitments of the agent, either to itself or to others. As argued by Bratman [7], intentions are meant to stabilize decision making. To motivate intentions we consider the application of one of the first BDI systems, namely a lunar robot. The robot is supposed to reach some destination on the surface of the moon. Its path is obstructed by a rock. Suppose that based on its cameras and other sensors, the robot decides that it will go around the rock on the left. At every step the robot will receive new information through its sensors. Because of shadows, rocks may suddenly appear much larger. If the robot were to reconsider its decision with each new piece of information, it would never reach its destination. Therefore, the agent will adopt a plan until some really strong reason forces it to change it. The intentions of an agent correspond to the set of adopted plans at some point in time. BDI theories are successfully applied in natural language processing and the design of interactive systems. The theory of speech acts [3, 44] and subsequent applications in artificial intelligence [14, 1] analyze the meaning of an utterance in terms of its applicability and sincerity conditions and the intended effect. These conditions are best expressed using belief (or knowledge), desire (or goal) and intention. For example, a question is applicable when the speaker does not yet know the answer and the hearer is expected to know the answer. A question is sincere if the speaker actually desires to know the answer. By the conventions encoded in language, the effect of a question is that it signals the intention of the speaker to let the hearer know that the speaker desires to know the answer. Now if we assume that the hearer is cooperative, which is not a bad assumption for interactive systems, the hearer will adopt the goal to let the speaker know the answer to the question and will consider plans to find and formulate such answers. In this way, traditional planning systems and natural language communication can be combined. For example, Sadek [8] describes the architecture of a spoken dialogue system that assists the user in selecting automated telephone services like the weather forecast, directory services or collect calls. According to its developers the advantage of the BDI specification is its flexibility. In case of a misunderstanding, the system can retry and reach its goal to assist the user by some 10 other means. This specification in terms of BDI later developed into the standard for agent communication languages endorsed by FIPA. As a typical example of a BDI model, we discuss Rao and Georgeff’s initial BDI logic [39]. The (partial) information on the state of the environment, which is represented by quantitative probabilities in classical decision theory [42, 28] and by qualitative ordering in qualitative decision theory [6], is now reduced to binary values (0-1). This abstraction of the (partial) information on the state of the environment is called the beliefs of the decision making agent. Similarly, the (partial) information about the objectives of the decision making agent, which is represented by quantitative utilities in classical decision theory and by qualitative preference ordering in qualitative decision theory, is reduced to binary values (0-1) as well. This abstraction of the (partial) information about the objectives of the decision making agent, is called the desires of the decision making agent. In the following rather complicated semantics of the BDI logic, this binary representation is represented by the fact that each relation B, D, and I maps each world (at a given time point) to an unstructured set of worlds. In the spirit of Boutilier’s QDT, each world (at a time point) would have to be mapped to an ordering on worlds. Definition 5 (Semantics of BDI logic [39]) An interpretation M is defined to be a tuple M = hW, E, T, <, U, B, D, I, Φi, where W is the set of worlds, E is the set of primitive event types, T is a set of time points, < a binary relation on time points, U is the universe of discourse, and Φ is a mapping from first-order entities to elements in U for any given world and time point. A situation is a world, say w, at a particular time point, say t, and is denoted by wt . The relations B, D, and I map the agent’s current situation to her beliefs, desire, and intention-accessible worlds, respectively. More formally, B ⊆ W × T × W and similarly for D and I. Sometimes we shall use R to refer to any one of these relations and shall use Rtw to denote the set of worlds R-accessible from world w at time t. Again there is a logic to reason about this semantics. The consequence of the fact that we no longer have pre-orders, like in Boutilier’s logic, but only an unstructured set, is that we can now only represent monadic expressions like B(ϕ) and D(ϕ), no longer dyadic expressions like I(ϕ|ψ). We have that a world (at a certain time point) of the model satisfies B(ϕ) if ϕ is true in all accessible worlds (at the same time point). However, Rao and Georgeff have introduced something else, namely a temporal component. The BDI logic is an extension of the so-called computational tree Logic (CT L∗ ), which is often used to model branching time structure, with modal epistemic operators for beliefs B, desires D, and intentions I. The modal epistemic operators are used to model the cognitive state (belief, desire, and intention) of a decision making agent, while the branching time structure is used to model possible events that could take place at a certain time point and determines the alternative worlds at that time point. 3.3 Comparison QDT and KBS/BDI As in the previous comparison and in order to compare QDT and BDI, we consider the two issues: alternatives and decision rules. Subsequently, we consider some distinctions 11 particular to these approaches. 3.3.1 Alternatives The two approaches have chosen different ways to represent alternatives. Boutilier [6] introduces a very simple but elegant distinction between consequences of actions and consequences of observations, by distinguishing between controllable and uncontrollable propositional atoms. This seems to make his logic less expressive but also simpler than Pearl’s (but beware, there are still a lot of complications around). One of the issues he discusses is the distinction between goals under complete knowledge and goals under partial knowledge. On the other hand, BDI does not involve an explicit notion of actions, but instead model possible events that can take place through time by the branching time. In fact, the branching time structure model possible (sequences of) events and determines the alternative (cognitive) worlds that an agent can bring about through time. Thus, each branch in the temporal structure represents an alternative the agent can choose. Uncertainty about the effects of actions is not modelled by branching time, but by distinguishing between different belief worlds. In effect, they model all uncertainty about the effects of actions as uncertainty about the present state, a well known trick from decision theory we already mentioned in Section 2.1. This is discussed at length in [38], a paper we discuss in more detail in Section 4. 3.3.2 Decision rules As observed above, the qualitative normality and the qualitative desirability orderings on possible worlds that are used in the QDT are reduced to dichotomous values in BDI. Based on the normality and the desirability orderings, Boutilier uses a qualitative decision rule like the Wald criterion. Since there is no such ordering on the possible worlds in BDI, each desire world can in principle be chosen in BDI as the goal world which need to be achieved. It may be clear that it is not an intuitive idea to select a desire world as the goal world in a random way since desire worlds can be in principle not belief worlds. Selecting a desire world which is not believed results in wishful thinking and thus suggests an unrealistic decision. Therefore, BDI proposes a number of constraints under which each desire world can be chosen as a goal world. These constraints are usually characterized by some axioms called realism, strong realism or weak realism [41, 11]. In the BDI logic, one can thus model the belief, desire, and intention worlds of an agent through time and relate these states to each other by means of axioms. In this way, one can state that the set of desire worlds should a subset of belief worlds and that the alternative worlds at certain time point, i.e. all time branches from a world at a certain time point, should be identical to the alternative desire worlds at that time point. This constraint is the so-called realism. In particular, the realism constraint states that agent’s desire should be consistent with its beliefs. Note that this constraint is the same as in QDT where the goal worlds should 12 be consistent with the belief worlds. Formally, the realism axiom states that the set of desire accessible worlds should be a subset of the set of belief accessible worlds, i.e. B(ϕ) → D(ϕ) and, moreover, the belief and desire worlds should have identical branching time structure (the same alternatives), i.e. ∀w∀t∀w′ if w′ ∈ Dtw then w′ ∈ Btw The axioms, which constrain the relation between beliefs, desires, and alternatives determine the type of agents, i.e. the type of decision making behavior. Thus, in contrast to QDT, where the decision rule is based on the ordering on the possible worlds, in BDI any desire world that satisfies a set of assumed axioms such as realism, can be chosen as the goal world, i.e. BDI systems do not consider decision rules but they consider agent types. Although, there is no such issue as agent type in classical or qualitative decision theory, there are discussions which can be related to agent types. For example, in discussions about risk attitude, often a distinction is made between risk neutral and risk averse. In the full version of the BDI, where intentions are considered, additional axioms are introduced to reduced the set of desire worlds that can be chosen as the goal world. These axioms guarantee that chosen desire world is consistent with the worlds that are already chosen as the goal worlds. For example, in addition to the above axiom defined on beliefs and desires, the definition of realism includes also the following axiom that states that the intention accessible worlds should be a subset of desire accessible worlds, i.e. D(ϕ) → I(ϕ) and, moreover, the desire and intention worlds should have identical branching time structure (the same alternatives), i.e. ∀w∀t∀w′ if w′ ∈ Itw then w′ ∈ Dtw In addition to these constraints, which are classified as static constraints, there are different types of constraints introduced in BDI resulting in additional agent types. These axioms determine when intentions or previously decided goals should be reconsidered or dropped. In BDI, choosing to drop a goal is thus considered to be as important as choosing a new goal. These constraints, called commitment strategies, involve time and intentions and express the dynamics of decision making. The well-known commitment strategies are ‘blindly committed decision making’, ‘single-minded committed decision making’, and ’open-minded committed decision making’. For example, the single-minded commitment strategy states that an agent remains committed to its intentions until wither it achieves its corresponding objective or does not believe that it can achieve it anymore. The notion of agent type has been refined and it has been extended to include obligations in Boersen et al.’s BOID system [10]. For example, they also formally distinguish between selfish agents, that give priority to their own desires, and social agents, that give priority to their obligations. 13 3.3.3 Two Steps A similarity between the two approaches is that we can distinguish two steps. In Boutilier’s approach, decision-making with flexible goals has split the decision-making process in two steps. First a decision is made which goals to adopt, and second a decision is made how to reach these goals. These two steps have been further studied by Thomason [49] and Broersen et al. [10] in the context of default logic. 1. First the agent has to combine desires and resolve conflicts between them. For example, assume that the agent desires to be on the beach, if he is on the beach then he desires to eat an ice-cream, he desires to be in the cinema, if he is in the cinema then he desires to eat popcorn, and he cannot be at the beach as well as in the cinema. Now he has to choose one of the two combined desires, or optional goals, being at the beach with ice-cream or being in the cinema with popcorn. 2. Second, the agent has to find out which actions or plans can be executed to reach the goal, and he has to take all side-effects of the actions into account. For example, assume that he desires to be on the beach, if he will quit his job and drive to the beach, he will be on the beach, if he does not have a job he will be poor, if he is poor then he desires to work. The only desire and thus the goal is to be on the beach, the only way to reach this goal is to quit his job, but the side effect of this action is that he will be poor and in that case he does not want to be on the beach but he wants to work. Now crucially, desires come into the picture two times! First they are used to determine the goals, and second they are used to evaluate the side-effects of the actions to reach these goals. In extreme cases, like the example above, what seemed like a goal may not be desirable, because the only actions to reach these goals have negative effects with much more impact than these goals. This shows that the balancing philosophy of BDI could be very helpful. At first sight, it seems that we can apply classical decision theory to each of these two sub-decisions. However, there is a caveat. The two sub-decisions are not independent, but closely related! For example, to decide which goals to adopt we must know which goals are feasible, and we thus have to take the possible actions into account. Moreover, the intended actions constrain the candidate goals which can be adopted. Other complications arise due to uncertainty, changing environments, etc, and we conclude here that the role of decision theory in planning is complex, and that decision-theoretic planning is much more complex than classical decision theory. 3.3.4 Goals vs desires A distinction between the two approaches is that Boutilier distinguishes between desires and goals, whereas Rao and Goergeff do not. In Boutilier’s logic, there is a formal distinction between preference ordering and goals expressed by ideality statements. Rao and Georgeff have unified these two notions, which has been criticized by [16]. In decision 14 systems such as [10], desires are considered to be more primitive than goals, because goals have to be adopted or generated based on desires. Moreover, goals can be based on desires, but also on other sources. For example, a social agent may adopt as a goal the desires of another agent or an obligation. Desires and candidate goals may conflict, but also goals have been considered which do not conflict. There are three main traditions. In the Newell/Simon tradition of knowledge level and knowledge based systems, goals are related to utility aspiration level and to limited (bounded) rationality. In this tradition goals have a desirability aspect as well as intentionality aspect. In the more recent BDI tradition knowledge and goals have been replaced by beliefs, desires and intentions due to Bratman’s work on the role of intentions in deliberation [7]. The third tradition relates desires and goals to utilities as used in classical decision theory. The problem here is that decision theory abstracts away from the deliberation cycle. Typically, Savage like constructions only consider the input (state of the world) and output (action) of agent. Consequently, utilities can be related to either desires or goals. 3.3.5 Conflict resolution A similarity between the two logics is that both are not capable of representing conflicts, either conflicting beliefs or conflicting desires. Although the constraints imposed by the I operator are rather weak, they are still too strong to represent certain types of conflicts. Consider conflicts among desires. Typically desires are allowed to be inconsistent, but once they are adopted and have become intentions, they should be consistent. Moreover, in the logic proposed in [48] a specificity set like D(p), D(¬p|q) is inconsistent (this problem is solved in [47], but the approach is criticized as ad hoc in [5]). A large set of potential conflicts between desires, including a classification and ways to resolve it, is given in [32]. A radically different approach to solving conflicts is to apply Reiter’s default logic to create extensions. This is recently proposed by Thomason [49], and thereafter also used in the BOID architecture [10]. Finally, an important branch of decision theory has to do with reasoning about multiple objectives, which may conflict, using multiple attribute utility theory: [30]. This is also the basis of the ‘ceteris paribus’ preferences mentioned above. This can be used to formalize conflicting desires. Note that all the modal approaches above would make conflicting desires inconsistent. 3.3.6 Proof theory A similarity of Boutilier’s logic and Rao and Georgeff’s logic is that both are based on a normal monadic modal logic. However, Boutilier interprets the accessibility relation as a preference relation, and defines in this monadic modal logic dyadic (i.e. conditional) operators. This works as follows. The truth condition for the modal connectives are defined as follow: M, w |= 2P ϕ ⇔ ∀v ∈ W v ≤P w, M, v |= ϕ 15 ← M, w |= 2P ϕ ⇔ ∀v ∈ W w <P v, M, v |= ϕ Thus, 2P ϕ is true in a world w iff ϕ is true at all worlds that are at least as preferred as w, ← and 2P ϕ is true in a world w iff ϕ is true at all less preferred worlds. The dual ‘possibility’ connectives are defined as usual (♦P ϕ = ¬2P ¬ϕ) meaning that ϕ is true at some equally ↔ ← ↔ ← or more preferred world, 2P = 2P ϕ∧ 2P ϕ, and ♦P = ♦P ϕ∨ ♦P ϕ. Moreover, a second variant of the modal operators, which are based on the normality orderings, are defined in ← similar way. These modal operators have N as the index, i.e. 2N , 2N , ♦N , etc. This logic formalizes the notion of ideal and knowledge using the preferences and normality orderings on worlds. The semantics of the ideal and normative connective for the knowledge are defined as follows: ↔ ↔ ↔ ↔ I(B|A) ≡def 2P ¬A ∨ ♦P (A ∧ 2P (A → B)) A ⇒ B ≡def 2N ¬A ∨ ♦N (A ∧ 2N (A → B)) Unlike QDT, in the BDI logic different modalities can be nested arbitrarily to represent beliefs about beliefs, belief about desires, etc. In the BDI model, there is no preference and normality orderings on the worlds such that there is no qualitative notion of decision rule: unlike QDT where agents attempt to achieve the best world which are consistent to their knowledge, in BDI all desire worlds are equally good such that the agent can aim to achieve each desire worlds. In Rao and Georgeff’s logic, a world at a certain time is also called a situation. From a certain situation, each time branch represents an event and determines one alternative situation, i.e one alternative world at the next time point. The modal epistemic operators have specific properties such as closeness under implication and consistency (KD axioms). Like in CT L∗ , the BDI logic has two types of formula. The first type of formula is called state formula, which are evaluated at a situation, i.e. in a given world and at a given time point. The second type of formula is called path formula, which are evaluated along a given path in a given world. Therefore, the path formulae express alternative worlds through time. In order to give examples of how state formulae are evaluated, let B and D be modal epistemic operator and ϕ be a state formula, then the state formulae Bϕ and Dϕ are evaluated relative to the model M and situation wt as follows: M, wt |= Bϕ ⇔ ∀w′ ∈ Btw M, wt′ |= ϕ M, wt |= Dϕ ⇔ ∀w′ ∈ Dtw M, wt′ |= ϕ 3.3.7 Non-monotonic closure rules A distinction between the logics is that Rao and Georgeff only present a monotonic logic, whereas Boutilier also presents a non-monotonic extension of his logic. The constraints imposed by I formulas of Boutilier (ideal) are relatively weak. Since the semantics of the I operator is analogous to the semantics of many default logics, Boutilier 16 [6] proposes to use non-monotonic closure rules for the I operator too. In particular he uses the well-known system Z [35]. Its workings can be summarized as ‘gravitation towards the ideal’, in this case. An advantage of this system is that it always gives exactly one preferred model, and that the same logic can be used for both desires and defaults. A variant of this idea was developed by Lang [31], who directly associates penalties with the desires (based on penalty logic [37]) and who does not use rankings of utility functions but utility functions themselves. More complex constructions have been discussed in [48, 47, 51, 33]. 4 Classical decision theory vs BDI logic In this section we compare classical decision theory to BDI theory. Thus far, we have seen a quantatative ordering in CDT, a semi-qualitative and qualitative ordering in QDT, and binary values in BDI. Classical decision theory and BDI thus seem far apart, and the question can be raised how they can be related. This question has been ignored in the literature, except by Rao and Georgeff’s little known translation of decision trees to beliefs and desires in [38]. Rao and Georgeff show that probabilistic constructions like probability and pay-off can be recreated in the setting of their BDI logic. This results in a somewhat artificial system, in which probabilities and pay-offs appear both in the object language and the model. However, it does show that the two approaches are compatible. In this section we sketch their approach. 4.1 BDI, continued In the semantics, Rao and Georgeff add semantic strutures to represent probabilities and utilities. Definition 6 (Extended BDI models [38]) The semantics of the BDI logic is based on models M of the following form: M = hW, E, T, <, RB , RD , RI , P, U, V i where W , E , T , <, RB , RD , RI and V are as before. The function P is a probability assignment function that assigns to each situation wt a probability distribution Ptw that in turns assigns probabilities to propositions (i.e. sets of possible worlds). The function U is a pay-off assignment function that assigns to each situation a pay-off function Utw that in turn assigns real-valued numbers to paths. In the formal logic, Rao and Georgeff add expressions to talk about the two new functions. Moreover, they also add expressions for various decision rules such as MEU and Wald’s criterion. We did not give the latter since they are rather involved. M, wt |= P ROB(ϕ) ≥ n ⇔ Ptw ({w′ ∈ B(wt ) | M, wt′ |= ϕ}) ≥ n. ′ M, wt |= P AY OF F (ψ) ≥ m ⇔ ∀wt′ ∈ G(wt ) and xi = hwt′ , wt+1 , ...i such that M, xi |= ψ it holds that Utw (xi ) ≥ m 17 We do not give any more formal details (they can be found in the cited paper), but we illustrate the logic by an example. Consider the example in figure 1. There is an American politician, a member of the house of representatives, who must make a decision about his political career. He may stay in the house of representatives (Rep), try to be elected in the senate (Sen) or retire altogether (Ret). He does not consider the option of retiring seriously, and is certain to keep his seat in the house. He must make his decision based on a poll, which is either a majority approval of his move to the senate (yes) or a majority disapproval (no). There are four belief-accessible worlds, each with a specific likelihood attached. The general propositions win, loss, yes and no are true at the appropriate points. The goal worlds are also shown, with the individual payoffs attached. For example, the payoff of trying to go to the senate and losing is 100, whereas trying the senate and wining is valued as 300. Note that retiring is an option in the belief worlds, but is not considered a goal. Finally, if we apply the maximal expected value decision rule, we end up with four remaining intention worlds, that indicate the commitments the agent should rationally make. Belief worlds 0.24 Rep Sen T4 win no poll Sen no poll loss Rep T5 loss poll win Rep T2 poll T2 yes yes Sen Sen Rep poll T2 win no poll T0 Rep T2 T6 Sen Ret T0 yes Sen T1 Ret T0 poll Rep T1 Ret T0 0.16 Rep Rep T1 Ret no poll 0.18 0.18 T3 T1 no Sen loss loss Sen win Goal worlds Rep Rep 200 200 no poll Sen 205.9 200 Rep 200 Rep 300 win no poll Sen 205.9 200 214.2 Rep 100 loss no poll 200 Sen Sen Rep 205.9 214.2 300 win 300 win 200 no poll 100 loss Sen 100 loss Sen Rep 205.9 155.2 100 loss 200 155.2 poll no Sen 200 Ret poll yes yes Rep 200 Ret poll poll 200 200 Ret Ret no Sen 300 win Sen 100 loss Intention Worlds Rep poll poll poll yes yes Sen 300 win 200 Rep 200 poll no no Figure 1: Belief, Goal and Intention worlds, using maxexpval as decision rule. 4.2 Relation between decision theory and BDI Rao and Georgeff relate decision trees to these kinds of structures on possible worlds. The proposed transformation is not between decision theory and BDI logic, but only between a decision tree and the goal accessible worlds of some agent. A decision tree consists of two types of nodes: one type of nodes expresses agent’s choices and the other type expresses the uncertainties about the effect of actions. These two types of nodes are indicated by a square and circle in the decision trees as illustrated 18 Rep 200 200 Ret α=1 Rep 200 no poll o 205.9 100 No Poll α = 0.42 win yes Sen win 300 o 100 No Poll Sen yes Rep 100 no poll Sen 300 o 200 Ret Sen 100 loss 200 Rep win 300 win 200 no P(win) = 0.4 P(loss) = 0.6 P(yes) = 0.42 P(no) = 0.58 P(win|yes) = 0.571 P(loss|yes) = 0.429 P(win|no) = 0.275 P(loss|no) = 0.724 Sen loss 200 Rep 300 0.24 o loss o 200 poll 200 Rep 200 Rep Rep 214.2 loss Poll 300 win 300 win Sen Sen Poll 100 205.9 loss yes win Sen 200 214.2 poll o 100 loss α = 0.58 Rep 300 yes Sen 0.18 100 loss 200 Rep Rep 200 200 win No Poll Sen Ret 300 o no poll Sen 300 win 100 loss 205.9 Rep 200 155.2 200 Rep poll Poll no no win Sen 300 Sen 0.16 300 win o Rep 100 200 200 loss Ret no poll Sen 205.9 Rep 100 loss 200 155.2 poll no 0.42 Sen 100 loss Figure 2: Transformation of a decision tree into a possible worlds structure in figure 2. In order to generate relevant plans (goals), the uncertainties about the effect of actions are removed from the given decision tree (circle in figure 2) resulting in a number of new decision trees. The uncertainties about the effect of actions are now assigned to the newly generated decision trees. For example, consider the decision tree in figure 2. A possible plan is to perform P oll followed by Sen if the effect of the poll is yes or Rep if the effect of the poll is no. Suppose that the probability of yes as the effect of a poll is 0.42 and that the probability of no is 0.58. Now the transformation will generate two new decision trees: one in which event yes takes place after choosing P oll and one in which event no takes place after choosing P oll. The uncertainties 0.42 and 0.57 are then assigned to the resulting trees, respectively. The new decision trees provide two scenarios P oll; if yes, then Sen and P oll; if no, then Rep with probabilities 0.42 and 0.58, respectively. In these scenarios the effects of events are known. The same mechanism can be repeated for the remaining chance nodes. The probability of a scenario that occurs in more than one goal world is the sum of the probabilities of 19 the different goal worlds in which the scenario occurs. This results in the goal accessible worlds from figure 1. The agent can decide on a scenario by means of a decision rule such as maximum expected utility. 5 Extensions In this section we discuss extension of classical decision theory with time in processes and with multiple agents in game theory. 5.1 Time: processes, planning and intentions Decision processes are sequences of decision problems. The field of decision processes contains some particular problems, such as the relation between these sequential decisions. One popular assumption is that sequential decisions are unrelated. This has been called the Markov assumption. One of the main open problems of BDI theory is how intentions are related to decision processes. As we mentioned above, the important contribution of the BDI logic is the introduction of the notion of intentions (or previous decisions) and commitments to the intentions. The idea of intention is to make the decision making behavior more stable. The argument of the BDI research [40, 41] is that the classic and qualitative decision theories may produce instable decision behavior when the environment is dynamic. Every change in the environment requires the decision problem to be reformulated, which in turn may result in contradictory decisions. Recall the example of the lunar robot, which may make wildly diverging decisions based on relatively arbitrary differences in the reading of its sensors. Committing to previous decisions can therefore influence the decisions that an agent makes at each time. An agent can adopt different commitment strategies to deal with its intentions. In BDI, several strategies are introduced to keep or drop the commitments to previous decisions. The well-known commitment strategies are blindly committed, singleminded committed, and open-minded committed [40]. The first denies any change in its beliefs and desires that conflicts with its previous decisions, the second allows belief changes and drop previous decisions that conflict with the new beliefs, and the last strategy allows both its desires and beliefs to change and drop its previous decisions that conflict with the new beliefs and desires. The process of intention creation and reconsideration is often called the deliberation process. 5.2 Multiple agents: games, norms and commitments Norms and social commitments are of interest when there is more than one agent making decisions. This situation has been studied in classical game theory, with its equilibria analysis. Recently computational BDI has been extended with norms (or obligations), see e.g. [19], though it is still debated whether artificial agents need norms, and if they are needed then for what purposes. It is also debated whether they should be represented 20 explicitly. Arguments for their use have been given in the cognitive approach to BDI, in evolutionary game theory and in the philosophical areas of practical reasoning and deontic logic. Many different notions of norms and commitments have been discussed, for example: Norms as goal generators. The cognitive science approach to BDI [15, 12] argues that norms are needed to model social agents. Norms are important concepts for social agents, because they are a mechanism by which society can influence the behavior of individual agents. This happens through the creation of normative goals, a process which consists of four steps. First the agent has to believe that there is a norm, then it has to believe that this norm is applicable, then it has to decide that it accepts this norm – the norm now leads to a normative goal – and finally it has to decide whether it will fulfill this normative goal. Reciprocal norms. The argument of evolutionary game theory [4] is that reciprocal norms are needed to establish cooperation in repeated prisoner’s dilemmas. Norms influencing decisions. In practical reasoning, in legal philosophy and in deontic logic (in philosophy as well as in computer science) it has been studied how norms influence behavior. Norms stabilizing multiagent systems. It has been argued that obligations play the same role in multiagent systems as intentions do in single agent systems, namely they stabilize its behavior [50]. Here we discuss an example which is closely related to game theory, in particular to the pennies pinching example. This is a problem discussed in philosophy that is also relevant for advanced agent-based computer applications. It is related to trust, but it has been discussed in the context of game theory, where it is known as a non-zero sum game. Hollis [26, 27] discusses the example and the related problem of backward induction as follows. A and B play a game where ten pennies are put on the table and each in turn takes one penny or two. If one is taken, then the turn passes. As soon as two are taken the game stops and any remaining pennies vanish. What will happen, if both players are rational? Offhand one might suppose that they emerge with five pennies each or with a six-four split – when the player with the odd-numbered turns take two at the end. But game theory seems to say not. Its apparent answer is that the opening player will take two pennies, thus killing the golden goose at the start and leaving both worse off. The immediate trouble is caused by what has become known as backward induction. The resulting pennies gained by each player are given by the bracketed numbers, with A’s put first in each case. Looking ahead, B realizes that they will not reach (5,5), because A would settle for (6,4). A realizes that B would therefore settle for (4,5), which makes it rational for A to stop at (5,3). In that case, B would settle for (3,4); so A would therefore settle for (4,2), leading B to prefer (2,3); and so on. A thus takes two pennies at his first move and reason has obstructed the benefit of mankind. 21 Game-theory and backward induction reasoning do not offer the intuitive solutions to the problem, because agents are assumed to be rational in the sense of economics and consequently game-theoretic solutions do not consider an implicit mutual understanding of a cooperation strategy [2]. Cooperation results in an increased personal benefit by seducing the other party in cooperation. The open question is how such ‘super-rational’ behavior can be explained. Hollis considers in his book ‘Trust within reason’ [27] several possible explanations why an agent should take one penny instead of two. For example, taking one penny in the first move ‘signals’ to the other agent that the agent wants to cooperate (and it signals that the agent is not rational in the economic sense). Two concepts that play a major role in his book are trust and commitment (together with norm and obligation). One possible explanation is that taking one penny induces a commitment that the agent will take one penny again in his next move. If the other agent believes this commitment, then it has become rational for him to take one penny too. Another explanation is that taking one penny leads to a commitment of the other agent to take one penny too, maybe as a result of a social law. Moreover, other explanations are not only based on commitments, but also on the trust in the other party. In [9] Broersen et al. introduce a language in which some aspects of these analyzes can be represented. They introduce a modal language, like the ones which have seen before, in which they introduce two new modalities. The formula Ci,j (α > β) means that agent i is more committed towards agent j to do α than β, and Tj,i (α > β) means that agent j is more trusted by agent i after executing α than after executing β. In all examples they accept the following relation between trust and commitment, which denotes that violations of stronger commitments result in a higher loss of trustworthiness than violations of weaker ones. Ci,j (α > β) → Tj,i (α > β) We first consider the example without communication. The set of agents is G = {1, 2} and the set of atomic actions A = {takei (1), takei (2) | i ∈ G}, where takei (n) denotes that the agent i takes n pennies. The following formula denotes that taking one penny induces a commitment to take one penny later on. [take1 (1); take2 (1)]C1,2 (take1 (1) > take1 (2)) The formula expresses that taking one penny is interpreted as a signal that the agent 1 will take one penny again on his next turn. When this formula holds, it is rational for agent 2 to take one penny. The following formula denotes that taking one penny induces a commitment for the other agent to take one penny on the next move. [take1 (1)]C2,1 (take2 (1) > take2 (2)) The formula denotes the implications of a social law, which states that you have to return favors. It is like giving a present at someone’s birthday, thereby giving the person the obligation to return a present for your birthday. 22 Besides the commitment operator more complex examples involve also the trust operator. For example, the following formula denotes that taking one penny increases the amount of trust. Ti,j ((α; takej (1)) > α). The following formulas illustrate how commitment and trust may interact. The first formula expresses that each agent intends to increase the amount of trust (=long term benefit). The second formula expresses that any commitment to itself is also a commitment to the other agent (a very strong cooperation rule). Ti,j (β > α) → Ij (β > α). Cj,j (β > α) ↔ Cj,i (β > α). From these two rules, together with the definitions and the general rule, we can deduce: Ci,j (takei (1) > takei (2)) ↔ Tj,i (takei (1) > takei (2)) In this scenario, each agent is assumed to act to increase its long term benefit, i.e. act to increase the trust of other agents. Note that the commitment of i to j to take one penny increases the trust of j in i and vice versa. Therefore, each agent would not want to take two pennies since this will decrease its long term benefit. Broersen et al. also discuss scenarios of pennies pinching with communication. 6 Concluding remarks In this paper we consider some examples of how the research areas CDT, QDT, KBS and BDI are related. These approaches have been developed in separation, and not much is know about their relationships. Consequently our study is exploratory. We start with the observation that QDT, KBS and BDI have been developed as a criticism on classical decision theory. QDT introduces qualitative abstractions to deal with situations in which not all numbers are available. KBS introduces goals to formalize utility aspiration levels and satisficing instead of maximizing, and BDI introduces intentions to make the decision-making process more stable. We discuss various similarities and differences. For example, all approaches have a representation of information (probabilities, qualitative abstractions, knowledge, beliefs) and a representation of motivation (utilities, qualitative abstractions, goals, desires). All approaches have a way to represent alternatives (a small set, formulas do(ϕ) for any ϕ, a set of decision variables, branches in a tree). Some of the approaches focus on decision rules (the output of the decision process), others on deliberation constraints or agent types (the way the output is produced). We also discuss several extensions. If we consider how decisions are calculated in theories of deliberation, then there are various notions of goals, which can for example be adopted or generated. In decision processes the role of intentions becomes object of study, 23 area information motivation alternatives focus CDT operations research probabilities utilities small set decision rule QDT artificial intelligence qualitative prob qualitative ut Do(ϕ) decision rule KBS artificial intelligence knowledge goals decision var deliberation BDI agent theory beliefs desires branches deliberation Table 2: Comparison and in decision games, the role of obligations and commitment is studied. In KBS and BDI norms are usually understood as obligations from the society, inspired by work on social agents, social norms and social commitments [12], whereas in decision theory and game theory norms are understood as reciprocal norms in evolutionary game theory [4, 45] that lead to cooperation in iterated prisoner’s dilemmas and in general lead to an decrease in uncertainty and an increase in stability of a society. We draw three preliminary conclusions, which indicate directions for further research. First, we have not found any assumptions or techniques which are incompatible between the approaches. This suggest that there is no need to choose one of the approaches, but that they can be combined. Second, several techniques can be interchanged. For example, whether one uses decision variables or branches in a decision tree to model the choice between alternatives seems to be irrelevant. Third, there are various issues which have been studied in one area but ignored in another one, which suggests that the approaches can benefit from each other. For example, several ideas developed in QDT which can be used to enrich BDI: the use of different decision rules, the role of knowledge in decision making and the granularity of actions, and the construction process of goals from explicit (conditional) desires. One way to integrate QDT and BDI is to start with QDT and try to incorporate ingredients of BDI. An example of such an approach is Broersen et al.’s BOID architecture, which takes the process-oriented approach from Thomason’s BDP and extending it with intentions and norms. However, most elements of QDT we discuss in this paper have not yet been incorporated in this approach. Another approach is to extend BDI with the elements of QDT we discuss here. We hope that our study will lead to more research into hybrid approaches. Acknowledgments Thanks to Jan Broersen and Zhisheng Huang for many discussions on related subjects in the context of the BOID project. 24 References [1] J. Allen. and G. Perrault. Analyzing intention in dialogues. Artificial Intelligence, 15(3):143–178, 1980. [2] R. Auman. Rationality and bounded rationality. Games and Economic behavior, 21:2–14, 1986. [3] J. Austin. How to do things with words. Harvard University Press, Cambridge Mass., 1962. [4] R. Axelrod. The evolution of cooperation. Basic Books, New York, 1984. [5] F. Bacchus and A.J. Grove. Utility independence in a qualitative decision theory. In Proceedings of Fifth International Conference on Knowledge Representation and Reasoning (KR’96), pages 542–552. Morgan Kaufmann, 1996. [6] C. Boutilier. Towards a logic for qualitative decision theory. In Proceedings of the Fourth International Conference on Knowledge Representation and Reasoning (KR’94), pages 75–86. Morgan Kaufmann, 1994. [7] M. Bratman. Intention, plans, and practical reason. Harvard University Press, Cambridge Mass, 1987. [8] P. Bretier and D. Sadek. A rational agent as the kernel of a cooperative spoken dialogue system: Implementing a logical theory of interaction. In Intelligent Agents III: ECAI’96 Workshop on Agent Theories, Architectures and Languages (ATAL), volume 1193 of LNCS, pages 189–204. Springer, 1997. [9] J. Broersen, M. Dastani, Z. Huang, and L. van der Torre. Trust and commitment in dynamic logic. In Proceedings of The First Eurasian Conference on Advances in Information and Communication Technology (EurAsia ICT 2002), volume 2510 of LNCS, pages 677–684. Springer, 2002. [10] J. Broersen, M. Dastani, J. Hulstijn, and L. van der Torre. Goal generation in the BOID architecture. Cognitive Science Quarterly, 2(3-4):428–447, 2002. [11] J. Broersen, M. Dastani, and L. van der Torre. Realistic desires. Journal of Applied Non-Classical Logics, 12(2):287–308, 2002. [12] C. Castelfranchi. Modeling social agents. Artificial Intelligence, 1998. [13] P. Cohen and H. Levesque. Intention is choice with commitment. Artificial Intelligence, 42 (2-3):213–261, 1990. [14] P. Cohen and C. Perrault. Elements of a plan-based theory of speech acts. Cognitive Science, 3:177–212, 1979. 25 [15] R. Conte and C Castelfranchi. Understanding the effects of norms in social groups through simulation. In G.N. Gilbert and R. Conte, editors, Artificial societies: the computer simulation of social life. London, UCL Press, 1995. [16] M. Dastani and L. van der Torre. Specifying the merging of desires into goals in the context of beliefs. In Proceedings of The First Eurasian Conference on Advances in Information and Communication Technology (EurAsia ICT 2002), LNCS 2510, pages 824–831. Springer, 2002. [17] T. Dean and M. Wellman. Planning and Control. Morgan Kaufmann, San Mateo, 1991. [18] D. Dennett. The Intentional Stance. MIT Press, Cambridge, MA, 1987. [19] F. Dignum. Autonomous agents with norms. Artificial Intelligence and Law, pages 69–79, 1999. [20] J. Doyle. A model for deliberation, action and introspection. Technical Report AITR-581, MIT AI Laboratory, 1980. [21] J. Doyle, Y. Shoham, and M.P. Wellman. The logic of relative desires. In Sixth International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, 1991. [22] J. Doyle and R. Thomason. Background to qualitative decision theory. AI magazine, 20:2:55–68, 1999. [23] J. Doyle and M. Wellman. Preferential semantics for goals. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI’91), pages 698–703, 1991. [24] D. Dubois and H. Prade. Possibility theory as a basis for qualitative decision theory. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI’95), pages 1924–1930. Morgan Kaufmann, 1995. [25] B. Hansson. An analysis of some deontic logics. Nous, 1969. [26] M. Hollis. Penny pinching and backward induction. Journal of Philosophy, 88:473–488, 1991. [27] M. Hollis. Trust within Reason. Cambridge University Press, 1998. [28] R. C. Jeffrey. The Logic of Decision. McGraw-Hill, New York., 1965. [29] N. Jennings. On agent-based software engineering. Artificial Intelligence, 117(2):277– 296, 2000. [30] R.L. Keeney and H. Raiffa. Decisions with Multiple Objectives: Preferences and Value Trade-offs. John Wiley and Sons, New York, 1976. 26 [31] J. Lang. Conditional desires and utilities - an alternative approach to qualitative decision theory. In Proceedings of the Twelth European Conference on Artificial Intelligence (ECAI’96), pages 318–322. John Wiley and Sons, 1996. [32] J. Lang, E. Weydert, and L. van der Torre. Conflicting desires. In Proceedings GTDT01, 2001. [33] J. Lang, E. Weydert, and L. van der Torre. Utilitarian desires. Autonomous Agents and Multi-Agent Systems, 5:3:329–363, 2002. [34] Allen Newell. The knowledge level. Artificial Intelligence, 18(1):87–127, 1982. [35] J. Pearl. System Z. In Proceedings of the TARK’90, pages 121–135. Morgan Kaufmann, 1990. [36] J. Pearl. From conditional ought to qualitative decision theory. In Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence (UAI’93), pages 12–20. John Wiley and Sons, 1993. [37] G. Pinkas. Reasoning, nonmonotonicity and learning in connectionist network that capture propositional knowledge. Artificial Intelligence, 77:203–247, 1995. [38] A. Rao and M. Georgeff. Deliberation and its role in the formation of intentions. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence (UAI91), pages 300–307, San Mateo, CA, 1991. Morgan Kaufmann Publishers. [39] A. Rao and M. Georgeff. Modeling rational agents within a bdi architecture. In Proceedings of Second International Conference on Knowledge Representation and Reasoning (KR’91), pages 473–484. Morgan Kaufmann, 1991. [40] A. Rao and M. Georgeff. BDI agents: From theory to practice. In Proceedings of the First International Conference on Multi-Agent Systems (ICMAS’95), 1995. [41] A. Rao and M. Georgeff. Decision procedures for BDI logics. Journal of Logic and Computation, 8:293–342, 1998. [42] L. Savage. The foundations of statistics. Wiley, New York, 1954. [43] G. Schreiber, H. Akkermans, A. Anjewierden, R. de Hoog, N. Shadbolt, W. Van de Velde, and B. Wielinga. Knowledge Engineering and Management — The CommonKADS Methodology. The MIT Press, Cambridge, Massachusetts; London, England, 1999. [44] J. Searle. Speech acts: an Essay in the Philosophy of Language. Cambridge University Press, Cambridge, 1969. [45] Y. Shoham and M. Tennenholtz. On the emergence of social conventions: modeling, analysis, and simulations. Artificial Intelligence, 94(1-2):139–166, 1997. 27 [46] H. A. Simon. A behavioral model of rational choice. Quarterly Journal of Economics, pages 99–118, 1955. [47] S.-W. Tan and J. Pearl. Qualitative decision theory. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI’94), 1994. [48] S.-W. Tan and J. Pearl. Specification and evaluation of preferences under uncertainty. In Proceedings of the Fourth International Conference on Knowledge Representation and Reasoning (KR’94), pages 530–539. Morgan Kaufmann, 1994. [49] R. Thomason. Desires and defaults: a framework for planning with inferred goals. In Proceedings of Seventh International Conference on Knowledge Representation and Reasoning (KR’2000), pages 702–713. Morgan Kaufmann, 2000. [50] L. van der Torre. Contextual deontic logic: Normative agents, violations and independence. Annals of Mathematics and Artificial Intelligence, 37 (1-2):33–63, 2003. [51] L. van der Torre and E. Weydert. Parameters for utilitarian desires in a qualitative decision theory. Applied Intelligence, 14:285–301, 2001. [52] M. Walker. Inferring acceptance and rejection in dialog by default rules of inference. Language and Speech, 39(2-3):265–304, 1996. 28