DR 5.1.2: Methods and paradigms for skill learning based on
Transcription
DR 5.1.2: Methods and paradigms for skill learning based on
DR 5.1.2: Methods and paradigms for skill learning based on affordances and action-reaction observation Mario Gianni, Panagiotis Papadakis, Fiora Pirri and Matia Pizzoli Dipartimento di Informatica e Sistemistica - Sapienza Università di Roma, via Ariosto 25, 00185 Rome, Italy [email protected] Project, project Id: Project start date: Due date of deliverable: Actual submission date: Lead partner: Revision: Dissemination level: EU FP7 NIFTi / ICT-247870 Jan 1 2010 (48 months) December 2010 February 2011 ROMA FINAL PU This document describes the status of progress for the research on learning skills for functioning processes and task execution performed by the NIFTi Consortium. In particular, according to the Description of Work (DOW), research is focused on the development of novel methods and paradigms for skill learning based on affordances and action-reaction observation. Planned work, as per the DOW, is introduced and the actual work is discussed, highlighting the relevant achievements, how these contribute to the current state of the art and to the aims of the project. 1 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli 1 Tasks, objectives, results 1.1 Planned work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Actual work performed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Task T5.1 Planning activities specification with end user (completed) 1.2.2 Task T5.2: Learning Skills for functioning processes and task execution (in progress) . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 T5.3: Task-driven attention for coordination and communication (in progress) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 T5.5: Situated exploration history (in progress) . . . . . . . . . . . . 1.3 Relation to user-centric design . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Relation to the state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 6 6 14 14 15 16 2 Annexes 2.1 Rudi et al. “Linear Solvability in the Viewing Graph” (ACCV2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Fanello et al. “Arm-Hand behaviours modelling: from attention to imitation”(ISVC2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 H. Khambhaita et al. “Help Me to Help You: how to Learn Intentions, Actions and Plans”(AAAI-SSS11) . . . . . . . . . . . . . . . . . . . . . . . 2.4 Carbone and Pirri. “Learning Saliency. An ICA based model using Bernoulli mixtures.”(BICS 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Gianni et al. “Learning cross-modal translatability: grounding speech act on visual perception.”(RSS 2010) . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Carrano et al. “An approach to projective reconstruction from multiple views.”(IASTED 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Krieger and Kruijff. “Combining Uncertainty and Description Logic RuleBased Reasoning in Situation-Aware Robots.”(AAAI-SSS 2011b) . . . . . . 2.8 Stachowicz and Kruijff. “Episodic-Like Memory for Cognitive Robots.”(IEEETAMD 2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Pirri et al. “A general method for the Point of Regard estimation in 3D space.”(CVPR 2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Finzi and Pirri. “Switching tasks and flexible reasoning in the Situation Calculus.”(TR 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Arm-Hand behaviours modelling: from attention to imitation 36 4 Learning Saliency. An ICA based model using Bernoulli mixtures 48 20 21 22 23 24 25 26 27 28 29 5 Learning cross-modal translatability: grounding speech act on visual perception 60 6 An approach to projective reconstruction from multiple views 63 7 Switching tasks and flexible reasoning in the Situation Calculus 70 EU FP7 NIFTi (ICT-247870) 2 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli Executive Summary This document describes the status of progress for the research on learning skills for functioning processes and task execution performed by the NIFTi Consortium. In particular, according to the Description of Work (DOW), research is focused on the development of novel methods and paradigms for skill learning based on affordances and action-reaction observation. Planned work, as per the DOW, is introduced and the actual work is discussed, highlighting the relevant achievements, how these contribute to the current state of the art and to the aims of the project. The exploration of an unknown and dynamic environment, the need to readily address the requests produced by the mixed initiative require the robot control to be flexible and adapt to the continuous changes of context. Switching between sensing modalities, changing the focus of attention or asking the operator’s intervention are example of skills that are desirable in order to develop the user centric, interaction oriented architecture that is the primary goal of the NIFTi research. Task 5.2 (T5.2), whose status of advancement is reported in this document, addresses the problem of providing methods and paradigms for learning the required skills. The training set is made of observations on demonstrations performed by humans. Central in this learning paradigm is the role of data collection in real scenarios from demonstrations by experts. Thus, together with the development of the flexible planning architecture, a novel approach to data collection from online demonstrations is introduced. This approach relies on the Gaze Machine, a system for the acquisition of demonstration data from the demonstrator’s point of view. The aim of this document is to report about the research carried out by WP5 and mainly concerned with T5.2. Nonetheless, the performed work relies on the analysis resulting from T5.1 and provides input to all the other WP5 tasks. Thus, although the main focus of this deliverable is on T5.2, DR5.1.2 also doubles to report progress on other WP5 tasks, and in particular on the work related to flexible planning, which have been contextually started. Role of Skill Learning in NIFTi During the exploration of an unknown area, the NIFTi human-robot team continuously acts and interacts. Thus, for an artificial agent, execution have to be adapted in order to address sudden, asynchronous needs generating from the intervention by the operators and by the dynamic nature of the unknown environment. Task 5.2, Learning skills for functioning processes and task execution, aims at developing paradigms and methods to acquire the skills necessary for such control, from human-robot interaction and live EU FP7 NIFTi (ICT-247870) 3 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli demonstrations in real intervention scenarios. WP5 investigates skills involved in selecting and coordinating multiple tasks when operating. Skills are learned by continuous interaction with humans, via demonstration and through an action-reaction paradigm, to acquire the effects of actions and processes. Contribution to the NIFTi scenarios and prototypes The problem that is addressed in T5.2 is to develop methods and paradigms to learn skills that are needed to plan and coordinate the tasks the NIFTi architecture has to accomplish to reach the goal in the USAR scenario. The training set is made of observations on demonstrations performed by humans. This means that where humans look, how they move, what actions they perform and how they report what they see and do through speech constitute the training data for skill learning. The use of human visual attention as a form of demonstration is novel in the definition of a model for flexible planning. Learning to detect salient features, gestures and actions requires an instrument for the study of visual attention. The most salient regions in a scene can be determined analysing the performed sequence of fixations, i.e. tracking the gaze. The need for a special, custom device, thus not relying on commercially available solutions, originated from a number of considerations. First, in the USAR scenario, the demonstrator moves in a 3-dimensional environment and his actions, in the form of fixations, motion or manipulation, are inherently 3-dimensional. The experiments require extracting scan-paths from extremely varying datasets, implying completely different experimental setups in terms of distance and camera field of view. So, a high level of flexibility is desirable. Finally, our experiments involve a number of subjects and complex, dynamic scenarios: calibration and acquisition procedures must require as little intervention from the operator as possible and the whole device should be highly automatic and non-invasive. Along with accuracy and robustness, the need for a device suitable for the on-field data acquisition led to the introduction of a novel calibration procedure that is almost completely automatic and can be carried out with very little intervention by the operator. The device we call the Gaze Machine allows such a data acquisition, as it is described in the following. EU FP7 NIFTi (ICT-247870) 4 Methods and paradigms for skill learning Develop an interactionoriented computational model of visual attention (T5.2) Gaze Data collected in 3D gaze tracking experiments Data acquisition (September 2010) 3D saliency map Inertial Measurements containing referring expressions Domain material action sequences specifying protocols and procedures Task driven attention for coordination and communication (WP5) Attentive joint exploration (WP5) Skill learning (WP5) for the subject’s head Running commentaries Domain analysis and specifications (DR 5.1.1) Gianni, Papadakis, Pirri, Pizzoli motion gestures actions Cognitive Execution Monitoring (WP5) • action - reaction Adaptation (WP5) • • Cognitive task load (WP4) observations affordances user interaction Multimodal HRI (WP4) Figure 1: Contribution of the learning paradigm based on the Gaze Machine data collection to the NIFTi scenario 1 1.1 Tasks, objectives, results Planned work WP5 contributes to NIFTi project via the realization of a flexible time planner with time interval compatibilities, resources and components management related to different robot processes. It also contributes to the execution and monitoring of the planned actions, and to all those activities that make planning adaptable to the user needs, to the task requirements, to jointproviso of team-work and to the processes and resources to be allocated for a designated mission. These activities include skill learning, accommodation to end-user planning procedures, modeling of end-users behaviors and most significantly of end-users instructed attentive behaviors while performing risky operations. T5.2 addresses the problem of how a robot can control task execution in such a dynamic setting. The objective of the task is to develop methods for acquiring the skills necessary for such control. Task T5.2 is in progress and most of the activities have been completed. In particular the present document is due as the deliverable DR 5.1.2 on Methods and paradigms for skill learning based on affordances and action reaction observation (as in Description of Work) and is being delivered at the end of the first year (12th month, December 2010). EU FP7 NIFTi (ICT-247870) 5 Methods and paradigms for skill learning 1.2 Gianni, Papadakis, Pirri, Pizzoli Actual work performed In the following a brief summary of the activities related to the started WP5 tasks is presented and the relative status of progress is reported. 1.2.1 Task T5.1 Planning activities specification with end user (completed) 1. A learning scenario has been specified with the Italian Fire Fighters (VVFF) and has been settled in Montelibretti during the NIFTi meeting in September 2010. 2. Standard procedures in the “smoke hall”, simulating the experience of exploring an unknown path in a real fire event, have been acquired from the VVFF and synthesized via a graphical representation (see DR5.1). 3. Primitive processes for planning and execution monitoring have been defined with the end users (VVFF) according to the above procedures. These have been compiled into basic knowledge structures (action preconditions and action effects) for action-reaction. 4. Preliminary compatibilities have been specified according to the already active components of NIFTi robot (mapping-vision-navigation). A comprehensive report on the above activities is available as the Deliverable DR 5.1.1. 1.2.2 Task T5.2: Learning Skills for functioning processes and task execution (in progress) 1. Methods for acquiring human skills have been provided via the Gaze Machine, which has been upgraded to operate in different outdoor scenarios and also in abrupt light changes, like between strong light and twilight [55]. 2. Methodologies to identify human motion in complex scenarios, in the presence of different motions, have been developed, these have been published in the context of action recognition and classification (see [18] and [39]), based on motion segmentation via attention. 3. Data have been gathered by an experienced VVFF instructor wearing the Gaze Machine inside the disaster area. Task 5.1 on Planning activities specification with end user, that was accomplished by the end of the tenth month (October 2010), made available the specifications of context scenario and skills primitives. On that basis, EU FP7 NIFTi (ICT-247870) 6 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli ETHZ contributed to the objectives of Task T5.2 by specifying an ontology of actions that the NIFTi platform should be able to execute. Corresponding low-level execution has been implemented using the open source ROS Navigation Stack [1], according to the NIFTi architecture design concepts. In the first year of the NIFTi research, Task 5.2 focused on providing methods and paradigms for skill learning. The main effort has been dedicated to build and improve the Gaze Machine device in order to make it suitable for the on-field data collection. Indeed, the Gaze Machine is crucial in all the complementary learning activities involving attention, that is, all the tasks in WP5: T5.2, T5.3, T5.4, T5.5, T5.6, and its development will now be described more thoroughly. Figure 2: Gaze Machine camera configuration. Scene cameras are calibrated for stereo depth estimation. Image from [40], reproduced under permission. The Gaze Machine [6, 40, 55] is a system for the acquisition of training data from demonstrations based on a head-mounted, three-dimensional gaze tracker. It constitutes the core of our approach, being the instrument we use to collect data and build our models. The Gaze Machine relies on an innovative model for 3D gaze estimation using multiple cameras. Firmly grounded on the geometry of the multiple views, it introduces a calibration procedure that is efficient, accurate, highly innovative but also practical and easy. Thus, it can run online with little intervention from the user. The overall gaze estimation model is general, as no particular complex model of the human eye is assumed in this work. The resultant system has been effectively used to collect gaze data from subjects freely moving in a dynamic environment and under varying light conditions. The implementation of the Gaze Machine platform has required important algebraic and geometric modeling that has produced a number of publications (see [60],[10],[11],[14]). Visual localization and mapping of the Gaze Machine, hence of the subject wearing it, in order to provide clear human EU FP7 NIFTi (ICT-247870) 7 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli IMU (back) scene cameras eye cameras microphone Figure 3: The fire fighter trainer is wearing the Gaze Machine. uses of vantage points in task execution and in exploration of any kind of environment, is being attained and an initial framework has been reported in [32]. Since the demonstrator moves and acts in the 3D world, the POR is estimated in 3D and, if needed, the structure of the object of fixation can be recovered from the Gaze Machine stereo rig. Inertial data are useful to understand head pose and movements. Vision actions are individuated by changing in head poses; head accelerations are also used to discard correspondent frames, which are very likely to contain high blurring. Running commentaries contain a spoken description of the activity the operator is currently involved in. In the skill learning task, this information is used to label activities making use of referring expressions. The device is wearable, allowing data to be collected while performing natural tasks in unknown, unstructured environments instead of experimental, lab settings. In order to collect data that is meaningful for skill learning the scenario for the experiments must be suitably designed. As previously stated, a first experimental data collection has been carried out during the NIFTi meeting at SFO, in Montelibretti. The Scuola di ForEU FP7 NIFTi (ICT-247870) 8 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli Figure 4: An example of the Gaze Machine on-field calibration. The fire fighter instructor is calibrating the device before starting a session of data acquisition in Montelibretti, on September 2010. Calibration is carried out by fixating a point (in this case the center of the marker) while moving the head. mazione Operativa (Operational Training School, SFO) provided an ideal test ground for the consortium to address the real evaluation scenario. As described in the DOW document, which states the NIFTi USAR scenario, rescuers need to make an assessment of the real situation at a disaster site. During September 2010 meeting at SFO, VVFF set up the tunnel car accident scenario, as described in DR 5.1.1. The NIFTi systems were deployed in order to collect data. Contextually, 10 Gaze Machine acquisitions were scheduled. The experiments involved 5 people: one expert fire fighter instructor and four common people, namely researchers and students involved in the NIFTi project. The intervention procedure was summarised by the expert firefighter during a pre-briefing. The content of the briefing discussion is reported in the following. According to the firefighter instructor’s description, a scenario depicting a car accident in a tunnel represents an extremely critical situation. The NIFTi tunnel simulation scenario doesn’t involve the presence of fire, which makes things easier. Rescue is divided in a first, quick, preliminary survey with the aim of reporting the status of victims and most evident dangers related to the structure, and the actual extraction. The description provided gave raise to two possible intervention scenarios: 1. only one leading fireman equips with self-contained breathing apparatus and access the disaster area; after, he reports to the team waiting outside the tunnel. From that brief description the rescue is organized; EU FP7 NIFTi (ICT-247870) 9 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli Figure 5: Briefing before the experiments. On the left, a particular of the visual acquisition facility composing the Gaze Machine. On the right, fire fighter instructor Salvatore Candela is summarizing the procedure for the tunnel car-accident scenario. 2. the rescue team clear and decontaminate the disaster area to allow the extraction of victims. The task that was assigned to the subjects was thus to access the tunnel area as if they had to make a rapid assessment about the victims, their number and positions, and about the dangers resulting from the possible presence of toxic or flammable substances and the chance of structure collapses. Processed data are available on the NIFTi repository at the address http://dav.nifti.eu/share/media/ROMA/SFO ROMA data. For every experiment, a video sequence is available for each of the cameras building up the system. Videos are MPEG-4 files, encoded by XVID codec. The four video streams are synchronized. The inertial measures are acquired at the same rate and consist in roll-pitch-yaw angles of the head. Running commentaries are encoded in MP3 files, with the same time duration of video sequences and thus can be played synchronously. The subtitle file (.srt) contains the transcription of the running commentary in the SubRip file format (http://en.wikipedia.org/wiki/Subrip), which contains formatted text for the starting/stopping time of every comment. The following running commentary has been collected during the first run of fire fighter instructor Candela into the Montelibretti tunnel scenario and can be found in the NIFTi data repository, in the GM data/salvatore-1 path. COMMENTARY BEGINS 00:00:42,800 – 00:00:43,117 Ok. 11 00:00:43,555 – 00:00:45,288 I’m approaching the tunnel. 12 00:00:45,289 – 00:00:47,894 I see... I’m collecting an attentive, quick survey. 13 00:00:47,894 – 00:00:50,381 The area is apparently safe. EU FP7 NIFTi (ICT-247870) 10 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli Figure 6: Operator observing the state of a victim and the fixation measured by the GM. 14 00:00:50,782 – 00:00:53,100 Also air is quite breathable. 15 00:00:53,150 – 00:00:58,830 I don’t smell flammable liquids or other dangerous substances. 16 00:00:59,000 – 00:01:00,701 I found the first car. 17 00:01:01,000 – 00:01:05,467 It is flipped by 90 degs w.r.t. the surface. 18 00:01:05,500 – 00:01:08,278 I’m trying to look inside. 19 00:01:08,278 – 00:01:09,678 Glasses are still intact. 20 00:01:09,900 – 00:01:12,262 Thy are dark so I cannot see well. 21 00:01:12,290 – 00:01:15,829 The person inside looks like unconscious. 22 00:01:15,961 – 00:01:18,361 One person is certainly inside. 23 00:01:19,000 – 00:01:20,521 I’m doing one more survey. 24 00:01:21,364 – 00:01:24,364 Now, in front of me, there’s a barrel. 25 00:01:24,400 – 00:01:27,035 Apparently it is an empty barrel. 26 00:01:27,123 – 00:01:32,123 No signs of toxic or dangerous substances. 27 00:01:32,785 – 00:01:34,785 It must be a building yard barrel. 28 00:01:34,900 – 00:01:37,648 Indeed I see a truck ahead. 29 00:01:37,840 – 00:01:39,900 On the right, ahead. 30 00:01:40,082 – 00:01:42,082 Certainly there are people inside. 31 00:01:41,993 – 00:01:44,993 A car... a civil car... with some people inside. 32 00:01:45,000 – 00:01:45,780 A family. EU FP7 NIFTi (ICT-247870) 11 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli 33 00:01:46,012 – 00:01:47,012 People... 34 00:01:47,710 – 00:01:48,710 A woman drives. 35 00:01:48,914 – 00:01:51,000 A person in the front seat. 36 00:01:51,069 – 00:01:52,000 A child. 37 00:01:52,349 – 00:01:53,800 Another child in the rear seat. 38 00:02:00,000 – 00:02:01,702 Another child... a baby. 39 00:02:02,000 – 00:02:03,000 We could easily take them out 40 00:02:03,100 – 00:02:06,900 because the car doesn’t cause any difficulties, 41 00:02:07,000 – 00:02:08,426 it is not seriously damaged. 42 00:02:08,946 – 00:02:09,946 Another car. 43 00:02:11,019 – 00:02:13,019 Also in this: an unconscious person. 44 00:02:13,100 – 00:02:14,043 A child. 45 00:02:14,100 – 00:02:15,800 A child in the rear seat. 46 00:02:15,900 – 00:02:16,849 A wife by the side. 47 00:02:18,654 – 00:02:20,654 Another barrel on the ground. 48 00:02:20,655 – 00:02:22,000 On the ground objects from a building yard. 49 00:02:22,203 – 00:02:24,183 Another barrel. 50 00:02:24,206 – 00:02:26,748 (counting) one... two... three... four barrels. 51 00:02:26,748 – 00:02:28,900 We don’t know the contained substances. 52 00:02:29,152 – 00:02:30,152 We will address this. 53 00:02:30,300 – 00:02:32,010 Ok, two more people. 54 00:02:32,041 – 00:02:34,041 They’re all unconscious. 55 00:02:34,644 – 00:02:36,644 Air is still quite breathable for me. 56 00:02:37,901 – 00:02:41,901 I’m not wearing protection devices but I can breath normally here. 57 00:02:42,889 – 00:02:44,889 Ok. I’m quitting. COMMENTARY ENDS Another example of collected running commentary is available in the GM data/salvatore-2 path: COMMENTARY BEGINS 1 00:00:15,000 – 00:00:17,000 Ready? 2 00:00:18,388 – 00:00:19,388 You can go. 3 00:00:41,177 – 00:00:53,177 (Instructions for calibration) 4 00:00:56,976 – 00:00:57,976 Ready. 5 00:00:59,340 – 00:01:00,340 Above me everything looks alright. 6 00:01:00,916 – 00:01:02,916 I’m entering. EU FP7 NIFTi (ICT-247870) 12 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli 7 00:01:02,670 – 00:01:03,670 Let’s see... 8 00:01:04,228 – 00:01:05,428 A car is flipped 9 00:01:05,483 – 00:01:06,483 People inside. 10 00:01:06,490 – 00:01:07,500 I can’t see very well 11 00:01:07,764 – 00:01:09,764 but for sure there’s a person inside 12 00:01:10,720 – 00:01:13,500 Building material on the road 13 00:01:13,722 – 00:01:14,722 I’m experiencing some difficulties in moving 14 00:01:15,047 – 00:01:17,047 A barrel with toxic substances 15 00:01:17,607 – 00:01:18,607 It’s oxygen, inflammable 16 00:01:19,077 – 00:01:21,077 Danger, caution 17 00:01:21,613 – 00:01:24,623 Ok, (counting): one, two, three unconscious people... four unconscious people 18 00:01:25,099 – 00:01:28,099 Another car... (counting) one, two, three cars 19 00:01:28,530 – 00:01:29,530 A van. 20 00:01:29,913 – 00:01:32,450 Poison... toxic substance... 21 00:01:32,923 – 00:01:33,923 Radioactive ? 22 00:01:34,155 – 00:01:35,955 Thus, the van was carrying materials... 23 00:01:36,387 – 00:01:40,387 Ok, unconscious driver 24 00:01:40,620 – 00:01:43,620 Probably intoxicated, I don’t know, we’ll check... 25 00:01:43,400 – 00:01:44,400 Ok. 26 00:01:44,824 – 00:01:45,824 Caution, another (barrel) 27 00:01:46,401 – 00:01:49,401 one, two, three... three, four barrels. 28 00:01:49,800 – 00:01:54,580 One, two, three... two, four cars... one van. 29 00:01:55,052 – 00:01:57,052 12-13 people at least. 30 00:01:57,861 – 00:02:00,861 Ok, I go on... no fire ignition. COMMENTARY ENDS Finally, the gpr output files contain the fixations re-projected to the right scene camera for every time step, in the form of mean and variance for the x and y image coordinates, in this order. Beside being used for Skill Learning, gathered data has provided an important input to the work in WP1 on functional mapping (T1.4) and to the work in WP3 on referencing in situated dialogue processing (T1.1/T1.3). EU FP7 NIFTi (ICT-247870) 13 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli Figure 7: Gaze Machine calibration. The calibration pattern is detected in the 3D world, allowing the model parameters to be recovered and the 3D Point of Regard to be estimated from the optical axes of both the eyes. 1.2.3 T5.3: Task-driven attention for coordination and communication (in progress) 1. We have settled a number of experiments in the tunnel use case for a multi-car accident, with the Gaze Machine worn by a VVFF instructor (described in Section 1.2.2). 2. From the sequence of fixations and the running commentaries a scanpath has been produced, the localization of the scan-path is being optimized as the training area induced a non trivial drift error. Early results are described in [32]. We have elaborated these experiments and labeled the sequence of actions with referring expressions, according to the recorded running commentaries. 1.2.4 T5.5: Situated exploration history (in progress) 1. Implementation of planning and interface of the planner with ROS. 2. Plan recognition and temporal specifications [33]. 3. Integration of the planner with ROS, namely with both the Navigation component and the Mapping components as provided in ROS. 4. Execution monitoring for preliminary exploration has been defined in Eclipse Prolog and interfaced with ROS, namely with both the EU FP7 NIFTi (ICT-247870) 14 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli Figure 8: Left: the GM worn by an experienced VVFF instructor in Montelibretti. Right: robot exploration of the tunnel disaster area. Navigation component and the Mapping components as provided in ROS. How theory of actions emerges from skill learning is described in [54], whereas a DL extensions to reasoning is described in [34], and general in [67], see Annexes. The research related to flexible planning is reported in [19], also contained in the Annexes. 1.3 Relation to user-centric design As previously remarked, the construction of a knowledge base for planning will mention compatibilities, specifying time relations, preconditions and effects of processes, actions and behaviours. All these aspects need to be adapted to the particular rescue situation and to specific needs from each component. In order to be effectively flexible in time and adaptive in behaviours, ideally most of the specifications should be learned. We aim at learning them using data gathered observing complex actions being demonstrated by a tutor. The Gaze Machine offers an extraordinary vantage point, as it enables to observe what effectively the tutor is doing and saying along with how he adapts his behaviours, by instantiating with common sense the intervention procedures that regulate his conduct in similar circumstances. The motivations behind the study of human gaze reside in the fact that attention plays a fundamental role in NIFTi. It influences all the aspects concerning skill learning and cooperative human - robot interaction on which planning and execution monitoring are based. Visual attention is closely related to gazing: eye tracking experiments demonstrate that simple cues (motion, contrast, color, luminance, orientation, flicker) can predict saccadic behavior and, thus, the focus of attention. Our effort has focused on collecting data during the on field test scenarios. At SFO a 3D dynamic gaze tracker, the Gaze Machine, has been used EU FP7 NIFTi (ICT-247870) 15 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli to collect the attention foci relative to a firefighter instructor in the tunnel scenario. Our gaze estimation device has been improved in different directions in order to be suitable for collecting data in the use case scenario. The used gaze tracking prototype allows the POR to be estimated in 3D space. The position of the fixed object in the world can thus be recovered with respect to the subject, allowing the determination of the actual Field Of View along with the foveated area and thus the analysis of peripheral vision. For every performed demonstration, the main objective is to segment the sequence in basic actions, such as motion, vision and manipulation actions. Data for learning comprise • scene recording • 3d Point of Regard (POR) estimation • inertial measurements • running commentaries For the sake of the experimental data collection, the acquisition device has been improved in both the hardware and software. New cameras allow for higher image quality and acquisition rate. A new tablet-convertible toughbook is used to collect data from the experiment; it is carried as a backpack and allows direct control on the experiment by a touch GUI. Thus, both the calibration and acquisition phases can be easily operated on field (as in Figure 4). Sensors and storage facilities are worn by the subject and the experimental setup is completely self contained. Improvements has been achieved also in the accuracy of the POR estimation. A novel calibration phase has been proposed and a paper describing the new architecture has been submitted to a computer vision conference [53]. Inertial data is collected by an Inertial Measurement Unit (IMU) placed on the firefighter helmet while a microphone is used to record the spoken running commentaries (see Figure 3). 1.4 Relation to the state-of-the-art Cognitive execution is meant to comply with real adaptation to changing objectives, following operators instructions (e.g. dialogue, commands) that require switching between tasks and flexibly revise plans and processes. The main novelty introduced by WP5-related work is the ability to learn several skills using attention and generated gaze scan-paths that show the affordances of processes, including the communication steps, at several levels of details (from activities to HRI). Learning skills provides classes of parameters and features for choosing strategies of actions according to time and compatibilities constraints. So far no attempt has been made to create new EU FP7 NIFTi (ICT-247870) 16 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli primitives and parameters on line (on line adaptation) based on the robots experience and interaction. Plan recognition is about inferring a plan from the observations of actions [64, 30, 29, 4, 22]. The analogous concepts of acting based on observations have been specified in the computer vision community as action recognition, imitation learning or affordances learning, as mainly motivated by the neurophisiological studies of Rizzolatti and colleagues [50, 21] and by Gibson [25, 24]. Reviews on action recognition are given in [44, 56, 2] and on learning by imitation in [3, 63]. The two approaches have, however, evolved in completely different directions. Plan recognition assumed actions to be already given and represented, in so being concerned only with the technical problems of generating a plan, taking into account specific preferences and user choices, and possibly interpreting plan recognition in terms of theory of explanations [12]. On the other hand action recognition and imitation learning has been more and more concerned with the robot ability to capture the real and effective sequence and to adapt it to changing contexts. As noted by Krüger and colleagues in [35] the terms action and intent recognition, in plan recognition, often obscure the real task achieved by these approaches. In fact, as far as plan recognition assumes an already defined set of actions the observation process is purely indexical. On the other hand the difficulties with the learning by imitation and action recognition approaches is that they lack important concepts such as execution monitoring, intention recognition and plan generation. The problem of learning a basic theory of actions from observations has been addressed in [53], where it is shown how it is possible to automatically derive a model of the Situation Calculus from early vision, thus providing an example of bridging from perception to logical modeling. The introduction of the Gaze Machine, as a device to implement the skill learning paradigm, constitutes a novelty, providing an extremely rich source of information an agent can use to learn a well temporised sequence of actions and thus, to generate a suitable plan [32]. Besides, the device itself is novel and the design of the video-oculography subsystem for the tracking of gaze and the localization in space of human scan-paths are producing contributions to the research in computer vision and visual attention [53, 61, 39]. Within the class of non invasive eye trackers, head mounted ones offer the advantage of being more accurate than remotely located ones ([46]), and enable gaze estimation of a person moving in unspecified environments. Eye gaze estimation systems are, usually, a combination of a pupil centre detection algorithm and an homography or polynomial mapping between pupil positions and points on a screen observed by the user ([27]). Recently this method was applied on a head mounted tracker ([37]), projecting the gaze direction on the image of a camera facing the scene. Other methods EU FP7 NIFTi (ICT-247870) 17 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli include pupil and corneal reflection tracking ([70]), dual Purkinje image tracking ([13]) or scleral coil searching ([8]). Three dimensional gaze direction estimation, instead, requires to determine the eye position with respect to the camera framing it. This process usually needs a preliminary calibration step where eye geometry is computed using a simplified eye model ([66]). Most of the systems are made up of one or more cameras pointing to a subject looking to a screen, where fixation points are projected ([49]). Therefore other calibration tasks are often necessary in order to reckon up screen and led position in the camera reference frame. The Gaze Machine platform, on the other hand, implements the geometry of the eyes motion manifold in order to compute the gaze fixations in a dynamic 3D space [53], with abrupt light changes, as it was the case in the tunnel scenario at SFO. The problem of skill learning is related to the problem of managing task switching at the appropriate time and context, and thus it involves the control of many sources of information, incoming from the environment, likewise arbitration of resource allocation for perceptual-motor and selection processes. The new experimental setting we have provided encourages a better understanding of the cognitive functioning of executive processes. The ability to establish the proper mappings between inputs, internal states, and outputs needed to perform a given tasks [43] is called cognitive control or executive function in neuroscience studies and it is often analysed with the aid of the concept of inhibition (see e.g. [43, 5]), explaining how a subject in the presence of several stimuli responds selectively and is able to resist inappropriate urges (see [69]). Cognitive control, as a general function, explains flexibly switching between tasks, when reconfiguration of memory and perception is required, by disengaging from previous goals or task sets (see [42][52]). The role of task switching in robot cognitive control is highlighted in many biologically inspired architectures, such as the ISAC architecture [31], ALEC architecture based on state changes induced by homeostatic variables [20], Hammer [15] and the GWT (Global Workspace Theory) [65]. Studies on cognitive control, and mainly on human adaptive behaviours, investigated within the task-switching paradigm, have strongly influenced cognitive robotics architectures since the eighties, as for example the Norman and Shallice [48] ATA schema, the FLE model of Duncan [17] and the principles of goal directed behaviours in Newell [47] (for a review on these architectures in the framework of the task switching paradigm see [59]). Also the approaches to model based executive robot control, such as Williams [7] and earlier [28, 71], devise runtime systems managing backward inhibition via real-time selection, execution and actions guiding, by hacking behaviours. This model-based view postulates the existence of a declarative (symbolic) model of the executive which can be used by the cognitive conEU FP7 NIFTi (ICT-247870) 18 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli trol to switch between processes within a reactive control loop. The flexible temporal planning approach (see also Constraint-based Interval Planning framework [28]), proposed by the planning community, has shown a strong practical impact in real world applications based on deliberation and execution integration (see e.g. RAX [28], IxTeT [23], INOVA [68], and RMPL [71]). Our method is an extension of these approaches as compatibilities are directly learned on the basis of our Gaze Machine online acquisition of behaviours, and our formalization well integrates control and reasoning. EU FP7 NIFTi (ICT-247870) 19 Methods and paradigms for skill learning 2 2.1 Gianni, Papadakis, Pirri, Pizzoli Annexes Rudi et al. “Linear Solvability in the Viewing Graph” (ACCV2010) Bibliography A. Rudi, M. Pizzoli and F. Pirri. “Linear Solvability in the Viewing Graph”In: Proceedings of the 10th Asian Conference on Computer Vision (ACCV 2010). Queenstown, New Zealand. November 2010. Lecture Notes on Computer Science 6494, Part III. Abstract The Viewing Graph [36] represents several views linked by the corresponding fundamental matrices, estimated pairwise. Given a Viewing Graph, the tuples of consistent camera matrices form a family that we call the Solution Set. This paper provides a theoretical framework that formalizes different properties of the topology, linear solvability and number of solutions of multi-camera systems. We systematically characterize the topology of the Viewing Graph in terms of its solution set by means of the associated algebraic bilinear system. Based on this characterization, we provide conditions about the linearity and the number of solutions and define an inductively constructible set of topologies which admit a unique linear solution. Camera matrices can thus be retrieved efficiently and large viewing graphs can be handled in a recursive fashion. The results apply to problems such as the projective reconstruction from multiple views or the calibration of camera networks. Relation to work performed In this paper we extend the notion of solvability for a Viewing Graph introducing a new taxonomy, taking into account both linear solvability and the number of solutions. An inductively constructible set of topologies admitting a unique linear solution allows for a building blocks design that can be used to inductively construct more complex topologies, very useful to combine global and incremental methods for camera matrix estimation. Such a formalisation contributes to the hierarchical and recursive approaches for solving n-view camera pose estimations and its use has been investigated in the context of reconstructing the motion of a moving camera pair that is part of the 3D gaze estimation device in unknown and unstructured environments. EU FP7 NIFTi (ICT-247870) 20 Methods and paradigms for skill learning 2.2 Gianni, Papadakis, Pirri, Pizzoli Fanello et al. “Arm-Hand behaviours modelling: from attention to imitation”(ISVC2010) Bibliography S. R. F. Fanello, I. Gori and Fiora Pirri. “Arm-Hand behaviours modelling: from attention to imitation”In: Proceedings of the 7th International Symposium on Visual Computing (ISVC 2010). Las Vegas, Nevada, USA . September 2010. Lecture Notes on Computer Science 6454. Abstract We present a new and original method for modelling arm-hand actions, learning and recognition. We use an incremental approach to separate the arm-hand action recognition problem into three levels. The lower level exploits bottom-up attention to select the region of interest, and attention is specifically tuned towards human motion. The middle level serves to classify action primitives exploiting motion features as descriptors. Each of the primitives is modelled by a Mixture of Gaussian, and it is recognised by a complete, real time and robust recognition system. The higher level system combines sequences of primitives using deterministic finite automata. The contribution of the paper is a compositional based model for arm-hand behaviours allowing a robot to learn new actions in a one time shot demonstration of the action execution. Relation to work performed The paper describes an approach to attentively recognize human gesture in the context of learning by imitation. The notion of saliency to motion is at the basis of the non-verbal interaction in NIFTi, as it provides segmentation of the human motion and, in the USAR scenario, allows the robot to focus on visually issued requests by the rescuers or victims. A novel model of visual attention taking into account motion is introduced and its effectiveness in segmenting human gestures is demonstrated. Basing on early motion features like orientation and velocities, complex gestures are segmented in action primitives, the basic components that constitute gesture sequences. The proposed gesture analysis paradigm is designed for skill learning via demonstration and attentive exploration. EU FP7 NIFTi (ICT-247870) 21 Methods and paradigms for skill learning 2.3 Gianni, Papadakis, Pirri, Pizzoli H. Khambhaita et al. “Help Me to Help You: how to Learn Intentions, Actions and Plans”(AAAI-SSS11) Bibliography H. Khambhaita, G-J. Kruijff, M. Mancas, M. Gianni, P. Papadakis, F. Pirri and M. Pizzoli. “Help Me to Help You: how to Learn Intentions, Actions and Plans”In: Proceedings of the Proceedings of the AAAI 2011 Spring Symposium on Help Me Help You: Bridging the Gaps in Human-Agent Collaboration(AAAI-SSS11). Stanford, California, USA . March 2011. Abstract The collaboration between a human and a robot is here understood as a learning process mediated by the instructor prompt behaviours and the apprentice collecting information from them to learn a plan. The instructor wears the Gaze Machine, a wearable device gathering and conveying visual and audio input from the instructor while executing a task. The robot, on the other hand, is eager to learn both the best sequence of actions, their timing and how they interlace. The cross relation among actions is specified both in terms of time intervals for their execution, and in terms of location in space to cope with the instruction interaction with people and objects in the scene. We outline this process: how to transform the rich information delivered by the Gaze Machine into a plan. Specifically, how to obtain a map of the instructor positions and his gaze position, via visual slam and gaze fixations; further, how to obtain an action map from the running commentaries and the topological maps and, finally, how to obtain a temporal net of the relevant actions that have been extracted. The learned structure is then managed by the flexible time paradigm of flexible planning in the Situation Calculus for execution monitoring and plan generation. Relation to work performed The paper outlines a model of human robot collaboration in which the final goal is to learn the best actions needed to achieve the required goals, in this case, reporting hazards due to a crash accident in a tunnel, identifying the status of victims and, possibly, rescuing them. The collaboration is here viewed as a learning process involving the extraction of information from the instructor behaviours, thus providing data for skill and affordances learning. The instructor communicate his actions both visually (Using the GM) and with the aid of his comments delivered while executing the actions. EU FP7 NIFTi (ICT-247870) 22 Methods and paradigms for skill learning 2.4 Gianni, Papadakis, Pirri, Pizzoli Carbone and Pirri. “Learning Saliency. An ICA based model using Bernoulli mixtures.”(BICS 2010) Bibliography A. Carbone and F. Pirri. “Learning Saliency. An ICA based model using Bernoulli mixtures.”In: In Proceedings of BICS, Brain Ispired Cognitive Systems (BICS 2010). Madrid, Spain. July 2010. Abstract In this work we present a model of both the visual input selection and the gaze orienting behaviour of a human observer undertaking a visual exploration task in a specified scenario. Our method built on a real set of gaze tracked points of fixation, acquired from a custom designed wearable device [41]. By comparing these sets of fovea-centred patches with a randomly chosen set of image patches, extracted from the whole visual context, we aim at characterising the statistical properties and regularities of the selected visual input. While the structure of the visual context is specified as a linear combination of basis functions, which are independent hence uncorrelated, we show how low level features affecting a scan-path of fixations can be obtained by hidden correlations to the context. Samples from human observers are collected both in free-viewing and surveillancelike tasks, in the specified visual scene. These scan-paths show important and interesting dependencies from the context. We show that a scan-path, given a database of a visual context, can be suitably induced by a system of filters that can be learned by a two stages model: the independent component analysis (ICA) to gather low level features and a mixtures of Bernoulli distributions identifying the hidden dependencies. Finally these two stages are used to build the cascade of filters. Relation to work performed In this paper a model of saliency for images is introduced. A mixture of multivariate Bernoulli distributions models fixations of human scanpaths, acquired by the GM, according to a set of linear filters generated by ICA decomposition of the visual environment. The resultant image saliency map elicits those regions which are likely to have a statistical similarity to the ones occurred in the reference scanpath. The trained system is thus able to predict the gaze behaviour, according to the training set of the acquired fixations, implementing a selection paradigm on early sensory acquisition and allowing control over functioning processes for vision-related tasks. EU FP7 NIFTi (ICT-247870) 23 Methods and paradigms for skill learning 2.5 Gianni, Papadakis, Pirri, Pizzoli Gianni et al. “Learning cross-modal translatability: grounding speech act on visual perception.”(RSS 2010) Bibliography M. Gianni, G. M. Krujiff and F. Pirri. “Learning crossmodal translatability: grounding speech act on visual perception.”In: Proceedings of the RSS Workshop on Learning for Human-Robot Interaction Modeling (RSS 2010). Zaragoza, Spain. June 2010. Abstract The problem of grounding language on visual perception has been nowadays investigated under different approaches, we refer the reader in particular to the works of [51, 58, 26, 72, 16, 57, 62, 45, 38, 9]. It is less investigated the inverse problem, that is, the problem of building the semantics/interpretation of visual perception, via speech-act. In this work we face the two problems simultaneously, via learning both the language and its semantics by human-robot interaction. We describe the progress of a current research facing the problem of simultaneously grounding parts of speech and learning the signature of a language for describing both actions and states space, while actions are executed and shown in a video. Indeed, having both a language and a suitable semantics/interpretation of objects, actions and states properties, we will be able to build descriptions and representations of real world activities under several interaction modalities. Given two inputs, a video and a narrative, the task is to associate a signature and an interpretation to each significant action and the afforded objects, in the sequence, and to infer the preconditions and effects of the actions so as to interpret the chronicle, explaining the beliefs of the agent about the observed task. Relation to work performed The work described in this paper contributes to the research on skill learning as it aims at providing a paradigm to associate speech acts, and thus language, to actions acquired by visual perception. EU FP7 NIFTi (ICT-247870) 24 Methods and paradigms for skill learning 2.6 Gianni, Papadakis, Pirri, Pizzoli Carrano et al. “An approach to projective reconstruction from multiple views.”(IASTED 2010) Bibliography A. Carrano, V. DAngelo, S. R. F. Fanello, I. Gori, F. Pirri, A. Rudi. “An approach to projective reconstruction from multiple views.”In Proceedings of IASTED Conference on Signal Processing, Pattern Recognition and Applications (IASTED 2010). Innsbruck, Austria, February 2010. Abstract We present an original method to perform a robust and detailed 3D reconstruction of a static scene from several images taken by one or more uncalibrated cameras. Making use only of fundamental matrices we are able to combine even heterogeneous video and/or photo sequences. In particular we give a characterization of camera matrices space consistent with a given fundamental matrix and provide a straightforward bottom-up method, linear in most practical uses, to fulfill the 3D reconstruction. We also describe shortly how to integrate this procedure in a standard vision system following an incremental approach. Relation to work performed The work describes a method to perform projective reconstruction from multiple uncalibrated cameras that is based on a graphical representation of the fundamental matrices constraining the images of an object. In this work the Viewing Graph (ACCV 2010) is exploited in order to perform reconstruction. The use of the resultant method has been investigated in the context of the estimation of the visual odometry for the Gaze Machine making use of techniques for Structure and Motion recovery. EU FP7 NIFTi (ICT-247870) 25 Methods and paradigms for skill learning 2.7 Gianni, Papadakis, Pirri, Pizzoli Krieger and Kruijff. “Combining Uncertainty and Description Logic Rule-Based Reasoning in Situation-Aware Robots.”(AAAI-SSS 2011b) Bibliography H. U. Krieger and G. J. M. Kruijff. “Combining Uncertainty and Description Logic Rule-Based Reasoning in Situation-Aware Robots.”In Proceedings of the AAAI 2011 Spring Symposium “Logical Formalizations of Commonsense Reasoning” (AAAI-SSS 2011). Stanford, California, USA. March 2011. Abstract The paper addresses how a robot can maintain a state representation of all that it knows about the environment over time and space, given its observations and its domain knowledge. The advantage in combining domain knowledge and observations is that the robot can in this way project from the past into the future, and reason from observations to more general statements to help guide how it plans to act and interact. The difficulty lies in the fact that observations are typically uncertain and logical inference for completion against a knowledge base is computationally hard. Relation to work performed The paper discusses how we can perform inference over such a long-term memory model, in a multi-agent belief model that deals with uncertainty in knowledge states. EU FP7 NIFTi (ICT-247870) 26 Methods and paradigms for skill learning 2.8 Gianni, Papadakis, Pirri, Pizzoli Stachowicz and Kruijff. “Episodic-Like Memory for Cognitive Robots.”(IEEE-TAMD 2011) Bibliography D. Stachowicz and G. J. M. Kruijff. “Episodic-Like Memory for Cognitive Robots.”IEEE Transactions on Autonomous Mental Development (IEEE-TAMD 2011), 2011. Abstract The article presents an approach to providing a cognitive robot with a long-term memory of experiences a memory, inspired by the concept of episodic memory (in humans) or episodic-like memory (in animals), respectively. The memory provides means to store experiences, integrate them into more abstract constructs, and recall such content. The article presents an analysis of key characteristics of natural episodic memory systems. Based on this analysis, conceptual and technical requirements for an episodic-like memory for cognitive robots are specified. The article provides a formal design that meets these requirements, and discusses its full implementation in a cognitive architecture for mobile robots. It reports results of simulation experiments which show that the approach can run efficiently in robot applications involving several hours of experience. Relation to work performed The article discusses the basis for the longterm memory model we are deploying it in NIFTi, to model the situated exploration history. EU FP7 NIFTi (ICT-247870) 27 Methods and paradigms for skill learning 2.9 Gianni, Papadakis, Pirri, Pizzoli Pirri et al. “A general method for the Point of Regard estimation in 3D space.”(CVPR 2011) Bibliography F. Pirri, M. Pizzoli, A. Rudi. “A general method for the Point of Regard estimation in 3D space. ”Accepted to the IEEE Conference on Computer VIsion and Pattern Recognition (CVPR 2011), 2011. Abstract A novel approach to 3D gaze estimation for wearable multicamera devices is proposed and its effectiveness is demonstrated both theoretically and empirically. The proposed approach, firmly grounded on the geometry of the multiple views, introduces a calibration procedure that is efficient, accurate, highly innovative but also practical and easy. Thus, it can run online with little intervention from the user. The overall gaze estimation model is general, as no particular complex model of the human eye is assumed in this work. This is made possible by a novel approach, that can be sketched as follows: each eye is imaged by a camera; two conics are fitted to the imaged pupils and a calibration sequence, consisting in the subject gazing a known 3D point, while moving his/her head, provides information to 1) estimate the optical axis in 3D world; 2) compute the geometry of the multi-camera system; 3) estimate the Point of Regard in 3D world. The resultant model is being used effectively to study visual attention by means of gaze estimation experiments, involving people performing natural tasks in wide-field, unstructured scenarios. Relation to work performed The paper describes the novel contributions introduced by the video-oculography subsystem of the Gaze Machine, the device that has been described in Section 1.2 as at the core of the skill learning paradigm and of the attention-related tasks. EU FP7 NIFTi (ICT-247870) 28 Methods and paradigms for skill learning 2.10 Gianni, Papadakis, Pirri, Pizzoli Finzi and Pirri. “Switching tasks and flexible reasoning in the Situation Calculus.”(TR 2010) Bibliography A. Finzi, F. Pirri. “Switching tasks and flexible reasoning in the Situation Calculus. ”DIS Techincal Report, n. 7, 2010 Abstract In this paper we present a new framework for modelling switching tasks and adaptive, flexible behaviours for cognitive robots. The framework is constructed on a suitable extension of the Situation Calculus, the Tem- poral Flexible Situation Calculus (TFSC), accommodating Allen temporal intervals, multiple timelines and concurrent situations. We introduce a constructive method to define pattern rules for temporal constraint, in a language of macros. The language of macros intermediates between Situation Calculus formulae and temporal constraint Networks. The programming language for the TFSC is TFGolog, a new Golog interpreter in the Golog family languages, that models concurrent plans with flexible and adaptive behaviours with switching modes. Finally, we show an implementation of a cognitive robot performing different tasks while attentively exploring a rescue environment. Relation to work performed The aim of this work is to report the research on flexible planning and constitutes the theoretical foundation for the current implementation of the NIFTi model based planner. EU FP7 NIFTi (ICT-247870) 29 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli References [1] http://www.ros.org/wiki/navigation. [2] J. K. Aggarwal and Q. Cai. Human motion analysis: A review. Computer Vision and Image Understanding, 73:428–440, 1999. [3] Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robot. Auton. Syst., 57(5):469–483, 2009. [4] Marcelo Gabriel Armentano and Analı́a Amandi. Plan recognition for interface agents. Artif. Intell. Rev., 28(2):131–162, 2007. [5] A. R. Aron. The neural basis of inhibition in cognitive control. The Neuroscientist, 13:214 – 228, 2007. [6] Anna Belardinelli, Fiora Pirri, and Andrea Carbone. Bottom-up gaze shifts and fixations learning by imitation. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 2007. [7] Stephen A. Block, Andreas F. Wehowsky, and Brian C. Williams. Robust execution on contingent, temporally flexible plans. In AAAI, 2006. [8] L. Bour. Dmi-search scleral coil. H2 214, Department of Neurology, Clinical Neurophysiology, Academic Medical Center, Amsterdam, 1997. [9] S. R. K. Branavan, Harr Chen, Jacob Eisenstein, and Regina Barzilay. Learning document-level semantic properties from free-text annotations. J. Artif. Intell. Res. (JAIR), 34:569–603, 2009. [10] A. Carbone and F. Pirri. Analysis of the local statistics at the centre of fixation during visual scene exploration. In IARP International Workshop on Robotics for risky interventions and Environmental Surveillance(RISE 2010), 2010. [11] A. Carbone and F. Pirri. Learning saliency. an ica based model using bernoulli mixtures. In In Proceedings of BICS, Brain Ispired Cognitive Systems, 2010. [12] Eugene Charniak and Robert P. Goldman. A bayesian model of plan recognition. Artif. Intell., 64(1):53–79, 1993. [13] T. N. Cornsweet and H. D. Crane. Accurate two-dimensional eye tracker using first and fourth purkinje images. Journal of The Optical Society of America, 68(8):921–928, 1973. EU FP7 NIFTi (ICT-247870) 30 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli [14] V. D’Angelo, S. R. F. Fanello, I. Gori, F. Pirri, and A. Rudi. An approach to projective reconstruction from multiple views. In In Proceedings of IASTED Conference on Signal Processing, Pattern Recognition and Applications, 2010. [15] Yiannis Demiris and Bassam Khadhouri. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5):361–369, 2006. [16] Peter Ford Dominey and Jean-David Boucher. Learning to talk about events from narrated video in a construction grammar framework. Artif. Intell., 167(1-2):31–61, 2005. [17] J. Duncan. Disorganization of behaviour after frontal-lobe damage. Cognitive Neuropsychology, 3:271–290, 1986. [18] I. Fanello, S. R. F. Gori and F. Pirri. Arm-hand behaviours modelling: from attention to imitation. In ISVC International Symposium on Visual Computing(ISVC2010), 2010. Best Paper Award. [19] A. Finzi and F. Pirri. Switching tasks and flexible reasoning in the situation calculus. Technical Report 7, Dipartimento di informatica e Sistemistica Sapienza Università di Roma, 2010. [20] Sandra Clara Gadanho. Learning behavior-selection by emotions and cognition in a multi-goal robot task. J. Mach. Learn. Res., 4:385–412, 2003. [21] V. Gallese, L. Fadiga, L. Fogassi, and G. Rizzolatti. Action recognition in the premotor cortex. Brain, 119:593–609, 1996. [22] C. Geib. Delaying commitment in plan recognition using combinatory categorial grammars. In Proc. of the IJCAI 2009, pages 1702–1707, 2009. [23] M. Ghallab and H. Laruelle. Representation and control in ixtet, a temporal planner. In Proceedings of AIPS-1994, pages 61–67, 1994. [24] J.J. Gibson. Perceptual learning: differentiation or enrichment? Psyc. Rev., 62:32–41, 1955. [25] J.J. Gibson. The theory of affordances. In R. Shaw and J. Bransford, editors, Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pages 67–82. Hillsdale, NJ: Lawrence Erlbaum, 1977. [26] Peter Gorniak and Deb Roy. Grounded semantic composition for visual scenes. J. Artif. Intell. Res. (JAIR), 21:429–470, 2004. EU FP7 NIFTi (ICT-247870) 31 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli [27] Dan Witzner Hansen and Arthur E. C. Pece. Eye typing off the shelf. In 2004 Conference on Computer Vision and Pattern Recognition (CVPR 2004), pages 159–164, June 2004. [28] Ari K. Jonsson, Paul H. Morris, Nicola Muscettola, Kanna Rajan, and Benjamin D. Smith. Planning in interplanetary space: Theory and practice. In Artificial Intelligence Planning Systems, pages 177–186, 2000. [29] Henry A. Kautz. A formal theory of plan recognition. PhD thesis, Department ofComputer Science, University of Rochester, 1987. [30] Henry A. Kautz and James F. Allen. Generalized plan recognition. In AAAI, pages 32–37, 1986. [31] Kazuhiko Kawamura, Tamara E. Rogers, and Xinyu Ao. Development of a cognitive model of humans in a multi-agent framework for humanrobot interaction. In AAMAS ’02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems, pages 1379–1386, New York, NY, USA, 2002. ACM. [32] H. Khambhaita, G.J.M. Kruijff, M. Mancas, M. Gianni, P. Papadakis, F. Pirri, and M. Pizzoli. Help me to help you: how to learn intentions, actions and plans. In AAAI 2011 Spring Symposium “Help Me Help You: Bridging the Gaps in Human-Agent Collaboration”, 2011. [33] Hans-Ulrich Krieger. A temporal extension of the Hayes and ter Horst entailment rules for RDFS and OWL. In AAAI 2011 Spring Symposium “Logical Formalizations of Commonsense Reasoning”, 2011. [34] Hans-Ulrich Krieger and Geert-Jan M. Kruijff. Combining uncertainty and description logic rule-based reasoning in situation-aware robots. In AAAI 2011 Spring Symposium “Logical Formalizations of Commonsense Reasoning”, 2011. [35] Volker Krüger, Danica Kragic, and Christopher Geib. The meaning of action a review on action recognition and mapping. Advanced Robotics, 21:1473–1501, 2007. [36] N. Levi and M. Werman. The viewing graph. CVPR, 2003. [37] Dongheng Li, Jason Babcock, and Derrick J. Parkhurst. Openeyes: a low-cost head-mounted eye-tracking solution. In ETRA ’06: Proceedings of the 2006 symposium on Eye tracking research & applications, pages 95–100, New York, NY, USA, 2006. ACM. EU FP7 NIFTi (ICT-247870) 32 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli [38] Ingo Lütkebohle, Julia Peltason, Lars Schillingmann, Britta Wrede, Sven Wachsmuth, Christof Elbrechter, and Robert Haschke. The curious robot - structuring interactive robot learning. In ICRA, pages 4156–4162, 2009. [39] M. Mancas, F. Pirri, and M. Pizzoli. Human-motion saliency in multimotion scenes and in close interaction. submitted Gesture Recognition Workshop 2011. [40] S. Marra and F. Pirri. Eyes and cameras calibration for 3d world gaze detection. In Proceedings of the International Conference on Computer Vision Systems, 2008. [41] Stefano Marra and Fiora Pirri. Eyes and cameras calibration for 3d world gaze detection. In Proceedings of the International Conference on Computer Vision Systems, pages 216–227, 2008. [42] U. Mayr and SW. Keele. Changing internal constraints on action: the role of backward inhibition. Journal of Experimental Psychology, 129(1):4–26, 2000. [43] E.K. Miller and J.D. Cohen. An integrative theory of prefrontal cortex function. Annual Rev. Neuroscience, 24:167 – 202, 2007. [44] Thomas B. Moeslund, Adrian Hilton, and Volker Krüger. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2-3):90–126, 2006. [45] Raymond J. Mooney. Learning to connect language and perception. In AAAI, pages 1598–1601, 2008. [46] Carlos H. Morimoto and Marcio R. M. Mimica. Eye gaze tracking techniques for interactive applications. Computer Vision Image Understing, 98(1):4–24, 2005. [47] A. Newell. Unified theories of cognition. Harvard University Press, 1990. [48] D. A. Norman and T. Shallice. Consciousness and Self-Regulation: Advances in Research and Theory, volume 4, chapter Attention to action: Willed and automatic control of behaviour. Plenum Press, 1986. [49] Takehiko Ohno, Naoki Mukawa, and Atsushi Yoshikawa. Freegaze: a gaze tracking system for everyday gaze interaction. In ETRA ’02: Proceedings of the 2002 symposium on Eye tracking research & applications, pages 125–132, New York, NY, USA, 2002. ACM. EU FP7 NIFTi (ICT-247870) 33 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli [50] G. Di Pellegrino, V. Gallese, L. Fadiga, L. Fogassi, and G. Rizzolatti. Understanding motor events: a neurophysiological study. Exp. Brain Research, 91:176–180, 1992. [51] Alex Pentland, Deb Roy, and Christopher Richard Wren. Perceptual intelligence: learning gestures and words for individualized, adaptive interfaces. In HCI (1), pages 286–290, 1999. [52] Andrea Philipp and Iring Koch. Task inhibition and task repetition in task switching. The European Journal of Cognitive Psychology, 18(4):624–639, 2006. [53] Fiora Pirri. The well-designed logical robot: learning and experience from observations to the situation calculus. Artificial Intelligence, pages 1–44, Apr 2010. [54] Fiora Pirri. The well-designed logical robot: Learning and experience from observations to the situation calculus. Artificial Intelligence, 175(1):378 – 415, 2011. [55] Fiora Pirri, Matia Pizzoli, and Alessandro Rudi. A general method for the point of regard estimation in 3d space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011. [56] Ronald Poppe. A survey on vision-based human action recognition. Image and Vision Computing, 28:976–990, 2010. [57] Deb Roy. Semiotic schemas: A framework for grounding language in action and perception. Artif. Intell., 167(1-2):170–205, 2005. [58] Deb Roy and Alex Pentland. Learning words from sights and sounds: a computational model. Cognitive Science, 26(1):113–146, 2002. [59] J.S. Rubinstein, E.D. Meyer, and J. E. Evans. Executive control of cognitive processes in task switching. Journal of Experimental Psychology: Human Perception and Performance, 27(4):763–797, 2001. [60] A. Rudi, M. Pizzoli, and F. Pirri. Linear solvability in the viewing graph. In Springer, editor, Proc. of the ACCV Asian Conference on Computer Vision(ACCV2010), 2010. [61] A. Rudi, M. Pizzoli, and F. Pirri. Linear solvability in the viewing graph. In ACCV Asian Conference on Computer Vision(ACCV2010), 2010. [62] Paul E. Rybski, Jeremy Stolarz, Kevin Yoon, and Manuela Veloso. Using dialog and human observations to dictate tasks to a learning robot assistant. Intel Serv Robotics, 1:159–167, 2008. EU FP7 NIFTi (ICT-247870) 34 Methods and paradigms for skill learning Gianni, Papadakis, Pirri, Pizzoli [63] Stefan Schaal, Auke Ijspeert, and Aude Billard. Computational approaches to motor learning by imitation. Philosophical Trans. of the Royal Soc. B: Biological Sciences, 358(1431):537–547, 2009. [64] Charles F. Schmidt, N. S. Sridharan, and John L. Goodson. The plan recognition problem: An intersection of psychology and artificial intelligence. Artif. Intell., 11(1-2):45–83, 1978. [65] M.P. Shanahan. A cognitive architecture that combines internal simulation with a global workspace. Consciousness and Cognition, 15:433–449, 2006. [66] Sheng-Wen Shih and Jin Liu. A novel approach to 3-d gaze tracking using stereo cameras. Systems, Man and Cybernetics, Part B, IEEE Transactions on, 34(1):234–245, 2004. [67] D. Stachowicz and G.J.M. Kruijff. Episodic-like memory for cognitive robots. Journal of Autonomous Mental Development, 2011. accepted for publication. [68] A. Tate. ”I-N-OVA” and ”I-N-CA”, Representing Plans and other Synthesised Artifacts as a Set of Constraints, pages 300–304. 2000. [69] S. P. Tipper. Does negative priming reflect inhibitory mechanisms? a review and integration of conflicting views. Quarterly Journal of Experimental Psychology, 54:321 – 343, 2001. [70] K.P. White, T.E. Hutchinson Jr., and J.M. Carley. Spatially dynamic calibration of an eye tracking system. In IEEE Transaction on Systems, Man, and Cybernetics, volume 23, pages 1162–1168, 1993. [71] B. Williams, M. Ingham, S. Chung, P. Elliott, M. Hofbaur, and G. Sullivan. Model-based programming of fault-aware systems. AI Magazine, Winter 2003. [72] Chen Yu and Dana H. Ballard. On the integration of grounding language and learning objects. In AAAI, pages 488–494, 2004. EU FP7 NIFTi (ICT-247870) 35 Arm-Hand behaviours modelling: from attention to imitation Sean R. F. Fanello1 , Ilaria Gori1 , and Fiora Pirri1 Sapienza Università di Roma, Dipartimento di Informatica e Sistemistica, Roma, RM, Italy [email protected],[email protected],[email protected] Abstract. We present a new and original method for modelling arm-hand actions, learning and recognition. We use an incremental approach to separate the arm-hand action recognition problem into three levels. The lower level exploits bottom-up attention to select the region of interest, and attention is specifically tuned towards human motion. The middle level serves to classify action primitives exploiting motion features as descriptors. Each of the primitives is modelled by a Mixture of Gaussian, and it is recognised by a complete, real time and robust recognition system. The higher level system combines sequences of primitives using deterministic finite automata. The contribution of the paper is a compositional based model for arm-hand behaviours allowing a robot to learn new actions in a one time shot demonstration of the action execution. Keywords: gesture recognition, action segmentation, human motion analysis. 1 Introduction We face the problem of modelling behaviours from a robot perspective. We provide an analysis of the role played by the primitive constituents of actions and show, for a number of simple primitives, how to make legal combinations of them in so enabling the robot to build and replicate the observed behaviour by its own. Here we shall focus only on actions performed by hands and arms, although we extend the action class beyond the concept of gestures (as specified, e.g. in Mitra et al. survey [1]). In fact, potentially any general action, such as drinking or moving objects around, performable by hand and arm, can be included in our approach. First of all we consider attention to and focus towards the human motion, as distinct from non-human motion, either natural or mechanical. This aspect, in particular, resorts to the theory of motion coherence and structured motion (see for example Wildes and Bergen [2]), for which oriented filters have been proven to be appropriate [3]. Indeed, we show how a bank of 3D Gabor filters can be tuned to respond selectively to some specific human motion. Thus, focus on regions of the scene interested by human motion provides the robot with a natural segmentation of where to look at for learning behaviours. In particular, attention to distinct human motion seems to be explicitly dependent on scale, frequency, direction, but not on shape. This fact has suggested us to define descriptors based only on these features, from which we extract principally the directions of the arm-hand movements (see also [4]). 2 Sean R. F. Fanello, Ilaria Gori, and Fiora Pirri The very simple structure of the descriptors enables a straightforward classification that includes all direction dependent primitives, such as up, tilt, release, grasp and so on. The basic classification can be easily extended to any legal sequence of actions for which a deterministic accepting automata exists. In this sense, according to the classification on human motion analysis, as provided by Moeslund et al. [5], our approach seems to fall into the category of action primitives and grammars, as no explicit reference to human model is used in the behaviour modelling. For other taxonomies concerning action recognition we refer the reader to [6–8]. The purpose of our work encompasses the classification and regression problem (see [6, 9]). The purpose is to enable robot action learning, by learning the primitives and their structured progression. This can be considered as a form of imitation learning (we refer the reader to the review of [10]) although an important generalisation inference is done in the construction of the accepting automaton. Thus, at each step of the behaviour learning, the robot finds itself either modelling the behaviour via a new automaton, from the observed sequence, or accepting the observed behaviour via an already memorised one. This can be further extended by revising the learned automaton. Finally we have experimented the above model with the robot iCub (see Fig. 6), a humanoid robot designed by the RobotCub Consortium. 2 Focusing on human motion Humans reveal a specific sensitivity to actions. It has been shown that action recognition is predominantly located in the left frontal lobe (see [11]) and that low level motion perception is biased towards stimuli complying with kinematics laws of human motion. Indeed, human visual sensitivity is greatest at roughly 5 cycle/degree and at 5 Hz. We have used 3D Gabor filters to record responses to human motion and in particular to arm-hand motion in so as to learn attention towards these specific movements in a scene, as opposed to other kind of movements (e.g. a fan). We show that 3D Gabor filters can discriminate different motions by suitably selecting scale and frequency. The selected regions are used by the descriptors to identify primitives of actions. The earliest studies on the Gabor transform [12] are due to Daugman [13] and to the experiments of Jones and Palmer [14], who tested the Daugman’s idea that simple receptive fields belong to a class of linear filters analogous to Gabor filters. Since then a wealth of literature has been produced on Gabor filters, to model several meaningful aspects of the visual process. Most of the works are, however, focused on the 2D analysis. A 3D Gabor, as the product of a 3D Gaussian and a complex 3D harmonic function, can be defined as follows: ! " # 1 > −1 > −1/2 −3/2 (1) (2π) exp − (x − x0 ) Σ (x − x0 ) exp 2πıu0 (x − x0 ) G(x) = |Σ| 2 Here x = (x, y, t), x0 = (x0 , y0 , t0 ) is the origin in the time-space √ domain, u0 = (u0 , v0 , w0 ) denote the central spatio-temporal frequency, finally ı = −1. Using Euler formula and simplifying, the harmonic term can be written as cos −2πu0 > x0 + ψ , with ψ the phase parameters in Cartesian coordinates. From this, according to the ψ value, it is possible to obtain two terms in quadrature, the even and the odd Gabor filters (see, Arm-Hand behaviours modelling: from attention to imitation 3 for example, [12] [15]), which we denote GO and GE . Clearly, with respect to Gabor’s representation of the information area [12] (see also [15]) these filters should be represented in a 6-dimensional space with coordinates x, y, t, u, v, w. However, following Daugman [15], we consider two representations, one in the space-time domain and one in the frequency domain. Here we shall mention only the space-time domain. Fig. 1: On the left a bank of 3D Gabor filters with same scale and frequency and varying direction. On the right a slice along the x − time. A Gabor can be specified by the parameters of scale (one per axis of the Gaussian support, which is an ellipsoid), of direction, by the angles θ and ϕ of the principal axes of the Gaussian support, and of central frequency. In fact, knowing the axes direction (eigenvectors) and the axes scale (eigenvalues) the Gaussian covariance Σ is determined. By varying these parameters, according to the spatial frequency contrast sensitivity and speed sensitivity in humans, providing limiting values, we have defined a bank of Gabor filters, in which the parameters range is specified as follows. Both the spatial and temporal frequencies are given as multiple of the Nyquist critical frequency f s = (1/2) cycle/pixel and ft = 12.5Hz, given that the video sampling rate was 25Hz. In particular, the frequency bandwidth is related to the Gaussian axes length as follows: ∆Fi = 1 √ , i = 1, ..3 2 λi (2) With λi the i-th eigenvalue of Σ, and ∆Fi = Fimax − Fimin the maximal (resp. the minimal) frequency of the chosen channel in the i-th direction. We have chosen the central frequency to vary along 4 channels from 1/2 to 1/8 and, accordingly, the scale to vary from 0.1 to 0.8 for each of the axis of the Gaussian support. This amounts to 48 parameters. On the other hand the orientation is given for 6 directions, namely {0, 30, 60, 90, 120, 150}, for both the angles θ and ϕ. This amounts to 36 parameters. We have thus obtained a bank of 48 × 36 filters. Figure 1 shows, on the left, a bank of 3D Gabor filters, with only the direction varying and the origin x0 varying on a circle, just for visualisation purposes. To learn the human motion bias, we are given training videos V of about 800 frames taken at a sampling rate of about 25 Hz. The video resolution is reduced to 144 × 192 4 Sean R. F. Fanello, Ilaria Gori, and Fiora Pirri and a period of ∆T = 0.64s is considered to accumulate information, at the end of which the energy is computed. This amounts to volumes V ∆T of 16 frames and, analogously the Gabor filter is defined by a volume of dimension 16 × 16 × 16. The square of the motion energy, for the given interval, is defined as: 2 R 2 P R 2 i 0 ∆T (3) En∆T (x − x0 )dx0 + 3 GiO (x0 )V ∆T (x − x0 )dx0 3 G E (x )V ∆T i (x) = R R Here the pair (GiE , GiO ) varies on the space of the filters bank, x = (x, y, t), x0 = (x0 , y0 , t0 ), and integration is triple on x0 , y0 and t0 . The energy is computed for each 3D Gabor in the filter bank, after smoothing the volume V ∆T , with ∆T ∼ 0.64s, with a binomial filter of size 3. Although the coverage of the bank is not complete, we look for the response that maximises the energy around a foveated region of at most 1 to 3 degrees, and it is minimal elsewhere. Intuitively, this means that the response of the receptive fields is sharp in the interesting regions. This fact, indeed, amounts to both maximise the energy and minimise the entropy of the information carried by the energy of the response. This is achieved by considering the energy voxels as i.i.d observations of a non-parametric kernel density. The non parametric kernel density of the energy is estimated using a uniform kernel (see next section, equation (7)), with bandwidth H = 0.08 · I, I the identity matrix, along the 3 dimensions of the foveated region (for non parametric densities and the estimation of the bandwidth, we refer the reader to [16]). Fig. 2: The higher sequence illustrates the optical flow, detecting evenly the fan and the arm-hand motion. The lower sequence illustrates the energy of the quadrature pair Gabor filters, along a path of ∆T minimising the entropy. This path is constant over 1/6 cycle/pixel and varying direction. The scale is fixed with a = 0.38, b = 0.46, c = 0.6. here all images have been resized to 1/2. On the other hand, the response is discriminative if the energy peaks are minimal in number, and hence the correlation is high on closest spatio-temporal regions. Therefore the optimisation criterion amounts to maximise the energy subject to both the minimisation of the entropy E(p) of the non-parametric density p and the minimisation of Arm-Hand behaviours modelling: from attention to imitation 5 an error function defined as the sum of the squared distances between any two energy peaks. Here a peak is any energy value x such that 4 X x≥ En. (4) 3N Here N is the dimension of En obtained by vectorisation of En∆T i . It is interesting to note that, under this optimisation criterion we have the following results: 1. Given ∼ 800 frames at 25Hz, with T about half minute, the maximisation of the energy, subject to the minimisation of the entropy, and subject to the minimisation of energy peaks distance, at each ∆T , ensures that the motion of a congruous source is tracked. 2. For attention towards human-motion only scale and central frequency influence energy response, while direction can be kept varying. 3. Human motion is located at the medium low frequencies of the filter bank. It follows that, choosing a Gabor filter of any direction, with central frequency in spacetime of about (1/6) cycle/pixel, cycle/frame, with minimal scale, if the optimisation criteria are satisfied along the whole path T then at the peaked regions arm-hand motion is very likely to be included. Some experiments are shown in Figure 2 and compared with optical flow where we note that 3D Gabor filtering can discriminate between handarm motion and the fan motion, note that the fan had two different velocities. 3 Online classification of action primitives Fig. 3: The figure illustrates the grabbed directions of some of the gesture primitives in pairs: Right (0°)- Left (180°), Up (90°)- Down (270°), Tilt 45°- Tilt 215°, Tilt 135°- Tilt 315°, Rotate (Right and Left), Grab and Release. In the previous section we have specified how to obtain the region of interest (ROI), where arm-hand motion is identified, for each frame of a video. In this section we show how actions primitives can be classified online. As gathered in the previous section, the optimisation criteria for identifying human motion were not tuned to direction: as far as a movement of the hand or arm is displayed the motion direction varies continuously, therefore it is less relevant than scale and frequency. However once the motion energy, as the squared sum of the responses of the two filters in quadrature, has been obtained, according to Heeger [17] (see also [18]), it is possible to recover the optical flow. However, once the scale and frequency have been selected for attention, any direction works 6 Sean R. F. Fanello, Ilaria Gori, and Fiora Pirri well with the selected scale and frequency for online bottom-up attention. Therefore to ease performance in gesture tracking and online classification we have chosen to use a simple and well performing optical flow algorithm such as Horn and Shunck’s algorithm [19]. Other methods such as Lucas-Kanade’s algorithm [20], Variational Optical Flow [21] and Brox’s Optical Flow [22] are either too demanding, in terms of features requirements, or too computationally expensive. For example Brox’s algorithm slow down computation at 6 Hz, whereas for real-time tracking human motion 25 Hz are needed. Fig. 4: The Figure on the left illustrates the likelihood computed at real-time at each time step t over a sequence of T = 500 frames. On the right the likelihood trend for the action “Grab-RightRelease”. Let hV(x, y, t), U(x, y, t), tit=1,...,T be the optical flow vector, for each pixel in the ROI. The principal directions of the velocity vectors are defined as follows: dir(x, y, t) = π k V(x, y, t) k V(x, y, t) (d arctan( )e + b arctan( )c) 2k π U(x, y, t) π U(x, y, t) (5) hence, at each (x, y) pixel in the ROI, at time t, the principal direction θ j takes the following discrete values: θj = ± (2 j − 1)π , j = 1, . . . , 2k 2k (6) Here k is half the number of required principal directions. We can note that the size of dir(t) depends on the dimension of the ROI. In order to obtain a normalised features vector X(t) ∈ R2k we use a uniform kernel which, essentially, transforms dir(t) into its histogram via a non parametric kernel density. More specifically, let n = 2k, let J be the indicator function, let Y(t) be the vectorisation of dir(t), with m its size, let x be an element of a vector of size n, scaled between min(Y(t)) and max(Y(t)): ! m 1 x − Y s (t) 1 X x − Y s (t) K (u) = J (|u| ≤ 1) hence for u = , X(x, t) = (7) K 2 h nh s=1 h Arm-Hand behaviours modelling: from attention to imitation 7 here we have chosen h = 1/28 . The obtained feature vector X(t) ∈ R2k is then used for any further classification of primitive actions. Right Left Up Down T45° T135° T225° T315° Grab Rel Rot False Right 0.7 Left Up Down T45° T135° T225° T315° Grab Rel Rot 0.15 False 0.15 0.3 0.1 0.1 0.2 0.7 0.1 0.7 1.0 1.0 0.1 0.7 0.8 0.3 0.7 1.0 0.1 0.1 0.1 0.1 0.1 0.8 0.8 0.1 Table 1: Confusion Matrix for 11 of the arm-hand primitives. Here false denotes a false positive gesture in the sequence. Given a source of sequences of 100 arm-hand actions (gestures) we have defined 11 principal primitives. Figure 3 illustrates 5 pairs of primitives, grab and release. Once each frame is encoded into the above defined descriptor, we can obtain parametric descriptors by estimating for each primitive action a mixture of Gaussian. For the estimation, to suitably assess the number of components of each mixture, we have been using the Spectral Clustering algorithm ([23]). Therefore for each primitive action A s , s = 1, . . . , M, with M = 11, the number of primitive actions considered, a mixture of Gaussian gAs , is estimated, with parameters (µ1 , . . . µm , Σ1 , . . . , Σm , π1 , . . . , πm ). The mixtures are directly used for classification. Given a video sequence of length T , we need to attribute a class A s to each feature descriptor Xi , i = 1, . . . , T , Xi = X(ti ), to establish the primitive actions appearing in the sequence. We note that because of the low frequency of arm-hand motion, it turns out that the same primitive holds for several frames, therefore it is possible to monitor the likelihood of a feature vector and searching the Gaussian space only at specific break points, indicating change direction. Indeed, consider a buffer BT = (X1 , . . . XT ) of features vectors, as defined in equation (7), obtained by coding T frames, where X(ti ) = Xi the i-th descriptor, at time ti , in the buffer. The posterior distribution of each primitive action, given each feature in the buffer is estimated via the softmax function. Namely, exp(λk ) P(A s |Xi ) = P j exp(λ j ) with λi = log gAs (Xi |A s )P(A s ) (8) Hence the observed primitive is classified to action A s , if P(A s |Xi ) > P(Aq |Xi ) for any primitive Aq , Aq , A s with gAs > τ, τ a threshold estimated in training, according to the 8 Sean R. F. Fanello, Ilaria Gori, and Fiora Pirri likelihood trend of each primitive. Now, given that, at time t0 , A s is chosen according to (8), the gradient of the likelihood is: ∆gAs (Xi ) = K X h=1 ph (Xi )πh Σh−1 (µh − Xi ) (9) Here K is the number of components of gAs , πh is the mixing parameter and Σh−1 the precision matrix. Now, as far as the likelihood goes in the direction of the gradient it follows that the action shown must be A s and as soon at the likelihood decreases it follows that a change in direction is occurring. Therefore the next class has to be identified via (8) and again the gradient is monitored. At the end of the computation a sequence hA s1 : p s1 , . . . , A sk : p sk i, of primitive actions, is returned. Each primitive in the sequence is labelled by the class posterior (according to 8), computed at the maximum likelihood, reached by the primitive action in the computation window of the gradient. In Figure 4 the likelihood trend of ten primitive gestures, over 500 frames, is shown. Table 1 illustrates the confusion matrix of the above defined primitives, for an online sequence of 100 gestures. From the confusion matrix it emerges that it is quite unlikely that the system mismatches a direction. However a weakness of the described online recognition algorithm is that it is possible to recognise false directions even if they are not in the performed sequence. In any case the accuracy of the whole system is around 80%, no matters if gestures are performed with varying speeds and by different actors. 4 Actions: Learning and Imitation According to the steps described in the previous sections, an action can be specified by a sequence of primitive gestures (arm-hand primitive actions). For example the action manipulation can be specified by the sequence hGrasp Rot Reli. This sequence is recognised using the online estimation of the likelihood trend of each primitive action in the sequence, as gathered in the previous section. However, the same action manipulation can be described by hGrasp Reli and by hGrasp U p Rot Down Reli, as well. Indeed, we consider a sequence of primitive gestures as a sample from an unknown regular language, specifying an action. We make the hypothesis that for each arm-hand action there is a regular language L(A) generating all the sequences (or words or strings) that specify the action. It follows, by the properties of regular languages, that further complex actions can be obtained by composition, likewise partial actions can be matched within more complex actions. We face the problem of learning a deterministic finite automaton (DFA) that recognises such a regular language. A DFA is defined by a 5-tuple (Q, Σ, δ, q0 , F), where Q is a finite set of states, with q0 the initial state; Σ is a finite input alphabet; δ : Q × Σ → Q is the transition function, and F ⊆ Q is the set of final states (see e.g. [24]). The problem of regular language inference described by a canonical DFA, consistent with the given sample, has been widely studied and, in particular, [25] have proposed the Regular Positive and Negative Inference (RPNI) algorithm to infer deterministic finite automata, in polynomial time. Here a canonical representation of an Arm-Hand behaviours modelling: from attention to imitation 9 Fig. 5: On the left the DFA of the action manipulation, on the right its extended PDFA. Note that, because of the structure of S + , I(q0 ) = 1 and PF (q2 ) = 1. The probabilities inside the states indicate the probability of the state to belong to QF , the set of final states. automaton A is a minimal representation A0 such that L(A) = L(A0 ), where L(A) is the language accepted by the DFA A. In this section we briefly show how, using the classification steps of the previous section, it is possible to build a positive and negative sample (S + , S − ) of an unknown regular language, such that the sample is structurally complete, that is, the words in the sample make use of all edges, states and final states of the DFA. We also provide a probabilistic extension of the finite automaton, using the annotation of the sequences. For each action A, to be learned using the 11 primitives, we define an ordering on the sequences starting with the minimal sequence (e.g. for the manipulation action hGrasp Reli), and increase the dimension with repeated primitives. Whenever a sequence fails to be recognised then the sequence, with the mismatched primitive, is added to the negative sample. It follows that, according to the recognition performance of the system, we should have 80 positive and 20 negative instances over 100 words of a specific action. Since the positive sample is provided by a benign advisor, it must be structurally complete. Given (S + , S − ) the RPNI algorithm starts by constructing an automaton hypothesis PT (S + )/πr , where PT (S + ) is the prefix tree acceptor of S + . Here πr is a partition of the prefixes Pr(S + ) of S + , defined as Pr(S + ) = {u ∈ Σ ? |∃v ∈ Σ ? , uv ∈ S + }, where Σ is the alphabet of S + . An example of a prefix tree, together with a merging transformation leading to the canonical automaton is given in [25]. Figure 5 illustrates an automaton generated by a sample including the following sequences: S + = {hGrasp Reli, hGrasp Down Reli, hGrasp Down Rot Reli, hGrasp Down U p Rot Reli, hGrasp Rot U p Reli, hGrasp Rot U p Down Reli, hGrasp U p Down Rot Reli, } (10) S − = {hRel Reli, hGrasp Graspi, hGrasp Rel Roti, hGrasp Rel U pi} Probabilistic extensions of DFA have been treated both from the point of view of probabilistic acceptors [26] and the point of view of automata as generative models of stochastic languages [27, 28]. Here, instead, we have assumed that the negative sample comes from the distribution of failures on single elements of the alphabet and we use the distribution on the primitive actions to compute the distribution induced by the identified automaton. Following [27, 28], we define the extension PA of a DFA A as A together with the functions IA : Qinit 7→ [0, 1], PA : δ 7→ [0, 1] and F A : QF 7→ [0, 1] where Qinit ⊆ Q is the set of initial states and QF ⊂ Q is the set of final states. If w is a word accepted 10 Sean R. F. Fanello, Ilaria Gori, and Fiora Pirri Fig. 6: iCub repeats actions performed by demonstrator. by A then there exists at least a path θ ∈ ΘA , ΘA the set of paths to final states, from ∃s0 ∈ Qinit to some sk ∈ QF and the probability of θ is: k Y PA (δ(si−1 , Ai )) F A (sk ) (si is a current state in the path) (11) PrA (θ) = IA (s0 ) i=1 Thus, given a normalization constant α, the probability of generating a word w is X PrA (w) = α PrA (θ) (12) θ∈ΘA It follows that if an action w is parsed by A then there exists among the valid paths the most probable one, according to the probabilities estimated in classification, as in HMMs. In order to add probabilities to states and transitions we proceed as follows. Let S = (S + , S − ) and let PT (S + ) be the prefix tree acceptor of S + . Let hFj = #{A j ∈ Σ | A j occurs as last symbol of w, w ∈ S + } and hIj = #{A j ∈ Σ | A j occurs as first symbol of w, w ∈ S + }. Now, let f jk = #{qk | δ(q, A j ) = qk , A j ∈ Σ} and l jk = #{A j | (qk , A j ) ∈ δ, qk ∈ Qinit }. Then we define: P P j f jk j l jk ? ? F A (qk ) = P F and IA (qk ) = P I (13) j h jk j h jk Here F A? and IA? are intermediate estimations. We recall that each word in S + is a labelled sequence according to the classification step. Namely if w ∈ S + then w = A j1 : p j1 , . . . , A jn : p jn . Now, for each branch in the PT (S + ) we construct a transition matrix Uk such that the dimension of Uk is |Q?A | × |Σ|, with |Q?A | = m the number of states in PT (S + ). Here an element ui j of Uk indicates the transition to state qi in the k-th branch, of the symbol A j (i.e. of primitive action A j ), in other words it indicates the position of A j in the sequence accepted by the k-th branch, since by construction qi is labelled by the prefix of the sequence up to A j . Thus ui j = 0 if there is no transition of A j to qi and ui j = pi j if A j labels the transition to qi in PT (S + ). Then all these transition matrices are added and normalised. For the normalisation we build a matrix Z which is formed P by repetition of a vector V, that is, Z = V ⊗ 1>n , n = |Σ|. Thus, let H = k Uk , with P Uk the matrix of the k-th branch of the PT (S + ) tree. we define VU = j H j , that is, the sum of H over the columns. Then U = H./Z, with Z the normalisation matrix defined Arm-Hand behaviours modelling: from attention to imitation 11 above, and ./ the element wise division between elements of H and elements of Z. Now, in order to obtain the transition probability matrix for A, at each merge step of state i and state j of the RNPI, assuming i < j, we have to eliminate a row u j . To this end we first obtain the new row unew = (ui + u j )/2 and then we can cross out u j . It follows that i the new matrix U new is (m − 1) × n, n the cardinality of Σ, and it is still stochastic. As this process is repeated for all merging operations in the NRPI algorithm, in the end the last U new obtained will be a stochastic matrix with the right number of transitions. At this point we are left with the two diagonal matrices F A? , IA? and the matrix U new . We define the three new vectors, which have all the same dimension: VF = diag(F A? ) VI = diag(IA? ) P Vδ = j U (new, j) Let Z = VF + Vδ . We can finally define F A = F A? ./Z, PA = U new ./Z and IA = IA? / It follows that the requirement for the DFA A to be a PFA, namely: P (q) = 1 q∈QA IAP F A (q) + A∈Σ PA (δ(q, A)) = 1 ∀q ∈ QA (14) P VI . (15) is satisfied, see Figure 5. We can note that, by the above construction, each transition δ is labelled according to the sample mean of each primitive action in the sequences mentioned in S + . Figure 6 shows sequences of learning and imitation. 5 Conclusions and acknowledgements We have described an original method, as far as we know, within the process of real time recognition and learning of direction-based arm-hand actions. Our main contribution is an incremental method that, from attention to human motion to the inference of a DFA, develops a model of some specific human behaviour that can be used to learn and recognise more complex actions of the same kind. For this early system we have chosen simple primitive gestures easily learnable and recognisable with low computational cost. In order to enable the robot to repeat the action, we have extended the DFA to a probabilistic DFA, which generates together with a language a distribution on it. Following the properties of regular languages it is possible to provide a real time learning system that can infer more complex actions. We have finally implemented the system, about which we have shown some specific performance results, on the iCub and tested that the set of actions demonstrated have been learned and replicated efficiently. The research is supported by the EU project NIFTI, n. 247870. References 1. Mitra, S., Acharya, T.: Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C 37(3) (2007) 311–324 2. Wildes, R.P., Bergen, J.R.: Qualitative spatiotemporal analysis using an oriented energy representation. In: ECCV ’00. (2000) 768–784 12 Sean R. F. Fanello, Ilaria Gori, and Fiora Pirri 3. Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. J. of the Optical Society of America A 2(2) (1985) 284–299 4. Braddick, O., OBrien, J., Wattam-Bell, J., Atkinson, J., Turner, R.: Form and motion coherence activate independent, but not dorsal/ventral segregated, networks in the human brain. Current Biology 10 (2000) 731–734 5. Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104(2-3) (2006) 90–126 6. Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28 (2010) 976–990 7. Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Computer Vision and Image Understanding 73 (1999) 428–440 8. Bobick, A.F.: Movement, activity, and action: the role of knowledge in the perception of motion. Philosophical Transactions of the Royal Society of London 352 (1997) 12571265 9. Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F., Ramanan, D.: Computational studies of human motion: Part 1, tracking and motion synthesis. Foundations and Trends in Computer Graphics and Vision 1(2/3) (2005) 10. Kürger, V., Kragic, D., Geib, C.: The meaning of action a review on action recognition and mapping. Advanced Robotics 21 (2007) 1473–1501 11. Casile, A., Dayan, E., Caggiano, V., Hendler, T., Flash, T., Giese, M.A.: Neuronal enc. of human kinematic invariants during action obs. Cereb Cortex 20(7) (2010) 1647–55 12. Gabor, D.: Theory of communication. J. IEE 93(26, Part III) (1946) 429–460 13. Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America 2(7) (1985) 1160–1169 14. Jones, J.P., Palmer, L.A.: An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology 58 (1987) 1233–1258 15. Daugman, J.G.: Complete discrete 2-d Gabor tansforms by neural networks for image analysis and compression. IEEE Trans. on ASSP 36(7) (1988) 1169–1179 16. Wasserman, L.: All of Nonparametric Statistics. Springer (2005) 17. Heeger, D.J.: Optical flow using spatiotemporal filters. International Journal of Computer Vision 1(4) (1988) 279–302 18. Watson, A.B., Ahumada, A.J.J.: Model of human visual-motion sensing. Journal of the Optical Society of America A: Optics, Image Science, and Vision 2(2) (1985) 322–342 19. Horn, B.K.P., Shunk, B.G.: Determining optical flow. Art. Intel. 17 (1981) 185–203 20. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. Proc. of DARPA Imaging Understanding Work. (1981) 121–130 21. Bruhn, A., Bruhn, J., Feddern, C., Kohlberger, T., Schnrr, C. In: Lecture Notes in Computer Science. Volume 2756. Springer, Berlin (2003) 222–229 22. Brox, T., Bruhn, A., Papenberg, N., Weickert, J. In: Lecture Notes in Computer Science. Volume 3024. Springer, Berlin (2004) 25–36 23. Luxburg, U.V.: A tutorial on spectral clustering. Statistics and Comp. 14 (2007) 395–416 24. Hopcroft, J., Ullman, J.: Introduction to Automata Theory Languages and Computation. Addison Wesley (1979) 25. Oncina, J., Garca, P.: -. In: Identifying regular languages in polynomial time. World Scientific Publishing (1992) 26. Rabin, M.O.: Probabilistic automata. Information and Control 6(3) (1963) 230–245 27. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finitestate machines-part i-ii. IEEE Trans. Pattern Anal. Mach. Intell. 27(7) (2005) 1013–1039 28. Dupont, P., Denis, F., Esposito, Y.: Links between probabilistic automata and hidden markov models: probability distributions, learning models and induction algorithms. Pattern Recognition 38(9) (2005) 1349–1371 Learning Saliency. An ICA based model using Bernoulli mixtures. Andrea Carbone and Fiora Pirri Abstract In this work we present a model of both the visual input selection and the gaze orienting behaviour of a human observer undertaking a visual exploration task in a specified scenario. Our method built on a real set of gaze tracked points of fixation, acquired from a custom designed wearable device [18]. By comparing these sets of fovea-centred patches with a randomly chosen set of image patches, extracted from the whole visual context, we aim at characterising the statistical properties and regularities of the selected visual input. While the structure of the visual context is specified as a linear combination of basis functions, which are independent hence uncorrelated, we show how low level features affecting a scan-path of fixations can be obtained by hidden correlations to the context. Samples from human observers are collected both in free-viewing and surveillance-like tasks, in the specified visual scene. These scan-paths show important and interesting dependencies from the context. We show that a scan-path, given a database of a visual context, can be suitably induced by a system of filters that can be learned by a two stages model: the independent component analysis (ICA) to gather low level features and a mixtures of Bernoulli distributions identifying the hidden dependencies. Finally these two stages are used to build the cascade of filters. 1 Introduction Perceptual (biological) systems are designed through natural selection; evolved optimally in response to the distribution of natural visual cues perceived from the environment. The knowledge of the statistical properties and regularities of the visual environment is a pivotal step in the understanding of the nature of visual processing [11, 7, 12]. Most natural visual tasks involve selecting a certain amount of locations in the visual environment to fixate. This visual scanning of the world - the Andrea Carbone Dept. of Computer and System Sciences, Sapienza, e-mail: [email protected] Fiora Pirri Dept. of Computer and System Sciences, Sapienza, e-mail: [email protected] 1 2 Andrea Carbone and Fiora Pirri scan-path - is performed by human beings in a very efficient way by programming a sequence of saccades on the visual array in order to project the selected spatial focus of attention onto the higher resolution area of the retina (fovea). The strategy followed by humans in deploying the mechanism of visual attention has been subject of research in neuroscience, cognitive science and lately computer vision. It has inspired novel biologically based methods for image compression, visual search, vision based navigation and all the areas of research in artificial systems where a preliminary selection of the area of interest, in a restricted portion of the input, helps in reducing the complexity of a generic further processing. The principle, underlying this approach, relies on a generic notion of visual saliency. This notion presupposes that the visual interestingness of a scene is a quantifiable entity encoding the task-relevant, context-based information embedded in the visual world. In general, saliency has been modelled as a function on some feature space computed on the image. Several approaches have considered the problem of quantifying, in a biological justified framework, a measure of visual salience. These include the approaches inspired by the Feature Integration Theory [31], and engineered to model the natural competition between bottom-up cues, such as local measure of centre-surround contrast on feature channels (i.e. orientation, colour opponency, luminance) [16][8]. Likewise a measure of salience is obtained by modelling features tuned to specific visual search tasks [33]. We recall also the approaches accounting for a top-down bias towards current task, like spatio-temporal locations or high level cues [32]. In Section 4 we further compare with other approaches. Our approach exploits the sparseness of the distribution of the selected fixations of a human scanpath according to a set of linear filters generated by ICA decomposition of the visual environment. By purposely choosing a threshold over the sparse feature vector characterising the scanpath, we map the originally continuous responses to their corresponding binary (discrete) representations from which a mixture of multivariate Bernoulli distributions is estimated. The goal of the mixture is to capture the residual dependencies existing between active responses elicitedby the visual selection of the observer. The model, that we derive from the mixture, computes a saliency map of the image, where high saliency values represent regions which are likely to have a statistical similarity to the ones occurred in the reference scanpath. 2 A new approach to visual gaze modelling. The above considerations motivate our approach to the problem of characterising the visual behaviour of an observer. The goal of this work is to investigate into the nature of visual selection, in terms of its statistical properties, when it is projected onto an ICA estimated feature space. The steps of our approach are: a. Sampling and modelling the visual context: to build a set of images carrying the information of the visual content of a specific scenario context. The context model is then derived by computing basic linear ICA, on a set of randomly selected samples from the database. b. Recording and Projecting the scan-path on the context bases: the actual gaze behaviour of a freely viewing subject is represented as a collection of gaze-centred Learning Saliency. An ICA based model using Bernoulli mixtures. 3 Fig. 1 A small collection of images sampled from the internals of the building hosting our department. image patches. The scan-path is then projected on the ICA features set computed is the first step. c. Prototyping the fixation sequence: to estimate the hidden dependencies via a Mixture of Bernoulli distributions on a thresholded version of the ICA-projected scan-path. d. Synthesis of a computational system: to build the filters cascade by transforming the mixtures parameters into a computational model that can be used to predict the learned gaze behaviour. Sampling the visual context. Our work is closely related to the natural image statistics domain [14]. In literature, natural images are defined as: “those that are likely to have similar statistical structure to that which the visual system is adapted to, during its evolution” [9, 20, 10]. The term natural in our context may sound misleading as it generally refers to collection of images of natural landscapes. In our scope we consider natural images as those characterising the visual context of the observer. For example a collection of pictures of the internals of a building, or the visual landscape of a surveillance inspector. See Figure 1 for an example. Modelling the visual context. ICA and sparse coding. A large amount of literature deals with the concepts of sparseness, efficient coding and blind source separation. These three aspects are intimately related to each other [23]. Sparseness is a statistical property meaning that a random variable takes both 4 Andrea Carbone and Fiora Pirri Fig. 2 A subset of the 256 of the linear ICA features computed from a set of 25000 random patches sampled from the global visual context database. small and large values more often than a normal density with the same variance. A sparse code, then represents data with a minimum number of active units. The typical shape of a sparse probability density distribution shows a peaked profile around zero and long heavy tails. The sparseness of the response of the cortical cells to the visual input [6], suggests the adoption of a computational framework suitable to discover the latent factors that represent the basis of an alternative space for encoding properly the visual data. The generative model we use in this work is the linear independent component analysis or ICA [15]. The linear independent component analysis models linear relations between pixels. In this model, any (greylevel) image patch I (x, y) can be expressed as a linear combination of a basis vector Bi (sometimes called the mixing matrix): n I (x, y) = ∑ Bi (x, y)si (1) i=1 where the si are the stochastic coefficients different for each image. The si can be computed from the image by inverting the mixing matrix: si = ∑ Wi (x, y) I (x, y) (2) x,y The Wi are called features or coefficients (because of the simple linear operation between coefficients si and features Wi ). An example of computed ICA features can be seen in Fig.2. The si are scalar values sparsely distributed (non Gaussian). The coefficients Wi resemble the organisation of the simple cells in the primary visual cortex V1 [19, 21] (i.e. a set of oriented, localised, bandpass filters). The linear ICA computations presented in this work were realised with the FastICA package [13]. Sampling the gaze. The scan-path. A gaze tracked sample g(i) is acquired at each frame. The i-th gaze sample is defined as the triple: Learning Saliency. An ICA based model using Bernoulli mixtures. 5 Fig. 3 A sample set out of the 470 filtered patches taken around fixations. g(i) = hp(i) , t(i) , f (i) i (3) Here p(i) denotes the (x(i) , y(i) ) image plane coordinates of the gaze point, t(i) is the time-stamp (in milliseconds) and f (i) the frame index. The full set of gaze samples is defined as: G = {g(1) , g(2) , . . . , g(k) } (4) Here k is the number of samples taken. As we are interested in analysing the information sampled at the centre of gaze during a fixation we proceed to filter out from the set G all those samples that are likely to occur at a saccade (the rapid eye movement between two consecutive fixations), via a non-parametric clustering problem. We borrow from Duchowski the definition of fixation as a sustained persistence of the line of sight in time and space [5]. In practice, a fixation is the centre of a spatial and temporal aggregation of samples in a given neighbourhood. We use the mean shift algorithm on the feature space spanned by the vectors in G (except the frame index information which is not useful to cluster together samples belonging to the same fixation). A similar approach has been presented in [28]. The output of the mean shift is a set of samples described by H = hc,V i, where c denotes a centre (x, y,t) resulting from the mean shift, and V is the patch centred in c. Therefore the full scan-path sequence F = H (1) , H (2) , . . . , H (l) (5) mentions only samples classified as fixation points. Results are shown in Fig.4. Figure 3 shows a subset of gaze-centred patches from a scan-path. Scan path projection on the context bases. Our model, defined over two stages, that is, the ICA model and the filters structure, is motivated by recent works of [22, 17]. Although in these approaches only the correlation is identified, while here we show the strong relation between context and free gaze motion. The visual context model is defined by the mixing matrix inverse W ∈ R256×1024 , obtained by a specified context. A patch is a 32 × 32 gray-level image I . A scanpath, such as the one illustrated in Figure 3 is a filtered (with respect to motion 6 Andrea Carbone and Fiora Pirri Fig. 4 Left: the gaze-machine used to acquire the scan-path. On the right: the mean-shift clustered fixation points (in red) superimposed on the plot of the full gaze track (continuous line). X,Y axes refers to the spatial image coordinates. The T axis represents the time-stamp in milliseconds of the gaze sample. Fig. 5 Stochastic coefficients obtained from a scan-path using the mixing matrix inverse from the context. blurring) sequence of N > 400 patches Ii , i = 1, . . . , N. From each scan-path the stochastic coefficients si j are obtained as in Eq.2, in particular: si j = W j Iˆi , (6) where Iˆi is the i-th patch transformed into a vector 1024 × 1 and W j is the j-th row of dimension (1 × 1024) of the matrix W . Let (si1 , . . . , si256 )> be the stochastic coefficients obtained for each i-th patch Ii . Thus a matrix S, of dimension N × 256, of these stochastic coefficients is obtained, Learning Saliency. An ICA based model using Bernoulli mixtures. 7 this is illustrated in Figure 5. The interest of these coefficients is to induce, by the second level of our model, a system of filters that can reproduce human fixations, given a context. That is, the system of filters generated by our model, given a context, will highlight the regions that have been fixated by some scan-path. Fig. 6 A short sequence of generated saliency map, superimposed on the corresponding images. Mixtures of Bernoulli. Given the scan-path generated coefficients, these are thresholded to obtain a mixture of Bernoulli distributions. A threshold is defined as follows. Let S be the coefficient matrix N × 256, each element of which is specified by equation (6), and si = (si1 , . . . , si256 ) its i-th row, which we consider as a multivariate random variable. Then τ is an optimal threshold, such that si j = 1 or si j = 0, if for each row i the entropy of the multivariate Bernoulli sˆi is minimised. That is, if f (si , τ) = sˆi then arg min( f , τ) = − ∑ p( f (si j , τ)|µi ) log p( f (si j , τ)|µi ) τ (7) j Here p(xi |µi ), with µi = (µi1 , . . . , µi256 )> , is a Bernoulli distribution with parameter µi , with xi a multivariate of dimension D, whose values are 0 or 1: D x p(xi |µi ) = ∏ µi ji j (1 − µi j )(1−xi j ) (8) j=1 Here D = 256, indeed. In other words entropy minimisation ensures that the information carried by the stochastic variable si is not wasted by the transformation into a Bernoulli variable sˆi . The computation of the threshold τ is achieved by an iterative procedure that initialise τ to the sample mean of the multivariate si , for each row i and further uses a classical gradient ascent method to find the τ that minimises the entropy of each obtained Bernoulli multivariate, given the local minimal thresholds τi . A mixture of Bernoulli multi-variates ŝi is defined as b(ŝi |µ, π) = K ∑ πk p(ŝi |µk ) (9) k=1 Here πk is the weight of the i-th mixture component, with ∑k πk = 1, and p(ŝi |µk ) is the Bernoulli distribution as specified in equation (8). The number of components 8 Andrea Carbone and Fiora Pirri of the mixture have been estimated analogously, using the entropy minimisation criterion, but on the mixture b. In fact, as the number of components of the mixture increases, the distribution tends to a uniform distribution, in so maximising the entropy. The optimal size for the samples considered was given by either K = 6 or K = 7 components. The parameters of the mixture have been estimated implementing, in Matlab, the estimation-maximisation (EM) algorithm for Bernoulli mixtures as reported in [2]. Building the model. The parameters interesting for our models are, indeed, both the weights and the mean vectors. Thus, for our samples case, these are the expectation vectors µk = (µk1 , . . . , µk256 ), k = 1, . . . , K, and the priors πk , both estimated by the EM. These parameters tell, in fact, the location of the dependency of the si from the context, which is coded in the mixing matrix inverse W . In fact, it is rational to expect that whenever a µi j , related to the si via the ŝi , has a high value, then the information of the associated channel in W has a specific importance, as a context, on the scan-path coefficient. Therefore, we choose from each of the K (number of mixture components) means µk = (µk1 , . . . , µk256 ) those values which are greater than a specified threshold according to the fact that for each component k = 1, . . . , K an approximately uniform number of µk j , k = 1, . . . K, j = 1, . . . , 256, is chosen. More specifically, let σi be a threshold and nk = #{µik |µik > σk }/D, with D = 256. We aim at choosing for each component a similar number of channels. The maximum entropy principle ensures to obtain a uniform distribution on these numbers. Thus, let h(nk , σk ) be a function that, given the K, µk = (µk1 , . . . , µk256 ), returns for each µk the number of µk j chosen in each vector, according to a given threshold σk , k = 1, . . . , K. Then we want to choose a threshold that makes the choice unbiased, that is, it does rely on the principle of maximum entropy or Laplace’s principle of indifference: K arg max(h, σ ) = − ∑ h(nk , σk ) log h(nk , σk ) σ (10) k=1 Now, let the chosen mean elements of each mean vector µk , of the k-th component, be denoted by µ̂k . The chosen means, as gathered above, indicate the important context dependency, according to the channel. In fact, each chosen µk j is related to a specific channel of the matrix W , the mixing matrix inverse, that specifies the linear dependency between the images of the context and the basis functions. Let (k j1 , . . . , k jm ), k = 1, . . . , K, be the indices of the channels of W corresponding to the selected µˆk , then for each component k we can establish the following correspondence: W k = (Wk j1 , . . . ,Wk jm ) iff µ̂k = (µk j1 , . . . , µk jm ) (11) Here W k is a sub-matrix of W , in which only the channels (k j1 , . . . , k jm ), k = 1, . . . , K have been chosen and each Wk ji , i = 1, . . . , m, is a column vector of the matrix W k . Hence, for each component k, the matrix W k is formed by those channels corresponding to the chosen µ̂k . Learning Saliency. An ICA based model using Bernoulli mixtures. 9 We are now ready to build the filtering system that shall lead to the construction of the scan-path saliency map. Fig. 7 The computational model induced by the learned mixture of Bernoulli distribution. Each of the K channel is characterised by a combination of a subset of the original ICA features.Qk is the output of the k-th channel. The saliency map is the weighted sum of the channels’ output via the mixture mixing coefficients π. Let (I1 , . . . , IN ) be a sequence of images taken from the context. The saliency map, induced by the context, is defined as follows, for each image Ir , r = 1, . . . , N. Qrk = ∏i (Ir ?Wk ji ) ∆r = ∑k πk ||Qrk || (12) Here ||Qrk || is the normalised version of Qrk . Here r indicates the r-th image in the sequence, from a context, i = 1, . . . m, indicates the number of channels of the W chosen for the k-th component by the µ̂k , and thus Qrk is the linearly filtered version of the input image by correlation with the features selected for the k-th channel. Finally, ∆r is the saliency map of the image Ir . Finally, to obtain the results illustrated in Figure 6 a Gaussian filter is applied to the saliency map to eliminate noise induced by the product and sum in eq. (12), and the saliency map is summed to the current frame H (i) . 3 Experiments The first step that we followed in order to model the experimental visual environment was to collect a set of views taken from the internals of a building. In this work we have chosen to take pictures of the Department of Computer and System Sciences building in Rome. We collected a set of 126 images representing the global content of visual information that people visiting or working in the building 10 Andrea Carbone and Fiora Pirri are likely to sense. The views contains sample images of different sub-contexts: corridors, rooms, laboratories, closets, doors. Pictures depicting the same sub-context were taken at different scales (i.e. from a closer or furthest point of view) and different angles. Fig. 1 shows a subset of pictures selected from the database and Fig. 2 shows its relative ICA decomposition. Subjective scanpaths have been recorded from three human subjects instructed to perform a two stage task involving initially a free-viewing behaviour inside a room and successively to walk out of the room and following a path in a corridor-like environment. We tested the quality of the saliency map computed by the model by comparing the cumulative saliency score of the human scan-paths with the one computed from a randomly generated scan-path. The saliency score Sal f related to a specific fixation is defined as the value of the saliency map ∆ computed on the corresponding image at the fixation coordinates. The cumulative saliency score induced by a scan-path is the sum of the individual scores over all of the fixations. The results show that . . . The saliency score Sal f related to a specific fixation and a saliency map ∆ is defined as a Gaussian weighted sum of the local saliency values on the neighbourhood (of the same size of the original patches) of the fixation coordinates. The cumulative saliency score induced by a scan-path is the sum of the individual scores over all of the fixations. The results clearly show that our model rewards human performed scan-path whereas random sequence of fixations gains much lower scores. This result is in line with those obtained for example in [25], where the statistic computed over a random set of fixation is compared to a natural one, yielding different results. The second evaluation measure instead, takes into account the average euclidean distance of the fixation point from the centroid of the maximal Fig. 8 Human vs random results: Saliency Score and Distance to maximum saliency value. 4 Related Works and Conclusions The role that central vision 1 plays in visual processes is intimately linked to the understanding of the relationships between action, visual environment statistics and 1 The high detailed visual information as projected on the neighbourhood of the centre of gaze during a fixation. Learning Saliency. An ICA based model using Bernoulli mixtures. 11 previous knowledge with the actual scan-path performed (i.e. the sequence of spatiotemporal fixations). To our best knowledge the closest methodology to our approach can be read in [27] and [26] where the fixations are collected from a moving observer but immersed in a virtual environment which second order statistics resemble the one that can be measured in real natural environments2 . Bruce and Tsotsos [3, 4] propose a bottom up strategy relying on a definition of saliency aimed at maximising Shannon’s information measure after ICA decomposition. They use a database of patches randomly sampled from a set of natural images. The saliency model is then validated against eye tracked data captured from laboratory experiment (recorded video and still images). In [24] the root mean square contrast is evaluated on a set of fixation points performed by an observer looking at static natural (in this case natural landscapes) images. They derive a saliency model based on the minimisation of the total contrast entropy. Reinagel and Zador in [25] study the effect of visual sampling by analysing the correlations of contrast and grey-level correlation in the fovea and para-fovea3 regions. In [34], the authors model the distribution of contrast and edges on gaze-centred image patches with a Weibull probability density function under the assumption that in a free-viewing context our gaze is drawn toward image regions which local statistics differs from the rest of the image. Tatler and Baddeley in [1] go through a deep discussion on determining what are the characteristics that most are likely influence the choice of the regions to fixate. They focus on local statistics on luminance, contrast and edges. The derived model, highlights a preference for high frequency edges. In [29] the authors observe the generic characteristic of the point of fixation conditioned to the magnitude of the saccade performed. Second order statistics regularities emerging in categories of natural images can be exploited as descriptors for classifying the kind of environment depicted [30]. We showed an approach aimed at modelling the visual selection of an generic observer from a real scan-path performed in a given environment. The rich information content from the visual environment is encoded in a set of feature bases which capture the linear correlations between the images. Then we project the actual scanpath onto the feature space representing the context. The research is supported by the EU project NIFTI, n. 247870. References 1. Baddeley, R., Tatler, B.: High frequency edges (but not contrast) predict where we fixate: A bayesian system identification analysis. Vision Research 46, 2824–2833 (2006) 2. Bishop, C.M.: Pattern recognition and machine learning (information science and statistics) (2006) 3. Bruce, N., Tsotsos, J.K.: An information theoretic model of saliency and visual search. Lecture Notes in Computer Science 4840, 171 (2007) 4. Bruce, N.D.B., Tsotsos, J.K.: Saliency, attention, and visual search: An information theoretic approach. J. Vis. 9(3), 1–24 (2009) 5. Duchowski, A.: Eye Tracking Methodology: Theory and Practice. (2007) 6. Field, D.: What is the goal of sensory coding? Neural Computation (1994) 7. Field, D.J.: Relations between the statistics of natural images and the response properties of cortical cells. J Opt Soc Am A 4(12), 2379–2394 (1987) 2 3 The so called 1/ f noise. Parafovea is the area sensed at minor resolution surrounding the fovea 12 Andrea Carbone and Fiora Pirri 8. Frintrop, S., Klodt, M., Rome, E.: A real-time visual attention system using integral images. Proc. of ICVS (2007) 9. Geisler, W.S.: Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59, 167–192 (2008) 10. Geisler, W.S., Ringach, D.: Natural systems analysis. Visual neuroscience 26(1), 1–3 (2009) 11. Gibson, J.J.: Perception of Visual World. (1966) 12. Gibson, J.J.: The Senses Considered as Perceptual Systems. (1983) 13. Hyvarinen, A.: Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10(3), 626–634 (1999) 14. Hyvarinen, A., Hurri, J., Hoyer, P.O.: Natural Image Statistics A Probabilistic Approach to Early Computational Vision., vol. 39 (2009) 15. Hyvarinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural networks 13(4-5), 411–430 (2000) 16. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998) 17. Karklin, Y., Lewicki, M.: Learning higher-order structures in natural images. Network: Computation in Neural Systems 14(3), 483–499 (2003) 18. Marra, S., Pirri, F.: Eyes and cameras calibration for 3d world gaze detection. pp. 216–227 (2008) 19. Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–609 (1996). DOI 10.1038/381607a0 20. Olshausen, B.A., Field, D.J.: Natural image statistics and efficient coding. Network: Computation in Neural Systems 7(2), 333–339 (1996) 21. Olshausen, B.A., Fteld, D.: Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research (1997) 22. Park, H.J., Lee, T.W.: Modeling nonlinear dependencies in natural images using mixture of laplacian distribution. (2004) 23. Pece, A.: The problem of sparse image coding. Journal of Mathematical Imaging and Vision (2002) 24. Raj, R., Geisler, W., Frazor, R., Bovik, A.: Natural contrast statistics and the selection of visual fixations. Image Processing, 2005. ICIP 2005. IEEE International Conference on 3, III – 1152–5 (2005). DOI 10.1109/ICIP.2005.1530601 25. Reinagel, P., Zador, A.: Natural scene statistics at the centre of gaze. Network: Computation in Neural Systems 10(4), 341–350 (1999) 26. Rothkopf, C.A., Ballard, D.H.: Image statistics at the point of gaze during human navigation. Visual neuroscience 26(01), 81–92 (2009) 27. Rothkopf, C.A., Ballard, D.H., Hayhoe, M.: Task and context determine where you look. Journal of Vision 7(14), 12 (2007) 28. Santella, A., DeCarlo, D.: Robust clustering of eye movement recordings for quantification of visual interest. ETRA ’04: Proceedings of the 2004 symposium on Eye tracking research & applications (2004) 29. Tatler, B., Baddeley, R., Vincent, B.: The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task. Vision Research 46(12), 1857–1862 (2006) 30. Torralba, A., Oliva, A.: Statistics of natural image categories. Network: Computation in Neural Systems 14(3), 391–412 (2003) 31. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognitive Psychology 12, 97–136 (1980) 32. Tsotsos, J.K., Culhane, S., Wai, W.K., Lai, Y., Davis, N., Nuflo, F.: Modeling visual attention via selective tuning. Artificial intelligence 78(1-2), 507–545 (1995) 33. Wolfe, J.M., Cave, K.R., Franzel, S.L.: Guided search: an alternative to the feature integration model for visual search. Journal of experimental psychology. Human perception and performance 15(3), 419–433 (1989) 34. Yanulevskaya, V., Geusebroek, J.M., Marsman, J.B.C., Cornelissen, F.W.: Natural image statistics differ for fixated vs. non-fixated regions. (2008) Learning cross-modal translatability: grounding speach act on visual perception Mario Gianni1 , Geert-Jan M. Krujiff2 , and Fiora Pirri1 1 2 Dipartimento di Informatica e Sistemistica, Sapienza, Universita’ di Roma Language Technology Lab, German Res. Center for Artificial Intell.(DFKI GmbH) January 21, 2011 The problem of grounding language on visual perception has been nowadays investigated under different approaches, we refer the reader in particular to the works of [7, 11, 3, 13, 2, 10, 12, 6, 5, 1]. It is less investigated the inverse problem, that is, the problem of building the semantics/interpretation of visual perception, via speechact. In this abstract we face the two problems simultaneously, via learning both the language and its semantics by human-robot interaction. We describe the progress of a current research facing the problem of simultaneously grounding parts of speech and learning the signature of a language for describing both actions and states space, while actions are executed and shown in a video. Indeed, having both a language and a suitable semantics/interpretation of objects, actions and states properties, we will be able to build descriptions and representations of real world activities under several interaction modalities. Given two inputs, a video and a narrative, the task is to associate a signature and an interpretation to each significant action and the afforded objects, in the sequence, and to infer the preconditions and effects of the actions so as to interpret the chronicle, explaining the beliefs of the agent about the observed task. K We start, thus, with two sets of observations the set {Y}N of speech-acts and the set {D}h=1 of descriptors of the n=1 action and objects space, both suitably extracted from the audio and video sequence (there are several methods to do that, for the visual sequence here we mention [9]). There are two sets of hidden data, namely the speech-act N labels {X}i=1 , and the properties {P}Hj=1 induced by actions, specifying how actions dynamically change both what is visible and what can be reported. The hidden variables P are indexed by time, and the hidden speech-act labels X are indexed by time and contextual links. We call these indices the states, thus, for all visual states j ∈ S there exists a cluster of contextual links { j1 , . . . , jk } formed by S k specifying a neighbour system for the speech act labels. The (simplified) dependency relation among the random variables is as follows. Speech-acts are independent of any other visual state, given the state at which the commented action is uttered, induced by the visual stimuli. The action descriptors are independent of both speech-act and the other visual states, given the state at which the action is expected. The interpretation of each phrase (the labels) is independent of any visual state given the time at which the action is seen and it depends on the interpretation of other speech-acts only via the neighbouring system. The specific dependencies of these variables is represented in Figure 1, right. The variables interplay is accounted for by the interaction of these two different processes, during an experiment. Here an experiment is specified by a task, like pouring some water from a jug to a glass, visually represented by a video, which is described by a narrative. The narrative refers to both the simple actions and objects space and to the beliefs about what is going on in the scene. The narrative, however, goes beyond the direct denotation as it describes the beliefs concerning preconditions and effects of actions on the afforded objects, in terms of temporal and spatial relations, and eventually other deictic expressions. In this initial formalisation, learning the crossmodal translatability is achieved via the two mentioned processes. As gathered above, each process is defined by an observable and an hidden part. The first process serves the linguistic part and it is defined by a hidden 1 Figure 1: Left: structure of the two double processes, where observations for both the processes are modulated by descriptors obtained by early interpretation of speech, in the narrative, and of motion and salient objects in the scene. The hidden states of the HMRF are formed by parts of speech describing possible (multi-modal) states of the world. The observation parts are denotations. The hidden states of the HMM are unobserved properties (such as changing relations) induced by actions. Observations are actions modelled as mixtures of Gaussian. Right: details. Markov random field (HMRF), to capture the qualitative-spatial structure of the multi-modal contexts of parts of speech. As said above speech-acts are the observed random variables, while labels provide an interpretation, or word contexts, and thus are the hidden part (see also [4]). The second process is an hidden Markov model with mixture of Gaussian observations (GHMM). Here the observations are the visual features dynamics; these are represented by descriptors of the images sequence which, in turn, are obtained by attention based interpretation of light and motion features. From these processes we obtain a speech-act space, a configuration space, an action space and a state space. The structure and connections of the two processes are schematically illustrated in Figure 1, left. The HMRF defines a joint probability distribution of sequences of observations, i.e. the speech-parts, and sequences of labels, i.e. the language and interpretation of speech-parts. Assume a collection of states S K has been learned, so that we have a double indexed graph. Observations are formed by a finite set of phrases {y1 , . . . , yn } j (we do not consider here the speech analysis which is assumed to yield the correct association with a predefined and recorded language corpus) having a d-dimensional term space Y indexed by S K , and the random field p(y) QK is defined by a n-dimensional space Y = n=1 Y jn , where Y jn = {y jn |y jn ∈ Σn }, with Σ the signature. The hidden field is determined by labels defined by a finite space of terms X jm = {x jm |x jm ∈ L}, with L the language, i.e. L = (Σ, D, I), that is, the language is defined by a signature, a domain and an interpretation. For example the predicate Q(t1 , t2 ) is a 2-dimensional term, its interpretation is defined according to the HMRF by any suitable set of pairs of objects whose denotation is specified by a term t ∈ Σn . Note that we are simply referring to a language and an interpretation in terms of elementary structures, not models (in the logical sense). Models of the language can be induced (see [9]) by the probabilistic relational structure, but are not treated here. The product space X is the space of configurations of the labels. A probability distribution p(x) on the space of configurations of the hidden labels is another random field; on this random field we define a neighbouring system δ that specifies how labels form subgraphs affine to the time of the visual stimuli. These subgraphs are, then, incrementally extended by the learning algorithm that we cannot describe here for lack of space. Labels specify via the neighbours a set of available interpretations. The random field equipped with the neighbour system δ is a Markov random field iff p(xi |xS \i ) = p(xi |δ(xi )) and the joint probability of the two processes is p(x, y) = p(y|x)p(x), this implies that any P function f : X 7→ R is supported by precisely the cliques of the graph, and p(xi |δ(xi )) = (1/Z) exp( C VC (xi )), P with VC = 1≤i≤nC λCi fiC (x) = λC f C (x), with λCi ∈ R the parameters of the model and f C (x) ∈ {0, 1} the features of the field. However, as gathered above the joint probability involves also the HMM. In fact, speech-acts (observations of the HMRF) are given in sequences and thus these are synchronised together with the action descriptors, therefore they turn out to be also observations of the HMM (see the Figure 1 on the left), if their 2 lengths satisfy specific conditions (that is, they are terms not phrases). For example, suppose that the task is to pour water into a glass. Then the video sequence is interpreted to generate descriptors for the action space. The parameters of the HMM can be learned as usual as far as we assume that observations are multivariate in R and states are in N. Suppose, instead, that by a suitable action space construction, from the video analysis, it is possible to build an action space and that states can be given an interpretation. Thus two more variables are involved, the denotation of variables and the inference of the M state properties, as extracted from the speech-acts. Thus, let {α}i=1 be the generated action space and let S be the state space of the visually interpreted actions. At time t a phrase and sparse denotations will be uttered, in the context of the observed scene. Thus the realization of the variables is p(y, D|st , kt , x)P(x|δ(x)). However the graph topology is locally induced by the visual stimulus and the utterance. Learning the dependency between the HMM states S and the HMRF states S K is achieved by an incremental learning algorithm that follows closely [8]. The difference is mostly on the initial steps. Here, instead of a normal distribution, the random field is built as a set of cliques induced by the simultaneous association of descriptors and phrases. In conclusion, labels are ground by the narrative which, on the other hand, describes both pointwise actions and state changes, by speech act explaining the action course and specific modalities concerning time and space features of the action effects and preconditions. The connection of the two learning processes ensures both grounding and signature learning. For example, after the action is executed a specific change of spatial relations is the action effect, a speech act shall serve to designate it. For this task the following objects require a relation to be established: the hand, the jug, the glass, the table and the water. The actions are: approaching the jug, grasping the jug handle, rising the jug, inclining the jug so that the water can spill out, putting down the jug. On the other hand there is an infinite set of possible world states associated with these actions and objects. However we are interested only in a finite state space, in which states are just those that can be specified in a finite time lag. That is, those states that can be uttered by the narrator. For example “now the hand is grasping the glass”, or “the glass now is on the table while before it was on the hand”, or “the glass is on the table and it is full of water but before you filled it it was empty”. Similarly: “pouring the water into the glass has been successful, because now the glass is full of water” and “you want to pour water in the glass because someone wants to drink it”. The research is supported by the EU project NIFTI, n. 247870. References [1] S. R. K. Branavan, Harr Chen, Jacob Eisenstein, and Regina Barzilay. Learning document-level semantic properties from free-text annotations. J. Artif. Intell. Res. (JAIR), 34:569–603, 2009. 1 [2] Peter Ford Dominey and Jean-David Boucher. Learning to talk about events from narrated video in a construction grammar framework. Artif. Intell., 167(1-2):31–61, 2005. 1 [3] Peter Gorniak and Deb Roy. Grounded semantic composition for visual scenes. J. Artif. Intell. Res. (JAIR), 21:429–470, 2004. 1 [4] Pierre Lison, Carsten Ehrler, and Geert-Jan M. Kruijff. Belief modelling for situation awareness in human-robot interaction, 2010. (submitted). 1 [5] Ingo Lütkebohle, Julia Peltason, Lars Schillingmann, Britta Wrede, Sven Wachsmuth, Christof Elbrechter, and Robert Haschke. The curious robot - structuring interactive robot learning. In ICRA, pages 4156–4162, 2009. 1 [6] Raymond J. Mooney. Learning to connect language and perception. In AAAI, pages 1598–1601, 2008. 1 [7] Alex Pentland, Deb Roy, and Christopher Richard Wren. Perceptual intelligence: learning gestures and words for individualized, adaptive interfaces. In HCI (1), pages 286–290, 1999. 1 [8] Stephen Della Pietra, Vincent J. Della Pietra, and John D. Lafferty. Inducing features of random fields. IEEE Trans. Pattern Anal. Mach. Intell., 19(4):380–393, 1997. 3 [9] Fiora Pirri. The well-designed logical robot: learning and experience from observations to the situation calculus. Artif. Intell., to appear, 2010. 1, 2 [10] Deb Roy. Semiotic schemas: A framework for grounding language in action and perception. Artif. Intell., 167(1-2):170–205, 2005. 1 [11] Deb Roy and Alex Pentland. Learning words from sights and sounds: a computational model. Cognitive Science, 26(1):113–146, 2002. 1 [12] Paul E. Rybski, Jeremy Stolarz, Kevin Yoon, and Manuela Veloso. Using dialog and human observations to dictate tasks to a learning robot assistant. Intel Serv Robotics, 1:159–167, 2008. 1 [13] Chen Yu and Dana H. Ballard. On the integration of grounding language and learning objects. In AAAI, pages 488–494, 2004. 1 3 AN APPROACH TO PROJECTIVE RECONSTRUCTION FROM MULTIPLE VIEWS A. Carrano, V. D’Angelo, S. R. F. Fanello, I. Gori, F. Pirri, A. Rudi email: [email protected], [email protected] Dipartimento di Informatica e Sistemistica Sapienza Universitı̈¿ 12 di Roma Rome, RM, Italy ABSTRACT We present an original multiple views method to perform a robust and detailed 3D reconstruction of a static scene from several images taken by one or more uncalibrated cameras. Making use only of fundamental matrices we are able to combine even heterogeneous video and/or photo sequences. In particular we give a characterization of camera matrices space consistent with a given fundamental matrix and provide a straightforward bottom-up method, linear in most practical uses, to fulfil the 3D reconstruction. We also describe shortly how to integrate this procedure in a standard vision system following an incremental approach. Figure 1. Comparison of the trifocal tensor and our approach: in the first case is necessary the correspondence between every view, in the second case are needed only couple of correspondences. [Hey98]. The trifocal tensor can be estimated by at least 7 corresponding points in three images, while the quadrifocal tensor can be estimated from at least 6 corresponding points in 4 images. Thus to use tensors it is necessary to have a certain number of points visible from each view. This can be achieved with quite good video sequences. For example, with at least 25 frames per second (fps), it is easy to find a set of images with a large percentage of common points. Nevertheless, in the general case only a certain number of views are available and the distance between different vantage points highly varies. It is, thus, difficult in most situations to force so many correct 3-correspondences or 4correspondences to be able to apply trifocal or quadrifocal tensors. Our method is less redundant than the tensor one, as it is not required to take unnecessary views of an object to obtain the right 3 and 4-correspondences. Indeed, being based on the fundamental matrix it is surely less constrained than tensors in fact while a fundamental matrix is always constructible when tensors are constructible, the contrary does not hold. For these reasons our method is closer to the human ability of choosing vantage points to mentally reconstruct a scene for the purpose of recognition. Human visual system, in fact, is able to generalize across viewpoints, and in some situations recognition is even view invariant. This human ability deeply studied in neurophysiology and psychology [BG93, Ede95, FG02] shows that humans need few vantage points to image an object, in other words, in just few views there is already enough information to perform the reconstruction. This means that the mental image of a familiar object, across views, exploits few correspondences KEY WORDS Stereo Vision, 3D-Reconstruction, camera matrices space, projective reconstruction, structure from motion 1 Introduction Modelling visual scenes is a research issue in several fields: finding the three-dimensional structure of an object, by analyzing its motion over time, recognizing an object in space, or just rendering a scene for visualization has many interesting applications from industry to security, from TV to media entertainment. In fact, efficiently computing 3D structure and camera parameters from multiple views has attracted the interest of many researchers in the last two decades. Since the early work of Faugeras, Zhang et Al. [DZLF94, ZDFL95], introducing the fundamental matrix, to deal with the problem of multiple images of a three-dimensional object, taken with an uncalibrated camera, several approaches have been considered. These approaches range from multilinear forms to multiple view tensors to take into account all the constraints induced by multiple views. However none of these approaches can be considered the final answer to the problem of reconstructing a scene from a number of its projective images due to the intrinsic complexity and constraints of the problem. Thus, easy and computational feasible methods are in great demand to obtain a good reconstruction of a scene. The trifocal tensor was introduced in [Har97] and the quadrifocal tensor by [Har98], connecting image measurements respectively along 3 and 4 views. A common framework for multiple view tensors has been proposed in 1 Figure 2. Inferred dense cloud of points obtained, applying the described method to the topology shown in Figure 6. among them; this suggests that it must be possible, in general, to obtain a reconstruction avoiding further constraints required by the methods used. In this sense our method, being quite flexible, allows to define a topology of views just from the similarities between the available ones, so that two images are connected if there is a good estimation of a fundamental matrix binding them. Using any reasonable topology we can always solve a 3D-reconstruction problem in a non linear way; furthermore, a linear solution for reconstruction is always possible using a compositional topology derived from an incremental approach. The paper is organized as follows. In the next Section 2 we introduce some preliminary camera matrix concept. In Section 3 we essentially explain the projective reconstruction method and show some example from the underlying topology. In Section 4 we describe the implementation of the method and illustrate an example of the complete metric reconstruction of Morpheus from the Matrix series see Figure 2. 2 Preliminaries Here we briefly recall some geometric concepts related to camera matrices, we refer the reader to [HZ00] for an in depth description. A perspective camera is modelled by the projection: xi ∼ P Xi , where ∼ is equality modulo a scale factor, Xi is a 4-vector denoting a 3D point in homogeneous coordinates, xi is a 3-vector denoting the corresponding 2D point and P is a 3 × 4 projection matrix (from 3D to 2D space). P is factorized, in a metric space as: P = K[R |t] (1) Here K is the intrinsic parameters matrix, R is the orientation matrix and t is the camera position 3-vector. The fundamental matrix for two views, capturing the correspondence between point x and x0 , is a rank 2, 3×3 matrix such that x0> F x = 0 and, given P1 and P2 , F = [P1 c]× P2 P1+ (2) Here P1+ is the pseudo-inverse of P1 , c is the center of the first camera, and [·]× indicates an anti-symmetric matrix of the vector product. The cameras projective matrices P1 = [I |∅] and P2 = [[e0 ]× F |e0 ], where e0 is the epipole and e0> F = 0, are the canonical cameras. The task of projective reconstruction is to find the camera matrices and the 3D points mapping to the points in each image. The estimation carries an intrinsic ambiguity in representation since any set of camera matrices corresponds to the set obtained by right multiplying both canonical cameras for an arbitrary non singular 4 × 4 matrix. If we consider three views (say P, P 0 and P 00 ) we can estimate fundamental matrices F12 , between the first and second view and F23 , between the second and third view, using the above results. However, a 3D point in a single image has 2 degrees of freedom, but in n images it has 3 degrees of freedom, thus there are 2n − 3 independent constraints between n views of a point and 2n − 4 for a line. Thus bilinear, trilinear and quadrilinear constraints are different. Hence, using multiple views carries in specific constraints also due to the error propagation of a moving camera, and the fact that, inevitably, points become occluded as the camera view changes. Therefore, a certain point is only visible in a certain set of views. Using multiple views allows inference of hidden dimension. For three views and four views, as specified in the introduction, the trifocal and quadrifocal tensor solve the trilinear and quadrlinear constraints. The tensors stop at four views (see [HZ00]). 3 Projective Reconstruction In this section we describe an original approach, as far as we know, to projective reconstruction simpler than trifocal tensor and anyhow quite powerful. This method uses fundamental matrices only. First of all we need a necessary and sufficient condition for pair of camera matrices P1 and P2 to be compatible to a given fundamental matrix F which is almost linear and does not explicitly involve 3D projective transformations. Then we will use this condition to build-up a linear system to solve the projective reconstruction problem. Two View Equation Let F be a fundamental matrix, the space of all pairs of camera matrices P1 , P2 compatible with F can be expressed as λP1 = [I | 0] Z (3) 0 0 µP2 = [e ]× F | e Z (4) where e0 is the left epipole, Z is any full-rank projective transform 4 × 4-matrix, and λ and µ are scale parameters of P1 and P2 respectively, both free and not null. Letting λP1 0 [I | 0] 0 H= and Y = (5) [e ]× F | e µP2 we can restate (3) and (4) as follows: (6) Y = HZ Now H is a full-rank 6 × 4-matrix as long as the epipole e0 is not null, which is always true for nondegenerate F , and in terms of its column space it can be represented as: H = h1 h2 h3 h4 here hi , 1 ≤ i ≤ 4 are linearly independent vectors. The space of the hs has dimension 6, hence there exist two vectors h5 , h6 orthogonal to h1 , ..., h4 which belong to null(H > ), let > z1 z2> > N = h5 h6 = null(H ) and Z = z3> z4> (7) Hence Y can be expressed as a linear combination of h1 , ..., h4 : > z1 z2> Y = HZ = h1 h2 h3 h4 z3> = (8) z4> = h1 z1> + h2 z2> + h3 z3> + h4 z4> . This equation is equivalent to (6) and, indeed, the solution of (9) is exactly (6). Now we obtain P1 and P2 from Y as follows. Let N > = N1> N2> we can write (9) as N1> N2> λP1 µP2 =0 thus we get the following that we shall call the two view equation: N1> P1 + γN2> P2 = 02×4 (10) with γ = µλ−1 free. This is the equation we are searching for. The equation (10) has 8 constraint on P1 , P2 and one free parameter γ thus it holds the same information of the fundamental matrix F , in fact, it has 7 constraints, as many as F ’s d.o.fs. Projective Reconstruction System Now we use the two view equation (see (10) above), for any couple of views equipped with a fundamental matrix with the aim to intersect the space of camera matrices in order to select only the satisfactory chain of views. We first analyze a particular, but prominent, case and then we give a general solution, which is nonlinear but can be handled through multiple linear refinements. Four views reconstruction Consider the case in which we have four views P1 , P2 , P3 and P4 , as stated above. Let Fij , be the fundamental matrix relating views Pi and Pj , with (i, j) ∈ {(1, 2), (2, 3), (3, 4), (3, 1), (4, 1)}. Let > > Λ> ij = (Λij 1 , Λij 2 ) denote any of the following pairs > > > {(A1 , A2 ), (B1 , B2> ), (C1> , C2> ), (D1> , D2> ), (E1> , E2> )}. Then: " > #! I Fij> e0ij × > Λij = null (11) > 0> e0 ij The graph illustrating the above described fundamental matrix connections Fij , is shown below, on the left. Now, in order to avoid the explicit use of Z we are going to express Y in terms of the null space of H > in fact multiplying both sides of (8) by N > we obtain N > Y = N > HZ = = = h> 5 h> 6 h1 z1> + h2 z2> + h3 z3> + h4 z4> = > > > > > > > h> 5 h1 z1 + h5 h2 z2 + h5 h3 z3 + h5 h4 z4 > > > > > > > h> h z + h h z + h h z + h h z 6 1 1 6 2 2 6 3 3 6 4 4 Finally N > Y = 02×4 = 02×4 . Figure 3. Base topologies. (9) Now, chosing the initial view P1 as constant we obtain the following system: > > 1 P1 + λA2 P2 = 0 A> > B1>P2 + µB2> P3 = 0 C1 P3 + νC2 P4 = 0 (12) D1> P3 + ρD2> P1 = 0 > > E P + ηE2 P1 = 0 1 4 P1 = [I | 0] We can note in (12) above that scale parameters, but two, are abitrary, since each Pi is defined modulo the scale parameter. We can, thus, further simplify (12) setting λ to be 1 in the first equation, and similarly we can set µ and ν. We obtain the system > A> 1 P1 + A 2 P2 = 0 > B1 P2 + B2> P3 = 0 C1> P3 + C2> P4 = 0 (13) D1> P3 + ρD2> P1 = 0 E > P + ηE2> P1 = 0 1 4 P1 = [I | 0] Figure 4. Adding a new view, possible links are shown. Multiple View Equation D E Let P̃1 ...P̃n be the solution of the Projective Reconstruction System S with P1 fixed, as seen in the previous sections. We can generalize the two view equation in the following way. We know that the space of the P chain which solve S with P1 free is: λ1 P1 = P̃1 Z .. . which is a straightforward linear system. λn Pn = P̃n Z Analysis of the degrees of freedom In general let n be the number of views and m the number of relations found between views. The system will have • Constraints: 8m due to the m equations, one for every relation found; 12 due to the fixed P1 ; overall 8m + 12. • Unknowns: 12n due to the n P s; m due to the scale factors; −(n−1) due to the scale choice, one for every P except the first; overall 12n + m − n + 1 = 11n + m + 1. To find a unique solution we should have 7m ≥ 11 (n − 1) . (15) (14) This is consistent with the number of dof of the P s and F , in fact, every F contributes with 7 constraints, that is, all information it has, and every P , except possibly the first, with 11 unknowns, that is, all information it needs to be instantiated. General Case For the general case views can be seen as the nodes of a graph. The nodes are connected if the Fij , related to the two views, have already been estimated. Broadly speaking, if we do not pay attention to the topology of the graph we can build a system as follows > Aij1 Pi + λij A> ij2 Pj = 0 ∀ij where exist Fij P1 = [I | 0] which in general is a nonlinear system. In the following sections we sketch an incremental approach taking into account the topology of the views graph, in order to find a linear solution of the system. Here the λi are the scale factors and Z is the free 3D projective transformation. As in the two view equation let λ1 P1 P̃1 . .. Y = (16) and H = .. . λ n Pn P̃n Now we can restate equations (15) as Y = HZ (17) from which we can obtain, as in the two view equation, the following result. Let N = null H > , with N1 Pn N = ... then N1> P1 + i=2 γi Ni> Pi = 0 Nn with γi free scale parameters, this is the Multiple Views Equation. Considering that H has dimension 3n × 4 then N has dimension 3n × (3n − 4) and so Ni , which are the n slices of N , have dimension 3 × (3n − 4). Thus equation on the right of (17) has (3n − 4) × 4 constraints and n − 1 free parameters. We show, now, how to use this equation to develop an incremental bottom-up linear approach. In fact, starting from the graph, we first solve the system for all 4-elements subgraphs, which form the base case discussed above and then, using the following two methods, we add new nodes and glue subgraphs. Adding a new double-connected view to an already D E solved graph. Let S = P̃1 ...P̃n be the solution of a projective reconstruction system with n views. Let Pnew be a new view to be added to S and let: M1> P1 + λM2> Pnew = 0 and > L> 1 Pnew + µL2 P2 = 0 be the two view equation of the F between P1 and Pnew and the two view equation of the F between Pnew and P2 . We can built the system M > P1 + M2> Pnew = 0 L>1 P > 1 new + µL2 P2 = 0 P1 = P̃1 P2 = P̃2 where λ has been set to 1 to fix the Pnew scale. The topology of this system, linear and simply solvable, is illustrated in Figure 4. Connecting two graph already solved. be the solution of the first system and N1> Q1 + γ2 N2> Q2 + γ3 N3> Q3 + n X D E Let P̃1 ...P̃n Figure 6. Topology used. let γi Ni> Qi = 0 D E Let P̃1 ...P̃n be the solution of the first system and N1> Q1 + γ2 N2> Q2 + γ3 N3> Q3 + i=4 the multiple view equation related to the second system, then the two graphs can be connected in a linear way trough 3 links as follows Pn > N1 Q1 + γ2 N2> Q2 + γ3 N3> Q3 + i=4 γi Ni> Qi = 0 > > A1 Q1 + λA2 P̃1 = 0 > > B 1 Q2 + µB2 P̃2 = 0 > > C1 Q3 + νC2 P̃3 = 0 we set all γs to 1 in order to set the scale of every Qs except the first, and λ to 1 to set the scale of Q1 . Then the system is linear. n X γi Ni> Qi = 0 i=4 M1> R1 + η2 M2> R2 + η3 M3> R3 + n X ηi Mi> Ri = 0 i=4 be the multiple view equation related to the second and the third graphs. Considering the system with the five links, arranged as shown on the right of Figure 5, we have Pn N1> Q1 + γ2 N2> Q2 + γ3 N3> Q3 + Pi=4 γi Ni> Qi = 0 n > > > M1 R1 + η2 M2 R2 + η3 M3 R3 + i=4 ηi Mi> Ri = 0 > > A1 Q2 + λA2 P̃1 = 0 B1> Q3 + µB2> P̃2 = 0 C1> R1 + αC2> P̃3 = 0 D1> R2 + βD2> P̃4 = 0 E1> R3 + ρE2> Q1 = 0 in which we set all γs and ηs to 1 in order to set the scale of every Qs except the first, and every Rs except the first. Further we set ρ to 1 for the scale of Q1 , with respect to R3 , and α to 1 to set the scale of R1 with respect to P3 . Then the system is linear. 4 Figure 5. The figure illustrates the possible links between two or three solved graphs. Connecting three graphs already solved with few links In this case, the topology is developed as follow: the first graph has two links to the second and two to the third, then there is a link between the second and the third. Implementation In this section we show via an example how with just few views, considered salient for a good reconstruction, our method proves its strength. The example we use here is obtained from a number of images of Morpheus from the Matrix series, illustrated in Figure 6, showing also the chosen topology for the path connecting four views. The complete metric reconstruction follows the steps listed below. 1. For features extraction we use the scale invariant feature transform [Low04]. Repolishment step is needed after camera matrix estimation in order to reduce errors due to fundamental matrix measurements. Nevertheless we observe that this step is generally very fast due to the proximity of linear solution to the best one, as we see in the next section. 5 Figure 7. Dense disparity map. 2. For feature matching we base correspondence between pairs of features in two adjacent view (according to the topology illustrated in Figure 6) on shortest Euclidean distance, in the invariant feature space. 3. The fundamental matrices are estimated iteratively with Random Sample Consensus (RANSAC) [FB81, RFP08] and a simple optimization, as follows: • The Fij , for the five pairs indicated in Figure 6, are estimated with the 8-points algorithm. • Inliers are computed, using the estimated Fij , iteratively up to convergence. • The Fij are re-estimated using inliers, minimizing a cost function based on Sampson error. • New correspondences are identified using the Fij found at the previous step. 4. Given the Fij , the camera matrices P1 , . . . , P4 are obtained according to the method described in Section 3 following the specified topology. Moreover P1 , . . . , P4 undergo a repolishment step via nonlinear least squares in order to minimize the reprojection error 5. Given the camera matrices (the views) P1 , . . . P4 , errors between the measured point xij in view Pi and the reprojection Pi Xj (see Section 2) are minimized to produce a jointly optimal 3D structure and calibration estimate, by bundle adjustment [TMHF99]. 6. The metric reconstruction is obtained, thus, with a rectifying homography H from auto-calibration constraints as (Pi H, H −1 Xj ) as described in [HZ00]. 7. With rectification new camera matrices are determined to obtain coplanar focal planes. 8. Finally, the dense disparity map, illustrated in Figure 7, is obtained from the correspondences between each pixel of image pairs, see [HZ00]. While the dense 3D reconstruction shown in Figure 2 generates a dense cloud of points using optimal triangulation algorithm. Experiments We tested our reconstruction method on a collection of randomly generated point sets and cameras in order to estimate its reliability under different working conditions. We have analyzed especially the robustness to errors of the four views graph (see Figure 3) which is the base structure of reconstruction graphs. In fact the goal of a real camera matrix estimator is to compute a set of camera matrices which minimizes the reprojection error of the estimated 3D points. Accordingly, in each trial we have simulated the real estimation process of camera matrices and have measured the reprojection error via the Sampson error. We recall that Sampson error is the first order approximation of the reprojection error to which is actually very close but absolutely computationally cheaper [HZ00]. Every trial consists of the following steps: 1. Setting up the environment. This amounts to random generation of four camera matrices and a point cloud (from 50 up to 1000 points). Projection of every point in the cloud on the image plane of each camera. Any point projection is perturbed by a gaussian error with standard deviation 0.2 ≤ σ ≤ 2, fixed by the trial. 2. Estimation of fundamental matrices. These are obtained from projected matching points on every couple of camera’s image planes (see point 3, Section 4) 3. Estimation of the four camera matrices. Our method is applied on five of the six fundamental matrices computed in the previous step and arranged following the four views graph topology (see point 4, Section 4). 4. Computation of fundamental matrices, those are obtained from every pair of camera matrices. 5. Measurement of the Sampson error of the projected points on those fundamental matrices. At the end of every trial we collected the mean and variance of the absolute Sampson error measured in pixels on the fundamental matrices estimated at step 2 and at step 4. Let us denote M SE1 , V SE1 , M SE2 and V SE2 the means and variances of the two errors. Analyzing those data we can observe a strong linear dependence between M SE2 and M SE1 . In fact the model which best fit among polynomials is M SE2 = 0.35 + 1.53M SE1 (18) with variance 1.62. Note that in 56.3% of the trial we have M SE2 < 0.35 + 1.53M SE1, and in 88.1% of the trial we have M SE2 < 0.35 + (1.53 + 1.62)M SE1. Tuan Luong, and Olivier D. Faugeras. Robust recovery of the epipolar geometry for an uncalibrated stereo rig. In ECCV (1), pages 567– 576, 1994. [Ede95] Shimon Edelman. Class similarity and viewpoint invariance in the recognition of 3d objects. Byological Cybernetics, 72:207–220, 1995. [FB81] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981. [FG02] D. H. Foster and S. J. Gilson. Recognizing novel threedimensional objects by summing signals from parts and views. In Proceedings of the Royal Society of London: Series B, volume 269, pages 1939–1947, 2002. [Har97] Richard I. Hartley. Lines and points in three views and the trifocal tensor. International Journal of Computer Vision, 22(2):125–140, 1997. [Har98] Richard I. Hartley. Computation of the quadrifocal tensor. In ECCV (1), pages 20–35, 1998. [Hey98] Anders Heyden. A common framework for multiple view tensors. In ECCV (1), pages 3– 19, 1998. [HZ00] R.I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049, 2000. [Low04] David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. [RFP08] Rahul Raguram, Jan-Michael Frahm, and Marc Pollefeys. A comparative analysis of ransac techniques leading to adaptive realtime random sample consensus. Computer Vision,ECCV 2008, pages 500–513, 2008. Figure 8. Linear dependance between M SE2 on y-axis and M SE1 on x-axis (solid line) and security lines (dashed lines) 6 Conclusion We have described an original method, as far as we know, within the process of 3D reconstruction from a set of views, to obtain the camera matrices from a set of fundamental matrices, that is fast and flexible. A main feature of our method is that it uses just the information given by fundamental matrices, without further assumptions requiring more views. It is clear that our approach can be useful in many practical applications, in fact, relying on a straightforward linear solution, it proves to be more efficent than classical approaches. Furthermore, the technique should be easily integrable into complex vision systems. We exploit, on the other hand, a topology of n-view, recursively based on four views, to build a system which is, in general, non linear but in the specified topological arrangements. Thus, our approach, is close to the human behaviour determining where to look at, to discover the important views (the hidden views) necessary to mentally reconstruct an object. Indeed, it exploits a specific structure of two views simply based on pairs of images correspondences, and thus it uses all the information given by the estimation of the fundamental matrix. The research has been partially supported by the EU project NIFTI, n. 247870. References [BG93] Irving Biederman and Peter C Gehardstein. Recognizing depth-rotated objects: Evidence and conditions for three-dimensional viewpoint invariance. Journal of Experimental Psychology: Human Perception and Performance, 19(6):1162–1182, 1993. [DZLF94] Rachid Deriche, Zhengyou Zhang, Quang- [TMHF99] Bill Triggs, Philip F. McLauchlan, Richard I. Hartley, and Andrew W. Fitzgibbon. Bundle adjustment - a modern synthesis. In Workshop on Vision Algorithms, pages 298–372, 1999. [ZDFL95] Zhengyou Zhang, Rachid Deriche, Olivier D. Faugeras, and Quang-Tuan Luong. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artif. Intell., 78(1-2):87– 119, 1995. Switching tasks and flexible reasoning in the Situation Calculus Alberto Finzi and Fiora Pirri {finzi}@na.infn.it {pirri}@dis.uniroma1.it March 25, 2010 Abstract In this paper we present a new framework for modelling switching tasks and adaptive, flexible behaviours for cognitive robots. The framework is constructed on a suitable extension of the Situation Calculus, the Temporal Flexible Situation Calculus (TFSC), accommodating Allen temporal intervals, multiple timelines and concurrent situations. We introduce a constructive method to define pattern rules for temporal constraint, in a language of macros. The language of macros intermediates between Situation Calculus formulae and temporal constraint Networks. The programming language for the TFSC is TFGolog, a new Golog interpreter in the Golog family languages, that models concurrent plans with flexible and adaptive behaviours with switching modes. Finally, we show an implementation of a cognitive robot performing different tasks while attentively exploring a rescue environment. Keywords: Cognitive robotics, executive control, cognitive control, switching tasks, adaptive and flexible behaviours, Situation Calculus, action perception and change, temporal planning. 1 1 1 INTRODUCTION 2 Introduction Several approaches have been recently taken for the advances of cognitive robotics. These different viewpoints are foraged by new breakthroughs in different research areas correlated to cognitive control and, mainly, by new experimental settings that have encouraged a better understanding of the cognitive functioning of executive processes. In real-world domains robots have to perform several activities requiring a suitably designed cognitive control, to select and coordinate the operation of multiple tasks. The ability to establish the proper mappings between inputs, internal states, and outputs needed to perform a given tasks [48] is called ”cognitive control” or ”executive function” in neuroscience studies and it is often analysed with the aid of the concept of inhibition (see e.g. [48, 3]), explaining how a subject in the presence of several stimuli responds selectively and is able to resist inappropriate urges (see [77]). Cognitive control, as a general function, explains flexibly switching between tasks, when reconfiguration of memory and perception is required, by disengaging from previous goals or task sets (see [45][55]). The role of task switching in robot cognitive control is highlighted in many biologically inspired architectures, such as the ISAC architecture [34], ALEC architecture based on state changes induced by homeostatic variables [25], Hammer [15] and the GWT (Global Workspace Theory) [71]. Studies on cognitive control, and mainly on human adaptive behaviours, investigated within the task-switching paradigm, have strongly influenced cognitive robotics architectures since the eighties, as for example the Norman and Shallice [54] ATA schema, the FLE model of Duncan [16] and the principles of goal directed behaviours in Newell [53] (for a review on these architectures in the framework of the task switching paradigm see [67]). Also the approaches to model based executive robot control, such as Williams [8] and earlier [33, 80], devise runtime systems managing backward inhibition via real-time selection, execution and actions guiding, by hacking behaviours. This model-based view postulates the existence of a declarative (symbolic) model of the executive which can be used by the cognitive control to switch between processes within a reactive control loop. Here, the executive model provides a local and detailed representation of the system and monitors the processes engagement and disengagement. In this context, the flexible temporal planning approach (e.g. Constraint-based Interval Planning framework [33]), proposed by the planning community, has shown a strong practical impact in real world applications based on deliberation and execution integration (see e.g. RAX [33], IxTeT [27], INOVA [74], and RMPL [80]). These approaches amalgamate planning, scheduling and resource optimisation for managing all the competing activities involved in many robot tasks. Important examples are the flexible concurrent plan concepts of Jonsson and colleagues [33, 12] and Ghallab and colleagues [27]. The flexible temporal planning approach, underpinning temporal constraint networks, provides a good model for behaviours interaction and temporal switching between different events and processes. However, the extremely complex structure required by the executive robot control has strongly affected the coherence of the whole framework, especially because implementation issues have prevailed in the flexible temporal planning approach over the semantic modelling of the different components integration. On the other hand, from a different perspective, high level executive control has been introduced in the qualitative Cognitive Robotics1 community, within the realm of theories of actions and change, such as the Situation Calculus [46, 61, 41, 65], Fluent Calculus [68, 17, 76], Event Calculus [72, 73, 22], the Action language [26] and their built-in agents programming languages such as the Golog family (ConGolog, INDIGolog, Readylog, etc. see [38, 64, 11, 30]), FLUX [75], and similarly APL [1]. In the theory of action and change framework the problem of executive control has been regarded mainly in terms of action properties, their effects on the world (e.g. the frame problem) and the agent’s ability to decide on a successful action sequence basing on its desire, intentions and knowledge [40, 4, 29, 42]. These both for off-line and online action execution. In this sense high level executive control is intended as the reasoning process underlying the choice of actions. Nonetheless reactive behaviours have been considered from the view point of the interleaving properties of the agent’s actions and external exogenous actions, induced by nature [63]. Reiter grasped the concept of 1 The term has been earliest introduced by Reiter IJCAI 93, see also [39] 1 INTRODUCTION 3 inhibition through that of “bad situations” [65]. Bad situations, however, were proposed in the perspective of actions effects achievements, although his considerations where more deeply immersed in the human behaviour and also concerned with task switching. Analogously, in Decision Theoretic Golog the stochastic structure of actions served to achieve the most successful plan in uncertain domains [9]. Real world robot applications are increasingly concerned not just (and not only) with properties of actions but also with the system reaction to a huge amount of stimuli, requiring to handle response timing. Therefore, the need to negotiate the multiplicity of reactions in tasks switching (for vision, localisation, manipulation, exploration, etc.) is bearing a different perspective on action theories. An example is the increasing emphasis on agents programming languages or on multiple forms of interactions leading to the extraordinary explosion of multi agent systems. Indeed, the control of many sources of information, incoming from the environment, likewise arbitration of resource allocation for perceptual-motor and selection processes, had become the core challenge in actions and behaviours modelling. The complexity of executive control under the view of adaptive, flexible and switching behaviours, in our opinion, requires the design of a grounded and interpretative framework that can be accomplished only within a coherent and strong qualitative model of action, perception and interaction. This with the proviso to offer sound transformations of the underlying constructs into structures that can be treated quantitatively (e.g. temporal networks, Bayes networks, graphical models, etc.). The main contributions of this paper can be briefly summarised as follows: • we extend the framework of the Situation Calculus to represent heterogeneous, concurrent, and interleaving flexible behaviours, subject to switching-time criteria. This leads to a new integration paradigm in which multiple parallel timelines assimilate temporal constraints among the activities. • Temporal constraints and rules for their definition (the compatibilities) implement adaptation and inhibition of behaviours. This is made possible via a specific term that we call bag of timelines (also bag of situations), actually a set of concurrent, temporal situations formalising processes on multiple timelines. On the basis of this term we are able to introduce a constructive method for declaring temporal compatibilities, based on a meta-language. • The compatibilities are rules with a double facet, they are formulae of the Situation Calculus but also the logical counterpart of a temporal network. We show, indeed, that compatibilities can be transformed into temporal constraint networks. We show therefore that, under specific circumstances, logic-based reasoning and constraints propagation can be treated independently still in the same logical framework. • As usual within the Situation Calculus, the extended framework provides the semantics for specifying a Golog interpreter. We introduce the Temporal Flexible Golog (TFGolog) programming language suitable for representing high-level agent programs for concurrent and temporal switching processes. We show how the TFGolog interpreter transforms high-level programs into temporally flexible plans. • We prove consistency results about the TFSC and prove several properties of the system. • We provide several examples that illustrate our approach and show its usefulness. In particular, we show that the framework can foster attention driven exploration. The example has also been used for testing this framework, as reported in [10]. The rest of the paper is organised as follows. In the next section we give an intuition of the proposed work with an example about cognitive robot control. In Section 3 we recall some preliminaries properties of the Situation Calculus and Golog and we introduce the Temporal Flexible Situation Calculus (TFSC). The TFSC as an extension of the Situation Calculus, including timelines and bag of timelines, is used also to define processes and constraints between processes, these issues are discussed in Section 4 and in Section 5. The language for the 2 WHY FLEXIBLE PLANNING AND WHY MODELLING MULTIPLE BEHAVIOURS 4 construction of constraints and flexible behaviours is presented in Section 5 and it is shown, in Section 6, how it maps to a Constraint Temporal Network. In Section 7 we introduce the Temporal Flexible Golog interpreter showing several results and examples. In Section 8 we illustrate the example of an attentive robot controller based on the Temporal Flexible Golog interpreter. Finally we dedicate Section 9 to the related works and to hint future works. All the proofs are collected in the Appendices A and B at the end of the paper. 2 Why Flexible Planning and why modelling multiple behaviours Robotic systems, whether they are mobile robots, camera networks or sensor networks, are composed of several heterogeneous hardware and software components operating concurrently and smoothly interacting with the external world. In these systems, complexity increases exponentially with the number of possible interactions among components. Each component can perform a set of activities whose duration might be controllable or not, as exogenous events can disturb allocated time lags. The range and variety of components possible interactions is quite extended, although limited by both structural temporal constraints (such as timeouts, time priorities, time windows), and resource constraints, such as terrain features for locomotion, light features for vision, and speed time for sensor networks. A control plan suitable for these systems should be flexible, in order to be robust and avoid deadlocks. In other words it should be able to make available, at any time, a set of possible behaviours, so that the actual behaviour can be decided on-line. Figure 1: Planned activities for the rescue mobile robot (see the robot looking at a victim hand waving from a hole, in the wall, on the right). Each component is represented by a timeline where the planned activities are sequenced. Starting and ending time of these activities are bound just at the execution time. Example 1 (Cognitive Robot) A mobile robotic system performs some basic tasks such as exploring the environment (possible a rescue environment). The robot control system is composed of several functional components, some typical are: Mapping and Localisation, Navigation (for path-planning), Pan-tilt unit (for head and gaze control), Camera (for vergence, zooming, etc.), Locomotion (low level engine controllers), Sound processing, Visual processing, Attentional system, Exploration (taking care of search strategies), and possible other components related to other sensors and other adaptation needs. These concurrent activities may have several causal, temporal, and resource constraints. For example, the Pan-tilt should look ahead while the robot is moving. The Camera should be continuously pointed in the correct direction that might be earlier detected by sound and, at the same time, the robot engine vibrations should be compensated by some stabilisation process, likewise ego-motion for suitable tracking. Camera and Pan-tilt components might start a tracking activity during 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 5 a task requiring to explore and search for something, or to follow someone. However while the starting time of the ptuScan process is controllable, the ending time of this process is nondeterministic, as it depends on the response of the scanned object/person. In turn, the ending time of the tracking process affects the starting and ending times of other activities, for example to pinpoint where exactly to go, operating the necessary strategies to achieve that. Figure 1 illustrates on the left a flexible temporal plan for a rescue robot (on the right looking at a hand waving from a hole in a wall). The plan stipulates that the explore process, given that it ends within an interval of [10, 20], should commit the the end-time of the stop process to be greater than 8, but less than 25 seconds, while ptuReset should end between 15 and 22 seconds. Now, ptuReset can be active just during the locomotion component process stop; the stop process, in turn, can end only after the end of ptuReset. On the other hand explore is not directly affected by ptuReset and stop, hence it can end before, after or during these activities, and its ending time can switch w.r.t. the ending times of ptuReset and stop. Whenever a set of planned activities is executed, the associated activation times are actually bound; hence, the enforced constraints can be suitably propagated. In the next sections we show how these problems can be addressed and solved in the Situation Calculus and Golog providing a clear and sound framework for designing a complex system. 3 Basics for the Temporal Flexible Situation Calculus In this section, we present the basic ideas and formal structure of the Temporal Flexible Situation Calculus (TFSC). The TFSC is conceived for describing a complex dynamic system with a finite number of components, to which a certain amount of resources and processes are assigned. The system should be able to execute interleaving processes, allowing switching between processes threads of different components, by inhibiting active tasks of less demanding components. 3.1 Preliminaries The Situation Calculus [46, 65] (SC) is a sorted first order language with equality, augmented with a second order induction axiom. The underlying signature of the sorted language is specified by three sorts: Act for actions, S it for situation and Ob j for objects. To simplify reading we usually refer to these sorts as actions, situations and objects. The terms of sort actions are either constants or functions mapping elements of sort objects and possibly of sort actions into elements of sort actions, e.g. move(x, y). Terms of sort situation are either the constant symbol S 0 or terms of the form do(a, s), where a is a term of sort action and s is a term of sort situation. The term S 0 denotes the initial situation, where no action has yet occurred, while do(a, s) encodes the sequence of actions obtained from executing the action a after the sequence of actions encoded in s. Properties of objects and their dynamics are described by fluents. Thus fluents denote properties that may change when executing an action, and are specified by either predicates or function symbols whose last argument is a situation. A basic action theory is defined by the following set of axioms BAT=(Σ, DS0 , Dssa , Duna , Dap ). (1) Here • Σ is the set of domain independent foundational axioms for the domain of situations, see Table 1. Situations are kept countably infinite by a second order axiom (see Table 1) assessing that there are no unintended models of the language, in which situations can be strange objects. 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 6 • Duna is the set of unique name axioms for actions, which expresses that different action terms, namely different names, stand for different actions: A(x1 , . . . , xn ) � B(y1 , . . . , yn ), and identical action terms have the same arguments A(x1 , . . . , xn ) = A(y1 , . . . , yn ) → x1 = y1 ∧ · · · ∧ xn = yn . • DS0 is a set of first-order formulas describing the initial state of the domain (represented by S 0 ). • Dssa is the set of successor state axioms [62, 65], one for each fluent symbol F(�x, s), in the language. A successor state axiom is an explicit definition of a fluent in a successor state do(a, s) as follows: F(�x, do(a, s)) ≡ ΦF (�x, a, s). A successor state axiom provides both a definition of action effects and a solution to the frame problem (assuming deterministic actions). Given a basic action theory, it is possible to infer the properties of the theory just appealing to the initial theory DS 0 . This is done by regressing any formula, taking as argument a situation of the form do(am , . . . , do(a1 , S 0 )), into an equivalent formula taking as argument the initial situation S 0 and not mentioning other situations different from S 0 . The regression of a formula φ(do(am , . . . , do(a1 , S 0 ))) is defined via a regression operator R by induction using the definitional structure of the successor state axioms and the properties of R as follows: R(F(�x, do(a, s))) = ΦF (�x, a, s) R(¬φ) = ¬R(φ) R(φ1 ∧ φ2 ) = R(φ1 ) ∧ R(φ2 ) R(∃x.φ) = ∃x.R(φ). (2) The simplicity and elegance of making inference and prove properties of situations is, indeed, due to the structure of the axioms, based on explicit definitions of the successor state. This structure would be prejudiced if state constraints are added to the theory, i.e. formulas mentioning situations neither uniform in S 0 2 nor in the form of successor state axioms. For example the followings state constraints: ∀s.Raise(sun, s). ∀sOn(x, y, s) ∧ On(y, z, s)→On(x, z, s). (3) lacking a definitional structure, would compromise the inference based on regressing sentences to the initial database DS 0 . Golog, was earlier introduced in [38], is an agent programming language formally based on the SC and usually implemented in Eclipse Prolog. Golog uses Algol-like control constructs to define complex actions from the primitive actions, which are those of a basic action theory BAT, see (1): 1. Action sequences: p1 ; p2 . 2. Tests: φ?. 3. Nondeterministic action choices: p1 |p2 . 2 Formulas uniforms in σ = do(a1 , . . . , do(am , S 0 )), m ≥ 0, are formulas either not mentioning situation terms or formulas not mentioning Poss nor � nor any other situation term than σ [61]. 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 7 4. Nondeterministic choices of action argument: (πx).p(x). 5. Conditionals: if φ then p1 else p2 . 6. While loops: while φ do p. 7. Nondeterministic iteration: p1 ∗ . 8. Procedure calls: {proc P1 (�x1 ) p1 end; . . . proc Pn (�xn ) pn ; p } An example of a Golog program is while ¬At(1, 2) do (πx, y)moveT o(x, y). Intuitively, the nondeterministic choice (πx, y)moveT o(x, y) is iterated until the atom At(1, 2) is verified. The Golog declarative semantics is defined in the language of SC. Given a complex action δ (a Golog program), the abbreviation Do(δ, s, s� ) says that situation s� can be reached from situation s by executing some complex action specified by the program δ. The construct definitions are the following: 1. Primitive actions: def Do(a, s, s� ) = Poss(a, s) ∧ s� = do(a, s). 2. Test actions: def Do(φ?, s, s� ) = φ[s] ∧ s = s� . 3. Sequence: def Do(p1 ; p2 , s, s� ) = ∃s�� Do(p1 , s, s�� ) ∧ Do(p1 , s�� , s� ). 4. Non-deterministic choice of two actions: def Do(p1 |p2 , s, s� ) = Do(p1 , s, s� ) ∨ Do(p2 , s, s� ). 5. Non-deterministic choice of arguments: def Do(π(x, p(x)), s, s� ) = ∃x Do(p(x), s, s� ). 6. Non-deterministic iteration: def Do(p∗ , s, s� ) = ∀P.{P(s1 , s1 )∧ ∀s1 , s2 , s3 .P(s1 , s2 ) ∧ Do(p, s2 , s3 )]} → P(s1 , s3 ). For procedure call expansion and other important constructs we refer the reader to [38, 29, 28, 65, 31]. 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 3.2 8 Extensions of the SC Among several languages for action theories (like the Fluent Calculus [68, 17, 76] the Event Calculus [72, 73, 22] and the Action language [26]) the SC is particularly simple to be extended and adapted to specific domains, such as, for example, cognitive robotics domains. In fact, being an axiomatic theory, an extension of the SC, requires two simple steps: 1. extend the set of foundational axioms Σ to account for more rich domains; 2. show that the new extension respects the fundamental constraints required to do inference within the system. The flexibility of both the successor state axioms D ss and the action precondition axioms Dap allows the user to define any domain. This is the reason why there have been many contributions to extensions of the Situation Calculus such as ([42, 58, 40, 66, 61, 4, 59, 21, 9, 65]). All these contributions, further, have coped with the constraints required by the regression inference, including the specification of the Golog programming language (see Section 7) such as ([38, 11, 64, 30]), requiring axioms to be based on the construction of explicit definition. In particular, macro definitions (see also [65] for a paragraph on “Why Macros?”) are explicit definitions of predicates that are not added to the language, therefore they stand also for abbreviations of the formulas defining them (the definiens). We refer the reader to [65, 61] for a complete introduction to the inference mechanisms in the Situation Calculus. In this paper we extend the Situation Calculus by adding a new set of axioms to the set of its foundational axioms, and by introducing macro definitions. In order to ensure that all the constraints are satisfied we need to go into details that are rather boring, although often straightforward, therefore many details are postponed and described in the Appendix. In particular, all proofs, likewise lemmas, for this section are given in Appendix A and Appendix B. 3.3 Time, types and bag of timelines The set of foundational axioms of the Situation Calculus, together with the set of new axioms are reported in Table 1. We introduce three kind of extensions. The first extension, concerning time, is essentially the same as the one introduced by Reiter in [65] and [63, 56], but making explicit the definition of start [65]. With the second kind of extension we introduce name types as objects, i.e. specific elements of sort object, denoted by constants: these are used in the perspective of describing a system with several components that can be each named by a specific constant. Note that because a robot system is composed of a finite set of parts we assume that the set of components is bound to be finite, although what a component can do might be described by an infinite set of actions. Each name type is extensively defined by a collection of actions that the component can execute. Name types are used to classify actions according to the actuating agent, for example often this has been implicitly assumed in the presence of different agents, by naming each agent specifically, here we do the same but in a systematic way. The third extension deals with set of situations, with some specific properties which are obtained by types and time. These kind of situations are called timelines and a set of timelines is here specified by what we call bag of timelines. Notation: in the following sections LX denotes the language specified by the axioms X. More precisely we assume that the signature includes all the symbols (functions and predicates) mentioned in X, equality, the quantifiers and connectives of FOL. Thus, for example, LΣ is the language of the axioms Σ. Also, we shall use the ordering relation s � s� between situations s and s� , that abbreviates s � s� ∨ s = s� (see foundational 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 9 Foundational Axioms of SC, Σ s � do(a, s� ) ≡ s � s� ¬(s � S 0 ) do(a, s) = do(a� , s� ) ≡ s = s� ∧ a = a� ∀P.P(S 0 ) ∧ ∀as.P(s)→P(do(a, s))→∀sP(s) Time axioms, Ax0 Σtime = Σ ∪ Ax0 Flexible Time SC extension axioms A+ = Σ ∪ Ax0 ∪ Ax1 ∪ Ax2 ∪ Ax3 Type axioms, Ax1 -Ax2 ΣH = Σtime ∪ Ax1 , Σ=ν = ΣH ∪ Ax2 Timelines axioms Ax3 and Bag of timelines axioms, Ax4 Σ=ν = ΣH ∪ Σtime = Σ ∪ Ax0 ∪ Ax1 ∪ Ax2 T1. time(S 0 ) = t0 ∀a. H1 (H(i, a)∧ i=1 n n � � (H(i, a)→ ¬H( j, a))) i=1 T2 ∀�x, t.time(A(�x, t)) = t→t > t0 T3. ∀�x, t, s. time(do(A(�x, t), s)) = time(A(�x, t)) n � j=1 j�i H2 ∀a, a� .a=ν a� ≡ ∃i. H(i, a) ∧ H(i, a� )) E1. ∀s, s� (s = s� →s=ν s� )∧ (s � S 0 →¬(s=ν S 0 ∨ S 0 =ν s)) E2. ∀a¬(a=ν S 0 ) E3. ∀a, a� , s� (a =ν do(a� , s� )) ≡ (a =ν a� ) ∧ (s� � S 0 →(s� =ν a)) E4. ∀a� , s, s� (s=ν do(a� , s� )) ≡ (s=ν a� ) ∧ (s� � S 0 →s� =ν s) E5. ∀a, s(s=ν a) ≡ (a=ν s) A+ = Σ=ν ∪ Ax3 ∪ Ax4 � G1. ∀s, s j1 , . . . s jk . s ∈S B(�s j1 , . . . , s jk � j ) ≡ � 1≤p≤k �� � � s = s jp ∧ s jp = S 0 ∨ T (i, s jp ) k∈N i ∀s∀ , � . (s ∈S G2. ≡ s ∈S � ) ≡ ( =S � ) G3. ∃ . = B0 G4. ∀s ∀ ∀i.s = S 0 ∨ T (i, s)→ ∃ � ( � =S ∪S B(s)) G5. for all sentences φ ϕ(B 0 ) ∧ (∀ ∀s.ϕ( ) ∧ ϕ(B(s))→ ϕ( ∪S B(s))) →∀ ϕ( ) W1. n � ∀a, s. T (i, do(a, s)) ≡ (s = S 0 ∧ H(i, a))∨ i=1 (s � S 0 ∧ a=ν s ∧ T (i, s)) Table 1: The foundational axioms Σ, of the basic Situation Calculus, the axioms A+ = Σ∪Ax0 ∪Ax1 ∪Ax2 ∪Ax3 ∪ Ax4 of the Flexible time Situation Calculus, extending Σ with time, types and bag o f timelines. The extension of the foundational axioms Σ is incremental. Four sets are built: first the set Σtime , extending Σ with time; then the set ΣH , extending Σtime with types; then the set Σ=ν extending ΣH with equivalence relations over actions and situations; finally the set A+ extending Σ=ν with timelines and bag of timelines. 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 10 axioms Σ in Table 1). We shall use �x to denote a tuple of variables, a to denote variables of sort action, A(�x) to denote action functions with arguments �x. When a situation mentions only actions in the form A(�x) then its only variables are variables of sort object, thus we use the symbol α to denote actions which are either ground or in the form A(�x). We use the symbol s to denote variables of sort situations, σ to denote histories of actions, such as σ = do(am , . . . , do(a1 , S 0 )), that is, a sequence of actions of length m, m >= 1 and S 0 to denote the initial situation, S 0 is a constant. As we shall extend the signature the new symbols will be contextually introduced. � 3.4 Representing Time in TFSC Time has been extensively introduced in the Situation Calculus in [58, 63, 60], where actions are instantaneous, and their time is selected by the function time(.). Durative actions are considered as processes [58, 65], represented by fluents, and durationless actions start and terminate these processes. For example, going(hill, s) is started by the action startGo(hill, t) and it is ended by endGo(hill, t� ). Analogously as in [63, 56], primitive actions are instantaneous and are represented by the term A(�x, t) where t is a special argument representing the execution time. For example, moveT o(room4, 0.5) means that moveT o was executed at time 0.5. We use time selection functions to extract the time of both actions and action sequences. In particular, we introduce a function time : S it ∪ Act → R+ , mapping both situation and actions into the positive real line, thus we implicitly assume that the reals are axiomatised (see the Appendix page 45). We also introduce the relation < and ≤ defined as < ∨ = ranging over the reals R+ . This is a common assumption in the Situation Calculus (see [65]), thus we leave it like this. We denote with Ax0 the set of axioms T 1-T 3 and Σtime = Σ ∪ Ax0 , see Table 1. Axiom T 1 says that the time of the initial situation S 0 is the initial time t0 , axiom T 2 says that the time of an action is the time of its time argument which has to be a positive real number successive to the initial time. Finally the third axiom T 3 says that the time of a situation do(a, s) is the time of the action a. The set Σtime is a conservative extension of the axioms Σ, of the basic Situation Calculus (see Lemma 1 in the Appendix). Here by conservative extension we mean that Σtime is obtained by extending the original language LΣ to the new language LΣtime without changing the initial theory Σ and its deductive closure, when only the original language is considered. The set Ax0 , however, does not ensure that the ordering � on situations is coherent with time. In other words s � s� →time(s) ≤ time(s� ) does not hold, in general, in a model M in which Σtime ∪ Duna is verified. Nevertheless it is always possible to build a model of Σtime ∪ Duna in which the above condition is verified (see Lemma 2 in the Appendix). Thus, to add coherence between situations and time we need to add a further axiom T 4. s � s� →time(s) ≤ time(s� ) (4) This new axiom will restrict the set of models to time coherent situations. In this models, although s � s� →time(s) ≤ time(s� ) will be verified, the inverse implication (time(s) ≤ time(s� ))→s � s� in general will not hold. 3.5 Typed Actions and Situations The second column of Table 1 illustrates the axiomatisations for name types. The distinction between sorts and name types is that sorts induce a partition on the domain, while name types are defined in the language via constant symbols involving only the sort Ob j, still inducing a partition on actions hence on situations. In particular, axioms Ax1 = {H1, H2} regulate name types, and the set Ax2 = {E1, E2, E3, E4, E5} extend types to situations via the relation =ν that is defined by Axiom (H2). Axiom (H1) settles the required specifications for name types to be coherent with respect to actions. The disjunction of components mentioned in the first conjunct of (H1) states that each action is ascribed to some component i. On the other hand the second conjunct of (H1) 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 11 states that, whenever an action is ascribed to a component with name type i, it cannot be ascribed to any other component. Note that (H1) does not affect the set Duna of inequalities for actions and, clearly, in a basic action theory, with a single component, (H1) is always satisfied. The axioms (H1) and (H2) can be safely added to Σtime , forming the theory ΣH = Σtime ∪ Ax1 , and the theory ΣH maintains satisfiability, see Lemma 3, in the Appendix page 46. The partition of actions, according to name types, is equipped with the relation =ν , defined by (H2), see Table 1. We show that =ν is an equivalence relation on the set of actions in Lemma 4, see the Appendix, page 47. Axioms E1-E5 (see Table 1, from row nine, second column) are, thus, needed to extend the relation =ν , defined by axiom (H2) for actions, to the set of situations. Axiom (E1) states that if two situations are equal then they must be also of the same type, but no situation is of the same type as S 0 . Axiom (E2) states that no action can be of the same type of S 0 . Axiom (E3) states recursively that an action is of the same type of a situation do(a� , s� ) if it is of the same type of the action a� and it is of the same type as s� , whenever s� is not S 0 . Finally, axiom (E4) says that two situations are of the same type if they mention actions and situations of the same type, and (E5) states symmetry between actions and situations. Also these axioms can be safely added to the theory built so far. Lemma 5, in the Appendix B, page 47, shows that adding axioms Ax2 = E1-E5 to the theory ΣH , in so obtaining the new theory Σ=ν = ΣH ∪ Ax2 , can be done consistently, also if the axioms set Duna is included. Furthermore we show in Lemma 6 (see the Appendix page 48), that =ν is an equivalence relation both on actions and on situations. This fact will be used to form timelines (see Section 3.6) Theorem 1 Let Σ be the set of foundational axioms of the Situation Calculus, and let Duna be the set of unique name for actions, then the set of axioms Σ=ν = Σ ∪ Ax0 ∪ Ax1 ∪ Ax2 is a sound axiomatisation of the temporal flexible Situation Calculus, that is, the set of axioms and Duna together form a satisfiable theory. � Now, with H1-H2 a new predicate H is introduced in the language, and it is defined for each component i. Let i1 , i2 , . . . , in be a finite set of constants denoting the components of a system, each ik is a name type. Each H(i, a) can be introduced by an extensional definition as follows: ∀a.H(i, a) ≡ φ(i, a) (5) When the action names for a specific component is a finite set then the extensional description of the component might be done as follows: ∀a.H(i, a) ≡ ∃�x1 , . . . , �xn . n � Ai (�xi ) = a (6) i=1 Example 2 If we want to define the actions for the robot component Pan-tilt we would introduce the constant pan-tilt and define it by its actions as follows: ∀a.H(pan-tilt, a) ≡ ∃ t θ.a = pan(θ, t) ∨ ∃ t γ.a = tilt(γ, t) ∨ ∃ t x y. a = scan(x, y, t). (7) � This set of definitions for H is added to DS 0 , as they are all uniform in S 0 . The easiest way to ensure consistency is to ascribe each action to a single type; in Lemma 7 (see the Appendix page 50) we show the conditions to ensure consistency of the type definitions with the axiom H1. As usual with typed languages there are drawbacks, if we consider a generic action name, such as run, that could be ascribed to more than one component for all its arguments, then we need either to specialise it to each type or to create a single component gathering all those subscribing the action run. 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 12 Figure 2: Timelines on a tree of situations. For this representation H(i, a1 ) ∧ H(i, a2 ) and H( j, a3 ) are possible types, T (i, do(a1 , S 0 )) ∧ . . . T (i, do(a2 , S 0 )), and T ( j, do(a3 , S 0 ) are timelines. By the SSA for timeline, these extend along the situations as indicated by the thick black lines. 3.6 Timelines and bag of timelines We introduce in this section the concept of a timeline. This concept is particularly useful for flexible planning because it makes possible to describe the interaction between processes performed by different components of the system. This concept makes it also possible to deal with the time at which a process starts and ends in a flexible way, according to the way processes interact. A timeline is denoted by a fluent T (i, s) and it is defined by an improper successor state axiom as follows, see Table 1 axiom (W1): (W1) n � i=1 T (i, do(a, s)) ≡ (s � S 0 ∧ a=ν s ∧ T (i, s)) ∨ (s = S 0 ∧ H(i, a)). (8) Note that (8) is not uniform in s3 as it mentions S 0 . Nevertheless the disjunction is obviously exclusive and thus the right hand side never diverges, indeed (8) is regressable as it is shown in Lemma 11, in the Appendix B. Example 3 The timeline for the pan-tilt unit can be defined as follows: ∀a s.T (pan-tilt, do(a, s)) ≡ s � S 0 ∧ a=ν s ∧ T (pan-tilt, s) ∨ s = S 0 ∧ H(pan-tilt, a). Which, by the previous Example 2, are all the histories built up by the actions pan(θ, t), tilt(γ, t) and scan(x, y, t). Note that, given the extended set Σ=ν the introduction of timelines does not affect the set of successor state axioms and action precondition axioms, thus: 3 The typical form of a successor state axiom requires the right hand side to mention only the situation s named in the left hand side, being thus uniform in s, to ensure that the regression via the right hand side ends in S 0 , in so not diverging into two, or more, different situations. 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 13 Corollary 1 Let D ss ∪Dap be the set of successor state axioms mentioning also timelines and action precondition axioms. Let DS 0 be the set of formulas, uniform in S 0 4 . Σ=ν ∪ Duna ∪ DS 0 is satisfiable iff Σ=ν ∪ Duna ∪ DS 0 ∪ Dap ∪ D ss is satisfiable. The characteristics properties of timelines are stated below. Theorem 2 A timeline represents the =ν -equivalence class of situations of the same type. The above theorem (see B.4 for the proof) states that all actions in a timeline are of the same type, and whenever a set of actions are of the same type they form a timeline, thus not all situations form timelines. In Section 4 we show that, under precise conditions, the set of timelines form the set of situations executable by the system components. Example 4 Timelines are sequences (histories) of actions indicated in thick black in Figure 2. The histories of actions not belonging to the set of timelines are represented in light gray, that is, histories of actions leading to situations, that do not belong to timelines, are depicted by thin gray lines. So a timeline represents the equivalence class of histories of the same type, yet how to ensure a meaningful interaction between timelines that can support switching tasks, is not proven. Suppose that we need to say that while the robot is exploring a given environment to correctly scan the surrounding it should stop or decelerate. The system component controlling the exploration actions should suitably synchronise with the component controlling the pan-tilt and the camera. To treat the interleaving between these two processes we have to ensure that at each time step of the operation loop all timelines are available for choice and switching decisions. To this end we introduce a new concept that can support set of timelines. This new concept comes with a new sort, we call this new sort the sort of bag of timelines. Intuitively, a bag of timelines is interpreted as a set of lists of actions, where each list of actions is a timeline. We require a bag of timelines to be a finite set and to mention only situations which are timelines, and possibly S 0. First we have to introduce the sort S standing for bag of timelines. This is defined as the codomain of a countable set of function symbols B : (S n �→ S n ) �→ S mapping a permutation of a tuple of situations into an element of the sorted domain, whose intended interpretation is a bag of timelines. Equality on these terms should account for idempotence and commutativity. Therefore we shall extend equality to account for permutations and repetitions of equal arguments. An S -term is defined as B(�s j1 , . . . , s jm � j ). Here, if m = 0 we obtain the empty bag of timelines, and we denote the empty bag with the constant B 0 . With �s j1 , . . . , s jm � j we denote a permutation of {1, . . . , m}. To formalise these ideas we adapt the finite set axiomatisation, from the system F of Brown and Wang [79], to add bags of timelines to the language together with the specific symbols ∈S , =S and empty bag, here identified with the constant term B 0 . The axioms, listed in Table 1 are here reported again, where all variables appearing without quantifiers are implicitly universally quantified outside. � � �� � � (G1) ∀s.s ∈S B(�s j1 , . . . , s jk � j ) ≡ 1≤p≤k s = s jp ∧ s jp = S 0 ∨ i T (i, s jp ) k∈N Here i ranges over the finite set of name types. (G2) ∀s. (s ∈S ≡ s ∈S � ) ≡ ( =S � ) (9) (G3) ∃ . = B 0 (G4) ∀s∀i. s = S 0 ∨ T (i, s)→∃ � . ( � =S ∪S B(s)) (G5) For every sentence ϕ : ϕ(B 0 ) ∧ (∀ ∀s.ϕ( ) ∧ ϕ(B(s))→ϕ( ∪S B(s)))→∀ ϕ( ) 4 See for the definition of uniform formulas Section 3.1 and [61]. Here DS 0 mentions also all the definitions H(i, a) for each name type i. 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS 14 The above axiom set (G1) defines the symbol ∈S . Note that axiom (G1) could be bound by a n ∈ N and transformed into a single axiom, otherwise there is an axiom for each k ∈ N. The set of axioms (G1) says that a situation s can belong to a bag of timelines, if s is equal to some of the timelines specified in the bag and each situation in the bag of timelines is either S 0 or it is, indeed, a timeline. Note that, following Theorem 2, although S 0 is not a timeline it can belong to a bag of timelines. Axiom (G2) is the extensionality axiom limited to bags of timelines. Axiom (G3) is the unconditional existence axiom, provided that the empty bag is the constant term B 0 . Axiom (G4) is the conditional existence for finite bags of timelines, provided that and B(s) are finite bags. Axiom (G4) too tells us that a bag of timelines can include S 0 . This axiom would allow bags unbound in size. To get bags bound in size whenever axiom (G1) requires so for some n then (G4) is changed accordingly. Note that, in (G4), ∪S is derivable from (G1), see the set operations as obtained in Example 6. Finally the last axiom (G5) is the inductive characterisation of finite sets, it tells that whenever sentences specify terms denoting bags of timelines these terms will denote finite bags. Example 5 Let us consider the two timelines T (pan tilt, do(pan(θ), do(tilt(γ), S 0 ))) and T (laser, do(acquire, S 0 )), then: a. b. c. B(�do(pan(θ), do(tilt(γ), S 0 )) p1 , do(acquire, S 0 ) p2 � p ) B(�do(pan(θ), do(tilt(γ), S 0 )) p1 , do(acquire, S 0 ) p2 � p ) =S B(�do(acquire, S 0 )q1 , do(pan(θ), do(tilt(γ), S 0 ))q2 �q ) B(�do(tilt(γ), S 0 ) p1 , do(tilt(γ), S 0 ) p2 , S 0 p3 , S 0 p,4 � p ) =S B(�do(tilt(γ), S 0 ) p1 , S 0 p3 , � p ). is a bag of timelines, by (G1) by (G1,G2) by (G1,G2) � The other usual symbols for sets can be extended to bag of timelines, using definitions or, more generally, the induction axiom. Example 6 The operators ⊆S , ∪S and ∩S can be defined as follows: x. ( ⊆S ) xx. ( ∪S xxx. ( ∩S � � de f ∀s.s ∈S →s ∈S = =S ) =S ) de f ≡ (s ∈S ∀s.s ∈S = de f ≡ (s ∈S ∀s.s ∈S = ∨ s ∈S ∧ s ∈S � ) � ) (10) On the other hand it is possible to use induction to prove properties of bag of timelines. For example the following simple property: ∀ , �, �� . ⊆S � ∧ � ⊆S �� → ⊆S �� (11) . can be proved by induction as follows: 1. 2. 3. 4. Let: ϕ(B 0 ) ϕ( ) ◦ ϕ( ∪ ◦ ) = ∀ � , �� .B 0 ⊆S � ∧ � ⊆S �� →B 0 ⊆S �� . = ∀ � , �� . ⊆S � ∧ � ⊆S �� → ⊆S �� . = B(s) = ∀ � , �� .( ∪ ◦ ) ⊆S � ∧ � ⊆S �� →( ∪ ◦ ) ⊆S (12) �� . 3 BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS a. b. c. d. e. f. g. h. i. j. Then : φ(B 0 ) ≡ � ( ∪S ◦ ) ⊆S � ∧ � ⊆S �� → ⊆S � ∧ � ⊆S �� ( ∪S ◦ ) ⊆S � ∧ � ⊆S �� → ◦ ⊆S � ∧ � ⊆S �� ( ⊆S � ) ∧ � ⊆S �� → ⊆S �� ( ◦ ⊆S � ) ∧ � ⊆S �� → ◦ ⊆S �� ( ⊆S � ) ∧ ( � ⊆S �� ) ∧ ( ◦ ⊆S � )→ ◦ ⊆S �� ∧ ⊆S ⊆S �� ∧ ◦ ⊆S �� →( ∪S ◦ ) ⊆S �� ( ∪S ◦ ) ⊆S � ∧ � ⊆S �� → ∪S ◦ ⊆S �� ϕ(B 0 ) ∧ (∀ � ∀s.ϕ( ) ∧ ϕ(B(s))→ϕ( ∪S ◦ )) ∀ ϕ( ) �� 15 By (G1) and (G2) by (G2), (x) and (xx) of Ex. 6 and Taut. by (G2), (x) and (xx) of Ex. 6 and Taut. by Ind. Hyp. by Ind. Hyp. (13) by d,e and Taut. by f, (G2), (x) and (xx) of Ex. 6 by b, g and Taut. by a, d, e, and g by i and (G5) A precedence relation �S between two bags of timelines can be defined as follows: �S ≡ ∀s ∃s� .s ∈S →s� ∈S ∧ s � s� ∧ ∀s� ∃s.s� ∈S →s ∈S ∧ s� � s (14) � The axiomatisation of bags of timelines is sound. Let Ax3 = G1-G5 and A+ = Σ=ν ∪ Ax3 then: Theorem 3 A+ ∪ Duna is satisfiable. � So far we have extended the language of a basic theory of actions in the Situation Calculus to include time, types, a new equality symbol =ν and bag of timelines, the final language is thus LT FS C . The extended language in particular includes all the formulas inductively defined using also the following set of atoms which, in turn, can be defined using the symbols =S and ∈S : Definition 1 If and are terms of sort bag of timelines and s is a term of sort situation, then s ∈S , =S ∪S � , =S ∩S � , =S \S � , ⊆S , �S are atoms of the extended language. =S , In [65] sets are often implicitly assumed, for example to define sets of actions with concurrent processes. Here the definition of sets of situations through bag of timelines is more involved. Indeed, we shall use them in Section 5 to build macro definitions of temporal compatibilities, from which we shall obtain the temporal network specifying time constraints and temporal relations. Macro definitions will then be reduced to sentences of the TFSC, therefore to prove properties about these sentence we shall often use regression, a central computational mechanism in the Situation Calculus. We introduce here the theorem ensuring that sentences mentioning bag of timelines are regressable, under analogous restriction conditions given in [61] and we refer the reader to the Appendix, page 54, for the details. Here by a regressable sentence and a k-uniform term we mean, respectively, a sentence and a term that satisfy the conditions specified in [61] and suitably extended to bag of timelines (see the Appendix, Definition 4 and Definition 5, page 55). Let D+ be a basic action theory with Σ extended to A+ (see the above Theorem 3), then: Theorem 4 Let φ( 1 , . . . , k ) be a regressable sentence mentioning terms of sort bag of timelines. There exists a formula R(φ( 1 , . . . , k )) uniform in S 0 such that; D+ |= R(φ( 1 , . . . , k )) ≡ φ( 1 , . . . , k) (15) 4 4 THE SYSTEM AT WORK: PROCESSES IN TFSC 16 The system at work: processes in TFSC For each type Hi , encoding a system component, we assume that there exists a set of processes and a set of fluents describing the behaviours of the component. It follows that also these actions need to be specified for the type H(i, a) of each component i. Processes span the subtree of situations, over a single interval between a start and end action: for each process there are two actions, starting and ending the process, abbreviated by startπ , meaning starts process π and endπ , meaning ends process π. To simplify the presentation we shall add to the start and end actions the type which, in general, given the H and the =ν is not needed. A process is denoted by a fluent π(i, �x, t− , s), where i is for the type and t− for its start time. Successor state axioms for processes (Dπ ) extend the set D ss of successor state axioms for fluents and are defined as follows: π(i, �x, t− , do(a, s)) ≡ a = startπ (i, �x, t− ) ∨ π(i, �x, t− , s) ∧ ∀t.a � endπ (i, �x, t). (16) For example the process for the component nav moving towards θ, can be defined as: move(nav, θ, t− , do(a, s)) ≡ a=startmove (nav, θ, t− ) ∨ move(nav, θ, t− , s) ∧ ¬∃t� .a = endmove (nav, θ, t� ). As usual (see [65]) a situation is defined to be executable as follows: de f executable(s) = ∀a, s� .s = d(a, s� ) � s→Poss(a, s� ) (17) On the other hand, given a set of processes related to a timeline T (i, s), their distribution on the timeline is controlled by the fluent Idle(i, s) telling whether a process, of type i, is being executed at the situation s. The successor state axiom for Idle is defined as follows: Idle(i, do(a, s)) ≡ (s = S 0 ∧ H(i, a) ∨ T (i, s) ∧ a=ν s) ∧ �� � �� � x t.a = endπ (i, �x, t) ∨ π∈Π ¬∃�x t.a = startπ (i, �x, t) ∧ Idle(i, s) . π∈Π ∃� (18) That is, Idle(i, s) lasts up to the process starting and after its end. We can then break down the processes along a timeline using the preconditions axioms (Dap ) as follows: Poss(startπ (i, �x, t), s) ≡ (s = S 0 ∨ s=ν startπ (i, �x, t)) ∧ Idle(i, s) ∧ time(s) ≤ t ∧ Φ start (i, �x, s); Poss(endπ (i, �x, t), s) ≡ (s = S 0 ∨ s=ν endπ (i, �x, t)) ∧ ∃t− π(i, �x, t− , s) ∧ time(s) < t ∧ Φend (i, �x, s). (19) Here Φ start (i, �x, s) and Φend (i, �x, t, s) are the precondition formulas for the execution of startπ and endπ respectively. These can possibly refer to other timelines, hence to other components. We do not further investigate here this possibility, instead we follow the approach in [43], where global constraints like e.g. Poss(a, s) → time(s) ≤ time(a) are specified by action preconditions of the form Poss(A(�x, t), s) ≡ Φ(�x, t, s) ∧ time(s) ≤ t. Indeed, here, time(s) < t is required for the endπ (i, �x, t), to filter out durationless processes. If a process of a component i is already active in S 0 all other processes of the same components cannot be active. This proviso is intuitive, each component of the complex system can execute a process at a time and the component is idling only if none of its processes are active. This requirement is expressed by the following property, for all types i: � Idle(i, S 0 ) ∨ ∃�x.π(i, �x, t0 , S 0 ) → ¬Idle(i, S 0 ) ∧ ∀�y ¬π� (i, �y, t0 , S 0 ) (processes consistency). (20) � π ∈Π π�π� If π(i, �x, t0 , S 0 ) holds for some �x in S 0 , then this is the only active process of type i, hence ¬Idle(i, S 0 ) holds too, because the i-component has an active process and so it is not idling. Note that there is no need to have a complete description of the initial situation DS 0 . Let us define Dπ to be the set of successor state axioms 5 TEMPORAL INTERVALS AND CONSTRAINTS 17 for processes, the set of action precondition axioms for processes and the successor state axiom for Idle, let D ss ∪ Dap be the set of successor state axioms and action precondition for fluents, and let DS 0 be the set of formulas uniform in S 0 , such that the above requirement (20) is satisfied in DS 0 . Let DT be the theory formed by D ss ∪ Dap ∪ Dπ ∪ DS 0 ∪ A+ ∪ Duna , then Theorem 5 DT is satisfiable iff DS 0 ∪ A+ ∪ Duna is. � Given the successor state axioms (16) and (18) along with the preconditions (19) the processes consistency property (20) holds for each executable timeline (see 17). We first show that any executable situation is a timeline. For this we may assume that the action preconditions for fluents (not processes) must be of the form: Poss(A(�x, t), s) ≡ (s = S 0 ∨ s=ν A(�x, t)) ∧ Φ(A(�x, t), s) (21) with A any action, possibly different from startπ and endπ . Proposition 1 Let DT = D ss ∪ Dap ∪ Dπ ∪ DS 0 ∪ A+ ∪ Duna , then, any executable situation σ is a timeline. � Using the above result we can state: Proposition 2 Let DT be as in Theorem 5 such that (20) holds in DS 0 , and let σ be an executable situation, then for any process π and type i: � � � � � DT |= Idle(i, σ) ∨ ∃�x t.π(i, �x, t, σ) → ¬Idle(i, σ) ∧ ∀�y π�π y, t, σ) . (22) π� ∈Π ¬π (i, � � The precondition axioms in (19) compel executability only on timelines. This requirement does not impose that preconditions of actions are unaffected by other timelines as, in fact, they might be specified in the ΦQ , but simply that there exists a component able to execute an action sequence. This notion is useful for the generation of executable flexible plans. On the other hand, hybrid executability both for processes and fluents, would require to introduce a distinction between: executability within the component (based on the preconditions (19)) and executability within the system. With two notions of executability at hands one could exploit non-timeline situations to reason about the system. For example a situation like σ = do([startgo (nav, pos1 , 1), start scan (pan, 2), end scan (pan, 3), endgo (nav, pos1 , 5)], S 0 ) could be used as a system log and exploited to infer properties about the overall system behaviour. Here we derive only the first notion of executability and we do not develop the latter. 5 Temporal Intervals and Constraints So far we have given the basic formalism to model parallel processes that can be executed on timelines specified by different components. The way these processes interact in terms of time can be expressed by time constraints taken from the classical relations [2, 47, 6] between time intervals, see Figure 3. Notation: In this section we shall denote process and fluent symbols with uppercase letters as P, Q, . . . to treat them uniformly, while in the example names of processes are all indicated by lowercase letters. All the defined predicates are macros, hence they are not added to the language and all the fluents appearing on the right hand side of the definition, that is, in the definiens, are defined by a successor state axiom. This fact ensures that macros cannot be reduced to state constraints. A temporal interval is denoted by [t− , t+ ]. Temporal interval relations before, meets, overlaps, during, starts, finishes, equals are denoted by b, m, o, d, s, f, e respectively. 5 TEMPORAL INTERVALS AND CONSTRAINTS 18 We represent the free temporal variables using the notation t to indicate that a temporal variable t occurs free; when necessary, we use τ to represent either a temporal variable that occurs free or a ground term instantiating a temporal variable. Further, we introduce the notation σ[ω] ( [ω] for bags of timelines) to explicitly denote a tuple ω = �t1 , t2 . . . , tn � of free temporal variables mentioned in a situation σ(bag of timelines ). Example 7 Consider the usual temporal interval relations: b, m, o, d, s, f, e, as defined in the temporal intervals literature, started in [2]. Let pan-tilt and nav (for navigation) be two components of the system, with the processes scan belonging to the pan-tilt component and stop belonging to the nav component. To express that the process scan can be performed while nav is stopped we would like to say: scan d stop, this constraint should be encoded in a suitable TFSC formula mentioning the fluents scan(pan−tilt, t, s) and stop(nav, t, s). � We, thus, begin by defining two predicates S tarted and Ended taking as arguments the processes/fluents arguments together with the starting time and the ending time, respectively. For each process/fluent P, these predicates are defined as follows: S tartedP (i, �x, t− , a, s) EndedP (i, �x, t , a, s) + de f = P(i, �x, do(a, s)) ∧ ¬P(i, �x, s) ∧ time(a) = t− de f = P(i, �x, s) ∧ ¬P(i, �x, do(a, s)) ∧ time(a) = t+ (23) The meaning of S tartedP is the following: a process P which does not hold in s is started by action a at time t− in so becoming active. On the other hand, the meaning of EndedP is: a process P which is currently holding in s is ended by action a at time t+ in so becoming elapsed. We can now define explicitly the temporal characterisation of a process in a time lapse. de f ActiveP (i, �x, t− , do(a, s)) = T (i, do(a, s)) ∧ S tartedP (i, �x, t− , a, s) ∨ ActiveP (i, �x, t− , s) ∧ ¬∃t+ Ended(i, �x, t+ , a, s) de f Elapsed(i, �x, t− , t+ , do(a, s)) = T (i, do(a, s)) ∧ Elapsed(i, �x, t− , t+ , s) ∨ Ended(i, �x, t+ , a, s) ∧ ActiveP (i, �x, t− , s) (24) The meaning of Active and Elapsed is intuitive: a process is active if it started at some time t− before the current time and it is still holding, while it is elapsed if it was active at some time before but is no more active. We assume that at time t0 , the time of S 0 , there is no record of past processes but there might be active processes just started at time t0 . This is expressed in the following definitions: de f ActiveP (i, �x, t− , S 0 ) = P(i, �x, S 0 ) ∧ time(S 0 ) = t− de f ElapsedP (i, �x, t− , t+ , S 0 ) = ⊥ (25) Example 8 For example, the interval during which the fluent predicate at(nav, o, x, s) lasts (nav is for the navigation component) can be described by Elapsedat (nav, o, x, t− , t+ , s) and Activeat (nav, o, x, t− , s) described as follows. Elapsedat (nav, o, x, t− , t+ , do(a, s)) de f Activeat (nav, o, x, t− , do(a, s)) de f = = T (i, do(a, s)) ∧ (Elapsedat (nav, o, x, t− , t+ , s)∨ Endedat (nav, t+ , o, x, a, s) ∧ Activeat (nav, o, x, t− , s)) ; T (i, do(a, s)) ∧ (S tartedat (nav, t− , o, x, a, s)∨ Activeat (nav, o, x, t− , s) ∧ ¬(∃t+ Endedat (nav, t+ , o, x, a, s))) . � 5 TEMPORAL INTERVALS AND CONSTRAINTS 19 With the aid of ElapsedP and ActiveP we can represent the above interval relations between processes, and fluents, specified in DT according to a TFSC formula Fop suitably built up from a combination of ElapsedX , ActiveX (where X denotes the fluent or process they refer to). Let op denote an interval relation: P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ] def = Fop (i, j, �x, �y, ti− , ti+ , t−j , t+j , s, s� ). (26) In particular, we focus on the interval relations op ∈ {b, m, o, d, s, f, e}. Here, P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j ) is a situation-suppressed expression that represents the interval relation between P and Q independently from the situations’ instances, while the expression [s, s� ] restores the situations in the formula. In the following example, we show how some of these relations can be represented in TFSC using the form (26). Example 9 The interval relations m, f, s, and d can be macro-defined as follows. i. Relation P(�x) m Q(�y): P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[s, s� ] def = ElapsedP (i, �x, ti− , ti+ , s) → � � (ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (ti+ = t−j ) . P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j ) holds over the timelines s and s� , with T (i, s) and T ( j, s� ) if, whenever P ends at ti+ , Q starts at t−j with ti+ = t−j . ii. Relation P(�x) f Q(�y): P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[s, s� ] def = ElapsedP (i, �x, ti− , ti+ , s) → � � ElapsedQ ( j, �y, ti+ , t+j , s� ) ∧ (ti+ = t+j ) . P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j ) holds over the timelines T (i, s) and T ( j, s� ) if, whenever P ends at ti+ , Q ends at t+j with ti+ = t+j . iii. Relation P(�x) s Q(�y): P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[s, s� ] def = (ElapsedP (i, �x, ti− , ti+ , s) → (ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (ti− = t−j )) ∨ (ActiveP (i, �x, ti− , s) → (ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (ti− = t−j )). P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j ) holds over the timelines T (i, s) and T ( j, s� ) if, whenever P starts at ti− with argument �x along s, then Q starts at t−j = ti− , with argument �y, along s� . iv. Relation P(�x) d Q(�y): P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[s, s� ] def = (ElapsedP (i, �x, ti− , ti+ , s) → (ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� ))∧ (t−j ≤ ti− ∧ ti+ ≤ t+j )) ∨ (ActiveP (i, �x, ti− , s) → (ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (t−j ≤ ti− )). 5 TEMPORAL INTERVALS AND CONSTRAINTS x 20 x before y y after x y x meets y y met by x x x overlaps y y overlapped by x x x x equals y y x starts y y started by x x x during y y contains x x x x finishes y y finished by x Figure 3: Allen Interval Relations P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j ) holds over the timelines T (i, s) and T ( j, s� ) if, whenever P starts at ti− (and ends at ti+ ) with argument �x along s, then Q started at t−j ≤ ti− (and ends at ti+ ≤ ti− ), with argument �y, along s� . � 5.1 Temporal Compatibilities: Syntax Given the abbreviations described in the previous section, as P(.) op Q(.), we can construct what we call compatibilities that regulate how each process (or fluent) Pi behaves along the timelines. We denote compatibilities by comp(Pi , LLists), where Pi is, indeed, either a process or a fluent symbol, and LList is a list of lists, named List, of pairs (op, P j ) composed of an interval relation op and a process or fluent symbol P j . The set of temporal compatibilities for a given action theory DT in TFSC is denoted T c , and the syntax for their construction is given below: Tc LLists List ::= ::= ::= [ ] | [comp(Pi , LLists) | T c ]; [ ] | [List | LLists]; [ ] | [(op, P j ) | List]. Example 10 A set of two compatibilities mentioning the interval relations m, d, b, e binding the interaction between the processes P1 , P2 , P3 and P4 is defined as follows: Tc = [ comp(P1 , [[(m, P2 ), (d, P3 )], [(b, P4 ), (e, P3 )]]), comp(P2 , [[(s, P1 ), (m, P4 )], [(d, P3 )]]) ]. Here the compatibilities state that either (1) the process P1 meets P2 and is during P3 , or (2) P1 is before P4 and ends P3 ; moreover, either (3) P2 starts P1 and meets P4 or (4) it is during P3 . 5.2 Temporal Compatibilities: Semantics Temporal compatibilities T c , similarly as in Golog, are not first class citizens of the language, thus their semantics is defined through TFSC macros. The time variables mentioned in the compatibilities (see previous paragraph) 5 TEMPORAL INTERVALS AND CONSTRAINTS 21 play an important role in their construction because tasks switching might not be defined in advance, that is, constraints might be qualitatively but not metrically fixed. For example, we know that event A has to occur before B, without knowing the precise duration or timing. We show in Section 6 that this flexibility can be managed by temporal variables whose values are constrained by the temporal network associated with the described compatibilities. We recall that we indicate the free temporal variable using the notation t and the notation s[ω] to denote the occurrence of free temporal variables ω = �t1 , t2 . . . , tn � in a situation s. For example, the situation σ1 = do(endπ (i, t1 ), do(startπ (i, 1.5), S 0 )) represents a process started at time 1.5 with the ending time denoted by the free variable t1 . The semantics of the above defined temporal compatibilities is specified by a predicate I(T c , [ω]) denoting the constraints associated with a bag of situations given the T c compatibilities. The predicate I(T c , [ω]) is obtained by eliciting the time constraints of the variables ω, according to the tail recursive construction illustrated below, using two further predicates I1 and I2 : def I([ ], ) = � . def I([comp(P(i, �x), LLists) | T c ], ) = I1 (comp(P(i, �x), LLists), ) ∧ I(T c , ) . (27) Here the induction is defined with respect to LLists: if LLists is empty then I is true; otherwise I is the conjunction of the predicate I1 taking as argument, to be expanded, the compatibility comp(P(i, �x), LLists) and the predicate I taking as argument the remaining compatibilities T c . The macro expansion construction proceeds as follows. def I1 (comp(P(i, �x), [ ]), ) = ⊥ . def I1 (comp(P(i, �x), [List | LLists]), ) = I2 (P(i, �x), List, ) ∨ I1 (comp(P(i, �x), LLists), ) . (28) The I1 macro denotes the compatibilities of P(i, �x) and it is defined by a disjunction with the I2 macros described below. Each I2 (P(i, �x), List, ) macro collects the set of temporal constraints specified by the compatibility comp(P(i, �x), [List]) over the timelines mentioned in the bag of situations . def I2 (P(i, �x), List, ) = ∃s( s ∈ ∧ T (i, s)∧ � (29) ( {(op,Q( j,�y))∈List} ∃s� (s� ∈ ∧ T ( j, s� )∧ ∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ])))). So the predicate I2 is the bottom of the expansion as it reduces to a conjunction of statements P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ]) that we already know how to transform into a formula of TFSC, see Section 5. Note, in particular, that the bag of timelines mentioned in I2 , serves to pick up a pair of situations for each temporal constraint, according to its type i. That is, the expansion of I2 (P(i, �x), List, ) says that each element (op, Q( j, �y)) ∈ List specifying a temporal relation op with the processes P(i, �x), holding over the timeline s ∈ , holds over the timeline s� ∈ compatibly with the constraint op. Example 11 Consider the timelines for the two components nav(for navigation) and eng (for engine) depicted in Figure 4. The involved compatibilities are represented by the following T c term Tc = [ comp(at(nav, x), [[(d stop(eng)) ] ]), comp(go(nav, x, x� ), [[(e run(eng)) ] ]) ], Here T c states that: at(nav, x) d stop(eng), that is, the agent navigation can be at a specified position x only while the engines are stopped. On the other hand the temporal constraint go(nav, x, x� ) e run(eng) tells us that the processes go(nav, x, x� ) and running(eng) start and stop at the same time. 5 TEMPORAL INTERVALS AND CONSTRAINTS S0 nav do(startgo,S0) go at do([startgo,endgo],S0) at Elapsedat(nav,p1,t0,t-1) Elapsedgo(nav,p1,p2,t-1,t+1) t-1 <= t-2 t-1 = t-2, t+1 = t+2 Elapsedstop(eng,t0,t-2) eng 22 stop S0 Elapsedrun(eng,t-2,t+2) run do(startrun,S0) Activeat(nav,p2, t+1) t+2 <= t+1 Activestop(eng,t+2) stop do([startrun,endrun],S0) Figure 4: Timelines represented by [t−1 , t+1 , t−2 , t+2 ] = B({ do([startgo (nav, p1 , p2 , t−1 ), endgo (nav, p1 , p2 , t+1 )], S 0 ), do([startrun (eng, t−2 ), endrun (eng, t+2 )], S 0 )}), and temporal constraints defined by the macro I(T c , [ω]), with ω = �t−1 , t+1 , t−2 , t+2 �. Note that each relation Elapsed and Active labelling the timeline nav implies the temporal constraint labelling the arrow. The timelines in Figure 4 are designated by the following bag of situations: [ω] = B({ do([startgo (nav, p1 , p2 , t−1 ), endgo (nav, p1 , p2 , t+1 )], S 0 ), do([startrun (eng, t−2 ), endrun (eng, t+2 )], S 0 )}), where ω = �t−1 , t+1 , t−2 , t+2 � are time variables. To establish the temporal constraints that hold over the timelines in [ω] using the compatibilities T c (see Figure 5), we can built the predicate I(T c , [ω]) on , with time variables ω, according to definition (28), as follows: I(T c , ) def = I1 (comp(at(nav, x), [[(d stop(eng)) ] ]), )∧ I1 (comp(go(nav, x, x� ), [[(e run(eng)) ] ]), ), that is, macro expanding I1 in terms of I2 (by (28)), I(T c , ) def = I2 (at(nav, x), [(d stop(eng))], ) ∧ I2 (go(nav, x, x� ), [(e run(eng))], ), According to the above I2 expansion, we obtain: def I2 (at(nav, x), [(d stop(eng))], ) = ∃s(s ∈ ∧ T (nav, s)∧ ∃s� (s� ∈ ∧ T (eng, s� ) ∧ ∀x, t1 , t2 ∃t3 , t4 (at(nav, x, t1 , t2 ) d stop(eng, t3 , t4 )[s, s� ]))) ; def I2 (go(nav, x, x� ), [(e run(nav))], τ2 , ) = ∃s(s ∈ ∧ T (nav, s)∧ ∃s� (s� ∈ ∧ T (nav, s� ) ∧ ∀x, y, t1 , t2 ∃t3 , t4 (go(nav, x, y, t1 , t2 ) e run(eng, t3 , t4 )[s, s� ]))). Collecting everything together we obtain that I(T c , ) reduces to the following TFSC formula denoting the tem- 5 TEMPORAL INTERVALS AND CONSTRAINTS {(0,0)} at 23 go {(0,0)} at {(0,0)} S0 {d} {e} {d} {(0,0)} stop {(0,0)} run {(0,0)} stop Figure 5: Temporal constraint network associated with the compatibilities T c = [comp(at(nav, x), [[(d stop(eng)) ] ]), comp(go(nav, x, x� ), [[(e run(eng))]])] and the timelines represented by [t−1 , t+1 , t−2 , t+2 ] = B({ do([startgo (nav, p1 , p2 , t−1 ), endgo (nav, p1 , p2 , t+1 )], S 0 ), do([startrun (eng, t−2 ), endrun (eng, t+2 )], S 0 )}). poral constraints relative to and T c (see Figure 4). I(T c , ) def = ∃s(s ∈ ∧ T (nav, s)∧ ∃s� (s� ∈ ∧ T (eng, s� ) ∧ ∀x, t1 , t2 ∃t3 , t4 ( Elapsedat (nav, x, t1 , t2 , s) → (Active stop (eng, y, t3 , s� ) ∨ Elapsed stop (eng, y, t3 , t4 , s� )) ∧ (t1 = t3 ∧ t2 = t4 )∨ Activeat (nav, x, t1 , s) → (Active stop (eng, y, t3 , s� ) ∨ Elapsed stop (eng, y, t3 , t4 , s� )) ∧ (t1 = t3 )))∧ � � ∃s (s ∈ ∧ T (eng, s� ) ∧ ∀x, t1 , t2 ∃t3 , t4 ( Elapsedgo (nav, x, t1 , t2 , s) → (Activerun (eng, y, t3 , s� ) ∨ Elapsedrun (eng, y, t3 , t4 , s� )) ∧ (t3 ≤ t1 ∧ t2 ≤ t4 )∨ Activego (nav, x, t1 , s) → (Activerun (eng, y, t3 , s� ) ∨ Elapsedrun (eng, y, t3 , t4 , s� )) ∧ (t3 ≤ t1 )))). Discussion In the TFSC framework, parallel timelines are associated with their sets of processes and fluents, therefore, processes and fluents belonging to different timelines influence each other mainly through temporal constraints. This proves that loosely coupled components and temporal constraints are necessary to allow and capture flexible temporal behaviours. Indeed, in the TFSC framework, starting and ending points of the processes are not fixed and events associated with different components are not sequenced, hence only temporal constraints can be forced. This approach allows us to (1) represent temporally flexible behaviours in their generality (2) keep the simple structure of the basic theory of actions for each component. Notice also that, fluents belonging to two separated components can be easily related through temporal compatibilities, e.g. specifying P(i, �x, t1− , t1+ ) e Q( j, �y, t2− , t2+ ), implies that whenever P(·) is on timeline i, Q(·) must be on timeline j. Furthermore, since the temporal compatibilities are expressed by a TFSC formula, it is possible to infer properties associated with parallel timelines and their constraints. E.g. it is possible to ask whether DT |= ∃t, p[at(nav, p, t, σ2 ) ∧ stop(eng, t, σ1 )∧ σ1 ∈ ∧ σ2 ∈ ] ∧ I(T c , ), with defined as specified in Example 11. This formula combines parallel processes and temporal constraints. In the next sections, we shall show how it is possible to decouple logical and temporal reasoning in TFSC. 6 MAPPING TFSC TO TEMPORAL CONSTRAINT NETWORKS 6 24 Mapping TFSC to Temporal Constraint Networks In this section, we introduce the construction of a transformation from the compatibility formula I(T c , ), having a macro definition (see (26)) within a background theory DT of TFSC, into a general temporal constraint networks (TCN) [47]. More specifically, we show that, given a domain theory DT , a set of temporal compatibilities T c , and a bag of timelines [ω], it is possible to build a temporal network as a disjunction of conjunctions of algebraic relations µop over time. 6.1 Temporal Constraint Network A Temporal Constraint Network (TCN) is a formal structure for handling metric information, the general concept was early introduced by Dechter, Meiri and Pearl in [13] and then further extended in [14] and in [47] to handle both quantitative and qualitative information. Temporal knowledge represented by propositions, can be associated with intervals, and relationships between events timing can be represented by constraints. For example the statement the robot was close to the door before it could see it, but it was still there after it had processed the images, can be represented as: (a) closeT o(r, d, t1− , t1+ ) b see(r, d, t2− , t2+ ) (b) closeT o(r, d, t1− , t1+ ) a scan(r, d, t2− , t2+ ) A TCN offers a simple representation schema for temporal statements, exploiting a temporal algebra of relations that can be expressed by a direct constraint graph, where each node is labelled by a an event associated with a temporal interval, and directed edges between nodes denote the temporal constraints. Essentially a TCN involves a set of variables {t1− , t1+ . . . , tn− , tn+ }, with time intervals [t− , t+ ] representing the duration of specific events (e.g. closeT o), and a set R of binary constraints coming from the 13 possible relationships that can be stated between any pair of intervals [2], these are illustrated in Figure 3. Note that constraints can be expressed disjunctively, for example if we consider the events at(r, P1 ) and moveT o(r, P2 ), then in the TCN we could express the statement at(r, P1 ) {m, s} moveT o(r, P2 ), saying that the event at(r, P1 ) can either start or meet the event moveT o(r, P2 ). According to the underlying temporal algebra, TCN s can express different forms of reasoning; among the most well known are the Point Algebra [78], and the metric point algebra [14], an extensive overview can be found in [6]. Let {t1− , t1+ , . . . , tn− , tn+ }, n ∈ N be a set of time variables, where each couple of variables ti− , ti+ denotes an interval − + [ti , ti ] possibly associated with some event; let TCN involve a set of binary constraints R = {op1 , . . . , opm }, m ∈ M. The temporal constraint network TCN represented with a labelled direct graph can be described using conjunctions and disjunctions of constraints as follows: � � − + − + (30) z∈Z i, j∈Jz [ti,z , ti,z ] opi,z [t j,z , t j,z ]. The assignment V = {�v−i , v+i � | v+i (ti+ ) = si and v−i (ti− ) = ei with si , ei ∈ R+ , si < ei } to the variables is called a solution if it satisfies all the constraints in R, defining the TCN. The network is consistent if at least one solution exists (see [14]). A classification of complexity for satisfiability problems (specifically for the Allen’s interval algebra), has been given in [35], following previous results of [52]. 6.2 Mapping compatibilities to temporal constraints Consider the macro definition (26) and the definitions of the temporal operators {m, b, f, d, s, e} as given in Example 9. We have that: de f P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ] = Fop (i, j, �x, �y, ti− , ti+ , t−j , t+j , s, s� ) 6 MAPPING TFSC TO TEMPORAL CONSTRAINT NETWORKS 25 where Fop (·) is a formula of LT FS C (definiens) while P(·) op Q(·) is a macro of the pseudo language (definiendum). Clearly, by M, v |= P(·) op Q(·) we mean M, v |= Fop (·). Furthermore, we can note that each op can be given an algebraic interpretation γop of a temporal constraint á la Allen as follows. Let op ∈ {b, m, o, d, s, f, e}, there is an algebraic relation interpreting op, say γop , defined as follows: def γb (ti− , ti+ , t−j , t+j ) = (ti+ ≤ t−j ) def def γm (ti− , ti+ , t−j , t+j ) = (ti+ = t−j ) def γo (ti− , ti+ , t−j , t+j ) = (ti− ≤ t−j ∧ ti+ ≤ t+j ) γd (ti− , ti+ , t−j , t+j ) = (t−j ≤ ti− ∧ ti+ ≤ t+j ) def γs (ti− , ti+ , t−j , t+j ) = (ti− = t−j ) def def γf (ti− , ti+ , t−j , t+j ) = (ti+ = t+j ) (31) γe (ti− , ti+ , t−j , t+j ) = (ti− = t−j ∧ ti+ = t+j ) Within the TFSC approach, the definiens Fop (·) is interpreted into structures of the TFSC, letting the assignments to temporal variables ω to freely vary on these structures. However we shall show that the TCN, that we obtain from the predicate I(T c , [ω]), will make it possible to suitably specify these variables values. The following theorem states that the predicate I(T c , [ω]) can be transformed into a normal form, given a suitable indexing of the time variables with respect to the processes and the interval relations op. Theorem 6 Let [ω] be a bag of timelines mentioning a set of timelines {σ1 , . . . , σn }, where each σi is a timeline term and where ω collects all the free variables in . Then the predicate I(T c , [ω]) can be reduced to the following form: � � x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ]. (32) z∈Z �q1 ,q2 ,q3 ,q4 �∈Jz ∀� Here, Z and Jz are finite sets of indexes and the τi, j s are either free variables or ground terms mentioned in ω. � Theorem 6 allows to eliminate temporal quantifiers in I(T c , [ω]) obtaining a normal form where temporal terms τi, j can be are either temporal variables or ground terms. Here, the index z ranges on the disjunctions, while the other indexes q1 , . . . , q4 range on the possible conjuncts Pz,q1 , on the associated temporal variables τz,q2 , on the relations opz,q3 Qz,q3 , and on their variables τz,q4 . To put in evidence the interval relations, when i, j, σi , σ j can be extrapolated from the context, we use the abbreviation ϕop (τ−k , τ+k , τ−g , τ+g ) to denote the interval relation: ∀�x∃�yP(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]. (33) Now, given the disjunction of conjunctions of interval relations as defined in (32), we need to make explicit the underlying algebraic relations. The algebraic interval relations associated with ϕop (τ−k , τ+k , τ−g , τ+g ) depends also on the associated domain theory DT . In fact, the left hand side P(·) of the interval relation P(·) op Q(·) works as the enabler of the interval constraint: if no process (or fluent) P(·) is either active or elapsed along the timelines, then the associated interval relation op is not applicable; otherwise, if P(·) is active or elapsed, the algebraic relation for op depends whether P(·) is still active or is elapsed. More precisely, we distinguish the following three cases: E P : DT |= ∃�x, t− , t+ ElapsedP (i, �x, t− , t+ , σi ) ∧ σi ∈ , AP : DT |= ∃�x, t− ActiveP (�x, t− , σi ) ∧ σi ∈ , NP : neither E P nor AP hold. (34) Given these cases, we can introduce the algebric relation µop (tk− , tk+ , tg− , tg+ ) associated with ϕop (tk− , tk+ , tg− , tg+ ) as follows: m1 m2 m3 If NP holds then, for the given op, no temporal constraint is imposed; if E P holds then, for the given op, µop = γop ; (35) if AP holds then we have that, for op ∈ {m, f} no temporal constraint is imposed, as for the remaining cases: 6 MAPPING TFSC TO TEMPORAL CONSTRAINT NETWORKS µb = γb , µe = (tk− = tg− ), µo = (tk− ≤ tg− ), 26 µd = (tg− ≤ tk− ). The following theorem shows the relation between µop and ϕop . Theorem 7 Let µop be any of the algebraic interval relations defined above, and let M = (D, I) be a structure of LT FS C such that M is a model of DT and suppose that for some assignment v to the free temporal variables the following holds: (i) M, v |= ∀�x∃�y.P(i, �x, τ−p , τ+p ) op Q( j, �y, τ−q , τ+q )[σi , σ j ], (ii) M, v |= ∃�x ElapsedP (i, �x, τ−p , τ+p , σi ) or M, v |= ∃�x ActiveP (�x, τ−p , σi ). Here σi , σ j are ground timelines of type i and j, with τ+p (I,v) = d+p , τ−p (I,v) = d−p , τ+q (I,v) = dq+ , τ−q (I,v) = dq− with d+p , d−p , dq− , dq+ elements of the domain D canonically interpreted in R+ . Then, the algebraic relation µop holds on �d−p , d+p , dq− , dq+ �. � The theorem says that, given the conditions (i) and (ii) for a temporal relation ∀�x∃�yP(·) op Q(·), the algebraic relation µop holds on the same time values. Note that, the condition (i) of Theorem 7 states that the temporal relation ∀�x∃�yP(·) op Q(·) is consistent with respect to the theory DT and the bag of timelines . Instead, the conditions (ii) play the role of the conditions m2 and m3 . 6.3 Compatibilities without logical constraints We now want to introduce a notion of consistency which depends only on the logical structure independently of the temporal constraints. Considering again the macro definition (26) and the definitions of the temporal operators as given in Example 9, by {P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]}Λ we indicate the formula obtained from P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ] excluding its temporal constraints. For example, for op = m {P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[s, s� ]}Λ = ElapsedP (i, �x, ti− , ti+ , s) → ((ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )). We can now introduce the notion of partially consistent interval relation. Definition 2 Given a TFSC theory DT and a bag of timelines , the interval relation P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ] is partially consistent with respect to DT if there exists a model M of DT and an assignment v to the free temporal variables such that M, v |= {P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]}Λ � This notion allows us to separate the temporal and logical structure associated to the compatibilities and will be exploited in the construction of the network illustrated in the next subsection. We are now ready, reversing the process and using the results of Theorem 6 and Theorem 7, to show how from the compatibility constraint predicate I(T c , [ω]) the temporal constraint network can be built. 6.4 Network Construction The network construction proceeds as follows. 6 MAPPING TFSC TO TEMPORAL CONSTRAINT NETWORKS 27 Temporal Network Temporal relations µ (τ− , τ+ , τ− , τ+ ) � �op p −p +q −q + Temporal constraint network µop (τ p , τ p , τq , τq ) Assignment V solution of the network V = {�v−i , v+i � | v+i (ti+ ) = si , v−i (ti− ) = ei with si , ei ∈ R+ , si < ei } Compatibilities and constraints in TFSC Temporal variables ω=�t−1 , t+1 , . . . t−n , t+n �, n ∈ N Compatibility term Tc Constraint formula for with free variables in ω I(T c , [ω]) Between TFSC and TCN mapping Mapping Temporal Constraints ζ : (DT , T c , [ω]) → TCN Mapping Ordering Constraints Ord : [ω] → TCN Table 2: Notation for TFSC and TCN mapping. (a) Transform the predicate I(T c , [ω]) into a disjunctive normal form, as indicated in Theorem 6, obtaining the form (32): � z∈Z � �q1 ,q2 ,q3 ,q4 �∈Jz ∀�x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ]. (b) Let τ−1,1 , τ+1,1 , . . . , τ−n,m , τ+n,m be time variables or instances thereof mentioned in each of the conjuncts ∀�x∃�y.P(·) op Q(·) as in (32). (c) For each conjunct ∀�x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ], if this is partially consistent with DT then, given the E P , AP , and NP as specified in (34), we can define the corresponding µqop1 z,q3 as specified by the rules (m1 )-(m3 ) in (35). Otherwise, if not partially consistent in DT , the associated index z is collected in a set of indexes Z ∗ , that is, z ∈ Z ∗ . (d) From the above disjunctive normal form (32), we can obtain the algebraic relations: � � µqop1 z,q (τ−z,q2 , τ+z,q2 , τ−z,q4 , τ+z,q4 ). z∈Z � (q1 ,q2 ,q3 ,q4 )∈Jz 3 Here, Z � is for Z \ Z ∗ and collects only the indexes not excluded at the step (c) (i.e. Z � collects only consistent disjuncts). Here, µqop1 z,q is the algebraic temporal relation indexed by z, q3 - as the opz,q3 in the form (32) - and q1 be3 cause depending on Pz,q1 according to the rules (m1 )-(m3 ). That is, given the domain theory DT and the bag of timelines , the above algebraic relation can be obtained from the form (32) once we substitute each conjunct � � q1 ∀�x∃�y.P(·) op Qz, j (·) as specified in the step (c). µopz,q3 (·, ·, ·, ·) is, indeed, the temporal constraint network implicitly represented by I(T c , [ω]) given DT . In other words, this network is a labelled direct graph which can be described using conjunctions and disjunctions of temporal constraints. Therefore, following the notation introduced in Section 6.1, we can denote this temporal network as ζ(DT , T c , [ω]) because it depends on DT , T c , and [ω]. Notice that in the temporal network ζ(DT , T c , [ω]) obtained by the (a)-(d) transformation, the time ordering constraints, enforced by the situations mentioned in the bag of timelines [ω], are not made explicit, although 7 FLEXIBLE HIGH LEVEL PROGRAMMING IN TFGOLOG 28 they hold in force of Lemma 2, (see Section 3). Therefore, to obtain the complete set of time constraints implicit in [ω], given T c and DT , we have to consider also the temporal ordering defined by the structure of the bag of timelines [ω]. That is, for any situation σ[ω� ] ∈ [ω] (where ω� are the variables mentioned in σ), if σ1 [ω1 ] � σ2 [ω2 ] � σ[ω� ], then time(σ1 [ω1 ]) ≤ time(σ2 [ω2 ]) ≤ time(σ[ω� ]). Given the time variables t−1 , t+1 , . . . , t−m , t+m used for the time variables ω, we call this set of ordering constraints Ord( [ω]). For example, considering again the bag of timelines depicted in Figure 4, Ord( [ω]) = (t0 ≤ t+1 ∧ t−1 ≤ t+1 ) ∧ (t0 ≤ t−2 ∧ t−2 ≤ t+2 ) collecting the ordering constraints associated with the two timelines in [ω]. Corollary 2 Let ζ(DT , T c , [ω]) be a temporal constraint network obtained by I(T c , [ω]), according to steps (a-d) of the network construction and let M be a structure of TSFC which is a model for DT . Let V = {�v−i , v+i � | v+i (ti+ ) = si and v−i (ti− ) = ei with si , ei ∈ R+ , si < ei } be an assignment to the time variables. Then V is a solution for the network iff for any assignment v to the free temporal variables of [ω], which is like V, M, v |= I(T c , [ω]). � The corollary says that we can use the temporal constraint network as a service for the theory of actions to fix the temporal constraints between processes and fluents. Given the compatibility constraints and the Ord ordering constraints introduced above, we can express with network(DT , T c , [ω]) the conjunction of the temporal constraint network and the ordering constraints as follows: network(DT , T c , [ω]) = ζ(DT , T c , [ω]) ∧ Ord( [ω]). (36) We shall not discuss in this paper methods of simplification of the constraints nor for computing a satisfiable set of time values for a temporal network. 7 Flexible High Level Programming in TFGolog In this section, we introduce the syntax and the semantics of a TFGolog interpreter that can be used to generate a temporal constraint network and the related flexible temporal plan. 7.1 TFGolog Syntax Given the extended action theory presented above, the following constructs inductively build Golog programs: 1. Primitive action: α. 2. Nondeterministic choice: α|β. Do α or β. 3. Test action: φ?. Test if φ is true in the current bag of timelines. 4. Nondeterministic argument choice: choose �x for p(�x). 5. Action sequence: p1 ; p2 . Do p1 followed by p2 . 6. Partial order action choice: p1 ≺ p2 . Do p1 before p2 . 7. Parallel execution: p1 �p2 . Do p1 concurrently with p2 . 8. Conditionals: if φ then p1 else p2 . 9. Nondeterministic iteration: p∗ . Do p n times, with n ≥ 0. 7 FLEXIBLE HIGH LEVEL PROGRAMMING IN TFGOLOG 29 10. While loops: while φ do p1 . 11. Procedures, including recursion. Hence, compared to Golog, here we also have the parallel execution and partial order operator that can be defined over parallel timelines. Example 12 Considering again the two components nav (for navigation) and eng (for engine), depicted in Figure 4, a possible TFGolog program encoding the robot task approaching position pos, within the time interval d, can be written as follows: proc(approach(d, pos), π(t1 , π(t2 , π(t3 , [π(x, startgo (nav, x, pos, t1 )) ≺ (at(nav, pos)?) � startrun (eng, t3 ) ; endrun (eng, t2 ) ; (t2 − t1 <d)?]))) ). Here, we are stating that, the robot starts to go to pos at time t1 , meanwhile the engine starts to work at time t3 and it is switched off, at the arrival to pos, at time t2 . Notice that endgo is not explicitly specified, but should be inferred by the interpreter because needed to satisfy at(nav, pos). 7.2 TFGolog Semantics The semantics of a TFGolog program p with respect to DT can be defined in the TFSC. Given an initial bag of timelines , an interval (h s , he ) specifying the time horizon over which the program is to be instantiated from h s to he , the execution trace � of a program p is recursively defined through the macro DoT F(p, , � , (h s , he )). First we shall introduce some further notation that extends Golog abbreviations to bag of timelines: � def = ddo(a, s, ) = (s ∈ ∧ a=ν s) ∧ ( � = ( \S B({s})) ∪S B({do(a, s)})). (37) Here, the two operations of difference and union are the one already defined for bag of timelines, as shown in Example 6 and Definition 1. Notice that for a = A(�x, t) with t free variable, equals to ddo(A(�x, t), σ, ) with mentioning the free variable t. Furthermore, we extend the function time to bags of timelines as follows: def ttime( ) = max{time(σ)|σ ∈ } (38) Here max{.} is defined by a first order formula, for example if it were defined for two elements it would be as follows: def max{a, b} = x = (x = a ∧ (a > b) ∨ x = b ∧ (b > a)). Further, we define the executability of a bag of timelines, over a specified horizon (h s , he ) as exec( , � , (h s , he )) def = ( = � ∧ ttime( ) ≥ h s ∧ ttime( ) ≤ he ) ∨ ∃ �� , s, a( � = ddo(a, s, �� )∧ (39) executable(do(a, s)) ∧ exec( , �� , (h s , he )) ∧ time(a) ≥ h s ∧ time(a) ≤ he ), exec( , � , (h s , he )) states that � is an executable extension of . The definition is inductive with respect to � , where the base case is � = and the inductive step is given for � = ddo(a, s, �� ), assuming exec( , � , (h s , he )), with do(a, s) ∈ � executable timeline such that s ∈ �� . We can now specify the DoT F(p, , � , (h s , he )) as follows. 7 FLEXIBLE HIGH LEVEL PROGRAMMING IN TFGOLOG 30 1. Primitive action with horizon: def DoT F(a, , � , (h s , he )) = ∃s(s ∈ ∧ a=ν s ∧ Poss(a, s) ∧ time(s) ≥ h s ∧ time(s) ≤ he ∧ time(s) ≥ time(a)∧ (time(a) ≤ he ∧ � = ddo(a, s, ) ∨ time(a) > he ∧ = � ))). Here, if the primitive action is applicable to s ∈ , and a can be scheduled after the horizon then it is neglected along with the rest of the program (i.e. each action, which can start after the horizon could be neglected; this temporal planning strategy is employed in several timeline-based planners, e.g. [51, 33]). Notice that Poss(a, s) require a and s to be of the same type. Notice also that, for a = A(�x, t) with t free variable, the free variable t is mentioned in � . We recall that [ω] denotes a bag of situations with free variables ω. 2. Program sequence: def DoT F(prog1 ; prog2 , , � , (h s , he )) = ∃ �� (DoT F(prog1 , , �� , (h s , he )) ∧ DoT F(prog2 , Here, the second program prog2 is executed starting from the execution �� �� , � , (h s , he )) of the first program prog1 . 3. Partial-order action choice: def DoT F(prog1 ≺ prog2 , , � , (h s , he )) = ∃ �� , ��� (DoT F(prog1 , , �� , (h s , he ))∧ DoT F(prog1 , ��� , � , (h s , he )) ∧ �� �S ��� ) ∧ exec( �� , ��� , (h s , he )) Here, given the execution �� of the first program prog1 , the second program prog2 can be executed starting from an executable extension ��� of �� . If �� = ��� then we have the sequence case. 4. Parallel execution: def DoT F(prog1 � prog2 , , � , (h s , he )) = ∃ �� DoT F(prog1 , , � , (h s , he )) ∧ DoT F(prog2 , , �� , (h s , he )) ∧ ( � = �� ). The parallel execution of two programs from , under the horizon (h s , he ), can be specified by the conjunction of the execution of the two programs over the timelines � and �� . The execution is correct iff the obtained timelines are equal. 5. Test action: def DoT F(φ?, , � , (h s , he )) = φ[ ] ∧ = � . Here φ[ ] stands for a generalisation of the standard φ[s] (in the TFSC) extended to bag of timelines, e.g. P1 [ ] ∧ P2 [ ] stands for P1 (s1 ) ∧ P2 (s2 ) with s1 , s2 ∈ , i.e. each fluent is evaluated with respect to its specific timeline. 6. Nondeterministic action choice: def DoT F(prog1 |prog2 , , � , (h s , he )) = DoT F(prog1 , , � , (h s , he )) ∨ DoT F(prog2 , , � , (h s , he )). Here, analogously to standard Golog, the execution of the action choice is represented as the disjunction of the two possible executions. 7 FLEXIBLE HIGH LEVEL PROGRAMMING IN TFGOLOG 31 7. Nondeterministic argument selection: def DoT F(π(x, prog(x)), , � , (h s , he )) = ∃xDoT F(prog(x), , � , (h s , he )). The execution of the nondeterministic argument selection is represented as in standard Golog. 8. Conditionals: def DoT F(if φ then prog1 else prog2 , , � , (h s , he )) = φ[ ] ∧ DoT F(prog1 , , � , (h s , he )) ∨ ¬φ[ ] ∧ DoT F(prog2 , , � , (h s , he )). 9. Nondeterministic iteration: def DoT F(prog∗ , , � , (h s , he )) = ∀P{∀ 1 P( 1 , 1 ) ∧ ∀ 1 , 2 , 3 [P( 1 , 2) ∧ DoT F(prog, 2 , 3 , (h s , he )) → P( 1 , 3 )]} → P( , � ). 10. The semantics of conditionals, while loops, and procedures is defined in the usual way. We show, now, that given two fully ground bags of timelines init and such that DoT F(prog, init , , (h s , he )) then the timelines in the bag of timelines complete the timelines in init . Furthermore, we show that, if the initial bag of timelines init mentions only executable timelines, then mentions only executable timelines too. Proposition 3 Let DT |= DoT F(prog, init , , (h s , he )) with ttime( ) ≤ he and h s ≤ ttime( init ), then init �S . � Proposition 4 Let DT |= DoT F(prog, init , , (h s , he )) with ttime( ) ≤ he and h s ≤ ttime( executable, then any σ ∈ is executable. init ), if any σ� ∈ init is � 7.3 Generating Flexible Plans in TFGolog In this section, we describe how TFGolog programs characterise temporally flexible execution traces represented by bags of timelines. The DoT F(prog, init , , (h s , he )) macro defined above defines the bag of timelines that are executable extensions of init within the horizon (h s , he ), but temporal constraints are not considered. However, a correct extension for init should also satisfy the temporal constraint network induced by the compatibilities T c (see Section D) and represented by I(T c , [ω]) in TFSC (Corollary 2). Therefore, we introduce the following notion of temporally flexible execution of a TFGolog program. Definition 3 Let DT be a domain theory, T c a set of compatibilities, prog a TFGolog program, (h s , he ) a horizon, and init an initial fully ground bag of situations. Let [ω] be a bag of timelines with free temporal variables ω, [ω] is a temporally flexible execution of prog if the following holds. DT |= ∃t1 , . . . , tn .DoT F(prog, init , [t1 , . . . , tn ], (h s , he )) ∧ I(T c , [t1 , . . . , tn ]). (40) � The following proposition shows that, given a temporally flexible execution of prog, the possible assignments of ω can be characterised by the solution assignments of the temporal constraint network network(DT , T c , [ω]). 7 FLEXIBLE HIGH LEVEL PROGRAMMING IN TFGOLOG 32 Proposition 5 Let DT , T c , prog, and [ω] be, respectively, a TFSC domain theory, a set of compatibilities, a TFGolog program, and a bag of timelines with ω free temporal variables and let M a model of DT . Furthermore, let V be a set of assignments which are solutions for network(DT , T c , [ω]) and let A be the set of assignments to the temporal variables for M. Given a ground bag of timelines init : iff v ∈ A and M, v |= DoT F(prog, init , [ω], (h s , he )) ∧ I(T c , [ω]) v ∈ V and M, v |= DoT F(prog, init , [ω], (h s , he )). � As a consequence of the definition of temporally flexible execution and of the above statement, we have the following corollary which directly follows from Proposition 5. Corollary 3 Let DT , T c , and prog be, respectively, a domain theory, a set of compatibilities, and a TFGolog program. Let (h s , he ) be a horizon, init an initial fully ground bag of situations, and [ω] a temporally flexible execution of prog. Given any model M of DT and any v s.t. M, v |= DoT F(prog, init , [ω], (h s , he )) ∧ I(T c , [ω]), we have that v ∈ V, hence it is a solution of network(DT , T c , [ω]). � In other words, this corollary states that, given a temporally flexible execution [ω], the possible assignments to ω are the solutions V of network(DT , T c , [ω]). Once we have established a relation between the temporally flexible execution and the network, we may want to explicitly represent the solutions in the signature of LT FS C . Suppose we obtain, as a solution of the network, a tuple of real numbers m = �m1 , . . . , mn � then there are two possibilities. If m is a tuple of rational numbers, they are representable in LT FS C , hence we can explicitly refer to [m] to represent a ground [ω], e.g., ensuring that DT |= DoT F(prog, init , [m], (h s , he )) ∧ I(T c , [m])) to check (40) for the instance m. Otherwise, m = �m1 , . . . , mn � might be numbers not in the signature of LT FS C . If we want to represent also these cases, a possible solution is to extend the language to Lm T FS C adding for each number its corresponding symbol name as a constant. Then, in the extended language Lm T FS C , we can obtain a suitable interpretation for the new symbols, by associating the interpretation of each new symbol mi to the correspondent variable assignment, that is, ensuring that v(ti ) = miI , according to Proposition 5. Notice that the [ω], analogously to standard Golog, can be obtained from the program prog as constructive proof of (40). The main difference with respect to the standard Golog approach relies in the presence of the free variables ω, these are the temporal variables �t1 , . . . , tm � associated with the actions Ai (d�i , ti ) extending the bag of timelines init to [ω]. The description of an algorithm implementing the interpreter is beyond the scope of this paper, here we just notice that, similarly to the standard Golog approach, the interpreter has to instantiate the nondeterministic choices searching for the possible alternatives. However, in this case, the temporal variables for the actions Ai (d�i , ti ), instead of been instantiated, are left free (because constrained by the temporal network represented by I(T c , [ω])). An interpreter of this kind can be implemented in the Constraint Logic Programming language (CLP) [32] which combines logic programming and constraint management. For example, in [10] we exploit an implementation of the TFGolog interpreter developed in C++ and the Eclipse 5.7 engine for CLP. Example 13 Considering again the two components nav (for navigation) and eng (for engine), depicted in Figure 4, a possible TFGolog program encoding the robot task approaching position pos, within the time interval d , can be written as follows: proc(approach(d, pos), π(t1 , π(t2 , π(t3 , [π(x, startgo (nav, x, pos, t1 )) ≺ at(nav, pos)? � startrun (eng, t3 ) ; endrun (eng, t2 ) ; (t2 − t1 <d)?])))). 7 FLEXIBLE HIGH LEVEL PROGRAMMING IN TFGOLOG 33 Here, we are stating that, the robot starts to go to pos at time t1 , meanwhile the engine starts to work at time t3 and it is switched off, at the arrival to pos, at time t2 . Given an initial, fully ground, bag of timelines: init = B({do(startgo (nav, p1 , p2 , 2), S 0 ), do(startrun (eng, 1), S 0 )}), stating that, at the beginning, the agent starts going from p1 to p2 at time 2 and starts the engine at time 1, a temporally flexible execution σ[ω] for the program approach(d, pos) is such that DT |= ∃t1 , t2 , t3 .DoT F(approach(5, p2 ), where d = 5, pos = p2 , and (h s , he ) = (0, 10). 7.4 init , [t1 , t2 , t3 ], (0, 10)) ∧ I(T c , [t1 , t2 , t3 ]), TFGolog and Sequential Temporal Golog To understand how TFGolog relates to other Golog versions in the literature we now show that TFGolog extends Sequential Temporal Golog. Given the axioms of sequential temporal SC in [64], it is possible to accommodate time in the Golog semantics. The Sequential Temporal Golog [64] can be directly obtained from the classical Golog, the only change needed is the Do macro for primitive actions: def Do(a, s, s� ) = Poss(a, s) ∧ start(s) ≤ time(a) ∧ s� = do(a, s), where start(s) represents the activation time for the situation s. Everything else about Do remains the same. It is possible to show that TFGolog extends STGolog. Intuitively, any STGolog program can be expressed as TFGolog program working with a single timeline, grounded time, infinite horizon, without time constraints. More formally, we can state the following proposition. Proposition 6 Given a Sequential Temporal SC theory DS T S C it is possible to define a TFSC theory DT such that for any STGolog program prog st there exists a TFGolog program progt f such that where DS T S C |= Do(prog st , σ, σ� ) = B(σ) and � = B(σ� ). if f DT |= DoT F(progt f , , � , (0, ∞)), � Notice that STGolog concurrent temporal processes can be expressed by interleaving start and end actions along a unique timeline represented by a single situation. STGolog, indeed, induces a complete order on activities. Therefore, this model of concurrency cannot represent partially ordered activities, that might have parallel runs, since in this case the sequence of activations has to be decided at the execution time. Reiter [65] proposes a concurrent version of STGolog, that permits the execution of sets actions, with a set c = {a1 , . . . , an } in place of primitive actions. This notwithstanding the order of the activations is already fixed in the generated sequence of actions. For example [{starta , startb }, enda , endb ] (41) [starta , enda ]�[startb , endb ]. (42) is a concurrent execution of two processes a and b where a and b start is synchronised, but end of a has to occur before the end of b. On the other hand, in TFGolog we can express a more general (flexible) execution plan as follows: Here process a can end either before or after or even during the end of b. The point here is that strict sequentialization, as illustrated above for concurrent STGolog, is due to the concurrency model based on interleaving actions. This model hampers the possibility to generate flexible sequences in which switching is made possible. These limiting aspects are inherited by all the Golog-family, based on the interleaving model, including ConGolog [11] and IndiGolog [29]. 8 8 EXAMPLE: ATTENTIVE ROBOT EXPLORATION OF THE ENVIRONMENT 34 Example: Attentive robot exploration of the environment Consider an autonomous robot exploring an environment and performing observations (e.g. a rescue rover [10]), robot stimuli might guide the robot according to compatible constraints between components and tasks. Let us model the executive control of the robot via a set of interacting components enabling switching between tasks. For example, some typical components of an autonomous mobile robot are: the Head controller, that is the PTU or pan-tilt unit, the Locomotion controller, the Navigation module, including simultaneous localisation and mapping S LAM, the Attention system providing a saliency map for focusing on regions to be processed by the Visual Processing component etc. We refer the reader to [10] for a detailed description of this domain. In particular, in [10] we present a mobile robot whose executive control system is designed deploying TFSC and TFGolog (see Figure 6). idle idle point detect a reset focus scan Moving Head d Attention d idle idle stop at wander d goto Moving run d Navigation idle Locomotion explore d stop map SLAM observe approach Exploration Figure 6: Robot Control Architecture (left) and some control modules (right) with their processes/states (round boxes), transitions (arrows), and temporal relations (dotted arrows) TFSC Representation. Each component of the control system can be represented by a type in TCSF and it is associated with a set of processes. For example, we can consider as names of types the constants: ptu, slm, att,exp, nav, lcm denoting, respectively, the Head component (ptu), the SLAM module (slm), the Attention module (att), the Explore module (exp), the Navigation module (nav), and the Locomotion component (lcm). Each of these types is associated with a set of primitive actions, e.g. for the Locomotion component we have the primitive actions start stop (lcm, t), end stop (lcm, t), startrun (lcm, t), and endrun (lcm, t), here the lcm type is defined as follows: H(lcm, a) ≡ ∃t a = start stop (lcm, t) ∨ ∃t a = end stop (lcm, t) ∨ ∃t a = startrun (lcm, t) ∨ ∃t a = endrun (lcm, t). As for the related processes, we can introduce specific fluents for each component, for example, given the component depicted in Figure 6, possible fluents in this domain are: 8 EXAMPLE: ATTENTIVE ROBOT EXPLORATION OF THE ENVIRONMENT 35 Head: {idle(ptu, s), point(ptu, x, t, s), scan(ptu, z, x, t, s), reset(ptu, z, x, t, s)}; SLAM: {idle(slm, s) map(slm, s, t), stop(slm, s, t)} }; Attention: {idle(att, s) detect(att, s, t), f ocus(att, s, t)} }; Explore: {idle(exp, s), explore(exp, t, s), observe(exp, s, t), approach(exp, s, t)} }; Navigation: {idle(nav, s), goto(nav, x, y, t, s), wander(nav, t, s), at(nav, x, s)}; Locomotion: {idle(lcm, s), run(lcm, t, s), stop(lcm, t, s)}. Each process is explicitly represented in the TFSC model as described in Section 3. For example, in our case the process scan is modelled by the fluent scan(ptu, z, x, t, s) and the actions start scan (ptu, z, x, t) and end scan (ptu, z, x, t). The preconditions and the effects are encoded in the DT as specified in (16). For example, the successor state axiom for scan(ptu, z, x, t, s) is the following: scan(ptu, z, x, t, do(a, s)) a = start scan (ptu, z, x, t)∨ scan(ptu, z, x, t, s) ∧ ¬∃t� (a = end scan (ptu, z, x, t� )), ≡ while the preconditions for start scan (ptu, z, x, t) and end scan (ptu, z, x, t) are: Poss(start scan (ptu, z, x, t), s) Poss(end scan (ptu, z, x, t), s) Temporal Compatibilities Tc ≡ ≡ (s = S 0 ∨ s=ν start scan (ptu, z, x, t)) ∧ idle(ptu, s) ∧ time(s) ≤ t ; (s = S 0 ∨ s=ν end scan (ptu, �x, t)) ∧ ∃t� scan(ptu, z, x, t� , s) ∧ time(s) ≤ t . Some temporal compatibilities T c among the activities can be defined as follows: =[ comp(point(ptu, x), comp( f ocus(att, x), comp(scan(ptu, z, x), comp(goto(nav, x, y), [[(m scan(ptu, x))]]), [[(a point(ptu, x))]]), [[(d stop(exp)), (a map(slm))]]), [[(d idle(ptu)), (d map(slm))]])]. These compatibilities state the following temporal constraints. point(ptu, x) m scan(ptu, x), i.e. upon the PTU is pointed toward a location x the head is expected to scan the region around that point x, and the temporal relation f ocus(att, x) a point(ptu, x) tells that attention focus is preceded by a PTU pointing towards a region of the environment specified by x. After focusing the head can direct the cameras towards the region. Also, scan(ptu, z, x) d stop(lcm) prescribes that while the Head is scanning the environment the robot must be in stop mode to avoid invoking stabilisation processes. The constraints goto(nav, x, y) d idle(ptu) and goto(nav, x, y) d map(slm) state that, while the robot is moving, the pant-tilt unit is to be idle and attention is to be active. Figure 7 illustrates a possible evolution of the timelines up to a planning horizon considering the overall system. TFGolog programs Once we have defined the TFSC domain, we can introduce a partial specification of the robot behaviours using TFGolog programs. For example, we can say that: at the very beginning, i.e. time 0, the pant-tilt is idling with attention enabled; from time 0 to 3 the robot should remain where it is (e.g. posinit ), perform overall scanning with attention on, gathering information from the environment. Furthermore, given a direction θ it should focus attention f ocus towards it, before 30 and after 20, and move towards it before 50. This partially specified plan of actions can be encoded by the following TFGolog program: proc(partialPlan(p, p� , θ), π(t1 , π(t2 , π(t3 , π(t4 , π(t6 , π(t5 , π(t7 , π(t8 , π(t9 , π(t10 , π(t11 , π(t12 , π(t13 , π(t14 , Activemap (slm, 0, t1 ) ∧ t1 > 0)? � Elapsed(att, detect, t2 , t3 ) ≺ Elapsed f ocus (att, θ, t4 , t5 )? � Elapsedidle (ptu, 0, t6 )? ≺ (Elapsed scan (ptu, θ, t7 , t8 ) ∧ 20 ≤ t6 ∧ t8 ≤ 30)? � Elapsed(lcm, stop, t9 , t10 )? � (Elapsedat (nav, p, 0, t11 ) ∧ t12 > 3)? ≺ (Elapsedat (nav, p� , t13 , t14 ) ∧ t15 < 50)? ))))))))))) ). 8 EXAMPLE: ATTENTIVE ROBOT EXPLORATION OF THE ENVIRONMENT Execution History 36 Planning Horizon 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Exploration observe Navigation at Locomotion PTU approach goto stop idle SLAM point idle Camera Attention run scan reset idle stop passive active detect focus map passive detect observe explore at wander stop run point scan reset stop idle map active focus passive active Current time Figure 7: The history of states over a period of time (timelines) illustrating the evolution of several system components up to a planning horizon. If we fix the horizon equal to (he , h s ) = (0, 50) and the initial bag of situations 0 = B(S 0 ), a complete plan for partialPlan can be obtained as a bag of timelines mentioning timelines for each components such that DT |= ∃t1 , . . . , tn .DoT F(partialPlan(p1 , p3 , θ), 0, [t1 , . . . , tn ], (0, 50)) ∧ I(T c , [t1 , . . . , tn ]). For example, let: σ slm σatt = do([ = do([ σexp = do([ σ ptu = do([ σnav σlcm = do([ = do([ startmap (slm, t1 )], startdet (att, t2 ) start f ocus (att, t4 ) startexp (exp, t7 ) startobs (exp, t9 ), start point (ptu, θ, t12 ), end scan (ptu, t15 ), startgo (nav, p3 , t18 ), start start (lcm, t20 ), S 0 ); enddet (att, t3 ), end f ocus (att, t5 ), startdet (att, t6 )], S 0 ); endexp (exp, t8 ), endobs (exp, t10 ), startexp (exp, t11 )], S 0 ); end point (ptu, θ, t13 ), start scan (t14 ), startreset (ptu, t16 ), endreset (ptu, t17 )], S 0 ); endgo (nav, p3 , t19 )], S 0 ); end start (lcm, t21 )], S 0 ). and [t1 , . . . , t21 ]=B(�σ slm [t1 ], σatt [t2 , t3 , t4 ], σexp [t7 , . . . , t11 ], σ ptu [t12 , . . . , t17 ], σnav [t18 , t19 ], σlcm [t20 , t21 ]�), the bag of situations [ω] defined above is obtained by the macro I(T c , [ω]) which represents a temporal network, denoted by network(DT , T c , [ω]), with the following temporal constraints: {0 ≤ t1 ≤ 50; 0 ≤ t2 ≤ t3 = t4 ≤ t5 = t6 ≤ 50; 0 ≤ t7 ≤ t6 = t7 ≤ t8 = t9 ≤ t10 = t11 ≤ 50; 0 ≤ t12 ≤ t12 = t13 ≤ t14 = t15 ≤ t16 = t17 ≤ 50; 0 ≤ t18 ≤ t19 ≤ 50; 0 ≤ t20 ≤ t21 ≤ 50; t1 ≤ t18 ; t1 ≤ t20 ; t13 ≤ t4 ; t9 ≤ t4 ; t5 ≤ t10 ; 0 ≤ t4 ≤ t5 ≤ t18 ; t21 ≤ t12 }. 9 RELATED WORKS 37 Beyond partial plans, TFGolog can encode more general and intuitive behaviour fragments for tasks that can be selected and compiled if they are compatible with the execution context. For example, the following behaviour fragment can induce the interpreter to produce a plan to find a location within the deadline d and reach it: proc( f indLocation(d), π(t1 , π(t2 , π(t3 , π(t4 , π(t5 , startexp (exp, t1 ) ≺ endappr (exp, t2 ) � startwand (nav, t3 ) ; endwand (nav, t4 ) ; π(x, startgoto (nav, x, t5 )) ; (t4 − t3 <d)?)))))) ). This TFGolog script starts both the exploration and wandering activities; the wandering phase has a timeout d, after which the robot has to go somewhere. The timeout d is provided by the calling process that can be either another TFGolog procedure or a decision taken by the operator. Note that the procedure here is partially specified, indeed, we only mention processes belonging to the Exploration and Navigation timelines, but all the other timelines are to be managed by the TFGolog interpreter. Another example is the following script that manages the switch to the explore mode during an approach phase in the case of a stop: proc(switchApproachExplore(x, d), π(t1 , π(t2 , π(t3 , ((((Activeappr (exp, x, t1 ) ∧ Active stop (lcm, t1 ) ∧ idle(nav, t1 ))? ≺ (startmap (slm, t2 ) ≺ startexp (exp, t3 ))) | (Activeappr (exp, x, t1 ) ∧ Active stop (lcm, t1 ) ∧ ∃y(at(nav, y, t1 ) ∧ y � x))? ; ((Activemap (slm, t1 ))? ≺ startrun (loc, t3 ) | ¬Activemap (slm, t1 )? ≺ startmap (slm, t2 ) ≺ startexp (exp, t3 ))) ∧ (t3 − t1 ) ≤ d))) ). Here, the script manages a switch between the approach and explore tasks caused by a stop during an approach to a target location x. The overall switch should occur within a deadline d. It considers two cases: (1) the stop occurred while the navigation is in idle, hence the robot is not localised; (2) the stop occurred while navigation is at position y, therefore the robot is localised. In the first case, the system should switch to the exploration mode (i.e. startexp ) and restart the SLAM mapping (i.e. startmap ) to re-localise the robot. Notice that the generation of the restart sequence is left to the interpreter because it depends on the context. For instance, if map is running, the interpreter is to switch off map and restart it. In the second case, the stop occurred while the robot is localised at position y (that is not the target position), the system can just restart the engine to continue to approach the target, otherwise, since the system behaviour is not the expected one, to keep the robot safety, the activities are to be reconfigured in the exploration mode restarting the map. 9 Related Works A very first temporal extension of the Situation Calculus is proposed by Pinto and Reiter in [57, 56, 65]. In these works, the authors provide an explicit representation of time and event occurrences assimilating a single timeline to a timed situation. They specify durative actions by instantaneous starting and ending actions actuating processes. Concurrent executions of instantaneous actions are also enabled as reported in [65]. Pinto and Reiter in [57] show that a modal logic for concurrency can be embedded in a suitable Situation Calculus extension. These topics are addressed also by Miller and Shannahan in [49], where they propose a method to represent incomplete narratives in the Situation Calculus. In this case, differently from our approach, the problem of an unknown ordering amongst events is enabled by non monotonic reasoning on temporal events. 9 RELATED WORKS 38 Indeed, while Pinto and Reiter in [57, 56, 65] propose a situation-based timeline representation, where time is scanned by actions, from which is recovered, Miller and Shannahan suggest in [49] a non monotonic timebased framework where each time point is connected with a situation and the frame problem is addressed via minimisation. These approaches are substantially different from our; in fact, as we already stated in Section 4, our framework assumes the durative actions representation proposed in [65, 56], but considering multiple timelines and flexible intervals. Our approach, furthermore, contributes substantially to the formalisation of tasks switching and components interaction that has been treated and faced with methodologies, distinct from ours, such as the flexible temporal planning framework. Pinto in [57, 56] has considered the interaction within processes too, but under the perspective of exogenous actions as natural processes. In [66], Reiter and Yuhua exploit the temporal extension of the Situation Calculus already presented in [57, 56, 65] for modelling complex scheduling tasks by axiomatising a deadline driven scheduler. In this case, the tasks are to be scheduled for a single CPU, and a schedule of length n is a sequence of n ground actions represented by a single grounded situation term, therefore constraints and flexible plans are not taken into account. Temporal properties in the Situation Calculus are also investigated by Gabaldon in [24] and by Bienvenu, Fritz and McIlraith in [7, 23], mainly focusing on search control in forward planning. Gabaldon, in fact, in [24] proposes to formalising control knowledge for a forward chaining planner using Linear Temporal Logic (LTL) expressions, represented in the Situation Calculus, and shows how a progression algorithm can be deployed in this setting. In the context of preference based planning [5], Bienvenu et al. [7] propose a logical language for specifying qualitative temporal preferences in the Situation Calculus. In this framework, temporal preferences can be expressed in LTL and the temporal connectives are interpreted in the Situation Calculus following the approach proposed by Gabaldon in [24]. Kvarnström and Doherty in [36] present a forward-chaining planner based on domain-dependent search control knowledge represented as formulas in the Temporal Action Logic (TAL); a narrative based linear metric time logic is used for reasoning about action and change. The authors disregard temporal constraint networks and flexible planning although in [44], following an approach similar to the one taken in [20, 19], the authors propose a first step towards the integration of constraint-based representations within the TAL framework. A procedural approach to model-based executive control through temporally flexible programs is provided by the model-based programming paradigm of Williams and colleagues [80]. In this approach, the reactive system’s controller is specified by programs and models of the system components. In particular, the authors develop the Reactive Model-based programming (RMPL) language and its executive (Titan). Titan control executes RMPL programs using extensive component-based declarative models of the embedded system to track states, analyse anomalous situations, and generate control sequences. RMPL programs are complete procedural specification of the system behaviour. In contrast, we deploy the TFGolog framework where partially specified programs can be encoded. The system we propose in this paper copes with high-level agent programming and can be seen as a trade off between the model-based programming approach (e.g. RMPL-Titan) and the model-based reactive planning (e.g. IDEA [50, 18]), but based on a logical framework and inspired by neuroscience principle on task switching. Indeed, similarly to RMPL, we use high-level programs to design the controller, but the constructs are defined in FOL; further, to enable run-time switching, our programs are partially specified scripts to be completed on-line by the program interpreter that works as a planner. In the literature, we can find several works investigating the combination of logic-based framework and temporal constraint reasoning. For example, Dechter and Schwalb in [69] present a logic-based framework combining qualitative and quantitative temporal constraints. This framework integrates reasoning in a propositional and narrative-based representation of a dynamic domain - in the style of the Event Calculus - with inference techniques proper of the temporal constraint networks formalism of Dechter, Meiri and Pearl [14]. The integration is based on the notion of conditional temporal network (CTN) which allows decoupling propositional and temporal constraints and treating them in isolation. Analogously to our approach, the logical machinery determines a 10 SUMMARY AND OUTLOOK 39 temporal network that can be solved with constraint propagation techniques. The combination of logic-based and constraint-based temporal reasoning is also investigated within the Constraint Logic Programming (CLP) paradigm. For example, the TCLP framework proposed by Schwalb and Vilain [70] augments logic programs with temporal constraints. Indeed Schwalb and Vilain investigate a decidable fragment called Simple TCLP accommodating intervals of event occurrences and temporal constraints between them. Lamma and Milano in [37] extend the Constraint Logic Programming framework to temporal reasoning, elaborating on the extensions of Vilain and Kautzs Point Algebra, on Allen’s Interval Algebra and on the STP framework proposed by Dechter, Meiri and Pearl. Lamma and Milano show how it is possible to cope with disjunctive constraints even in an interval based framework. 10 Summary and Outlook Cognitive control has to deal with several components, with flexible behaviours that can be adapted to different contexts and with the ability to switch between tasks, on stimuli requests. In this paper, we have presented a methodology to incorporate these attitudes in the Situation Calculus. We have introduced the Temporally Flexible Situation Calculus (TFSC) that combines temporal constraint reasoning and reasoning about actions. In this framework, we have shown how to incorporate multiple parallel timelines and temporal constraints among the activities. For this purpose, we have introduced sets of concurrent, temporal, situations describing a constructive method to associate a temporal constraint network to each set of concurrent timelines represented by a collection of situations. In this way, causal logic-based reasoning and temporal constraint propagation methods can be integrated. We have described an approach for modelling complex dynamic domains in TFSC illustrating how temporally flexible behaviours can be represented. We have shown how this framework can be exploited to design and develop a model-based control system for an autonomous mobile robot capable of balancing high-level deliberative activities and reactive behaviours, more details on the application can be found in [10]. References [1] Natasha Alechina, Mehdi Dastani, Brian Logan, and John-Jules Ch. Meyer. A logic of agent programs. In AAAI, pages 795–800, 2007. [2] James F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832–843, 1983. [3] A. R. Aron. The neural basis of inhibition in cognitive control. The Neuroscientist, 13:214 – 228, 2007. [4] Fahiem Bacchus, Joseph Y. Halpern, and Hector J. Levesque. Reasoning about noisy sensors and effectors in the situation calculus. Artif. Intell., 111(1-2):171–208, 1999. [5] J. Baier, F. Bacchus, and S. McIlraith. A heuristic search approach to planning with temporally extended preferences. In Proceedings of IJCAI-2007), pages 1808–1815, 2007. [6] Federico Barber. Reasoning on interval and point-based disjunctive metric constraints in temporal contexts. J. Artif. Intell. Res. (JAIR), 12:35–86, 2000. [7] M. Bienvenu, C. Fritz, and S. McIlraith. Planning with qualitative temporal preferences. In Proceedings of KR-06, pages 134–144, 2006. [8] Stephen A. Block, Andreas F. Wehowsky, and Brian C. Williams. Robust execution on contingent, temporally flexible plans. In AAAI, 2006. REFERENCES 40 [9] Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, and Sebastian Thrun. Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of AAAI-2000, pages 355–362, 2000. [10] A. Carbone, A. Finzi, A. Orlandini, and F. Pirri. Model-based control architecture for attentive robots in rescue scenarios. Autonomous Robots, 24(1):87–120, 2008. [11] G. de Giacomo, Y. Lespérance, and H. J. Levesque. Congolog, a concurrent programming language based on the situation calculus. Artif. Intell., 121(1-2):109–169, 2000. [12] A.K. Jonsson D.E. Smith, J. Frank. Bridging the gap between planning and scheduling. Knowledge Engineering Review, 15(1), 2000. [13] Rina Dechter, Itay Meiri, and Judea Pearl. Temporal constraint networks. In KR, pages 83–93, 1989. [14] Rina Dechter, Itay Meiri, and Judea Pearl. Temporal constraint networks. Artif. Intell., 49(1-3):61–95, 1991. [15] Yiannis Demiris and Bassam Khadhouri. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5):361–369, 2006. [16] J. Duncan. Disorganization of behaviour after frontal-lobe damage. Cognitive Neuropsychology, 3:271– 290, 1986. [17] Matthias Fichtner, Axel Großmann, and Michael Thielscher. Intelligent execution monitoring in dynamic environments. Fundamenta Informaticae, 57(2–4):371–392, 2003. [18] A. Finzi, F. Ingrand, and N. Muscettola. Model-based executive control through reactive planning for autonomous rovers. In Proceedings of IROS-2004, pages 879–884, 2004. [19] A. Finzi and F. Pirri. Flexible interval planning in concurrent temporal golog. In Proceedings of Cognitive Robotics 2004, 2004. [20] A. Finzi and F. Pirri. Representing flexible temporal behaviors in the situation calculus. In Proceedings of IJCAI-2005, pages 436–441, 2005. [21] A. Finzi, F. Pirri, and R. Reiter. Open world planning in the situation calculus. In Proceedings of AAAI/IAAI2000, pages 754–760, 2000. [22] Jeremy Forth and Murray Shanahan. Indirect and conditional sensing in the event calculus. In ECAI, pages 900–904, 2004. [23] C. Fritz and S. McIlraith. Decision-theoretic golog with qualitative preferences. In Proceedings of the 10th International Conference on Principles of Knowledge Representation and Reasoning (KR06), pages 153–163, Lake District, UK, June 2006. [24] A. Gabaldon. Precondition control and the progression algorithm. In Shlomo Zilberstein, Jana Koehler, and Sven Koenig, editors, ICAPS-2004, pages 23–32. AAAI, 2004. [25] Sandra Clara Gadanho. Learning behavior-selection by emotions and cognition in a multi-goal robot task. J. Mach. Learn. Res., 4:385–412, 2003. [26] Michael Gelfond and Vladimir Lifschitz. Action languages. Electron. Trans. Artif. Intell., 2:193–210, 1998. [27] M. Ghallab and H. Laruelle. Representation and control in ixtet, a temporal planner. In Proceedings of AIPS-1994, pages 61–67, 1994. REFERENCES 41 [28] G. De Giacomo, Y. Lespérance, and H. Levesque. ConGolog, a concurrent programming language based on the situation calculus. Artif. Intell., 121(1–2):109–169, 2000. [29] G. De Giacomo, Y. Lespérance, and H. J. Levesque. Reasoning about concurrent execution, prioritized interrupts, and exogenous actions in the situation calculus. In IJCAI-1997, pages 1221–1226, 1997. [30] H. Grosskreutz and G. Lakemeyer. ccgolog – a logical language dealing with continuous change. Logic Journal of the IGPL, 11(2):179–221, 2003. [31] H. Grosskreutz and G. Lakemeyer. Probabilistic complex actions in golog. Fundam. Inf., 57(2-4):167–192, 2003. [32] J. Jaffar and M.J. Maher. Constraint logic programming: A survey. Journal of Logic Programming, 19/20:503–581, 1994. [33] Ari K. Jonsson, Paul H. Morris, Nicola Muscettola, Kanna Rajan, and Benjamin D. Smith. Planning in interplanetary space: Theory and practice. In Artificial Intelligence Planning Systems, pages 177–186, 2000. [34] Kazuhiko Kawamura, Tamara E. Rogers, and Xinyu Ao. Development of a cognitive model of humans in a multi-agent framework for human-robot interaction. In AAMAS ’02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems, pages 1379–1386, New York, NY, USA, 2002. ACM. [35] Andrei Krokhin, Peter Jeavons, and Peter Jonsson. Reasoning about temporal relations: The tractable subalgebras of allen’s interval algebra. J. ACM, 50(5):591–640, 2003. [36] J. Kvarnström and P. Doherty. Talplanner: A temporal logic based forward chaining planner. Annals of Mathematics and Artificial Intelligence, 30(1-4):119–169, 2000. [37] E. Lamma, M. Milano, and P. Mello. Extending constraint logic programming for temporal reasoning. Annals of Mathematics and Artificial Intelligence, 22(1-2):139–158, 1998. [38] H. J. Levesque, R. Reiter, Y. Lesperance, F. Lin, and R. B. Scherl. GOLOG: A logic programming langauge for dynamic domains. Journal of Logic Programming, 31:59–84, 1997. [39] Hector Levesque and Gerhard Lakemeyer. Handbook of Knowledge Representation, chapter Cognitive Robotics. Elsevier, 2007. [40] Hector J. Levesque. What is planning in the presence of sensing? In AAAI/IAAI, Vol. 2, pages 1139–1146, 1996. [41] Hector J. Levesque, Fiora Pirri, and Raymond Reiter. Foundations for the situation calculus. Electron. Trans. Artif. Intell., 2:159–178, 1998. [42] H.J. Levesque. Knowledge, action, and ability in the situation calculus. In Proceedings of TARK-94, pages 1–4. Morgan Kaufmann, 1994. [43] F. Lin and R. Reiter. State constraints revisited. Journal of Logic and Computation, 5(4):655–677, 1994. [44] M. Magnusson and P. Doherty. Commonsense-2007, 2007. Deductive planning with temporal constraints. In Proceedings of REFERENCES 42 [45] U. Mayr and SW. Keele. Changing internal constraints on action: the role of backward inhibition. Journal of Experimental Psychology, 129(1):4–26, 2000. [46] J. McCarthy. Situations, actions and causal laws. Technical report, Stanford University, 1963. Reprinted in Semantic Information Processing (M. Minsky ed.), MIT Press, Cambridge, Mass., 1968, pp. 410-417. [47] Itay Meiri. Combining qualitative and quantitative constraints in temporal reasoning. Artif. Intell., 87(12):343–385, 1996. [48] E.K. Miller and J.D. Cohen. An integrative theory of prefrontal cortex function. Annual Rev. Neuroscience, 24:167 – 202, 2007. [49] R. Miller and M. Shanahan. Narratives in the situation calculus. Journal of Logic and Computation, 4(5):513–530, 1994. [50] N. Muscettola, G. A. Dorais, C. Fry, R. Levinson, and C. Plaunt. Idea: Planning at the core of autonomous reactive agents. In Proc. of NASA Workshop on Planning and Scheduling for Space, 2002. [51] Nicola Muscettola. Hsts: Integrating planning and scheduling. Intelligent Scheduling, pages 451–461, 1994. [52] Bernhard Nebel and Hans-Jürgen Bürckert. Reasoning about temporal relations: a maximal tractable subclass of allen’s interval algebra. J. ACM, 42(1):43–66, 1995. [53] A. Newell. Unified theories of cognition. Harvard University Press, 1990. [54] D. A. Norman and T. Shallice. Consciousness and Self-Regulation: Advances in Research and Theory, volume 4, chapter Attention to action: Willed and automatic control of behaviour. Plenum Press, 1986. [55] Andrea Philipp and Iring Koch. Task inhibition and task repetition in task switching. The European Journal of Cognitive Psychology, 18(4):624–639, 2006. [56] J. Pinto. Occurrences and narratives as constraints in the branching structure of the situation calculus. Journal of Logic and Computation, 8(6):777–808, 1998. [57] J. Pinto and R. Reiter. Reasoning about time in the situation calculus. Annals of Mathematics and Artificial Intelligence, 14:2510–268, 1995. [58] J.A. Pinto and R. Reiter. Reasoning about time in the situation calculus. Annals of Mathematics and Artificial Intelligence, 14(2-4):251–268, September 1995. [59] F. Pirri and A. Finzi. An approach to perception in theory of actions: Part i. Electron. Trans. Artif. Intell., 3(C):19–61, 1999. [60] F. Pirri and R. Reiter. Planning with natural actions in the situation calculus. Logic-based artificial intelligence, pages 213–231, 2000. [61] Fiora Pirri and Ray Reiter. Some contributions to the metatheory of the situation calculus. Journal of ACM, 46(3):325–361, 1999. [62] R. Reiter. The frame problem in the situation calculus: A simple solution (sometimes) and a completeness result for goal regression. In Vladimir Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, pages 359–380. Academic Press, San Diego, CA, 1991. REFERENCES 43 [63] R. Reiter. Natural actions, concurrency and continuous time in the situation calculus. In Proceedings of KR’96, pages 2–13, 1996. [64] R. Reiter. Sequential, temporal GOLOG. In Proceedings of KR’98, pages 547–556, 1998. [65] R. Reiter. Knowledge in action : logical foundations for specifying and implementing dynamical systems. MIT Press, 2001. [66] R. Reiter and Z. Yuhua. Scheduling in the situation calculus: A case study. Annals of Mathematics and Artificial Intelligence, 21(2-4):397–421, 1997. [67] J.S. Rubinstein, E.D. Meyer, and J. E. Evans. Executive control of cognitive processes in task switching. Journal of Experimental Psychology: Human Perception and Performance, 27(4):763–797, 2001. [68] Erik Sandewall. Features and fluents (vol. 1): the representation of knowledge about dynamical systems. Oxford University Press, Inc., 1994. [69] E. Schwalb, K. Kask, and R. Dechter. Temporal reasoning with constraints on fluents and events. In Proceedings of AAAI-1994, pages 1067–1072, Menlo Park, CA, USA, 1994. American Association for Artificial Intelligence. [70] E. Schwalb and L. Vila. Logic programming with temporal constraints. In TIME ’96: Proceedings of the 3rd Workshop on Temporal Representation and Reasoning (TIME’96), Washington, DC, USA, 1996. IEEE Computer Society. [71] M.P. Shanahan. A cognitive architecture that combines internal simulation with a global workspace. Consciousness and Cognition, 15:433–449, 2006. [72] Murray Shanahan. Solving the frame problem: a mathematical investigation of the common sense law of inertia. MIT Press, 1997. [73] Murray Shanahan. The event calculus explained. In Artificial Intelligence Today, pages 409–430. 1999. [74] A. Tate. ”I-N-OVA” and ”I-N-CA”, Representing Plans and other Synthesised Artifacts as a Set of Constraints, pages 300–304. 2000. [75] Michael Thielscher. FLUX: A logic programming method for reasoning agents. Theory and Practice of Logic Programming, 5(4–5):533–565, 2005. [76] Michael Thielscher and Thomas Witkowski. The features-and-fluents semantics for the fluent calculus. In KR, pages 362–370, 2006. [77] S. P. Tipper. Does negative priming reflect inhibitory mechanisms? a review and integration of conflicting views. Quarterly Journal of Experimental Psychology, 54:321 – 343, 2001. [78] Marc B. Vilain and Henry A. Kautz. Constraint propagation algorithms for temporal reasoning. In AAAI, pages 377–382, 1986. [79] H. Wang and K. J. Brown. Finite set theory, number theory and axioms of limitation. Mathematische Annalen, 164:26–29, 1966. [80] B. Williams, M. Ingham, S. Chung, P. Elliott, M. Hofbaur, and G. Sullivan. Model-based programming of fault-aware systems. AI Magazine, Winter 2003. Appendix A A NOTATIONAL CONVENTIONS AND PRELIMINARIES A 44 Notational conventions and preliminaries We recall that the set of axioms Σ (see also Table 1) is: 1. 2. 3. 4. ¬(s � S 0 ), s � do(a, s� ) ≡ s � s� , do(a, s) = do(a� , s� ) ≡ a = a� ∧ s = s� , ∀P.P(S 0 ) ∧ ∀as.P(s)→P(do(a, s))→∀sP(s). (43) Duna is the set of axioms of the form Ai (·) � A j (·) with Ai and A j names of actions, and the set of axioms specifying that identical action terms must have the same arguments (see Section 3.1). The set Σ ∪ Duna is satisfiable in some model M0 = (D, I) in which the real line is interpreted as usual. In fact in order to introduce time (see [58]) we may assume that the signature of initial language L sitcalc includes: (i) All rational constants p/q, and the special symbols 0 and 1; (ii) usual operators, such as +, − and ·; (iii) the relations < and, being = in the language, the defined relation ≤=< ∨ = and the defined relation > standing for (¬ ≤). Thus Σ can be suitably extended to include all the axioms for the theory of reals (additive, multiplicative, order, least upper bound) and the axiom ∀t.0 ≤ t with t ranging over the reals. We also may assume that there always exists a structure M for the classical basic Situation Calculus, in which the real numbers have an intended interpretation. For the next theorems we stipulate the following. Let S be the signature of standard Situation Calculus with equality, including symbols for actions, situations, �, indexed symbols ti for the rational numbers, and the symbols indicated in the above items (i-iii), then: 1. 2. 3. 4. L1 is L sitcalc , the language defined on S. L2 is L1 extended to the signature including the symbols for the terms mentioning time. L3 is like L2 extended to the signature including the symbols H, a finite amount of new constants, of sort object, for types, and =ν . (44) L4 is like L3 extended to the signature including the symbols for the terms mentioning timelines and the terms of sort bag of timelines, i.e. the symbols T , ∈S , =S , ∪S , ∩S , ⊆S and the symbol B. L4 is LT FS C , the language of the Temporal Flexible Situation Calculus. A formula ϕ of a language L is said to be restricted to the language LQ , over the signature Q, and denoted ϕ\LQ if ϕ mentions only the symbols of Q. Finally we recall two theorems of [61] that will be used in the next proofs. Theorem 8 (Relative Satisfiability) A basic action theory D is satisfiable iff Duna ∪ DS 0 is. Theorem 9 (Regression) Suppose W is a regressable formula of L sitcalc and D is a basic theory of actions. Then R[W] is a formula uniform in S 0 . Moreover, D |= (∀)(W ≡ R[W]), where (∀)φ denotes the universal closure of the formula φ with respect to its free variables. For the definition of the regression operator R we refer the reader for details to [61], see also equation (85). Appendix B B PROOFS OF SECTION 3 45 B Proofs of Section 3 In the following we assume that the reals are axiomatised, as noted in A above, that the language includes countable many terms taking values in the reals and countable many constant symbols denoting the rational numbers, plus 0 and 1. We also assume that actions form their arguments in the domain Ob j and R+ . B.1 Lemma 1-7 Lemma 1 Σtime is a conservative extension of Σ and any model of Σ ∪ Duna can be extended to a model of Σtime ∪ Duna . Proof of Lemma 1. Let Σtime = Σ ∪ Ax0 , that is, the set formed by the foundational axioms of the Situation Calculus, given above, and the axioms T 1-T 3 (see Table1 page 9). We have to prove that Σtime ∪ Duna is a conservative extension of Σ ∪ Duna , that is, for any formula ϕ in the language L1 of Σ ∪ Duna : Σtime ∪ Duna |= ϕ iff Σ ∪ Duna |= ϕ Let M0 = �D, I� be a model of Σ ∪ Duna with D including the positive real line (see paragraph A above). We define a structure M1 = (D, I � ) for L2 having the same domain as M0 and with I � interpreting all the symbols of L1 like I, thus, in particular, we might assume that there is a specific term t0 indicating the 0, and such that for all positive t, (t0 ≤ t)(I,v) , and thus this is replicated in M1 , v. Now for the terms mentioning time the interpretation I � is specified as follows, for any assignment v to the free variables: (1) (2) (3) � (time(S 0 ))(I ,v) is mapped to t0(I ,v) . � � For each action A(�x, t) we set (time(A(�x, t)))(I ,v) = t(I ,v) iff t > 0. For all situation s and actions A: � � (time(do(A(�x, t), s)))(I ,v) = time(A(�x, t))(I ,v) . � It follows that T 1-T 3 are satisfied in M1 . This concludes the interpretation of time. We have shown that any structure for L1 which is a model of Σ ∪ Duna can be suitably extended to the language L2 in so being a model of Σtime ∪ Duna . Now, by monotonicity, if Σ ∪ Duna |= ϕ\L1 then Σtime ∪ Duna |= ϕ\L1 . For the other direction, suppose that Σtime ∪ Duna |= ϕ\L1 and Σ ∪ Duna �|= ϕ\L1 , then there is a model M of Σ ∪ Duna not satisfying ϕ\L1 . Now, M can be extended to satisfy Σtime ∪ Duna , hence we have a contradiction. Lemma 2 There exists a model M of Σtime ∪ Duna such that, for all s and s� , M models: s � s� →time(s) ≤ time(s� ) (45) Proof of Lemma 2. Let M0 = (D, I) be a model of Σtime ∪ Duna , using Lowenheim-Skolem theorem let M1 be a model elementary equivalent to M0 but with a countable domain. We build a new model M2 = (D� , I � ), having the same domain for Act and Ob j as M1 , and interpreting in these domains everything, like I. However, the domain of situations in D� is DS 0 = {S 0I,v } = {[]}, that is, the domain of situations includes only the interpretation of the constant S 0 , which is the usual one and it is like in I: (1) (2) (3) M1 , v |= time(S 0 ) = t0 M1 , v |= A(x, t) = t M1 , v |= A(x, t) = A� (x, t) iff M2 , v |= time(S 0 ) = t0 iff M2 , v |= A(x, t) = t iff M2 , v |= A(x, t) = A� (x, t) (46) B PROOFS OF SECTION 3 46 The set of terms for the sort Act is countable, thus we can enumerate the terms of sort action, order them according to time a1 , a2 , . . . , and consider the interpretation in M2 , according to v, with respect to time along a chain as follows: C = t0(M2 ,v) <(M2 ) time(a1 )(M2 ,v) ≤(M2 ) time(a2 )(M2 ,v) ≤(M2 ) · · · ≤(M2 ) time(am )(M2 ,v) ≤(M2 ) · · · (47) Here ≤ has the usual interpretation. Since C is countable because there are countable many terms in the language of sort Act, we can assume that each term time(ai ) is suitably interpreted. Furthermore, being both M1 and M2 models of Σtime , by axiom (T2), for all actions A, t0 < time(A(x, t)). Now, given that the domain of sort situation is DS 0 = {S 0(M2 ) } = {[]}, we shall build the following two sets: T ti = {a(M2 ,v) ∈ Act | time(a)(M2 ,v) = ti(M2 ,v) , ti(M2 ,v) the i-th element in C} 2 ,v) 2 ,v) 2 ,v) 2 ,v 2 ,v 2 ,v 2 ,v D si = {[aM , . . . , aM ] | a(M , . . . , a(M ∈ Act, a(M ∈ T ti and [aM , . . . , aM i i i 1 1 1 i−1 ] ∈ D si−1 } (48) Each D si is countable. Taking the union of the D si : S it = ∞ � D si (49) i=0 We still get a countable set such that D si ⊆ S it for all D si . The interpretation of do can now be defined as usual on the sequences in S it, the interpretation of � can be set also to be the usual one, given that the interpretation of each element in S it is a finite sequence of elements of the domain Act. Thus we extend the interpretation I � of M2 to J accordingly. Indeed, let M = (D� ∪ S it, J): (J,v) (J,v) [a(J,v) ] = do(J,v) (a(J,v) , [a(J,v) i 1 , . . . , ai 1 , . . . , ai−1 ]) � (J,v) J � (J,v) (J,v) s � s iff the sequence s is a proper initial subsequence of s (J,v) (50) (J,v) (J,v) (J,v) (J,v) It follows, by the definition of the D si , that if [a(J,v) ∈ D s p and [a(J,v) ∈ D sq , 1 , . . . , ap ] = s 1 , . . . , aq ] = s (J,v) (J,v) (J,v) J � (J,v) I (J,v) I (J,v) with p < q then s � s , hence time (a p ) < time (aq ) since aP ∈ T t p and aq ∈ T tq . Now, in the new 2 ,v) 2 ,v) structure M we can define, for each [a(M , . . . , a(M ] ∈ D si ⊆ S it: i 1 � time([a1 , . . . , ai ])(M,v) = time(do(ai , s))(M,v) = time(ai )(M2 ,v) Hence time(s) < time(s� ) and, clearly, M = (D� ∪ S it, J) is a model of Σtime ∪ Duna . Thus the claim holds. (51) � Lemma 3 Let Ax1 denote the axioms H1-H2 and ΣH = Σtime ∪ Ax1 : ΣH is a conservative extension of Σtime and any model of Σtime ∪ Duna can be extended to a model of ΣH ∪ Duna . Proof of Lemma 3. Let ΣH = Σtime ∪ Ax1 (see the axioms H1-H2 Table 1, page 9). We have to prove that ΣH ∪Duna is a conservative extension of Σtime ∪Duna , that is, for any formula ϕ in the language L2 of Σtime ∪Duna : ΣH ∪ Duna |= ϕ iff Σtime ∪ Duna |= ϕ Let M0 = �D, I� be a model of Σtime ∪ Duna . We extend I to I � to interpret the type predicates H(i, a) and the relation =ν , between actions. Let M1 = (D, I � ) be a structure having the same domain as M0 and I � will interpret all predicate symbols, and function symbols and constant of L2 as I, and for the extended language L3 we shall proceed with the following interpretation. • We first consider the interpretation of the predicate H(i, a). Here we shall only provide a partition of name types, as follows. We order the constants denoting types, namely i1 , i2 , . . . , im and the action names B PROOFS OF SECTION 3 47 A1 , A2 , . . . , An , . . . and we define a mapping f : A p �→ (mod(p − 1, m) + 1), with p ≥ 1 and m the number of constants denoting types, so that each action name is assigned precisely to a single type. Now, for any assignment v: M0 , v |= a = A p (�x) iff M1 , v |= a = A p (�x) We thus set � � �ik , A p (�x)�(I ,v) ∈ H I iff f (A p ) = k = (mod(p − 1, m) + 1) It follows that if M0 , v |= A p (�y) = A p (�x) then M1 , v |= H(ik , A p (�x)) ∧ H(ik , A p (�y)) for f (A p ) = k = (mod(p − 1, m) + 1) • Next we consider the interpretation of =ν , for any assignment v, as follows: � � �A p (�y), Aq (�x)�(I ,v) ∈ =ν I iff M1 , v |= H(ik , A p (�y)) ∧ H(ik , Aq (�x)) By the construction it follows that: 1. 2. 3. if M0 , v |= A(�x) = A(�y) then M1 , v |= A(�x) = A(�y) and M1 , v |= A(�x)=ν A(�y) M0 , v |= A(�x) � B(�y) iff M1 , v |= A(�x) � B(�y) � � � � � � � � H I (iI , AIp (�xv )) ∩ H I ( jI , AqI (�xv )) = ∅ iff iI � jI . Hence H1-H2 ∪ Duna are verified in M1 . Now, by an analogous argument as in Lemma 1 we obtain that ΣH ∪ Duna is a conservative extension of Σtime ∪ Duna . � Lemma 4 The relation =ν is an equivalence relation on the set of actions. Proof of Lemma 4. That the relation =ν is reflexive and symmetric follows from (H2) and the property of ∧. And the same for transitivity: 1. 2. 3. a=ν a� ∧ a� =ν a�� →H(i, a) ∧ H(i, a� ) ∧ H(i, a�� ) H(i, a) ∧ H(i, a�� )→a=ν a�� a=ν a� ∧ a� =ν a�� →a=ν a�� (by (H2)) (by (H2)) (by 1, 2 and Taut.) (52) Hence =ν is a reflexive, transitive and symmetric relation on the set of actions partitioned by (H1), i.e. it is an equivalence relation. � Lemma 5 Let Ax2 denote the set of four axioms E1-E4 and Σ=ν = ΣH ∪ Ax2 . Any model of ΣH ∪ Duna can be extended to a model of Σ=ν ∪ Duna . Proof of Lemma 5. Let Σ=ν = ΣH ∪ Ax2 , we have to prove that Σ=ν ∪ Duna is satisfiable iff ΣH ∪ Duna is. Let M1 = (D, I) be a model of ΣH ∪ Duna , M1 exists according to Lemma 3. Let M = (D, I � ), where I � interprets all symbols like I and =ν , on all actions, like I. We thus manage to extend I � to interpret =ν also on situations as follows: B PROOFS OF SECTION 3 48 � if M1 , v |= s = s� then �s, s� �(I ,v) ∈ =ν I (a) � � � if M1 , v |= ¬(s = S 0 ) then (�s, S 0 �(I ,v) , �S 0 , s�(I ,v) ) � =ν I (b) � � � � if M1 , v |= (a=ν a� ) then both �a, do(a� , S 0 )�(I ,v) ∈ =ν I and �a� , do(a, S 0 )�(I ,v) ∈ =ν I (c) � � � � if M1 , v |= ¬(a=ν a� ) then both �a, do(a� , S 0 )�(I ,v) � =ν I and �a� , do(a, S 0 )�(I ,v) � =ν I (d) � The interpretation I � can thus be extended for any assignment v to all situations as follows: � � � � � � � � if M1 , v |= (a=ν a� ) and �s, a� �(I ,v) ∈ =ν I then �a, do(a� , s)�(I ,v) ∈ =ν I (e) if M1 , v |= ¬(a=ν a� ) or �s, a� �(I ,v) � =ν I then �a, do(a� , s)�(I ,v) � =ν I (f) � � � � � � � � if M1 , v |= (s=ν s� ) and �s, a� �(I ,v) ∈ =ν I then �s, do(a� , s� )�(I ,v) ∈ =ν I (g) if M1 , v |= ¬(s=ν s� ) or �a� , s�(I ,v) � =ν I then �s, do(a� , s� )�(I ,v) � =ν I (h) � � � if (a=ν s)�(I ,v) ∈ =ν I then �s, a�(I ,v) ∈ =ν I (i) � This concludes the extension of I to I � . The construction implies that M = (D, I � ) is a model of (H1-H2, E1-E5). � Lemma 6 Let =ν be a relation on the terms of sort actions and situations. Then =ν is an equivalence relation both on situations and on actions and situations. Proof of Lemma 6. First note that the relation =ν is reflexive both on situations and on actions. By axiom E5 it is symmetric on action and situations. We show that it is symmetric and transitive on situations, likewise that it is transitive over actions and situations: Basic case: symmetry s s=ν S 0 ≡ S 0 =ν s (By E1.) (53) Let, now, s, s� � S 0 , we shall first show: (a). do(a, s)=ν do(a� , s� ) ≡ a=ν a� ∧ s=ν s� ∧ s� =ν a ∧ a� =ν s (54) Indeed: 3.1 3.2 3.3 3.4 3.5 do(a, s)=ν do(a� , s� ) do(a, s)=ν a� a� =ν do(a, s) s� =ν do(a, s) do(a, s)=ν do(a� , s� ) ≡ ≡ ≡ ≡ ≡ do(a, s)=ν a� ∧ s� =ν do(a, s) (By E4) a� =ν do(a, s) (By E5) a=ν a� ∧ (s=ν a� ) (By E4) (55) s� =ν a ∧ s� =ν s (By E4) (a=ν a� ) ∧ (s=ν s� ) ∧ (s� =ν a) ∧ (a� =ν s) (By 3.2, 3.3, 3.4, E5 and Ind. Hyp.) We can thus show symmetry for situations: symmetry-s : do(a, s)=ν do(a� , s� ) ≡ do(a, s)=ν do(a� , s� ) ≡ ≡ ≡ s: do(a� , s� )=ν do(a, s) (a=ν a� ) ∧ (s=ν s� ) ∧ (s� =ν a) ∧ (a� =ν s) (a� =ν a) ∧ (s� =ν s) ∧ (s=ν a� ) ∧ (a=ν s� ) do(a� , s� )=ν do(a, s) (By 3) (By Ind. Hyp.) (56) We shall, now, show transitivity for action and situations, here (symm) shall refer to both (E5) and symmetryT 1. a=ν s ∧ s=ν s� →a=ν s� . (57) B PROOFS OF SECTION 3 49 For either s = S 0 or s� = S 0 or both, it is trivially true, by (E2). Let s, s� � S 0 1. a=ν s ∧ s=ν s� ∧ a = a→a=ν s� ∧ s� =ν s ∧ a=ν a ( By (E1) and (E5)) 2. a=ν s ∧ s� =ν s ∧ a=ν a→do(a, s� )=ν do(a, s) ( by (a)) 3. do(a, s� )=ν do(a, s)→a = do(a, s� ) ∧ s=ν do(a, s� ) ( by (E4)) 4. a = do(a, s� )→a=ν a ∧ a=ν s� ( by (E3) and (symm.)) 5. a=ν s ∧ s=ν s� →a=ν s� ( by 1,4 and Taut) (58) T 2. a� =ν s ∧ s=ν s� ∧ s� =ν a→a� =ν a. (59) 1. 2. 3. 4. 5. a� =ν s ∧ s� =ν s→s=ν do(a� , s� ) ( By (E4) and (symm) ) a=ν s� ∧ s� =ν s→a=ν s ( by T1. and (symm.)) a=ν s ∧ s=ν do(a� , s� )→a=ν do(a� , s� ) ( by 1, 2 and T1.) a=ν do(a� , s� )→a=ν a� ∧ a=ν s� ( by (E4)) a� =ν s ∧ s=ν s� ∧ s� =ν a→a� =ν a ( by 1,2, 4, (symm.) and Taut) (60) Similarly, from (a), (E3), (E4) and (E5) it is possible to prove that T 3. a=ν a� ∧ a� =ν s→a=ν s. (61) Finally transitivity for situations can be shown by induction on s. For s = S 0 and s�� = S 0 : 1. 2. 3. 4. do(a, S 0 )=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , S 0 )→a=ν a�� ( By (a) above and Lemma 4) a=ν a�� →a�� =ν do(a, S 0 ) ( by (E3)) (62) a�� =ν do(a, S 0 )→do(a, S 0 )=ν do(a�� , S 0 ) ( by (E4)) do(a, S 0 )=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , S 0 )→do(a, S 0 )=ν do(a�� , S 0 ) ( by 1, 3 and Taut.) For s � S 0 : 1. 2. 3. 4. 5. do(a, s)=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , s�� )→a=ν s� ∧ a� =ν s ∧ a� =ν s�� ∧ a�� =ν s� s=ν s� ∧ s� =ν s�� →s=ν s�� a=ν s� ∧ s� =ν s�� →a=ν s�� a=ν a�� ∧ s=ν s�� ∧ a=ν s�� →do(a, s)=ν do(a�� , s�� ) do(a, s)=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , s�� )→do(a, s)=ν do(a�� , s�� ) ( By (a) ) ( by Ind. Hyp.) ( by (a) ) (63) ( by (d) ) ( by 5 and Taut.) We have thus shown that =ν is an equivalence relation on the set of actions and situations. � B.2 Proof of Theorem 1 We have to show that Σ ∪ Duna together with the set of axioms Ax0 − Ax2 , that is, (T 1-T 3, H1-H2, E1-E5) forms a satisfiable set. We have shown, incrementally that Σtime = Σ ∪ Ax0 is a conservative extension, Lemma 1, that ΣH = Σtime ∪ Ax1 conservatively extends Σtime , Lemma 3, and that Σ=ν = ΣH ∪ Ax2 is a conservative extension of ΣH , Lemma 5. And, in particular, that all are conservative extensions of Σ ∪ Duna . Hence any model of Σ ∪ Duna can be extended to a model of Σ ∪ Ax0 ∪ Ax1 ∪ Ax2 . � B PROOFS OF SECTION 3 50 B.3 Proof of Corollary 1 By Theorem 1 we know that Σ=ν ∪Duna is satisfiable in some model M of L3 . On the other hand the satisfiability of DS 0 and hence of Σ=ν ∪ Duna ∪ DS 0 depends on the design of DS 0 and, in particular, on the definition of H(i, a), for each component i, which are in DS 0 . If DS 0 ∪ Duna ∪ Σ=ν is satisfiable, then following the same arguments of the relative satisfiability theorem (see Theorem 8) a model M of DS 0 ∪ Duna ∪ Σ=ν can be easily extended to a model of Σ=ν ∪ Duna ∪ DS 0 ∪ D ss ∪ Dap . The other direction follows from the fact that a model of Σ=ν ∪ Duna ∪ DS 0 ∪ D ss ∪ Dap is also a model of Σ=ν ∪ Duna ∪ DS 0 . � The only concern is given by the specifications of the H(i, a) for each type i and each action A, mentioned in Duna , in DS 0 , whether there exists a model for DS 0 ∪ Duna ∪ Ax1 . If this model exists then using the previous theorem and lemmas this can be extended to a model of DS 0 ∪ Duna ∪ Σ=ν . So we may make some assumption on the definition of the H(i, a) to show that, under these conditions, a model of DS 0 ∪ Duna ∪ Ax1 exists. Lemma 7 Let D−S 0 ∪ Duna be satisfiable in some model M of L2 , let it be uniform in S 0 and not mentioning the predicate H(·). Then the definitions of H(·), for each type i, and action A referred to, in Duna , can be safely added to D−S 0 ∪ Duna in the form: ∀a.H(i, a) ≡ ϕ(i, a), with ϕ not mentioning H(·) (64) If there are formulas ϕ(i, a) specifying actions and components, such that, for each i. i. D−S 0 |� = ∀a.ϕ(i, a), � ii. D−S 0 |= ∃a.ϕi (i, a) ∧ ni� j ¬ϕ j ( j, a). (65) j=1 then the extended DS 0 will satisfy, for all types i: a. b. H(i, a), in DS 0 occurs only in formulas of the form ∀a.H(i, a) ≡ φ(i, a), with φ not mentioning H(·), � � DS 0 ∪ Duna ∪ {∀a. ni=1 H(i, a)} ∪ {∀a.H(i, a)→ ni� j ¬H( j, a)} (66) is satisfiable in some modelM� of L3 j=1 Proof. Assume that D−S 0 is satisfiable in some structure M of L2 and, thus, it does not mention H(·). Then we extend the theory DS 0 to L3 according to the following construction. First note that here we mention i as an element of the domain object, no axioms for types are assumed so far, although we can assume that there are n elements of the domain object specifying components (despite we use natural numbers to denote them). Define, similarly as in Lemma 3, an indexing function i = mod( j − 1, n) + 1 for action names A j so that actions are grouped in such a way that M �|= ∀aϕ(i, a), with ϕ(i, a) a suitable formula mentioning the A j specified in Duna , and satisfying the conditions of the Lemma. For example, if there is a finite � set of action names ascribed to a component i, then ϕi (i, a) is ∃ x�1 , . . . , x�k kj=1 a = A j (�x j ), as in (6), with the A j suitably chosen with i = mod( j − 1, n) + 1, and it satisfies all the conditions of the lemma. Given M, by the Lowenheim-Skolem theorem, there is a model M∼ of DS 0 ∪ Duna which is elementary equivalent to M and has a countable domain. As usual we define a new structure M1 = (D, I) for L3 , with the same domain and the same interpretation as M∼ on all symbols of L2 . Furthermore M1 interprets all the new constants symbols that are added in the construction, as illustrated below in (67). The construction with new constants is given according to the above specified enumeration of formulas ϕ(i, a), B PROOFS OF SECTION 3 51 and the above specified conditions as follows: ∆0 ... ∆i = = {ψ | M∼ |= ψ} � {H(i, a) ≡ ϕ(i, a) | M∼ , v |= ϕ(i, a) iff M1 , v |= H(i, a) and ∃a.ϕ(i, a) not used in ∆ j , 0 < j < i} � (67) {H(i, c) | M1 , v |= H(i, a) ∧ ϕ(i, a), av = d = cI , c a fresh constant symbol} � v I {¬H(i, c) | M1 , v |= ¬ϕ(i, a) ∧ ¬H(i, a), a = d = c , c a fresh constant symbol} � � {H(i, c)→ nj�i ¬H( j, c) | M1 , v |= nj�i ¬ϕ j ( j, a) ∧ ϕi (i, a), , av = d = cI , c a fresh constant symbol} j=1 j=1 Each ∆i , 0 ≤ i ≤ n is satisfiable in M1 , by construction, furthermore D= n � (68) ∆i i=0 is satisfiable in M1 and H is complete in D, which is the diagram of H, in M1 , that is, H(i, c) ∈ D iff M1 , v |= H(i, a) and ¬H(i, c) ∈ D iff M1 , v |= ¬H(i, a), with av = d = cI . By the constraints on each ϕ(i, a) in the � � enumeration, M∼ �|= ∀a.ϕ(i, a), and M∼ , v |= ϕi (i, a) ∧ nj�i ¬ϕ j ( j, a) hence H(i, c)→ nj�i ¬H( j, c) ∈ ∆i . It j=1 j=1 � � remains to show that ni=1 ¬H(i, c) � ∆i . But that ni=1 ¬H(i, c) ∈ ∆i is impossible, since for each ∆i all the added �n constants are fresh hence if ¬H(i, c) ∈ ∆i , then ¬H( j, c) � D, i � j. Also because if j�i ¬H( j, c) ∈ ∆i , then by j=1 � the condition on the subset, it must be that H(i, c) hence ni=1 ¬H(i, c) � ∆i . �n � It follows that M1 �|= ∃a. i=1 ¬H(i, a) and since M1 is also a model of DS 0 it follows that DS 0 �|= ∃a. ni=1 ¬H(i, a). �n �n On the other hand H(i, c)→ j�i ¬H( j, c) ∈ ∆i , for each i, hence D |= H(i, c)→ j�i ¬H( j, c) for each i with c j=1 j=1 � � � � � � new constants, hence M1 |= ∀a.H(i, a)→ nj�i ¬H( j, a). Thus DS 0 Duna {∀a.H(i, a)→ nj�i ¬H( j, a)} {∀a. ni=1 H(i, a)} j=1 j=1 is satisfiable in M1 . Therefore, under the conditions (65), M1 |= H1. Following again Lemma 3 the construction can lead to a model for DS 0 ∪ Duna ∪ Ax1 . � B.4 Proof of Theorem 2 Theorem 2 A timeline represents the =ν -equivalence class of situations of the same type. Proof of the theorem. Recall that a timeline is defined by an (improper) successor state axiom as follows: T (i, do(a, s)) ≡ (s � S 0 ∧ a=ν s ∧ T (i, s)) ∨ (s = S 0 ∧ H(i, a)). (69) We show that a timeline corresponds to an =ν -equivalence class. Define: [ do(a, s) ] = {do(a� , s� ) |do(a, s)=ν do(a� , s� )} (70) [ do(a, s)] is an =ν -equivalence class because =ν is an equivalence relation on actions and situations (see Lemma 6). We show, by induction on s� that: do(a� , s� ) ∈ [do(a, s)] iff ∃i.T (i, do(a� , s� )) ∧ T (i, do(a, s)). By definition of the equivalence class, it implies that we show do(a, s)=ν do(a� , s� ) ≡ ∃i.T (i, do(a� , s� )) ∧ T (i, do(a, s)). Basic case s� = S 0 (71) B PROOFS OF SECTION 3 52 ⇒ A. s� = S 0 and s = S 0 1. do(a� , S 0 )=ν do(a, S 0 )→a=ν a� (By (a), Lemma 6 ) 2. a=ν a� →∃i.H(i, a) ∧ H(i, a� ) (By (H2)) 3. H(i, a) ∧ H(i, a� ) ∧ s� = S 0 ∧ s = S 0 → (72) ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 )) (By (W1)) 4. do(a� , S 0 )=ν do(a, S 0 )→∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 )) (By A1, A3 and Taut.) � B. s = S 0 and s � S 0 and the Ind. Hyp. is s=ν do(a� , S 0 )→∃i.T (i, do(a� , S 0 )) ∧ T (i, s), for s ∈ [do(a, s)]. 1. do(a� , S 0 )=ν do(a, s)→a=ν do(a� , S 0 ) ∧ s=ν do(a� , S 0 ) (By (E4)) 2. s=ν do(a� , S 0 )→∃i.T (i, do(a� , S 0 )) ∧ T (i, s) (By Ind. Hyp.) 3. a=ν do(a� , S 0 )→a=ν a� (By (E3).) 4. s=ν do(a� , S 0 )→a� =ν s (By (E4)) (73) 5. a� =ν a� ∧ a� =ν s→a=ν s (By (T3) Lemma 6) 6. ∃i.T (i, do(a� , S 0 )) ∧ T (i, s) ∧ a=ν s→ ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s)) (By 2 and (W1)) 7. do(a� , S 0 )=ν do(a, s)→∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s)) (By 1, 6 and Taut.) ⇐ C. s� = S 0 and s = S 0 1. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 ))→∃i.H(i, a� ) ∧ H(i, a) (By (W1)) 2. ∃i.H(i, a) ∧ H(i, a� )→a=ν a� (By (H2)) 3. a� =ν a ∧ s = S 0 →a� =ν do(a, S 0 ) (By (E3) ) (74) 4. a� =ν do(a, S 0 )→do(a, S 0 )=ν do(a� , S 0 ) (By (E4)) 5. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 ))→do(a, S 0 )=ν do(a� , S 0 ) (By 1,4 and Taut.) � D. s = S 0 and s � S 0 , the Ind. Hyp. is ∃i.T (i, do(a� , S 0 )) ∧ T (i, s)→s=ν do(a� , S 0 ) 1. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s))→ ∃i.T (i, do(a� , S 0 )) ∧ a=ν s ∧ T (i, s) (By (W1)) 2. ∃i.T (i, do(a� , S 0 )) ∧ a=ν s ∧ T (i, s)→s=ν do(a� , S 0 ) (By Ind. Hyp.) (75) 3. a=ν s ∧ s=ν do(a� , S 0 )→a=ν do(a� , S 0 ) (By (E3) and symm.) � � � 4. a=ν do(a , S 0 ) ∧ s=ν do(a , S 0 )→do(a, s)=ν do(a , S 0 ) (By (E4)) 5. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s))→do(a� , S 0 )=ν do(a, s) (By 1,4 and Taut.) Induction s� � S 0 (we may also assume w.l.g s � S 0 , otherwise it can be reduced to the above basic cases.) ⇒ The Ind. Hyp. is s=ν s� →∃i.T (i, s) ∧ T (i, s� ) with s� ∈ [s]. 1. do(a� , s� )=ν do(a, s)→a=ν a� ∧ s=ν s� ∧ a=ν s� ∧ a� =ν s (By (b,c), Lemma 6 ) 2. a� =ν s ∧ s=ν s� →a� =ν s� (By (T1), Lemma 6 ) 3. a=ν s� ∧ s� =ν s→a=ν s (By (T1), Lemma 6 ) 4. s=ν s� →∃i.T (i, s� ) ∧ T (i, s) (By Ind. Hyp. and 1) 5. ∃i.T (i, s� ) ∧ T (i, s) ∧ a=ν s ∧ a� =ν s� → ∃i.T (i, do(a, s)) ∧ T (i, do(a� , s� )) (By (W1)) 6. do(a� , s� )=ν do(a, s)→∃i.T (i, do(a, s)) ∧ T (i, do(a� , s� )) (By 1, 5 and Taut.) ⇐ The induction hypothesis is ∃i.T (i, s) ∧ T (i, s� )→s=ν s� with s� ∈ [s]. (76) 1. ∃i.T (i, do(a� , s� )) ∧ T (i, do(a� , s� ))→ a� =ν s� ∧ a=ν s ∧ T (i, s) ∧ T (i, s� ) (By (W1)) 2. ∃i.T (i, s) ∧ T (i, s� )→s=ν s� (By Ind. Hyp.) (77) 3. a� =ν s� ∧ s� =ν s ∧ a=ν s→a� =ν s ∧ a=ν s� ∧ a=ν a� (By (T1,T2), Lemma 6 ) � � � � � 4. a=ν s ∧ s=ν s ∧ a=ν a →→do(a, s)=ν do(a , s ) (By (d), Lemma 6 ) 5. ∃i.T (i, do(a� , s� )) ∧ T (i, do(a� , s� ))→do(a, s)=ν do(a� , s� ) (By 1,4 and Taut.) B PROOFS OF SECTION 3 53 We have, thus, shown that a timeline represents an equivalence class indexed by a type (that is by a component of the system). Furthermore it follows from the definition that S 0 does not belong to any equivalence class hence to no timeline. Thus any induction on timelines needs that the basic case is do(a, S 0 ). � B.5 Proof of Theorem 3 We have to show that the axioms G1-G5 have a model which is also a model of Σ=ν ∪ Duna . A structure for L3 which is a model for Σ=ν ∪ Duna has been provided in Theorem 1, furthermore, if we consider a satisfiable initial database DS 0 as shown in Corollary 1, there exists a model of Σ=ν ∪ Duna ∪ DS 0 that can be extended to a model of D = Σ=ν ∪ Duna ∪ DS 0 ∪ D ss ∪ Dap , with successor state axioms mentioning also timelines. Therefore let M1 = (D, I0 ) be such a model. We shall show that this model M1 can be extended to a model M2 in which we give an appropriate interpretation for the sort S of bags of timelines, and such that the axioms G1-G5 are satisfied. Let M2 = (D ∪ {S }, I) i.e. M2 has the same domain as M1 for sorts objectincluding reals (time)- actions and situations, I is like I0 for all symbols of the language L3 and it is extended to interpret the above mentioned elements in S according to the following steps. Let n ∈ N and �s1 , . . . , sn � a n tuple of situations in S itn , S it ⊂ D and v any valuation on the free variables. 1. 2. If n = 0 If n > 0 B 0I ∈ S . (I,v) �s(I,v) 1 , . . . , sn � ∈ S iff for each sk , k = 1, .., n, M1 , v |= (sk = S 0 ∨ ∃iT (i, sk )) (78) Now, each term (I,v) ∈ S is invariant to both permutations of the order of the tuple and compaction of repeated situations. That is, whenever p is a permutation of {1, . . . , n} and �s1 , . . . , sn � is a tuple of situations in S itn , S it ⊂ D, then (I,v) (I,v) (I,v) (I,v) (I,v) (I,v) �s(I,v) 1 , . . . , sn � = �κ(s p1 , . . . , s pi , s pi , . . . , s pn )� p (79) Here κ : S �→ S indicates a compaction function on a tuple of the repeated k elements of a n + k tuple and (I,v) �· � p : S �→ S the permutation function on {1, . . . , n}. More precisely, �s(I,v) p1 , . . . s pn � p has been obtained (I,v) (I,v) (I,v) (I,v) from κ(�s(I,v) 1 , . . . , s pi , s pi , . . . , s pn � p ) by compacting to a single representative the repeated arguments s pi , (I,v) (I,v) (I,v) (I,v) (I,v) and �s(I,v) by a permutation p of {1, . . . , n}. In the 1 , . . . , sn � has been obtained from �s p1 , . . . , s pi , . . . s pn � p language the permutation �·� p incorporates also the compaction κ. [G1] Given the construction of terms of sort bag of timelines we can state the following, for n ∈ N and v any assignment to the free variables: � � � M2 , v |= s ∈S B(�s j1 , . . . , s jn � j ) iff M1 , v |= (s = s jk ∧ ∃i.T (i, s jk )) ∨ S 0 = s jk (80) 1≤k≤n And since M2 is like M1 for L3 it follows that (G1) is satisfied in M2 . We can now generalise the membership relation over bags of situations as follows: 1. 2 For all sI,v ∈ S it ⊂ D, I f (I,v) = B 0I Otherwise M2 , v |= s ∈S then M2 , v |= s �S (I,v) iff there is a n ∈ N and a n tuple �s(I,v) 1 , . . . sn �, of elements of S it ⊂ D s.t. (I,v) (I,v) (I,v) (I,v) = �κ(s p1 , . . . s pn )� p , for some k = 1, . . . , n, s(I,v) = s(I,v) pk , and M2 , v |= ∃i.T (i, s) iff sI,v � S 0I Now we have given an interpretation in M2 to ∈S and built the terms of sort bag of timelines, thus we can define the interpretation for =S : M2 , v |= =S � � iff for any sI ,v ∈ S it (M2 , v |= s ∈S iff M2 , v |= s ∈S � ) (82) (81) B PROOFS OF SECTION 3 54 [G2] Follows from (82) above. [G3] This is a consequence of item (1) of (78), defining the interpretation of the constant B 0 . [G4] By definition of timeline and of bag of timelines, if M2 , v |= (s = S 0 ∨ ∃iT (i, s)) then B(s) is a bag of timelines. Consider, now, the definition of ∪S given in Example 10 then M2 , v |= =S � ∪ B(s) iff M2 , v |= ∀s.s ∈ ≡s∈ � ∪ B(s) Hence by simple induction on the structure of � , G4 is satisfied in M2 . [G5] Let S be the terms of sort bag of timelines, and consider the definition of ⊆S given in Example 10: ⊆S is an ordering relation on S , then every subset of S has a set of minimal elements and, in particular, B 0 is a minimum. Now suppose that M2 , v |= ϕ(B 0 ) ∧ (∀s .ϕ( ) ∧ ϕ(B(s))→ϕ( ∪S B(s))) (83) and for some M2 , v �|= ϕ( ) Hence M2 , v |= ¬ϕ( ) Let W = { � |M2 , v |= ¬ϕ( � )} then W has a set of minimal elements Min = { ∈ W |¬∃ ∈ W, ⊂S }. Let x ∈ Min then, by hypothesis M2 , v |= ¬ϕ(x), with x � B 0 . Now, being x minimal in W, there must exist a � , � ∈ S , with � ⊂ x and � � W. Then M2 , v |= ϕ( � ), since � � W. We can find an s such that x = � ∪S B(s), by the conditional existence. By equation (83)), since by hypothesis M2 , v |= ϕ(B 0 ), and from M2 , v |= ϕ( � ) and ϕ(B(s)) it follows that M2 , v |= ϕ( � ∪ B(s)). Hence M2 , v |= ϕ(x), a contradiction. We have shown that M2 is a model of Duna ∪ Σ=ν ∪ (G1-G5). � B.6 Proof of Theorem 4 To prove the theorem we shall first extend the definition of uniform terms and regressable sentence (see [61]) to include terms and formulas mentioning terms of sort bag of situations. Let LT FS C be the language of SC extended to include A+ and let D+ be a basic action theory extended with A+ . Let σ denote a term of sort situation mentioning only S 0 and terms of sort actions αi = A(t1 , . . . , tm ), m ≥ 0, with ti , 1 ≤ i ≤ m, not mentioning terms of sort situation, that is, appealing to the notational convention used in [61] σ = do([α1 , . . . , αn ], S 0 ), for some n ≥ 0, and for terms α1 , . . . , αn of sort action. Definition 4 The set of terms of the language LT FS C uniform in σ1 , . . . , σk , k ≥ 1, is the smallest set defined as follows: 1. Any term not mentioning term of sort situation is uniform in σ1 , . . . , σk , k ≥ 1. 2. σi is uniform in σ1 , . . . , σk , i = 1, . . . , k. 3. If g is an n-ary function symbol other than do and B, and t1 , . . . , tn are terms uniform in σ1 , . . . , σk whose sorts are appropriate for g, then g(t1 , . . . , tn ) is a term uniform in σ1 , . . . , σk . 4. B(σi ) is a term uniform in σ1 , . . . , σk , i = 1, . . . , k. 5. B(�σ j1 , . . . , σ jk � j ) is a term uniform in σ1 , . . . , σk , for any permutation j of {1, . . . , k}. B PROOFS OF SECTION 3 55 Finally: Items (4) and (5), of Definition 4 above, are correct because bags of timelines are flat sets, that is, they are formed only by situations, which are their individual elements. This follows from the first axiom (G1) defining membership only for elements of sort situation. To see this we prove the following lemmas. Lemma 8 For all bags of timelines and : B.6.1 �S . Proof of Lemma 8 We prove the claim by induction on , using (G5). First note that by (G1) � B 0 and by (G2) � B(s), because �S s. Let: φ( ) = ∀ . � , then we have: φ( ) φ(B 0 ) φ(B(s)) = = = ∀ . � (Ind. Hyp) ∀ . � B0 (by G1) ∀ . � B(s) (by G2) Then ∀ . � ∧ � B(s)→ � ∪S B(s) (by Def. of ∪S Ex. 10) ∀ . � (by Ind. Hyp. and G5) (84) Hence the claim. � The above lemma implies, in particular, that bags of timelines are not ordinal numbers. Lemma 9 Let P( ) be the power set of the bag . Then for any bag term , ∩ P( ) = ∅. B.6.2 Proof of Lemma 9 Let P( ) be the power set of , that is ∀x.x ⊆S →x ∈S P( ). We have to show that for all bags of timelines , ∩S P( ) = ∅. Suppose that there is some x, such that x ∈S ∩S P( ), then x ∈S and x ∈S P( ). Since x ∈S P( ) then x is a bag term � hence � ∈S contradicting the previous Lemma 8. � Lemma 10 If B.6.3 is a set of bag terms then �S for all bag terms . Proof of Lemma 10 Follows from the previous Lemma 9. � We define now the set of regressable formulas extending the definition of [61] to include formulas mentioning bag of timelines. Definition 5 A formula W of LT FS C is regressable iff 1. W is first order. 2. W does not mention variables of sort situation nor of sort bag of timelines. 3. Every term of sort situation mentioned by W is uniform in σ1 , . . . , σn , n ≥ 1. 4. For every atom of the form Poss(α, σ) mentioned by W, α has the form A(t1 , . . . , tn ) for some n-ary action function symbol A of LT FS C . B PROOFS OF SECTION 3 56 5. Every term of sort bag of timelines appearing in W is uniform in σ1 , . . . , σk , for some k ≥ 0. 6. W does not quantify over situations nor bag of situations. First note that, by definition of the regression operator, if then D |= W ≡ W � D |= R(W) ≡ R(W � ) (85) Let W be a regressable formula mentioning terms of sort bag of timelines we show that D+ |= (∀)W ≡ R(W) with R(W) a formula uniform in S 0 . We show the claim by induction on the structure of the regressable formula W, mentioning terms of sort bag of timelines. We first show, however, that T (i, σ) is regressable if σ is a uniform term. Lemma 11 Consider a uniform term of sort situation, of the form σm+1 = do(Am+1 (�xm+1 ), do(. . . , do(A1 (�x1 ), S 0 ) . . .)), for some m ∈ N and actions A1 , . . . Am+1 , that is a timeline. Then: R(T (i, do(Am+1 , σm ))) ≡ m+1 � H(i, A j (�x j )) (86) j=1 Proof of the Lemma First note that given σm+1 , as specified in the Lemma, the following holds, by m applications of Axioms (E1) and (E3) and of theorem (T1) of Lemma 6: Am+1 =ν σm ∧ σm = do(Am (�xm ), do(. . . , do(A1 (�x1 ), S 0 ) . . .)) ≡ ≡ Am+1 (�xm+1 )=ν Am (�xm ) ∧ . . . ∧ A2 (�x2 )=ν A1 (�x1 ) (87) H(i, Am+1 (�xm+1 )) ∧ . . . ∧ H(i, A1 (�x1 )) Further, by induction on the number of actions mentioned in the uniform situation term σm+1 , we can see that: T (i, do(Am+1 (�xm+1 ), σm )) ≡ m+1 �� j=1 � A j (�x j )=ν σ j−1 ∨ σ j−1 = S 0 ∧ H(i, A(�x j )) (88) Indeed, the basic case, for σm = S 0 follows from the definition of T (i, do(a, s)). For the induction, we have: T (i, do(Am+1 (�xm+1 ), σm )) ≡ ≡ Thus �m+1 j=1 ≡ T (i, σm ) ∧ Am+1 (�xm=1 )=ν σm ∨ σm = S 0 ∧ H(i, Am+1 (�xm=1 )) � �m+1 � x j )=ν σ j−1 ∨ σ j−1 = S 0 ∧ H(i, A(�x j )) j=1 A j (� ∧Am+1 (�xm=1 )=ν σm ∨ σm = S 0 ∧ H(i, Am+1 (�xm=1 )) �m+1 x j )) j=1 H(i, A j (� (89) (By Ind. Hyp.) (By (87), above and Taut.) H(i, A j (�x j )) ≡ R(T (i, do(Am+1 , σm ))), a formula uniform in S 0 . � Theorem continue.. For the basic step we consider the following atoms (see Definition 1): 1. if W has the form σ ∈ B(�σ j1 , . . . , σ jk �), with each σ ji = do([α1 , . . . , αn ], S 0 ) for some n ≥ 0, then by axiom (G1), � � D+ |= W ≡ σ = σ jp ∧ σ jp = S 0 ∨ T (i, σ jp ) 1≤p≤k i B PROOFS OF SECTION 3 57 So let W � be the RHS of the equivalence in the above formula, by Lemmas 8,9 and (G1) W � does not mention terms of sort bag of timelines and moreover is a regressable formula (Lemma 11) hence the regression theorem, as stated in [61] applies. Therefore R(W � ) is a formula uniform in S 0 and, by (85) above: R(W) ≡ R(W � ) (90) and the claim is verified by monotonicity since D ⊆ D+ . 2. If W has the form =S then, because W is a regressable sentence, it has the following form: B(�σ j1 , . . . , σ jk � j ) =S B(�σ�p1 , . . . , σ�pm � p ) then by (G2), W is equivalent to the formula W � : � � � � ∀s. s ∈S B(�σ j1 , . . . , σ jk � j ) ≡ s ∈S B(�σ p1 , . . . , σ pm � p ) (91) Finally W �� , by first order tautologies and equality, is equivalent to the following sentence W ��� : �� � � � 1≤h≤k 1≤q≤m (σ jh = σ pq ∧ (σ jh = S 0 ∨ i T (i, σ jh ))) (93) And, by (G1), W � is equivalent to the following formula W �� : �� � � � � ∀s. 1≤h≤k s = σ jh ∧ (σ jh = S 0 ∨ i T (i, σ jh )) ≡ 1≤q≤m s = σ pq ∧ (σ pq = S 0 ∨ i T (i, σ pq )) (92) Now, by Lemma 11, W ��� is a regressable sentence not mentioning terms of sort bag of timelines and hence the regression theorem can be applied and the claim holds. 3. If W has the form: =S B 0 Then W reduces to either � or ⊥, which are regressable sentences in L, and R(�) (as R(⊥)) are uniform in S 0 , hence the claim holds. 4. If W has the form: B(�σr1 , . . . , σrm �r ) =S B(�σ�j1 , . . . , σ�jk � j ) ∪S B(�σ��p1 , . . . , σ��pn � p ) then we can use the definition of ∪S given in Example 10: � ( ∪S =S ≡ ∀s.s ∈S ≡ (s ∈S ∨ s ∈S � )) Then W is equivalent to the following formula W � : ∀s.s ∈S B(�σr1 , . . . , σrm �r ) ≡ s ∈S B(�σ�j1 , . . . , σ�jk � j ) ∨ s ∈S B(�σ��p1 , . . . , σ��pn � p ) (94) Again using (G1), we get W �� : � s = σu ∧ (σu = S 0 ∨ i T (i, σu )) ≡ � � � � � � � �� �� �� 1≤h≤k s = σh ∧ (σh = S 0 ∨ i T (i, σh )) ∨ 1≤q≤n s = σq ∧ (σq = S 0 ∨ i T (i, σq )) ∀s. � �� 1≤u≤m (95) C PROOFS FOR SECTION 4 58 By first order tautologies and equality, W �� reduces to the equivalent formula W ��� : � 1≤u≤m �� 1≤h≤k (σu = σ�h ) ∨ � 1≤q≤n (σu � = σ��q ) ∧ (σu = S 0 ∨ T (i, σu )) (96) W ��� is a regressable sentence of L3 and does not mention terms of sort bags of timelines, therefore the regression theorem can be applied and also in this case the claim holds. 5. Any other regressable atom (see Definition 1) mentioning the classical set-operators, that can be defined using (G1-G5), can be easily reduced to the previous cases. Now, by induction hypothesis, regression can be extended to any regressable sentence mentioning terms of sort bag of timelines simply using the inductive definition of R: R[¬W] = R[W1 ∧ W2 ] = R[(∃v)W] = ¬R[W], R[W1 ] ∧ R[W2 ], (∃v)R[W] (97) This concludes the proof that regression can be extended to regressable formulas mentioning terms of sort bag of timelines � Appendix C C Proofs for Section 4 C.1 Proof of Theorem 5 Let DT be the theory formed by D ss ∪ Dap ∪ Dπ ∪ DS 0 ∪ A+ ∪ Duna , we have to prove that DT is satisfiable iff DS 0 ∪ A+ ∪ Duna is. If DT is satisfiable, despite the second order axiom, using compactness it follows that also DS 0 ∪ A+ ∪ Duna is. For the other direction assume that DS 0 ∪ A+ ∪ Duna is satisfiable. Let M1 be any of such structures. We know from the proof of Theorem 3 that M1 , a structure of L4 , is also a model of DS 0 ∪ Dap ∪ D ssa ∪ Duna ∪ A+ where D ssa ∪ Duna mentions successor state axioms for timelines, but no process is yet defined. We also assume that axiom (4), (see also (45 )) is verified in M1 . Here we have to show that M1 can be transformed into a model of the condition (20) and of the successor state axioms and action precondition axioms for processes. To this end we define a new structure M2 = (D, I � ) having the same domain as M1 , but the interpretation I � extends I, fixing an interpretation also for processes and the fluent Idle, as specified in the sequel. Let Π be the set of processes in the language L4 , we enumerate Π π1 (i1 , �x, t0 , S 0 ), π2 (i1 , �x, t0 , S 0 ), . . . , πm (ik , �x, t0 , S 0 ), πm+1 (ik , �x, t0 , S 0 ), . . . and define subsets of this ordering as follows: Πim = {π j (im , �x, t0 , S 0 ) | im is the corresponding name type of the process π j , for which H(im , a) is defined} In other words, following Corollary B.3 all the actions startπ and end pi might have been already suitable assigned to the H(i, a). We can now order the Πik according to the name type and choose one process for each set and state: M2 |= ∃�xπ j (im , �x, t0 , S 0 ) iff for all πk (im , �x, t0 , S 0 ) ∈ Πim , (k � j), M2 |= ∀�x¬πk (im , �x, t0 , S 0 ) Further we set for all name types ¬Idle(i, S 0 ). C PROOFS FOR SECTION 4 59 This construction implies that M1 is a model of the condition (20). Thus we have: (1) (2) (3) (4) M2 , v |= ¬Idle(i, S 0 ) for all name types i M2 , v |= ¬Poss(startπ (i, �x, t), S 0 ) for all name types i, because of (1) above and definition (19) M2 , v |= Poss(endπ (i, �x, t), S 0 ) iff M2 , v |= Ψend (i, �x, S 0 ) and (98) M2 , v |= π(i, �x, t0 , S 0 ) ∧ time(endπ ) > t0 M2 , v |= time(S 0 ) = t0 by the construction of M1 Having fixed the definition of processes, Idle and Poss in DS 0 we can proceed inductively on all situation s similarly as in the relative satisfiability theorem in [61]. In fact, the inductive step of the proof relies on the fact that the right hand side of the successor state axioms Ψ(�x, y, a, t, s) is uniform in s and hence has already been assigned a truth value in s by M2 , and being fixed for DS 0 the induction is straightforward. � C.2 Proof of Proposition 1 We proceed by induction on σ, using the definition of executable given in (17) and the definitions of the action preconditions. Consider do(α, S 0 ), in this case, if H(i, α) then DT |= T (i, do(α, S 0 )). For the inductive step, consider do(α, σ). By the hypothesis α is executable in σ, hence DT |= Poss(α, σ), therefore, by (19) and (21) , α=ν σ. By the induction hypothesis DT |= T (i, σ) hence DT |= T (i, do(α, σ)). � C.3 Proof of Proposition 2 We have to show that, for σ an executable situation, then π�π �� � DT |= Idle(i, σ) ∨ ∃�x t.π(i, �x, t, σ) → ¬Idle(i, σ) ∧ ∀�y ¬π (i, �y, t, σ) (99) π� ∈Π First note that for σ = S 0 the statement holds because of (20). For any other σ, by the previous Proposition 1 (C.2) it follows that σ must also be a timeline. Suppose that for the given timeline T (i, σ) the statement holds for �� any σ � σ, and let do(a, σ� ) be the first situation in the equivalence class of the timeline for which the statement does not hold, for some a. We show that this leads to a contradiction. Assume, therefore, there is some model M of DT and some processes π� (i, �y, t� , do(a, σ� )) and π�� (i, �x, t, do(a, σ� )), with π� � π�� , which do not satisfy (99). Then: M |= ¬Idle(i, do(a, σ)) ∧ (∃�x, t.π� (i, �x, t, do(a, σ� )) ∧ ∃�z, t� .π�� (i,�z, t� , do(a, σ� )) ∨ Idle(i, do(a, σ))) (100) M |= ¬Idle(i, do(a, σ)) ∧ (∃�x t.π� (i, �x, t, do(a, σ� )) ∧ ∃z, t� .π�� (i,�z, t� , do(a, σ� ))) Then: 1. Since M |= ∃�x, t.π� (i, �x, t, do(a, σ� )), then by the successor state axiom for processes (16): M |= M |= ∃�x, t.a=startπ� (i, �x, t) ∨ π� (i, �x, t, σ� ) ∧ ∀t.a � endπ� (i, �x, t) ∃�x, t.a=startπ� (i, �x, t) ∨ ∃�z, t� .π� (i,�z, t� , σ� ) ∧ ∀t�� .a � endπ� (i,�z, t�� ) (101) 2. Since M |= ∃�x, t.π�� (i, �x, t, do(a, σ� )), then: M |= ∃�x, t.a=startπ�� (i, �x, t) ∨ ∃�z, t� .π�� (i,�z, t� , σ� ) ∧ ∀t� .a � endπ�� (i,�z, t� ) (102) D PROOFS FOR SECTION 6 60 Now, since do(a, σ� ) is the first situation in which the statement fails, it follows that it must be true in σ hence it � t� , σ� ) and π�� (i, d�� , t� , σ� ) hold. W.l.o.g cannot be that for some d� and some d�� in the domain of M, both π� (i, d, � � � � t , σ ), for some d� ∈ D, the domain of M, and we may assume any of the two and establish that M |= π (i, d, M |= ∀z, t� ¬π�� (i,�z, t� , σ� ). But now, since M satisfies π�� with argument do(a, σ), it follows, from the successor state axiom for processes, that it must be that M |= ∃�y, t� .a = startπ�� (i, �y, t� ). By the statement assumptions do(a, σ) must be executable, hence M |= ∃�y, t� .a = startπ�� (i, �y, t� )∧Poss(startπ�� (i, �y, t� ), σ� ). This fact, in turn, implies, by the definition of Poss for the action startπ�� (see 19) that M |= Idle(i, σ� ), hence � t� , σ� ), for some d� ∈ D, given that (99) holds with σ� , therefore also for π� it it cannot be true that M |= π� (i, d, � � � must be that M |= ∀z, t ¬π (i, d, t� , σ� ). We are thus left with M |= ∃�x, t.a=startπ� (i, �x, t) ∧ Idle(i, σ� ) ∧ ∃�y, t� .a=startπ�� (i, �y, t) ∧ Idle(i, σ� ) (103) But this is not possible for both, by the equality, hence it follows that M |= ∀�x, t.¬π� (i, �x, t, do(a, σ� )), and we have a contradiction. � Appendix D D Proofs for Section 6 D.1 Proof of Theorem 6 We first introduce three lemmas, then we prove the theorem. Lemma 12 Let K and G be finite sets of indexes, the following are tautologies: � � � � i. { i∈K Di } → { g∈G Wg } ≡ {D → { g∈G Wg }}, � � � �i∈K i ii. ∀� w∃�z i∈K {Di (� w) → { g∈G Wg (� w,�z)}} ≡ w{Di (� w) → { g∈G ∃�z Wg (� w,�z)}}. i∈K ∀� Proof. By FOL (104) � Lemma 13 Given the predicates Elapsed X (i, �x, t− , t+ , σ) and ActiveX (i, �x, t− , σ), as defined in (24,25), with σ a ground situation of type i, for any M of TFSC and assignment v, the following holds: � M, v |= Elapsed X (i, �x, t− , t+ , σ) iff M, v |= i∈K [Mk,X (�x, t− , t+ , S 0 ) ∧ (t− = τ−k,X ∧ t+ = τ+k,X )] ; � (105) M, v |= ActiveX (i, �x, t− , σ) iff M, v |= i∈K [Nk,X (�x, t− , S 0 ) ∧ (t− = τ−k,X )]. Here τ±k,X is a time variable (or instance) mentioned in σ, Mk,X (�x, t− , t+ , S 0 ) and Nk,X (�x, t− , S 0 ) are TFSC formulas in S 0 with k in K finite set of indexes. Proof. We proceed by induction on σ. Basic case: in this case we state σ = S 0 , hence, by (24) we have that ActiveX (i, �x, t− , S 0 ) = X(i, �x, S 0 ) ∧ time(S 0 ) = t− ElapsedX (i, �x, t− , t+ , S 0 ) = ⊥, and we obtain (105) once we state, for instance, K = {1}, M1,X (i, �x, t− , t+ , S 0 ) = ⊥, N1,X (i, �x, t− , S 0 ) = X(i, �x, S 0 ) and τ−1,X = t0 , where t0 is for time(S 0 ). Inductive step: now we assume that (105) holds for σ and we prove that it holds for do(A, σ). D PROOFS FOR SECTION 6 61 By (24) we have that: ActiveX (i, �x, t− , do(A, σ)) = T (i, do(A, σ)) ∧ S tartedX (i, �x, t− , A, σ)∨ ActiveX (i, �x, t− , σ) ∧ ¬∃t+ EndedX (i, �x, t+ , A, σ) ElapsedX (i, �x, t− , t+ , do(A, σ)) = T (i, do(A, s)) ∧ ElapsedX (i, �x, t− , t+ , σ)∨ EndedX (i, �x, t+ , A, σ) ∧ ActiveX (i, �x, t− , σ). Applying the definition of S tartedX and EndedX the previous one can be rewritten as follows: ActiveX (i, �x, t− , do(A, σ)) = T (i, do(A, σ)) ∧ X(�x, do(A, σ)) ∧ ¬X(�x, σ) ∧ time(A) = t− ∨ ActiveX (i, �x, t− , σ) ∧ ¬∃t+ (X(�x, σ) ∧ ¬X(�x, do(A, σ)) ∧ time(A) = t+ ); ElapsedX (i, �x, t− , t+ , do(A, σ)) = T (i, do(A, s)) ∧ ElapsedX (i, �x, t− , t+ , σ)∨ X(�x, σ) ∧ ¬X(�x, do(A, σ)) ∧ time(A) = t+ ∧ ActiveX (i, �x, t− , σ). (106) Consider the regression of the following formulas: R(T (i, do(A, σ)) = R0i (S 0 ); R(X(�x, σ) ∧ ¬X(�x, do(A, σ))) = R1X (�x, S 0 ); R(T (i, do(A, σ)) ∧ X(�x, do(A, σ)) ∧ ¬X(�x, σ)) = R2X (�x, S 0 ); R(¬∃t+ (X(�x, σ) ∧ ¬X(�x, do(A, σ)) ∧ time(A) = t+ )) = R3X (�x, S 0 ). Than we can rewrite the previous equivalence (106) by substituting the regressed formulas R2X (�x, S 0 ) ∧ time(A) = t− ∨ ActiveX (i, �x, t− , σ) ∧ R3X (�x, S 0 ); ElapsedX (i, �x, t− , t+ , do(A, σ)) = R0i (S 0 ) ∧ ElapsedX (i, �x, t− , t+ , σ)∨ R1X (�x, S 0 ) ∧ ActiveX (i, �x, t− , σ) ∧ time(A) = t+ . ActiveX (i, �x, t− , do(A, σ)) = If we now apply the inductive hypothesis we get: M, v |= M, v |= M, v |= M, v |= Elapsed X (i, �x, t− , t+ , do(A, σ)) � [M (�x, t− , t+ , S 0 ) ∧ R0i (S 0 ) ∧ (t− = time(a−k,X ) ∧ t+ = time(a+k,X ))]∨ �k∈K k,X − x, t , S 0 ) ∧ R1X (�x, S 0 ) ∧ time(A) = t+ ; k∈K [Nk,X (� ActiveX (i, �x, t− , do(A, σ)) � x, t− , S 0 ) ∧ R3X (�x, S 0 ) ∧ (t− = time(a−k,X ))]∨ k∈K [Nk,X (� 2 RX (�x, S 0 ) ∧ time(A) = t− iff iff (107) where a±k,X is an action mentioned in the ground situation σ. Since time(ak,X ) equals to a time variable (or instance) τ±k,X mentioned in σ, the property (23) holds for do(A, σ). This concludes the prove. � Lemma 14 Given a bag of timelines [ω] mentioning the set of timelines {σ1 , . . . , σn }, where ω is a tuple of the time variables ti, j , the predicate I(T c , [ω]) can be transformed into the following form: � � � � � x∃�y(Pd (id , �x, τ−k , τ+k ) opd,r Qr ( jr , �y, τ−g , τ+g )[σid , σ jr ]). (108) d∈D w∈Wd r∈Rd,w k∈Kd,w,r g∈Gd,w,r ∀� Here D, Wd , Rd,w Kd,w,r , and Gd,w,r are finite set of indexes, where τ−k , τ+k (τ−g , τ+g ) are either temporal variables or ground temporal instances mentioned in the ground situation σid (respectively, σ jr ) with name types id ( jr ) . � Proof. From (27) we have that I(T c , ) is a TFSC formula of the form � � � ∃s(s ∈ ∧ T (i, s) ∧ ( {(op,Q( j,�y))∈L} ∃s� (s� ∈ ∧ T ( j, s� )∧ (comp(Pi (�x),LL) ∈ T c ) (L ∈ LL) ∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ])))). D PROOFS FOR SECTION 6 62 We want to prove that, for each model M of TFSC and assignment v M, v |= M, v |= ∃s(s ∈ ∧ T (i, s) ∧ ∃s� (s� ∈ ∧ T ( j, s� )∧ ∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ]))) (109) iff there exist two ground situation σi and σ j in of type i and j respectively, such that: � � x∃�y(P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]), k∈K g∈G ∀� with K and G finite sets of indexes and τ−k , τ+k (τ−g , τ+g ) time variables or instances in σi (σ j ). First of all, observe that, since M is a TFSC model, the following holds: M, v |= M, v |= ∃s(s ∈ ∧ T (i, s) ∧ ∃s� (s� ∈ ∧ T ( j, s� )∧ ∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ]))) (110) iff there exists two ground situations, σi and σ j in , of type i and j respectively, such that: ∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[σi , σ j ]). Therefore, the theorem is proved once we show that M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[σi , σ j ]), � � M, v |= k∈K g∈G ∀�x∃�y(P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]), with σi , σ j ∈ , iff (111) with τ−k , τ+k , (τ−g , τ+g ) time variables or instances thereof mentioned in σi (σ j ). Indeed, from (111) and (27) we obtain that M, v |= I(T c , ) iff � � � � � M, v |= d∈D w∈Wd r∈Rd,w k∈Kd,w,r g∈Gd,w,r ∀�x∃�y(Pd (id , �x, τ−k , τ+k ) opr Qr ( jr , �y, τ−g , τ+g )[σid , σ jr ]), where d ∈ D is for (comp(P(�x), LL) in T c ), w ∈ Wh is for L in LL, and r ∈ Rd,w is for (op, Q( j, �y)) ∈ L. Once we � � � represent r∈Rd,w k∈Kd,w,r directly as �r,k�∈RKd,w , we obtain the equation (108) and the Lemma is proved. Now, it remains to show that (27) holds. In order to prove this, we proceed with a proof by cases, restricting our attention to {m, b, s, f, d}. Case meets: We consider the following form: P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ] def ElapsedP (i, �x, ti− , ti+ , σi ) → � � (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti+ = t−j ) ; � Given the (107), by FOL M, v |= (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) iff M, v |= g∈G [Wg,Q (�y, t−j , t+j S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q )], with τg,Q time variables (instances) mentioned in σ� . Therefore, we have that: = M, v |= P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ] � M, v |= { i∈K [Mi,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} → � { g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t−j )]}, iff (112) We can consider the formula ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ]. We can see that: M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ], (by (112)) iff � M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j { i∈K [Mi,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} → � (by (104) i.) iff { g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t−j )]}, � M, v |= i∈K {∀�x, ti− , ti+ [Mi,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )] → � { g∈G ∃�y, t−j , t+j [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t−j )]}}, (by (104) ii.) iff � � M, v |= k∈K {∀�x[Mk,P (�x, τ−k,P , τ+g,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ+k,P =τ−g,Q )]}} iff � � M, v |= k∈K g∈G ∀�x, ∃�yP(i, �x, τ−k,P , τ+g,P ) m Q( j, �y, τ−g,Q , τ+g,Q )[σi , σ j ] D PROOFS FOR SECTION 6 63 The last one mentioning only time variables in σi and σ j . This concludes the prove for m. Case before: We consider the following form: P(i, �x, ti− , ti+ ) b Q( j, �y, t−j , t+j )[σi , σ j ] def = ElapsedP (i, �x, ti− , ti+ , σi ) → � � (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti+ > t−j ) . Analogously to the previous case, ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) b Q( j, �y, t−j , t+j )[σi , σ j ] can be transformed into � � {∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ+k,P <τ−g,Q )]}} g∈G k∈K mentioning only time variables in σ and σ� . This concludes the prove for b. Case finishes: We consider the following form: P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[σi , σ j ] def = ElapsedP (i, �x, ti− , ti+ , σi ) → � � ElapsedQ ( j, �y, ti+ , t+j , σ j ) ∧ (ti+ = t+j ) . Given the equation (107), by regression, we have that M, v |= P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[σi , σ j ] iff � M, v |= { k∈K [Mk,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} → � { g∈G [Mg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t+j )]}, Analogously to the previous cases, M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[σi , σ j ] iff � � M, v |= {∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { ∃�y[Mg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ+k,P =τ+g,Q )]}} g∈G k∈K mentioning only time variables in σi and σ j . This concludes the prove for f. Case starts: We consider the following form: P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ] def = (ElapsedP (i, �x, ti− , ti+ , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j )) ∧ (ActiveP (i, �x, ti− , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j )). Given the equation (107), we have that M, v |= P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ] iff M, v |= � ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} → � { g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti− =t−j )]}∧ � { k∈K [Nk,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P )]} → � { g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti− =t−j )]}, { x, ti− , ti+ , S 0 ) k∈K [Mk,P (� Given this form, we have that M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ] iff M, v |= � � {∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−k,P =τ−g,Q )]}}∧ �k∈K � xNk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−k,P =τ−g,Q )]}} k∈K {∀� mentioning only time instances or variables in σi and σ j . This concludes the prove for s. D PROOFS FOR SECTION 6 64 Case during: We consider the following form: P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[σi , σ j ] def = (ElapsedP (i, �x, ti− , ti+ , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j ))∧ (t−j ≤ ti− ∧ ti+ ≤ t+j ))∧ (ActiveP (i, �x, ti− , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (t−j ≤ ti− )). Analogously to the previous case, M |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[σi , σ j ] iff M, v |= � � {∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−g,Q ≤ τ−i,P ∧ τ+i,P ≤ τ+g,Q )]}}∧ �k∈K � xNk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−g,Q ≤ τ−i,P )]}} k∈K {∀� mentioning only time variables or instances in σi and σ j . This concludes the prove for d. � Concluding the proof of Theorem 6: To conclude the proof we consider the trivial transformation from CNF to DNF formulas, i.e. the CNF form � � � � d∈D w∈Wd Bd,w , from the Lemma, can be expressed as an equivalent DNF form n∈N m∈Mn Bn,m for a suitable set of indexes D, N, Wi , and Mi . We consider now the form (108), i.e. � � � � � x∃�y(Pd (id , �x, τ−k , τ+k ) opr Qr ( jk , �y, τ−g , τ+g )[σih , σ jr ]), d∈D w∈Wd r∈Rd,w k∈Kd,w,r g∈Gd,w,r ∀� where D, W, R, K, and G are finite set of indexes, τ−k , τ+k (τ−g , τ+g ) are the temporal variables mentioned in the ground situation σid . We consider this formula in the following form: � � � � d,w,r (113) d∈D w∈Wd �r,k�∈RKd,w g∈Gd,w,r Bk,g , with Bd,w,r x ∃�y Pd (id , �x, τ−k , τ+k ) opr Qr ( jk , �y, τ−g , τ+g )[σid , σ jr ]. By applying the CNF to the DNF k,g representing ∀� transformation we can pass through the following equivalent forms. � � � � � � � � From (1) d∈D �w,r�∈WRd k∈Kd,w,r g∈Gd,w,r Bd,w,r d∈D �w,r�∈WRd n=�n1 ,n2 �∈Nd,w m=�m1 ,m2 �∈Md,w,n k,g we get (2) d,w Bd,w n,m , with Bn,m representing ∀�x∃�yPd (id , �x, τ−n1 ,m1 , τ+n1 ,m1 )opn1 ,m1 Qn1 ,m1 ( jn1 ,m1 , �y, τ−n2 ,m2 , τ+n2 ,m2 )[σid , σ jn1 ,m1 ] � � � Bd,w,r n,m , and the previous form can be expressed as (4) ∈ WRNd abbreviates �w, r� ∈ WRd , n ∈ Nd,w . d∈D �w,r,n�∈WRNd m∈Md,w,n � � � By applying again the CNF to DNF transformation we get (5) z∈Z s=�s1 ,s2 ,s3 �∈S z m∈Mz,s,n Bz,s m , which is equivalent to (3) � � � d∈D �w,r�∈WRd ,n∈Nd,w Bd,w,r n,m , where �w, n� m∈Md,w,n ∀�x∃�yPz,s1 (iz,s1 , �x, τ−s2 ,m1 , τ+s2 ,m1 )op s3 ,m1 Q s3 ,m1 ( j s3 ,m1 , �y, τ−s2 ,m2 , τ+s2 ,m2 )[σiz,s1 , σ js1 ,m1 ] � � � � � hence, if we call s∈S z m∈Mz,s,n as q=�q1 ,q2 ,q3 ,q4 �∈Qz , we get (6) z∈Z q∈Qz Bzq . Now, from (6), with Bzq representing ∀�x∃�yPz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 )opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ]. We obtain the required form (32). This concludes the proof of the theorem. � D PROOFS FOR SECTION 6 65 D.2 Proof of Theorem 7 First note that each symbol op is associated to the temporal relation γop (t1− , t1+ , t2− , t2+ ) obtained as a combination of relations = and < as specified in equation (31). Let M be a structure of LT FS C such that M is a model of DT and suppose that for some assignment v the followings hold: (i) M, v |= ∀�x∃�y P(i, �x, τ−p , τ+p ) op Q( j, �y, τ−q , τ+q )[σi , σ j ], (ii) M, v |= ∃�x ElapsedP (i, �x, τ−p , τ+p , σi ) or M, v |= ∃�x ActiveP (�x, τ−p , σi ). We can prove the theorem by cases for each op in {m, s, f, b, d}. First of all we consider the case of m. Since P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ] def = ElapsedP (i, �x, ti− , ti+ , σi ) → � � (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti+ = t−j ) , we have that for any model M |= DT and assignment v such that the items (i) and (ii) hold, � � M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ) → (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ+p = τ−q ) , and, � � M, v |= (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ+p = τ−q ) . (τ+p (114) τ−q ), Thus, by (i) and (ii) we get M, v |= = Let [d−p , d+p dq− , dq+ ] be an assignment to the variables according to v (or an interpretation of the ground terms) then, since M, v |= (τ+p = τ−q ), by the above equation 114 it follows that the algebraic relation γm (d−p , d+p , dq− , dq+ ) holds too in M. The proof for b and f is analogous. For s, we have the following macro expansion: P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ] def = (ElapsedP (i, �x, ti− , ti+ , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j )) ∧ (ActiveP (i, �x, ti− , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j )), we have that for any model M |= DT and assignment v and such that (i) holds � � M, v |= ActiveP (i, �x, τ−p , σi ) → (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ−p = τ−q ) , and � � M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ) → (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ−p = τ−q ) . By (ii), either M, v |= ActiveP (i, �x, τ−p , σi ) or M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ), in both cases M, v |= (τ−p = τ−q ). Hence given the assignment [d−p , d+p dq− , dq+ ] to the time variables, the relation γs (d−p , d+p , dq− , dq+ ) holds in M. For d, we are given the following form: P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[σi , σ j ] def = (ElapsedP (i, �x, ti− , ti+ , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j ))∧ (t−j ≤ ti− ∧ ti+ ≤ t+j ))∧ (ActiveP (i, �x, ti− , σi ) → (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (t−j ≤ ti− )), D PROOFS FOR SECTION 6 66 we have that, by (i) and (ii), either M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ) then M, v |= (τ−q ≤ τ−p ∧ τ+p ≤ τ+q ) or M, v |= ActiveP (i, �x, τ−p , σi ) then M, v |= (τ−q ≤ τ−q ). Also in this case, given the assignment v, mapping the time variables to [d+p dq− , dq+ , dq− ] dq− ≤ d−p or dq− ≤ d−p ∧ d+p ≤ dq+ . The case for ground terms is analogous. Thus, either the relation γd (d−p , d+p , dq− , dq+ ) holds or its relaxed version γd� (d−p , d+p ) holds. � D.3 Proof of Corollary 2 It is consequence of Theorem 6, Theorem 7, and of the network construction. Indeed, by Theorem 6, we have that I(T c , [ω]) can be reduced to the form (32): � � x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ]. z∈Z �q1 ,q2 ,q3 ,q4 �∈Jz ∀� Form this, following the network construction steps (a)-(d), we can build the temporal constraint network ζ: � � µqop1 z,q (τ−z,q2 , τ+z,q2 , τ−z,q4 , τ+z,q4 ). z∈Z (q1 ,q2 ,q3 ,q4 )∈Jz 3 We have to show that, given a model M for DT : (1) if M, v |= I(T c , [ω]), then the assignment v represents a solution for ζ(DT , T c , [ω]); (2) given an assignment v which is a solution for ζ(DT , T c , [ω]), then M, v |= I(T c , [ω]). (1) If M, v |= I(T c , [ω]), then, given the (32) form, there exists at least one z ∈ Z such that M, v |= � x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·). Therefore, for each associated conjunct, indexed by �q1 , q2 , q3 , q4 � ∈ �q1 ,q2 ,q3 ,q4 �∈Jz � Jz , M, v |= ∀�x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·). If (i) and (ii) of Theorem 7 are satisfied, then we can apply Theorem 7 that ensures that µqop1 z,q3 (obtained from m2 if E Pz,q1 holds or from m3 if APz,q1 holds) is satisfied by the assignment v. In the remaining case, since (i) and p1 (ii) are not satisfied then NP holds, thus µop is trivially satisfied because it does not impose any constraint on z,p3 the associated variables. This concludes the proof for (1). (2) As for the other direction, we have to show that, given a model M of DT , given an assignment solution V for the temporal network ζ(DT , T c , [ω]), then the assignment v that is like V w.r.t. the time variables ω is such that M, v |= I(T c , [ω]). Analogously to the previous case, since ζ is a disjunction of conjunctions, if ζ � is satisfiable, then there exists at least one disjunct �q1 ,q2 ,q3 ,q4 �∈Jz µqz,q1 3 (·), indexed by z ∈ Z, such that V is an assignment solution. Now, given one of the conjuncts µqz,q1 3 (·), by the step (c) of the ζ construction, there exists an associated conjunct ∀�x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·) in (32) that is also partially consistent. We show that, given the assignment v restricted to the time variables ω that is like V on these, M, v |= ∀�x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·) for each opz,q2 ∈ {m, f, s, d, . . . }. � For each of these cases, Pz,q1 (·) opz,q2 Qz,q3 (·) can be reduced to the following form (A) (E P (�x, �τ, ·) → (QE P (�y, �τ, ·) ∧ µop (�τ))), where the E P (�x, �τ, ·) are mutually exclusive in the disjunction (i.e. given an assignment for �x,�τ,and · at least one µop is enabled). We may assume, by contradiction, that v is such that M, v �|= � ∀�x∃�y.Pz,q1 (�x, �τ) opz,q2 Qz,q3 (�y, �τ), hence M, v �|= ∀�x∃�y. (E P (�x, �τ, ·) → (QE P (�y, �τ, ·) ∧ µop (�τ))). However, by � FOL, from this we get that M, v �|= ∀�x. (E P (�x, �τ, ·) → (∃�yQE P (�y, �τ, ·) ∧ µop (�τ))). From this, it follows that there � exists v∗ that extends v with an assignment for �x, such that M, v∗ �|= (E P (�x, �τ, ·) → (∃�yQE P (�y, �τ, ·) ∧ µop (�τ))), hence, for each disjunct, M, v∗ �|= E P (�x, �τ, ·) → (∃�yQE P (�y, �τ, ·) ∧ µop (�τ)). At this point, for each disjoint, we have two possible cases: (B) M, v∗ �|= E P (�x, �τ, ·) → ∃�yQE P (�y, �τ, ·) or (C) M, v∗ �|= E P (�x, �τ, ·) → µop (�τ). However, by the partial consistency assumption, (B) is contradicted in at least one disjoint. On the other hand, (C) requires that M, v∗ |= E P (�x, �τ, ·) and (D) M, v∗ �|= µop (�τ). But (D) contradicts the assumption, in fact, by assumption, v∗ restricted to the temporal variables is like V that solves the algebraic relation represented by µop (�τ). This concludes the proof for (2). � E PROOFS FOR SECTION 7 67 E Proofs for Section 7 E.1 Proof of Proposition 3 We have to show that given a model M of DT if M |= DoT F(prog, init , , (h s , he )), and assuming that ttime( ) ≤ he and h s ≤ ttime( M |= init init ), then �S . We shall show the statement by induction on the structure of prog. The basic case is given for the primitive action prog = a. In this case, we have to show that if M |= DoT F(a, init , , (h s , he )), then init �S . However, by definition, either M |= = ddo(a, s, init ) or M |= init = , in both cases the statement holds for the basic case. Now assume, by induction, that the statement holds for DoT F(prog, init , , (h s , he )), we show the following constructs. 1. Consider the program sequence: DoT F(prog1 ; prog2 , pothesis we have that there exists �� such that init �S init �S . , (h s , he )), by definition and inductive hyand �� �S , hence, by transitivity we have init , �� 2. Consider the Partial-order action choice: DoT F(prog1 ≺ prog2 , assumptions M |= ∃ �� , ��� (DoT F(prog1 , init , �� init , , (h s , he )) ∧ DoT F(prog2 , ��� , (h s , he )). By definition and the , , (h s , he )) ∧ �� �S ��� ) (115) Thus, there are two bags of timelines �� , ��� such that init �S �� and ��� �S by inductive hypothesis, and �� �S ��� by definition, therefore by transitivity of �S we obtain that init �S 3. Consider Nondeterministic action choice: M |= DoT F(prog1 |prog2 , M |= DoT F(prog1 , init , hence, by inductive hypothesis, , (h s , he )) ∨ DoT F(prog2 , init init , init , , (h s , he )). By definition, , (h s , he )). �S . 4. Consider the Nondeterministic iteration: M |= DoT F(prog∗ , init , , (h s , he )). By inductive hypothesis we have that, if M |= DoT F(prog, � , �� , (h s , he )), then � �S �� , hence, by transitive closure, init �S . The other cases follow by analogous reasoning. � E.2 Proof of Proposition 4 Analogously to the previous proof, we show the statement by induction on the structure of prog. The base of the induction uses the primitive action cases and the other statements are proved using the inductive hypothesis. E PROOFS FOR SECTION 7 68 1. Primitive action. If prog = a, the executability is a direct consequence of the executability of a in init . Indeed, in any model M of DT , if DoT F(a, init , � , (h s , he )) and ttime( ) ≥ h s and ttime( � ) ≤ he hold, by definition of DoT F and FOL, we have that ∃s(s ∈ ∧ a=ν s ∧ Poss(a, s) ∧ time(s) ≥ h s ∧ time(s) ≤ he ∧time(s) ≥ time(a)∧ (time(a) ≤ he ∧ � = ddo(a, s, )). Hence, if σ ∈ � holds, either σ ∈ init or σ = do(a, σ� ) with σ� ∈ init . In the two cases, by assumption and definition of DoT F, executable(σ) obtains. 2. Program sequence. If DoT F(prog1 ; prog2 , init , , (h s , he )) holds, we can assume by induction that, there exists �� such that the property holds for: (1) DoT F(prog1 , init , �� , (h s , he )); (2) DoT F(prog1 , �� , � , (h s , he )). Therefore, by assumption and (1) we can conclude that any σ ∈ �� is executable. Now, since any σ ∈ �� is executable, we can apply the inductive hypothesis to (2) and conclude that if σ ∈ then σ is executable. 3. Partial-order action choice. Analogously to the previous case, if DoT F(prog1 ≺ prog2 , init , , (h s , he )) holds, there exists �� and � such that (1) DoT F(prog1 , init , �� , (h s , he )) and (2) DoT F(prog2 , � , , (h s , he )) with (3) exec( �� , � , (h s , he )). Now, from (1) and (3), by the inductive hypothesis and definition of exec, any σ ∈ � is executable. Therefore, by (3) and the inductive hypothesis, we can also conclude that if σ ∈ then σ is executable. 4. Nondeterministic iteration. By the inductive hypothesis, if DoT F(prog, � , �� , (h s , he )) holds and σ ∈ � , only if it is executable, then σ� ∈ �� only if it is executable. Analogously to Proof E.1, the statement can be proved by transitive closure. 5. The cases of test, nondet. choice of actions, nondet. choice of argument are straightforward. � E.3 Proof of Proposition 5 The proof is a direct consequence of Corollary 2. By FOL, we have that, for any model M of DT , and for any assignment v to ω, M, v |= DoT F(prog, init , [ω], (h s , he )) ∧ I(T c , [ω]) iff M, v |= DoT F(prog, init , [ω], (h s , he )) and M, v |= I(T c , [ω]). However, by Corollary 2 we have that, given a model M of DT , for any assignment v to the free variables ω of [ω], M, v |= I(T c , [ω]) iff v is an assignment solution for the network(DT , T c , [ω]). Hence, by FOL and Corollary 2, we have that for any model M of DT , for any assignment v to ω, M, v |= DoT F(prog, init , [ω], (h s , he )) ∧ I(T c , [ω]) iff M, v |= DoT F(prog, init , [ω], (h s , he )) and v is a solution for network(DT , T c , [ω]). � E.4 Proof of Proposition 6 Given the DS T S C , we can build an associated DT denoting a single component. We introduce a constant v representing a unique state variable. Each action a st (·) and fluent P st (·) used in DS T S C is also introduced in DT . For each start pst (·) and end pst (·) in DS T S C we introduce an action start st (v, ·) and end st (v, ·) in DT . The action preconditions associated with start st (v, ·), end pst (v, ·) are the one in DS T S C . In this case, the processes are not st needed, preconditions Dap and successor state axioms D ss coincide with Dap and D stss respectively. As for DS 0 , st st st + this coincides with DS 0 . At this point, the DT is defined by D ss ∪ Dap ∪ DSst0 ∪ A+ ∪ Duna . Note that, since the durative actions in DS T S C are not the processes in DT , hence Dπ is left empty. We shall show the statement by induction on the structure of prog st . For the base step we consider the primitive action. We can show that (1) DS T S C |= Do(a, s, s� ) iff (2) DT |= DoT F(a, , � , (0, ∞)). If we consider the horizon (0, ∞), (1) can be reduced to DT |= ∃s(s ∈ ∧ a=ν s ∧ Poss(a, s) ∧ time(s) ≥ time(a) ∧ ( � = ddo(a, s, )))). E PROOFS FOR SECTION 7 69 On the other hand, the (1) holds iff DS T S C |= Poss(a, s) ∧ start(s) ≤ time(a) ∧ s� = do(a, s). Since DT extends DS T S C , it is easy to see that (2) entails (1). As for the other direction, is composed of a single situation, therefore, for any situation σ that satisfies (2) it is possible to build a bag of situation that satisfies (1). By induction on the structure of the program, it is easy to show that the property holds for the other standard Golog constructs. �