DR 5.1.2: Methods and paradigms for skill learning based on

Transcription

DR 5.1.2: Methods and paradigms for skill learning
based on affordances and action-reaction observation
Mario Gianni, Panagiotis Papadakis, Fiora Pirri and Matia Pizzoli
Dipartimento di Informatica e Sistemistica - Sapienza Università di Roma, via
Ariosto 25, 00185 Rome, Italy
[email protected]
Project, project Id:
Project start date:
Due date of deliverable:
Actual submission date:
Lead partner:
Revision:
Dissemination level:
EU FP7 NIFTi / ICT-247870
Jan 1 2010 (48 months)
December 2010
February 2011
ROMA
FINAL
PU
This document describes the status of progress for the research on learning
skills for functioning processes and task execution performed by the NIFTi
Consortium. In particular, according to the Description of Work (DOW),
research is focused on the development of novel methods and paradigms for
skill learning based on affordances and action-reaction observation. Planned
work, as per the DOW, is introduced and the actual work is discussed,
highlighting the relevant achievements, how these contribute to the current
state of the art and to the aims of the project.
1
Methods and paradigms for skill learning
Gianni, Papadakis, Pirri, Pizzoli
1 Tasks, objectives, results
1.1 Planned work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Actual work performed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Task T5.1 Planning activities specification with end user (completed)
1.2.2 Task T5.2: Learning Skills for functioning processes and task execution (in progress) . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 T5.3: Task-driven attention for coordination and communication (in
progress) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 T5.5: Situated exploration history (in progress) . . . . . . . . . . . .
1.3 Relation to user-centric design . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Relation to the state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
6
6
6
14
14
15
16
2 Annexes
2.1 Rudi et al. “Linear Solvability in the Viewing Graph”
(ACCV2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Fanello et al. “Arm-Hand behaviours modelling: from attention to imitation”(ISVC2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 H. Khambhaita et al. “Help Me to Help You: how to Learn Intentions,
Actions and Plans”(AAAI-SSS11) . . . . . . . . . . . . . . . . . . . . . . .
2.4 Carbone and Pirri. “Learning Saliency. An ICA based model using Bernoulli
mixtures.”(BICS 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Gianni et al. “Learning cross-modal translatability: grounding speech act
on visual perception.”(RSS 2010) . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Carrano et al. “An approach to projective reconstruction from multiple
views.”(IASTED 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Krieger and Kruijff. “Combining Uncertainty and Description Logic RuleBased Reasoning in Situation-Aware Robots.”(AAAI-SSS 2011b) . . . . . .
2.8 Stachowicz and Kruijff. “Episodic-Like Memory for Cognitive Robots.”(IEEETAMD 2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Pirri et al. “A general method for the Point of Regard estimation in 3D
space.”(CVPR 2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Finzi and Pirri. “Switching tasks and flexible reasoning in the Situation
Calculus.”(TR 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3 Arm-Hand behaviours modelling: from attention to imitation
36
4 Learning Saliency. An ICA based model using Bernoulli mixtures
48
20
21
22
23
24
25
26
27
28
29
5 Learning cross-modal translatability: grounding speech act on visual
perception
60
6 An approach to projective reconstruction from multiple views
63
7 Switching tasks and flexible reasoning in the Situation Calculus
70
EU FP7 NIFTi (ICT-247870)
2
Executive Summary
This document describes the status of progress for the research on learning
skills for functioning processes and task execution performed by the NIFTi
Consortium. In particular, according to the Description of Work (DOW),
research is focused on the development of novel methods and paradigms for
skill learning based on affordances and action-reaction observation. Planned
work, as per the DOW, is introduced and the actual work is discussed,
highlighting the relevant achievements, how these contribute to the current
state of the art and to the aims of the project.
The exploration of an unknown and dynamic environment, the need to
readily address the requests produced by the mixed initiative require the
robot control to be flexible and adapt to the continuous changes of context.
Switching between sensing modalities, changing the focus of attention or
asking the operator’s intervention are example of skills that are desirable in
order to develop the user centric, interaction oriented architecture that is
the primary goal of the NIFTi research.
Task 5.2 (T5.2), whose status of advancement is reported in this document, addresses the problem of providing methods and paradigms for learning the required skills. The training set is made of observations on demonstrations performed by humans. Central in this learning paradigm is the
role of data collection in real scenarios from demonstrations by experts.
Thus, together with the development of the flexible planning architecture, a
novel approach to data collection from online demonstrations is introduced.
This approach relies on the Gaze Machine, a system for the acquisition of
demonstration data from the demonstrator’s point of view.
The aim of this document is to report about the research carried out by
WP5 and mainly concerned with T5.2. Nonetheless, the performed work
relies on the analysis resulting from T5.1 and provides input to all the other
WP5 tasks. Thus, although the main focus of this deliverable is on T5.2,
DR5.1.2 also doubles to report progress on other WP5 tasks, and in particular on the work related to flexible planning, which have been contextually
started.
Role of Skill Learning in NIFTi
During the exploration of an unknown area, the NIFTi human-robot team
continuously acts and interacts. Thus, for an artificial agent, execution have
to be adapted in order to address sudden, asynchronous needs generating
from the intervention by the operators and by the dynamic nature of the
unknown environment. Task 5.2, Learning skills for functioning processes
and task execution, aims at developing paradigms and methods to acquire
the skills necessary for such control, from human-robot interaction and live
3
demonstrations in real intervention scenarios. WP5 investigates skills involved in selecting and coordinating multiple tasks when operating. Skills
are learned by continuous interaction with humans, via demonstration and
through an action-reaction paradigm, to acquire the effects of actions and
processes.
Contribution to the NIFTi scenarios and prototypes
The problem that is addressed in T5.2 is to develop methods and paradigms
to learn skills that are needed to plan and coordinate the tasks the NIFTi
architecture has to accomplish to reach the goal in the USAR scenario.
The training set is made of observations on demonstrations performed by
humans. This means that where humans look, how they move, what actions
they perform and how they report what they see and do through speech
constitute the training data for skill learning. The use of human visual
attention as a form of demonstration is novel in the definition of a model
for flexible planning.
Learning to detect salient features, gestures and actions requires an instrument for the study of visual attention. The most salient regions in
a scene can be determined analysing the performed sequence of fixations,
i.e. tracking the gaze. The need for a special, custom device, thus not
relying on commercially available solutions, originated from a number of
considerations. First, in the USAR scenario, the demonstrator moves in a
3-dimensional environment and his actions, in the form of fixations, motion
or manipulation, are inherently 3-dimensional. The experiments require extracting scan-paths from extremely varying datasets, implying completely
different experimental setups in terms of distance and camera field of view.
So, a high level of flexibility is desirable. Finally, our experiments involve
a number of subjects and complex, dynamic scenarios: calibration and acquisition procedures must require as little intervention from the operator as
possible and the whole device should be highly automatic and non-invasive.
Along with accuracy and robustness, the need for a device suitable for the
on-field data acquisition led to the introduction of a novel calibration procedure that is almost completely automatic and can be carried out with very
little intervention by the operator. The device we call the Gaze Machine
allows such a data acquisition, as it is described in the following.
4
Develop an interactionoriented computational
model of visual attention
(T5.2)
Gaze Data
collected in 3D gaze
tracking experiments
Data
acquisition
(September
2010)
3D saliency map
Inertial
Measurements
containing referring
expressions
Domain material
action sequences specifying
protocols and procedures
Task driven attention
for coordination and
communication
(WP5)
Attentive joint
exploration (WP5)
Skill learning (WP5)
for the subject’s head
Running
commentaries
Domain
analysis and
specifications
(DR 5.1.1)
motion
gestures actions
Cognitive Execution
Monitoring (WP5)
• action - reaction
Adaptation (WP5)
•
•
Cognitive task load
(WP4)
observations
affordances
user interaction
Multimodal HRI (WP4)
Figure 1: Contribution of the learning paradigm based on the Gaze Machine
data collection to the NIFTi scenario
1
1.1
Tasks, objectives, results
Planned work
WP5 contributes to NIFTi project via the realization of a flexible time planner with time interval compatibilities, resources and components management related to different robot processes. It also contributes to the execution
and monitoring of the planned actions, and to all those activities that make
planning adaptable to the user needs, to the task requirements, to jointproviso of team-work and to the processes and resources to be allocated for
a designated mission. These activities include skill learning, accommodation
to end-user planning procedures, modeling of end-users behaviors and most
significantly of end-users instructed attentive behaviors while performing
risky operations.
T5.2 addresses the problem of how a robot can control task execution
in such a dynamic setting. The objective of the task is to develop methods
for acquiring the skills necessary for such control. Task T5.2 is in progress
and most of the activities have been completed. In particular the present
document is due as the deliverable DR 5.1.2 on Methods and paradigms
for skill learning based on affordances and action reaction observation (as
in Description of Work) and is being delivered at the end of the first year
(12th month, December 2010).
5
1.2
Actual work performed
In the following a brief summary of the activities related to the started WP5
tasks is presented and the relative status of progress is reported.
1.2.1
Task T5.1 Planning activities specification with end user
(completed)
1. A learning scenario has been specified with the Italian Fire Fighters
(VVFF) and has been settled in Montelibretti during the NIFTi meeting in September 2010.
2. Standard procedures in the “smoke hall”, simulating the experience
of exploring an unknown path in a real fire event, have been acquired
from the VVFF and synthesized via a graphical representation (see
DR5.1).
3. Primitive processes for planning and execution monitoring have been
defined with the end users (VVFF) according to the above procedures.
These have been compiled into basic knowledge structures (action preconditions and action effects) for action-reaction.
4. Preliminary compatibilities have been specified according to the already active components of NIFTi robot (mapping-vision-navigation).
A comprehensive report on the above activities is available as the Deliverable
DR 5.1.1.
1.2.2
Task T5.2: Learning Skills for functioning processes and
task execution (in progress)
1. Methods for acquiring human skills have been provided via the Gaze
Machine, which has been upgraded to operate in different outdoor
scenarios and also in abrupt light changes, like between strong light
and twilight [55].
2. Methodologies to identify human motion in complex scenarios, in the
presence of different motions, have been developed, these have been
published in the context of action recognition and classification (see
[18] and [39]), based on motion segmentation via attention.
3. Data have been gathered by an experienced VVFF instructor wearing
the Gaze Machine inside the disaster area.
Task 5.1 on Planning activities specification with end user, that was accomplished by the end of the tenth month (October 2010), made available
the specifications of context scenario and skills primitives. On that basis,
6
ETHZ contributed to the objectives of Task T5.2 by specifying an ontology
of actions that the NIFTi platform should be able to execute. Corresponding low-level execution has been implemented using the open source ROS
Navigation Stack [1], according to the NIFTi architecture design concepts.
In the first year of the NIFTi research, Task 5.2 focused on providing
methods and paradigms for skill learning. The main effort has been dedicated
to build and improve the Gaze Machine device in order to make it suitable
for the on-field data collection. Indeed, the Gaze Machine is crucial in all
the complementary learning activities involving attention, that is, all the
tasks in WP5: T5.2, T5.3, T5.4, T5.5, T5.6, and its development will now
be described more thoroughly.
Figure 2: Gaze Machine camera configuration. Scene cameras are calibrated
for stereo depth estimation. Image from [40], reproduced under permission.
The Gaze Machine [6, 40, 55] is a system for the acquisition of training
data from demonstrations based on a head-mounted, three-dimensional gaze
tracker. It constitutes the core of our approach, being the instrument we
use to collect data and build our models.
The Gaze Machine relies on an innovative model for 3D gaze estimation
using multiple cameras. Firmly grounded on the geometry of the multiple
views, it introduces a calibration procedure that is efficient, accurate, highly
innovative but also practical and easy. Thus, it can run online with little
intervention from the user. The overall gaze estimation model is general, as
no particular complex model of the human eye is assumed in this work. The
resultant system has been effectively used to collect gaze data from subjects
freely moving in a dynamic environment and under varying light conditions.
The implementation of the Gaze Machine platform has required important algebraic and geometric modeling that has produced a number of publications (see [60],[10],[11],[14]). Visual localization and mapping of the Gaze
Machine, hence of the subject wearing it, in order to provide clear human
7
IMU
(back)
scene cameras
eye cameras
microphone
Figure 3: The fire fighter trainer is wearing the Gaze Machine.
uses of vantage points in task execution and in exploration of any kind of
environment, is being attained and an initial framework has been reported
in [32].
Since the demonstrator moves and acts in the 3D world, the POR is
estimated in 3D and, if needed, the structure of the object of fixation can
be recovered from the Gaze Machine stereo rig.
Inertial data are useful to understand head pose and movements. Vision
actions are individuated by changing in head poses; head accelerations are
also used to discard correspondent frames, which are very likely to contain
high blurring.
Running commentaries contain a spoken description of the activity the
operator is currently involved in. In the skill learning task, this information
is used to label activities making use of referring expressions.
The device is wearable, allowing data to be collected while performing
natural tasks in unknown, unstructured environments instead of experimental, lab settings.
In order to collect data that is meaningful for skill learning the scenario
for the experiments must be suitably designed.
As previously stated, a first experimental data collection has been carried
out during the NIFTi meeting at SFO, in Montelibretti. The Scuola di ForEU FP7 NIFTi (ICT-247870)
8
Figure 4: An example of the Gaze Machine on-field calibration. The fire
fighter instructor is calibrating the device before starting a session of data
acquisition in Montelibretti, on September 2010. Calibration is carried out
by fixating a point (in this case the center of the marker) while moving the
head.
mazione Operativa (Operational Training School, SFO) provided an ideal
test ground for the consortium to address the real evaluation scenario. As
described in the DOW document, which states the NIFTi USAR scenario,
rescuers need to make an assessment of the real situation at a disaster site.
During September 2010 meeting at SFO, VVFF set up the tunnel car accident scenario, as described in DR 5.1.1. The NIFTi systems were deployed
in order to collect data. Contextually, 10 Gaze Machine acquisitions were
scheduled. The experiments involved 5 people: one expert fire fighter instructor and four common people, namely researchers and students involved
in the NIFTi project.
The intervention procedure was summarised by the expert firefighter
during a pre-briefing. The content of the briefing discussion is reported in
the following. According to the firefighter instructor’s description, a scenario
depicting a car accident in a tunnel represents an extremely critical situation.
The NIFTi tunnel simulation scenario doesn’t involve the presence of fire,
which makes things easier. Rescue is divided in a first, quick, preliminary
survey with the aim of reporting the status of victims and most evident
dangers related to the structure, and the actual extraction. The description
provided gave raise to two possible intervention scenarios:
1. only one leading fireman equips with self-contained breathing apparatus and access the disaster area; after, he reports to the team waiting
outside the tunnel. From that brief description the rescue is organized;
9
Figure 5: Briefing before the experiments. On the left, a particular of the
visual acquisition facility composing the Gaze Machine. On the right, fire
fighter instructor Salvatore Candela is summarizing the procedure for the
tunnel car-accident scenario.
2. the rescue team clear and decontaminate the disaster area to allow the
extraction of victims.
The task that was assigned to the subjects was thus to access the tunnel area as if they had to make a rapid assessment about the victims, their
number and positions, and about the dangers resulting from the possible
presence of toxic or flammable substances and the chance of structure collapses.
Processed data are available on the NIFTi repository at the address
http://dav.nifti.eu/share/media/ROMA/SFO ROMA data. For every experiment, a video sequence is available for each of the cameras building up
the system. Videos are MPEG-4 files, encoded by XVID codec. The four
video streams are synchronized. The inertial measures are acquired at the
same rate and consist in roll-pitch-yaw angles of the head. Running commentaries are encoded in MP3 files, with the same time duration of video
sequences and thus can be played synchronously. The subtitle file (.srt) contains the transcription of the running commentary in the SubRip file format
(http://en.wikipedia.org/wiki/Subrip), which contains formatted text
for the starting/stopping time of every comment. The following running
commentary has been collected during the first run of fire fighter instructor Candela into the Montelibretti tunnel scenario and can be found in the
NIFTi data repository, in the GM data/salvatore-1 path.
COMMENTARY BEGINS
00:00:42,800 – 00:00:43,117 Ok.
11 00:00:43,555 – 00:00:45,288 I’m approaching the tunnel.
12 00:00:45,289 – 00:00:47,894 I see... I’m collecting an attentive, quick
survey.
13 00:00:47,894 – 00:00:50,381 The area is apparently safe.
10
Figure 6: Operator observing the state of a victim and the fixation measured
by the GM.
14 00:00:50,782 – 00:00:53,100 Also air is quite breathable.
15 00:00:53,150 – 00:00:58,830 I don’t smell flammable liquids or other
dangerous substances.
16 00:00:59,000 – 00:01:00,701 I found the first car.
17 00:01:01,000 – 00:01:05,467 It is flipped by 90 degs w.r.t. the surface.
18 00:01:05,500 – 00:01:08,278 I’m trying to look inside.
19 00:01:08,278 – 00:01:09,678 Glasses are still intact.
20 00:01:09,900 – 00:01:12,262 Thy are dark so I cannot see well.
21 00:01:12,290 – 00:01:15,829 The person inside looks like unconscious.
22 00:01:15,961 – 00:01:18,361 One person is certainly inside.
23 00:01:19,000 – 00:01:20,521 I’m doing one more survey.
24 00:01:21,364 – 00:01:24,364 Now, in front of me, there’s a barrel.
25 00:01:24,400 – 00:01:27,035 Apparently it is an empty barrel.
26 00:01:27,123 – 00:01:32,123 No signs of toxic or dangerous substances.
27 00:01:32,785 – 00:01:34,785 It must be a building yard barrel.
28 00:01:34,900 – 00:01:37,648 Indeed I see a truck ahead.
29 00:01:37,840 – 00:01:39,900 On the right, ahead.
30 00:01:40,082 – 00:01:42,082 Certainly there are people inside.
31 00:01:41,993 – 00:01:44,993 A car... a civil car... with some people
inside.
32 00:01:45,000 – 00:01:45,780 A family.
11
33 00:01:46,012 – 00:01:47,012 People...
34 00:01:47,710 – 00:01:48,710 A woman drives.
35 00:01:48,914 – 00:01:51,000 A person in the front seat.
36 00:01:51,069 – 00:01:52,000 A child.
37 00:01:52,349 – 00:01:53,800 Another child in the rear seat.
38 00:02:00,000 – 00:02:01,702 Another child... a baby.
39 00:02:02,000 – 00:02:03,000 We could easily take them out
40 00:02:03,100 – 00:02:06,900 because the car doesn’t cause any difficulties,
41 00:02:07,000 – 00:02:08,426 it is not seriously damaged.
42 00:02:08,946 – 00:02:09,946 Another car.
43 00:02:11,019 – 00:02:13,019 Also in this: an unconscious person.
44 00:02:13,100 – 00:02:14,043 A child.
45 00:02:14,100 – 00:02:15,800 A child in the rear seat.
46 00:02:15,900 – 00:02:16,849 A wife by the side.
47 00:02:18,654 – 00:02:20,654 Another barrel on the ground.
48 00:02:20,655 – 00:02:22,000 On the ground objects from a building
yard.
49 00:02:22,203 – 00:02:24,183 Another barrel.
50 00:02:24,206 – 00:02:26,748 (counting) one... two... three... four
barrels.
51 00:02:26,748 – 00:02:28,900 We don’t know the contained substances.
52 00:02:29,152 – 00:02:30,152 We will address this.
53 00:02:30,300 – 00:02:32,010 Ok, two more people.
54 00:02:32,041 – 00:02:34,041 They’re all unconscious.
55 00:02:34,644 – 00:02:36,644 Air is still quite breathable for me.
56 00:02:37,901 – 00:02:41,901 I’m not wearing protection devices but I
can breath normally here.
57 00:02:42,889 – 00:02:44,889 Ok. I’m quitting.
COMMENTARY ENDS
Another example of collected running commentary is available in the
GM data/salvatore-2 path:
COMMENTARY BEGINS
1 00:00:15,000 – 00:00:17,000 Ready?
2 00:00:18,388 – 00:00:19,388 You can go.
3 00:00:41,177 – 00:00:53,177 (Instructions for calibration)
4 00:00:56,976 – 00:00:57,976 Ready.
5 00:00:59,340 – 00:01:00,340 Above me everything looks alright.
6 00:01:00,916 – 00:01:02,916 I’m entering.
12
7 00:01:02,670 – 00:01:03,670 Let’s see...
8 00:01:04,228 – 00:01:05,428 A car is flipped
9 00:01:05,483 – 00:01:06,483 People inside.
10 00:01:06,490 – 00:01:07,500 I can’t see very well
11 00:01:07,764 – 00:01:09,764 but for sure there’s a person inside
12 00:01:10,720 – 00:01:13,500 Building material on the road
13 00:01:13,722 – 00:01:14,722 I’m experiencing some difficulties in moving
14 00:01:15,047 – 00:01:17,047 A barrel with toxic substances
15 00:01:17,607 – 00:01:18,607 It’s oxygen, inflammable
16 00:01:19,077 – 00:01:21,077 Danger, caution
17 00:01:21,613 – 00:01:24,623 Ok, (counting): one, two, three unconscious people... four unconscious people
18 00:01:25,099 – 00:01:28,099 Another car... (counting) one, two, three
cars
19 00:01:28,530 – 00:01:29,530 A van.
20 00:01:29,913 – 00:01:32,450 Poison... toxic substance...
21 00:01:32,923 – 00:01:33,923 Radioactive ?
22 00:01:34,155 – 00:01:35,955 Thus, the van was carrying materials...
23 00:01:36,387 – 00:01:40,387 Ok, unconscious driver
24 00:01:40,620 – 00:01:43,620 Probably intoxicated, I don’t know, we’ll
check...
25 00:01:43,400 – 00:01:44,400 Ok.
26 00:01:44,824 – 00:01:45,824 Caution, another (barrel)
27 00:01:46,401 – 00:01:49,401 one, two, three... three, four barrels.
28 00:01:49,800 – 00:01:54,580 One, two, three... two, four cars... one
van.
29 00:01:55,052 – 00:01:57,052 12-13 people at least.
30 00:01:57,861 – 00:02:00,861 Ok, I go on... no fire ignition.
COMMENTARY ENDS
Finally, the gpr output files contain the fixations re-projected to the right
scene camera for every time step, in the form of mean and variance for the
x and y image coordinates, in this order.
Beside being used for Skill Learning, gathered data has provided an
important input to the work in WP1 on functional mapping (T1.4) and to
the work in WP3 on referencing in situated dialogue processing (T1.1/T1.3).
13
Figure 7: Gaze Machine calibration. The calibration pattern is detected in
the 3D world, allowing the model parameters to be recovered and the 3D
Point of Regard to be estimated from the optical axes of both the eyes.
1.2.3
T5.3: Task-driven attention for coordination and communication (in progress)
1. We have settled a number of experiments in the tunnel use case for a
multi-car accident, with the Gaze Machine worn by a VVFF instructor
(described in Section 1.2.2).
2. From the sequence of fixations and the running commentaries a scanpath has been produced, the localization of the scan-path is being
optimized as the training area induced a non trivial drift error. Early
results are described in [32]. We have elaborated these experiments
and labeled the sequence of actions with referring expressions, according to the recorded running commentaries.
1.2.4
T5.5: Situated exploration history (in progress)
1. Implementation of planning and interface of the planner with ROS.
2. Plan recognition and temporal specifications [33].
3. Integration of the planner with ROS, namely with both the Navigation
component and the Mapping components as provided in ROS.
4. Execution monitoring for preliminary exploration has been defined
in Eclipse Prolog and interfaced with ROS, namely with both the
14
Figure 8: Left: the GM worn by an experienced VVFF instructor in Montelibretti. Right: robot exploration of the tunnel disaster area.
Navigation component and the Mapping components as provided in
ROS.
How theory of actions emerges from skill learning is described in [54],
whereas a DL extensions to reasoning is described in [34], and general in
[67], see Annexes. The research related to flexible planning is reported in
[19], also contained in the Annexes.
1.3
Relation to user-centric design
As previously remarked, the construction of a knowledge base for planning
will mention compatibilities, specifying time relations, preconditions and
effects of processes, actions and behaviours. All these aspects need to be
adapted to the particular rescue situation and to specific needs from each
component. In order to be effectively flexible in time and adaptive in behaviours, ideally most of the specifications should be learned. We aim at
learning them using data gathered observing complex actions being demonstrated by a tutor.
The Gaze Machine offers an extraordinary vantage point, as it enables
to observe what effectively the tutor is doing and saying along with how he
adapts his behaviours, by instantiating with common sense the intervention
procedures that regulate his conduct in similar circumstances.
The motivations behind the study of human gaze reside in the fact that
attention plays a fundamental role in NIFTi. It influences all the aspects
concerning skill learning and cooperative human - robot interaction on which
planning and execution monitoring are based. Visual attention is closely
related to gazing: eye tracking experiments demonstrate that simple cues
(motion, contrast, color, luminance, orientation, flicker) can predict saccadic
behavior and, thus, the focus of attention.
Our effort has focused on collecting data during the on field test scenarios. At SFO a 3D dynamic gaze tracker, the Gaze Machine, has been used
15
to collect the attention foci relative to a firefighter instructor in the tunnel
scenario.
Our gaze estimation device has been improved in different directions in
order to be suitable for collecting data in the use case scenario. The used
gaze tracking prototype allows the POR to be estimated in 3D space. The
position of the fixed object in the world can thus be recovered with respect
to the subject, allowing the determination of the actual Field Of View along
with the foveated area and thus the analysis of peripheral vision.
For every performed demonstration, the main objective is to segment the
sequence in basic actions, such as motion, vision and manipulation actions.
Data for learning comprise
• scene recording
• 3d Point of Regard (POR) estimation
• inertial measurements
• running commentaries
For the sake of the experimental data collection, the acquisition device
has been improved in both the hardware and software. New cameras allow
for higher image quality and acquisition rate. A new tablet-convertible
toughbook is used to collect data from the experiment; it is carried as a
backpack and allows direct control on the experiment by a touch GUI. Thus,
both the calibration and acquisition phases can be easily operated on field
(as in Figure 4). Sensors and storage facilities are worn by the subject and
the experimental setup is completely self contained. Improvements has been
achieved also in the accuracy of the POR estimation. A novel calibration
phase has been proposed and a paper describing the new architecture has
been submitted to a computer vision conference [53].
Inertial data is collected by an Inertial Measurement Unit (IMU) placed
on the firefighter helmet while a microphone is used to record the spoken
running commentaries (see Figure 3).
1.4
Relation to the state-of-the-art
Cognitive execution is meant to comply with real adaptation to changing
objectives, following operators instructions (e.g. dialogue, commands) that
require switching between tasks and flexibly revise plans and processes. The
main novelty introduced by WP5-related work is the ability to learn several
skills using attention and generated gaze scan-paths that show the affordances of processes, including the communication steps, at several levels of
details (from activities to HRI). Learning skills provides classes of parameters and features for choosing strategies of actions according to time and
compatibilities constraints. So far no attempt has been made to create new
16
primitives and parameters on line (on line adaptation) based on the robots
experience and interaction.
Plan recognition is about inferring a plan from the observations of actions
[64, 30, 29, 4, 22]. The analogous concepts of acting based on observations
have been specified in the computer vision community as action recognition, imitation learning or affordances learning, as mainly motivated by the
neurophisiological studies of Rizzolatti and colleagues [50, 21] and by Gibson [25, 24]. Reviews on action recognition are given in [44, 56, 2] and on
learning by imitation in [3, 63].
The two approaches have, however, evolved in completely different directions. Plan recognition assumed actions to be already given and represented, in so being concerned only with the technical problems of generating
a plan, taking into account specific preferences and user choices, and possibly interpreting plan recognition in terms of theory of explanations [12].
On the other hand action recognition and imitation learning has been more
and more concerned with the robot ability to capture the real and effective
sequence and to adapt it to changing contexts. As noted by Krüger and colleagues in [35] the terms action and intent recognition, in plan recognition,
often obscure the real task achieved by these approaches. In fact, as far as
plan recognition assumes an already defined set of actions the observation
process is purely indexical.
On the other hand the difficulties with the learning by imitation and
action recognition approaches is that they lack important concepts such as
execution monitoring, intention recognition and plan generation.
The problem of learning a basic theory of actions from observations has
been addressed in [53], where it is shown how it is possible to automatically
derive a model of the Situation Calculus from early vision, thus providing
an example of bridging from perception to logical modeling.
The introduction of the Gaze Machine, as a device to implement the
skill learning paradigm, constitutes a novelty, providing an extremely rich
source of information an agent can use to learn a well temporised sequence
of actions and thus, to generate a suitable plan [32].
Besides, the device itself is novel and the design of the video-oculography
subsystem for the tracking of gaze and the localization in space of human
scan-paths are producing contributions to the research in computer vision
and visual attention [53, 61, 39].
Within the class of non invasive eye trackers, head mounted ones offer
the advantage of being more accurate than remotely located ones ([46]),
and enable gaze estimation of a person moving in unspecified environments.
Eye gaze estimation systems are, usually, a combination of a pupil centre
detection algorithm and an homography or polynomial mapping between
pupil positions and points on a screen observed by the user ([27]). Recently
this method was applied on a head mounted tracker ([37]), projecting the
gaze direction on the image of a camera facing the scene. Other methods
17
include pupil and corneal reflection tracking ([70]), dual Purkinje image
tracking ([13]) or scleral coil searching ([8]).
Three dimensional gaze direction estimation, instead, requires to determine the eye position with respect to the camera framing it. This process
usually needs a preliminary calibration step where eye geometry is computed
using a simplified eye model ([66]). Most of the systems are made up of one
or more cameras pointing to a subject looking to a screen, where fixation
points are projected ([49]). Therefore other calibration tasks are often necessary in order to reckon up screen and led position in the camera reference
frame.
The Gaze Machine platform, on the other hand, implements the geometry of the eyes motion manifold in order to compute the gaze fixations in a
dynamic 3D space [53], with abrupt light changes, as it was the case in the
tunnel scenario at SFO.
The problem of skill learning is related to the problem of managing
task switching at the appropriate time and context, and thus it involves
the control of many sources of information, incoming from the environment,
likewise arbitration of resource allocation for perceptual-motor and selection
processes. The new experimental setting we have provided encourages a
better understanding of the cognitive functioning of executive processes.
The ability to establish the proper mappings between inputs, internal states,
and outputs needed to perform a given tasks [43] is called cognitive control
or executive function in neuroscience studies and it is often analysed with
the aid of the concept of inhibition (see e.g. [43, 5]), explaining how a
subject in the presence of several stimuli responds selectively and is able to
resist inappropriate urges (see [69]). Cognitive control, as a general function,
explains flexibly switching between tasks, when reconfiguration of memory
and perception is required, by disengaging from previous goals or task sets
(see [42][52]).
The role of task switching in robot cognitive control is highlighted in
many biologically inspired architectures, such as the ISAC architecture [31],
ALEC architecture based on state changes induced by homeostatic variables
[20], Hammer [15] and the GWT (Global Workspace Theory) [65].
Studies on cognitive control, and mainly on human adaptive behaviours,
investigated within the task-switching paradigm, have strongly influenced
cognitive robotics architectures since the eighties, as for example the Norman and Shallice [48] ATA schema, the FLE model of Duncan [17] and the
principles of goal directed behaviours in Newell [47] (for a review on these
architectures in the framework of the task switching paradigm see [59]).
Also the approaches to model based executive robot control, such as
Williams [7] and earlier [28, 71], devise runtime systems managing backward
inhibition via real-time selection, execution and actions guiding, by hacking
behaviours. This model-based view postulates the existence of a declarative
(symbolic) model of the executive which can be used by the cognitive conEU FP7 NIFTi (ICT-247870)
18
trol to switch between processes within a reactive control loop. The flexible
temporal planning approach (see also Constraint-based Interval Planning
framework [28]), proposed by the planning community, has shown a strong
practical impact in real world applications based on deliberation and execution integration (see e.g. RAX [28], IxTeT [23], INOVA [68], and RMPL
[71]). Our method is an extension of these approaches as compatibilities
are directly learned on the basis of our Gaze Machine online acquisition of
behaviours, and our formalization well integrates control and reasoning.
19
2
2.1
Annexes
Rudi et al. “Linear Solvability in the Viewing Graph”
(ACCV2010)
Bibliography A. Rudi, M. Pizzoli and F. Pirri. “Linear Solvability in the
Viewing Graph”In: Proceedings of the 10th Asian Conference on Computer
Vision (ACCV 2010). Queenstown, New Zealand. November 2010. Lecture
Notes on Computer Science 6494, Part III.
Abstract The Viewing Graph [36] represents several views linked by the
corresponding fundamental matrices, estimated pairwise. Given a Viewing
Graph, the tuples of consistent camera matrices form a family that we call
the Solution Set. This paper provides a theoretical framework that formalizes different properties of the topology, linear solvability and number
of solutions of multi-camera systems. We systematically characterize the
topology of the Viewing Graph in terms of its solution set by means of the
associated algebraic bilinear system. Based on this characterization, we provide conditions about the linearity and the number of solutions and define
an inductively constructible set of topologies which admit a unique linear
solution. Camera matrices can thus be retrieved efficiently and large viewing
graphs can be handled in a recursive fashion. The results apply to problems
such as the projective reconstruction from multiple views or the calibration
of camera networks.
Relation to work performed In this paper we extend the notion of
solvability for a Viewing Graph introducing a new taxonomy, taking into
account both linear solvability and the number of solutions. An inductively
constructible set of topologies admitting a unique linear solution allows for a
building blocks design that can be used to inductively construct more complex topologies, very useful to combine global and incremental methods for
camera matrix estimation. Such a formalisation contributes to the hierarchical and recursive approaches for solving n-view camera pose estimations
and its use has been investigated in the context of reconstructing the motion
of a moving camera pair that is part of the 3D gaze estimation device in
unknown and unstructured environments.
20
2.2
Fanello et al. “Arm-Hand behaviours modelling: from
attention to imitation”(ISVC2010)
Bibliography S. R. F. Fanello, I. Gori and Fiora Pirri. “Arm-Hand behaviours modelling: from attention to imitation”In: Proceedings of the 7th
International Symposium on Visual Computing (ISVC 2010). Las Vegas,
Nevada, USA . September 2010. Lecture Notes on Computer Science 6454.
Abstract We present a new and original method for modelling arm-hand
actions, learning and recognition. We use an incremental approach to separate the arm-hand action recognition problem into three levels. The lower
level exploits bottom-up attention to select the region of interest, and attention is specifically tuned towards human motion. The middle level serves
to classify action primitives exploiting motion features as descriptors. Each
of the primitives is modelled by a Mixture of Gaussian, and it is recognised by a complete, real time and robust recognition system. The higher
level system combines sequences of primitives using deterministic finite automata. The contribution of the paper is a compositional based model for
arm-hand behaviours allowing a robot to learn new actions in a one time
shot demonstration of the action execution.
Relation to work performed The paper describes an approach to attentively recognize human gesture in the context of learning by imitation.
The notion of saliency to motion is at the basis of the non-verbal interaction in NIFTi, as it provides segmentation of the human motion and, in
the USAR scenario, allows the robot to focus on visually issued requests
by the rescuers or victims. A novel model of visual attention taking into
account motion is introduced and its effectiveness in segmenting human gestures is demonstrated. Basing on early motion features like orientation and
velocities, complex gestures are segmented in action primitives, the basic
components that constitute gesture sequences. The proposed gesture analysis paradigm is designed for skill learning via demonstration and attentive
exploration.
21
2.3
H. Khambhaita et al. “Help Me to Help You: how to
Learn Intentions, Actions and Plans”(AAAI-SSS11)
Bibliography H. Khambhaita, G-J. Kruijff, M. Mancas, M. Gianni, P.
Papadakis, F. Pirri and M. Pizzoli. “Help Me to Help You: how to Learn
Intentions, Actions and Plans”In: Proceedings of the Proceedings of the
AAAI 2011 Spring Symposium on Help Me Help You: Bridging the Gaps
in Human-Agent Collaboration(AAAI-SSS11). Stanford, California, USA .
March 2011.
Abstract The collaboration between a human and a robot is here understood as a learning process mediated by the instructor prompt behaviours
and the apprentice collecting information from them to learn a plan. The
instructor wears the Gaze Machine, a wearable device gathering and conveying visual and audio input from the instructor while executing a task.
The robot, on the other hand, is eager to learn both the best sequence of actions, their timing and how they interlace. The cross relation among actions
is specified both in terms of time intervals for their execution, and in terms
of location in space to cope with the instruction interaction with people and
objects in the scene. We outline this process: how to transform the rich
information delivered by the Gaze Machine into a plan. Specifically, how
to obtain a map of the instructor positions and his gaze position, via visual
slam and gaze fixations; further, how to obtain an action map from the running commentaries and the topological maps and, finally, how to obtain a
temporal net of the relevant actions that have been extracted. The learned
structure is then managed by the flexible time paradigm of flexible planning
in the Situation Calculus for execution monitoring and plan generation.
Relation to work performed The paper outlines a model of human
robot collaboration in which the final goal is to learn the best actions needed
to achieve the required goals, in this case, reporting hazards due to a crash
accident in a tunnel, identifying the status of victims and, possibly, rescuing
them. The collaboration is here viewed as a learning process involving the
extraction of information from the instructor behaviours, thus providing
data for skill and affordances learning. The instructor communicate his
actions both visually (Using the GM) and with the aid of his comments
delivered while executing the actions.
22
2.4
Carbone and Pirri. “Learning Saliency. An ICA based
model using Bernoulli mixtures.”(BICS 2010)
Bibliography A. Carbone and F. Pirri. “Learning Saliency. An ICA
based model using Bernoulli mixtures.”In: In Proceedings of BICS, Brain
Ispired Cognitive Systems (BICS 2010). Madrid, Spain. July 2010.
Abstract In this work we present a model of both the visual input selection and the gaze orienting behaviour of a human observer undertaking
a visual exploration task in a specified scenario. Our method built on a
real set of gaze tracked points of fixation, acquired from a custom designed
wearable device [41]. By comparing these sets of fovea-centred patches with
a randomly chosen set of image patches, extracted from the whole visual
context, we aim at characterising the statistical properties and regularities
of the selected visual input. While the structure of the visual context is
specified as a linear combination of basis functions, which are independent
hence uncorrelated, we show how low level features affecting a scan-path of
fixations can be obtained by hidden correlations to the context. Samples
from human observers are collected both in free-viewing and surveillancelike tasks, in the specified visual scene. These scan-paths show important
and interesting dependencies from the context. We show that a scan-path,
given a database of a visual context, can be suitably induced by a system of
filters that can be learned by a two stages model: the independent component analysis (ICA) to gather low level features and a mixtures of Bernoulli
distributions identifying the hidden dependencies. Finally these two stages
are used to build the cascade of filters.
Relation to work performed In this paper a model of saliency for images is introduced. A mixture of multivariate Bernoulli distributions models
fixations of human scanpaths, acquired by the GM, according to a set of linear filters generated by ICA decomposition of the visual environment. The
resultant image saliency map elicits those regions which are likely to have
a statistical similarity to the ones occurred in the reference scanpath. The
trained system is thus able to predict the gaze behaviour, according to the
training set of the acquired fixations, implementing a selection paradigm on
early sensory acquisition and allowing control over functioning processes for
vision-related tasks.
23
2.5
Gianni et al. “Learning cross-modal translatability: grounding speech act on visual perception.”(RSS 2010)
Bibliography M. Gianni, G. M. Krujiff and F. Pirri. “Learning crossmodal translatability: grounding speech act on visual perception.”In: Proceedings of the RSS Workshop on Learning for Human-Robot Interaction
Modeling (RSS 2010). Zaragoza, Spain. June 2010.
Abstract The problem of grounding language on visual perception has
been nowadays investigated under different approaches, we refer the reader
in particular to the works of [51, 58, 26, 72, 16, 57, 62, 45, 38, 9]. It is
less investigated the inverse problem, that is, the problem of building the
semantics/interpretation of visual perception, via speech-act.
In this work we face the two problems simultaneously, via learning both
the language and its semantics by human-robot interaction. We describe
the progress of a current research facing the problem of simultaneously
grounding parts of speech and learning the signature of a language for
describing both actions and states space, while actions are executed and
shown in a video. Indeed, having both a language and a suitable semantics/interpretation of objects, actions and states properties, we will be able
to build descriptions and representations of real world activities under several interaction modalities.
Given two inputs, a video and a narrative, the task is to associate a
signature and an interpretation to each significant action and the afforded
objects, in the sequence, and to infer the preconditions and effects of the
actions so as to interpret the chronicle, explaining the beliefs of the agent
about the observed task.
Relation to work performed The work described in this paper contributes to the research on skill learning as it aims at providing a paradigm
to associate speech acts, and thus language, to actions acquired by visual
perception.
24
2.6
Carrano et al. “An approach to projective reconstruction from multiple views.”(IASTED 2010)
Bibliography A. Carrano, V. DAngelo, S. R. F. Fanello, I. Gori, F. Pirri,
A. Rudi. “An approach to projective reconstruction from multiple views.”In
Proceedings of IASTED Conference on Signal Processing, Pattern Recognition and Applications (IASTED 2010). Innsbruck, Austria, February 2010.
Abstract We present an original method to perform a robust and detailed
3D reconstruction of a static scene from several images taken by one or
more uncalibrated cameras. Making use only of fundamental matrices we
are able to combine even heterogeneous video and/or photo sequences. In
particular we give a characterization of camera matrices space consistent
with a given fundamental matrix and provide a straightforward bottom-up
method, linear in most practical uses, to fulfill the 3D reconstruction. We
also describe shortly how to integrate this procedure in a standard vision
system following an incremental approach.
Relation to work performed The work describes a method to perform
projective reconstruction from multiple uncalibrated cameras that is based
on a graphical representation of the fundamental matrices constraining the
images of an object. In this work the Viewing Graph (ACCV 2010) is exploited in order to perform reconstruction. The use of the resultant method
has been investigated in the context of the estimation of the visual odometry
for the Gaze Machine making use of techniques for Structure and Motion
recovery.
25
2.7
Krieger and Kruijff. “Combining Uncertainty and Description Logic Rule-Based Reasoning in Situation-Aware
Robots.”(AAAI-SSS 2011b)
Bibliography H. U. Krieger and G. J. M. Kruijff. “Combining Uncertainty and Description Logic Rule-Based Reasoning in Situation-Aware
Robots.”In Proceedings of the AAAI 2011 Spring Symposium “Logical Formalizations of Commonsense Reasoning” (AAAI-SSS 2011). Stanford, California, USA. March 2011.
Abstract The paper addresses how a robot can maintain a state representation of all that it knows about the environment over time and space, given
its observations and its domain knowledge. The advantage in combining domain knowledge and observations is that the robot can in this way project
from the past into the future, and reason from observations to more general
statements to help guide how it plans to act and interact. The difficulty lies
in the fact that observations are typically uncertain and logical inference for
completion against a knowledge base is computationally hard.
Relation to work performed The paper discusses how we can perform
inference over such a long-term memory model, in a multi-agent belief model
that deals with uncertainty in knowledge states.
26
2.8
Stachowicz and Kruijff. “Episodic-Like Memory for Cognitive Robots.”(IEEE-TAMD 2011)
Bibliography D. Stachowicz and G. J. M. Kruijff. “Episodic-Like Memory for Cognitive Robots.”IEEE Transactions on Autonomous Mental Development (IEEE-TAMD 2011), 2011.
Abstract The article presents an approach to providing a cognitive robot
with a long-term memory of experiences a memory, inspired by the concept of episodic memory (in humans) or episodic-like memory (in animals),
respectively. The memory provides means to store experiences, integrate
them into more abstract constructs, and recall such content. The article
presents an analysis of key characteristics of natural episodic memory systems. Based on this analysis, conceptual and technical requirements for an
episodic-like memory for cognitive robots are specified. The article provides
a formal design that meets these requirements, and discusses its full implementation in a cognitive architecture for mobile robots. It reports results of
simulation experiments which show that the approach can run efficiently in
robot applications involving several hours of experience.
Relation to work performed The article discusses the basis for the longterm memory model we are deploying it in NIFTi, to model the situated
exploration history.
27
2.9
Pirri et al. “A general method for the Point of Regard
estimation in 3D space.”(CVPR 2011)
Bibliography F. Pirri, M. Pizzoli, A. Rudi. “A general method for the
Point of Regard estimation in 3D space. ”Accepted to the IEEE Conference
on Computer VIsion and Pattern Recognition (CVPR 2011), 2011.
Abstract A novel approach to 3D gaze estimation for wearable multicamera devices is proposed and its effectiveness is demonstrated both theoretically and empirically. The proposed approach, firmly grounded on the
geometry of the multiple views, introduces a calibration procedure that is
efficient, accurate, highly innovative but also practical and easy. Thus, it
can run online with little intervention from the user. The overall gaze estimation model is general, as no particular complex model of the human eye
is assumed in this work. This is made possible by a novel approach, that can
be sketched as follows: each eye is imaged by a camera; two conics are fitted
to the imaged pupils and a calibration sequence, consisting in the subject
gazing a known 3D point, while moving his/her head, provides information
to 1) estimate the optical axis in 3D world; 2) compute the geometry of the
multi-camera system; 3) estimate the Point of Regard in 3D world. The
resultant model is being used effectively to study visual attention by means
of gaze estimation experiments, involving people performing natural tasks
in wide-field, unstructured scenarios.
Relation to work performed The paper describes the novel contributions introduced by the video-oculography subsystem of the Gaze Machine,
the device that has been described in Section 1.2 as at the core of the skill
learning paradigm and of the attention-related tasks.
28
2.10
Finzi and Pirri. “Switching tasks and flexible reasoning
in the Situation Calculus.”(TR 2010)
Bibliography A. Finzi, F. Pirri. “Switching tasks and flexible reasoning
in the Situation Calculus. ”DIS Techincal Report, n. 7, 2010
Abstract In this paper we present a new framework for modelling switching tasks and adaptive, flexible behaviours for cognitive robots. The framework is constructed on a suitable extension of the Situation Calculus, the
Tem- poral Flexible Situation Calculus (TFSC), accommodating Allen temporal intervals, multiple timelines and concurrent situations. We introduce a
constructive method to define pattern rules for temporal constraint, in a language of macros. The language of macros intermediates between Situation
Calculus formulae and temporal constraint Networks. The programming
language for the TFSC is TFGolog, a new Golog interpreter in the Golog
family languages, that models concurrent plans with flexible and adaptive
behaviours with switching modes. Finally, we show an implementation of
a cognitive robot performing different tasks while attentively exploring a
rescue environment.
Relation to work performed The aim of this work is to report the
research on flexible planning and constitutes the theoretical foundation for
the current implementation of the NIFTi model based planner.
29
References
[1] http://www.ros.org/wiki/navigation.
[2] J. K. Aggarwal and Q. Cai. Human motion analysis: A review. Computer Vision and Image Understanding, 73:428–440, 1999.
[3] Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robot. Auton.
Syst., 57(5):469–483, 2009.
[4] Marcelo Gabriel Armentano and Analı́a Amandi. Plan recognition for
interface agents. Artif. Intell. Rev., 28(2):131–162, 2007.
[5] A. R. Aron. The neural basis of inhibition in cognitive control. The
Neuroscientist, 13:214 – 228, 2007.
[6] Anna Belardinelli, Fiora Pirri, and Andrea Carbone. Bottom-up gaze
shifts and fixations learning by imitation. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 2007.
[7] Stephen A. Block, Andreas F. Wehowsky, and Brian C. Williams. Robust execution on contingent, temporally flexible plans. In AAAI, 2006.
[8] L. Bour. Dmi-search scleral coil. H2 214, Department of Neurology,
Clinical Neurophysiology, Academic Medical Center, Amsterdam, 1997.
[9] S. R. K. Branavan, Harr Chen, Jacob Eisenstein, and Regina Barzilay. Learning document-level semantic properties from free-text annotations. J. Artif. Intell. Res. (JAIR), 34:569–603, 2009.
[10] A. Carbone and F. Pirri. Analysis of the local statistics at the centre of
fixation during visual scene exploration. In IARP International Workshop on Robotics for risky interventions and Environmental Surveillance(RISE 2010), 2010.
[11] A. Carbone and F. Pirri. Learning saliency. an ica based model using
bernoulli mixtures. In In Proceedings of BICS, Brain Ispired Cognitive
Systems, 2010.
[12] Eugene Charniak and Robert P. Goldman. A bayesian model of plan
recognition. Artif. Intell., 64(1):53–79, 1993.
[13] T. N. Cornsweet and H. D. Crane. Accurate two-dimensional eye tracker
using first and fourth purkinje images. Journal of The Optical Society
of America, 68(8):921–928, 1973.
30
[14] V. D’Angelo, S. R. F. Fanello, I. Gori, F. Pirri, and A. Rudi. An approach to projective reconstruction from multiple views. In In Proceedings of IASTED Conference on Signal Processing, Pattern Recognition
and Applications, 2010.
[15] Yiannis Demiris and Bassam Khadhouri. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and
Autonomous Systems, 54(5):361–369, 2006.
[16] Peter Ford Dominey and Jean-David Boucher. Learning to talk about
events from narrated video in a construction grammar framework. Artif.
Intell., 167(1-2):31–61, 2005.
[17] J. Duncan. Disorganization of behaviour after frontal-lobe damage.
Cognitive Neuropsychology, 3:271–290, 1986.
[18] I. Fanello, S. R. F. Gori and F. Pirri. Arm-hand behaviours modelling:
from attention to imitation. In ISVC International Symposium on Visual Computing(ISVC2010), 2010. Best Paper Award.
[19] A. Finzi and F. Pirri. Switching tasks and flexible reasoning in the
situation calculus. Technical Report 7, Dipartimento di informatica e
Sistemistica Sapienza Università di Roma, 2010.
[20] Sandra Clara Gadanho. Learning behavior-selection by emotions and
cognition in a multi-goal robot task. J. Mach. Learn. Res., 4:385–412,
2003.
[21] V. Gallese, L. Fadiga, L. Fogassi, and G. Rizzolatti. Action recognition
in the premotor cortex. Brain, 119:593–609, 1996.
[22] C. Geib. Delaying commitment in plan recognition using combinatory
categorial grammars. In Proc. of the IJCAI 2009, pages 1702–1707,
2009.
[23] M. Ghallab and H. Laruelle. Representation and control in ixtet, a
temporal planner. In Proceedings of AIPS-1994, pages 61–67, 1994.
[24] J.J. Gibson. Perceptual learning: differentiation or enrichment? Psyc.
Rev., 62:32–41, 1955.
[25] J.J. Gibson. The theory of affordances. In R. Shaw and J. Bransford, editors, Perceiving, Acting, and Knowing: Toward an Ecological
Psychology, pages 67–82. Hillsdale, NJ: Lawrence Erlbaum, 1977.
[26] Peter Gorniak and Deb Roy. Grounded semantic composition for visual
scenes. J. Artif. Intell. Res. (JAIR), 21:429–470, 2004.
31
[27] Dan Witzner Hansen and Arthur E. C. Pece. Eye typing off the shelf. In
2004 Conference on Computer Vision and Pattern Recognition (CVPR
2004), pages 159–164, June 2004.
[28] Ari K. Jonsson, Paul H. Morris, Nicola Muscettola, Kanna Rajan, and
Benjamin D. Smith. Planning in interplanetary space: Theory and
practice. In Artificial Intelligence Planning Systems, pages 177–186,
2000.
[29] Henry A. Kautz. A formal theory of plan recognition. PhD thesis,
Department ofComputer Science, University of Rochester, 1987.
[30] Henry A. Kautz and James F. Allen. Generalized plan recognition. In
AAAI, pages 32–37, 1986.
[31] Kazuhiko Kawamura, Tamara E. Rogers, and Xinyu Ao. Development
of a cognitive model of humans in a multi-agent framework for humanrobot interaction. In AAMAS ’02: Proceedings of the first international
joint conference on Autonomous agents and multiagent systems, pages
1379–1386, New York, NY, USA, 2002. ACM.
[32] H. Khambhaita, G.J.M. Kruijff, M. Mancas, M. Gianni, P. Papadakis,
F. Pirri, and M. Pizzoli. Help me to help you: how to learn intentions,
actions and plans. In AAAI 2011 Spring Symposium “Help Me Help
You: Bridging the Gaps in Human-Agent Collaboration”, 2011.
[33] Hans-Ulrich Krieger. A temporal extension of the Hayes and ter Horst
entailment rules for RDFS and OWL. In AAAI 2011 Spring Symposium
“Logical Formalizations of Commonsense Reasoning”, 2011.
[34] Hans-Ulrich Krieger and Geert-Jan M. Kruijff. Combining uncertainty
and description logic rule-based reasoning in situation-aware robots. In
AAAI 2011 Spring Symposium “Logical Formalizations of Commonsense Reasoning”, 2011.
[35] Volker Krüger, Danica Kragic, and Christopher Geib. The meaning of
action a review on action recognition and mapping. Advanced Robotics,
21:1473–1501, 2007.
[36] N. Levi and M. Werman. The viewing graph. CVPR, 2003.
[37] Dongheng Li, Jason Babcock, and Derrick J. Parkhurst. Openeyes: a
low-cost head-mounted eye-tracking solution. In ETRA ’06: Proceedings of the 2006 symposium on Eye tracking research & applications,
pages 95–100, New York, NY, USA, 2006. ACM.
32
[38] Ingo Lütkebohle, Julia Peltason, Lars Schillingmann, Britta Wrede,
Sven Wachsmuth, Christof Elbrechter, and Robert Haschke. The curious robot - structuring interactive robot learning. In ICRA, pages
4156–4162, 2009.
[39] M. Mancas, F. Pirri, and M. Pizzoli. Human-motion saliency in multimotion scenes and in close interaction. submitted Gesture Recognition
Workshop 2011.
[40] S. Marra and F. Pirri. Eyes and cameras calibration for 3d world gaze
detection. In Proceedings of the International Conference on Computer
Vision Systems, 2008.
[41] Stefano Marra and Fiora Pirri. Eyes and cameras calibration for 3d
world gaze detection. In Proceedings of the International Conference
on Computer Vision Systems, pages 216–227, 2008.
[42] U. Mayr and SW. Keele. Changing internal constraints on action:
the role of backward inhibition. Journal of Experimental Psychology,
129(1):4–26, 2000.
[43] E.K. Miller and J.D. Cohen. An integrative theory of prefrontal cortex
function. Annual Rev. Neuroscience, 24:167 – 202, 2007.
[44] Thomas B. Moeslund, Adrian Hilton, and Volker Krüger. A survey of
advances in vision-based human motion capture and analysis. Computer
Vision and Image Understanding, 104(2-3):90–126, 2006.
[45] Raymond J. Mooney. Learning to connect language and perception. In
AAAI, pages 1598–1601, 2008.
[46] Carlos H. Morimoto and Marcio R. M. Mimica. Eye gaze tracking techniques for interactive applications. Computer Vision Image Understing,
98(1):4–24, 2005.
[47] A. Newell. Unified theories of cognition. Harvard University Press,
1990.
[48] D. A. Norman and T. Shallice. Consciousness and Self-Regulation: Advances in Research and Theory, volume 4, chapter Attention to action:
Willed and automatic control of behaviour. Plenum Press, 1986.
[49] Takehiko Ohno, Naoki Mukawa, and Atsushi Yoshikawa. Freegaze: a
gaze tracking system for everyday gaze interaction. In ETRA ’02: Proceedings of the 2002 symposium on Eye tracking research & applications,
pages 125–132, New York, NY, USA, 2002. ACM.
33
[50] G. Di Pellegrino, V. Gallese, L. Fadiga, L. Fogassi, and G. Rizzolatti.
Understanding motor events: a neurophysiological study. Exp. Brain
Research, 91:176–180, 1992.
[51] Alex Pentland, Deb Roy, and Christopher Richard Wren. Perceptual
intelligence: learning gestures and words for individualized, adaptive
interfaces. In HCI (1), pages 286–290, 1999.
[52] Andrea Philipp and Iring Koch. Task inhibition and task repetition
in task switching. The European Journal of Cognitive Psychology,
18(4):624–639, 2006.
[53] Fiora Pirri. The well-designed logical robot: learning and experience
from observations to the situation calculus. Artificial Intelligence, pages
1–44, Apr 2010.
[54] Fiora Pirri. The well-designed logical robot: Learning and experience from observations to the situation calculus. Artificial Intelligence,
175(1):378 – 415, 2011.
[55] Fiora Pirri, Matia Pizzoli, and Alessandro Rudi. A general method for
the point of regard estimation in 3d space. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2011.
[56] Ronald Poppe. A survey on vision-based human action recognition.
Image and Vision Computing, 28:976–990, 2010.
[57] Deb Roy. Semiotic schemas: A framework for grounding language in
action and perception. Artif. Intell., 167(1-2):170–205, 2005.
[58] Deb Roy and Alex Pentland. Learning words from sights and sounds:
a computational model. Cognitive Science, 26(1):113–146, 2002.
[59] J.S. Rubinstein, E.D. Meyer, and J. E. Evans. Executive control of cognitive processes in task switching. Journal of Experimental Psychology:
Human Perception and Performance, 27(4):763–797, 2001.
[60] A. Rudi, M. Pizzoli, and F. Pirri. Linear solvability in the viewing
graph. In Springer, editor, Proc. of the ACCV Asian Conference on
Computer Vision(ACCV2010), 2010.
[61] A. Rudi, M. Pizzoli, and F. Pirri. Linear solvability in the viewing
graph. In ACCV Asian Conference on Computer Vision(ACCV2010),
2010.
[62] Paul E. Rybski, Jeremy Stolarz, Kevin Yoon, and Manuela Veloso.
Using dialog and human observations to dictate tasks to a learning
robot assistant. Intel Serv Robotics, 1:159–167, 2008.
34
[63] Stefan Schaal, Auke Ijspeert, and Aude Billard. Computational approaches to motor learning by imitation. Philosophical Trans. of the
Royal Soc. B: Biological Sciences, 358(1431):537–547, 2009.
[64] Charles F. Schmidt, N. S. Sridharan, and John L. Goodson. The plan
recognition problem: An intersection of psychology and artificial intelligence. Artif. Intell., 11(1-2):45–83, 1978.
[65] M.P. Shanahan. A cognitive architecture that combines internal simulation with a global workspace. Consciousness and Cognition, 15:433–449,
2006.
[66] Sheng-Wen Shih and Jin Liu. A novel approach to 3-d gaze tracking
using stereo cameras. Systems, Man and Cybernetics, Part B, IEEE
Transactions on, 34(1):234–245, 2004.
[67] D. Stachowicz and G.J.M. Kruijff. Episodic-like memory for cognitive
robots. Journal of Autonomous Mental Development, 2011. accepted
for publication.
[68] A. Tate. ”I-N-OVA” and ”I-N-CA”, Representing Plans and other
Synthesised Artifacts as a Set of Constraints, pages 300–304. 2000.
[69] S. P. Tipper. Does negative priming reflect inhibitory mechanisms?
a review and integration of conflicting views. Quarterly Journal of
Experimental Psychology, 54:321 – 343, 2001.
[70] K.P. White, T.E. Hutchinson Jr., and J.M. Carley. Spatially dynamic
calibration of an eye tracking system. In IEEE Transaction on Systems,
Man, and Cybernetics, volume 23, pages 1162–1168, 1993.
[71] B. Williams, M. Ingham, S. Chung, P. Elliott, M. Hofbaur, and G. Sullivan. Model-based programming of fault-aware systems. AI Magazine,
Winter 2003.
[72] Chen Yu and Dana H. Ballard. On the integration of grounding language and learning objects. In AAAI, pages 488–494, 2004.
35
Arm-Hand behaviours modelling: from attention to
imitation
Sean R. F. Fanello1 , Ilaria Gori1 , and Fiora Pirri1
Sapienza Università di Roma, Dipartimento di Informatica e Sistemistica, Roma, RM, Italy
[email protected],[email protected],[email protected]
Abstract. We present a new and original method for modelling arm-hand actions, learning and recognition. We use an incremental approach to separate the
arm-hand action recognition problem into three levels. The lower level exploits
bottom-up attention to select the region of interest, and attention is specifically
tuned towards human motion. The middle level serves to classify action primitives exploiting motion features as descriptors. Each of the primitives is modelled
by a Mixture of Gaussian, and it is recognised by a complete, real time and robust
recognition system. The higher level system combines sequences of primitives using deterministic finite automata. The contribution of the paper is a compositional
based model for arm-hand behaviours allowing a robot to learn new actions in a
one time shot demonstration of the action execution.
Keywords: gesture recognition, action segmentation, human motion analysis.
1
Introduction
We face the problem of modelling behaviours from a robot perspective. We provide
an analysis of the role played by the primitive constituents of actions and show, for a
number of simple primitives, how to make legal combinations of them in so enabling
the robot to build and replicate the observed behaviour by its own.
Here we shall focus only on actions performed by hands and arms, although we
extend the action class beyond the concept of gestures (as specified, e.g. in Mitra et al.
survey [1]). In fact, potentially any general action, such as drinking or moving objects
around, performable by hand and arm, can be included in our approach.
First of all we consider attention to and focus towards the human motion, as distinct
from non-human motion, either natural or mechanical. This aspect, in particular, resorts
to the theory of motion coherence and structured motion (see for example Wildes and
Bergen [2]), for which oriented filters have been proven to be appropriate [3]. Indeed,
we show how a bank of 3D Gabor filters can be tuned to respond selectively to some
specific human motion. Thus, focus on regions of the scene interested by human motion provides the robot with a natural segmentation of where to look at for learning
behaviours. In particular, attention to distinct human motion seems to be explicitly dependent on scale, frequency, direction, but not on shape. This fact has suggested us to
define descriptors based only on these features, from which we extract principally the
directions of the arm-hand movements (see also [4]).
2
Sean R. F. Fanello, Ilaria Gori, and Fiora Pirri
The very simple structure of the descriptors enables a straightforward classification
that includes all direction dependent primitives, such as up, tilt, release, grasp and so
on. The basic classification can be easily extended to any legal sequence of actions
for which a deterministic accepting automata exists. In this sense, according to the
classification on human motion analysis, as provided by Moeslund et al. [5], our approach seems to fall into the category of action primitives and grammars, as no explicit
reference to human model is used in the behaviour modelling. For other taxonomies
concerning action recognition we refer the reader to [6–8].
The purpose of our work encompasses the classification and regression problem (see
[6, 9]). The purpose is to enable robot action learning, by learning the primitives and
their structured progression. This can be considered as a form of imitation learning (we
refer the reader to the review of [10]) although an important generalisation inference is
done in the construction of the accepting automaton.
Thus, at each step of the behaviour learning, the robot finds itself either modelling
the behaviour via a new automaton, from the observed sequence, or accepting the observed behaviour via an already memorised one. This can be further extended by revising the learned automaton. Finally we have experimented the above model with the
robot iCub (see Fig. 6), a humanoid robot designed by the RobotCub Consortium.
2
Focusing on human motion
Humans reveal a specific sensitivity to actions. It has been shown that action recognition
is predominantly located in the left frontal lobe (see [11]) and that low level motion
perception is biased towards stimuli complying with kinematics laws of human motion.
Indeed, human visual sensitivity is greatest at roughly 5 cycle/degree and at 5 Hz.
We have used 3D Gabor filters to record responses to human motion and in particular to arm-hand motion in so as to learn attention towards these specific movements in
a scene, as opposed to other kind of movements (e.g. a fan). We show that 3D Gabor
filters can discriminate different motions by suitably selecting scale and frequency. The
selected regions are used by the descriptors to identify primitives of actions.
The earliest studies on the Gabor transform [12] are due to Daugman [13] and to
the experiments of Jones and Palmer [14], who tested the Daugman’s idea that simple
receptive fields belong to a class of linear filters analogous to Gabor filters. Since then
a wealth of literature has been produced on Gabor filters, to model several meaningful
aspects of the visual process. Most of the works are, however, focused on the 2D analysis. A 3D Gabor, as the product of a 3D Gaussian and a complex 3D harmonic function,
can be defined as follows:
!
"
#
1
> −1
>
−1/2
−3/2
(1)
(2π)
exp − (x − x0 ) Σ (x − x0 ) exp 2πıu0 (x − x0 )
G(x) = |Σ|
2
Here x = (x, y, t), x0 = (x0 , y0 , t0 ) is the origin in the time-space
√ domain, u0 =
(u0 , v0 , w0 ) denote the central spatio-temporal frequency, finally ı = −1. Using Euler
formula and simplifying, the harmonic term can be written as cos −2πu0 > x0 + ψ , with
ψ the phase parameters in Cartesian coordinates. From this, according to the ψ value, it
is possible to obtain two terms in quadrature, the even and the odd Gabor filters (see,
Arm-Hand behaviours modelling: from attention to imitation
3
for example, [12] [15]), which we denote GO and GE . Clearly, with respect to Gabor’s
representation of the information area [12] (see also [15]) these filters should be represented in a 6-dimensional space with coordinates x, y, t, u, v, w. However, following
Daugman [15], we consider two representations, one in the space-time domain and one
in the frequency domain. Here we shall mention only the space-time domain.
Fig. 1: On the left a bank of 3D Gabor filters with same scale and frequency and varying direction.
On the right a slice along the x − time.
A Gabor can be specified by the parameters of scale (one per axis of the Gaussian support, which is an ellipsoid), of direction, by the angles θ and ϕ of the principal
axes of the Gaussian support, and of central frequency. In fact, knowing the axes direction (eigenvectors) and the axes scale (eigenvalues) the Gaussian covariance Σ is
determined. By varying these parameters, according to the spatial frequency contrast
sensitivity and speed sensitivity in humans, providing limiting values, we have defined
a bank of Gabor filters, in which the parameters range is specified as follows. Both the
spatial and temporal frequencies are given as multiple of the Nyquist critical frequency
f s = (1/2) cycle/pixel and ft = 12.5Hz, given that the video sampling rate was 25Hz.
In particular, the frequency bandwidth is related to the Gaussian axes length as follows:
∆Fi =
1
√ , i = 1, ..3
2 λi
(2)
With λi the i-th eigenvalue of Σ, and ∆Fi = Fimax − Fimin the maximal (resp. the minimal) frequency of the chosen channel in the i-th direction. We have chosen the central frequency to vary along 4 channels from 1/2 to 1/8 and, accordingly, the scale
to vary from 0.1 to 0.8 for each of the axis of the Gaussian support. This amounts
to 48 parameters. On the other hand the orientation is given for 6 directions, namely
{0, 30, 60, 90, 120, 150}, for both the angles θ and ϕ. This amounts to 36 parameters. We
have thus obtained a bank of 48 × 36 filters. Figure 1 shows, on the left, a bank of 3D
Gabor filters, with only the direction varying and the origin x0 varying on a circle, just
for visualisation purposes.
To learn the human motion bias, we are given training videos V of about 800 frames
taken at a sampling rate of about 25 Hz. The video resolution is reduced to 144 × 192
4
and a period of ∆T = 0.64s is considered to accumulate information, at the end of which
the energy is computed. This amounts to volumes V ∆T of 16 frames and, analogously
the Gabor filter is defined by a volume of dimension 16 × 16 × 16. The square of the
motion energy, for the given interval, is defined as:
2 R
2
P R
2
i
0
∆T
(3)
En∆T
(x − x0 )dx0 + 3 GiO (x0 )V ∆T (x − x0 )dx0
3 G E (x )V
∆T
i (x) =
R
R
Here the pair (GiE , GiO ) varies on the space of the filters bank, x = (x, y, t), x0 = (x0 , y0 , t0 ),
and integration is triple on x0 , y0 and t0 . The energy is computed for each 3D Gabor in
the filter bank, after smoothing the volume V ∆T , with ∆T ∼ 0.64s, with a binomial
filter of size 3. Although the coverage of the bank is not complete, we look for the
response that maximises the energy around a foveated region of at most 1 to 3 degrees,
and it is minimal elsewhere. Intuitively, this means that the response of the receptive
fields is sharp in the interesting regions. This fact, indeed, amounts to both maximise
the energy and minimise the entropy of the information carried by the energy of the
response. This is achieved by considering the energy voxels as i.i.d observations of
a non-parametric kernel density. The non parametric kernel density of the energy is
estimated using a uniform kernel (see next section, equation (7)), with bandwidth H =
0.08 · I, I the identity matrix, along the 3 dimensions of the foveated region (for non
parametric densities and the estimation of the bandwidth, we refer the reader to [16]).
Fig. 2: The higher sequence illustrates the optical flow, detecting evenly the fan and the arm-hand
motion. The lower sequence illustrates the energy of the quadrature pair Gabor filters, along a path
of ∆T minimising the entropy. This path is constant over 1/6 cycle/pixel and varying direction.
The scale is fixed with a = 0.38, b = 0.46, c = 0.6. here all images have been resized to 1/2.
On the other hand, the response is discriminative if the energy peaks are minimal in
number, and hence the correlation is high on closest spatio-temporal regions. Therefore
the optimisation criterion amounts to maximise the energy subject to both the minimisation of the entropy E(p) of the non-parametric density p and the minimisation of
5
an error function defined as the sum of the squared distances between any two energy
peaks. Here a peak is any energy value x such that
4 X
x≥
En.
(4)
3N
Here N is the dimension of En obtained by vectorisation of En∆T
i . It is interesting to
note that, under this optimisation criterion we have the following results:
1. Given ∼ 800 frames at 25Hz, with T about half minute, the maximisation of the
energy, subject to the minimisation of the entropy, and subject to the minimisation
of energy peaks distance, at each ∆T , ensures that the motion of a congruous source
is tracked.
2. For attention towards human-motion only scale and central frequency influence
energy response, while direction can be kept varying.
3. Human motion is located at the medium low frequencies of the filter bank.
It follows that, choosing a Gabor filter of any direction, with central frequency in spacetime of about (1/6) cycle/pixel, cycle/frame, with minimal scale, if the optimisation
criteria are satisfied along the whole path T then at the peaked regions arm-hand motion
is very likely to be included. Some experiments are shown in Figure 2 and compared
with optical flow where we note that 3D Gabor filtering can discriminate between handarm motion and the fan motion, note that the fan had two different velocities.
3
Online classification of action primitives
Fig. 3: The figure illustrates the grabbed directions of some of the gesture primitives in pairs:
Right (0°)- Left (180°), Up (90°)- Down (270°), Tilt 45°- Tilt 215°, Tilt 135°- Tilt 315°, Rotate
(Right and Left), Grab and Release.
In the previous section we have specified how to obtain the region of interest (ROI),
where arm-hand motion is identified, for each frame of a video. In this section we show
how actions primitives can be classified online. As gathered in the previous section, the
optimisation criteria for identifying human motion were not tuned to direction: as far as
a movement of the hand or arm is displayed the motion direction varies continuously,
therefore it is less relevant than scale and frequency. However once the motion energy,
as the squared sum of the responses of the two filters in quadrature, has been obtained,
according to Heeger [17] (see also [18]), it is possible to recover the optical flow. However, once the scale and frequency have been selected for attention, any direction works
6
well with the selected scale and frequency for online bottom-up attention. Therefore to
ease performance in gesture tracking and online classification we have chosen to use
a simple and well performing optical flow algorithm such as Horn and Shunck’s algorithm [19]. Other methods such as Lucas-Kanade’s algorithm [20], Variational Optical
Flow [21] and Brox’s Optical Flow [22] are either too demanding, in terms of features
requirements, or too computationally expensive. For example Brox’s algorithm slow
down computation at 6 Hz, whereas for real-time tracking human motion 25 Hz are
needed.
Fig. 4: The Figure on the left illustrates the likelihood computed at real-time at each time step t
over a sequence of T = 500 frames. On the right the likelihood trend for the action “Grab-RightRelease”.
Let hV(x, y, t), U(x, y, t), tit=1,...,T be the optical flow vector, for each pixel in the ROI.
The principal directions of the velocity vectors are defined as follows:
dir(x, y, t) =
π k
V(x, y, t)
k
V(x, y, t)
(d arctan(
)e + b arctan(
)c)
2k π
U(x, y, t)
π
U(x, y, t)
(5)
hence, at each (x, y) pixel in the ROI, at time t, the principal direction θ j takes the
following discrete values:
θj = ±
(2 j − 1)π
, j = 1, . . . , 2k
2k
(6)
Here k is half the number of required principal directions. We can note that the size of
dir(t) depends on the dimension of the ROI. In order to obtain a normalised features
vector X(t) ∈ R2k we use a uniform kernel which, essentially, transforms dir(t) into its
histogram via a non parametric kernel density. More specifically, let n = 2k, let J be
the indicator function, let Y(t) be the vectorisation of dir(t), with m its size, let x be an
element of a vector of size n, scaled between min(Y(t)) and max(Y(t)):
!
m
1
x − Y s (t)
1 X
x − Y s (t)
K (u) = J (|u| ≤ 1) hence for u =
, X(x, t) =
(7)
K
2
h
nh s=1
h
7
here we have chosen h = 1/28 . The obtained feature vector X(t) ∈ R2k is then used for
any further classification of primitive actions.
Right
Left
Up
Down
T45°
T135°
T225°
T315°
Grab
Rel
Rot
False
Right
0.7
Left
Up
Down T45° T135° T225° T315° Grab
Rel
Rot
0.15
False
0.15
0.3
0.1
0.1
0.2
0.7
0.1
0.7
1.0
1.0
0.1
0.7
0.8
0.3
0.7
1.0
0.1
0.1
0.1
0.1
0.1
0.8
0.8
0.1
Table 1: Confusion Matrix for 11 of the arm-hand primitives. Here false denotes a false positive
gesture in the sequence.
Given a source of sequences of 100 arm-hand actions (gestures) we have defined 11
principal primitives. Figure 3 illustrates 5 pairs of primitives, grab and release. Once
each frame is encoded into the above defined descriptor, we can obtain parametric descriptors by estimating for each primitive action a mixture of Gaussian. For the estimation, to suitably assess the number of components of each mixture, we have been
using the Spectral Clustering algorithm ([23]). Therefore for each primitive action A s ,
s = 1, . . . , M, with M = 11, the number of primitive actions considered, a mixture
of Gaussian gAs , is estimated, with parameters (µ1 , . . . µm , Σ1 , . . . , Σm , π1 , . . . , πm ). The
mixtures are directly used for classification.
Given a video sequence of length T , we need to attribute a class A s to each feature
descriptor Xi , i = 1, . . . , T , Xi = X(ti ), to establish the primitive actions appearing in the
sequence. We note that because of the low frequency of arm-hand motion, it turns out
that the same primitive holds for several frames, therefore it is possible to monitor the
likelihood of a feature vector and searching the Gaussian space only at specific break
points, indicating change direction.
Indeed, consider a buffer BT = (X1 , . . . XT ) of features vectors, as defined in equation (7), obtained by coding T frames, where X(ti ) = Xi the i-th descriptor, at time ti , in
the buffer. The posterior distribution of each primitive action, given each feature in the
buffer is estimated via the softmax function. Namely,
exp(λk )
P(A s |Xi ) = P
j exp(λ j )
with λi = log gAs (Xi |A s )P(A s )
(8)
Hence the observed primitive is classified to action A s , if P(A s |Xi ) > P(Aq |Xi ) for any
primitive Aq , Aq , A s with gAs > τ, τ a threshold estimated in training, according to the
8
likelihood trend of each primitive. Now, given that, at time t0 , A s is chosen according to
(8), the gradient of the likelihood is:
∆gAs (Xi ) =
K
X
h=1
ph (Xi )πh Σh−1 (µh − Xi )
(9)
Here K is the number of components of gAs , πh is the mixing parameter and Σh−1 the
precision matrix. Now, as far as the likelihood goes in the direction of the gradient
it follows that the action shown must be A s and as soon at the likelihood decreases
it follows that a change in direction is occurring. Therefore the next class has to be
identified via (8) and again the gradient is monitored. At the end of the computation
a sequence hA s1 : p s1 , . . . , A sk : p sk i, of primitive actions, is returned. Each primitive
in the sequence is labelled by the class posterior (according to 8), computed at the
maximum likelihood, reached by the primitive action in the computation window of the
gradient.
In Figure 4 the likelihood trend of ten primitive gestures, over 500 frames, is shown.
Table 1 illustrates the confusion matrix of the above defined primitives, for an online
sequence of 100 gestures. From the confusion matrix it emerges that it is quite unlikely
that the system mismatches a direction. However a weakness of the described online
recognition algorithm is that it is possible to recognise false directions even if they are
not in the performed sequence. In any case the accuracy of the whole system is around
80%, no matters if gestures are performed with varying speeds and by different actors.
4
Actions: Learning and Imitation
According to the steps described in the previous sections, an action can be specified by
a sequence of primitive gestures (arm-hand primitive actions). For example the action
manipulation can be specified by the sequence hGrasp Rot Reli. This sequence is recognised using the online estimation of the likelihood trend of each primitive action in the
sequence, as gathered in the previous section. However, the same action manipulation
can be described by hGrasp Reli and by hGrasp U p Rot Down Reli, as well. Indeed,
we consider a sequence of primitive gestures as a sample from an unknown regular
language, specifying an action. We make the hypothesis that for each arm-hand action
there is a regular language L(A) generating all the sequences (or words or strings) that
specify the action. It follows, by the properties of regular languages, that further complex actions can be obtained by composition, likewise partial actions can be matched
within more complex actions.
We face the problem of learning a deterministic finite automaton (DFA) that recognises such a regular language. A DFA is defined by a 5-tuple (Q, Σ, δ, q0 , F), where Q is
a finite set of states, with q0 the initial state; Σ is a finite input alphabet; δ : Q × Σ → Q
is the transition function, and F ⊆ Q is the set of final states (see e.g. [24]).
The problem of regular language inference described by a canonical DFA, consistent with the given sample, has been widely studied and, in particular, [25] have
proposed the Regular Positive and Negative Inference (RPNI) algorithm to infer deterministic finite automata, in polynomial time. Here a canonical representation of an
9
Fig. 5: On the left the DFA of the action manipulation, on the right its extended PDFA. Note
that, because of the structure of S + , I(q0 ) = 1 and PF (q2 ) = 1. The probabilities inside the states
indicate the probability of the state to belong to QF , the set of final states.
automaton A is a minimal representation A0 such that L(A) = L(A0 ), where L(A) is
the language accepted by the DFA A.
In this section we briefly show how, using the classification steps of the previous
section, it is possible to build a positive and negative sample (S + , S − ) of an unknown
regular language, such that the sample is structurally complete, that is, the words in
the sample make use of all edges, states and final states of the DFA. We also provide a
probabilistic extension of the finite automaton, using the annotation of the sequences.
For each action A, to be learned using the 11 primitives, we define an ordering
on the sequences starting with the minimal sequence (e.g. for the manipulation action hGrasp Reli), and increase the dimension with repeated primitives. Whenever a
sequence fails to be recognised then the sequence, with the mismatched primitive, is
added to the negative sample. It follows that, according to the recognition performance
of the system, we should have 80 positive and 20 negative instances over 100 words of
a specific action. Since the positive sample is provided by a benign advisor, it must be
structurally complete.
Given (S + , S − ) the RPNI algorithm starts by constructing an automaton hypothesis
PT (S + )/πr , where PT (S + ) is the prefix tree acceptor of S + . Here πr is a partition of
the prefixes Pr(S + ) of S + , defined as Pr(S + ) = {u ∈ Σ ? |∃v ∈ Σ ? , uv ∈ S + }, where Σ is
the alphabet of S + . An example of a prefix tree, together with a merging transformation
leading to the canonical automaton is given in [25]. Figure 5 illustrates an automaton
generated by a sample including the following sequences:
S + = {hGrasp Reli, hGrasp Down Reli, hGrasp Down Rot Reli, hGrasp Down U p Rot Reli,
hGrasp Rot U p Reli, hGrasp Rot U p Down Reli, hGrasp U p Down Rot Reli, }
(10)
S − = {hRel Reli, hGrasp Graspi, hGrasp Rel Roti, hGrasp Rel U pi}
Probabilistic extensions of DFA have been treated both from the point of view of
probabilistic acceptors [26] and the point of view of automata as generative models of
stochastic languages [27, 28].
Here, instead, we have assumed that the negative sample comes from the distribution of failures on single elements of the alphabet and we use the distribution on
the primitive actions to compute the distribution induced by the identified automaton.
Following [27, 28], we define the extension PA of a DFA A as A together with the
functions IA : Qinit 7→ [0, 1], PA : δ 7→ [0, 1] and F A : QF 7→ [0, 1] where Qinit ⊆ Q
is the set of initial states and QF ⊂ Q is the set of final states. If w is a word accepted
10
Fig. 6: iCub repeats actions performed by demonstrator.
by A then there exists at least a path θ ∈ ΘA , ΘA the set of paths to final states, from
∃s0 ∈ Qinit to some sk ∈ QF and the probability of θ is:

 k
Y

PA (δ(si−1 , Ai )) F A (sk ) (si is a current state in the path) (11)
PrA (θ) = IA (s0 ) 
i=1
Thus, given a normalization constant α, the probability of generating a word w is
X
PrA (w) = α
PrA (θ)
(12)
θ∈ΘA
It follows that if an action w is parsed by A then there exists among the valid paths
the most probable one, according to the probabilities estimated in classification, as in
HMMs.
In order to add probabilities to states and transitions we proceed as follows. Let
S = (S + , S − ) and let PT (S + ) be the prefix tree acceptor of S + . Let hFj = #{A j ∈ Σ | A j
occurs as last symbol of w, w ∈ S + } and hIj = #{A j ∈ Σ | A j occurs as first symbol of
w, w ∈ S + }. Now, let f jk = #{qk | δ(q, A j ) = qk , A j ∈ Σ} and l jk = #{A j | (qk , A j ) ∈ δ, qk ∈
Qinit }. Then we define:
P
P
j f jk
j l jk
?
?
F A (qk ) = P F and IA (qk ) = P I
(13)
j h jk
j h jk
Here F A? and IA? are intermediate estimations. We recall that each word in S + is a labelled sequence according to the classification step. Namely if w ∈ S + then w = A j1 :
p j1 , . . . , A jn : p jn . Now, for each branch in the PT (S + ) we construct a transition matrix
Uk such that the dimension of Uk is |Q?A | × |Σ|, with |Q?A | = m the number of states in
PT (S + ). Here an element ui j of Uk indicates the transition to state qi in the k-th branch,
of the symbol A j (i.e. of primitive action A j ), in other words it indicates the position of
A j in the sequence accepted by the k-th branch, since by construction qi is labelled by
the prefix of the sequence up to A j . Thus ui j = 0 if there is no transition of A j to qi and
ui j = pi j if A j labels the transition to qi in PT (S + ). Then all these transition matrices
are added and normalised. For the normalisation we build a matrix Z which is formed
P
by repetition of a vector V, that is, Z = V ⊗ 1>n , n = |Σ|. Thus, let H = k Uk , with
P
Uk the matrix of the k-th branch of the PT (S + ) tree. we define VU = j H j , that is, the
sum of H over the columns. Then U = H./Z, with Z the normalisation matrix defined
11
above, and ./ the element wise division between elements of H and elements of Z. Now,
in order to obtain the transition probability matrix for A, at each merge step of state i
and state j of the RNPI, assuming i < j, we have to eliminate a row u j . To this end we
first obtain the new row unew
= (ui + u j )/2 and then we can cross out u j . It follows that
i
the new matrix U new is (m − 1) × n, n the cardinality of Σ, and it is still stochastic. As
this process is repeated for all merging operations in the NRPI algorithm, in the end the
last U new obtained will be a stochastic matrix with the right number of transitions.
At this point we are left with the two diagonal matrices F A? , IA? and the matrix U new .
We define the three new vectors, which have all the same dimension:
VF = diag(F A? )
VI = diag(IA? )
P
Vδ = j U (new, j)
Let Z = VF + Vδ . We can finally define F A = F A? ./Z, PA = U new ./Z and IA = IA? /
It follows that the requirement for the DFA A to be a PFA, namely:
P
(q) = 1
q∈QA IAP
F A (q) + A∈Σ PA (δ(q, A)) = 1 ∀q ∈ QA
(14)
P
VI .
(15)
is satisfied, see Figure 5. We can note that, by the above construction, each transition
δ is labelled according to the sample mean of each primitive action in the sequences
mentioned in S + . Figure 6 shows sequences of learning and imitation.
5
Conclusions and acknowledgements
We have described an original method, as far as we know, within the process of real
time recognition and learning of direction-based arm-hand actions. Our main contribution is an incremental method that, from attention to human motion to the inference of a
DFA, develops a model of some specific human behaviour that can be used to learn and
recognise more complex actions of the same kind. For this early system we have chosen simple primitive gestures easily learnable and recognisable with low computational
cost. In order to enable the robot to repeat the action, we have extended the DFA to a
probabilistic DFA, which generates together with a language a distribution on it. Following the properties of regular languages it is possible to provide a real time learning
system that can infer more complex actions. We have finally implemented the system,
about which we have shown some specific performance results, on the iCub and tested
that the set of actions demonstrated have been learned and replicated efficiently.
The research is supported by the EU project NIFTI, n. 247870.
References
1. Mitra, S., Acharya, T.: Gesture recognition: A survey. IEEE Transactions on Systems, Man,
and Cybernetics, Part C 37(3) (2007) 311–324
2. Wildes, R.P., Bergen, J.R.: Qualitative spatiotemporal analysis using an oriented energy
representation. In: ECCV ’00. (2000) 768–784
12
3. Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. J.
of the Optical Society of America A 2(2) (1985) 284–299
4. Braddick, O., OBrien, J., Wattam-Bell, J., Atkinson, J., Turner, R.: Form and motion coherence activate independent, but not dorsal/ventral segregated, networks in the human brain.
Current Biology 10 (2000) 731–734
5. Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion
capture and analysis. Computer Vision and Image Understanding 104(2-3) (2006) 90–126
6. Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing
28 (2010) 976–990
7. Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Computer Vision and Image
Understanding 73 (1999) 428–440
8. Bobick, A.F.: Movement, activity, and action: the role of knowledge in the perception of
motion. Philosophical Transactions of the Royal Society of London 352 (1997) 12571265
9. Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F., Ramanan, D.: Computational studies of
human motion: Part 1, tracking and motion synthesis. Foundations and Trends in Computer
Graphics and Vision 1(2/3) (2005)
10. Kürger, V., Kragic, D., Geib, C.: The meaning of action a review on action recognition and
mapping. Advanced Robotics 21 (2007) 1473–1501
11. Casile, A., Dayan, E., Caggiano, V., Hendler, T., Flash, T., Giese, M.A.: Neuronal enc. of
human kinematic invariants during action obs. Cereb Cortex 20(7) (2010) 1647–55
12. Gabor, D.: Theory of communication. J. IEE 93(26, Part III) (1946) 429–460
13. Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of
America 2(7) (1985) 1160–1169
14. Jones, J.P., Palmer, L.A.: An evaluation of the two-dimensional gabor filter model of simple
receptive fields in cat striate cortex. Journal of Neurophysiology 58 (1987) 1233–1258
15. Daugman, J.G.: Complete discrete 2-d Gabor tansforms by neural networks for image analysis and compression. IEEE Trans. on ASSP 36(7) (1988) 1169–1179
16. Wasserman, L.: All of Nonparametric Statistics. Springer (2005)
17. Heeger, D.J.: Optical flow using spatiotemporal filters. International Journal of Computer
Vision 1(4) (1988) 279–302
18. Watson, A.B., Ahumada, A.J.J.: Model of human visual-motion sensing. Journal of the
Optical Society of America A: Optics, Image Science, and Vision 2(2) (1985) 322–342
19. Horn, B.K.P., Shunk, B.G.: Determining optical flow. Art. Intel. 17 (1981) 185–203
20. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to
stereo vision. Proc. of DARPA Imaging Understanding Work. (1981) 121–130
21. Bruhn, A., Bruhn, J., Feddern, C., Kohlberger, T., Schnrr, C. In: Lecture Notes in Computer
Science. Volume 2756. Springer, Berlin (2003) 222–229
22. Brox, T., Bruhn, A., Papenberg, N., Weickert, J. In: Lecture Notes in Computer Science.
Volume 3024. Springer, Berlin (2004) 25–36
23. Luxburg, U.V.: A tutorial on spectral clustering. Statistics and Comp. 14 (2007) 395–416
24. Hopcroft, J., Ullman, J.: Introduction to Automata Theory Languages and Computation.
Addison Wesley (1979)
25. Oncina, J., Garca, P.: -. In: Identifying regular languages in polynomial time. World Scientific
Publishing (1992)
26. Rabin, M.O.: Probabilistic automata. Information and Control 6(3) (1963) 230–245
27. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finitestate machines-part i-ii. IEEE Trans. Pattern Anal. Mach. Intell. 27(7) (2005) 1013–1039
28. Dupont, P., Denis, F., Esposito, Y.: Links between probabilistic automata and hidden markov
models: probability distributions, learning models and induction algorithms. Pattern Recognition 38(9) (2005) 1349–1371
Learning Saliency. An ICA based model using
Bernoulli mixtures.
Andrea Carbone and Fiora Pirri
Abstract In this work we present a model of both the visual input selection and the
gaze orienting behaviour of a human observer undertaking a visual exploration task
in a specified scenario. Our method built on a real set of gaze tracked points of fixation, acquired from a custom designed wearable device [18]. By comparing these
sets of fovea-centred patches with a randomly chosen set of image patches, extracted
from the whole visual context, we aim at characterising the statistical properties and
regularities of the selected visual input. While the structure of the visual context is
specified as a linear combination of basis functions, which are independent hence
uncorrelated, we show how low level features affecting a scan-path of fixations can
be obtained by hidden correlations to the context. Samples from human observers
are collected both in free-viewing and surveillance-like tasks, in the specified visual scene. These scan-paths show important and interesting dependencies from the
context. We show that a scan-path, given a database of a visual context, can be suitably induced by a system of filters that can be learned by a two stages model: the
independent component analysis (ICA) to gather low level features and a mixtures
of Bernoulli distributions identifying the hidden dependencies. Finally these two
stages are used to build the cascade of filters.
1 Introduction
Perceptual (biological) systems are designed through natural selection; evolved optimally in response to the distribution of natural visual cues perceived from the environment. The knowledge of the statistical properties and regularities of the visual
environment is a pivotal step in the understanding of the nature of visual processing
[11, 7, 12]. Most natural visual tasks involve selecting a certain amount of locations in the visual environment to fixate. This visual scanning of the world - the
Andrea Carbone
Dept. of Computer and System Sciences, Sapienza, e-mail: [email protected]
Fiora Pirri
Dept. of Computer and System Sciences, Sapienza, e-mail: [email protected]
1
2
scan-path - is performed by human beings in a very efficient way by programming
a sequence of saccades on the visual array in order to project the selected spatial
focus of attention onto the higher resolution area of the retina (fovea).
The strategy followed by humans in deploying the mechanism of visual attention
has been subject of research in neuroscience, cognitive science and lately computer
vision. It has inspired novel biologically based methods for image compression, visual search, vision based navigation and all the areas of research in artificial systems
where a preliminary selection of the area of interest, in a restricted portion of the input, helps in reducing the complexity of a generic further processing. The principle,
underlying this approach, relies on a generic notion of visual saliency. This notion
presupposes that the visual interestingness of a scene is a quantifiable entity encoding the task-relevant, context-based information embedded in the visual world.
In general, saliency has been modelled as a function on some feature space computed on the image. Several approaches have considered the problem of quantifying, in a biological justified framework, a measure of visual salience. These include
the approaches inspired by the Feature Integration Theory [31], and engineered to
model the natural competition between bottom-up cues, such as local measure of
centre-surround contrast on feature channels (i.e. orientation, colour opponency, luminance) [16][8]. Likewise a measure of salience is obtained by modelling features
tuned to specific visual search tasks [33]. We recall also the approaches accounting
for a top-down bias towards current task, like spatio-temporal locations or high level
cues [32]. In Section 4 we further compare with other approaches.
Our approach exploits the sparseness of the distribution of the selected fixations
of a human scanpath according to a set of linear filters generated by ICA decomposition of the visual environment. By purposely choosing a threshold over the sparse
feature vector characterising the scanpath, we map the originally continuous responses to their corresponding binary (discrete) representations from which a mixture of multivariate Bernoulli distributions is estimated. The goal of the mixture is
to capture the residual dependencies existing between active responses elicitedby
the visual selection of the observer.
The model, that we derive from the mixture, computes a saliency map of the
image, where high saliency values represent regions which are likely to have a statistical similarity to the ones occurred in the reference scanpath.
2 A new approach to visual gaze modelling.
The above considerations motivate our approach to the problem of characterising
the visual behaviour of an observer. The goal of this work is to investigate into the
nature of visual selection, in terms of its statistical properties, when it is projected
onto an ICA estimated feature space. The steps of our approach are:
a. Sampling and modelling the visual context: to build a set of images carrying
the information of the visual content of a specific scenario context. The context
model is then derived by computing basic linear ICA, on a set of randomly selected samples from the database.
b. Recording and Projecting the scan-path on the context bases: the actual gaze behaviour of a freely viewing subject is represented as a collection of gaze-centred
Learning Saliency. An ICA based model using Bernoulli mixtures.
3
Fig. 1 A small collection of images sampled from the internals of the building hosting our department.
image patches. The scan-path is then projected on the ICA features set computed
is the first step.
c. Prototyping the fixation sequence: to estimate the hidden dependencies via a
Mixture of Bernoulli distributions on a thresholded version of the ICA-projected
scan-path.
d. Synthesis of a computational system: to build the filters cascade by transforming
the mixtures parameters into a computational model that can be used to predict
the learned gaze behaviour.
Sampling the visual context.
Our work is closely related to the natural image statistics domain [14]. In literature,
natural images are defined as: “those that are likely to have similar statistical structure to that which the visual system is adapted to, during its evolution” [9, 20, 10].
The term natural in our context may sound misleading as it generally refers to collection of images of natural landscapes. In our scope we consider natural images
as those characterising the visual context of the observer. For example a collection
of pictures of the internals of a building, or the visual landscape of a surveillance
inspector. See Figure 1 for an example.
Modelling the visual context. ICA and sparse coding.
A large amount of literature deals with the concepts of sparseness, efficient coding
and blind source separation. These three aspects are intimately related to each other
[23]. Sparseness is a statistical property meaning that a random variable takes both
4
Fig. 2 A subset of the 256 of the linear ICA features computed from a set of 25000 random patches
sampled from the global visual context database.
small and large values more often than a normal density with the same variance.
A sparse code, then represents data with a minimum number of active units. The
typical shape of a sparse probability density distribution shows a peaked profile
around zero and long heavy tails. The sparseness of the response of the cortical
cells to the visual input [6], suggests the adoption of a computational framework
suitable to discover the latent factors that represent the basis of an alternative space
for encoding properly the visual data. The generative model we use in this work
is the linear independent component analysis or ICA [15]. The linear independent
component analysis models linear relations between pixels. In this model, any (greylevel) image patch I (x, y) can be expressed as a linear combination of a basis vector
Bi (sometimes called the mixing matrix):
n
I (x, y) = ∑ Bi (x, y)si
(1)
i=1
where the si are the stochastic coefficients different for each image. The si can be
computed from the image by inverting the mixing matrix:
si = ∑ Wi (x, y) I (x, y)
(2)
x,y
The Wi are called features or coefficients (because of the simple linear operation
between coefficients si and features Wi ). An example of computed ICA features can
be seen in Fig.2. The si are scalar values sparsely distributed (non Gaussian).
The coefficients Wi resemble the organisation of the simple cells in the primary
visual cortex V1 [19, 21] (i.e. a set of oriented, localised, bandpass filters). The
linear ICA computations presented in this work were realised with the FastICA
package [13].
Sampling the gaze. The scan-path.
A gaze tracked sample g(i) is acquired at each frame. The i-th gaze sample is defined
as the triple:
5
Fig. 3 A sample set out of the 470 filtered patches taken around fixations.
g(i) = hp(i) , t(i) , f (i) i
(3)
Here p(i) denotes the (x(i) , y(i) ) image plane coordinates of the gaze point, t(i) is the
time-stamp (in milliseconds) and f (i) the frame index. The full set of gaze samples
is defined as:
G = {g(1) , g(2) , . . . , g(k) }
(4)
Here k is the number of samples taken. As we are interested in analysing the information sampled at the centre of gaze during a fixation we proceed to filter out from
the set G all those samples that are likely to occur at a saccade (the rapid eye movement between two consecutive fixations), via a non-parametric clustering problem.
We borrow from Duchowski the definition of fixation as a sustained persistence of
the line of sight in time and space [5]. In practice, a fixation is the centre of a spatial
and temporal aggregation of samples in a given neighbourhood. We use the mean
shift algorithm on the feature space spanned by the vectors in G (except the frame
index information which is not useful to cluster together samples belonging to the
same fixation). A similar approach has been presented in [28]. The output of the
mean shift is a set of samples described by H = hc,V i, where c denotes a centre
(x, y,t) resulting from the mean shift, and V is the patch centred in c. Therefore the
full scan-path sequence
F = H (1) , H (2) , . . . , H (l)
(5)
mentions only samples classified as fixation points. Results are shown in Fig.4. Figure 3 shows a subset of gaze-centred patches from a scan-path.
Scan path projection on the context bases.
Our model, defined over two stages, that is, the ICA model and the filters structure,
is motivated by recent works of [22, 17]. Although in these approaches only the
correlation is identified, while here we show the strong relation between context
and free gaze motion.
The visual context model is defined by the mixing matrix inverse W ∈ R256×1024 ,
obtained by a specified context. A patch is a 32 × 32 gray-level image I . A scanpath, such as the one illustrated in Figure 3 is a filtered (with respect to motion
6
Fig. 4 Left: the gaze-machine used to acquire the scan-path. On the right: the mean-shift clustered
fixation points (in red) superimposed on the plot of the full gaze track (continuous line). X,Y axes
refers to the spatial image coordinates. The T axis represents the time-stamp in milliseconds of the
gaze sample.
Fig. 5 Stochastic coefficients obtained from a scan-path using the mixing matrix inverse from the
context.
blurring) sequence of N > 400 patches Ii , i = 1, . . . , N. From each scan-path the
stochastic coefficients si j are obtained as in Eq.2, in particular:
si j = W j Iî ,
(6)
where Iî is the i-th patch transformed into a vector 1024 × 1 and W j is the j-th row
of dimension (1 × 1024) of the matrix W .
Let (si1 , . . . , si256 )> be the stochastic coefficients obtained for each i-th patch Ii .
Thus a matrix S, of dimension N × 256, of these stochastic coefficients is obtained,
7
this is illustrated in Figure 5. The interest of these coefficients is to induce, by the
second level of our model, a system of filters that can reproduce human fixations,
given a context. That is, the system of filters generated by our model, given a context, will highlight the regions that have been fixated by some scan-path.
Fig. 6 A short sequence of generated saliency map, superimposed on the corresponding images.
Mixtures of Bernoulli.
Given the scan-path generated coefficients, these are thresholded to obtain a mixture
of Bernoulli distributions.
A threshold is defined as follows. Let S be the coefficient matrix N × 256, each
element of which is specified by equation (6), and si = (si1 , . . . , si256 ) its i-th row,
which we consider as a multivariate random variable. Then τ is an optimal threshold,
such that si j = 1 or si j = 0, if for each row i the entropy of the multivariate Bernoulli
sî is minimised. That is, if f (si , τ) = sî then
arg min( f , τ) = − ∑ p( f (si j , τ)|µi ) log p( f (si j , τ)|µi )
τ
(7)
j
Here p(xi |µi ), with µi = (µi1 , . . . , µi256 )> , is a Bernoulli distribution with parameter
µi , with xi a multivariate of dimension D, whose values are 0 or 1:
D
x
p(xi |µi ) = ∏ µi ji j (1 − µi j )(1−xi j )
(8)
j=1
Here D = 256, indeed. In other words entropy minimisation ensures that the information carried by the stochastic variable si is not wasted by the transformation into a
Bernoulli variable sî . The computation of the threshold τ is achieved by an iterative
procedure that initialise τ to the sample mean of the multivariate si , for each row i
and further uses a classical gradient ascent method to find the τ that minimises the
entropy of each obtained Bernoulli multivariate, given the local minimal thresholds
τi .
A mixture of Bernoulli multi-variates ŝi is defined as
b(ŝi |µ, π) =
K
∑ πk p(ŝi |µk )
(9)
k=1
Here πk is the weight of the i-th mixture component, with ∑k πk = 1, and p(ŝi |µk ) is
the Bernoulli distribution as specified in equation (8). The number of components
8
of the mixture have been estimated analogously, using the entropy minimisation criterion, but on the mixture b. In fact, as the number of components of the mixture
increases, the distribution tends to a uniform distribution, in so maximising the entropy. The optimal size for the samples considered was given by either K = 6 or
K = 7 components. The parameters of the mixture have been estimated implementing, in Matlab, the estimation-maximisation (EM) algorithm for Bernoulli mixtures
as reported in [2].
Building the model.
The parameters interesting for our models are, indeed, both the weights and the
mean vectors. Thus, for our samples case, these are the expectation vectors µk =
(µk1 , . . . , µk256 ), k = 1, . . . , K, and the priors πk , both estimated by the EM. These
parameters tell, in fact, the location of the dependency of the si from the context,
which is coded in the mixing matrix inverse W . In fact, it is rational to expect that
whenever a µi j , related to the si via the ŝi , has a high value, then the information of
the associated channel in W has a specific importance, as a context, on the scan-path
coefficient.
Therefore, we choose from each of the K (number of mixture components) means
µk = (µk1 , . . . , µk256 ) those values which are greater than a specified threshold according to the fact that for each component k = 1, . . . , K an approximately uniform
number of µk j , k = 1, . . . K, j = 1, . . . , 256, is chosen. More specifically, let σi be a
threshold and nk = #{µik |µik > σk }/D, with D = 256. We aim at choosing for each
component a similar number of channels. The maximum entropy principle ensures
to obtain a uniform distribution on these numbers. Thus, let h(nk , σk ) be a function
that, given the K, µk = (µk1 , . . . , µk256 ), returns for each µk the number of µk j chosen
in each vector, according to a given threshold σk , k = 1, . . . , K.
Then we want to choose a threshold that makes the choice unbiased, that is, it
does rely on the principle of maximum entropy or Laplace’s principle of indifference:
K
arg max(h, σ ) = − ∑ h(nk , σk ) log h(nk , σk )
σ
(10)
k=1
Now, let the chosen mean elements of each mean vector µk , of the k-th component,
be denoted by µ̂k .
The chosen means, as gathered above, indicate the important context dependency,
according to the channel. In fact, each chosen µk j is related to a specific channel
of the matrix W , the mixing matrix inverse, that specifies the linear dependency
between the images of the context and the basis functions. Let (k j1 , . . . , k jm ), k =
1, . . . , K, be the indices of the channels of W corresponding to the selected µˆk , then
for each component k we can establish the following correspondence:
W k = (Wk j1 , . . . ,Wk jm ) iff µ̂k = (µk j1 , . . . , µk jm )
(11)
Here W k is a sub-matrix of W , in which only the channels (k j1 , . . . , k jm ), k =
1, . . . , K have been chosen and each Wk ji , i = 1, . . . , m, is a column vector of the
matrix W k . Hence, for each component k, the matrix W k is formed by those channels
corresponding to the chosen µ̂k .
9
We are now ready to build the filtering system that shall lead to the construction
of the scan-path saliency map.
Fig. 7 The computational model induced by the learned mixture of Bernoulli distribution. Each of
the K channel is characterised by a combination of a subset of the original ICA features.Qk is the
output of the k-th channel. The saliency map is the weighted sum of the channels’ output via the
mixture mixing coefficients π.
Let (I1 , . . . , IN ) be a sequence of images taken from the context. The saliency
map, induced by the context, is defined as follows, for each image Ir , r = 1, . . . , N.
Qrk = ∏i (Ir ?Wk ji )
∆r = ∑k πk ||Qrk ||
(12)
Here ||Qrk || is the normalised version of Qrk . Here r indicates the r-th image in
the sequence, from a context, i = 1, . . . m, indicates the number of channels of the
W chosen for the k-th component by the µ̂k , and thus Qrk is the linearly filtered
version of the input image by correlation with the features selected for the k-th
channel. Finally, ∆r is the saliency map of the image Ir . Finally, to obtain the results
illustrated in Figure 6 a Gaussian filter is applied to the saliency map to eliminate
noise induced by the product and sum in eq. (12), and the saliency map is summed
to the current frame H (i) .
3 Experiments
The first step that we followed in order to model the experimental visual environment was to collect a set of views taken from the internals of a building. In this
work we have chosen to take pictures of the Department of Computer and System Sciences building in Rome. We collected a set of 126 images representing the
global content of visual information that people visiting or working in the building
10
are likely to sense. The views contains sample images of different sub-contexts: corridors, rooms, laboratories, closets, doors. Pictures depicting the same sub-context
were taken at different scales (i.e. from a closer or furthest point of view) and different angles. Fig. 1 shows a subset of pictures selected from the database and Fig.
2 shows its relative ICA decomposition.
Subjective scanpaths have been recorded from three human subjects instructed to
perform a two stage task involving initially a free-viewing behaviour inside a room
and successively to walk out of the room and following a path in a corridor-like
environment.
We tested the quality of the saliency map computed by the model by comparing
the cumulative saliency score of the human scan-paths with the one computed from
a randomly generated scan-path. The saliency score Sal f related to a specific fixation
is defined as the value of the saliency map ∆ computed on the corresponding image
at the fixation coordinates. The cumulative saliency score induced by a scan-path is
the sum of the individual scores over all of the fixations. The results show that . . .
The saliency score Sal f related to a specific fixation and a saliency map ∆ is defined as a Gaussian weighted sum of the local saliency values on the neighbourhood
(of the same size of the original patches) of the fixation coordinates. The cumulative
saliency score induced by a scan-path is the sum of the individual scores over all of
the fixations.
The results clearly show that our model rewards human performed scan-path
whereas random sequence of fixations gains much lower scores. This result is in
line with those obtained for example in [25], where the statistic computed over a
random set of fixation is compared to a natural one, yielding different results.
The second evaluation measure instead, takes into account the average euclidean
distance of the fixation point from the centroid of the maximal
Fig. 8 Human vs random results: Saliency Score and Distance to maximum saliency value.
4 Related Works and Conclusions
The role that central vision 1 plays in visual processes is intimately linked to the
understanding of the relationships between action, visual environment statistics and
1
The high detailed visual information as projected on the neighbourhood of the centre of gaze
during a fixation.
11
previous knowledge with the actual scan-path performed (i.e. the sequence of spatiotemporal fixations). To our best knowledge the closest methodology to our approach
can be read in [27] and [26] where the fixations are collected from a moving observer
but immersed in a virtual environment which second order statistics resemble the
one that can be measured in real natural environments2 . Bruce and Tsotsos [3, 4]
propose a bottom up strategy relying on a definition of saliency aimed at maximising
Shannon’s information measure after ICA decomposition. They use a database of
patches randomly sampled from a set of natural images. The saliency model is then
validated against eye tracked data captured from laboratory experiment (recorded
video and still images). In [24] the root mean square contrast is evaluated on a set of
fixation points performed by an observer looking at static natural (in this case natural
landscapes) images. They derive a saliency model based on the minimisation of the
total contrast entropy. Reinagel and Zador in [25] study the effect of visual sampling
by analysing the correlations of contrast and grey-level correlation in the fovea and
para-fovea3 regions. In [34], the authors model the distribution of contrast and edges
on gaze-centred image patches with a Weibull probability density function under the
assumption that in a free-viewing context our gaze is drawn toward image regions
which local statistics differs from the rest of the image. Tatler and Baddeley in [1] go
through a deep discussion on determining what are the characteristics that most are
likely influence the choice of the regions to fixate. They focus on local statistics on
luminance, contrast and edges. The derived model, highlights a preference for high
frequency edges. In [29] the authors observe the generic characteristic of the point
of fixation conditioned to the magnitude of the saccade performed. Second order
statistics regularities emerging in categories of natural images can be exploited as
descriptors for classifying the kind of environment depicted [30].
We showed an approach aimed at modelling the visual selection of an generic
observer from a real scan-path performed in a given environment. The rich information content from the visual environment is encoded in a set of feature bases which
capture the linear correlations between the images. Then we project the actual scanpath onto the feature space representing the context.
References
1. Baddeley, R., Tatler, B.: High frequency edges (but not contrast) predict where we fixate: A
bayesian system identification analysis. Vision Research 46, 2824–2833 (2006)
2. Bishop, C.M.: Pattern recognition and machine learning (information science and statistics)
(2006)
3. Bruce, N., Tsotsos, J.K.: An information theoretic model of saliency and visual search. Lecture
Notes in Computer Science 4840, 171 (2007)
4. Bruce, N.D.B., Tsotsos, J.K.: Saliency, attention, and visual search: An information theoretic
approach. J. Vis. 9(3), 1–24 (2009)
5. Duchowski, A.: Eye Tracking Methodology: Theory and Practice. (2007)
6. Field, D.: What is the goal of sensory coding? Neural Computation (1994)
7. Field, D.J.: Relations between the statistics of natural images and the response properties of
cortical cells. J Opt Soc Am A 4(12), 2379–2394 (1987)
2
3
The so called 1/ f noise.
Parafovea is the area sensed at minor resolution surrounding the fovea
12
8. Frintrop, S., Klodt, M., Rome, E.: A real-time visual attention system using integral images.
Proc. of ICVS (2007)
9. Geisler, W.S.: Visual perception and the statistical properties of natural scenes. Annu. Rev.
Psychol. 59, 167–192 (2008)
10. Geisler, W.S., Ringach, D.: Natural systems analysis. Visual neuroscience 26(1), 1–3 (2009)
11. Gibson, J.J.: Perception of Visual World. (1966)
12. Gibson, J.J.: The Senses Considered as Perceptual Systems. (1983)
13. Hyvarinen, A.: Fast and robust fixed-point algorithms for independent component analysis.
IEEE Transactions on Neural Networks 10(3), 626–634 (1999)
14. Hyvarinen, A., Hurri, J., Hoyer, P.O.: Natural Image Statistics A Probabilistic Approach to
Early Computational Vision., vol. 39 (2009)
15. Hyvarinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural
networks 13(4-5), 411–430 (2000)
16. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259
(1998)
17. Karklin, Y., Lewicki, M.: Learning higher-order structures in natural images. Network: Computation in Neural Systems 14(3), 483–499 (2003)
18. Marra, S., Pirri, F.: Eyes and cameras calibration for 3d world gaze detection. pp. 216–227
(2008)
19. Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning
a sparse code for natural images. Nature 381(6583), 607–609 (1996). DOI 10.1038/381607a0
20. Olshausen, B.A., Field, D.J.: Natural image statistics and efficient coding. Network: Computation in Neural Systems 7(2), 333–339 (1996)
21. Olshausen, B.A., Fteld, D.: Sparse coding with an overcomplete basis set: A strategy employed
by v1? Vision Research (1997)
22. Park, H.J., Lee, T.W.: Modeling nonlinear dependencies in natural images using mixture of
laplacian distribution. (2004)
23. Pece, A.: The problem of sparse image coding. Journal of Mathematical Imaging and Vision
(2002)
24. Raj, R., Geisler, W., Frazor, R., Bovik, A.: Natural contrast statistics and the selection of
visual fixations. Image Processing, 2005. ICIP 2005. IEEE International Conference on 3, III
– 1152–5 (2005). DOI 10.1109/ICIP.2005.1530601
25. Reinagel, P., Zador, A.: Natural scene statistics at the centre of gaze. Network: Computation
in Neural Systems 10(4), 341–350 (1999)
26. Rothkopf, C.A., Ballard, D.H.: Image statistics at the point of gaze during human navigation.
Visual neuroscience 26(01), 81–92 (2009)
27. Rothkopf, C.A., Ballard, D.H., Hayhoe, M.: Task and context determine where you look.
Journal of Vision 7(14), 12 (2007)
28. Santella, A., DeCarlo, D.: Robust clustering of eye movement recordings for quantification of
visual interest. ETRA ’04: Proceedings of the 2004 symposium on Eye tracking research &
applications (2004)
29. Tatler, B., Baddeley, R., Vincent, B.: The long and the short of it: Spatial statistics at fixation
vary with saccade amplitude and task. Vision Research 46(12), 1857–1862 (2006)
30. Torralba, A., Oliva, A.: Statistics of natural image categories. Network: Computation in Neural
Systems 14(3), 391–412 (2003)
31. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognitive Psychology
12, 97–136 (1980)
32. Tsotsos, J.K., Culhane, S., Wai, W.K., Lai, Y., Davis, N., Nuflo, F.: Modeling visual attention
via selective tuning. Artificial intelligence 78(1-2), 507–545 (1995)
33. Wolfe, J.M., Cave, K.R., Franzel, S.L.: Guided search: an alternative to the feature integration
model for visual search. Journal of experimental psychology. Human perception and performance 15(3), 419–433 (1989)
34. Yanulevskaya, V., Geusebroek, J.M., Marsman, J.B.C., Cornelissen, F.W.: Natural image
statistics differ for fixated vs. non-fixated regions. (2008)
Learning cross-modal translatability: grounding speach act on
visual perception
Mario Gianni1 , Geert-Jan M. Krujiff2 , and Fiora Pirri1
1
2
Dipartimento di Informatica e Sistemistica, Sapienza, Universita’ di Roma
Language Technology Lab, German Res. Center for Artificial Intell.(DFKI GmbH)
January 21, 2011
The problem of grounding language on visual perception has been nowadays investigated under different approaches, we refer the reader in particular to the works of [7, 11, 3, 13, 2, 10, 12, 6, 5, 1]. It is less investigated
the inverse problem, that is, the problem of building the semantics/interpretation of visual perception, via speechact.
In this abstract we face the two problems simultaneously, via learning both the language and its semantics
by human-robot interaction. We describe the progress of a current research facing the problem of simultaneously grounding parts of speech and learning the signature of a language for describing both actions and states
space, while actions are executed and shown in a video. Indeed, having both a language and a suitable semantics/interpretation of objects, actions and states properties, we will be able to build descriptions and representations of real world activities under several interaction modalities.
Given two inputs, a video and a narrative, the task is to associate a signature and an interpretation to each
significant action and the afforded objects, in the sequence, and to infer the preconditions and effects of the
actions so as to interpret the chronicle, explaining the beliefs of the agent about the observed task.
K
We start, thus, with two sets of observations the set {Y}N
of speech-acts and the set {D}h=1
of descriptors of the
n=1
action and objects space, both suitably extracted from the audio and video sequence (there are several methods to
do that, for the visual sequence here we mention [9]). There are two sets of hidden data, namely the speech-act
N
labels {X}i=1
, and the properties {P}Hj=1 induced by actions, specifying how actions dynamically change both what
is visible and what can be reported. The hidden variables P are indexed by time, and the hidden speech-act labels
X are indexed by time and contextual links. We call these indices the states, thus, for all visual states j ∈ S there
exists a cluster of contextual links { j1 , . . . , jk } formed by S k specifying a neighbour system for the speech act
labels.
The (simplified) dependency relation among the random variables is as follows. Speech-acts are independent of
any other visual state, given the state at which the commented action is uttered, induced by the visual stimuli.
The action descriptors are independent of both speech-act and the other visual states, given the state at which the
action is expected. The interpretation of each phrase (the labels) is independent of any visual state given the time
at which the action is seen and it depends on the interpretation of other speech-acts only via the neighbouring
system. The specific dependencies of these variables is represented in Figure 1, right.
The variables interplay is accounted for by the interaction of these two different processes, during an experiment.
Here an experiment is specified by a task, like pouring some water from a jug to a glass, visually represented by
a video, which is described by a narrative. The narrative refers to both the simple actions and objects space and
to the beliefs about what is going on in the scene. The narrative, however, goes beyond the direct denotation as it
describes the beliefs concerning preconditions and effects of actions on the afforded objects, in terms of temporal
and spatial relations, and eventually other deictic expressions. In this initial formalisation, learning the crossmodal translatability is achieved via the two mentioned processes. As gathered above, each process is defined
by an observable and an hidden part. The first process serves the linguistic part and it is defined by a hidden
1
Figure 1: Left: structure of the two double processes, where observations for both the processes are modulated by descriptors
obtained by early interpretation of speech, in the narrative, and of motion and salient objects in the scene. The hidden states
of the HMRF are formed by parts of speech describing possible (multi-modal) states of the world. The observation parts
are denotations. The hidden states of the HMM are unobserved properties (such as changing relations) induced by actions.
Observations are actions modelled as mixtures of Gaussian. Right: details.
Markov random field (HMRF), to capture the qualitative-spatial structure of the multi-modal contexts of parts of
speech. As said above speech-acts are the observed random variables, while labels provide an interpretation, or
word contexts, and thus are the hidden part (see also [4]). The second process is an hidden Markov model with
mixture of Gaussian observations (GHMM). Here the observations are the visual features dynamics; these are
represented by descriptors of the images sequence which, in turn, are obtained by attention based interpretation
of light and motion features. From these processes we obtain a speech-act space, a configuration space, an action
space and a state space. The structure and connections of the two processes are schematically illustrated in Figure
1, left.
The HMRF defines a joint probability distribution of sequences of observations, i.e. the speech-parts, and sequences of labels, i.e. the language and interpretation of speech-parts. Assume a collection of states S K has been
learned, so that we have a double indexed graph. Observations are formed by a finite set of phrases {y1 , . . . , yn } j
(we do not consider here the speech analysis which is assumed to yield the correct association with a predefined
and recorded language corpus) having a d-dimensional term space Y indexed by S K , and the random field p(y)
QK
is defined by a n-dimensional space Y = n=1
Y jn , where Y jn = {y jn |y jn ∈ Σn }, with Σ the signature. The hidden
field is determined by labels defined by a finite space of terms X jm = {x jm |x jm ∈ L}, with L the language, i.e.
L = (Σ, D, I), that is, the language is defined by a signature, a domain and an interpretation. For example the
predicate Q(t1 , t2 ) is a 2-dimensional term, its interpretation is defined according to the HMRF by any suitable set
of pairs of objects whose denotation is specified by a term t ∈ Σn . Note that we are simply referring to a language
and an interpretation in terms of elementary structures, not models (in the logical sense). Models of the language
can be induced (see [9]) by the probabilistic relational structure, but are not treated here. The product space X
is the space of configurations of the labels. A probability distribution p(x) on the space of configurations of the
hidden labels is another random field; on this random field we define a neighbouring system δ that specifies how
labels form subgraphs affine to the time of the visual stimuli. These subgraphs are, then, incrementally extended
by the learning algorithm that we cannot describe here for lack of space. Labels specify via the neighbours a set
of available interpretations. The random field equipped with the neighbour system δ is a Markov random field iff
p(xi |xS \i ) = p(xi |δ(xi )) and the joint probability of the two processes is p(x, y) = p(y|x)p(x), this implies that any
P
function f : X 7→ R is supported by precisely the cliques of the graph, and p(xi |δ(xi )) = (1/Z) exp( C VC (xi )),
P
with VC = 1≤i≤nC λCi fiC (x) = λC f C (x), with λCi ∈ R the parameters of the model and f C (x) ∈ {0, 1} the features of the field. However, as gathered above the joint probability involves also the HMM. In fact, speech-acts
(observations of the HMRF) are given in sequences and thus these are synchronised together with the action
descriptors, therefore they turn out to be also observations of the HMM (see the Figure 1 on the left), if their
2
lengths satisfy specific conditions (that is, they are terms not phrases).
For example, suppose that the task is to pour water into a glass. Then the video sequence is interpreted to
generate descriptors for the action space. The parameters of the HMM can be learned as usual as far as we
assume that observations are multivariate in R and states are in N. Suppose, instead, that by a suitable action
space construction, from the video analysis, it is possible to build an action space and that states can be given
an interpretation. Thus two more variables are involved, the denotation of variables and the inference of the
M
state properties, as extracted from the speech-acts. Thus, let {α}i=1
be the generated action space and let S be
the state space of the visually interpreted actions. At time t a phrase and sparse denotations will be uttered, in
the context of the observed scene. Thus the realization of the variables is p(y, D|st , kt , x)P(x|δ(x)). However the
graph topology is locally induced by the visual stimulus and the utterance. Learning the dependency between the
HMM states S and the HMRF states S K is achieved by an incremental learning algorithm that follows closely
[8]. The difference is mostly on the initial steps. Here, instead of a normal distribution, the random field is built
as a set of cliques induced by the simultaneous association of descriptors and phrases.
In conclusion, labels are ground by the narrative which, on the other hand, describes both pointwise actions
and state changes, by speech act explaining the action course and specific modalities concerning time and space
features of the action effects and preconditions. The connection of the two learning processes ensures both
grounding and signature learning. For example, after the action is executed a specific change of spatial relations
is the action effect, a speech act shall serve to designate it. For this task the following objects require a relation
to be established: the hand, the jug, the glass, the table and the water. The actions are: approaching the jug,
grasping the jug handle, rising the jug, inclining the jug so that the water can spill out, putting down the jug. On
the other hand there is an infinite set of possible world states associated with these actions and objects. However
we are interested only in a finite state space, in which states are just those that can be specified in a finite time
lag. That is, those states that can be uttered by the narrator. For example “now the hand is grasping the glass”, or
“the glass now is on the table while before it was on the hand”, or “the glass is on the table and it is full of water
but before you filled it it was empty”. Similarly: “pouring the water into the glass has been successful, because
now the glass is full of water” and “you want to pour water in the glass because someone wants to drink it”.
References
[1] S. R. K. Branavan, Harr Chen, Jacob Eisenstein, and Regina Barzilay. Learning document-level semantic properties from free-text
annotations. J. Artif. Intell. Res. (JAIR), 34:569–603, 2009. 1
[2] Peter Ford Dominey and Jean-David Boucher. Learning to talk about events from narrated video in a construction grammar framework.
Artif. Intell., 167(1-2):31–61, 2005. 1
[3] Peter Gorniak and Deb Roy. Grounded semantic composition for visual scenes. J. Artif. Intell. Res. (JAIR), 21:429–470, 2004. 1
[4] Pierre Lison, Carsten Ehrler, and Geert-Jan M. Kruijff. Belief modelling for situation awareness in human-robot interaction, 2010.
(submitted). 1
[5] Ingo Lütkebohle, Julia Peltason, Lars Schillingmann, Britta Wrede, Sven Wachsmuth, Christof Elbrechter, and Robert Haschke. The
curious robot - structuring interactive robot learning. In ICRA, pages 4156–4162, 2009. 1
[6] Raymond J. Mooney. Learning to connect language and perception. In AAAI, pages 1598–1601, 2008. 1
[7] Alex Pentland, Deb Roy, and Christopher Richard Wren. Perceptual intelligence: learning gestures and words for individualized,
adaptive interfaces. In HCI (1), pages 286–290, 1999. 1
[8] Stephen Della Pietra, Vincent J. Della Pietra, and John D. Lafferty. Inducing features of random fields. IEEE Trans. Pattern Anal.
Mach. Intell., 19(4):380–393, 1997. 3
[9] Fiora Pirri. The well-designed logical robot: learning and experience from observations to the situation calculus. Artif. Intell., to appear,
2010. 1, 2
[10] Deb Roy. Semiotic schemas: A framework for grounding language in action and perception. Artif. Intell., 167(1-2):170–205, 2005. 1
[11] Deb Roy and Alex Pentland. Learning words from sights and sounds: a computational model. Cognitive Science, 26(1):113–146, 2002.
1
[12] Paul E. Rybski, Jeremy Stolarz, Kevin Yoon, and Manuela Veloso. Using dialog and human observations to dictate tasks to a learning
robot assistant. Intel Serv Robotics, 1:159–167, 2008. 1
[13] Chen Yu and Dana H. Ballard. On the integration of grounding language and learning objects. In AAAI, pages 488–494, 2004. 1
3
AN APPROACH TO PROJECTIVE RECONSTRUCTION FROM MULTIPLE
VIEWS
A. Carrano, V. D’Angelo, S. R. F. Fanello, I. Gori, F. Pirri, A. Rudi
email: [email protected], [email protected]
Dipartimento di Informatica e Sistemistica
Sapienza Universitı̈¿ 12 di Roma
Rome, RM, Italy
ABSTRACT
We present an original multiple views method to perform
a robust and detailed 3D reconstruction of a static scene
from several images taken by one or more uncalibrated
cameras. Making use only of fundamental matrices we are
able to combine even heterogeneous video and/or photo sequences. In particular we give a characterization of camera
matrices space consistent with a given fundamental matrix
and provide a straightforward bottom-up method, linear in
most practical uses, to fulfil the 3D reconstruction. We also
describe shortly how to integrate this procedure in a standard vision system following an incremental approach.
Figure 1. Comparison of the trifocal tensor and our approach: in
the first case is necessary the correspondence between every view,
in the second case are needed only couple of correspondences.
[Hey98]. The trifocal tensor can be estimated by at least 7
corresponding points in three images, while the quadrifocal
tensor can be estimated from at least 6 corresponding points
in 4 images. Thus to use tensors it is necessary to have a
certain number of points visible from each view. This can
be achieved with quite good video sequences. For example,
with at least 25 frames per second (fps), it is easy to find
a set of images with a large percentage of common points.
Nevertheless, in the general case only a certain number of
views are available and the distance between different vantage points highly varies. It is, thus, difficult in most situations to force so many correct 3-correspondences or 4correspondences to be able to apply trifocal or quadrifocal
tensors.
Our method is less redundant than the tensor one, as it
is not required to take unnecessary views of an object to obtain the right 3 and 4-correspondences. Indeed, being based
on the fundamental matrix it is surely less constrained than
tensors in fact while a fundamental matrix is always constructible when tensors are constructible, the contrary does
not hold.
For these reasons our method is closer to the human
ability of choosing vantage points to mentally reconstruct a
scene for the purpose of recognition. Human visual system,
in fact, is able to generalize across viewpoints, and in some
situations recognition is even view invariant. This human
ability deeply studied in neurophysiology and psychology
[BG93, Ede95, FG02] shows that humans need few vantage points to image an object, in other words, in just few
views there is already enough information to perform the
reconstruction. This means that the mental image of a familiar object, across views, exploits few correspondences
KEY WORDS
Stereo Vision, 3D-Reconstruction, camera matrices space,
projective reconstruction, structure from motion
1
Introduction
Modelling visual scenes is a research issue in several fields:
finding the three-dimensional structure of an object, by
analyzing its motion over time, recognizing an object in
space, or just rendering a scene for visualization has many
interesting applications from industry to security, from TV
to media entertainment. In fact, efficiently computing
3D structure and camera parameters from multiple views
has attracted the interest of many researchers in the last
two decades. Since the early work of Faugeras, Zhang
et Al. [DZLF94, ZDFL95], introducing the fundamental
matrix, to deal with the problem of multiple images of a
three-dimensional object, taken with an uncalibrated camera, several approaches have been considered. These approaches range from multilinear forms to multiple view
tensors to take into account all the constraints induced by
multiple views. However none of these approaches can be
considered the final answer to the problem of reconstructing a scene from a number of its projective images due
to the intrinsic complexity and constraints of the problem.
Thus, easy and computational feasible methods are in great
demand to obtain a good reconstruction of a scene.
The trifocal tensor was introduced in [Har97] and
the quadrifocal tensor by [Har98], connecting image measurements respectively along 3 and 4 views. A common
framework for multiple view tensors has been proposed in
1
Figure 2. Inferred dense cloud of points obtained, applying the described method to the topology shown in Figure 6.
among them; this suggests that it must be possible, in general, to obtain a reconstruction avoiding further constraints
required by the methods used. In this sense our method,
being quite flexible, allows to define a topology of views
just from the similarities between the available ones, so
that two images are connected if there is a good estimation of a fundamental matrix binding them. Using any reasonable topology we can always solve a 3D-reconstruction
problem in a non linear way; furthermore, a linear solution for reconstruction is always possible using a compositional topology derived from an incremental approach.
The paper is organized as follows. In the next Section 2
we introduce some preliminary camera matrix concept. In
Section 3 we essentially explain the projective reconstruction method and show some example from the underlying
topology. In Section 4 we describe the implementation of
the method and illustrate an example of the complete metric reconstruction of Morpheus from the Matrix series see
Figure 2.
2 Preliminaries
Here we briefly recall some geometric concepts related to
camera matrices, we refer the reader to [HZ00] for an in
depth description. A perspective camera is modelled by
the projection: xi ∼ P Xi , where ∼ is equality modulo a
scale factor, Xi is a 4-vector denoting a 3D point in homogeneous coordinates, xi is a 3-vector denoting the corresponding 2D point and P is a 3 × 4 projection matrix
(from 3D to 2D space). P is factorized, in a metric space
as:
P = K[R |t]
(1)
Here K is the intrinsic parameters matrix, R is the orientation matrix and t is the camera position 3-vector. The
fundamental matrix for two views, capturing the correspondence between point x and x0 , is a rank 2, 3×3 matrix such
that
x0> F x = 0 and, given P1 and P2 , F = [P1 c]× P2 P1+ (2)
Here P1+ is the pseudo-inverse of P1 , c is the center of
the first camera, and [·]× indicates an anti-symmetric matrix of the vector product. The cameras projective matrices
P1 = [I |∅] and P2 = [[e0 ]× F |e0 ], where e0 is the epipole
and e0> F = 0, are the canonical cameras. The task of
projective reconstruction is to find the camera matrices and
the 3D points mapping to the points in each image. The
estimation carries an intrinsic ambiguity in representation
since any set of camera matrices corresponds to the set obtained by right multiplying both canonical cameras for an
arbitrary non singular 4 × 4 matrix.
If we consider three views (say P, P 0 and P 00 ) we can
estimate fundamental matrices F12 , between the first and
second view and F23 , between the second and third view,
using the above results. However, a 3D point in a single
image has 2 degrees of freedom, but in n images it has 3
degrees of freedom, thus there are 2n − 3 independent constraints between n views of a point and 2n − 4 for a line.
Thus bilinear, trilinear and quadrilinear constraints are different. Hence, using multiple views carries in specific constraints also due to the error propagation of a moving camera, and the fact that, inevitably, points become occluded
as the camera view changes. Therefore, a certain point is
only visible in a certain set of views. Using multiple views
allows inference of hidden dimension. For three views and
four views, as specified in the introduction, the trifocal and
quadrifocal tensor solve the trilinear and quadrlinear constraints. The tensors stop at four views (see [HZ00]).
3
Projective Reconstruction
In this section we describe an original approach, as far as
we know, to projective reconstruction simpler than trifocal
tensor and anyhow quite powerful. This method uses fundamental matrices only.
First of all we need a necessary and sufficient condition for pair of camera matrices P1 and P2 to be compatible
to a given fundamental matrix F which is almost linear and
does not explicitly involve 3D projective transformations.
Then we will use this condition to build-up a linear system
to solve the projective reconstruction problem.
Two View Equation
Let F be a fundamental matrix, the space of all pairs of
camera matrices P1 , P2 compatible with F can be expressed as
λP1 = [I | 0] Z
(3)
0
0
µP2 = [e ]× F | e Z
(4)
where e0 is the left epipole, Z is any full-rank projective
transform 4 × 4-matrix, and λ and µ are scale parameters
of P1 and P2 respectively, both free and not null.
Letting
λP1
0 [I | 0] 0 H=
and Y =
(5)
[e ]× F | e
µP2
we can restate (3) and (4) as follows:
(6)
Y = HZ
Now H is a full-rank 6 × 4-matrix as long as the
epipole e0 is not null, which is always true for nondegenerate F , and in terms of its column space it can be
represented as:
H = h1 h2 h3 h4
here hi , 1 ≤ i ≤ 4 are linearly independent vectors.
The space of the hs has dimension 6, hence there exist
two vectors h5 , h6 orthogonal to h1 , ..., h4 which belong
to null(H > ), let
 > 
z1
 z2> 
>

N = h5 h6 = null(H ) and Z = 
 z3> 
z4>
(7)
Hence Y can be expressed as a linear combination of
h1 , ..., h4 :
 > 
z1
 z2> 

Y = HZ = h1 h2 h3 h4 
 z3>  =
(8)
z4>
= h1 z1> + h2 z2> + h3 z3> + h4 z4> .
This equation is equivalent to (6) and, indeed, the solution of (9) is exactly (6). Now we obtain P1 and P2 from
Y as follows. Let
N > = N1> N2>
we can write (9) as
N1>
N2>
λP1
µP2
=0
thus we get the following that we shall call the two view
equation:
N1> P1 + γN2> P2 = 02×4
(10)
with γ = µλ−1 free. This is the equation we are searching
for. The equation (10) has 8 constraint on P1 , P2 and one
free parameter γ thus it holds the same information of the
fundamental matrix F , in fact, it has 7 constraints, as many
as F ’s d.o.fs.
Projective Reconstruction System
Now we use the two view equation (see (10) above), for
any couple of views equipped with a fundamental matrix
with the aim to intersect the space of camera matrices in
order to select only the satisfactory chain of views. We first
analyze a particular, but prominent, case and then we give
a general solution, which is nonlinear but can be handled
through multiple linear refinements.
Four views reconstruction
Consider the case in which we have four views
P1 , P2 , P3 and P4 , as stated above.
Let Fij , be
the fundamental matrix relating views Pi and Pj ,
with (i, j) ∈ {(1, 2), (2, 3), (3, 4), (3, 1), (4, 1)}. Let
>
>
Λ>
ij = (Λij 1 , Λij 2 ) denote any of the following pairs
>
>
>
{(A1 , A2 ), (B1 , B2> ), (C1> , C2> ), (D1> , D2> ), (E1> , E2> )}.
Then:
"
> #!
I Fij> e0ij ×
>
Λij = null
(11)
>
0>
e0 ij
The graph illustrating the above described fundamental matrix connections Fij , is shown below, on the left.
Now, in order to avoid the explicit use of Z we are
going to express Y in terms of the null space of H > in fact
multiplying both sides of (8) by N > we obtain
N > Y = N > HZ =
=
=
h>
5
h>
6
h1 z1> + h2 z2> + h3 z3> + h4 z4> =
>
>
>
>
>
>
>
h>
5 h1 z1 + h5 h2 z2 + h5 h3 z3 + h5 h4 z4
>
>
>
>
>
>
>
h>
h
z
+
h
h
z
+
h
h
z
+
h
h
z
6 1 1
6 2 2
6 3 3
6 4 4
Finally
N > Y = 02×4
= 02×4 .
Figure 3. Base topologies.
(9)
Now, chosing the initial view P1 as constant we obtain
the following system:
 >
>

1 P1 + λA2 P2 = 0
 A>

>


 B1>P2 + µB2> P3 = 0

C1 P3 + νC2 P4 = 0
(12)
D1> P3 + ρD2> P1 = 0



>
>

E P + ηE2 P1 = 0


 1 4
P1 = [I | 0]
We can note in (12) above that scale parameters, but two,
are abitrary, since each Pi is defined modulo the scale parameter. We can, thus, further simplify (12) setting λ to be
1 in the first equation, and similarly we can set µ and ν. We
obtain the system

>
A>

1 P1 + A 2 P2 = 0


>

B1 P2 + B2> P3 = 0



C1> P3 + C2> P4 = 0
(13)
D1> P3 + ρD2> P1 = 0




E > P + ηE2> P1 = 0


 1 4
P1 = [I | 0]
Figure 4. Adding a new view, possible links are shown.
Multiple View Equation
D
E
Let P̃1 ...P̃n be the solution of the Projective Reconstruction System S with P1 fixed, as seen in the previous sections. We can generalize the two view equation in the following way. We know that the space of the P chain which
solve S with P1 free is:
λ1 P1 = P̃1 Z
..
.
which is a straightforward linear system.
λn Pn = P̃n Z
Analysis of the degrees of freedom
In general let n be the number of views and m the number
of relations found between views. The system will have
• Constraints: 8m due to the m equations, one for every
relation found; 12 due to the fixed P1 ; overall 8m +
12.
• Unknowns: 12n due to the n P s; m due to the scale
factors; −(n−1) due to the scale choice, one for every
P except the first; overall 12n + m − n + 1 = 11n +
m + 1.
To find a unique solution we should have
7m ≥ 11 (n − 1) .
(15)
(14)
This is consistent with the number of dof of the P s
and F , in fact, every F contributes with 7 constraints, that
is, all information it has, and every P , except possibly the
first, with 11 unknowns, that is, all information it needs to
be instantiated.
General Case
For the general case views can be seen as the nodes of a
graph. The nodes are connected if the Fij , related to the
two views, have already been estimated. Broadly speaking,
if we do not pay attention to the topology of the graph we
can build a system as follows
>
Aij1 Pi + λij A>
ij2 Pj = 0 ∀ij where exist Fij
P1 = [I | 0]
which in general is a nonlinear system.
In the following sections we sketch an incremental approach taking into account the topology of the views graph,
in order to find a linear solution of the system.
Here the λi are the scale factors and Z is the free 3D projective transformation. As in the two view equation let




λ1 P1
P̃1


 . 
..
Y =
(16)
 and H =  .. 
.
λ n Pn
P̃n
Now we can restate equations (15) as
Y = HZ
(17)
from which we can obtain, as in the two
view equation, the
following result. Let N = null H > , with


N1
Pn


N =  ...  then N1> P1 + i=2 γi Ni> Pi = 0
Nn
with γi free scale parameters, this is the Multiple Views
Equation.
Considering that H has dimension 3n × 4 then N has
dimension 3n × (3n − 4) and so Ni , which are the n slices
of N , have dimension 3 × (3n − 4). Thus equation on the
right of (17) has (3n − 4) × 4 constraints and n − 1 free
parameters.
We show, now, how to use this equation to develop
an incremental bottom-up linear approach. In fact, starting
from the graph, we first solve the system for all 4-elements
subgraphs, which form the base case discussed above and
then, using the following two methods, we add new nodes
and glue subgraphs.
Adding a new double-connected
view
to an already
D
E
solved graph. Let S = P̃1 ...P̃n be the solution of
a projective reconstruction system with n views. Let Pnew
be a new view to be added to S and let:
M1> P1 + λM2> Pnew = 0 and
>
L>
1 Pnew + µL2 P2 = 0
be the two view equation of the F between P1 and Pnew
and the two view equation of the F between Pnew and P2 .
We can built the system

M > P1 + M2> Pnew = 0


 L>1 P
>
1 new + µL2 P2 = 0
P1 = P̃1



P2 = P̃2
where λ has been set to 1 to fix the Pnew scale. The topology of this system, linear and simply solvable, is illustrated
in Figure 4.
Connecting two graph already solved.
be the solution of the first system and
N1> Q1 + γ2 N2> Q2 + γ3 N3> Q3 +
n
X
D
E
Let P̃1 ...P̃n
Figure 6. Topology used.
let
γi Ni> Qi = 0
D
E
Let P̃1 ...P̃n be the solution of the first system and
N1> Q1 + γ2 N2> Q2 + γ3 N3> Q3 +
i=4
the multiple view equation related to the second system,
then the two graphs can be connected in a linear way trough
3 links as follows
Pn
 >
N1 Q1 + γ2 N2> Q2 + γ3 N3> Q3 + i=4 γi Ni> Qi = 0



>
>
A1 Q1 + λA2 P̃1 = 0
>
>
B

1 Q2 + µB2 P̃2 = 0


>
>
C1 Q3 + νC2 P̃3 = 0
we set all γs to 1 in order to set the scale of every Qs except
the first, and λ to 1 to set the scale of Q1 . Then the system
is linear.
n
X
γi Ni> Qi = 0
i=4
M1> R1 + η2 M2> R2 + η3 M3> R3 +
n
X
ηi Mi> Ri = 0
i=4
be the multiple view equation related to the second and the
third graphs. Considering the system with the five links,
arranged as shown on the right of Figure 5, we have
Pn

N1> Q1 + γ2 N2> Q2 + γ3 N3> Q3 + Pi=4 γi Ni> Qi = 0


n

>
>
>

M1 R1 + η2 M2 R2 + η3 M3 R3 + i=4 ηi Mi> Ri = 0



>
>

A1 Q2 + λA2 P̃1 = 0

B1> Q3 + µB2> P̃2 = 0



C1> R1 + αC2> P̃3 = 0




D1> R2 + βD2> P̃4 = 0


E1> R3 + ρE2> Q1 = 0
in which we set all γs and ηs to 1 in order to set the scale
of every Qs except the first, and every Rs except the first.
Further we set ρ to 1 for the scale of Q1 , with respect to
R3 , and α to 1 to set the scale of R1 with respect to P3 .
Then the system is linear.
4
Figure 5. The figure illustrates the possible links between two or
three solved graphs.
Connecting three graphs already solved with few links
In this case, the topology is developed as follow: the first
graph has two links to the second and two to the third, then
there is a link between the second and the third.
Implementation
In this section we show via an example how with just few
views, considered salient for a good reconstruction, our
method proves its strength. The example we use here is
obtained from a number of images of Morpheus from the
Matrix series, illustrated in Figure 6, showing also the chosen topology for the path connecting four views. The complete metric reconstruction follows the steps listed below.
1. For features extraction we use the scale invariant feature transform [Low04].
Repolishment step is needed after camera matrix estimation in order to reduce errors due to fundamental matrix
measurements. Nevertheless we observe that this step is
generally very fast due to the proximity of linear solution
to the best one, as we see in the next section.
5
Figure 7. Dense disparity map.
2. For feature matching we base correspondence between pairs of features in two adjacent view (according to the topology illustrated in Figure 6) on shortest
Euclidean distance, in the invariant feature space.
3. The fundamental matrices are estimated iteratively
with Random Sample Consensus (RANSAC) [FB81,
RFP08] and a simple optimization, as follows:
• The Fij , for the five pairs indicated in Figure 6,
are estimated with the 8-points algorithm.
• Inliers are computed, using the estimated Fij , iteratively up to convergence.
• The Fij are re-estimated using inliers, minimizing a cost function based on Sampson error.
• New correspondences are identified using the
Fij found at the previous step.
4. Given the Fij , the camera matrices P1 , . . . , P4 are
obtained according to the method described in Section 3 following the specified topology. Moreover
P1 , . . . , P4 undergo a repolishment step via nonlinear
least squares in order to minimize the reprojection error
5. Given the camera matrices (the views) P1 , . . . P4 , errors between the measured point xij in view Pi and
the reprojection Pi Xj (see Section 2) are minimized
to produce a jointly optimal 3D structure and calibration estimate, by bundle adjustment [TMHF99].
6. The metric reconstruction is obtained, thus, with a
rectifying homography H from auto-calibration constraints as (Pi H, H −1 Xj ) as described in [HZ00].
7. With rectification new camera matrices are determined to obtain coplanar focal planes.
8. Finally, the dense disparity map, illustrated in Figure
7, is obtained from the correspondences between each
pixel of image pairs, see [HZ00]. While the dense 3D
reconstruction shown in Figure 2 generates a dense
cloud of points using optimal triangulation algorithm.
Experiments
We tested our reconstruction method on a collection of randomly generated point sets and cameras in order to estimate
its reliability under different working conditions.
We have analyzed especially the robustness to errors
of the four views graph (see Figure 3) which is the base
structure of reconstruction graphs.
In fact the goal of a real camera matrix estimator is
to compute a set of camera matrices which minimizes the
reprojection error of the estimated 3D points.
Accordingly, in each trial we have simulated the real
estimation process of camera matrices and have measured
the reprojection error via the Sampson error. We recall that
Sampson error is the first order approximation of the reprojection error to which is actually very close but absolutely
computationally cheaper [HZ00].
Every trial consists of the following steps:
1. Setting up the environment. This amounts to random
generation of four camera matrices and a point cloud
(from 50 up to 1000 points). Projection of every point
in the cloud on the image plane of each camera. Any
point projection is perturbed by a gaussian error with
standard deviation 0.2 ≤ σ ≤ 2, fixed by the trial.
2. Estimation of fundamental matrices. These are obtained from projected matching points on every couple
of camera’s image planes (see point 3, Section 4)
3. Estimation of the four camera matrices. Our method is
applied on five of the six fundamental matrices computed in the previous step and arranged following the
four views graph topology (see point 4, Section 4).
4. Computation of fundamental matrices, those are obtained from every pair of camera matrices.
5. Measurement of the Sampson error of the projected
points on those fundamental matrices.
At the end of every trial we collected the mean and
variance of the absolute Sampson error measured in pixels
on the fundamental matrices estimated at step 2 and at step
4. Let us denote M SE1 , V SE1 , M SE2 and V SE2 the
means and variances of the two errors. Analyzing those
data we can observe a strong linear dependence between
M SE2 and M SE1 . In fact the model which best fit among
polynomials is
M SE2 = 0.35 + 1.53M SE1
(18)
with variance 1.62. Note that in 56.3% of the trial we have
M SE2 < 0.35 + 1.53M SE1, and in 88.1% of the trial we
have M SE2 < 0.35 + (1.53 + 1.62)M SE1.
Tuan Luong, and Olivier D. Faugeras. Robust
recovery of the epipolar geometry for an uncalibrated stereo rig. In ECCV (1), pages 567–
576, 1994.
[Ede95]
Shimon Edelman. Class similarity and viewpoint invariance in the recognition of 3d objects. Byological Cybernetics, 72:207–220,
1995.
[FB81]
Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model
fitting with applications to image analysis
and automated cartography. Commun. ACM,
24(6):381–395, 1981.
[FG02]
D. H. Foster and S. J. Gilson. Recognizing
novel threedimensional objects by summing
signals from parts and views. In Proceedings
of the Royal Society of London: Series B, volume 269, pages 1939–1947, 2002.
[Har97]
Richard I. Hartley. Lines and points in three
views and the trifocal tensor. International
Journal of Computer Vision, 22(2):125–140,
1997.
[Har98]
Richard I. Hartley. Computation of the quadrifocal tensor. In ECCV (1), pages 20–35, 1998.
[Hey98]
Anders Heyden. A common framework for
multiple view tensors. In ECCV (1), pages 3–
19, 1998.
[HZ00]
R.I. Hartley and A. Zisserman. Multiple View
Geometry in Computer Vision. Cambridge
University Press, ISBN: 0521623049, 2000.
[Low04]
David G. Lowe. Distinctive image features
from scale-invariant keypoints. International
Journal of Computer Vision, 60(2):91–110,
2004.
[RFP08]
Rahul Raguram, Jan-Michael Frahm, and
Marc Pollefeys. A comparative analysis of
ransac techniques leading to adaptive realtime random sample consensus. Computer Vision,ECCV 2008, pages 500–513, 2008.
Figure 8. Linear dependance between M SE2 on y-axis and
M SE1 on x-axis (solid line) and security lines (dashed lines)
6
Conclusion
We have described an original method, as far as we know,
within the process of 3D reconstruction from a set of views,
to obtain the camera matrices from a set of fundamental
matrices, that is fast and flexible. A main feature of our
method is that it uses just the information given by fundamental matrices, without further assumptions requiring
more views. It is clear that our approach can be useful in
many practical applications, in fact, relying on a straightforward linear solution, it proves to be more efficent than
classical approaches. Furthermore, the technique should be
easily integrable into complex vision systems. We exploit,
on the other hand, a topology of n-view, recursively based
on four views, to build a system which is, in general, non
linear but in the specified topological arrangements. Thus,
our approach, is close to the human behaviour determining where to look at, to discover the important views (the
hidden views) necessary to mentally reconstruct an object.
Indeed, it exploits a specific structure of two views simply
based on pairs of images correspondences, and thus it uses
all the information given by the estimation of the fundamental matrix.
The research has been partially supported by the EU
project NIFTI, n. 247870.
References
[BG93]
Irving Biederman and Peter C Gehardstein.
Recognizing depth-rotated objects: Evidence
and conditions for three-dimensional viewpoint invariance. Journal of Experimental Psychology: Human Perception and Performance,
19(6):1162–1182, 1993.
[DZLF94] Rachid Deriche, Zhengyou Zhang, Quang-
[TMHF99] Bill Triggs, Philip F. McLauchlan, Richard I.
Hartley, and Andrew W. Fitzgibbon. Bundle
adjustment - a modern synthesis. In Workshop
on Vision Algorithms, pages 298–372, 1999.
[ZDFL95] Zhengyou Zhang, Rachid Deriche, Olivier D.
Faugeras, and Quang-Tuan Luong. A robust
technique for matching two uncalibrated images through the recovery of the unknown
epipolar geometry. Artif. Intell., 78(1-2):87–
119, 1995.
Switching tasks and flexible reasoning in the Situation Calculus
Alberto Finzi and Fiora Pirri
{finzi}@na.infn.it {pirri}@dis.uniroma1.it
March 25, 2010
Abstract
In this paper we present a new framework for modelling switching tasks and adaptive, flexible behaviours
for cognitive robots. The framework is constructed on a suitable extension of the Situation Calculus, the Temporal Flexible Situation Calculus (TFSC), accommodating Allen temporal intervals, multiple timelines and
concurrent situations. We introduce a constructive method to define pattern rules for temporal constraint, in a
language of macros. The language of macros intermediates between Situation Calculus formulae and temporal
constraint Networks. The programming language for the TFSC is TFGolog, a new Golog interpreter in the
Golog family languages, that models concurrent plans with flexible and adaptive behaviours with switching
modes. Finally, we show an implementation of a cognitive robot performing different tasks while attentively
exploring a rescue environment.
Keywords: Cognitive robotics, executive control, cognitive control, switching
tasks, adaptive and flexible behaviours, Situation Calculus, action perception and
change, temporal planning.
1
1
1
INTRODUCTION
2
Introduction
Several approaches have been recently taken for the advances of cognitive robotics. These different viewpoints
are foraged by new breakthroughs in different research areas correlated to cognitive control and, mainly, by
new experimental settings that have encouraged a better understanding of the cognitive functioning of executive
processes. In real-world domains robots have to perform several activities requiring a suitably designed cognitive
control, to select and coordinate the operation of multiple tasks.
The ability to establish the proper mappings between inputs, internal states, and outputs needed to perform
a given tasks [48] is called ”cognitive control” or ”executive function” in neuroscience studies and it is often
analysed with the aid of the concept of inhibition (see e.g. [48, 3]), explaining how a subject in the presence of
several stimuli responds selectively and is able to resist inappropriate urges (see [77]). Cognitive control, as a
general function, explains flexibly switching between tasks, when reconfiguration of memory and perception is
required, by disengaging from previous goals or task sets (see [45][55]).
The role of task switching in robot cognitive control is highlighted in many biologically inspired architectures,
such as the ISAC architecture [34], ALEC architecture based on state changes induced by homeostatic variables
[25], Hammer [15] and the GWT (Global Workspace Theory) [71].
Studies on cognitive control, and mainly on human adaptive behaviours, investigated within the task-switching
paradigm, have strongly influenced cognitive robotics architectures since the eighties, as for example the Norman
and Shallice [54] ATA schema, the FLE model of Duncan [16] and the principles of goal directed behaviours in
Newell [53] (for a review on these architectures in the framework of the task switching paradigm see [67]).
Also the approaches to model based executive robot control, such as Williams [8] and earlier [33, 80], devise runtime systems managing backward inhibition via real-time selection, execution and actions guiding, by
hacking behaviours. This model-based view postulates the existence of a declarative (symbolic) model of the
executive which can be used by the cognitive control to switch between processes within a reactive control loop.
Here, the executive model provides a local and detailed representation of the system and monitors the processes
engagement and disengagement. In this context, the flexible temporal planning approach (e.g. Constraint-based
Interval Planning framework [33]), proposed by the planning community, has shown a strong practical impact in
real world applications based on deliberation and execution integration (see e.g. RAX [33], IxTeT [27], INOVA
[74], and RMPL [80]). These approaches amalgamate planning, scheduling and resource optimisation for managing all the competing activities involved in many robot tasks. Important examples are the flexible concurrent
plan concepts of Jonsson and colleagues [33, 12] and Ghallab and colleagues [27]. The flexible temporal planning approach, underpinning temporal constraint networks, provides a good model for behaviours interaction and
temporal switching between different events and processes. However, the extremely complex structure required
by the executive robot control has strongly affected the coherence of the whole framework, especially because
implementation issues have prevailed in the flexible temporal planning approach over the semantic modelling of
the different components integration.
On the other hand, from a different perspective, high level executive control has been introduced in the qualitative Cognitive Robotics1 community, within the realm of theories of actions and change, such as the Situation
Calculus [46, 61, 41, 65], Fluent Calculus [68, 17, 76], Event Calculus [72, 73, 22], the Action language [26]
and their built-in agents programming languages such as the Golog family (ConGolog, INDIGolog, Readylog,
etc. see [38, 64, 11, 30]), FLUX [75], and similarly APL [1]. In the theory of action and change framework the
problem of executive control has been regarded mainly in terms of action properties, their effects on the world
(e.g. the frame problem) and the agent’s ability to decide on a successful action sequence basing on its desire,
intentions and knowledge [40, 4, 29, 42]. These both for off-line and online action execution. In this sense high
level executive control is intended as the reasoning process underlying the choice of actions.
Nonetheless reactive behaviours have been considered from the view point of the interleaving properties
of the agent’s actions and external exogenous actions, induced by nature [63]. Reiter grasped the concept of
1 The
term has been earliest introduced by Reiter IJCAI 93, see also [39]
1
INTRODUCTION
3
inhibition through that of “bad situations” [65]. Bad situations, however, were proposed in the perspective of
actions effects achievements, although his considerations where more deeply immersed in the human behaviour
and also concerned with task switching. Analogously, in Decision Theoretic Golog the stochastic structure of
actions served to achieve the most successful plan in uncertain domains [9].
Real world robot applications are increasingly concerned not just (and not only) with properties of actions
but also with the system reaction to a huge amount of stimuli, requiring to handle response timing. Therefore,
the need to negotiate the multiplicity of reactions in tasks switching (for vision, localisation, manipulation, exploration, etc.) is bearing a different perspective on action theories. An example is the increasing emphasis on
agents programming languages or on multiple forms of interactions leading to the extraordinary explosion of
multi agent systems.
Indeed, the control of many sources of information, incoming from the environment, likewise arbitration of
resource allocation for perceptual-motor and selection processes, had become the core challenge in actions and
behaviours modelling.
The complexity of executive control under the view of adaptive, flexible and switching behaviours, in our
opinion, requires the design of a grounded and interpretative framework that can be accomplished only within a
coherent and strong qualitative model of action, perception and interaction. This with the proviso to offer sound
transformations of the underlying constructs into structures that can be treated quantitatively (e.g. temporal
networks, Bayes networks, graphical models, etc.).
The main contributions of this paper can be briefly summarised as follows:
• we extend the framework of the Situation Calculus to represent heterogeneous, concurrent, and interleaving
flexible behaviours, subject to switching-time criteria. This leads to a new integration paradigm in which
multiple parallel timelines assimilate temporal constraints among the activities.
• Temporal constraints and rules for their definition (the compatibilities) implement adaptation and inhibition
of behaviours. This is made possible via a specific term that we call bag of timelines (also bag of situations),
actually a set of concurrent, temporal situations formalising processes on multiple timelines. On the basis
of this term we are able to introduce a constructive method for declaring temporal compatibilities, based
on a meta-language.
• The compatibilities are rules with a double facet, they are formulae of the Situation Calculus but also the
logical counterpart of a temporal network. We show, indeed, that compatibilities can be transformed into
temporal constraint networks. We show therefore that, under specific circumstances, logic-based reasoning
and constraints propagation can be treated independently still in the same logical framework.
• As usual within the Situation Calculus, the extended framework provides the semantics for specifying a
Golog interpreter. We introduce the Temporal Flexible Golog (TFGolog) programming language suitable
for representing high-level agent programs for concurrent and temporal switching processes. We show
how the TFGolog interpreter transforms high-level programs into temporally flexible plans.
• We prove consistency results about the TFSC and prove several properties of the system.
• We provide several examples that illustrate our approach and show its usefulness. In particular, we show
that the framework can foster attention driven exploration. The example has also been used for testing this
framework, as reported in [10].
The rest of the paper is organised as follows. In the next section we give an intuition of the proposed work
with an example about cognitive robot control. In Section 3 we recall some preliminaries properties of the
Situation Calculus and Golog and we introduce the Temporal Flexible Situation Calculus (TFSC). The TFSC as
an extension of the Situation Calculus, including timelines and bag of timelines, is used also to define processes
and constraints between processes, these issues are discussed in Section 4 and in Section 5. The language for the
2
WHY FLEXIBLE PLANNING AND WHY MODELLING MULTIPLE BEHAVIOURS
4
construction of constraints and flexible behaviours is presented in Section 5 and it is shown, in Section 6, how
it maps to a Constraint Temporal Network. In Section 7 we introduce the Temporal Flexible Golog interpreter
showing several results and examples. In Section 8 we illustrate the example of an attentive robot controller
based on the Temporal Flexible Golog interpreter. Finally we dedicate Section 9 to the related works and to hint
future works. All the proofs are collected in the Appendices A and B at the end of the paper.
2
Why Flexible Planning and why modelling multiple behaviours
Robotic systems, whether they are mobile robots, camera networks or sensor networks, are composed of several
heterogeneous hardware and software components operating concurrently and smoothly interacting with the
external world. In these systems, complexity increases exponentially with the number of possible interactions
among components. Each component can perform a set of activities whose duration might be controllable or not,
as exogenous events can disturb allocated time lags. The range and variety of components possible interactions
is quite extended, although limited by both structural temporal constraints (such as timeouts, time priorities, time
windows), and resource constraints, such as terrain features for locomotion, light features for vision, and speed
time for sensor networks.
A control plan suitable for these systems should be flexible, in order to be robust and avoid deadlocks. In
other words it should be able to make available, at any time, a set of possible behaviours, so that the actual
behaviour can be decided on-line.
Figure 1: Planned activities for the rescue mobile robot (see the robot looking at a victim hand waving from a
hole, in the wall, on the right). Each component is represented by a timeline where the planned activities are
sequenced. Starting and ending time of these activities are bound just at the execution time.
Example 1 (Cognitive Robot) A mobile robotic system performs some basic tasks such as exploring the environment (possible a rescue environment). The robot control system is composed of several functional components, some typical are: Mapping and Localisation, Navigation (for path-planning), Pan-tilt unit (for head
and gaze control), Camera (for vergence, zooming, etc.), Locomotion (low level engine controllers), Sound processing, Visual processing, Attentional system, Exploration (taking care of search strategies), and possible other
components related to other sensors and other adaptation needs. These concurrent activities may have several
causal, temporal, and resource constraints. For example, the Pan-tilt should look ahead while the robot is moving. The Camera should be continuously pointed in the correct direction that might be earlier detected by sound
and, at the same time, the robot engine vibrations should be compensated by some stabilisation process, likewise ego-motion for suitable tracking. Camera and Pan-tilt components might start a tracking activity during
3
BASICS FOR THE TEMPORAL FLEXIBLE SITUATION CALCULUS
5
a task requiring to explore and search for something, or to follow someone. However while the starting time
of the ptuScan process is controllable, the ending time of this process is nondeterministic, as it depends on the
response of the scanned object/person. In turn, the ending time of the tracking process affects the starting and
ending times of other activities, for example to pinpoint where exactly to go, operating the necessary strategies
to achieve that. Figure 1 illustrates on the left a flexible temporal plan for a rescue robot (on the right looking
at a hand waving from a hole in a wall). The plan stipulates that the explore process, given that it ends within
an interval of [10, 20], should commit the the end-time of the stop process to be greater than 8, but less than 25
seconds, while ptuReset should end between 15 and 22 seconds. Now, ptuReset can be active just during the
locomotion component process stop; the stop process, in turn, can end only after the end of ptuReset. On the
other hand explore is not directly affected by ptuReset and stop, hence it can end before, after or during these
activities, and its ending time can switch w.r.t. the ending times of ptuReset and stop. Whenever a set of planned
activities is executed, the associated activation times are actually bound; hence, the enforced constraints can be
suitably propagated.
In the next sections we show how these problems can be addressed and solved in the Situation Calculus and
Golog providing a clear and sound framework for designing a complex system.
3
Basics for the Temporal Flexible Situation Calculus
In this section, we present the basic ideas and formal structure of the Temporal Flexible Situation Calculus
(TFSC). The TFSC is conceived for describing a complex dynamic system with a finite number of components,
to which a certain amount of resources and processes are assigned. The system should be able to execute interleaving processes, allowing switching between processes threads of different components, by inhibiting active
tasks of less demanding components.
3.1
Preliminaries
The Situation Calculus [46, 65] (SC) is a sorted first order language with equality, augmented with a second order
induction axiom. The underlying signature of the sorted language is specified by three sorts: Act for actions, S it
for situation and Ob j for objects. To simplify reading we usually refer to these sorts as actions, situations and
objects.
The terms of sort actions are either constants or functions mapping elements of sort objects and possibly of
sort actions into elements of sort actions, e.g. move(x, y).
Terms of sort situation are either the constant symbol S 0 or terms of the form do(a, s), where a is a term of
sort action and s is a term of sort situation. The term S 0 denotes the initial situation, where no action has yet
occurred, while do(a, s) encodes the sequence of actions obtained from executing the action a after the sequence
of actions encoded in s.
Properties of objects and their dynamics are described by fluents. Thus fluents denote properties that may
change when executing an action, and are specified by either predicates or function symbols whose last argument
is a situation. A basic action theory is defined by the following set of axioms
BAT=(Σ, DS0 , Dssa , Duna , Dap ).
(1)
Here
• Σ is the set of domain independent foundational axioms for the domain of situations, see Table 1. Situations
are kept countably infinite by a second order axiom (see Table 1) assessing that there are no unintended
models of the language, in which situations can be strange objects.
3
6
• Duna is the set of unique name axioms for actions, which expresses that different action terms, namely
different names, stand for different actions:
A(x1 , . . . , xn ) � B(y1 , . . . , yn ),
and identical action terms have the same arguments
A(x1 , . . . , xn ) = A(y1 , . . . , yn ) → x1 = y1 ∧ · · · ∧ xn = yn .
• DS0 is a set of first-order formulas describing the initial state of the domain (represented by S 0 ).
• Dssa is the set of successor state axioms [62, 65], one for each fluent symbol F(�x, s), in the language. A
successor state axiom is an explicit definition of a fluent in a successor state do(a, s) as follows:
F(�x, do(a, s)) ≡ ΦF (�x, a, s).
A successor state axiom provides both a definition of action effects and a solution to the frame problem
(assuming deterministic actions).
Given a basic action theory, it is possible to infer the properties of the theory just appealing to the initial theory
DS 0 . This is done by regressing any formula, taking as argument a situation of the form do(am , . . . , do(a1 , S 0 )),
into an equivalent formula taking as argument the initial situation S 0 and not mentioning other situations different
from S 0 .
The regression of a formula φ(do(am , . . . , do(a1 , S 0 ))) is defined via a regression operator R by induction
using the definitional structure of the successor state axioms and the properties of R as follows:
R(F(�x, do(a, s))) = ΦF (�x, a, s)
R(¬φ) = ¬R(φ)
R(φ1 ∧ φ2 ) = R(φ1 ) ∧ R(φ2 )
R(∃x.φ) = ∃x.R(φ).
(2)
The simplicity and elegance of making inference and prove properties of situations is, indeed, due to the structure
of the axioms, based on explicit definitions of the successor state. This structure would be prejudiced if state
constraints are added to the theory, i.e. formulas mentioning situations neither uniform in S 0 2 nor in the form of
successor state axioms. For example the followings state constraints:
∀s.Raise(sun, s).
∀sOn(x, y, s) ∧ On(y, z, s)→On(x, z, s).
(3)
lacking a definitional structure, would compromise the inference based on regressing sentences to the initial
database DS 0 .
Golog, was earlier introduced in [38], is an agent programming language formally based on the SC and
usually implemented in Eclipse Prolog. Golog uses Algol-like control constructs to define complex actions from
the primitive actions, which are those of a basic action theory BAT, see (1):
1. Action sequences: p1 ; p2 .
2. Tests: φ?.
3. Nondeterministic action choices: p1 |p2 .
2 Formulas
uniforms in σ = do(a1 , . . . , do(am , S 0 )), m ≥ 0, are formulas either not mentioning situation terms or formulas not mentioning
Poss nor � nor any other situation term than σ [61].
3
7
4. Nondeterministic choices of action argument: (πx).p(x).
5. Conditionals: if φ then p1 else p2 .
6. While loops: while φ do p.
7. Nondeterministic iteration: p1 ∗ .
8. Procedure calls: {proc P1 (�x1 ) p1 end; . . . proc Pn (�xn ) pn ; p }
An example of a Golog program is
while ¬At(1, 2) do (πx, y)moveT o(x, y).
Intuitively, the nondeterministic choice (πx, y)moveT o(x, y) is iterated until the atom At(1, 2) is verified.
The Golog declarative semantics is defined in the language of SC. Given a complex action δ (a Golog program), the abbreviation Do(δ, s, s� ) says that situation s� can be reached from situation s by executing some
complex action specified by the program δ.
The construct definitions are the following:
1. Primitive actions:
def
Do(a, s, s� ) = Poss(a, s) ∧ s� = do(a, s).
2. Test actions:
def
Do(φ?, s, s� ) = φ[s] ∧ s = s� .
3. Sequence:
def
Do(p1 ; p2 , s, s� ) = ∃s�� Do(p1 , s, s�� ) ∧ Do(p1 , s�� , s� ).
4. Non-deterministic choice of two actions:
def
Do(p1 |p2 , s, s� ) = Do(p1 , s, s� ) ∨ Do(p2 , s, s� ).
5. Non-deterministic choice of arguments:
def
Do(π(x, p(x)), s, s� ) = ∃x Do(p(x), s, s� ).
6. Non-deterministic iteration:
def
Do(p∗ , s, s� ) = ∀P.{P(s1 , s1 )∧
∀s1 , s2 , s3 .P(s1 , s2 ) ∧ Do(p, s2 , s3 )]} → P(s1 , s3 ).
For procedure call expansion and other important constructs we refer the reader to [38, 29, 28, 65, 31].
3
3.2
8
Extensions of the SC
Among several languages for action theories (like the Fluent Calculus [68, 17, 76] the Event Calculus [72, 73, 22]
and the Action language [26]) the SC is particularly simple to be extended and adapted to specific domains, such
as, for example, cognitive robotics domains.
In fact, being an axiomatic theory, an extension of the SC, requires two simple steps:
1. extend the set of foundational axioms Σ to account for more rich domains;
2. show that the new extension respects the fundamental constraints required to do inference within the system.
The flexibility of both the successor state axioms D ss and the action precondition axioms Dap allows the user to
define any domain.
This is the reason why there have been many contributions to extensions of the Situation Calculus such as
([42, 58, 40, 66, 61, 4, 59, 21, 9, 65]). All these contributions, further, have coped with the constraints required
by the regression inference, including the specification of the Golog programming language (see Section 7) such
as ([38, 11, 64, 30]), requiring axioms to be based on the construction of explicit definition. In particular, macro
definitions (see also [65] for a paragraph on “Why Macros?”) are explicit definitions of predicates that are not
added to the language, therefore they stand also for abbreviations of the formulas defining them (the definiens).
We refer the reader to [65, 61] for a complete introduction to the inference mechanisms in the Situation
Calculus.
In this paper we extend the Situation Calculus by adding a new set of axioms to the set of its foundational
axioms, and by introducing macro definitions. In order to ensure that all the constraints are satisfied we need to
go into details that are rather boring, although often straightforward, therefore many details are postponed and
described in the Appendix. In particular, all proofs, likewise lemmas, for this section are given in Appendix A
and Appendix B.
3.3
Time, types and bag of timelines
The set of foundational axioms of the Situation Calculus, together with the set of new axioms are reported in
Table 1. We introduce three kind of extensions. The first extension, concerning time, is essentially the same as
the one introduced by Reiter in [65] and [63, 56], but making explicit the definition of start [65]. With the second
kind of extension we introduce name types as objects, i.e. specific elements of sort object, denoted by constants:
these are used in the perspective of describing a system with several components that can be each named by a
specific constant. Note that because a robot system is composed of a finite set of parts we assume that the set
of components is bound to be finite, although what a component can do might be described by an infinite set of
actions. Each name type is extensively defined by a collection of actions that the component can execute.
Name types are used to classify actions according to the actuating agent, for example often this has been
implicitly assumed in the presence of different agents, by naming each agent specifically, here we do the same
but in a systematic way. The third extension deals with set of situations, with some specific properties which are
obtained by types and time. These kind of situations are called timelines and a set of timelines is here specified
by what we call bag of timelines.
Notation: in the following sections LX denotes the language specified by the axioms X. More precisely
we assume that the signature includes all the symbols (functions and predicates) mentioned in X, equality, the
quantifiers and connectives of FOL. Thus, for example, LΣ is the language of the axioms Σ. Also, we shall
use the ordering relation s � s� between situations s and s� , that abbreviates s � s� ∨ s = s� (see foundational
3
9
Foundational Axioms of SC, Σ
s � do(a, s� ) ≡ s � s�
¬(s � S 0 )
do(a, s) = do(a� , s� ) ≡ s = s� ∧ a = a�
∀P.P(S 0 ) ∧ ∀as.P(s)→P(do(a, s))→∀sP(s)
Time axioms, Ax0
Σtime = Σ ∪ Ax0
Flexible Time SC extension axioms
A+ = Σ ∪ Ax0 ∪ Ax1 ∪ Ax2 ∪ Ax3
Type axioms, Ax1 -Ax2
ΣH = Σtime ∪ Ax1 , Σ=ν = ΣH ∪ Ax2
Timelines axioms Ax3 and
Bag of timelines axioms, Ax4
Σ=ν = ΣH ∪ Σtime = Σ ∪ Ax0 ∪ Ax1 ∪ Ax2
T1.
time(S 0 ) = t0
∀a.
H1
(H(i, a)∧
i=1
n
n
�
�
(H(i, a)→
¬H( j, a)))
i=1
T2
∀�x, t.time(A(�x, t)) =
t→t > t0
T3.
∀�x, t, s. time(do(A(�x, t), s)) =
time(A(�x, t))
n
�
j=1
j�i
H2
∀a, a� .a=ν a� ≡ ∃i. H(i, a) ∧ H(i, a� ))
E1.
∀s, s� (s = s� →s=ν s� )∧
(s � S 0 →¬(s=ν S 0 ∨ S 0 =ν s))
E2.
∀a¬(a=ν S 0 )
E3.
∀a, a� , s� (a =ν do(a� , s� )) ≡
(a =ν a� ) ∧ (s� � S 0 →(s� =ν a))
E4.
∀a� , s, s� (s=ν do(a� , s� )) ≡
(s=ν a� ) ∧ (s� � S 0 →s� =ν s)
E5.
∀a, s(s=ν a) ≡ (a=ν s)
A+ = Σ=ν ∪ Ax3 ∪ Ax4
�
G1.
∀s, s j1 , . . . s jk . s ∈S B(�s j1 , . . . , s jk � j ) ≡
�
1≤p≤k
��
�
�
s = s jp ∧ s jp = S 0 ∨
T (i, s jp )
k∈N
i
∀s∀ ,
�
. (s ∈S
G2.
≡ s ∈S
�
) ≡ ( =S
�
)
G3.
∃ . = B0
G4.
∀s ∀ ∀i.s = S 0 ∨ T (i, s)→
∃ � ( � =S ∪S B(s))
G5. for all sentences φ
ϕ(B 0 ) ∧ (∀ ∀s.ϕ( ) ∧ ϕ(B(s))→
ϕ( ∪S B(s))) →∀ ϕ( )
W1.
n
�
∀a, s.
T (i, do(a, s)) ≡ (s = S 0 ∧ H(i, a))∨
i=1
(s � S 0 ∧ a=ν s ∧ T (i, s))
Table 1: The foundational axioms Σ, of the basic Situation Calculus, the axioms A+ = Σ∪Ax0 ∪Ax1 ∪Ax2 ∪Ax3 ∪
Ax4 of the Flexible time Situation Calculus, extending Σ with time, types and bag o f timelines. The extension
of the foundational axioms Σ is incremental. Four sets are built: first the set Σtime , extending Σ with time; then
the set ΣH , extending Σtime with types; then the set Σ=ν extending ΣH with equivalence relations over actions and
situations; finally the set A+ extending Σ=ν with timelines and bag of timelines.
3
10
axioms Σ in Table 1). We shall use �x to denote a tuple of variables, a to denote variables of sort action, A(�x) to
denote action functions with arguments �x. When a situation mentions only actions in the form A(�x) then its only
variables are variables of sort object, thus we use the symbol α to denote actions which are either ground or in
the form A(�x). We use the symbol s to denote variables of sort situations, σ to denote histories of actions, such
as σ = do(am , . . . , do(a1 , S 0 )), that is, a sequence of actions of length m, m >= 1 and S 0 to denote the initial
situation, S 0 is a constant. As we shall extend the signature the new symbols will be contextually introduced.
�
3.4
Representing Time in TFSC
Time has been extensively introduced in the Situation Calculus in [58, 63, 60], where actions are instantaneous,
and their time is selected by the function time(.). Durative actions are considered as processes [58, 65], represented by fluents, and durationless actions start and terminate these processes. For example, going(hill, s) is
started by the action startGo(hill, t) and it is ended by endGo(hill, t� ).
Analogously as in [63, 56], primitive actions are instantaneous and are represented by the term A(�x, t) where
t is a special argument representing the execution time. For example, moveT o(room4, 0.5) means that moveT o
was executed at time 0.5.
We use time selection functions to extract the time of both actions and action sequences. In particular, we
introduce a function time : S it ∪ Act → R+ , mapping both situation and actions into the positive real line, thus
we implicitly assume that the reals are axiomatised (see the Appendix page 45). We also introduce the relation
< and ≤ defined as < ∨ = ranging over the reals R+ . This is a common assumption in the Situation Calculus (see
[65]), thus we leave it like this.
We denote with Ax0 the set of axioms T 1-T 3 and Σtime = Σ ∪ Ax0 , see Table 1. Axiom T 1 says that the time
of the initial situation S 0 is the initial time t0 , axiom T 2 says that the time of an action is the time of its time
argument which has to be a positive real number successive to the initial time. Finally the third axiom T 3 says
that the time of a situation do(a, s) is the time of the action a. The set Σtime is a conservative extension of the
axioms Σ, of the basic Situation Calculus (see Lemma 1 in the Appendix). Here by conservative extension we
mean that Σtime is obtained by extending the original language LΣ to the new language LΣtime without changing
the initial theory Σ and its deductive closure, when only the original language is considered.
The set Ax0 , however, does not ensure that the ordering � on situations is coherent with time. In other words
s � s� →time(s) ≤ time(s� ) does not hold, in general, in a model M in which Σtime ∪ Duna is verified. Nevertheless
it is always possible to build a model of Σtime ∪ Duna in which the above condition is verified (see Lemma 2 in
the Appendix). Thus, to add coherence between situations and time we need to add a further axiom
T 4.
s � s� →time(s) ≤ time(s� )
(4)
This new axiom will restrict the set of models to time coherent situations. In this models, although s �
s� →time(s) ≤ time(s� ) will be verified, the inverse implication (time(s) ≤ time(s� ))→s � s� in general will not
hold.
3.5
Typed Actions and Situations
The second column of Table 1 illustrates the axiomatisations for name types. The distinction between sorts
and name types is that sorts induce a partition on the domain, while name types are defined in the language
via constant symbols involving only the sort Ob j, still inducing a partition on actions hence on situations. In
particular, axioms Ax1 = {H1, H2} regulate name types, and the set Ax2 = {E1, E2, E3, E4, E5} extend types to
situations via the relation =ν that is defined by Axiom (H2). Axiom (H1) settles the required specifications for
name types to be coherent with respect to actions. The disjunction of components mentioned in the first conjunct
of (H1) states that each action is ascribed to some component i. On the other hand the second conjunct of (H1)
3
11
states that, whenever an action is ascribed to a component with name type i, it cannot be ascribed to any other
component. Note that (H1) does not affect the set Duna of inequalities for actions and, clearly, in a basic action
theory, with a single component, (H1) is always satisfied.
The axioms (H1) and (H2) can be safely added to Σtime , forming the theory ΣH = Σtime ∪ Ax1 , and the theory
ΣH maintains satisfiability, see Lemma 3, in the Appendix page 46.
The partition of actions, according to name types, is equipped with the relation =ν , defined by (H2), see Table
1. We show that =ν is an equivalence relation on the set of actions in Lemma 4, see the Appendix, page 47.
Axioms E1-E5 (see Table 1, from row nine, second column) are, thus, needed to extend the relation =ν ,
defined by axiom (H2) for actions, to the set of situations. Axiom (E1) states that if two situations are equal then
they must be also of the same type, but no situation is of the same type as S 0 . Axiom (E2) states that no action
can be of the same type of S 0 . Axiom (E3) states recursively that an action is of the same type of a situation
do(a� , s� ) if it is of the same type of the action a� and it is of the same type as s� , whenever s� is not S 0 . Finally,
axiom (E4) says that two situations are of the same type if they mention actions and situations of the same type,
and (E5) states symmetry between actions and situations.
Also these axioms can be safely added to the theory built so far. Lemma 5, in the Appendix B, page 47, shows
that adding axioms Ax2 = E1-E5 to the theory ΣH , in so obtaining the new theory Σ=ν = ΣH ∪ Ax2 , can be done
consistently, also if the axioms set Duna is included. Furthermore we show in Lemma 6 (see the Appendix page
48), that =ν is an equivalence relation both on actions and on situations. This fact will be used to form timelines
(see Section 3.6)
Theorem 1 Let Σ be the set of foundational axioms of the Situation Calculus, and let Duna be the set of unique
name for actions, then the set of axioms Σ=ν = Σ ∪ Ax0 ∪ Ax1 ∪ Ax2 is a sound axiomatisation of the temporal
flexible Situation Calculus, that is, the set of axioms and Duna together form a satisfiable theory.
�
Now, with H1-H2 a new predicate H is introduced in the language, and it is defined for each component i.
Let i1 , i2 , . . . , in be a finite set of constants denoting the components of a system, each ik is a name type. Each
H(i, a) can be introduced by an extensional definition as follows:
∀a.H(i, a) ≡ φ(i, a)
(5)
When the action names for a specific component is a finite set then the extensional description of the component
might be done as follows:
∀a.H(i, a) ≡ ∃�x1 , . . . , �xn .
n
�
Ai (�xi ) = a
(6)
i=1
Example 2 If we want to define the actions for the robot component Pan-tilt we would introduce the constant
pan-tilt and define it by its actions as follows:
∀a.H(pan-tilt, a) ≡
∃ t θ.a = pan(θ, t) ∨ ∃ t γ.a = tilt(γ, t) ∨ ∃ t x y. a = scan(x, y, t).
(7)
�
This set of definitions for H is added to DS 0 , as they are all uniform in S 0 . The easiest way to ensure
consistency is to ascribe each action to a single type; in Lemma 7 (see the Appendix page 50) we show the
conditions to ensure consistency of the type definitions with the axiom H1. As usual with typed languages
there are drawbacks, if we consider a generic action name, such as run, that could be ascribed to more than one
component for all its arguments, then we need either to specialise it to each type or to create a single component
gathering all those subscribing the action run.
3
12
Figure 2: Timelines on a tree of situations. For this representation H(i, a1 ) ∧ H(i, a2 ) and H( j, a3 ) are possible
types, T (i, do(a1 , S 0 )) ∧ . . . T (i, do(a2 , S 0 )), and T ( j, do(a3 , S 0 ) are timelines. By the SSA for timeline, these
extend along the situations as indicated by the thick black lines.
3.6
Timelines and bag of timelines
We introduce in this section the concept of a timeline. This concept is particularly useful for flexible planning
because it makes possible to describe the interaction between processes performed by different components of
the system. This concept makes it also possible to deal with the time at which a process starts and ends in a
flexible way, according to the way processes interact.
A timeline is denoted by a fluent T (i, s) and it is defined by an improper successor state axiom as follows,
see Table 1 axiom (W1):
(W1)
n
�
i=1
T (i, do(a, s)) ≡ (s � S 0 ∧ a=ν s ∧ T (i, s)) ∨ (s = S 0 ∧ H(i, a)).
(8)
Note that (8) is not uniform in s3 as it mentions S 0 . Nevertheless the disjunction is obviously exclusive and
thus the right hand side never diverges, indeed (8) is regressable as it is shown in Lemma 11, in the Appendix B.
Example 3 The timeline for the pan-tilt unit can be defined as follows:
∀a s.T (pan-tilt, do(a, s)) ≡ s � S 0 ∧ a=ν s ∧ T (pan-tilt, s) ∨ s = S 0 ∧ H(pan-tilt, a).
Which, by the previous Example 2, are all the histories built up by the actions pan(θ, t), tilt(γ, t) and scan(x, y, t).
Note that, given the extended set Σ=ν the introduction of timelines does not affect the set of successor state
axioms and action precondition axioms, thus:
3 The typical form of a successor state axiom requires the right hand side to mention only the situation s named in the left hand side, being
thus uniform in s, to ensure that the regression via the right hand side ends in S 0 , in so not diverging into two, or more, different situations.
3
13
Corollary 1 Let D ss ∪Dap be the set of successor state axioms mentioning also timelines and action precondition
axioms. Let DS 0 be the set of formulas, uniform in S 0 4 . Σ=ν ∪ Duna ∪ DS 0 is satisfiable iff Σ=ν ∪ Duna ∪ DS 0 ∪
Dap ∪ D ss is satisfiable.
The characteristics properties of timelines are stated below.
Theorem 2 A timeline represents the =ν -equivalence class of situations of the same type.
The above theorem (see B.4 for the proof) states that all actions in a timeline are of the same type, and
whenever a set of actions are of the same type they form a timeline, thus not all situations form timelines.
In Section 4 we show that, under precise conditions, the set of timelines form the set of situations executable
by the system components.
Example 4 Timelines are sequences (histories) of actions indicated in thick black in Figure 2. The histories of
actions not belonging to the set of timelines are represented in light gray, that is, histories of actions leading to
situations, that do not belong to timelines, are depicted by thin gray lines.
So a timeline represents the equivalence class of histories of the same type, yet how to ensure a meaningful
interaction between timelines that can support switching tasks, is not proven. Suppose that we need to say that
while the robot is exploring a given environment to correctly scan the surrounding it should stop or decelerate.
The system component controlling the exploration actions should suitably synchronise with the component controlling the pan-tilt and the camera. To treat the interleaving between these two processes we have to ensure that
at each time step of the operation loop all timelines are available for choice and switching decisions.
To this end we introduce a new concept that can support set of timelines. This new concept comes with a new
sort, we call this new sort the sort of bag of timelines.
Intuitively, a bag of timelines is interpreted as a set of lists of actions, where each list of actions is a timeline.
We require a bag of timelines to be a finite set and to mention only situations which are timelines, and possibly
S 0.
First we have to introduce the sort S standing for bag of timelines. This is defined as the codomain of a
countable set of function symbols B : (S n �→ S n ) �→ S mapping a permutation of a tuple of situations into an
element of the sorted domain, whose intended interpretation is a bag of timelines.
Equality on these terms should account for idempotence and commutativity. Therefore we shall extend
equality to account for permutations and repetitions of equal arguments.
An S -term is defined as B(�s j1 , . . . , s jm � j ). Here, if m = 0 we obtain the empty bag of timelines, and we
denote the empty bag with the constant B 0 . With �s j1 , . . . , s jm � j we denote a permutation of {1, . . . , m}.
To formalise these ideas we adapt the finite set axiomatisation, from the system F of Brown and Wang [79], to
add bags of timelines to the language together with the specific symbols ∈S , =S and empty bag, here identified
with the constant term B 0 . The axioms, listed in Table 1 are here reported again, where all variables appearing
without quantifiers are implicitly universally quantified outside.
�
�
��
�
�
(G1) ∀s.s ∈S B(�s j1 , . . . , s jk � j ) ≡ 1≤p≤k s = s jp ∧ s jp = S 0 ∨ i T (i, s jp )
k∈N
Here i ranges over the finite set of name types.
(G2) ∀s. (s ∈S ≡ s ∈S � ) ≡ ( =S � )
(9)
(G3) ∃ . = B 0
(G4) ∀s∀i. s = S 0 ∨ T (i, s)→∃ � . ( � =S ∪S B(s))
(G5) For every sentence ϕ :
ϕ(B 0 ) ∧ (∀ ∀s.ϕ( ) ∧ ϕ(B(s))→ϕ( ∪S B(s)))→∀ ϕ( )
4 See
for the definition of uniform formulas Section 3.1 and [61]. Here DS 0 mentions also all the definitions H(i, a) for each name type i.
3
14
The above axiom set (G1) defines the symbol ∈S . Note that axiom (G1) could be bound by a n ∈ N and
transformed into a single axiom, otherwise there is an axiom for each k ∈ N. The set of axioms (G1) says that a
situation s can belong to a bag of timelines, if s is equal to some of the timelines specified in the bag and each
situation in the bag of timelines is either S 0 or it is, indeed, a timeline. Note that, following Theorem 2, although
S 0 is not a timeline it can belong to a bag of timelines. Axiom (G2) is the extensionality axiom limited to bags
of timelines. Axiom (G3) is the unconditional existence axiom, provided that the empty bag is the constant term
B 0 . Axiom (G4) is the conditional existence for finite bags of timelines, provided that and B(s) are finite bags.
Axiom (G4) too tells us that a bag of timelines can include S 0 . This axiom would allow bags unbound in size.
To get bags bound in size whenever axiom (G1) requires so for some n then (G4) is changed accordingly. Note
that, in (G4), ∪S is derivable from (G1), see the set operations as obtained in Example 6.
Finally the last axiom (G5) is the inductive characterisation of finite sets, it tells that whenever sentences
specify terms denoting bags of timelines these terms will denote finite bags.
Example 5 Let us consider the two timelines T (pan tilt, do(pan(θ), do(tilt(γ), S 0 ))) and T (laser, do(acquire, S 0 )),
then:
a.
b.
c.
B(�do(pan(θ), do(tilt(γ), S 0 )) p1 , do(acquire, S 0 ) p2 � p )
B(�do(pan(θ), do(tilt(γ), S 0 )) p1 , do(acquire, S 0 ) p2 � p ) =S
B(�do(acquire, S 0 )q1 , do(pan(θ), do(tilt(γ), S 0 ))q2 �q )
B(�do(tilt(γ), S 0 ) p1 , do(tilt(γ), S 0 ) p2 , S 0 p3 , S 0 p,4 � p ) =S
B(�do(tilt(γ), S 0 ) p1 , S 0 p3 , � p ).
is a bag of timelines, by (G1)
by (G1,G2)
by (G1,G2)
�
The other usual symbols for sets can be extended to bag of timelines, using definitions or, more generally,
the induction axiom.
Example 6 The operators ⊆S , ∪S and ∩S can be defined as follows:
x.
( ⊆S )
xx.
( ∪S
xxx.
( ∩S
�
�
de f
∀s.s ∈S →s ∈S
=
=S )
=S )
de f
≡ (s ∈S
∀s.s ∈S
=
de f
≡ (s ∈S
∀s.s ∈S
=
∨ s ∈S
∧ s ∈S
�
)
�
)
(10)
On the other hand it is possible to use induction to prove properties of bag of timelines. For example the following
simple property:
∀ , �,
��
. ⊆S
�
∧
�
⊆S
��
→ ⊆S
��
(11)
.
can be proved by induction as follows:
1.
2.
3.
4.
Let:
ϕ(B 0 )
ϕ( )
◦
ϕ( ∪
◦
)
= ∀ � , �� .B 0 ⊆S � ∧ � ⊆S �� →B 0 ⊆S �� .
= ∀ � , �� . ⊆S � ∧ � ⊆S �� → ⊆S �� .
= B(s)
= ∀ � , �� .( ∪ ◦ ) ⊆S � ∧ � ⊆S �� →( ∪ ◦ ) ⊆S
(12)
��
.
3
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
Then :
φ(B 0 ) ≡ �
( ∪S ◦ ) ⊆S � ∧ � ⊆S �� → ⊆S � ∧ � ⊆S ��
( ∪S ◦ ) ⊆S � ∧ � ⊆S �� → ◦ ⊆S � ∧ � ⊆S ��
( ⊆S � ) ∧ � ⊆S �� → ⊆S ��
( ◦ ⊆S � ) ∧ � ⊆S �� → ◦ ⊆S ��
( ⊆S � ) ∧ ( � ⊆S �� ) ∧ ( ◦ ⊆S � )→ ◦ ⊆S �� ∧ ⊆S
⊆S �� ∧ ◦ ⊆S �� →( ∪S ◦ ) ⊆S ��
( ∪S ◦ ) ⊆S � ∧ � ⊆S �� → ∪S ◦ ⊆S ��
ϕ(B 0 ) ∧ (∀ � ∀s.ϕ( ) ∧ ϕ(B(s))→ϕ( ∪S ◦ ))
∀ ϕ( )
��
15
By (G1) and (G2)
by (G2), (x) and (xx) of Ex. 6 and Taut.
by (G2), (x) and (xx) of Ex. 6 and Taut.
by Ind. Hyp.
by Ind. Hyp.
(13)
by d,e and Taut.
by f, (G2), (x) and (xx) of Ex. 6
by b, g and Taut.
by a, d, e, and g
by i and (G5)
A precedence relation �S between two bags of timelines can be defined as follows:
�S
≡ ∀s ∃s� .s ∈S →s� ∈S
∧ s � s� ∧ ∀s� ∃s.s� ∈S →s ∈S
∧ s� � s
(14)
�
The axiomatisation of bags of timelines is sound. Let Ax3 = G1-G5 and A+ = Σ=ν ∪ Ax3 then:
Theorem 3 A+ ∪ Duna is satisfiable.
�
So far we have extended the language of a basic theory of actions in the Situation Calculus to include time,
types, a new equality symbol =ν and bag of timelines, the final language is thus LT FS C . The extended language
in particular includes all the formulas inductively defined using also the following set of atoms which, in turn,
can be defined using the symbols =S and ∈S :
Definition 1 If and are terms of sort bag of timelines and s is a term of sort situation, then s ∈S ,
=S ∪S � , =S ∩S � , =S \S � , ⊆S , �S are atoms of the extended language.
=S ,
In [65] sets are often implicitly assumed, for example to define sets of actions with concurrent processes.
Here the definition of sets of situations through bag of timelines is more involved. Indeed, we shall use them in
Section 5 to build macro definitions of temporal compatibilities, from which we shall obtain the temporal network
specifying time constraints and temporal relations. Macro definitions will then be reduced to sentences of the
TFSC, therefore to prove properties about these sentence we shall often use regression, a central computational
mechanism in the Situation Calculus.
We introduce here the theorem ensuring that sentences mentioning bag of timelines are regressable, under
analogous restriction conditions given in [61] and we refer the reader to the Appendix, page 54, for the details.
Here by a regressable sentence and a k-uniform term we mean, respectively, a sentence and a term that satisfy
the conditions specified in [61] and suitably extended to bag of timelines (see the Appendix, Definition 4 and
Definition 5, page 55). Let D+ be a basic action theory with Σ extended to A+ (see the above Theorem 3), then:
Theorem 4 Let φ( 1 , . . . , k ) be a regressable sentence mentioning terms of sort bag of timelines. There exists a
formula R(φ( 1 , . . . , k )) uniform in S 0 such that;
D+ |= R(φ( 1 , . . . ,
k ))
≡ φ( 1 , . . . ,
k)
(15)
4
4
THE SYSTEM AT WORK: PROCESSES IN TFSC
16
The system at work: processes in TFSC
For each type Hi , encoding a system component, we assume that there exists a set of processes and a set of
fluents describing the behaviours of the component. It follows that also these actions need to be specified for the
type H(i, a) of each component i. Processes span the subtree of situations, over a single interval between a start
and end action: for each process there are two actions, starting and ending the process, abbreviated by startπ ,
meaning starts process π and endπ , meaning ends process π. To simplify the presentation we shall add to the
start and end actions the type which, in general, given the H and the =ν is not needed.
A process is denoted by a fluent π(i, �x, t− , s), where i is for the type and t− for its start time. Successor state
axioms for processes (Dπ ) extend the set D ss of successor state axioms for fluents and are defined as follows:
π(i, �x, t− , do(a, s))
≡ a = startπ (i, �x, t− ) ∨ π(i, �x, t− , s) ∧ ∀t.a � endπ (i, �x, t).
(16)
For example the process for the component nav moving towards θ, can be defined as:
move(nav, θ, t− , do(a, s)) ≡ a=startmove (nav, θ, t− ) ∨ move(nav, θ, t− , s) ∧ ¬∃t� .a = endmove (nav, θ, t� ).
As usual (see [65]) a situation is defined to be executable as follows:
de f
executable(s) = ∀a, s� .s = d(a, s� ) � s→Poss(a, s� )
(17)
On the other hand, given a set of processes related to a timeline T (i, s), their distribution on the timeline is
controlled by the fluent Idle(i, s) telling whether a process, of type i, is being executed at the situation s. The
successor state axiom for Idle is defined as follows:
Idle(i, do(a, s)) ≡
(s = S 0 ∧ H(i, a) ∨ T (i, s) ∧ a=ν s) ∧
��
� ��
�
x t.a = endπ (i, �x, t) ∨ π∈Π ¬∃�x t.a = startπ (i, �x, t) ∧ Idle(i, s) .
π∈Π ∃�
(18)
That is, Idle(i, s) lasts up to the process starting and after its end. We can then break down the processes
along a timeline using the preconditions axioms (Dap ) as follows:
Poss(startπ (i, �x, t), s) ≡ (s = S 0 ∨ s=ν startπ (i, �x, t)) ∧ Idle(i, s) ∧ time(s) ≤ t ∧ Φ start (i, �x, s);
Poss(endπ (i, �x, t), s) ≡ (s = S 0 ∨ s=ν endπ (i, �x, t)) ∧ ∃t− π(i, �x, t− , s) ∧ time(s) < t ∧ Φend (i, �x, s).
(19)
Here Φ start (i, �x, s) and Φend (i, �x, t, s) are the precondition formulas for the execution of startπ and endπ respectively. These can possibly refer to other timelines, hence to other components. We do not further investigate
here this possibility, instead we follow the approach in [43], where global constraints like e.g. Poss(a, s) →
time(s) ≤ time(a) are specified by action preconditions of the form Poss(A(�x, t), s) ≡ Φ(�x, t, s) ∧ time(s) ≤ t.
Indeed, here, time(s) < t is required for the endπ (i, �x, t), to filter out durationless processes.
If a process of a component i is already active in S 0 all other processes of the same components cannot be
active. This proviso is intuitive, each component of the complex system can execute a process at a time and
the component is idling only if none of its processes are active. This requirement is expressed by the following
property, for all types i:




�


Idle(i, S 0 ) ∨ ∃�x.π(i, �x, t0 , S 0 ) → ¬Idle(i, S 0 ) ∧ ∀�y
¬π� (i, �y, t0 , S 0 )
(processes consistency). (20)


�
π ∈Π
π�π�
If π(i, �x, t0 , S 0 ) holds for some �x in S 0 , then this is the only active process of type i, hence ¬Idle(i, S 0 ) holds
too, because the i-component has an active process and so it is not idling. Note that there is no need to have
a complete description of the initial situation DS 0 . Let us define Dπ to be the set of successor state axioms
5
TEMPORAL INTERVALS AND CONSTRAINTS
17
for processes, the set of action precondition axioms for processes and the successor state axiom for Idle, let
D ss ∪ Dap be the set of successor state axioms and action precondition for fluents, and let DS 0 be the set of
formulas uniform in S 0 , such that the above requirement (20) is satisfied in DS 0 . Let DT be the theory formed
by D ss ∪ Dap ∪ Dπ ∪ DS 0 ∪ A+ ∪ Duna , then
Theorem 5 DT is satisfiable iff DS 0 ∪ A+ ∪ Duna is.
�
Given the successor state axioms (16) and (18) along with the preconditions (19) the processes consistency
property (20) holds for each executable timeline (see 17). We first show that any executable situation is a timeline.
For this we may assume that the action preconditions for fluents (not processes) must be of the form:
Poss(A(�x, t), s) ≡ (s = S 0 ∨ s=ν A(�x, t)) ∧ Φ(A(�x, t), s)
(21)
with A any action, possibly different from startπ and endπ .
Proposition 1 Let DT = D ss ∪ Dap ∪ Dπ ∪ DS 0 ∪ A+ ∪ Duna , then, any executable situation σ is a timeline.
�
Using the above result we can state:
Proposition 2 Let DT be as in Theorem 5 such that (20) holds in DS 0 , and let σ be an executable situation, then
for any process π and type i:
�
�
� � �
DT |= Idle(i, σ) ∨ ∃�x t.π(i, �x, t, σ) → ¬Idle(i, σ) ∧ ∀�y π�π
y, t, σ) .
(22)
π� ∈Π ¬π (i, �
�
The precondition axioms in (19) compel executability only on timelines. This requirement does not impose
that preconditions of actions are unaffected by other timelines as, in fact, they might be specified in the ΦQ , but
simply that there exists a component able to execute an action sequence. This notion is useful for the generation
of executable flexible plans. On the other hand, hybrid executability both for processes and fluents, would
require to introduce a distinction between: executability within the component (based on the preconditions (19))
and executability within the system. With two notions of executability at hands one could exploit non-timeline
situations to reason about the system. For example a situation like σ = do([startgo (nav, pos1 , 1), start scan (pan, 2),
end scan (pan, 3), endgo (nav, pos1 , 5)], S 0 ) could be used as a system log and exploited to infer properties about the
overall system behaviour. Here we derive only the first notion of executability and we do not develop the latter.
5
Temporal Intervals and Constraints
So far we have given the basic formalism to model parallel processes that can be executed on timelines specified
by different components. The way these processes interact in terms of time can be expressed by time constraints
taken from the classical relations [2, 47, 6] between time intervals, see Figure 3.
Notation: In this section we shall denote process and fluent symbols with uppercase letters as P, Q, . . . to treat
them uniformly, while in the example names of processes are all indicated by lowercase letters. All the defined
predicates are macros, hence they are not added to the language and all the fluents appearing on the right hand
side of the definition, that is, in the definiens, are defined by a successor state axiom. This fact ensures that
macros cannot be reduced to state constraints. A temporal interval is denoted by [t− , t+ ]. Temporal interval
relations before, meets, overlaps, during, starts, finishes, equals are denoted by b, m, o, d, s, f, e respectively.
5
18
We represent the free temporal variables using the notation t to indicate that a temporal variable t occurs free;
when necessary, we use τ to represent either a temporal variable that occurs free or a ground term instantiating
a temporal variable. Further, we introduce the notation σ[ω] ( [ω] for bags of timelines) to explicitly denote a
tuple ω = �t1 , t2 . . . , tn � of free temporal variables mentioned in a situation σ(bag of timelines ).
Example 7 Consider the usual temporal interval relations: b, m, o, d, s, f, e, as defined in the temporal intervals
literature, started in [2]. Let pan-tilt and nav (for navigation) be two components of the system, with the processes
scan belonging to the pan-tilt component and stop belonging to the nav component. To express that the process
scan can be performed while nav is stopped we would like to say: scan d stop, this constraint should be encoded
in a suitable TFSC formula mentioning the fluents scan(pan−tilt, t, s) and stop(nav, t, s).
�
We, thus, begin by defining two predicates S tarted and Ended taking as arguments the processes/fluents
arguments together with the starting time and the ending time, respectively. For each process/fluent P, these
predicates are defined as follows:
S tartedP (i, �x, t− , a, s)
EndedP (i, �x, t , a, s)
+
de f
= P(i, �x, do(a, s)) ∧ ¬P(i, �x, s) ∧ time(a) = t−
de f
= P(i, �x, s) ∧ ¬P(i, �x, do(a, s)) ∧ time(a) = t+
(23)
The meaning of S tartedP is the following: a process P which does not hold in s is started by action a at time
t− in so becoming active. On the other hand, the meaning of EndedP is: a process P which is currently holding
in s is ended by action a at time t+ in so becoming elapsed.
We can now define explicitly the temporal characterisation of a process in a time lapse.
de f
ActiveP (i, �x, t− , do(a, s)) =
T (i, do(a, s)) ∧ S tartedP (i, �x, t− , a, s) ∨ ActiveP (i, �x, t− , s) ∧ ¬∃t+ Ended(i, �x, t+ , a, s)
de f
Elapsed(i, �x, t− , t+ , do(a, s)) =
T (i, do(a, s)) ∧ Elapsed(i, �x, t− , t+ , s) ∨ Ended(i, �x, t+ , a, s) ∧ ActiveP (i, �x, t− , s)
(24)
The meaning of Active and Elapsed is intuitive: a process is active if it started at some time t− before the current
time and it is still holding, while it is elapsed if it was active at some time before but is no more active.
We assume that at time t0 , the time of S 0 , there is no record of past processes but there might be active
processes just started at time t0 . This is expressed in the following definitions:
de f
ActiveP (i, �x, t− , S 0 ) = P(i, �x, S 0 ) ∧ time(S 0 ) = t−
de f
ElapsedP (i, �x, t− , t+ , S 0 ) = ⊥
(25)
Example 8 For example, the interval during which the fluent predicate at(nav, o, x, s) lasts (nav is for the navigation component) can be described by Elapsedat (nav, o, x, t− , t+ , s) and Activeat (nav, o, x, t− , s) described as
follows.
Elapsedat (nav, o, x, t− , t+ , do(a, s))
de f
Activeat (nav, o, x, t− , do(a, s))
de f
=
=
T (i, do(a, s)) ∧ (Elapsedat (nav, o, x, t− , t+ , s)∨
Endedat (nav, t+ , o, x, a, s) ∧ Activeat (nav, o, x, t− , s)) ;
T (i, do(a, s)) ∧ (S tartedat (nav, t− , o, x, a, s)∨
Activeat (nav, o, x, t− , s) ∧ ¬(∃t+ Endedat (nav, t+ , o, x, a, s))) .
�
5
19
With the aid of ElapsedP and ActiveP we can represent the above interval relations between processes, and
fluents, specified in DT according to a TFSC formula Fop suitably built up from a combination of ElapsedX ,
ActiveX (where X denotes the fluent or process they refer to). Let op denote an interval relation:
P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ]
def
=
Fop (i, j, �x, �y, ti− , ti+ , t−j , t+j , s, s� ).
(26)
In particular, we focus on the interval relations op ∈ {b, m, o, d, s, f, e}. Here, P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j ) is
a situation-suppressed expression that represents the interval relation between P and Q independently from the
situations’ instances, while the expression [s, s� ] restores the situations in the formula.
In the following example, we show how some of these relations can be represented in TFSC using the form
(26).
Example 9 The interval relations m, f, s, and d can be macro-defined as follows.
i. Relation P(�x) m Q(�y):
P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[s, s� ]
def
=
ElapsedP (i, �x, ti− , ti+ , s) →
�
�
(ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (ti+ = t−j ) .
P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j ) holds over the timelines s and s� , with T (i, s) and T ( j, s� ) if, whenever P
ends at ti+ , Q starts at t−j with ti+ = t−j .
ii. Relation P(�x) f Q(�y):
P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[s, s� ]
def
=
�
�
ElapsedQ ( j, �y, ti+ , t+j , s� ) ∧ (ti+ = t+j ) .
P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j ) holds over the timelines T (i, s) and T ( j, s� ) if, whenever P ends at ti+ , Q ends
at t+j with ti+ = t+j .
iii. Relation P(�x) s Q(�y):
P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[s, s� ]
def
=
(ElapsedP (i, �x, ti− , ti+ , s) →
(ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (ti− = t−j ))
∨
(ActiveP (i, �x, ti− , s) →
(ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (ti− = t−j )).
P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j ) holds over the timelines T (i, s) and T ( j, s� ) if, whenever P starts at ti− with
argument �x along s, then Q starts at t−j = ti− , with argument �y, along s� .
iv. Relation P(�x) d Q(�y):
P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[s, s� ]
def
=
(ElapsedP (i, �x, ti− , ti+ , s) →
(ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� ))∧
(t−j ≤ ti− ∧ ti+ ≤ t+j ))
∨
(ActiveP (i, �x, ti− , s) →
(ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )) ∧ (t−j ≤ ti− )).
5
x
20
x before y
y after x
y
x meets y
y met by x
x
x overlaps y
y overlapped by x
x
x
x equals y
y
x starts y
y started by x
x
x during y
y contains x
x
x
x finishes y
y finished by x
Figure 3: Allen Interval Relations
P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j ) holds over the timelines T (i, s) and T ( j, s� ) if, whenever P starts at ti− (and
ends at ti+ ) with argument �x along s, then Q started at t−j ≤ ti− (and ends at ti+ ≤ ti− ), with argument �y, along
s� .
�
5.1
Temporal Compatibilities: Syntax
Given the abbreviations described in the previous section, as P(.) op Q(.), we can construct what we call compatibilities that regulate how each process (or fluent) Pi behaves along the timelines. We denote compatibilities
by comp(Pi , LLists), where Pi is, indeed, either a process or a fluent symbol, and LList is a list of lists, named
List, of pairs (op, P j ) composed of an interval relation op and a process or fluent symbol P j . The set of temporal
compatibilities for a given action theory DT in TFSC is denoted T c , and the syntax for their construction is given
below:
Tc
LLists
List
::=
::=
::=
[ ] | [comp(Pi , LLists) | T c ];
[ ] | [List | LLists];
[ ] | [(op, P j ) | List].
Example 10 A set of two compatibilities mentioning the interval relations m, d, b, e binding the interaction
between the processes P1 , P2 , P3 and P4 is defined as follows:
Tc
=
[ comp(P1 , [[(m, P2 ), (d, P3 )], [(b, P4 ), (e, P3 )]]),
comp(P2 , [[(s, P1 ), (m, P4 )], [(d, P3 )]])
].
Here the compatibilities state that either (1) the process P1 meets P2 and is during P3 , or (2) P1 is before P4 and
ends P3 ; moreover, either (3) P2 starts P1 and meets P4 or (4) it is during P3 .
5.2
Temporal Compatibilities: Semantics
Temporal compatibilities T c , similarly as in Golog, are not first class citizens of the language, thus their semantics
is defined through TFSC macros. The time variables mentioned in the compatibilities (see previous paragraph)
5
21
play an important role in their construction because tasks switching might not be defined in advance, that is, constraints might be qualitatively but not metrically fixed. For example, we know that event A has to occur before
B, without knowing the precise duration or timing. We show in Section 6 that this flexibility can be managed by
temporal variables whose values are constrained by the temporal network associated with the described compatibilities. We recall that we indicate the free temporal variable using the notation t and the notation s[ω] to denote
the occurrence of free temporal variables ω = �t1 , t2 . . . , tn � in a situation s. For example, the situation σ1 =
do(endπ (i, t1 ), do(startπ (i, 1.5), S 0 )) represents a process started at time 1.5 with the ending time denoted by the
free variable t1 .
The semantics of the above defined temporal compatibilities is specified by a predicate I(T c , [ω]) denoting
the constraints associated with a bag of situations given the T c compatibilities. The predicate I(T c , [ω]) is
obtained by eliciting the time constraints of the variables ω, according to the tail recursive construction illustrated
below, using two further predicates I1 and I2 :
def
I([ ], ) = � .
def
I([comp(P(i, �x), LLists) | T c ], ) = I1 (comp(P(i, �x), LLists), ) ∧ I(T c , ) .
(27)
Here the induction is defined with respect to LLists: if LLists is empty then I is true; otherwise I is the conjunction
of the predicate I1 taking as argument, to be expanded, the compatibility comp(P(i, �x), LLists) and the predicate
I taking as argument the remaining compatibilities T c . The macro expansion construction proceeds as follows.
def
I1 (comp(P(i, �x), [ ]), ) = ⊥ .
def
I1 (comp(P(i, �x), [List | LLists]), ) = I2 (P(i, �x), List, ) ∨ I1 (comp(P(i, �x), LLists), ) .
(28)
The I1 macro denotes the compatibilities of P(i, �x) and it is defined by a disjunction with the I2 macros described below. Each I2 (P(i, �x), List, ) macro collects the set of temporal constraints specified by the compatibility comp(P(i, �x), [List]) over the timelines mentioned in the bag of situations .
def
I2 (P(i, �x), List, ) = ∃s(
s ∈ ∧ T (i, s)∧
�
(29)
( {(op,Q( j,�y))∈List} ∃s� (s� ∈ ∧ T ( j, s� )∧
∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ])))).
So the predicate I2 is the bottom of the expansion as it reduces to a conjunction of statements P(i, �x, ti− , ti+ )
op Q( j, �y, t−j , t+j )[s, s� ]) that we already know how to transform into a formula of TFSC, see Section 5. Note,
in particular, that the bag of timelines mentioned in I2 , serves to pick up a pair of situations for each temporal
constraint, according to its type i. That is, the expansion of I2 (P(i, �x), List, ) says that each element (op, Q( j, �y)) ∈
List specifying a temporal relation op with the processes P(i, �x), holding over the timeline s ∈ , holds over the
timeline s� ∈ compatibly with the constraint op.
Example 11 Consider the timelines for the two components nav(for navigation) and eng (for engine) depicted
in Figure 4. The involved compatibilities are represented by the following T c term
Tc
=
[ comp(at(nav, x), [[(d stop(eng)) ] ]),
comp(go(nav, x, x� ), [[(e run(eng)) ] ]) ],
Here T c states that: at(nav, x) d stop(eng), that is, the agent navigation can be at a specified position x only
while the engines are stopped. On the other hand the temporal constraint go(nav, x, x� ) e run(eng) tells us that
the processes go(nav, x, x� ) and running(eng) start and stop at the same time.
5
S0
nav
do(startgo,S0)
go
at
do([startgo,endgo],S0)
at
Elapsedat(nav,p1,t0,t-1)
Elapsedgo(nav,p1,p2,t-1,t+1)
t-1 <= t-2
t-1 = t-2, t+1 = t+2
Elapsedstop(eng,t0,t-2)
eng
22
stop
S0
Elapsedrun(eng,t-2,t+2)
run
do(startrun,S0)
Activeat(nav,p2, t+1)
t+2 <= t+1
Activestop(eng,t+2)
stop
do([startrun,endrun],S0)
Figure 4: Timelines represented by [t−1 , t+1 , t−2 , t+2 ] = B({ do([startgo (nav, p1 , p2 , t−1 ), endgo (nav, p1 , p2 , t+1 )], S 0 ),
do([startrun (eng, t−2 ), endrun (eng, t+2 )], S 0 )}), and temporal constraints defined by the macro I(T c , [ω]), with ω
= �t−1 , t+1 , t−2 , t+2 �. Note that each relation Elapsed and Active labelling the timeline nav implies the temporal
constraint labelling the arrow.
The timelines in Figure 4 are designated by the following bag of situations:
[ω]
= B({ do([startgo (nav, p1 , p2 , t−1 ), endgo (nav, p1 , p2 , t+1 )], S 0 ),
do([startrun (eng, t−2 ), endrun (eng, t+2 )], S 0 )}),
where ω = �t−1 , t+1 , t−2 , t+2 � are time variables.
To establish the temporal constraints that hold over the timelines in [ω] using the compatibilities T c (see
Figure 5), we can built the predicate I(T c , [ω]) on , with time variables ω, according to definition (28), as
follows:
I(T c , )
def
=
I1 (comp(at(nav, x), [[(d stop(eng)) ] ]), )∧
I1 (comp(go(nav, x, x� ), [[(e run(eng)) ] ]), ),
that is, macro expanding I1 in terms of I2 (by (28)),
I(T c , )
def
=
I2 (at(nav, x), [(d stop(eng))], ) ∧ I2 (go(nav, x, x� ), [(e run(eng))], ),
According to the above I2 expansion, we obtain:
def
I2 (at(nav, x), [(d stop(eng))], ) = ∃s(s ∈ ∧ T (nav, s)∧
∃s� (s� ∈ ∧ T (eng, s� ) ∧ ∀x, t1 , t2 ∃t3 , t4 (at(nav, x, t1 , t2 ) d stop(eng, t3 , t4 )[s, s� ]))) ;
def
I2 (go(nav, x, x� ), [(e run(nav))], τ2 , ) = ∃s(s ∈ ∧ T (nav, s)∧
∃s� (s� ∈ ∧ T (nav, s� ) ∧ ∀x, y, t1 , t2 ∃t3 , t4 (go(nav, x, y, t1 , t2 ) e run(eng, t3 , t4 )[s, s� ]))).
Collecting everything together we obtain that I(T c , ) reduces to the following TFSC formula denoting the tem-
5
{(0,0)}
at
23
go
{(0,0)}
at
{(0,0)}
S0
{d}
{e}
{d}
{(0,0)}
stop
{(0,0)}
run
{(0,0)}
stop
Figure 5: Temporal constraint network associated with the compatibilities T c = [comp(at(nav, x), [[(d stop(eng))
] ]), comp(go(nav, x, x� ), [[(e run(eng))]])] and the timelines represented by [t−1 , t+1 , t−2 , t+2 ] = B({ do([startgo (nav,
p1 , p2 , t−1 ), endgo (nav, p1 , p2 , t+1 )], S 0 ), do([startrun (eng, t−2 ), endrun (eng, t+2 )], S 0 )}).
poral constraints relative to and T c (see Figure 4).
I(T c , )
def
=
∃s(s ∈ ∧ T (nav, s)∧
∃s� (s� ∈ ∧ T (eng, s� ) ∧ ∀x, t1 , t2 ∃t3 , t4 (
Elapsedat (nav, x, t1 , t2 , s) →
(Active stop (eng, y, t3 , s� ) ∨ Elapsed stop (eng, y, t3 , t4 , s� )) ∧ (t1 = t3 ∧ t2 = t4 )∨
Activeat (nav, x, t1 , s) →
(Active stop (eng, y, t3 , s� ) ∨ Elapsed stop (eng, y, t3 , t4 , s� )) ∧ (t1 = t3 )))∧
� �
∃s (s ∈ ∧ T (eng, s� ) ∧ ∀x, t1 , t2 ∃t3 , t4 (
Elapsedgo (nav, x, t1 , t2 , s) →
(Activerun (eng, y, t3 , s� ) ∨ Elapsedrun (eng, y, t3 , t4 , s� )) ∧ (t3 ≤ t1 ∧ t2 ≤ t4 )∨
Activego (nav, x, t1 , s) →
(Activerun (eng, y, t3 , s� ) ∨ Elapsedrun (eng, y, t3 , t4 , s� )) ∧ (t3 ≤ t1 )))).
Discussion In the TFSC framework, parallel timelines are associated with their sets of processes and fluents,
therefore, processes and fluents belonging to different timelines influence each other mainly through temporal
constraints. This proves that loosely coupled components and temporal constraints are necessary to allow and
capture flexible temporal behaviours. Indeed, in the TFSC framework, starting and ending points of the processes are not fixed and events associated with different components are not sequenced, hence only temporal
constraints can be forced. This approach allows us to (1) represent temporally flexible behaviours in their generality (2) keep the simple structure of the basic theory of actions for each component. Notice also that, fluents
belonging to two separated components can be easily related through temporal compatibilities, e.g. specifying
P(i, �x, t1− , t1+ ) e Q( j, �y, t2− , t2+ ), implies that whenever P(·) is on timeline i, Q(·) must be on timeline j. Furthermore,
since the temporal compatibilities are expressed by a TFSC formula, it is possible to infer properties associated
with parallel timelines and their constraints. E.g. it is possible to ask whether DT |= ∃t, p[at(nav, p, t, σ2 ) ∧
stop(eng, t, σ1 )∧ σ1 ∈ ∧ σ2 ∈ ] ∧ I(T c , ), with defined as specified in Example 11. This formula combines
parallel processes and temporal constraints. In the next sections, we shall show how it is possible to decouple
logical and temporal reasoning in TFSC.
6
MAPPING TFSC TO TEMPORAL CONSTRAINT NETWORKS
6
24
Mapping TFSC to Temporal Constraint Networks
In this section, we introduce the construction of a transformation from the compatibility formula I(T c , ), having a
macro definition (see (26)) within a background theory DT of TFSC, into a general temporal constraint networks
(TCN) [47]. More specifically, we show that, given a domain theory DT , a set of temporal compatibilities T c ,
and a bag of timelines [ω], it is possible to build a temporal network as a disjunction of conjunctions of algebraic
relations µop over time.
6.1
Temporal Constraint Network
A Temporal Constraint Network (TCN) is a formal structure for handling metric information, the general concept
was early introduced by Dechter, Meiri and Pearl in [13] and then further extended in [14] and in [47] to handle
both quantitative and qualitative information. Temporal knowledge represented by propositions, can be associated with intervals, and relationships between events timing can be represented by constraints. For example the
statement the robot was close to the door before it could see it, but it was still there after it had processed the
images, can be represented as:
(a) closeT o(r, d, t1− , t1+ ) b see(r, d, t2− , t2+ )
(b) closeT o(r, d, t1− , t1+ ) a scan(r, d, t2− , t2+ )
A TCN offers a simple representation schema for temporal statements, exploiting a temporal algebra of
relations that can be expressed by a direct constraint graph, where each node is labelled by a an event associated
with a temporal interval, and directed edges between nodes denote the temporal constraints.
Essentially a TCN involves a set of variables {t1− , t1+ . . . , tn− , tn+ }, with time intervals [t− , t+ ] representing the
duration of specific events (e.g. closeT o), and a set R of binary constraints coming from the 13 possible relationships that can be stated between any pair of intervals [2], these are illustrated in Figure 3. Note that constraints
can be expressed disjunctively, for example if we consider the events at(r, P1 ) and moveT o(r, P2 ), then in the
TCN we could express the statement at(r, P1 ) {m, s} moveT o(r, P2 ), saying that the event at(r, P1 ) can either
start or meet the event moveT o(r, P2 ).
According to the underlying temporal algebra, TCN s can express different forms of reasoning; among the
most well known are the Point Algebra [78], and the metric point algebra [14], an extensive overview can be
found in [6].
Let {t1− , t1+ , . . . , tn− , tn+ }, n ∈ N be a set of time variables, where each couple of variables ti− , ti+ denotes an interval
− +
[ti , ti ] possibly associated with some event; let TCN involve a set of binary constraints R = {op1 , . . . , opm },
m ∈ M. The temporal constraint network TCN represented with a labelled direct graph can be described using
conjunctions and disjunctions of constraints as follows:
� �
− +
− +
(30)
z∈Z
i, j∈Jz [ti,z , ti,z ] opi,z [t j,z , t j,z ].
The assignment V = {�v−i , v+i � | v+i (ti+ ) = si and v−i (ti− ) = ei with si , ei ∈ R+ , si < ei } to the variables is called a
solution if it satisfies all the constraints in R, defining the TCN. The network is consistent if at least one solution
exists (see [14]). A classification of complexity for satisfiability problems (specifically for the Allen’s interval
algebra), has been given in [35], following previous results of [52].
6.2
Mapping compatibilities to temporal constraints
Consider the macro definition (26) and the definitions of the temporal operators {m, b, f, d, s, e} as given in
Example 9. We have that:
de f
P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ] = Fop (i, j, �x, �y, ti− , ti+ , t−j , t+j , s, s� )
6
25
where Fop (·) is a formula of LT FS C (definiens) while P(·) op Q(·) is a macro of the pseudo language (definiendum). Clearly, by M, v |= P(·) op Q(·) we mean M, v |= Fop (·). Furthermore, we can note that each op can be
given an algebraic interpretation γop of a temporal constraint á la Allen as follows. Let op ∈ {b, m, o, d, s, f, e},
there is an algebraic relation interpreting op, say γop , defined as follows:
def
γb (ti− , ti+ , t−j , t+j ) = (ti+ ≤ t−j )
def
def
γm (ti− , ti+ , t−j , t+j ) = (ti+ = t−j )
def
γo (ti− , ti+ , t−j , t+j ) = (ti− ≤ t−j ∧ ti+ ≤ t+j ) γd (ti− , ti+ , t−j , t+j ) = (t−j ≤ ti− ∧ ti+ ≤ t+j )
def
γs (ti− , ti+ , t−j , t+j ) = (ti− = t−j )
def
def
γf (ti− , ti+ , t−j , t+j ) = (ti+ = t+j )
(31)
γe (ti− , ti+ , t−j , t+j ) = (ti− = t−j ∧ ti+ = t+j )
Within the TFSC approach, the definiens Fop (·) is interpreted into structures of the TFSC, letting the assignments
to temporal variables ω to freely vary on these structures. However we shall show that the TCN, that we obtain
from the predicate I(T c , [ω]), will make it possible to suitably specify these variables values.
The following theorem states that the predicate I(T c , [ω]) can be transformed into a normal form, given a
suitable indexing of the time variables with respect to the processes and the interval relations op.
Theorem 6 Let [ω] be a bag of timelines mentioning a set of timelines {σ1 , . . . , σn }, where each σi is a timeline
term and where ω collects all the free variables in .
Then the predicate I(T c , [ω]) can be reduced to the following form:
� �
x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ].
(32)
z∈Z
�q1 ,q2 ,q3 ,q4 �∈Jz ∀�
Here, Z and Jz are finite sets of indexes and the τi, j s are either free variables or ground terms mentioned in ω.
�
Theorem 6 allows to eliminate temporal quantifiers in I(T c , [ω]) obtaining a normal form where temporal terms
τi, j can be are either temporal variables or ground terms. Here, the index z ranges on the disjunctions, while the
other indexes q1 , . . . , q4 range on the possible conjuncts Pz,q1 , on the associated temporal variables τz,q2 , on the
relations opz,q3 Qz,q3 , and on their variables τz,q4 .
To put in evidence the interval relations, when i, j, σi , σ j can be extrapolated from the context, we use the
abbreviation ϕop (τ−k , τ+k , τ−g , τ+g ) to denote the interval relation:
∀�x∃�yP(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ].
(33)
Now, given the disjunction of conjunctions of interval relations as defined in (32), we need to make explicit the
underlying algebraic relations. The algebraic interval relations associated with ϕop (τ−k , τ+k , τ−g , τ+g ) depends also
on the associated domain theory DT . In fact, the left hand side P(·) of the interval relation P(·) op Q(·) works as
the enabler of the interval constraint: if no process (or fluent) P(·) is either active or elapsed along the timelines,
then the associated interval relation op is not applicable; otherwise, if P(·) is active or elapsed, the algebraic
relation for op depends whether P(·) is still active or is elapsed. More precisely, we distinguish the following
three cases:
E P : DT |= ∃�x, t− , t+ ElapsedP (i, �x, t− , t+ , σi ) ∧ σi ∈ ,
AP : DT |= ∃�x, t− ActiveP (�x, t− , σi ) ∧ σi ∈ ,
NP : neither E P nor AP hold.
(34)
Given these cases, we can introduce the algebric relation µop (tk− , tk+ , tg− , tg+ ) associated with ϕop (tk− , tk+ , tg− , tg+ ) as
follows:
m1
m2
m3
If NP holds then, for the given op, no temporal constraint is imposed;
if E P holds then, for the given op, µop = γop ;
(35)
if AP holds then we have that, for op ∈ {m, f} no temporal constraint is imposed, as for the remaining cases:
6
µb = γb , µe = (tk− = tg− ), µo = (tk− ≤ tg− ),
26
µd = (tg− ≤ tk− ).
The following theorem shows the relation between µop and ϕop .
Theorem 7 Let µop be any of the algebraic interval relations defined above, and let M = (D, I) be a structure
of LT FS C such that M is a model of DT and suppose that for some assignment v to the free temporal variables
the following holds:
(i) M, v |= ∀�x∃�y.P(i, �x, τ−p , τ+p ) op Q( j, �y, τ−q , τ+q )[σi , σ j ],
(ii) M, v |= ∃�x ElapsedP (i, �x, τ−p , τ+p , σi ) or M, v |= ∃�x ActiveP (�x, τ−p , σi ).
Here σi , σ j are ground timelines of type i and j, with τ+p (I,v) = d+p , τ−p (I,v) = d−p , τ+q (I,v) = dq+ , τ−q (I,v) = dq− with d+p ,
d−p , dq− , dq+ elements of the domain D canonically interpreted in R+ .
Then, the algebraic relation µop holds on �d−p , d+p , dq− , dq+ �.
�
The theorem says that, given the conditions (i) and (ii) for a temporal relation ∀�x∃�yP(·) op Q(·), the algebraic
relation µop holds on the same time values. Note that, the condition (i) of Theorem 7 states that the temporal
relation ∀�x∃�yP(·) op Q(·) is consistent with respect to the theory DT and the bag of timelines . Instead, the
conditions (ii) play the role of the conditions m2 and m3 .
6.3
Compatibilities without logical constraints
We now want to introduce a notion of consistency which depends only on the logical structure independently
of the temporal constraints. Considering again the macro definition (26) and the definitions of the temporal
operators as given in Example 9, by {P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]}Λ we indicate the formula obtained
from P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ] excluding its temporal constraints. For example, for op = m
{P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[s, s� ]}Λ
=
((ActiveQ ( j, �y, t−j , s� ) ∨ ElapsedQ ( j, �y, ti− , t+j , s� )).
We can now introduce the notion of partially consistent interval relation.
Definition 2 Given a TFSC theory DT and a bag of timelines , the interval relation P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]
is partially consistent with respect to DT if there exists a model M of DT and an assignment v to the free temporal
variables such that
M, v |= {P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]}Λ
�
This notion allows us to separate the temporal and logical structure associated to the compatibilities and will
be exploited in the construction of the network illustrated in the next subsection. We are now ready, reversing
the process and using the results of Theorem 6 and Theorem 7, to show how from the compatibility constraint
predicate I(T c , [ω]) the temporal constraint network can be built.
6.4
Network Construction
The network construction proceeds as follows.
6
27
Temporal Network
Temporal relations
µ (τ− , τ+ , τ− , τ+ )
� �op p −p +q −q +
Temporal constraint network
µop (τ p , τ p , τq , τq )
Assignment V solution of the network
V = {�v−i , v+i � | v+i (ti+ ) = si , v−i (ti− ) = ei with si , ei ∈ R+ , si < ei }
Compatibilities and constraints in TFSC
Temporal variables
ω=�t−1 , t+1 , . . . t−n , t+n �, n ∈ N
Compatibility term
Tc
Constraint formula for with free variables in ω
I(T c , [ω])
Between TFSC and TCN mapping
Mapping Temporal Constraints
ζ : (DT , T c , [ω]) → TCN
Mapping Ordering Constraints
Ord : [ω] → TCN
Table 2: Notation for TFSC and TCN mapping.
(a) Transform the predicate I(T c , [ω]) into a disjunctive normal form, as indicated in Theorem 6, obtaining
the form (32):
�
z∈Z
�
�q1 ,q2 ,q3 ,q4 �∈Jz
∀�x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ].
(b) Let τ−1,1 , τ+1,1 , . . . , τ−n,m , τ+n,m be time variables or instances thereof mentioned in each of the conjuncts
∀�x∃�y.P(·) op Q(·) as in (32).
(c) For each conjunct
∀�x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ],
if this is partially consistent with DT then, given the E P , AP , and NP as specified in (34), we can define
the corresponding µqop1 z,q3 as specified by the rules (m1 )-(m3 ) in (35). Otherwise, if not partially consistent
in DT , the associated index z is collected in a set of indexes Z ∗ , that is, z ∈ Z ∗ .
(d) From the above disjunctive normal form (32), we can obtain the algebraic relations:
�
�
µqop1 z,q (τ−z,q2 , τ+z,q2 , τ−z,q4 , τ+z,q4 ).
z∈Z � (q1 ,q2 ,q3 ,q4 )∈Jz
3
Here, Z � is for Z \ Z ∗ and collects only the indexes not excluded at the step (c) (i.e. Z � collects only
consistent disjuncts).
Here, µqop1 z,q is the algebraic temporal relation indexed by z, q3 - as the opz,q3 in the form (32) - and q1 be3
cause depending on Pz,q1 according to the rules (m1 )-(m3 ). That is, given the domain theory DT and the bag of
timelines , the above algebraic relation can be obtained from the form (32) once we substitute each conjunct
� � q1
∀�x∃�y.P(·) op Qz, j (·) as specified in the step (c).
µopz,q3 (·, ·, ·, ·) is, indeed, the temporal constraint network
implicitly represented by I(T c , [ω]) given DT . In other words, this network is a labelled direct graph which
can be described using conjunctions and disjunctions of temporal constraints. Therefore, following the notation
introduced in Section 6.1, we can denote this temporal network as ζ(DT , T c , [ω]) because it depends on DT , T c ,
and [ω].
Notice that in the temporal network ζ(DT , T c , [ω]) obtained by the (a)-(d) transformation, the time ordering
constraints, enforced by the situations mentioned in the bag of timelines [ω], are not made explicit, although
7
FLEXIBLE HIGH LEVEL PROGRAMMING IN TFGOLOG
28
they hold in force of Lemma 2, (see Section 3). Therefore, to obtain the complete set of time constraints implicit
in [ω], given T c and DT , we have to consider also the temporal ordering defined by the structure of the bag
of timelines [ω]. That is, for any situation σ[ω� ] ∈ [ω] (where ω� are the variables mentioned in σ), if
σ1 [ω1 ] � σ2 [ω2 ] � σ[ω� ], then time(σ1 [ω1 ]) ≤ time(σ2 [ω2 ]) ≤ time(σ[ω� ]). Given the time variables t−1 ,
t+1 , . . . , t−m , t+m used for the time variables ω, we call this set of ordering constraints Ord( [ω]). For example,
considering again the bag of timelines depicted in Figure 4, Ord( [ω]) = (t0 ≤ t+1 ∧ t−1 ≤ t+1 ) ∧ (t0 ≤ t−2 ∧ t−2 ≤ t+2 )
collecting the ordering constraints associated with the two timelines in [ω].
Corollary 2 Let ζ(DT , T c , [ω]) be a temporal constraint network obtained by I(T c , [ω]), according to steps
(a-d) of the network construction and let M be a structure of TSFC which is a model for DT .
Let V = {�v−i , v+i � | v+i (ti+ ) = si and v−i (ti− ) = ei with si , ei ∈ R+ , si < ei } be an assignment to the time variables.
Then V is a solution for the network iff for any assignment v to the free temporal variables of [ω], which is
like V, M, v |= I(T c , [ω]).
�
The corollary says that we can use the temporal constraint network as a service for the theory of actions to fix
the temporal constraints between processes and fluents. Given the compatibility constraints and the Ord ordering constraints introduced above, we can express with network(DT , T c , [ω]) the conjunction of the temporal
constraint network and the ordering constraints as follows:
network(DT , T c , [ω]) = ζ(DT , T c , [ω]) ∧ Ord( [ω]).
(36)
We shall not discuss in this paper methods of simplification of the constraints nor for computing a satisfiable set
of time values for a temporal network.
7
Flexible High Level Programming in TFGolog
In this section, we introduce the syntax and the semantics of a TFGolog interpreter that can be used to generate
a temporal constraint network and the related flexible temporal plan.
7.1
TFGolog Syntax
Given the extended action theory presented above, the following constructs inductively build Golog programs:
1. Primitive action: α.
2. Nondeterministic choice: α|β. Do α or β.
3. Test action: φ?. Test if φ is true in the current bag of timelines.
4. Nondeterministic argument choice: choose �x for p(�x).
5. Action sequence: p1 ; p2 . Do p1 followed by p2 .
6. Partial order action choice: p1 ≺ p2 . Do p1 before p2 .
7. Parallel execution: p1 �p2 . Do p1 concurrently with p2 .
8. Conditionals: if φ then p1 else p2 .
9. Nondeterministic iteration: p∗ . Do p n times, with n ≥ 0.
7
29
10. While loops: while φ do p1 .
11. Procedures, including recursion.
Hence, compared to Golog, here we also have the parallel execution and partial order operator that can be defined
over parallel timelines.
Example 12 Considering again the two components nav (for navigation) and eng (for engine), depicted in
Figure 4, a possible TFGolog program encoding the robot task approaching position pos, within the time interval
d, can be written as follows:
proc(approach(d, pos), π(t1 , π(t2 , π(t3 ,
[π(x, startgo (nav, x, pos, t1 )) ≺ (at(nav, pos)?) � startrun (eng, t3 ) ; endrun (eng, t2 ) ; (t2 − t1 <d)?])))
).
Here, we are stating that, the robot starts to go to pos at time t1 , meanwhile the engine starts to work at time t3
and it is switched off, at the arrival to pos, at time t2 . Notice that endgo is not explicitly specified, but should be
inferred by the interpreter because needed to satisfy at(nav, pos).
7.2
TFGolog Semantics
The semantics of a TFGolog program p with respect to DT can be defined in the TFSC.
Given an initial bag of timelines , an interval (h s , he ) specifying the time horizon over which the program is
to be instantiated from h s to he , the execution trace � of a program p is recursively defined through the macro
DoT F(p, , � , (h s , he )).
First we shall introduce some further notation that extends Golog abbreviations to bag of timelines:
�
def
= ddo(a, s, ) = (s ∈ ∧ a=ν s) ∧ (
�
= ( \S B({s})) ∪S B({do(a, s)})).
(37)
Here, the two operations of difference and union are the one already defined for bag of timelines, as shown in
Example 6 and Definition 1. Notice that for a = A(�x, t) with t free variable, equals to ddo(A(�x, t), σ, ) with
mentioning the free variable t.
Furthermore, we extend the function time to bags of timelines as follows:
def
ttime( ) = max{time(σ)|σ ∈ }
(38)
Here max{.} is defined by a first order formula, for example if it were defined for two elements it would be as
follows:
def
max{a, b} = x = (x = a ∧ (a > b) ∨ x = b ∧ (b > a)).
Further, we define the executability of a bag of timelines, over a specified horizon (h s , he ) as
exec( , � , (h s , he ))
def
=
( = � ∧ ttime( ) ≥ h s ∧ ttime( ) ≤ he ) ∨ ∃ �� , s, a( � = ddo(a, s, �� )∧
(39)
executable(do(a, s)) ∧ exec( , �� , (h s , he )) ∧ time(a) ≥ h s ∧ time(a) ≤ he ),
exec( , � , (h s , he )) states that � is an executable extension of . The definition is inductive with respect to � ,
where the base case is � = and the inductive step is given for � = ddo(a, s, �� ), assuming exec( , � , (h s , he )),
with do(a, s) ∈ � executable timeline such that s ∈ �� .
We can now specify the DoT F(p, , � , (h s , he )) as follows.
7
30
1. Primitive action with horizon:
def
DoT F(a, , � , (h s , he )) =
∃s(s ∈ ∧ a=ν s ∧ Poss(a, s) ∧ time(s) ≥ h s ∧ time(s) ≤ he ∧ time(s) ≥ time(a)∧
(time(a) ≤ he ∧ � = ddo(a, s, ) ∨ time(a) > he ∧ = � ))).
Here, if the primitive action is applicable to s ∈ , and a can be scheduled after the horizon then it is
neglected along with the rest of the program (i.e. each action, which can start after the horizon could be
neglected; this temporal planning strategy is employed in several timeline-based planners, e.g. [51, 33]).
Notice that Poss(a, s) require a and s to be of the same type. Notice also that, for a = A(�x, t) with t free
variable, the free variable t is mentioned in � . We recall that [ω] denotes a bag of situations with free
variables ω.
2. Program sequence:
def
DoT F(prog1 ; prog2 , , � , (h s , he )) = ∃
��
(DoT F(prog1 , ,
��
, (h s , he )) ∧ DoT F(prog2 ,
Here, the second program prog2 is executed starting from the execution
��
��
, � , (h s , he ))
of the first program prog1 .
3. Partial-order action choice:
def
DoT F(prog1 ≺ prog2 , , � , (h s , he )) = ∃ �� , �� (DoT F(prog1 , , �� , (h s , he ))∧
DoT F(prog1 , �� , � , (h s , he )) ∧ �� S �� ) ∧ exec( �� , �� , (h s , he ))
Here, given the execution �� of the first program prog1 , the second program prog2 can be executed starting
from an executable extension �� of �� . If �� = �� then we have the sequence case.
4. Parallel execution:
def
DoT F(prog1 � prog2 , , � , (h s , he )) =
∃ �� DoT F(prog1 , , � , (h s , he )) ∧ DoT F(prog2 , ,
��
, (h s , he )) ∧ (
�
=
��
).
The parallel execution of two programs from , under the horizon (h s , he ), can be specified by the conjunction of the execution of the two programs over the timelines � and �� . The execution is correct iff the
obtained timelines are equal.
5. Test action:
def
DoT F(φ?, , � , (h s , he )) = φ[ ] ∧ =
�
.
Here φ[ ] stands for a generalisation of the standard φ[s] (in the TFSC) extended to bag of timelines, e.g.
P1 [ ] ∧ P2 [ ] stands for P1 (s1 ) ∧ P2 (s2 ) with s1 , s2 ∈ , i.e. each fluent is evaluated with respect to its
specific timeline.
6. Nondeterministic action choice:
def
DoT F(prog1 |prog2 , , � , (h s , he )) = DoT F(prog1 , , � , (h s , he )) ∨ DoT F(prog2 , , � , (h s , he )).
Here, analogously to standard Golog, the execution of the action choice is represented as the disjunction
of the two possible executions.
7
31
7. Nondeterministic argument selection:
def
DoT F(π(x, prog(x)), , � , (h s , he )) = ∃xDoT F(prog(x), , � , (h s , he )).
The execution of the nondeterministic argument selection is represented as in standard Golog.
8. Conditionals:
def
DoT F(if φ then prog1 else prog2 , , � , (h s , he )) =
φ[ ] ∧ DoT F(prog1 , , � , (h s , he )) ∨ ¬φ[ ] ∧ DoT F(prog2 , , � , (h s , he )).
9. Nondeterministic iteration:
def
DoT F(prog∗ , , � , (h s , he )) =
∀P{∀ 1 P( 1 , 1 ) ∧ ∀ 1 , 2 , 3 [P( 1 ,
2)
∧ DoT F(prog,
2 , 3 , (h s , he ))
→ P( 1 ,
3 )]}
→ P( , � ).
10. The semantics of conditionals, while loops, and procedures is defined in the usual way.
We show, now, that given two fully ground bags of timelines init and such that DoT F(prog, init , , (h s , he ))
then the timelines in the bag of timelines complete the timelines in init . Furthermore, we show that, if the
initial bag of timelines init mentions only executable timelines, then mentions only executable timelines too.
Proposition 3 Let DT |= DoT F(prog,
init ,
, (h s , he )) with ttime( ) ≤ he and h s ≤ ttime(
init ),
then
init
�S .
�
Proposition 4 Let DT |= DoT F(prog, init , , (h s , he )) with ttime( ) ≤ he and h s ≤ ttime(
executable, then any σ ∈ is executable.
init ),
if any σ� ∈
init
is
�
7.3
Generating Flexible Plans in TFGolog
In this section, we describe how TFGolog programs characterise temporally flexible execution traces represented
by bags of timelines.
The DoT F(prog, init , , (h s , he )) macro defined above defines the bag of timelines that are executable
extensions of init within the horizon (h s , he ), but temporal constraints are not considered. However, a correct
extension for init should also satisfy the temporal constraint network induced by the compatibilities T c (see
Section D) and represented by I(T c , [ω]) in TFSC (Corollary 2). Therefore, we introduce the following notion
of temporally flexible execution of a TFGolog program.
Definition 3 Let DT be a domain theory, T c a set of compatibilities, prog a TFGolog program, (h s , he ) a horizon,
and init an initial fully ground bag of situations. Let [ω] be a bag of timelines with free temporal variables ω,
[ω] is a temporally flexible execution of prog if the following holds.
DT |= ∃t1 , . . . , tn .DoT F(prog,
init ,
[t1 , . . . , tn ], (h s , he )) ∧ I(T c , [t1 , . . . , tn ]).
(40)
�
The following proposition shows that, given a temporally flexible execution of prog, the possible assignments of
ω can be characterised by the solution assignments of the temporal constraint network network(DT , T c , [ω]).
7
32
Proposition 5 Let DT , T c , prog, and [ω] be, respectively, a TFSC domain theory, a set of compatibilities, a
TFGolog program, and a bag of timelines with ω free temporal variables and let M a model of DT . Furthermore,
let V be a set of assignments which are solutions for network(DT , T c , [ω]) and let A be the set of assignments
to the temporal variables for M.
Given a ground bag of timelines init :
iff
v ∈ A and M, v |= DoT F(prog,
init ,
[ω], (h s , he )) ∧ I(T c , [ω])
v ∈ V and M, v |= DoT F(prog,
init ,
[ω], (h s , he )).
�
As a consequence of the definition of temporally flexible execution and of the above statement, we have the
following corollary which directly follows from Proposition 5.
Corollary 3 Let DT , T c , and prog be, respectively, a domain theory, a set of compatibilities, and a TFGolog
program. Let (h s , he ) be a horizon, init an initial fully ground bag of situations, and [ω] a temporally flexible
execution of prog.
Given any model M of DT and any v s.t. M, v |= DoT F(prog, init , [ω], (h s , he )) ∧ I(T c , [ω]), we have that
v ∈ V, hence it is a solution of network(DT , T c , [ω]).
�
In other words, this corollary states that, given a temporally flexible execution [ω], the possible assignments
to ω are the solutions V of network(DT , T c , [ω]).
Once we have established a relation between the temporally flexible execution and the network, we may want
to explicitly represent the solutions in the signature of LT FS C . Suppose we obtain, as a solution of the network, a
tuple of real numbers m = �m1 , . . . , mn � then there are two possibilities. If m is a tuple of rational numbers, they
are representable in LT FS C , hence we can explicitly refer to [m] to represent a ground [ω], e.g., ensuring that
DT |= DoT F(prog, init , [m], (h s , he )) ∧ I(T c , [m])) to check (40) for the instance m. Otherwise, m = �m1 , . . . ,
mn � might be numbers not in the signature of LT FS C . If we want to represent also these cases, a possible solution
is to extend the language to Lm
T FS C adding for each number its corresponding symbol name as a constant. Then,
in the extended language Lm
T FS C , we can obtain a suitable interpretation for the new symbols, by associating the
interpretation of each new symbol mi to the correspondent variable assignment, that is, ensuring that v(ti ) = miI ,
according to Proposition 5.
Notice that the [ω], analogously to standard Golog, can be obtained from the program prog as constructive
proof of (40). The main difference with respect to the standard Golog approach relies in the presence of the free
variables ω, these are the temporal variables �t1 , . . . , tm � associated with the actions Ai (d�i , ti ) extending the bag
of timelines init to [ω]. The description of an algorithm implementing the interpreter is beyond the scope of
this paper, here we just notice that, similarly to the standard Golog approach, the interpreter has to instantiate the
nondeterministic choices searching for the possible alternatives. However, in this case, the temporal variables
for the actions Ai (d�i , ti ), instead of been instantiated, are left free (because constrained by the temporal network
represented by I(T c , [ω])). An interpreter of this kind can be implemented in the Constraint Logic Programming
language (CLP) [32] which combines logic programming and constraint management. For example, in [10] we
exploit an implementation of the TFGolog interpreter developed in C++ and the Eclipse 5.7 engine for CLP.
Example 13 Considering again the two components nav (for navigation) and eng (for engine), depicted in
Figure 4, a possible TFGolog program encoding the robot task approaching position pos, within the time interval
d , can be written as follows:
proc(approach(d, pos), π(t1 , π(t2 , π(t3 ,
[π(x, startgo (nav, x, pos, t1 )) ≺ at(nav, pos)? � startrun (eng, t3 ) ; endrun (eng, t2 ) ; (t2 − t1 <d)?])))).
7
33
Here, we are stating that, the robot starts to go to pos at time t1 , meanwhile the engine starts to work at time t3
and it is switched off, at the arrival to pos, at time t2 . Given an initial, fully ground, bag of timelines:
init
= B({do(startgo (nav, p1 , p2 , 2), S 0 ), do(startrun (eng, 1), S 0 )}),
stating that, at the beginning, the agent starts going from p1 to p2 at time 2 and starts the engine at time 1, a
temporally flexible execution σ[ω] for the program approach(d, pos) is such that
DT |= ∃t1 , t2 , t3 .DoT F(approach(5, p2 ),
where d = 5, pos = p2 , and (h s , he ) = (0, 10).
7.4
init ,
[t1 , t2 , t3 ], (0, 10)) ∧ I(T c , [t1 , t2 , t3 ]),
TFGolog and Sequential Temporal Golog
To understand how TFGolog relates to other Golog versions in the literature we now show that TFGolog extends
Sequential Temporal Golog. Given the axioms of sequential temporal SC in [64], it is possible to accommodate
time in the Golog semantics. The Sequential Temporal Golog [64] can be directly obtained from the classical
Golog, the only change needed is the Do macro for primitive actions:
def
Do(a, s, s� ) = Poss(a, s) ∧ start(s) ≤ time(a) ∧ s� = do(a, s),
where start(s) represents the activation time for the situation s. Everything else about Do remains the same.
It is possible to show that TFGolog extends STGolog. Intuitively, any STGolog program can be expressed
as TFGolog program working with a single timeline, grounded time, infinite horizon, without time constraints.
More formally, we can state the following proposition.
Proposition 6 Given a Sequential Temporal SC theory DS T S C it is possible to define a TFSC theory DT such
that for any STGolog program prog st there exists a TFGolog program progt f such that
where
DS T S C |= Do(prog st , σ, σ� )
= B(σ) and
�
= B(σ� ).
if f
DT |= DoT F(progt f , , � , (0, ∞)),
�
Notice that STGolog concurrent temporal processes can be expressed by interleaving start and end actions
along a unique timeline represented by a single situation. STGolog, indeed, induces a complete order on activities. Therefore, this model of concurrency cannot represent partially ordered activities, that might have parallel
runs, since in this case the sequence of activations has to be decided at the execution time.
Reiter [65] proposes a concurrent version of STGolog, that permits the execution of sets actions, with a set
c = {a1 , . . . , an } in place of primitive actions. This notwithstanding the order of the activations is already fixed in
the generated sequence of actions. For example
[{starta , startb }, enda , endb ]
(41)
[starta , enda ]�[startb , endb ].
(42)
is a concurrent execution of two processes a and b where a and b start is synchronised, but end of a has to occur
before the end of b.
On the other hand, in TFGolog we can express a more general (flexible) execution plan as follows:
Here process a can end either before or after or even during the end of b.
The point here is that strict sequentialization, as illustrated above for concurrent STGolog, is due to the concurrency model based on interleaving actions. This model hampers the possibility to generate flexible sequences
in which switching is made possible. These limiting aspects are inherited by all the Golog-family, based on the
interleaving model, including ConGolog [11] and IndiGolog [29].
8
8
EXAMPLE: ATTENTIVE ROBOT EXPLORATION OF THE ENVIRONMENT
34
Example: Attentive robot exploration of the environment
Consider an autonomous robot exploring an environment and performing observations (e.g. a rescue rover [10]),
robot stimuli might guide the robot according to compatible constraints between components and tasks.
Let us model the executive control of the robot via a set of interacting components enabling switching between
tasks. For example, some typical components of an autonomous mobile robot are: the Head controller, that is
the PTU or pan-tilt unit, the Locomotion controller, the Navigation module, including simultaneous localisation
and mapping S LAM, the Attention system providing a saliency map for focusing on regions to be processed by
the Visual Processing component etc. We refer the reader to [10] for a detailed description of this domain. In
particular, in [10] we present a mobile robot whose executive control system is designed deploying TFSC and
TFGolog (see Figure 6).
idle
idle
point
detect
a
reset
focus
scan
Moving
Head
d
Attention
d
idle
idle
stop
at
wander
d
goto
Moving
run
d
Navigation
idle
Locomotion
explore
d
stop
map
SLAM
observe
approach
Exploration
Figure 6: Robot Control Architecture (left) and some control modules (right) with their processes/states (round
boxes), transitions (arrows), and temporal relations (dotted arrows)
TFSC Representation. Each component of the control system can be represented by a type in TCSF and it
is associated with a set of processes. For example, we can consider as names of types the constants: ptu, slm,
att,exp, nav, lcm denoting, respectively, the Head component (ptu), the SLAM module (slm), the Attention
module (att), the Explore module (exp), the Navigation module (nav), and the Locomotion component (lcm).
Each of these types is associated with a set of primitive actions, e.g. for the Locomotion component we have the
primitive actions start stop (lcm, t), end stop (lcm, t), startrun (lcm, t), and endrun (lcm, t), here the lcm type is defined
as follows:
H(lcm, a) ≡
∃t a = start stop (lcm, t) ∨ ∃t a = end stop (lcm, t) ∨ ∃t a = startrun (lcm, t) ∨ ∃t a = endrun (lcm, t).
As for the related processes, we can introduce specific fluents for each component, for example, given the
component depicted in Figure 6, possible fluents in this domain are:
8
35
Head:
{idle(ptu, s), point(ptu, x, t, s), scan(ptu, z, x, t, s), reset(ptu, z, x, t, s)};
SLAM:
{idle(slm, s) map(slm, s, t), stop(slm, s, t)} };
Attention:
{idle(att, s) detect(att, s, t), f ocus(att, s, t)} };
Explore:
{idle(exp, s), explore(exp, t, s), observe(exp, s, t), approach(exp, s, t)} };
Navigation:
{idle(nav, s), goto(nav, x, y, t, s), wander(nav, t, s), at(nav, x, s)};
Locomotion: {idle(lcm, s), run(lcm, t, s), stop(lcm, t, s)}.
Each process is explicitly represented in the TFSC model as described in Section 3. For example, in our case the
process scan is modelled by the fluent scan(ptu, z, x, t, s) and the actions start scan (ptu, z, x, t) and end scan (ptu, z,
x, t). The preconditions and the effects are encoded in the DT as specified in (16). For example, the successor
state axiom for scan(ptu, z, x, t, s) is the following:
scan(ptu, z, x, t, do(a, s))
a = start scan (ptu, z, x, t)∨
scan(ptu, z, x, t, s) ∧ ¬∃t� (a = end scan (ptu, z, x, t� )),
≡
while the preconditions for start scan (ptu, z, x, t) and end scan (ptu, z, x, t) are:
Poss(start scan (ptu, z, x, t), s)
Poss(end scan (ptu, z, x, t), s)
Temporal Compatibilities
Tc
≡
≡
(s = S 0 ∨ s=ν start scan (ptu, z, x, t)) ∧ idle(ptu, s) ∧ time(s) ≤ t ;
(s = S 0 ∨ s=ν end scan (ptu, �x, t)) ∧ ∃t� scan(ptu, z, x, t� , s) ∧ time(s) ≤ t .
Some temporal compatibilities T c among the activities can be defined as follows:
=[ comp(point(ptu, x),
comp( f ocus(att, x),
comp(scan(ptu, z, x),
comp(goto(nav, x, y),
[[(m scan(ptu, x))]]),
[[(a point(ptu, x))]]),
[[(d stop(exp)), (a map(slm))]]),
[[(d idle(ptu)), (d map(slm))]])].
These compatibilities state the following temporal constraints. point(ptu, x) m scan(ptu, x), i.e. upon the PTU
is pointed toward a location x the head is expected to scan the region around that point x, and the temporal
relation f ocus(att, x) a point(ptu, x) tells that attention focus is preceded by a PTU pointing towards a region
of the environment specified by x. After focusing the head can direct the cameras towards the region. Also,
scan(ptu, z, x) d stop(lcm) prescribes that while the Head is scanning the environment the robot must be in stop
mode to avoid invoking stabilisation processes. The constraints goto(nav, x, y) d idle(ptu) and goto(nav, x, y) d
map(slm) state that, while the robot is moving, the pant-tilt unit is to be idle and attention is to be active. Figure
7 illustrates a possible evolution of the timelines up to a planning horizon considering the overall system.
TFGolog programs Once we have defined the TFSC domain, we can introduce a partial specification of the
robot behaviours using TFGolog programs. For example, we can say that: at the very beginning, i.e. time 0,
the pant-tilt is idling with attention enabled; from time 0 to 3 the robot should remain where it is (e.g. posinit ),
perform overall scanning with attention on, gathering information from the environment. Furthermore, given a
direction θ it should focus attention f ocus towards it, before 30 and after 20, and move towards it before 50. This
partially specified plan of actions can be encoded by the following TFGolog program:
proc(partialPlan(p, p� , θ),
π(t1 , π(t2 , π(t3 , π(t4 , π(t6 , π(t5 , π(t7 , π(t8 , π(t9 , π(t10 , π(t11 , π(t12 , π(t13 , π(t14 ,
Activemap (slm, 0, t1 ) ∧ t1 > 0)? �
Elapsed(att, detect, t2 , t3 ) ≺ Elapsed f ocus (att, θ, t4 , t5 )? �
Elapsedidle (ptu, 0, t6 )? ≺ (Elapsed scan (ptu, θ, t7 , t8 ) ∧ 20 ≤ t6 ∧ t8 ≤ 30)? �
Elapsed(lcm, stop, t9 , t10 )? �
(Elapsedat (nav, p, 0, t11 ) ∧ t12 > 3)? ≺ (Elapsedat (nav, p� , t13 , t14 ) ∧ t15 < 50)?
)))))))))))
).
8
Execution History
36
Planning Horizon
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Exploration
observe
Navigation
at
Locomotion
PTU
approach
goto
stop
idle
SLAM
point
idle
Camera
Attention
run
scan
reset
idle
stop
passive
active
detect
focus
map
passive
detect
observe
explore
at
wander
stop
run
point
scan reset
stop
idle
map
active
focus
passive
active
Current time
Figure 7: The history of states over a period of time (timelines) illustrating the evolution of several system
components up to a planning horizon.
If we fix the horizon equal to (he , h s ) = (0, 50) and the initial bag of situations 0 = B(S 0 ), a complete plan for
partialPlan can be obtained as a bag of timelines mentioning timelines for each components such that
DT |= ∃t1 , . . . , tn .DoT F(partialPlan(p1 , p3 , θ),
0,
[t1 , . . . , tn ], (0, 50)) ∧ I(T c , [t1 , . . . , tn ]).
For example, let:
σ slm
σatt
= do([
= do([
σexp
= do([
σ ptu
= do([
σnav
σlcm
= do([
= do([
startmap (slm, t1 )],
startdet (att, t2 )
start f ocus (att, t4 )
startexp (exp, t7 )
startobs (exp, t9 ),
start point (ptu, θ, t12 ),
end scan (ptu, t15 ),
startgo (nav, p3 , t18 ),
start start (lcm, t20 ),
S 0 );
enddet (att, t3 ),
end f ocus (att, t5 ),
startdet (att, t6 )],
S 0 );
endexp (exp, t8 ),
endobs (exp, t10 ),
startexp (exp, t11 )], S 0 );
end point (ptu, θ, t13 ), start scan (t14 ),
startreset (ptu, t16 ),
endreset (ptu, t17 )], S 0 );
endgo (nav, p3 , t19 )], S 0 );
end start (lcm, t21 )],
S 0 ).
and [t1 , . . . , t21 ]=B(�σ slm [t1 ], σatt [t2 , t3 , t4 ], σexp [t7 , . . . , t11 ], σ ptu [t12 , . . . , t17 ], σnav [t18 , t19 ], σlcm [t20 , t21 ]�), the
bag of situations [ω] defined above is obtained by the macro I(T c , [ω]) which represents a temporal network,
denoted by network(DT , T c , [ω]), with the following temporal constraints:
{0 ≤ t1 ≤ 50; 0 ≤ t2 ≤ t3 = t4 ≤ t5 = t6 ≤ 50;
0 ≤ t7 ≤ t6 = t7 ≤ t8 = t9 ≤ t10 = t11 ≤ 50;
0 ≤ t12 ≤ t12 = t13 ≤ t14 = t15 ≤ t16 = t17 ≤ 50;
0 ≤ t18 ≤ t19 ≤ 50; 0 ≤ t20 ≤ t21 ≤ 50;
t1 ≤ t18 ; t1 ≤ t20 ; t13 ≤ t4 ; t9 ≤ t4 ;
t5 ≤ t10 ; 0 ≤ t4 ≤ t5 ≤ t18 ; t21 ≤ t12 }.
9
RELATED WORKS
37
Beyond partial plans, TFGolog can encode more general and intuitive behaviour fragments for tasks that can
be selected and compiled if they are compatible with the execution context. For example, the following behaviour
fragment can induce the interpreter to produce a plan to find a location within the deadline d and reach it:
proc( f indLocation(d),
π(t1 , π(t2 , π(t3 , π(t4 , π(t5 ,
startexp (exp, t1 ) ≺ endappr (exp, t2 ) �
startwand (nav, t3 ) ; endwand (nav, t4 ) ; π(x, startgoto (nav, x, t5 )) ; (t4 − t3 <d)?))))))
).
This TFGolog script starts both the exploration and wandering activities; the wandering phase has a timeout
d, after which the robot has to go somewhere. The timeout d is provided by the calling process that can be
either another TFGolog procedure or a decision taken by the operator. Note that the procedure here is partially
specified, indeed, we only mention processes belonging to the Exploration and Navigation timelines, but all the
other timelines are to be managed by the TFGolog interpreter.
Another example is the following script that manages the switch to the explore mode during an approach phase
in the case of a stop:
proc(switchApproachExplore(x, d),
π(t1 , π(t2 , π(t3 ,
((((Activeappr (exp, x, t1 ) ∧ Active stop (lcm, t1 ) ∧ idle(nav, t1 ))? ≺
(startmap (slm, t2 ) ≺ startexp (exp, t3 ))) |
(Activeappr (exp, x, t1 ) ∧ Active stop (lcm, t1 ) ∧ ∃y(at(nav, y, t1 ) ∧ y � x))? ;
((Activemap (slm, t1 ))? ≺ startrun (loc, t3 ) |
¬Activemap (slm, t1 )? ≺ startmap (slm, t2 ) ≺ startexp (exp, t3 ))) ∧ (t3 − t1 ) ≤ d)))
).
Here, the script manages a switch between the approach and explore tasks caused by a stop during an approach
to a target location x. The overall switch should occur within a deadline d. It considers two cases: (1) the stop
occurred while the navigation is in idle, hence the robot is not localised; (2) the stop occurred while navigation is
at position y, therefore the robot is localised. In the first case, the system should switch to the exploration mode
(i.e. startexp ) and restart the SLAM mapping (i.e. startmap ) to re-localise the robot. Notice that the generation of
the restart sequence is left to the interpreter because it depends on the context. For instance, if map is running,
the interpreter is to switch off map and restart it. In the second case, the stop occurred while the robot is localised
at position y (that is not the target position), the system can just restart the engine to continue to approach the
target, otherwise, since the system behaviour is not the expected one, to keep the robot safety, the activities are
to be reconfigured in the exploration mode restarting the map.
9
Related Works
A very first temporal extension of the Situation Calculus is proposed by Pinto and Reiter in [57, 56, 65]. In
these works, the authors provide an explicit representation of time and event occurrences assimilating a single
timeline to a timed situation. They specify durative actions by instantaneous starting and ending actions actuating
processes. Concurrent executions of instantaneous actions are also enabled as reported in [65]. Pinto and Reiter
in [57] show that a modal logic for concurrency can be embedded in a suitable Situation Calculus extension.
These topics are addressed also by Miller and Shannahan in [49], where they propose a method to represent
incomplete narratives in the Situation Calculus. In this case, differently from our approach, the problem of an
unknown ordering amongst events is enabled by non monotonic reasoning on temporal events.
9
RELATED WORKS
38
Indeed, while Pinto and Reiter in [57, 56, 65] propose a situation-based timeline representation, where time
is scanned by actions, from which is recovered, Miller and Shannahan suggest in [49] a non monotonic timebased framework where each time point is connected with a situation and the frame problem is addressed via
minimisation.
These approaches are substantially different from our; in fact, as we already stated in Section 4, our framework assumes the durative actions representation proposed in [65, 56], but considering multiple timelines and
flexible intervals. Our approach, furthermore, contributes substantially to the formalisation of tasks switching
and components interaction that has been treated and faced with methodologies, distinct from ours, such as the
flexible temporal planning framework.
Pinto in [57, 56] has considered the interaction within processes too, but under the perspective of exogenous
actions as natural processes.
In [66], Reiter and Yuhua exploit the temporal extension of the Situation Calculus already presented in
[57, 56, 65] for modelling complex scheduling tasks by axiomatising a deadline driven scheduler. In this case,
the tasks are to be scheduled for a single CPU, and a schedule of length n is a sequence of n ground actions represented by a single grounded situation term, therefore constraints and flexible plans are not taken into account.
Temporal properties in the Situation Calculus are also investigated by Gabaldon in [24] and by Bienvenu,
Fritz and McIlraith in [7, 23], mainly focusing on search control in forward planning. Gabaldon, in fact, in [24]
proposes to formalising control knowledge for a forward chaining planner using Linear Temporal Logic (LTL)
expressions, represented in the Situation Calculus, and shows how a progression algorithm can be deployed in
this setting. In the context of preference based planning [5], Bienvenu et al. [7] propose a logical language for
specifying qualitative temporal preferences in the Situation Calculus. In this framework, temporal preferences
can be expressed in LTL and the temporal connectives are interpreted in the Situation Calculus following the
approach proposed by Gabaldon in [24].
Kvarnström and Doherty in [36] present a forward-chaining planner based on domain-dependent search control knowledge represented as formulas in the Temporal Action Logic (TAL); a narrative based linear metric time
logic is used for reasoning about action and change. The authors disregard temporal constraint networks and
flexible planning although in [44], following an approach similar to the one taken in [20, 19], the authors propose
a first step towards the integration of constraint-based representations within the TAL framework.
A procedural approach to model-based executive control through temporally flexible programs is provided by
the model-based programming paradigm of Williams and colleagues [80]. In this approach, the reactive system’s
controller is specified by programs and models of the system components. In particular, the authors develop the
Reactive Model-based programming (RMPL) language and its executive (Titan). Titan control executes RMPL
programs using extensive component-based declarative models of the embedded system to track states, analyse
anomalous situations, and generate control sequences. RMPL programs are complete procedural specification
of the system behaviour. In contrast, we deploy the TFGolog framework where partially specified programs can
be encoded. The system we propose in this paper copes with high-level agent programming and can be seen as
a trade off between the model-based programming approach (e.g. RMPL-Titan) and the model-based reactive
planning (e.g. IDEA [50, 18]), but based on a logical framework and inspired by neuroscience principle on task
switching. Indeed, similarly to RMPL, we use high-level programs to design the controller, but the constructs are
defined in FOL; further, to enable run-time switching, our programs are partially specified scripts to be completed
on-line by the program interpreter that works as a planner.
In the literature, we can find several works investigating the combination of logic-based framework and
temporal constraint reasoning. For example, Dechter and Schwalb in [69] present a logic-based framework combining qualitative and quantitative temporal constraints. This framework integrates reasoning in a propositional
and narrative-based representation of a dynamic domain - in the style of the Event Calculus - with inference techniques proper of the temporal constraint networks formalism of Dechter, Meiri and Pearl [14]. The integration is
based on the notion of conditional temporal network (CTN) which allows decoupling propositional and temporal constraints and treating them in isolation. Analogously to our approach, the logical machinery determines a
10
SUMMARY AND OUTLOOK
39
temporal network that can be solved with constraint propagation techniques.
The combination of logic-based and constraint-based temporal reasoning is also investigated within the Constraint Logic Programming (CLP) paradigm. For example, the TCLP framework proposed by Schwalb and
Vilain [70] augments logic programs with temporal constraints. Indeed Schwalb and Vilain investigate a decidable fragment called Simple TCLP accommodating intervals of event occurrences and temporal constraints
between them. Lamma and Milano in [37] extend the Constraint Logic Programming framework to temporal
reasoning, elaborating on the extensions of Vilain and Kautzs Point Algebra, on Allen’s Interval Algebra and on
the STP framework proposed by Dechter, Meiri and Pearl. Lamma and Milano show how it is possible to cope
with disjunctive constraints even in an interval based framework.
10
Summary and Outlook
Cognitive control has to deal with several components, with flexible behaviours that can be adapted to different
contexts and with the ability to switch between tasks, on stimuli requests.
In this paper, we have presented a methodology to incorporate these attitudes in the Situation Calculus. We
have introduced the Temporally Flexible Situation Calculus (TFSC) that combines temporal constraint reasoning
and reasoning about actions. In this framework, we have shown how to incorporate multiple parallel timelines
and temporal constraints among the activities. For this purpose, we have introduced sets of concurrent, temporal,
situations describing a constructive method to associate a temporal constraint network to each set of concurrent
timelines represented by a collection of situations. In this way, causal logic-based reasoning and temporal constraint propagation methods can be integrated. We have described an approach for modelling complex dynamic
domains in TFSC illustrating how temporally flexible behaviours can be represented. We have shown how this
framework can be exploited to design and develop a model-based control system for an autonomous mobile robot
capable of balancing high-level deliberative activities and reactive behaviours, more details on the application
can be found in [10].
References
[1] Natasha Alechina, Mehdi Dastani, Brian Logan, and John-Jules Ch. Meyer. A logic of agent programs. In
AAAI, pages 795–800, 2007.
[2] James F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832–843, 1983.
[3] A. R. Aron. The neural basis of inhibition in cognitive control. The Neuroscientist, 13:214 – 228, 2007.
[4] Fahiem Bacchus, Joseph Y. Halpern, and Hector J. Levesque. Reasoning about noisy sensors and effectors
in the situation calculus. Artif. Intell., 111(1-2):171–208, 1999.
[5] J. Baier, F. Bacchus, and S. McIlraith. A heuristic search approach to planning with temporally extended
preferences. In Proceedings of IJCAI-2007), pages 1808–1815, 2007.
[6] Federico Barber. Reasoning on interval and point-based disjunctive metric constraints in temporal contexts.
J. Artif. Intell. Res. (JAIR), 12:35–86, 2000.
[7] M. Bienvenu, C. Fritz, and S. McIlraith. Planning with qualitative temporal preferences. In Proceedings of
KR-06, pages 134–144, 2006.
[8] Stephen A. Block, Andreas F. Wehowsky, and Brian C. Williams. Robust execution on contingent, temporally flexible plans. In AAAI, 2006.
REFERENCES
40
[9] Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, and Sebastian Thrun. Decision-theoretic, high-level
agent programming in the situation calculus. In Proceedings of AAAI-2000, pages 355–362, 2000.
[10] A. Carbone, A. Finzi, A. Orlandini, and F. Pirri. Model-based control architecture for attentive robots in
rescue scenarios. Autonomous Robots, 24(1):87–120, 2008.
[11] G. de Giacomo, Y. Lespérance, and H. J. Levesque. Congolog, a concurrent programming language based
on the situation calculus. Artif. Intell., 121(1-2):109–169, 2000.
[12] A.K. Jonsson D.E. Smith, J. Frank. Bridging the gap between planning and scheduling. Knowledge Engineering Review, 15(1), 2000.
[13] Rina Dechter, Itay Meiri, and Judea Pearl. Temporal constraint networks. In KR, pages 83–93, 1989.
[14] Rina Dechter, Itay Meiri, and Judea Pearl. Temporal constraint networks. Artif. Intell., 49(1-3):61–95,
1991.
[15] Yiannis Demiris and Bassam Khadhouri. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5):361–369, 2006.
[16] J. Duncan. Disorganization of behaviour after frontal-lobe damage. Cognitive Neuropsychology, 3:271–
290, 1986.
[17] Matthias Fichtner, Axel Großmann, and Michael Thielscher. Intelligent execution monitoring in dynamic
environments. Fundamenta Informaticae, 57(2–4):371–392, 2003.
[18] A. Finzi, F. Ingrand, and N. Muscettola. Model-based executive control through reactive planning for
autonomous rovers. In Proceedings of IROS-2004, pages 879–884, 2004.
[19] A. Finzi and F. Pirri. Flexible interval planning in concurrent temporal golog. In Proceedings of Cognitive
Robotics 2004, 2004.
[20] A. Finzi and F. Pirri. Representing flexible temporal behaviors in the situation calculus. In Proceedings of
IJCAI-2005, pages 436–441, 2005.
[21] A. Finzi, F. Pirri, and R. Reiter. Open world planning in the situation calculus. In Proceedings of AAAI/IAAI2000, pages 754–760, 2000.
[22] Jeremy Forth and Murray Shanahan. Indirect and conditional sensing in the event calculus. In ECAI, pages
900–904, 2004.
[23] C. Fritz and S. McIlraith. Decision-theoretic golog with qualitative preferences. In Proceedings of the
10th International Conference on Principles of Knowledge Representation and Reasoning (KR06), pages
153–163, Lake District, UK, June 2006.
[24] A. Gabaldon. Precondition control and the progression algorithm. In Shlomo Zilberstein, Jana Koehler,
and Sven Koenig, editors, ICAPS-2004, pages 23–32. AAAI, 2004.
[25] Sandra Clara Gadanho. Learning behavior-selection by emotions and cognition in a multi-goal robot task.
J. Mach. Learn. Res., 4:385–412, 2003.
[26] Michael Gelfond and Vladimir Lifschitz. Action languages. Electron. Trans. Artif. Intell., 2:193–210, 1998.
[27] M. Ghallab and H. Laruelle. Representation and control in ixtet, a temporal planner. In Proceedings of
AIPS-1994, pages 61–67, 1994.
REFERENCES
41
[28] G. De Giacomo, Y. Lespérance, and H. Levesque. ConGolog, a concurrent programming language based
on the situation calculus. Artif. Intell., 121(1–2):109–169, 2000.
[29] G. De Giacomo, Y. Lespérance, and H. J. Levesque. Reasoning about concurrent execution, prioritized
interrupts, and exogenous actions in the situation calculus. In IJCAI-1997, pages 1221–1226, 1997.
[30] H. Grosskreutz and G. Lakemeyer. ccgolog – a logical language dealing with continuous change. Logic
Journal of the IGPL, 11(2):179–221, 2003.
[31] H. Grosskreutz and G. Lakemeyer. Probabilistic complex actions in golog. Fundam. Inf., 57(2-4):167–192,
2003.
[32] J. Jaffar and M.J. Maher. Constraint logic programming: A survey. Journal of Logic Programming,
19/20:503–581, 1994.
[33] Ari K. Jonsson, Paul H. Morris, Nicola Muscettola, Kanna Rajan, and Benjamin D. Smith. Planning in
interplanetary space: Theory and practice. In Artificial Intelligence Planning Systems, pages 177–186,
2000.
[34] Kazuhiko Kawamura, Tamara E. Rogers, and Xinyu Ao. Development of a cognitive model of humans in a
multi-agent framework for human-robot interaction. In AAMAS ’02: Proceedings of the first international
joint conference on Autonomous agents and multiagent systems, pages 1379–1386, New York, NY, USA,
2002. ACM.
[35] Andrei Krokhin, Peter Jeavons, and Peter Jonsson. Reasoning about temporal relations: The tractable
subalgebras of allen’s interval algebra. J. ACM, 50(5):591–640, 2003.
[36] J. Kvarnström and P. Doherty. Talplanner: A temporal logic based forward chaining planner. Annals of
Mathematics and Artificial Intelligence, 30(1-4):119–169, 2000.
[37] E. Lamma, M. Milano, and P. Mello. Extending constraint logic programming for temporal reasoning.
Annals of Mathematics and Artificial Intelligence, 22(1-2):139–158, 1998.
[38] H. J. Levesque, R. Reiter, Y. Lesperance, F. Lin, and R. B. Scherl. GOLOG: A logic programming langauge
for dynamic domains. Journal of Logic Programming, 31:59–84, 1997.
[39] Hector Levesque and Gerhard Lakemeyer. Handbook of Knowledge Representation, chapter Cognitive
Robotics. Elsevier, 2007.
[40] Hector J. Levesque. What is planning in the presence of sensing? In AAAI/IAAI, Vol. 2, pages 1139–1146,
1996.
[41] Hector J. Levesque, Fiora Pirri, and Raymond Reiter. Foundations for the situation calculus. Electron.
Trans. Artif. Intell., 2:159–178, 1998.
[42] H.J. Levesque. Knowledge, action, and ability in the situation calculus. In Proceedings of TARK-94, pages
1–4. Morgan Kaufmann, 1994.
[43] F. Lin and R. Reiter. State constraints revisited. Journal of Logic and Computation, 5(4):655–677, 1994.
[44] M. Magnusson and P. Doherty.
Commonsense-2007, 2007.
Deductive planning with temporal constraints.
In Proceedings of
REFERENCES
42
[45] U. Mayr and SW. Keele. Changing internal constraints on action: the role of backward inhibition. Journal
of Experimental Psychology, 129(1):4–26, 2000.
[46] J. McCarthy. Situations, actions and causal laws. Technical report, Stanford University, 1963. Reprinted in
Semantic Information Processing (M. Minsky ed.), MIT Press, Cambridge, Mass., 1968, pp. 410-417.
[47] Itay Meiri. Combining qualitative and quantitative constraints in temporal reasoning. Artif. Intell., 87(12):343–385, 1996.
[48] E.K. Miller and J.D. Cohen. An integrative theory of prefrontal cortex function. Annual Rev. Neuroscience,
24:167 – 202, 2007.
[49] R. Miller and M. Shanahan. Narratives in the situation calculus. Journal of Logic and Computation,
4(5):513–530, 1994.
[50] N. Muscettola, G. A. Dorais, C. Fry, R. Levinson, and C. Plaunt. Idea: Planning at the core of autonomous
reactive agents. In Proc. of NASA Workshop on Planning and Scheduling for Space, 2002.
[51] Nicola Muscettola. Hsts: Integrating planning and scheduling. Intelligent Scheduling, pages 451–461,
1994.
[52] Bernhard Nebel and Hans-Jürgen Bürckert. Reasoning about temporal relations: a maximal
tractable subclass of allen’s interval algebra. J. ACM, 42(1):43–66, 1995.
[53] A. Newell. Unified theories of cognition. Harvard University Press, 1990.
[54] D. A. Norman and T. Shallice. Consciousness and Self-Regulation: Advances in Research and Theory,
volume 4, chapter Attention to action: Willed and automatic control of behaviour. Plenum Press, 1986.
[55] Andrea Philipp and Iring Koch. Task inhibition and task repetition in task switching. The European Journal
of Cognitive Psychology, 18(4):624–639, 2006.
[56] J. Pinto. Occurrences and narratives as constraints in the branching structure of the situation calculus.
Journal of Logic and Computation, 8(6):777–808, 1998.
[57] J. Pinto and R. Reiter. Reasoning about time in the situation calculus. Annals of Mathematics and Artificial
Intelligence, 14:2510–268, 1995.
[58] J.A. Pinto and R. Reiter. Reasoning about time in the situation calculus. Annals of Mathematics and
Artificial Intelligence, 14(2-4):251–268, September 1995.
[59] F. Pirri and A. Finzi. An approach to perception in theory of actions: Part i. Electron. Trans. Artif. Intell.,
3(C):19–61, 1999.
[60] F. Pirri and R. Reiter. Planning with natural actions in the situation calculus. Logic-based artificial intelligence, pages 213–231, 2000.
[61] Fiora Pirri and Ray Reiter. Some contributions to the metatheory of the situation calculus. Journal of ACM,
46(3):325–361, 1999.
[62] R. Reiter. The frame problem in the situation calculus: A simple solution (sometimes) and a completeness
result for goal regression. In Vladimir Lifschitz, editor, Artificial Intelligence and Mathematical Theory of
Computation: Papers in Honor of John McCarthy, pages 359–380. Academic Press, San Diego, CA, 1991.
REFERENCES
43
[63] R. Reiter. Natural actions, concurrency and continuous time in the situation calculus. In Proceedings of
KR’96, pages 2–13, 1996.
[64] R. Reiter. Sequential, temporal GOLOG. In Proceedings of KR’98, pages 547–556, 1998.
[65] R. Reiter. Knowledge in action : logical foundations for specifying and implementing dynamical systems.
MIT Press, 2001.
[66] R. Reiter and Z. Yuhua. Scheduling in the situation calculus: A case study. Annals of Mathematics and
Artificial Intelligence, 21(2-4):397–421, 1997.
[67] J.S. Rubinstein, E.D. Meyer, and J. E. Evans. Executive control of cognitive processes in task switching.
Journal of Experimental Psychology: Human Perception and Performance, 27(4):763–797, 2001.
[68] Erik Sandewall. Features and fluents (vol. 1): the representation of knowledge about dynamical systems.
Oxford University Press, Inc., 1994.
[69] E. Schwalb, K. Kask, and R. Dechter. Temporal reasoning with constraints on fluents and events. In
Proceedings of AAAI-1994, pages 1067–1072, Menlo Park, CA, USA, 1994. American Association for
Artificial Intelligence.
[70] E. Schwalb and L. Vila. Logic programming with temporal constraints. In TIME ’96: Proceedings of the
3rd Workshop on Temporal Representation and Reasoning (TIME’96), Washington, DC, USA, 1996. IEEE
Computer Society.
[71] M.P. Shanahan. A cognitive architecture that combines internal simulation with a global workspace. Consciousness and Cognition, 15:433–449, 2006.
[72] Murray Shanahan. Solving the frame problem: a mathematical investigation of the common sense law of
inertia. MIT Press, 1997.
[73] Murray Shanahan. The event calculus explained. In Artificial Intelligence Today, pages 409–430. 1999.
[74] A. Tate. ”I-N-OVA” and ”I-N-CA”, Representing Plans and other Synthesised Artifacts as a Set of Constraints, pages 300–304. 2000.
[75] Michael Thielscher. FLUX: A logic programming method for reasoning agents. Theory and Practice of
Logic Programming, 5(4–5):533–565, 2005.
[76] Michael Thielscher and Thomas Witkowski. The features-and-fluents semantics for the fluent calculus. In
KR, pages 362–370, 2006.
[77] S. P. Tipper. Does negative priming reflect inhibitory mechanisms? a review and integration of conflicting
views. Quarterly Journal of Experimental Psychology, 54:321 – 343, 2001.
[78] Marc B. Vilain and Henry A. Kautz. Constraint propagation algorithms for temporal reasoning. In AAAI,
pages 377–382, 1986.
[79] H. Wang and K. J. Brown. Finite set theory, number theory and axioms of limitation. Mathematische
Annalen, 164:26–29, 1966.
[80] B. Williams, M. Ingham, S. Chung, P. Elliott, M. Hofbaur, and G. Sullivan. Model-based programming of
fault-aware systems. AI Magazine, Winter 2003.
Appendix A
A
NOTATIONAL CONVENTIONS AND PRELIMINARIES
A
44
Notational conventions and preliminaries
We recall that the set of axioms Σ (see also Table 1) is:
1.
2.
3.
4.
¬(s � S 0 ),
s � do(a, s� ) ≡ s � s� ,
do(a, s) = do(a� , s� ) ≡ a = a� ∧ s = s� ,
∀P.P(S 0 ) ∧ ∀as.P(s)→P(do(a, s))→∀sP(s).
(43)
Duna is the set of axioms of the form Ai (·) � A j (·) with Ai and A j names of actions, and the set of axioms
specifying that identical action terms must have the same arguments (see Section 3.1). The set Σ ∪ Duna is
satisfiable in some model M0 = (D, I) in which the real line is interpreted as usual. In fact in order to introduce
time (see [58]) we may assume that the signature of initial language L sitcalc includes:
(i) All rational constants p/q, and the special symbols 0 and 1;
(ii) usual operators, such as +, − and ·;
(iii) the relations < and, being = in the language, the defined relation ≤=< ∨ = and the defined relation >
standing for (¬ ≤).
Thus Σ can be suitably extended to include all the axioms for the theory of reals (additive, multiplicative, order,
least upper bound) and the axiom ∀t.0 ≤ t with t ranging over the reals. We also may assume that there always
exists a structure M for the classical basic Situation Calculus, in which the real numbers have an intended
interpretation.
For the next theorems we stipulate the following. Let S be the signature of standard Situation Calculus
with equality, including symbols for actions, situations, �, indexed symbols ti for the rational numbers, and the
symbols indicated in the above items (i-iii), then:
1.
2.
3.
4.
L1 is L sitcalc , the language defined on S.
L2 is L1 extended to the signature including the symbols for the terms mentioning time.
L3 is like L2 extended to the signature including the symbols H, a finite amount of new constants,
of sort object, for types, and =ν .
(44)
L4 is like L3 extended to the signature including the symbols for the terms mentioning timelines and
the terms of sort bag of timelines, i.e. the symbols T , ∈S , =S , ∪S , ∩S , ⊆S and the symbol B.
L4 is LT FS C , the language of the Temporal Flexible Situation Calculus.
A formula ϕ of a language L is said to be restricted to the language LQ , over the signature Q, and denoted ϕ\LQ
if ϕ mentions only the symbols of Q.
Finally we recall two theorems of [61] that will be used in the next proofs.
Theorem 8 (Relative Satisfiability) A basic action theory D is satisfiable iff Duna ∪ DS 0 is.
Theorem 9 (Regression) Suppose W is a regressable formula of L sitcalc and D is a basic theory of actions. Then
R[W] is a formula uniform in S 0 . Moreover,
D |= (∀)(W ≡ R[W]),
where (∀)φ denotes the universal closure of the formula φ with respect to its free variables.
For the definition of the regression operator R we refer the reader for details to [61], see also equation (85).
Appendix B
B
PROOFS OF SECTION 3
45
B Proofs of Section 3
In the following we assume that the reals are axiomatised, as noted in A above, that the language includes
countable many terms taking values in the reals and countable many constant symbols denoting the rational
numbers, plus 0 and 1. We also assume that actions form their arguments in the domain Ob j and R+ .
B.1 Lemma 1-7
Lemma 1 Σtime is a conservative extension of Σ and any model of Σ ∪ Duna can be extended to a model of
Σtime ∪ Duna .
Proof of Lemma 1.
Let Σtime = Σ ∪ Ax0 , that is, the set formed by the foundational axioms of the Situation Calculus, given above,
and the axioms T 1-T 3 (see Table1 page 9). We have to prove that Σtime ∪ Duna is a conservative extension of
Σ ∪ Duna , that is, for any formula ϕ in the language L1 of Σ ∪ Duna :
Σtime ∪ Duna |= ϕ iff Σ ∪ Duna |= ϕ
Let M0 = �D, I� be a model of Σ ∪ Duna with D including the positive real line (see paragraph A above). We
define a structure M1 = (D, I � ) for L2 having the same domain as M0 and with I � interpreting all the symbols of
L1 like I, thus, in particular, we might assume that there is a specific term t0 indicating the 0, and such that for
all positive t, (t0 ≤ t)(I,v) , and thus this is replicated in M1 , v.
Now for the terms mentioning time the interpretation I � is specified as follows, for any assignment v to the
free variables:
(1)
(2)
(3)
�
(time(S 0 ))(I ,v) is mapped to t0(I ,v) .
�
�
For each action A(�x, t) we set (time(A(�x, t)))(I ,v) = t(I ,v) iff t > 0.
For all situation s and actions A:
�
�
(time(do(A(�x, t), s)))(I ,v) = time(A(�x, t))(I ,v) .
�
It follows that T 1-T 3 are satisfied in M1 . This concludes the interpretation of time. We have shown that any
structure for L1 which is a model of Σ ∪ Duna can be suitably extended to the language L2 in so being a model
of Σtime ∪ Duna . Now, by monotonicity, if Σ ∪ Duna |= ϕ\L1 then Σtime ∪ Duna |= ϕ\L1 . For the other direction,
suppose that Σtime ∪ Duna |= ϕ\L1 and Σ ∪ Duna �|= ϕ\L1 , then there is a model M of Σ ∪ Duna not satisfying ϕ\L1 .
Now, M can be extended to satisfy Σtime ∪ Duna , hence we have a contradiction.
Lemma 2 There exists a model M of Σtime ∪ Duna such that, for all s and s� , M models:
s � s� →time(s) ≤ time(s� )
(45)
Proof of Lemma 2. Let M0 = (D, I) be a model of Σtime ∪ Duna , using Lowenheim-Skolem theorem let M1 be a
model elementary equivalent to M0 but with a countable domain. We build a new model M2 = (D� , I � ), having
the same domain for Act and Ob j as M1 , and interpreting in these domains everything, like I. However, the
domain of situations in D� is DS 0 = {S 0I,v } = {[]}, that is, the domain of situations includes only the interpretation
of the constant S 0 , which is the usual one and it is like in I:
(1)
(2)
(3)
M1 , v |= time(S 0 ) = t0
M1 , v |= A(x, t) = t
M1 , v |= A(x, t) = A� (x, t)
iff M2 , v |= time(S 0 ) = t0
iff M2 , v |= A(x, t) = t
iff M2 , v |= A(x, t) = A� (x, t)
(46)
B
PROOFS OF SECTION 3
46
The set of terms for the sort Act is countable, thus we can enumerate the terms of sort action, order them according
to time a1 , a2 , . . . , and consider the interpretation in M2 , according to v, with respect to time along a chain as
follows:
C = t0(M2 ,v) <(M2 ) time(a1 )(M2 ,v) ≤(M2 ) time(a2 )(M2 ,v) ≤(M2 ) · · · ≤(M2 ) time(am )(M2 ,v) ≤(M2 ) · · ·
(47)
Here ≤ has the usual interpretation. Since C is countable because there are countable many terms in the language
of sort Act, we can assume that each term time(ai ) is suitably interpreted. Furthermore, being both M1 and M2
models of Σtime , by axiom (T2), for all actions A, t0 < time(A(x, t)). Now, given that the domain of sort situation
is DS 0 = {S 0(M2 ) } = {[]}, we shall build the following two sets:
T ti = {a(M2 ,v) ∈ Act | time(a)(M2 ,v) = ti(M2 ,v) , ti(M2 ,v) the i-th element in C}
2 ,v)
2 ,v)
2 ,v)
2 ,v
2 ,v
2 ,v
2 ,v
D si = {[aM
, . . . , aM
] | a(M
, . . . , a(M
∈ Act, a(M
∈ T ti and [aM
, . . . , aM
i
i
i
1
1
1
i−1 ] ∈ D si−1 }
(48)
Each D si is countable. Taking the union of the D si :
S it =
∞
�
D si
(49)
i=0
We still get a countable set such that D si ⊆ S it for all D si . The interpretation of do can now be defined as usual
on the sequences in S it, the interpretation of � can be set also to be the usual one, given that the interpretation
of each element in S it is a finite sequence of elements of the domain Act. Thus we extend the interpretation I � of
M2 to J accordingly. Indeed, let M = (D� ∪ S it, J):
(J,v)
(J,v)
[a(J,v)
] = do(J,v) (a(J,v)
, [a(J,v)
i
1 , . . . , ai
1 , . . . , ai−1 ])
�
(J,v) J � (J,v)
(J,v)
s
� s
iff the sequence s
is a proper initial subsequence of s (J,v)
(50)
(J,v)
(J,v)
(J,v)
(J,v)
It follows, by the definition of the D si , that if [a(J,v)
∈ D s p and [a(J,v)
∈ D sq ,
1 , . . . , ap ] = s
1 , . . . , aq ] = s
(J,v)
(J,v)
(J,v) J � (J,v)
I (J,v)
I (J,v)
with p < q then s
� s
, hence time (a p ) < time (aq ) since aP ∈ T t p and aq ∈ T tq . Now, in the new
2 ,v)
2 ,v)
structure M we can define, for each [a(M
, . . . , a(M
] ∈ D si ⊆ S it:
i
1
�
time([a1 , . . . , ai ])(M,v) = time(do(ai , s))(M,v) = time(ai )(M2 ,v)
Hence time(s) < time(s� ) and, clearly, M = (D� ∪ S it, J) is a model of Σtime ∪ Duna .
Thus the claim holds.
(51)
�
Lemma 3 Let Ax1 denote the axioms H1-H2 and ΣH = Σtime ∪ Ax1 : ΣH is a conservative extension of Σtime and
any model of Σtime ∪ Duna can be extended to a model of ΣH ∪ Duna .
Proof of Lemma 3. Let ΣH = Σtime ∪ Ax1 (see the axioms H1-H2 Table 1, page 9). We have to prove that
ΣH ∪Duna is a conservative extension of Σtime ∪Duna , that is, for any formula ϕ in the language L2 of Σtime ∪Duna :
ΣH ∪ Duna |= ϕ iff Σtime ∪ Duna |= ϕ
Let M0 = �D, I� be a model of Σtime ∪ Duna .
We extend I to I � to interpret the type predicates H(i, a) and the relation =ν , between actions. Let M1 = (D, I � )
be a structure having the same domain as M0 and I � will interpret all predicate symbols, and function symbols
and constant of L2 as I, and for the extended language L3 we shall proceed with the following interpretation.
• We first consider the interpretation of the predicate H(i, a). Here we shall only provide a partition of
name types, as follows. We order the constants denoting types, namely i1 , i2 , . . . , im and the action names
B
PROOFS OF SECTION 3
47
A1 , A2 , . . . , An , . . . and we define a mapping f : A p �→ (mod(p − 1, m) + 1), with p ≥ 1 and m the number
of constants denoting types, so that each action name is assigned precisely to a single type.
Now, for any assignment v:
M0 , v |= a = A p (�x) iff M1 , v |= a = A p (�x)
We thus set
�
�
�ik , A p (�x)�(I ,v) ∈ H I iff f (A p ) = k = (mod(p − 1, m) + 1)
It follows that
if M0 , v |= A p (�y) = A p (�x) then M1 , v |= H(ik , A p (�x)) ∧ H(ik , A p (�y)) for f (A p ) = k = (mod(p − 1, m) + 1)
• Next we consider the interpretation of =ν , for any assignment v, as follows:
�
�
�A p (�y), Aq (�x)�(I ,v) ∈ =ν I iff M1 , v |= H(ik , A p (�y)) ∧ H(ik , Aq (�x))
By the construction it follows that:
1.
2.
3.
if M0 , v |= A(�x) = A(�y) then M1 , v |= A(�x) = A(�y) and M1 , v |= A(�x)=ν A(�y)
M0 , v |= A(�x) � B(�y) iff M1 , v |= A(�x) � B(�y)
�
�
�
�
�
�
�
�
H I (iI , AIp (�xv )) ∩ H I ( jI , AqI (�xv )) = ∅ iff iI � jI .
Hence H1-H2 ∪ Duna are verified in M1 .
Now, by an analogous argument as in Lemma 1 we obtain that ΣH ∪ Duna is a conservative extension of
Σtime ∪ Duna .
�
Lemma 4 The relation =ν is an equivalence relation on the set of actions.
Proof of Lemma 4. That the relation =ν is reflexive and symmetric follows from (H2) and the property of ∧. And
the same for transitivity:
1.
2.
3.
a=ν a� ∧ a� =ν a�� →H(i, a) ∧ H(i, a� ) ∧ H(i, a�� )
H(i, a) ∧ H(i, a�� )→a=ν a��
a=ν a� ∧ a� =ν a�� →a=ν a��
(by (H2))
(by (H2))
(by 1, 2 and Taut.)
(52)
Hence =ν is a reflexive, transitive and symmetric relation on the set of actions partitioned by (H1), i.e. it is an
equivalence relation.
�
Lemma 5 Let Ax2 denote the set of four axioms E1-E4 and Σ=ν = ΣH ∪ Ax2 . Any model of ΣH ∪ Duna can be
extended to a model of Σ=ν ∪ Duna .
Proof of Lemma 5. Let Σ=ν = ΣH ∪ Ax2 , we have to prove that Σ=ν ∪ Duna is satisfiable iff ΣH ∪ Duna is.
Let M1 = (D, I) be a model of ΣH ∪ Duna , M1 exists according to Lemma 3. Let M = (D, I � ), where I �
interprets all symbols like I and =ν , on all actions, like I. We thus manage to extend I � to interpret =ν also on
situations as follows:
B
PROOFS OF SECTION 3
48
�
if M1 , v |= s = s� then �s, s� �(I ,v) ∈ =ν I
(a)
�
�
�
if M1 , v |= ¬(s = S 0 ) then (�s, S 0 �(I ,v) , �S 0 , s�(I ,v) ) � =ν I
(b)
�
�
�
�
if M1 , v |= (a=ν a� ) then both �a, do(a� , S 0 )�(I ,v) ∈ =ν I and �a� , do(a, S 0 )�(I ,v) ∈ =ν I
(c)
�
�
�
�
if M1 , v |= ¬(a=ν a� ) then both �a, do(a� , S 0 )�(I ,v) � =ν I and �a� , do(a, S 0 )�(I ,v) � =ν I
(d)
�
The interpretation I � can thus be extended for any assignment v to all situations as follows:
�
�
�
�
�
�
�
�
if M1 , v |= (a=ν a� ) and �s, a� �(I ,v) ∈ =ν I then �a, do(a� , s)�(I ,v) ∈ =ν I
(e)
if M1 , v |= ¬(a=ν a� ) or �s, a� �(I ,v) � =ν I then �a, do(a� , s)�(I ,v) � =ν I
(f)
�
�
�
�
�
�
�
�
if M1 , v |= (s=ν s� ) and �s, a� �(I ,v) ∈ =ν I then �s, do(a� , s� )�(I ,v) ∈ =ν I
(g)
if M1 , v |= ¬(s=ν s� ) or �a� , s�(I ,v) � =ν I then �s, do(a� , s� )�(I ,v) � =ν I
(h)
�
�
�
if (a=ν s)�(I ,v) ∈ =ν I then �s, a�(I ,v) ∈ =ν I
(i)
�
This concludes the extension of I to I � . The construction implies that M = (D, I � ) is a model of (H1-H2, E1-E5).
�
Lemma 6 Let =ν be a relation on the terms of sort actions and situations. Then =ν is an equivalence relation
both on situations and on actions and situations.
Proof of Lemma 6.
First note that the relation =ν is reflexive both on situations and on actions. By axiom E5 it is symmetric on
action and situations. We show that it is symmetric and transitive on situations, likewise that it is transitive over
actions and situations:
Basic case:
symmetry s
s=ν S 0 ≡ S 0 =ν s (By E1.)
(53)
Let, now, s, s� � S 0 , we shall first show:
(a). do(a, s)=ν do(a� , s� ) ≡
a=ν a� ∧ s=ν s� ∧ s� =ν a ∧ a� =ν s
(54)
Indeed:
3.1
3.2
3.3
3.4
3.5
do(a, s)=ν do(a� , s� )
do(a, s)=ν a�
a� =ν do(a, s)
s� =ν do(a, s)
do(a, s)=ν do(a� , s� )
≡
≡
≡
≡
≡
do(a, s)=ν a� ∧ s� =ν do(a, s)
(By E4)
a� =ν do(a, s)
(By E5)
a=ν a� ∧ (s=ν a� )
(By E4)
(55)
s� =ν a ∧ s� =ν s
(By E4)
(a=ν a� ) ∧ (s=ν s� ) ∧ (s� =ν a) ∧ (a� =ν s) (By 3.2, 3.3, 3.4, E5 and Ind. Hyp.)
We can thus show symmetry for situations:
symmetry-s : do(a, s)=ν do(a� , s� ) ≡
do(a, s)=ν do(a� , s� ) ≡
≡
≡
s:
do(a� , s� )=ν do(a, s)
(a=ν a� ) ∧ (s=ν s� ) ∧ (s� =ν a) ∧ (a� =ν s)
(a� =ν a) ∧ (s� =ν s) ∧ (s=ν a� ) ∧ (a=ν s� )
do(a� , s� )=ν do(a, s)
(By 3)
(By Ind. Hyp.)
(56)
We shall, now, show transitivity for action and situations, here (symm) shall refer to both (E5) and symmetryT 1. a=ν s ∧ s=ν s� →a=ν s� .
(57)
B
PROOFS OF SECTION 3
49
For either s = S 0 or s� = S 0 or both, it is trivially true, by (E2). Let s, s� � S 0
1. a=ν s ∧ s=ν s� ∧ a = a→a=ν s� ∧ s� =ν s ∧ a=ν a
( By (E1) and (E5))
2. a=ν s ∧ s� =ν s ∧ a=ν a→do(a, s� )=ν do(a, s)
( by (a))
3. do(a, s� )=ν do(a, s)→a = do(a, s� ) ∧ s=ν do(a, s� ) ( by (E4))
4. a = do(a, s� )→a=ν a ∧ a=ν s�
( by (E3) and (symm.))
5. a=ν s ∧ s=ν s� →a=ν s�
( by 1,4 and Taut)
(58)
T 2. a� =ν s ∧ s=ν s� ∧ s� =ν a→a� =ν a.
(59)
1.
2.
3.
4.
5.
a� =ν s ∧ s� =ν s→s=ν do(a� , s� )
( By (E4) and (symm) )
a=ν s� ∧ s� =ν s→a=ν s
( by T1. and (symm.))
a=ν s ∧ s=ν do(a� , s� )→a=ν do(a� , s� ) ( by 1, 2 and T1.)
a=ν do(a� , s� )→a=ν a� ∧ a=ν s�
( by (E4))
a� =ν s ∧ s=ν s� ∧ s� =ν a→a� =ν a
( by 1,2, 4, (symm.) and Taut)
(60)
Similarly, from (a), (E3), (E4) and (E5) it is possible to prove that
T 3. a=ν a� ∧ a� =ν s→a=ν s.
(61)
Finally transitivity for situations can be shown by induction on s. For s = S 0 and s�� = S 0 :
1.
2.
3.
4.
do(a, S 0 )=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , S 0 )→a=ν a��
( By (a) above and Lemma 4)
a=ν a�� →a�� =ν do(a, S 0 )
( by (E3))
(62)
a�� =ν do(a, S 0 )→do(a, S 0 )=ν do(a�� , S 0 )
( by (E4))
do(a, S 0 )=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , S 0 )→do(a, S 0 )=ν do(a�� , S 0 ) ( by 1, 3 and Taut.)
For s � S 0 :
1.
2.
3.
4.
5.
do(a, s)=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , s�� )→a=ν s� ∧ a� =ν s ∧ a� =ν s�� ∧ a�� =ν s�
s=ν s� ∧ s� =ν s�� →s=ν s��
a=ν s� ∧ s� =ν s�� →a=ν s��
a=ν a�� ∧ s=ν s�� ∧ a=ν s�� →do(a, s)=ν do(a�� , s�� )
do(a, s)=ν do(a� , s� ) ∧ do(a� , s� )=ν do(a�� , s�� )→do(a, s)=ν do(a�� , s�� )
( By (a) )
( by Ind. Hyp.)
( by (a) )
(63)
( by (d) )
( by 5 and Taut.)
We have thus shown that =ν is an equivalence relation on the set of actions and situations.
�
B.2 Proof of Theorem 1
We have to show that Σ ∪ Duna together with the set of axioms Ax0 − Ax2 , that is, (T 1-T 3, H1-H2, E1-E5) forms
a satisfiable set. We have shown, incrementally that Σtime = Σ ∪ Ax0 is a conservative extension, Lemma 1, that
ΣH = Σtime ∪ Ax1 conservatively extends Σtime , Lemma 3, and that Σ=ν = ΣH ∪ Ax2 is a conservative extension of
ΣH , Lemma 5. And, in particular, that all are conservative extensions of Σ ∪ Duna . Hence any model of Σ ∪ Duna
can be extended to a model of Σ ∪ Ax0 ∪ Ax1 ∪ Ax2 .
�
B
PROOFS OF SECTION 3
50
B.3 Proof of Corollary 1
By Theorem 1 we know that Σ=ν ∪Duna is satisfiable in some model M of L3 . On the other hand the satisfiability
of DS 0 and hence of Σ=ν ∪ Duna ∪ DS 0 depends on the design of DS 0 and, in particular, on the definition of
H(i, a), for each component i, which are in DS 0 . If DS 0 ∪ Duna ∪ Σ=ν is satisfiable, then following the same
arguments of the relative satisfiability theorem (see Theorem 8) a model M of DS 0 ∪ Duna ∪ Σ=ν can be easily
extended to a model of Σ=ν ∪ Duna ∪ DS 0 ∪ D ss ∪ Dap . The other direction follows from the fact that a model
of Σ=ν ∪ Duna ∪ DS 0 ∪ D ss ∪ Dap is also a model of Σ=ν ∪ Duna ∪ DS 0 .
�
The only concern is given by the specifications of the H(i, a) for each type i and each action A, mentioned in
Duna , in DS 0 , whether there exists a model for DS 0 ∪ Duna ∪ Ax1 . If this model exists then using the previous
theorem and lemmas this can be extended to a model of DS 0 ∪ Duna ∪ Σ=ν .
So we may make some assumption on the definition of the H(i, a) to show that, under these conditions, a
model of DS 0 ∪ Duna ∪ Ax1 exists.
Lemma 7 Let D−S 0 ∪ Duna be satisfiable in some model M of L2 , let it be uniform in S 0 and not mentioning the
predicate H(·). Then the definitions of H(·), for each type i, and action A referred to, in Duna , can be safely added
to D−S 0 ∪ Duna in the form:
∀a.H(i, a) ≡ ϕ(i, a), with ϕ not mentioning H(·)
(64)
If there are formulas ϕ(i, a) specifying actions and components, such that, for each i.
i. D−S 0 |� = ∀a.ϕ(i, a),
�
ii. D−S 0 |= ∃a.ϕi (i, a) ∧ ni� j ¬ϕ j ( j, a).
(65)
j=1
then the extended DS 0 will satisfy, for all types i:
a.
b.
H(i, a), in DS 0 occurs only in formulas of the form ∀a.H(i, a) ≡ φ(i, a), with φ not mentioning H(·),
�
�
DS 0 ∪ Duna ∪ {∀a. ni=1 H(i, a)} ∪ {∀a.H(i, a)→ ni� j ¬H( j, a)}
(66)
is satisfiable in some modelM� of L3
j=1
Proof. Assume that D−S 0 is satisfiable in some structure M of L2 and, thus, it does not mention H(·). Then we
extend the theory DS 0 to L3 according to the following construction.
First note that here we mention i as an element of the domain object, no axioms for types are assumed so far,
although we can assume that there are n elements of the domain object specifying components (despite we use
natural numbers to denote them). Define, similarly as in Lemma 3, an indexing function i = mod( j − 1, n) + 1
for action names A j so that actions are grouped in such a way that M �|= ∀aϕ(i, a), with ϕ(i, a) a suitable formula
mentioning the A j specified in Duna , and satisfying the conditions of the Lemma. For example, if there is a finite
�
set of action names ascribed to a component i, then ϕi (i, a) is ∃ x�1 , . . . , x�k kj=1 a = A j (�x j ), as in (6), with the
A j suitably chosen with i = mod( j − 1, n) + 1, and it satisfies all the conditions of the lemma. Given M, by the
Lowenheim-Skolem theorem, there is a model M∼ of DS 0 ∪ Duna which is elementary equivalent to M and has
a countable domain. As usual we define a new structure M1 = (D, I) for L3 , with the same domain and the same
interpretation as M∼ on all symbols of L2 . Furthermore M1 interprets all the new constants symbols that are
added in the construction, as illustrated below in (67).
The construction with new constants is given according to the above specified enumeration of formulas ϕ(i, a),
B
PROOFS OF SECTION 3
51
and the above specified conditions as follows:
∆0
...
∆i
=
=
{ψ | M∼ |= ψ}
�
{H(i, a) ≡ ϕ(i, a) | M∼ , v |= ϕ(i, a) iff M1 , v |= H(i, a) and ∃a.ϕ(i, a) not used in ∆ j , 0 < j < i}
�
(67)
{H(i, c) | M1 , v |= H(i, a) ∧ ϕ(i, a), av = d = cI , c a fresh constant symbol}
�
v
I
{¬H(i, c) | M1 , v |= ¬ϕ(i, a) ∧ ¬H(i, a), a = d = c , c a fresh constant symbol}
�
�
{H(i, c)→ nj�i ¬H( j, c) | M1 , v |= nj�i ¬ϕ j ( j, a) ∧ ϕi (i, a), , av = d = cI , c a fresh constant symbol}
j=1
j=1
Each ∆i , 0 ≤ i ≤ n is satisfiable in M1 , by construction, furthermore
D=
n
�
(68)
∆i
i=0
is satisfiable in M1 and H is complete in D, which is the diagram of H, in M1 , that is, H(i, c) ∈ D iff M1 , v |=
H(i, a) and ¬H(i, c) ∈ D iff M1 , v |= ¬H(i, a), with av = d = cI . By the constraints on each ϕ(i, a) in the
�
�
enumeration, M∼ �|= ∀a.ϕ(i, a), and M∼ , v |= ϕi (i, a) ∧ nj�i ¬ϕ j ( j, a) hence H(i, c)→ nj�i ¬H( j, c) ∈ ∆i . It
j=1
j=1
�
�
remains to show that ni=1 ¬H(i, c) � ∆i . But that ni=1 ¬H(i, c) ∈ ∆i is impossible, since for each ∆i all the added
�n
constants are fresh hence if ¬H(i, c) ∈ ∆i , then ¬H( j, c) � D, i � j. Also because if j�i ¬H( j, c) ∈ ∆i , then by
j=1
�
the condition on the subset, it must be that H(i, c) hence ni=1 ¬H(i, c) � ∆i .
�n
�
It follows that M1 �|= ∃a. i=1 ¬H(i, a) and since M1 is also a model of DS 0 it follows that DS 0 �|= ∃a. ni=1 ¬H(i, a).
�n
�n
On the other hand H(i, c)→ j�i ¬H( j, c) ∈ ∆i , for each i, hence D |= H(i, c)→ j�i ¬H( j, c) for each i with c
j=1
j=1
�
�
�
�
�
�
new constants, hence M1 |= ∀a.H(i, a)→ nj�i ¬H( j, a). Thus DS 0 Duna {∀a.H(i, a)→ nj�i ¬H( j, a)} {∀a. ni=1 H(i, a)}
j=1
j=1
is satisfiable in M1 .
Therefore, under the conditions (65), M1 |= H1. Following again Lemma 3 the construction can lead to a
model for DS 0 ∪ Duna ∪ Ax1 .
�
Theorem 2 A timeline represents the =ν -equivalence class of situations of the same type.
Proof of the theorem. Recall that a timeline is defined by an (improper) successor state axiom as follows:
T (i, do(a, s)) ≡ (s � S 0 ∧ a=ν s ∧ T (i, s)) ∨ (s = S 0 ∧ H(i, a)).
(69)
We show that a timeline corresponds to an =ν -equivalence class. Define:
[ do(a, s) ] = {do(a� , s� ) |do(a, s)=ν do(a� , s� )}
(70)
[ do(a, s)] is an =ν -equivalence class because =ν is an equivalence relation on actions and situations (see Lemma
6). We show, by induction on s� that:
do(a� , s� ) ∈ [do(a, s)] iff ∃i.T (i, do(a� , s� )) ∧ T (i, do(a, s)).
By definition of the equivalence class, it implies that we show
do(a, s)=ν do(a� , s� ) ≡ ∃i.T (i, do(a� , s� )) ∧ T (i, do(a, s)).
Basic case s� = S 0
(71)
B
PROOFS OF SECTION 3
52
⇒ A. s� = S 0 and s = S 0
1. do(a� , S 0 )=ν do(a, S 0 )→a=ν a�
(By (a), Lemma 6 )
2. a=ν a� →∃i.H(i, a) ∧ H(i, a� )
(By (H2))
3. H(i, a) ∧ H(i, a� ) ∧ s� = S 0 ∧ s = S 0 →
(72)
∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 ))
(By (W1))
4. do(a� , S 0 )=ν do(a, S 0 )→∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 )) (By A1, A3 and Taut.)
�
B. s = S 0 and s � S 0 and the Ind. Hyp. is s=ν do(a� , S 0 )→∃i.T (i, do(a� , S 0 )) ∧ T (i, s), for s ∈
[do(a, s)].
1. do(a� , S 0 )=ν do(a, s)→a=ν do(a� , S 0 ) ∧ s=ν do(a� , S 0 )
(By (E4))
2. s=ν do(a� , S 0 )→∃i.T (i, do(a� , S 0 )) ∧ T (i, s)
(By Ind. Hyp.)
3. a=ν do(a� , S 0 )→a=ν a�
(By (E3).)
4. s=ν do(a� , S 0 )→a� =ν s
(By (E4))
(73)
5. a� =ν a� ∧ a� =ν s→a=ν s
(By (T3) Lemma 6)
6. ∃i.T (i, do(a� , S 0 )) ∧ T (i, s) ∧ a=ν s→
∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s))
(By 2 and (W1))
7. do(a� , S 0 )=ν do(a, s)→∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s)) (By 1, 6 and Taut.)
⇐ C. s� = S 0 and s = S 0
1. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 ))→∃i.H(i, a� ) ∧ H(i, a)
(By (W1))
2. ∃i.H(i, a) ∧ H(i, a� )→a=ν a�
(By (H2))
3. a� =ν a ∧ s = S 0 →a� =ν do(a, S 0 )
(By (E3) )
(74)
4. a� =ν do(a, S 0 )→do(a, S 0 )=ν do(a� , S 0 )
(By (E4))
5. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, S 0 ))→do(a, S 0 )=ν do(a� , S 0 ) (By 1,4 and Taut.)
�
D. s = S 0 and s � S 0 , the Ind. Hyp. is ∃i.T (i, do(a� , S 0 )) ∧ T (i, s)→s=ν do(a� , S 0 )
1. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s))→
∃i.T (i, do(a� , S 0 )) ∧ a=ν s ∧ T (i, s)
(By (W1))
2. ∃i.T (i, do(a� , S 0 )) ∧ a=ν s ∧ T (i, s)→s=ν do(a� , S 0 )
(By Ind. Hyp.)
(75)
3. a=ν s ∧ s=ν do(a� , S 0 )→a=ν do(a� , S 0 )
(By (E3) and symm.)
�
�
�
4. a=ν do(a , S 0 ) ∧ s=ν do(a , S 0 )→do(a, s)=ν do(a , S 0 )
(By (E4))
5. ∃i.T (i, do(a� , S 0 )) ∧ T (i, do(a, s))→do(a� , S 0 )=ν do(a, s) (By 1,4 and Taut.)
Induction s� � S 0 (we may also assume w.l.g s � S 0 , otherwise it can be reduced to the above basic cases.)
⇒ The Ind. Hyp. is s=ν s� →∃i.T (i, s) ∧ T (i, s� ) with s� ∈ [s].
1. do(a� , s� )=ν do(a, s)→a=ν a� ∧ s=ν s� ∧ a=ν s� ∧ a� =ν s
(By (b,c), Lemma 6 )
2. a� =ν s ∧ s=ν s� →a� =ν s�
(By (T1), Lemma 6 )
3. a=ν s� ∧ s� =ν s→a=ν s
(By (T1), Lemma 6 )
4. s=ν s� →∃i.T (i, s� ) ∧ T (i, s)
(By Ind. Hyp. and 1)
5. ∃i.T (i, s� ) ∧ T (i, s) ∧ a=ν s ∧ a� =ν s� →
∃i.T (i, do(a, s)) ∧ T (i, do(a� , s� ))
(By (W1))
6. do(a� , s� )=ν do(a, s)→∃i.T (i, do(a, s)) ∧ T (i, do(a� , s� )) (By 1, 5 and Taut.)
⇐ The induction hypothesis is ∃i.T (i, s) ∧ T (i, s� )→s=ν s� with s� ∈ [s].
(76)
1. ∃i.T (i, do(a� , s� )) ∧ T (i, do(a� , s� ))→
a� =ν s� ∧ a=ν s ∧ T (i, s) ∧ T (i, s� )
(By (W1))
2. ∃i.T (i, s) ∧ T (i, s� )→s=ν s�
(By Ind. Hyp.)
(77)
3. a� =ν s� ∧ s� =ν s ∧ a=ν s→a� =ν s ∧ a=ν s� ∧ a=ν a�
(By (T1,T2), Lemma 6 )
�
�
�
� �
4. a=ν s ∧ s=ν s ∧ a=ν a →→do(a, s)=ν do(a , s )
(By (d), Lemma 6 )
5. ∃i.T (i, do(a� , s� )) ∧ T (i, do(a� , s� ))→do(a, s)=ν do(a� , s� ) (By 1,4 and Taut.)
B
PROOFS OF SECTION 3
53
We have, thus, shown that a timeline represents an equivalence class indexed by a type (that is by a component
of the system). Furthermore it follows from the definition that S 0 does not belong to any equivalence class hence
to no timeline. Thus any induction on timelines needs that the basic case is do(a, S 0 ).
�
We have to show that the axioms G1-G5 have a model which is also a model of Σ=ν ∪ Duna .
A structure for L3 which is a model for Σ=ν ∪ Duna has been provided in Theorem 1, furthermore, if we
consider a satisfiable initial database DS 0 as shown in Corollary 1, there exists a model of Σ=ν ∪ Duna ∪ DS 0 that
can be extended to a model of D = Σ=ν ∪ Duna ∪ DS 0 ∪ D ss ∪ Dap , with successor state axioms mentioning
also timelines. Therefore let M1 = (D, I0 ) be such a model. We shall show that this model M1 can be extended
to a model M2 in which we give an appropriate interpretation for the sort S of bags of timelines, and such that
the axioms G1-G5 are satisfied. Let M2 = (D ∪ {S }, I) i.e. M2 has the same domain as M1 for sorts objectincluding reals (time)- actions and situations, I is like I0 for all symbols of the language L3 and it is extended to
interpret the above mentioned elements in S according to the following steps.
Let n ∈ N and �s1 , . . . , sn � a n tuple of situations in S itn , S it ⊂ D and v any valuation on the free variables.
1.
2.
If n = 0
If n > 0
B 0I ∈ S .
(I,v)
�s(I,v)
1 , . . . , sn � ∈ S iff for each sk , k = 1, .., n, M1 , v |= (sk = S 0 ∨ ∃iT (i, sk ))
(78)
Now, each term (I,v) ∈ S is invariant to both permutations of the order of the tuple and compaction of
repeated situations. That is, whenever p is a permutation of {1, . . . , n} and �s1 , . . . , sn � is a tuple of situations in
S itn , S it ⊂ D, then
(I,v)
(I,v)
(I,v) (I,v)
(I,v) (I,v)
�s(I,v)
1 , . . . , sn � = �κ(s p1 , . . . , s pi , s pi , . . . , s pn )� p
(79)
Here κ : S �→ S indicates a compaction function on a tuple of the repeated k elements of a n + k tuple and
(I,v)
�· � p : S �→ S the permutation function on {1, . . . , n}. More precisely, �s(I,v)
p1 , . . . s pn � p has been obtained
(I,v) (I,v)
(I,v) (I,v)
from κ(�s(I,v)
1 , . . . , s pi , s pi , . . . , s pn � p ) by compacting to a single representative the repeated arguments s pi ,
(I,v)
(I,v)
(I,v)
(I,v) (I,v)
and �s(I,v)
by a permutation p of {1, . . . , n}. In the
1 , . . . , sn � has been obtained from �s p1 , . . . , s pi , . . . s pn � p
language the permutation �·� p incorporates also the compaction κ.
[G1] Given the construction of terms of sort bag of timelines we can state the following, for n ∈ N and v any
assignment to the free variables:
� �
�
M2 , v |= s ∈S B(�s j1 , . . . , s jn � j ) iff M1 , v |=
(s = s jk ∧ ∃i.T (i, s jk )) ∨ S 0 = s jk
(80)
1≤k≤n
And since M2 is like M1 for L3 it follows that (G1) is satisfied in M2 .
We can now generalise the membership relation over bags of situations as follows:
1.
2
For all sI,v ∈ S it ⊂ D,
I f (I,v) = B 0I
Otherwise
M2 , v |= s ∈S
then M2 , v |= s �S
(I,v)
iff there is a n ∈ N and a n tuple �s(I,v)
1 , . . . sn �, of elements of S it ⊂ D s.t.
(I,v)
(I,v) (I,v)
(I,v)
= �κ(s p1 , . . . s pn )� p , for some k = 1, . . . , n, s(I,v) = s(I,v)
pk , and
M2 , v |= ∃i.T (i, s) iff sI,v � S 0I
Now we have given an interpretation in M2 to ∈S and built the terms of sort bag of timelines, thus we can
define the interpretation for =S :
M2 , v |=
=S
�
�
iff for any sI ,v ∈ S it (M2 , v |= s ∈S
iff M2 , v |= s ∈S
�
)
(82)
(81)
B
PROOFS OF SECTION 3
54
[G2] Follows from (82) above.
[G3] This is a consequence of item (1) of (78), defining the interpretation of the constant B 0 .
[G4] By definition of timeline and of bag of timelines, if M2 , v |= (s = S 0 ∨ ∃iT (i, s)) then B(s) is a bag of
timelines. Consider, now, the definition of ∪S given in Example 10 then
M2 , v |=
=S
�
∪ B(s) iff M2 , v |= ∀s.s ∈
≡s∈
�
∪ B(s)
Hence by simple induction on the structure of � , G4 is satisfied in M2 .
[G5] Let S be the terms of sort bag of timelines, and consider the definition of ⊆S given in Example 10:
⊆S is an ordering relation on S , then every subset of S has a set of minimal elements and, in particular, B 0 is
a minimum. Now suppose that
M2 , v |= ϕ(B 0 ) ∧ (∀s .ϕ( ) ∧ ϕ(B(s))→ϕ( ∪S B(s)))
(83)
and for some
M2 , v �|= ϕ( )
Hence
M2 , v |= ¬ϕ( )
Let W = { � |M2 , v |= ¬ϕ( � )} then W has a set of minimal elements Min = { ∈ W |¬∃ ∈ W, ⊂S }. Let
x ∈ Min then, by hypothesis M2 , v |= ¬ϕ(x), with x � B 0 . Now, being x minimal in W, there must exist a � ,
�
∈ S , with � ⊂ x and � � W. Then M2 , v |= ϕ( � ), since � � W. We can find an s such that x = � ∪S B(s),
by the conditional existence. By equation (83)), since by hypothesis M2 , v |= ϕ(B 0 ), and from M2 , v |= ϕ( � )
and ϕ(B(s)) it follows that M2 , v |= ϕ( � ∪ B(s)). Hence M2 , v |= ϕ(x), a contradiction.
We have shown that M2 is a model of Duna ∪ Σ=ν ∪ (G1-G5).
�
To prove the theorem we shall first extend the definition of uniform terms and regressable sentence (see [61])
to include terms and formulas mentioning terms of sort bag of situations. Let LT FS C be the language of SC
extended to include A+ and let D+ be a basic action theory extended with A+ .
Let σ denote a term of sort situation mentioning only S 0 and terms of sort actions αi = A(t1 , . . . , tm ), m ≥ 0,
with ti , 1 ≤ i ≤ m, not mentioning terms of sort situation, that is, appealing to the notational convention used in
[61] σ = do([α1 , . . . , αn ], S 0 ), for some n ≥ 0, and for terms α1 , . . . , αn of sort action.
Definition 4 The set of terms of the language LT FS C uniform in σ1 , . . . , σk , k ≥ 1, is the smallest set defined as
follows:
1. Any term not mentioning term of sort situation is uniform in σ1 , . . . , σk , k ≥ 1.
2. σi is uniform in σ1 , . . . , σk , i = 1, . . . , k.
3. If g is an n-ary function symbol other than do and B, and t1 , . . . , tn are terms uniform in σ1 , . . . , σk whose
sorts are appropriate for g, then g(t1 , . . . , tn ) is a term uniform in σ1 , . . . , σk .
4. B(σi ) is a term uniform in σ1 , . . . , σk , i = 1, . . . , k.
5. B(�σ j1 , . . . , σ jk � j ) is a term uniform in σ1 , . . . , σk , for any permutation j of {1, . . . , k}.
B
PROOFS OF SECTION 3
55
Finally:
Items (4) and (5), of Definition 4 above, are correct because bags of timelines are flat sets, that is, they are
formed only by situations, which are their individual elements. This follows from the first axiom (G1) defining
membership only for elements of sort situation. To see this we prove the following lemmas.
Lemma 8 For all bags of timelines and :
B.6.1
�S .
Proof of Lemma 8
We prove the claim by induction on , using (G5). First note that by (G1) � B 0 and by (G2) � B(s), because
�S s. Let: φ( ) = ∀ . � , then we have:
φ( )
φ(B 0 )
φ(B(s))
=
=
=
∀ . �
(Ind. Hyp)
∀ . � B0
(by G1)
∀ . � B(s)
(by G2)
Then
∀ . � ∧ � B(s)→ � ∪S B(s) (by Def. of ∪S Ex. 10)
∀ . � (by Ind. Hyp. and G5)
(84)
Hence the claim.
�
The above lemma implies, in particular, that bags of timelines are not ordinal numbers.
Lemma 9 Let P( ) be the power set of the bag . Then for any bag term , ∩ P( ) = ∅.
B.6.2
Proof of Lemma 9
Let P( ) be the power set of , that is ∀x.x ⊆S →x ∈S P( ). We have to show that for all bags of timelines
, ∩S P( ) = ∅. Suppose that there is some x, such that x ∈S ∩S P( ), then x ∈S and x ∈S P( ).
Since x ∈S P( ) then x is a bag term � hence � ∈S contradicting the previous Lemma 8.
�
Lemma 10 If
B.6.3
is a set of bag terms then
�S
for all bag terms .
Proof of Lemma 10
Follows from the previous Lemma 9.
�
We define now the set of regressable formulas extending the definition of [61] to include formulas mentioning
bag of timelines.
Definition 5 A formula W of LT FS C is regressable iff
1. W is first order.
2. W does not mention variables of sort situation nor of sort bag of timelines.
3. Every term of sort situation mentioned by W is uniform in σ1 , . . . , σn , n ≥ 1.
4. For every atom of the form Poss(α, σ) mentioned by W, α has the form A(t1 , . . . , tn ) for some n-ary action
function symbol A of LT FS C .
B
PROOFS OF SECTION 3
56
5. Every term of sort bag of timelines appearing in W is uniform in σ1 , . . . , σk , for some k ≥ 0.
6. W does not quantify over situations nor bag of situations.
First note that, by definition of the regression operator,
if
then
D |= W ≡ W �
D |= R(W) ≡ R(W � )
(85)
Let W be a regressable formula mentioning terms of sort bag of timelines we show that
D+ |= (∀)W ≡ R(W)
with R(W) a formula uniform in S 0 .
We show the claim by induction on the structure of the regressable formula W, mentioning terms of sort bag
of timelines. We first show, however, that T (i, σ) is regressable if σ is a uniform term.
Lemma 11 Consider a uniform term of sort situation, of the form σm+1 = do(Am+1 (�xm+1 ), do(. . . , do(A1 (�x1 ), S 0 ) . . .)),
for some m ∈ N and actions A1 , . . . Am+1 , that is a timeline. Then:
R(T (i, do(Am+1 , σm ))) ≡
m+1
�
H(i, A j (�x j ))
(86)
j=1
Proof of the Lemma First note that given σm+1 , as specified in the Lemma, the following holds, by m applications
of Axioms (E1) and (E3) and of theorem (T1) of Lemma 6:
Am+1 =ν σm ∧ σm = do(Am (�xm ), do(. . . , do(A1 (�x1 ), S 0 ) . . .))
≡
≡
Am+1 (�xm+1 )=ν Am (�xm ) ∧ . . . ∧ A2 (�x2 )=ν A1 (�x1 )
(87)
H(i, Am+1 (�xm+1 )) ∧ . . . ∧ H(i, A1 (�x1 ))
Further, by induction on the number of actions mentioned in the uniform situation term σm+1 , we can see
that:
T (i, do(Am+1 (�xm+1 ), σm )) ≡
m+1
��
j=1
�
A j (�x j )=ν σ j−1 ∨ σ j−1 = S 0 ∧ H(i, A(�x j ))
(88)
Indeed, the basic case, for σm = S 0 follows from the definition of T (i, do(a, s)). For the induction, we have:
T (i, do(Am+1 (�xm+1 ), σm )) ≡
≡
Thus
�m+1
j=1
≡
T (i, σm ) ∧ Am+1 (�xm=1 )=ν σm ∨
σm = S 0 ∧ H(i, Am+1 (�xm=1 ))
�
�m+1 �
x j )=ν σ j−1 ∨ σ j−1 = S 0 ∧ H(i, A(�x j ))
j=1 A j (�
∧Am+1 (�xm=1 )=ν σm ∨ σm = S 0 ∧ H(i, Am+1 (�xm=1 ))
�m+1
x j ))
j=1 H(i, A j (�
(89)
(By Ind. Hyp.)
(By (87), above and Taut.)
H(i, A j (�x j )) ≡ R(T (i, do(Am+1 , σm ))), a formula uniform in S 0 .
�
Theorem continue.. For the basic step we consider the following atoms (see Definition 1):
1. if W has the form σ ∈ B(�σ j1 , . . . , σ jk �), with each σ ji = do([α1 , . . . , αn ], S 0 ) for some n ≥ 0, then by
axiom (G1),


�
�


D+ |= W ≡
σ = σ jp ∧ σ jp = S 0 ∨
T (i, σ jp )
1≤p≤k
i
B
PROOFS OF SECTION 3
57
So let W � be the RHS of the equivalence in the above formula, by Lemmas 8,9 and (G1) W � does not
mention terms of sort bag of timelines and moreover is a regressable formula (Lemma 11) hence the
regression theorem, as stated in [61] applies. Therefore R(W � ) is a formula uniform in S 0 and, by (85)
above:
R(W) ≡ R(W � )
(90)
and the claim is verified by monotonicity since D ⊆ D+ .
2. If W has the form
=S
then, because W is a regressable sentence, it has the following form:
B(�σ j1 , . . . , σ jk � j ) =S B(�σ�p1 , . . . , σ�pm � p )
then by (G2), W is equivalent to the formula W � :
�
� �
�
∀s. s ∈S B(�σ j1 , . . . , σ jk � j ) ≡ s ∈S B(�σ p1 , . . . , σ pm � p )
(91)
Finally W �� , by first order tautologies and equality, is equivalent to the following sentence W �� :
��
�
�
�
1≤h≤k
1≤q≤m (σ jh = σ pq ∧ (σ jh = S 0 ∨
i T (i, σ jh )))
(93)
And, by (G1), W � is equivalent to the following formula W �� :
��
�
�
�
�
∀s. 1≤h≤k s = σ jh ∧ (σ jh = S 0 ∨ i T (i, σ jh )) ≡ 1≤q≤m s = σ pq ∧ (σ pq = S 0 ∨ i T (i, σ pq )) (92)
Now, by Lemma 11, W �� is a regressable sentence not mentioning terms of sort bag of timelines and hence
the regression theorem can be applied and the claim holds.
3. If W has the form:
=S B 0
Then W reduces to either � or ⊥, which are regressable sentences in L, and R(�) (as R(⊥)) are uniform
in S 0 , hence the claim holds.
4. If W has the form:
B(�σr1 , . . . , σrm �r ) =S B(�σ�j1 , . . . , σ�jk � j ) ∪S B(�σ��p1 , . . . , σ��pn � p )
then we can use the definition of ∪S given in Example 10:
�
( ∪S
=S
≡ ∀s.s ∈S
≡ (s ∈S
∨ s ∈S
�
))
Then W is equivalent to the following formula W � :
∀s.s ∈S B(�σr1 , . . . , σrm �r ) ≡ s ∈S B(�σ�j1 , . . . , σ�jk � j ) ∨ s ∈S B(�σ��p1 , . . . , σ��pn � p )
(94)
Again using (G1), we get W �� :
�
s = σu ∧ (σu = S 0 ∨ i T (i, σu )) ≡
�
�
�
�
�
�
�
��
��
��
1≤h≤k s = σh ∧ (σh = S 0 ∨
i T (i, σh )) ∨
1≤q≤n s = σq ∧ (σq = S 0 ∨
i T (i, σq ))
∀s.
�
��
1≤u≤m
(95)
C
PROOFS FOR SECTION 4
58
By first order tautologies and equality, W �� reduces to the equivalent formula W �� :
�
1≤u≤m
��
1≤h≤k (σu
= σ�h ) ∨
�
1≤q≤n (σu
�
= σ��q ) ∧ (σu = S 0 ∨ T (i, σu ))
(96)
W �� is a regressable sentence of L3 and does not mention terms of sort bags of timelines, therefore the
regression theorem can be applied and also in this case the claim holds.
5. Any other regressable atom (see Definition 1) mentioning the classical set-operators, that can be defined
using (G1-G5), can be easily reduced to the previous cases.
Now, by induction hypothesis, regression can be extended to any regressable sentence mentioning terms of sort
bag of timelines simply using the inductive definition of R:
R[¬W]
=
R[W1 ∧ W2 ] =
R[(∃v)W]
=
¬R[W],
R[W1 ] ∧ R[W2 ],
(∃v)R[W]
(97)
This concludes the proof that regression can be extended to regressable formulas mentioning terms of sort bag of
timelines
�
Appendix C
C
Proofs for Section 4
C.1 Proof of Theorem 5
Let DT be the theory formed by D ss ∪ Dap ∪ Dπ ∪ DS 0 ∪ A+ ∪ Duna , we have to prove that DT is satisfiable
iff DS 0 ∪ A+ ∪ Duna is. If DT is satisfiable, despite the second order axiom, using compactness it follows that
also DS 0 ∪ A+ ∪ Duna is. For the other direction assume that DS 0 ∪ A+ ∪ Duna is satisfiable. Let M1 be
any of such structures. We know from the proof of Theorem 3 that M1 , a structure of L4 , is also a model of
DS 0 ∪ Dap ∪ D ssa ∪ Duna ∪ A+ where D ssa ∪ Duna mentions successor state axioms for timelines, but no process
is yet defined. We also assume that axiom (4), (see also (45 )) is verified in M1 .
Here we have to show that M1 can be transformed into a model of the condition (20) and of the successor
state axioms and action precondition axioms for processes. To this end we define a new structure M2 = (D, I � )
having the same domain as M1 , but the interpretation I � extends I, fixing an interpretation also for processes and
the fluent Idle, as specified in the sequel.
Let Π be the set of processes in the language L4 , we enumerate Π
π1 (i1 , �x, t0 , S 0 ), π2 (i1 , �x, t0 , S 0 ), . . . , πm (ik , �x, t0 , S 0 ), πm+1 (ik , �x, t0 , S 0 ), . . .
and define subsets of this ordering as follows:
Πim = {π j (im , �x, t0 , S 0 ) | im is the corresponding name type of the process π j , for which H(im , a) is defined}
In other words, following Corollary B.3 all the actions startπ and end pi might have been already suitable assigned
to the H(i, a).
We can now order the Πik according to the name type and choose one process for each set and state:
M2 |= ∃�xπ j (im , �x, t0 , S 0 ) iff for all πk (im , �x, t0 , S 0 ) ∈ Πim , (k � j), M2 |= ∀�x¬πk (im , �x, t0 , S 0 )
Further we set for all name types ¬Idle(i, S 0 ).
C
59
This construction implies that M1 is a model of the condition (20).
Thus we have:
(1)
(2)
(3)
(4)
M2 , v |= ¬Idle(i, S 0 )
for all name types i
M2 , v |= ¬Poss(startπ (i, �x, t), S 0 ) for all name types i, because of (1) above and definition (19)
M2 , v |= Poss(endπ (i, �x, t), S 0 ) iff M2 , v |= Ψend (i, �x, S 0 ) and
(98)
M2 , v |= π(i, �x, t0 , S 0 ) ∧ time(endπ ) > t0
M2 , v |= time(S 0 ) = t0
by the construction of M1
Having fixed the definition of processes, Idle and Poss in DS 0 we can proceed inductively on all situation s
similarly as in the relative satisfiability theorem in [61]. In fact, the inductive step of the proof relies on the fact
that the right hand side of the successor state axioms Ψ(�x, y, a, t, s) is uniform in s and hence has already been
assigned a truth value in s by M2 , and being fixed for DS 0 the induction is straightforward.
�
C.2 Proof of Proposition 1
We proceed by induction on σ, using the definition of executable given in (17) and the definitions of the action
preconditions. Consider do(α, S 0 ), in this case, if H(i, α) then DT |= T (i, do(α, S 0 )). For the inductive step,
consider do(α, σ). By the hypothesis α is executable in σ, hence DT |= Poss(α, σ), therefore, by (19) and (21) ,
α=ν σ. By the induction hypothesis DT |= T (i, σ) hence DT |= T (i, do(α, σ)).
�
C.3 Proof of Proposition 2
We have to show that, for σ an executable situation, then


π�π
��


�

DT |= Idle(i, σ) ∨ ∃�x t.π(i, �x, t, σ) → ¬Idle(i, σ) ∧ ∀�y
¬π (i, �y, t, σ)
(99)
π� ∈Π
First note that for σ = S 0 the statement holds because of (20). For any other σ, by the previous Proposition 1
(C.2) it follows that σ must also be a timeline. Suppose that for the given timeline T (i, σ) the statement holds for
��
any σ � σ, and let do(a, σ� ) be the first situation in the equivalence class of the timeline for which the statement
does not hold, for some a. We show that this leads to a contradiction. Assume, therefore, there is some model M
of DT and some processes π� (i, �y, t� , do(a, σ� )) and π�� (i, �x, t, do(a, σ� )), with π� � π�� , which do not satisfy (99).
Then:
M |= ¬Idle(i, do(a, σ)) ∧ (∃�x, t.π� (i, �x, t, do(a, σ� )) ∧ ∃�z, t� .π�� (i,�z, t� , do(a, σ� )) ∨ Idle(i, do(a, σ)))
(100)
M |= ¬Idle(i, do(a, σ)) ∧ (∃�x t.π� (i, �x, t, do(a, σ� )) ∧ ∃z, t� .π�� (i,�z, t� , do(a, σ� )))
Then:
1. Since M |= ∃�x, t.π� (i, �x, t, do(a, σ� )), then by the successor state axiom for processes (16):
M |=
M |=
∃�x, t.a=startπ� (i, �x, t) ∨ π� (i, �x, t, σ� ) ∧ ∀t.a � endπ� (i, �x, t)
∃�x, t.a=startπ� (i, �x, t) ∨ ∃�z, t� .π� (i,�z, t� , σ� ) ∧ ∀t�� .a � endπ� (i,�z, t�� )
(101)
2. Since M |= ∃�x, t.π�� (i, �x, t, do(a, σ� )), then:
M |= ∃�x, t.a=startπ�� (i, �x, t) ∨ ∃�z, t� .π�� (i,�z, t� , σ� ) ∧ ∀t� .a � endπ�� (i,�z, t� )
(102)
D
60
Now, since do(a, σ� ) is the first situation in which the statement fails, it follows that it must be true in σ hence it
� t� , σ� ) and π�� (i, d�� , t� , σ� ) hold. W.l.o.g
cannot be that for some d� and some d�� in the domain of M, both π� (i, d,
�
�
�
� t , σ ), for some d� ∈ D, the domain of M, and
we may assume any of the two and establish that M |= π (i, d,
M |= ∀z, t� ¬π�� (i,�z, t� , σ� ). But now, since M satisfies π�� with argument do(a, σ), it follows, from the successor
state axiom for processes, that it must be that M |= ∃�y, t� .a = startπ�� (i, �y, t� ).
By the statement assumptions do(a, σ) must be executable, hence M |= ∃�y, t� .a = startπ�� (i, �y, t� )∧Poss(startπ�� (i, �y, t� ), σ� ).
This fact, in turn, implies, by the definition of Poss for the action startπ�� (see 19) that M |= Idle(i, σ� ), hence
� t� , σ� ), for some d� ∈ D, given that (99) holds with σ� , therefore also for π� it
it cannot be true that M |= π� (i, d,
�
�
�
must be that M |= ∀z, t ¬π (i, d, t� , σ� ). We are thus left with
M |=
∃�x, t.a=startπ� (i, �x, t) ∧ Idle(i, σ� ) ∧ ∃�y, t� .a=startπ�� (i, �y, t) ∧ Idle(i, σ� )
(103)
But this is not possible for both, by the equality, hence it follows that
M |= ∀�x, t.¬π� (i, �x, t, do(a, σ� )),
and we have a contradiction.
�
Appendix D
D
Proofs for Section 6
D.1 Proof of Theorem 6
We first introduce three lemmas, then we prove the theorem.
Lemma 12 Let K and G be finite sets of indexes, the following are tautologies:
�
�
�
�
i.
{ i∈K Di } → { g∈G Wg } ≡
{D → { g∈G Wg }},
�
�
�
�i∈K i
ii. ∀�
w∃�z i∈K {Di (�
w) → { g∈G Wg (�
w,�z)}} ≡
w{Di (�
w) → { g∈G ∃�z Wg (�
w,�z)}}.
i∈K ∀�
Proof. By FOL
(104)
�
Lemma 13 Given the predicates Elapsed X (i, �x, t− , t+ , σ) and ActiveX (i, �x, t− , σ), as defined in (24,25), with σ a
ground situation of type i, for any M of TFSC and assignment v, the following holds:
�
M, v |= Elapsed X (i, �x, t− , t+ , σ) iff M, v |= i∈K [Mk,X (�x, t− , t+ , S 0 ) ∧ (t− = τ−k,X ∧ t+ = τ+k,X )] ;
�
(105)
M, v |= ActiveX (i, �x, t− , σ)
iff M, v |= i∈K [Nk,X (�x, t− , S 0 ) ∧ (t− = τ−k,X )].
Here τ±k,X is a time variable (or instance) mentioned in σ, Mk,X (�x, t− , t+ , S 0 ) and Nk,X (�x, t− , S 0 ) are TFSC formulas
in S 0 with k in K finite set of indexes.
Proof. We proceed by induction on σ.
Basic case: in this case we state σ = S 0 , hence, by (24) we have that
ActiveX (i, �x, t− , S 0 ) = X(i, �x, S 0 ) ∧ time(S 0 ) = t−
ElapsedX (i, �x, t− , t+ , S 0 ) = ⊥,
and we obtain (105) once we state, for instance, K = {1}, M1,X (i, �x, t− , t+ , S 0 ) = ⊥, N1,X (i, �x, t− , S 0 ) = X(i, �x, S 0 )
and τ−1,X = t0 , where t0 is for time(S 0 ).
Inductive step: now we assume that (105) holds for σ and we prove that it holds for do(A, σ).
D
61
By (24) we have that:
ActiveX (i, �x, t− , do(A, σ)) = T (i, do(A, σ)) ∧ S tartedX (i, �x, t− , A, σ)∨
ActiveX (i, �x, t− , σ) ∧ ¬∃t+ EndedX (i, �x, t+ , A, σ)
ElapsedX (i, �x, t− , t+ , do(A, σ)) = T (i, do(A, s)) ∧ ElapsedX (i, �x, t− , t+ , σ)∨
EndedX (i, �x, t+ , A, σ) ∧ ActiveX (i, �x, t− , σ).
Applying the definition of S tartedX and EndedX the previous one can be rewritten as follows:
ActiveX (i, �x, t− , do(A, σ)) = T (i, do(A, σ)) ∧ X(�x, do(A, σ)) ∧ ¬X(�x, σ) ∧ time(A) = t− ∨
ActiveX (i, �x, t− , σ) ∧ ¬∃t+ (X(�x, σ) ∧ ¬X(�x, do(A, σ)) ∧ time(A) = t+ );
ElapsedX (i, �x, t− , t+ , do(A, σ)) = T (i, do(A, s)) ∧ ElapsedX (i, �x, t− , t+ , σ)∨
X(�x, σ) ∧ ¬X(�x, do(A, σ)) ∧ time(A) = t+ ∧ ActiveX (i, �x, t− , σ).
(106)
Consider the regression of the following formulas:
R(T (i, do(A, σ)) = R0i (S 0 );
R(X(�x, σ) ∧ ¬X(�x, do(A, σ))) = R1X (�x, S 0 );
R(T (i, do(A, σ)) ∧ X(�x, do(A, σ)) ∧ ¬X(�x, σ)) = R2X (�x, S 0 );
R(¬∃t+ (X(�x, σ) ∧ ¬X(�x, do(A, σ)) ∧ time(A) = t+ )) = R3X (�x, S 0 ).
Than we can rewrite the previous equivalence (106) by substituting the regressed formulas
R2X (�x, S 0 ) ∧ time(A) = t− ∨
ActiveX (i, �x, t− , σ) ∧ R3X (�x, S 0 );
ElapsedX (i, �x, t− , t+ , do(A, σ)) = R0i (S 0 ) ∧ ElapsedX (i, �x, t− , t+ , σ)∨
R1X (�x, S 0 ) ∧ ActiveX (i, �x, t− , σ) ∧ time(A) = t+ .
ActiveX (i, �x, t− , do(A, σ)) =
If we now apply the inductive hypothesis we get:
M, v |=
M, v |=
M, v |=
M, v |=
Elapsed X (i, �x, t− , t+ , do(A, σ))
�
[M (�x, t− , t+ , S 0 ) ∧ R0i (S 0 ) ∧ (t− = time(a−k,X ) ∧ t+ = time(a+k,X ))]∨
�k∈K k,X −
x, t , S 0 ) ∧ R1X (�x, S 0 ) ∧ time(A) = t+ ;
k∈K [Nk,X (�
ActiveX (i, �x, t− , do(A, σ))
�
x, t− , S 0 ) ∧ R3X (�x, S 0 ) ∧ (t− = time(a−k,X ))]∨
k∈K [Nk,X (�
2
RX (�x, S 0 ) ∧ time(A) = t−
iff
iff
(107)
where a±k,X is an action mentioned in the ground situation σ. Since time(ak,X ) equals to a time variable (or instance) τ±k,X mentioned in σ, the property (23) holds for do(A, σ). This concludes the prove.
�
Lemma 14 Given a bag of timelines [ω] mentioning the set of timelines {σ1 , . . . , σn }, where ω is a tuple of the
time variables ti, j , the predicate I(T c , [ω]) can be transformed into the following form:
� �
�
�
�
x∃�y(Pd (id , �x, τ−k , τ+k ) opd,r Qr ( jr , �y, τ−g , τ+g )[σid , σ jr ]).
(108)
d∈D
w∈Wd
r∈Rd,w
k∈Kd,w,r
g∈Gd,w,r ∀�
Here D, Wd , Rd,w Kd,w,r , and Gd,w,r are finite set of indexes, where τ−k , τ+k (τ−g , τ+g ) are either temporal variables
or ground temporal instances mentioned in the ground situation σid (respectively, σ jr ) with name types id ( jr ) .
�
Proof. From (27) we have that I(T c , ) is a TFSC formula of the form
�
�
�
∃s(s ∈ ∧ T (i, s) ∧ ( {(op,Q( j,�y))∈L} ∃s� (s� ∈ ∧ T ( j, s� )∧
(comp(Pi (�x),LL) ∈ T c )
(L ∈ LL)
∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ])))).
D
62
We want to prove that, for each model M of TFSC and assignment v
M, v |=
M, v |=
∃s(s ∈ ∧ T (i, s) ∧ ∃s� (s� ∈ ∧ T ( j, s� )∧
∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ])))
(109)
iff there exist two ground situation σi and σ j in of type i and j respectively, such that:
� �
x∃�y(P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]),
k∈K
g∈G ∀�
with K and G finite sets of indexes and τ−k , τ+k (τ−g , τ+g ) time variables or instances in σi (σ j ).
First of all, observe that, since M is a TFSC model, the following holds:
M, v |=
M, v |=
∃s(s ∈ ∧ T (i, s) ∧ ∃s� (s� ∈ ∧ T ( j, s� )∧
∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[s, s� ])))
(110)
iff there exists two ground situations, σi and σ j in , of type i and j respectively, such that:
∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[σi , σ j ]).
Therefore, the theorem is proved once we show that
M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j (P(i, �x, ti− , ti+ ) op Q( j, �y, t−j , t+j )[σi , σ j ]),
� �
M, v |= k∈K g∈G ∀�x∃�y(P(i, �x, τ−k , τ+k ) op Q( j, �y, τ−g , τ+g )[σi , σ j ]),
with σi , σ j ∈ , iff
(111)
with τ−k , τ+k , (τ−g , τ+g ) time variables or instances thereof mentioned in σi (σ j ). Indeed, from (111) and (27) we
obtain that M, v |= I(T c , ) iff
� �
�
�
�
M, v |= d∈D w∈Wd r∈Rd,w k∈Kd,w,r g∈Gd,w,r ∀�x∃�y(Pd (id , �x, τ−k , τ+k ) opr Qr ( jr , �y, τ−g , τ+g )[σid , σ jr ]),
where d ∈ D is for (comp(P(�x), LL) in T c ), w ∈ Wh is for L in LL, and r ∈ Rd,w is for (op, Q( j, �y)) ∈ L. Once we
�
�
�
represent r∈Rd,w k∈Kd,w,r directly as �r,k�∈RKd,w , we obtain the equation (108) and the Lemma is proved.
Now, it remains to show that (27) holds. In order to prove this, we proceed with a proof by cases, restricting
our attention to {m, b, s, f, d}.
Case meets: We consider the following form:
P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ]
def
ElapsedP (i, �x, ti− , ti+ , σi ) →
�
�
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti+ = t−j ) ;
�
Given the (107), by FOL M, v |= (ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) iff M, v |= g∈G [Wg,Q (�y, t−j ,
t+j S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q )], with τg,Q time variables (instances) mentioned in σ� . Therefore, we have that:
=
M, v |= P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ]
�
M, v |= { i∈K [Mi,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} →
�
{ g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t−j )]},
iff
(112)
We can consider the formula ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ]. We can see that:
M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ],
(by (112)) iff
�
M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j { i∈K [Mi,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} →
�
(by (104) i.) iff
{ g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t−j )]},
�
M, v |= i∈K {∀�x, ti− , ti+ [Mi,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )] →
�
{ g∈G ∃�y, t−j , t+j [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t−j )]}}, (by (104) ii.) iff
�
�
M, v |= k∈K {∀�x[Mk,P (�x, τ−k,P , τ+g,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ+k,P =τ−g,Q )]}}
iff
� �
M, v |= k∈K g∈G ∀�x, ∃�yP(i, �x, τ−k,P , τ+g,P ) m Q( j, �y, τ−g,Q , τ+g,Q )[σi , σ j ]
D
63
The last one mentioning only time variables in σi and σ j . This concludes the prove for m.
Case before: We consider the following form:
P(i, �x, ti− , ti+ ) b Q( j, �y, t−j , t+j )[σi , σ j ]
def
=
�
�
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti+ > t−j ) .
Analogously to the previous case, ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) b Q( j, �y, t−j , t+j )[σi , σ j ] can be transformed into
�
�
{∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ+k,P <τ−g,Q )]}}
g∈G
k∈K
mentioning only time variables in σ and σ� . This concludes the prove for b.
Case finishes: We consider the following form:
P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[σi , σ j ]
def
=
�
�
ElapsedQ ( j, �y, ti+ , t+j , σ j ) ∧ (ti+ = t+j ) .
Given the equation (107), by regression, we have that M, v |= P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[σi , σ j ] iff
�
M, v |= { k∈K [Mk,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} →
�
{ g∈G [Mg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti+ =t+j )]},
Analogously to the previous cases, M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) f Q( j, �y, t−j , t+j )[σi , σ j ] iff
�
�
M, v |=
{∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { ∃�y[Mg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ+k,P =τ+g,Q )]}}
g∈G
k∈K
mentioning only time variables in σi and σ j . This concludes the prove for f.
Case starts: We consider the following form:
P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ]
def
=
(ElapsedP (i, �x, ti− , ti+ , σi ) →
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j ))
∧
(ActiveP (i, �x, ti− , σi ) →
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j )).
Given the equation (107), we have that M, v |= P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ] iff
M, v |=
�
∧ (ti− = τ−k,P ∧ ti+ = τ+k,P )]} →
�
{ g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti− =t−j )]}∧
�
{ k∈K [Nk,P (�x, ti− , ti+ , S 0 ) ∧ (ti− = τ−k,P )]} →
�
{ g∈G [Wg,Q (�y, t−j , t+j , S 0 ) ∧ (t−j = τ−g,Q ∧ t+j = τ+g,Q ) ∧ (ti− =t−j )]},
{
x, ti− , ti+ , S 0 )
k∈K [Mk,P (�
Given this form, we have that M, v |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ] iff
M, v |=
�
�
{∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−k,P =τ−g,Q )]}}∧
�k∈K
�
xNk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−k,P =τ−g,Q )]}}
k∈K {∀�
mentioning only time instances or variables in σi and σ j . This concludes the prove for s.
D
64
Case during: We consider the following form:
P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[σi , σ j ]
def
=
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j ))∧
(t−j ≤ ti− ∧ ti+ ≤ t+j ))∧
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (t−j ≤ ti− )).
Analogously to the previous case, M |= ∀�x, ti− , ti+ ∃�y, t−j , t+j P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[σi , σ j ] iff
M, v |=
�
�
{∀�x Mk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−g,Q ≤ τ−i,P ∧ τ+i,P ≤ τ+g,Q )]}}∧
�k∈K
�
xNk,P (�x, τ−k,P , τ+k,P , S 0 ) → { g∈G ∃�y[Wg,Q (�y, τ−g,Q , τ+g,Q , S 0 ) ∧ (τ−g,Q ≤ τ−i,P )]}}
k∈K {∀�
mentioning only time variables or instances in σi and σ j . This concludes the prove for d.
�
Concluding the proof of Theorem 6:
To conclude the proof we consider the trivial transformation from CNF to DNF formulas, i.e. the CNF form
� �
� �
d∈D
w∈Wd Bd,w , from the Lemma, can be expressed as an equivalent DNF form
n∈N
m∈Mn Bn,m for a suitable
set of indexes D, N, Wi , and Mi .
We consider now the form (108), i.e.
� �
�
�
�
x∃�y(Pd (id , �x, τ−k , τ+k ) opr Qr ( jk , �y, τ−g , τ+g )[σih , σ jr ]),
d∈D
w∈Wd
r∈Rd,w
k∈Kd,w,r
g∈Gd,w,r ∀�
where D, W, R, K, and G are finite set of indexes, τ−k , τ+k (τ−g , τ+g ) are the temporal variables mentioned in the
ground situation σid . We consider this formula in the following form:
� �
�
�
d,w,r
(113)
d∈D
w∈Wd
�r,k�∈RKd,w
g∈Gd,w,r Bk,g ,
with Bd,w,r
x ∃�y Pd (id , �x, τ−k , τ+k ) opr Qr ( jk , �y, τ−g , τ+g )[σid , σ jr ]. By applying the CNF to the DNF
k,g representing ∀�
transformation we can pass through the following equivalent forms.
�
�
�
�
�
�
�
�
From (1) d∈D �w,r�∈WRd k∈Kd,w,r g∈Gd,w,r Bd,w,r
d∈D
�w,r�∈WRd
n=�n1 ,n2 �∈Nd,w
m=�m1 ,m2 �∈Md,w,n
k,g we get (2)
d,w
Bd,w
n,m , with Bn,m representing
∀�x∃�yPd (id , �x, τ−n1 ,m1 , τ+n1 ,m1 )opn1 ,m1 Qn1 ,m1 ( jn1 ,m1 , �y, τ−n2 ,m2 , τ+n2 ,m2 )[σid , σ jn1 ,m1 ]
�
�
�
Bd,w,r
n,m , and the previous form can be expressed as (4)
∈ WRNd abbreviates �w, r� ∈ WRd , n ∈ Nd,w .
d∈D
�w,r,n�∈WRNd
m∈Md,w,n
� �
�
By applying again the CNF to DNF transformation we get (5) z∈Z s=�s1 ,s2 ,s3 �∈S z m∈Mz,s,n Bz,s
m ,
which is equivalent to (3)
�
�
�
d∈D
�w,r�∈WRd ,n∈Nd,w
Bd,w,r
n,m , where �w, n�
m∈Md,w,n
∀�x∃�yPz,s1 (iz,s1 , �x, τ−s2 ,m1 , τ+s2 ,m1 )op s3 ,m1 Q s3 ,m1 ( j s3 ,m1 , �y, τ−s2 ,m2 , τ+s2 ,m2 )[σiz,s1 , σ js1 ,m1 ]
�
�
�
� �
hence, if we call s∈S z m∈Mz,s,n as q=�q1 ,q2 ,q3 ,q4 �∈Qz , we get (6) z∈Z q∈Qz Bzq .
Now, from (6), with Bzq representing
∀�x∃�yPz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 )opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ].
We obtain the required form (32). This concludes the proof of the theorem.
�
D
65
D.2 Proof of Theorem 7
First note that each symbol op is associated to the temporal relation γop (t1− , t1+ , t2− , t2+ ) obtained as a combination
of relations = and < as specified in equation (31). Let M be a structure of LT FS C such that M is a model of DT
and suppose that for some assignment v the followings hold:
(i) M, v |= ∀�x∃�y P(i, �x, τ−p , τ+p ) op Q( j, �y, τ−q , τ+q )[σi , σ j ],
(ii) M, v |= ∃�x ElapsedP (i, �x, τ−p , τ+p , σi ) or M, v |= ∃�x ActiveP (�x, τ−p , σi ).
We can prove the theorem by cases for each op in {m, s, f, b, d}.
First of all we consider the case of m. Since
P(i, �x, ti− , ti+ ) m Q( j, �y, t−j , t+j )[σi , σ j ]
def
=
�
�
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti+ = t−j ) ,
we have that for any model M |= DT and assignment v such that the items (i) and (ii) hold,
�
�
M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ) → (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ+p = τ−q ) ,
and,
�
�
M, v |= (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ+p = τ−q ) .
(τ+p
(114)
τ−q ),
Thus, by (i) and (ii) we get M, v |=
=
Let [d−p , d+p dq− , dq+ ] be an assignment to the variables according to v (or an interpretation of the ground terms)
then, since M, v |= (τ+p = τ−q ), by the above equation 114 it follows that the algebraic relation γm (d−p , d+p , dq− , dq+ )
holds too in M.
The proof for b and f is analogous.
For s, we have the following macro expansion:
P(i, �x, ti− , ti+ ) s Q( j, �y, t−j , t+j )[σi , σ j ]
def
=
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j ))
∧
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (ti− = t−j )),
we have that for any model M |= DT and assignment v and such that (i) holds
�
�
M, v |= ActiveP (i, �x, τ−p , σi ) → (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ−p = τ−q ) ,
and
�
�
M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ) → (ActiveQ ( j, �y, τ−q , σ j ) ∨ ElapsedQ ( j, �y, τ−q , τ+q , σ j )) ∧ (τ−p = τ−q ) .
By (ii), either M, v |= ActiveP (i, �x, τ−p , σi ) or M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ), in both cases M, v |= (τ−p = τ−q ).
Hence given the assignment [d−p , d+p dq− , dq+ ] to the time variables, the relation γs (d−p , d+p , dq− , dq+ ) holds in M.
For d, we are given the following form:
P(i, �x, ti− , ti+ ) d Q( j, �y, t−j , t+j )[σi , σ j ]
def
=
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j ))∧
(t−j ≤ ti− ∧ ti+ ≤ t+j ))∧
(ActiveQ ( j, �y, t−j , σ j ) ∨ ElapsedQ ( j, �y, ti− , t+j , σ j )) ∧ (t−j ≤ ti− )),
D
66
we have that, by (i) and (ii), either M, v |= ElapsedP (i, �x, τ−p , τ+p , σi ) then M, v |= (τ−q ≤ τ−p ∧ τ+p ≤ τ+q ) or
M, v |= ActiveP (i, �x, τ−p , σi ) then M, v |= (τ−q ≤ τ−q ). Also in this case, given the assignment v, mapping the time
variables to [d+p dq− , dq+ , dq− ] dq− ≤ d−p or dq− ≤ d−p ∧ d+p ≤ dq+ . The case for ground terms is analogous. Thus, either
the relation γd (d−p , d+p , dq− , dq+ ) holds or its relaxed version γd� (d−p , d+p ) holds.
�
D.3 Proof of Corollary 2
It is consequence of Theorem 6, Theorem 7, and of the network construction. Indeed, by Theorem 6, we have
that I(T c , [ω]) can be reduced to the form (32):
� �
x∃�y.Pz,q1 (iz,q1 , �x, τ−z,q2 , τ+z,q2 ) opz,q3 Qz,q3 ( jz,q3 , �y, τ−z,q4 , τ+z,q4 )[σiz,q1 , σ jz,q3 ].
z∈Z
�q1 ,q2 ,q3 ,q4 �∈Jz ∀�
Form this, following the network construction steps (a)-(d), we can build the temporal constraint network ζ:
�
�
µqop1 z,q (τ−z,q2 , τ+z,q2 , τ−z,q4 , τ+z,q4 ).
z∈Z (q1 ,q2 ,q3 ,q4 )∈Jz
3
We have to show that, given a model M for DT : (1) if M, v |= I(T c , [ω]), then the assignment v represents
a solution for ζ(DT , T c , [ω]); (2) given an assignment v which is a solution for ζ(DT , T c , [ω]), then M, v |=
I(T c , [ω]).
(1) If M, v |= I(T c , [ω]), then, given the (32) form, there exists at least one z ∈ Z such that M, v |=
�
x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·). Therefore, for each associated conjunct, indexed by �q1 , q2 , q3 , q4 � ∈
�q1 ,q2 ,q3 ,q4 �∈Jz �
Jz , M, v |= ∀�x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·).
If (i) and (ii) of Theorem 7 are satisfied, then we can apply Theorem 7 that ensures that µqop1 z,q3 (obtained from
m2 if E Pz,q1 holds or from m3 if APz,q1 holds) is satisfied by the assignment v. In the remaining case, since (i) and
p1
(ii) are not satisfied then NP holds, thus µop
is trivially satisfied because it does not impose any constraint on
z,p3
the associated variables. This concludes the proof for (1).
(2) As for the other direction, we have to show that, given a model M of DT , given an assignment solution
V for the temporal network ζ(DT , T c , [ω]), then the assignment v that is like V w.r.t. the time variables ω is
such that M, v |= I(T c , [ω]). Analogously to the previous case, since ζ is a disjunction of conjunctions, if ζ
�
is satisfiable, then there exists at least one disjunct �q1 ,q2 ,q3 ,q4 �∈Jz µqz,q1 3 (·), indexed by z ∈ Z, such that V is an
assignment solution.
Now, given one of the conjuncts µqz,q1 3 (·), by the step (c) of the ζ construction, there exists an associated
conjunct ∀�x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·) in (32) that is also partially consistent. We show that, given the assignment
v restricted to the time variables ω that is like V on these, M, v |= ∀�x∃�y.Pz,q1 (·) opz,q2 Qz,q3 (·) for each opz,q2 ∈
{m, f, s, d, . . . }.
�
For each of these cases, Pz,q1 (·) opz,q2 Qz,q3 (·) can be reduced to the following form (A) (E P (�x, �τ, ·) →
(QE P (�y, �τ, ·) ∧ µop (�τ))), where the E P (�x, �τ, ·) are mutually exclusive in the disjunction (i.e. given an assignment for �x,�τ,and · at least one µop is enabled). We may assume, by contradiction, that v is such that M, v �|=
�
∀�x∃�y.Pz,q1 (�x, �τ) opz,q2 Qz,q3 (�y, �τ), hence M, v �|= ∀�x∃�y. (E P (�x, �τ, ·) → (QE P (�y, �τ, ·) ∧ µop (�τ))). However, by
�
FOL, from this we get that M, v �|= ∀�x. (E P (�x, �τ, ·) → (∃�yQE P (�y, �τ, ·) ∧ µop (�τ))). From this, it follows that there
�
exists v∗ that extends v with an assignment for �x, such that M, v∗ �|= (E P (�x, �τ, ·) → (∃�yQE P (�y, �τ, ·) ∧ µop (�τ))),
hence, for each disjunct, M, v∗ �|= E P (�x, �τ, ·) → (∃�yQE P (�y, �τ, ·) ∧ µop (�τ)). At this point, for each disjoint, we have
two possible cases: (B) M, v∗ �|= E P (�x, �τ, ·) → ∃�yQE P (�y, �τ, ·) or (C) M, v∗ �|= E P (�x, �τ, ·) → µop (�τ). However, by
the partial consistency assumption, (B) is contradicted in at least one disjoint. On the other hand, (C) requires
that M, v∗ |= E P (�x, �τ, ·) and (D) M, v∗ �|= µop (�τ). But (D) contradicts the assumption, in fact, by assumption, v∗
restricted to the temporal variables is like V that solves the algebraic relation represented by µop (�τ).
This concludes the proof for (2).
�
E
67
E Proofs for Section 7
E.1 Proof of Proposition 3
We have to show that given a model M of DT if
M |= DoT F(prog,
init ,
, (h s , he )),
and assuming that ttime( ) ≤ he and h s ≤ ttime(
M |=
init
init ),
then
�S .
We shall show the statement by induction on the structure of prog.
The basic case is given for the primitive action prog = a. In this case, we have to show that if
M |= DoT F(a,
init ,
, (h s , he )),
then init �S . However, by definition, either M |= = ddo(a, s, init ) or M |= init = , in both cases the
statement holds for the basic case.
Now assume, by induction, that the statement holds for DoT F(prog, init , , (h s , he )), we show the following
constructs.
1. Consider the program sequence: DoT F(prog1 ; prog2 ,
pothesis we have that there exists �� such that init �S
init �S .
, (h s , he )), by definition and inductive hyand �� S , hence, by transitivity we have
init ,
��
2. Consider the Partial-order action choice: DoT F(prog1 ≺ prog2 ,
assumptions
M |= ∃
��
,
��
(DoT F(prog1 ,
init ,
��
init ,
, (h s , he )) ∧ DoT F(prog2 ,
��
, (h s , he )). By definition and the
, , (h s , he )) ∧
��
�S
��
) (115)
Thus, there are two bags of timelines �� , �� such that init �S �� and �� S by inductive hypothesis,
and �� S �� by definition, therefore by transitivity of �S we obtain that init �S
3. Consider Nondeterministic action choice: M |= DoT F(prog1 |prog2 ,
M |= DoT F(prog1 ,
init ,
hence, by inductive hypothesis,
, (h s , he )) ∨ DoT F(prog2 ,
init
init ,
init ,
, (h s , he )). By definition,
, (h s , he )).
�S .
4. Consider the Nondeterministic iteration: M |= DoT F(prog∗ , init , , (h s , he )). By inductive hypothesis we
have that, if M |= DoT F(prog, � , �� , (h s , he )), then � �S �� , hence, by transitive closure, init �S .
The other cases follow by analogous reasoning.
�
Analogously to the previous proof, we show the statement by induction on the structure of prog. The base of the
induction uses the primitive action cases and the other statements are proved using the inductive hypothesis.
E
68
1. Primitive action. If prog = a, the executability is a direct consequence of the executability of a in init .
Indeed, in any model M of DT , if DoT F(a, init , � , (h s , he )) and ttime( ) ≥ h s and ttime( � ) ≤ he hold, by
definition of DoT F and FOL, we have that ∃s(s ∈ ∧ a=ν s ∧ Poss(a, s) ∧ time(s) ≥ h s ∧ time(s) ≤ he
∧time(s) ≥ time(a)∧ (time(a) ≤ he ∧ � = ddo(a, s, )). Hence, if σ ∈ � holds, either σ ∈ init or
σ = do(a, σ� ) with σ� ∈ init . In the two cases, by assumption and definition of DoT F, executable(σ)
obtains.
2. Program sequence. If DoT F(prog1 ; prog2 , init , , (h s , he )) holds, we can assume by induction that, there
exists �� such that the property holds for: (1) DoT F(prog1 , init , �� , (h s , he )); (2) DoT F(prog1 , �� , � , (h s , he )).
Therefore, by assumption and (1) we can conclude that any σ ∈ �� is executable. Now, since any σ ∈ ��
is executable, we can apply the inductive hypothesis to (2) and conclude that if σ ∈ then σ is executable.
3. Partial-order action choice. Analogously to the previous case, if DoT F(prog1 ≺ prog2 , init , , (h s , he ))
holds, there exists �� and � such that (1) DoT F(prog1 , init , �� , (h s , he )) and (2) DoT F(prog2 , � , , (h s , he ))
with (3) exec( �� , � , (h s , he )). Now, from (1) and (3), by the inductive hypothesis and definition of exec,
any σ ∈ � is executable. Therefore, by (3) and the inductive hypothesis, we can also conclude that if σ ∈
then σ is executable.
4. Nondeterministic iteration. By the inductive hypothesis, if DoT F(prog, � , �� , (h s , he )) holds and σ ∈ � ,
only if it is executable, then σ� ∈ �� only if it is executable. Analogously to Proof E.1, the statement can
be proved by transitive closure.
5. The cases of test, nondet. choice of actions, nondet. choice of argument are straightforward.
�
The proof is a direct consequence of Corollary 2.
By FOL, we have that, for any model M of DT , and for any assignment v to ω, M, v |= DoT F(prog, init ,
[ω], (h s , he )) ∧ I(T c , [ω]) iff M, v |= DoT F(prog, init , [ω], (h s , he )) and M, v |= I(T c , [ω]).
However, by Corollary 2 we have that, given a model M of DT , for any assignment v to the free variables ω
of [ω], M, v |= I(T c , [ω]) iff v is an assignment solution for the network(DT , T c , [ω]).
Hence, by FOL and Corollary 2, we have that for any model M of DT , for any assignment v to ω, M, v |=
DoT F(prog, init , [ω], (h s , he )) ∧ I(T c , [ω]) iff M, v |= DoT F(prog, init , [ω], (h s , he )) and v is a solution for
network(DT , T c , [ω]).
�
Given the DS T S C , we can build an associated DT denoting a single component. We introduce a constant v
representing a unique state variable. Each action a st (·) and fluent P st (·) used in DS T S C is also introduced in DT .
For each start pst (·) and end pst (·) in DS T S C we introduce an action start st (v, ·) and end st (v, ·) in DT . The action
preconditions associated with start st (v, ·), end pst (v, ·) are the one in DS T S C . In this case, the processes are not
st
needed, preconditions Dap and successor state axioms D ss coincide with Dap
and D stss respectively. As for DS 0 ,
st
st
st +
this coincides with DS 0 . At this point, the DT is defined by D ss ∪ Dap ∪ DSst0 ∪ A+ ∪ Duna . Note that, since
the durative actions in DS T S C are not the processes in DT , hence Dπ is left empty.
We shall show the statement by induction on the structure of prog st .
For the base step we consider the primitive action. We can show that (1) DS T S C |= Do(a, s, s� ) iff (2) DT |=
DoT F(a, , � , (0, ∞)). If we consider the horizon (0, ∞), (1) can be reduced to
DT |= ∃s(s ∈ ∧ a=ν s ∧ Poss(a, s) ∧ time(s) ≥ time(a) ∧ (
�
= ddo(a, s, )))).
E
69
On the other hand, the (1) holds iff
DS T S C |= Poss(a, s) ∧ start(s) ≤ time(a) ∧ s� = do(a, s).
Since DT extends DS T S C , it is easy to see that (2) entails (1). As for the other direction, is composed of a single
situation, therefore, for any situation σ that satisfies (2) it is possible to build a bag of situation that satisfies
(1).
By induction on the structure of the program, it is easy to show that the property holds for the other standard
Golog constructs.
�

DR 5.1.2: Methods and paradigms for skill learning based on

Transcription

Similar documents

presentation

Images of The Human Skull: Outer, Middle and Inner Ear Anatomy

Risk aversion and asset prices

Calculus D Notes

The Virtual Light of Other Days

Calculus D Notes

BULMER, R.N.H., 1969. Cultural Diversity and National Unity: Past

Mathematical Modeling as a Means to Demonstrate the

bankgothic md bt