Euclidean Path Modeling from Ground and Aerial Views

Transcription

Euclidean Path Modeling from Ground and Aerial Views
Euclidean Path Modeling from Ground and Aerial Views*
Irnran N. Junejo and Hassan Foroosh
{ ijunejo I foroosh ) @cs.ucf.edu
University of Central Florida, Orlando, Fl328 16
Abstract
We address the issue of Euclidean path mu& ling in a single camera for activity monitoring in a multi-camera video
surveillance system m e paperpmposes a novel linear solufwn ro auto-calibrafe any camera observing pedestrians
and uses these calibrated cameras to detect unusual object
behaviot: The input rrajectories are metric rectified and #e
inpur sequences are registered to rh sarellia imagery and
prototype pa# models are constructed. During #e testing
phase, using our simple yet eflcient similarity measures, we
seek a relation between the input trajectories derivedfrom a
sequence a d h e protorype path models. Real-world pedesrrian sequences are used ro demonstrate rhe practicality of
h e proposed method
1. Introduction
In path modeling and surveillance,we aim to build a system that, once given an acceptable set of trajectories of objects in a scene, is able to build a path model. Our goal is to
learn the routes or paths most commonly taken by objects as
they traverse through a scene. Once we have a model for the
scene, the method should be able to classify incoming trajectories as conforming to our model or not. Moreover, as
common pathways are detected by clustering the trajectories, we can efficiently assign detected trajectory its associated path model, thereby only storing the path label and the
object labels instead of the whole trajectory set, resulting
in a significant compression for storing surveillance data.
01ie very irnpor~anlrnolivalion lor his syslenl is buildi~iga
topology of the scene.
After successfully detecting and tracking the observed
pedestrians using Javed et al. [3], the first step is the removal of projective distortion from the object trajectories.
Thus upgrading them to Euclidean trajectories. This is only
possible once the camera is calibrated and we do this by observing the moving pedestrians. Original work on camera
*The sup* of the National Science Foundation(NSF) through the
Grant # IIS-0644280js gratefully acknowledged.
calibration using vanishing points started from the seminal
paper by Caprile and Torre [I].Liebowitz et al. [6]developed a method to compute the camera intrinsics by using
the Cholesky Decomposition [2]. Recently, Lv et al. [7]
and Mendonqa [51 proposed methods for auto-calibration
by observing pedestrians.
Once the object trajectories are metric rectified, a path
model for the scene is constructed. The most related work
is that of Makris and Ellis (81 where they develop a spatial
model to represent the routes in an image. Once a trajectory
of a moving object is obtained, it is matched with routes
already existing in a database using a simple distance measure. If a match is found, the existing route is updated by a
weight update function; otherwise a new route is created for
this new trajectory having some initial weight. The system
cannot lstinguish between a person walking and a person
lingering aroulid, or be~weeria ru111iirig aid a walki~igperson. There exist no stopping criteria for merging of routes.
In this paper, we present a novel linear method to metric rectify object trajectories by observing pedestrians in a
scene(Section 2). A novel application of Normalized-cuts
is used to cluster the rectified trajectories into distinct paths
(Section 3). Once the trajectories are clustered, we extract
meaningful features to build our model and check the conlorrnily o l a LGSLvajec~oryLO our buil~path rnodel (Secho~i
4). We rigourously test the proposed method on real and
synthetic data and the results are encouraging (Section 8).
We also demonstrate an application of the proposed method
for registration to the satellite imagery (Section 7).
2. Training Phase
During training phase, our goal is toflrst: calibrate the
camera. Once the camera is calibrated, the extracted object
trajectories are metric rectified. Second, to cluster the input lrajectories and a build a model based on our features
(Section 3) which are then be used to test the incoming uajecwries.
1-4244-1180-7/07/$25.00 ©2007 IEEE
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply.
Figure 1. (Left):A pdestriau detected at two different time instances provides vertical vanishing point (v,) and an another vanishing point (v') lying on thc horizon linc of thc ground planc.
(Right): Trachng pedeshians over any two frames provides two
harmonic homologies.
(a)
(b)
(c)
(4
Figure 2. Rectified lkajectories:(b) represents reconstructed trajectories for Seq #2 - shown in (a), while (d) represents Seq #3,
shown in (c),rectified.
2.1. Camera Calibration from pedestrians
As an object or a pedestrian of height h traverses the
ground plane, each location on this plane corresponds to
exactly one location on the head plane. Without loss of
generality, for a simple case of two frames, as shown in
Fig. 1, this correspondence can be mapped by a homography. In fact, this special type of homography is referred to
as a homology [Z].We refer to this homology as the vertical
homology: H, = 1- p, v lT where v, and 11 are, respectively, apex and axis of the homology. Another important
geometric relation, so far ignored in existing literature on
camera calibration from pedesuians, is the homology existing between different locations of a pedesuian (cf. Fig. 1).
The line joining the head locations (tl and t 2 ) intersects the
line joining the feet location (bl and b2) at a point v' on the
line at infinity (L),
forming another homology which we
refer to as horizontal homology: Hh = 1- ph $$, where
11,12, v' and v, are as depicted in Fig. 1. A homology has
five degrees of freedom. But this homology can be replaced
by harmonic homologies. Harmonic homology is a special
type of a homology such that ph =
= -1 [ZI. This
reduces the degree of freedom to four i.e. only two point
correspondences are needed to determine this harmonic homology. Also, a harmonic cross-ratio
exists, given as:
- C ~ Q ~ tl,
V PI,
Z ,61) = (*)/(%I
= -1
Similarly, Cross(vf,t2,q2,tl) = -1 for Hh. Thus, in
absence of noise, only two frames are required to calibrate
the camera using these cross ratios.
e,
where Hp and HAmrespectively the projective and affine
components of the projective transformation. Once C&, is
identified, a suitable rectifying hornography is obtained by
using the SVD decomposition: C& = U
[:: :I
o
1
0
UT
where U is the rectifying projectivity. Fig. 2 depicts some
results obtained by rectifying the obtained training trajectories finm nne of nut tast sequences.
From here on, all references to 2-D image coordinates
and trajectories imply rect@ed 2-D image coordinates and
rect@ad trajectories, respectively.
3.Path Model Construction
A typical scene consists of a single camera mounted on
a wall or on a tripod looking at a certain location. For
any object i tracked through n frames, the 2-D image coordinates for the trajectory obtained can be given as Ti =
...
{(XI,2111, (52,2/2), (xn,21n)I.
Wc track thc fcct of thc objccts for morc prccisc mcasurement. The trajectories obtained through the tracker are
sometimes very noisy; therefore, trajectory smoothing is
perfomled.
lkajectory Clustering: Clustering has to be based on some
kind for similarity criteria. We choose the llausdorff disi
tance as our similarity measure. For two trajectories T
and Tj,the Hausdorff distance, D(T,, T ~ ) is
, defined as
D(T4, Tg = maxjd(T4, Tg 1, d(Tj, T4))1, where
2.2. Rectifying Trajectories
The line at infinity 1, intersects w at two complex
conjugate ideal points I and J, called the circular points
[21. The conic dual to the circular points is given as
C; = IJT + J I T , where CL is a degenerate conic consisting of two circular points. Under a point transformation,
C;, invariant under similarity transformation, transforms
as:
One advantage of using Hausdorff distance is that it allows us to compare two trajectories of different lengths. In
order to cluster trajectories into different paths, we formulate a complete graph where the weight of each edge is determined by the Hausdorff distance between the two trajectories. Spatially proximal trajectories will have small
weighs because 01lesser Hausdorfl dismnce, aud vice
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply.
Figure 3. A typical scene where an object is traversing an existing path. An average trajectory d an envelope boundary are
calculated for each set of clustered trajectories.
versa. The constructed complete graph needs to be partitioned; each partition having one or more trajectories corresponds to a unique path. To perform such a partition accurately and automatically, Normalized-cuts ( [9])are used
recursively to partition the graph. One of the advantage of
this is that the Normalized-cuts avoids bias for partitioning
out small sets of points. This technique makes it possible
to perform recursive cuts by using special properties of the
eigenvectors. Fig. 8a-d shows the results obtained by clustering one of om data set.
3.1. Envelope & Mean Path Construction
Now we create fur d l trajectories in each clustered path
an envelope and a mean path represeiitation. An envelope
can be defined as the spatial extent of a path (cf. Fig. 3(a)).
Applying Dynamic timc Warping (DTW) algorithm ( [4]),
where column represent trajectory A and the row represent
trajectory B, pair-wise correspondences between the two
trajectories is determined. Using DTW,distance at each instance is given by:
model. All points on the candidate trajectory are compared
to the envelope of the path model. The result of this process
is a binary vector with 1 when a trajectory points is inside
the envelope and 0 (zero) when the point is outside the envelope. This information is used to make a final decision for
a candidate trajectory along with the spatio-temporalcurvahire measme. Tf all candidate trajectory points ate outside
the envelope, then this is an outright rejection.
Motion Characteristics: The second step is essential to
discriminatebetween trajectories of varying motion characteristics. The trajectory whose velocity is similar to the velocity characteristics of an existing route is considered simi,i = 0,1, ,N- 1,
lar. Velocity for a trajectory T i ( ~ iyi,ti)
,
is calculated as:
Mean and the standard deviation of the motion characteristics of the training trajectories are computed. A Gaussian
distribution is fitted to model the velocities of the trajectories in the path model. The Mahalanobis distance measure
is used to decide if the test trajectory is anomalous,
Where vi is velocity fiom the test trajectory, rn; is the
mean, cp a distance threshold, and C is the covariance matrix of our path velocity distribution.
Spatio-Temporal Curvature Similarity: The third step
allows us to capture the discontinuity in the velocity, acceleration and position of our trajectory. Thus we are able to
discriminatebetween a person walking in a straight line and
a person walking in an errant path. The velocity vb and acceleration vy, first derivative of the velocity, is used to calculate the curvature of the trajectory. Curvature is defined
as,
j-rc(i.jn
"" ,+e "" ,
-where the distance measure is q ( i ,j) =
ij represents the Euclidean distance, a, represent standard
deviation in spatio-temporal curvature, and a, represent a
suitable standard deviation parameter for the trajectory (in
pixels). This distance measure finds correspondences between trajectories based on the spatial as well as spatiotemporal curvature similarity.
4. Feature Extraction and Test Phase
A path m d e l is developed that distinguishes between
trajectories that are (a) Spatially unlike, (b) Spatially proximal but of different speeds, or (c) Spatially proximal but
crooked. Once the path models are learned as described
above, we extract more features from the trajectories in each
path in order to verify the conformity of a candidate test trajectory,
Spatial Proximity: To verify spatial similarity, membership ol die ESL uajec~oryis verified w die developed palh
where x' and y' are the x and y components of the velocity. Mean and standard deviation of ~ ' are
s determined to
fit a Gaussian distribution for spatio-temporal characteristic. We compare the curvature of the test trajectory with our
distribution using the Mahalanobis distance, bounded by a
threshold. By using this measure we are able to detect irregular motion. For example, a drunkard walking in a zigzag
path, or a person slowing down and making a u-turn.
In summary, we initially detect non-conforming trajectories on the basis of spatial dissimilarity. In case the given
trajectory is spatially similar to one of the path models, the
similarity in the velocity feature of the trajectories in that
path and the given trajectory is computed. If the motion features are also similar then a final check on spatio-temporal
curvature is made. The trajectory is deemed to be anomalous if it fails to satisfy any one of the spatial, velocity or
spalio-lemporal curvalure conslrainls.
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply.
The length of the trajectories varies from 250 points to almost 800 points. The trajectories clustered into paths are
shown in Fig. 8.
6.1. Evaluating auto-calibration
(b)
Figure 4. Topological Map: This figure demonstrate the topological map associated with two test sequences. Entrylexit point is
denoted by n circle, overlapping regions of the trnjectmies are represented by square, and junctions of paths are denoted by triangle.
(a)
5. Scene Topology
One of the desired tasks is to graph a fopological map of
an entire scene i.e. a map giving an overview of the scene.
During the training phase, after determining paths by clustering, we interpolated a meanlaveragetrajectory for the entire path which are then used for generating the map. The
Entry and Exit points of each average trajectory are represented by a node (a circle). The overlapping regions of the
average lrajectories are combined, represented by square
(cf. Fig. 4) and regions where more than one trajectory
intersects form a junction represented by a triangle. TheEntryfixit points (circles), overlapping regions (squares) and
junctions (triangles)close to each other are merged together.
We set this region distance to be 30 pixels (depending on the
noise in the data) for our test sequence. Based on the constructed topological map, entry and exit probabilities associated with each node are calculated as:
PZntry
=
#En
#E'
pa,
-
arm
#Xn
#X
7
- #(#En n #Xm)
#En
Pcntry,c+it -
where #E is the total number of people entering a scene,
# X is the total number of people exiting a scene (equal
to # E in most cases). P&,,, is the probability of an en-
ez,,
is the probability of exit.
try from a node n while
Pn,m
is defined as the probability of an exit from node
m while entering from node n.
,,,,,
6. Results
The proposed system has been tested on two 320 x 240
pixels resolution sequences containing a variety of motion
trajectories:
Seq #1: This is a short sequence of 3730 frames with 15
different trajectories forming two unique paths. The clustered trajectories are shown in Fig. 7. Trajectories obtained
for the training sequence are depicted in Fig. 7(a)(b)(c),
representing different behavior of the pedestrians.
Seq #2: A real sequence of 9284 frames with 27 difr
lerenl lrajeclories lornli~ig3 dillere~ilpallis a l ~clusleri~ig.
Synthetic daW Tell vertical li~lesof same heighl bul
random location are generated to represent pedestrian in
our synthetic data. We gradually add a Gaussian noise with
p = 0 and 5 1.5 pixels to the data-points making up the
vertical lines. Hh and H, are estimated by taking two vertical lines at a time. The over-determinedsystem of equations
is constructed after the vanishing points are estimated. We
perform 1000independent trials for each noise level, the r e
sults are shown in Fig. 5. The relative error in f increases
almost linearly with respect to the noise level. For a maximum noise of 1.5pixels, we found that the error was under
7%. The absolute error in the rotation angles also increase
linearly and is well under 1degree, as shown in Fig.S(b).
Real Data: As reported by [lo], the mean of the estimated focal length is taken as the ground truth and the standard deviation as a measure of uncertainty in the results.
As shown in Fig. 6, different pedestrians are chosen for
auto-calibration,Using the method described above, the focal length is determined using the robust TLS method. The
results for this sequences are given in Table l(1eft column).
The standard deviation is low and the estimated focal length
is f 2307.932f44.12.Seq #2 is an another sequence
used for testing, a couple of instances are shown in Fig. 6fg. Thc cstimatcd focal lcngths arc vcry closc to cach othcr,
as shown in Table l(right column - top).
The error in the results can be attributed to many factors. One of the main reason is that only a few kames are
used per syumce. The stwdwd deviation for f in all our
experiments is found to be less than reported by [5].
-
6.2. Evaluating path modeling
One test case from Seq #1 is shown in Fig.7(d). The
training sequence only contained people walking in the
scene. But the bicyclist shown in (d) has motion characteristics different (containing faster movement) than the training cases, hence detected as abnormal behavior (displayed
in red).
(a)
(b)
Figure 5. Autocalibration method VS.Noise level in pixels.
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply.
-
Figure 6. (a) (i) from left-tcwight: The figure depicts instances
of the data sets used for testing the proposed auto-calibration
method. The estimated head and foot locations are marked with
circle. Different frames are super-imposed on the background image to better visualize the test data.
Figure 9. h g e rectification and registration results for Seq # 2
Seq#l
I
1
and Seq X 1. See text for more details.
(f1
Fig. 6a
1
f = 2362.48
Fig. 6c
Fig. 6d
Fig, 6e
seq #2
Fig. 6f
I
f = 2287.68
of our proposed system lies in the spheres where there is
a defined behavior, differentiable from certain other unacceptable behavior for, lets say, security reasons.
f = 2295.54
f = 2252.24
(f1
7. Registration to Aerial Imagery
f = 2040.06
Table 1 . The recovered focal length for (siartingfmthe l e j column, going clock wise direcrion) S q #1 and Seq #2.
(a)
(b)
(c)
(dl
Figure 7. (b)(c) show three clustered path for Seq #1 while (a)
shows all the trajectories in the training phase. (d) demonstrates a
test case where a bicyclist crosses the scene at a velocity greater
than the pedestrians observed during the training phase.
I
Figure 8. All the trajectories in the training set and the subsequent
clusters are shown in (a)-(d). (e)-(g) show instances of a drunkard walking, aperson running, and a person walking, respectively.
Red trajectories denote unusual behavior while the black trajectories are the casual behavior.
Three test cases from Seq #2 are depicted in Fig. 8(e)(g). A person walking in a zig-zag fashion (Fig. 8(e)), and a
person running (Fig. 8(f)) are flagged for an activity that is
considered as an unusual behavior. Fig. 8(g) demonstrates
a case where a person walks at a normal pase in conforming
behavior.
The system gives satisfactory results for our experiments. Although some existing methods do incorporate
model update, we believe this is what leads to a wrod~ldrift.
That is, after a number of updates the model can become
general enough to accommodate any behavior considering
iI as acceplable behavior. BUI cerlai~ily, he applicabili~y
Registration to the satellite imagery gives a global view
of the scene under observation. It is known that under projective imaging, a plane is mapped to the image plane by
a perspective transformation. Since we do not have the
knowledge of Euclidean world coordinates of (at least) four
points and we want to make the process automatic, the estimated affineand the perspective transform can be combined
together to efficientlymetric rectify the video sequence such
that the only unknown transformationis the similarity transformation.
The results obtained by rectifying the test sequences ate
shown in Fig. 9. A frame from the test sequence Seq # 2
is rectified by using the line at infinity which is obtained as:
1, = wv,. 'I'he obtained circular points are used to construct the conic C& in order to obtain the rectifying projectivity, as described in Section 2.2. The rectified image
is shown in Fig. 9(a), and the registered image is shown in
Fig. 9(b). Similar computation for Seq # 1produce the rectlfied images as shown in 9(c) and 9(d), respectively. Due
to some newly constructed structures, the satellite imagery
for Seq # 2 and Seq # 1is considerably different Erom the
test sequences, hence the images are not registered.
The proposed method is automated and provides satisfactory rasults. Thus for any test sequences, the ohtained
path model can be mapped to the corresponding satellite
image in order to obtain a global view - representing the
behavior of pedestrians in that particular area.
8. Conclusion
A unified method for path modeling, detection and
surveillance is proposed in this paper, We propose a novel
linear method for auto-calibrating any camera that may be
involved. After calibration, the trajectory data is metric rectified to represent a uuer picture of the data. Metric rectified
observed scene is regislered LO aerial view lo exlrac~rrielric
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply.
information from the video sequence, for example, the actual speed of an object. We extract spatial, velocity and
spatio-temporal curvature based features from the clustered
paths and use it for unusual behavior detection. Calibration
method and the path modeling method has been extensively
tested on a number of sequences and have demonstrated satisfactory results.
References
[I] B. Caprile and V. Torre. Using vanishing points for
camera calibration. Int. J. Comput. Vision, 4(2):127140,1990.
[21 R. I. Hartley and A. Zisserman. Multiple Mm Geometry in Computer Vision. Cambridge University Press,
ISBN: 05215405 18, second edition, 2004.
[3] 0.Javed and M. Shah. Tracking and object classification for automated surveillance. In the seventh European Conference on Computer Vision, 2002.
[4] E.Keogh. Exact indexing of dynamic time warping.
In 28th International Conference on Very Large Data
Bases. Hong Kong, pages 4 0 6 4 17,2002.
[5] N. Krahnstuevm and P. R. S. Mendonca. Bayesian
autocalibration for surveillance. In Tenth IEEE International Conference on Computer Ksion, 2005.
[61 D. Liebowitz and A. Zisserman. Combining scene
and auto-calibration consaaints. In Proc. IEEE ICCV,
pages 293-300,1999.
[7] E Lv, T. Zhao, and R. Nevatia. Self-calibration of a
camera from video of a walking human. In ZEEE Zntemtional Conference of Pallem Recognition, 2002.
[8] D. Makris and T. Ellis. Path detection in video surveillance. Image and Msion Computing Journal (IVC),
20( 12):895-903,2002,
191 J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis
and Machine Intelligence(PMI), 2000.
[lo] Z. Zhang. A flcxiblc ncw tcchniquc for camcra calibration. IEEB Tram. Panem Anal. Mach Zatell.,
22(11):1330-1334,2000.
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply.