Euclidean Path Modeling from Ground and Aerial Views
Transcription
Euclidean Path Modeling from Ground and Aerial Views
Euclidean Path Modeling from Ground and Aerial Views* Irnran N. Junejo and Hassan Foroosh { ijunejo I foroosh ) @cs.ucf.edu University of Central Florida, Orlando, Fl328 16 Abstract We address the issue of Euclidean path mu& ling in a single camera for activity monitoring in a multi-camera video surveillance system m e paperpmposes a novel linear solufwn ro auto-calibrafe any camera observing pedestrians and uses these calibrated cameras to detect unusual object behaviot: The input rrajectories are metric rectified and #e inpur sequences are registered to rh sarellia imagery and prototype pa# models are constructed. During #e testing phase, using our simple yet eflcient similarity measures, we seek a relation between the input trajectories derivedfrom a sequence a d h e protorype path models. Real-world pedesrrian sequences are used ro demonstrate rhe practicality of h e proposed method 1. Introduction In path modeling and surveillance,we aim to build a system that, once given an acceptable set of trajectories of objects in a scene, is able to build a path model. Our goal is to learn the routes or paths most commonly taken by objects as they traverse through a scene. Once we have a model for the scene, the method should be able to classify incoming trajectories as conforming to our model or not. Moreover, as common pathways are detected by clustering the trajectories, we can efficiently assign detected trajectory its associated path model, thereby only storing the path label and the object labels instead of the whole trajectory set, resulting in a significant compression for storing surveillance data. 01ie very irnpor~anlrnolivalion lor his syslenl is buildi~iga topology of the scene. After successfully detecting and tracking the observed pedestrians using Javed et al. [3], the first step is the removal of projective distortion from the object trajectories. Thus upgrading them to Euclidean trajectories. This is only possible once the camera is calibrated and we do this by observing the moving pedestrians. Original work on camera *The sup* of the National Science Foundation(NSF) through the Grant # IIS-0644280js gratefully acknowledged. calibration using vanishing points started from the seminal paper by Caprile and Torre [I].Liebowitz et al. [6]developed a method to compute the camera intrinsics by using the Cholesky Decomposition [2]. Recently, Lv et al. [7] and Mendonqa [51 proposed methods for auto-calibration by observing pedestrians. Once the object trajectories are metric rectified, a path model for the scene is constructed. The most related work is that of Makris and Ellis (81 where they develop a spatial model to represent the routes in an image. Once a trajectory of a moving object is obtained, it is matched with routes already existing in a database using a simple distance measure. If a match is found, the existing route is updated by a weight update function; otherwise a new route is created for this new trajectory having some initial weight. The system cannot lstinguish between a person walking and a person lingering aroulid, or be~weeria ru111iirig aid a walki~igperson. There exist no stopping criteria for merging of routes. In this paper, we present a novel linear method to metric rectify object trajectories by observing pedestrians in a scene(Section 2). A novel application of Normalized-cuts is used to cluster the rectified trajectories into distinct paths (Section 3). Once the trajectories are clustered, we extract meaningful features to build our model and check the conlorrnily o l a LGSLvajec~oryLO our buil~path rnodel (Secho~i 4). We rigourously test the proposed method on real and synthetic data and the results are encouraging (Section 8). We also demonstrate an application of the proposed method for registration to the satellite imagery (Section 7). 2. Training Phase During training phase, our goal is toflrst: calibrate the camera. Once the camera is calibrated, the extracted object trajectories are metric rectified. Second, to cluster the input lrajectories and a build a model based on our features (Section 3) which are then be used to test the incoming uajecwries. 1-4244-1180-7/07/$25.00 ©2007 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply. Figure 1. (Left):A pdestriau detected at two different time instances provides vertical vanishing point (v,) and an another vanishing point (v') lying on thc horizon linc of thc ground planc. (Right): Trachng pedeshians over any two frames provides two harmonic homologies. (a) (b) (c) (4 Figure 2. Rectified lkajectories:(b) represents reconstructed trajectories for Seq #2 - shown in (a), while (d) represents Seq #3, shown in (c),rectified. 2.1. Camera Calibration from pedestrians As an object or a pedestrian of height h traverses the ground plane, each location on this plane corresponds to exactly one location on the head plane. Without loss of generality, for a simple case of two frames, as shown in Fig. 1, this correspondence can be mapped by a homography. In fact, this special type of homography is referred to as a homology [Z].We refer to this homology as the vertical homology: H, = 1- p, v lT where v, and 11 are, respectively, apex and axis of the homology. Another important geometric relation, so far ignored in existing literature on camera calibration from pedesuians, is the homology existing between different locations of a pedesuian (cf. Fig. 1). The line joining the head locations (tl and t 2 ) intersects the line joining the feet location (bl and b2) at a point v' on the line at infinity (L), forming another homology which we refer to as horizontal homology: Hh = 1- ph $$, where 11,12, v' and v, are as depicted in Fig. 1. A homology has five degrees of freedom. But this homology can be replaced by harmonic homologies. Harmonic homology is a special type of a homology such that ph = = -1 [ZI. This reduces the degree of freedom to four i.e. only two point correspondences are needed to determine this harmonic homology. Also, a harmonic cross-ratio exists, given as: - C ~ Q ~ tl, V PI, Z ,61) = (*)/(%I = -1 Similarly, Cross(vf,t2,q2,tl) = -1 for Hh. Thus, in absence of noise, only two frames are required to calibrate the camera using these cross ratios. e, where Hp and HAmrespectively the projective and affine components of the projective transformation. Once C&, is identified, a suitable rectifying hornography is obtained by using the SVD decomposition: C& = U [:: :I o 1 0 UT where U is the rectifying projectivity. Fig. 2 depicts some results obtained by rectifying the obtained training trajectories finm nne of nut tast sequences. From here on, all references to 2-D image coordinates and trajectories imply rect@ed 2-D image coordinates and rect@ad trajectories, respectively. 3.Path Model Construction A typical scene consists of a single camera mounted on a wall or on a tripod looking at a certain location. For any object i tracked through n frames, the 2-D image coordinates for the trajectory obtained can be given as Ti = ... {(XI,2111, (52,2/2), (xn,21n)I. Wc track thc fcct of thc objccts for morc prccisc mcasurement. The trajectories obtained through the tracker are sometimes very noisy; therefore, trajectory smoothing is perfomled. lkajectory Clustering: Clustering has to be based on some kind for similarity criteria. We choose the llausdorff disi tance as our similarity measure. For two trajectories T and Tj,the Hausdorff distance, D(T,, T ~ ) is , defined as D(T4, Tg = maxjd(T4, Tg 1, d(Tj, T4))1, where 2.2. Rectifying Trajectories The line at infinity 1, intersects w at two complex conjugate ideal points I and J, called the circular points [21. The conic dual to the circular points is given as C; = IJT + J I T , where CL is a degenerate conic consisting of two circular points. Under a point transformation, C;, invariant under similarity transformation, transforms as: One advantage of using Hausdorff distance is that it allows us to compare two trajectories of different lengths. In order to cluster trajectories into different paths, we formulate a complete graph where the weight of each edge is determined by the Hausdorff distance between the two trajectories. Spatially proximal trajectories will have small weighs because 01lesser Hausdorfl dismnce, aud vice Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply. Figure 3. A typical scene where an object is traversing an existing path. An average trajectory d an envelope boundary are calculated for each set of clustered trajectories. versa. The constructed complete graph needs to be partitioned; each partition having one or more trajectories corresponds to a unique path. To perform such a partition accurately and automatically, Normalized-cuts ( [9])are used recursively to partition the graph. One of the advantage of this is that the Normalized-cuts avoids bias for partitioning out small sets of points. This technique makes it possible to perform recursive cuts by using special properties of the eigenvectors. Fig. 8a-d shows the results obtained by clustering one of om data set. 3.1. Envelope & Mean Path Construction Now we create fur d l trajectories in each clustered path an envelope and a mean path represeiitation. An envelope can be defined as the spatial extent of a path (cf. Fig. 3(a)). Applying Dynamic timc Warping (DTW) algorithm ( [4]), where column represent trajectory A and the row represent trajectory B, pair-wise correspondences between the two trajectories is determined. Using DTW,distance at each instance is given by: model. All points on the candidate trajectory are compared to the envelope of the path model. The result of this process is a binary vector with 1 when a trajectory points is inside the envelope and 0 (zero) when the point is outside the envelope. This information is used to make a final decision for a candidate trajectory along with the spatio-temporalcurvahire measme. Tf all candidate trajectory points ate outside the envelope, then this is an outright rejection. Motion Characteristics: The second step is essential to discriminatebetween trajectories of varying motion characteristics. The trajectory whose velocity is similar to the velocity characteristics of an existing route is considered simi,i = 0,1, ,N- 1, lar. Velocity for a trajectory T i ( ~ iyi,ti) , is calculated as: Mean and the standard deviation of the motion characteristics of the training trajectories are computed. A Gaussian distribution is fitted to model the velocities of the trajectories in the path model. The Mahalanobis distance measure is used to decide if the test trajectory is anomalous, Where vi is velocity fiom the test trajectory, rn; is the mean, cp a distance threshold, and C is the covariance matrix of our path velocity distribution. Spatio-Temporal Curvature Similarity: The third step allows us to capture the discontinuity in the velocity, acceleration and position of our trajectory. Thus we are able to discriminatebetween a person walking in a straight line and a person walking in an errant path. The velocity vb and acceleration vy, first derivative of the velocity, is used to calculate the curvature of the trajectory. Curvature is defined as, j-rc(i.jn "" ,+e "" , -where the distance measure is q ( i ,j) = ij represents the Euclidean distance, a, represent standard deviation in spatio-temporal curvature, and a, represent a suitable standard deviation parameter for the trajectory (in pixels). This distance measure finds correspondences between trajectories based on the spatial as well as spatiotemporal curvature similarity. 4. Feature Extraction and Test Phase A path m d e l is developed that distinguishes between trajectories that are (a) Spatially unlike, (b) Spatially proximal but of different speeds, or (c) Spatially proximal but crooked. Once the path models are learned as described above, we extract more features from the trajectories in each path in order to verify the conformity of a candidate test trajectory, Spatial Proximity: To verify spatial similarity, membership ol die ESL uajec~oryis verified w die developed palh where x' and y' are the x and y components of the velocity. Mean and standard deviation of ~ ' are s determined to fit a Gaussian distribution for spatio-temporal characteristic. We compare the curvature of the test trajectory with our distribution using the Mahalanobis distance, bounded by a threshold. By using this measure we are able to detect irregular motion. For example, a drunkard walking in a zigzag path, or a person slowing down and making a u-turn. In summary, we initially detect non-conforming trajectories on the basis of spatial dissimilarity. In case the given trajectory is spatially similar to one of the path models, the similarity in the velocity feature of the trajectories in that path and the given trajectory is computed. If the motion features are also similar then a final check on spatio-temporal curvature is made. The trajectory is deemed to be anomalous if it fails to satisfy any one of the spatial, velocity or spalio-lemporal curvalure conslrainls. Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply. The length of the trajectories varies from 250 points to almost 800 points. The trajectories clustered into paths are shown in Fig. 8. 6.1. Evaluating auto-calibration (b) Figure 4. Topological Map: This figure demonstrate the topological map associated with two test sequences. Entrylexit point is denoted by n circle, overlapping regions of the trnjectmies are represented by square, and junctions of paths are denoted by triangle. (a) 5. Scene Topology One of the desired tasks is to graph a fopological map of an entire scene i.e. a map giving an overview of the scene. During the training phase, after determining paths by clustering, we interpolated a meanlaveragetrajectory for the entire path which are then used for generating the map. The Entry and Exit points of each average trajectory are represented by a node (a circle). The overlapping regions of the average lrajectories are combined, represented by square (cf. Fig. 4) and regions where more than one trajectory intersects form a junction represented by a triangle. TheEntryfixit points (circles), overlapping regions (squares) and junctions (triangles)close to each other are merged together. We set this region distance to be 30 pixels (depending on the noise in the data) for our test sequence. Based on the constructed topological map, entry and exit probabilities associated with each node are calculated as: PZntry = #En #E' pa, - arm #Xn #X 7 - #(#En n #Xm) #En Pcntry,c+it - where #E is the total number of people entering a scene, # X is the total number of people exiting a scene (equal to # E in most cases). P&,,, is the probability of an en- ez,, is the probability of exit. try from a node n while Pn,m is defined as the probability of an exit from node m while entering from node n. ,,,,, 6. Results The proposed system has been tested on two 320 x 240 pixels resolution sequences containing a variety of motion trajectories: Seq #1: This is a short sequence of 3730 frames with 15 different trajectories forming two unique paths. The clustered trajectories are shown in Fig. 7. Trajectories obtained for the training sequence are depicted in Fig. 7(a)(b)(c), representing different behavior of the pedestrians. Seq #2: A real sequence of 9284 frames with 27 difr lerenl lrajeclories lornli~ig3 dillere~ilpallis a l ~clusleri~ig. Synthetic daW Tell vertical li~lesof same heighl bul random location are generated to represent pedestrian in our synthetic data. We gradually add a Gaussian noise with p = 0 and 5 1.5 pixels to the data-points making up the vertical lines. Hh and H, are estimated by taking two vertical lines at a time. The over-determinedsystem of equations is constructed after the vanishing points are estimated. We perform 1000independent trials for each noise level, the r e sults are shown in Fig. 5. The relative error in f increases almost linearly with respect to the noise level. For a maximum noise of 1.5pixels, we found that the error was under 7%. The absolute error in the rotation angles also increase linearly and is well under 1degree, as shown in Fig.S(b). Real Data: As reported by [lo], the mean of the estimated focal length is taken as the ground truth and the standard deviation as a measure of uncertainty in the results. As shown in Fig. 6, different pedestrians are chosen for auto-calibration,Using the method described above, the focal length is determined using the robust TLS method. The results for this sequences are given in Table l(1eft column). The standard deviation is low and the estimated focal length is f 2307.932f44.12.Seq #2 is an another sequence used for testing, a couple of instances are shown in Fig. 6fg. Thc cstimatcd focal lcngths arc vcry closc to cach othcr, as shown in Table l(right column - top). The error in the results can be attributed to many factors. One of the main reason is that only a few kames are used per syumce. The stwdwd deviation for f in all our experiments is found to be less than reported by [5]. - 6.2. Evaluating path modeling One test case from Seq #1 is shown in Fig.7(d). The training sequence only contained people walking in the scene. But the bicyclist shown in (d) has motion characteristics different (containing faster movement) than the training cases, hence detected as abnormal behavior (displayed in red). (a) (b) Figure 5. Autocalibration method VS.Noise level in pixels. Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply. - Figure 6. (a) (i) from left-tcwight: The figure depicts instances of the data sets used for testing the proposed auto-calibration method. The estimated head and foot locations are marked with circle. Different frames are super-imposed on the background image to better visualize the test data. Figure 9. h g e rectification and registration results for Seq # 2 Seq#l I 1 and Seq X 1. See text for more details. (f1 Fig. 6a 1 f = 2362.48 Fig. 6c Fig. 6d Fig, 6e seq #2 Fig. 6f I f = 2287.68 of our proposed system lies in the spheres where there is a defined behavior, differentiable from certain other unacceptable behavior for, lets say, security reasons. f = 2295.54 f = 2252.24 (f1 7. Registration to Aerial Imagery f = 2040.06 Table 1 . The recovered focal length for (siartingfmthe l e j column, going clock wise direcrion) S q #1 and Seq #2. (a) (b) (c) (dl Figure 7. (b)(c) show three clustered path for Seq #1 while (a) shows all the trajectories in the training phase. (d) demonstrates a test case where a bicyclist crosses the scene at a velocity greater than the pedestrians observed during the training phase. I Figure 8. All the trajectories in the training set and the subsequent clusters are shown in (a)-(d). (e)-(g) show instances of a drunkard walking, aperson running, and a person walking, respectively. Red trajectories denote unusual behavior while the black trajectories are the casual behavior. Three test cases from Seq #2 are depicted in Fig. 8(e)(g). A person walking in a zig-zag fashion (Fig. 8(e)), and a person running (Fig. 8(f)) are flagged for an activity that is considered as an unusual behavior. Fig. 8(g) demonstrates a case where a person walks at a normal pase in conforming behavior. The system gives satisfactory results for our experiments. Although some existing methods do incorporate model update, we believe this is what leads to a wrod~ldrift. That is, after a number of updates the model can become general enough to accommodate any behavior considering iI as acceplable behavior. BUI cerlai~ily, he applicabili~y Registration to the satellite imagery gives a global view of the scene under observation. It is known that under projective imaging, a plane is mapped to the image plane by a perspective transformation. Since we do not have the knowledge of Euclidean world coordinates of (at least) four points and we want to make the process automatic, the estimated affineand the perspective transform can be combined together to efficientlymetric rectify the video sequence such that the only unknown transformationis the similarity transformation. The results obtained by rectifying the test sequences ate shown in Fig. 9. A frame from the test sequence Seq # 2 is rectified by using the line at infinity which is obtained as: 1, = wv,. 'I'he obtained circular points are used to construct the conic C& in order to obtain the rectifying projectivity, as described in Section 2.2. The rectified image is shown in Fig. 9(a), and the registered image is shown in Fig. 9(b). Similar computation for Seq # 1produce the rectlfied images as shown in 9(c) and 9(d), respectively. Due to some newly constructed structures, the satellite imagery for Seq # 2 and Seq # 1is considerably different Erom the test sequences, hence the images are not registered. The proposed method is automated and provides satisfactory rasults. Thus for any test sequences, the ohtained path model can be mapped to the corresponding satellite image in order to obtain a global view - representing the behavior of pedestrians in that particular area. 8. Conclusion A unified method for path modeling, detection and surveillance is proposed in this paper, We propose a novel linear method for auto-calibrating any camera that may be involved. After calibration, the trajectory data is metric rectified to represent a uuer picture of the data. Metric rectified observed scene is regislered LO aerial view lo exlrac~rrielric Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply. information from the video sequence, for example, the actual speed of an object. We extract spatial, velocity and spatio-temporal curvature based features from the clustered paths and use it for unusual behavior detection. Calibration method and the path modeling method has been extensively tested on a number of sequences and have demonstrated satisfactory results. References [I] B. Caprile and V. Torre. Using vanishing points for camera calibration. Int. J. Comput. Vision, 4(2):127140,1990. [21 R. I. Hartley and A. Zisserman. Multiple Mm Geometry in Computer Vision. Cambridge University Press, ISBN: 05215405 18, second edition, 2004. [3] 0.Javed and M. Shah. Tracking and object classification for automated surveillance. In the seventh European Conference on Computer Vision, 2002. [4] E.Keogh. Exact indexing of dynamic time warping. In 28th International Conference on Very Large Data Bases. Hong Kong, pages 4 0 6 4 17,2002. [5] N. Krahnstuevm and P. R. S. Mendonca. Bayesian autocalibration for surveillance. In Tenth IEEE International Conference on Computer Ksion, 2005. [61 D. Liebowitz and A. Zisserman. Combining scene and auto-calibration consaaints. In Proc. IEEE ICCV, pages 293-300,1999. [7] E Lv, T. Zhao, and R. Nevatia. Self-calibration of a camera from video of a walking human. In ZEEE Zntemtional Conference of Pallem Recognition, 2002. [8] D. Makris and T. Ellis. Path detection in video surveillance. Image and Msion Computing Journal (IVC), 20( 12):895-903,2002, 191 J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence(PMI), 2000. [lo] Z. Zhang. A flcxiblc ncw tcchniquc for camcra calibration. IEEB Tram. Panem Anal. Mach Zatell., 22(11):1330-1334,2000. Authorized licensed use limited to: IEEE Xplore. Downloaded on January 30, 2009 at 10:50 from IEEE Xplore. Restrictions apply.