Davide Baltieri, Roberto Vezzani, Rita Cucchiara Ákos

Transcription

Davide Baltieri, Roberto Vezzani, Rita Cucchiara Ákos
Multi-View People Surveillance Using 3D Information
Davide Baltieri, Roberto Vezzani, Rita Cucchiara
Ákos Utasi, Csaba Benedek, Tamás Szirányi
{davide.baltieri,roberto.vezzani,rita.cucchiara}@unimore.it
{utasi,bcsaba,sziranyi}@sztaki.hu
Motivation
Pixel-Level Features
Short-Term Tracking
People Re-Identification
Goal: localize and track people in the scene
Assumptions:
• Scene monitored by multiple cameras
Head feature:
Costant Velocity Kalman Filter:
• Unmatched detection ⇒ create new object
• Detection assigned to object ⇒ update object
state
• Unmatched track ⇒ Kalman prediction or
delete
• Output: segmented/broken trajectories
Find correspondences between models:
• Vertex-to-vertex Hellinger distance:
p t
p
t
d v , v = dH H , H =
s
Xp
H p (h, s, v) · H t (h, s, v)
= 1−
• Cameras: calibrated + overlapping FOV
Work-flow of the System
fhi (p) =
+
−
h
(p)
−Area
A
∩S
Area(Ah
∩S
)
(
i
i
h
h (p))
+
Area(Sh
(p))
Closed leg feature:
+
−
0
0
Area
A
∩S
(p)
−Area
A
∩S
( i cl )
( i cl (p))
i
fcl (p) =
+
Area(Scl
(p))
Open leg feature:
+
−
0
0
Area
A
∩S
(p)
−Area
A
∩S
(
)
(
i
i
ol
ol (p))
i
fol (p) =
+
Area(Sol
(p))
Joint leg feature:
i
i
i
fl (p) = max fcl (p), fol (p)
Dynamic range: truncate fli (p) and fhi (p) to [0, fˆ],
normalize by fˆ
Long-Term Tracking
Feature Fusion
P0
+
Extended version of [3]:
• Connects broken tracks by people reidentification
• 3D body model is placed and oriented:
– Height: from the people detection
– Orientation: from the last K positions
• Appearance features are extracted for
matching body models
P168cm
×
+
=
h,s,v
• Model-to-model distance:
P
p t
(wi · d(vi , vi ))
p
t
i=1...M
P
D(Γ , Γ ) =
i=1...M (wi )
• Weights are computed from saliency
p
p
t
wi = f (θi ) · f (θi ) · si
p
p
t
si ∝ min dH (Hi , Hi ) + s0
p
si :
t
Experiments
Improved localization accuracy:
• 5% improvement over [1]
Original
method[1]
f (p, 168cm)
Extended
f (p, h) =
People Detection
Extended version of [1]:
1. Multi-plane projection: to Ground plane P0 ,
and to Parallel planes Pz
q
1
N
PN
i (p) ×
f
i=1 l
1
N
i (p)
f
i=1 h
3-D Marked Point Process Model
Person object: cylinder u, with constant radius R
Optimal object configuration: minimize energy
Data
Prior
zX }|
zX }| {
{
ΦD (ω) =
JD (u) +γ ·
I(u, v)
u∈ω
z<h
z=h
z>h
Data term:
u,v∈ω
u∼v
Prior term:
2. Pixel-level feature extraction: from the projected foreground masks
JD (u) ∈ [−1, 1]
method
PN
I(u, v) ∈ [0, 1]
Long-term tracking performance:
• Recall: 88.8%
Feature Extraction
1. Project the vertex to the camera image
2. Initialize the vertex features:
• Normal vector ~ni : static, pre-computed
• Mean color ci
• Local HSV histogram Hi
• Optical reliability: θi = ~ni · p~, i.e. frontviewed vertices are favoured
• Saliency si : uniqueness (e.g. logo)
3. Vertices outside the person silhouette:
• Copy features from the nearest vertex
• Use θi = 0
• Precision: 72.73%
References
[1] Á. Utasi, C. Benedek. A 3-D Marked Point Process Model
for Multi-View People Detection, CVPR, 2011
[2] X. Descombes, R. Minlos, E. Zhizhina. Object Extraction
Using a Stochastic Birth-and-Death Dynamics in Continuum, J. of Math. Imaging and Vision, 2009
[3] D. Baltieri, R. Vezzani, R. Cucchiara. 3D Body Model
Construction and Matching for Real Time People Reidentification, EG-IT, 2010
Acknowledgement
This work has been done within the THIS project with the support of the
Features of [1]
New feature
Optimization:
• Multiple Birth-and-Death Dynamics [2]
Prevention, Preparedness and Consequence Management of Terrorism
and other Security-related Risks Programme European Commission Directorate-General Justice, Freedom and Security.