3D Object Representations for Fine

Transcription

3D Object Representations for Fine
3D Object Representations for
Fine-Grained Categorization
Jonathan Krause, Michael Stark,
Jia Deng, Li Fei-Fei
What is this?
What is this?
Car
What is this?
Sedan
What is this?
BMW Sedan
What is this?
BMW 3-Series Sedan
What is this?
2013 BMW 3-Series Sedan
What is this?
2013 BMW 3-Series Sedan 328i
Difficulty
How many classes are there?
Difficulty
How many classes are there?
Why 3D?
Why 3D?
Related Work
• Many works on fine-grained recognition and
3D recognition
• Birdlets
– 3D volumetric bird model
– Pose normalization
– Extensive training annotations
Birdlets: Subordinate Categorization Using Volumetric Primitives and Pose-Normalized Appearance.
R. Farrell, O. Oza, N. Zhang, V. I. Morariu, T. Darrell, L. S. Davis. ICCV 2011
Method Overview
1. Estimate 3D geometry
2. Calculate appearance w.r.t. geometry
3. Use appearance in 3D representation
Getting 3D Geometry
• Train geometry classifier from synthetic data
– Generate synthetic data from CAD models
– Group synthetic data by azimuth, elevation, and
coarse type
• sedan, coupe, convertible, SUV, pickup, hatchback,
station wagon
– SVM
• At test time use multiple hypotheses
Base HOG features
Learned classifier
Synthetic Data
•
•
•
•
•
41 CAD models
36 azimuths
4 elevations
10 backgrounds
59,040 synthetic images w/full 3D annotations
Appearance
• Sample patches directly from 3D surface
• Rectify patches for viewpoint invariance
3D Representation 1: SPM-3D
• Extension of Spatial Pyramid Matching to 3D
1. Compute features for each patch
2. Pool over regions on object surface
We use 1x1,2x2,4x4 pooling regions
Beyond Bags of Features: Spatial Pyramid Matching for recognizing natural scene categories.
S. Lazebnik, C. Schmid, J. Ponce. CVPR 2006
3D Representation 2: BB-3D
• 3D version of randomized BubbleBank [Deng et al. CVPR 2013]
• BB-2D: random templates + local pooling regions
Fine-Grained Crowdsourcing for Fine-Grained Recognition. J. Deng, J. Krause, L. Fei-Fei. CVPR 2013
BubbleBank-3D
1. Randomly sample templates
2. Pool over local 3D region
Fine-Grained Car Datasets
• Existing datasets are small and not very fine-grained
– car-types: 14 classes, variety of coarse categories
• Two new datasets:
– BMW-10: Ten classes, ultra-fine-grained
– car-197: 197 classes, much bigger
• In terms of # images:
car-types
car-197
Fine-Grained Categorization for Scene Understanding.
M Stark, J. Krause, B. Pepik, D. Meger, J.J. Little, B. Schiele, D. Koller. BMVC 2012
BMW-10
• 10 types of BMWs, 512 images, many
viewpoints, bounding boxes, hand-curated
Car-197
•
•
•
•
197 car models, 16,185 images
Collected very carefully on AMT
Slightly modified version in FGComp
Standalone dataset out soon
Fine-Grained Challenge 2013. http://sites.google.com/site/fgcomp2013
Experiments: BMW-10
70
60
Accuracy
50
40
30
20
10
0
3D works!
BB-3D: Local vs. Global
• BB-3D-L: 64.7%, BB-3D-G: 66.1%
• Why global pooling can work:
– More robust w.r.t. difficult viewpoints
– Left-right symmetry
Experiments: car-types
100
95
Accuracy
90
85
80
75
70
Still works!
Accuracy
Experiments: car-197
78
76
74
72
70
68
66
64
62
60
58
56
LLC+SPM
SPM-3D
BB
BB-3D-G
Stacked
• The problems:
– Underrepresentation of some types of CAD models
– Template vs. codebook approaches with many classes
• The silver lining: Stacking helps a lot :)
Discriminative Bubbles
Discriminative power of templates in BB-3D (BMW-10):
Size/color proportional to
Discriminative features at front/back!
Bonus: Ultra-Wide Baseline Matching
• Measures ability to localize 3D points across viewpoints
• Use BB-3D-L + RANSAC for correspondences
Experiments: Ultra-Wide Baseline Matching
• On 3D Object Classes
BB-3D-S: Single geometry hypothesis
BB-3D-M: Multiple geometry hypotheses
• Works well, state of the art for some baselines
3D Generic Object Categorization, Localization, and Pose Estimation. S. Savarese, L. Fei-Fei. ICCV 2007
[24] 3D2PM – 3D Deformable Part Models. B. Pepik, P. Gehler, M. Stark, B. Schiele. ECCV 2012
[37] Revisiting 3D Geometric Models for Accurate Object Shape and Pose. M. Z. Zia, M. Stark, B. Schiele, M.
Schindler. 3DRR 2011
But Wait, There’s More:
Reconstruction of Category
• Same fine-grained category, different instances,
backgrounds, lighting, etc.
• Pipeline: BB-3D-L for point correspondences→
VisualSFM for bundle adjustment
Conclusion
• Lifted two representations to 3D (SPM-3D, BB3D) which are state of the art on two finegrained datasets
• Two new fine-grained datasets of cars
• Promising initial results on ultra-wide baseline
matching and reconstruction of a fine-grained
category
Thank You!