Fully Convolutional Neural Networks for Classification, Detection
Transcription
Fully Convolutional Neural Networks for Classification, Detection
1 Fully Convolutional Neural Networks for Classification, Detection & Segmentation or, all your computer wanted to know about horses Iasonas Kokkinos Ecole Centrale Paris / INRIA Saclay & G. Papandreou, P.-A. Savalle, S. Tsogkas, L-C Chen, K. Murphy, A. Yuille, A. Vedaldi 2 Fully convolutional neural networks convolutional fully connected 3 Fully convolutional neural networks convolutional Fully connected layers: 1x1 spatial convolution kernels Allows network to process images of arbitrary size P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus and Y. LeCun, OverFeat, ICLR, 2014 M. Oquab, L. Bottou, I. Laptev, J. Sivic, Weakly Supervised Object Recognition with CNNs, TR2014 J. Long, E. Shelhamer, T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 15 4 Fully convolutional neural networks FCNN 5 Fully convolutional neural networks FCNN 6 Fully convolutional neural networks FCNN 7 Fully convolutional neural networks FCNN 8 Fully convolutional neural networks FCNN Fast (shared convolutions) Simple (dense) 9 Part 1: FCNNs for classification & detection (CVPR’15) G. Papandreou, TTI /Google P.-A. Savalle ECP /CISCO Part 2: FCNNs for semantic segmentation (ICLR 15?) G. Papandreou, TTI /Google L-C. Chen, UCLA K. Murphy, Google A. Yuille, UCLA Part 2.5: FCNNs for part segmentation (on-going) G. Papandreou, TTI /Google S. Tsogkas ECP A. Vedaldi, Oxford 10 Part 1: FCNNs for classification & detection G. Papandreou P.-A. Savalle G. Papandreou, I. Kokkinos and P. A. Savalle, Untangling Local and Global Deformations in Deep Convolutional Networks for Image Classification and Sliding Window Detection, arXiv:1412.0296, 2014 & CVPR 2015 11 Category-dependent Scale-Invariant classification Scale-dependent x ! {xs1 , . . . , xsK } F (x) ! {F (xs1 ), . . . , F (xsK )} 12 Scale Invariant classification Object’s scale Poor discriminative power Invariant classifier Classifier mixture Smaller training sets Scale-tuned classifier Requires normalized data x ! {xs1 , . . . , xsK } MIL: ‘bag’ of features F (x) ! {F (xs1 ), . . . , F (xsK )} K X 1 F 0 (x) = F (xsk ) K This work: F 0 (x) = max F (xsk ) k k=1 A. Howard. Some improvements on deep convolutional neural network based image classification, 2013. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014. 13 Multiple Instance Learning via Max-Pooling 220x220x3 I(x,y) pyramid 1x1x20 stich I(x,y,s) FCNN Patchwork(x,y) F(x,y) Max-Pooling End to end training! Class Score F 0 (x) = max F (xsk ) k Baseline: maxepitomic DCNN epitomic DCNN + MIL pooled net ~1% gain ~2% gain 13.0% 11.9% 10.00% Top-5 error. All DCNNs have 6 convolutional and 2 fully-connected layers. 14 Towards Object Detection 220x220x3 I(x,y) pyramid stich I(x,y,s) FCNN Patchwork(x,y) Search over position and scale: done! Missing: aspect ratio 1x1x20 F(x,y) 15 ‘Squeeze-Invariant’ classification Hyberbolic mapping Poor discriminative power Invariant classifier Classifier mixture Smaller training sets Ratio-tuned classifier Requires normalized data Aspect ratio 16 The Greeks did it first: Procrustes F.L. Bookstein, Morphometric tools for landmark data, Cambridge University Press, (1991). T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham (1995). "Active shape models - their training and application". Computer Vision and Image Understanding (61): 38–59 Detection on Procrustes’ bed car window 17 18 Explicit search over aspect ratio, scale & position 19 Explicit search over aspect ratio, scale & position 20 Explicit search over aspect ratio, scale & position 21 Explicit search over aspect ratio, scale & position 22 Pascal VOC: best sliding-window detector 1st row: us + VGG network, 56.4 mAP (6-10 seconds) 2nd row: RCNN + VGG network, 62.2 mAP (60 seconds) 3rd row: RCNN + AlexNet, 54.2 mAP (10 seconds) 4th row: end-to-end DPM 46.9 mAP End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression, Li Wan, David Eigen, Rob Fergus, Arxiv 14, CVPR 15 56.4 mAP: first shot -no hinge loss -no hard negative mining -smaller (100x100) inputs, smaller network lots of room for improvement! Part 2: FCNNs for semantic segmentation G. Papandreou L-C. Chen, UCLA K. Murphy, Google A. Yuille, UCLA L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, http://arxiv.org/abs/1412.7062 23 Semantic segmentation task 24 System outline J. Long, E. Shelhamer, T. Darrell, FCNNs for Semantic Segmentation, CVPR 15 P. Krähenbühl and V. Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011 25 26 Repurposing DCNNs for semantic segmentation ● Accelerate CNN evaluation by ‘hard dropout’ & finetuning ● In VGG: Subsample first FC layer 7x7 → 3x3 ● Decrease score map stride (32->8) with ‘atrous’ (w. holes) algorithm 8 FPS M. Holschneider, et al, A real-time algorithm for signal analysis with the help of the wavelet transform, Wavelets, Time-Frequency Methods and Phase Space, 1989. FCNN-DCRF: Full & densely connected FCNN-based labelling from denselyconnected CRF ● Large CNN receptive field: + good accuracy - worse performance near boundaries ● Dense CRF: sharpen boundaries using image-based info P. Krähenbühl and V. Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011 27 28 Indicative Results Raw score maps After dense CRF 29 Indicative Results Raw score maps After dense CRF 30 Indicative Results Raw score maps After dense CRF 31 Indicative Results Raw score maps After dense CRF 32 Improvements due to fully-connected CRF Improvements due to Dense CRF Krahenbuhl et. al. (TextonBoost unaries) 27.6 -> 29.1 (+1.5) Our work (FCNN unaries) 61.3 -> 65.21 (+3.9) 33 Comparisons to Fully Convolutional Net Ground-truth FCN-8 Our work J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. arXiv:1411.4038, 2014. Comparisons to TTI-Zoomout system Ground-truth TTI-Zoomout Our work M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich. Feedforward semantic segmentation with zoom-out features. arXiv:1412.0774, 2014 34 35 Comparison to state-of-the-art (Pascal VOC test) Pre-CNN: Up to 50% G. Papandreou, et al, Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation, arxiv 2015 CNN: 60-64% CNN + CRF: >67% Pascal Train: 67% Coco + Pascal 71% Part 2.5: FCNNs for part segmentation (on-going)36 S. Tsogkas G. Papandreou A. Vedaldi 37 Part Segmentation data • AeroplanOID • PASCAL-Part A. Vedaldi, S. Mahendran, S. Tsogkas, S. Maji, B. Girshick, J. Kannala, E. Rahtu, I. Kokkinos, M. B. Blaschko, D. Weiss, B. Taskar, K. Simonyan, N. Saphra, and S. Mohamed, Understanding Objects in Detail with Fine-grained Attributes, CVPR, 2014 X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A.L. Yuille. Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts. CVPR. 2014 38 Part segmentation pipeline head Input image neck torso DCNN CRF tail legs hooves Full segmentation 39 Preliminary results Input Groundtruth Our result Input Groundtruth arms legs Our result Thanks! P.-A. Savalle L-C. Chen 40 S. Tsogkas G. Papandreou K. Murphy A. Yuille A. Vedaldi