M. Volino, D. Casas, J. Collomosse, A. Hilton
Transcription
M. Volino, D. Casas, J. Collomosse, A. Hilton
Optimal Representation of Multi-View Video M. Volino, D. Casas, J. Collomosse, A. Hilton BRITISH MACHINE VISION CONFERENCE 2014 Problem Statement Optimal resampling of the captured views to obtain a compact representation without loss of view-dependent dynamic surface appearance information INPUT REPRESENTATION RESULT Multi-View Video and Geometry Multi-Layer Texture Representation Final Render 2 Problem Statement Optimal resampling of the captured views to obtain a compact representation without loss of view-dependent dynamic surface appearance information INPUT REPRESENTATION RESULT Multi-View Video and Geometry Multi-Layer Texture Representation Final Render 2 Multi-View Video Character Dan Storage: 1.6 GB Character 1 Storage: 1.8 GB 3 Multi-View Video Character Dan Storage: 1.6 GB Character 1 Storage: 1.8 GB 3 Multi-View Video Cloth Face Storage: 11.4 GB Storage: 13.1 GB 4 Multi-View Video Cloth Face Storage: 11.4 GB Storage: 13.1 GB 4 Reconstructing Geometry Vlasic 2008 5 Reconstructing Geometry Method ModelBased Advantage • • Highly detailed geometry Temporal consistent geometry Vlasic 2008 Disadvantage • • • Requires prior knowledge of subject Loss of dynamic surface detail Expensive equipment Vlasic 2008 Carranza 2003 Model-Based - Vlasic 2008 5 Reconstructing Geometry Method ModelBased ModelFree Advantage • • • • Highly detailed geometry Temporal consistent geometry Flexible, no prior knowledge of scene required no specialist equipment only cameras Vlasic 2008 Disadvantage • • • • • Requires prior knowledge of subject Loss of dynamic surface detail Expensive equipment Each frame is reconstructed independently, requires temporal alignment Model detail is limited by camera resolution Model-Based - Vlasic 2008 Vlasic 2008 Carranza 2003 Starck 2007 Model-Free - Stark and Hilton 2007 5 Texturing 4D Models Multi-View Video ~700MB View Dependent Dynamic Practical Storage 6 Texturing 4D Models Multi-View Video ~700MB View Dependent Dynamic Practical Storage 6 Texturing 4D Models Multi-View Video ~700MB View Dependent Dynamic Practical Storage 6 Texturing 4D Models Multi-View Video Single-Texture Map ~700MB ~1MB View Dependent View Dependent Dynamic Dynamic Practical Storage Practical Storage 6 Texturing 4D Models Multi-View Video Single-Texture Map ~700MB ~1MB View Dependent View Dependent Dynamic Dynamic Practical Storage Practical Storage 6 Texturing 4D Models Multi-View Video Single-Texture Map Texture map per Frame ~700MB ~1MB ~30MB View Dependent View Dependent View Dependent Dynamic Dynamic Dynamic Practical Storage Practical Storage Practical Storage 6 Texturing 4D Models Multi-View Video Single-Texture Map Texture map per Frame ~700MB ~1MB ~30MB View Dependent View Dependent View Dependent Dynamic Dynamic Dynamic Practical Storage Practical Storage Practical Storage 6 Multi-Layer Texture Map INPUT PROCESS ASSIGNMENT Multi-View Video Aligned Geometry UV Coordinates 7 Multi-Layer Texture Map INPUT PROCESS ASSIGNMENT Multi-View Video Aligned Geometry UV Coordinates 7 Multi-Layer Texture Map INPUT PROCESS ASSIGNMENT Multi-View Video Aligned Geometry UV Coordinates 7 Multi-Layer Texture Map INPUT PROCESS ASSIGNMENT Multi-View Video Aligned Geometry UV Coordinates 7 Multi-Layer Texture Map INPUT Multi-View Video Aligned Geometry PROCESS α ASSIGNMENT β γ UV Coordinates 7 Multi-Layer Texture Map INPUT Multi-View Video Aligned Geometry PROCESS α ASSIGNMENT β γ β<α<γ UV Coordinates 7 Multi-Layer Texture Map INPUT PROCESS ASSIGNMENT Layer 1 Multi-View Video Aligned Geometry α β γ Layer 2 Layer 3 β<α<γ UV Coordinates 7 Multi-Layer Texture Map Layer 1 Layer 2 Layer 3 Texture Colour Camera Assignment 8 Multi-Layer Texture Map Layer 1 Layer 2 Layer 3 Texture Colour Camera Assignment 8 Optimisation • Construct an undirected graph Nodes -> Mesh Polygons Edges -> One Neighbour connection • Using Markov Random Fields find a labelling of cameras to mesh polygons that minimises an energy function 9 Optimisation 10 Optimisation Unary Term • Unary Term Ensures most direct camera is preferred and enforces visibility constraints 10 Optimisation Unary Term • Spatial Term Unary Term Ensures most direct camera is preferred and enforces visibility constraints • Spatial Smoothness Term Reduces changes in camera-to-polygon assignment across mesh surface 10 Optimisation Unary Term • Spatial Term Temporal Term Unary Term Ensures most direct camera is preferred and enforces visibility constraints • Spatial Smoothness Term Reduces changes in camera-to-polygon assignment across mesh surface • Temporal Smoothness Term Reduces changes in camera-to-polygon assignment over time 10 Optimisation Layer 1 Layer 2 Layer 3 No Optimisation NO Spatial Optimisation SO Spatio + Temporal Optimisation TO 11 Optimisation Layer 1 Layer 2 Layer 3 No Optimisation NO Spatial Optimisation SO Spatio + Temporal Optimisation TO 11 Rendering 12 Rendering 12 Texture Artefacts • Approximate geometry and imprecise camera calibration leads to artefacts when blending textures as in view-dependent rendering • Highlights need for spatial alignment 13 Multi-View Texture Alignment • From viewpoint of each camera, projectively texture from all other cameras and compute optical flow • Optical flow breaks down in presence of occlusions • Discard flow in areas of occlusions and depth discontinuities (black areas) 1 2 3 4 5 14 Multi-View Texture Alignment TODO VIDEO 15 Multi-View Texture Alignment TODO VIDEO 15 Multi-View Texture Alignment 16 Multi-View Texture Alignment 16 Evaluation: Datasets Dataset Character 1 Cloth Dan Face Cameras Frames 8 5 8 5 31 310 28 355 Captured Data (MB) Raw Video Compression 1800 11400 1600 13100 61 906 57 386 17 Evaluation: Datasets Dataset Character 1 Cloth Dan Face Cameras Frames 8 5 8 5 31 310 28 355 Captured Data (MB) Raw Video Compression 1800 11400 1600 13100 61 906 57 386 17 Evaluation: Datasets Dataset Character 1 Cloth Dan Face • Cameras Frames 8 5 8 5 31 310 28 355 Captured Data (MB) Raw Video Compression 1800 11400 1600 13100 61 906 57 386 Compare storage of layer sequence to the video compressed captured data both encoded using same codec 17 Evaluation: Datasets Dataset Character 1 Cloth Dan Face Cameras Frames 8 5 8 5 31 310 28 355 Captured Data (MB) Raw Video Compression 1800 11400 1600 13100 61 906 57 386 • Compare storage of layer sequence to the video compressed captured data both encoded using same codec • Evaluate quality by rendering arbitrary views using multi-layer texture representation and using free viewpoint video renderer [Starck et al. 2009] and compare using Structural Similarity Index Measure [Wang et al. 2009]. 17 Evaluation: Quality • Optimisation has no effect on rendering quality • Same appearance information reordered for better compression • Result generated using 512 texture size 18 Evaluation: Quality • Optimisation has no effect on rendering quality • Same appearance information reordered for better compression • Result generated using 512 texture size • Increasing the size of the texture map has no effect on the quality above 1024 • Results generated using TO 18 Evaluation: Storage • Effect of optimisation on storage size • With Face dataset this difference is approximately 10-20MB • Results generated using 512 texture size 19 Evaluation: Storage • Effect of optimisation on storage size • With Face dataset this difference is approximately 10-20MB • Results generated using 512 texture size • Higher the texture size, the lower the storage reduction • In the case of datasets dan and character 1 reduction stop after 4 layers • Results generated using TO 19 Conclusions 20 Conclusions • Novel texture representation maintains dynamic, view-dependent appearance View Dependent Dynamic Practical Storage • Demonstrated on a variety of subjects including full body, face and cloth capture 20 Conclusions • Novel texture representation maintains dynamic, view-dependent appearance View Dependent Dynamic Practical Storage • Demonstrated on a variety of subjects including full body, face and cloth capture • Significant reduction in storage requirements > 90% 20 Conclusions • Novel texture representation maintains dynamic, view-dependent appearance View Dependent Dynamic Practical Storage • Demonstrated on a variety of subjects including full body, face and cloth capture • Significant reduction in storage requirements > 90% • Optical flow based multi camera alignment which significantly reduces artefacts 20 Future Work • Can we ensure temporal coherence over layer sequences? • Can the multi-layer texture representation be extended to allow appearance editing and relighting through extraction of material properties? 21 Datasets cvssp.org/data/cvssp3d 22 Acknowledgements University of Surrey Adrian Hilton John Collomosse BBC R&D Graham Thomas Project Dan Casas* Martin Klaudiny Peng Huang * Now at USC: ICT 23 cvssp.org/data/cvssp3d Thanks for your attention! Questions ? TO, 1024, 3 Layers 72 MB, 82% Reduction TO, 512, 3 Layers 3.8 MB, 93% Reduction cvssp.org/data/cvssp3d Thanks for your attention! Questions ? TO, 1024, 3 Layers 72 MB, 82% Reduction TO, 512, 3 Layers 3.8 MB, 93% Reduction