M. Volino, D. Casas, J. Collomosse, A. Hilton

Transcription

M. Volino, D. Casas, J. Collomosse, A. Hilton
Optimal Representation
of Multi-View Video
M. Volino, D. Casas, J. Collomosse, A. Hilton
BRITISH MACHINE VISION CONFERENCE 2014
Problem Statement
Optimal resampling of the captured views to obtain a
compact representation without loss of view-dependent
dynamic surface appearance information
INPUT
REPRESENTATION
RESULT
Multi-View Video
and Geometry
Multi-Layer
Texture Representation
Final Render
2
Problem Statement
Optimal resampling of the captured views to obtain a
compact representation without loss of view-dependent
dynamic surface appearance information
INPUT
REPRESENTATION
RESULT
Multi-View Video
and Geometry
Multi-Layer
Texture Representation
Final Render
2
Multi-View Video
Character Dan
Storage: 1.6 GB
Character 1
Storage: 1.8 GB
3
Multi-View Video
Character Dan
Storage: 1.6 GB
Character 1
Storage: 1.8 GB
3
Multi-View Video
Cloth
Face
Storage: 11.4 GB
Storage: 13.1 GB
4
Multi-View Video
Cloth
Face
Storage: 11.4 GB
Storage: 13.1 GB
4
Reconstructing Geometry
Vlasic 2008
5
Reconstructing Geometry
Method
ModelBased
Advantage
•
•
Highly detailed geometry
Temporal consistent geometry
Vlasic 2008
Disadvantage
•
•
•
Requires prior knowledge of subject
Loss of dynamic surface detail
Expensive equipment
Vlasic 2008
Carranza 2003
Model-Based - Vlasic 2008
5
Reconstructing Geometry
Method
ModelBased
ModelFree
Advantage
•
•
•
•
Highly detailed geometry
Temporal consistent geometry
Flexible, no prior knowledge of scene
required
no specialist equipment only cameras
Vlasic 2008
Disadvantage
•
•
•
•
•
Requires prior knowledge of subject
Loss of dynamic surface detail
Expensive equipment
Each frame is reconstructed independently,
requires temporal alignment
Model detail is limited by camera resolution
Model-Based - Vlasic 2008
Vlasic 2008
Carranza 2003
Starck 2007
Model-Free - Stark and Hilton 2007
5
Texturing 4D Models
Multi-View Video
~700MB
View Dependent
Dynamic
Practical Storage
6
Texturing 4D Models
Multi-View Video
~700MB
View Dependent
Dynamic
Practical Storage
6
Texturing 4D Models
Multi-View Video
~700MB
View Dependent
Dynamic
Practical Storage
6
Texturing 4D Models
Multi-View Video
Single-Texture Map
~700MB
~1MB
View Dependent
View Dependent
Dynamic
Dynamic
Practical Storage
Practical Storage
6
Texturing 4D Models
Multi-View Video
Single-Texture Map
~700MB
~1MB
View Dependent
View Dependent
Dynamic
Dynamic
Practical Storage
Practical Storage
6
Texturing 4D Models
Multi-View Video
Single-Texture Map
Texture map per Frame
~700MB
~1MB
~30MB
View Dependent
View Dependent
View Dependent
Dynamic
Dynamic
Dynamic
Practical Storage
Practical Storage
Practical Storage
6
Texturing 4D Models
Multi-View Video
Single-Texture Map
Texture map per Frame
~700MB
~1MB
~30MB
View Dependent
View Dependent
View Dependent
Dynamic
Dynamic
Dynamic
Practical Storage
Practical Storage
Practical Storage
6
Multi-Layer Texture Map
INPUT
PROCESS
ASSIGNMENT
Multi-View Video
Aligned Geometry
UV Coordinates
7
Multi-Layer Texture Map
INPUT
PROCESS
ASSIGNMENT
Multi-View Video
Aligned Geometry
UV Coordinates
7
Multi-Layer Texture Map
INPUT
PROCESS
ASSIGNMENT
Multi-View Video
Aligned Geometry
UV Coordinates
7
Multi-Layer Texture Map
INPUT
PROCESS
ASSIGNMENT
Multi-View Video
Aligned Geometry
UV Coordinates
7
Multi-Layer Texture Map
INPUT
Multi-View Video
Aligned Geometry
PROCESS
α
ASSIGNMENT
β
γ
UV Coordinates
7
Multi-Layer Texture Map
INPUT
Multi-View Video
Aligned Geometry
PROCESS
α
ASSIGNMENT
β
γ
β<α<γ
UV Coordinates
7
Multi-Layer Texture Map
INPUT
PROCESS
ASSIGNMENT
Layer 1
Multi-View Video
Aligned Geometry
α
β
γ
Layer 2
Layer 3
β<α<γ
UV Coordinates
7
Multi-Layer Texture Map
Layer 1
Layer 2
Layer 3
Texture
Colour
Camera
Assignment
8
Multi-Layer Texture Map
Layer 1
Layer 2
Layer 3
Texture
Colour
Camera
Assignment
8
Optimisation
•
Construct an undirected graph
Nodes -> Mesh Polygons
Edges -> One Neighbour connection
•
Using Markov Random Fields
find a labelling of cameras to
mesh polygons that minimises
an energy function
9
Optimisation
10
Optimisation
Unary Term
•
Unary Term
Ensures most direct camera is preferred and enforces visibility
constraints
10
Optimisation
Unary Term
•
Spatial Term
Unary Term
Ensures most direct camera is preferred and enforces visibility
constraints
•
Spatial Smoothness Term
Reduces changes in camera-to-polygon assignment across
mesh surface
10
Optimisation
Unary Term
•
Spatial Term
Temporal Term
Unary Term
Ensures most direct camera is preferred and enforces visibility
constraints
•
Spatial Smoothness Term
Reduces changes in camera-to-polygon assignment across
mesh surface
•
Temporal Smoothness Term
Reduces changes in camera-to-polygon assignment over time
10
Optimisation
Layer 1
Layer 2
Layer 3
No
Optimisation
NO
Spatial
Optimisation
SO
Spatio + Temporal
Optimisation
TO
11
Optimisation
Layer 1
Layer 2
Layer 3
No
Optimisation
NO
Spatial
Optimisation
SO
Spatio + Temporal
Optimisation
TO
11
Rendering
12
Rendering
12
Texture Artefacts
•
Approximate geometry and imprecise camera
calibration leads to artefacts when blending textures
as in view-dependent rendering
•
Highlights need for spatial alignment
13
Multi-View Texture Alignment
•
From viewpoint of each
camera, projectively
texture from all other
cameras and compute
optical flow
•
Optical flow breaks down
in presence of occlusions
•
Discard flow in areas of
occlusions and depth
discontinuities (black
areas)
1
2
3
4
5
14
Multi-View Texture Alignment
TODO
VIDEO
15
Multi-View Texture Alignment
TODO
VIDEO
15
Multi-View Texture Alignment
16
Multi-View Texture Alignment
16
Evaluation: Datasets
Dataset
Character 1
Cloth
Dan
Face
Cameras Frames
8
5
8
5
31
310
28
355
Captured Data (MB)
Raw
Video Compression
1800
11400
1600
13100
61
906
57
386
17
Evaluation: Datasets
Dataset
Character 1
Cloth
Dan
Face
Cameras Frames
8
5
8
5
31
310
28
355
Captured Data (MB)
Raw
Video Compression
1800
11400
1600
13100
61
906
57
386
17
Evaluation: Datasets
Dataset
Character 1
Cloth
Dan
Face
•
Cameras Frames
8
5
8
5
31
310
28
355
Captured Data (MB)
Raw
Video Compression
1800
11400
1600
13100
61
906
57
386
Compare storage of layer sequence to the video compressed captured
data both encoded using same codec
17
Evaluation: Datasets
Dataset
Character 1
Cloth
Dan
Face
Cameras Frames
8
5
8
5
31
310
28
355
Captured Data (MB)
Raw
Video Compression
1800
11400
1600
13100
61
906
57
386
•
Compare storage of layer sequence to the video compressed captured
data both encoded using same codec
•
Evaluate quality by rendering arbitrary views using multi-layer texture
representation and using free viewpoint video renderer [Starck et al. 2009]
and compare using Structural Similarity Index Measure [Wang et al. 2009].
17
Evaluation: Quality
•
Optimisation has no effect on rendering
quality
•
Same appearance information reordered for better compression
•
Result generated using 512 texture size
18
Evaluation: Quality
•
Optimisation has no effect on rendering
quality
•
Same appearance information reordered for better compression
•
Result generated using 512 texture size
•
Increasing the size of the texture
map has no effect on the quality
above 1024
•
Results generated using TO
18
Evaluation: Storage
•
Effect of optimisation on storage size
•
With Face dataset this difference is
approximately 10-20MB
•
Results generated using 512 texture
size
19
Evaluation: Storage
•
Effect of optimisation on storage size
•
With Face dataset this difference is
approximately 10-20MB
•
Results generated using 512 texture
size
•
Higher the texture size, the lower the
storage reduction
•
In the case of datasets dan and
character 1 reduction stop after 4 layers
•
Results generated using TO
19
Conclusions
20
Conclusions
•
Novel texture representation maintains
dynamic, view-dependent appearance
View Dependent
Dynamic
Practical Storage
•
Demonstrated on a variety of subjects
including full body, face and cloth
capture
20
Conclusions
•
Novel texture representation maintains
dynamic, view-dependent appearance
View Dependent
Dynamic
Practical Storage
•
Demonstrated on a variety of subjects
including full body, face and cloth
capture
•
Significant reduction in storage
requirements > 90%
20
Conclusions
•
Novel texture representation maintains
dynamic, view-dependent appearance
View Dependent
Dynamic
Practical Storage
•
Demonstrated on a variety of subjects
including full body, face and cloth
capture
•
Significant reduction in storage
requirements > 90%
•
Optical flow based multi camera
alignment which significantly reduces
artefacts
20
Future Work
•
Can we ensure temporal coherence over layer
sequences?
•
Can the multi-layer texture representation be extended
to allow appearance editing and relighting through
extraction of material properties?
21
Datasets
cvssp.org/data/cvssp3d
22
Acknowledgements
University of Surrey
Adrian Hilton
John Collomosse
BBC R&D
Graham Thomas
Project
Dan Casas*
Martin Klaudiny
Peng Huang
* Now at USC: ICT
23
cvssp.org/data/cvssp3d
Thanks for your attention!
Questions ?
TO, 1024, 3 Layers
72 MB, 82% Reduction
TO, 512, 3 Layers
3.8 MB, 93% Reduction
cvssp.org/data/cvssp3d
Thanks for your attention!
Questions ?
TO, 1024, 3 Layers
72 MB, 82% Reduction
TO, 512, 3 Layers
3.8 MB, 93% Reduction