www.cg.cs.tu-bs.de - Computer Graphics Lab

Transcription

www.cg.cs.tu-bs.de - Computer Graphics Lab
COMPUTER GRAPHICS forum
Volume 0 (1981), Number 0 pp. 1–11
Virtual Video Camera:
Image-Based Viewpoint Navigation Through Space and Time
C. Lipski, C. Linz, K. Berger, A. Sellent and M. Magnor
Computer Graphics Lab, TU Braunschweig, Germany
Abstract
We present an image-based rendering system to viewpoint-navigate through space and time of complex real-world,
dynamic scenes. Our approach accepts unsynchronized, uncalibrated multi-video footage as input. Inexpensive,
consumer-grade camcorders suffice to acquire arbitrary scenes, e.g., in the outdoors, without elaborate recording
setup procedures, allowing also for hand-held recordings. Instead of scene depth estimation, layer segmentation,
or 3D reconstruction, our approach is based on dense image correspondences, treating view interpolation uniformly in space and time: spatial viewpoint navigation, slow motion, or freeze-and-rotate effects can all be created
in the same way. Acquisition simplification, integration of moving cameras, generalization to difficult scenes, and
space-time symmetric interpolation amount to a widely applicable Virtual Video Camera system.
Categories and Subject Descriptors (according to ACM CCS):
Generation—I.3.8Applications
1. Introduction
The objective common to all free-viewpoint navigation systems is to render photo-realistic vistas of real-world, dynamic scenes from arbitrary perspective (within some specified range), given a number of simultaneously recorded
video streams. Most systems exploit epipolar geometry based on either dense depth/disparity maps [NTH02,
GMW02, ZKU∗ 04] or complete 3D geometry models [MBR∗ 00, CTMS03, VBK05, SH07, dAST∗ 08]. In order to estimate depth or reconstruct 3D geometry of dynamic scenes, the input multi-video data must be precisely
calibrated as well as captured synchronously. This dependence on synchronized footage can limit practical applicability: high-end or even custom-built acquisition hardware
must be employed, and the recording setup indispensably includes some sort of camera interconnections (cables, WiFi).
The cost, time, and effort involved in recording synchronized
multi-video data prevents widespread use of free-viewpoint
navigation methods.
In this work, we address these limitations and propose a
purely image-based approach to free-viewpoint navigation
through space as well as time. Our approach accepts unsynchronized, uncalibrated multi-video footage as input. It is
motivated by the pioneering work on view interpolation by
Chen and Williams [CW93]. We pick up on the idea of inc 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and
350 Main Street, Malden, MA 02148, USA.
I.3.3 [Computer Graphics]: Picture/Image
terpolating different image acquisition attributes in higher
dimensional space and suitably extend it to be applicable
to view interpolation in the spatial as well as temporal domain. By putting the temporal dimension on a par with the
spatial dimensions, a uniform framework is available to continuously interpolate virtual video camera positions across
space and time.
In our approach, we make use of dense image correspondences [SLW∗ 08, SLW∗ 10], which take the place of
depth/disparity or 3D geometry. We show how dense correspondences allow extending applicability to scenes whose
object surfaces are highly variable in appearance, e.g. due
to specular highlights, or hard to reconstruct for other reasons. Perceptually plausible image correspondence fields
can often still be established where ground-truth geometry
(or geometry-based correspondences) cannot [BN92]. Dense
image correspondences can also be established along the
temporal dimension to enable interpolation in time.
Main contribution of our paper is the presentation of a
complete Virtual Video Camera system that features continuous viewpoint navigation in space and time for general,
real-world scenes from a manageable number of easy-toacquire, unsynchronized, uncalibrated and potentially handheld camcorder recordings. To facilitate intuitive navigation in space and time, we describe the construction of a 3-
C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
Figure 1: Viewpoint navigation space: time and camera viewing directions span our navigation space. Each cube represents one
video frame. Edges denote correspondence fields between two video frames. The navigation space is partitioned into tetrahedra.
Our virtual video camera view is interpolated by warping and compositing the four video frames of the enclosing tetrahedron
(blue) in real-time.
dimensional navigation space. We show that special care has
to be taken to create the notion of a static virtual camera if
the temporal dimension is involved.
In the following section, we highlight related work before we describe our navigation space in Sect. 3. We discuss
implications and potential pitfalls in Sect. 4. Practical implementation issues of our processing pipeline (Fig. 2) are
addressed in Sect. 5, outlining existing methods employed
and explaining important necessary adaptations. To demonstrate the versatility of our approach and to document rendering quality, we present results of our Virtual Video Camera
system for a number of real-world scenes featuring different
challenges in Sect. 6. We discuss remaining limitations of
our system before we conclude in Sect. 7 with a summary
and outlook on intriguing follow-up research.
2. Related Work
Free-viewpoint navigation around real-world, dynamic
scenes has recently received considerable attention. In general, any image-based rendering method originally developed for static scenes can be applied to dynamic scene
content by considering each time frame individually. For
example, light field rendering [LH96] can be directly extended to dynamic scenes [FT02,MP04]. To cover some useful view angle range at acceptable image quality, however,
large numbers of densely packed video cameras are necessary [MB95, YEBM02, WJV∗ 05, VGvT∗ 05].
Wider camera spacings become possible only if more
elaborate view interpolation approaches are employed. Frequent use is made of the epipolar constraint, reducing the 2D
correspondence problem between synchronized images from
different cameras to a one-dimensional line search. Based
on epipolar geometry, either dense depth/disparity maps or
complete 3D geometry models are estimated from the input
image data prior to rendering. Seitz and Dyer [SD96] determine the F matrix to estimate dense disparity and warpinterpolate between two views of static scenes. Manning
and Dyer [MD99] extend the approach to dynamic scenes
and unsynchronized images for the case of linearly moving
scene objects that can be segmented into layers. For rigidbody objects, Xiao and Shah [XS04] describe an interpolation method for three images. All these approaches, however, neglect occlusion/disocclusion. In their Video View Interpolation system, Zitnick et al. [ZKU∗ 04] take occlusion
effects into account. Based on dense depth maps, the system
requires synchronized and calibrated multi-video footage
which is acquired using a custom-built multi-camera system.
Hornung and Kobbelt [HK09] reconstruct a particle point
cloud from an unordered image collection and use this representation for their rendering. Ballan et al. use a combination
of approximate scene geometry reconstruction and billboard
representation of a single moving actor [BBPP10]. They rely
on the observation that the actor’s silhouette looks identical
from various viewpoints, otherwise cross-fading artifacts become visible. Fitzgibbon et al. [FWZ03] circumvent the necessity of depth estimation by reconstructing color instead of
geometry. Space-Time Light Field Rendering overcomes the
dependence on synchronized recordings [WY05, WSY07].
The system operates on unsynchronized video streams and is
capable of interpolating in space and time. The input images
are warped twice, first to a common virtual time before Unstructured Lumigraph Rendering [BBM∗ 01] is applied in a
second step. Unfortunately, the approach requires considerable computation time and takes several minutes per output
frame.
Most depth/disparity-based systems perform viewpoint
interpolation between recording camera positions only.
For unrestricted viewpoint navigation through 3D space,
complete 3D models of dynamic scene geometry are
needed [MBR∗ 00, CTMS03, WWC∗ 05, SH07, dAST∗ 08].
While most systems perform only spatial viewpoint interpolation, Vedula et al. [VBK05] describe a volume-based
approach that is capable of interpolating in space and time
by estimating 3D geometry and motion. In general, rendering quality is highly dependent on 3D model accuracy; for
dynamic scene content whose geometry cannot be reconstructed or modeled well, rendering results are prone to artifacts.
A generally applicable, feature-based method for inc 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
I
Offline Pre-Processing Real-Time Rendering
Color Correction
Ic
Embedding (Sect. 3)
Calibration
Audio Sync.
R, p
{vj }
Tetrahedralization (Sect. 4)
t
Corresp. Estimation
View Interpolation
Iv
{Wij }
{λk }
ϕ, θ,t
G, cm
Figure 2: Virtual Video Camera processing pipeline: multi-video data (I) is color-corrected first (Ic ). To embed the video
frames into navigation space {vj }, the user specifies the master camera cm and a common ground plane G . Extrinsic camera
parameters (R, p) and the time offsets (t) are automatically estimated. Adjacency of video frames induces a tetrahedralization
({λk }) of the navigation space, and dense correspondences ({Wij }) are estimated along the edges of the tetrahedralization. We
allow for manual correction of spurious correspondence fields. After these off-line processing steps, the navigation space can
be interactively explored (viewing directions ϕ, θ, time t) by real-time rendering the virtual view Iv .
terpolating between two different images is presented by
Beier and Neely [BN92]. Chen and Williams show how
general image interpolation can be used for view interpolation [CW93]. For improved rendering performance,
McMillan and Bishop propose a planar-to-planar, forward
mapped image warping algorithm [MB95]. Mark et al.
adapt the method to achieve high frame rates for postrendering [MMB97], while Zhang et al. apply feature-based
morphing to light fields [ZWGS02]. Lee et al. extended the
feature-based method presented by Beier and Neely [BN92]
to more than two images [LWS98]. Based on visually plausible, dense image correspondence fields, warping/morphingbased image interpolation is considerably more flexible than
epipolar-constrained view interpolation; calibration imprecision, unsynchronized multi-video footage, or geometry
inaccuracies do not hamper correspondence-based interpolation. In addition, plausible image correspondence fields
can often still be established for scenes for which depth
or 3D geometry is hard to come by [SLW∗ 08]. Recently,
Mahajan et al. presented a path-based interpolation for image pairs that operates in the gradient domain and prevents
ghosting/blurring and many occlusion artifacts visible in
morphing-based methods [MHM∗ 09]. Other approaches to
combat occlusion artifacts have been proposed by [HSB09]
and [IK08].
In their Siggraph’93 paper, Chen and Williams mention
image interpolation in a higher-dimensional space of different acquisition attributes [CW93]. Our Virtual Video Camera
system picks up on this idea and suitably extends it to view
interpolation in the spatial and temporal domain.
3. Navigation space embedding
Our goal is to explore the captured scene in an intuitive way
and render a (virtual) view Iv of it, corresponding to a combination of viewing direction and time. To this end, we choose
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation S
cb = cm
g1
ca
cc
pS
cd
g2
g3
Figure 3: Navigation space: we define a sphere S around the
scene. For the center pS of S, we least-squares-determine the
point closest to the optical axes of all cameras (green lines).
The user selects three points g1 , g2 , g3 to define the ground
plane (yellow circle). We take the normal of the plane as the
up vector of the scene, and thus as the rotation axis of the
sphere. The embedding is uniquely defined by labeling one
of the cameras as the Master camera cm .
to define a 3-dimensional navigation space N that represents spatial camera coordinates as well as the temporal dimension. In their seminal paper, Chen and Williams [CW93]
propose to interpolate the camera rotation R and position p
directly in 6-dimensional hyperspace. While this is perfectly
feasible in theory, it has several major drawbacks in practice: it neither allows for intuitive exploration of the scene by
a user, nor is it practical to handle the amount of emerging
data needed for interpolation in this high-dimensional space.
Additionally, cameras would have to be arranged in a way
that they span an actual volume in Euclidean space. With
C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
cd
ce
c fvc
ch
c f vc
cg
Figure 4: Two possibilities to partition the camera setup of the Breakdancer sequence. Both images depict the camera setup
seen from above. When partitioning the Euclidean space directly (left), several problems arise. Although an actual volume is
spanned by the cameras, many tetrahedra are degenerated. If the virtual camera vc is reconstructed, captured images from
cameras cd , ce , c f and ch as well as the wide-baseline correspondence fields between them are required. Using our navigation
space embedding (right), interpolation only takes place between neighboring cameras. Less data has to be processed and
correspondence estimation is much easier. Additionally, spatial navigation is intuitively simplified. Instead of a 3D position,
only the position on the one-dimensional arc (light blue) has to be specified. The small spatial error (distance between vc and
line segment between c f and cg ) is negligible even for ad-hoc setups.
this requirement, it would be hard to devise an arrangement
of cameras where they do not occlude each other’s view of
the scene. The crucial design decision in our system is hence
to map the extrinsic camera parameters to a lower dimensional space that allows intuitive navigation. The temporal
dimension already defines one axis of the navigation space
N , leaving two dimensions for parameterizing the camera
orientation and position. Practical parametrizations that allow for realistic view interpolation are only possible if the
cameras’ optical axes cross at some point (possibly at infinity).
A natural choice for such an embedding is a spherical
parametrization of the camera setup. While for example a
cylindrical embedding or an embedding in a plane is also
feasible, a spherical embedding allows for all reasonable
physical camera setups, ranging from cameras arranged in a
1-dimensional arc, over cameras placed in a spherical setup
to linear camera arrays with parallel optical axes in the limit.
Regarding existing data sets, it is obvious that spherical/arcshaped setups are the predominant multi-view capture scenarios (e.g. [ZKU∗ 04, dAST∗ 08]). Even in unordered image collections it can be observed that a majority of camera
positions is distributed in arcs around certain points of interest [SGSS08]. Other placements, such as panoramic views,
are also possible, but would suffer from the small overlap
of the image regions of cameras. For all results presented
in this paper, we hence employ a spherical model with optical axes centered at a common point. Camera setups such as
Bézier splines or patches are also feasible and our approach
may adapt to any setup as long as a sensible parametrization
can be obtained. As the extension of our approach to these
settings is straightforward, we will not discuss it in detail.
Cameras are placed on the surface of a virtual sphere, their
orientations are defined by azimuth ϕ and elevation θ. Together with the temporal dimension t, ϕ and θ span our 3dimensional navigation space N . If cameras are arranged
in an arc or curve around the scene, θ is fixed, reducing N
to two dimensions. As this simplification is trivial, we will
only cover the three dimensional case in our discussion. In
contrast to the conventional partition of space suggested by
Chen and Williams [CW93], we thus restrict the movement
of the virtual camera to a subspace of lesser dimensionality
(2D approximate spherical surface or 1D approximated arc).
Although this might appear as a drawback at first sight, several advantages arise from this crucial design decision, see
Fig. 4:
1. The amount of correspondence fields needed for image
interpolation is reduced significantly, making both preprocessing and rendering faster.
2. An unrestricted partition of Euclidean space leads to degenerated tetrahedra. Especially when cameras are arranged along a line or arc, adjacencies between remote
cameras are established for which no reliable correspondence information can be obtained.
3. Our parametrization of the camera arrangement provides
an intuitive navigation around the scene.
To define our navigation space N , we assume that we know
ground-truth extrinsic camera parameters R and p for every
camera, as well as a few point correspondences with their 3D
world coordinates (in Sect. 5.2 we describe how we obtain
sufficiently accurate calibration parameters). For a specific
virtual image Iv , we want to interpolate the image at a given
point in navigation space defined by the two spatial parameters ϕ and θ as well as recording time t . To serve as sampling
points, the camera configuration of our recorded multi-video
input in Euclidean world space is embedded into navigation
space N
Ψ : (R, p,t) 7→ (ϕ, θ,t).
In our system, Ψ is simply a transformation from Cartesian
coordinates to spherical coordinates, where the sphere center pS and the radius of the sphere rS are computed from
the cameras’ extrinsic parameters R and p in a least-squares
sense. The embedding is uniquely defined by specifying a
ground plane in the scene and by labeling one of the cameras as the master camera cm , cf. Fig. 3.
It is evident that the recording hull spanned by all camera positions is, in general, only a crude approximation to a spherical surface. Fortunately, we found that non-spherical camc 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
cb = (0, 3)
cc = (3, 3)
ca = (0, 0) cd = (2, 0)
∆1
∆3
∆2
∆4
Figure 5: Surface tessellation: for the spherical coordinates
of four cameras ca , . . . cd , two possible tessellations {∆1 , ∆2 }
and {∆3 , ∆4 } exist. During space-time interpolation, the
configuration of the initial tessellation may not change to
avoid interpolation discontinuities.
era arrangements do not cause any visually noticeable effect
during rendering.
4. Spacetime tetrahedralization
In order to interpolate viewpoints, Chen and Williams propose to generate some arbitrary graph to partition the space
such that every possible viewpoint lies in a d-simplex and
can be expressed as a linear combination of the d + 1 vertices of the enclosing simplex [CW93]. When including
the temporal dimension, however, an arbitrary graph leads
to rendering artifacts due to inconsistent mappings between
Euclidean space and navigation space.
the temporal dimension is introduced, even if ground-truth
camera parameters R and p are available.
Imagine now that the user intends to re-render the scene
from a static virtual camera. The position of this camera is
somewhere between the four original cameras, so that the
new view has to be interpolated. Because the camera is static
in navigation space, ϕ and θ stay fixed while t is variable
such that the virtual camera moves along a line l in navigation space, Fig. 6. Assume that the navigation space has
been partitioned by applying a standard Delaunay tetrahedralization [ES96] to the set of navigation space vertices
via , vib , vic , vid , where i indexes the temporal dimension. Now
consider the camera path l. In line section l1 (green), images
associated with camera ca , cb and cc are used for interpolation. The virtual camera thus represents a point that lies
on plane ∆3 in the original Euclidean space, Fig. 5 (lower
right). In line section l2 (blue), however, the virtual camera
is interpolated from all four cameras, i.e., it moves within the
volume spanned by the four different cameras in Euclidean
world space, Fig. 6(a). This violates the notion of a static
virtual camera. In line segment l3 (red), the virtual camera is
represented by a point in Euclidean world space situated on
plane ∆1 , Fig. 5 (upper right). Since ∆1 and ∆3 are not coplanar, the position of the virtual camera in Euclidean space is
different in both line sections, again violating the static camera assumption.
In navigation-space interpolation, this error can be
avoided if any two points vt = (ϕ, θ,t) and vt+δ = (ϕ, θ,t +
δ) re-project to the same point in Euclidean space, i.e., if
the virtual camera is static in Euclidean space. Instead of
building an arbitrary graph structure on the navigation space
vertices, we apply a combination of constrained and unconstrained Delaunay tessellations to navigation space vertices.
4.1. Naïve tetrahedralization
4.2. A constrained tetrahedralization algorithm
Our navigation space consists of two view-directional
and the temporal dimension. We subdivide the navigation
space into tetrahedra λ such that each embedded video
frame represents one vertex [LASM08]. We interpolate
any virtual camera view from the enclosing tetrahedron
λ = {vi }, i = 1 . . . 4. As already mentioned, arbitrary partitions are feasible in static scenes. However, our spherical
approximation and/or calibration imprecision will lead to interpolation errors in practice if the temporal dimension is included.
To obtain a temporally consistent tetrahedralization, we start
with an unconstrained Delaunay tessellation on the sphere,
i.e., we neglect the temporal dimension for now. This initial tesselation must be preserved, i.e., the spatial ordering
of cameras must stay the same for all recorded frames of the
video streams. Referring back to the example given above,
we arrive at the tessellation shown in Fig. 7 (left). The edges
of this tessellation serve as boundaries that help maintain the
initial tessellation. We then add the temporal dimension. The
initial tessellation of the sphere is duplicated and displaced
along the temporal axis. In order to maintain the structure
of the tessellation computed so far, we add boundary faces
for each edge of the initial tessellation, e.g., we add a face
(v1b , v2b , v2d ) in the example above, Fig. 7 (right).
To prove this point, let us assume that we have captured
a dynamic scene with four static, unsynchronized cameras
ca , cb , cc and cd . As we stated in Sect. 3, our view-directional
subspace relates to the spherical camera placement. To be
more exact, it represents an approximation of a spherical
surface, examples for possible approximations are given in
Fig. 5, upper right and lower right. We will show that although the chosen spherical approximation is arbitrary, it is
crucial to enforce consistency of the approximation when
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation A constrained Delaunay tessellation can then be computed on this three dimensional point cloud in navigation
space [SG05] . In our example, the added boundary faces
separate the data in such a way that every tetrahedron corresponds to one of the two planes in Euclidean space, i.e.,
C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
v1b
l
λ1
l1
v2b
v2c
l2
v1b
l3
λ2
v2d
v2c
l2
l1
l
λ3
v2a
λ5
v2b
λ6
λ4
v3a
v2a
v2d
v3a
(b) Constrained Delaunay tetrahedralization.
(a) Unconstrained Delaunay tetrahedralization.
Figure 6: Unconstrained vs. constrained tessellation of the navigation space: with an unconstrained tetrahedralization (a),
tetrahedra λ1 , λ2 and λ3 are created. A static virtual camera moves along line l and passes through all three tetrahedra.
When comparing the line sections l1 (green), l2 (blue) and l3 (red), the point in navigation space relates to different points in
Euclidean space, thus the virtual camera seems to move. By introducing boundary faces, such errors can be avoided (b). The
virtual camera passes solely through tetrahedra (λ4 , λ6 ) which are spanned by the same three cameras and is not influenced by
v2c (tetrahedron λ5 remains obsolete). Therefore, the virtual camera remains static in Euclidean space, as intended.
either to ∆1 or ∆2 in Fig. 5, but never to ∆3 or ∆4 . All tetrahedra also comprise of four images that were captured by
exactly three cameras. If each image vertex in a tetrahedron
was captured by a different camera, the situation shown in
line section l2 (Fig. 6(a)) would arise.
cb
The undesired flipping along the temporal dimension also
occurs when working with dynamic cameras. Here we assume that the cameras move, but their relative position do
not change, i.e. we do not allow cameras to switch position.
In this case, the same reasoning as presented above for static
cameras applies. However, now the notion of a static virtual camera is defined with respect to the moving input cameras. By doing so, sudden changes of the spherical surface
approximation can be prevented, just as in the case of stationary recordings. The only limitation is that when cameras
move in Euclidean space, the approximated spherical surface also changes. In these cases our approximation error,
i.e. the distance of the rendered virtual camera to our idealized spherical camera arrangement, does not remain fixed. In
general, this this only noticeable with fast moving cameras
and can usually be neglected for small camera movements.
The same applies to imprecise camera calibration. Small errors in R and p are inevitable, but usually do not manifest
in visible artifacts. For assessment, we refer to Sect. 6 and
to our accompanying video. In cases where the scene center
is actually moving (e.g., at outdoor sports events) and cameramen are following it, we suggest to use a time-dependent
camera embedding.
ca
5. Processing Pipeline
Based on the above-described navigation space tetrahedralization, we propose a fully functional processing pipeline for
free-viewpoint navigation from unsynchronized multi-video
footage. Our processing pipeline makes use of known techniques and suitably adapts them to solve the encountered
problems.
cc
v1b
l
pl
cd
v2c
v2b
v2a
v3a
v2d
Figure 7: Boundary faces (green): A two-dimensional unconstrained Delaunay tessellation reveals the boundary
edges between cameras (left). In three-dimensional navigation space (right), boundary faces (green triangles) are inserted for each boundary edge. The virtual static camera pl
that moves on a line l in navigation space always lies in
tetrahedra constructed from ca , cb and cd , as postulated in
Fig. 5.
5.1. Acquisition
To acquire multi-video data, we use up to 16 HDV Canon
XHA1 camcorders (1440 x 1080 pixels, 25 fps). The captured sequences are internally MPEG-compressed, stored on
DV tape, and later transferred to a standard PC. This setup
is very flexible, easy to setup, runs completely on batteries
and is suitable for indoor and outdoor use. Static setups using tripods as well as setups with dynamic hand-held cameras are possible. Adjacent cameras should only have sufficient view overlap to facilitate correspondence estimation.
For enhancing interpolation, the angle between neighboring
cameras should not exceed roughly 10 degrees, in vertical
or horizontal direction. Fig. 8 shows some typical camera
configurations.
5.2. Pre-processing
Color correction. To correct for color balance differences
among cameras, we use the master camera cm defined in the
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
Figure 8: Camera setups: our approach works with offthe-shelf consumer-grade camcorders, mounted either on
tripods or handheld.
embedding stage and apply the color correction approach
presented by Snavely et al. to all video frames [SSS06].
Camera calibration. We need to determine R and p of
the camera setup to define the mapping Ψ from world
space coordinates to navigation space N , Sect. 3. Recent
structure-from-motion algorithms for unordered image collections [GSC∗ 07, SK09, LBE∗ 10] solve this problem robustly and can also provide a set of sparse world space points
needed for constructing the common ground plane for our
navigation space. We found that this algorithm yields robust
results also for dynamic scenes. For dynamic camera scenarios (i.e., handheld, moving cameras), R and p have to be
computed for every frame of each camera.
Temporal registration. The mapping Ψ additionally needs
the exact recording time t of each camera. We estimate
the sub-frame temporal offset by recording a dual-tone sequence during acquisition and analyzing the audio tracks afterwards [HRT∗ 09]. If recording a separate audio track is not
feasible, pure post-processing approaches [MSMP08] can be
employed instead. Please note that after temporal registration, the recorded images are still not synchronized. Since
the camcorder recordings were triggered manually, a subframe offset still prevents exploiting the epipolar constraint
in the presence of moving objects.
Figure 9: Manual correction of correspondence fields:
Matching ambiguities in difficult cases (e.g., fast motion in
Breakdancer scene) are resolved manually. The user draws
corresponding lines onto both images (purple). They serve
as a correspondence prior for the matching process, which
would otherwise only include automatic correspondence estimation of edge pixels (red). Where necessary, interaction
time per image pair is typically less than 1 minute.
as v = ∑4i=1 µi vi , where µi are the barycentric coordinates of
v. Each of the 4 vertices vi of the tetrahedron corresponds to
a recorded image Ii . Each of the 12 edges ei j correspond to a
correspondence map Wij , that defines a translation of a pixel
location x on the image plane. We are now able to synthesize
a novel image Iv for every point v inside the recording hull
of the navigation space N by multi-image interpolation:
4
Iv =
∑ µi I˜i ,
i=1
Dense correspondence field estimation. In order to interpolate between two images Ii , I j , we need bidirectional dense
correspondence maps Wij for each tetrahedral edge in navigation space N , Sect. 3. In our system, we employ the algorithm proposed by Stich et al. for dense correspondence
estimation [SLW∗ 08, SLW∗ 10]. For most image pairs, the
results are perceptually convincing, and if not, the algorithm
accepts manual corrections, see Fig. 9. Alternatively, our
system would work just as well with any other correspondence estimation algorithm [ZJK05, ST06, SC08, XCJ08,
LLNM10].
5.3. Rendering
Having subdivided navigation space N into tetrahedra, each
point v is defined by the vertices of the enclosing tetrahedron
λ = {vi }, i = 1 . . . 4. Its position can be uniquely expressed
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation where
I˜i Πi (x) +
∑
j=1,...,4, j6=i
!
µ j Π j (Wij (x))
= Ii (x)
are the forward-warped images [MMB97]. {Πi } defines a
set of re-projection matrices {Πi } that map each image
Ii onto the image plane of Iv , as proposed by Seitz and
Dyer [SD96]. Those matrices can be easily derived from
camera calibration. Since the virtual image Iv is always oriented towards the center of the scene, this re-projection corrects the skew of optical axes potentially introduced by our
loose camera setup and also accounts for jittering introduced by dynamic cameras. Image re-projection is done on
the GPU without image data resampling. We handle occlusion/disocclusion on-the-fly based on correspondence field
heuristics as proposed by Stich et al. [SLAM08]. Given the
C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
Figure 10: Virtual video editing: to interactively create arbitrary virtual video camera sequences, the user moves the
camera by click-and-drag movements in the rendering window (top). The spline curve representing the space-time
camera path is automatically updated and visualized in the
navigation space window (bottom).
images recorded by two neighboring cameras at roughly the
same point in time, we evaluate the x-component of the correspondence fields leading from the left camera image to the
right camera image. Assuming that inter-camera motion is
generally larger than inter-frame motion, we use a simple
visibility ordering heuristic. An object located at the center
of the scene will exhibit a horizontal motion vector close to
zero, objects in the foreground move to the right and objects
in the back move to the left. When two regions of a warped
image overlap, i.e., two pixel fragments are warped onto the
same location, a simple comparison of their respective motion vectors thus resolves this ambiguity.
Disocclusions are detected by calculating local divergence
in the correspondence fields. If any two neighboring pixels
exhibit a difference of more than 4 pixels in their motion, triangles connecting them are discarded in a geometry shader.
Hole filling is done during the blending stage where image
information from the other three warped images is used. As
a last step, the borders of the rendered images are cropped
(10% of the image in each dimension), since no reliable correspondence information is available for these regions.
At 940 × 560 pixels output resolution, rendering frame
rates exceed 25 fps on an NVIDIA GeForce 8800 GTX. We
store images and correspondence fields in local memory and
use a pre-fetching scheme for image data.
5.4. Applications
One application is the creation of virtual camera movements
during post-production. Arbitrary panning, time-freeze, and
slow motion shots can be defined and rendered. We implemented a graphical user interface that allows intuitive exploration of our navigation space and allows the user to com-
Figure 11: Interactive Spacetime Navigator: while watching
the scene, the user can control playback speed and view perspective by clicking and dragging within the video window.
pose arbitrary spacetime camera movements interactively.
The user interface makes use of common 3D animation concepts. A visualization of the navigation space is presented,
and the current virtual camera view is rendered in real-time,
Fig. 10. The user can continuously change viewing direction and time by simple click-and-drag movements within
the rendering window. Camera paths are defined by placing,
editing, or deleting control points in navigation space at the
bottom of the screen. The virtual video camera path through
spacetime is interpolated by Catmull-Rom splines [CR74].
Another application is our interactive Spacetime Navigator, Fig. 11. While the interface bears much resemblance
to classic media players, the user can move the camera by
clicking and dragging the mouse cursor within the rendering
window and observe the scene from arbitrary viewpoints.
In addition, playback speed can be varied arbitrarily. The
Spacetime Navigator and the Firebreather test scene are
available online [Com10].
6. Results
We evaluated our system on a variety of real-world scenes,
each one posing different challenges to the processing
pipeline, Table 1. For each test scene, Fig. 12 depicts an
interpolated view at the barycenter of one navigation-space
tetrahedron. The barycenter represents the point where all
four video images are warped and weighed equally, thus representing the case most likely to show rendering artifacts.
Because still images are not able to convey the full impression of interactive viewpoint navigation, we refer the reader
to the accompanying video and the interactive Spacetime
Navigator [Com10]. Besides high-quality image interpolation in space and time, the design of our navigation space
allows intuitive exploration of the recorded spacetime. As
such, our system lends itself as a valuable tool for intuitive
design of visual effects such as time-freeze and slow motion in a post-process. Besides scene-specific challenges, the
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
system must also cope with lens distortion, camera noise,
and compression artifacts inherent to consumer-grade camcorders. The Juggler sequence was recorded using hand-held
camcorders and the cameramen were moving about 0.5 meter to the left during acquisition. Nevertheless, we are able
to synthesize a static virtual view. The small geometrical error introduced by the non-stationary approximation of the
camera arc manifests in a slight drift of the virtual camera, yet it is hardly noticeable. In order to remedy this undesired effect, further stabilization techniques could be integrated (e.g., [LGJA09]). The Breakdancer sequence is provided for visual comparison to the approach of Zitnick et
al. [ZKU∗ 04]. As their data set is closely sampled in the spatial domain (the angle between adjacent cameras is approximately 3 degrees), no user interaction was required for intercamera correspondence field estimation. Please note that our
video is not perfectly synchronized to the original video,
since the original spacetime trajectory is unknown. We also
decided not to imitate the original cross-blending effect,
since it distracts from potential rendering artifacts. While
obtaining comparable visual quality, our method can also interpolate along the temporal dimension. Unfortunately, the
data set was captured at only 15 fps and exhibits very fast
rotational movement, which violates the assumption of linear motion. Still, spatiotemporal image interpolation can be
achieved. However, most of the inter-frame foreground correspondences had to be edited by hand, since the automatic
matching could not cope with the fast motion of the breakdancer.
Limitations.
Similar to depth/disparity-based freeviewpoint navigation systems [ZKU∗ 04], our virtual
viewpoint is spatially restricted: we can viewpoint-navigate
on the hull spanned by all camera recording positions,
looking at the scene from different directions, but we
cannot, for example, move into the scene or fly through the
scene.
Output rendering quality obviously depends on the visual plausibility of the correspondence fields. While we
found the automatic, pair-wise correspondence estimation
algorithm by Stich et al. [SLW∗ 08, SLW∗ 10] to yield
convincing and robust results overall, we explicitly allow
for human interaction in addition to correct for remaining
spurious correspondences. While for special scenarios other
specifically tailored matching algorithms might yield better
results, we incorporated a matching algorithm that provides
robust results for most scenes. For the computation of a
correspondence field, our non-optimized C++ code took
about 2 to 5 minuted per image pair, depending on image
resolution and scene content (i.e., amount of visible edges).
Correspondence correction typically takes about one minute
per video frame pair. In some cases, manual correction was
not feasible. E.g., the Beer scene features single streaks of
foam that cannot be matched manually. In the case of fluids,
i.e., in the Water sequence, we observed that automatic
feature matching works quite well at freeze-and-rotate
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation shots. In amorph image regions, no edges are matched and a
simple blending of color values also yields plausible results.
Bearing in mind that the short Firebreather sequence
contains more than 3000 pairwise correspondence fields, the
vast majority of correspondences in the presented results is
computed without user interaction. The same is true for the
rendered videos, e.g., the trajectory of the virtual camera in
the Dancer scene requires several hundred correspondence
fields. We experienced that editing more than one or two
dozen correspondence fields per scene is not feasible. Small
uncorrected inaccuracies in automatically computed fields
become manifest in cross-fading/ghosting artifacts visible
in some scenes of the accompanying video.
In our system, occlusion is handled by heuristics as
proposed by Stich et al. [SLW∗ 08], disocclusion is handled by mesh cutting based on the connectedness of the
correspondence fields [MMB97]. We hence rely on the
correctness of the computed correspondence fields. In
cases where our occlusion heuristic fails, visible rendering
artifacts occur at occlusion borders, e.g., the small ball in
the Juggler scene and the silhouette of the skateboarder
are not preserved faithfully. More sophisticated occlusion
handling techniques, such as presented by [HSB09], [IK08]
or [MHM∗ 09] might improve rendering quality in these
cases.
For high-quality rendering results, we observed that the
angle between adjacent cameras should not exceed 10
degrees, independent of scene content, Table 1. For greater
distances, missing scene information appears to become too
large to still achieve convincing interpolation results. The
same is true for too fast scene motion. As a rule of thumb,
we found that across space or time, scene correspondences
should not be farther apart than approximately 10% of linear
image size.
7. Conclusion
We have presented an image-based rendering system to
interactively viewpoint-navigate through space and time
of general real-world, dynamic scenes. Our system accepts unsynchronized multi-video data as input. A modest
number of cameras suffices to achieve convincing results.
Since we do not need special acquisition hardware or timeconsuming setup procedures, we can record with hand-held
consumer camcorders in arbitrary environments, e.g., outdoors and hand-held. Our approach allows for smooth interpolation of view perspective and time, i.e., for simultaneous free-viewpoint and slow motion rendering. Based on
pre-computed image correspondences, our system can handle complex scenes for which depth or geometry might be
hard to reconstruct.
Our system opens up a number of interesting future research directions. For example, ensuring correspondence
consistency in a network of images awaits in-depth investigation. Furthermore, given the number of correspondence
C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
Scene
Dancer
Camera Setup
top: 4 cameras
middle: 8 cameras
bottom: 4 cameras
Water
5 cameras in halfcircular arc
Beer
top: 5 cameras
middle: 5 cameras
bottom: 5 cameras
Firetop: 4 cameras
breather middle: 8 cameras
bottom: 4 cameras
Skate6 cameras in halfboarder circular arc
Break8 cameras in halfdancer
circular arc
Juggler
5 cameras in halfcircular arc
Challenges
wide-angle lens distortion, fast motion
refraction, reflection,
specular highlights
high speed motion,
changing topology
high dynamic contrast,
over-exposure,
volumetric effects
outdoor capture, varying lighting conditions
very fast motion, low
temporal
sampling,
noisy acquisition
outdoor capture, handheld and moving cameras
Table 1: Camera recording arrangements. Cameras are
spaced approximately 10 degrees apart in horizontal as well
as vertical direction (3 degrees in the Breakdancer scene).
We have deliberately chosen non-trivial test scenes to assess
performance for a range of different scenarios.
fields, automatically judging their quality by detecting artifacts in the interpolated images is an interesting avenue
for future research. Also, integrating a path-based image
interpolation method as recently presented by Mahajan et
al. [MHM∗ 09] could turn out to be an interesting alternative to our current rendering model, given their method can
be extended to more than two images while maintaining our
real-time constraint. Recently, a 3D image stabilization technique for handheld video has been proposed that is based on
sparse geometry reconstruction [LGJA09]. In order to correct the small geometric error when using handheld cameras, this approach could be incorporated into our system in
a post-processing step. Given the amount of data our system relies on, suitable compression techniques are an important pre-requisite for distributing multi-video and correspondence field data. The image interpolation approach
lends itself to generating a wide range of image-based special effects. Besides time-freeze and slow motion already
presented here, an interpolation-based F/X editor can be conceived that allows for creating advanced effects such as motion streaks, motion distortion, space blur, and space-time
ramps [LLR∗ 10].
[BBPP10] BALLAN L., B ROSTOW G. J., P UWEIN J., P OLLE FEYS M.: Unstructured video-based rendering: Interactive exploration of casually captured videos. 1–11.
[BN92] B EIER T., N EELY S.: Feature-based image metamorphosis. Computer Graphics (Proc. of SIGGRAPH’93) 26, 2 (1992),
35–42.
[Com10]
C OMPUTER G RAPHICS L AB ,
TU
B RAUN Virtual Video Camera project website.
http://graphics.tu-bs.de/projects/vvc/,
Aug. 2010.
SCHWEIG :
[CR74] C ATMULL E., ROM R.: A class of local interpolating
splines. In Computer Aided Geometric Design, Barnhill R.,
Riesenfeld R., (Eds.). Academic Press, 1974, pp. 317–326.
[CTMS03] C ARRANZA J., T HEOBALT C., M AGNOR M., S EI DEL H. P.: Free-Viewpoint Video of Human Actors. ACM Trans.
on Graphics 22, 3 (2003), 569–577.
[CW93] C HEN S. E., W ILLIAMS L.: View interpolation for image synthesis. In Proc. of ACM SIGGRAPH’93 (New York,
1993), ACM Press/ACM SIGGRAPH, pp. 279–288.
[dAST∗ 08] DE AGUIAR E., S TOLL C., T HEOBALT C., A HMED
N., S EIDEL H.-P., T HRUN S.: Performance Capture from Sparse
Multi-View Video. ACM Trans. on Graphics 27, 3 (2008), 1–10.
[ES96] E DELSBRUNNER H., S HAH N.: Incremental topological flipping works for regular triangulations. Algorithmica 15,
3 (1996), 223–241.
[FT02] F UJII T., TANIMOTO M.: Free viewpoint TV system
based on ray-space representation. In Proceedings of SPIE
(2002), vol. 4864, SPIE, p. 175.
[FWZ03] F ITZGIBBON A., W EXLER Y., Z ISSERMAN A.:
Image-based rendering using image-based priors. In ICCV ’03:
Proceedings of the Ninth IEEE International Conference on
Computer Vision (Washington, DC, USA, 2003), IEEE Computer
Society, p. 1176.
[GMW02] G OLDLÜCKE B., M AGNOR M., W ILBURN B.:
Hardware-accelerated dynamic light field rendering. Proc. of
VMV’02 (Nov. 2002), 455–462.
[GSC∗ 07] G OESELE M., S NAVELY N., C URLESS B., H OPPE
H., S EITZ S.: Multi-view stereo for community photo collections. ICCV’07 (2007), 1–8.
[HK09] H ORNUNG A., KOBBELT L.: Interactive pixel-accurate
free viewpoint rendering from images with silhouette aware sampling. Computer Graphics Forum 28, 8 (2009), 2090 – 2103.
[HRT∗ 09] H ASLER N., ROSENHAHN B., T HORMÄHLEN T.,
WAND M., G ALL J., S EIDEL H.-P.: Markerless Motion Capture
with Unsynchronized Moving Cameras. In Proc. of CVPR’09
(Washington, June 2009), IEEE Computer Society, p. to appear.
[HSB09] H ERBST E., S EITZ S., BAKER S.: Occlusion Reasoning for Temporal Interpolation using Optical Flow. Tech.
rep., Microsoft Research Technical Report, MSR-TR-2009-2014,
2009. no. MSR-TR-2009-2014.
[IK08] I NCE S., KONRAD J.: Occlusion-aware view interpolation. EURASIP Journal on Image and Video Processing (2008).
References
[LASM08] L IPSKI C., A LBUQUERQUE G., S TICH T., M AGNOR
M.: Spacetime Tetrahedra: Image-Based Viewpoint Navigation
through Space and Time. Tech. rep., Computer Graphics Lab, TU
Braunschweig, 2008. no. 2008-12-9.
[BBM∗ 01] B UEHLER C., B OSSE M., M C M ILLAN L.,
G ORTLER S., C OHEN M.: Unstructured Lumigraph Rendering. In Proc. of ACM SIGGRAPH’01 (New York, 2001),
ACM Press/ACM SIGGRAPH, pp. 425–432.
[LBE∗ 10] L IPSKI C., B OSE D., E ISEMANN M., B ERGER K.,
M AGNOR M.: Sparse Bundle Adjustment Speedup Strategies.
In WSCG Short Papers Post-Conference Proceedings (2 2010),
Skala V., (Ed.), WSCG.
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
[LGJA09] L IU F., G LEICHER M., J IN H., AGARWALA A.:
Content-preserving warps for 3d video stabilization. ACM Trans.
Graph. 28, 3 (2009), 1–9.
for structure from motion methods. In The 13th Central European Seminar on Computer Graphics (CESCG 2009) (Apr.
2009).
[LH96] L EVOY M., H ANRAHAN P.: Light Field Rendering.
In Proc. of ACM SIGGRAPH’96 (New York, 1996), ACM
Press/ACM SIGGRAPH, pp. 31–42.
[SLAM08] S TICH T., L INZ C., A LBUQUERQUE G., M AGNOR
M.: View and Time Interpolation in Image Space. Computer
Graphics Forum (Proc. of PG’08) 27, 7 (2008).
[LLNM10] L IPSKI C., L INZ C., N EUMANN T., M AGNOR
M.: High resolution image correspondences for video PostProduction. In CVMP 2010 (London, 2010).
[SLW∗ 08] S TICH T., L INZ C., WALLRAVEN C., C UNNINGHAM
D., M AGNOR M.: Perception-motivated Interpolation of Image Sequences. In Proc. of ACM APGV’08 (Los Angeles, USA,
2008), ACM Press, pp. 97–106.
[LLR∗ 10] L INZ C., L IPSKI C., ROGGE L., T HEOBALT C.,
M AGNOR M.: Space-Time visual effects as a Post-Production
process. In ACM Multimedia 2010 Workshop - 3DVP’10
(Firenze, 10 2010), p. to appear.
[LWS98] L EE S., W OLBERG G., S HIN S.: Polymorph: morphing
among multiple images. IEEE Computer Graphics and Applications (1998), 58–71.
[MB95] M C M ILLAN L., B ISHOP G.: Plenoptic Modeling.
In Proc. of ACM SIGGRAPH’95 (New York, 1995), ACM
Press/ACM SIGGRAPH, pp. 39–46.
[MBR∗ 00] M ATUSIK W., B UEHLER C., R ASKAR R., G ORTLER
S., M C M ILLAN L.: Image-Based Visual Hulls. In Proc. of
ACM SIGGRAPH’00 (New York, 2000), ACM Press/ACM SIGGRAPH, pp. 369–374.
[MD99] M ANNING R., DYER C.: Interpolating View and Scene
Motion by Dynamic View Morphing. In Proc. of CVPR’99
(Washington, 1999), IEEE Computer Society, pp. 388–394.
[MHM∗ 09]
M AHAJAN D., H UANG F., M ATUSIK W., R A R., B ELHUMEUR P.: Moving Gradients: A PathBased Method for Plausible Image Interpolation. ACM Trans. on
Graphics (2009), to appear.
MAMOORTHI
[MMB97] M ARK W., M C M ILLAN L., B ISHOP G.: PostRendering 3D Warping. In Proc. of Symposium on Interactive
3D Graphics (1997), pp. 7–16.
[MP04] M ATUSIK W., P FISTER H.: 3D TV: A Scalable System
for Real-Time Acquisition, Transmission, and Autostereoscopic
Display of Dynamic Scenes. ACM Trans. on Graphics 23, 3
(2004), 814–824.
[MSMP08] M EYER B., S TICH T., M AGNOR M., P OLLEFEYS
M.: Subframe Temporal Alignment of Non-Stationary Cameras.
In Proc. British Machine Vision Conference BMVC ’08 (2008).
[NTH02] NAEMURA T., TAGO J., H ARASHIMA H.: Real-Time
Video-Based Modeling and Rendering of 3D Scenes. IEEE Computer Graphics and Applications (2002), 66–73.
[SC08] S CHOENEMANN T., C REMERS D.: High Resolution Motion Layer Decomposition using Dual-space Graph Cuts. In Proc.
of CVPR (Washington, 2008), IEEE Computer Society, pp. 1–7.
[SLW∗ 10] S TICH T., L INZ C., WALLRAVEN C., C UNNINGHAM
D., M AGNOR M.: Perception-motivated Interpolation of Image Sequences. ACM Transactions on Applied Perception (TAP)
(2010). to appear.
[SSS06] S NAVELY N., S EITZ S. M., S ZELISKI R.: Photo
tourism: Exploring photo collections in 3d. ACM Trans. on
Graphics 25, 3 (2006), 835–846.
[ST06] S AND P., T ELLER S.: Particle Video: Long-Range Motion Estimation using Point Trajectories. In Proc. of CVPR’06
(Washington, 2006), IEEE Computer Society, pp. 2195–2202.
[VBK05] V EDULA S., BAKER S., K ANADE T.: Image Based
Spatio-Temporal Modeling and View Interpolation of Dynamic
Events. ACM Trans. on Graphics 24, 2 (2005), 240–261.
[VGvT∗ 05] VAISH V., G ARG G., VILLE TALVALA E., A N TUNEZ E., W ILBURN B., H OROWITZ M., L EVOY M.: Synthetic aperture focusing using a shear-warp factorization of the
viewing transform. In Proc. Workshop on Advanced 3D Imaging
for Safety and Security (2005), p. 129.
[WJV∗ 05] W ILBURN B., J OSHI N., VAISH V., TALVALA E.-V.,
A NTUNEZ E., BARTH A., A DAMS A., H OROWITZ M., L EVOY
M.: High Performance Imaging using large Camera Arrays.
ACM Trans. on Graphics 24, 3 (2005), 765–776.
[WSY07] WANG H., S UN M., YANG R.: Space-Time Light Field
Rendering. IEEE Trans. Visualization and Computer Graphics
(2007), 697–710.
[WWC∗ 05] WASCHBÜSCH M., W ÜRMLIN S., C OTTING D.,
S ADLO F., G ROSS M.: Scalable 3d video of dynamic scenes.
Vis. Comp. 21, 8 (2005), 629–638.
[WY05] WANG H., YANG R.: Towards space: time light field
rendering. In Proc. of I3D’05 (New York, 2005), ACM Press,
pp. 125–132.
[XCJ08] X U L., C HEN J., J IA J.: A Segmentation Based Variational Model for Accurate Optical Flow Estimation. In Proc. of
ECCV’08 (Berlin, 2008), Springer, pp. 671–684.
[XS04] X IAO J., S HAH M.: Tri-View Morphing. Computer Vision and Image Understanding 96 (2004), 345–366.
[SD96] S EITZ S. M., DYER C. R.: View Morphing. In Proc. of
ACM SIGGRAPH’96 (New York, 1996), ACM Press/ACM SIGGRAPH, pp. 21–30.
[YEBM02] YANG J. C., E VERETT M., B UEHLER C., M CMIL LAN L.: A Real-Time distributed Light Field Camera. In Proc.
of EGRW’02 (2002), Eurographics Association, pp. 77–86.
[SG05] S I H., G AERTNER K.: Meshing Piecewise Linear Complexes by Constrained Delaunay Tetrahedralizations. In Proc.
of International Meshing Roundtable (Berlin, September 2005),
Springer, pp. 147–163.
[ZJK05] Z ITNICK C., J OJIC N., K ANG S. B.: Consistent Segmentation for Optical Flow Estimation. ICCV’05 2 (2005), 1308–
1315.
[SGSS08] S NAVELY N., G ARG R., S EITZ S. M., S ZELISKI R.:
Finding Paths through the World’s Photos. ACM Trans. on
Graphics 27, 3 (2008), 11–21.
[ZKU∗ 04] Z ITNICK C., K ANG S., U YTTENDAELE M.,
W INDER S., S ZELISKI R.: High-Quality Video View Interpolation Using a Layered Representation. ACM Trans. on Graphics
23, 3 (2004), 600–608.
[SH07] S TARCK J., H ILTON A.: Surface Capture for Performance Based Animation. IEEE Computer Graphics and Applications 27(3) (2007), 21–31.
[ZWGS02] Z HANG Z., WANG L., G UO B., S HUM H.-Y.:
Feature-based light field morphing. ACM Trans. on Graphics
21, 3 (2002), 457–464.
[SK09]
S CHWARTZ C., K LEIN R.: Improving initial estimations
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera
Figure 12: Virtual Video Camera views: for each test scene, the virtual viewpoint at the barycenter of the enclosing navigationspace tetrahedron is rendered, representing the ‘worst case interpolation’. As test scenes, we have deliberately chosen complex
scenes that pose a variety of challenges: outdoor capture (Skateboarder), volumetric effect and high dynamic contrast (Firebreather), fast, non-rigid motion (Beer), reflection and refraction (Water), wide-angle lens distortion (Dancer), very fast scene
motion (Breakdancer), and hand-held acquisition (Juggler).
c 2010 The Author(s)
c 2010 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation