www.cg.cs.tu-bs.de - Computer Graphics Lab
Transcription
www.cg.cs.tu-bs.de - Computer Graphics Lab
COMPUTER GRAPHICS forum Volume 0 (1981), Number 0 pp. 1–11 Virtual Video Camera: Image-Based Viewpoint Navigation Through Space and Time C. Lipski, C. Linz, K. Berger, A. Sellent and M. Magnor Computer Graphics Lab, TU Braunschweig, Germany Abstract We present an image-based rendering system to viewpoint-navigate through space and time of complex real-world, dynamic scenes. Our approach accepts unsynchronized, uncalibrated multi-video footage as input. Inexpensive, consumer-grade camcorders suffice to acquire arbitrary scenes, e.g., in the outdoors, without elaborate recording setup procedures, allowing also for hand-held recordings. Instead of scene depth estimation, layer segmentation, or 3D reconstruction, our approach is based on dense image correspondences, treating view interpolation uniformly in space and time: spatial viewpoint navigation, slow motion, or freeze-and-rotate effects can all be created in the same way. Acquisition simplification, integration of moving cameras, generalization to difficult scenes, and space-time symmetric interpolation amount to a widely applicable Virtual Video Camera system. Categories and Subject Descriptors (according to ACM CCS): Generation—I.3.8Applications 1. Introduction The objective common to all free-viewpoint navigation systems is to render photo-realistic vistas of real-world, dynamic scenes from arbitrary perspective (within some specified range), given a number of simultaneously recorded video streams. Most systems exploit epipolar geometry based on either dense depth/disparity maps [NTH02, GMW02, ZKU∗ 04] or complete 3D geometry models [MBR∗ 00, CTMS03, VBK05, SH07, dAST∗ 08]. In order to estimate depth or reconstruct 3D geometry of dynamic scenes, the input multi-video data must be precisely calibrated as well as captured synchronously. This dependence on synchronized footage can limit practical applicability: high-end or even custom-built acquisition hardware must be employed, and the recording setup indispensably includes some sort of camera interconnections (cables, WiFi). The cost, time, and effort involved in recording synchronized multi-video data prevents widespread use of free-viewpoint navigation methods. In this work, we address these limitations and propose a purely image-based approach to free-viewpoint navigation through space as well as time. Our approach accepts unsynchronized, uncalibrated multi-video footage as input. It is motivated by the pioneering work on view interpolation by Chen and Williams [CW93]. We pick up on the idea of inc 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. I.3.3 [Computer Graphics]: Picture/Image terpolating different image acquisition attributes in higher dimensional space and suitably extend it to be applicable to view interpolation in the spatial as well as temporal domain. By putting the temporal dimension on a par with the spatial dimensions, a uniform framework is available to continuously interpolate virtual video camera positions across space and time. In our approach, we make use of dense image correspondences [SLW∗ 08, SLW∗ 10], which take the place of depth/disparity or 3D geometry. We show how dense correspondences allow extending applicability to scenes whose object surfaces are highly variable in appearance, e.g. due to specular highlights, or hard to reconstruct for other reasons. Perceptually plausible image correspondence fields can often still be established where ground-truth geometry (or geometry-based correspondences) cannot [BN92]. Dense image correspondences can also be established along the temporal dimension to enable interpolation in time. Main contribution of our paper is the presentation of a complete Virtual Video Camera system that features continuous viewpoint navigation in space and time for general, real-world scenes from a manageable number of easy-toacquire, unsynchronized, uncalibrated and potentially handheld camcorder recordings. To facilitate intuitive navigation in space and time, we describe the construction of a 3- C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera Figure 1: Viewpoint navigation space: time and camera viewing directions span our navigation space. Each cube represents one video frame. Edges denote correspondence fields between two video frames. The navigation space is partitioned into tetrahedra. Our virtual video camera view is interpolated by warping and compositing the four video frames of the enclosing tetrahedron (blue) in real-time. dimensional navigation space. We show that special care has to be taken to create the notion of a static virtual camera if the temporal dimension is involved. In the following section, we highlight related work before we describe our navigation space in Sect. 3. We discuss implications and potential pitfalls in Sect. 4. Practical implementation issues of our processing pipeline (Fig. 2) are addressed in Sect. 5, outlining existing methods employed and explaining important necessary adaptations. To demonstrate the versatility of our approach and to document rendering quality, we present results of our Virtual Video Camera system for a number of real-world scenes featuring different challenges in Sect. 6. We discuss remaining limitations of our system before we conclude in Sect. 7 with a summary and outlook on intriguing follow-up research. 2. Related Work Free-viewpoint navigation around real-world, dynamic scenes has recently received considerable attention. In general, any image-based rendering method originally developed for static scenes can be applied to dynamic scene content by considering each time frame individually. For example, light field rendering [LH96] can be directly extended to dynamic scenes [FT02,MP04]. To cover some useful view angle range at acceptable image quality, however, large numbers of densely packed video cameras are necessary [MB95, YEBM02, WJV∗ 05, VGvT∗ 05]. Wider camera spacings become possible only if more elaborate view interpolation approaches are employed. Frequent use is made of the epipolar constraint, reducing the 2D correspondence problem between synchronized images from different cameras to a one-dimensional line search. Based on epipolar geometry, either dense depth/disparity maps or complete 3D geometry models are estimated from the input image data prior to rendering. Seitz and Dyer [SD96] determine the F matrix to estimate dense disparity and warpinterpolate between two views of static scenes. Manning and Dyer [MD99] extend the approach to dynamic scenes and unsynchronized images for the case of linearly moving scene objects that can be segmented into layers. For rigidbody objects, Xiao and Shah [XS04] describe an interpolation method for three images. All these approaches, however, neglect occlusion/disocclusion. In their Video View Interpolation system, Zitnick et al. [ZKU∗ 04] take occlusion effects into account. Based on dense depth maps, the system requires synchronized and calibrated multi-video footage which is acquired using a custom-built multi-camera system. Hornung and Kobbelt [HK09] reconstruct a particle point cloud from an unordered image collection and use this representation for their rendering. Ballan et al. use a combination of approximate scene geometry reconstruction and billboard representation of a single moving actor [BBPP10]. They rely on the observation that the actor’s silhouette looks identical from various viewpoints, otherwise cross-fading artifacts become visible. Fitzgibbon et al. [FWZ03] circumvent the necessity of depth estimation by reconstructing color instead of geometry. Space-Time Light Field Rendering overcomes the dependence on synchronized recordings [WY05, WSY07]. The system operates on unsynchronized video streams and is capable of interpolating in space and time. The input images are warped twice, first to a common virtual time before Unstructured Lumigraph Rendering [BBM∗ 01] is applied in a second step. Unfortunately, the approach requires considerable computation time and takes several minutes per output frame. Most depth/disparity-based systems perform viewpoint interpolation between recording camera positions only. For unrestricted viewpoint navigation through 3D space, complete 3D models of dynamic scene geometry are needed [MBR∗ 00, CTMS03, WWC∗ 05, SH07, dAST∗ 08]. While most systems perform only spatial viewpoint interpolation, Vedula et al. [VBK05] describe a volume-based approach that is capable of interpolating in space and time by estimating 3D geometry and motion. In general, rendering quality is highly dependent on 3D model accuracy; for dynamic scene content whose geometry cannot be reconstructed or modeled well, rendering results are prone to artifacts. A generally applicable, feature-based method for inc 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera I Offline Pre-Processing Real-Time Rendering Color Correction Ic Embedding (Sect. 3) Calibration Audio Sync. R, p {vj } Tetrahedralization (Sect. 4) t Corresp. Estimation View Interpolation Iv {Wij } {λk } ϕ, θ,t G, cm Figure 2: Virtual Video Camera processing pipeline: multi-video data (I) is color-corrected first (Ic ). To embed the video frames into navigation space {vj }, the user specifies the master camera cm and a common ground plane G . Extrinsic camera parameters (R, p) and the time offsets (t) are automatically estimated. Adjacency of video frames induces a tetrahedralization ({λk }) of the navigation space, and dense correspondences ({Wij }) are estimated along the edges of the tetrahedralization. We allow for manual correction of spurious correspondence fields. After these off-line processing steps, the navigation space can be interactively explored (viewing directions ϕ, θ, time t) by real-time rendering the virtual view Iv . terpolating between two different images is presented by Beier and Neely [BN92]. Chen and Williams show how general image interpolation can be used for view interpolation [CW93]. For improved rendering performance, McMillan and Bishop propose a planar-to-planar, forward mapped image warping algorithm [MB95]. Mark et al. adapt the method to achieve high frame rates for postrendering [MMB97], while Zhang et al. apply feature-based morphing to light fields [ZWGS02]. Lee et al. extended the feature-based method presented by Beier and Neely [BN92] to more than two images [LWS98]. Based on visually plausible, dense image correspondence fields, warping/morphingbased image interpolation is considerably more flexible than epipolar-constrained view interpolation; calibration imprecision, unsynchronized multi-video footage, or geometry inaccuracies do not hamper correspondence-based interpolation. In addition, plausible image correspondence fields can often still be established for scenes for which depth or 3D geometry is hard to come by [SLW∗ 08]. Recently, Mahajan et al. presented a path-based interpolation for image pairs that operates in the gradient domain and prevents ghosting/blurring and many occlusion artifacts visible in morphing-based methods [MHM∗ 09]. Other approaches to combat occlusion artifacts have been proposed by [HSB09] and [IK08]. In their Siggraph’93 paper, Chen and Williams mention image interpolation in a higher-dimensional space of different acquisition attributes [CW93]. Our Virtual Video Camera system picks up on this idea and suitably extends it to view interpolation in the spatial and temporal domain. 3. Navigation space embedding Our goal is to explore the captured scene in an intuitive way and render a (virtual) view Iv of it, corresponding to a combination of viewing direction and time. To this end, we choose c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation S cb = cm g1 ca cc pS cd g2 g3 Figure 3: Navigation space: we define a sphere S around the scene. For the center pS of S, we least-squares-determine the point closest to the optical axes of all cameras (green lines). The user selects three points g1 , g2 , g3 to define the ground plane (yellow circle). We take the normal of the plane as the up vector of the scene, and thus as the rotation axis of the sphere. The embedding is uniquely defined by labeling one of the cameras as the Master camera cm . to define a 3-dimensional navigation space N that represents spatial camera coordinates as well as the temporal dimension. In their seminal paper, Chen and Williams [CW93] propose to interpolate the camera rotation R and position p directly in 6-dimensional hyperspace. While this is perfectly feasible in theory, it has several major drawbacks in practice: it neither allows for intuitive exploration of the scene by a user, nor is it practical to handle the amount of emerging data needed for interpolation in this high-dimensional space. Additionally, cameras would have to be arranged in a way that they span an actual volume in Euclidean space. With C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera cd ce c fvc ch c f vc cg Figure 4: Two possibilities to partition the camera setup of the Breakdancer sequence. Both images depict the camera setup seen from above. When partitioning the Euclidean space directly (left), several problems arise. Although an actual volume is spanned by the cameras, many tetrahedra are degenerated. If the virtual camera vc is reconstructed, captured images from cameras cd , ce , c f and ch as well as the wide-baseline correspondence fields between them are required. Using our navigation space embedding (right), interpolation only takes place between neighboring cameras. Less data has to be processed and correspondence estimation is much easier. Additionally, spatial navigation is intuitively simplified. Instead of a 3D position, only the position on the one-dimensional arc (light blue) has to be specified. The small spatial error (distance between vc and line segment between c f and cg ) is negligible even for ad-hoc setups. this requirement, it would be hard to devise an arrangement of cameras where they do not occlude each other’s view of the scene. The crucial design decision in our system is hence to map the extrinsic camera parameters to a lower dimensional space that allows intuitive navigation. The temporal dimension already defines one axis of the navigation space N , leaving two dimensions for parameterizing the camera orientation and position. Practical parametrizations that allow for realistic view interpolation are only possible if the cameras’ optical axes cross at some point (possibly at infinity). A natural choice for such an embedding is a spherical parametrization of the camera setup. While for example a cylindrical embedding or an embedding in a plane is also feasible, a spherical embedding allows for all reasonable physical camera setups, ranging from cameras arranged in a 1-dimensional arc, over cameras placed in a spherical setup to linear camera arrays with parallel optical axes in the limit. Regarding existing data sets, it is obvious that spherical/arcshaped setups are the predominant multi-view capture scenarios (e.g. [ZKU∗ 04, dAST∗ 08]). Even in unordered image collections it can be observed that a majority of camera positions is distributed in arcs around certain points of interest [SGSS08]. Other placements, such as panoramic views, are also possible, but would suffer from the small overlap of the image regions of cameras. For all results presented in this paper, we hence employ a spherical model with optical axes centered at a common point. Camera setups such as Bézier splines or patches are also feasible and our approach may adapt to any setup as long as a sensible parametrization can be obtained. As the extension of our approach to these settings is straightforward, we will not discuss it in detail. Cameras are placed on the surface of a virtual sphere, their orientations are defined by azimuth ϕ and elevation θ. Together with the temporal dimension t, ϕ and θ span our 3dimensional navigation space N . If cameras are arranged in an arc or curve around the scene, θ is fixed, reducing N to two dimensions. As this simplification is trivial, we will only cover the three dimensional case in our discussion. In contrast to the conventional partition of space suggested by Chen and Williams [CW93], we thus restrict the movement of the virtual camera to a subspace of lesser dimensionality (2D approximate spherical surface or 1D approximated arc). Although this might appear as a drawback at first sight, several advantages arise from this crucial design decision, see Fig. 4: 1. The amount of correspondence fields needed for image interpolation is reduced significantly, making both preprocessing and rendering faster. 2. An unrestricted partition of Euclidean space leads to degenerated tetrahedra. Especially when cameras are arranged along a line or arc, adjacencies between remote cameras are established for which no reliable correspondence information can be obtained. 3. Our parametrization of the camera arrangement provides an intuitive navigation around the scene. To define our navigation space N , we assume that we know ground-truth extrinsic camera parameters R and p for every camera, as well as a few point correspondences with their 3D world coordinates (in Sect. 5.2 we describe how we obtain sufficiently accurate calibration parameters). For a specific virtual image Iv , we want to interpolate the image at a given point in navigation space defined by the two spatial parameters ϕ and θ as well as recording time t . To serve as sampling points, the camera configuration of our recorded multi-video input in Euclidean world space is embedded into navigation space N Ψ : (R, p,t) 7→ (ϕ, θ,t). In our system, Ψ is simply a transformation from Cartesian coordinates to spherical coordinates, where the sphere center pS and the radius of the sphere rS are computed from the cameras’ extrinsic parameters R and p in a least-squares sense. The embedding is uniquely defined by specifying a ground plane in the scene and by labeling one of the cameras as the master camera cm , cf. Fig. 3. It is evident that the recording hull spanned by all camera positions is, in general, only a crude approximation to a spherical surface. Fortunately, we found that non-spherical camc 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera cb = (0, 3) cc = (3, 3) ca = (0, 0) cd = (2, 0) ∆1 ∆3 ∆2 ∆4 Figure 5: Surface tessellation: for the spherical coordinates of four cameras ca , . . . cd , two possible tessellations {∆1 , ∆2 } and {∆3 , ∆4 } exist. During space-time interpolation, the configuration of the initial tessellation may not change to avoid interpolation discontinuities. era arrangements do not cause any visually noticeable effect during rendering. 4. Spacetime tetrahedralization In order to interpolate viewpoints, Chen and Williams propose to generate some arbitrary graph to partition the space such that every possible viewpoint lies in a d-simplex and can be expressed as a linear combination of the d + 1 vertices of the enclosing simplex [CW93]. When including the temporal dimension, however, an arbitrary graph leads to rendering artifacts due to inconsistent mappings between Euclidean space and navigation space. the temporal dimension is introduced, even if ground-truth camera parameters R and p are available. Imagine now that the user intends to re-render the scene from a static virtual camera. The position of this camera is somewhere between the four original cameras, so that the new view has to be interpolated. Because the camera is static in navigation space, ϕ and θ stay fixed while t is variable such that the virtual camera moves along a line l in navigation space, Fig. 6. Assume that the navigation space has been partitioned by applying a standard Delaunay tetrahedralization [ES96] to the set of navigation space vertices via , vib , vic , vid , where i indexes the temporal dimension. Now consider the camera path l. In line section l1 (green), images associated with camera ca , cb and cc are used for interpolation. The virtual camera thus represents a point that lies on plane ∆3 in the original Euclidean space, Fig. 5 (lower right). In line section l2 (blue), however, the virtual camera is interpolated from all four cameras, i.e., it moves within the volume spanned by the four different cameras in Euclidean world space, Fig. 6(a). This violates the notion of a static virtual camera. In line segment l3 (red), the virtual camera is represented by a point in Euclidean world space situated on plane ∆1 , Fig. 5 (upper right). Since ∆1 and ∆3 are not coplanar, the position of the virtual camera in Euclidean space is different in both line sections, again violating the static camera assumption. In navigation-space interpolation, this error can be avoided if any two points vt = (ϕ, θ,t) and vt+δ = (ϕ, θ,t + δ) re-project to the same point in Euclidean space, i.e., if the virtual camera is static in Euclidean space. Instead of building an arbitrary graph structure on the navigation space vertices, we apply a combination of constrained and unconstrained Delaunay tessellations to navigation space vertices. 4.1. Naïve tetrahedralization 4.2. A constrained tetrahedralization algorithm Our navigation space consists of two view-directional and the temporal dimension. We subdivide the navigation space into tetrahedra λ such that each embedded video frame represents one vertex [LASM08]. We interpolate any virtual camera view from the enclosing tetrahedron λ = {vi }, i = 1 . . . 4. As already mentioned, arbitrary partitions are feasible in static scenes. However, our spherical approximation and/or calibration imprecision will lead to interpolation errors in practice if the temporal dimension is included. To obtain a temporally consistent tetrahedralization, we start with an unconstrained Delaunay tessellation on the sphere, i.e., we neglect the temporal dimension for now. This initial tesselation must be preserved, i.e., the spatial ordering of cameras must stay the same for all recorded frames of the video streams. Referring back to the example given above, we arrive at the tessellation shown in Fig. 7 (left). The edges of this tessellation serve as boundaries that help maintain the initial tessellation. We then add the temporal dimension. The initial tessellation of the sphere is duplicated and displaced along the temporal axis. In order to maintain the structure of the tessellation computed so far, we add boundary faces for each edge of the initial tessellation, e.g., we add a face (v1b , v2b , v2d ) in the example above, Fig. 7 (right). To prove this point, let us assume that we have captured a dynamic scene with four static, unsynchronized cameras ca , cb , cc and cd . As we stated in Sect. 3, our view-directional subspace relates to the spherical camera placement. To be more exact, it represents an approximation of a spherical surface, examples for possible approximations are given in Fig. 5, upper right and lower right. We will show that although the chosen spherical approximation is arbitrary, it is crucial to enforce consistency of the approximation when c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation A constrained Delaunay tessellation can then be computed on this three dimensional point cloud in navigation space [SG05] . In our example, the added boundary faces separate the data in such a way that every tetrahedron corresponds to one of the two planes in Euclidean space, i.e., C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera v1b l λ1 l1 v2b v2c l2 v1b l3 λ2 v2d v2c l2 l1 l λ3 v2a λ5 v2b λ6 λ4 v3a v2a v2d v3a (b) Constrained Delaunay tetrahedralization. (a) Unconstrained Delaunay tetrahedralization. Figure 6: Unconstrained vs. constrained tessellation of the navigation space: with an unconstrained tetrahedralization (a), tetrahedra λ1 , λ2 and λ3 are created. A static virtual camera moves along line l and passes through all three tetrahedra. When comparing the line sections l1 (green), l2 (blue) and l3 (red), the point in navigation space relates to different points in Euclidean space, thus the virtual camera seems to move. By introducing boundary faces, such errors can be avoided (b). The virtual camera passes solely through tetrahedra (λ4 , λ6 ) which are spanned by the same three cameras and is not influenced by v2c (tetrahedron λ5 remains obsolete). Therefore, the virtual camera remains static in Euclidean space, as intended. either to ∆1 or ∆2 in Fig. 5, but never to ∆3 or ∆4 . All tetrahedra also comprise of four images that were captured by exactly three cameras. If each image vertex in a tetrahedron was captured by a different camera, the situation shown in line section l2 (Fig. 6(a)) would arise. cb The undesired flipping along the temporal dimension also occurs when working with dynamic cameras. Here we assume that the cameras move, but their relative position do not change, i.e. we do not allow cameras to switch position. In this case, the same reasoning as presented above for static cameras applies. However, now the notion of a static virtual camera is defined with respect to the moving input cameras. By doing so, sudden changes of the spherical surface approximation can be prevented, just as in the case of stationary recordings. The only limitation is that when cameras move in Euclidean space, the approximated spherical surface also changes. In these cases our approximation error, i.e. the distance of the rendered virtual camera to our idealized spherical camera arrangement, does not remain fixed. In general, this this only noticeable with fast moving cameras and can usually be neglected for small camera movements. The same applies to imprecise camera calibration. Small errors in R and p are inevitable, but usually do not manifest in visible artifacts. For assessment, we refer to Sect. 6 and to our accompanying video. In cases where the scene center is actually moving (e.g., at outdoor sports events) and cameramen are following it, we suggest to use a time-dependent camera embedding. ca 5. Processing Pipeline Based on the above-described navigation space tetrahedralization, we propose a fully functional processing pipeline for free-viewpoint navigation from unsynchronized multi-video footage. Our processing pipeline makes use of known techniques and suitably adapts them to solve the encountered problems. cc v1b l pl cd v2c v2b v2a v3a v2d Figure 7: Boundary faces (green): A two-dimensional unconstrained Delaunay tessellation reveals the boundary edges between cameras (left). In three-dimensional navigation space (right), boundary faces (green triangles) are inserted for each boundary edge. The virtual static camera pl that moves on a line l in navigation space always lies in tetrahedra constructed from ca , cb and cd , as postulated in Fig. 5. 5.1. Acquisition To acquire multi-video data, we use up to 16 HDV Canon XHA1 camcorders (1440 x 1080 pixels, 25 fps). The captured sequences are internally MPEG-compressed, stored on DV tape, and later transferred to a standard PC. This setup is very flexible, easy to setup, runs completely on batteries and is suitable for indoor and outdoor use. Static setups using tripods as well as setups with dynamic hand-held cameras are possible. Adjacent cameras should only have sufficient view overlap to facilitate correspondence estimation. For enhancing interpolation, the angle between neighboring cameras should not exceed roughly 10 degrees, in vertical or horizontal direction. Fig. 8 shows some typical camera configurations. 5.2. Pre-processing Color correction. To correct for color balance differences among cameras, we use the master camera cm defined in the c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera Figure 8: Camera setups: our approach works with offthe-shelf consumer-grade camcorders, mounted either on tripods or handheld. embedding stage and apply the color correction approach presented by Snavely et al. to all video frames [SSS06]. Camera calibration. We need to determine R and p of the camera setup to define the mapping Ψ from world space coordinates to navigation space N , Sect. 3. Recent structure-from-motion algorithms for unordered image collections [GSC∗ 07, SK09, LBE∗ 10] solve this problem robustly and can also provide a set of sparse world space points needed for constructing the common ground plane for our navigation space. We found that this algorithm yields robust results also for dynamic scenes. For dynamic camera scenarios (i.e., handheld, moving cameras), R and p have to be computed for every frame of each camera. Temporal registration. The mapping Ψ additionally needs the exact recording time t of each camera. We estimate the sub-frame temporal offset by recording a dual-tone sequence during acquisition and analyzing the audio tracks afterwards [HRT∗ 09]. If recording a separate audio track is not feasible, pure post-processing approaches [MSMP08] can be employed instead. Please note that after temporal registration, the recorded images are still not synchronized. Since the camcorder recordings were triggered manually, a subframe offset still prevents exploiting the epipolar constraint in the presence of moving objects. Figure 9: Manual correction of correspondence fields: Matching ambiguities in difficult cases (e.g., fast motion in Breakdancer scene) are resolved manually. The user draws corresponding lines onto both images (purple). They serve as a correspondence prior for the matching process, which would otherwise only include automatic correspondence estimation of edge pixels (red). Where necessary, interaction time per image pair is typically less than 1 minute. as v = ∑4i=1 µi vi , where µi are the barycentric coordinates of v. Each of the 4 vertices vi of the tetrahedron corresponds to a recorded image Ii . Each of the 12 edges ei j correspond to a correspondence map Wij , that defines a translation of a pixel location x on the image plane. We are now able to synthesize a novel image Iv for every point v inside the recording hull of the navigation space N by multi-image interpolation: 4 Iv = ∑ µi I˜i , i=1 Dense correspondence field estimation. In order to interpolate between two images Ii , I j , we need bidirectional dense correspondence maps Wij for each tetrahedral edge in navigation space N , Sect. 3. In our system, we employ the algorithm proposed by Stich et al. for dense correspondence estimation [SLW∗ 08, SLW∗ 10]. For most image pairs, the results are perceptually convincing, and if not, the algorithm accepts manual corrections, see Fig. 9. Alternatively, our system would work just as well with any other correspondence estimation algorithm [ZJK05, ST06, SC08, XCJ08, LLNM10]. 5.3. Rendering Having subdivided navigation space N into tetrahedra, each point v is defined by the vertices of the enclosing tetrahedron λ = {vi }, i = 1 . . . 4. Its position can be uniquely expressed c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation where I˜i Πi (x) + ∑ j=1,...,4, j6=i ! µ j Π j (Wij (x)) = Ii (x) are the forward-warped images [MMB97]. {Πi } defines a set of re-projection matrices {Πi } that map each image Ii onto the image plane of Iv , as proposed by Seitz and Dyer [SD96]. Those matrices can be easily derived from camera calibration. Since the virtual image Iv is always oriented towards the center of the scene, this re-projection corrects the skew of optical axes potentially introduced by our loose camera setup and also accounts for jittering introduced by dynamic cameras. Image re-projection is done on the GPU without image data resampling. We handle occlusion/disocclusion on-the-fly based on correspondence field heuristics as proposed by Stich et al. [SLAM08]. Given the C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera Figure 10: Virtual video editing: to interactively create arbitrary virtual video camera sequences, the user moves the camera by click-and-drag movements in the rendering window (top). The spline curve representing the space-time camera path is automatically updated and visualized in the navigation space window (bottom). images recorded by two neighboring cameras at roughly the same point in time, we evaluate the x-component of the correspondence fields leading from the left camera image to the right camera image. Assuming that inter-camera motion is generally larger than inter-frame motion, we use a simple visibility ordering heuristic. An object located at the center of the scene will exhibit a horizontal motion vector close to zero, objects in the foreground move to the right and objects in the back move to the left. When two regions of a warped image overlap, i.e., two pixel fragments are warped onto the same location, a simple comparison of their respective motion vectors thus resolves this ambiguity. Disocclusions are detected by calculating local divergence in the correspondence fields. If any two neighboring pixels exhibit a difference of more than 4 pixels in their motion, triangles connecting them are discarded in a geometry shader. Hole filling is done during the blending stage where image information from the other three warped images is used. As a last step, the borders of the rendered images are cropped (10% of the image in each dimension), since no reliable correspondence information is available for these regions. At 940 × 560 pixels output resolution, rendering frame rates exceed 25 fps on an NVIDIA GeForce 8800 GTX. We store images and correspondence fields in local memory and use a pre-fetching scheme for image data. 5.4. Applications One application is the creation of virtual camera movements during post-production. Arbitrary panning, time-freeze, and slow motion shots can be defined and rendered. We implemented a graphical user interface that allows intuitive exploration of our navigation space and allows the user to com- Figure 11: Interactive Spacetime Navigator: while watching the scene, the user can control playback speed and view perspective by clicking and dragging within the video window. pose arbitrary spacetime camera movements interactively. The user interface makes use of common 3D animation concepts. A visualization of the navigation space is presented, and the current virtual camera view is rendered in real-time, Fig. 10. The user can continuously change viewing direction and time by simple click-and-drag movements within the rendering window. Camera paths are defined by placing, editing, or deleting control points in navigation space at the bottom of the screen. The virtual video camera path through spacetime is interpolated by Catmull-Rom splines [CR74]. Another application is our interactive Spacetime Navigator, Fig. 11. While the interface bears much resemblance to classic media players, the user can move the camera by clicking and dragging the mouse cursor within the rendering window and observe the scene from arbitrary viewpoints. In addition, playback speed can be varied arbitrarily. The Spacetime Navigator and the Firebreather test scene are available online [Com10]. 6. Results We evaluated our system on a variety of real-world scenes, each one posing different challenges to the processing pipeline, Table 1. For each test scene, Fig. 12 depicts an interpolated view at the barycenter of one navigation-space tetrahedron. The barycenter represents the point where all four video images are warped and weighed equally, thus representing the case most likely to show rendering artifacts. Because still images are not able to convey the full impression of interactive viewpoint navigation, we refer the reader to the accompanying video and the interactive Spacetime Navigator [Com10]. Besides high-quality image interpolation in space and time, the design of our navigation space allows intuitive exploration of the recorded spacetime. As such, our system lends itself as a valuable tool for intuitive design of visual effects such as time-freeze and slow motion in a post-process. Besides scene-specific challenges, the c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera system must also cope with lens distortion, camera noise, and compression artifacts inherent to consumer-grade camcorders. The Juggler sequence was recorded using hand-held camcorders and the cameramen were moving about 0.5 meter to the left during acquisition. Nevertheless, we are able to synthesize a static virtual view. The small geometrical error introduced by the non-stationary approximation of the camera arc manifests in a slight drift of the virtual camera, yet it is hardly noticeable. In order to remedy this undesired effect, further stabilization techniques could be integrated (e.g., [LGJA09]). The Breakdancer sequence is provided for visual comparison to the approach of Zitnick et al. [ZKU∗ 04]. As their data set is closely sampled in the spatial domain (the angle between adjacent cameras is approximately 3 degrees), no user interaction was required for intercamera correspondence field estimation. Please note that our video is not perfectly synchronized to the original video, since the original spacetime trajectory is unknown. We also decided not to imitate the original cross-blending effect, since it distracts from potential rendering artifacts. While obtaining comparable visual quality, our method can also interpolate along the temporal dimension. Unfortunately, the data set was captured at only 15 fps and exhibits very fast rotational movement, which violates the assumption of linear motion. Still, spatiotemporal image interpolation can be achieved. However, most of the inter-frame foreground correspondences had to be edited by hand, since the automatic matching could not cope with the fast motion of the breakdancer. Limitations. Similar to depth/disparity-based freeviewpoint navigation systems [ZKU∗ 04], our virtual viewpoint is spatially restricted: we can viewpoint-navigate on the hull spanned by all camera recording positions, looking at the scene from different directions, but we cannot, for example, move into the scene or fly through the scene. Output rendering quality obviously depends on the visual plausibility of the correspondence fields. While we found the automatic, pair-wise correspondence estimation algorithm by Stich et al. [SLW∗ 08, SLW∗ 10] to yield convincing and robust results overall, we explicitly allow for human interaction in addition to correct for remaining spurious correspondences. While for special scenarios other specifically tailored matching algorithms might yield better results, we incorporated a matching algorithm that provides robust results for most scenes. For the computation of a correspondence field, our non-optimized C++ code took about 2 to 5 minuted per image pair, depending on image resolution and scene content (i.e., amount of visible edges). Correspondence correction typically takes about one minute per video frame pair. In some cases, manual correction was not feasible. E.g., the Beer scene features single streaks of foam that cannot be matched manually. In the case of fluids, i.e., in the Water sequence, we observed that automatic feature matching works quite well at freeze-and-rotate c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation shots. In amorph image regions, no edges are matched and a simple blending of color values also yields plausible results. Bearing in mind that the short Firebreather sequence contains more than 3000 pairwise correspondence fields, the vast majority of correspondences in the presented results is computed without user interaction. The same is true for the rendered videos, e.g., the trajectory of the virtual camera in the Dancer scene requires several hundred correspondence fields. We experienced that editing more than one or two dozen correspondence fields per scene is not feasible. Small uncorrected inaccuracies in automatically computed fields become manifest in cross-fading/ghosting artifacts visible in some scenes of the accompanying video. In our system, occlusion is handled by heuristics as proposed by Stich et al. [SLW∗ 08], disocclusion is handled by mesh cutting based on the connectedness of the correspondence fields [MMB97]. We hence rely on the correctness of the computed correspondence fields. In cases where our occlusion heuristic fails, visible rendering artifacts occur at occlusion borders, e.g., the small ball in the Juggler scene and the silhouette of the skateboarder are not preserved faithfully. More sophisticated occlusion handling techniques, such as presented by [HSB09], [IK08] or [MHM∗ 09] might improve rendering quality in these cases. For high-quality rendering results, we observed that the angle between adjacent cameras should not exceed 10 degrees, independent of scene content, Table 1. For greater distances, missing scene information appears to become too large to still achieve convincing interpolation results. The same is true for too fast scene motion. As a rule of thumb, we found that across space or time, scene correspondences should not be farther apart than approximately 10% of linear image size. 7. Conclusion We have presented an image-based rendering system to interactively viewpoint-navigate through space and time of general real-world, dynamic scenes. Our system accepts unsynchronized multi-video data as input. A modest number of cameras suffices to achieve convincing results. Since we do not need special acquisition hardware or timeconsuming setup procedures, we can record with hand-held consumer camcorders in arbitrary environments, e.g., outdoors and hand-held. Our approach allows for smooth interpolation of view perspective and time, i.e., for simultaneous free-viewpoint and slow motion rendering. Based on pre-computed image correspondences, our system can handle complex scenes for which depth or geometry might be hard to reconstruct. Our system opens up a number of interesting future research directions. For example, ensuring correspondence consistency in a network of images awaits in-depth investigation. Furthermore, given the number of correspondence C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera Scene Dancer Camera Setup top: 4 cameras middle: 8 cameras bottom: 4 cameras Water 5 cameras in halfcircular arc Beer top: 5 cameras middle: 5 cameras bottom: 5 cameras Firetop: 4 cameras breather middle: 8 cameras bottom: 4 cameras Skate6 cameras in halfboarder circular arc Break8 cameras in halfdancer circular arc Juggler 5 cameras in halfcircular arc Challenges wide-angle lens distortion, fast motion refraction, reflection, specular highlights high speed motion, changing topology high dynamic contrast, over-exposure, volumetric effects outdoor capture, varying lighting conditions very fast motion, low temporal sampling, noisy acquisition outdoor capture, handheld and moving cameras Table 1: Camera recording arrangements. Cameras are spaced approximately 10 degrees apart in horizontal as well as vertical direction (3 degrees in the Breakdancer scene). We have deliberately chosen non-trivial test scenes to assess performance for a range of different scenarios. fields, automatically judging their quality by detecting artifacts in the interpolated images is an interesting avenue for future research. Also, integrating a path-based image interpolation method as recently presented by Mahajan et al. [MHM∗ 09] could turn out to be an interesting alternative to our current rendering model, given their method can be extended to more than two images while maintaining our real-time constraint. Recently, a 3D image stabilization technique for handheld video has been proposed that is based on sparse geometry reconstruction [LGJA09]. In order to correct the small geometric error when using handheld cameras, this approach could be incorporated into our system in a post-processing step. Given the amount of data our system relies on, suitable compression techniques are an important pre-requisite for distributing multi-video and correspondence field data. The image interpolation approach lends itself to generating a wide range of image-based special effects. Besides time-freeze and slow motion already presented here, an interpolation-based F/X editor can be conceived that allows for creating advanced effects such as motion streaks, motion distortion, space blur, and space-time ramps [LLR∗ 10]. [BBPP10] BALLAN L., B ROSTOW G. J., P UWEIN J., P OLLE FEYS M.: Unstructured video-based rendering: Interactive exploration of casually captured videos. 1–11. [BN92] B EIER T., N EELY S.: Feature-based image metamorphosis. Computer Graphics (Proc. of SIGGRAPH’93) 26, 2 (1992), 35–42. [Com10] C OMPUTER G RAPHICS L AB , TU B RAUN Virtual Video Camera project website. http://graphics.tu-bs.de/projects/vvc/, Aug. 2010. SCHWEIG : [CR74] C ATMULL E., ROM R.: A class of local interpolating splines. In Computer Aided Geometric Design, Barnhill R., Riesenfeld R., (Eds.). Academic Press, 1974, pp. 317–326. [CTMS03] C ARRANZA J., T HEOBALT C., M AGNOR M., S EI DEL H. P.: Free-Viewpoint Video of Human Actors. ACM Trans. on Graphics 22, 3 (2003), 569–577. [CW93] C HEN S. E., W ILLIAMS L.: View interpolation for image synthesis. In Proc. of ACM SIGGRAPH’93 (New York, 1993), ACM Press/ACM SIGGRAPH, pp. 279–288. [dAST∗ 08] DE AGUIAR E., S TOLL C., T HEOBALT C., A HMED N., S EIDEL H.-P., T HRUN S.: Performance Capture from Sparse Multi-View Video. ACM Trans. on Graphics 27, 3 (2008), 1–10. [ES96] E DELSBRUNNER H., S HAH N.: Incremental topological flipping works for regular triangulations. Algorithmica 15, 3 (1996), 223–241. [FT02] F UJII T., TANIMOTO M.: Free viewpoint TV system based on ray-space representation. In Proceedings of SPIE (2002), vol. 4864, SPIE, p. 175. [FWZ03] F ITZGIBBON A., W EXLER Y., Z ISSERMAN A.: Image-based rendering using image-based priors. In ICCV ’03: Proceedings of the Ninth IEEE International Conference on Computer Vision (Washington, DC, USA, 2003), IEEE Computer Society, p. 1176. [GMW02] G OLDLÜCKE B., M AGNOR M., W ILBURN B.: Hardware-accelerated dynamic light field rendering. Proc. of VMV’02 (Nov. 2002), 455–462. [GSC∗ 07] G OESELE M., S NAVELY N., C URLESS B., H OPPE H., S EITZ S.: Multi-view stereo for community photo collections. ICCV’07 (2007), 1–8. [HK09] H ORNUNG A., KOBBELT L.: Interactive pixel-accurate free viewpoint rendering from images with silhouette aware sampling. Computer Graphics Forum 28, 8 (2009), 2090 – 2103. [HRT∗ 09] H ASLER N., ROSENHAHN B., T HORMÄHLEN T., WAND M., G ALL J., S EIDEL H.-P.: Markerless Motion Capture with Unsynchronized Moving Cameras. In Proc. of CVPR’09 (Washington, June 2009), IEEE Computer Society, p. to appear. [HSB09] H ERBST E., S EITZ S., BAKER S.: Occlusion Reasoning for Temporal Interpolation using Optical Flow. Tech. rep., Microsoft Research Technical Report, MSR-TR-2009-2014, 2009. no. MSR-TR-2009-2014. [IK08] I NCE S., KONRAD J.: Occlusion-aware view interpolation. EURASIP Journal on Image and Video Processing (2008). References [LASM08] L IPSKI C., A LBUQUERQUE G., S TICH T., M AGNOR M.: Spacetime Tetrahedra: Image-Based Viewpoint Navigation through Space and Time. Tech. rep., Computer Graphics Lab, TU Braunschweig, 2008. no. 2008-12-9. [BBM∗ 01] B UEHLER C., B OSSE M., M C M ILLAN L., G ORTLER S., C OHEN M.: Unstructured Lumigraph Rendering. In Proc. of ACM SIGGRAPH’01 (New York, 2001), ACM Press/ACM SIGGRAPH, pp. 425–432. [LBE∗ 10] L IPSKI C., B OSE D., E ISEMANN M., B ERGER K., M AGNOR M.: Sparse Bundle Adjustment Speedup Strategies. In WSCG Short Papers Post-Conference Proceedings (2 2010), Skala V., (Ed.), WSCG. c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera [LGJA09] L IU F., G LEICHER M., J IN H., AGARWALA A.: Content-preserving warps for 3d video stabilization. ACM Trans. Graph. 28, 3 (2009), 1–9. for structure from motion methods. In The 13th Central European Seminar on Computer Graphics (CESCG 2009) (Apr. 2009). [LH96] L EVOY M., H ANRAHAN P.: Light Field Rendering. In Proc. of ACM SIGGRAPH’96 (New York, 1996), ACM Press/ACM SIGGRAPH, pp. 31–42. [SLAM08] S TICH T., L INZ C., A LBUQUERQUE G., M AGNOR M.: View and Time Interpolation in Image Space. Computer Graphics Forum (Proc. of PG’08) 27, 7 (2008). [LLNM10] L IPSKI C., L INZ C., N EUMANN T., M AGNOR M.: High resolution image correspondences for video PostProduction. In CVMP 2010 (London, 2010). [SLW∗ 08] S TICH T., L INZ C., WALLRAVEN C., C UNNINGHAM D., M AGNOR M.: Perception-motivated Interpolation of Image Sequences. In Proc. of ACM APGV’08 (Los Angeles, USA, 2008), ACM Press, pp. 97–106. [LLR∗ 10] L INZ C., L IPSKI C., ROGGE L., T HEOBALT C., M AGNOR M.: Space-Time visual effects as a Post-Production process. In ACM Multimedia 2010 Workshop - 3DVP’10 (Firenze, 10 2010), p. to appear. [LWS98] L EE S., W OLBERG G., S HIN S.: Polymorph: morphing among multiple images. IEEE Computer Graphics and Applications (1998), 58–71. [MB95] M C M ILLAN L., B ISHOP G.: Plenoptic Modeling. In Proc. of ACM SIGGRAPH’95 (New York, 1995), ACM Press/ACM SIGGRAPH, pp. 39–46. [MBR∗ 00] M ATUSIK W., B UEHLER C., R ASKAR R., G ORTLER S., M C M ILLAN L.: Image-Based Visual Hulls. In Proc. of ACM SIGGRAPH’00 (New York, 2000), ACM Press/ACM SIGGRAPH, pp. 369–374. [MD99] M ANNING R., DYER C.: Interpolating View and Scene Motion by Dynamic View Morphing. In Proc. of CVPR’99 (Washington, 1999), IEEE Computer Society, pp. 388–394. [MHM∗ 09] M AHAJAN D., H UANG F., M ATUSIK W., R A R., B ELHUMEUR P.: Moving Gradients: A PathBased Method for Plausible Image Interpolation. ACM Trans. on Graphics (2009), to appear. MAMOORTHI [MMB97] M ARK W., M C M ILLAN L., B ISHOP G.: PostRendering 3D Warping. In Proc. of Symposium on Interactive 3D Graphics (1997), pp. 7–16. [MP04] M ATUSIK W., P FISTER H.: 3D TV: A Scalable System for Real-Time Acquisition, Transmission, and Autostereoscopic Display of Dynamic Scenes. ACM Trans. on Graphics 23, 3 (2004), 814–824. [MSMP08] M EYER B., S TICH T., M AGNOR M., P OLLEFEYS M.: Subframe Temporal Alignment of Non-Stationary Cameras. In Proc. British Machine Vision Conference BMVC ’08 (2008). [NTH02] NAEMURA T., TAGO J., H ARASHIMA H.: Real-Time Video-Based Modeling and Rendering of 3D Scenes. IEEE Computer Graphics and Applications (2002), 66–73. [SC08] S CHOENEMANN T., C REMERS D.: High Resolution Motion Layer Decomposition using Dual-space Graph Cuts. In Proc. of CVPR (Washington, 2008), IEEE Computer Society, pp. 1–7. [SLW∗ 10] S TICH T., L INZ C., WALLRAVEN C., C UNNINGHAM D., M AGNOR M.: Perception-motivated Interpolation of Image Sequences. ACM Transactions on Applied Perception (TAP) (2010). to appear. [SSS06] S NAVELY N., S EITZ S. M., S ZELISKI R.: Photo tourism: Exploring photo collections in 3d. ACM Trans. on Graphics 25, 3 (2006), 835–846. [ST06] S AND P., T ELLER S.: Particle Video: Long-Range Motion Estimation using Point Trajectories. In Proc. of CVPR’06 (Washington, 2006), IEEE Computer Society, pp. 2195–2202. [VBK05] V EDULA S., BAKER S., K ANADE T.: Image Based Spatio-Temporal Modeling and View Interpolation of Dynamic Events. ACM Trans. on Graphics 24, 2 (2005), 240–261. [VGvT∗ 05] VAISH V., G ARG G., VILLE TALVALA E., A N TUNEZ E., W ILBURN B., H OROWITZ M., L EVOY M.: Synthetic aperture focusing using a shear-warp factorization of the viewing transform. In Proc. Workshop on Advanced 3D Imaging for Safety and Security (2005), p. 129. [WJV∗ 05] W ILBURN B., J OSHI N., VAISH V., TALVALA E.-V., A NTUNEZ E., BARTH A., A DAMS A., H OROWITZ M., L EVOY M.: High Performance Imaging using large Camera Arrays. ACM Trans. on Graphics 24, 3 (2005), 765–776. [WSY07] WANG H., S UN M., YANG R.: Space-Time Light Field Rendering. IEEE Trans. Visualization and Computer Graphics (2007), 697–710. [WWC∗ 05] WASCHBÜSCH M., W ÜRMLIN S., C OTTING D., S ADLO F., G ROSS M.: Scalable 3d video of dynamic scenes. Vis. Comp. 21, 8 (2005), 629–638. [WY05] WANG H., YANG R.: Towards space: time light field rendering. In Proc. of I3D’05 (New York, 2005), ACM Press, pp. 125–132. [XCJ08] X U L., C HEN J., J IA J.: A Segmentation Based Variational Model for Accurate Optical Flow Estimation. In Proc. of ECCV’08 (Berlin, 2008), Springer, pp. 671–684. [XS04] X IAO J., S HAH M.: Tri-View Morphing. Computer Vision and Image Understanding 96 (2004), 345–366. [SD96] S EITZ S. M., DYER C. R.: View Morphing. In Proc. of ACM SIGGRAPH’96 (New York, 1996), ACM Press/ACM SIGGRAPH, pp. 21–30. [YEBM02] YANG J. C., E VERETT M., B UEHLER C., M CMIL LAN L.: A Real-Time distributed Light Field Camera. In Proc. of EGRW’02 (2002), Eurographics Association, pp. 77–86. [SG05] S I H., G AERTNER K.: Meshing Piecewise Linear Complexes by Constrained Delaunay Tetrahedralizations. In Proc. of International Meshing Roundtable (Berlin, September 2005), Springer, pp. 147–163. [ZJK05] Z ITNICK C., J OJIC N., K ANG S. B.: Consistent Segmentation for Optical Flow Estimation. ICCV’05 2 (2005), 1308– 1315. [SGSS08] S NAVELY N., G ARG R., S EITZ S. M., S ZELISKI R.: Finding Paths through the World’s Photos. ACM Trans. on Graphics 27, 3 (2008), 11–21. [ZKU∗ 04] Z ITNICK C., K ANG S., U YTTENDAELE M., W INDER S., S ZELISKI R.: High-Quality Video View Interpolation Using a Layered Representation. ACM Trans. on Graphics 23, 3 (2004), 600–608. [SH07] S TARCK J., H ILTON A.: Surface Capture for Performance Based Animation. IEEE Computer Graphics and Applications 27(3) (2007), 21–31. [ZWGS02] Z HANG Z., WANG L., G UO B., S HUM H.-Y.: Feature-based light field morphing. ACM Trans. on Graphics 21, 3 (2002), 457–464. [SK09] S CHWARTZ C., K LEIN R.: Improving initial estimations c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Lipski, C. Linz, K. Berger, A. Sellent, M. Magnor / Virtual Video Camera Figure 12: Virtual Video Camera views: for each test scene, the virtual viewpoint at the barycenter of the enclosing navigationspace tetrahedron is rendered, representing the ‘worst case interpolation’. As test scenes, we have deliberately chosen complex scenes that pose a variety of challenges: outdoor capture (Skateboarder), volumetric effect and high dynamic contrast (Firebreather), fast, non-rigid motion (Beer), reflection and refraction (Water), wide-angle lens distortion (Dancer), very fast scene motion (Breakdancer), and hand-held acquisition (Juggler). c 2010 The Author(s) c 2010 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation