The roots of image based rendering

Transcription

The roots of image based rendering
The roots of image based rendering
Wolfgang Illmeyer, 532, 0326382
June 1, 2006
Abstract
When talking about rendering, especially in conjunction with
widely available consumer 3D-graphics accelerators and computer
games, we almost always talk about geometry based approaches, but
there’s also the image based approach, which is based on light rays.
This paper describes the roots of image based rendering up to the
papers which uncovered the full potential of image based rendering techniques - Levoy’s and Hanrahan’s “Light Field Rendering”
[Levoy ’96] and Gortler et al. - “The Lumigraph” [Gortler ’96].
1
Contents
1 Image based rendering
1.1 About image based rendering . . . . . . . . . . . . . . . . . .
1.2 Traditional / image based rendering tradeoff . . . . . . . . . .
3
3
3
2 Sprites
2.1 About sprites . . . . . . . . . . . . . . . . .
2.2 Sprites in hardware . . . . . . . . . . . . . .
2.2.1 The C64 and the VIC-II Chip . . . .
2.2.2 The Amiga and the Agnus Chip . . .
2.2.3 Sprites in OpenGL based accelerators
2.3 Sprites and image based rendering . . . . . .
.
.
.
.
.
.
4
4
4
4
4
5
5
.
.
.
.
.
.
.
.
.
.
5
5
6
6
7
7
7
7
8
8
8
3 QuickTime VR
3.1 About QuickTime VR . . . . .
3.2 Panoramic images in QuickTime
3.2.1 Creation . . . . . . . . .
3.2.2 Storage . . . . . . . . .
3.2.3 Viewing . . . . . . . . .
3.3 3D object viewing in QuickTime
3.3.1 Creation . . . . . . . . .
3.3.2 Storage . . . . . . . . .
3.3.3 Viewing . . . . . . . . .
3.4 QuickTime VR and image based
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . .
VR (“Panorama movies”)
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
VR (“Object movies”) . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
rendering . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 The plenoptic function
8
5 Light field rendering
5.1 The light slab . . .
5.2 Light field creation
5.3 Light field storage .
5.4 Light field display .
9
9
10
12
12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Lumigraph
13
6.1 Free handed camera . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Surface approximation . . . . . . . . . . . . . . . . . . . . . . 13
6.3 Depth correction . . . . . . . . . . . . . . . . . . . . . . . . . 14
2
1
1.1
Image based rendering
About image based rendering
Traditional geometry based rendering approaches rely on a model of the geometry to be drawn in some way. This is often a problem when an object
in the real world needs to be modeled. There are multiple approaches, like
coordinate measuring machines (CMM) or laser scanners, but they just deliver surface vertices - so it’s still a long way to go until there’s a complete
model of the object. The surface has to be recreated either by adding edges
for a polygonal model or creating some kind of spline patches to describe it.
There’s no texture either yet.
Image based techniques allow for easier modeling of real world objects.
To create a model for image based rendering, we just need a set of photos
from the object, captured at special positions around it. The resulting data
still has to be processed for space and rendering speed reasons, but the model
is already complete at that point.
Back in the days when polygonal 3D accelerators weren’t widely available,
“scaled down” image based rendering approaches were used a lot in consumer
hardware. Virtually any computer game uses sprites in some way. Aside from
games, there also is Apple’s QuickTime VR, which was designed to deliver
“Virtual Reality” experience to consumer devices with limited computing
power and without any special acceleration or input hardware. Both of them
are image based rendering approaches, because no geometry data is used for
rendering in either case.
1.2
Traditional / image based rendering tradeoff
Whereas geometry based rendering relies heavily on computing power for
better image quality or more complex models, this is not the case with image based rendering approaches. If we want better image quality, we have
to improve the quality of the source image data somehow. The computing
power needed for one image of a given size is always equal, no matter how
complex the scene is. The only exception is the choice of different interpolation techniques. To prevent aliasing, we can use interpolation in 2D or 4D,
which of course slows down the rendering process. However, image based
rendering techniques do rely heavily on storage space. If we want to render
larger images, we need source material in higher resolutions, which is limited
mostly by the available RAM.
3
2
2.1
Sprites
About sprites
Sprites are small rectangular bitmaps, which can be partially transparent.
They contain no geometry data. Sprites experienced heavy use in computer
games, but they’re also used for special effects in movies. Sprites have the
advantage, that they can be accelerated very easily, and therefore there are
many home computers and gaming consoles which support accelerated sprite
rendering. The support for sprites became less important in the last years,
because nearly every device has polygonal rendering acceleration, which includes sprites as a special case. Also, computing power is no more issue these
days, so acceleration wouldn’t even be needed for sprite rendering.
Sprites are used when classic polygonal rendering delivers bad results.
In terms of image quality the best example are flames. Flames look quite
strange when rendered as polygons, while they have a quite natural look
when rendered as a sprite. When there are a lot of “unimportant” geometry
details, as it is the case with grass in outdoor scenes, polygonal rendering
would be too slow. The viewer doesn’t pay attention to every single blade of
grass, so it can be rendered more efficiently using sprites with a fraction of
the polygons.
2.2
2.2.1
Sprites in hardware
The C64 and the VIC-II Chip
The C64 was a popular 8-bit home computer by Commodore. For rendering
sprites, it used the “VIC-II” Chip by MOS Technology [Bauer ’96]. Up to
eight sprites could be loaded into the on-chip 16 KB graphics RAM. The chip
only needs to be told which of the sprites should be drawn where, and all of
that could be changed on every scanline. The VIC-II had no framebuffer, it
directly rendered to PAL/NTSC output.
2.2.2
The Amiga and the Agnus Chip
The Amiga[Wikipedia], also sold by Commodore also was a popular home
computer. It was heavily used for video editing. The Amiga’s graphics chip
called “Denise”, used a simple kind of framebuffer, which could be accessed
by the “Agnus” chip, which controlled the RAM. “Agnus” contained a special
blitting unit, which enabled it to copy sprite data to the framebuffer, while
the CPU could do something else.
4
2.2.3
Sprites in OpenGL based accelerators
Todays modern 3D accelerators can draw sprites as fast as any polygon.
For OpenGL based accelerators, Sprites are just quadrilaterals with normals
facing the viewer. This technique is called “Billboarding” in the context of
3D scenes.
2.3
Sprites and image based rendering
Sprites are as powerful as any captured image - there’s only one viewpoint
for the scene, and only one viewing direction. This is actually the most basic
form of image based rendering. We can add more viewpoints by making
multiple images/sprites from the same object. This was actually used in
“Doom” to simulate different views at the monsters. Besides that, we can
also perform simple transforms on sprites, which are not supported by all the
mentioned acceleration methods. Sprites can be rotated around the viewing
axis, and they can be zoomed. Other transforms (e.g. shear) are possible,
too, but in context of image based rendering, they are not significant.
Figure 1: Example sprites of Doom by id Software for different viewpoints
3
3.1
QuickTime VR
About QuickTime VR
QuickTime VR [Chen ’95] was created by Apple Computer Inc. as a way
to provide a cheap virtual reality extension to their “QuickTime” multimedia framework. “Virtual reality” in this context means being able to either
rotate the camera at a fixed viewpoint so that one can almost freely look
around in a pre-captured environment, or rotating the camera around an
object. QuickTime VR provides complete means to create, distribute and
view such scenes. It includes software for authoring, streaming and displaying virtual reality content. The most important design goal of QuickTime
VR was to produce an affordable system, so that the rendering could take
place in normal personal computers, without the need for too much storage
5
space or an especially fast CPU. Because 3D input- or output devices are
also quite expensive, there’s no support for any 3D helmets or data gloves.
These requirements impose restrictions on what can be done in QuickTime
VR.
3.2
Panoramic images in QuickTime VR (“Panorama
movies”)
Figure 2: Example cylinder environment map stitched together from multiple
photos
3.2.1
Creation
To create panoramic pictures, there are several possibilities. We can stitch
together multiple images captured with a rotatable camera where the rear
nodal point of the objective is on the rotation axis. For this purpose, a so
called “Fish eye” objective can be used. Fish eye objectives have a field of
vision of about 180 degrees, so only 2 images must be captured for a full
panorama. Another possibility is to use special panoramic cameras.
To create a model of the panorama, the image material needs to be projected on a surface which can be easily parameterized, like a sphere, a cube
or a cylinder. QuickTime VR uses a cylinder without caps. This limits the
vertical viewing angle, because there are no caps, but it makes projection,
reprojection and storage easier and faster.
6
3.2.2
Storage
What essentially needs to be stored for panoramic images, is the cylinder,
which can be cut and rolled up in a plane - so it is actually just an image
of a cylindric environment map. In QuickTime, this image is split in tiles,
and each tile is saved as a separate, compressed image in a QuickTime video
track.
3.2.3
Viewing
To view a panoramic image, the required tiles are extracted from the video
track. Then they are reprojected back on the screen. Tiles adjacent to
the viewing direction are preloaded and cached in order to provide realtime
viewing.
Figure 3: Reprojection using a cube and a sphere
3.3
3.3.1
3D object viewing in QuickTime VR (“Object
movies”)
Creation
Object movies are created by taking photos of an object from different angles
and a fixed distance from the center. We could for example take a photo
on every 10 degrees vertically and horizontally. This is achieved through a
special apparatus consisting of rotatable platform and a meridian where the
camera can be mounted at different positions. For object movies, lighting
is especially important. Lighting can either be fixed relative to the cameras
position, or relative to the object’s rotation. With camera-attached lighting,
the viewer gets the impression that he stays at a fixed position while rotating
the object, whereas with object-attached lighting, the viewer thinks he walks
around the object, while it stays in a fixed place.
7
3.3.2
Storage
QuickTime VR uses the same storage pattern for object movies as for
panorama movies, but this time there are already multiple pictures, nothing
needs to be split.
3.3.3
Viewing
For viewing, the corresponding frame is extracted from the video and shown.
Adjacent frames can be preloaded and cached to allow for realtime viewing.
3.4
QuickTime VR and image based rendering
Panorama movies go one step beyond sprite rendering. There’s still only one
possible viewpoint, but the viewing direction can be chosen freely (with the
exception of the vertical viewing angle, because the caps of the cylinder map
are not saved). There can be multiple viewing points where each has its own
panorama. In QuickTime VR, panoramas can be linked together, so that
the viewer can click on the next viewing point in the panorama to advance.
The images could also be rotated around the viewing axis, but this is not
supported by QuickTime VR.
The object movie approach is essentially equal to sprite rendering. Rotation around the view axis would be possible here, too, but is not implemented
in QuickTime VR.
4
The plenoptic function
The plenoptic function is the basis of any image based rendering approach.
The name comes from the latin, with “plenus” meaning “complete” or “full”
and “optic” meaning “pertaining to vision”. It describes the radiance of
light from any given direction (θ, φ) at any given point of view (Vx , Vy , Vz ) at
any time (t) at any wavelength (λ) independent from the origin point of the
radiance.
R = P (θ, φ, λ, Vx , Vy , Vz , t)
(1)
For affordable image based rendering approaches, the plenoptic function
needs to be sampled in an efficient way. For current approaches, the plenoptic function is reduced to not include time, as well as all the different wavelengths. The reduced plenoptical function looks like that:
RGB = P (θ, φ, Vx , Vy , Vz )
8
(2)
Figure 4: Illustration of the plenoptic function
From the original 7 dimensions, only 5 are left, and the function returns an
RGB value rather than a radiance of a wavelength. However, for practical applications, this parameterization is not feasible, because 5D sampling
generates far too much data. Current applications place an additional restriction at the scene: When all light rays originate from a convex space and
the viewing point is outside of that space (in “occluder free space”), a 4D
parametrization of the plenoptic function is sufficient.
5
Light field rendering
The light slab structure provides means for creating, storing and rendering
precaptured objects in an efficient way with image based rendering methods.
It offers a freely chosable point and direction of view. However, the lighting
of the model is static and there’s no concept of time, so the model can’t be
lighted dynamically or animated in any way.
The Light Field paper not only describes the light slab structure, but also
suggests methods for creating, storing and rendering lightfields.
5.1
The light slab
The light slab is a structure to efficiently store a subset of the plenoptic
function. It consists of two planes which are parameterized by (u, v) and
(s, t). Every ray of light that passes both planes can be stored in the light slab
and can be referenced by (u, v, s, t), where (u, v) and (s, t) are the intersection
9
points at the two planes.
Figure 5: The Light slab representation
Using light slabs, we only have a 4D parameterization of the “ray space”,
in contrast to the 5D parameterization of the reduced plenoptic function.
This requires unobstructed, free space, though. To create a Light field model
of a given object, we have to enclose it in light slabs, so that it can be viewed
from any point outside the model.
5.2
Light field creation
Light fields can be created from virtual scenes as well as from photographs of
real objects. For synthetic scenes, the raytracer software needs to be modified
slightly to directly output the light slabs by tracing every ray that crosses
any pair of discrete sampling points of the (u, v) and (s, t) planes. If the
synthetic scene is not rendered using raytracing, we can produce a set of 2D
images. The virtual camera is placed at every point on the (u, v) plane and
looks at the (s, t) plane. However, the pixels of the rendered image must
exactly correspond to the points on the (s, t) plane. This can be achieved
through a sheared perspective projection. The resulting 4D light slab can be
visualized as an array of (u, v) images of size s · t, or the other way round as
an array of (s, t) images of size u · v, as seen in Figure 6.
To create a light field of real world objects, they need to be photographed
from different known positions in order to create light slabs. The Light
Field paper suggests to use a special gantry which can move the camera in
horizontal and vertical direction on a plane and additionally adjust pitch and
yaw of the camera so that it always points at the center of the object. This
ensures a full coverage of the whole (u, v) and (s, t) plane. After the whole
plane of the gantry has been scanned through, the object and its lighting are
rotated by 90 degrees and the whole process is repeated until four complete
light slabs have been captured.
10
Figure 6: Visual representations of a lightslab.
Figure 7: Gantry for creating a light field
11
5.3
Light field storage
As a light slab is actually a 4D image, it naturally uses up a lot of storage
space. The largest example of a light field in the paper uses up 1.6 GB of space
in uncompressed form. The data in light slabs is also highly redundant, so a
compression scheme is needed. The Authers set up a compression pipeline for
lightfields consisting of lossy and lossless compression schemes which allows
for a compression ratio of about 120:1.
To achieve that, the light field array is compressed with vector quantization. To achieve that, the light slabs are split into either 2D or 4D tiles,
yielding 12- or 48 dimensional vectors. A representative subset of the tiles is
then used for training, which means generating a representative subset for all
training vectors which matches the training set with the least mean squared
error. This representative subset is called “codebook”. Once the codebook
is generated, compression can start. Each tile of the light slab is replaced by
the index of the codebook entry which it matches best. The paper uses 14
Bit indices which are padded up to 16 Bit to simplify the decoding process.
In this first compression stage, they achieve a compression ratio of 24:1.
The result of the vector quantization still has a lot of redundancy, for
example constant background color. The second step eliminates that redundancy by entropy coding. The gzip implementation of the Lempel-Ziv coding
is used to achieve that. gzip typically can reduce the size of the light slab
indices by another 5:1.
5.4
Light field display
Figure 8: Rendering an image from a light slab
To view a light field, it has to be loaded into memory. At first, the gzip
compressed indices and the code book are uncompressed and stored in RAM.
The renderer then traces a ray from an eyepoint through the view plane and
through the (u, v) and (s, t) planes. Then it looks up the codewords it needs
12
for the current ray in the codeword array and get the corresponding vector
from the codebook.
Due to the discrete nature of the (u, v) and (s, t) sampling points, adjacent
rays have to be interpolated for rendering an image to prevent aliasing effects.
This can either be done only in the (u,v) plane, but quadralinear interpolation
in both planes (u, v, s, t) delivers best results.
6
Lumigraph
The Lumigraph paper by Gortler et al. [Gortler ’96] discusses similar ideas as
the light field paper and introduces a system for capturing real world objects
for creation of a light field, or Lumigraph, as they call it, more easily. They
eliminate the need for computer-controlled or -assisted camera posing in favor
of the use of a hand camera, so that bigger objects can also be captured. The
lumigraph also introduces depth correction using surface approximation and
a method to fill gaps in the model, where too little data has been captured.
6.1
Free handed camera
To allow the use of a free handed camera, it must be calibrated. This is a
two step process. There’s the intrinsic parameter calibration, which means
positioning all rays that a camera captures relative to the viewpoint. For
the Lumigraph system, a camera with a fixed lens is used, so the intrinsic
parameter calibration must only be performed once.
The extrinsic parameter calibration extracts the pose of the camera (its
position and orientation) from the image. Because the pose of the camera is
changed on every frame, this step is performed repeatedly. To assist this process, Images are taken at a special stage. It consists of three fixed, orthogonal
walls. The walls are all colored in cyan to assist in the surface approximation
stage and for parameter calibration, the walls bear 30 markers, each consisting of up to three concentric circles of different sizes. The markers are used
for both intrinsic and extrinsic parameter calibration, however for extrinsic
calibration, only 8 markers need to be visible at any time. Using this stage,
the whole upper hemisphere around the object can be captured.
6.2
Surface approximation
The Lumigraph system uses surface approximation for the following depth
correction step. The surface approximation is achieved by constructing an
octree. For this purpose, a subset of the captured images is segmented into
13
Figure 9: Markers for camera calibration on a stage for data acquisition
object and background. The segmented images are then used together with
their camera pose information to cut out the shape of the object from the
octree. After about 4 subdivisions of the octree, the resulting 3D model can
be hollowed and passed on to depth correction.
Figure 10: Segmented image and surface approximation
6.3
Depth correction
If we wanted to aproximate the ray (s, u) in a 2D-Lumigraph (see Figure
11), we would most likely chose the ray (si+1 , up ) for looking up the color
of the corresponding pixel, because it is next to (s, u). However, given the
depth information (z) extracted from the previous step, we can see that for
example the ray (si , u0 ) is much nearer to the point on the surface where
(s, u) intersects the object than (si+1 , up ).
We can calculate an u0 for any si so that (si , u0 ) intersects the surface at
the same place as the original ray. If z is normalized so that z = 0 means
14
the object surface lies on the (u, v) plane and z = 1 means it’s on the (s, t)
plane, then u0 can be calculated as follows:
z
u0 = u + (s − si )
1−z
Figure 11: Depth correction
References
[Levoy ’96] Pat Hanrahan, Marc Levoy
Light Field Rendering. Computer Graphics Proceedings, Annual Conference Series
(Proc. SIGGRAPH ’96), pages 31-42, 1996
[Gortler ’96] Michael F. Cohen, Steveb J. Gortler, Radek Grzesczuk, Richard Szeliski
The Lumigraph. Computer Graphics Proceedings, An- nual Conference Series (Proc.
SIGGRAPH ’96), pages 43-54, 1996
[McMillan & Bishop ’95] Leonard McMillan, Gary Bishop.
Plenoptic Modeling. Computer Graphics Proceedings, Annual Conference Series
(Proc. SIGGRAPH ’95), pages 39-46, 1995
[Chen ’95] Shenchang Eric Chen, Apple Computer Inc.
QuickTime VR - an image-based approach to virtual environment navigation. Proc.
ACM SIGGRAPH ’95, pages 29–38.
[Bauer ’96] Christian Bauer
The MOS 6567/6569 video controller (VIC-II) and its application in the Commodore
64 Christian Bauer, 1996
http://www.minet.uni-jena.de/~andreasg/c64/vic_artikel/vic_article_1.htm
[Wikipedia] Wikipedia
http://de.wikipedia.org/w/index.php?title=Amiga&id=17159946
15