Streaming of photo-realistic texture mapped on 3D

Transcription

Streaming of photo-realistic texture mapped on 3D
Streaming of photo-realistic texture mapped on 3D surface
S. Horbelt, F. Jordan, T. Ebrahimi
Signal Processing Laboratory, Swiss Federal Institute of Technology,
CH 1015, Lausanne,
Switzerland
Email: [email protected], [email protected], [email protected]
ABSTRACT
We present a novel technique for efficient coding
of texture to be mapped on 3D landscapes. The
technique enables to stream the data across the
network using a back-channel.
1.
y
x
INTRODUCTION
A new method to obtain an optimal coding of
texture and 3D data is presented. It consists of
taking into account the viewing position in the 3D
virtual world in order to transmit only the most
visible information. This approach allows to greatly
reduce the transmitted information between a
remote database and a user, given that a backchannel is available. At a given time, only the most
important information is sent, depending on object
geometry and viewpoint displacement.
This basic principle is applied to the texture data to
be mapped on a 3D grid mesh.
Section 2 explains which criteria can be used to
reduce the texture information. Section 3 reports
how these criteria can be applied in a DCT-based
(Discrete Cosine Transform) encoder. Sections 4
and 5 provide results with the proposed technique
applied to landscape and spherical mesh.
Projection plane
z
View point
Figure 1: Depth effect
2.3. Tilting effect
Let us consider the case of a planar surface tilted
around y-axis and projected on the x-y-plane. In
orthogonal projection, it is possible to down-scale
the texture in x-direction, depending on the tilting
angle. For instance, with a 60° tilting, only half of
the texture information really needs to be
transmitted. (Figure 2).
y
x
Projection plane
z
2.
CRITERIA
2.1. Principle
The 3D relative position between viewpoint and
mesh can be used to decrease the texture
information. Two criteria have been used: the depth
and the tilting.
2.2. Depth criteria
If the viewpoint is sufficiently far away from the
mesh, a down-scaled version of the original texture
can be transmitted without any visible alteration of
the rendered image. (Figure 1).
Figure 2: tilting effect, cos α= 0.5
2.4. Rotation
Let us call α the tilting angle. The rotation angle β
of the plane determines in which direction the
tilting criteria should be applied. (Figure 3).
Normal
y
α
x
β
View Point
z
Figure 3: Orientation of the plane
3.
TEXTURE DATA REDUCTION
PRINCIPLE
3.1. Generalization to a regular grid mesh
In the case of a grid mesh on which a big texture is
mapped, the same principle can be applied to each
of the cells which define the mesh. In the
following, it is assumed that the number of cells
and the texture size are such that a texture block of
8x8 texels is mapped on each cell.
3.2. Coding
Each 8x8 block is DCT transformed. It is then
possible to remove, or “shrink”, some of the DCT
coefficients, according to α and β values as shown
in Figure 4. The DCT coefficients are then
quantized and coded using MPEG-4 Video-VM
algorithm.
cos(α)
β
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
0
1
1
1
1
0
0
0
0
1
1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Figure 4: Shrinking DCT coefficients
3.3. Streaming texture data
Once the texture data needed for the first frame
(intra data) has been loaded using the proposed
method, it is then straightforward to send new DCT
coefficients (inter data) as the observer is moving
in the scene. Therefore, it is possible to update the
texture information, according to the new viewer
position using a very low bit rate channel.
4.
RESULTS
4.1. Landscape and elevation data used
We used three sets of digital maps and elevation
data provided by three different companies:
SA: The texture by Swissair, contains a 10km by
10km area of the valleys and mountains next to our
institute at 1m resolution. We use a 1024×1024
pixel version in YUV format (12 Mbits) and a grid
with 128×128 elevation points for the experiment
Z2/1 and Z2/3.
E&S: Evans&Sutherland provided a 10km by
10km area with canyons and mountains next to San
Luis Obispo, California, USA at 5m resolution. We
use this data in the same resolution as above for the
experiment Z2/2, Z2/4 and Z2/5.
LE: The Living Earth, Inc., gave us their map of
the world at 4km resolution (900Mbits). We use a
1024×512 pixel version in YUV format (6Mbits)
together with a spherical 128×64 sized grid for
experiments Z2/6.
4.2. Experiments
The following extended discussion of the core
experiments Z2/1-6 [5,6] and their results, show the
universality of the view-dependent technique at
different resolutions. Figures 8-11 and tables 2-3
show the bitrates obtained to generate each of the
frames of the sequence. The bitrates do not include
the transmission of the elevation data, which
represent a raw data of 32768 bytes with 16 bits per
elevation point (1 % of the original texture data).
Figures 5-7 show the patterns of transmitted DCT
coefficients for each frame.
4.3. Flat zoom sequences
This experiment zooms to a flat square plane on
which the texture is mapped. At the beginning the
plane is titled and far away (~100km), at the end it
is viewed closely (~5km) from the top. The square
never leaves the field of view. The visualization of
the DCT coefficients that are updated for each
frame of the sequence reveals interesting patterns
(see figure 5). It shows how the view-dependent
filter evolves over time in its frequency selectivity.
Figure 8 contains the bitrates for the two different
textures: SA for Z2/1 and E&S for Z2/3. Again one
can see that first the low frequency parts are
transmitted and as the sequence zooms closer the
higher frequency parts follow. The comparison of
the bitrates shows that the E&S texture is more
detailed (see table 2).
4.4. Flyover sequences
In these experiments, we fly in an airplane at low
altitude (~300m) over the texture mapped terrain.
We observe that the data is streamed at a quite
constant and low bitrate (see figures 6, 9, 10 and
table 2). Depending on the flight path and the relief,
more or less new visual information has to be
transmitted. In figure 9, the bitrate in Z2/2 (SA)
drops at the end, as the sequences discovers less
new territory, which is not the case for sequence
Z2/4 (E&S). The long sequence Z2/5 simulates a
take off from an airport (see figure 10).
4.5. Globe zoom sequences
The zoom on the earth from a distance of
100.000km to 5.000km streams the texture of earth
globe (LE). Figure 7 shows the circular pattern of
the transmitted frequencies, which becomes at close
distance a square, similar to the flat sequence.
Figure 11 and table 3 show how screen size and
texture quality influence the bitrates.
4.6. Comparison with still image coding
One could just compress the textures with JPEG
and transmit them (see table 1). The clear
disadvantage is: the huge delay before the first
image can be rendered at the receiver. The
proposed view-dependent method is simple and
achieves lower bitrates, a higher quality and a
lower delay.
Texture
RGB coded
JPEG 75% coded
Flat zoom sequence
YUV data: 180 Mbits
Fly over sequence
YUV data: 200 MBits
SA
24 MBits
2.4 MBits
1.6 Mbits
(Z2/1)
0.9 Mbits
(Z2/2)
E&S
24 MBits
2.4 MBits
1.1 MBits
(Z2/3)
1.5 Mbits
(Z2/4)
LE
12 Mbits
0.7 MBits
0.5 Mbits
(Globe)
---
Table 1:Number of bits needed to transmit a texture
during the whole sequence, compared to the
number of bits achieved after coding the texture
with JPEG at 75% quality factor.
5.
CONCLUSION
This technique has been proposed to ISO/MPEG-4
organization in the framework of Synthetic Natural
and Hybrid Coding activities.
Several core experiments have been performed [3,
4, 5, 6] which show that this technique performs
better than the current video-VM. The viewdependency criteria have been integrated in the
SNHC Verification Model.
Future work will merge and extend our experiments
in order to stream bigger landscape areas and zoom
over several magnitudes of resolutions, while
keeping very-low bitrates.
Figure 5: DCT coefficients sent in the flat zoom
sequences Z2/1 and Z2/3. The first frame (intra) is
at upper left corner, last one is at bottom right.
Each frame is split into 8x8 DCT blocks (not
visible at this scale). In each block, the black dots
represent the transmitted DCT coefficients. They
form an interesting pattern. Each stripe moving
across the texture, as time increases, contains DCT
coefficients of one frequency.
We would like to thank Swissair, LivingEarth and
Evans&Sutherland for providing us with photorealistic landscape textures and elevation data.
6.
REFERENCES
[1] Funkhouser, T.A. and Sequin, Proceedings of
SIGGRAPH 93, pp. 247-254.
[2] Peter Lindstrom, David Koller, Larry
F.Hodges, William Ribarsky, Nick Faust,
“Level-of-detail management for real-time
rendering of photo-textured terrain”, Internal
report
of
Georgia
Tech,
http://www.cc.gatech.edu/gvu/research/techrep
orts95.html
Figure 6: DCT coefficients sent for each frame in
the fly over sequence Z2/4. One can see how the
viewer position moves. The pattern looks like the
“light cone” of a torch moving across the
landscape; the texture parts that receive “more
light” are updated.
[3] Fred Jordan, Touradj Ebrahimi, “Viewdependent texture for transmission of virtual
environment”, MPEG96/M1501, Maceio,
1996.
[4] Touradj Ebrahimi, “Description of Coreexperiment in SNHC texture coding”,
MPEG96/M1723, Sevilla, 1996.
[5] Fred Jordan, Stefan Horbelt, Touradj Ebrahimi,
1997, “Results of Core Experiment X3 for
Texture Coding”, MPEG97/M1737, Sevilla,
1997.
[6] Stefan Horbelt, Fred Jordan, Touradj Ebrahimi,
“Results
of
Core
Experiment
Z2”,
MPEG97/M1904, Bristol, 1997.
Figure 7: DCT coefficients sent in the globe zoom
sequence Z2/6. Note the circular frequency
patterns.
Experiment
Z2/1
Z2/3
Sequence description Flat zoom
sequence
Figure for bitrates
See 8
Figure DCT masks
See 5
Texture
SA
E&S
Kbits orig. frame
4866 4866
Kbits first frame
155
170
Intra comp. ratio
1:31
1:28
Kbits/s@10frame/s
189
294
Inter compress ratio 1: 256 1:165
Kbits per frame
32.2
21.7
Z2/2
Z2/4
Fly over
Sequence
See 9
See 6 -----SA
E&S
3932 3932
333
307
1:12
1:13
75
160
1:524 1:246
11.8
19.8
Z2/5
Take
off
10
-----E&S
3932
66
1:60
72
1:544
7.6
Mean compres. ratio
Mbits transmitted
1:332
0.9
1:520
1.1
1:151
1.6
1:224
1.1
1:198
1.5
Table 2: Compares the bitrates and compression
ratios from the core experiments Z2. The Z2/4 is
closer to the ground, the terrain is more flat and the
texture has higher frequencies, all this requires a
higher bitrate.
Experiment
Sequence description
Figure for bitrates
Figure DCT masks
Texture
Kbits orig. frame
Kbits first frame
Intra comp. ratio
Kbits/s@10frame/s
Inter compres. ratio
Kbits per frame
Mean compres. ratio
Mbits transmitted
6.1
6.2
Globe zoom sequence
See 11
See 7
LE*
LE
3932
4866
68
88
1:58
1:55
50
87
1: 791
1: 556
64
105
1: 617
1: 463
Figure 9: Shows the bitrates of the two flyover
sequences Z2/2 (SA) and Z2/4 (E&S). In the
latter, the viewer flies over more parts of the
texture.
6.3
LE
3932
71
1:56
77
1: 512
91
1: 435
Table 3: Compares the bitrates and compression
ratios of the globe zoom sequences.
Figure 8: Compares the bitrates of the flat zoom
sequences. One can see that the E&S texture (3)
has higher frequency parts. The compression ratios
are the ratio between the number of transmitted bits
and the size of the original sequences. The bitrates
are calculated for 10 frames per second.
Figure 10: Z2/5 starts from an airport and flies
around in a circle. This flight is longer and the
data is streamed at lower bitrates than in Z2/4.
Figure 11: Shows the bitrates for the zoom on
the earth globe and the influence of different
screen sizes and texture qualities: In case 2 the
screen size is increased, the data is transmitted
faster. In case 1 the bitrate drops faster at
higher frequencies, the reason is that the
texture quality was reduced before coding, by
compressing it with JPEG at low quality. (LE*)