Streaming of photo-realistic texture mapped on 3D
Transcription
Streaming of photo-realistic texture mapped on 3D
Streaming of photo-realistic texture mapped on 3D surface S. Horbelt, F. Jordan, T. Ebrahimi Signal Processing Laboratory, Swiss Federal Institute of Technology, CH 1015, Lausanne, Switzerland Email: [email protected], [email protected], [email protected] ABSTRACT We present a novel technique for efficient coding of texture to be mapped on 3D landscapes. The technique enables to stream the data across the network using a back-channel. 1. y x INTRODUCTION A new method to obtain an optimal coding of texture and 3D data is presented. It consists of taking into account the viewing position in the 3D virtual world in order to transmit only the most visible information. This approach allows to greatly reduce the transmitted information between a remote database and a user, given that a backchannel is available. At a given time, only the most important information is sent, depending on object geometry and viewpoint displacement. This basic principle is applied to the texture data to be mapped on a 3D grid mesh. Section 2 explains which criteria can be used to reduce the texture information. Section 3 reports how these criteria can be applied in a DCT-based (Discrete Cosine Transform) encoder. Sections 4 and 5 provide results with the proposed technique applied to landscape and spherical mesh. Projection plane z View point Figure 1: Depth effect 2.3. Tilting effect Let us consider the case of a planar surface tilted around y-axis and projected on the x-y-plane. In orthogonal projection, it is possible to down-scale the texture in x-direction, depending on the tilting angle. For instance, with a 60° tilting, only half of the texture information really needs to be transmitted. (Figure 2). y x Projection plane z 2. CRITERIA 2.1. Principle The 3D relative position between viewpoint and mesh can be used to decrease the texture information. Two criteria have been used: the depth and the tilting. 2.2. Depth criteria If the viewpoint is sufficiently far away from the mesh, a down-scaled version of the original texture can be transmitted without any visible alteration of the rendered image. (Figure 1). Figure 2: tilting effect, cos α= 0.5 2.4. Rotation Let us call α the tilting angle. The rotation angle β of the plane determines in which direction the tilting criteria should be applied. (Figure 3). Normal y α x β View Point z Figure 3: Orientation of the plane 3. TEXTURE DATA REDUCTION PRINCIPLE 3.1. Generalization to a regular grid mesh In the case of a grid mesh on which a big texture is mapped, the same principle can be applied to each of the cells which define the mesh. In the following, it is assumed that the number of cells and the texture size are such that a texture block of 8x8 texels is mapped on each cell. 3.2. Coding Each 8x8 block is DCT transformed. It is then possible to remove, or “shrink”, some of the DCT coefficients, according to α and β values as shown in Figure 4. The DCT coefficients are then quantized and coded using MPEG-4 Video-VM algorithm. cos(α) β 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 4: Shrinking DCT coefficients 3.3. Streaming texture data Once the texture data needed for the first frame (intra data) has been loaded using the proposed method, it is then straightforward to send new DCT coefficients (inter data) as the observer is moving in the scene. Therefore, it is possible to update the texture information, according to the new viewer position using a very low bit rate channel. 4. RESULTS 4.1. Landscape and elevation data used We used three sets of digital maps and elevation data provided by three different companies: SA: The texture by Swissair, contains a 10km by 10km area of the valleys and mountains next to our institute at 1m resolution. We use a 1024×1024 pixel version in YUV format (12 Mbits) and a grid with 128×128 elevation points for the experiment Z2/1 and Z2/3. E&S: Evans&Sutherland provided a 10km by 10km area with canyons and mountains next to San Luis Obispo, California, USA at 5m resolution. We use this data in the same resolution as above for the experiment Z2/2, Z2/4 and Z2/5. LE: The Living Earth, Inc., gave us their map of the world at 4km resolution (900Mbits). We use a 1024×512 pixel version in YUV format (6Mbits) together with a spherical 128×64 sized grid for experiments Z2/6. 4.2. Experiments The following extended discussion of the core experiments Z2/1-6 [5,6] and their results, show the universality of the view-dependent technique at different resolutions. Figures 8-11 and tables 2-3 show the bitrates obtained to generate each of the frames of the sequence. The bitrates do not include the transmission of the elevation data, which represent a raw data of 32768 bytes with 16 bits per elevation point (1 % of the original texture data). Figures 5-7 show the patterns of transmitted DCT coefficients for each frame. 4.3. Flat zoom sequences This experiment zooms to a flat square plane on which the texture is mapped. At the beginning the plane is titled and far away (~100km), at the end it is viewed closely (~5km) from the top. The square never leaves the field of view. The visualization of the DCT coefficients that are updated for each frame of the sequence reveals interesting patterns (see figure 5). It shows how the view-dependent filter evolves over time in its frequency selectivity. Figure 8 contains the bitrates for the two different textures: SA for Z2/1 and E&S for Z2/3. Again one can see that first the low frequency parts are transmitted and as the sequence zooms closer the higher frequency parts follow. The comparison of the bitrates shows that the E&S texture is more detailed (see table 2). 4.4. Flyover sequences In these experiments, we fly in an airplane at low altitude (~300m) over the texture mapped terrain. We observe that the data is streamed at a quite constant and low bitrate (see figures 6, 9, 10 and table 2). Depending on the flight path and the relief, more or less new visual information has to be transmitted. In figure 9, the bitrate in Z2/2 (SA) drops at the end, as the sequences discovers less new territory, which is not the case for sequence Z2/4 (E&S). The long sequence Z2/5 simulates a take off from an airport (see figure 10). 4.5. Globe zoom sequences The zoom on the earth from a distance of 100.000km to 5.000km streams the texture of earth globe (LE). Figure 7 shows the circular pattern of the transmitted frequencies, which becomes at close distance a square, similar to the flat sequence. Figure 11 and table 3 show how screen size and texture quality influence the bitrates. 4.6. Comparison with still image coding One could just compress the textures with JPEG and transmit them (see table 1). The clear disadvantage is: the huge delay before the first image can be rendered at the receiver. The proposed view-dependent method is simple and achieves lower bitrates, a higher quality and a lower delay. Texture RGB coded JPEG 75% coded Flat zoom sequence YUV data: 180 Mbits Fly over sequence YUV data: 200 MBits SA 24 MBits 2.4 MBits 1.6 Mbits (Z2/1) 0.9 Mbits (Z2/2) E&S 24 MBits 2.4 MBits 1.1 MBits (Z2/3) 1.5 Mbits (Z2/4) LE 12 Mbits 0.7 MBits 0.5 Mbits (Globe) --- Table 1:Number of bits needed to transmit a texture during the whole sequence, compared to the number of bits achieved after coding the texture with JPEG at 75% quality factor. 5. CONCLUSION This technique has been proposed to ISO/MPEG-4 organization in the framework of Synthetic Natural and Hybrid Coding activities. Several core experiments have been performed [3, 4, 5, 6] which show that this technique performs better than the current video-VM. The viewdependency criteria have been integrated in the SNHC Verification Model. Future work will merge and extend our experiments in order to stream bigger landscape areas and zoom over several magnitudes of resolutions, while keeping very-low bitrates. Figure 5: DCT coefficients sent in the flat zoom sequences Z2/1 and Z2/3. The first frame (intra) is at upper left corner, last one is at bottom right. Each frame is split into 8x8 DCT blocks (not visible at this scale). In each block, the black dots represent the transmitted DCT coefficients. They form an interesting pattern. Each stripe moving across the texture, as time increases, contains DCT coefficients of one frequency. We would like to thank Swissair, LivingEarth and Evans&Sutherland for providing us with photorealistic landscape textures and elevation data. 6. REFERENCES [1] Funkhouser, T.A. and Sequin, Proceedings of SIGGRAPH 93, pp. 247-254. [2] Peter Lindstrom, David Koller, Larry F.Hodges, William Ribarsky, Nick Faust, “Level-of-detail management for real-time rendering of photo-textured terrain”, Internal report of Georgia Tech, http://www.cc.gatech.edu/gvu/research/techrep orts95.html Figure 6: DCT coefficients sent for each frame in the fly over sequence Z2/4. One can see how the viewer position moves. The pattern looks like the “light cone” of a torch moving across the landscape; the texture parts that receive “more light” are updated. [3] Fred Jordan, Touradj Ebrahimi, “Viewdependent texture for transmission of virtual environment”, MPEG96/M1501, Maceio, 1996. [4] Touradj Ebrahimi, “Description of Coreexperiment in SNHC texture coding”, MPEG96/M1723, Sevilla, 1996. [5] Fred Jordan, Stefan Horbelt, Touradj Ebrahimi, 1997, “Results of Core Experiment X3 for Texture Coding”, MPEG97/M1737, Sevilla, 1997. [6] Stefan Horbelt, Fred Jordan, Touradj Ebrahimi, “Results of Core Experiment Z2”, MPEG97/M1904, Bristol, 1997. Figure 7: DCT coefficients sent in the globe zoom sequence Z2/6. Note the circular frequency patterns. Experiment Z2/1 Z2/3 Sequence description Flat zoom sequence Figure for bitrates See 8 Figure DCT masks See 5 Texture SA E&S Kbits orig. frame 4866 4866 Kbits first frame 155 170 Intra comp. ratio 1:31 1:28 Kbits/s@10frame/s 189 294 Inter compress ratio 1: 256 1:165 Kbits per frame 32.2 21.7 Z2/2 Z2/4 Fly over Sequence See 9 See 6 -----SA E&S 3932 3932 333 307 1:12 1:13 75 160 1:524 1:246 11.8 19.8 Z2/5 Take off 10 -----E&S 3932 66 1:60 72 1:544 7.6 Mean compres. ratio Mbits transmitted 1:332 0.9 1:520 1.1 1:151 1.6 1:224 1.1 1:198 1.5 Table 2: Compares the bitrates and compression ratios from the core experiments Z2. The Z2/4 is closer to the ground, the terrain is more flat and the texture has higher frequencies, all this requires a higher bitrate. Experiment Sequence description Figure for bitrates Figure DCT masks Texture Kbits orig. frame Kbits first frame Intra comp. ratio Kbits/s@10frame/s Inter compres. ratio Kbits per frame Mean compres. ratio Mbits transmitted 6.1 6.2 Globe zoom sequence See 11 See 7 LE* LE 3932 4866 68 88 1:58 1:55 50 87 1: 791 1: 556 64 105 1: 617 1: 463 Figure 9: Shows the bitrates of the two flyover sequences Z2/2 (SA) and Z2/4 (E&S). In the latter, the viewer flies over more parts of the texture. 6.3 LE 3932 71 1:56 77 1: 512 91 1: 435 Table 3: Compares the bitrates and compression ratios of the globe zoom sequences. Figure 8: Compares the bitrates of the flat zoom sequences. One can see that the E&S texture (3) has higher frequency parts. The compression ratios are the ratio between the number of transmitted bits and the size of the original sequences. The bitrates are calculated for 10 frames per second. Figure 10: Z2/5 starts from an airport and flies around in a circle. This flight is longer and the data is streamed at lower bitrates than in Z2/4. Figure 11: Shows the bitrates for the zoom on the earth globe and the influence of different screen sizes and texture qualities: In case 2 the screen size is increased, the data is transmitted faster. In case 1 the bitrate drops faster at higher frequencies, the reason is that the texture quality was reduced before coding, by compressing it with JPEG at low quality. (LE*)