HD logix, Inc. HD logix, Inc.
Transcription
HD logix, Inc. HD logix, Inc.
HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com ImageIQ3D – 3D Video from 2D Video in Real-Time ® Utilizing many of the same patented and patent-pending technologies as HDlogix‘s ImageIQ -3D ImageIQ literally creates a new dimension for video. Drawing on HDlogix‘s expertise with superresolution that converts SD video to full-bandwidth HD video and ImageIQ‘s GPU-based, realtime optical flow and image structure analysis capabilities, it is now possible to convert 2D video into 3D video in real-time with no intervention. Unlike previous ―pseudo-3D‖ gimmicks, ImageIQ3D reconstructs the geometry of the video scene from regular video in order to create true 3D stereo video for any 3D video display – even ―autostereoscopic‖ 3D displays that do not require any glasses. What is ImageIQ3D? ImageIQ3D constructs a geometric representation and model of objects in a video scene, in real-time, with no user or operator intervention. This information is used to convert regular video into left/right stereo views and/or color plus depthmap views suitable for display on any existing 3D display. Additionally, ImageIQ3D can generate full-color 3D stereo content from anaglyph (e.g. red/cyan), also in real-time. Any 2D, live broadcast can be generated as 3D, on-the-fly, without having to shoot with stereoscopic cameras and equipment Any and all user-generated content can be converted to 3D Anaglyph/colored-glasses programs can be converted to full-color 3D stereo, without the original full color version All of the 10 different flavors of existing 3D content can be converted to and from each other – a first in the industry – in real-time. Any 2D and 3D content can be converted to play back on any 3D stereoscopic display – another industry first. Uncalibrated, unaligned camera pairs can be used for stereo video and image capture Inexpensively transform telepresence solutions from HD to 3D-HD Transform a webcam video chat into 3D telepresence without a calibrated stereo camera Problems that cause disorientation, such as rapid depth changes are automatically modeled and eliminated 3D Has Come a Long Way From the 1980s – or has it? Almost everyone has had some experience with color anaglyph 3D – examples include the gimmicky horror films of the ‗80s, another example is the recent blue/yellow glasses many people experienced with the 2009 SuperBowl. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com As we all know, this method of 3D is prone to inducing headaches, and the experience is less than ideal because both eyes do not receive a full-color image. There are, of course, more elegant, fullcolor 3D solutions – one is to use shutter-glasses, with a “3D-Ready” display. Another is to use polarized glasses, with a polarized projection or active display. Other recent developments include autostereoscopic video displays that use clever optics to allow 3D viewing without any glasses at all. The hardware required for these has either been cheap and clunky (and headache inducing), or if executed well, relegated to expensive niche markets like signage, CAD/CAM visualization for mechanical modeling and medical imaging—or for venue-based cinema like iMax ® 3D . (Despite best intentions, some of these have also induced headaches despite being expensive and well-executed). Recently, full-color stereo-3D capable displays have been on the mass market – in fact, if you have bought an HDTV recently, it is entirely possible that it is capable of displaying 3D video with an inexpensive add-on and shutter glasses – without headaches and compromises with color. Welcome to the 3D Zoo Millions of customers worldwide already have “3D-Ready” displays, but they don‘t even know it. There are thousands of hours of high-quality 3D content in movie libraries, plus almost every one of the major movie studios have committed billions of dollars to 3D movie production for 2009 and ‗10 – and not just animated productions. Why doesn‘t everyone know about 3D? The answer is that there is no ready content for these displays. If there‘s so much 3D display technology in end-users‘ homes and so much 3D content, how is this possible? The reason is that there are many animals in the 3D Zoo, and they don‘t communicate: there are no less than 4 different technologies for shooting and producing 3D films and videos, yet more ways to store and transmit them and literally dozens of different technologies and products to display them as 3D video – and none of these will ―talk‖ to each other. If you have a red-cyan anaglyph source, you can‘t display it in full color on a shutter-glasses stereoscopic display. Likewise, if you have a full-color stereo source, you can‘t display it on a multichannel autostereoscopic display. If the content has been converted for a multichannel autostereoscopic display (at $25,000 per minute of footage), it no longer can be displayed with red/cyan glasses OR on a shutter–glasses display without a new transmission medium – and this barely scratches the surface of the problem. Figure 1. Lots of existing 3D video technologies are talking, but not a lot of them are listening to each other. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com Until ImageIQ3D, there was no way to make all of these formats, technologies, and display technologies compatible with each other without expending tens of thousands of dollars per minute of footage – offline, and with a slow turnaround . Most importantly, there has been no way to make 3D actually work for the end-user without a lot of excuses, apologies, and headaches – until now. Figure 2. ImageIQ3D is the Universal Translator for 3D and 2D video. With ImageIQ3D, everyone‘s talking 3D video to each other – even if the original video is only 2D, and regardless of origination methods, archive formats, and distribution standards/transmission methods. Further, for 2D content that was created with only one camera, the only way to convert to 3D was manually intensive, requiring expensive intervention by a veritable army of 3D modelers, stereoscopic specialists, 3D rendering artists, and video engineers to identify precise scene cuts, and painstakingly edit geometry and depthmaps, and correct their errors and problems – until now. Today’s Video Architecture: The GPU/3D Accelerator Some of the algorithms that ImageIQ3D uses have existed for years – but have only been possible to perform in real-time recently with the advent of programmable graphics hardware: GPUs. GPUs allow for highly-parallel, memory intensive processes – much more so than equivalent CPUs that are ten times more expensive. Additionally, ImageIQ3D‘s approach is very similar to the 2D ® superresolution technology in ImageIQ , which is particularly well-suited for GPU implementation. As a result, ImageIQ3D can run in real-time on very modest GPU hardware. Top-of-the-line video card hardware is not required, and in fact laptops that have several-year-old video chipset hardware can run ImageIQ3D without breaking a sweat. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com Overview of the ImageIQ3D Process ® Like ImageIQ , ImageIQ3D performs sophisticated motion analysis called optical flow. Much information about the 3D scene geometry can be gleaned from the relative motion of objects in the video and how they occlude (reveal and hide) pixels in other objects as they move – as long as the motion estimation is precise and accurate. Second, straight lines of buildings, the horizon, and other objects gives clues about the vanishing points, which also help to solve the puzzle. A Generalised Hough/Radon Transform helps identify these lines and other useful features. Finally, in most photography there is a tendency for objects that are very near and very far from the camera‘s focal plane to be blurred by an amount proportional to their distance from it. A Blind Point-SpreadFunction Estimator is used to estimate the out-of-focus character for each pixel, to complete the information needed to estimate the depth of the video. Some of this information is always available, sometimes not all of it is (for example, when nothing is moving in the video). ImageIQ3D uses a superresolution-based statistical approach to achieve robust and consistent results even when there is very little or partial information available. Figure 3. The ImageIQ3D Process – a simplified view. Ultimately, the goal is to produce an accurate depth map for each video frame – a representation of the distance of each pixel in the video from the camera. Once an accurate depth map is calculated, it is possible to easily convert to and from any 2D or 3D format! ImageIQ3D: Depth-from-Motion via Optical Flow Critical information about the objects and background making up a scene, and their relative distances to the camera, can be calculated if these objects ever move, and if there is an accurate and precise ® estimation of the true motion. ImageIQ3D uses the same optical flow engine as ImageIQ ‘s superresolution process. The ImageIQ3D optical flow computation system achieves real-time, per-pixel dense motion estimation with a wide and precise spatial dynamic range – 0.01 to 500.00 pixels. A motion vector is calculated for every pixel, in every image – the motion vector tells how much the pixel has moved. One way to view a motion-vector field is to let hue represent the direction, and brightness to represent the magnitude, as shown in Figure 4. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com Figure 4. Original frame (left), Hue-Saturation-Value representation of the optical flow field (right). Not just the motion itself is important – how objects hide and reveal pixels from other objects and the background behind them gives significant depth information. ImageIQ3D computes occlusions in addition to optical flow. Figure 5 demonstrates the ImageIQ3D Depth from Motion process in action: Figure 5. Using occlusion and motion to generate a depth map. Top Left image – reveal and hide occlusions are marked in red and yellow. Top right image – optical flow. Bottom image – generated depth map from motion and occlusions. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com Figure 6. The generated depthmap has been used to generate a synthetic left/right image pair. (One can get the 3D effect by crossing one‘s eyes to fuse the left and right sides). Figure 7. The depthmap was used to generate a synthetic red/cyan anaglyph (if one has a cheap pair of red/cyan glasses, you can view the effect). HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com ImageIQ3D: Scene Change Detection What happens if the camera is panning, and then suddenly stops? Previous algorithms would lose all depth information and the 3D Video would ―flatten out‖. The solutions to this problem are conceptually simple – somehow accumulate depth for each pixel as you go along. Of course, pixels move, and this requires a very accurate motion compensation to do properly. Another problem is that accumulating depth values as a history can significantly corrupt the depth map for the current frame unless the system knows when the shot has cut to a new scene, or even if all of the relevant pixels have ―panned off the screen‖. Carrying over depth from a previous shot can result in serious distortions, and in some cases, a violent motion sickness response in some viewers. Clearly, a ―shot change detection‖ method is required, and this is a well traversed area of study and practice – but for the 2D to 3D case, it‘s not enough to know if the editor cut away to an entirely different scene. One must know, reliably, when each individual pixel has moved offscreen and no longer has any history – and when new pixels appear, one has to know that too. Of course, if the current shot cuts away, all depth assumptions have to be reset as well. ImageIQ3D has a very robust ―scene-change detection‖ engine that provides exactly this capability – for every pixel, individually. When everything has panned or zoomed offscreen, or the current scene has cut or faded away, ImageIQ3D knows how to reset its assumptions – a very important part of making the 2D to 3D process seamless and requiring no user intervention. This is also very important to ensure that these changes don‘t cause viewers‘ eyes to cross (or cause them to throw up) when errors due to scene changes cause left/right disparity issues. ImageIQ3D: Depth-from-Vanishing Points via Radon Transform Video does not always include motion. Sometimes, other cues are necessary to obtain depth. One solution is to use geometric clues in the images themselves to assist – if one knows where the predominant straight edges are, and has some information about the faces of objects in a scene, some information about the depth of foreground objects and the background can be obtained. Like MRI machines, ImageIQ3D performs a Hough/Radon transform to correlate image edges and structure – except ImageIQ3D does it in real-time: Figure 8. Not all images have good geometric depth cues. Original frame, marked up with predominant straight edges in red (left), Generalised Hough/Radon on right. Crossings of the curves and bright yellow/white dots indicate position and slope of significant straight lines. This frame is ambiguous, so other information (like motion/occlusion) is needed to infer depth. Depth from 2D is a specialized case of superresolution – using multiple pieces of information to fill in an incomplete (or sometimes, overcomplete) estimation. In the classic superessolution case, one is HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com trying to enlarge an image and fill in missing pixels with information from previous frames with motion giving the critical ―clues‖. In this case, ImageIQ3D fills in depth information from previous frame motion, plus multiple other sources – like geometric cues. Figure 9. Other images have excellent geometric depth cues. Original frame marked up with straight edges in red (left), Generalised Hough/Radon on right. This frame has several clearly distinguishable straight edges indicated by convergence of crossing curves in the transform on the right. Vanishing points can be clearly detected, and are used to constrain the depth map estimation. Figure 10. Depth map obtained from geometric depth cues plus motion. ImageIQ3D: Depth-from-Focus via PSF Estimation Another way to increase the robustness of depth estimation is to include information about how much different objects in the scene are blurred, relative to each other. In combination with motion, occlusions, and geometric cues, a robust depth estimation can be obtained by performing Point Spread Function (PSF) estimation. This process estimates the focus and motion blur for each pixel in a scene. The information from the Radon/Hough transform is not only used to estimate geometric features, but also to find relevant edges which can be used to estimate the blur of objects in the scene. In combination with the structure analysis performed by ImageIQ®‘s optical flow analysis, the blur of each pixel (if it is near an edge feature) can be obtained. A robust regularization function is used to propagate values to adjacent non-edge pixels. These per-pixel focus cues, like the geometric cues, are incorporated into the overall model that builds the final depth map. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com ImageIQ3D: Putting it all Together A great deal of the magic of ImageIQ3D is not just performing optical flow, Hough/Radon, and intelligent, adaptive operations – but intelligently applying brute force. Like ImageIQ®, ImageIQ3D treats 2D to 3D as a superresolution problem – instead of creating pixels in the X and Y directions by using X, Y and Time information, ImageIQ3D creates new pixels in the Z direction using X, Y, and Time information. Most of the intelligence is embedded in how all of this information is combined, and how it constrains the final ―solution‖ – that solution being a consistent, reliable depth map that can be used to translate any 2D or 3D video into any other suitable 3D video format. ImageIQ3D: There’s One More Animal in the “3D Zoo” to Tame A different set of problems are presented when converting anaglyph (colored-glasses) video to fullcolor stereo – but, the toolset that ImageIQ3D uses lends itself extremely well to this circumstance as well. Consider a green/magenta anaglyph video: Figure 11. A frame from a green/magenta anaglyph 3D film. In this case, the left eye is coded into the green channel of the RGB image, and the right eye is coded into the red and blue channels. The full-color stereo version can be reconstructed, as long as there is a robust method of optical flow that knows about occlusions -- this sounds familiar! Conceptually, it‘s simple – estimate the optical flow between the Green, and the Red/Blue – and motion compensate the green toward the Red/Blue – add them together, and this becomes the full-color right eye image. Next, estimate the optical flow between the Red/Blue, and the Green – and motion compensate the Red/Blue toward the Green – add them, and this becomes the full-color left eye image. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com Figure 12. ImageIQ3D color anaglyph to full-color stereo conversion process. More properly stated, this is actually a problem of disparity estimation (not optical flow) between the right and left images -- but either way, the right and left images are using different colors. This makes solving this problem very difficult because the green color for the left eye, and the magenta for the right eye, cannot easily be compared, because their colors and brightness (and pixel values) are different. However, the Optical Flow engine in ImageIQ3D does not use block matching, or colors, but uses actual object structure, per-pixel, to determine motion and optic flow – so in this case, it‘s perfectly suited to the problem at hand. Figure 13. The same green/magenta anaglyph 3D film, reconstructed as a full-color 3D stereo pair. The original full-color movie was NOT used to construct this stereo pair. In short, this means that a player incorporating ImageIQ3D can not only convert from 2D to 3D, but take any legacy 3D format (including color anaglyph) and convert to full-color, full-stereo 3D, in realtime, with no operator or user intervention or tuning – on commodity, off-the-shelf, inexpensive GPU hardware. HDlogix, Inc. 26 Mayfield Ave. Edison, NJ 08837 (732) 623-2067 www.hdlogix.com ImageIQ3D: Many Deployment Options ImageIQ3D has been developed as a consumer DVD player application for Windows and MacOS, and as a batch-mode processor running on Linux, and is ready for low-BOM and parts-countsensitive consumer electronics applications. To find out how you can leverage ImageIQ3D in your network, workflow or consumer electronics solutions contact HDlogix at [email protected]. ® ImageIQ is a registered trademark of HDlogix, Inc. ImageIQ3D is a trademark of HDLogix, Inc. ©2009 HDlogix, Inc.