MPEG I-Frame Encoding
Transcription
MPEG I-Frame Encoding
Computer and Machine Vision Deeper Dive into MPEG Digital Video Encoding January 22, 2014 Sam Siewert Reminders CV and MV Use UNCOMPRESSED FRAMES Remote Cameras (E.g. Security) May Need to Transport Frames Capture Over Network to CV/MV Processor We NEED to Understand Both! BEWARE of LOSSY COMPRESSION I-Frame ONLY or MJPEG Decent Compromise of Both Sam Siewert 2 MPEG: Order Of Operators #1 #2B #2C #2A #3 #1: POINT (Pixel) Encoding #2 A-C: Macro-Block Lossy Intra-Frame Compression #3: Motion-Based Compression in Group of Pictures Sam Siewert 3 Step #1 – RGB to YCrCb 4:4:4 24-bit (Lossless) For every Y sample in a scan-line, there is also one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel) Typically a Post Production, CEDIA or DCI format 0 319 … … 76,480 = Y, Cr, and Cb sample Sam Siewert 76,799 … = Y sample only 4 Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixels = 48 bits, Whereas Two YCrCb is 32 bits, or 16 bits per pixel vs. 24 bits per pixel (33% smaller frame size) 0 319 … 48 bit to 32 bit 76,480 = Y, Cr, and Cb sample Sam Siewert … 76,799 … = Y sample only 5 Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12 bits per pixel on average vs. 24 bits per pixel (50% smaller) 0 319 … … … 76,480 = Cr, Cb sample Sam Siewert 76,799 = Y sample only 6 Step #2 – Convert to 8x8 Macroblocks and Transform Aspect Ratios Designed to Fit 8x8 Macroblock E.g. 640 x 480 => 80 x 60 Macroblocks Discrete Cosine Transform Applied to Each 8x8 – Spatial Intensity to Frequency Transform – Applied on X Axis (Row) – Applied on Y Axis (Column) Set up for Intra-frame (I-frame) Compression Sam Siewert 7 Convolution Concepts Math operation on 2 functions, that produces a 3rd Point Spread Function “Sharpen” meets this Definition So do Many Mask Operations applied to Pixel Neighborhoods 2 impulses, f(t), g(X – t) Area inside intersection f convolved with g over t Sam Siewert 8 DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/ De-convolved to restore image from Convolved Image DCT Inverse DCT Sam Siewert 9 DCT Concepts F(x) is a sum of sinusoids (with frequency, amplitude) DCT operates of a discrete number of samples Can derive DC sum at any x, even where F(x) not known N x N Macro-block has Zero Frequency DC at 0,0 Increasing Horizontal Frequency Increasing Vertical Frequency Can De-convolve (inverse DCT, or iDCT) Can Eliminate High Frequency Horizontal and Vertical Terms – Minimal Losses from Truncation (otherwise lossless) – Loss of High Frequency Image Features (What are These?) Sam Siewert 10 Basic Concept of Waveforms Complex Waveform is Sum of Simple Fundamentals Simple Fundamentals Can Be Derived from Complex Sam Siewert 11 Scanline DCT Example Small Losses Due to DCT, iDCT Numerical Truncation Larger Losses Due to H.O.T. Quantization and Truncation http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-NFundamentals.xlsx Sam Siewert 12 What Is Lost with DCT Quantization? Noise More Than Anything Else Complex XY Variable Patterns (Real Science Data?) Complex Tiling Higher Frequency X Higher Frequency Y Terms Can Still be Ignored Complex Wood Texture Most Detail in X Far Less in Y Randomized Texture Image High X Detail High Y Detail Most Loss of Detail, But Noisy Sam Siewert 13 Step #2A: Macro-block Discrete Cosine Transform 8x8 Pixel Block – Macro-block – SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio – HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR – HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR Sam Siewert 14 Step #2B: Macro-block Quantization (Lossy) Apply Weighting and Scaling 8x8 to DCT Produces Lots of Repeated Values (and Zeros) Compared to Original Sam Siewert 15 Decode Process for #2A-B Sam Siewert 16 How Lossy is the Decode MacroBlock? Sam Siewert 17 OpenCV Macroblock DCT Example Same Cactus 320x240 with 80x80 DCT Macroblocks DCT iDCT Same Cactus 320x240 Again with 8x8 DCT Macroblocks DCT Sam Siewert iDCT 18 Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: The Art of Scientific Computing (3rd ed.)) http://en.wikipedia.org/wiki/File:Dctjpeg.png http://www.cse.uaa.alaska.edu/~ssiewert/a490d mis_code/dct2/dct2.c Sam Siewert 19 Step #2C: Macro-block Run-Length and Huffman Encoding Zig-Zag Run-Length Encoding to Exploit Repeated Data and Zeros found in H.O.T. of Quantized DCT – 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, … Becomes: Sam Siewert 20 Huffman Applied to RLE Data Huffman Tables for MPEG-2 Macro-Blocks Defined in 13818-2 (Lossless) Compression Based on Probability of Occurance Shannon’s Source Coding Theory: log2(P), P=probability of occurrence, Binary encoding of Symbols Sam Siewert 21 Step #3: Group of Pictures Concept – Transmit Change-Only Data I-Frame Compressed Only Intra-Frame By Methods #2A-2C to Macro-Blocks I-Frame Can Be Decoded Alone P-Frame is Differences Only Over the GoP B-Frame is Differences Only Between Both I-Frame and Closest P-Frame Difference Data Can be Further Encoded with Lossless Methods Without Steps 2A-C, Specifically Quantization, and With High Motion Video, Could Blow-Up Sam Siewert 22 Group of Pictures: High Level View Sam Siewert 23 Overall MPEG YCrCb Compression Performance Standard Definition 720x480x2 (675KB/frame) @ 30fps – – – – – Requires 20MB/sec (200 Mbps) Uncompressed Typical MPEG-2 @ 3.75 Mbps, > 50x Compression Typical MPEG-4 @ 1.5 Mbps, > 100x Compression 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch) ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch) HD 720p (1280x720x2,1800KB/frame) @ 30fps – Requires 53MB/sec (530Mbps) Uncompressed – Typical MPEG-2 @ 20 Mbps, > 25x Compression – Typical MPEG-4 @ 10 Mbps, > 50x Compression HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps – Requires 120MB/sec (1200Mbps) Uncompressed – Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression – Typical MPEG-4 @ 20 Mbps, > 60x Compression Sam Siewert 24 Parsing an Elementary Video Stream Many 188-Byte Packet Types and Header Allows for Multi-plexing of many Video and Audio Streams on a Carrier Sam Siewert 25 MPEG-4 vs. MPEG-2 MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988) – Widely Used for Digital Video – Digital Cable TV, DVD – Transport Stream designed for Broadcast (Lossy, No Beginning or End of Stream) ATSC – Advanced Television Systems Committee (HDTV Broadcast) – 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39 Mbps, Reed-Solomon Error Correction – Up to 1080p (1920x1080) Video Resolution – AC-3 (Dolby) Audio DVB – Digital Video Broadcast (Europe, Satellite) – Program Stream designed for Playback Media (DVD, Flash, HDD, etc.) MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode – Better Compression Rates (improved motion prediction for P,B frames), MPEG-4 Part-10 (H.264), e.g. Blu-Ray – Extensions for Digital Rights Management – Advanced Audio Encoding – Becoming More Widely Deployed for HD and Because of Lower Bit-Rate Transport Streams Sam Siewert 26