MPEG I-Frame Encoding

Transcription

MPEG I-Frame Encoding
Computer and Machine Vision
Deeper Dive into MPEG
Digital Video Encoding
January 22, 2014
 Sam Siewert
Reminders
CV and MV Use UNCOMPRESSED FRAMES
Remote Cameras (E.g. Security) May Need to Transport
Frames Capture Over Network to CV/MV Processor
We NEED to Understand Both!
BEWARE of LOSSY COMPRESSION
I-Frame ONLY or MJPEG Decent Compromise of Both
 Sam Siewert
2
MPEG: Order Of Operators
#1
#2B
#2C
#2A
#3
#1: POINT (Pixel) Encoding
#2 A-C: Macro-Block Lossy Intra-Frame Compression
#3: Motion-Based Compression in Group of Pictures
 Sam Siewert
3
Step #1 – RGB to YCrCb 4:4:4 24-bit
(Lossless)
For every Y sample in a scan-line, there is also one CrCb
sample
– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits
– No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel)
Typically a Post Production, CEDIA or DCI format
0
319
…
…
76,480
= Y, Cr, and Cb sample
 Sam Siewert
76,799
…
= Y sample only
4
Step #1 – RGB to YCrCb 4:2:2 (Lossy)
For every 2 Y samples in a scan-line, one CrCb sample
– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits
– Two RGB Pixels = 48 bits, Whereas Two YCrCb is 32 bits, or 16
bits per pixel vs. 24 bits per pixel (33% smaller frame size)
0
319
…
48 bit to 32 bit
76,480
= Y, Cr, and Cb sample
 Sam Siewert
…
76,799
…
= Y sample only
5
Step #1 – RGB to YCrCb 4:2:0 (Lossy)
For every 4 Y samples in a scan-line, one CrCb sample
– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits
– Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12
bits per pixel on average vs. 24 bits per pixel (50% smaller)
0
319
…
…
…
76,480
= Cr, Cb sample
 Sam Siewert
76,799
= Y sample only
6
Step #2 – Convert to 8x8 Macroblocks
and Transform
Aspect Ratios Designed to Fit 8x8 Macroblock
E.g. 640 x 480 => 80 x 60 Macroblocks
Discrete Cosine Transform Applied to Each 8x8
– Spatial Intensity to Frequency Transform
– Applied on X Axis (Row)
– Applied on Y Axis (Column)
Set up for Intra-frame (I-frame) Compression
 Sam Siewert
7
Convolution Concepts
Math operation on 2 functions, that produces a 3rd
Point Spread Function “Sharpen” meets this Definition
So do Many Mask Operations applied to Pixel Neighborhoods
2 impulses, f(t), g(X – t)
Area inside intersection
f convolved with g over t
 Sam Siewert
8
DCT – Discrete Cosine Transform
Convolution of Image with Discrete Cosine
See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/
De-convolved to restore image from Convolved Image
DCT
Inverse DCT
 Sam Siewert
9
DCT Concepts
F(x) is a sum of sinusoids (with frequency, amplitude)
DCT operates of a discrete number of samples
Can derive DC sum at any x, even where F(x) not known
N x N Macro-block has Zero Frequency DC at 0,0
Increasing Horizontal Frequency
Increasing Vertical Frequency
Can De-convolve (inverse DCT, or iDCT)
Can Eliminate High Frequency Horizontal and Vertical
Terms
– Minimal Losses from Truncation (otherwise lossless)
– Loss of High Frequency Image Features (What are These?)
 Sam Siewert
10
Basic Concept of Waveforms
Complex Waveform is Sum of Simple Fundamentals
Simple Fundamentals Can Be Derived from Complex
 Sam Siewert
11
Scanline DCT Example
Small Losses Due to DCT, iDCT Numerical Truncation
Larger Losses Due to H.O.T. Quantization and Truncation
http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-NFundamentals.xlsx
 Sam Siewert
12
What Is Lost with DCT Quantization?
Noise More Than Anything Else
Complex XY Variable Patterns (Real Science Data?)
Complex Tiling
Higher Frequency X
Higher Frequency Y
Terms Can Still be Ignored
Complex Wood Texture
Most Detail in X
Far Less in Y
Randomized Texture Image
High X Detail
High Y Detail
Most Loss of Detail, But Noisy
 Sam Siewert
13
Step #2A: Macro-block Discrete Cosine
Transform
8x8 Pixel Block – Macro-block
– SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio
– HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR
– HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR
 Sam Siewert
14
Step #2B: Macro-block Quantization (Lossy)
Apply Weighting and Scaling 8x8 to DCT
Produces Lots of Repeated Values (and Zeros)
Compared to Original
 Sam Siewert
15
Decode Process for #2A-B
 Sam Siewert
16
How Lossy is the Decode MacroBlock?
 Sam Siewert
17
OpenCV Macroblock DCT Example
Same Cactus 320x240 with 80x80 DCT Macroblocks
DCT
iDCT
Same Cactus 320x240 Again with 8x8 DCT Macroblocks
DCT
 Sam Siewert
iDCT
18
Mathematics for 2D DCT
Frequency Variation on X and Y axes from top
left to bottom right
Straight-forward Algorithm Based on 2D
Equation is O(n2) per dimension
Like Cooley-Tukey for DFT, a DCT Algorithm
that is O(n*log2(n)) has been formulated (Arai,
Y.; Agui, T.; Nakajima, M. - Numerical Recipes:
The Art of Scientific Computing (3rd ed.))
http://en.wikipedia.org/wiki/File:Dctjpeg.png
http://www.cse.uaa.alaska.edu/~ssiewert/a490d
mis_code/dct2/dct2.c
 Sam Siewert
19
Step #2C: Macro-block Run-Length and
Huffman Encoding
Zig-Zag Run-Length Encoding to Exploit Repeated Data
and Zeros found in H.O.T. of Quantized DCT
– 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, …
Becomes:
 Sam Siewert
20
Huffman Applied to RLE Data
Huffman Tables for MPEG-2 Macro-Blocks Defined in
13818-2 (Lossless)
Compression Based on Probability of Occurance
Shannon’s Source Coding Theory: log2(P), P=probability
of occurrence, Binary encoding of Symbols
 Sam Siewert
21
Step #3: Group of Pictures
Concept – Transmit Change-Only Data
I-Frame Compressed Only Intra-Frame
By Methods #2A-2C to Macro-Blocks
I-Frame Can Be Decoded Alone
P-Frame is Differences Only Over the
GoP
B-Frame is Differences Only Between
Both I-Frame and Closest P-Frame
Difference Data Can be Further
Encoded with Lossless Methods
Without Steps 2A-C, Specifically
Quantization, and With High Motion
Video, Could Blow-Up
 Sam Siewert
22
Group of Pictures: High Level View
 Sam Siewert
23
Overall MPEG YCrCb Compression
Performance
Standard Definition 720x480x2 (675KB/frame) @ 30fps
–
–
–
–
–
Requires 20MB/sec (200 Mbps) Uncompressed
Typical MPEG-2 @ 3.75 Mbps, > 50x Compression
Typical MPEG-4 @ 1.5 Mbps, > 100x Compression
10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch)
≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch)
HD 720p (1280x720x2,1800KB/frame) @ 30fps
– Requires 53MB/sec (530Mbps) Uncompressed
– Typical MPEG-2 @ 20 Mbps, > 25x Compression
– Typical MPEG-4 @ 10 Mbps, > 50x Compression
HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps
– Requires 120MB/sec (1200Mbps) Uncompressed
– Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression
– Typical MPEG-4 @ 20 Mbps, > 60x Compression
 Sam Siewert
24
Parsing an Elementary Video Stream
Many 188-Byte Packet Types and Header
Allows for Multi-plexing of many Video and Audio
Streams on a Carrier
 Sam Siewert
25
MPEG-4 vs. MPEG-2
MPEG-2 – Defined by ISO 13818-1, 13818-2
– Leverages MPEG-1 (Motion Picture Experts Group – 1988)
– Widely Used for Digital Video – Digital Cable TV, DVD
– Transport Stream designed for Broadcast (Lossy, No Beginning or End of
Stream)
ATSC – Advanced Television Systems Committee (HDTV Broadcast)
– 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39
Mbps, Reed-Solomon Error Correction
– Up to 1080p (1920x1080) Video Resolution
– AC-3 (Dolby) Audio
DVB – Digital Video Broadcast (Europe, Satellite)
– Program Stream designed for Playback Media (DVD, Flash, HDD, etc.)
MPEG-4 – Defined by ISO 14496 (1998)
– Leverages MPEG-2 Standards for Program/Transport, Encode/Decode
– Better Compression Rates (improved motion prediction for P,B frames),
MPEG-4 Part-10 (H.264), e.g. Blu-Ray
– Extensions for Digital Rights Management
– Advanced Audio Encoding
– Becoming More Widely Deployed for HD and Because of Lower Bit-Rate
Transport Streams
 Sam Siewert
26