Video Compress Theory

Image Sampling
The subsampling scheme is commonly expressed as a three part ratio J:a:b (e.g. 4:2:2), although sometimes expressed as four parts (e.g. 4:2:2:4), that describe the number of luminance and chrominance samples in a conceptual region that is J pixels wide, and 2 pixels high. The parts are (in their respective order):
J: horizontal sampling reference (width of the conceptual region). Usually, 4.
a: number of chrominance samples (Cr, Cb) in the first row of J pixels.
b: number of (additional) chrominance samples (Cr, Cb) in the second row of J pixels.
Alpha: horizontal factor (relative to first digit). May be omitted if alpha component is not present, and is equal to J when present.
image
YCbCr 4:2:0 with Interlaced scanning is a common practice in Portable Video Device(MPEG-4) and Video Conferencing (H.263). While YCbCr 4:2:2 is prevalent in DVD, Digital TV, HDTV as well as consumers’ video devices.  
To Calculate how many percentage of bits saved to store the images: 1-(J+a+b)/(4+4+4).

Reference URL: Sampling System and Ratios–Wiki page

Processing Unit for Image Compress
Macroblock is the basic unit for image sampling and encoding of Video Frame or Image, and it normally comprises 2 or more blocks. Block is a matrix of pixels and the size of block depends on the codec but is usually a multiple of 4. In MPEG2 and other early codecs, for example, a block has a fixed size of 8×8 pixels
A macroblock comprised four Y blocks, one Cb block and one Cr block, defining a 16×16 pixel square of YCbCr 4:2:0. In more modern codecs such as h.263 and h.264 the overarching macroblock size is fixed at 16×16 pixels, but this is broken down into smaller blocks or partitions which are either 4, 8 or 16 pixels by 4, 8 or 16 pixels. (Combinations of these smaller partitions must combine to form 16×16 macroblocks.)
imageColor information is usually encoded at a lower resolution than the luminance information. For example, the color information of an 8×8 macroblock in a 4:1:1 color space will be encoded into a Y Cb Cr format. The Luminance will be encoded at an 8×8 pixel size and the difference-red and difference-blue information each at a size of 2×2. In the decode process these will be stretched out to cover the 8×8 space.
Each macroblock contains 4 Y (luminance) block, 1 Cb (blue color difference) block, 1 Cr (red color difference) block (4:2:0). (It also could be represented by 4:2:2 or 4:4:4 YCbCr format). Macroblocks can be subdivided further into smaller blocks, called partitions. H.264, for example, supports block sizes as small as 4×4.
image
Reference URL: Macroblock-Wiki Page

Image Codec Flow
image
Discrete Cosine Transform (DCT)
imageIt is widely used in signal or image (still and motion) compressing coupled with a certain degree of quality degradation. DCT transforms the signal into frequency domain in which the signal demonstrate much more significant correlations or redundancy; thus facilitate the de-correlating and compressing.
DCT transforms each block of 8×8 samples (pixels) into a block of 8×8 spatial frequency coefficients which is actually 64 weights for 64 basis patterns in 8×8 pixels block. As the energy of the majority of image tends to be concentrated near the low-frequency area, after DCT, It is most likely that there are only a few significant coefficients and they are locatedimage near the low-frequency area while many other insignificant coefficients can be neglected. In other words, Only making use of those few significant coefficients is fair enough to recover the original image without significant quality degradation.
Zig-Zag Scanning
imageAfter picking out the significant spatial frequency coefficients, they are quantized and scanned in a zig-zag order and output a long sequence of digits of which the majority are zero.
Run-Level Encoding
imageThe long sequence of digits output from Zig-zag scanning are encoded in the form of (run, level) pair. Then each block (8×8 pixels) is reduced to a short sequence of (run, level) pairs.
run = number of zeros preceding the level (non-zero digit)
level = non-zero digit
Variable-Length Coding
Encoding each (run, level) pair using a variable-length code will produce the compressed version of image.
What cause Quality Loss
According to the image codec flow, precision loss is maily created by Sampling, DCT Quantization. If image is over-compressed, then
- block edges start to show (“blockiness”),
- high-frequency patterns start to appear (“mosquito noise”)

Video Compressing
Video is fundamentally a sequence of video frames (still image) presented in order at a very small time interval. Therefore, it is quite understandable that successive video frames are quite similar. In short, video compressing is kind of 2-step deal, Intra-frame compress and inter-frame compress. Intra-frame compress is exactly the same as Image compress mentioned above, while Inter-Frame compress aims to reduce the redundancy between Frames. Following sections will lead us to more details about inter-frame compress.
Inter-Frame Compress
Inter-frame compress is conducted on GOP (group of pictures: 6 or 15 successive frames) basis. In each GOP, Motion Estimation and Motion Compensation are conducted for every frame member in the unit of 16×16 samples (Macroblock).
Motion Estimation: comparing all frames in GOP and find out the closet matching area which is called prediction reference or I-Frame. Then calculate the offset between each frame and I-frame for in the unit of macroblock, which is referred as motion vector.
Motion Compensation: subtract the I-Frame from the each GOP member frame in unit of macroblock to figure out the difference and only include that information in that GOP member Frame (namely Subtraction Frame) before sending them to Image compress system, which can compress this kind of subtraction frame in a very high efficient way.
Inter-Frame decoding
imageReceived video data flow was firstly subject to image decoding system which then outputs a sequence of I-Frame, B-Frame (Bi-direction frame), and P-Frame (Predicted frame). These Frames are then subjected to Inter-Frame decoding on GOP basis.During that decoding, I-Frame feeds information to P-Frame which then feeds information to B-Frame.
to be continue