Compression algorithms and media codecs

Huffman: encode using paths down binary tree. Build tree by iteratively selecting the lowest prob pair of nodes to parent/group together.
- Letter-oriented Huffman is good at detecting and compressing files using the letter frequencies alone, but it cannot detect correlations between consecutive letters (common words and syllables).
Lempel Ziv
- both terms LZ77 and LZSS tend to be used very loosely, so they do not really imply very specific algorithms
  - in LZ77 the dictionary reference could actually be longer than the string it was replacing. In LZSS, such references are omitted if the length is less than the "break even" point.
  - LZSS is family of algorithms that generalizes LZ77 (source)
- LZ77 is good at detecting and compressing these kinds of common words and syllables that occur far more often than you might guess from the letter frequencies alone.
Deflate compression: LZSS → Huffman
- LZ77 compresses an original file into an intermediate sequence of literal letters and "copy items". Then Huffman further compresses that intermediate sequence
Zstd
JPEG compression
- Algo
  - Convert from RGB to YCbCr color space which separates luminance from chrominance
  - Downsample chrominance—can get away with quite a bit without being perceptible
  - Divide into 8x8 blocks
  - Center around 0
  - Perform DCT, so the block is a linear combo of the following, coeffs between -1024 and 1024. Most big coefs are in lower freqs. (“energy compaction”)
  - Divide each int coef by a quantization value int. A quality level of 50% has the following table. Result is many of the values will go to 0, esp. the higher freq entries
  - RLE the result. Use zigzag pattern, since this keeps the high-freq 0s together:
  - Huffman compress
- DCT: can be just a matmul. Because C below is orthonormal, its transpose is the inv.
- Weakness: images with lots of high freqs, like text, which ends up having artifacts around the text
- Video
PNG
- Focus on lossless compression
- Filters (like UP, AVG, PAETH, etc.) → deflate
- Filters use one’s complement
- Video
MPEG
- I-frames: independent frames
  - DCT. Basically JPEGs. 8x8 blocks.
- P-frames: predicted frames, depending only on earlier I and P frames and not B frames
  - Just the differences from prior decoded frame, then DCT, which is more effective with smaller magnitudes
- B-frames: bidirectionally predicted frames
  - They are thus decoded and may sometimes be sent out of order by container format
- Various sources in the literature on MPEG-1 contain statements indicating that a typical I-frame should use about 1 bit per pixel on average, a typical P-frame should use 0.1 bits per pixel and a typical B-frame should use 0.015 bits per pixel.
- Motion compensation
  - Operate on macroblocks (16x16)
  - Want to find blocks that result in smaller encoding
  - Could just find in local area - good for real movement
  - But for finding the best match in the entire frame (purely for optimizing encoding), then want to search larger area. Do this by iteratively 2D binary search, with some fixed-depth beam search keeping the top 25%
  - Search by these metrics
- Video