How not to design... containers

This post is part 1 of the "How not to design..." series:

How not to design... containers

A simple codec container

Do not rely on a magic word. Take a look at the BBC Dirac container syntax:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      'B'      |      'B'      |      'C'      |      'D'      | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Parse code    |  Next parse offset                            | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               |  Last parse offset                            | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               |  Unit data...                                 | 12-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Each packet starts with a 32-bit sequence BBDC, then a parse code which indicates the type of the unit, then two 32-bit DWORDS which are supposed to be the global offset of the current/last unit within the file, and finally that's followed by the unit data, which may be colorspace info or actual codec packets containing pixels.

On the surface its a simple container and you could write a bad parser in just a few lines and as many minutes. And here's a short list of what's wrong with it:

No size field. The only way to know how big a unit is would be to get the difference between the next and current unit's offset.
Fast and accurate seeking not possible in files over 4GB. Both offsets may overflow so seeking using fseek and a known frame numer means you have to parse every single unit.
Unsuitable for streaming. Both offsets make no sense for a stream as there's no start nor end known on the client-side.
A single-bit error in all but 8 bits of the 108-bit header will break a simple demuxer, while an error in the 8-bit leftover (parse code) will break decoding. A badly written demuxer will go into a loop with a near-0 chance of recovery, while a better one will look at each incoming byte for BBCD.
For a stream, there's barely 32 bits of usable entropy to find the next unit. While some sanity checks could be done to the offsets and parse code, a demuxer which accepts arbitrary input can't do them.
BBCD is not an uncommon sequence to see in compressed data, or in fact any data with a high amount of entropy. Which means you have a non-insignificant chance of parsing a wrong unit when there wasn't meant to be one.

Combined, all those issues make parsing not very robust and not very quick. A reliable parser can only be written for a very limited subset of the container.

A more robust simple codec container

"Let's just put a CRC on the whole data!". Hence, let's take a look at OGG:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      'O'      |      'g'      |      'g'      |      'S'      | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version       | header_type   | granule_position              | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               | bitstream_serial_number       | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               | page_sequence_number          | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               | CRC_checksum                  | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               | page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...                                                           | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Here, the 32-bit standard CRC_checksum covers the entire (possibly segmented) packet. When computing it, its just taken as 0x0 and replaced later.

A single bit error anywhere will desync demuxing. Which means you have to look for the OggS sequence at every byte position in a compressed stream.
You could ignore the CRC and let the packet through to the decoder, but there's no guarantee the codec won't output garbage, or fail to decode (or produce valid garbage, since any sequence of random bytes is a valid Opus packet).
If you don't ignore the CRC, you'll have to look at every byte position for the magic OggS and then at every match, perform a CRC which may be as long as 65k. This isn't fast nor good for battery consumption or memory.
The way chaining works in internet radio is to literally append the next file onto the stream, which causes a desync. So you have a mandatory desync every few minutes.
Seeking will also result in a desync and require you to search.
There's still only 32-bits of entropy in the magic sequence, and worse than Dirac, not many ways to check whether a packet is valid based on the header, since new streams may be created or simply dropped at any point.

There's a ton more I'm not covering here on what Ogg did disasterously wrong. Its a story for another time.

My point is that while magic sequences and CRCs are convenient, they're not robust, especially done on a per-packet level.

misc · containers ·

Lynne's compiled musings

20-09-30