With the standardization of the Vulkan decoding extension less than a month ago, two codecs were
defined - H264 and H265. While they have cemented their position in multimedia, another, newer
codec called AV1 appeared. Indeed, I was involved with its standardization. Not entirely satisfied
with the pace of Khronos, nor with VAAPI's lack of synchronization, me and Dave Airlie decided to
make our own extension to support AV1 decoding -
We were granted an official dedicated stable extension number from Khronos,
510, to avoid incompatibilities.
The extension is done in the same style as the other 2 decoder extensions, but with differences.
Unlike MPEG codecs, AV1 lacks the overcomplicated multiple NALU structure. A single sequence
header unit is all that's needed to begin decoding frames. Each frame is prefixed by a (quite large)
frame header. Thus, the video session parameters was delegated to the only piece of header that
may (or may not) be common amongst multiple frames - the sequence header. The frame header is supplied
separately, via each frame's
AV1 has support for film grain insertion, which creates a challenge for hardware decoders, as the film grain is required to be put on the decode output only, and be missing from the reconstructed output used for references. In my previous article about vulkan decoding, I mentioned the three possible modes for decoding refrence buffers:
- in-place (output frames are also used as references)
- out-of-place (output frames are separate from references)
- out-of-place layered (output frames are separate from references, which are in a single multi-layered image)
The first option cannot be combined with film grain, as the output images have film grain. But we
still want to use it if the hardware supports it, and no film grain is enabled. So, we require
that if the user signals that a frame has film grain enabled, the output image view MUST be different
from the reference output image view. To accomplish that, we use a pool of decoding buffers only for frames with
film grain, which requires that at least
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR is set for
Devices which support in-place decoding also support out-of-place decoding, which makes this method compatible
with future hardware.
This allows us to handle cases where film grain is switched on for a frame, and then switched off,
without wasting memory on separate reference buffers. This also allows for external film grain application
during presentation, such as with libplacebo.
Another difference between the way references are handled between AV1 and MPEG codecs is that a single frame can overwrite multiple reference slots. Rather than naively do copies, we instead leave the higher-level decoder (not the hardware accelerator, which is what the extension is) handle this, by reference-counting the frame. This does require that the hardware supports reusing the same frame in multiple slots, which, to our knowledge, all hardware av1 hardware accelerators do.
Finally, the biggest issue was with the hardware itself. AMD's hardware decoder expects a unique 8-bit ID to be assigned for each frame. This was no problem for index-based APIs, such as VAAPI, VDPAU, DXVA2, NVDEC and practially all other decoding APIs.
Vulkan, however, is not index based. Each frame does not have an index - instead, it's much lower level, working with bare device addresses. Users are free to alias the address and use it as another frame, which immediately breaks the uniqueness of indices.
To workaround this hardware limitation, we had no choice but to create a made-up frame ID. Writing the code was difficult, as it was a huge hack in what was otherwise a straightforward implementation.
The AV1's frame header does feature frame IDs, however, those are completely optional, and most encoders skip them (with good reason, the frame header is already needlessly large).
While it's possible for the extension to become official, it requires maintaining and sitting through meetings, which neither of us has the time for. Instead, we hope that this extension becomes a starting point for an official version, with all the discussion points highlighted. The official extension probably wouldn't look very different. It's possible to build it in other ways, but doing so would be inefficient, and probably unimplementable - we were able to fit our extension to use the same model as all other AV1 hardware accelerators available in FFmpeg.
The extension's very likely going to get changes as we receive some feedback from hardware vendors (if we do at all, we'd certainly like to know why AMD designed theirs the way they did).
The code can be found here:
- My FFmpeg
vulkanbranch - https://github.com/cyanreg/FFmpeg/tree/vulkan
- Dave Airlie's
radv-vulkan-video-decode-mesa-av1branch - https://gitlab.freedesktop.org/airlied/mesa/-/commits/radv-vulkan-video-decode-mesa-av1
Additionally, you can read Dave's blog post here - https://airlied.blogspot.com/2023/01/vulkan-video-decoding-av1-yes-av1.html.
It was nice seeing all the areas I worked on in AV1 actually get used in practice. As for where this goes, well, saying that we might be getting access to AMD's 7000-series soon would be more than enough :)
Update: while looking through Intel's drivers and documentation for AV1 support, Dave Airlie discovered that the hardware only supports decoding of a single tilegroup per command buffer. This is not compatible with our extension, nor with the way Vulkan video decoding currently exists as operating on a frame-only basis. Hopefully some solution exists which does not involve extensive driver bitstream modifications. Also, in the case of AMD, it may be possible to hide the frame index in the driver-side VkImage structure. However, the hardware seems to expect an ID based on the frame structure, which may make this impossible.