Khronos Finalizes Vulkan Video Extensions for Accelerated H.264 and H.265 Decode

December 19, 2022 by Tony Zlatinski, NVIDIA vulkan

Vulkan Video Logo

In April 2021, the Vulkan® Working Group at Khronos® released a set of provisional extensions , collectively called ‘Vulkan Video’, for seamlessly integrating hardware-accelerated video compression and decompression into the Vulkan API. Today, Khronos is releasing finalized extensions that incorporate industry feedback and expose core and decode Vulkan Video functionality to provide fully accelerated H.264 and H.265 decode.

Khronos will release an ongoing series of Vulkan Video extensions to enable additional codecs and accelerated encode as well as decode. This blog is a general overview of the Vulkan Video architecture and also provides details about the finalized extensions and links to important resources to help you create your first Vulkan Video applications.

Introduction

Vulkan Video adheres to the Vulkan philosophy of providing applications flexible, fine-grained control over scheduling, synchronization, and memory allocation. By leveraging the existing Vulkan API framework, the Vulkan Video extensions enable efficient, low-latency, low-overhead use of processing resources for accelerated video processing, including distributing stream processing tasks across multiple CPU cores and video codec hardware—all with application portability across multiple platforms and devices ranging from small embedded devices to high-performance servers supporting Windows and Linux.

Vulkan Video Extensions Overview

GPUs typically contain dedicated video decode and encode acceleration engine(s) that are independent from other graphics and compute engines. In fact, some physical devices may support only video decode and/or video encode acceleration. Consequently, Vulkan Video adds video decode and encode queues to support these video operations.

Also, the field of video codecs is always changing. More advanced and domain-specific video coding tools make it easier to compress and decompress video, which leads to new codecs and codec extensions. So, Vulkan Video was designed to be flexible enough to support a wide range of existing and future codecs. It does this by including both universal "core" extensions that should be useful for all codecs and codec-specific extensions that are only useful for one codec type. Figure 2 depicts Vulkan Video extensions along with their status and relations.

This release of Vulkan Video finalizes the following extensions:

VK_KHR_video_queue: common APIs for all video coding operations.
VK_KHR_video_decode_queue: common APIs for all video decode operations.
VK_KHR_video_decode_h264: H.264 decode-specific capabilities and parameters (promoted from EXT to KHR in this final release).
VK_KHR_video_decode_h265: H.265 decode-specific capabilities and parameters (promoted from EXT to KHR in this final release).

Collectively, the above extensions allow exposing H.264 and H.265 video decode acceleration through Vulkan.

The following encode-related extensions remain provisional and are expected to be finalized in 2023 along with new extensions to be introduced for VP9 decode and AV1 decode/encode operations:

VK_KHR_video_encode_queue: common APIs for all video encode operations.
VK_EXT_video_encode_h264: H.264 encode-specific capabilities and parameters.
VK_EXT_video_encode_h265: H.265 encode-specific capabilities and parameters.

As an example, a Vulkan Video implementation that only supports H.264 decoding would only expose support for the VK_KHR_video_queue, VK_KHR_video_decode_queue, and VK_KHR_video_decode_h264 extensions, and an application would use all three extensions together to perform H.264 decode operations on that target device.

The standard vkGetPhysicalDeviceQueueFamilyProperties2 API may be used to determine support for codec operations, such as H.264 decode, by chaining VkVideoQueueFamilyPropertiesKHR to retrieve VkVideoCodecOperationFlagsKHR.

Vulkan Video Codec-Specific Headers

Video coding experts often analyze video bitstreams to investigate coding artifacts and improve video quality using codec-specific syntax elements in the bitstream. Such syntax elements are documented in the codec specification, which defines behavioral descriptions of syntax and tools for a given codec. Vulkan Video makes it easy to recognize API fields corresponding to codec syntax elements or codec-defined terms, without bloating the Vulkan specification with descriptions already well documented in the codec standard specifications.

Codec-specific standard ("Std") header files define structures with explicit and derived codec syntax fields in the naming and style conventions of the corresponding codec standard specification. These Std structures are used as fields in Vulkan Video codec-specific extension structures. This final release provides the following codec Std headers:

vulkan_video_codec_h264std.h: defines structures and types shared by H.264 decode and encode operations.
vulkan_video_codec_h264std_decode.h: defines structures used only by H.264 decode operations.
vulkan_video_codec_h265std.h: defines structures and types shared by H.265 decode and encode operations.
vulkan_video_codec_h265std_decode.h: defines structures used only by H.265 decode operations.
vulkan_video_codecs_common.h: defines a versioning macro used by other Std headers for version maintenance.

The following encode-specific Std headers remain provisional:

vulkan_video_codec_h264std_encode.h: defines structures used only by H.264 encode operations.
vulkan_video_codec_h265std_encode.h: defines structures used only by H.265 encode operations.

Video Transcoding Example

Video transcoding is often used to transform video from an older codec to a newer one so that the compression efficiency is improved. It may also be used to convert content to a codec more appropriate for efficient consumption at the target environment. Figure 3 depicts a basic block diagram for video transcoding.

The first phase of video transcoding is decoding an input video bitstream (sequence of bytes) to generate the images that make up the video sequence. Decoding individual images in the bitstream often requires referencing one or multiple previously decoded images, which must be retained for this purpose in the Decoded Picture Buffer (DPB). Note that some implementations may support using the same image resources for output images and DPB images, while others may require or prefer decoupling decode output images from DPB images–for example, by using proprietary layouts and storing metadata along with DPB images while keeping output images in standard layouts for external consumption. Finally, to arrive at the original video sequence, it may be necessary to reorder the output images as instructed by the bitstream.

The second phase of transcoding involves encoding the decoded images with a new codec (or perhaps the same codec with a different set of tools). The encoding process is essentially the reverse of the decoding process: the input is a sequence of images, which may be re-ordered before encoding. It may be necessary to retain "reconstructed" or decoded versions of the images for reference while encoding the following images. Note that in general, input images are not used for reference in the encoding process to avoid drift when decoding the bitstream at the consumer end since encoding is usually a lossy operation. Transcoding applications pipeline decode and encode operations to reduce the number of decode output / encode input images needed while transcoding.

So, how would we implement this transcoding example using Vulkan Video?

Video Resources & Profiles

The first step of a transcoding application is to allocate the necessary resources. The basic resources for video decode and encode operations use standard Vulkan objects:

Video decode input and encode output bitstreams: VkBuffer
Video decode output, encode input, and decode/encode DPB images: VkImageView backed by VkImage

Vulkan Video extends VkBufferUsage, VkImageUsage and VkImageLayout with bits relevant to video decode/encode usage and layouts, which are used by applications to optimally manage video decode and encode resources.

Video codecs typically define "profiles" that are used to advertise the feature set used by a coded bitstream. Codec-compliant HW decoders often support the full set of profile features, so they can process all compliant content. In contrast, hardware vendors may support selected profile features in a hardware encoder, and still generate a compliant bitstream, driven by area and cost considerations while prioritizing key encoding APIs and use cases. The VkVideoProfileKHR structure defines the target video profile:

The video codec operation (e.g. H.265-decode or H.264-encode)
The YCbCr chroma-subsampling and luma/chroma component bit-depths (e.g. 4:2:0, 8-bit luma/chroma), as video codecs operate on YUV images for coding efficiency
The codec-specific video profile (e.g. H.264 Main profile), via a chained structure specific to the codec-operation in use (e.g. for H.264-decode, VkVideoDecodeH264ProfileInfoKHR would be chained)
Optional use case hint information, via a chained VkVideoDecodeUsageInfoKHR structure

Resources intended for video operations may have implementation-specific properties and requirements based on the target video profile. Therefore, applications should specify the target video profile when querying properties or creating various resources (images, buffers, etc.).

The following API call enumerates supported video image formats and properties for a given video codec operation, video profile and intended image usage:

vkGetPhysicalDeviceVideoFormatPropertiesKHR

Video Session

Once resources are allocated, the transcoding application creates a video session. The VkVideoSessionKHR video session object provides a context for storing persistent state while working on a specific video stream. Separate instances of VkVideoSessionKHR may be created to concurrently operate on multiple video streams. The following APIs are used to create, destroy and find out how much memory a video session object needs, as well as bind memory to it:

vkCreateVideoSessionKHR
vkDestroyVideoSessionKHR
vkGetVideoSessionMemoryRequirementsKHR
vkBindVideoSessionMemoryKHR

If the application is to support decoding a video bitstream that dynamically changes resolution–to deal with varying network conditions, for example–the video session should be created with maximum video stream parameters so that sufficient resources are allocated.

An API is provided for the application to query the capabilities of the implementation, including minimum and maximum limits for certain settings:

vkGetPhysicalDeviceVideoCapabilitiesKHR

Video Session Parameters

Vulkan Video uses VkVideoSessionParametersKHR objects, created against a given VkVideoSessionKHR instance, to store video parameter sets to control stream processing, e.g., to describe settings that apply to one or more pictures within a stream—such as H.264 sequence and picture parameter sets.

The application may create multiple session parameter objects for a given video session, specifying the maximum number of parameter sets of various kinds that this object is expected to hold. This lets the user add more parameter sets to the same object later, as long as certain conditions are true. Alternatively, the user may create another session parameters object with more storage capacity and inherit existing parameter sets, retained from a previously created session parameters object. This avoids re-translation of parameter sets through the Vulkan API and enables reusing their internal representations across objects.

The following APIs are provided to create, destroy, and update video session parameters:

vkCreateVideoSessionParametersKHR
vkDestroyVideoSessionParametersKHR
vkUpdateVideoSessionParametersKHR

Currently, the video session parameters object is used to store H.264 SPS and PPS parameter sets, and H.265 VPS, SPS and PPS parameter sets. For decode operations, the application is expected to parse bitstream segments containing these codec headers to create/update session parameter objects as needed.

Video decode hardware acceleration is typically needed only for the bitstream segments related to images/pictures or their sub-regions, while segments related to parameter sets are designed for simple CPU-based decoding or parsing. Parameter sets are also designed to efficiently communicate resource requirements for decoding the video bitstream ahead of time, and to determine whether the hardware decoder supports decoding the actual bitstreams or not.

In addition to accelerating the decoding of pictures or sub-regions, implementations may use different techniques to get around bitstream errors, such as those caused by corruption during unreliable network transmission. It may also be necessary to store statistics or states related to prior decoding to aid in decoding current/future pictures/sub-pictures in the video sequence. Typically, an application will use Vulkan Video for the heavy lifting of picture-level decoding, while handling parsing, resource management and synchronization internally.

Video Decode Operation Command

Now, it is finally time to record the video decode operation into a Vulkan command buffer using:

vkCmdDecodeVideoKHR

This is the only API call provided in the VK_KHR_video_decode_queue extension. Before being sent to the GPU, command buffers and bitstream data for the video device are processed in memory.

Currently, only picture-level decode commands are supported, as specified by the appropriate codec-specific extension structures for decode operations–for example, VkVideoDecodeH264PictureInfoKHR. We are aware of interest in finer-grained operations–such as H.264/H.265 slice-based decoding–and will do our best to expose such flexibility by working with IHVs, interested Vulkan Video TSG members and ecosystem participants to provide such support.

Video Encoding Process

Now that we have the decoded images, we are ready to encode. Encoding involves similar detailed tasks to decoding but with considerably more decision points (Figure 5). At the sequence level, the application can configure the target bitrate for the generated bitstream. Implementations use their own algorithms to figure out how complicated a picture is and how many bits should be used across pictures and in different parts of each picture. Commonly known as "rate control", this feature also necessitates storing statistics and state that may be utilized while encoding future pictures of the sequence.

As part of the encoding process, decisions must also be made regarding which codec tools to use when encoding each picture or sub-picture region, and which other pictures should be referenced while encoding. Decisions may even be applied at the lowest-level coding units (e.g. 16x16 pixel blocks) for which bitstream syntax may be specified (as defined by the codec). In addition, to generate the complete elementary video bitstream, the correct parameter sets must be coded along with the bitstreams for pictures or sub-picture regions.

Encoder implementations may differ in terms of the codec tools they offer and how much control they give the user. Similarly, user expectations vary significantly for encode. Some applications prefer low-level, precise control, while others prefer high-level settings that make the most of detailed encoding decisions behind the scenes. Some advanced users may want more control over the low-level encoding process so they can make optimizations to the application that are specific to their domain.

The result of balancing these requirements is Vulkan Video. It has a low-level API to give advanced users the flexibility they need and it uses tools and layers to hide complexity from applications that prefer a higher-level API. Vulkan Video lets vendors add extensions that allow for vendor-specific controls. If there is cross-vendor support, these controls could be made standard.

Figure 5. also illustrates some of the additional Vulkan Video commands and queries introduced, which are described next.

Video Encode Operation Commands

Now we are ready to start the encoding process by recording the video encode operation into a Vulkan command buffer:

vkCmdEncodeVideoKHR

This is the only API call provided in the VK_KHR_video_encode_queue extension.

All decisions about picture and reference management are left up to the application. The application also has direct control over the reference management bitstream syntax. In addition, the application may optionally request the generation of H.264 SPS/PPS bitstream segments by the implementation (see VkVideoEncodeH264EmitPictureParametersEXT). This provides a path for implementations to generate a complete elementary bitstream if needed.

The following API is used to record encoder rate control settings into a Vulkan command buffer:

vkCmdControlVideoCodingKHR

Note that these settings take effect in the execution timeline (i.e., at queue submission). This API also lets you reset the video session to its original state, which is required before using a session for the first time and is recommended when using the same session to process a new video stream. This generic API hook enables future extensions for other stream-level control operations.

Video Command Buffer Context

As a number of decode or encode operations may be recorded in the same command buffer, all relying on the same set of resources and settings, Vulkan Video defines a pair of API calls to mark the scope of video command control parameters during a session:

vkCmdBeginVideoCodingKHR
vkCmdEndVideoCodingKHR

vkCmdBeginVideoCodingKHR sets up the context for video operations on a single video stream in the command buffer. The VkVideoSessionKHR object is provided at this point, along with the VkVideoSessionParametersKHR object containing parameter sets for use in all subsequent video decode or encode operations until the end of scope. One or more vkCmd*Video*KHR are expected after this, specifying the actual decode/encode operation(s) and/or video control operation(s). Standard Vulkan commands for synchronization, layout transition, etc. may also be present along with the video commands. vkCmdEndVideoCodingKHR ends the scope of video operations.

Multiple sets of video commands can be recorded into the same command buffer, separated by the vkCmdBeginVideoCodingKHR and vkCmdEndVideoCodingKHR commands. Each set can use the same or a different video session and video session parameter object. You can also use a video session parameters object with its matching video session object in multiple command buffer recording calls. This lets you record into multiple command buffers at the same time.

Video Queries

Vulkan Video adds a new mandatory VkQueryType to report the location and size of the encoded bitstream in the output buffer (see VK_QUERY_TYPE_VIDEO_ENCODE_BITSTREAM_BUFFER_RANGE_KHR).

The vkCmdBeginQuery and vkCmdEndQuery commands now have an optional result status query type that can be used to find out the status of a set of operations. This result status may be reported by itself using the VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR query type, or in conjunction with another query type using VK_QUERY_RESULT_WITH_STATUS_BIT_KHR. The result status is not specific to video operations, and may be used to report errors during execution of any Vulkan commands that require additional investigation. For video operations, implementations may report error status when decoding syntax errors are encountered, or when the encode-bitstream buffer overflows.

As video queries are generally consumed by the host, video queues only support host translation of query results (vkGetQueryPoolResults), and do not support device translation (vkCmdCopyQueryPoolResults). Please let us know if device translation is important for your use case!

This concludes the transcoding example walkthrough! We hope this has given you an idea of how Vulkan Video could add new features to your own products by combining low-level video acceleration with video graphics, compute, and display operations in sophisticated Vulkan pipelines.

Call for Action, Feedback & Support!

The finalization and release of these Vulkan Video core & decode extensions marks a significant milestone in the Vulkan ecosystem roadmap, adding fully accelerated H.264 and H.265 decode to this widely available cross-platform GPU API. We encourage developers to utilize these extensions to bring new levels of performance and functionality to their video applications on Windows and Linux.

We welcome you to join Vulkanised 2023 (February 7-9 in Munich), which will include a presentation & live demo about Vulkan Video and much more!

The proposal documents provide a much more detailed and thorough review of the Vulkan Video Core and Decode extensions:

NVIDIA, Intel & AMD are the first IHVs to implement support for these extensions:

NVIDIA: Windows and Linux BETA drivers
Intel: [Updated: May 2, 2023] The Intel Graphics Driver 31.0.101.4314 for Intel Arc Graphics, 11th-13th Gen Intel Core processor graphics, and related Intel Pentium and Celeron processors introduced support Vulkan Video extension for AVC & HEVC decode.
AMD: Windows BETA driver

The official example application for Vulkan Video Decode extensions is the open-source vk_video_decode sample from NVIDIA.

Ecosystem adoption has already started. The very popular gstreamer and ffmpeg video processing frameworks and RADV and ANV open-source drivers are early adopters of Vulkan Video.

An upcoming release of Vulkan SDK will include updated Vulkan headers, and Validation Layer support for the video core and decode extensions is planned near the end of January 2023. In the meantime, please find these resources here:

Vulkan Headers
Validation Layer PR

Vulkan Video Core and Decode Final extension specification links:

We also encourage additional feedback on the following Vulkan Video Encode provisional extensions while they are being finalized:

NVIDIA’s open-source Vulkan Video Encode sample vk_video_encode and Vulkan BETA drivers may be used for early evaluation of Vulkan Video Encode support.

We look forward to your use and feedback on Vulkan Video! Please share your experience and thoughts through the Khronos Vulkan Video GitHub Issue. This issue will also be updated to provide links to Vulkan Video-related resources as they become available.

We also encourage your participation in extending Vulkan Video to support more codecs and features. See khronos.org/members for information about how to join Khronos and participate in the definition of any of our standards.

Thank you for your interest and support of Vulkan Video. We hope you find it effective for your use cases and applications, and we look forward to supporting your needs with more codecs and features!