The never-ending story: GStreamer and hardware integration – Sebastian Dröge

Integration of hardware accelerators in GStreamer has always been a difficult topic. Everything should be possible in 1.x, but it’s not done. There is also no high-level overview documentation – this presentation fixes that. In GStreamer 1.x it is possible to allocate special memory (shared with accelerator) with GstMemory and GstAllocator – no more subclassing of GstBuffer. See also http://coaxion.net/blog

GstMemory knows its allocator – but several allocators may allocate the same memory type. Some hardware memory needs special access methods for reading/writing – GstMemory provides that by subclassing it. For instance, for GL textures the access would be delegated to a GL thread (it’s inconvenient but it works just fine). If all you need is aligned or padded or physically contiguous memory, GstAllocator has parameters to handle those cases, so no subclasses are needed.

A buffer is now a list of GstMemories with the data and a list of metadata objects. It’s a list of GstMemories so you can for instance store different video planes at separate addresses. The interpretation of the data is done according to the last seen CAPS event – so caps are no longer part of the buffer. For the common case where you need to manage a pool of buffers, GstBufferPool provides a base class with the standard functionality, that can be overridden. But if you just need to pre-allocate or use a specific allocator, you only need to configure a GstBufferPool object and overriding is not needed. For video, a GstVideoBufferPool exists that takes care of per-plane padding, for allocating according to the caps, etc.

Buffer metadata can be used to delegate processing to the sink, which can often do it much more efficiently. E.g. cropping can be done by the compositor. There are standard GstMeta subclasses to pass this kind of information. Specifically for GL textures there is also metadata, to keep track of the mapping of the buffer data into GL handles. The metadata that will be passed is negotiated with a query. So if the downstream elements don’t support cropping, then the cropping will be done by the upstream element after all and no cropping metadata is sent.

Negotiation: First, the upstream element negotiates caps with a CAPS query and a CAPS event. Then, an ALLOCATION query is sent to allocate buffer space. In 1.2, GstCapsFeatures is added, which puts additional constraints on caps, e.g. expressing that the raw video should use EGLImage memory. You can also constraint metadata in this way. By doing this in caps negotiation, the sink can express preferences; e.g. a gl sink prefers EGLImage memory but can also deal with other memory by copying. The elements in the middle of the stream can update the caps list to express their own preferences. In 1.x, the query itself can already express some constraints, e.g. the source can exclude grayscale if it knows that its stream is colour.

ALLOCATION is a second step that uses the caps negotiated before, but not exactly e.g. if the image is cropped downstream with a Meta tag, the stream caps are smaller but the allocation should be for the full image. The sink proposes a list of allocators and features, upstream elements can filter this list according to their requirements, and the element that fills the buffers (e.g. the decoder) calls the resulting allocator and configures its buffer pool.

With hardware acceleration, different elements usually need to share some kind of context. For instance the GL context. For this, GstContext queries have been added. For global contexts, there are messages. So individual elements don’t need to worry about contexts. For instance, an element that does a GL operation queries the EGLDisplay context from upstream and downstream; the sink can provide this. If no context is found, the element creates one and sends a message to the bin so other elements can get access to it as well.

Things not solved:

Reconfiguring a memory provider. This usually requires collecting all memories before the new ones can be allocated. The solution is to tell everybody to release their memory, but this may be problematic with 3rd party libraries.

Devicee probing API is missing. For example if an application needs a camera, there is currently no GST way to query the available cameras. The solution will be based on GstPluginFeature objects, that allow to enumerate the cameras visible to v4l2src.

Current status:

gst-vaapi works completely transparently. gst-omx and v4l2 can work in zerocopy. On RPi it is possible to decode HD video to EGLImages in realtime. plugins-gl has a solution for all threading problems and interoperates with non-GL elements by falling back to copying.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s