I attended the first GStreamer conference in Cambridge on October 26, 2010. Below is my report on the sessions I attended. But first, some general observations.
- As usual, the only women were behind the registration desk. In fact, it struck me that there were even less women here than in the typical hacker convention.
- There were about 160 attendants at this first GStreamer conference. Of course, many of these mainly come for ELC, but still it’s quite a lot.
- I saw a large number of Macs (let’s say 20%). That surprises me because GStreamer doesn’t run very well on Mac…
History of GStreamer – Wim Taymans, Collabora
Wim has proven once again that good software developers don’t (necessarily) make good presentations :-).
It looks like we’ll finally move to GStreamer 1.0 next year. However, the ambitions are huge. Wim highlighted a few shortcomings which should be fixed by 1.0.
- Caps negotiations is too slow, and caps are too complex.
- Mini objects are not flexible enough to add new stuff to buffers, like DTS in the timestamp and strides in the video metadata. The idea is to make buffers really simple but to provide an area where metadata can be written dynamically.
- Dynamic pipeline modifications are really hard to get right. Wim didn’t mention the difficulty of blocking, only the problem with timestamps. The idea is to stick the segment information to pads and pass it around when pads are connected.
Cross-platform development – Michael Smith, Songbird
This was mainly about the difficulties about porting GStreamer to ‘non-Unix’ systems. I.e., Windows and MacOS.
VLC was not used because it’s GPL. Songbird is GPL, but OEM versions are also distributed under proprietary license. VLC is also difficult to work with for other things than playback (basically for other things than what you can do from the command line).
Songbird doesn’t use GStreamer for DRM playback. Obviously, because it would break DRM…
Most of GStreamer is platform-independent because it uses GLib. The exception are sources and sinks. Also the codec libraries aren’t always portable. And autotools is a major pain. And finally, distributing a GStreamer-based app is unix-focused: on Windows and Mac, you’d put the libraries and plugins in an application-specific directory instead of a system directory.
The only decent video sink on Windows is dshowvideosink. Only problem is that DirectShow a different stride than GStreamer, so the data needs to be copied. d3dvideosink is a good idea but not in upstream yet and anyway not really complete. None of them really has hardware acceleration support. For video source, ksvideosrc looks like a good idea.
It’s better on MacOS, but GStreamer lacks an OSX expert to maintain osxaudiosink and osxvideosink.
On MacOS and Windows, there are codecs preinstalled on the system: quicktime and directshow. Using those has the advantage that Apple and Microsoft have already paid the patent licenses. Unfortunately, in quicktime every codec has to be wrapped individually (to put the codec data in a custom structure); also, it doesn’t support all profiles of MPEG4 and the distinction can’t be made easily with caps. For DirectShow, there are actually three different APIs so you need different wrapping code. Also the copying problem occurs again.
Adaptive streaming – Emanuele Quacchio, STM
Although the research is for a set-top box, this talk is about streaming SVC on PC.
Created SVC decoder plugin (based on reference implementation) and extended other plugins to support adaptation. Will be upstreamed soon.
Decoder plugin adapts to change in width/height caps in the sink. h264parse is extended to not just split into NALUs, but also in AUs (which allow separating the differen qualities) and to select the quality levels to be forwarded (the rest is dropped). Also h264parse can split the stream into layers for the different qualities, which can be sent to mpegtsmux to mux into a transport stream. mpegtsmux has been extended to support this.
h264parse sends a message to the application, which can then control the quality levels based on window size, battery level, …
Integration with GStreamer QoS is still todo.
OMAP4 – Rob Clark, TI
OMAP4 new features:
- global MMU (TILER) so no longer need to allocate contiguous memory;
- Tiler can do remapping, e.g. striding and rotation, by taking a different point in the address space;
- separate address spaces for different bitwidths and strides;
- hardware to build fully accelerated codecs without involving DSP, these use two additional ARM cores and various basic hardware blocks;
- display system can also save to memory, so you can use it to accelerate e.g. scaling;
- still image memory-to-memory processor for e.g. lens distortion correction, has an additional ARM core;
Hardware doesn’t solve everything, though, and still some extra processing, cropping, memcpy are needed in software.
Introduced some hacks in GStreamer to support the hardware, e.g. strided YUV caps. As Wim mentioned, these should be replaced by a better GstBuffer implementation. Other hacks use events, but that has a huge backward compatibility problem; not sure how to fix it.
Enhancing camerabin to make more use of hardware acceleration. E.g. source element can generate simultaneous jpeg and raw versions of the same image.
OpenMAX framework that marshals OMX calls to DSP/hardware. This uses an interface layer. Problem with buffers: openmax assumes pre-negotiating a buffer pool, but this is difficult in GStreamer. Can use an omx extension to non pre-negotiate buffers, but then you have conflicting refcounting in GStreamer and in hardware.
Video conferencing – Olivier Crête, Collabora
Instead of making a conferencing application, use conferencing framework: farsight. Farsight abstracts the protocols: streaming (GStreamer) and connection setup (telepathy). Farsight “Conference” is a bin, like decodebin, that abstracts encoder/decoder, (de)payloader, and network protocols (e.g. ICE). Of course, some new concepts need to be introduced to support videoconferencing.
- Sessions = an aspect of the call. E.g. audio.
- Stream = part of a session. Every participant of the sessions has a stream.
The farsight conference element has sink pads for the audio and video that you generate locally, and src pads for all the remote streams.
Telepathy uses D-Bus to set things up. Call a telepathy method to find an account and contact, then another method to set up the call. It returns the parameters of the call.
Future: ease of use: autovideosrc element (look for one that actually works and has the requested properties), dynamic switching (e.g. unplug a camera and plug in a new one), source element with an embedded tee that automatically starts/stops playing, sink element that adds audio mixer if necessary, filter that abstracts dynamic insertion of elements into a pipeline.
Fsu = abstraction that completely hides GStreamer. Just deal with conference, session and stream. Basically a call application without the UI, because everybody wants to make their own UI.
Landell, live streaming – Luciana Fujii, Holoscopio
Record and stream video. Developed to support the FISL conference in Brazil. After trying a lot of software they ended up using gst-launch… Then a PoC in Java was made, afterwards restart on Python-Gtk+. Since it has to be open source and not patent encumbered, it uses Ogg/Theora.
Simple interface to select inputs, outputs, encoder and metadata. Pipeline is set up statically. While running, output stream layout can be edited, e.g. adding PiP or text overlay, or filtering (brightness, …). Stopping can be done for each output independently, but starting is not possible.
cairoimageoverlay (not upstreamed) to paint image overlays (watermarks).
To select a source, it doesn’t use inputselector but videomixer (alpha channel = 0 to deselect). Less problems that way. inputselector does pad blocking and stuff and this adds latency.
6 months effort to implement this.
Flumotion – Zaheer Merali, ex-Flumotion
WebM: released by Google this year, as well as GStreamer elements for the codec. But didn’t support live streaming – done by Zaheer.
Workers (doing encoding etc) can be distributed over several servers. Manager controls these workers centrally. So it’s a distributed GStreamer pipeline. You can have e.g. a producer on a beagleboard and a transcoder on PC. Workers communicate using gdp (gstreamer protocol) over file descriptors (fdsrc!gdpdepay, gdppay!multifdsink). Synchronization is done with the GStreamer network clock (GstNetClientClock).
Network operations are not done in GStreamer but in python/twisted: that anyway already has a complete implementation. It is passed to GStreamer using fdsrc/multifdsink. Multifdsink can drop buffers and continue at keyframe, and repeats stream header for new connection.
Remotely controlled vehicle – Andrey Nechypurenko and Maksym Parkachov
Hobby project to remotely control a model car. It contains a beagleboard with WLAN and a wireless camera. IO channels to control motor and servos. Live video stream to remote driver, steering action is returned. It should work over the Internet, so deal with NAT traversal, latency and packet loss problems.
- Use TCP. Gives feedback of packet delivery status and traverses firewalls.
- Vehicle monitors output queue to deduce QoS conditions.
- Different quality states, switch depending on output queue size. Adapt frame size and bitrate.
- ICE for NAT traversal. RTP payload goes over ZeroC ICE middleware. Goes through appsrc/appsink (but could be ZeroC ICE elements of course).
Rendering is done in blender so it uses opengl for rendering.
Frame size is adapted by using a capsfilter and changing its width/height on the fly. For test, netfilter is used to reduce bandwidth. The output is compared with normal GStreamer RTSP server and mplayer rendering. The latter gives full network saturation and drop in framerate and increased latency. With the adaptive control, the framerate is also reduced and latency increases, but then this is detected and instead the resolution is reduced so framerate/latency remains constant. This also reduces network usage below saturation.
Source code is on gitorious project VETER.
Intel Streaming Media Driver elements – Josep Torra, Fluendo
Intel build two x86-based SoCs for use in set-top boxes: CE4100 (Sodaville) and CE3100 (Canmore). Intel provides an SDK which is a Linux system with GStreamer. They have two integrated HD video decoders and audio DSPs, a HW blender for compositing, and have OpenGL ES support.
Intel Streaming Media Driver (ISMD) is a low-level API that accesses this hardware. Fluendo has wrapped this into GStreamer elements. Intel puts another library on top of GStreamer to make C++ multimedia applications.
ISMD elements are autopluggable into decodebin2 and friends. They can be mixed with software-based elements (e.g. muxers): buffers are converted automatically after negotiation.
DVB source can select channels and parse program info.
Audio sink has caps for compressed formats, which are handled directly by hardware. Re-encodes transparently to AC3 or DTS if the selected output channel is e.g. S/PDIF. It is also a clock provider, giving an ISMD clock (which allows hardware to synchronize on it).
Video decoder, video post-processor, video sink, etc.
Interactivity (menus etc.) – Jan Schmidt, Oracle
Use Navigation interface, which is implemented by sink elements and pushes events into the pipeline when e.g. a mouse button is clicked. Transform elements (e.g. videoscale) must also transform the events to fix the coordinate system. Events are interpreted somewhere along the pipeline, and transformed into command events. DVD source element transform the events into messages which are sent to application. Application can make queries, e.g. to find set of available commands. These can then be presented in the UI. Problem: intelligence is at front of the pipeline, which means pipeline must be flushed when something happens, and that there can be a big delay between occurrence of event and its handling (due to queue). To fix this, the overlay element that draws menus sits as close as possible to the sink of the pipeline.
This implementation is pretty much customized for DVD. As is, it can’t be used for some new non-DVD format (e.g. the graphics have to be VOPSUB).
The presentation itself was made with GstInteract, which is the base class of the DVD virtual machine. It showed working buttons etc. to go to next slide and to launch a pipeline.