Timing and synchronization in GStreamer can be confusing. This talks explains the basic concepts used in GStreamer, and why it’s done that way, based on examples.
Time is basically counting the number of occurrences of a repeating event. You can’t measure absolute time, only time differences, so you need a reference. Also the counter value itself is not relevant, it has to be converted to something standard (a second). Such a counter is a clock.
GstClock reports a monotonic value, as an absolute time against some unspecified reference. It can be used without a pipeline.
A buffer timestamp (pts) specifies how the different buffers in a stream should be ordered and synchronized in time. They are relative to a certain time, specific for that stream. In order to allow for synchronization between different streams, GStreamer defines a Base Time, an absolute GstClock value. After that, we can measure the Running Time, a monotonically increasing global time referenced to the Base Time. To relate buffer timestamps to the Running Time, segments are introduced.
A segment has a start and stop time: the first and last buffer timestamp that are valid for that segment. If the segment itself is issued at the base time, then the running time for a buffer = buffer.pts – segment.start.
When playing, the Running Time is suspended. The GstClock time is still increasing, however. Therefore, the Base Time is updated when you start playing again.
For playing faster/slower, there are several options. A segment can have a rate property, that changes the way that the running time for a buffer is updated: buffer.running_time = (buffer.pts – segment.start) / abs(segment.rate). For playing backward, segment.rate is negative but we also have to start from the other end of the segment, so buffer.running_time = (segment.stop – buffer.pts) / abs(segment.rate).
The final concept is Stream Time. It is the position in the pipeline, as presented to the user (e.g. in a progress bar). Most of the time it is the same as buffer timestamps. It is different when watching streams that don’t start from zero, e.g. a live stream that already started long before. To get Stream Time, a segment has segment.time. Stream Time = buffer.pts – segment.time.
A “live source” is something that you cannot capture earlier or later than when the event happens. E.g. a live webcast, a camera or microphone input. We want to make sure that those buffers have a timestamp that corresponds to the same moment in the real world, so that the sound recorded by the microphone is synchronized with the video captured from the camera. Problem is that there is a different latency in the two sources, so the timestamp when we actually get the buffer from the hardware is not necessarily correct. Subtracting a delay from both streams to give the pts would work, but then the sink would always consider that the buffers arrive too late. That’s where the concept of latency comes in.
Every element reports its latency: the minimum amount of time it will introduce, and the maximum delay it can support before it starts dropping or blocking. For instance, in an audio source there is a ring buffer. If you wait longer to read than the size of the buffer, you loose data. A queue can be used to increase the maximum latency. The maximum is used for a sanity check: if your minimum latency on one path of the stream is larger than the maximum supported in any other path, it will never work. The pipeline latency is the maximum of all minimum latencies for all paths. The pipeline latency is added to the current time in the sink to evaluate if a buffer is too late to be displayed.