Linux kernel multimedia experiments

Miguel Freitas
Updated Apr 23, 2003

What is this document about?

This is a set of experiments i made with several kernels for multimedia applications, most notably xine. My objective is to have a great linux desktop which plays audio and video with the best performance possible, requiring minimum (if any) intervention from the user.

As a xine user and developer i coded a simple test program (multimedia_sim.c) trying to simulate the video playback workload and take real world measures (those that are noticed when watching the video) like the number of dropped frames.

What this is not...

Motivation

I was a happy user of stock 2.4.17 kernel for several months until i decided to upgrade my system and give redhat 2.4.18-17 kernel a try. The result was quite disapointing, playing videos in xine were not as smoother as they used to be. I tried mplayer to see if it were something i did wrong in xine and got the same behaviour.

I finally reached the source of non-smoothness: a small applet in my KDE panel (KSysGuard) which shows a graph of cpu usage and is updated on every 2 seconds. Every time an update is made, there are dropped frames in player.

My primary, and most surprising observation, is that kernel 2.4.17 (stock) doesn't have that problem. The newer kernels with O(1) scheduler i tested (2.4.18-rh, 2.4.19-ck14, 2.5.51) all dropped frames at KSysGuard updates.

Why frames are dropped even with low cpu utilization ?

In other to display a video, the player must be able to send frames to screen at a constant rate. For NTSC material, for example, one frame should be show at every 33ms. Unfortunately the player is not the only responsible for that, it also depends on kernel and X server.

One of the greatest improvements of XFree86 4.x series for multimedia users was the introduction of XV extension for image scaling and color conversion. That made player's job easier by not having to access the hardware directly in order to get better quality and lower cpu utilization. Now most people are probably using XV without even knowing about it, which is a very good thing.

The average xine user, will have three important processes that must be scheduled at every frame:

  1. video_out thread: This will receive frames queued for displaying and ensure they are sent to video driver at the right time. The common case is using XV driver from the X-Server.
  2. XFree86 process: This process will receive frames using a shared memory and actually show them, that is, copy it to video card memory or something. Since a shared memory is used to speed up communication with the player, a xshm completion event is sent back to the application.
  3. xine-ui main thread: This will receive general events from the X-Server like user's actions and also the xshm completion events. The completion events are needed to know if it's safe to manipulate the shared memory again. (i have not included this thread in multimedia_sim.c because it's xine specific and i know there are improvements to be done here - see updates below)

The problem is that if any of these threads isn't scheduled for a long time (let's say, 100ms?) frames will be dropped. And user will notice the judder or jerky video playback.

The XFree86 issue is outside the scope of xine project. Quoting Mark Vojkovich from nVidia in a xine-devel thread: "You can never guarantee that (frames will be show at the right time just because you sent them to XFree86 at the right time). The X-server might not have even been scheduled when you made those request. Even at 1/24 the X-server might wake up to find that it has 3 frames in its queue. Linux and it's coarse scheduling are not well suited for multimedia. The server is definitely in the position where it has more frames to process than the hardware can queue."

This is the same problem experienced by DVD users trying to play their movies with DMA disable. It's not a matter of transfering data faster to cpu (DVD playback requires modest data rate for today's parameters), but of not introducing high latencies to the running processes. There is no way to show 30 frames/second if player isn't called often enough by the kernel.

Newer kernels and O(1) scheduler

Latest linux kernels (2.5.x and some non-official 2.4.x trees) includes several important improvements for the multimedia desktop: preemptive patch (recommended interview with Robert Love) and increased HZ can lower the scheduling latencies, therefore showing frames with better accuracy (ie, a more constant frame rate).

O(1) scheduler is also a notable improvement by the great kernel hacker Ingo Molnar (recommend interview). The new scheduler takes nice levels more seriously, and that may be the cause of the problems i've experienced.

Quoting some private conversation i had with Ingo: "that's the main problem with O(1) and video playback - the xine threads use up lots of CPU time, especially with higher quality movies where lots of decompression has to be done - and the kernel rates them as 'CPU hogs'"

"There's no way this could be detected automatically - nothing really differentiates a xine thread from a number cruncher process (from the kernel's point of view) - both use up lots of CPU time. The difference in the xine case is that there's a human eye attached to the screen output - unfortunately it would be hard for the kernel to detect this :-)"

multimedia_sim.c

This is a simple multithreaded program designed to simulate a dummy multimedia application and measure how it would perform in a loaded system. It implements the primitive mechanisms of xine architecture, except that no video is currently decoded or copied around.

  1. decoder thread: simulates a cpu load decoding a mpeg stream. Every dummy decoded frame is enqueued for displaying by the video_out thread. 15 frames are decoded "ahead" of the screen, so this thread is able to tolerate much higher latencies.
  2. video_out thread: (1) gets an enqueued frame. (2) sleeps until it's time to display the frame. (3) wake up the server (send a frame).
  3. server thread: just wait for frames to come and do performance measurements.

The later two threads can have their nice level changed by command line parameters. Another parameter configure the time to run the test. For additional information refer to the comments in latest version source code.

Preliminary results

I did a first try of running multimedia_sim.c using the ConTest script. I just modified it to run my program instead of a kernel compilation, keeping the same load conditions. The results were quite interesting: although i did only a few tests on each kernel, the output were very consistent across multiple runs.

The kernels tested are 2.4.17 (stock), 2.4.18-17.7.x (redhat), 2.4.19-ck14 (using vm from aa tree, O1, lolat and preemp) and 2.5.51 (stock, no preemp). The program source (old version) and raw results can be found here.

Notes from the above results:

What should we expect?

Of course, i don't expect any magic from kernel side. I don't expect, for example, to be able to do several kernel compilations in background without experiencing any degradation in my player performance (except if i have root access to boost xine and X priorities much higher than anything else).

However i expect that an average user doesn't need to care about tweaking nice levels of xine and XFree86 initilization scripts in order to have a good multimedia performance. Some distros are doing their part and shipping X server with -10 nice level (see Ingo's interview above), which improves response of the interface and also video playback.

I guess that the missing part would be trying to fix any kernel corner cases (io_load test, for example, had better results with older 2.4.17) and also having a way that non-root process could hint the scheduler that he wants a bit higher priority. Quoting a private email from Con Kolivas: "I think it is unfair that for smooth audio/video playback the software should be -nice.", "Increasingly many distributions don't let you suid root a lot of applications (namely gtk)".

Ingo suggested that one possible idea would be allowing user applications to set small negative nice values (for example, -5 to -1). That would work like the sort of hint i'm looking for. Besides it would still leave a reasonably good range (-20 to -6) to root only.

I my experiences, i noticed great improvements in xine playback smoothness by simple setting nice(-1) at the video output thread and having the X server with nice(-10). With mplayer, the same result was achieved using nice(-2). This is an expected behaviour since mplayer is a single-thread application and, as Ingo pointed, to the kernel it looks like a "CPU hog" process.

Acknowledges

I would like to thank Ingo Molnar and Con Kolivas for the discussion and ideas. Please notice that I have not yet received Ingo answer about the "preliminary results" above, so the comments in this document are my responsibility only.

Updates

  • 16 Apr 2003 - libxine doesn't use completion events anymore. That change improves smoothness of playback and completely detach the scheduler problem from the GUI thread.
  • 23 Apr 2003 - version 0.3 consider a dropped frame if latency exceeds FRAME_PERIOD. it used to be FRAME_PERIOD/2 which is obviously wrong and show very bad results for stock kernels where HZ=100.

  • please send your spam to this email


    powered by
    Center for Telecommunications Studies of PUC-Rio
    PUC-Rio (Brasil)