[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1261140546.15591.5.camel@marge.simson.net>
Date: Fri, 18 Dec 2009 13:49:06 +0100
From: Mike Galbraith <efault@....de>
To: Jason Garrett-Glaser <darkshikari@...il.com>
Cc: Ingo Molnar <mingo@...e.hu>, Kasper Sandberg <lkml@...anurb.dk>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
LKML Mailinglist <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: x264 benchmarks BFS vs CFS
On Fri, 2009-12-18 at 02:11 -0800, Jason Garrett-Glaser wrote:
> Two more thoughts here:
>
> 1) We're considering moving to a thread pool soon; we already have a
> working patch for it and if anything it'll save a few clocks spent on
> nice()ing threads and other such things. Will this improve
> START_DEBIT at all?
Yeah, START_DEBIT only affects a thread once.
> I've attached the beta patch if you want to try
> it. Note this also works with 2) as well, so it adds yet another
> dimension to what's mentioned below.
>
> 2) We recently implemented a new threading model which may be
> interesting to test as well. This threading model gives worse
> compression *and* performance, but has one benefit: it adds zero
> latency, whereas normal threading adds a full frame of latency per
> thread. This was paid for by a company interested in
> ultra-low-latency streaming applications, where 1 millisecond is a
> huge deal. I've been thinking this might be interesting to bench from
> a kernel perspective as well, as when you're spawning a half-dozen
> threads and need them all done within 6 milliseconds, you start
> getting down to serious scheduler issues.
>
> The new threading model is much less complex than the regular one and
> works as follows. The frame is split into X slices, and each slice
> encoded with one thread. Specifically, it works via the following
> process:
>
> 1. Preprocess input frame, perform lookahead analysis on input frame
> (all singlethreaded)
> 2. Split up a ton of threads to do the main encode, one per slice.
> 3. Join all the threads.
> 4. Do post-filtering on the output frame, return.
>
> Clearly this is an utter disaster, since it spawns N times as many
> threads as the old threading model *and* they last far shorter, *and*
> only part of the application is multithreaded. But there's not really
> a better way to do low-latency threading, and it's an interesting
> challenge to boot. IIRC, it's also the way ffmpeg's encoder threading
> works. It's widely considered an inferior model, but as mentioned
> before, in this particular use-case there's no choice.
>
> To enable this, use --sliced-threads. I'd recommend using a
> higher-resolution clip for this, as it performs atrociously bad on
> very low resolution videos for reasons you might be able to guess. If
> you need a higher-res clip, check the SD or HD ones here:
> http://media.xiph.org/video/derf/ .
In another 8 hrs 24 min, I'll have a sunflower to stare at.
> I'm personally curious as to what kind of scheduler issues this
> results in--I haven't done any BFS vs CFS tests with this option
> enabled yet.
I'll look for x264 source, and patch/piddle.
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists