linux-kernel - Re: x264 benchmarks BFS vs CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1261140546.15591.5.camel@marge.simson.net>
Date:	Fri, 18 Dec 2009 13:49:06 +0100
From:	Mike Galbraith <efault@....de>
To:	Jason Garrett-Glaser <darkshikari@...il.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Kasper Sandberg <lkml@...anurb.dk>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	LKML Mailinglist <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: x264 benchmarks BFS vs CFS

On Fri, 2009-12-18 at 02:11 -0800, Jason Garrett-Glaser wrote:

> Two more thoughts here:
> 
> 1) We're considering moving to a thread pool soon; we already have a
> working patch for it and if anything it'll save a few clocks spent on
> nice()ing threads and other such things.  Will this improve
> START_DEBIT at all?

Yeah, START_DEBIT only affects a thread once.

>   I've attached the beta patch if you want to try
> it.  Note this also works with 2) as well, so it adds yet another
> dimension to what's mentioned below.
> 
> 2) We recently implemented a new threading model which may be
> interesting to test as well.  This threading model gives worse
> compression *and* performance, but has one benefit: it adds zero
> latency, whereas normal threading adds a full frame of latency per
> thread.  This was paid for by a company interested in
> ultra-low-latency streaming applications, where 1 millisecond is a
> huge deal.  I've been thinking this might be interesting to bench from
> a kernel perspective as well, as when you're spawning a half-dozen
> threads and need them all done within 6 milliseconds, you start
> getting down to serious scheduler issues.
> 
> The new threading model is much less complex than the regular one and
> works as follows.  The frame is split into X slices, and each slice
> encoded with one thread.  Specifically, it works via the following
> process:
> 
> 1.  Preprocess input frame, perform lookahead analysis on input frame
> (all singlethreaded)
> 2.  Split up a ton of threads to do the main encode, one per slice.
> 3.  Join all the threads.
> 4.  Do post-filtering on the output frame, return.
> 
> Clearly this is an utter disaster, since it spawns N times as many
> threads as the old threading model *and* they last far shorter, *and*
> only part of the application is multithreaded.  But there's not really
> a better way to do low-latency threading, and it's an interesting
> challenge to boot.  IIRC, it's also the way ffmpeg's encoder threading
> works.  It's widely considered an inferior model, but as mentioned
> before, in this particular use-case there's no choice.
> 
> To enable this, use --sliced-threads.  I'd recommend using a
> higher-resolution clip for this, as it performs atrociously bad on
> very low resolution videos for reasons you might be able to guess.  If
> you need a higher-res clip, check the SD or HD ones here:
> http://media.xiph.org/video/derf/ .

In another 8 hrs 24 min, I'll have a sunflower to stare at.

> I'm personally curious as to what kind of scheduler issues this
> results in--I haven't done any BFS vs CFS tests with this option
> enabled yet.

I'll look for x264 source, and patch/piddle.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/