linux-kernel - Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1202808206.7829.36.camel@homer.simson.net>
Date:	Tue, 12 Feb 2008 10:23:26 +0100
From:	Mike Galbraith <efault@....de>
To:	Olof Johansson <olof@...om.net>
Cc:	Willy Tarreau <w@....eu>, linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived
	threads


On Mon, 2008-02-11 at 14:31 -0600, Olof Johansson wrote:
> On Mon, Feb 11, 2008 at 08:58:46PM +0100, Mike Galbraith wrote:

> > It shouldn't matter if you yield or not really, that should reduce the
> > number of non-work spin cycles wasted awaiting preemption as threads
> > execute in series (the problem), and should improve your performance
> > numbers, but not beyond single threaded.
> > 
> > If I plugged a yield into the busy wait, I would expect to see a large
> > behavioral difference due to yield implementation changes, but that
> > would only be a symptom in this case, no?  Yield should be a noop.
> 
> Exactly. It made a big impact on the first testcase from Friday, where
> the spin-off thread spent the bulk of the time in the busy-wait loop,
> with a very small initial workload loop. Thus the yield passed the cpu
> over to the other thread who got a chance to run the small workload,
> followed by a quick finish by both of them. The better model spends the
> bulk of the time in the first workload loop, so yielding doesn't gain
> at all the same amount.

There is a strong dependency on execution order in this testcase.

Between cpu affinity and giving the child a little head start to reduce
the chance (100% if child wakes on same CPU and doesn't preempt parent)
of busy wait, modified testcase behaves.  I don't think I should need
the CPU affinity, but I do.

If you plunk a usleep(1) in prior to calling thread_func() does your
testcase performance change radically?  If so, I wonder if the real
application has the same kind of dependency.

	-Mike

View attachment "threadtest.c" of type "text/x-csrc" (2030 bytes)