linux-kernel - Re: CFS: some bad numbers with Java/database threading [FIXED]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070919195633.GA23595@elte.hu>
Date:	Wed, 19 Sep 2007 21:56:33 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Chuck Ebbert <cebbert@...hat.com>,
	Antoine Martin <antoine@...afix.co.uk>,
	Satyam Sharma <satyam.sharma@...il.com>,
	Linux Kernel Development <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: CFS: some bad numbers with Java/database threading [FIXED]

* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> > Linus, what do you think? I have no strong feelings, I think the 
> > patch cannot hurt (it does not change anything by default) - but we 
> > should not turn the workaround flag on by default.
> 
> I disagree. I think CFS made "sched_yield()" worse, and what you call 
> "bug workaround" is likely the *better* behaviour.
> 
> The fact is, sched_yield() is not - and should not be - about 
> "recalculating the position in the scheduler queue" like you do now in 
> CFS.
> 
> It very much is about moving the thread *dead last* within its 
> priority group.
[...]
> and quite frankly, the current CFS behaviour simply looks buggy. It 
> should simply not move it to the "right place" in the rbtree. It 
> should move it *last*.

ok, we can do that.

the O(1) implementation of yield() was pretty arbitrary: it did not move 
it last on the same priority level - it only did it within the active 
array. So expired tasks (such as CPU hogs) would come _after_ a 
yield()-ing task.

so the yield() implementation was so much tied to the data structures of 
the O(1) scheduler that it was impossible to fully emulate it in CFS.

in CFS we dont have a per-nice-level rbtree, so we cannot move it dead 
last within the same priority group - but we can move it dead last in 
the whole tree. (then they'd be put even after nice +19 tasks.) People 
might complain about _that_.

another practical problem is that this will break certain desktop apps 
that do calls to yield() [some firefox plugins do that, some 3D apps do 
that, etc.] but they dont expect to be moved 'very late' into the queue 
- they expect the O(1) scheduler's behavior of being delayed "a bit". 
(That's why i added the yield-granularity tunable.)

we can make yield super-agressive, that is pretty much the only sane 
(because well-defined) thing to do (besides turning yield into a NOP), 
but there will be lots of regression reports about lost interactivity 
during load. sched_yield() is a mortally broken API. "fix the app" would 
be the answer, but still there will be lots of complaints.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/