[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.999.0709191306530.16478@woody.linux-foundation.org>
Date: Wed, 19 Sep 2007 13:28:47 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Ingo Molnar <mingo@...e.hu>
cc: Chuck Ebbert <cebbert@...hat.com>,
Antoine Martin <antoine@...afix.co.uk>,
Satyam Sharma <satyam.sharma@...il.com>,
Linux Kernel Development <linux-kernel@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: CFS: some bad numbers with Java/database threading [FIXED]
On Wed, 19 Sep 2007, Ingo Molnar wrote:
>
> in CFS we dont have a per-nice-level rbtree, so we cannot move it dead
> last within the same priority group - but we can move it dead last in
> the whole tree. (then they'd be put even after nice +19 tasks.) People
> might complain about _that_.
Yeah, with reasonably good reason.
The sched_yield() behaviour is actually very well-defined for RT tasks
(now, whether it's a good interface to use or not is still open to debate,
but at least it's a _defined_ interface ;), and I think we should at least
try to approximate that behaviour for normal tasks, even if they aren't
RT.
And the closest you can get is basically to say something that is close to
"sched_yield()" on a non-RT process still means that all
other runnable tasks in that same nice-level will be scheduled
before the task is scheduled again.
and I think that would be the optimal approximation of the RT behaviour.
Now, quite understandably we might not be able to actually get that
*exact* behaviour (since we mix up different nice levels), but I think
from a QoI standpoint we should really have that as a "target" behaviour.
So I think it should be moved back in the RB-tree, but really preferably
only back to behind all other processes at the same nice-level.
So I think we have two problems with yield():
- it really doesn't have very nice/good semantics in the first place for
anything but strict round-robin RT tasks.
We can't do much about this problem, apart from trying to make people
use proper locking and other models that do *not* depend on yield().
- Linux has made the problem a bit worse by then having fairly arbitrary
and different semantics over time.
This is where I think it would be a good idea to try to have the above
kind of "this is the ideal behaviour - we don't guarantee it being
precise, but at least we'd have some definition of what people should
be able to expect is the "best" behaviour.
So I don't think a "globally last" option is at all the best thing, but I
think it's likely better than what CFS does now. And if we then have some
agreement on what would be considered a further _improvement_, then that
would be a good thing - maybe we're not perfect, but at least having a
view of what we want to do is a good idea.
Btw, the "enqueue at the end" could easily be a statistical thing, not an
absolute thing. So it's possible that we could perhaps implement the CFS
"yield()" using the same logic as we have now, except *not* calling the
"update_stats()" stuff:
__dequeue_entity(..);
__enqueue_entity(..);
and then just force the "fair_key" of the to something that
*statistically* means that it should be at the end of its nice queue.
I dunno.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists