[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47545F88.6070609@nortel.com>
Date: Mon, 03 Dec 2007 13:56:56 -0600
From: "Chris Friesen" <cfriesen@...tel.com>
To: davids@...master.com
CC: Nick Piggin <nickpiggin@...oo.com.au>, Ingo Molnar <mingo@...e.hu>,
"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: sched_yield: delete sysctl_sched_compat_yield
David Schwartz wrote:
> Chris Friesen wrote:
> If CFS really can't support sched_yield's semantics, then it should just
> not, and that's that. Return ENOSYS and admit that the behavior sched_yield
> is documented to have simply can't be supported by the scheduler.
That's just it though...sched_yield() with SCHED_OTHER doesn't have well
defined semantics, so we can do just about anything we want.
The issue is mostly how to work around existing apps that (invalidly)
expect certain behaviour from sched_yield().
>>The problem is where do we insert the task that is yielding? CFS is
>>based around a tree structure ordered by time.
> We put it exactly where we would have when its timeslice ran out. If we can
> reward it a little bit, that's great. But if not, we can live with that.
> Just imagine that the timer interrupt fired to indicate the end of the
> thread's run time when the thread called 'sched_yield'.
CFS doesn't really do "timeslice". But in essence what you are
describing is the default behaviour currently...it simply removes the
task from the tree and reinserts it based on how much cpu time it used up.
> Then what does he do when the task runs out of run time? It's hard to
> imagine we can't do that when the task calls sched_yield.
It gets reinserted into the tree at a position based on how much cpu
time it used. This is exactly the current sched_yield() behaviour.
>>It just says to delay the task,
>>without specifying how long or what the task is waiting *for*.
> That is not true. The task is waiting for something that will be done by
> another thread that is ready-to-run and at the same priority level. The task
> does not need to wait until the thing is guaranteed done but wishes to wait
> until it is more likely to be done. This is an often-misused but sometimes
> sensible thing to do.
The scheduler still doesn't know specifically what the task is waiting for.
> Sure, if there is more information. But if all you really want to do is wait
> until other threads at the same static priority level have had a chance to
> run, then sched_yield is the right API.
Technically, all of SCHED_OTHER has static priority level zero. Thus
the "right" thing to do is to allow all SCHED_OTHER tasks to run,
including the ones with the highst possible nice level.
This is the alternate implementation in the current code, but it has
latency implications that may be unexpected by applications written for
the previous 2.6 behaviour.
Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists