[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1271688225.1488.237.camel@laptop>
Date: Mon, 19 Apr 2010 16:43:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Ingo Molnar <mingo@...e.hu>, Mike Galbraith <efault@....de>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Greg Kroah-Hartman <greg@...ah.com>,
Steven Rostedt <rostedt@...dmis.org>,
Jarkko Nikula <jhnikula@...il.com>,
Tony Lindgren <tony@...mide.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC patch] CFS fix place entity spread issue (v2)
On Sun, 2010-04-18 at 09:13 -0400, Mathieu Desnoyers wrote:
OK, so looking purely at the patch:
> Index: linux-2.6-lttng.git/kernel/sched_fair.c
> ===================================================================
> --- linux-2.6-lttng.git.orig/kernel/sched_fair.c 2010-04-18 01:48:04.000000000 -0400
> +++ linux-2.6-lttng.git/kernel/sched_fair.c 2010-04-18 08:58:30.000000000 -0400
> @@ -738,6 +738,14 @@
> unsigned long thresh = sysctl_sched_latency;
>
> /*
> + * Place the woken up task relative to
> + * min_vruntime + sysctl_sched_latency.
> + * We must _never_ decrement min_vruntime, because the effect is
Nobody I could find decrements min_vruntime, and certainly
place_entity() doesn't change min_vruntime. So this is a totally
mis-guided comment.
> + * that spread increases progressively under the Xorg workload.
> + */
> + vruntime += sysctl_sched_latency;
So in effect you change:
vruntime = max(vruntime, min_vruntime - thresh/2)
into
vruntime = max(vruntime, min_vruntime + thresh/2)
in a non-obvious way and unclear reason.
> + /*
> * Convert the sleeper threshold into virtual time.
> * SCHED_IDLE is a special sub-class. We care about
> * fairness only relative to other SCHED_IDLE tasks,
> @@ -755,6 +763,9 @@
> thresh >>= 1;
>
> vruntime -= thresh;
> +
> + /* ensure min_vruntime never go backwards. */
> + vruntime = max_t(u64, vruntime, cfs_rq->min_vruntime);
So the comment doesn't match the code, nor is it correct.
The code tries to implement clipping vruntime to min_vruntime, not
clipping min_vruntime, but then botches it by not taking wrap-around
into account.
Now, I know why your patch helps you (its in effect similar to what
START_DEBIT does for fork()), but getting the wakeup-preemption to do
something nice along with it is the hard part.
The whole perfectly fair scheduling thing is more-or-less doable
(dealing with tasks dying with !0-lag gets interesting, you'd have to
start adjusting global-timeline like things for that). But the thing is
that it makes for rather poor interactive behaviour.
Letting a task that sleeps long and runs short preempt heavier tasks
generally works well. Also, there's a number of apps that get a nice
boost from getting preempted before they can actually block on a
(read-like) systemcall, That saves a whole scheduler round-trip on the
wakeup side, so ping-pong like tasks love this too.
And then there is the whole signal delivery muck..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists