[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1271688225.1488.237.camel@laptop>
Date:	Mon, 19 Apr 2010 16:43:45 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Mike Galbraith <efault@....de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Greg Kroah-Hartman <greg@...ah.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Jarkko Nikula <jhnikula@...il.com>,
	Tony Lindgren <tony@...mide.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC patch] CFS fix place entity spread issue (v2)
On Sun, 2010-04-18 at 09:13 -0400, Mathieu Desnoyers wrote:
OK, so looking purely at the patch:
> Index: linux-2.6-lttng.git/kernel/sched_fair.c
> ===================================================================
> --- linux-2.6-lttng.git.orig/kernel/sched_fair.c        2010-04-18 01:48:04.000000000 -0400
> +++ linux-2.6-lttng.git/kernel/sched_fair.c     2010-04-18 08:58:30.000000000 -0400
> @@ -738,6 +738,14 @@
>                 unsigned long thresh = sysctl_sched_latency;
>  
>                 /*
> +                * Place the woken up task relative to
> +                * min_vruntime + sysctl_sched_latency.
> +                * We must _never_ decrement min_vruntime, because the effect is
Nobody I could find decrements min_vruntime, and certainly
place_entity() doesn't change min_vruntime. So this is a totally
mis-guided comment.
> +                * that spread increases progressively under the Xorg workload.
> +                */
> +               vruntime += sysctl_sched_latency;
So in effect you change: 
  vruntime = max(vruntime, min_vruntime - thresh/2)
into
  vruntime = max(vruntime, min_vruntime + thresh/2)
in a non-obvious way and unclear reason.
> +               /*
>                  * Convert the sleeper threshold into virtual time.
>                  * SCHED_IDLE is a special sub-class.  We care about
>                  * fairness only relative to other SCHED_IDLE tasks,
> @@ -755,6 +763,9 @@
>                         thresh >>= 1;
>  
>                 vruntime -= thresh;
> +
> +               /* ensure min_vruntime never go backwards. */
> +               vruntime = max_t(u64, vruntime, cfs_rq->min_vruntime);
So the comment doesn't match the code, nor is it correct.
The code tries to implement clipping vruntime to min_vruntime, not
clipping min_vruntime, but then botches it by not taking wrap-around
into account.
Now, I know why your patch helps you (its in effect similar to what
START_DEBIT does for fork()), but getting the wakeup-preemption to do
something nice along with it is the hard part.
The whole perfectly fair scheduling thing is more-or-less doable
(dealing with tasks dying with !0-lag gets interesting, you'd have to
start adjusting global-timeline like things for that). But the thing is
that it makes for rather poor interactive behaviour.
Letting a task that sleeps long and runs short preempt heavier tasks
generally works well. Also, there's a number of apps that get a nice
boost from getting preempted before they can actually block on a
(read-like) systemcall, That saves a whole scheduler round-trip on the
wakeup side, so ping-pong like tasks love this too.
And then there is the whole signal delivery muck..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
