[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1374076741.7412.35.camel@j-VirtualBox>
Date: Wed, 17 Jul 2013 08:59:01 -0700
From: Jason Low <jason.low2@...com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Mike Galbraith <efault@....de>,
Thomas Gleixner <tglx@...utronix.de>,
Paul Turner <pjt@...gle.com>, Alex Shi <alex.shi@...el.com>,
Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Morten Rasmussen <morten.rasmussen@....com>,
Namhyung Kim <namhyung@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Kees Cook <keescook@...omium.org>,
Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
aswin@...com, scott.norton@...com, chegu_vinod@...com
Subject: Re: [RFC] sched: Limit idle_balance() when it is being used too
frequently
Hi Peter,
On Wed, 2013-07-17 at 11:39 +0200, Peter Zijlstra wrote:
> On Wed, Jul 17, 2013 at 01:11:41AM -0700, Jason Low wrote:
> > For the more complex model, are you suggesting that each completion time
> > is the time it takes to complete 1 iteration of the for_each_domain()
> > loop?
>
> Per sd, yes? So higher domains (or lower depending on how you model the thing
> in you head) have bigger CPU spans, and thus take longer to complete. Imagine
> the top domain of a 4096 cpu system, it would go look at all cpus to see if it
> could find a task.
>
> > Based on some of the data I collected, a single iteration of the
> > for_each_domain() loop is almost always significantly lower than the
> > approximate CPU idle time, even in workloads where idle_balance is
> > lowering performance. The bigger issue is that it takes so many of these
> > attempts before idle_balance actually "worked" and pulls a tasks.
>
> I'm confused, so:
>
> schedule()
> if (!rq->nr_running)
> idle_balance()
> for_each_domain(sd)
> load_balance(sd)
>
> is the entire thing, there's no other loop in there.
So if we have the following:
for_each_domain(sd)
before = sched_clock_cpu
load_balance(sd)
after = sched_clock_cpu
idle_balance_completion_time = after - before
At this point, the "idle_balance_completion_time" is usually a very
small value and is usually a lot smaller than the avg CPU idle time.
However, the vast majority of the time, load_balance returns 0.
> > I initially was thinking about each "completion time" of an idle balance
> > as the sum total of the times of all iterations to complete until a task
> > is successfully pulled within each domain.
>
> So you're saying that normally idle_balance() won't find a task to pull? And we
> need many times going newidle before we do get something?
Yes, a while ago, I collected some data on the rate in which
idle_balance() does not pull tasks, and it was a very high number.
> Wouldn't this mean that there simply weren't enough tasks to keep all cpus busy?
If I remember correctly, in a lot of those load_balance attempts when
the machine is under a high Java load, there were no "imbalance" between
the groups in each sched_domain.
> If there were tasks we could've pulled, we might need to look at why they
> weren't and maybe fix that. Now it could be that it things this cpu, even with
> the (little) idle time it has is sufficiently loaded and we'll get a 'local'
> wakeup soon enough. That's perfectly fine.
>
> What we should avoid is spending more time looking for tasks then we have idle,
> since that reduces the total time we can spend doing useful work. So that is I
> think the critical cut-off point.
Do you think its worth a try to consider each newidle balance attempt as
the total load_balance attempts until it is able to move a task, and
then skip balancing within the domain if a CPU's avg idle time is less
than that avg time doing newidle balance?
Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists