lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 17 Jul 2013 08:59:01 -0700 From: Jason Low <jason.low2@...com> To: Peter Zijlstra <peterz@...radead.org> Cc: Ingo Molnar <mingo@...hat.com>, LKML <linux-kernel@...r.kernel.org>, Mike Galbraith <efault@....de>, Thomas Gleixner <tglx@...utronix.de>, Paul Turner <pjt@...gle.com>, Alex Shi <alex.shi@...el.com>, Preeti U Murthy <preeti@...ux.vnet.ibm.com>, Vincent Guittot <vincent.guittot@...aro.org>, Morten Rasmussen <morten.rasmussen@....com>, Namhyung Kim <namhyung@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, Kees Cook <keescook@...omium.org>, Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>, aswin@...com, scott.norton@...com, chegu_vinod@...com Subject: Re: [RFC] sched: Limit idle_balance() when it is being used too frequently Hi Peter, On Wed, 2013-07-17 at 11:39 +0200, Peter Zijlstra wrote: > On Wed, Jul 17, 2013 at 01:11:41AM -0700, Jason Low wrote: > > For the more complex model, are you suggesting that each completion time > > is the time it takes to complete 1 iteration of the for_each_domain() > > loop? > > Per sd, yes? So higher domains (or lower depending on how you model the thing > in you head) have bigger CPU spans, and thus take longer to complete. Imagine > the top domain of a 4096 cpu system, it would go look at all cpus to see if it > could find a task. > > > Based on some of the data I collected, a single iteration of the > > for_each_domain() loop is almost always significantly lower than the > > approximate CPU idle time, even in workloads where idle_balance is > > lowering performance. The bigger issue is that it takes so many of these > > attempts before idle_balance actually "worked" and pulls a tasks. > > I'm confused, so: > > schedule() > if (!rq->nr_running) > idle_balance() > for_each_domain(sd) > load_balance(sd) > > is the entire thing, there's no other loop in there. So if we have the following: for_each_domain(sd) before = sched_clock_cpu load_balance(sd) after = sched_clock_cpu idle_balance_completion_time = after - before At this point, the "idle_balance_completion_time" is usually a very small value and is usually a lot smaller than the avg CPU idle time. However, the vast majority of the time, load_balance returns 0. > > I initially was thinking about each "completion time" of an idle balance > > as the sum total of the times of all iterations to complete until a task > > is successfully pulled within each domain. > > So you're saying that normally idle_balance() won't find a task to pull? And we > need many times going newidle before we do get something? Yes, a while ago, I collected some data on the rate in which idle_balance() does not pull tasks, and it was a very high number. > Wouldn't this mean that there simply weren't enough tasks to keep all cpus busy? If I remember correctly, in a lot of those load_balance attempts when the machine is under a high Java load, there were no "imbalance" between the groups in each sched_domain. > If there were tasks we could've pulled, we might need to look at why they > weren't and maybe fix that. Now it could be that it things this cpu, even with > the (little) idle time it has is sufficiently loaded and we'll get a 'local' > wakeup soon enough. That's perfectly fine. > > What we should avoid is spending more time looking for tasks then we have idle, > since that reduces the total time we can spend doing useful work. So that is I > think the critical cut-off point. Do you think its worth a try to consider each newidle balance attempt as the total load_balance attempts until it is able to move a task, and then skip balancing within the domain if a CPU's avg idle time is less than that avg time doing newidle balance? Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists