linux-kernel - Re: sched: Consequences of integrating the Per Entity Load Tracking Metric into the Load Balancer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50EA5D41.4090502@linux.vnet.ibm.com>
Date:	Mon, 07 Jan 2013 10:59:37 +0530
From:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
To:	Mike Galbraith <bitbucket@...ine.de>
CC:	LKML <linux-kernel@...r.kernel.org>,
	"svaidy@...ux.vnet.ibm.com" <svaidy@...ux.vnet.ibm.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Amit Kucheria <amit.kucheria@...aro.org>,
	Morten Rasmussen <Morten.Rasmussen@....com>,
	Paul McKenney <paul.mckenney@...aro.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
	Venki Pallipadi <venki@...gle.com>,
	Robin Randhawa <robin.randhawa@....com>,
	Lists linaro-dev <linaro-dev@...ts.linaro.org>,
	Matthew Garrett <mjg59@...f.ucam.org>,
	Alex Shi <alex.shi@...el.com>, srikar@...ux.vnet.ibm.com
Subject: Re: sched: Consequences of integrating the Per Entity Load Tracking
 Metric into the Load Balancer

Hi Mike,
Thank you very much for your inputs.Just a few thoughts so that we are
clear with the problems so far in the scheduler scalability and in what
direction we ought to move to correct them.

1. During fork or exec,the scheduler goes through find_idlest_group()
and find_idlest_cpu() in select_task_rq_fair() by iterating through all
domains.Why then was a similar approach not followed for wake up
balancing? What was so different about wake ups (except that the woken
up task had to remain close to the prev/waking cpu) that we had to
introduce select_idle_sibling() in the first place?

2.To the best of my knowlege,the concept of buddy cpu was introduced in
select_idle_sibling() so as to avoid the entire package traversal and
restrict it to the buddy cpus alone.But even during fork or exec,we
iterate through all the sched domains,like I have mentioned above.Why
did not the buddy cpu solution come to the rescue here as well?

3.So the correct problem stands at avoid iterating through the entire
package at the cost of less aggression in finding the idle cpu or
iterate through the package with an intention of finding the idlest
cpu.To the best of my understanding the former is your approach or
commit 37407ea7,the latter is what I tried to do.But as you have rightly
pointed out my approach will have scaling issues.In this light,how does
your best_combined patch(below) look like?
Do you introduce a cut off value on the loads to decide on which
approach to take?

Meanwhile I will also try to run tbench and a few other benchmarks to
find out why the results are like below.Will update you very soon on this.

Thank you

Regards
Preeti U Murthy

On 01/06/2013 10:02 PM, Mike Galbraith wrote:
> On Sat, 2013-01-05 at 09:13 +0100, Mike Galbraith wrote:
> 
>> I still have a 2.6-rt problem I need to find time to squabble with, but
>> maybe I'll soonish see if what you did plus what I did combined works
>> out on that 4x10 core box where current is _so_ unbelievably horrible.
>> Heck, it can't get any worse, and the restricted wake balance alone
>> kinda sorta worked.
> 
> Actually, I flunked copy/paste 101.  Below (preeti) shows the real deal.
> 
> tbench, 3 runs, 30 secs/run
> revert = 37407ea7 reverted
> clients                     1          5         10        20         40         80
> 3.6.0.virgin            27.83     139.50    1488.76   4172.93    6983.71    8301.73
>                         29.23     139.98    1500.22   4162.92    6907.16    8231.13
>                         30.00     141.43    1500.09   3975.50    6847.24    7983.98
> 
> 3.6.0+revert           281.08    1404.76    2802.44   5019.49    7080.97    8592.80
>                        282.38    1375.70    2747.23   4823.95    7052.15    8508.45
>                        270.69    1375.53    2736.29   5243.05    7058.75    8806.72
> 
> 3.6.0+preeti            26.43     126.62    1027.23   3350.06    7004.22    7561.83
>                         26.67     128.66     922.57   3341.73    7045.05    7662.18
>                         25.54     129.20    1015.02   3337.60    6591.32    7634.33
> 
> 3.6.0+best_combined    280.48    1382.07    2730.27   4786.20    6477.28    7980.07
>                        276.88    1392.50    2708.23   4741.25    6590.99    7992.11
>                        278.92    1368.55    2735.49   4614.99    6573.38    7921.75
> 
> 3.0.51-0.7.9-default   286.44    1415.37    2794.41   5284.39    7282.57   13670.80
> 
> Something is either wrong with 3.6 itself, or the config I'm using, as
> max throughput is nowhere near where it should be (see default).  On the
> bright side, integrating the two does show some promise.
> 
> -Mike
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/