linux-kernel - Re: sched: Consequences of integrating the Per Entity Load Tracking Metric into the Load Balancer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50ECE097.7010609@linux.vnet.ibm.com>
Date:	Wed, 09 Jan 2013 08:44:31 +0530
From:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
To:	Vincent Guittot <vincent.guittot@...aro.org>,
	Mike Galbraith <bitbucket@...ine.de>
CC:	Matthew Garrett <mjg59@...f.ucam.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"svaidy@...ux.vnet.ibm.com" <svaidy@...ux.vnet.ibm.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Amit Kucheria <amit.kucheria@...aro.org>,
	Morten Rasmussen <Morten.Rasmussen@....com>,
	Paul McKenney <paul.mckenney@...aro.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
	Venki Pallipadi <venki@...gle.com>,
	Robin Randhawa <robin.randhawa@....com>,
	Lists linaro-dev <linaro-dev@...ts.linaro.org>,
	Alex Shi <alex.shi@...el.com>
Subject: Re: sched: Consequences of integrating the Per Entity Load Tracking
 Metric into the Load Balancer

>>>> Here comes the point of making both load balancing and wake up
>>>> balance(select_idle_sibling) co operative. How about we always schedule
>>>> the woken up task on the prev_cpu? This seems more sensible considering
>>>> load balancing considers blocked load as being a part of the load of cpu2.
>>>
>>> Hi Preeti,
>>>
>>> I'm not sure that we want such steady state at cores level because we
>>> take advantage of migrating wake up tasks between cores that share
>>> their cache as Matthew demonstrated. But I agree that reaching such
>>> steady state at cluster and CPU level is interesting.
>>>
>>> IMHO, you're right that taking the blocked load into consideration
>>> should minimize tasks migration between cluster but it should no
>>> prevent fast task migration between cores that share their cache
>>
>> True Vincent.But I think the one disadvantage even at cpu or cluster
>> level is that when we consider blocked load, we might prevent any more
>> tasks from being scheduled on that cpu during periodic load balance if
>> the blocked load is too much.This is very poor cpu utilization
> 
> The blocked load of a cluster will be high if the blocked tasks have
> run recently. The contribution of a blocked task will be divided by 2
> each 32ms, so it means that a high blocked load will be made of recent
> running tasks and the long sleeping tasks will not influence the load
> balancing.
> The load balance period is between 1 tick (10ms for idle load balance
> on ARM) and up to 256 ms (for busy load balance) so a high blocked
> load should imply some tasks that have run recently otherwise your
> blocked load will be small and will not have a large influence on your
> load balance

Makes a lot of sense.

>> Also we can consider steady states if the waking tasks have a specific
>> waking pattern.I am not sure if we can risk hoping that the blocked task
>> would wake up soon or would wake up at time 'x' and utilize that cpu.
> 
> Ok, so you don't consider to use blocked load in load balancing any more ?

Hmm..This has got me thinking.I thought to solve the existing
select_idle_sibling() problem of bouncing tasks all over the l3 package
and taking time to find an idle buddy could be solved in isolation with
the PJT's metric.But that does not seem to be the case considering the
suggestions by you and Mike.

Currently there are so many approaches proposed to improve the scheduler
that it is confusing as to how and which pieces fit well.Let me lay them
down.Please do help me put them together.

Jigsaw Piece1:Use Pjt's metric in load balancing and  Blocked
load+runnable load as part of cpu load while load balancing.

Jigsaw Piece2: select_idle_sibling() choosing the cpu to wake up tasks on.

Jigsaw Piece3: 'cpu buddy' concept to prevent bouncing of tasks.

Considering both yours and Mike's suggestions,what do you guys think of
the following puzzle and solution?

*Puzzle*: Waking up tasks should not take too much time to find a cpu to
run on and should not keep bouncing on too many cpus all over the
package, and should try as much not to create too much of an imbalance
in the load distribution of the cpus.

*Solution:*

Place Jigsaw Piece 1 first:Use Pjt's metric and blocked load + runnable
load as part of cpu load while load balancing.
(As time passes the blocked load becomes less significant on that
cpu,hence load balancing will go on as usual).

Place Jigsaw Piece 2 next: When tasks wake up,**use
select_idle_sibling() to see only if you can migrate tasks between cores
that share their cache**,
IOW see if the cpu at the lowest level sched domain is idle.If it is,
then schedule on it and migrate_task_rq_fair() will remove the load from
the prev_cpu,if not idle,then return the prev_cpu() which had already
considered the blocked load as part of its overall load.Hence very
little imbalance will be created.


*Possible End Picture*

Waking up tasks will not take time to find a cpu since we are probing
the cpus at only one sched domain level.The bouncing of tasks will be
restricted at the core level.An imbalance will not be created as the
blocked load is also considered while load balancing.

*Concerns*

1.Is the wake up load balancing in this solution less aggressive so as
to harm throughput significantly ?
2.Do we need Jigsaw Piece 3 at all?

Please do let me know what you all think.Thank you very much for your
suggestions.
>
> regards,
> Vincent

Regards
Preeti U Murthy



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/