linux-kernel - Re: sched: Consequences of integrating the Per Entity Load Tracking Metric into the Load Balancer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtB4ABqkB=x6sUzCmvJCj6p+RaFKpGYneNt+zASyL-oU0w@mail.gmail.com>
Date:	Tue, 8 Jan 2013 15:04:12 +0100
From:	Vincent Guittot <vincent.guittot@...aro.org>
To:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
Cc:	Mike Galbraith <bitbucket@...ine.de>,
	Matthew Garrett <mjg59@...f.ucam.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"svaidy@...ux.vnet.ibm.com" <svaidy@...ux.vnet.ibm.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Amit Kucheria <amit.kucheria@...aro.org>,
	Morten Rasmussen <Morten.Rasmussen@....com>,
	Paul McKenney <paul.mckenney@...aro.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
	Venki Pallipadi <venki@...gle.com>,
	Robin Randhawa <robin.randhawa@....com>,
	Lists linaro-dev <linaro-dev@...ts.linaro.org>
Subject: Re: sched: Consequences of integrating the Per Entity Load Tracking
 Metric into the Load Balancer

On 8 January 2013 07:06, Preeti U Murthy <preeti@...ux.vnet.ibm.com> wrote:
> On 01/07/2013 09:18 PM, Vincent Guittot wrote:
>> On 2 January 2013 05:22, Preeti U Murthy <preeti@...ux.vnet.ibm.com> wrote:
>>> Hi everyone,
>>> I have been looking at how different workloads react when the per entity
>>> load tracking metric is integrated into the load balancer and what are
>>> the possible reasons for it.
>>>
>>> I had posted the integration patch earlier:
>>> https://lkml.org/lkml/2012/11/15/391
>>>
>>> Essentially what I am doing is:
>>> 1.I have disabled CONFIG_FAIR_GROUP_SCHED to make the analysis simple
>>> 2.I have replaced cfs_rq->load.weight in weighted_cpuload() with
>>> cfs.runnable_load_avg,the active load tracking metric.
>>> 3.I have replaced se.load.weight in task_h_load() with
>>> se.load.avg.contrib,the per entity load tracking metric.
>>> 4.The load balancer will end up using these metrics.
>>>
>>> After conducting experiments on several workloads I found out that the
>>> performance of the workloads with the above integration would neither
>>> improve nor deteriorate.And this observation was consistent.
>>>
>>> Ideally the performance should have improved considering,that the metric
>>> does better tracking of load.
>>>
>>> Let me explain with a simple example as to why we should see a
>>> performance improvement ideally:Consider 2 80% tasks and 1 40% task.
>>>
>>> With integration:
>>> ----------------
>>>
>>>        40%
>>> 80%    40%
>>> cpu1  cpu2
>>>
>>> The above will be the scenario when the tasks fork initially.And this is
>>> a perfectly balanced system,hence no more load balancing.And proper
>>> distribution of loads on the cpu.
>>>
>>> Without integration
>>> -------------------
>>>
>>> 40%                               40%
>>> 80%    40%                 80%    40%
>>> cpu1   cpu2        OR     cpu1   cpu2
>>>
>>> Because the  view is that all the tasks as having the same load.The load
>>> balancer could ping pong tasks between these two situations.
>>>
>>> When I performed this experiment,I did not see an improvement in the
>>> performance though in the former case.On further observation I found
>>> that the following was actually happening.
>>>
>>> With integration
>>> ----------------
>>>
>>> Initially         40% task sleeps      40% task wakes up
>>>                                        and select_idle_sibling()
>>>                                        decides to wake it up on cpu1
>>>
>>>        40%   ->                   ->   40%
>>> 80%    40%        80%    40%           80%      40%
>>> cpu1  cpu2        cpu1   cpu2          cpu1     cpu2
>>>
>>>
>>> This makes load balance trigger movement of 40% from cpu1 back to
>>> cpu2.Hence the stability that the load balancer was trying to achieve is
>>> gone.Hence the culprit boils down to select_idle_sibling.How is it the
>>> culprit and how is it hindering performance of the workloads?
>>>
>>> *What is the way ahead with the per entity load tracking metric in the
>>> load balancer then?*
>>>
>>> In replies to a post by Paul in https://lkml.org/lkml/2012/12/6/105,
>>> he mentions the following:
>>>
>>> "It is my intuition that the greatest carnage here is actually caused
>>> by wake-up load-balancing getting in the way of periodic in
>>> establishing a steady state. I suspect more mileage would result from
>>> reducing the interference wake-up load-balancing has with steady
>>> state."
>>>
>>> "The whole point of using blocked load is so that you can converge on a
>>> steady state where you don't NEED to move tasks.  What disrupts this is
>>> we naturally prefer idle cpus on wake-up balance to reduce wake-up
>>> latency. I think the better answer is making these two processes load
>>> balancing() and select_idle_sibling() more co-operative."
>>>
>>> I had not realised how this would happen until I saw it happening in the
>>> above experiment.
>>>
>>> Based on what Paul explained above let us use the runnable load + the
>>> blocked load for calculating the load on a cfs runqueue rather than just
>>> the runnable load(which is what i am doing now) and see its consequence.
>>>
>>> Initially:       40% task sleeps
>>>
>>>        40%
>>> 80%    40%   ->  80%  40%
>>> cpu1   cpu2     cpu1  cpu2
>>>
>>> So initially the load on cpu1 is say 80 and on cpu2 also it is
>>> 80.Balanced.Now when 40% task sleeps,the total load on cpu2=runnable
>>> load+blocked load.which is still 80.
>>>
>>> As a consequence,firstly,during periodic load balancing the load is not
>>> moved from cpu1 to cpu2 when the 40% task sleeps.(It sees the load on
>>> cpu2 as 80 and not as 40).
>>> Hence the above scenario remains the same.On wake up,what happens?
>>>
>>> Here comes the point of making both load balancing and wake up
>>> balance(select_idle_sibling) co operative. How about we always schedule
>>> the woken up task on the prev_cpu? This seems more sensible considering
>>> load balancing considers blocked load as being a part of the load of cpu2.
>>
>> Hi Preeti,
>>
>> I'm not sure that we want such steady state at cores level because we
>> take advantage of migrating wake up tasks between cores that share
>> their cache as Matthew demonstrated. But I agree that reaching such
>> steady state at cluster and CPU level is interesting.
>>
>> IMHO, you're right that taking the blocked load into consideration
>> should minimize tasks migration between cluster but it should no
>> prevent fast task migration between cores that share their cache
>
> True Vincent.But I think the one disadvantage even at cpu or cluster
> level is that when we consider blocked load, we might prevent any more
> tasks from being scheduled on that cpu during periodic load balance if
> the blocked load is too much.This is very poor cpu utilization

The blocked load of a cluster will be high if the blocked tasks have
run recently. The contribution of a blocked task will be divided by 2
each 32ms, so it means that a high blocked load will be made of recent
running tasks and the long sleeping tasks will not influence the load
balancing.
The load balance period is between 1 tick (10ms for idle load balance
on ARM) and up to 256 ms (for busy load balance) so a high blocked
load should imply some tasks that have run recently otherwise your
blocked load will be small and will not have a large influence on your
load balance

>
> Also we can consider steady states if the waking tasks have a specific
> waking pattern.I am not sure if we can risk hoping that the blocked task
> would wake up soon or would wake up at time 'x' and utilize that cpu.

Ok, so you don't consider to use blocked load in load balancing any more ?

regards,
Vincent
>
>>
>> Vincent
>
> Regards
> Preeti U Murthy
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/