[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ddf061e-26a2-7151-adff-7ae339c848ac@arm.com>
Date: Fri, 28 Jul 2017 13:16:24 +0100
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Jeffrey Hugo <jhugo@...eaurora.org>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
Austin Christ <austinwc@...eaurora.org>,
Tyler Baicar <tbaicar@...eaurora.org>,
Timur Tabi <timur@...eaurora.org>
Subject: Re: [PATCH V6] sched/fair: Remove group imbalance from
calculate_imbalance()
On 26/07/17 15:54, Peter Zijlstra wrote:
> On Tue, Jul 18, 2017 at 08:48:53PM +0100, Dietmar Eggemann wrote:
>> Hi Jeffrey,
>>
>> On 13/07/17 20:55, Jeffrey Hugo wrote:
[...]
>>> Since the group imbalance path in calculate_imbalance() is at best a NOP
>>> but otherwise harmful, remove it.
>
> Hurm.. so fix_small_imbalance() itself is a pile of dog poo... it used
> to make sense a long time ago, but smp-nice and then cgroups made a
> complete joke of things.
>
>> IIRC the topology you had in mind was MC + DIE level with n (n > 2) DIE
>> level sched groups.
>
> That'd be a NUMA box?
I don't think it's NUMA. SD level are MC, DIE w/ # DIE sg's >> 2.
[...]
>> but here the prefer_sibling handling (group overloaded) eclipses 'group
>> imbalance' the moment one of the cfs tasks can go to cpu2 so the if
>> condition you got rid of is a nop.
>>
>> I wonder if it is fair to say that your fix helps multi-cluster
>> (especially with n > 2) systems without SMT and with your first patch
>> [1] for this specific, cpu affinity restricted test cases.
>
> I tried on an IVB-EP with all the HT siblings unplugged, could not
> reproduce either. Still at n=2 though. Let me fire up an EX, that'll get
> me n=4.
>
> So this is 4 * 18 * 2 = 144 cpus:
Impressive ;-)
>
> # for ((i=72; i<144; i++)) ; do echo 0 > /sys/devices/system/cpu/cpu$i/online; done
> # taskset -pc 0,18 $$
> # while :; do :; done & while :; do :; done &
>
> So I'm taking SMT out, affine to first and second MC group, start 2
> loops.
>
> Using another console I see them both using 100%.
>
> If I then start a 3rd loop, I see 100% 50%,50%. I then kill the 100%.
> Then instantly they balance and I get 2x100% back.
Yeah, could reproduce on IVB-EP (2x10x2).
> Anything else I need to reproduce? (other than maybe a slightly less
> insane machine :-)
I guess what Jeff is trying to avoid is that 'busiest->load_per_task'
lowered to 'sds->avg_load' in case of an imbalanced busiest sg:
if (busiest->group_type == group_imbalanced)
busiest->load_per_task = min(busiest->load_per_task, sds->avg_load);
is so low that later fix_small_imbalance() won't be called and
'env->imbalance' stays so low that load-balance of on 50% task to the
now idle cpu won't happen.
if (env->imbalance < busiest->load_per_task)
fix_small_imbalance(env, sds);
Having really a lot of otherwise idle DIE sg's helps to keep
'sds->avg_load' low in comparison to 'busiest->load_per_task'.
> Because I have the feeling that while this patch cures things for you,
> you're fighting symptoms.
Unfortunately, don't have a machine available with n >> 2 (on DIE or
NUMA) ...
Powered by blists - more mailing lists