linux-kernel - Re: [PATCH V6] sched/fair: Remove group imbalance from calculate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170728125924.eaw6unxude2qiyym@hirez.programming.kicks-ass.net>
Date:   Fri, 28 Jul 2017 14:59:24 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Jeffrey Hugo <jhugo@...eaurora.org>,
        Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        Austin Christ <austinwc@...eaurora.org>,
        Tyler Baicar <tbaicar@...eaurora.org>,
        Timur Tabi <timur@...eaurora.org>
Subject: Re: [PATCH V6] sched/fair: Remove group imbalance from
 calculate_imbalance()

On Fri, Jul 28, 2017 at 01:16:24PM +0100, Dietmar Eggemann wrote:
> >> IIRC the topology you had in mind was MC + DIE level with n (n > 2) DIE
> >> level sched groups.
> > 
> > That'd be a NUMA box?
> 
> I don't think it's NUMA. SD level are MC, DIE w/ # DIE sg's >> 2.

Ah, I can't read. I thought >2 DIEs.

> > So this is 4 * 18 * 2 = 144 cpus:
> 
> Impressive ;-)

Takes forever to boot though :/

> > If I then start a 3rd loop, I see 100% 50%,50%. I then kill the 100%.
> > Then instantly they balance and I get 2x100% back.
> 
> Yeah, could reproduce on IVB-EP (2x10x2).

OK, I have one of those. What should I do, because I didn't actually see
anything odd.

> > Anything else I need to reproduce? (other than maybe a slightly less
> > insane machine :-)
> 
> I guess what Jeff is trying to avoid is that 'busiest->load_per_task'
> lowered to 'sds->avg_load' in case of an imbalanced busiest sg:
> 
>   if (busiest->group_type == group_imbalanced)
>     busiest->load_per_task = min(busiest->load_per_task, sds->avg_load);
> 
> is so low that later fix_small_imbalance() won't be called and
> 'env->imbalance' stays so low that load-balance of on 50% task to the
> now idle cpu won't happen.
> 
>   if (env->imbalance < busiest->load_per_task)
>     fix_small_imbalance(env, sds);
> 
> Having really a lot of otherwise idle DIE sg's helps to keep
> 'sds->avg_load' low in comparison to 'busiest->load_per_task'.

Right, but the whole load_per_task thing is a bit wonky, and since
that's the basis of fix_small_imbalance() I'm very suspect.