linux-kernel - Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161018090747.GW3142@twins.programming.kicks-ass.net>
Date:   Tue, 18 Oct 2016 11:07:47 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Joseph Salisbury <joseph.salisbury@...onical.com>,
        Ingo Molnar <mingo@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Mike Galbraith <efault@....de>, omer.akram@...onical.com
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote:
> 
> Something looks weird related to the use of for_each_possible_cpu(i) in
> online_fair_sched_group() on my i5-3320M CPU (4 logical cpus).
> 
> In case I print out cpu id and the cpu masks inside the for_each_possible_cpu(i)
> I get:
> 
> [ 5.462368]  cpu=0  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3

OK, you have a buggy BIOS :-) It enumerates too many CPU slots. There is
no reason to have 4 empty CPU slots on a machine that cannot do physical
hotplug.

This also explains why it doesn't show on many machines, most machines
will not have this and possible_mask == present_mask == online_mask for
most all cases.

x86 folk, can we detect the lack of physical hotplug capability and FW_WARN
on this and lowering possible_mask to present_mask?

> [ 5.462370]  cpu=1  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462370]  cpu=2  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462371]  cpu=3  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462372] *cpu=4* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462373] *cpu=5* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462374] *cpu=6* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462375] *cpu=7* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> 
> T430:/sys/fs/cgroup/cpu,cpuacct/system.slice# ls -l | grep '^d' | wc -l
> 80
> 
> /proc/sched_debug:
> 
> cfs_rq[0]:/system.slice
>   ...
>   .tg_load_avg                   : 323584
>   ...
> 
> 80 * 1024 * 4 (not existent cpu4-cpu7) = 327680 (with a little bit of decay,
> this could be this extra load on the systen.slice tg)
> 
> Using for_each_online_cpu(i) instead of for_each_possible_cpu(i) in
> online_fair_sched_group() works on this machine, i.e. the .tg_load_avg
> of system.slice tg is 0 after startup.

Right, so the reason for using present_mask is that it avoids having to
deal with hotplug, also all the per-cpu memory is allocated and present
for !online CPUs anyway, so might as well set it up properly anyway.

(You might want to start booting your laptop with "possible_cpus=4" to
save some memory FWIW.)

But yes, we have a bug here too... /me ponders

So aside from funny BIOSes, this should also show up when creating
cgroups when you have offlined a few CPUs, which is far more common I'd
think.

On IRC you mentioned that adding list_add_leaf_cfs_rq() to
online_fair_sched_group() cures this, this would actually match with
unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid
a few instructions on the enqueue path, so that's all good.

I'm just not immediately seeing how that cures things. The only relevant
user of the leaf_cfs_rq list seems to be update_blocked_averages() which
is called from the balance code (idle_balance() and
rebalance_domains()). But neither should call that for offline (or
!present) CPUs.

Humm..