linux-kernel - Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCH4wbzHaDzgAtLLSLUegQfztgE5bO4JV-SYxEGd78KGA@mail.gmail.com>
Date:   Tue, 18 Oct 2016 11:45:48 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Joseph Salisbury <joseph.salisbury@...onical.com>,
        Ingo Molnar <mingo@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Mike Galbraith <efault@....de>, omer.akram@...onical.com
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

On 18 October 2016 at 11:07, Peter Zijlstra <peterz@...radead.org> wrote:
> On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote:
>>
>> Something looks weird related to the use of for_each_possible_cpu(i) in
>> online_fair_sched_group() on my i5-3320M CPU (4 logical cpus).
>>
>> In case I print out cpu id and the cpu masks inside the for_each_possible_cpu(i)
>> I get:
>>
>> [ 5.462368]  cpu=0  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>
> OK, you have a buggy BIOS :-) It enumerates too many CPU slots. There is
> no reason to have 4 empty CPU slots on a machine that cannot do physical
> hotplug.
>
> This also explains why it doesn't show on many machines, most machines
> will not have this and possible_mask == present_mask == online_mask for
> most all cases.
>
> x86 folk, can we detect the lack of physical hotplug capability and FW_WARN
> on this and lowering possible_mask to present_mask?
>
>> [ 5.462370]  cpu=1  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>> [ 5.462370]  cpu=2  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>> [ 5.462371]  cpu=3  cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>> [ 5.462372] *cpu=4* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>> [ 5.462373] *cpu=5* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>> [ 5.462374] *cpu=6* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>> [ 5.462375] *cpu=7* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>>
>> T430:/sys/fs/cgroup/cpu,cpuacct/system.slice# ls -l | grep '^d' | wc -l
>> 80
>>
>> /proc/sched_debug:
>>
>> cfs_rq[0]:/system.slice
>>   ...
>>   .tg_load_avg                   : 323584
>>   ...
>>
>> 80 * 1024 * 4 (not existent cpu4-cpu7) = 327680 (with a little bit of decay,
>> this could be this extra load on the systen.slice tg)
>>
>> Using for_each_online_cpu(i) instead of for_each_possible_cpu(i) in
>> online_fair_sched_group() works on this machine, i.e. the .tg_load_avg
>> of system.slice tg is 0 after startup.
>
> Right, so the reason for using present_mask is that it avoids having to
> deal with hotplug, also all the per-cpu memory is allocated and present
> for !online CPUs anyway, so might as well set it up properly anyway.
>
> (You might want to start booting your laptop with "possible_cpus=4" to
> save some memory FWIW.)
>
> But yes, we have a bug here too... /me ponders
>
> So aside from funny BIOSes, this should also show up when creating
> cgroups when you have offlined a few CPUs, which is far more common I'd
> think.

The problem is also that the load of the tg->se[cpu] that represents
the tg->cfs_rq[cpu] is initialized to 1024 in:
alloc_fair_sched_group
     for_each_possible_cpu(i) {
         init_entity_runnable_average(se);
            sa->load_avg = scale_load_down(se->load.weight);

Initializing  sa->load_avg to 1024 for a newly created task makes
sense as we don't know yet what will be its real load but i'm not sure
that we have to do the same for se that represents a task group. This
load should be initialized to 0 and it will increase when task will be
moved/attached into task group

>
> On IRC you mentioned that adding list_add_leaf_cfs_rq() to
> online_fair_sched_group() cures this, this would actually match with
> unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid
> a few instructions on the enqueue path, so that's all good.
>
> I'm just not immediately seeing how that cures things. The only relevant
> user of the leaf_cfs_rq list seems to be update_blocked_averages() which
> is called from the balance code (idle_balance() and
> rebalance_domains()). But neither should call that for offline (or
> !present) CPUs.
>
>
> Humm..