linux-kernel - Re: [PATCH v2] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <11997145-4718-ed17-6085-54be18bf85ba@redhat.com>
Date:   Wed, 20 Feb 2019 18:57:01 +0100
From:   Laurent Vivier <lvivier@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Borislav Petkov <bp@...e.de>,
        David Gibson <david@...son.dropbear.id.au>,
        Michael Ellerman <mpe@...erman.id.au>,
        Nathan Fontenot <nfont@...ux.vnet.ibm.com>,
        Michael Bringmann <mwb@...ux.vnet.ibm.com>,
        linuxppc-dev@...ts.ozlabs.org, Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH v2] sched/topology: fix kernel crash when a CPU is
 hotplugged in a memoryless node

On 20/02/2019 18:08, Peter Zijlstra wrote:
> On Wed, Feb 20, 2019 at 05:55:20PM +0100, Laurent Vivier wrote:
>> index 3f35ba1d8fde..372278605f0d 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -1651,6 +1651,7 @@ void sched_init_numa(void)
>>  	 */
>>  	tl[i++] = (struct sched_domain_topology_level){
>>  		.mask = sd_numa_mask,
>> +		.flags = SDTL_OVERLAP,
> 
> This makes no sense what so ever. The numa identify node should not have
> overlap with other domains.
> 
> Are you sure this is not because of the utterly broken powerpc nonsense
> where they move CPUs between nodes?

No, I'm not sure. This why I've Cc: powerpc folks. My conclusion is only
based on the before/after changes.

I've tested some patches from powerpc ML, but they don't fix this problem:
  powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update
  powerpc/pseries: Perform full re-add of CPU for topology update
post-migration

So the only reason I can see to have a corrupted sched_group list is the
sched_domain_span() fonction doesn't return a correct cpumask for the
domain once a new CPU is added.

Thanks,
Laurent