linux-kernel - Re: [PATCH v2] pseries: prevent free CPU ids to be reused on another node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <877dlkbpqn.fsf@linux.ibm.com>
Date:   Fri, 02 Apr 2021 11:44:32 -0500
From:   Nathan Lynch <nathanl@...ux.ibm.com>
To:     Laurent Dufour <ldufour@...ux.ibm.com>
Cc:     cheloha@...ux.ibm.com, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, mpe@...erman.id.au,
        benh@...nel.crashing.org, paulus@...ba.org,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [PATCH v2] pseries: prevent free CPU ids to be reused on
 another node

Laurent Dufour <ldufour@...ux.ibm.com> writes:
> Le 02/04/2021 à 15:34, Nathan Lynch a écrit :
>> Laurent Dufour <ldufour@...ux.ibm.com> writes:
>>> When a CPU is hot added, the CPU ids are taken from the available mask from
>>> the lower possible set. If that set of values was previously used for CPU
>>> attached to a different node, this seems to application like if these CPUs
>>> have migrated from a node to another one which is not expected in real
>>> life.
>> 
>> This seems like a problem that could affect other architectures or
>> platforms? I guess as long as arch code is responsible for placing new
>> CPUs in cpu_present_mask, that code will have the responsibility of
>> ensuring CPU IDs' NUMA assignments remain stable.
>
> Actually, x86 is already handling this issue in the arch code specific
> code, see 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for
> cpuid <-> apicid mapping"). I didn't check for other architectures but
> as CPU id allocation is in the arch part, I believe this is up to each
> arch to deal with this issue.
>
> Making the CPU id allocation common to all arch is outside the scope
> of this patch.

Well... we'd better avoid a situation where architectures impose
different policies in this area. (I guess we're already in this
situation, which is the problem.) A more maintainable way to achieve
that would be to put the higher-level policy in arch-independent code,
as much as possible.

I don't insist, though.


>> I don't know, should we not fail the request instead of doing the
>> ABI-breaking thing the code in this change is trying to prevent? I
>> don't think a warning in the kernel log is going to help any
>> application that would be affected by this.
>
> That's a really good question. One should argue that the most
> important is to satisfy the CPU add operation, assuming that only few
> are interested in the CPU numbering, while others would prefer the CPU
> adding to fail (which may prevent adding CPUs on another nodes if the
> whole operation is aborted as soon as a CPU add is failing).
>
> I was conservative here, but if failing the operation is the best
> option, then this will make that code simpler, removing the backward
> jump.
>
> Who is deciding?

I favor failing the request. Apart from the implications for user space,
it's not clear to me that allowing the cpu-node relationship to change
once initialized is benign in terms of internal kernel assumptions
(e.g. sched domains, workqueues?). And as you say, it would make for
more straightforward code.