lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 02 Apr 2021 11:44:32 -0500
From:   Nathan Lynch <nathanl@...ux.ibm.com>
To:     Laurent Dufour <ldufour@...ux.ibm.com>
Cc:     cheloha@...ux.ibm.com, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, mpe@...erman.id.au,
        benh@...nel.crashing.org, paulus@...ba.org,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [PATCH v2] pseries: prevent free CPU ids to be reused on
 another node

Laurent Dufour <ldufour@...ux.ibm.com> writes:
> Le 02/04/2021 à 15:34, Nathan Lynch a écrit :
>> Laurent Dufour <ldufour@...ux.ibm.com> writes:
>>> When a CPU is hot added, the CPU ids are taken from the available mask from
>>> the lower possible set. If that set of values was previously used for CPU
>>> attached to a different node, this seems to application like if these CPUs
>>> have migrated from a node to another one which is not expected in real
>>> life.
>> 
>> This seems like a problem that could affect other architectures or
>> platforms? I guess as long as arch code is responsible for placing new
>> CPUs in cpu_present_mask, that code will have the responsibility of
>> ensuring CPU IDs' NUMA assignments remain stable.
>
> Actually, x86 is already handling this issue in the arch code specific
> code, see 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for
> cpuid <-> apicid mapping"). I didn't check for other architectures but
> as CPU id allocation is in the arch part, I believe this is up to each
> arch to deal with this issue.
>
> Making the CPU id allocation common to all arch is outside the scope
> of this patch.

Well... we'd better avoid a situation where architectures impose
different policies in this area. (I guess we're already in this
situation, which is the problem.) A more maintainable way to achieve
that would be to put the higher-level policy in arch-independent code,
as much as possible.

I don't insist, though.


>> I don't know, should we not fail the request instead of doing the
>> ABI-breaking thing the code in this change is trying to prevent? I
>> don't think a warning in the kernel log is going to help any
>> application that would be affected by this.
>
> That's a really good question. One should argue that the most
> important is to satisfy the CPU add operation, assuming that only few
> are interested in the CPU numbering, while others would prefer the CPU
> adding to fail (which may prevent adding CPUs on another nodes if the
> whole operation is aborted as soon as a CPU add is failing).
>
> I was conservative here, but if failing the operation is the best
> option, then this will make that code simpler, removing the backward
> jump.
>
> Who is deciding?

I favor failing the request. Apart from the implications for user space,
it's not clear to me that allowing the cpu-node relationship to change
once initialized is benign in terms of internal kernel assumptions
(e.g. sched domains, workqueues?). And as you say, it would make for
more straightforward code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ