lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1097a1d1-483d-44b3-b473-4350b5a4b04d@arm.com>
Date: Fri, 15 Aug 2025 11:46:35 -0500
From: Jeremy Linton <jeremy.linton@....com>
To: Sudeep Holla <sudeep.holla@....com>,
 "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Huang Shijie <shijie@...amperecomputing.com>, catalin.marinas@....com,
 will@...nel.org, patches@...erecomputing.com,
 Shubhang@...amperecomputing.com, krzysztof.kozlowski@...aro.org,
 bjorn.andersson@....qualcomm.com, geert+renesas@...der.be, arnd@...db.de,
 nm@...com, ebiggers@...nel.org, nfraprado@...labora.com,
 prabhakar.mahadev-lad.rj@...renesas.com,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER

Hi,


On 8/15/25 5:48 AM, Sudeep Holla wrote:
> On Thu, Aug 14, 2025 at 09:30:06AM -0700, Christoph Lameter (Ampere) wrote:
>> On Thu, 14 Aug 2025, Sudeep Holla wrote:
>>
>>>    |  Different architectures use different terminology to denominate logically
>>>    |  associated processors, but terms such as package, cluster, module, and
>>>    |  socket are typical examples.
>>>
>>> So how can one use these across architectures ? Package/Socket is quite
>>> standard. Cluster can be group of processors or it can also be group of
>>> processor clusters. One of the Arm vendors call it super cluster or something.
>>> All these makes it super hard for a generic OS to interpret that information.
>>> Just CONFIG_SCHED_CLUSTER was added with one notion of cluster which was soon
>>> realised doesn't match with some other notion of it.
>>
>> What the cluster actually is used for is up to the hardware. The linux
>> scheduler provides this functionality. How and when this feature is used
>> by firmware is a vendor issue. There was never a clear definition.
>>
> 
> Sure, since it is left to architecture to define what it means, it could
> work. But what happens if we have multiple chiplet inside a socket and
> each chiplet has multiple cluster. Do you envision using this SCHED_CLUSTER
> at chiplet level if that works best on the platform ?
> 
> That could work, but we need to document all these with the best of our
> knowledge now so that it is easy to revisit in the future.
> 
>>> We can enable it and I am sure someone will report a regression on their
>>> platform and we need to disable it again. The benchmark doesn't purely
>>> depend on just the "notion" of cluster but it is often related to the
>>> private resource and how they are shared in the system. So even if you
>>> strictly follow the notion of cluster as supported by CONFIG_SCHED_CLUSTER
>>> it will fail on systems where the private resources are shared across the
>>> "cluster" boundaries or some variant configuration.
>>
>> That is not our problem. If the vendor provides clustering information and
>> the scheduler uses that then the vendor can modify the firmware to not
>> enable clustering.
>>
> 
> That is pure wrong. ACPI is describing the hardware. Deciding to put
> clustering information in these tables only if it provides performance or
> not hinder performance seem complete non-sense to me. That covering policy
> in ACPI hardware description. Does ACPI spec mention anything about it ?
> I mean remove some hardware description even if it is 100% accurate if it
> hinders performance on one of the OSPM ? Doesn't sound correct at all.
> 
>> As mentioned before: We could create a blacklist to override the ACPI info
>> from the vendor to ensure that clustering is off.
>>
> 
> Not a bad idea. We can see if allow or blocklist works as we start with one.

 From a distro perspective it makes more sense to me to change it from a 
compile time option to a runtime kernel command line option with the 
default on/off set by this SCHED_CLUSTER flag rather than try to 
maintain a blocklist.


I agree the firmware needs a much clearer way to signal that these nodes 
represent something other than just side effects of the way the table is 
built. If the working group is hesitant to declare additional 
topological flags, maybe this idea of deriving additional topological 
information from nodes without caches is a reasonable spec 
clarification. That way some future 
NODE_IS_A_CLUSTER/DSU/CHIPLET/SUPERCLUSTER/RING/SLICE/WHATEVER doesn't 
turn the existing code into technical debt.

But returning to the original point, its not clear to me that the HW 
'cluster' information is really causing the performance boost vs, just 
having a medium size scheduling domain (aka just picking an arbitrary 
size 4-16 cores) under MC, or simply 'slicing' a L3 in the PPTT such 
that the MC domains are smaller, yields the same effect. I've seen a 
number of cases where 'lying' about the topology yields a better result 
in a benchmark. This is largely what is happening with these Firmware 
toggles that move/remove the NUMA domains too. Being able to manually 
reconfigure some of these scheduling levels at runtime might be useful...




> 
>> What we should not do is disabling clustering for all.
>>
> 
> Not completely against it but I have concerns on how all these scale with
> multiple chiplets within a socket or any such variants.
> 
>>>> We could add a blacklist for those platforms to avoid regressions but we
>>>> should not allow that to hinder us to enable full support for clustering
>>>> on ARM64.
>>>>
>>>
>>> Sure, but we need to improve the "cluster" definition in the ACPI and Arm
>>> specification, get an agreement on what it means for other architecture
>>> first IMO. We don't want to revisit the same topic again without these as
>>> IIRC this is the second time we are discussion around this topic.
>>
>> The vendors need flexibility to use this feature when it makes sense.
>>
> 
> Sure, but too much flexibility might also hinder future changes when adding
> some other feature(chiplet again is one thing I can think of now)
> 
>> Having a clear definition would limit the use of clustering feature and
>> limits innovation. Vendors can control the clustering via ACPI and the
>> firmware they provide with their system.
>>
> 
> Not sure if that should be right direction TBH, but again not against the
> idea of enabling the feature on some platforms if we are going to enable it
> by default.
> 
>> We could change definition but that but that would be a decadelong
>> process which will encounter resistance from vendors that make uses of the
>> clustering feature that does not fall into the stricter definition.
>>
> 
> I understand and get the point, but decadelong is bit of an exaggeration 😉.
> Not discussing these in ACPI or similar forum is not a good idea as we know
> there are new h/w features that are being added and current specification
> may not provide ways to express all of those.
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ