[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJ20imoeRL_tifky@bogus>
Date: Thu, 14 Aug 2025 11:03:54 +0100
From: Sudeep Holla <sudeep.holla@....com>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Jeremy Linton <jeremy.linton@....com>,
Sudeep Holla <sudeep.holla@....com>,
Huang Shijie <shijie@...amperecomputing.com>,
catalin.marinas@....com, will@...nel.org,
patches@...erecomputing.com, Shubhang@...amperecomputing.com,
krzysztof.kozlowski@...aro.org, bjorn.andersson@....qualcomm.com,
geert+renesas@...der.be, arnd@...db.de, nm@...com,
ebiggers@...nel.org, nfraprado@...labora.com,
prabhakar.mahadev-lad.rj@...renesas.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
On Wed, Aug 13, 2025 at 03:56:47PM -0700, Christoph Lameter (Ampere) wrote:
> On Wed, 13 Aug 2025, Christoph Lameter (Ampere) wrote:
>
> > Can we figure out which platforms benchmarks were affected and why?
> >
> > It seems the notion of a "cluster" on ARM64 is derived (I guess a better
> > word than "invented" hehe) from sibling information instead of PPTT. But
> > using that information should work fine right?
>
> Sorry no that is not correct. The cluster information is correctly read
> from the ACPI tables and the cluster ids are avaialble in
>
> /sys/devices/system/cpu/cpuXX/topology/cluster_id
>
Agreed, the parts of ACPI specification has added notion of cluster sprinkle
across various chapters(mostly added by Arm in the earlier days though the
Arm architecture specification itself doesn't have any standard definition
for the cluster). Also note it nicely adds a disclaimer:
| Different architectures use different terminology to denominate logically
| associated processors, but terms such as package, cluster, module, and
| socket are typical examples.
So how can one use these across architectures ? Package/Socket is quite
standard. Cluster can be group of processors or it can also be group of
processor clusters. One of the Arm vendors call it super cluster or something.
All these makes it super hard for a generic OS to interpret that information.
Just CONFIG_SCHED_CLUSTER was added with one notion of cluster which was soon
realised doesn't match with some other notion of it.
We can enable it and I am sure someone will report a regression on their
platform and we need to disable it again. The benchmark doesn't purely
depend on just the "notion" of cluster but it is often related to the
private resource and how they are shared in the system. So even if you
strictly follow the notion of cluster as supported by CONFIG_SCHED_CLUSTER
it will fail on systems where the private resources are shared across the
"cluster" boundaries or some variant configuration.
> if CONFIG_SCHED_CLUSTER is enabled.
>
> If there is an issue then it is a problem with the vendor firmware
> providing cluster id configurations via ACPI that cause regressions.
>
As mentioned, it is not strictly just the cluster id but other shared
resources that contribute to the issues/regressions.
> We could add a blacklist for those platforms to avoid regressions but we
> should not allow that to hinder us to enable full support for clustering
> on ARM64.
>
Sure, but we need to improve the "cluster" definition in the ACPI and Arm
specification, get an agreement on what it means for other architecture
first IMO. We don't want to revisit the same topic again without these as
IIRC this is the second time we are discussion around this topic.
--
Regards,
Sudeep
Powered by blists - more mailing lists