[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Ynq6Vt0oNhq7brEf@fedora>
Date: Tue, 10 May 2022 12:17:42 -0700
From: Darren Hart <darren@...amperecomputing.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Jeremy Linton <jeremy.linton@....com>,
Barry Song <21cnbao@...il.com>, bp@...en8.de,
catalin.marinas@....com, dietmar.eggemann@....com,
gregkh@...uxfoundation.org, hpa@...or.com, juri.lelli@...hat.com,
bristot@...hat.com, lenb@...nel.org, mgorman@...e.de,
mingo@...hat.com, peterz@...radead.org, rjw@...ysocki.net,
sudeep.holla@....com, tglx@...utronix.de,
aubrey.li@...ux.intel.com, bsegall@...gle.com,
guodong.xu@...aro.org, liguozhu@...ilicon.com,
linux-acpi@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, mark.rutland@....com,
msys.mizuma@...il.com, prime.zeng@...ilicon.com,
rostedt@...dmis.org, tim.c.chen@...ux.intel.com,
valentin.schneider@....com, vincent.guittot@...aro.org,
will@...nel.org, x86@...nel.org, xuwei5@...wei.com,
yangyicong@...wei.com, linuxarm@...wei.com,
Tian Tao <tiantao6@...ilicon.com>,
Barry Song <song.bao.hua@...ilicon.com>
Subject: Re: [BUG] Re: [PATCH 1/3] topology: Represent clusters of CPUs
within a die
On Mon, May 09, 2022 at 11:15:53AM +0100, Jonathan Cameron wrote:
> On Fri, 6 May 2022 15:24:27 -0500
> Jeremy Linton <jeremy.linton@....com> wrote:
>
> > Hi,
> >
> > It seems this set is:
> >
> > "BUG: arch topology borken"
> > ^code
> >
> > on machines that don't actually have clusters, or provide a
> > representation which might be taken for a cluster. The Ampere Altra for
Hi All,
The fix for this particular issue is upstream:
db1e59483dfd topology: make core_mask include at least cluster_siblings
> > one. So, I guess its my job to relay what I was informed of when I
> > intially proposed something similar a few years back.
> >
> > Neither the ACPI/PPTT spec nor the Arm architectural spec mandate the
> > concept of a "cluster" particularly in the form of a system with cores
> > sharing the L2, which IIRC is the case for the Kunpeng.
>
> It is not. Kunpeng 920 shares l3 tag cache, but not l2 cache (which is
> private for each core).
> As such the existence of a cluster is not distinguished by sharing
> of any cache resources that are in PPTT. There is an argument for potentially
> adding more types of resource to PPTT to give a richer description.
>
> Whilst ACPI doesn't mandate a cluster (there is an example, though that happens
> to have L3 shared across the cluster), it does allow for addition
> hierarchy description. Cluster is just a name for such an extra level.
>
> > And it tends to
> > be a shared L2 which gives the most bang for the buck (or was when I was
> > testing/benchmarking all this, on aarch64) from scheduler changes which
> > create cluster level scheduling domains.
> > But OTOH, things like specJBB
> > didn't really like those smaller MC levels (which I suspect is hurt by
> > this change, without running a full benchmark suite, especially on
> > something like the above ampere, given what is happening to its
> > scheduling domains).
> >
> > So, the one takeway I can give is this, the code below which is
> > attempting to create a cluster level should be a bit more intellegent
> > about whether there is an actual cluster.
>
> I agree that more intelligence is needed, though I think that belongs
> in the interpretation of the cluster level. This particular patch
> should present that information in a consistent fashion. My understanding
> is it is consistent with how other levels have been presented in that
> it's perfectly acceptable to have multiple levels that can be collapsed
> by the users of the description. (perhaps I'm wrong on that?)
>
Collapsing redundant levels is indeed an intentional part of the design as I
understand it.
> > A first order approximation
> > might be adding a check to see if the node immediatly above the CPU
> > contains an L2 and that its shared.
>
> That rules out our clusters, so not a great starting point :)
>
> Darren Hart's recent set for Ampere Altra is fixing a different combination
> but is in some sense similar in that it corrects an assumption that turned
> out to be false in the user of the topology description whilst leaving the
> description alone.
I think that concept is important: "correct assumptions in the abstraction while
leaving the description alone" (provided the description follows the relevant
standards and specifications of course).
>
> > A better fix, of course is the
> > reason this wasn't previously done, and that is to convince the ACPI
> > commitee to standardize a CLUSTER level flag which could be utilized by
> > a firmware/machine manufactuer to decide whether cluster level
> > scheduling provides an advantage and simply not do it on machines which
> > don't flag CLUSTER levels because its not avantagious.
>
> While I obviously can't predict discussions in ASWG, my gut feeling
> is that would be a non starter with questions along the lines of:
>
> 1) Why is this level special? The spec already defines a hierarchical
> description with caches described at each level, so you can infer
> what is intended. If we define cluster, we'll also need to define
> super cluster (we have designs with super clusters and it's only going
> to get worse as systems continue to get bigger.)
While I share Jeremy's concern about the lack of specificity of the term
Cluster, I suspect Jonathan's point about that path leading to more and more
categorization (e.g. super cluster) in a space that is rapidly evolving is
accurate.
Beyond the topology of cores and cpu-side caches, we have other properties to
consider which affect scheduling performance (like the shared snoop filter,
memory-side caches, etc.) and could/should be considered in the heuristics.
Comprehending all of these into a fixed set of defined topology constructs seems
unlikely.
It seems to me we are going to need to respond to a set of properties
rather than attempting to rigidly define what future topologies will be. Even
terms like die and package start to get fuzzy as more and more complex
architectures get pushed down into a single socket.
--
Darren Hart
Ampere Computing / OS and Kernel
Powered by blists - more mailing lists