[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z9MWUhHmZ5ND0b_e@gourry-fedora-PF4VCD3F>
Date: Thu, 13 Mar 2025 13:30:58 -0400
From: Gregory Price <gourry@...rry.net>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: lsf-pc@...ts.linux-foundation.org, linux-mm@...ck.org,
linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [LSF/MM] CXL Boot to Bash - Section 0: ACPI and Linux Resources
On Thu, Mar 13, 2025 at 04:55:39PM +0000, Jonathan Cameron wrote:
>
> Maybe ignore Generic Initiators for this doc. They are relevant for
> CXL but in the fabric they only matter for type 1 / 2 devices not
> memory and only if the BIOS wants to do HMAT for end to end. Gets
> more fun when they are in the host side of the root bridge.
>
Fair, I wanted to reference the proposals but I personally don't have a
strong understanding of this yet. Dave Jiang mentioned wanting to write
some info on CDAT with some reference to the Generic Port work as well.
Some help understanding this a little better would be very much
appreciated, but I like your summary below. Noted for updated version.
> # Generic Port
>
> In the scenario where CXL memory devices are not present at boot, or
> not configured by the BIOS or he BIOS has not provided full HMAT
> descriptions for the configured memory, we may still want to
> generate proximity domain configurations for those devices.
> The Generic Port structures are intended to fill this gap, so
> that performance information can still be utilized when the
> devices are available at runtime by combining host information
> with that discovered from devices.
>
> Or just
> # Generic Ports
>
> These are fun ;)
>
> >
> > ====
> > HMAT
> > ====
> > The Heterogeneous Memory Attributes Table contains information such as
> > cache attributes and bandwidth and latency details for memory proximity
> > domains. For the purpose of this document, we will only discuss the
> > SSLIB entry.
>
> No fun. You miss Intel's extensions to memory-side caches ;)
> (which is wise!)
>
Yes yes, but I'm trying to be nice. I'm debating on writing the Section
4 interleave addendum on Zen5 too :P
> > ==================
> > NUMA node creation
> > ===================
> > NUMA nodes are *NOT* hot-pluggable. All *POSSIBLE* NUMA nodes are
> > identified at `__init` time, more specifically during `mm_init`.
> >
> > What this means is that the CEDT and SRAT must contain sufficient
> > `proximity domain` information for linux to identify how many NUMA
> > nodes are required (and what memory regions to associate with them).
>
> Is it worth talking about what is effectively a constraint of the spec
> and what is a Linux current constraint?
>
> SRAT is only ACPI defined way of getting Proximity nodes. Linux chooses
> to at most map those 1:1 with NUMA nodes.
> CEDT adds on description of SPA ranges where there might be memory that Linux
> might want to map to 1 or more NUMA nodes
>
Rather than asking if it's worth talking about, I'll spin that around
and ask what value the distinction adds. The source of the constraint
seems less relevant than "All nodes must be defined during mm_init by
something - be it ACPI or CXL source data".
Maybe if this turns into a book, it's worth breaking it out for
referential purposes (pointing to each point in each spec).
> >
> > Basically, the heuristic is as follows:
> > 1) Add one NUMA node per Proximity Domain described in SRAT
>
> if it contains, memory, CPU or generic initiator.
>
noted
> > 2) If the SRAT describes all memory described by all CFMWS
> > - do not create nodes for CFMWS
> > 3) If SRAT does not describe all memory described by CFMWS
> > - create a node for that CFMWS
> >
> > Generally speaking, you will see one NUMA node per Host bridge, unless
> > inter-host-bridge interleave is in use (see Section 4 - Interleave).
>
> I just love corners: QoS concerns might mean multiple CFMWS and hence
> multiple nodes per host bridge (feel free to ignore this one - has
> anyone seen this in the wild yet?) Similar mess for properties such
> as persistence, sharing etc.
This actually come up as a result of me writing this - this does exist
in the wild and is causing all kinds of fun on the weighted_interleave
functionality.
I plan to come back and add this as an addendum, but probably not until
after LSF.
We'll probably want to expand this into a library of case studies that
cover these different choices - in hopes of getting some set of
*suggested* configurations for platform vendors to help play nice with
linux (especially for things that actually consume these blasted nodes).
~Gregory
Powered by blists - more mailing lists