[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67ec7814ea055_73d8294e0@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 1 Apr 2025 16:34:44 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Gregory Price <gourry@...rry.net>, Dan Williams <dan.j.williams@...el.com>
CC: "Fabio M. De Francesco" <fabio.m.de.francesco@...ux.intel.com>, "Davidlohr
Bueso" <dave@...olabs.net>, Jonathan Cameron <jonathan.cameron@...wei.com>,
Dave Jiang <dave.jiang@...el.com>, Alison Schofield
<alison.schofield@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, "Ira
Weiny" <ira.weiny@...el.com>, Robert Richter <rrichter@....com>,
<ming.li@...omail.com>, <linux-kernel@...r.kernel.org>,
<linux-cxl@...r.kernel.org>
Subject: Re: [PATCH 2/4 v2] cxl/core: Add helpers to detect Low memory Holes
on x86
Gregory Price wrote:
> On Tue, Apr 01, 2025 at 01:32:33PM -0700, Dan Williams wrote:
> > Gregory Price wrote:
> > > Is there a reason not to handle more than just LMH's in this set?
> >
> > This discussion was referenced recently on an IM and I wanted to share
> > my response to it:
> >
> > The rules for when to apply this memory hole quirk are explicit and
> > suitable to add to the CXL specification. I want the same standard for
> > any other quirk and ideally some proof-of-work to get that quirk
> > recognized by the specification. Otherwise, I worry that generalizing
> > for all the possible ways that platform BIOS tries to be clever means we
> > end up with something that has no rules.
> >
> > The spec is there to allow software to delineate valid configurations vs
> > mistakes, and this slow drip of "Linux does not understand this platform
> > configuration" is a spec gap.
>
> Note: I've since come around to understand the whole ecosystem a bit
> better since i wrote this response.
Yes, I should have acknowledged shifts in understanding since this
thread went quiet. Fabio was about to spin this set again to add more
"generalization" and I wanted to clarify my current thinking that
generalization is the opposite of what should happen here.
> I don't know that it's needed.
Referring to spec changes? I think they are, see below
> Some of the explanation of this patch series is a bit confusing. It
> justifies itself by saying CFMWS don't intersect memory holes and that
> endpoint decoders have to be 256MB aligned.
>
> /*
> * Match CXL Root and Endpoint Decoders by comparing SPA and HPA ranges.
> *
> * On x86, CFMWS ranges never intersect memory holes while endpoint decoders
> * HPA range sizes are always guaranteed aligned to NIW * 256MB; therefore,
> * the given endpoint decoder HPA range size is always expected aligned and
> * also larger than that of the matching root decoder. If there are LMH's,
> * the root decoder range end is always less than SZ_4G.
> */
>
> But per the spec, CFMWS is also aligned to be aligned to 256MB.
Right, something has to give, i.e. "spec meet reality". Hardware
endpoint decoders must be aligned, that is a shipping expectation, and
endpoints are not in a position to know or care about host platform
constraints. In constrast, the CFMWS definition runs into a practical
problem meeting the same expectation given competing host phyiscal
memory map constraints.
The platforms with this condition want to support CXL mapped starting at
zero *and* the typical/historical PCI MMIO space in low memory (for
those PCI devices that do not support 64-bit addressing). If the CFMWS
blindly followed the 256MB*NIW constraint the CXL window would overlap
the MMIO space. So the choices are:
1/ Give up on mapping CXL starting at zero when 256MB * NIW does not fit
2/ Give up on maintaining historical availabilty of and compatibility
with 32-bit only PCI devices (PCI configuration regression)
3/ Trim CFMWS to match the reality that the platform will always route
memory cycles in that PCI MMIO range to PCI MMIO and never to CXL.
4/ Define some new protocol for when CFMWS is explicitly countermanded
by other platform resource descriptors, and not a BIOS bug.
The platform in question chose option 3.
> Shouldn't the platform work around memory holes to generate multiple
> CFMWS for the entire capacity, and then use multiple endpoint decoders
> (1 per CFMWS) to map the capacity accordingly?
Per above, the maths do not work out to be able to support that relative
to a CXL region with problematic NIW.
> (Also, I still don't understand the oracle value of <4GB address range.
> It seems like if this is some quirk of SPA vs HPA alignment, then it
> can hold for *all* ocurrances, not just stuff below 4GB)
The goal is to get platform vendors to define the rules so that an OS
has a reasonable expectation to know what is a valid vs invalid
configuration. A hole above 4GB has no reason to exist, there is no
resource conflict like PCI MMIO that explains why typical spec
expectation can not be met.
So I want the subsystem to have an explicit set of platform quirks
ideally backed up by updated spec language. That allows us to validate
that the Linux implementation is correct by some objective source of
truth, encourage platform vendors to update that source of truth when
they create new corner cases, or even better, be more mindful to not
create new corner cases.
Powered by blists - more mailing lists