lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aOco9dzjzcWJBNYh@aschofie-mobl2.lan>
Date: Wed, 8 Oct 2025 20:16:05 -0700
From: Alison Schofield <alison.schofield@...el.com>
To: "Fabio M. De Francesco" <fabio.m.de.francesco@...ux.intel.com>
CC: <linux-cxl@...r.kernel.org>, Davidlohr Bueso <dave@...olabs.net>,
	"Jonathan Cameron" <jonathan.cameron@...wei.com>, Dave Jiang
	<dave.jiang@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny
	<ira.weiny@...el.com>, "Dan Williams" <dan.j.williams@...el.com>, Jonathan
 Corbet <corbet@....net>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Gregory Price <gourry@...rry.net>, Robert
 Richter <rrichter@....com>, Cheatham Benjamin <benjamin.cheatham@....com>
Subject: Re: [PATCH 2/4 v5] cxl/core: Add helpers to detect Low Memory Holes
 on x86

On Mon, Oct 06, 2025 at 05:58:05PM +0200, Fabio M. De Francesco wrote:
> On a x86 platform with a low memory hole (LHM), the BIOS may publish
> CFMWS that describes a system physical address (SPA) range that
> typically is only a subset of the corresponding CXL intermediate switch
> and endpoint decoder's host physical address (HPA) ranges. The CFMWS
> range never intersects the LHM and so the driver instantiates a root
> decoder whose HPA range size doesn't fully contain the matching switch
> and endpoint decoders' HPA ranges.[1]
> 
> To construct regions and attach decoders, the driver needs to match root
> decoders and regions with endpoint decoders. The process fails and
> returns errors because the driver is not designed to deal with SPA
> ranges which are smaller than the corresponding hardware decoders HPA
> ranges.
> 
> Introduce two functions that indirectly detect the presence of x86 LMH
> and allow the matching between a root decoder or an already constructed
> region with a corresponding intermediate switch or endpoint decoder to
> enable the construction of a region and the subsequent attachment of the
> same decoders to that region.
> 
> These functions return true when SPA/HPA misalignments due to LMH's are
> detected under specific conditions:
> 
> - Both the SPA and HPA ranges must start at LMH_CFMWS_RANGE_START (i.e.,
>   0x0 on x86 with LMH's).
> - The SPA range's size is less than HPA's.
> - The SPA range's size is less than 4G.
> - The HPA range's size is aligned to the NIW * 256M rule.
> 
> Also introduce a function that adjusts the range end of a region to be
> constructed and the DPA range's end of the endpoint decoders that will
> be later attached to that region.

Hi Fabio,

Your getting some fresh eyes on some of this with my review.
The adjustment of resources is what caught my eye, and I looked at
platform_res_adjust() in this patch and it's usage in the next patch.


> 
> [1] commit 7a81173f3 ("cxl: Documentation/driver-api/cxl: Describe the x86 Low Memory Hole solution")
> 
> Cc: Alison Schofield <alison.schofield@...el.com>
> Cc: Dan Williams <dan.j.williams@...el.com>
> Cc: Dave Jiang <dave.jiang@...el.com>
> Cc: Ira Weiny <ira.weiny@...el.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@...ux.intel.com>
> ---
>  drivers/cxl/Kconfig                |  4 ++
>  drivers/cxl/core/Makefile          |  1 +
>  drivers/cxl/core/platform_quirks.c | 99 ++++++++++++++++++++++++++++++
>  drivers/cxl/core/platform_quirks.h | 33 ++++++++++
>  4 files changed, 137 insertions(+)
>  create mode 100644 drivers/cxl/core/platform_quirks.c
>  create mode 100644 drivers/cxl/core/platform_quirks.h
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 48b7314afdb8..03c0583bc9a3 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -211,6 +211,10 @@ config CXL_REGION
>  
>  	  If unsure say 'y'
>  
> +config CXL_PLATFORM_QUIRKS
> +	def_bool y
> +	depends on CXL_REGION
> +

Why no help text for the new CONFIG option?
That text will probably answer my next question: why do we have the
option?

snip


I have comments for the callsites of platform_res_adjust() in the next
patch, but I'll pull some of that back into this patch to keep it all
in one, more logical, place.

There are 2 callsites, and one passes in NULL for 'res' because
at that site we know that the regions struct res has been adjusted.
I felt that was subtle, and that it may be better to just pass in
the 'res' all the time and let the function adjust if needed,
ignore if not needed.

The name platform_res_adjust() suggested that the 'res' as in the
region 'res' was getting adjusted. This is adjusting multiple resources
- the region resource and the endpoint decoder dpa resource. If it's
meant to be kind of opaque, that's ok, but by using _res_ it sure sounds
like it's adjusting the the region resource (when viewed from the call site).

I might have done this in 2 helpers for crispness:
res = platform_adjust_region_resource()
cxled = platform_adjust_endpoint_decoder()

Then you could adjust the region resource once when the region
is constructed, and the endpoint regions every time in 
cxl_add_to_region().

If you are settled with one adjust routine, perhaps just a 
rename to platform_adjust_resources() will make it sound as
broad as it is.


> +void platform_res_adjust(struct resource *res,
> +			 struct cxl_endpoint_decoder *cxled,
> +			 const struct cxl_root_decoder *cxlrd)
> +{
> +	if (!platform_cxlrd_matches_cxled(cxlrd, cxled))
> +		return;
> +
> +	guard(rwsem_write)(&cxl_rwsem.dpa);
> +	dev_dbg(cxled_to_memdev(cxled)->dev.parent,
> +		"Low Memory Hole detected. Resources were (%s: %pr, %pr)\n",
> +		dev_name(&cxled->cxld.dev), res, cxled->dpa_res);
> +	if (res) {
> +		/* Trim region resource overlap with LMH */
> +		res->end = cxlrd->res->end;
> +	}

Prefer dev_info so always appears.
Prefer to see the region name.
I'm guessing the dev_dbg() above and the dev_info() below are written
with the idea that we want the before view only in dev_dbg() and the
after view only in dev_info().

Looks like this now:
[] cxl_core:platform_res_adjust:90: cxl_mock_mem cxl_mem.0: Low Memory Hole detected. Resources were (decoder12.0: [mem 0x3ff010000000-0x3ff04fffffff flags 0x200], [mem 0x00000000-0x1fffffff flags 0x80000200])
[] cxl_mock_mem cxl_mem.0: Resources have been adjusted for LMH (decoder12.0: [mem 0x3ff010000000-0x3ff03fffffff flags 0x200], [mem 0x00000000-0x17ffffff flags 0x80000200])
[] cxl_core:platform_res_adjust:90: cxl_mock_mem cxl_mem.4: Low Memory Hole detected. Resources were (decoder13.0: (null), [mem 0x00000000-0x1fffffff flags 0x80000200])
[] cxl_mock_mem cxl_mem.4: Resources have been adjusted for LMH (decoder13.0: (null), [mem 0x00000000-0x17ffffff flags 0x80000200])

I'll suggest this to emit explicitly what is changing:
[] cxl region0: LMH Low memory hole trims region resource [mem 0x3ff010000000-0x3ff04fffffff flags 0x200] to [mem 0x3ff010000000-0x3ff03fffffff flags 0x200])
[] cxl decoder13.0: LMH Low memory hole trims DPA resource [mem 0x00000000-0x1fffffff flags 0x80000200] to [mem 0x00000000-0x17ffffff flags 0x80000200])
[] cxl decoder17.0: LMH Low memory hole trims DPA resource [mem 0x00000000-0x1fffffff flags 0x80000200] to [mem 0x00000000-0x17ffffff flags 0x80000200])


> +	/* Match endpoint decoder's DPA resource to root decoder's */
A 'Match' would be if the the endpoint and root decoder resource were
same. This is more of adjustment or recalculation of the DPA length.

> +	cxled->dpa_res->end =
> +		cxled->dpa_res->start +
> +		resource_size(cxlrd->res) / cxled->cxld.interleave_ways - 1;

I'm cautious about the use of division and suggest this as the more
bullet-proof kernel style:

	slice = div_u64(resource_size(cxlrd->res), cxled->cxld.interleave_ways);
	cxled->dpa_res->end = cxled->dpa_res->start + slice - 1;



> +	dev_info(cxled_to_memdev(cxled)->dev.parent,
> +		 "Resources have been adjusted for LMH (%s: %pr, %pr)\n",
> +		 dev_name(&cxled->cxld.dev), res, cxled->dpa_res);
> +}

Here's the diff showing how I emmited the that messaging above. I really
wanted to have that region name to emit. This was done keeping the
adjust in one function, but maybe you'll choose to split :)


---
 drivers/cxl/core/platform_quirks.c | 32 ++++++++++++++++++------------
 drivers/cxl/core/platform_quirks.h |  6 ++++--
 drivers/cxl/core/region.c          | 15 ++++++++------
 3 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/drivers/cxl/core/platform_quirks.c b/drivers/cxl/core/platform_quirks.c
index aecd376f2766..aa25770c088a 100644
--- a/drivers/cxl/core/platform_quirks.c
+++ b/drivers/cxl/core/platform_quirks.c
@@ -81,24 +81,30 @@ EXPORT_SYMBOL_NS_GPL(__platform_region_matches_cxld, "CXL");
 
 void platform_res_adjust(struct resource *res,
 			 struct cxl_endpoint_decoder *cxled,
-			 const struct cxl_root_decoder *cxlrd)
+			 const struct cxl_root_decoder *cxlrd,
+			 const struct device *region_dev)
 {
+	struct resource dpa_res_orig = *cxled->dpa_res;
+	u64 slice;
+
 	if (!platform_cxlrd_matches_cxled(cxlrd, cxled))
 		return;
 
 	guard(rwsem_write)(&cxl_rwsem.dpa);
-	dev_dbg(cxled_to_memdev(cxled)->dev.parent,
-		"Low Memory Hole detected. Resources were (%s: %pr, %pr)\n",
-		dev_name(&cxled->cxld.dev), res, cxled->dpa_res);
-	if (res) {
-		/* Trim region resource overlap with LMH */
+
+	/* Region resource will need a trim at first endpoint attach only */
+	if ((res) && (res->end != cxlrd->res->end)) {
+		dev_info(region_dev,
+			 "LMH Low memory hole trims region resource %pr to %pr)\n",
+			 res, cxlrd->res);
 		res->end = cxlrd->res->end;
 	}
-	/* Match endpoint decoder's DPA resource to root decoder's */
-	cxled->dpa_res->end =
-		cxled->dpa_res->start +
-		resource_size(cxlrd->res) / cxled->cxld.interleave_ways - 1;
-	dev_info(cxled_to_memdev(cxled)->dev.parent,
-		 "Resources have been adjusted for LMH (%s: %pr, %pr)\n",
-		 dev_name(&cxled->cxld.dev), res, cxled->dpa_res);
+
+	/* Adjust the endpoint decoder DPA resource end */
+	slice = div_u64(resource_size(cxlrd->res), cxled->cxld.interleave_ways);
+	cxled->dpa_res->end = cxled->dpa_res->start + slice - 1;
+
+	dev_info(&cxled->cxld.dev,
+		 "LMH Low memory hole trims DPA resource %pr to %pr)\n",
+		 &dpa_res_orig, cxled->dpa_res);
 }
diff --git a/drivers/cxl/core/platform_quirks.h b/drivers/cxl/core/platform_quirks.h
index bdea00365dad..55647711cdb4 100644
--- a/drivers/cxl/core/platform_quirks.h
+++ b/drivers/cxl/core/platform_quirks.h
@@ -17,7 +17,8 @@ bool __platform_region_matches_cxld(const struct cxl_region_params *p,
 				    const struct cxl_decoder *cxld);
 void platform_res_adjust(struct resource *res,
 			 struct cxl_endpoint_decoder *cxled,
-			 const struct cxl_root_decoder *cxlrd);
+			 const struct cxl_root_decoder *cxlrd,
+			 const struct device *region_dev);
 #else
 static inline bool
 platform_cxlrd_matches_cxled(const struct cxl_root_decoder *cxlrd,
@@ -35,7 +36,8 @@ platform_region_matches_cxld(const struct cxl_region_params *p,
 
 inline void platform_res_adjust(struct resource *res,
 				struct cxl_endpoint_decoder *cxled,
-				const struct cxl_root_decoder *cxlrd)
+				const struct cxl_root_decoder *cxlrd,
+				const struct device *region_dev);
 {
 }
 #endif /* CONFIG_CXL_PLATFORM_QUIRKS */
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 9a499bfca23d..d4298a61b912 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3502,7 +3502,7 @@ static int __construct_region(struct cxl_region *cxlr,
 	 * Trim the HPA retrieved from hardware to fit the SPA mapped by the
 	 * platform
 	 */
-	platform_res_adjust(res, cxled, cxlrd);
+	platform_res_adjust(res, cxled, cxlrd, &cxlr->dev);
 
 	rc = cxl_extended_linear_cache_resize(cxlr, res);
 	if (rc && rc != -EOPNOTSUPP) {
@@ -3611,14 +3611,17 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
 	mutex_lock(&cxlrd->range_lock);
 	struct cxl_region *cxlr __free(put_cxl_region) =
 		cxl_find_region_by_range(cxlrd, cxled);
-	if (!cxlr)
+	if (!cxlr) {
 		cxlr = construct_region(cxlrd, cxled);
-	else
+	} else {
 		/*
-		 * Adjust the Endpoint Decoder's dpa_res to fit the Region which
-		 * it has to be attached to
+		 * Platform adjustments are done in construct_region()
+		 * for first target, and here for additional targets.
 		 */
-		platform_res_adjust(NULL, cxled, cxlrd);
+		p = &cxlr->params;
+		platform_res_adjust(p->res, cxled, cxlrd, &cxlr->dev);
+	}
+
 	mutex_unlock(&cxlrd->range_lock);
 
 	rc = PTR_ERR_OR_ZERO(cxlr);
-- 
2.37.3

> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ