lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHaBdj0QrEe_gymR@aschofie-mobl2.lan>
Date: Tue, 15 Jul 2025 09:27:34 -0700
From: Alison Schofield <alison.schofield@...el.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<nvdimm@...ts.linux.dev>, <linux-fsdevel@...r.kernel.org>,
	<linux-pm@...r.kernel.org>, Davidlohr Bueso <dave@...olabs.net>, "Jonathan
 Cameron" <jonathan.cameron@...wei.com>, Dave Jiang <dave.jiang@...el.com>,
	Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
	"Dan Williams" <dan.j.williams@...el.com>, Matthew Wilcox
	<willy@...radead.org>, Jan Kara <jack@...e.cz>, "Rafael J . Wysocki"
	<rafael@...nel.org>, Len Brown <len.brown@...el.com>, Pavel Machek
	<pavel@...nel.org>, Li Ming <ming.li@...omail.com>, Jeff Johnson
	<jeff.johnson@....qualcomm.com>, "Ying Huang" <huang.ying.caritas@...il.com>,
	Yao Xingtao <yaoxt.fnst@...itsu.com>, Peter Zijlstra <peterz@...radead.org>,
	Greg KH <gregkh@...uxfoundation.org>, Nathan Fontenot
	<nathan.fontenot@....com>, Terry Bowman <terry.bowman@....com>, Robert
 Richter <rrichter@....com>, Benjamin Cheatham <benjamin.cheatham@....com>,
	PradeepVineshReddy Kodamati <PradeepVineshReddy.Kodamati@....com>, Zhijian Li
	<lizhijian@...itsu.com>
Subject: Re: [PATCH v4 7/7] cxl/dax: Defer DAX consumption of SOFT RESERVED
 resources until after CXL region creation

On Tue, Jun 03, 2025 at 10:19:49PM +0000, Smita Koralahalli wrote:
> From: Nathan Fontenot <nathan.fontenot@....com>
> 
> The DAX HMEM driver currently consumes all SOFT RESERVED iomem resources
> during initialization. This interferes with the CXL driver’s ability to
> create regions and trim overlapping SOFT RESERVED ranges before DAX uses
> them.
> 
> To resolve this, defer the DAX driver's resource consumption if the
> cxl_acpi driver is enabled. The DAX HMEM initialization skips walking the
> iomem resource tree in this case. After CXL region creation completes,
> any remaining SOFT RESERVED resources are explicitly registered with the
> DAX driver by the CXL driver.
> 
> This sequencing ensures proper handling of overlaps and fixes hotplug
> failures.

Hi Smita,

About the issue I first mentioned here [1]. The HMEM driver is not
waiting for region probe to finish. By the time region probe attempts
to hand off the memory to DAX, the memory is already marked as System RAM.

See 'case CXL_PARTMODE_RAM:' in cxl_region_probe(). The is_system_ram()
test fails so devm_cxl_add_dax_region() not possible.

This means that in appearance, just looking at /proc/iomem/, this
seems to have worked. There is no soft reserved and the dax and
kmem resources are child resources of the region resource. But they
were not set up by the region driver, hence no unregister callback
is triggered when the region is disabled.

It appears like this: 

c080000000-17dbfffffff : CXL Window 0
  c080000000-c47fffffff : region2
    c080000000-c47fffffff : dax0.0
      c080000000-c47fffffff : System RAM (kmem)

Now, to make the memory available for reuse, need to do:
# daxctl offline-memory dax0.0
# daxctl destroy-device --force dax0.0
# cxl disable-region 2
# cxl destroy-region 2

Whereas previously, did this:
# daxctl offline-memory dax0.0
# cxl disable-region 2
  After disabling region, dax device unregistered.
# cxl destroy-region 2

I do see that __cxl_region_softreserv_update() is not called until
after cxl_region_probe() completes, so that is waiting properly to
pick up the scraps. I'm actually not sure there would be any scraps
though, if the HMEM driver has already done it's thing. In my case
the Soft Reserved size is same as region, so I cannot tell what
would happen if that Soft Reserved had more capacity than the region.

If I do this: # CONFIG_DEV_DAX_HMEM is not set, works same as before,
which is as expected. 

Let me know if I can try anything else out or collect more info.

--Alison


[1] https://lore.kernel.org/nvdimm/20250603221949.53272-1-Smita.KoralahalliChannabasappa@amd.com/T/#m10c0eb7b258af7cd0c84c7ee2c417c055724f921


> 
> Co-developed-by: Nathan Fontenot <Nathan.Fontenot@....com>
> Signed-off-by: Nathan Fontenot <Nathan.Fontenot@....com>
> Co-developed-by: Terry Bowman <terry.bowman@....com>
> Signed-off-by: Terry Bowman <terry.bowman@....com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
> ---
>  drivers/cxl/core/region.c | 10 +++++++++
>  drivers/dax/hmem/device.c | 43 ++++++++++++++++++++-------------------
>  drivers/dax/hmem/hmem.c   |  3 ++-
>  include/linux/dax.h       |  6 ++++++
>  4 files changed, 40 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3a5ca44d65f3..c6c0c7ba3b20 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -10,6 +10,7 @@
>  #include <linux/sort.h>
>  #include <linux/idr.h>
>  #include <linux/memory-tiers.h>
> +#include <linux/dax.h>
>  #include <cxlmem.h>
>  #include <cxl.h>
>  #include "core.h"
> @@ -3553,6 +3554,11 @@ static struct resource *normalize_resource(struct resource *res)
>  	return NULL;
>  }
>  
> +static int cxl_softreserv_mem_register(struct resource *res, void *unused)
> +{
> +	return hmem_register_device(phys_to_target_node(res->start), res);
> +}
> +
>  static int __cxl_region_softreserv_update(struct resource *soft,
>  					  void *_cxlr)
>  {
> @@ -3590,6 +3596,10 @@ int cxl_region_softreserv_update(void)
>  				    __cxl_region_softreserv_update);
>  	}
>  
> +	/* Now register any remaining SOFT RESERVES with DAX */
> +	walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED, IORESOURCE_MEM,
> +			    0, -1, NULL, cxl_softreserv_mem_register);
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_region_softreserv_update, "CXL");
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index 59ad44761191..cc1ed7bbdb1a 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -8,7 +8,6 @@
>  static bool nohmem;
>  module_param_named(disable, nohmem, bool, 0444);
>  
> -static bool platform_initialized;
>  static DEFINE_MUTEX(hmem_resource_lock);
>  static struct resource hmem_active = {
>  	.name = "HMEM devices",
> @@ -35,9 +34,7 @@ EXPORT_SYMBOL_GPL(walk_hmem_resources);
>  
>  static void __hmem_register_resource(int target_nid, struct resource *res)
>  {
> -	struct platform_device *pdev;
>  	struct resource *new;
> -	int rc;
>  
>  	new = __request_region(&hmem_active, res->start, resource_size(res), "",
>  			       0);
> @@ -47,21 +44,6 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
>  	}
>  
>  	new->desc = target_nid;
> -
> -	if (platform_initialized)
> -		return;
> -
> -	pdev = platform_device_alloc("hmem_platform", 0);
> -	if (!pdev) {
> -		pr_err_once("failed to register device-dax hmem_platform device\n");
> -		return;
> -	}
> -
> -	rc = platform_device_add(pdev);
> -	if (rc)
> -		platform_device_put(pdev);
> -	else
> -		platform_initialized = true;
>  }
>  
>  void hmem_register_resource(int target_nid, struct resource *res)
> @@ -83,9 +65,28 @@ static __init int hmem_register_one(struct resource *res, void *data)
>  
>  static __init int hmem_init(void)
>  {
> -	walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
> -			IORESOURCE_MEM, 0, -1, NULL, hmem_register_one);
> -	return 0;
> +	struct platform_device *pdev;
> +	int rc;
> +
> +	if (!IS_ENABLED(CONFIG_CXL_ACPI)) {
> +		walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
> +				    IORESOURCE_MEM, 0, -1, NULL,
> +				    hmem_register_one);
> +	}
> +
> +	pdev = platform_device_alloc("hmem_platform", 0);
> +	if (!pdev) {
> +		pr_err("failed to register device-dax hmem_platform device\n");
> +		return -1;
> +	}
> +
> +	rc = platform_device_add(pdev);
> +	if (rc) {
> +		pr_err("failed to add device-dax hmem_platform device\n");
> +		platform_device_put(pdev);
> +	}
> +
> +	return rc;
>  }
>  
>  /*
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 3aedef5f1be1..a206b9b383e4 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -61,7 +61,7 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> -static int hmem_register_device(int target_nid, const struct resource *res)
> +int hmem_register_device(int target_nid, const struct resource *res)
>  {
>  	struct device *host = &dax_hmem_pdev->dev;
>  	struct platform_device *pdev;
> @@ -124,6 +124,7 @@ static int hmem_register_device(int target_nid, const struct resource *res)
>  	platform_device_put(pdev);
>  	return rc;
>  }
> +EXPORT_SYMBOL_GPL(hmem_register_device);
>  
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index a4ad3708ea35..5052dca8b3bc 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -299,10 +299,16 @@ static inline int dax_mem2blk_err(int err)
>  
>  #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
>  void hmem_register_resource(int target_nid, struct resource *r);
> +int hmem_register_device(int target_nid, const struct resource *res);
>  #else
>  static inline void hmem_register_resource(int target_nid, struct resource *r)
>  {
>  }
> +
> +static inline int hmem_register_device(int target_nid, const struct resource *res)
> +{
> +	return 0;
> +}
>  #endif
>  
>  typedef int (*walk_hmem_fn)(int target_nid, const struct resource *res);
> -- 
> 2.17.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ