lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf47567b-3277-48a3-a500-81b444cd165a@amd.com>
Date: Tue, 29 Jul 2025 08:48:41 -0700
From: "Koralahalli Channabasappa, Smita"
 <Smita.KoralahalliChannabasappa@....com>
To: dan.j.williams@...el.com, linux-cxl@...r.kernel.org,
 linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev,
 linux-fsdevel@...r.kernel.org, linux-pm@...r.kernel.org
Cc: Davidlohr Bueso <dave@...olabs.net>,
 Jonathan Cameron <jonathan.cameron@...wei.com>,
 Dave Jiang <dave.jiang@...el.com>,
 Alison Schofield <alison.schofield@...el.com>,
 Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
 Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
 "Rafael J . Wysocki" <rafael@...nel.org>, Len Brown <len.brown@...el.com>,
 Pavel Machek <pavel@...nel.org>, Li Ming <ming.li@...omail.com>,
 Jeff Johnson <jeff.johnson@....qualcomm.com>,
 Ying Huang <huang.ying.caritas@...il.com>,
 Yao Xingtao <yaoxt.fnst@...itsu.com>, Peter Zijlstra <peterz@...radead.org>,
 Greg KH <gregkh@...uxfoundation.org>,
 Nathan Fontenot <nathan.fontenot@....com>,
 Terry Bowman <terry.bowman@....com>, Robert Richter <rrichter@....com>,
 Benjamin Cheatham <benjamin.cheatham@....com>,
 PradeepVineshReddy Kodamati <PradeepVineshReddy.Kodamati@....com>,
 Zhijian Li <lizhijian@...itsu.com>
Subject: Re: [PATCH v5 3/7] cxl/acpi: Add background worker to coordinate with
 cxl_mem probe completion

Hi Dan,

On 7/23/2025 12:31 AM, dan.j.williams@...el.com wrote:
> Smita Koralahalli wrote:
>> Introduce a background worker in cxl_acpi to delay SOFT RESERVE handling
>> until the cxl_mem driver has probed at least one device. This coordination
>> ensures that DAX registration or fallback handling for soft-reserved
>> regions is not triggered prematurely.
>>
>> The worker waits on cxl_wait_queue, which is signaled via
>> cxl_mem_active_inc() during cxl_mem_probe(). Once at least one memory
>> device probe is confirmed, the worker invokes wait_for_device_probe()
>> to allow the rest of the CXL device hierarchy to complete initialization.
>>
>> Additionally, it also handles initialization order issues where
>> cxl_acpi_probe() may complete before other drivers such as cxl_port or
>> cxl_mem have loaded, especially when cxl_acpi and cxl_port are built-in
>> and cxl_mem is a loadable module. In such cases, using only
>> wait_for_device_probe() is insufficient, as it may return before all
>> relevant probes are registered.
> 
> Right, but that problem is not solved by this which still leaves the
> decision on when to give up on this mechanism, and this mechanism does
> not tell you when follow-on probe work is complete.
> 
>> While region creation happens in cxl_port_probe(), waiting on
>> cxl_mem_active() would be sufficient as cxl_mem_probe() can only succeed
>> after the port hierarchy is in place. Furthermore, since cxl_mem depends
>> on cxl_pci, this also guarantees that cxl_pci has loaded by the time the
>> wait completes.
>>
>> As cxl_mem_active() infrastructure already exists for tracking probe
>> activity, cxl_acpi can use it without introducing new coordination
>> mechanisms.
> 
> In appreciate the instinct to not add anything new, but the module
> loading problem is solvable.
> 
> If the goal is: "I want to give device-dax a point at which it can make
> a go / no-go decision about whether the CXL subsystem has properly
> assembled all CXL regions implied by Soft Reserved instersecting with
> CXL Windows." Then that is something like the below, only lightly tested
> and likely regresses the non-CXL case.
> 
> -- 8< --
>  From 48b25461eca050504cf5678afd7837307b2dd14f Mon Sep 17 00:00:00 2001
> From: Dan Williams <dan.j.williams@...el.com>
> Date: Tue, 22 Jul 2025 16:11:08 -0700
> Subject: [RFC PATCH] dax/cxl: Defer Soft Reserved registration
> 
> CXL and dax_hmem fight over "Soft Reserved" (EFI Specific Purpose Memory)
> resources are published in the iomem resource tree. The entry blocks some
> CXL hotplug flows, and CXL blocks dax_hmem from publishing the memory in
> the event that CXL fails to parse the platform configuration.
> 
> Towards resolving this conflict: (the non-RFC version
> of this patch should split these into separate patches):
> 
> 1/ Defer publishing "Soft Reserved" entries in the iomem resource tree
>     until the consumer, dax_hmem, is ready to use them.
> 
> 2/ Fix detection of "Soft Reserved" vs "CXL Window" resource overlaps by
>     switching from MODULE_SOFTDEP() to request_module() for making sure that
>     cxl_acpi has had a chance to publish "CXL Window" resources.
> 
> 3/ Add cxl_pci to the list of modules that need to have had a chance to
>     scan boot devices such that wait_device_probe() flushes initial CXL
>     topology discovery.
> 
> 4/ Add a workqueue that delays consideration of "Soft Reserved" that
>     overlaps CXL so that the CXL subsystem can complete all of its region
>     assembly.
> 
> For RFC purposes this only solves the reliabilty of the DAX_CXL_MODE_DROP
> case. DAX_CXL_MODE_REGISTER support can follow to shutdown CXL in favor of
> vanilla DAX devices as an emergency fallback for platform configuration
> quirks and bugs.
> 
> Signed-off-by: Dan Williams <dan.j.williams@...el.com>
> ---
>   arch/x86/kernel/e820.c    |  2 +-
>   drivers/dax/hmem/device.c |  4 +-
>   drivers/dax/hmem/hmem.c   | 94 +++++++++++++++++++++++++++++++++------
>   include/linux/ioport.h    | 25 +++++++++++
>   kernel/resource.c         | 58 +++++++++++++++++++-----
>   5 files changed, 156 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index c3acbd26408b..aef1ff2cabda 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1153,7 +1153,7 @@ void __init e820__reserve_resources_late(void)
>   	res = e820_res;
>   	for (i = 0; i < e820_table->nr_entries; i++) {
>   		if (!res->parent && res->end)
> -			insert_resource_expand_to_fit(&iomem_resource, res);
> +			insert_resource_late(res);
>   		res++;
>   	}
>   
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index f9e1a76a04a9..22732b729017 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -83,8 +83,8 @@ static __init int hmem_register_one(struct resource *res, void *data)
>   
>   static __init int hmem_init(void)
>   {
> -	walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
> -			IORESOURCE_MEM, 0, -1, NULL, hmem_register_one);
> +	walk_soft_reserve_res_desc(IORES_DESC_SOFT_RESERVED, IORESOURCE_MEM, 0,
> +				   -1, NULL, hmem_register_one);
>   	return 0;
>   }
>   
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 5e7c53f18491..0916478e3817 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -59,9 +59,45 @@ static void release_hmem(void *pdev)
>   	platform_device_unregister(pdev);
>   }
>   
> +static enum dax_cxl_mode {
> +	DAX_CXL_MODE_DEFER,
> +	DAX_CXL_MODE_REGISTER,
> +	DAX_CXL_MODE_DROP,
> +} dax_cxl_mode;
> +
> +static int handle_deferred_cxl(struct device *host, int target_nid,
> +				const struct resource *res)
> +{
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		if (dax_cxl_mode == DAX_CXL_MODE_DROP)
> +			dev_dbg(host, "dropping CXL range: %pr\n", res);
> +	}
> +	return 0;
> +}
> +
> +struct dax_defer_work {
> +	struct platform_device *pdev;
> +	struct work_struct work;
> +};
> +
> +static void process_defer_work(struct work_struct *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +	struct platform_device *pdev = work->pdev;
> +
> +	/* relies on cxl_acpi and cxl_pci having had a chance to load */
> +	wait_for_device_probe();
> +
> +	dax_cxl_mode = DAX_CXL_MODE_DROP;
> +
> +	walk_hmem_resources(&pdev->dev, handle_deferred_cxl);
> +}
> +
>   static int hmem_register_device(struct device *host, int target_nid,
>   				const struct resource *res)
>   {
> +	struct dax_defer_work *work = dev_get_drvdata(host);
>   	struct platform_device *pdev;
>   	struct memregion_info info;
>   	long id;
> @@ -70,14 +106,21 @@ static int hmem_register_device(struct device *host, int target_nid,
>   	if (IS_ENABLED(CONFIG_CXL_REGION) &&
>   	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>   			      IORES_DESC_CXL) != REGION_DISJOINT) {

I may be wrong here, but could this check fail? While request_module() 
ensures that cxl_acpi and cxl_pci are requested for loading, it does not 
guarantee that either has completed initialization or that region 
enumeration (i.e add_cxl_resources()) has finished by the time we reach 
this check.

We also haven't called wait_for_device_probe() at this point, which is 
typically used to block until all pending device probes are complete.

Thanks
Smita
> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> -		return 0;
> +		switch (dax_cxl_mode) {
> +		case DAX_CXL_MODE_DEFER:
> +			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +			schedule_work(&work->work);
> +			return 0;
> +		case DAX_CXL_MODE_REGISTER:
> +			dev_dbg(host, "registering CXL range: %pr\n", res);
> +			break;
> +		case DAX_CXL_MODE_DROP:
> +			dev_dbg(host, "dropping CXL range: %pr\n", res);
> +			return 0;
> +		}
>   	}
>   
> -	rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> -			       IORES_DESC_SOFT_RESERVED);
> -	if (rc != REGION_INTERSECTS)
> -		return 0;
> +	/* TODO: insert "Soft Reserved" into iomem here */
>   
>   	id = memregion_alloc(GFP_KERNEL);
>   	if (id < 0) {
> @@ -123,8 +166,30 @@ static int hmem_register_device(struct device *host, int target_nid,
>   	return rc;
>   }
>   
> +static void kill_defer_work(void *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +
> +	cancel_work_sync(&work->work);
> +	kfree(work);
> +}
> +
>   static int dax_hmem_platform_probe(struct platform_device *pdev)
>   {
> +	struct dax_defer_work *work = kzalloc(sizeof(*work), GFP_KERNEL);
> +	int rc;
> +
> +	if (!work)
> +		return -ENOMEM;
> +
> +	work->pdev = pdev;
> +	INIT_WORK(&work->work, process_defer_work);
> +
> +	rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, work);
> +	if (rc)
> +		return rc;
> +
> +	platform_set_drvdata(pdev, work);
>   	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>   }
>   
> @@ -139,6 +204,16 @@ static __init int dax_hmem_init(void)
>   {
>   	int rc;
>   
> +	/*
> +	 * Ensure that cxl_acpi and cxl_pci have a chance to kick off
> +	 * CXL topology discovery at least once before scanning the
> +	 * iomem resource tree for IORES_DESC_CXL resources.
> +	 */
> +	if (IS_ENABLED(CONFIG_CXL_REGION)) {
> +		request_module("cxl_acpi");
> +		request_module("cxl_pci");
> +	}
> +
>   	rc = platform_driver_register(&dax_hmem_platform_driver);
>   	if (rc)
>   		return rc;
> @@ -159,13 +234,6 @@ static __exit void dax_hmem_exit(void)
>   module_init(dax_hmem_init);
>   module_exit(dax_hmem_exit);
>   
> -/* Allow for CXL to define its own dax regions */
> -#if IS_ENABLED(CONFIG_CXL_REGION)
> -#if IS_MODULE(CONFIG_CXL_ACPI)
> -MODULE_SOFTDEP("pre: cxl_acpi");
> -#endif
> -#endif
> -
>   MODULE_ALIAS("platform:hmem*");
>   MODULE_ALIAS("platform:hmem_platform*");
>   MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index e8b2d6aa4013..4fc6ab518c24 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -232,6 +232,9 @@ struct resource_constraint {
>   /* PC/ISA/whatever - the normal PC address spaces: IO and memory */
>   extern struct resource ioport_resource;
>   extern struct resource iomem_resource;
> +#ifdef CONFIG_EFI_SOFT_RESERVE
> +extern struct resource soft_reserve_resource;
> +#endif
>   
>   extern struct resource *request_resource_conflict(struct resource *root, struct resource *new);
>   extern int request_resource(struct resource *root, struct resource *new);
> @@ -255,6 +258,22 @@ int adjust_resource(struct resource *res, resource_size_t start,
>   		    resource_size_t size);
>   resource_size_t resource_alignment(struct resource *res);
>   
> +
> +#ifdef CONFIG_EFI_SOFT_RESERVE
> +static inline void insert_resource_late(struct resource *new)
> +{
> +	if (new->desc == IORES_DESC_SOFT_RESERVED)
> +		insert_resource_expand_to_fit(&soft_reserve_resource, new);
> +	else
> +		insert_resource_expand_to_fit(&iomem_resource, new);
> +}
> +#else
> +static inline void insert_resource_late(struct resource *new)
> +{
> +	insert_resource_expand_to_fit(&iomem_resource, new);
> +}
> +#endif
> +
>   /**
>    * resource_set_size - Calculate resource end address from size and start
>    * @res: Resource descriptor
> @@ -409,6 +428,12 @@ walk_system_ram_res_rev(u64 start, u64 end, void *arg,
>   extern int
>   walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
>   		    void *arg, int (*func)(struct resource *, void *));
> +int walk_soft_reserve_res_desc(unsigned long desc, unsigned long flags,
> +			       u64 start, u64 end, void *arg,
> +			       int (*func)(struct resource *, void *));
> +int region_intersects_soft_reserve(struct resource *root, resource_size_t start,
> +				   size_t size, unsigned long flags,
> +				   unsigned long desc);
>   
>   struct resource *devm_request_free_mem_region(struct device *dev,
>   		struct resource *base, unsigned long size);
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 8d3e6ed0bdc1..fd90990c31c6 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -321,8 +321,8 @@ static bool is_type_match(struct resource *p, unsigned long flags, unsigned long
>   }
>   
>   /**
> - * find_next_iomem_res - Finds the lowest iomem resource that covers part of
> - *			 [@start..@end].
> + * find_next_res - Finds the lowest resource that covers part of
> + *		   [@start..@end].
>    *
>    * If a resource is found, returns 0 and @*res is overwritten with the part
>    * of the resource that's within [@start..@end]; if none is found, returns
> @@ -337,9 +337,9 @@ static bool is_type_match(struct resource *p, unsigned long flags, unsigned long
>    * The caller must specify @start, @end, @flags, and @desc
>    * (which may be IORES_DESC_NONE).
>    */
> -static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> -			       unsigned long flags, unsigned long desc,
> -			       struct resource *res)
> +static int find_next_res(struct resource *parent, resource_size_t start,
> +			 resource_size_t end, unsigned long flags,
> +			 unsigned long desc, struct resource *res)
>   {
>   	struct resource *p;
>   
> @@ -351,7 +351,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>   
>   	read_lock(&resource_lock);
>   
> -	for_each_resource(&iomem_resource, p, false) {
> +	for_each_resource(parent, p, false) {
>   		/* If we passed the resource we are looking for, stop */
>   		if (p->start > end) {
>   			p = NULL;
> @@ -382,16 +382,23 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>   	return p ? 0 : -ENODEV;
>   }
>   
> -static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> -				 unsigned long flags, unsigned long desc,
> -				 void *arg,
> -				 int (*func)(struct resource *, void *))
> +static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> +			       unsigned long flags, unsigned long desc,
> +			       struct resource *res)
> +{
> +	return find_next_res(&iomem_resource, start, end, flags, desc, res);
> +}
> +
> +static int walk_res_desc(struct resource *parent, resource_size_t start,
> +			 resource_size_t end, unsigned long flags,
> +			 unsigned long desc, void *arg,
> +			 int (*func)(struct resource *, void *))
>   {
>   	struct resource res;
>   	int ret = -EINVAL;
>   
>   	while (start < end &&
> -	       !find_next_iomem_res(start, end, flags, desc, &res)) {
> +	       !find_next_res(parent, start, end, flags, desc, &res)) {
>   		ret = (*func)(&res, arg);
>   		if (ret)
>   			break;
> @@ -402,6 +409,15 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>   	return ret;
>   }
>   
> +static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> +				 unsigned long flags, unsigned long desc,
> +				 void *arg,
> +				 int (*func)(struct resource *, void *))
> +{
> +	return walk_res_desc(&iomem_resource, start, end, flags, desc, arg, func);
> +}
> +
> +
>   /**
>    * walk_iomem_res_desc - Walks through iomem resources and calls func()
>    *			 with matching resource ranges.
> @@ -426,6 +442,26 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
>   }
>   EXPORT_SYMBOL_GPL(walk_iomem_res_desc);
>   
> +#ifdef CONFIG_EFI_SOFT_RESERVE
> +struct resource soft_reserve_resource = {
> +	.name	= "Soft Reserved",
> +	.start	= 0,
> +	.end	= -1,
> +	.desc	= IORES_DESC_SOFT_RESERVED,
> +	.flags	= IORESOURCE_MEM,
> +};
> +EXPORT_SYMBOL_GPL(soft_reserve_resource);
> +
> +int walk_soft_reserve_res_desc(unsigned long desc, unsigned long flags,
> +			       u64 start, u64 end, void *arg,
> +			       int (*func)(struct resource *, void *))
> +{
> +	return walk_res_desc(&soft_reserve_resource, start, end, flags, desc,
> +			     arg, func);
> +}
> +EXPORT_SYMBOL_GPL(walk_soft_reserve_res_desc);
> +#endif
> +
>   /*
>    * This function calls the @func callback against all memory ranges of type
>    * System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ