lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aORscMprmQyGlohw@aschofie-mobl2.lan>
Date: Mon, 6 Oct 2025 18:27:12 -0700
From: Alison Schofield <alison.schofield@...el.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<nvdimm@...ts.linux.dev>, <linux-fsdevel@...r.kernel.org>,
	<linux-pm@...r.kernel.org>, Davidlohr Bueso <dave@...olabs.net>, "Jonathan
 Cameron" <jonathan.cameron@...wei.com>, Dave Jiang <dave.jiang@...el.com>,
	Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
	"Dan Williams" <dan.j.williams@...el.com>, Matthew Wilcox
	<willy@...radead.org>, Jan Kara <jack@...e.cz>, "Rafael J . Wysocki"
	<rafael@...nel.org>, Len Brown <len.brown@...el.com>, Pavel Machek
	<pavel@...nel.org>, Li Ming <ming.li@...omail.com>, Jeff Johnson
	<jeff.johnson@....qualcomm.com>, "Ying Huang" <huang.ying.caritas@...il.com>,
	Yao Xingtao <yaoxt.fnst@...itsu.com>, Peter Zijlstra <peterz@...radead.org>,
	Greg KH <gregkh@...uxfoundation.org>, Nathan Fontenot
	<nathan.fontenot@....com>, Terry Bowman <terry.bowman@....com>, Robert
 Richter <rrichter@....com>, Benjamin Cheatham <benjamin.cheatham@....com>,
	Zhijian Li <lizhijian@...itsu.com>, "Borislav Petkov" <bp@...en8.de>, Ard
 Biesheuvel <ardb@...nel.org>
Subject: Re: [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling
 until CXL region assembly completes

On Tue, Sep 30, 2025 at 04:47:56AM +0000, Smita Koralahalli wrote:
> From: Dan Williams <dan.j.williams@...el.com>
> 
> Previously, dax_hmem deferred to CXL only when an immediate resource
> intersection with a CXL window was detected. This left a gap: if cxl_acpi
> or cxl_pci probing or region assembly had not yet started, hmem could
> prematurely claim ranges.
> 
> Fix this by introducing a dax_cxl_mode state machine and a deferred
> work mechanism.
> 
> The new workqueue delays consideration of Soft Reserved overlaps until
> the CXL subsystem has had a chance to complete its discovery and region
> assembly. This avoids premature iomem claims, eliminates race conditions
> with async cxl_pci probe, and provides a cleaner handoff between hmem and
> CXL resource management.

Hi Smita,

I've attached what I did to make this work for handoff to DAX after
region assembly failure. I don't know how it fits into the complete
solution. Please take a look.

Thanks,
Alison


> 
> Signed-off-by: Dan Williams <dan.j.williams@...el.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
> ---
>  drivers/dax/hmem/hmem.c | 72 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index c2c110b194e5..0498cb234c06 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -58,9 +58,45 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> +static enum dax_cxl_mode {
> +	DAX_CXL_MODE_DEFER,
> +	DAX_CXL_MODE_REGISTER,
> +	DAX_CXL_MODE_DROP,
> +} dax_cxl_mode;

DAX_CXL_MOD_REGISTER isn't used (yet).  I used it below.
The state machine now goes directly from DEFER -> DROP.
See suggestion in process_defer_work() below.

> +
> +static int handle_deferred_cxl(struct device *host, int target_nid,
> +				const struct resource *res)
> +{
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		if (dax_cxl_mode == DAX_CXL_MODE_DROP)
> +			dev_dbg(host, "dropping CXL range: %pr\n", res);


IORES_DESC_CXL doesn't tell us if a CXL region was successfully assembled.
Even if CXL region assembly fails I think the window resources still be in
the iomem tree. So maybe this check always returns true?

Can we check if the SR conflicts with an existing iomem resource? If CXL
region assembled successfully, it'll conflict, otherwise they'll be no
conflict. No conflict means the range is available for DAX, so register
it.

Here's what worked for me:

	rc = add_soft_reserve_into_iomem(host, res);
        /* The above add probably means patch 5 drops */
	if (rc == -EBUSY) {
		dev_dbg(host, "range already in iomem (CXL owns it): %pr\n", res);
		return 0;
	}
	if (rc) {
		dev_err(host, "failed to add soft reserve to iomem: %d\n", rc);
		return rc;
	}

	dev_dbg(host, "registering released/unclaimed range with DAX: %pr\n", res);

	return hmem_register_device(host, target_nid, res);
	}

> +	}
> +	return 0;
> +}
> +
> +struct dax_defer_work {
> +	struct platform_device *pdev;
> +	struct work_struct work;
> +};
> +
> +static void process_defer_work(struct work_struct *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +	struct platform_device *pdev = work->pdev;
> +
> +	/* relies on cxl_acpi and cxl_pci having had a chance to load */
> +	wait_for_device_probe();

The wait_for_device_probe() didn't wait for region probe to complete.
I couldn't figure out why, so I just 'slept' here in my testing. 
How is that suppose work? Could I have something config'd wrong?

After the long sleep that allowed region assembly to complete, and
fail, this worked for me: 

	/*
        * At this point, CXL has had its chance. Resources that CXL
        * successfully claimed will have resources in iomem. Resources
        * where CXL region assembly failed will be available.
        */
       dax_cxl_mode = DAX_CXL_MODE_REGISTER;

       /*
        * Walk all Soft Reserved ranges and register the ones
        * that CXL didn't claim or that CXL released after failure.
        */
       walk_hmem_resources(&pdev->dev, handle_deferred_cxl);

       /*
        * Future attempts should drop CXL overlaps immediately
        * without deferring again.
        */
> +	dax_cxl_mode = DAX_CXL_MODE_DROP;
> +
> +	walk_hmem_resources(&pdev->dev, handle_deferred_cxl);
> +}
> +
>  static int hmem_register_device(struct device *host, int target_nid,
>  				const struct resource *res)
>  {
> +	struct dax_defer_work *work = dev_get_drvdata(host);
>  	struct platform_device *pdev;
>  	struct memregion_info info;
>  	long id;
> @@ -69,8 +105,18 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>  	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>  			      IORES_DESC_CXL) != REGION_DISJOINT) {
> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> -		return 0;
> +		switch (dax_cxl_mode) {
> +		case DAX_CXL_MODE_DEFER:
> +			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +			schedule_work(&work->work);
> +			return 0;
> +		case DAX_CXL_MODE_REGISTER:
> +			dev_dbg(host, "registering CXL range: %pr\n", res);
> +			break;
> +		case DAX_CXL_MODE_DROP:
> +			dev_dbg(host, "dropping CXL range: %pr\n", res);
> +			return 0;
> +		}
>  	}
>  
>  	rc = region_intersects_soft_reserve(res->start, resource_size(res),
> @@ -125,8 +171,30 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	return rc;
>  }
>  
> +static void kill_defer_work(void *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +
> +	cancel_work_sync(&work->work);
> +	kfree(work);
> +}
> +
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
> +	struct dax_defer_work *work = kzalloc(sizeof(*work), GFP_KERNEL);
> +	int rc;
> +
> +	if (!work)
> +		return -ENOMEM;
> +
> +	work->pdev = pdev;
> +	INIT_WORK(&work->work, process_defer_work);
> +
> +	rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, work);
> +	if (rc)
> +		return rc;
> +
> +	platform_set_drvdata(pdev, work);
>  	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>  }
>  
> -- 
> 2.17.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ