netdev - Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <682e364b9dc25_1626e100f8@dwillia2-xfh.jf.intel.com.notmuch>
Date: Wed, 21 May 2025 13:23:39 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: <alejandro.lucero-palau@....com>, <linux-cxl@...r.kernel.org>,
	<netdev@...r.kernel.org>, <dan.j.williams@...el.com>, <edward.cree@....com>,
	<davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<edumazet@...gle.com>, <dave.jiang@...el.com>
CC: Alejandro Lucero <alucerop@....com>, Ben Cheatham
	<benjamin.cheatham@....com>, Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA
 allocation

alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@....com>
> 
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
> 
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace.
> 
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
> 
> Signed-off-by: Alejandro Lucero <alucerop@....com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@....com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> ---
>  drivers/cxl/core/hdm.c | 86 ++++++++++++++++++++++++++++++++++++++++++
>  include/cxl/cxl.h      |  5 +++
>  2 files changed, 91 insertions(+)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 70cae4ebf8a4..500df2deceef 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -3,6 +3,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/device.h>
>  #include <linux/delay.h>
> +#include <cxl/cxl.h>
>  
>  #include "cxlmem.h"
>  #include "core.h"
> @@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
>  	return base;
>  }
>  
> +/**
> + * cxl_dpa_free - release DPA (Device Physical Address)
> + *
> + * @cxled: endpoint decoder linked to the DPA
> + *
> + * Returns 0 or error.
> + */
>  int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>  {
>  	struct cxl_port *port = cxled_to_port(cxled);
> @@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>  	devm_cxl_dpa_release(cxled);
>  	return 0;
>  }
> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>  
>  int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
>  		     enum cxl_partition_mode mode)
> @@ -686,6 +695,83 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>  }
>  
> +static int find_free_decoder(struct device *dev, const void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_port *port;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	port = cxled_to_port(cxled);
> +
> +	if (cxled->cxld.id != port->hdm_end + 1)
> +		return 0;
> +
> +	return 1;
> +}
> +
> +/**
> + * cxl_request_dpa - search and reserve DPA given input constraints
> + * @cxlmd: memdev with an endpoint port with available decoders
> + * @mode: DPA operation mode (ram vs pmem)
> + * @alloc: dpa size required
> + *
> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
> + *
> + * Given that a region needs to allocate from limited HPA capacity it
> + * may be the case that a device has more mappable DPA capacity than
> + * available HPA. The expectation is that @alloc is a driver known
> + * value based on the device capacity but it could not be available
> + * due to HPA constraints.
> + *
> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
> + * reserved, or an error pointer. The caller is also expected to own the
> + * lifetime of the memdev registration associated with the endpoint to
> + * pin the decoder registered as well.
> + */
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> +					     enum cxl_partition_mode mode,
> +					     resource_size_t alloc)
> +{
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct device *cxled_dev;
> +	int rc;
> +
> +	if (!IS_ALIGNED(alloc, SZ_256M))
> +		return ERR_PTR(-EINVAL);
> +
> +	down_read(&cxl_dpa_rwsem);
> +	cxled_dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
> +	up_read(&cxl_dpa_rwsem);

In another effort [1] I am trying to get rid of all explicit unlock
management of cxl_dpa_rwsem and cxl_region_rwsem, and ultimately get rid
of all "goto" use in the CXL core. 

[1]: http://lore.kernel.org/20250507072145.3614298-1-dan.j.williams@intel.com

So that conversion here would be:

DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&cxled->cxld.dev))
struct cxl_endpoint_decoder *cxl_find_free_decoder(struct cxl_memdev *cxlmd)
{
	struct device *dev;

	scoped_guard(rwsem_read, &cxl_dpa_rwsem)
		dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
	if (dev)
		return to_cxl_endpoint_decoder(dev);
	return NULL;
}

...and then:

struct cxl_endpoint_decoder *cxled __free(put_cxled) = cxl_find_free_decoder(cxlmd);

> +
> +	if (!cxled_dev)
> +		return ERR_PTR(-ENXIO);
> +
> +	cxled = to_cxl_endpoint_decoder(cxled_dev);
> +
> +	if (!cxled) {
> +		rc = -ENODEV;
> +		goto err;
> +	}
> +
> +	rc = cxl_dpa_set_part(cxled, mode);
> +	if (rc)
> +		goto err;
> +
> +	rc = cxl_dpa_alloc(cxled, alloc);

The current user of this interface is sysfs. The expecation there is
that if 2 userspace threads are racing to allocate DPA space, the kernel
will protect itself and not get confused, but the result will be that
one thread loses the race and needs to redo its allocation.

That's not an interface that the kernel can support, so there needs to
be some locking to enforce that 2 threads racing cxl_request_dpa() each
end up with independent allocations. That likely needs to be a
syncrhonization primitive over the entire process due to the way that
CXL requires in-order allocation of DPA and HPA. Effectively you need to
complete the entire HPA allocatcion, DPA allocation, and decoder
programming in one atomic unit.

I think to start since there is only 1 Type-2 driver in the kernel and
it's only use case is single-threaded setup this is not yet an immediate
problem.