netdev - Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6be8b643-75bd-4f82-ae9e-9c04fbfefe1d@amd.com>
Date: Fri, 6 Jun 2025 14:09:49 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: Dan Williams <dan.j.williams@...el.com>, alejandro.lucero-palau@....com,
 linux-cxl@...r.kernel.org, netdev@...r.kernel.org, edward.cree@....com,
 davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
 edumazet@...gle.com, dave.jiang@...el.com
Cc: Ben Cheatham <benjamin.cheatham@....com>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA
 allocation


On 5/21/25 21:23, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@....com>
>>
>> Region creation involves finding available DPA (device-physical-address)
>> capacity to map into HPA (host-physical-address) space.
>>
>> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
>> that tries to allocate the DPA memory the driver requires to operate.The
>> memory requested should not be bigger than the max available HPA obtained
>> previously with cxl_get_hpa_freespace.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@....com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> ---
>>   drivers/cxl/core/hdm.c | 86 ++++++++++++++++++++++++++++++++++++++++++
>>   include/cxl/cxl.h      |  5 +++
>>   2 files changed, 91 insertions(+)
>>
>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>> index 70cae4ebf8a4..500df2deceef 100644
>> --- a/drivers/cxl/core/hdm.c
>> +++ b/drivers/cxl/core/hdm.c
>> @@ -3,6 +3,7 @@
>>   #include <linux/seq_file.h>
>>   #include <linux/device.h>
>>   #include <linux/delay.h>
>> +#include <cxl/cxl.h>
>>   
>>   #include "cxlmem.h"
>>   #include "core.h"
>> @@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
>>   	return base;
>>   }
>>   
>> +/**
>> + * cxl_dpa_free - release DPA (Device Physical Address)
>> + *
>> + * @cxled: endpoint decoder linked to the DPA
>> + *
>> + * Returns 0 or error.
>> + */
>>   int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>>   {
>>   	struct cxl_port *port = cxled_to_port(cxled);
>> @@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>>   	devm_cxl_dpa_release(cxled);
>>   	return 0;
>>   }
>> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>>   
>>   int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
>>   		     enum cxl_partition_mode mode)
>> @@ -686,6 +695,83 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>>   	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>>   }
>>   
>> +static int find_free_decoder(struct device *dev, const void *data)
>> +{
>> +	struct cxl_endpoint_decoder *cxled;
>> +	struct cxl_port *port;
>> +
>> +	if (!is_endpoint_decoder(dev))
>> +		return 0;
>> +
>> +	cxled = to_cxl_endpoint_decoder(dev);
>> +	port = cxled_to_port(cxled);
>> +
>> +	if (cxled->cxld.id != port->hdm_end + 1)
>> +		return 0;
>> +
>> +	return 1;
>> +}
>> +
>> +/**
>> + * cxl_request_dpa - search and reserve DPA given input constraints
>> + * @cxlmd: memdev with an endpoint port with available decoders
>> + * @mode: DPA operation mode (ram vs pmem)
>> + * @alloc: dpa size required
>> + *
>> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
>> + *
>> + * Given that a region needs to allocate from limited HPA capacity it
>> + * may be the case that a device has more mappable DPA capacity than
>> + * available HPA. The expectation is that @alloc is a driver known
>> + * value based on the device capacity but it could not be available
>> + * due to HPA constraints.
>> + *
>> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
>> + * reserved, or an error pointer. The caller is also expected to own the
>> + * lifetime of the memdev registration associated with the endpoint to
>> + * pin the decoder registered as well.
>> + */
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> +					     enum cxl_partition_mode mode,
>> +					     resource_size_t alloc)
>> +{
>> +	struct cxl_port *endpoint = cxlmd->endpoint;
>> +	struct cxl_endpoint_decoder *cxled;
>> +	struct device *cxled_dev;
>> +	int rc;
>> +
>> +	if (!IS_ALIGNED(alloc, SZ_256M))
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	down_read(&cxl_dpa_rwsem);
>> +	cxled_dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
>> +	up_read(&cxl_dpa_rwsem);
> In another effort [1] I am trying to get rid of all explicit unlock
> management of cxl_dpa_rwsem and cxl_region_rwsem, and ultimately get rid
> of all "goto" use in the CXL core.
>
> [1]: http://lore.kernel.org/20250507072145.3614298-1-dan.j.williams@intel.com
>
> So that conversion here would be:
>
> DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&cxled->cxld.dev))
> struct cxl_endpoint_decoder *cxl_find_free_decoder(struct cxl_memdev *cxlmd)
> {
> 	struct device *dev;
>
> 	scoped_guard(rwsem_read, &cxl_dpa_rwsem)
> 		dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
> 	if (dev)
> 		return to_cxl_endpoint_decoder(dev);
> 	return NULL;
> }
>
> ...and then:
>
> struct cxl_endpoint_decoder *cxled __free(put_cxled) = cxl_find_free_decoder(cxlmd);
>
>> +
>> +	if (!cxled_dev)
>> +		return ERR_PTR(-ENXIO);
>> +
>> +	cxled = to_cxl_endpoint_decoder(cxled_dev);
>> +
>> +	if (!cxled) {
>> +		rc = -ENODEV;
>> +		goto err;
>> +	}
>> +
>> +	rc = cxl_dpa_set_part(cxled, mode);
>> +	if (rc)
>> +		goto err;
>> +
>> +	rc = cxl_dpa_alloc(cxled, alloc);
> The current user of this interface is sysfs. The expecation there is
> that if 2 userspace threads are racing to allocate DPA space, the kernel
> will protect itself and not get confused, but the result will be that
> one thread loses the race and needs to redo its allocation.
>
> That's not an interface that the kernel can support, so there needs to
> be some locking to enforce that 2 threads racing cxl_request_dpa() each
> end up with independent allocations. That likely needs to be a
> syncrhonization primitive over the entire process due to the way that
> CXL requires in-order allocation of DPA and HPA. Effectively you need to
> complete the entire HPA allocatcion, DPA allocation, and decoder
> programming in one atomic unit.


I do not understand this atomic requirement. As I understand this, 
dpa_alloc, with the proper locking, will satisfy just one request if two 
content, with the second one requiring another try. Once the decoders 
resources have been obtained, there is nothing to worry about assuming 
the commit of those decoders will be performed with the proper locking 
as well.


>
> I think to start since there is only 1 Type-2 driver in the kernel and
> it's only use case is single-threaded setup this is not yet an immediate
> problem.