netdev - Re: [PATCH v17 12/22] sfc: get endpoint decoder

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <836d06d6-a36f-4ba3-b7c9-ba8687ba2190@amd.com>
Date: Mon, 11 Aug 2025 15:24:04 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: dan.j.williams@...el.com, alejandro.lucero-palau@....com,
 linux-cxl@...r.kernel.org, netdev@...r.kernel.org, edward.cree@....com,
 davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
 edumazet@...gle.com, dave.jiang@...el.com
Cc: Martin Habets <habetsm.xilinx@...il.com>,
 Edward Cree <ecree.xilinx@...il.com>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH v17 12/22] sfc: get endpoint decoder


On 7/28/25 17:30, dan.j.williams@...el.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@....com>
>>
>> Use cxl api for getting DPA (Device Physical Address) to use through an
>> endpoint decoder.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@...il.com>
>> Acked-by: Edward Cree <ecree.xilinx@...il.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> ---
>>   drivers/net/ethernet/sfc/Kconfig   |  1 +
>>   drivers/net/ethernet/sfc/efx_cxl.c | 32 +++++++++++++++++++++++++++++-
>>   2 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
>> index 979f2801e2a8..e959d9b4f4ce 100644
>> --- a/drivers/net/ethernet/sfc/Kconfig
>> +++ b/drivers/net/ethernet/sfc/Kconfig
>> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
>>   config SFC_CXL
>>   	bool "Solarflare SFC9100-family CXL support"
>>   	depends on SFC && CXL_BUS >= SFC
>> +	depends on CXL_REGION
>>   	default SFC
>>   	help
>>   	  This enables SFC CXL support if the kernel is configuring CXL for
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index e2d52ed49535..c0adfd99cc78 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>   {
>>   	struct efx_nic *efx = &probe_data->efx;
>>   	struct pci_dev *pci_dev = efx->pci_dev;
>> +	resource_size_t max_size;
>>   	struct efx_cxl *cxl;
>>   	u16 dvsec;
>>   	int rc;
>> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>   		return PTR_ERR(cxl->cxlmd);
>>   	}
>>   
>> +	cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
>> +	if (IS_ERR(cxl->endpoint))
>> +		return PTR_ERR(cxl->endpoint);
> Between Terry's set, the soft reserve set, and now this, it is become
> clearer that the cxl_core needs a centralized solution to the questions
> of:
>
> - Does the platform have CXL and if so might a device ever successfully
>    complete cxl_mem_probe() for a cxl_memdev that it registered?
>
> - When can a driver assume that no cxl_port topology is going to arrive?
>    I.e. when to give up on probe deferral.


Hi Dan,

I think your concern is valid, but I think we are mixing things up, or 
maybe it is just me getting confused, so let me to explain myself.

We have different situations to be aware of:


1) CXL topology not there or nor properly configured yet.

2) accelerator relying on pcie instead of CXL.io

3) potential removal of cxl_mem, cxl_acpi or cxl_port

4) cxl initialization failing due to dynamic modules dependencies

5) CXL errors


I think your patches in the cxl-probe-order branch will hopefully fix 
the last situation.

About 2, and as I have commented in another patch review in this series, 
it is possible to check and to preclude further cxl initialization. This 
is the last concern you have raised, and it is valid but your proposal 
in those patches are not, in my understanding, addressing it, but they 
are still useful for 4.

About 3, the only way to be protected is partially during initialization 
with cxl_acquire, and afer initialization with that callback you do not 
like introduced in patch 18. I think we agreed those modules should not 
be allowed to be removed and it requires work in the cxl core for 
support that as a follow-up work.

Regarding 5, I think Terry's patchset introduces the proper handling for 
this, or at least some initial work which will surely require adjustments.

Then we have the first situation which I admit is the most confusing (at 
least to me). If we can solve the problem of the proper initialization 
based on the probe() calls for those cxl devices to work with, the any 
other explanation for specifically dealing with this situation requires 
further explanation and, I guess, documentation.

AFAIK, the BIOS will perform a good bunch of CXL initialization (BTW, I 
think we should discuss this as well at some point for having same 
expectations about what and how things are done, and also when) then the 
kernel CXL initialization will perform its own bunch based on what the 
BIOS is given. That implies CXL Root ports, downstream/upstream cxl 
ports to be register, switches, ... . If I am not wrong, that depends on 
subsys_initcall level, and therefore earlier than any accelerator driver 
initialization. Am I right assuming once those modules are done the 
kernel cxl topology/infrastructure is ready to deal with an accelerator 
initializing its cxl functionality? If not, what is the problem or 
problems? Is this due to longer than expected hardware initialization by 
the kernel? if so, could not be leave to the BIOS somehow? is this due 
to some asynchronous initialization impossible to avoid or be certain 
of? If so, can we document it?

I understand with CXL could/will come complex topologies where maybe 
initialization by a single host is not possible without synchronizing 
with other hosts or CXL infrastructure. Is this what is all this about?

> It is also clear that a class of CXL accelerator drivers would be
> served by a simple shared routine to autocreate a region.
>
> I am going to take a stab at refactoring the current classmem case into
> a scheme that resolves automatic region assembly at
> devm_cxl_add_memdev() time in a way that can be reused to solve this
> automatic region creation problem.
>

Not sure I follow you here. But in any case, do you consider that is 
necessary for this initial Type2 support?