netdev - Re: [PATCH v17 12/22] sfc: get endpoint decoder

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3887fb34-a844-4769-b0d2-ffb7b6cd4585@amd.com>
Date: Tue, 2 Sep 2025 08:11:55 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: dan.j.williams@...el.com, alejandro.lucero-palau@....com,
 linux-cxl@...r.kernel.org, netdev@...r.kernel.org, edward.cree@....com,
 davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
 edumazet@...gle.com, dave.jiang@...el.com
Cc: Martin Habets <habetsm.xilinx@...il.com>,
 Edward Cree <ecree.xilinx@...il.com>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH v17 12/22] sfc: get endpoint decoder

Hi all,


I just want to refresh this discussion below. There were not replies to 
my comment, and I would like to have people's comments since I consider 
this the main obstacle before sending v18.


Thanks


On 8/11/25 15:24, Alejandro Lucero Palau wrote:
>
> On 7/28/25 17:30, dan.j.williams@...el.com wrote:
>> alejandro.lucero-palau@ wrote:
>>> From: Alejandro Lucero <alucerop@....com>
>>>
>>> Use cxl api for getting DPA (Device Physical Address) to use through an
>>> endpoint decoder.
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>>> Reviewed-by: Martin Habets <habetsm.xilinx@...il.com>
>>> Acked-by: Edward Cree <ecree.xilinx@...il.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>>> ---
>>>   drivers/net/ethernet/sfc/Kconfig   |  1 +
>>>   drivers/net/ethernet/sfc/efx_cxl.c | 32 
>>> +++++++++++++++++++++++++++++-
>>>   2 files changed, 32 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/sfc/Kconfig 
>>> b/drivers/net/ethernet/sfc/Kconfig
>>> index 979f2801e2a8..e959d9b4f4ce 100644
>>> --- a/drivers/net/ethernet/sfc/Kconfig
>>> +++ b/drivers/net/ethernet/sfc/Kconfig
>>> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
>>>   config SFC_CXL
>>>       bool "Solarflare SFC9100-family CXL support"
>>>       depends on SFC && CXL_BUS >= SFC
>>> +    depends on CXL_REGION
>>>       default SFC
>>>       help
>>>         This enables SFC CXL support if the kernel is configuring 
>>> CXL for
>>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c 
>>> b/drivers/net/ethernet/sfc/efx_cxl.c
>>> index e2d52ed49535..c0adfd99cc78 100644
>>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>>> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>>   {
>>>       struct efx_nic *efx = &probe_data->efx;
>>>       struct pci_dev *pci_dev = efx->pci_dev;
>>> +    resource_size_t max_size;
>>>       struct efx_cxl *cxl;
>>>       u16 dvsec;
>>>       int rc;
>>> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>>           return PTR_ERR(cxl->cxlmd);
>>>       }
>>>   +    cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
>>> +    if (IS_ERR(cxl->endpoint))
>>> +        return PTR_ERR(cxl->endpoint);
>> Between Terry's set, the soft reserve set, and now this, it is become
>> clearer that the cxl_core needs a centralized solution to the questions
>> of:
>>
>> - Does the platform have CXL and if so might a device ever successfully
>>    complete cxl_mem_probe() for a cxl_memdev that it registered?
>>
>> - When can a driver assume that no cxl_port topology is going to arrive?
>>    I.e. when to give up on probe deferral.
>
>
> Hi Dan,
>
> I think your concern is valid, but I think we are mixing things up, or 
> maybe it is just me getting confused, so let me to explain myself.
>
> We have different situations to be aware of:
>
>
> 1) CXL topology not there or nor properly configured yet.
>
> 2) accelerator relying on pcie instead of CXL.io
>
> 3) potential removal of cxl_mem, cxl_acpi or cxl_port
>
> 4) cxl initialization failing due to dynamic modules dependencies
>
> 5) CXL errors
>
>
> I think your patches in the cxl-probe-order branch will hopefully fix 
> the last situation.
>
> About 2, and as I have commented in another patch review in this 
> series, it is possible to check and to preclude further cxl 
> initialization. This is the last concern you have raised, and it is 
> valid but your proposal in those patches are not, in my understanding, 
> addressing it, but they are still useful for 4.
>
> About 3, the only way to be protected is partially during 
> initialization with cxl_acquire, and afer initialization with that 
> callback you do not like introduced in patch 18. I think we agreed 
> those modules should not be allowed to be removed and it requires work 
> in the cxl core for support that as a follow-up work.
>
> Regarding 5, I think Terry's patchset introduces the proper handling 
> for this, or at least some initial work which will surely require 
> adjustments.
>
> Then we have the first situation which I admit is the most confusing 
> (at least to me). If we can solve the problem of the proper 
> initialization based on the probe() calls for those cxl devices to 
> work with, the any other explanation for specifically dealing with 
> this situation requires further explanation and, I guess, documentation.
>
> AFAIK, the BIOS will perform a good bunch of CXL initialization (BTW, 
> I think we should discuss this as well at some point for having same 
> expectations about what and how things are done, and also when) then 
> the kernel CXL initialization will perform its own bunch based on what 
> the BIOS is given. That implies CXL Root ports, downstream/upstream 
> cxl ports to be register, switches, ... . If I am not wrong, that 
> depends on subsys_initcall level, and therefore earlier than any 
> accelerator driver initialization. Am I right assuming once those 
> modules are done the kernel cxl topology/infrastructure is ready to 
> deal with an accelerator initializing its cxl functionality? If not, 
> what is the problem or problems? Is this due to longer than expected 
> hardware initialization by the kernel? if so, could not be leave to 
> the BIOS somehow? is this due to some asynchronous initialization 
> impossible to avoid or be certain of? If so, can we document it?
>
> I understand with CXL could/will come complex topologies where maybe 
> initialization by a single host is not possible without synchronizing 
> with other hosts or CXL infrastructure. Is this what is all this about?
>
>> It is also clear that a class of CXL accelerator drivers would be
>> served by a simple shared routine to autocreate a region.
>>
>> I am going to take a stab at refactoring the current classmem case into
>> a scheme that resolves automatic region assembly at
>> devm_cxl_add_memdev() time in a way that can be reused to solve this
>> automatic region creation problem.
>>
>
> Not sure I follow you here. But in any case, do you consider that is 
> necessary for this initial Type2 support?
>