[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3887fb34-a844-4769-b0d2-ffb7b6cd4585@amd.com>
Date: Tue, 2 Sep 2025 08:11:55 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: dan.j.williams@...el.com, alejandro.lucero-palau@....com,
linux-cxl@...r.kernel.org, netdev@...r.kernel.org, edward.cree@....com,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
edumazet@...gle.com, dave.jiang@...el.com
Cc: Martin Habets <habetsm.xilinx@...il.com>,
Edward Cree <ecree.xilinx@...il.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH v17 12/22] sfc: get endpoint decoder
Hi all,
I just want to refresh this discussion below. There were not replies to
my comment, and I would like to have people's comments since I consider
this the main obstacle before sending v18.
Thanks
On 8/11/25 15:24, Alejandro Lucero Palau wrote:
>
> On 7/28/25 17:30, dan.j.williams@...el.com wrote:
>> alejandro.lucero-palau@ wrote:
>>> From: Alejandro Lucero <alucerop@....com>
>>>
>>> Use cxl api for getting DPA (Device Physical Address) to use through an
>>> endpoint decoder.
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>>> Reviewed-by: Martin Habets <habetsm.xilinx@...il.com>
>>> Acked-by: Edward Cree <ecree.xilinx@...il.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>>> ---
>>> drivers/net/ethernet/sfc/Kconfig | 1 +
>>> drivers/net/ethernet/sfc/efx_cxl.c | 32
>>> +++++++++++++++++++++++++++++-
>>> 2 files changed, 32 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/sfc/Kconfig
>>> b/drivers/net/ethernet/sfc/Kconfig
>>> index 979f2801e2a8..e959d9b4f4ce 100644
>>> --- a/drivers/net/ethernet/sfc/Kconfig
>>> +++ b/drivers/net/ethernet/sfc/Kconfig
>>> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
>>> config SFC_CXL
>>> bool "Solarflare SFC9100-family CXL support"
>>> depends on SFC && CXL_BUS >= SFC
>>> + depends on CXL_REGION
>>> default SFC
>>> help
>>> This enables SFC CXL support if the kernel is configuring
>>> CXL for
>>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
>>> b/drivers/net/ethernet/sfc/efx_cxl.c
>>> index e2d52ed49535..c0adfd99cc78 100644
>>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>>> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>> {
>>> struct efx_nic *efx = &probe_data->efx;
>>> struct pci_dev *pci_dev = efx->pci_dev;
>>> + resource_size_t max_size;
>>> struct efx_cxl *cxl;
>>> u16 dvsec;
>>> int rc;
>>> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>> return PTR_ERR(cxl->cxlmd);
>>> }
>>> + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
>>> + if (IS_ERR(cxl->endpoint))
>>> + return PTR_ERR(cxl->endpoint);
>> Between Terry's set, the soft reserve set, and now this, it is become
>> clearer that the cxl_core needs a centralized solution to the questions
>> of:
>>
>> - Does the platform have CXL and if so might a device ever successfully
>> complete cxl_mem_probe() for a cxl_memdev that it registered?
>>
>> - When can a driver assume that no cxl_port topology is going to arrive?
>> I.e. when to give up on probe deferral.
>
>
> Hi Dan,
>
> I think your concern is valid, but I think we are mixing things up, or
> maybe it is just me getting confused, so let me to explain myself.
>
> We have different situations to be aware of:
>
>
> 1) CXL topology not there or nor properly configured yet.
>
> 2) accelerator relying on pcie instead of CXL.io
>
> 3) potential removal of cxl_mem, cxl_acpi or cxl_port
>
> 4) cxl initialization failing due to dynamic modules dependencies
>
> 5) CXL errors
>
>
> I think your patches in the cxl-probe-order branch will hopefully fix
> the last situation.
>
> About 2, and as I have commented in another patch review in this
> series, it is possible to check and to preclude further cxl
> initialization. This is the last concern you have raised, and it is
> valid but your proposal in those patches are not, in my understanding,
> addressing it, but they are still useful for 4.
>
> About 3, the only way to be protected is partially during
> initialization with cxl_acquire, and afer initialization with that
> callback you do not like introduced in patch 18. I think we agreed
> those modules should not be allowed to be removed and it requires work
> in the cxl core for support that as a follow-up work.
>
> Regarding 5, I think Terry's patchset introduces the proper handling
> for this, or at least some initial work which will surely require
> adjustments.
>
> Then we have the first situation which I admit is the most confusing
> (at least to me). If we can solve the problem of the proper
> initialization based on the probe() calls for those cxl devices to
> work with, the any other explanation for specifically dealing with
> this situation requires further explanation and, I guess, documentation.
>
> AFAIK, the BIOS will perform a good bunch of CXL initialization (BTW,
> I think we should discuss this as well at some point for having same
> expectations about what and how things are done, and also when) then
> the kernel CXL initialization will perform its own bunch based on what
> the BIOS is given. That implies CXL Root ports, downstream/upstream
> cxl ports to be register, switches, ... . If I am not wrong, that
> depends on subsys_initcall level, and therefore earlier than any
> accelerator driver initialization. Am I right assuming once those
> modules are done the kernel cxl topology/infrastructure is ready to
> deal with an accelerator initializing its cxl functionality? If not,
> what is the problem or problems? Is this due to longer than expected
> hardware initialization by the kernel? if so, could not be leave to
> the BIOS somehow? is this due to some asynchronous initialization
> impossible to avoid or be certain of? If so, can we document it?
>
> I understand with CXL could/will come complex topologies where maybe
> initialization by a single host is not possible without synchronizing
> with other hosts or CXL infrastructure. Is this what is all this about?
>
>> It is also clear that a class of CXL accelerator drivers would be
>> served by a simple shared routine to autocreate a region.
>>
>> I am going to take a stab at refactoring the current classmem case into
>> a scheme that resolves automatic region assembly at
>> devm_cxl_add_memdev() time in a way that can be reused to solve this
>> automatic region creation problem.
>>
>
> Not sure I follow you here. But in any case, do you consider that is
> necessary for this initial Type2 support?
>
Powered by blists - more mailing lists