[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <593e126b-7942-484b-bdf3-2f8d25273f31@amd.com>
Date: Mon, 30 Jun 2025 16:57:29 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>,
alejandro.lucero-palau@....com
Cc: linux-cxl@...r.kernel.org, netdev@...r.kernel.org,
dan.j.williams@...el.com, edward.cree@....com, davem@...emloft.net,
kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com, dave.jiang@...el.com
Subject: Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media
ready
On 6/27/25 09:39, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:38 +0100
> alejandro.lucero-palau@....com wrote:
>
>> From: Alejandro Lucero <alucerop@....com>
>>
>> Use cxl code for registers discovery and mapping regarding cxl component
>> regs and validate registers found are as expected.
>>
>> Set media ready explicitly as there is no means for doing so without
>> a mailbox, and without the related cxl register, not mandatory for type2.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@....com>
> Perhaps add a brief note to the description on why you decided on the
> mix of warn vs err messages in the different conditions.
After your comment I think such divergence needs to be addressed. Using
the warn comes from the cxl pci driver, but it is a different situation
here, so dev_err should be used instead.
>
> Superficially there is a call in here that can defer. If it can't
> add a comment on why as if it can you should be failing the main
> driver probe until it doesn't defer (or adding a bunch of descriptive
> comments on why that doesn't make sense!)
Commenting on this below.
>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 34 ++++++++++++++++++++++++++++++
>> 1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index f1db7284dee8..ea02eb82b73c 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -9,6 +9,7 @@
>> * by the Free Software Foundation, incorporated herein by reference.
>> */
>>
>> +#include <cxl/cxl.h>
>> #include <cxl/pci.h>
>> #include <linux/pci.h>
>>
>> @@ -23,6 +24,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> struct pci_dev *pci_dev = efx->pci_dev;
>> struct efx_cxl *cxl;
>> u16 dvsec;
>> + int rc;
>>
>> probe_data->cxl_pio_initialised = false;
>>
>> @@ -43,6 +45,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> if (!cxl)
>> return -ENOMEM;
>>
>> + rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
>> + &cxl->cxlds.reg_map);
>> + if (rc) {
>> + dev_warn(&pci_dev->dev, "No component registers (err=%d)\n", rc);
>> + return rc;
> I haven't checked the code paths to see if we might hit them but this might
> defer. In which case
> return dev_err_probe() is appropriate as it stashes away the
> cause of deferral for debugging purposes and doesn't print if that's what
> happened as we'll be back later.
>
> If we can hit the deferral then you should catch that at the caller of efx_cxl_init()
> and fail the probe (we'll be back a bit later and should then succeed).
>
I'm scare of opening this can ... but I think adding probe deferral
support to the sfc driver is not an option, or at least something we
want to avoid because the complexity it would add.
If this call returns EPROBE_DEFER it would be because cxl_mem is not
loaded or some delay with the cxl mem device initialization when probe
inside that module, because the other potential CXL modules should be
loaded at this point (subsys_initcall vs device_initcall). I am aware
there could be some latencies when those modules are initializing,
although I think it is not a big problem now and if it is, I think we
should address it specifically. Then, if the problem is with cxl_mem,
something I have been witnessing, IMO it is preferable to unload the sfc
driver and load it again, once the cxl_mem is installed, than to support
probe deferral. Moreover, if the problem is with something related to
the cxl_mem probe, I doubt probe deferral will help at all since the sfc
cxl unwinding will remove the mem device as well, so the probing will be
needed to happen again ...
Did I say I did not want to open this can?
>> + }
>> +
>> + if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
>> + dev_err(&pci_dev->dev, "Expected HDM component register not found\n");
>> + return -ENODEV;
> Trivial but given this is new code maybe differing from style of existing sfc
> and using
> return dev_err_probe(&pci->dev, "Expected HDM component register not found\n");
> would be a nice to have. Given deferral isn't a thing for this call, it just saves on about
> 2 lines of code for each use.
It makes sense.
Thanks!
> or use pci_err() and pci_warn()?
>
>
>> + }
>> +
>> + if (!cxl->cxlds.reg_map.component_map.ras.valid) {
>> + dev_err(&pci_dev->dev, "Expected RAS component register not found\n");
>> + return -ENODEV;
>> + }
>> +
>> + rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
>> + &cxl->cxlds.regs.component,
>> + BIT(CXL_CM_CAP_CAP_ID_RAS));
>> + if (rc) {
>> + dev_err(&pci_dev->dev, "Failed to map RAS capability.\n");
>> + return rc;
>> + }
>> +
>> + /*
>> + * Set media ready explicitly as there are neither mailbox for checking
>> + * this state nor the CXL register involved, both not mandatory for
>> + * type2.
>> + */
>> + cxl->cxlds.media_ready = true;
>> +
>> probe_data->cxl = cxl;
>>
>> return 0;
Powered by blists - more mailing lists