netdev - Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media ready

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <593e126b-7942-484b-bdf3-2f8d25273f31@amd.com>
Date: Mon, 30 Jun 2025 16:57:29 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>,
 alejandro.lucero-palau@....com
Cc: linux-cxl@...r.kernel.org, netdev@...r.kernel.org,
 dan.j.williams@...el.com, edward.cree@....com, davem@...emloft.net,
 kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com, dave.jiang@...el.com
Subject: Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media
 ready


On 6/27/25 09:39, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:38 +0100
> alejandro.lucero-palau@....com wrote:
>
>> From: Alejandro Lucero <alucerop@....com>
>>
>> Use cxl code for registers discovery and mapping regarding cxl component
>> regs and validate registers found are as expected.
>>
>> Set media ready explicitly as there is no means for doing so without
>> a mailbox, and without the related cxl register, not mandatory for type2.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@....com>
> Perhaps add a brief note to the description on why you decided on the
> mix of warn vs err messages in the different conditions.


After your comment I think such divergence needs to be addressed. Using 
the warn comes from the cxl pci driver, but it is a different situation 
here, so dev_err should be used instead.


>
> Superficially there is a call in here that can defer.  If it can't
> add a comment on why as if it can you should be failing the main
> driver probe until it doesn't defer (or adding a bunch of descriptive
> comments on why that doesn't make sense!)


Commenting on this below.


>
>> ---
>>   drivers/net/ethernet/sfc/efx_cxl.c | 34 ++++++++++++++++++++++++++++++
>>   1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index f1db7284dee8..ea02eb82b73c 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -9,6 +9,7 @@
>>    * by the Free Software Foundation, incorporated herein by reference.
>>    */
>>   
>> +#include <cxl/cxl.h>
>>   #include <cxl/pci.h>
>>   #include <linux/pci.h>
>>   
>> @@ -23,6 +24,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>   	struct pci_dev *pci_dev = efx->pci_dev;
>>   	struct efx_cxl *cxl;
>>   	u16 dvsec;
>> +	int rc;
>>   
>>   	probe_data->cxl_pio_initialised = false;
>>   
>> @@ -43,6 +45,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>   	if (!cxl)
>>   		return -ENOMEM;
>>   
>> +	rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
>> +				&cxl->cxlds.reg_map);
>> +	if (rc) {
>> +		dev_warn(&pci_dev->dev, "No component registers (err=%d)\n", rc);
>> +		return rc;
> I haven't checked the code paths to see if we might hit them but this might
> defer.  In which case
> 		return dev_err_probe() is appropriate as it stashes away the
> cause of deferral for debugging purposes and doesn't print if that's what
> happened as we'll be back later.
>
> If we can hit the deferral then you should catch that at the caller of efx_cxl_init()
> and fail the probe (we'll be back a bit later and should then succeed).
>

I'm scare of opening this can ... but I think adding probe deferral 
support to the sfc driver is not an option, or at least something we 
want to avoid because the complexity it would add.


If this call returns EPROBE_DEFER it would be because cxl_mem is not 
loaded or some delay with the cxl mem device initialization when probe 
inside that module, because the other potential CXL modules should be 
loaded at this point (subsys_initcall vs device_initcall). I am aware 
there could be some latencies when those modules are initializing, 
although I think it is not a big problem now and if it is, I think we 
should address it specifically. Then, if the problem is with cxl_mem, 
something I have been witnessing, IMO it is preferable to unload the sfc 
driver and load it again, once the cxl_mem is installed, than to support 
probe deferral. Moreover, if the problem is with something related to 
the cxl_mem probe, I doubt probe deferral will help at all since the sfc 
cxl unwinding will remove the mem device as well, so the probing will be 
needed to happen again ...


Did I say I did not want to open this can?


>> +	}
>> +
>> +	if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
>> +		dev_err(&pci_dev->dev, "Expected HDM component register not found\n");
>> +		return -ENODEV;
> Trivial but given this is new code maybe differing from style of existing sfc
> and using
> 		return dev_err_probe(&pci->dev, "Expected HDM component register not found\n");
> would be a nice to have.  Given deferral isn't a thing for this call, it just saves on about
> 2 lines of code for each use.


It makes sense.


Thanks!


> or use pci_err() and pci_warn()?
>
>   
>> +	}
>> +
>> +	if (!cxl->cxlds.reg_map.component_map.ras.valid) {
>> +		dev_err(&pci_dev->dev, "Expected RAS component register not found\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
>> +				    &cxl->cxlds.regs.component,
>> +				    BIT(CXL_CM_CAP_CAP_ID_RAS));
>> +	if (rc) {
>> +		dev_err(&pci_dev->dev, "Failed to map RAS capability.\n");
>> +		return rc;
>> +	}
>> +
>> +	/*
>> +	 * Set media ready explicitly as there are neither mailbox for checking
>> +	 * this state nor the CXL register involved, both not mandatory for
>> +	 * type2.
>> +	 */
>> +	cxl->cxlds.media_ready = true;
>> +
>>   	probe_data->cxl = cxl;
>>   
>>   	return 0;