[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <61d4932d-7e8e-c8d0-d782-8b15c9d86714@amd.com>
Date: Wed, 8 Jan 2025 14:53:13 +0000
From: Alejandro Lucero Palau <alucerop@....com>
To: Dan Williams <dan.j.williams@...el.com>, alejandro.lucero-palau@....com,
linux-cxl@...r.kernel.org, netdev@...r.kernel.org, martin.habets@...inx.com,
edward.cree@....com, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, edumazet@...gle.com, dave.jiang@...el.com
Subject: Re: [PATCH v8 02/27] sfc: add cxl support using new CXL API
On 1/8/25 01:56, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@....com>
>>
>> Add CXL initialization based on new CXL API for accel drivers and make
>> it dependent on kernel CXL configuration.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@...il.com>
>> Acked-by: Edward Cree <ecree.xilinx@...il.com>
>> ---
>> drivers/net/ethernet/sfc/Kconfig | 7 +++
>> drivers/net/ethernet/sfc/Makefile | 1 +
>> drivers/net/ethernet/sfc/efx.c | 23 ++++++-
>> drivers/net/ethernet/sfc/efx_cxl.c | 87 +++++++++++++++++++++++++++
>> drivers/net/ethernet/sfc/efx_cxl.h | 28 +++++++++
>> drivers/net/ethernet/sfc/net_driver.h | 10 +++
>> 6 files changed, 155 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
>> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
>>
>> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
>> index 3eb55dcfa8a6..a8bc777baa95 100644
>> --- a/drivers/net/ethernet/sfc/Kconfig
>> +++ b/drivers/net/ethernet/sfc/Kconfig
>> @@ -65,6 +65,13 @@ config SFC_MCDI_LOGGING
>> Driver-Interface) commands and responses, allowing debugging of
>> driver/firmware interaction. The tracing is actually enabled by
>> a sysfs file 'mcdi_logging' under the PCI device.
>> +config SFC_CXL
>> + bool "Solarflare SFC9100-family CXL support"
>> + depends on SFC && CXL_BUS && !(SFC=y && CXL_BUS=m)
>> + default y
>
> This looks a bit messy, how about:
>
> depends on SFC
> depends on CXL_BUS >= SFC
> default SFC
>
> ...where the "depends on SFC" line could be deleted if this other
> "depends on SFC" options in this file are all moved under an "if SFC"
> section.
>
> ...where the "CXL_BUS >= SFC" is the canonical way to represent the
> "only build me if my core library is built-in or I am also a dynamic
> module".
>
> ...and where "default SFC" is a bit clearer that this is a "non-optional
> functionality of the SFC driver", not "non-optional functionality of the
> wider kernel".
>
> Noted that all of the above is inconsistent with the existing style in
> this file, I still think it's a worthwhile cleanup.
>
>
>> + help
>> + This enables CXL support by the driver relying on kernel support
>> + and hardware support.
> That feels like an "information free" help text. Given this
> capability auto-enables shouldn't the help text be giving some direction
> about when someone would want to turn it off? Or maybe reconsider making
> it "default y" if this is really functionality that someone should
> conciously opt-in to?
>
> Otherwise if it auto-enables and you do not expect anyone to turn it
> off, just disable the prompt for this by removing the help text and
> making it purely and automatic parameter.
>
>> source "drivers/net/ethernet/sfc/falcon/Kconfig"
>> source "drivers/net/ethernet/sfc/siena/Kconfig"
>> diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
>> index 8f446b9bd5ee..e909cafd5908 100644
>> --- a/drivers/net/ethernet/sfc/Makefile
>> +++ b/drivers/net/ethernet/sfc/Makefile
>> @@ -13,6 +13,7 @@ sfc-$(CONFIG_SFC_SRIOV) += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
>> mae.o tc.o tc_bindings.o tc_counters.o \
>> tc_encap_actions.o tc_conntrack.o
>>
>> +sfc-$(CONFIG_SFC_CXL) += efx_cxl.o
>> obj-$(CONFIG_SFC) += sfc.o
>>
>> obj-$(CONFIG_SFC_FALCON) += falcon/
>> diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
>> index 650136dfc642..ef9bae88df6a 100644
>> --- a/drivers/net/ethernet/sfc/efx.c
>> +++ b/drivers/net/ethernet/sfc/efx.c
>> @@ -34,6 +34,9 @@
>> #include "selftest.h"
>> #include "sriov.h"
>> #include "efx_devlink.h"
>> +#ifdef CONFIG_SFC_CXL
>> +#include "efx_cxl.h"
>> +#endif
> Just unconditionally include this...
>
>>
>> #include "mcdi_port_common.h"
>> #include "mcdi_pcol.h"
>> @@ -1004,12 +1007,17 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
>> efx_pci_remove_main(efx);
>>
>> efx_fini_io(efx);
>> +
>> + probe_data = container_of(efx, struct efx_probe_data, efx);
>> +#ifdef CONFIG_SFC_CXL
>> + efx_cxl_exit(probe_data);
>> +#endif
> ...and add a section in efx_cxl.h that does:
>
> #ifdef CONFIG_SFC_CXL
> void efx_cxl_exit(struct efx_probe_data *probe_data);
> #else /* CONFIG_SFC_CXL */
> static inline void efx_cxl_exit(struct efx_probe_data *probe_data) { }
> #endif
>
> ...to meet the "no ifdef in C files" coding style guidance.
>
>
>> +
>> pci_dbg(efx->pci_dev, "shutdown successful\n");
>>
>> efx_fini_devlink_and_unlock(efx);
>> efx_fini_struct(efx);
>> free_netdev(efx->net_dev);
>> - probe_data = container_of(efx, struct efx_probe_data, efx);
>> kfree(probe_data);
>> };
>>
>> @@ -1214,6 +1222,16 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
>> if (rc)
>> goto fail2;
>>
>> +#ifdef CONFIG_SFC_CXL
>> + /* A successful cxl initialization implies a CXL region created to be
>> + * used for PIO buffers. If there is no CXL support, or initialization
>> + * fails, efx_cxl_pio_initialised wll be false and legacy PIO buffers
>> + * defined at specific PCI BAR regions will be used.
>> + */
>> + rc = efx_cxl_init(probe_data);
>> + if (rc)
>> + pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
>> +#endif
>> rc = efx_pci_probe_post_io(efx);
>> if (rc) {
>> /* On failure, retry once immediately.
>> @@ -1485,3 +1503,6 @@ MODULE_AUTHOR("Solarflare Communications and "
>> MODULE_DESCRIPTION("Solarflare network driver");
>> MODULE_LICENSE("GPL");
>> MODULE_DEVICE_TABLE(pci, efx_pci_table);
>> +#ifdef CONFIG_SFC_CXL
>> +MODULE_SOFTDEP("pre: cxl_core cxl_port cxl_acpi cxl-mem");
> No, endpoint drivers should not need softdep for cxl core modules.
> Primarily because this does nothing to ensure platform CXL
> capability-enumeration relative to PCI driver loading, and because half
> of those softdeps in that statement are redundant or broken.
>
> - cxl_core is already a dependency due to link time dependencies
> - cxl_port merely being loaded does nothing to enforce that port probing
> is complete by the time the driver loads. Instead the driver needs to
> use EPROBE_DEFER to wait for CXL enumeration, or it needs to use the
> scheme that cxl_pci uses which is register a memdev and teach userspace
> to wait for that memdev attaching to its driver event as the "CXL memory
> is now available" event.
> - cxl_acpi is a platform specific implementation detail. When / if a
> non-ACPI platform ever adds CXL support it would be broken if every
> endpoint softdep line needed to then be updated
> - cxl-mem is misspelled cxl_mem and likely is not having any effect.
>
> In short, if you delete this line and something breaks then it needs to
> be fixed in code and not module dependencies.
It has been a while since I worked on this part, so I need to put my
mind back, but with the right softdeps the driver initialization does
not fail due to the CXL modules not being loaded yet.
With a softdep for cxl_port, which is the problem here, and with
cxl_port having a dependency for cxl_core, I would say everything the
driver needs should be there at the time the sfc driver is loaded. I
agree cxl_acpi could be a problem because it is architecture dependent,
and I need to see the cxl_mem dependency which is obviously unclear
after you spotting the typo.
In other words, I need to think about all this again and perform some
testing. This was implemented after the problems you solved regarding
the loading dependencies and enumeration delays, and from our discussion
then, I though EPROBE_DEFER was not needed for solving this.
Thanks anyway.
Powered by blists - more mailing lists