[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5e3337df-cb85-efbb-ceaf-a9d9808d981c@amd.com>
Date: Mon, 16 Sep 2024 13:03:10 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>,
alejandro.lucero-palau@....com
Cc: linux-cxl@...r.kernel.org, netdev@...r.kernel.org,
dan.j.williams@...el.com, martin.habets@...inx.com, edward.cree@....com,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com
Subject: Re: [PATCH v3 01/20] cxl: add type2 device basic support
On 9/13/24 17:41, Jonathan Cameron wrote:
> On Sat, 7 Sep 2024 09:18:17 +0100
> <alejandro.lucero-palau@....com> wrote:
>
>> From: Alejandro Lucero <alucerop@....com>
> Hi Alejandro,
>
> I'm mainly looking at these to get my head back into this support
> for discussions next week but will probably leave
> lots of trivial review feedback as I go.
>
> And to advertise that:
> https://lpc.events/event/18/contributions/1828/
Looking forward to see you there along with other CXL kernel guys.
>> Differientiate Type3, aka memory expanders, from Type2, aka device
> Spell check. Differentiate.
Embarrassing ...I did fix that or I though I did since this was also
pointed out by Dan Williams as well.
I'll definitely fix it for v4.
>> accelerators, with a new function for initializing cxl_dev_state.
>>
>> Create accessors to cxl_dev_state to be used by accel drivers.
>>
>> Add SFC ethernet network driver as the client.
> Minor thing (And others may disagree) but I'd split this to be nice
> to others who might want to backport the type2 support but not
> the sfc changes (as they are supporting some other hardware).
Should I then send incremental sfc changes as well as the API is
introduced or just a final patch with all of it?
>> Based on https://lore.kernel.org/linux-cxl/168592160379.1948938.12863272903570476312.stgit@dwillia2-xfh.jf.intel.com/
> Maybe make that a link tag Link: .... # [1]
> and have
> Based on [1] here.
OK.
>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>> Co-developed-by: Dan Williams <dan.j.williams@...el.com>
>
>> +int cxl_set_resource(struct cxl_dev_state *cxlds, struct resource res,
>> + enum cxl_resource type)
>> +{
>> + switch (type) {
>> + case CXL_ACCEL_RES_DPA:
>> + cxlds->dpa_res = res;
>> + return 0;
>> + case CXL_ACCEL_RES_RAM:
>> + cxlds->ram_res = res;
>> + return 0;
>> + case CXL_ACCEL_RES_PMEM:
>> + cxlds->pmem_res = res;
>> + return 0;
>> + default:
>> + dev_err(cxlds->dev, "unknown resource type (%u)\n", type);
> It's an enum, do we need the default? Hence do we need the return value?
>
I think it does not harm and helps with extending the enum without
silently failing if all the places where it is used are not properly
updated.
>> + return -EINVAL;
>> + }
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_set_resource, CXL);
>> +
>> static int cxl_memdev_release_file(struct inode *inode, struct file *file)
>> {
>> struct cxl_memdev *cxlmd =
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 4be35dc22202..742a7b2a1be5 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -11,6 +11,8 @@
>> #include <linux/pci.h>
>> #include <linux/aer.h>
>> #include <linux/io.h>
>> +#include <linux/cxl/cxl.h>
>> +#include <linux/cxl/pci.h>
>> #include "cxlmem.h"
>> #include "cxlpci.h"
>> #include "cxl.h"
>> @@ -795,6 +797,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> struct cxl_memdev *cxlmd;
>> int i, rc, pmu_count;
>> bool irq_avail;
>> + u16 dvsec;
>>
>> /*
>> * Double check the anonymous union trickery in struct cxl_regs
>> @@ -815,12 +818,14 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> pci_set_drvdata(pdev, cxlds);
>>
>> cxlds->rcd = is_cxl_restricted(pdev);
>> - cxlds->serial = pci_get_dsn(pdev);
>> - cxlds->cxl_dvsec = pci_find_dvsec_capability(
>> - pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
>> - if (!cxlds->cxl_dvsec)
>> + cxl_set_serial(cxlds, pci_get_dsn(pdev));
>> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
>> + CXL_DVSEC_PCIE_DEVICE);
>> + if (!dvsec)
>> dev_warn(&pdev->dev,
>> "Device DVSEC not present, skip CXL.mem init\n");
>> + else
>> + cxl_set_dvsec(cxlds, dvsec);
> Set it unconditionally perhaps. If it's NULL that's fine and then it corresponds
> directly to the previous
OK. I guess keeping the dev_warn. Right?
>>
>> rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
>> if (rc)
>> diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
>> index 6f1a01ded7d4..3a7406aa950c 100644
>> --- a/drivers/net/ethernet/sfc/efx.c
>> +++ b/drivers/net/ethernet/sfc/efx.c
>> @@ -33,6 +33,7 @@
>> #include "selftest.h"
>> #include "sriov.h"
>> #include "efx_devlink.h"
>> +#include "efx_cxl.h"
>>
>> #include "mcdi_port_common.h"
>> #include "mcdi_pcol.h"
>> @@ -899,6 +900,9 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
>> efx_pci_remove_main(efx);
>>
>> efx_fini_io(efx);
>> +
>> + efx_cxl_exit(efx);
>> +
>> pci_dbg(efx->pci_dev, "shutdown successful\n");
>>
>> efx_fini_devlink_and_unlock(efx);
>> @@ -1109,6 +1113,15 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
>> if (rc)
>> goto fail2;
>>
>> + /* A successful cxl initialization implies a CXL region created to be
>> + * used for PIO buffers. If there is no CXL support, or initialization
>> + * fails, efx_cxl_pio_initialised wll be false and legacy PIO buffers
>> + * defined at specific PCI BAR regions will be used.
>> + */
>> + rc = efx_cxl_init(efx);
>> + if (rc)
>> + pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
> If you are carrying on anyway is pci_info() more appropriate?
> Personally I dislike muddling on in error cases, but understand
> it can be useful on occasion at the cost of more complex flows.
>
>
Not sure. Note this is for the case something went wrong when the device
has CXL support.
It is not fatal, but it is an error.
>> +
>> rc = efx_pci_probe_post_io(efx);
>> if (rc) {
>> /* On failure, retry once immediately.
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> new file mode 100644
>> index 000000000000..bba36cbbab22
>> --- /dev/null
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -0,0 +1,86 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/****************************************************************************
>> + *
>> + * Driver for AMD network controllers and boards
>> + * Copyright (C) 2024, Advanced Micro Devices, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License version 2 as published
>> + * by the Free Software Foundation, incorporated herein by reference.
>> + */
>> +
>> +#include <linux/cxl/cxl.h>
>> +#include <linux/cxl/pci.h>
>> +#include <linux/pci.h>
>> +
>> +#include "net_driver.h"
>> +#include "efx_cxl.h"
>> +
>> +#define EFX_CTPIO_BUFFER_SIZE (1024 * 1024 * 256)
>> +
>> +int efx_cxl_init(struct efx_nic *efx)
>> +{
>> + struct pci_dev *pci_dev = efx->pci_dev;
>> + struct efx_cxl *cxl;
>> + struct resource res;
>> + u16 dvsec;
>> + int rc;
>> +
>> + efx->efx_cxl_pio_initialised = false;
>> +
>> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
>> + CXL_DVSEC_PCIE_DEVICE);
>> +
> Trivial but probably no blank line here. Keeps the error condition tightly
> grouped with the call.
OK
>> + if (!dvsec)
>> + return 0;
>> +
>> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
>> +
>> + efx->cxl = kzalloc(sizeof(*cxl), GFP_KERNEL);
>> + if (!efx->cxl)
>> + return -ENOMEM;
>> +
>> + cxl = efx->cxl;
> Rather than setting it back to zero in some error paths I'd
> suggest keeping it as local only until you know everything
> succeeded.
>
> cxl = kzalloc(...)
>
It makes sense.
> //maybe also cxlds as then you can use __free() to handle the
> //cleanup paths for both allowing early returns instead
> //of gotos.
Maybe, but using __free is discouraged in network code: 1.6.5 at
https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html
> ...
>
> efx->cxl = cxl;
>
> return 0;
>
>> +
>> + cxl->cxlds = cxl_accel_state_create(&pci_dev->dev);
>> + if (IS_ERR(cxl->cxlds)) {
>> + pci_err(pci_dev, "CXL accel device state failed");
>> + kfree(efx->cxl);
> Use the a separate label below. Error blocks in a given function
> should probably do one or the other between going to labels
> or handling locally. Mixture is harder to read.
OK
>
>> + return -ENOMEM;
>> + }
>> +
>> + cxl_set_dvsec(cxl->cxlds, dvsec);
>> + cxl_set_serial(cxl->cxlds, pci_dev->dev.id);
>> +
>> + res = DEFINE_RES_MEM(0, EFX_CTPIO_BUFFER_SIZE);
>> + if (cxl_set_resource(cxl->cxlds, res, CXL_ACCEL_RES_DPA)) {
>> + pci_err(pci_dev, "cxl_set_resource DPA failed\n");
>> + rc = -EINVAL;
>> + goto err;
>> + }
>> +
>> + res = DEFINE_RES_MEM_NAMED(0, EFX_CTPIO_BUFFER_SIZE, "ram");
>> + if (cxl_set_resource(cxl->cxlds, res, CXL_ACCEL_RES_RAM)) {
>> + pci_err(pci_dev, "cxl_set_resource RAM failed\n");
>> + rc = -EINVAL;
>> + goto err;
>> + }
>> +
>> + return 0;
>> +err:
>> + kfree(cxl->cxlds);
>> + kfree(cxl);
>> + efx->cxl = NULL;
>> +
>> + return rc;
>> +}
>> +
>> +void efx_cxl_exit(struct efx_nic *efx)
>> +{
>> + if (efx->cxl) {
>> + kfree(efx->cxl->cxlds);
>> + kfree(efx->cxl);
>> + }
>> +}
>> +
>> +MODULE_IMPORT_NS(CXL);
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
>> new file mode 100644
>> index 000000000000..f57fb2afd124
>> --- /dev/null
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
> ...
>
>
>> +struct efx_cxl {
>> + struct cxl_dev_state *cxlds;
>> + struct cxl_memdev *cxlmd;
>> + struct cxl_root_decoder *cxlrd;
>> + struct cxl_port *endpoint;
>> + struct cxl_endpoint_decoder *cxled;
>> + struct cxl_region *efx_region;
> Why is the region efx_ prefixed but nothing else?
> Feels a little random.
>
>> + void __iomem *ctpio_cxl;
>> +};
>> +
>> +int efx_cxl_init(struct efx_nic *efx);
>> +void efx_cxl_exit(struct efx_nic *efx);
>> +#endif
>> diff --git a/include/linux/cxl/cxl.h b/include/linux/cxl/cxl.h
>> new file mode 100644
>> index 000000000000..e78eefa82123
>> --- /dev/null
>> +++ b/include/linux/cxl/cxl.h
>> @@ -0,0 +1,21 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright(c) 2024 Advanced Micro Devices, Inc. */
>> +
>> +#ifndef __CXL_H
>> +#define __CXL_H
>> +
>> +#include <linux/device.h>
>> +
>> +enum cxl_resource {
>> + CXL_ACCEL_RES_DPA,
>> + CXL_ACCEL_RES_RAM,
>> + CXL_ACCEL_RES_PMEM,
>> +};
>> +
>> +struct cxl_dev_state *cxl_accel_state_create(struct device *dev);
>> +
>> +void cxl_set_dvsec(struct cxl_dev_state *cxlds, u16 dvsec);
>> +void cxl_set_serial(struct cxl_dev_state *cxlds, u64 serial);
>> +int cxl_set_resource(struct cxl_dev_state *cxlds, struct resource res,
>> + enum cxl_resource);
>> +#endif
>> diff --git a/include/linux/cxl/pci.h b/include/linux/cxl/pci.h
>> new file mode 100644
>> index 000000000000..c337ae8797e6
>> --- /dev/null
>> +++ b/include/linux/cxl/pci.h
>> @@ -0,0 +1,23 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright(c) 2024 Advanced Micro Devices, Inc. */
> Bit bold to claim sole copyright of a cut and paste blob.
> Fine to add AMD one, but keep the original copyright as well.
>
Sure.
>> +
>> +#ifndef __CXL_ACCEL_PCI_H
>> +#define __CXL_ACCEL_PCI_H
>> +
>> +/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
>> +#define CXL_DVSEC_PCIE_DEVICE 0
>> +#define CXL_DVSEC_CAP_OFFSET 0xA
>> +#define CXL_DVSEC_MEM_CAPABLE BIT(2)
>> +#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
>> +#define CXL_DVSEC_CTRL_OFFSET 0xC
>> +#define CXL_DVSEC_MEM_ENABLE BIT(2)
>> +#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
>> +#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
>> +#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
>> +#define CXL_DVSEC_MEM_ACTIVE BIT(1)
>> +#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
>> +#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
>> +#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
> Brackets around (i) to protect against stupid use of the macro.
> This is general kernel convention rather than a real problem here.
> Sure original code didn't do it but if we are touching the code
> might as well fix it ;)
I found this warning when checkpatch and I thought it should not be done
then as it was there from a previous patch.
But I agree, I should fix it now.
Thanks!
>
>> +#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
>> +
>> +#endif
Powered by blists - more mailing lists