[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aHE5jmIOD0hOdWec@ff87d1e86a04>
Date: Fri, 11 Jul 2025 16:19:26 +0000
From: Subbaraya Sundeep <sbhatta@...vell.com>
To: Saeed Mahameed <saeed@...nel.org>
CC: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
Saeed
Mahameed <saeedm@...dia.com>, <netdev@...r.kernel.org>,
Tariq Toukan
<tariqt@...dia.com>, Gal Pressman <gal@...dia.com>,
Leon Romanovsky
<leonro@...dia.com>, Jiri Pirko <jiri@...dia.com>,
Simon Horman
<horms@...nel.org>
Subject: Re: [PATCH net-next V6 02/13] net/mlx5: Implement cqe_compress_type
via devlink params
On 2025-07-10 at 05:51:58, Saeed Mahameed (saeed@...nel.org) wrote:
> On 09 Jul 09:06, Subbaraya Sundeep wrote:
> > Hi Saeed,
> >
> > On 2025-07-09 at 03:04:44, Saeed Mahameed (saeed@...nel.org) wrote:
> > > From: Saeed Mahameed <saeedm@...dia.com>
> > >
> > > Selects which algorithm should be used by the NIC in order to decide rate of
> > > CQE compression dependeng on PCIe bus conditions.
> > >
> > > Supported values:
> > >
> > > 1) balanced, merges fewer CQEs, resulting in a moderate compression ratio
> > > but maintaining a balance between bandwidth savings and performance
> > > 2) aggressive, merges more CQEs into a single entry, achieving a higher
> > > compression rate and maximizing performance, particularly under high
> > > traffic loads.
> > >
> > > Signed-off-by: Saeed Mahameed <saeedm@...dia.com>
> > > Reviewed-by: Jiri Pirko <jiri@...dia.com>
> > > ---
> > > Documentation/networking/devlink/mlx5.rst | 9 +
> > > .../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
> > > .../net/ethernet/mellanox/mlx5/core/devlink.c | 8 +
> > > .../net/ethernet/mellanox/mlx5/core/devlink.h | 1 +
> > > .../mellanox/mlx5/core/lib/nv_param.c | 245 ++++++++++++++++++
> > > .../mellanox/mlx5/core/lib/nv_param.h | 14 +
> > > include/linux/mlx5/driver.h | 1 +
> > > 7 files changed, 279 insertions(+), 1 deletion(-)
> > > create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
> > > create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.h
> > >
> > > diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
> > > index 7febe0aecd53..417e5cdcd35d 100644
> > > --- a/Documentation/networking/devlink/mlx5.rst
> > > +++ b/Documentation/networking/devlink/mlx5.rst
> > > @@ -117,6 +117,15 @@ parameters.
> > > - driverinit
> > > - Control the size (in packets) of the hairpin queues.
> > >
> > > + * - ``cqe_compress_type``
> > > + - string
> > > + - permanent
> > > + - Configure which algorithm should be used by the NIC in order to decide
> > > + rate of CQE compression dependeng on PCIe bus conditions.
typo -> depending
> > > +
> > > + * ``balanced`` : Merges fewer CQEs, resulting in a moderate compression ratio but maintaining a balance between bandwidth savings and performance
> > > + * ``aggressive`` : Merges more CQEs into a single entry, achieving a higher compression rate and maximizing performance, particularly under high traffic loads
> > > +
> > I think it is better to have this cqe param from ethtool.
> > I remember Jakub suggested to implement cqe-size in ethtool when I
> > submitted with devlink initially. Do you think this HW feature may become
> > common for other NICs too in future?
> >
>
> I don't think we should be using ehttool to set permanent NV global HCA
> configs :/, as the series suggests, these are permanent device parameters
> that are preserved on power cycles, hence "permanent devlink param".
>
> "balanced and aggressive" currently the two different ASIC implementation
> for cqe compression in ConnectX chips to accommodate different kinds of
> servers/topologies and PCI connections. I don't see how this could be
> common for other vendors and if others would agree on the same naming..
>
Got it.
Thanks,
Sundeep
> > Thanks,
> > Sundeep
> >
> > > The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
> > >
> > > Info versions
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> > > index d292e6a9e22c..26c824e13c45 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> > > @@ -17,7 +17,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
> > > fs_counters.o fs_ft_pool.o rl.o lag/debugfs.o lag/lag.o dev.o events.o wq.o lib/gid.o \
> > > lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o diag/fs_tracepoint.o \
> > > diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o diag/reporter_vnic.o \
> > > - fw_reset.o qos.o lib/tout.o lib/aso.o wc.o fs_pool.o
> > > + fw_reset.o qos.o lib/tout.o lib/aso.o wc.o fs_pool.o lib/nv_param.o
> > >
> > > #
> > > # Netdev basic
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
> > > index 3ffa3fbacd16..18347b44d611 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
> > > @@ -10,6 +10,7 @@
> > > #include "esw/qos.h"
> > > #include "sf/dev/dev.h"
> > > #include "sf/sf.h"
> > > +#include "lib/nv_param.h"
> > >
> > > static int mlx5_devlink_flash_update(struct devlink *devlink,
> > > struct devlink_flash_update_params *params,
> > > @@ -895,8 +896,14 @@ int mlx5_devlink_params_register(struct devlink *devlink)
> > > if (err)
> > > goto max_uc_list_err;
> > >
> > > + err = mlx5_nv_param_register_dl_params(devlink);
> > > + if (err)
> > > + goto nv_param_err;
> > > +
> > > return 0;
> > >
> > > +nv_param_err:
> > > + mlx5_devlink_max_uc_list_params_unregister(devlink);
> > > max_uc_list_err:
> > > mlx5_devlink_auxdev_params_unregister(devlink);
> > > auxdev_reg_err:
> > > @@ -907,6 +914,7 @@ int mlx5_devlink_params_register(struct devlink *devlink)
> > >
> > > void mlx5_devlink_params_unregister(struct devlink *devlink)
> > > {
> > > + mlx5_nv_param_unregister_dl_params(devlink);
> > > mlx5_devlink_max_uc_list_params_unregister(devlink);
> > > mlx5_devlink_auxdev_params_unregister(devlink);
> > > devl_params_unregister(devlink, mlx5_devlink_params,
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
> > > index 961f75da6227..74bcdfa70361 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
> > > @@ -22,6 +22,7 @@ enum mlx5_devlink_param_id {
> > > MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT,
> > > MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES,
> > > MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE,
> > > + MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE
> > > };
> > >
> > > struct mlx5_trap_ctx {
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
> > > new file mode 100644
> > > index 000000000000..20a39483be04
> > > --- /dev/null
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
> > > @@ -0,0 +1,245 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
> > > +
> > > +#include "nv_param.h"
> > > +#include "mlx5_core.h"
> > > +
> > > +enum {
> > > + MLX5_CLASS_0_CTRL_ID_NV_SW_OFFLOAD_CONFIG = 0x10a,
> > > +};
> > > +
> > > +struct mlx5_ifc_configuration_item_type_class_global_bits {
> > > + u8 type_class[0x8];
> > > + u8 parameter_index[0x18];
> > > +};
> > > +
> > > +union mlx5_ifc_config_item_type_auto_bits {
> > > + struct mlx5_ifc_configuration_item_type_class_global_bits
> > > + configuration_item_type_class_global;
> > > + u8 reserved_at_0[0x20];
> > > +};
> > > +
> > > +struct mlx5_ifc_config_item_bits {
> > > + u8 valid[0x2];
> > > + u8 priority[0x2];
> > > + u8 header_type[0x2];
> > > + u8 ovr_en[0x1];
> > > + u8 rd_en[0x1];
> > > + u8 access_mode[0x2];
> > > + u8 reserved_at_a[0x1];
> > > + u8 writer_id[0x5];
> > > + u8 version[0x4];
> > > + u8 reserved_at_14[0x2];
> > > + u8 host_id_valid[0x1];
> > > + u8 length[0x9];
> > > +
> > > + union mlx5_ifc_config_item_type_auto_bits type;
> > > +
> > > + u8 reserved_at_40[0x10];
> > > + u8 crc16[0x10];
> > > +};
> > > +
> > > +struct mlx5_ifc_mnvda_reg_bits {
> > > + struct mlx5_ifc_config_item_bits configuration_item_header;
> > > +
> > > + u8 configuration_item_data[64][0x20];
> > > +};
> > > +
> > > +struct mlx5_ifc_nv_sw_offload_conf_bits {
> > > + u8 ip_over_vxlan_port[0x10];
> > > + u8 tunnel_ecn_copy_offload_disable[0x1];
> > > + u8 pci_atomic_mode[0x3];
> > > + u8 sr_enable[0x1];
> > > + u8 ptp_cyc2realtime[0x1];
> > > + u8 vector_calc_disable[0x1];
> > > + u8 uctx_en[0x1];
> > > + u8 prio_tag_required_en[0x1];
> > > + u8 esw_fdb_ipv4_ttl_modify_enable[0x1];
> > > + u8 mkey_by_name[0x1];
> > > + u8 ip_over_vxlan_en[0x1];
> > > + u8 one_qp_per_recovery[0x1];
> > > + u8 cqe_compression[0x3];
> > > + u8 tunnel_udp_entropy_proto_disable[0x1];
> > > + u8 reserved_at_21[0x1];
> > > + u8 ar_enable[0x1];
> > > + u8 log_max_outstanding_wqe[0x5];
> > > + u8 vf_migration[0x2];
> > > + u8 log_tx_psn_win[0x6];
> > > + u8 lro_log_timeout3[0x4];
> > > + u8 lro_log_timeout2[0x4];
> > > + u8 lro_log_timeout1[0x4];
> > > + u8 lro_log_timeout0[0x4];
> > > +};
> > > +
> > > +#define MNVDA_HDR_SZ \
> > > + (MLX5_ST_SZ_BYTES(mnvda_reg) - \
> > > + MLX5_BYTE_OFF(mnvda_reg, configuration_item_data))
> > > +
> > > +#define MLX5_SET_CFG_ITEM_TYPE(_cls_name, _mnvda_ptr, _field, _val) \
> > > + MLX5_SET(mnvda_reg, _mnvda_ptr, \
> > > + configuration_item_header.type.configuration_item_type_class_##_cls_name._field, \
> > > + _val)
> > > +
> > > +#define MLX5_SET_CFG_HDR_LEN(_mnvda_ptr, _cls_name) \
> > > + MLX5_SET(mnvda_reg, _mnvda_ptr, configuration_item_header.length, \
> > > + MLX5_ST_SZ_BYTES(_cls_name))
> > > +
> > > +#define MLX5_GET_CFG_HDR_LEN(_mnvda_ptr) \
> > > + MLX5_GET(mnvda_reg, _mnvda_ptr, configuration_item_header.length)
> > > +
> > > +static int mlx5_nv_param_read(struct mlx5_core_dev *dev, void *mnvda,
> > > + size_t len)
> > > +{
> > > + u32 param_idx, type_class;
> > > + u32 header_len;
> > > + void *cls_ptr;
> > > + int err;
> > > +
> > > + if (WARN_ON(len > MLX5_ST_SZ_BYTES(mnvda_reg)) || len < MNVDA_HDR_SZ)
> > > + return -EINVAL; /* A caller bug */
> > > +
> > > + err = mlx5_core_access_reg(dev, mnvda, len, mnvda, len, MLX5_REG_MNVDA,
> > > + 0, 0);
> > > + if (!err)
> > > + return 0;
> > > +
> > > + cls_ptr = MLX5_ADDR_OF(mnvda_reg, mnvda,
> > > + configuration_item_header.type.configuration_item_type_class_global);
> > > +
> > > + type_class = MLX5_GET(configuration_item_type_class_global, cls_ptr,
> > > + type_class);
> > > + param_idx = MLX5_GET(configuration_item_type_class_global, cls_ptr,
> > > + parameter_index);
> > > + header_len = MLX5_GET_CFG_HDR_LEN(mnvda);
> > > +
> > > + mlx5_core_warn(dev, "Failed to read mnvda reg: type_class 0x%x, param_idx 0x%x, header_len %u, err %d\n",
> > > + type_class, param_idx, header_len, err);
> > > +
> > > + return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static int mlx5_nv_param_write(struct mlx5_core_dev *dev, void *mnvda,
> > > + size_t len)
> > > +{
> > > + if (WARN_ON(len > MLX5_ST_SZ_BYTES(mnvda_reg)) || len < MNVDA_HDR_SZ)
> > > + return -EINVAL;
> > > +
> > > + if (WARN_ON(MLX5_GET_CFG_HDR_LEN(mnvda) == 0))
> > > + return -EINVAL;
> > > +
> > > + return mlx5_core_access_reg(dev, mnvda, len, mnvda, len, MLX5_REG_MNVDA,
> > > + 0, 1);
> > > +}
> > > +
> > > +static int
> > > +mlx5_nv_param_read_sw_offload_conf(struct mlx5_core_dev *dev, void *mnvda,
> > > + size_t len)
> > > +{
> > > + MLX5_SET_CFG_ITEM_TYPE(global, mnvda, type_class, 0);
> > > + MLX5_SET_CFG_ITEM_TYPE(global, mnvda, parameter_index,
> > > + MLX5_CLASS_0_CTRL_ID_NV_SW_OFFLOAD_CONFIG);
> > > + MLX5_SET_CFG_HDR_LEN(mnvda, nv_sw_offload_conf);
> > > +
> > > + return mlx5_nv_param_read(dev, mnvda, len);
> > > +}
> > > +
> > > +static const char *const
> > > + cqe_compress_str[] = { "balanced", "aggressive" };
> > > +
> > > +static int
> > > +mlx5_nv_param_devlink_cqe_compress_get(struct devlink *devlink, u32 id,
> > > + struct devlink_param_gset_ctx *ctx)
> > > +{
> > > + struct mlx5_core_dev *dev = devlink_priv(devlink);
> > > + u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
> > > + u8 value = U8_MAX;
> > > + void *data;
> > > + int err;
> > > +
> > > + err = mlx5_nv_param_read_sw_offload_conf(dev, mnvda, sizeof(mnvda));
> > > + if (err)
> > > + return err;
> > > +
> > > + data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
> > > + value = MLX5_GET(nv_sw_offload_conf, data, cqe_compression);
> > > +
> > > + if (value >= ARRAY_SIZE(cqe_compress_str))
> > > + return -EOPNOTSUPP;
> > > +
> > > + strscpy(ctx->val.vstr, cqe_compress_str[value], sizeof(ctx->val.vstr));
> > > + return 0;
> > > +}
> > > +
> > > +static int
> > > +mlx5_nv_param_devlink_cqe_compress_validate(struct devlink *devlink, u32 id,
> > > + union devlink_param_value val,
> > > + struct netlink_ext_ack *extack)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(cqe_compress_str); i++) {
> > > + if (!strcmp(val.vstr, cqe_compress_str[i]))
> > > + return 0;
> > > + }
> > > +
> > > + NL_SET_ERR_MSG_MOD(extack,
> > > + "Invalid value, supported values are balanced/aggressive");
> > > + return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static int
> > > +mlx5_nv_param_devlink_cqe_compress_set(struct devlink *devlink, u32 id,
> > > + struct devlink_param_gset_ctx *ctx,
> > > + struct netlink_ext_ack *extack)
> > > +{
> > > + struct mlx5_core_dev *dev = devlink_priv(devlink);
> > > + u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
> > > + int err = 0;
> > > + void *data;
> > > + u8 value;
> > > +
> > > + if (!strcmp(ctx->val.vstr, "aggressive"))
> > > + value = 1;
> > > + else /* balanced: can't be anything else already validated above */
> > > + value = 0;
> > > +
> > > + err = mlx5_nv_param_read_sw_offload_conf(dev, mnvda, sizeof(mnvda));
> > > + if (err) {
> > > + NL_SET_ERR_MSG_MOD(extack,
> > > + "Failed to read sw_offload_conf mnvda reg");
> > > + return err;
> > > + }
> > > +
> > > + data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
> > > + MLX5_SET(nv_sw_offload_conf, data, cqe_compression, value);
> > > +
> > > + return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda));
> > > +}
> > > +
> > > +static const struct devlink_param mlx5_nv_param_devlink_params[] = {
> > > + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE,
> > > + "cqe_compress_type", DEVLINK_PARAM_TYPE_STRING,
> > > + BIT(DEVLINK_PARAM_CMODE_PERMANENT),
> > > + mlx5_nv_param_devlink_cqe_compress_get,
> > > + mlx5_nv_param_devlink_cqe_compress_set,
> > > + mlx5_nv_param_devlink_cqe_compress_validate),
> > > +};
> > > +
> > > +int mlx5_nv_param_register_dl_params(struct devlink *devlink)
> > > +{
> > > + if (!mlx5_core_is_pf(devlink_priv(devlink)))
> > > + return 0;
> > > +
> > > + return devl_params_register(devlink, mlx5_nv_param_devlink_params,
> > > + ARRAY_SIZE(mlx5_nv_param_devlink_params));
> > > +}
> > > +
> > > +void mlx5_nv_param_unregister_dl_params(struct devlink *devlink)
> > > +{
> > > + if (!mlx5_core_is_pf(devlink_priv(devlink)))
> > > + return;
> > > +
> > > + devl_params_unregister(devlink, mlx5_nv_param_devlink_params,
> > > + ARRAY_SIZE(mlx5_nv_param_devlink_params));
> > > +}
> > > +
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.h
> > > new file mode 100644
> > > index 000000000000..9f4922ff7745
> > > --- /dev/null
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.h
> > > @@ -0,0 +1,14 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> > > +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
> > > +
> > > +#ifndef __MLX5_NV_PARAM_H
> > > +#define __MLX5_NV_PARAM_H
> > > +
> > > +#include <linux/mlx5/driver.h>
> > > +#include "devlink.h"
> > > +
> > > +int mlx5_nv_param_register_dl_params(struct devlink *devlink);
> > > +void mlx5_nv_param_unregister_dl_params(struct devlink *devlink);
> > > +
> > > +#endif
> > > +
> > > diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> > > index e6ba8f4f4bd1..96ce152e739f 100644
> > > --- a/include/linux/mlx5/driver.h
> > > +++ b/include/linux/mlx5/driver.h
> > > @@ -135,6 +135,7 @@ enum {
> > > MLX5_REG_MTCAP = 0x9009,
> > > MLX5_REG_MTMP = 0x900A,
> > > MLX5_REG_MCIA = 0x9014,
> > > + MLX5_REG_MNVDA = 0x9024,
> > > MLX5_REG_MFRL = 0x9028,
> > > MLX5_REG_MLCR = 0x902b,
> > > MLX5_REG_MRTC = 0x902d,
> > > --
> > > 2.50.0
> > >
Powered by blists - more mailing lists