[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3a5673b7-47d2-4a59-a515-95b396364994@nvidia.com>
Date: Sun, 17 Aug 2025 11:36:43 +0300
From: Carolina Jubran <cjubran@...dia.com>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>,
Tariq Toukan <tariqt@...dia.com>
Cc: Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
Mark Bloch <mbloch@...dia.com>, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
Gal Pressman <gal@...dia.com>, Dragos Tatulea <dtatulea@...dia.com>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no
configuration remains
On 14/08/2025 12:55, Przemek Kitszel wrote:
> On 8/13/25 16:31, Tariq Toukan wrote:
>> From: Carolina Jubran <cjubran@...dia.com>
>>
>> If a VF has been configured and the user later clears all QoS settings,
>> the vport element remains in the firmware QoS tree. This leads to
>> inconsistent behavior compared to VFs that were never configured, since
>> the FW assumes that unconfigured VFs are outside the QoS hierarchy.
>> As a result, the bandwidth share across VFs may differ, even though
>> none of them appear to have any configuration.
>>
>> Align the driver behavior with the FW expectation by destroying the
>> vport QoS element when all configurations are removed.
>>
>> Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
>> Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement
>> full support for tc-bw")
>> Signed-off-by: Carolina Jubran <cjubran@...dia.com>
>> Reviewed-by: Cosmin Ratiu <cratiu@...dia.com>
>> Signed-off-by: Tariq Toukan <tariqt@...dia.com>
>> ---
>> .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 54 +++++++++++++++++--
>> 1 file changed, 51 insertions(+), 3 deletions(-)
>
>
>> +static bool esw_vport_qos_check_and_disable(struct mlx5_vport *vport,
>> + struct devlink_rate *parent,
>> + u64 tx_max, u64 tx_share,
>> + u32 *tc_bw)
>> +{
>> + struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
>> +
>> + if (parent || tx_max || tx_share || !esw_qos_tc_bw_disabled(tc_bw))
>> + return false;
>> +
>> + esw_qos_lock(esw);
>> + mlx5_esw_qos_vport_disable_locked(vport);
>> + esw_qos_unlock(esw);
>> +
>> + return true;
>> +}
>> +
>> int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
>> {
>> if (esw->qos.domain)
>> @@ -1703,6 +1731,11 @@ int
>> mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate
>> *rate_leaf, void
>> if (!mlx5_esw_allowed(esw))
>> return -EPERM;
>> + if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
>> + rate_leaf->tx_max, tx_share,
>> + rate_leaf->tc_bw))
>> + return 0;
>> +
>
> I would rather keep executing the code that "sets tx_share to 0 and
> propagates the info", and only then prune all-0 nodes.
> Same for other params (tx_max, ...)
>
> That would be less risky and more future-proof, and also would let your
> check&disable function to take less params.
>
> Finally, the name is poor, what about:?
> esw_vport_qos_prune_empty(vport, rate_leaf);
> (after applying my prev suggestion the above line will be at the
> bottom of function).
>
> Also a note, that if you apply the above, it would be also good to
> keep the "esw_qos_lock() just once" (as it is now)
>
Thanks for the review!
My initial thought was to reduce the amount of FW commands, but I agree
with your points.
Also, thanks for the name suggestion.
I’ll fix and send v2.
Carolina
>> err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_share",
>> &tx_share, extack);
>> if (err)
>> return err;
>> @@ -1724,6 +1757,11 @@ int
>> mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf,
>> void *
>> if (!mlx5_esw_allowed(esw))
>> return -EPERM;
>> + if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
>> tx_max,
>> + rate_leaf->tx_share,
>> + rate_leaf->tc_bw))
>> + return 0;
>> +
>> err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_max",
>> &tx_max, extack);
>> if (err)
>> return err;
>> @@ -1749,6 +1787,11 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct
>> devlink_rate *rate_leaf,
>> if (!mlx5_esw_allowed(esw))
>> return -EPERM;
>> + if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
>> + rate_leaf->tx_max,
>> + rate_leaf->tx_share, tc_bw))
>> + return 0;
>> +
>> disable = esw_qos_tc_bw_disabled(tc_bw);
>> esw_qos_lock(esw);
>> @@ -1930,6 +1973,11 @@ int
>> mlx5_esw_devlink_rate_leaf_parent_set(struct devlink_rate *devlink_rate,
>> struct mlx5_esw_sched_node *node;
>> struct mlx5_vport *vport = priv;
>> + if (esw_vport_qos_check_and_disable(vport, parent, devlink_rate-
>> >tx_max,
>> + devlink_rate->tx_share,
>> + devlink_rate->tc_bw))
>> + return 0;
>> +
>> if (!parent)
>> return mlx5_esw_qos_vport_update_parent(vport, NULL, extack);
>
Powered by blists - more mailing lists