[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1765284977-1363052-9-git-send-email-tariqt@nvidia.com>
Date: Tue, 9 Dec 2025 14:56:16 +0200
From: Tariq Toukan <tariqt@...dia.com>
To: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David
S. Miller" <davem@...emloft.net>
CC: Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
Tariq Toukan <tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>,
<netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Gal Pressman <gal@...dia.com>, Moshe Shemesh
<moshe@...dia.com>, Breno Leitao <leitao@...ian.org>, Alexandre Cassen
<acassen@...p.free.fr>, William Tu <witu@...dia.com>
Subject: [PATCH net 8/9] net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
During channel reconfiguration (e.g., ethtool private flags changes),
the driver can trigger a kernel BUG_ON in dql_completed() with the error
"kernel BUG at lib/dynamic_queue_limits.c:99".
The issue occurs in the following sequence:
During mlx5e_safe_switch_params(), old channels are deactivated via
mlx5e_deactivate_txqsq(). New channels are created and activated, taking
ownership of the netdev_queues and their BQL state.
When old channels are closed via mlx5e_close_txqsq(), there may be
pending TX descriptors (sq->cc != sq->pc) that were in-flight during the
deactivation.
mlx5e_free_txqsq_descs() frees these pending descriptors and attempts to
complete them via netdev_tx_completed_queue().
However, the BQL state (dql->num_queued and dql->num_completed) have
been reset in mlx5e_activate_txqsq and belong to the new queue owner,
leading to dql->num_queued - dql->num_completed < nbytes.
This triggers BUG_ON(count > num_queued - num_completed) in
dql_completed().
Fixes: 3b88a535a8e1 ("net/mlx5e: Defer channels closure to reduce interface down time")
Signed-off-by: Tariq Toukan <tariqt@...dia.com>
Signed-off-by: William Tu <witu@...dia.com>
Reviewed-by: Dragos Tatulea <dtatulea@...dia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 14884b9ea7f3..a01ee656a1e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -939,7 +939,11 @@ void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq)
sq->dma_fifo_cc = dma_fifo_cc;
sq->cc = sqcc;
- netdev_tx_completed_queue(sq->txq, npkts, nbytes);
+ /* Do not update BQL for TXQs that got replaced by new active ones, as
+ * netdev_tx_reset_queue() is called for them in mlx5e_activate_txqsq().
+ */
+ if (sq == sq->priv->txq2sq[sq->txq_ix])
+ netdev_tx_completed_queue(sq->txq, npkts, nbytes);
}
#ifdef CONFIG_MLX5_CORE_IPOIB
--
2.31.1
Powered by blists - more mailing lists