[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ffb70369-e64e-4e2a-8555-c36c6013b32f@nvidia.com>
Date: Sun, 11 May 2025 15:52:14 +0300
From: Moshe Shemesh <moshe@...dia.com>
To: Shawn.Shao <shawn.shao@...uarmicro.com>, <saeedm@...dia.com>,
<leon@...nel.org>, <tariqt@...dia.com>, <andrew+netdev@...n.ch>,
<davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
<pabeni@...hat.com>, <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
CC: <xiaowu.ding@...uarmicro.com>
Subject: Re: [PATCH] MLX5: Fix semaphore leak on command timeout
On 5/9/2025 9:48 AM, Shawn.Shao wrote:
> From: Shawn Shao <shawn.shao@...uarmicro.com>
>
> Fixes a resource leak in the MLX5 driver when handling command timeouts.
> The command entry reference count (`mlx5_cmd_work_ent`) was not properly
> decremented during timeouts, causing the semaphore to remain unreleased.
>
> In the current flow, the reference count is incremented but not decremented
> in timeout cases. This prevents proper release of the semaphore.
>
> Add a condition to decrement the reference count when a timeout occurs,
> ensuring the semaphore is released and preventing resource leaks:
>
> if (!forced || mlx5_cmd_is_down(dev)
> ||!opcode_allowed(cmd, ent->op)
> || ent->ret == -ETIMEDOUT)
> cmd_ent_put(ent);
>
> This ensures the semaphore is released properly on command timeouts.
We can't release it on command timeout. The firmware may still write the
answer on the command slot memory, even if driver had timeout.
Note: few lines above in this code, there is a comment "only real
completion can free the cmd slot". There it will be released:
/* only real completion can free the cmd slot */
if (!forced) {
mlx5_core_err(dev, "Command completion arrived after timeout
(entry idx = %d).\n",
ent->idx);
cmd_ent_put(ent);
}
>
> Signed-off-by: Shawn Shao <shawn.shao@...uarmicro.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> index e53dbdc0a7a1..7f1f6345d90c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> @@ -1714,7 +1714,8 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force
>
> if (!forced || /* Real FW completion */
> mlx5_cmd_is_down(dev) || /* No real FW completion is expected */
> - !opcode_allowed(cmd, ent->op))
> + !opcode_allowed(cmd, ent->op) ||
> + ent->ret == -ETIMEDOUT)
> cmd_ent_put(ent);
>
> ent->ts2 = ktime_get_ns();
Powered by blists - more mailing lists