lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ffb70369-e64e-4e2a-8555-c36c6013b32f@nvidia.com>
Date: Sun, 11 May 2025 15:52:14 +0300
From: Moshe Shemesh <moshe@...dia.com>
To: Shawn.Shao <shawn.shao@...uarmicro.com>, <saeedm@...dia.com>,
	<leon@...nel.org>, <tariqt@...dia.com>, <andrew+netdev@...n.ch>,
	<davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
	<pabeni@...hat.com>, <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
CC: <xiaowu.ding@...uarmicro.com>
Subject: Re: [PATCH] MLX5: Fix semaphore leak on command timeout



On 5/9/2025 9:48 AM, Shawn.Shao wrote:
> From: Shawn Shao <shawn.shao@...uarmicro.com>
> 
> Fixes a resource leak in the MLX5 driver when handling command timeouts.
> The command entry reference count (`mlx5_cmd_work_ent`) was not properly
> decremented during timeouts, causing the semaphore to remain unreleased.
> 
> In the current flow, the reference count is incremented but not decremented
> in timeout cases. This prevents proper release of the semaphore.
> 
> Add a condition to decrement the reference count when a timeout occurs,
> ensuring the semaphore is released and preventing resource leaks:
> 
>      if (!forced || mlx5_cmd_is_down(dev)
> 	    ||!opcode_allowed(cmd, ent->op)
> 	    || ent->ret == -ETIMEDOUT)
>          cmd_ent_put(ent);
> 
> This ensures the semaphore is released properly on command timeouts.

We can't release it on command timeout. The firmware may still write the 
answer on the command slot memory, even if driver had timeout.

Note: few lines above in this code, there is a comment "only real 
completion can free the cmd slot". There it will be released:

/* only real completion can free the cmd slot */
if (!forced) {
         mlx5_core_err(dev, "Command completion arrived after timeout 
(entry idx = %d).\n",
                       ent->idx);
         cmd_ent_put(ent);
}


> 
> Signed-off-by: Shawn Shao <shawn.shao@...uarmicro.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> index e53dbdc0a7a1..7f1f6345d90c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> @@ -1714,7 +1714,8 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force
>   
>   			if (!forced || /* Real FW completion */
>   			     mlx5_cmd_is_down(dev) || /* No real FW completion is expected */
> -			     !opcode_allowed(cmd, ent->op))
> +			     !opcode_allowed(cmd, ent->op) ||
> +			     ent->ret == -ETIMEDOUT)
>   				cmd_ent_put(ent);
>   
>   			ent->ts2 = ktime_get_ns();


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ