[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9515a39b692eeaadbdc0dcf8903ad2ab9b3ca64e.camel@redhat.com>
Date: Tue, 08 Nov 2022 11:19:31 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>, Saeed Mahameed <saeed@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>, netdev@...r.kernel.org,
Tariq Toukan <tariqt@...dia.com>,
Moshe Shemesh <moshe@...dia.com>
Subject: Re: [V2 net 05/11] net/mlx5: Fix possible deadlock on
mlx5e_tx_timeout_work
On Mon, 2022-11-07 at 20:24 -0800, Jakub Kicinski wrote:
> On Sat, 5 Nov 2022 00:10:22 -0700 Saeed Mahameed wrote:
> > + /* Once deactivated, new tx_timeout_work won't be initiated. */
> > + if (current_work() != &priv->tx_timeout_work)
> > + cancel_work_sync(&priv->tx_timeout_work);
>
> The work takes rtnl_lock, are there no callers of
> mlx5e_switch_priv_channels() that are under rtnl_lock()?
>
> This patch is definitely going onto my "expecting Fixes"
> bingo card :S
I think Jakub is right and even mlx5e_close_locked() will deadlock on
cancel_work_sync() if the work is scheduled but it has not yet acquired
the rtnl lock.
IIRC lockdep is not able to catch this kind of situation, so you can
only observe the deadlock when reaching the critical scenario.
I'm wild guessing than a possible solution would be restrict the
state_lock scope in mlx5e_tx_timeout_work() around the state check,
without additional cancel_work operations.
Thanks,
Paolo
Powered by blists - more mailing lists