[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <75a38217239d4df76f53cd6c355c5179ffb97546.camel@nvidia.com>
Date: Mon, 2 Feb 2026 14:48:28 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "kuba@...nel.org" <kuba@...nel.org>
CC: "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>, Tariq Toukan
<tariqt@...dia.com>, Gal Pressman <gal@...dia.com>, Mark Bloch
<mbloch@...dia.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Moshe Shemesh <moshe@...dia.com>,
"pabeni@...hat.com" <pabeni@...hat.com>, "edumazet@...gle.com"
<edumazet@...gle.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Saeed Mahameed <saeedm@...dia.com>, "leon@...nel.org" <leon@...nel.org>,
"horms@...nel.org" <horms@...nel.org>
Subject: Re: [PATCH net V2 2/4] net/mlx5: Fix deadlock between devlink lock
and esw->wq
On Thu, 2026-01-29 at 15:40 -0800, Jakub Kicinski wrote:
> On Thu, 29 Jan 2026 10:33:40 +0000 Cosmin Ratiu wrote:
> > > This is quite an ugly hack, is there no way to avoid the flush
> > > and
> > > let
> > > the work discover that what it was supposed to do is no longer
> > > needed?
> >
> > Not possible, unfortunately. I stared at it for quite a while. The
> > wq
> > is flushed because the esw is being unconfigured, which removes
> > data
> > structs the work handler uses. Flushing the work is required,
> > otherwise
> > we'll run into worse issues.
>
> And having a refount on (I presume) struct mlx5_esw_functions
> so that work can hold a ref is not an option?
> Are you planning to revisit this in -next?
Currently, mlx5_eswitch_disable_locked (with the devlink lock held)
waits for esw_vfs_changed_event_handler to finish.
The event handler needs to acquire the same lock and load/unload all
VFs, which touches the entire esw.
I don't currently see how to use reference counting on the esw to avoid
waiting for the handler.
But we can have a deeper look as part of an internal task to improve
this. For now, please accept the V3 fix (about-to-be-sent) with the
current approach because we couldn't find a better way.
Cosmin.
Powered by blists - more mailing lists