[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230119104427.69d95782@kernel.org>
Date: Thu, 19 Jan 2023 10:44:27 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: "Lucero Palau, Alejandro" <alejandro.lucero-palau@....com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-net-drivers (AMD-Xilinx)" <linux-net-drivers@....com>,
"davem@...emloft.net" <davem@...emloft.net>,
"pabeni@...hat.com" <pabeni@...hat.com>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"ecree.xilinx@...il.com" <ecree.xilinx@...il.com>
Subject: Re: [PATCH net-next 1/7] sfc: add devlink support for ef100
On Thu, 19 Jan 2023 17:52:42 +0000 Lucero Palau, Alejandro wrote:
> On 1/19/23 17:16, Jakub Kicinski wrote:
> > On Thu, 19 Jan 2023 11:31:34 +0000 alejandro.lucero-palau@....com wrote:
> >> + devlink_unregister(efx->devlink);
> >> + devlink_free(efx->devlink);
> > Please use the devl_ APIs and take the devl_lock() explicitly.
> > Once you start adding sub-objects the API with implicit locking
> > gets racy.
>
> I need more help here.
>
> The explicit locking you refer to, is it for this specific code only?
I only had a quick look at the series, but I saw you add ports.
So the locking should be something like:
devlink = devlink_alloc();
devl_lock(devlink);
...
devl_register(devlink);
...
netdev_register(netdev);
devl_port_register(port_for_the_netdev);
...
devl_unlock();
And the inverse on the .remove path.
Basically you want to hold the devlink instance lock for most of
the .probe and .remove. That way nothing can bother the devlink
instance and the driver while the driver is initializing/finalizing.
Without holding the lock the linking between the devlink port and
the netdev gets a bit iffy. It's a circular dependency of sorts
because both the netdev carries a link to the port and the port
carries info about the netdev.
We've been figuring out workarounds for subtle ordering and locking
problems since devlink ports were created. Recently we just gave up
and started asking drivers to hold the instance lock across .probe/
/.remove.
> Also, I can not see all drivers locking/unlocking when doing
> devlink_unregister. Those doing it are calling code which invoke
> unregister devlink ports, like the NFP and I think ml5x as well.
Right, only netdevsim was fully converted so far. The syzbot and other
testers use netdevsim mostly. We'll push actual HW drivers towards this
locking slowly.
> In this case, no devlink port remains at this point, and no netdev either.
>
> What is the potential race against?
Right, I don't mean this particular spot, just over-trimmed the quote.
Powered by blists - more mailing lists