[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250107090641.39d70828@kernel.org>
Date: Tue, 7 Jan 2025 09:06:41 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Antoine Tenart <atenart@...nel.org>
Cc: davem@...emloft.net, pabeni@...hat.com, edumazet@...gle.com,
netdev@...r.kernel.org, gregkh@...uxfoundation.org, mhocko@...e.com,
stephen@...workplumber.org
Subject: Re: [RFC PATCH net-next 1/4] net-sysfs: remove rtnl_trylock from
device attributes
On Tue, 07 Jan 2025 17:30:03 +0100 Antoine Tenart wrote:
> Quoting Jakub Kicinski (2025-01-02 23:36:47)
> > On Wed, 18 Oct 2023 17:47:43 +0200 Antoine Tenart wrote:
> > > We have an ABBA deadlock between net device unregistration and sysfs
> > > files being accessed[1][2]. To prevent this from happening all paths
> > > taking the rtnl lock after the sysfs one (actually kn->active refcount)
> > > use rtnl_trylock and return early (using restart_syscall)[3] which can
> > > make syscalls to spin for a long time when there is contention on the
> > > rtnl lock[4].
> >
> > I was looking at the sysfs locking, and ended up going down a very
> > similar path. Luckily lore search for sysfs_break_active_protection()
> > surfaced this thread so I can save myself some duplicated work :)
>
> Seeing that thread in my inbox again is a nice surprise :-)
>
> Did you encounter any specific issue that made you look at the sysfs
> locking?
I started working on broadening the use of the netdev->lock
(per instance lock) to lower the rtnl_lock pressure.
I wanted to make sure I will not end up with the same trylock
hack when it comes to sysfs, so I started digging into the existing
issue...
> > Is there any particular reason why you haven't pursued this solution
> > further? I think it should work.
>
> I felt there wasn't much interest and feedback at the time and we had
> things in place to ease the initial issue we were working on (~ slow
> boot time w/ lots of netns and containers). With that and given the
> change was a bit tricky I didn't wanted to be the only one pushing for
> this.
>
> But I still think this could be beneficial for various use cases so if
> you're interested I'll be happy to revive it. I'll have to refresh my
> mind and run some tests again first. (Any additional testing will be
> appreciated too).
TBH my interest is a bit tangential. We keep adding device configuration
APIs to netdev, and they all end up taking rtnl_lock, even tho vast
majority of the time the configuration is completely local to a single
instance. I'm trying to lay enough the groundwork for using the instance
lock to enable less experienced developers using it. Kuniyuki is also
working on making rtnl_lock per netns. I think it's a good time to fix
the sysfs situation.
> > My version, FWIW:
> > https://github.com/kuba-moo/linux/commit/2724bb7275496a254b001fe06fe20ccc5addc9d2
>
> I might take a few of your changes in there, eg. I see you used an
> interruptible lock. With this and the few minors comments this RFC got I
> can prepare a new series.
Perfect.
Powered by blists - more mailing lists