[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210929063126.4a702dbd@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Wed, 29 Sep 2021 06:31:26 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Antoine Tenart <atenart@...nel.org>
Cc: davem@...emloft.net, pabeni@...hat.com, gregkh@...uxfoundation.org,
ebiederm@...ssion.com, stephen@...workplumber.org,
herbert@...dor.apana.org.au, juri.lelli@...hat.com,
netdev@...r.kernel.org
Subject: Re: [RFC PATCH net-next 8/9] net: delay device_del until run_todo
On Wed, 29 Sep 2021 10:26:35 +0200 Antoine Tenart wrote:
> Quoting Jakub Kicinski (2021-09-29 02:02:29)
> > On Tue, 28 Sep 2021 14:54:59 +0200 Antoine Tenart wrote:
> > > The sysfs removal is done in device_del, and moving it outside of the
> > > rtnl lock does fix the initial deadlock. With that the trylock/restart
> > > logic can be removed in a following-up patch.
> >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index a1eab120bb50..d774fbec5d63 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -10593,6 +10593,8 @@ void netdev_run_todo(void)
> > > continue;
> > > }
> > >
> > > + device_del(&dev->dev);
> > > +
> > > dev->reg_state = NETREG_UNREGISTERED;
> > >
> > > netdev_wait_allrefs(dev);
> > > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> > > index 21c3fdeccf20..e754f00c117b 100644
> > > --- a/net/core/net-sysfs.c
> > > +++ b/net/core/net-sysfs.c
> > > @@ -1955,8 +1955,6 @@ void netdev_unregister_kobject(struct net_device *ndev)
> > > remove_queue_kobjects(ndev);
> > >
> > > pm_runtime_set_memalloc_noio(dev, false);
> > > -
> > > - device_del(dev);
> > > }
> > >
> > > /* Create sysfs entries for network device. */
> >
> > Doesn't this mean there may be sysfs files which are accessible
> > for an unregistered netdevice?
>
> It would mean having accessible sysfs files for a device in the
> NETREG_UNREGISTERING state; NETREG_UNREGISTERED still comes after
> device_del. It's a small difference but still important, I think.
>
> You raise a good point. Yes, that would mean accessing attributes of net
> devices being unregistered, meaning accessing or modifying unused or
> obsolete parameters and data (it shouldn't be garbage data though).
> Unlisting those sysfs files without removing them would be better here,
> to not expose files when the device is being unregistered while still
> allowing pending operations to complete. I don't know if that is doable
> in sysfs.
I wonder. Do we somehow remove the queue objects without waiting or are
those also waited on when we remove the device? 'Cause XPS is the part
that jumps out to me - we reset XPS after netdev_unregister_kobject().
Does it mean user can re-instate XPS settings after we thought we
already reset them?
> (While I did ran stress tests reading/writing attributes while
> unregistering devices, I think I missed an issue with the
> netdev_queue_default attributes; which hopefully can be fixed — if the
> whole idea is deemed acceptable).
Well, it's a little wobbly but I think the direction is sane.
It wouldn't feel super clean to add
if (dev->state != NETREG_REGISTERED)
goto out;
to the sysfs handlers but maybe it's better than leaving potential
traps for people who are not aware of the intricacies later? Not sure.
> > Isn't the point of having device_del() under rtnl_lock() to make sure
> > we sysfs handlers can't run on dead devices?
>
> Hard to say what was the initial point, there is a lot of history here
> :) I'm not sure it was done because of a particular reason; IMHO it just
> made sense to make this simple without having a good reason not to do
> so. And it helped with the naming collision detection.
FWIW the other two pieces of feedback I have is try to avoid the
synchronize_net() in patch 7 and add a new helper for the name
checking, which would return bool. The callers don't have any
business getting the struct.
Powered by blists - more mailing lists