[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <173703522332.6390.7759526922746662664@kwain>
Date: Thu, 16 Jan 2025 14:47:03 +0100
From: Antoine Tenart <atenart@...nel.org>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>
Cc: ecree.xilinx@...il.com, netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com
Subject: Re: [PATCH net v2] net: avoid race between device unregistration and ethnl ops
Quoting Przemek Kitszel (2025-01-16 10:44:40)
> On 1/16/25 10:21, Antoine Tenart wrote:
> > The following trace can be seen if a device is being unregistered while
> > its number of channels are being modified.
> >
> > DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> > WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120
> > CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771
> > RIP: 0010:__mutex_lock+0xc8a/0x1120
> > Call Trace:
> > <TASK>
> > ethtool_check_max_channel+0x1ea/0x880
> > ethnl_set_channels+0x3c3/0xb10
> > ethnl_default_set_doit+0x306/0x650
> > genl_family_rcv_msg_doit+0x1e3/0x2c0
> > genl_rcv_msg+0x432/0x6f0
> > netlink_rcv_skb+0x13d/0x3b0
> > genl_rcv+0x28/0x40
> > netlink_unicast+0x42e/0x720
> > netlink_sendmsg+0x765/0xc20
> > __sys_sendto+0x3ac/0x420
> > __x64_sys_sendto+0xe0/0x1c0
> > do_syscall_64+0x95/0x180
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > This is because unregister_netdevice_many_notify might run before the
> > rtnl lock section of ethnl operations, eg. set_channels in the above
> > example. In this example the rss lock would be destroyed by the device
> > unregistration path before being used again, but in general running
> > ethnl operations while dismantle has started is not a good idea.
> >
> > Fix this by denying any operation on devices being unregistered. A check
> > was already there in ethnl_ops_begin, but not wide enough.
> >
> > Note that the same issue cannot be seen on the ioctl version
> > (__dev_ethtool) because the device reference is retrieved from within
> > the rtnl lock section there. Once dismantle started, the net device is
> > unlisted and no reference will be found.
> >
> > Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered")
> > Signed-off-by: Antoine Tenart <atenart@...nel.org>
> > ---
>
> for future submissions, please add a changelog and a link to previous
> revisions
This one was a bit special as v2 is completely different from v1, not
much to describe. But sure, at least a link could help.
> > net/ethtool/netlink.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
> > index e3f0ef6b851b..4d18dc29b304 100644
> > --- a/net/ethtool/netlink.c
> > +++ b/net/ethtool/netlink.c
> > @@ -90,7 +90,7 @@ int ethnl_ops_begin(struct net_device *dev)
> > pm_runtime_get_sync(dev->dev.parent);
> >
> > if (!netif_device_present(dev) ||
> > - dev->reg_state == NETREG_UNREGISTERING) {
> > + dev->reg_state >= NETREG_UNREGISTERING) {
>
> looks good, but I would add a comment above enum netdev_reg_state
> definition, to avoid any new state added "at the end"
>
> what about NETREG_DUMMY? you want to cover it here too?
I'm not super familiar with NETREG_DUMMY but my understanding is those
devices aren't listed and aren't accessible through ethnl.
Having said that I do agree the checks on reg_state could be
consolidated, eg. reusing and improving dev_isalive(). I actually
planned to have a look at if this would make sense later on.
tl;dr; I don't think there's an issue in practice but we could probably
consolidate the code to make things easier to maintain and to read.
Thanks,
Antoine
Powered by blists - more mailing lists