[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4b1397ffa83c73dfdab6bcbce51e564592e18c8.camel@nvidia.com>
Date: Thu, 3 Apr 2025 13:24:01 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "stfomichev@...il.com" <stfomichev@...il.com>
CC: "pabeni@...hat.com" <pabeni@...hat.com>, "edumazet@...gle.com"
<edumazet@...gle.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>, "sdf@...ichev.me"
<sdf@...ichev.me>, "kuba@...nel.org" <kuba@...nel.org>
Subject: Re: another netdev instance lock bug in ipv6_add_dev
On Wed, 2025-04-02 at 16:20 -0700, Stanislav Fomichev wrote:
> On 04/02, Cosmin Ratiu wrote:
> > Hi,
> >
> > Not sure if it's reported already, but I encountered a bug while
> > testing with the new locking scheme.
> > This is the call trace:
> >
> > [ 3454.975672] WARNING: CPU: 1 PID: 58237 at
> > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > [ 3455.008776] ? ipv6_add_dev+0x370/0x620
> > [ 3455.010097] ipv6_find_idev+0x96/0xe0
> > [ 3455.010725] addrconf_add_dev+0x1e/0xa0
> > [ 3455.011382] addrconf_init_auto_addrs+0xb0/0x720
> > [ 3455.013537] addrconf_notify+0x35f/0x8d0
> > [ 3455.014214] notifier_call_chain+0x38/0xf0
> > [ 3455.014903] netdev_state_change+0x65/0x90
> > [ 3455.015586] linkwatch_do_dev+0x5a/0x70
> > [ 3455.016238] rtnl_getlink+0x241/0x3e0
> > [ 3455.019046] rtnetlink_rcv_msg+0x177/0x5e0
> >
> > The call chain is rtnl_getlink -> linkwatch_sync_dev ->
> > linkwatch_do_dev -> netdev_state_change -> ...
> >
> > Nothing on this path acquires the netdev lock, resulting in a
> > warning.
> > Perhaps rtnl_getlink should acquire it, in addition to the RTNL
> > already
> > held by rtnetlink_rcv_msg?
> >
> > The same thing can be seen from the regular linkwatch wq:
> >
> > [ 3456.637014] WARNING: CPU: 16 PID: 83257 at
> > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > [ 3456.655305] Call Trace:
> > [ 3456.655610] <TASK>
> > [ 3456.655890] ? __warn+0x89/0x1b0
> > [ 3456.656261] ? ipv6_add_dev+0x370/0x620
> > [ 3456.660039] ipv6_find_idev+0x96/0xe0
> > [ 3456.660445] addrconf_add_dev+0x1e/0xa0
> > [ 3456.660861] addrconf_init_auto_addrs+0xb0/0x720
> > [ 3456.661803] addrconf_notify+0x35f/0x8d0
> > [ 3456.662236] notifier_call_chain+0x38/0xf0
> > [ 3456.662676] netdev_state_change+0x65/0x90
> > [ 3456.663112] linkwatch_do_dev+0x5a/0x70
> > [ 3456.663529] __linkwatch_run_queue+0xeb/0x200
> > [ 3456.663990] linkwatch_event+0x21/0x30
> > [ 3456.664399] process_one_work+0x211/0x610
> > [ 3456.664828] worker_thread+0x1cc/0x380
> > [ 3456.665691] kthread+0xf4/0x210
> >
> > In this case, __linkwatch_run_queue seems like a good place to grab
> > a
> > device lock before calling linkwatch_do_dev.
>
> Thanks for the report! What about linkwatch_sync_dev in
> netdev_run_todo
> and carrier_show? Should probably also need to be wrapped?
Done, here's the patch I'm testing with which works for all tests I
could get my hands on. Will you officially propose it (maybe in a
slightly different form) please?
-------------------------
linkwatch can end up calling the IPv6 addrconf notifier, which might
end
up calling ipv6_add_dev which requires holding the netdev lock.
This patch makes sure that the netdev instance lock is held on all call
paths.
Signed-off-by: Cosmin Ratiu <cratiu@...dia.com>
Change-Id: Ief821cf069408cecc82adaa01cafa0462c51908a
---
net/core/dev.c | 2 +-
net/core/link_watch.c | 2 ++
net/core/net-sysfs.c | 2 ++
net/core/rtnetlink.c | 2 ++
4 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 87cba93fa59f..1b9ee2828076 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11343,8 +11343,8 @@ void netdev_run_todo(void)
netdev_lock(dev);
WRITE_ONCE(dev->reg_state, NETREG_UNREGISTERED);
- netdev_unlock(dev);
linkwatch_sync_dev(dev);
+ netdev_unlock(dev);
}
cnt = 0;
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index cb04ef2b9807..002f18b11d85 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -240,7 +240,9 @@ static void __linkwatch_run_queue(int urgent_only)
*/
netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
spin_unlock_irq(&lweventlist_lock);
+ netdev_lock_ops(dev);
linkwatch_do_dev(dev);
+ netdev_unlock_ops(dev);
do_dev--;
spin_lock_irq(&lweventlist_lock);
}
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1ace0cd01adc..92cffb233306 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -325,7 +325,9 @@ static ssize_t carrier_show(struct device *dev,
/* Synchronize carrier state with link watch,
* see also rtnl_getlink().
*/
+ netdev_lock_ops(netdev);
linkwatch_sync_dev(netdev);
+ netdev_unlock_ops(netdev);
ret = sysfs_emit(buf, fmt_dec,
!!netif_carrier_ok(netdev));
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e4c93f87f5d4..2cb28a3d0d20 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4175,7 +4175,9 @@ static int rtnl_getlink(struct sk_buff *skb,
struct nlmsghdr *nlh,
* only TX if link watch work has run, but without this we'd
* already report carrier on, even if it doesn't work yet.
*/
+ netdev_lock_ops(dev);
linkwatch_sync_dev(dev);
+ netdev_unlock_ops(dev);
err = rtnl_fill_ifinfo(nskb, dev, net,
RTM_NEWLINK, NETLINK_CB(skb).portid,
--
2.45.0
Powered by blists - more mailing lists