[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aac073de8beec3e531c86c101b274d434741c28e.camel@nvidia.com>
Date: Wed, 2 Apr 2025 21:41:54 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "sdf@...ichev.me"
<sdf@...ichev.me>
CC: "edumazet@...gle.com" <edumazet@...gle.com>, "davem@...emloft.net"
<davem@...emloft.net>, "kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>
Subject: another netdev instance lock bug in ipv6_add_dev
Hi,
Not sure if it's reported already, but I encountered a bug while
testing with the new locking scheme.
This is the call trace:
[ 3454.975672] WARNING: CPU: 1 PID: 58237 at
./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
[ 3455.008776] ? ipv6_add_dev+0x370/0x620
[ 3455.010097] ipv6_find_idev+0x96/0xe0
[ 3455.010725] addrconf_add_dev+0x1e/0xa0
[ 3455.011382] addrconf_init_auto_addrs+0xb0/0x720
[ 3455.013537] addrconf_notify+0x35f/0x8d0
[ 3455.014214] notifier_call_chain+0x38/0xf0
[ 3455.014903] netdev_state_change+0x65/0x90
[ 3455.015586] linkwatch_do_dev+0x5a/0x70
[ 3455.016238] rtnl_getlink+0x241/0x3e0
[ 3455.019046] rtnetlink_rcv_msg+0x177/0x5e0
The call chain is rtnl_getlink -> linkwatch_sync_dev ->
linkwatch_do_dev -> netdev_state_change -> ...
Nothing on this path acquires the netdev lock, resulting in a warning.
Perhaps rtnl_getlink should acquire it, in addition to the RTNL already
held by rtnetlink_rcv_msg?
The same thing can be seen from the regular linkwatch wq:
[ 3456.637014] WARNING: CPU: 16 PID: 83257 at
./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
[ 3456.655305] Call Trace:
[ 3456.655610] <TASK>
[ 3456.655890] ? __warn+0x89/0x1b0
[ 3456.656261] ? ipv6_add_dev+0x370/0x620
[ 3456.660039] ipv6_find_idev+0x96/0xe0
[ 3456.660445] addrconf_add_dev+0x1e/0xa0
[ 3456.660861] addrconf_init_auto_addrs+0xb0/0x720
[ 3456.661803] addrconf_notify+0x35f/0x8d0
[ 3456.662236] notifier_call_chain+0x38/0xf0
[ 3456.662676] netdev_state_change+0x65/0x90
[ 3456.663112] linkwatch_do_dev+0x5a/0x70
[ 3456.663529] __linkwatch_run_queue+0xeb/0x200
[ 3456.663990] linkwatch_event+0x21/0x30
[ 3456.664399] process_one_work+0x211/0x610
[ 3456.664828] worker_thread+0x1cc/0x380
[ 3456.665691] kthread+0xf4/0x210
In this case, __linkwatch_run_queue seems like a good place to grab a
device lock before calling linkwatch_do_dev.
The proposed patch is below, I'll let you reason through the
implications of calling NETDEV_CHANGE notifiers from linkwatch with the
instance lock, you have thought about this much longer than me.
---
net/core/link_watch.c | 2 ++
net/core/rtnetlink.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index cb04ef2b9807..002f18b11d85 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -240,7 +240,9 @@ static void __linkwatch_run_queue(int urgent_only)
*/
netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
spin_unlock_irq(&lweventlist_lock);
+ netdev_lock_ops(dev);
linkwatch_do_dev(dev);
+ netdev_unlock_ops(dev);
do_dev--;
spin_lock_irq(&lweventlist_lock);
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a2736e434712..c77b37d897eb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4175,7 +4175,9 @@ static int rtnl_getlink(struct sk_buff *skb,
struct nlmsghdr *nlh,
* only TX if link watch work has run, but without this we'd
* already report carrier on, even if it doesn't work yet.
*/
+ netdev_lock_ops(dev);
linkwatch_sync_dev(dev);
+ netdev_unlock_ops(dev);
err = rtnl_fill_ifinfo(nskb, dev, net,
RTM_NEWLINK, NETLINK_CB(skb).portid,
--
2.45.0
Powered by blists - more mailing lists