netdev - Re: another netdev instance lock bug in ipv6_add

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4b1397ffa83c73dfdab6bcbce51e564592e18c8.camel@nvidia.com>
Date: Thu, 3 Apr 2025 13:24:01 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "stfomichev@...il.com" <stfomichev@...il.com>
CC: "pabeni@...hat.com" <pabeni@...hat.com>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"davem@...emloft.net" <davem@...emloft.net>, "sdf@...ichev.me"
	<sdf@...ichev.me>, "kuba@...nel.org" <kuba@...nel.org>
Subject: Re: another netdev instance lock bug in ipv6_add_dev

On Wed, 2025-04-02 at 16:20 -0700, Stanislav Fomichev wrote:
> On 04/02, Cosmin Ratiu wrote:
> > Hi,
> > 
> > Not sure if it's reported already, but I encountered a bug while
> > testing with the new locking scheme.
> > This is the call trace:
> > 
> > [ 3454.975672] WARNING: CPU: 1 PID: 58237 at
> > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > [ 3455.008776]  ? ipv6_add_dev+0x370/0x620
> > [ 3455.010097]  ipv6_find_idev+0x96/0xe0
> > [ 3455.010725]  addrconf_add_dev+0x1e/0xa0
> > [ 3455.011382]  addrconf_init_auto_addrs+0xb0/0x720
> > [ 3455.013537]  addrconf_notify+0x35f/0x8d0
> > [ 3455.014214]  notifier_call_chain+0x38/0xf0
> > [ 3455.014903]  netdev_state_change+0x65/0x90
> > [ 3455.015586]  linkwatch_do_dev+0x5a/0x70
> > [ 3455.016238]  rtnl_getlink+0x241/0x3e0
> > [ 3455.019046]  rtnetlink_rcv_msg+0x177/0x5e0
> > 
> > The call chain is rtnl_getlink -> linkwatch_sync_dev ->
> > linkwatch_do_dev -> netdev_state_change -> ...
> > 
> > Nothing on this path acquires the netdev lock, resulting in a
> > warning.
> > Perhaps rtnl_getlink should acquire it, in addition to the RTNL
> > already
> > held by rtnetlink_rcv_msg?
> > 
> > The same thing can be seen from the regular linkwatch wq:
> > 
> > [ 3456.637014] WARNING: CPU: 16 PID: 83257 at
> > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > [ 3456.655305] Call Trace:
> > [ 3456.655610]  <TASK>
> > [ 3456.655890]  ? __warn+0x89/0x1b0
> > [ 3456.656261]  ? ipv6_add_dev+0x370/0x620
> > [ 3456.660039]  ipv6_find_idev+0x96/0xe0
> > [ 3456.660445]  addrconf_add_dev+0x1e/0xa0
> > [ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
> > [ 3456.661803]  addrconf_notify+0x35f/0x8d0
> > [ 3456.662236]  notifier_call_chain+0x38/0xf0
> > [ 3456.662676]  netdev_state_change+0x65/0x90
> > [ 3456.663112]  linkwatch_do_dev+0x5a/0x70
> > [ 3456.663529]  __linkwatch_run_queue+0xeb/0x200
> > [ 3456.663990]  linkwatch_event+0x21/0x30
> > [ 3456.664399]  process_one_work+0x211/0x610
> > [ 3456.664828]  worker_thread+0x1cc/0x380
> > [ 3456.665691]  kthread+0xf4/0x210
> > 
> > In this case, __linkwatch_run_queue seems like a good place to grab
> > a
> > device lock before calling linkwatch_do_dev.
> 
> Thanks for the report! What about linkwatch_sync_dev in
> netdev_run_todo
> and carrier_show? Should probably also need to be wrapped?

Done, here's the patch I'm testing with which works for all tests I
could get my hands on. Will you officially propose it (maybe in a
slightly different form) please?

-------------------------

linkwatch can end up calling the IPv6 addrconf notifier, which might
end
up calling ipv6_add_dev which requires holding the netdev lock.

This patch makes sure that the netdev instance lock is held on all call
paths.

Signed-off-by: Cosmin Ratiu <cratiu@...dia.com>
Change-Id: Ief821cf069408cecc82adaa01cafa0462c51908a
---
 net/core/dev.c        | 2 +-
 net/core/link_watch.c | 2 ++
 net/core/net-sysfs.c  | 2 ++
 net/core/rtnetlink.c  | 2 ++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 87cba93fa59f..1b9ee2828076 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11343,8 +11343,8 @@ void netdev_run_todo(void)
 
 		netdev_lock(dev);
 		WRITE_ONCE(dev->reg_state, NETREG_UNREGISTERED);
-		netdev_unlock(dev);
 		linkwatch_sync_dev(dev);
+		netdev_unlock(dev);
 	}
 
 	cnt = 0;
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index cb04ef2b9807..002f18b11d85 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -240,7 +240,9 @@ static void __linkwatch_run_queue(int urgent_only)
 		 */
 		netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
 		spin_unlock_irq(&lweventlist_lock);
+		netdev_lock_ops(dev);
 		linkwatch_do_dev(dev);
+		netdev_unlock_ops(dev);
 		do_dev--;
 		spin_lock_irq(&lweventlist_lock);
 	}
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1ace0cd01adc..92cffb233306 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -325,7 +325,9 @@ static ssize_t carrier_show(struct device *dev,
 		/* Synchronize carrier state with link watch,
 		 * see also rtnl_getlink().
 		 */
+		netdev_lock_ops(netdev);
 		linkwatch_sync_dev(netdev);
+		netdev_unlock_ops(netdev);
 
 		ret = sysfs_emit(buf, fmt_dec,
!!netif_carrier_ok(netdev));
 	}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e4c93f87f5d4..2cb28a3d0d20 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4175,7 +4175,9 @@ static int rtnl_getlink(struct sk_buff *skb,
struct nlmsghdr *nlh,
 	 * only TX if link watch work has run, but without this we'd
 	 * already report carrier on, even if it doesn't work yet.
 	 */
+	netdev_lock_ops(dev);
 	linkwatch_sync_dev(dev);
+	netdev_unlock_ops(dev);
 
 	err = rtnl_fill_ifinfo(nskb, dev, net,
 			       RTM_NEWLINK, NETLINK_CB(skb).portid,
-- 
2.45.0