lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEt6LvBMwUMxmUyx@mini-arch>
Date: Thu, 12 Jun 2025 18:09:02 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: syzbot <syzbot+b8c48ea38ca27d150063@...kaller.appspotmail.com>
Cc: davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
	kuba@...nel.org, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, pabeni@...hat.com,
	syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] WARNING in __linkwatch_sync_dev (2)

On 06/11, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    f09079bd04a9 Merge tag 'powerpc-6.16-2' of git://git.kerne..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16e9260c580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=e24211089078d6c6
> dashboard link: https://syzkaller.appspot.com/bug?extid=b8c48ea38ca27d150063
> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-f09079bd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/ef68cb3d29a3/vmlinux-f09079bd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/1cc9431b9a15/bzImage-f09079bd.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+b8c48ea38ca27d150063@...kaller.appspotmail.com
> 
> ------------[ cut here ]------------
> RTNL: assertion failed at ./include/net/netdev_lock.h (72)
> WARNING: CPU: -1 PID: 1141 at ./include/net/netdev_lock.h:72 netdev_ops_assert_locked include/net/netdev_lock.h:72 [inline]
> WARNING: CPU: 0 PID: 1141 at ./include/net/netdev_lock.h:72 __linkwatch_sync_dev+0x1ed/0x230 net/core/link_watch.c:279
>  ethtool_op_get_link+0x1d/0x70 net/ethtool/ioctl.c:63
>  bond_check_dev_link+0x3f9/0x710 drivers/net/bonding/bond_main.c:863
>  bond_miimon_inspect drivers/net/bonding/bond_main.c:2745 [inline]
>  bond_mii_monitor+0x3c0/0x2dc0 drivers/net/bonding/bond_main.c:2967
>  process_one_work+0x9cf/0x1b70 kernel/workqueue.c:3238
>  process_scheduled_works kernel/workqueue.c:3321 [inline]
>  worker_thread+0x6c8/0xf10 kernel/workqueue.c:3402
>  kthread+0x3c5/0x780 kernel/kthread.c:464
>  ret_from_fork+0x5d4/0x6f0 arch/x86/kernel/process.c:148
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>

netdev_ops_assert_locked is called for non-ops-locked netdev and we
trigger ASSERT_RTNL case. Which is a bit misleading, but I noticed that
bond_miimon_inspect is running under rcu lock, which is not
gonna work for ops-locked devices :-/ (we want to grab instance
lock for the CHANGE notifiers).

I'm contemplating dropping rcu and doing try_lock rtnl. Looking at
commit f0c76d61779b ("bonding: refactor mii monitor"), it doesn't look
like there were issues with rtnl performance, so hopefully should be ok.

Because from my resent patches I remember this trace:

    [ 3456.656261]  ? ipv6_add_dev+0x370/0x620
    [ 3456.660039]  ipv6_find_idev+0x96/0xe0
    [ 3456.660445]  addrconf_add_dev+0x1e/0xa0
    [ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
    [ 3456.661803]  addrconf_notify+0x35f/0x8d0
    [ 3456.662236]  notifier_call_chain+0x38/0xf0
    [ 3456.662676]  netdev_state_change+0x65/0x90
    [ 3456.663112]  linkwatch_do_dev+0x5a/0x70

Where linkwatch_do_dev (potentially called from ethtool_op_get_link and
bond_check_dev_link) might trigger ipv6 address assignment so I'm not
sure how this all supposed to work under rcu and without rtnl lock.

Tentatively (untested uncompiled):

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c4d53e8e7c15..e2c4bcdb8b1a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2739,7 +2739,7 @@ static int bond_miimon_inspect(struct bonding *bond)
 			ignore_updelay = true;
 	}
 
-	bond_for_each_slave_rcu(bond, slave, iter) {
+	bond_for_each_slave(bond, slave, iter) {
 		bond_propose_link_state(slave, BOND_LINK_NOCHANGE);
 
 		link_state = bond_check_dev_link(bond, slave->dev, 0);
@@ -2962,35 +2962,28 @@ static void bond_mii_monitor(struct work_struct *work)
 	if (!bond_has_slaves(bond))
 		goto re_arm;
 
-	rcu_read_lock();
+	/* Race avoidance with bond_close cancel of workqueue */
+	if (!rtnl_trylock()) {
+		delay = 1;
+		should_notify_peers = false;
+		goto re_arm;
+	}
+
 	should_notify_peers = bond_should_notify_peers(bond);
 	commit = !!bond_miimon_inspect(bond);
 	if (bond->send_peer_notif) {
-		rcu_read_unlock();
-		if (rtnl_trylock()) {
-			bond->send_peer_notif--;
-			rtnl_unlock();
-		}
-	} else {
-		rcu_read_unlock();
+		bond->send_peer_notif--;
 	}
 
 	if (commit) {
-		/* Race avoidance with bond_close cancel of workqueue */
-		if (!rtnl_trylock()) {
-			delay = 1;
-			should_notify_peers = false;
-			goto re_arm;
-		}
-
 		bond_for_each_slave(bond, slave, iter) {
 			bond_commit_link_state(slave, BOND_SLAVE_NOTIFY_LATER);
 		}
 		bond_miimon_commit(bond);
-
-		rtnl_unlock();	/* might sleep, hold no other locks */
 	}
 
+	rtnl_unlock();	/* might sleep, hold no other locks */
+
 re_arm:
 	if (bond->params.miimon)
 		queue_delayed_work(bond->wq, &bond->mii_work, delay);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ