netdev - Re: Sleeping in atomic context with VLAN and netdev instance lock drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aHZ54sAfzIe0rmCd@mini-arch>
Date: Tue, 15 Jul 2025 08:55:14 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Cosmin Ratiu <cratiu@...dia.com>
Cc: "sdf@...ichev.me" <sdf@...ichev.me>,
	"kuba@...nel.org" <kuba@...nel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Sleeping in atomic context with VLAN and netdev instance lock
 drivers

On 07/15, Cosmin Ratiu wrote:
> Hi Stanislav,
> 
> There's a bug that was uncovered recently in a kernel with
> DEBUG_ATOMIC_SLEEP related to the new netdev instance locking.
> 
> I looked a bit into it and I am not sure how to solve it, I'd like your
> help. On a netdevice with instance locking enabled which supports
> macsec (e.g. mlx5) and a kernel with:
> CONFIG_MACSEC=y
> CONFIG_MLX5_MACSEC=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> 
> Run these:
> 
> IF=eth1
> ip link del macsec0
> ip link add link $IF macsec0 type macsec sci 3154 cipher gcm-aes-256
> encrypt on encodingsa 0
> ip link set dev macsec0 up
> ip link add link macsec0 name macsec_vlan type vlan id 1
> ip link set dev macsec_vlan address 00:11:22:33:44:88
> ip link set dev macsec_vlan up
> 
> And you get this splat:
> # BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:275
> #   dump_stack_lvl+0x4f/0x60
> #   __might_resched+0xeb/0x140
> #   mutex_lock+0x1a/0x40
> #   dev_set_promiscuity+0x26/0x90
> #   __dev_set_promiscuity+0x85/0x170
> #   __dev_set_rx_mode+0x69/0xa0
> #   dev_uc_add+0x6d/0x80
> #   vlan_dev_open+0x5f/0x120 [8021q]
> #  __dev_open+0x10c/0x2a0
> #  __dev_change_flags+0x1a4/0x210
> #  netif_change_flags+0x22/0x60
> #  do_setlink.isra.0+0xdb0/0x10f0
> #  rtnl_newlink+0x797/0xb00
> #  rtnetlink_rcv_msg+0x1cb/0x3f0
> #  netlink_rcv_skb+0x53/0x100
> #  netlink_unicast+0x273/0x3b0
> #  netlink_sendmsg+0x1f2/0x430
> 
> The problem is taking the netdev instance lock while holding the dev-
> >addr_list_lock spinlock.
> 
> Any suggestions on how to refactor things to avoid this? Maybe schedule
> a wq task from vlan_dev_change_rx_flags instead of synchronously trying
> to do the change? I'm not sure that would entirely solve the issue
> though.

Thanks for the report, I was looking at similar issue in [0] and for
macsec I was thinking about the following:

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 7edbe76b5455..4c75d1fea552 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3868,7 +3868,7 @@ static void macsec_setup(struct net_device *dev)
 	ether_setup(dev);
 	dev->min_mtu = 0;
 	dev->max_mtu = ETH_MAX_MTU;
-	dev->priv_flags |= IFF_NO_QUEUE;
+	dev->priv_flags |= IFF_NO_QUEUE | IFF_UNICAST_FLT;
 	dev->netdev_ops = &macsec_netdev_ops;
 	dev->needs_free_netdev = true;
 	dev->priv_destructor = macsec_free_netdev;

macsec has an ndo_set_rx_mode handler that propagates the uc list so
not sure why it lacks IFF_UNICAST_FLT.

This is not a systemic fix, but I guess with the limited number of
stacking devices, that should do? If that fixes the issue for you,
I can send a patch..

0: https://lore.kernel.org/netdev/686d55b4.050a0220.1ffab7.0014.GAE@google.com/