lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <2aff4342b0f5b1539c02ffd8df4c7e58dd9746e7.camel@nvidia.com>
Date: Tue, 15 Jul 2025 15:04:48 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "sdf@...ichev.me" <sdf@...ichev.me>
CC: "kuba@...nel.org" <kuba@...nel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>
Subject: Sleeping in atomic context with VLAN and netdev instance lock drivers

Hi Stanislav,

There's a bug that was uncovered recently in a kernel with
DEBUG_ATOMIC_SLEEP related to the new netdev instance locking.

I looked a bit into it and I am not sure how to solve it, I'd like your
help. On a netdevice with instance locking enabled which supports
macsec (e.g. mlx5) and a kernel with:
CONFIG_MACSEC=y
CONFIG_MLX5_MACSEC=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

Run these:

IF=eth1
ip link del macsec0
ip link add link $IF macsec0 type macsec sci 3154 cipher gcm-aes-256
encrypt on encodingsa 0
ip link set dev macsec0 up
ip link add link macsec0 name macsec_vlan type vlan id 1
ip link set dev macsec_vlan address 00:11:22:33:44:88
ip link set dev macsec_vlan up

And you get this splat:
# BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:275
#   dump_stack_lvl+0x4f/0x60
#   __might_resched+0xeb/0x140
#   mutex_lock+0x1a/0x40
#   dev_set_promiscuity+0x26/0x90
#   __dev_set_promiscuity+0x85/0x170
#   __dev_set_rx_mode+0x69/0xa0
#   dev_uc_add+0x6d/0x80
#   vlan_dev_open+0x5f/0x120 [8021q]
#  __dev_open+0x10c/0x2a0
#  __dev_change_flags+0x1a4/0x210
#  netif_change_flags+0x22/0x60
#  do_setlink.isra.0+0xdb0/0x10f0
#  rtnl_newlink+0x797/0xb00
#  rtnetlink_rcv_msg+0x1cb/0x3f0
#  netlink_rcv_skb+0x53/0x100
#  netlink_unicast+0x273/0x3b0
#  netlink_sendmsg+0x1f2/0x430

The problem is taking the netdev instance lock while holding the dev-
>addr_list_lock spinlock.

Any suggestions on how to refactor things to avoid this? Maybe schedule
a wq task from vlan_dev_change_rx_flags instead of synchronously trying
to do the change? I'm not sure that would entirely solve the issue
though.

Cosmin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ