lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c0a6b462200847d87221dcd7655b6a746012061.camel@nvidia.com>
Date: Wed, 7 May 2025 15:13:36 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "stfomichev@...il.com" <stfomichev@...il.com>
CC: "davem@...emloft.net" <davem@...emloft.net>,
	"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>, Dragos
 Tatulea <dtatulea@...dia.com>, "sdf@...ichev.me" <sdf@...ichev.me>,
	"kuba@...nel.org" <kuba@...nel.org>, "jiri@...nulli.us" <jiri@...nulli.us>,
	"edumazet@...gle.com" <edumazet@...gle.com>, "pabeni@...hat.com"
	<pabeni@...hat.com>, Saeed Mahameed <saeedm@...dia.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net] net: Lock lower level devices when updating features

On Wed, 2025-05-07 at 14:35 +0000, Cosmin Ratiu wrote:
> On Tue, 2025-05-06 at 11:13 -0700, Stanislav Fomichev wrote:
> > On 05/06, Cosmin Ratiu wrote:
> > 
> > 
> > Right, but netdev_sync_lower_features calls lower's
> > __netdev_update_features
> > only for NETIF_F_UPPER_DISABLES. So it doesn't propagate all
> > features,
> > only LRO AFAICT.
> 
> Got it, I didn't look into what netdev_sync_lower_features actually
> does besides noticing it can call __netdev_update_feature...
> 
> In any case, please hold off with picking this patch up, it seems
> there's a possibility of a real deadlock. Here's the scenario:
> 
> ============================================
> WARNING: possible recursive locking detected
> --------------------------------------------
> ethtool/44150 is trying to acquire lock:
> ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> __netdev_update_features+0x31e/0xe20
> 
> but task is already holding lock:
> ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> ethnl_set_features+0xbc/0x4b0
> and the lock comparison function returns 0:
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&dev_instance_lock_key#7);
>   lock(&dev_instance_lock_key#7);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by ethtool/44150:
>  #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
>  #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at:
> ethnl_set_features+0x88/0x4b0
>  #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> ethnl_set_features+0xbc/0x4b0
> 
> stack backtrace:
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x69/0xa0
>  print_deadlock_bug.cold+0xbd/0xca
>  __lock_acquire+0x163c/0x2f00
>  lock_acquire+0xd3/0x2e0
>  __mutex_lock+0x98/0xf10
>  __netdev_update_features+0x31e/0xe20
>  netdev_update_features+0x1f/0x60
>  vlan_device_event+0x57d/0x930 [8021q]
>  notifier_call_chain+0x3d/0x100
>  netdev_features_change+0x32/0x50
>  ethnl_set_features+0x17e/0x4b0
>  genl_family_rcv_msg_doit+0xe0/0x130
>  genl_rcv_msg+0x188/0x290
> [...]
> 
> I'm not sure how to solve this yet...
> Cosmin.

If it's not clear, the problem is that:
1. the lower device is already ops locked
2. netdev_feature_change gets called.
3. __netdev_update_features gets called for the vlan (upper) dev.
4. It tries to acquire the same lock instance as 1 (this patch).
5. Deadlock.

One solution I can think of would be to run device notifiers for
changing features outside the lock, it doesn't seem like the netdev
lock has anything to do with what other devices might do with this
information.

This can be triggered from many scenarios, I have another similar stack
involving bonding.

What do you think?

Cosmin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ