lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <97a94886-1252-4004-9a88-13430da1d25d@nvidia.com>
Date: Wed, 26 Mar 2025 15:46:40 +0200
From: Yael Chemla <ychemla@...dia.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>
Cc: Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in
 unregister_netdevice_notifier_dev_net().

On 17/02/2025 21:11, Kuniyuki Iwashima wrote:
> After the cited commit, dev_net(dev) is fetched before holding RTNL
> and passed to __unregister_netdevice_notifier_net().
> 
> However, dev_net(dev) might be different after holding RTNL.
> 
> In the reported case [0], while removing a VF device, its netns was
> being dismantled and the VF was moved to init_net.
> 
> So the following sequence is basically illegal when dev was fetched
> without lookup:
> 
>   net = dev_net(dev);
>   rtnl_net_lock(net);
> 
> Let's use a new helper rtnl_net_dev_lock() to fix the race.
> 
> It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
> dev_net_rcu(dev) is changed after rtnl_net_lock().
> 
> [0]:
> BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl (lib/dump_stack.c:123)
>  print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
>  kasan_report (mm/kasan/report.c:604)
>  notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
>  call_netdevice_notifiers_info (net/core/dev.c:2011)
>  unregister_netdevice_many_notify (net/core/dev.c:11551)
>  unregister_netdevice_queue (net/core/dev.c:11487)
>  unregister_netdev (net/core/dev.c:11635)
>  mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
>  auxiliary_bus_remove (drivers/base/auxiliary.c:230)
>  device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
>  bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
>  device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
>  mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
>  mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
>  mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
>  remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
>  pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
>  device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
>  unbind_store (drivers/base/bus.c:245)
>  kernfs_fop_write_iter (fs/kernfs/file.c:338)
>  vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
>  ksys_write (fs/read_write.c:732)
>  do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
>  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7f6a4d5018b7
> 
> Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> Reported-by: Yael Chemla <ychemla@...dia.com>
> Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> Reviewed-by: Eric Dumazet <edumazet@...gle.com>
> ---
> v5:
>   * Use do-while loop
> 
> v4:
>   * Fix build failure when !CONFIG_NET_NS
>   * Use net_passive_dec()
> 
> v3:
>   * Bump net->passive instead of maybe_get_net()
>   * Remove msleep(1) loop
>   * Use rcu_access_pointer() instead of rcu_read_lock().
> 
> v2:
>   * Use dev_net_rcu().
>   * Use msleep(1) instead of cond_resched() after maybe_get_net()
>   * Remove cond_resched() after net_eq() check
> 
> v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
> ---
>  net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b91658e8aedb..19e268568282 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2070,6 +2070,42 @@ static void __move_netdevice_notifier_net(struct net *src_net,
>  	__register_netdevice_notifier_net(dst_net, nb, true);
>  }
>  
> +static void rtnl_net_dev_lock(struct net_device *dev)
> +{
> +	bool again;
> +
> +	do {
> +		struct net *net;
> +
> +		again = false;
> +
> +		/* netns might be being dismantled. */
> +		rcu_read_lock();
> +		net = dev_net_rcu(dev);
> +		net_passive_inc(net);

Hi Kuniyuki,
It seems we are still encountering the previously reorted issue,
even when running with your latest fix. However, the problem has become
less frequent, now requiring multiple test iterations to reproduce.

1) we identified the following warnings (each accompanied by a call
trace; only one is detailed below, though others are similar):

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 6 PID: 1105 at lib/refcount.c:25 refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:25 (discriminator 1))

and also

refcount_t: underflow; use-after-free.
WARNING: CPU: 6 PID: 1105 at lib/refcount.c:28 refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:28 (discriminator 1))


2) test scenario:
sets up a network topology of two VFs on different eSwitch, performs
ping tests between them, verifies traffic rules offloading, and cleans
up the environment afterward.

3) the warning is triggered upon reaching the
unregister_netdevice_notifier_dev_net when both net->ns.count and
net->passive reference counts are zero.

 Call Trace:
  <TASK>
 ? __warn (/usr/work/linux/kernel/panic.c:748)
 ? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
 ? report_bug (/usr/work/linux/lib/bug.c:180 /usr/work/linux/lib/bug.c:219)
 ? handle_bug (/usr/work/linux/arch/x86/kernel/traps.c:285)
 ? exc_invalid_op (/usr/work/linux/arch/x86/kernel/traps.c:309
(discriminator 1))
 ? asm_exc_invalid_op
(/usr/work/linux/./arch/x86/include/asm/idtentry.h:574)
 ? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
 ? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
 rtnl_net_dev_lock (/usr/work/linux/./include/linux/refcount.h:190
/usr/work/linux/./include/linux/refcount.h:241
/usr/work/linux/./include/linux/refcount.h:258
/usr/work/linux/./include/net/net_namespace.h:339
/usr/work/linux/net/core/dev.c:2076)
 unregister_netdevice_notifier_dev_net
(/usr/work/linux/./include/linux/list.h:218
/usr/work/linux/./include/linux/list.h:229
/usr/work/linux/net/core/dev.c:2125)
 mlx5e_tc_nic_cleanup
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:5352)
mlx5_core
 mlx5e_cleanup_nic_rx
(/usr/work/linux/./drivers/net/ethernet/mellanox/mlx5/core/en/fs.h:161
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:5765)
mlx5_core
 mlx5e_detach_netdev
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6231)
mlx5_core
 _mlx5e_suspend
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6399)
mlx5_core
 mlx5e_remove
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6526
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6553)
mlx5_core
 auxiliary_bus_remove (/usr/work/linux/drivers/base/auxiliary.c:230)
 device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275
/usr/work/linux/drivers/base/dd.c:1296)
 bus_remove_device (/usr/work/linux/./include/linux/kobject.h:193
/usr/work/linux/drivers/base/base.h:73
/usr/work/linux/drivers/base/bus.c:586)
 device_del (/usr/work/linux/drivers/base/power/power.h:142
/usr/work/linux/drivers/base/core.c:3856)
 ? devl_param_driverinit_value_get (/usr/work/linux/net/devlink/param.c:778)
 mlx5_rescan_drivers_locked
(/usr/work/linux/./include/linux/auxiliary_bus.h:242
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:333
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:535
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
 mlx5_unregister_device
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:468)
mlx5_core
 mlx5_uninit_one (/usr/work/linux/./arch/x86/include/asm/bitops.h:206
/usr/work/linux/./arch/x86/include/asm/bitops.h:238
/usr/work/linux/./include/asm-generic/bitops/instrumented-non-atomic.h:142
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:1576)
mlx5_core
 remove_one
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:969
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:2034)
mlx5_core
 pci_device_remove (/usr/work/linux/./arch/x86/include/asm/atomic.h:23
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:457
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:2426
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:2456
/usr/work/linux/./include/linux/atomic/atomic-instrumented.h:1518
/usr/work/linux/./include/linux/pm_runtime.h:129
/usr/work/linux/drivers/pci/pci-driver.c:475)
 device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275
/usr/work/linux/drivers/base/dd.c:1296)
 pci_stop_bus_device (/usr/work/linux/drivers/pci/remove.c:46
/usr/work/linux/drivers/pci/remove.c:106)
 pci_stop_and_remove_bus_device (/usr/work/linux/drivers/pci/remove.c:141)
 pci_iov_remove_virtfn (/usr/work/linux/drivers/pci/iov.c:377)
 sriov_disable (/usr/work/linux/drivers/pci/iov.c:712 (discriminator 1)
/usr/work/linux/drivers/pci/iov.c:723 (discriminator 1))
 mlx5_sriov_disable
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:208)
mlx5_core
 mlx5_core_sriov_configure
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:229)
mlx5_core
 sriov_numvfs_store (/usr/work/linux/./include/linux/device.h:1045
/usr/work/linux/drivers/pci/iov.c:471)
 kernfs_fop_write_iter (/usr/work/linux/fs/kernfs/file.c:338)
 vfs_write (/usr/work/linux/fs/read_write.c:587 (discriminator 1)
/usr/work/linux/fs/read_write.c:679 (discriminator 1))
 ksys_write (/usr/work/linux/fs/read_write.c:732)
 do_syscall_64 (/usr/work/linux/arch/x86/entry/common.c:52
(discriminator 1) /usr/work/linux/arch/x86/entry/common.c:83
(discriminator 1))
 entry_SYSCALL_64_after_hwframe
(/usr/work/linux/arch/x86/entry/entry_64.S:130)
 RIP: 0033:0x7fbe3e9018b7

let me know if you need more information
Yael

> +		rcu_read_unlock();
> +
> +		rtnl_net_lock(net);
> +
> +#ifdef CONFIG_NET_NS
> +		/* dev might have been moved to another netns. */
> +		if (!net_eq(net, rcu_access_pointer(dev->nd_net.net))) {
> +			rtnl_net_unlock(net);
> +			net_passive_dec(net);
> +			again = true;
> +		}
> +#endif
> +	} while (again);
> +}
> +
> +static void rtnl_net_dev_unlock(struct net_device *dev)
> +{
> +	struct net *net = dev_net(dev);
> +
> +	rtnl_net_unlock(net);
> +	net_passive_dec(net);
> +}
> +
>  int register_netdevice_notifier_dev_net(struct net_device *dev,
>  					struct notifier_block *nb,
>  					struct netdev_net_notifier *nn)
> @@ -2077,6 +2113,11 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
>  	struct net *net = dev_net(dev);
>  	int err;
>  
> +	/* rtnl_net_lock() assumes dev is not yet published by
> +	 * register_netdevice().
> +	 */
> +	DEBUG_NET_WARN_ON_ONCE(!list_empty(&dev->dev_list));
> +
>  	rtnl_net_lock(net);
>  	err = __register_netdevice_notifier_net(net, nb, false);
>  	if (!err) {
> @@ -2093,13 +2134,12 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
>  					  struct notifier_block *nb,
>  					  struct netdev_net_notifier *nn)
>  {
> -	struct net *net = dev_net(dev);
>  	int err;
>  
> -	rtnl_net_lock(net);
> +	rtnl_net_dev_lock(dev);
>  	list_del(&nn->list);
> -	err = __unregister_netdevice_notifier_net(net, nb);
> -	rtnl_net_unlock(net);
> +	err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
> +	rtnl_net_dev_unlock(dev);
>  
>  	return err;
>  }


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ