lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8/t/LYlMWEwa+RL@mev-dev.igk.intel.com>
Date: Tue, 11 Mar 2025 09:02:04 +0100
From: Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>
To: Tariq Toukan <tariqt@...dia.com>
Cc: "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Andrew Lunn <andrew+netdev@...n.ch>, Gal Pressman <gal@...dia.com>,
	Mark Bloch <mbloch@...dia.com>, Moshe Shemesh <moshe@...dia.com>,
	Saeed Mahameed <saeedm@...dia.com>,
	Leon Romanovsky <leon@...nel.org>, netdev@...r.kernel.org,
	linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
	Jianbo Liu <jianbol@...dia.com>
Subject: Re: [PATCH net 5/6] net/mlx5: Bridge, fix the crash caused by LAG
 state check

On Tue, Mar 11, 2025 at 12:01:43AM +0200, Tariq Toukan wrote:
> From: Jianbo Liu <jianbol@...dia.com>
> 
> When removing LAG device from bridge, NETDEV_CHANGEUPPER event is
> triggered. Driver finds the lower devices (PFs) to flush all the
> offloaded entries. And mlx5_lag_is_shared_fdb is checked, it returns
> false if one of PF is unloaded. In such case,
> mlx5_esw_bridge_lag_rep_get() and its caller return NULL, instead of
> the alive PF, and the flush is skipped.
> 
> Besides, the bridge fdb entry's lastuse is updated in mlx5 bridge
> event handler. But this SWITCHDEV_FDB_ADD_TO_BRIDGE event can be
> ignored in this case because the upper interface for bond is deleted,
> and the entry will never be aged because lastuse is never updated.
> 
> To make things worse, as the entry is alive, mlx5 bridge workqueue
> keeps sending that event, which is then handled by kernel bridge
> notifier. It causes the following crash when accessing the passed bond
> netdev which is already destroyed.
> 
> To fix this issue, remove such checks. LAG state is already checked in
> commit 15f8f168952f ("net/mlx5: Bridge, verify LAG state when adding
> bond to bridge"), driver still need to skip offload if LAG becomes
> invalid state after initialization.
> 
>  Oops: stack segment: 0000 [#1] SMP
>  CPU: 3 UID: 0 PID: 23695 Comm: kworker/u40:3 Tainted: G           OE      6.11.0_mlnx #1
>  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  Workqueue: mlx5_bridge_wq mlx5_esw_bridge_update_work [mlx5_core]
>  RIP: 0010:br_switchdev_event+0x2c/0x110 [bridge]
>  Code: 44 00 00 48 8b 02 48 f7 00 00 02 00 00 74 69 41 54 55 53 48 83 ec 08 48 8b a8 08 01 00 00 48 85 ed 74 4a 48 83 fe 02 48 89 d3 <4c> 8b 65 00 74 23 76 49 48 83 fe 05 74 7e 48 83 fe 06 75 2f 0f b7
>  RSP: 0018:ffffc900092cfda0 EFLAGS: 00010297
>  RAX: ffff888123bfe000 RBX: ffffc900092cfe08 RCX: 00000000ffffffff
>  RDX: ffffc900092cfe08 RSI: 0000000000000001 RDI: ffffffffa0c585f0
>  RBP: 6669746f6e690a30 R08: 0000000000000000 R09: ffff888123ae92c8
>  R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888123ae9c60
>  R13: 0000000000000001 R14: ffffc900092cfe08 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f15914c8734 CR3: 0000000002830005 CR4: 0000000000770ef0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   ? __die_body+0x1a/0x60
>   ? die+0x38/0x60
>   ? do_trap+0x10b/0x120
>   ? do_error_trap+0x64/0xa0
>   ? exc_stack_segment+0x33/0x50
>   ? asm_exc_stack_segment+0x22/0x30
>   ? br_switchdev_event+0x2c/0x110 [bridge]
>   ? sched_balance_newidle.isra.149+0x248/0x390
>   notifier_call_chain+0x4b/0xa0
>   atomic_notifier_call_chain+0x16/0x20
>   mlx5_esw_bridge_update+0xec/0x170 [mlx5_core]
>   mlx5_esw_bridge_update_work+0x19/0x40 [mlx5_core]
>   process_scheduled_works+0x81/0x390
>   worker_thread+0x106/0x250
>   ? bh_worker+0x110/0x110
>   kthread+0xb7/0xe0
>   ? kthread_park+0x80/0x80
>   ret_from_fork+0x2d/0x50
>   ? kthread_park+0x80/0x80
>   ret_from_fork_asm+0x11/0x20
>   </TASK>
> 
> Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
> Signed-off-by: Jianbo Liu <jianbol@...dia.com>
> Reviewed-by: Vlad Buslov <vladbu@...dia.com>
> Signed-off-by: Tariq Toukan <tariqt@...dia.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en/rep/bridge.c  | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
> index 5d128c5b4529..0f5d7ea8956f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
> @@ -48,15 +48,10 @@ mlx5_esw_bridge_lag_rep_get(struct net_device *dev, struct mlx5_eswitch *esw)
>  	struct list_head *iter;
>  
>  	netdev_for_each_lower_dev(dev, lower, iter) {
> -		struct mlx5_core_dev *mdev;
> -		struct mlx5e_priv *priv;
> -
>  		if (!mlx5e_eswitch_rep(lower))
>  			continue;
>  
> -		priv = netdev_priv(lower);
> -		mdev = priv->mdev;
> -		if (mlx5_lag_is_shared_fdb(mdev) && mlx5_esw_bridge_dev_same_esw(lower, esw))
> +		if (mlx5_esw_bridge_dev_same_esw(lower, esw))
>  			return lower;
>  	}
>  
> @@ -125,7 +120,7 @@ static bool mlx5_esw_bridge_is_local(struct net_device *dev, struct net_device *
>  	priv = netdev_priv(rep);
>  	mdev = priv->mdev;
>  	if (netif_is_lag_master(dev))
> -		return mlx5_lag_is_shared_fdb(mdev) && mlx5_lag_is_master(mdev);
> +		return mlx5_lag_is_master(mdev);
>  	return true;
>  }
>  
> @@ -455,6 +450,9 @@ static int mlx5_esw_bridge_switchdev_event(struct notifier_block *nb,
>  	if (!rep)
>  		return NOTIFY_DONE;
>  
> +	if (netif_is_lag_master(dev) && !mlx5_lag_is_shared_fdb(esw->dev))
> +		return NOTIFY_DONE;
> +
>  	switch (event) {
>  	case SWITCHDEV_FDB_ADD_TO_BRIDGE:
>  		fdb_info = container_of(info,

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>

> -- 
> 2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ