[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJixPn_7gYd1o69V@pidgin.makrotopia.org>
Date: Sun, 10 Aug 2025 15:48:30 +0100
From: Daniel Golle <daniel@...rotopia.org>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Hauke Mehrtens <hauke@...ke-m.de>, Andrew Lunn <andrew@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Arkadi Sharshevsky <arkadis@...lanox.com>,
Florian Fainelli <f.fainelli@...il.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
Andreas Schirm <andreas.schirm@...mens.com>,
Alexander Sverdlin <alexander.sverdlin@...mens.com>,
Lukas Stockmann <lukas.stockmann@...mens.com>,
John Crispin <john@...ozen.org>
Subject: Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to
port_fdb_{add,del}
Hi Vladimir,
thank you for spending parts of your weekend dealing with this problem ;)
On Sun, Aug 10, 2025 at 04:06:37PM +0300, Vladimir Oltean wrote:
> On Sat, Aug 09, 2025 at 11:35:28PM +0100, Daniel Golle wrote:
> > Commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
> > notification") added a dev_close() call "to indicate inconsistent
> > situation" when we could not delete an FDB entry from the port. In case
> > of the lantiq_gswip driver this is problematic on standalone ports for
> > which all calls to either .port_fdb_add() or .port_fdb_del() would just
> > always return -EINVAL as adding or removing FDB entries is currently
> > only supported for ports which are a member of a bridge.
> >
> > As since commit c26933639b54 ("net: dsa: request drivers to perform FDB
> > isolation") the dsa_db is passed to the .port_fdb_add() or
> > .port_fdb_del() calls we can use that to set the FID accordingly,
> > similar to how it was for bridge ports, and to FID 0 for standalone
> > ports. In order for FID 0 to work at all we also need to set bit 1 in
> > val[1], so always set it.
> >
> > This solution was found in a downstream driver provided by MaxLinear
> > (which is the current owner of the former Lantiq switch IP) under
> > GPL-2.0. Import the implementation and the copyright headers from that
> > driver.
> >
> > Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
> > Signed-off-by: Daniel Golle <daniel@...rotopia.org>
> > ---
>
> 1. The dev_close() call was removed in commit 2fd186501b1c ("net: dsa:
> be louder when a non-legacy FDB operation fails"); what kernel are you
> seeing failures on?
I'm working on net-next at this moment, trying historic OpenWrt releases
on that hardware I noticed the problem has been introduced somewhere
between Linxu 5.10 and Linux 5.15, but it is still present up to net-next
as of today. That's why I concluded c9eb3e0f8701 would be the original
cause.
>
> 2. The call paths which set DSA_DB_PORT should be all guarded by
> dsa_switch_supports_uc_filtering(), which the gswip driver doesn't
> fulfill (it's missing ds->fdb_isolation). Can you put a dump_stack()
> in the DSA_DB_PORT handler and let me know where it's called from?
In case you meant the driver implementation of .port_fdb_add() and
.port_fdb_del(), see below. If you meant somewhere else, please let me
know exactly where you want the dump_stack() to be placed.
>
> 3. You haven't actually explained the context that leads to
> gswip_port_fdb() returning -EINVAL. It would be great to have that as
> a starting point, perhaps a dump_stack() in the unmodified code could
> reveal more.
So instead of the fix I have now applied this simple patch to reveal the
stacktrace which leads to the return of -EINVAL:
diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 6eb3140d4044..befca64cce6d 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -1359,7 +1359,7 @@ static int gswip_port_fdb(struct dsa_switch *ds, int port,
int i;
int err;
- if (!bridge)
+ if (WARN_ON(!bridge))
return -EINVAL;
for (i = max_ports; i < ARRAY_SIZE(priv->vlans); i++) {
---
Unfortunately that doesn't tell much:
[ 52.910000] ------------[ cut here ]------------
[ 52.910000] WARNING: CPU: 0 PID: 12 at drivers/net/dsa/lantiq_gswip.c:1362 0xc026c364
[ 52.920000] Modules linked in: tag_gswip lantiq_gswip
[ 52.930000] CPU: 0 UID: 0 PID: 12 Comm: kworker/u8:0 Not tainted 6.16.0+ #0 NONE
[ 52.930000] Hardware name: o2 Box 6431
[ 52.930000] Workqueue: dsa_ordered dsa_user_switchdev_event_work
[ 52.930000] Stack : 8187db3c 8187db10 00000000 000affff 807ec348 81c05200 81bd5e00 6473615f
[ 52.930000] 6f726465 72656400 00000000 00000000 00000000 00000001 8187daf8 8183ddc0
[ 52.930000] 00000000 00000000 808edd50 8187d920 ffffefff 00000000 00000098 0000009a
[ 52.930000] 8187d92c 0000009a 809fd7d8 fffffffb 00000001 00000000 808edd50 00000000
[ 52.930000] 00000000 c026c364 00000000 81cb6154 00000003 fffc2513 00000000 80eb0000
[ 52.930000] ...
[ 52.930000] Call Trace:
[ 52.930000] [<800167cc>] show_stack+0x28/0xf0
[ 52.930000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[ 52.930000] [<8003bb58>] __warn+0x9c/0x114
[ 52.930000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[ 52.930000] [<c026c364>] 0xc026c364
[ 52.930000]
[ 52.930000] ---[ end trace 0000000000000000 ]---
So my next attempt is this
diff --git a/net/dsa/user.c b/net/dsa/user.c
index f59d66f0975d..34d67486ec2f 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -3649,7 +3649,7 @@ static void dsa_user_switchdev_event_work(struct work_struct *work)
err = dsa_port_lag_fdb_add(dp, addr, vid);
else
err = dsa_port_fdb_add(dp, addr, vid);
- if (err) {
+ if (WARN_ON(err)) {
dev_err(ds->dev,
"port %d failed to add %pM vid %d to fdb: %d\n",
dp->index, addr, vid, err);
@@ -3665,7 +3665,7 @@ static void dsa_user_switchdev_event_work(struct work_struct *work)
err = dsa_port_lag_fdb_del(dp, addr, vid);
else
err = dsa_port_fdb_del(dp, addr, vid);
- if (err) {
+ if (WARN_ON(err)) {
dev_err(ds->dev,
"port %d failed to delete %pM vid %d from fdb: %d\n",
dp->index, addr, vid, err);
---
That lead to a little more revealing result:
[ 48.930000] ------------[ cut here ]------------
[ 48.930000] WARNING: CPU: 0 PID: 31 at net/dsa/user.c:3652 dsa_user_switchdev_event_work+0x1e8/0x210
[ 48.940000] Modules linked in: tag_gswip lantiq_gswip
[ 48.940000] CPU: 0 UID: 0 PID: 31 Comm: kworker/u8:2 Tainted: G W 6.16.0+ #0 NONE
[ 48.940000] Tainted: [W]=WARN
[ 48.940000] Hardware name: o2 Box 6431
[ 48.940000] Workqueue: dsa_ordered dsa_user_switchdev_event_work
[ 48.940000] Stack : 81c2dd24 81c2dcf8 00000000 000affff 807ec348 81c6b000 81bcc200 6473615f
[ 48.940000] 6f726465 72656400 00000000 00000000 00000000 00000001 81c2dce0 8183b840
[ 48.940000] 00000000 00000000 808edd50 81c2db08 ffffefff 00000000 00000191 00000193
[ 48.940000] 81c2db14 00000193 809fd7d8 fffffffb 00000001 00000000 808edd50 00000000
[ 48.940000] 00000000 807ec530 00000000 81804420 00000003 fffc5b17 000a3b9f 00000001
[ 48.940000] ...
[ 48.940000] Call Trace:
[ 48.940000] [<800167cc>] show_stack+0x28/0xf0
[ 48.940000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[ 48.940000] [<8003bb58>] __warn+0x9c/0x114
[ 48.940000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[ 48.940000] [<807ec530>] dsa_user_switchdev_event_work+0x1e8/0x210
[ 48.940000] [<8005892c>] process_one_work+0x1c4/0x3e0
[ 48.940000] [<8005a1ac>] worker_thread+0x318/0x488
[ 48.940000] [<80062000>] kthread+0x118/0x258
[ 48.940000] [<80011db8>] ret_from_kernel_thread+0x14/0x1c
[ 48.940000]
[ 48.940000] ---[ end trace 0000000000000000 ]---
So I went ahead and instead put the WARN_ON() in the function which schedules the work.
diff --git a/net/dsa/user.c b/net/dsa/user.c
index f59d66f0975d..1b56c0a5d5d6 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -3748,6 +3748,7 @@ static int dsa_user_fdb_event(struct net_device *dev,
orig_dev->name, fdb_info->addr, fdb_info->vid,
host_addr ? " as host address" : "");
+ WARN_ON(1);
INIT_WORK(&switchdev_work->work, dsa_user_switchdev_event_work);
switchdev_work->event = event;
switchdev_work->dev = dev;
---
It is now a bit difficult to tell which of the calls actually return
-EINVAL, but counting the number of stackdumps and the number of -EINVAL
error messages seem to match:
[ 65.500000] ------------[ cut here ]------------
[ 65.500000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[ 65.510000] Modules linked in: tag_gswip lantiq_gswip
[ 65.510000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Not tainted 6.16.0+ #0 NONE
[ 65.510000] Hardware name: o2 Box 6431
[ 65.510000] Stack : 81f5b8b4 00000096 00000000 00000001 00000000 00000000 00000000 00000000
[ 65.510000] 00000000 00000000 00000000 00000000 00000000 00000001 81f5b870 81839f40
[ 65.510000] 00000000 00000000 808edd50 81f5b708 ffffefff 00000000 00000096 00000098
[ 65.510000] 81f5b714 00000098 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[ 65.510000] 00000000 807ed128 00000000 807eb87c 00000003 fffc249b 000726df 00000001
[ 65.510000] ...
[ 65.510000] Call Trace:
[ 65.510000] [<800167cc>] show_stack+0x28/0xf0
[ 65.510000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[ 65.510000] [<8003bb58>] __warn+0x9c/0x114
[ 65.510000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[ 65.510000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[ 65.510000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[ 65.510000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[ 65.510000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[ 65.510000] [<807c43c0>] br_switchdev_fdb_replay+0xd0/0x138
[ 65.510000] [<807c4de8>] br_switchdev_port_offload+0x240/0x39c
[ 65.510000] [<80799b6c>] br_switchdev_blocking_event+0x80/0xec
[ 65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
[ 65.510000] [<807f83e0>] switchdev_bridge_port_offload+0x5c/0xd0
[ 65.510000] [<807e4c90>] dsa_port_bridge_join+0x170/0x410
[ 65.510000] [<807ed5fc>] dsa_user_changeupper.part.0+0x40/0x180
[ 65.510000] [<807f0ac0>] dsa_user_netdevice_event+0x5b4/0xc34
[ 65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
[ 65.510000] [<805edeec>] __netdev_upper_dev_link+0x1bc/0x450
[ 65.510000] [<805ee1dc>] netdevbmaster_upper_dev_link+0x2c/0x38
[ 65.510000] [<807a055c>] br_add_if+0x494/0x890
[ 65.510000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[ 65.510000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[ 65.510000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[ 65.510000] [<8002204c>] syscall_common+0x34/0x58
[ 65.510000]
[ 65.510000] ---[ end trace 0000000000000000 ]---
[ 65.710000] gswip 1e108000.switch lan1: entered promiscuous mode
[ 65.710000] ------------[ cut here ]------------
[ 65.720000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[ 65.730000] Modules linked in: tag_gswip lantiq_gswip
[ 65.730000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Tainted: G W 6.16.0+ #0 NONE
[ 65.730000] Tainted: [W]=WARN
[ 65.730000] Hardware name: o2 Box 6431
[ 65.730000] Stack : 81f5bb04 8003b4bc 00000000 00000001 00000000 00000000 00000000 00000000
[ 65.730000] 00000000 00000000 00000000 00000000 00000000 00000001 81f5bac0 81839f40
[ 65.730000] 00000000 00000000 808edd50 81f5b958 ffffefff 00000000 000000be 000000c0
[ 65.730000] 81f5b964 000000c0 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[ 65.730000] 00000000 807ed128 00000000 807eb87c 00000003 fffc2d13 00000000 80eb0000
[ 65.730000] ...
[ 65.730000] Call Trace:
[ 65.730000] [<800167cc>] show_stack+0x28/0xf0
[ 65.730000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[ 65.730000] [<8003bb58>] __warn+0x9c/0x114
[ 65.730000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[ 65.730000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[ 65.730000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[ 65.730000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[ 65.730000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[ 65.730000] [<80065ea8>] atomic_notifier_call_chain+0x48/0x88
[ 65.730000] [<807c47f0>] br_switchdev_fdb_notify+0xf0/0x11c
[ 65.730000] [<8079b33c>] fdb_notify+0x110/0x158
[ 65.730000] [<8079c1d0>] fdb_add_local+0x124/0x140
[ 65.730000] [<8079ce64>] br_fdb_add_local+0x4c/0x7c
[ 65.730000] [<807a0670>] br_add_if+0x5a8/0x890
[ 65.730000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[ 65.730000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[ 65.730000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[ 65.730000] [<8002204c>] syscall_common+0x34/0x58
[ 65.730000]
[ 65.730000] ---[ end trace 0000000000000000 ]---
[ 65.890000] ------------[ cut here ]------------
[ 65.900000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[ 65.900000] Modules linked in: tag_gswip lantiq_gswip
[ 65.910000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Tainted: G W 6.16.0+ #0 NONE
[ 65.910000] Tainted: [W]=WARN
[ 65.910000] Hardware name: o2 Box 6431
[ 65.910000] Stack : 81f5ba34 8003b4bc 00000000 00000001 00000000 00000000 00000000 00000000
[ 65.910000] 00000000 00000000 00000000 00000000 00000000 00000001 81f5b9f0 81839f40
[ 65.910000] 00000000 00000000 808edd50 81f5b888 ffffefff 00000000 000000df 000000e1
[ 65.910000] 81f5b894 000000e1 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[ 65.910000] 00000000 807ed128 00000000 807eb87c 00000003 fffc33df 00000000 80eb0000
[ 65.910000] ...
[ 65.910000] Call Trace:
[ 65.910000] [<800167cc>] show_stack+0x28/0xf0
[ 65.910000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[ 65.910000] [<8003bb58>] __warn+0x9c/0x114
[ 65.910000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[ 65.910000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[ 65.910000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[ 65.910000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[ 65.910000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[ 65.910000] [<80065ea8>] atomic_notifier_call_chain+0x48/0x88
[ 65.910000] [<807c47f0>] br_switchdev_fdb_notify+0xf0/0x11c
[ 65.910000] [<8079b33c>] fdb_notify+0x110/0x158
[ 65.910000] [<8079c1d0>] fdb_add_local+0x124/0x140
[ 65.910000] [<8079ce64>] br_fdb_add_local+0x4c/0x7c
[ 65.910000] [<807bee58>] __vlan_add+0xa0/0x93c
[ 65.910000] [<807bfc04>] nbp_vlan_add+0x134/0x1fc
[ 65.910000] [<807c0750>] nbp_vlan_init+0x120/0x178
[ 65.910000] [<807a06ac>] br_add_if+0x5e4/0x890
[ 65.910000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[ 65.910000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[ 65.910000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[ 65.910000] [<8002204c>] syscall_common+0x34/0x58
[ 65.910000]
[ 65.910000] ---[ end trace 0000000000000000 ]---
[ 66.080000] ------------[ cut here ]------------
[ 66.090000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[ 66.100000] Modules linked in: tag_gswip lantiq_gswip
[ 66.100000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Tainted: G W 6.16.0+ #0 NONE
[ 66.100000] Tainted: [W]=WARN
[ 66.100000] Hardware name: o2 Box 6431
[ 66.100000] Stack : 81f5baa4 8003b4bc 00000000 00000001 00000000 00000000 00000000 00000000
[ 66.100000] 00000000 00000000 00000000 00000000 00000000 00000001 81f5ba60 81839f40
[ 66.100000] 00000000 00000000 808edd50 81f5b8f8 ffffefff 00000000 00000103 00000105
[ 66.100000] 81f5b904 00000105 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[ 66.100000] 00000000 807ed128 00000000 807eb87c 00000003 fffc3b2b 00000000 80eb0000
[ 66.100000] ...
[ 66.100000] Call Trace:
[ 66.100000] [<800167cc>] show_stack+0x28/0xf0
[ 66.100000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[ 66.100000] [<8003bb58>] __warn+0x9c/0x114
[ 66.100000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[ 66.100000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[ 66.100000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[ 66.100000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[ 66.100000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[ 66.100000] [<80065ea8>] atomic_notifier_call_chain+0x48/0x88
[ 66.100000] [<807c4810>] br_switchdev_fdb_notify+0x110/0x11c
[ 66.100000] [<8079b33c>] fdb_notify+0x110/0x158
[ 66.100000] [<8079bdb4>] fdb_delete+0x1e4/0x310
[ 66.100000] [<8079c59c>] br_fdb_change_mac_address+0x180/0x188
[ 66.100000] [<807a46f0>] br_stp_change_bridge_id+0x4c/0x1c8
[ 66.100000] [<807a496c>] br_stp_recalculate_bridge_id+0x100/0x148
[ 66.100000] [<807a06c4>] br_add_if+0x5fc/0x890
[ 66.100000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[ 66.100000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[ 66.100000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[ 66.100000] [<8002204c>] syscall_common+0x34/0x58
[ 66.100000]
[ 66.100000] ---[ end trace 0000000000000000 ]---
[ 66.300000] gswip 1e108000.switch: port 3 failed to add 6a:94:c2:xx:xx:xx vid 1 to fdb: -22
[ 66.300000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 0 to fdb: -22
[ 66.320000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 1 to fdb: -22
[ 66.320000] gswip 1e108000.switch: port 3 failed to delete 6a:94:c2:xx:xx:xx vid 1 from fdb: -2
So the problem is apparently that at the point of calling br_add_if() the
port obviously isn't (yet) a member of the bridge and hence
dsa_port_bridge_dev_get() would still return NULL at this point, which
then causes gswip_port_fdb() to return -EINVAL.
Powered by blists - more mailing lists