[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZM1ye+UYKdW6a/m5@euler>
Date: Fri, 4 Aug 2023 14:49:47 -0700
From: Colin Foster <colin.foster@...advantage.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Claudiu Manoil <claudiu.manoil@....com>,
Alexandre Belloni <alexandre.belloni@...tlin.com>,
UNGLinuxDriver@...rochip.com
Subject: Re: [PATCH net] net: dsa: ocelot: call dsa_tag_8021q_unregister()
under rtnl_lock() on driver remove
Hi Vladimir,
On Fri, Aug 04, 2023 at 08:09:58PM +0300, Vladimir Oltean wrote:
> Hi Colin,
>
> The WARN_ON() in dsa_switch_release_ports() is different, and I tried to fix it here:
> https://patchwork.kernel.org/project/netdevbpf/patch/20230411144955.1604591-1-vladimir.oltean@nxp.com/
> but judging by the fact that that was in April and now we're in August,
> obviously I didn't succeed.
>
> What's worse is the other one, the "scheduling while atomic" bug in the
> gpiochip removal path from ocelot-pinctrl.c. I'm not sure, at first glance,
> what causes the calling context to be atomic. Presumably some kind of
> spinlock which should be tracked down.
>
> Unfortunately I'm not very good with kernel debugging the way it should
> be done, so what I would advise you to do is to walk the stack trace
> down, from device_del() or so, and sprinkle a few might_sleep() calls
> until you figure out who's forcing atomic context and why.
I'm not super confident in tracking down these types of bugs either, but
you already knew that ;-) Thanks for these debugging tips. This is
absolutely a problem I want to understand!
When I was _really_ early in my SGPIO LED bring-up, I saw a scheduling
while atomic warning when I set one of those LEDs to trigger on
heartbeat. I have since tested that many times and never saw it again.
> Otherwise, can't you just unbind the driver from the ethernet-switch
> child of the SPI device, rather than the entire SPI device? That should
> avoid the gpiochip/pinctrl bug. And the other one is ignorable for the
> intents and purposes here (that is, unless you want to take care of it,
> of course).
Ok, I might have it recreated. My method:
(updated to 6.5-rc4)
Set dsa up for ocelot-8021q:
echo ocelot-8021q > /sys/class/net/eth0/dsa/tagging
Bring up the ports in my jumpered bridge configuration. (Bridge
swp1-swp7 and have a bunch of short cables to test RSTP.)
**
Side note, in 8021q mode I can ping through the bridge's swp1 to my
desktop, but RSTP forwards on all ports. I'm adding this to my list.
**
Unbind the ocelot-ext driver:
echo ocelot-ext-switch.5.auto > \
/sys/bus/platform/drivers/ocelot-ext-switch/unbind
br0: port 1(swp1) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left promiscuous mode
br0: port 1(swp1) entered disabled state
br0: port 2(swp2) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left promiscuous mode
br0: port 2(swp2) entered disabled state
br0: port 3(swp3) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left promiscuous mode
br0: port 3(swp3) entered disabled state
br0: port 4(swp4) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left promiscuous mode
br0: port 4(swp4) entered disabled state
br0: port 5(swp5) entered disabled state
br0: port 5(swp5) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left promiscuous mode
br0: port 5(swp5) entered disabled state
br0: port 6(swp6) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left promiscuous mode
br0: port 6(swp6) entered disabled state
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left allmulticast mode
cpsw-switch 4a100000.switch eth0: left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left promiscuous mode
cpsw-switch 4a100000.switch eth0: left promiscuous mode
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x1a8/0x1ac
RTNL: assertion failed at net/dsa/tag_8021q.c (409)
Modules linked in:
CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:00000009 r6:00000000 r5:c18c0c5c r4:00030113
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:00000009 r6:c1197930 r5:00000199 r4:c1a06dfc
dump_stack from __warn+0x88/0x160
__warn from warn_slowpath_fmt+0x150/0x1e0
r8:00000009 r7:00000199 r6:c1a06dfc r5:c1d05590 r4:c1c978a4
warn_slowpath_fmt from dsa_tag_8021q_unregister+0x1a8/0x1ac
r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c46ec180 r5:c4284a30
r4:c4298300
dsa_tag_8021q_unregister from felix_tag_8021q_teardown+0x94/0x140
r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c26db808 r5:c4284a30
r4:c4298300
felix_tag_8021q_teardown from felix_teardown+0x30/0xc0
r7:00000000 r6:c26db808 r5:c4298300 r4:c4298300
felix_teardown from dsa_tree_teardown_switches+0xa4/0xd4
r7:00000000 r6:c4284b48 r5:c4282600 r4:c4298300
dsa_tree_teardown_switches from dsa_unregister_switch+0xe4/0x18c
r7:c4255c54 r6:c1ec5414 r5:c4284b40 r4:c4298300
dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
ocelot_ext_remove from platform_remove+0x50/0x6c
r5:c424d800 r4:c4255c10
platform_remove from device_remove+0x50/0x74
r5:c424d800 r4:c4255c10
device_remove from device_release_driver_internal+0x190/0x204
r5:c424d800 r4:c4255c10
device_release_driver_internal from device_driver_detach+0x20/0x24
r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
device_driver_detach from unbind_store+0x64/0xa0
unbind_store from drv_attr_store+0x34/0x40
r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740
drv_attr_store from sysfs_kf_write+0x48/0x54
r5:c471a100 r4:c0a5299c
sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
r5:c471a100 r4:00000019
kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68
r4:c4714f00
vfs_write from ksys_write+0x70/0xf4
r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00
r4:c4714f02
ksys_write from sys_write+0x18/0x1c
r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019
sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0: 00000019 00579a60 00000001 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06
---[ end trace 0000000000000000 ]---
cpsw-switch 4a100000.switch eth0: Link is Down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/8021q/vlan_core.c:376 vlan_vid_del+0x14c/0x16c
RTNL: assertion failed at net/8021q/vlan_core.c (376)
Modules linked in:
CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:00000009 r6:00000000 r5:c18c0c5c r4:000f0013
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:00000009 r6:c11a5554 r5:00000178 r4:c1a078b4
dump_stack from __warn+0x88/0x160
__warn from warn_slowpath_fmt+0x150/0x1e0
r8:00000009 r7:00000178 r6:c1a078b4 r5:c1d05590 r4:c1c978a4
warn_slowpath_fmt from vlan_vid_del+0x14c/0x16c
r10:00000c01 r9:c46ec180 r8:c241f000 r7:c241f000 r6:00000c01 r5:0000a888
r4:c4298300
vlan_vid_del from dsa_tag_8021q_unregister+0x164/0x1ac
r9:c46ec180 r8:c241f000 r7:00000000 r6:c46ec180 r5:00000001 r4:c4298300
dsa_tag_8021q_unregister from felix_tag_8021q_teardown+0x94/0x140
r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c26db808 r5:c4284a30
r4:c4298300
felix_tag_8021q_teardown from felix_teardown+0x30/0xc0
r7:00000000 r6:c26db808 r5:c4298300 r4:c4298300
felix_teardown from dsa_tree_teardown_switches+0xa4/0xd4
r7:00000000 r6:c4284b48 r5:c4282600 r4:c4298300
dsa_tree_teardown_switches from dsa_unregister_switch+0xe4/0x18c
r7:c4255c54 r6:c1ec5414 r5:c4284b40 r4:c4298300
dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
ocelot_ext_remove from platform_remove+0x50/0x6c
r5:c424d800 r4:c4255c10
platform_remove from device_remove+0x50/0x74
r5:c424d800 r4:c4255c10
device_remove from device_release_driver_internal+0x190/0x204
r5:c424d800 r4:c4255c10
device_release_driver_internal from device_driver_detach+0x20/0x24
r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
device_driver_detach from unbind_store+0x64/0xa0
unbind_store from drv_attr_store+0x34/0x40
r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740
drv_attr_store from sysfs_kf_write+0x48/0x54
r5:c471a100 r4:c0a5299c
sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
r5:c471a100 r4:00000019
kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68
r4:c4714f00
vfs_write from ksys_write+0x70/0xf4
r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00
r4:c4714f02
ksys_write from sys_write+0x18/0x1c
r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019
sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0: 00000019 00579a60 00000001 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06
---[ end trace 0000000000000000 ]---
DSA: tree 0 torn down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
Modules linked in:
CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:00000009 r6:00000000 r5:c18c0c5c r4:000d0013
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:00000009 r6:c1187600 r5:000005d2 r4:c1a06454
dump_stack from __warn+0x88/0x160
__warn from warn_slowpath_fmt+0xe4/0x1e0
r8:00000009 r7:000005d2 r6:c1a06454 r5:c1d05590 r4:c1c978a4
warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
r10:c2182b90 r9:c4284b68 r8:00000100 r7:c1a06454 r6:c4298300 r5:c4282800
r4:c4282600
dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
r9:c4284b68 r8:c424d844 r7:c4255c54 r6:c4284b70 r5:c4284b40 r4:c4298300
dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
ocelot_ext_remove from platform_remove+0x50/0x6c
r5:c424d800 r4:c4255c10
platform_remove from device_remove+0x50/0x74
r5:c424d800 r4:c4255c10
device_remove from device_release_driver_internal+0x190/0x204
r5:c424d800 r4:c4255c10
device_release_driver_internal from device_driver_detach+0x20/0x24
r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
device_driver_detach from unbind_store+0x64/0xa0
unbind_store from drv_attr_store+0x34/0x40
r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740
drv_attr_store from sysfs_kf_write+0x48/0x54
r5:c471a100 r4:c0a5299c
sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
r5:c471a100 r4:00000019
kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68
r4:c4714f00
vfs_write from ksys_write+0x70/0xf4
r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00
r4:c4714f02
ksys_write from sys_write+0x18/0x1c
r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019
sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0: 00000019 00579a60 00000001 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06
---[ end trace 0000000000000000 ]---
BUG: scheduling while atomic: bash/157/0x00000002
Modules linked in:
Preemption disabled at:
[<c03b8ff0>] __wake_up_klogd.part.0+0x20/0xb4
CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:00000002 r6:00000080 r5:c18c0c5c r4:000d0093
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:00000002 r6:c42ad8c0 r5:c03b8ff0 r4:c42ad8c0
dump_stack from __schedule_bug+0x94/0xa4
__schedule_bug from __schedule+0x8fc/0xc48
r5:00000000 r4:df99a400
__schedule from schedule+0x60/0xf4
r10:5ac3c35a r9:befffd07 r8:fffffe30 r7:00000002 r6:c03002f4 r5:e0c55fb0
r4:c42ad8c0
schedule from do_work_pending+0x84/0x568
r5:e0c55fb0 r4:c42ad8c0
do_work_pending from slow_work_pending+0xc/0x20
Exception stack(0xe0c55fb0 to 0xe0c55ff8)
5fa0: 00000019 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06 40010030 00000001
r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000004 r6:b6f86550 r5:00579a60
r4:00000019
I repeat with your patch:
...
ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down
cpsw-switch 4a100000.switch eth0: Link is Down
DSA: tree 0 torn down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
Modules linked in:
CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc4-00008-gda8b39e58927-dirty #1326
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:00000009 r6:00000000 r5:c18c0c64 r4:00000013
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:00000009 r6:c1187600 r5:000005d2 r4:c1a06470
dump_stack from __warn+0x88/0x160
__warn from warn_slowpath_fmt+0xe4/0x1e0
r8:00000009 r7:000005d2 r6:c1a06470 r5:c1d05590 r4:c1c978a4
warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
r10:c4afc890 r9:c4284b68 r8:00000100 r7:c1a06470 r6:c4298300 r5:c4282800
r4:c4282600
dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
r9:c4284b68 r8:c424d844 r7:c4255c54 r6:c4284b70 r5:c4284b40 r4:c4298300
dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
ocelot_ext_remove from platform_remove+0x50/0x6c
r5:c424d800 r4:c4255c10
platform_remove from device_remove+0x50/0x74
r5:c424d800 r4:c4255c10
device_remove from device_release_driver_internal+0x190/0x204
r5:c424d800 r4:c4255c10
device_release_driver_internal from device_driver_detach+0x20/0x24
r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
device_driver_detach from unbind_store+0x64/0xa0
unbind_store from drv_attr_store+0x34/0x40
r7:e0c55f08 r6:c4afc880 r5:c471a600 r4:c0a53740
drv_attr_store from sysfs_kf_write+0x48/0x54
r5:c471a600 r4:c0a5299c
sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
r5:c471a600 r4:00000019
kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:005598a8 r5:e0c55f68
r4:c4a819c0
vfs_write from ksys_write+0x70/0xf4
r10:00000004 r9:c47d0000 r8:c03002f4 r7:00000000 r6:00000000 r5:c4a819c0
r4:c4a819c2
ksys_write from sys_write+0x18/0x1c
r7:00000004 r6:b6f1d550 r5:005598a8 r4:00000019
sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0: 00000019 005598a8 00000001 005598a8 00000019 00000000
5fc0: 00000019 005598a8 b6f1d550 00000004 00000019 00000001 00000000 be9f8a6c
5fe0: 00000004 be9f89c8 b6dc6767 b6d51e06
---[ end trace 0000000000000000 ]---
BUG: scheduling while atomic: bash/157/0x00000002
Modules linked in:
Preemption disabled at:
[<c03b8ff0>] __wake_up_klogd.part.0+0x20/0xb4
CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927-dirty #1326
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:00000002 r6:00000080 r5:c18c0c64 r4:00000093
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:00000002 r6:c47d0000 r5:c03b8ff0 r4:c47d0000
dump_stack from __schedule_bug+0x94/0xa4
__schedule_bug from __schedule+0x8fc/0xc48
r5:00000000 r4:df99a400
__schedule from schedule+0x60/0xf4
r10:5ac3c35a r9:befffd07 r8:fffffe30 r7:00000002 r6:c03002f4 r5:e0c55fb0
r4:c47d0000
schedule from do_work_pending+0x84/0x568
r5:e0c55fb0 r4:c47d0000
do_work_pending from slow_work_pending+0xc/0x20
Exception stack(0xe0c55fb0 to 0xe0c55ff8)
5fa0: 00000019 005598a8 00000019 00000000
5fc0: 00000019 005598a8 b6f1d550 00000004 00000019 00000001 00000000 be9f8a6c
5fe0: 00000004 be9f89c8 b6dc6767 b6d51e06 40010030 00000001
r10:00000004 r9:c47d0000 r8:c03002f4 r7:00000004 r6:b6f1d550 r5:005598a8
So if your patch is confirming the existence and removal of the warnings:
WARNING: CPU: 0 PID: 157 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x1a8/0x1ac
WARNING: CPU: 0 PID: 157 at net/8021q/vlan_core.c:376 vlan_vid_del+0x14c/0x16c
... and you think my testing methods are valid, I can add Acked / Tested
if you like.
("scheduling while atomic" and "dsa_switch_release_ports" I see when
unbinding, regardless of ocelot or ocelot-8021q tagging protocols)
Hope this helps, and let me know if there's anything else you want me to
test.
Powered by blists - more mailing lists