[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZM0hVTA7nHuRCSXa@euler>
Date: Fri, 4 Aug 2023 09:03:33 -0700
From: Colin Foster <colin.foster@...advantage.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Claudiu Manoil <claudiu.manoil@....com>,
Alexandre Belloni <alexandre.belloni@...tlin.com>,
UNGLinuxDriver@...rochip.com
Subject: Re: [PATCH net] net: dsa: ocelot: call dsa_tag_8021q_unregister()
under rtnl_lock() on driver remove
Hi Vladimir,
On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> When the tagging protocol in current use is "ocelot-8021q" and we unbind
> the driver, we see this splat:
>
> $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
> mscc_felix 0000:00:00.5 swp0: left promiscuous mode
> sja1105 spi2.0: Link is Down
> DSA: tree 1 torn down
> mscc_felix 0000:00:00.5 swp2: left promiscuous mode
> sja1105 spi2.2: Link is Down
> DSA: tree 3 torn down
> fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
> mscc_felix 0000:00:00.5: Link is Down
> ------------[ cut here ]------------
> RTNL: assertion failed at net/dsa/tag_8021q.c (409)
> WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
> Modules linked in:
> CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
> pc : dsa_tag_8021q_unregister+0x12c/0x1a0
> lr : dsa_tag_8021q_unregister+0x12c/0x1a0
> Call trace:
> dsa_tag_8021q_unregister+0x12c/0x1a0
> felix_tag_8021q_teardown+0x130/0x150
> felix_teardown+0x3c/0xd8
> dsa_tree_teardown_switches+0xbc/0xe0
> dsa_unregister_switch+0x168/0x260
> felix_pci_remove+0x30/0x60
> pci_device_remove+0x4c/0x100
> device_release_driver_internal+0x188/0x288
> device_links_unbind_consumers+0xfc/0x138
> device_release_driver_internal+0xe0/0x288
> device_driver_detach+0x24/0x38
> unbind_store+0xd8/0x108
> drv_attr_store+0x30/0x50
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> RTNL: assertion failed at net/8021q/vlan_core.c (376)
> WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
> CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771
> pc : vlan_vid_del+0x1b8/0x1f0
> lr : vlan_vid_del+0x1b8/0x1f0
> dsa_tag_8021q_unregister+0x8c/0x1a0
> felix_tag_8021q_teardown+0x130/0x150
> felix_teardown+0x3c/0xd8
> dsa_tree_teardown_switches+0xbc/0xe0
> dsa_unregister_switch+0x168/0x260
> felix_pci_remove+0x30/0x60
> pci_device_remove+0x4c/0x100
> device_release_driver_internal+0x188/0x288
> device_links_unbind_consumers+0xfc/0x138
> device_release_driver_internal+0xe0/0x288
> device_driver_detach+0x24/0x38
> unbind_store+0xd8/0x108
> drv_attr_store+0x30/0x50
> DSA: tree 0 torn down
>
> This was somewhat not so easy to spot, because "ocelot-8021q" is not the
> default tagging protocol, and thus, not everyone who tests the unbinding
> path may have switched to it beforehand. The default
> felix_tag_npi_teardown() does not require rtnl_lock() to be held.
I ran this unbind test (with just ocelot tagging) on my currently
running system (6.5.1-rc1 + 8). This doesn't include your patch, but I
suspect this is entirely different because I'm not using ocelot-8021q.
# echo spi0.0 > /sys/bus/spi/drivers/ocelot-soc/unbind
br0: port 1(swp1) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left promiscuous mode
br0: port 1(swp1) entered disabled state
br0: port 2(swp2) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left promiscuous mode
br0: port 2(swp2) entered disabled state
br0: port 3(swp3) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left promiscuous mode
br0: port 3(swp3) entered disabled state
br0: port 4(swp4) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left promiscuous mode
br0: port 4(swp4) entered disabled state
br0: port 5(swp5) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left promiscuous mode
br0: port 5(swp5) entered disabled state
br0: port 6(swp6) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left promiscuous mode
br0: port 6(swp6) entered disabled state
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left allmulticast mode
cpsw-switch 4a100000.switch eth0: left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left promiscuous mode
cpsw-switch 4a100000.switch eth0: left promiscuous mode
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down
DSA: tree 0 torn down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
Modules linked in:
CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc1-00008-ga5ed09af118a #1324
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:00000009 r6:00000000 r5:c18c0a8c r4:000e0113
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:00000009 r6:c1186e10 r5:000005d2 r4:c1a06270
dump_stack from __warn+0x88/0x160
__warn from warn_slowpath_fmt+0xe4/0x1e0
r8:00000009 r7:000005d2 r6:c1a06270 r5:c1d05590 r4:c1c978a4
warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
r10:c1ea8b7c r9:c4290da8 r8:00000100 r7:c1a06270 r6:c4288380 r5:c427f800
r4:c427f600
dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
r9:c4290da8 r8:00000044 r7:c4255c54 r6:c4290db0 r5:c4290d80 r4:c4288380
dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
r9:c1f6ec1c r8:00000044 r7:c4255c54 r6:c1ec5454 r5:00000000 r4:c26db800
ocelot_ext_remove from platform_remove+0x50/0x6c
r5:00000000 r4:c4255c10
platform_remove from device_remove+0x50/0x74
r5:00000000 r4:c4255c10
device_remove from device_release_driver_internal+0x190/0x204
r5:00000000 r4:c4255c10
device_release_driver_internal from device_release_driver+0x20/0x24
r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c4255c10 r4:c2146930
device_release_driver from bus_remove_device+0xd0/0xf4
bus_remove_device from device_del+0x164/0x454
r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c4255c10 r4:c4255c54
device_del from platform_device_del.part.0+0x20/0x84
r10:c1ea8b7c r9:c4292e80 r8:00000100 r7:00000122 r6:c4255c00 r5:c4255c00
r4:c4255c00
platform_device_del.part.0 from platform_device_unregister+0x28/0x34
r5:c4255c10 r4:c4255c00
platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
r5:c4255c10 r4:c1ea8b7c
mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78
r4:00000000 r3:00000001
device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
r6:e0c55dd4 r5:c4270e00 r4:c4270f00
devm_mfd_dev_release from release_nodes+0x78/0x104
release_nodes from devres_release_all+0x90/0xe0
r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039
r4:c424d800
devres_release_all from device_unbind_cleanup+0x1c/0x70
r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800
device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
r5:c424d400 r4:c424d800
device_release_driver_internal from device_driver_detach+0x20/0x24
r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108
device_driver_detach from unbind_store+0x64/0xa0
unbind_store from drv_attr_store+0x34/0x40
r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410
drv_attr_store from sysfs_kf_write+0x48/0x54
r5:c471d040 r4:c0a5266c
sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
r5:c471d040 r4:00000007
kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68
r4:c4958cc0
vfs_write from ksys_write+0x70/0xf4
r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0
r4:c4958cc0
ksys_write from sys_write+0x18/0x1c
r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007
sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
---[ end trace 0000000000000000 ]---
gpio_stub_drv gpiochip6: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
BUG: scheduling while atomic: bash/157/0x00000002
Modules linked in:
Preemption disabled at:
[<c03b8f98>] __wake_up_klogd.part.0+0x20/0xb4
CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc1-00008-ga5ed09af118a #1324
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:c47b4700 r6:00000000 r5:c18c0a8c r4:000e0113
show_stack from dump_stack_lvl+0x60/0x78
dump_stack_lvl from dump_stack+0x18/0x1c
r7:c47b4700 r6:c47b4700 r5:c03b8f98 r4:c47b4700
dump_stack from __schedule_bug+0x94/0xa4
__schedule_bug from __schedule+0x8fc/0xc48
r5:00000000 r4:df99a400
__schedule from schedule+0x60/0xf4
r10:e0c55ab4 r9:00000002 r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac
r4:c47b4700
schedule from schedule_timeout+0xd8/0x190
r5:e0c55aac r4:7fffffff
schedule_timeout from wait_for_completion+0xa0/0x124
r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac r4:7fffffff
wait_for_completion from devtmpfs_submit_req+0x70/0x80
r10:c47b4700 r9:c1f6ec1c r8:c424e810 r7:00000000 r6:e0c55aac r5:e0c55aa8
r4:c1f6ed78
devtmpfs_submit_req from devtmpfs_delete_node+0x84/0xb4
r7:c47b4700 r6:c4250264 r5:c4250000 r4:00000000
devtmpfs_delete_node from device_del+0x3b8/0x454
r5:c4250000 r4:c4250044
device_del from cdev_device_del+0x24/0x54
r10:c47b4700 r9:c1d04d5c r8:00000040 r7:c4250234 r6:c4250264 r5:c42501e0
r4:c4250000
cdev_device_del from gpiolib_cdev_unregister+0x20/0x24
r5:c4250000 r4:00000000
gpiolib_cdev_unregister from gpiochip_remove+0x100/0x130
gpiochip_remove from devm_gpio_chip_release+0x18/0x1c
r9:c1d04d5c r8:c1f099a8 r7:c424e810 r6:e0c55bf4 r5:c427e700 r4:c427ea80
devm_gpio_chip_release from devm_action_release+0x1c/0x20
devm_action_release from release_nodes+0x78/0x104
release_nodes from devres_release_all+0x90/0xe0
r10:c1ea8b7c r9:c1f6ec1c r8:00000044 r7:c424e9c0 r6:800e0113 r5:00000093
r4:c424e810
devres_release_all from device_unbind_cleanup+0x1c/0x70
r7:c424e854 r6:c1dd9a80 r5:00000000 r4:c424e810
device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
r5:00000000 r4:c424e810
device_release_driver_internal from device_release_driver+0x20/0x24
r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c424e810 r4:c2146930
device_release_driver from bus_remove_device+0xd0/0xf4
bus_remove_device from device_del+0x164/0x454
r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c424e810 r4:c424e854
device_del from platform_device_del.part.0+0x20/0x84
r10:c1ea8b7c r9:c4274f00 r8:00000100 r7:00000122 r6:c424e800 r5:c424e800
r4:c424e800
platform_device_del.part.0 from platform_device_unregister+0x28/0x34
r5:c424e810 r4:c424e800
platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
r5:c424e810 r4:c1ea8b7c
mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78
r4:00000000 r3:00000001
device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
r6:e0c55dd4 r5:c4270e00 r4:c4270f00
devm_mfd_dev_release from release_nodes+0x78/0x104
release_nodes from devres_release_all+0x90/0xe0
r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039
r4:c424d800
devres_release_all from device_unbind_cleanup+0x1c/0x70
r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800
device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
r5:c424d400 r4:c424d800
device_release_driver_internal from device_driver_detach+0x20/0x24
r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108
device_driver_detach from unbind_store+0x64/0xa0
unbind_store from drv_attr_store+0x34/0x40
r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410
drv_attr_store from sysfs_kf_write+0x48/0x54
r5:c471d040 r4:c0a5266c
sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
r5:c471d040 r4:00000007
kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68
r4:c4958cc0
vfs_write from ksys_write+0x70/0xf4
r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0
r4:c4958cc0
ksys_write from sys_write+0x18/0x1c
r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007
sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
cpsw-switch 4a100000.switch eth0: Link is Down
It looks to me like I have some things to fix :)
Is it worth me still trying to recreate / test? I haven't used
ocelot-8021q really at all.
Colin
Powered by blists - more mailing lists