lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZM0hVTA7nHuRCSXa@euler>
Date: Fri, 4 Aug 2023 09:03:33 -0700
From: Colin Foster <colin.foster@...advantage.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Andrew Lunn <andrew@...n.ch>,
	Florian Fainelli <f.fainelli@...il.com>,
	Claudiu Manoil <claudiu.manoil@....com>,
	Alexandre Belloni <alexandre.belloni@...tlin.com>,
	UNGLinuxDriver@...rochip.com
Subject: Re: [PATCH net] net: dsa: ocelot: call dsa_tag_8021q_unregister()
 under rtnl_lock() on driver remove

Hi Vladimir,

On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> When the tagging protocol in current use is "ocelot-8021q" and we unbind
> the driver, we see this splat:
> 
> $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
> mscc_felix 0000:00:00.5 swp0: left promiscuous mode
> sja1105 spi2.0: Link is Down
> DSA: tree 1 torn down
> mscc_felix 0000:00:00.5 swp2: left promiscuous mode
> sja1105 spi2.2: Link is Down
> DSA: tree 3 torn down
> fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
> mscc_felix 0000:00:00.5: Link is Down
> ------------[ cut here ]------------
> RTNL: assertion failed at net/dsa/tag_8021q.c (409)
> WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
> Modules linked in:
> CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
> pc : dsa_tag_8021q_unregister+0x12c/0x1a0
> lr : dsa_tag_8021q_unregister+0x12c/0x1a0
> Call trace:
>  dsa_tag_8021q_unregister+0x12c/0x1a0
>  felix_tag_8021q_teardown+0x130/0x150
>  felix_teardown+0x3c/0xd8
>  dsa_tree_teardown_switches+0xbc/0xe0
>  dsa_unregister_switch+0x168/0x260
>  felix_pci_remove+0x30/0x60
>  pci_device_remove+0x4c/0x100
>  device_release_driver_internal+0x188/0x288
>  device_links_unbind_consumers+0xfc/0x138
>  device_release_driver_internal+0xe0/0x288
>  device_driver_detach+0x24/0x38
>  unbind_store+0xd8/0x108
>  drv_attr_store+0x30/0x50
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> RTNL: assertion failed at net/8021q/vlan_core.c (376)
> WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
> CPU: 1 PID: 329 Comm: bash Tainted: G        W          6.5.0-rc3+ #771
> pc : vlan_vid_del+0x1b8/0x1f0
> lr : vlan_vid_del+0x1b8/0x1f0
>  dsa_tag_8021q_unregister+0x8c/0x1a0
>  felix_tag_8021q_teardown+0x130/0x150
>  felix_teardown+0x3c/0xd8
>  dsa_tree_teardown_switches+0xbc/0xe0
>  dsa_unregister_switch+0x168/0x260
>  felix_pci_remove+0x30/0x60
>  pci_device_remove+0x4c/0x100
>  device_release_driver_internal+0x188/0x288
>  device_links_unbind_consumers+0xfc/0x138
>  device_release_driver_internal+0xe0/0x288
>  device_driver_detach+0x24/0x38
>  unbind_store+0xd8/0x108
>  drv_attr_store+0x30/0x50
> DSA: tree 0 torn down
> 
> This was somewhat not so easy to spot, because "ocelot-8021q" is not the
> default tagging protocol, and thus, not everyone who tests the unbinding
> path may have switched to it beforehand. The default
> felix_tag_npi_teardown() does not require rtnl_lock() to be held.

I ran this unbind test (with just ocelot tagging) on my currently
running system (6.5.1-rc1 + 8). This doesn't include your patch, but I
suspect this is entirely different because I'm not using ocelot-8021q.

# echo spi0.0 > /sys/bus/spi/drivers/ocelot-soc/unbind
br0: port 1(swp1) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left promiscuous mode
br0: port 1(swp1) entered disabled state
br0: port 2(swp2) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left promiscuous mode
br0: port 2(swp2) entered disabled state
br0: port 3(swp3) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left promiscuous mode
br0: port 3(swp3) entered disabled state
br0: port 4(swp4) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left promiscuous mode
br0: port 4(swp4) entered disabled state
br0: port 5(swp5) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left promiscuous mode
br0: port 5(swp5) entered disabled state
br0: port 6(swp6) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left promiscuous mode
br0: port 6(swp6) entered disabled state
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left allmulticast mode
cpsw-switch 4a100000.switch eth0: left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left promiscuous mode
cpsw-switch 4a100000.switch eth0: left promiscuous mode
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down
DSA: tree 0 torn down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
Modules linked in:
CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc1-00008-ga5ed09af118a #1324
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000009 r6:00000000 r5:c18c0a8c r4:000e0113
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000009 r6:c1186e10 r5:000005d2 r4:c1a06270
 dump_stack from __warn+0x88/0x160
 __warn from warn_slowpath_fmt+0xe4/0x1e0
 r8:00000009 r7:000005d2 r6:c1a06270 r5:c1d05590 r4:c1c978a4
 warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
 r10:c1ea8b7c r9:c4290da8 r8:00000100 r7:c1a06270 r6:c4288380 r5:c427f800
 r4:c427f600
 dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
 r9:c4290da8 r8:00000044 r7:c4255c54 r6:c4290db0 r5:c4290d80 r4:c4288380
 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
 r9:c1f6ec1c r8:00000044 r7:c4255c54 r6:c1ec5454 r5:00000000 r4:c26db800
 ocelot_ext_remove from platform_remove+0x50/0x6c
 r5:00000000 r4:c4255c10
 platform_remove from device_remove+0x50/0x74
 r5:00000000 r4:c4255c10
 device_remove from device_release_driver_internal+0x190/0x204
 r5:00000000 r4:c4255c10
 device_release_driver_internal from device_release_driver+0x20/0x24
 r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c4255c10 r4:c2146930
 device_release_driver from bus_remove_device+0xd0/0xf4
 bus_remove_device from device_del+0x164/0x454
 r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c4255c10 r4:c4255c54
 device_del from platform_device_del.part.0+0x20/0x84
 r10:c1ea8b7c r9:c4292e80 r8:00000100 r7:00000122 r6:c4255c00 r5:c4255c00
 r4:c4255c00
 platform_device_del.part.0 from platform_device_unregister+0x28/0x34
 r5:c4255c10 r4:c4255c00
 platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
 r5:c4255c10 r4:c1ea8b7c
 mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
 r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78
 r4:00000000 r3:00000001
 device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
 r6:e0c55dd4 r5:c4270e00 r4:c4270f00
 devm_mfd_dev_release from release_nodes+0x78/0x104
 release_nodes from devres_release_all+0x90/0xe0
 r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039
 r4:c424d800
 devres_release_all from device_unbind_cleanup+0x1c/0x70
 r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800
 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
 r5:c424d400 r4:c424d800
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471d040 r4:c0a5266c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471d040 r4:00000007
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68
 r4:c4958cc0
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0
 r4:c4958cc0
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
---[ end trace 0000000000000000 ]---
gpio_stub_drv gpiochip6: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
BUG: scheduling while atomic: bash/157/0x00000002
Modules linked in:
Preemption disabled at:
[<c03b8f98>] __wake_up_klogd.part.0+0x20/0xb4
CPU: 0 PID: 157 Comm: bash Tainted: G        W          6.5.0-rc1-00008-ga5ed09af118a #1324
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c47b4700 r6:00000000 r5:c18c0a8c r4:000e0113
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c47b4700 r6:c47b4700 r5:c03b8f98 r4:c47b4700
 dump_stack from __schedule_bug+0x94/0xa4
 __schedule_bug from __schedule+0x8fc/0xc48
 r5:00000000 r4:df99a400
 __schedule from schedule+0x60/0xf4
 r10:e0c55ab4 r9:00000002 r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac
 r4:c47b4700
 schedule from schedule_timeout+0xd8/0x190
 r5:e0c55aac r4:7fffffff
 schedule_timeout from wait_for_completion+0xa0/0x124
 r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac r4:7fffffff
 wait_for_completion from devtmpfs_submit_req+0x70/0x80
 r10:c47b4700 r9:c1f6ec1c r8:c424e810 r7:00000000 r6:e0c55aac r5:e0c55aa8
 r4:c1f6ed78
 devtmpfs_submit_req from devtmpfs_delete_node+0x84/0xb4
 r7:c47b4700 r6:c4250264 r5:c4250000 r4:00000000
 devtmpfs_delete_node from device_del+0x3b8/0x454
 r5:c4250000 r4:c4250044
 device_del from cdev_device_del+0x24/0x54
 r10:c47b4700 r9:c1d04d5c r8:00000040 r7:c4250234 r6:c4250264 r5:c42501e0
 r4:c4250000
 cdev_device_del from gpiolib_cdev_unregister+0x20/0x24
 r5:c4250000 r4:00000000
 gpiolib_cdev_unregister from gpiochip_remove+0x100/0x130
 gpiochip_remove from devm_gpio_chip_release+0x18/0x1c
 r9:c1d04d5c r8:c1f099a8 r7:c424e810 r6:e0c55bf4 r5:c427e700 r4:c427ea80
 devm_gpio_chip_release from devm_action_release+0x1c/0x20
 devm_action_release from release_nodes+0x78/0x104
 release_nodes from devres_release_all+0x90/0xe0
 r10:c1ea8b7c r9:c1f6ec1c r8:00000044 r7:c424e9c0 r6:800e0113 r5:00000093
 r4:c424e810
 devres_release_all from device_unbind_cleanup+0x1c/0x70
 r7:c424e854 r6:c1dd9a80 r5:00000000 r4:c424e810
 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
 r5:00000000 r4:c424e810
 device_release_driver_internal from device_release_driver+0x20/0x24
 r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c424e810 r4:c2146930
 device_release_driver from bus_remove_device+0xd0/0xf4
 bus_remove_device from device_del+0x164/0x454
 r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c424e810 r4:c424e854
 device_del from platform_device_del.part.0+0x20/0x84
 r10:c1ea8b7c r9:c4274f00 r8:00000100 r7:00000122 r6:c424e800 r5:c424e800
 r4:c424e800
 platform_device_del.part.0 from platform_device_unregister+0x28/0x34
 r5:c424e810 r4:c424e800
 platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
 r5:c424e810 r4:c1ea8b7c
 mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
 r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78
 r4:00000000 r3:00000001
 device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
 r6:e0c55dd4 r5:c4270e00 r4:c4270f00
 devm_mfd_dev_release from release_nodes+0x78/0x104
 release_nodes from devres_release_all+0x90/0xe0
 r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039
 r4:c424d800
 devres_release_all from device_unbind_cleanup+0x1c/0x70
 r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800
 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
 r5:c424d400 r4:c424d800
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471d040 r4:c0a5266c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471d040 r4:00000007
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68
 r4:c4958cc0
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0
 r4:c4958cc0
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
cpsw-switch 4a100000.switch eth0: Link is Down


It looks to me like I have some things to fix :)


Is it worth me still trying to recreate / test? I haven't used
ocelot-8021q really at all.


Colin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ