[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230804170958.nru6iafu5jrfxhqh@skbuf>
Date: Fri, 4 Aug 2023 20:09:58 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: Colin Foster <colin.foster@...advantage.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Claudiu Manoil <claudiu.manoil@....com>,
Alexandre Belloni <alexandre.belloni@...tlin.com>,
UNGLinuxDriver@...rochip.com
Subject: Re: [PATCH net] net: dsa: ocelot: call dsa_tag_8021q_unregister()
under rtnl_lock() on driver remove
Hi Colin,
On Fri, Aug 04, 2023 at 09:03:33AM -0700, Colin Foster wrote:
> On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> I ran this unbind test (with just ocelot tagging) on my currently
> running system (6.5.1-rc1 + 8). This doesn't include your patch, but I
> suspect this is entirely different because I'm not using ocelot-8021q.
>
> # echo spi0.0 > /sys/bus/spi/drivers/ocelot-soc/unbind
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
> Modules linked in:
> CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc1-00008-ga5ed09af118a #1324
> Hardware name: Generic AM33XX (Flattened Device Tree)
> Backtrace:
> __warn from warn_slowpath_fmt+0xe4/0x1e0
> warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
> dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
> dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
> ocelot_ext_remove from platform_remove+0x50/0x6c
> platform_remove from device_remove+0x50/0x74
> device_remove from device_release_driver_internal+0x190/0x204
> device_release_driver_internal from device_release_driver+0x20/0x24
> device_release_driver from bus_remove_device+0xd0/0xf4
> bus_remove_device from device_del+0x164/0x454
> device_del from platform_device_del.part.0+0x20/0x84
> platform_device_del.part.0 from platform_device_unregister+0x28/0x34
> platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
> mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
> device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
> devm_mfd_dev_release from release_nodes+0x78/0x104
> release_nodes from devres_release_all+0x90/0xe0
> devres_release_all from device_unbind_cleanup+0x1c/0x70
> device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
> device_release_driver_internal from device_driver_detach+0x20/0x24
> device_driver_detach from unbind_store+0x64/0xa0
> unbind_store from drv_attr_store+0x34/0x40
> drv_attr_store from sysfs_kf_write+0x48/0x54
> sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
> kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
> vfs_write from ksys_write+0x70/0xf4
> ksys_write from sys_write+0x18/0x1c
> sys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xe0c55fa8 to 0xe0c55ff0)
> 5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
> 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
> 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
> ---[ end trace 0000000000000000 ]---
> gpio_stub_drv gpiochip6: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
> BUG: scheduling while atomic: bash/157/0x00000002
> Modules linked in:
> Preemption disabled at:
> [<c03b8f98>] __wake_up_klogd.part.0+0x20/0xb4
> CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc1-00008-ga5ed09af118a #1324
> Hardware name: Generic AM33XX (Flattened Device Tree)
> Backtrace:
> __schedule_bug from __schedule+0x8fc/0xc48
> __schedule from schedule+0x60/0xf4
> schedule from schedule_timeout+0xd8/0x190
> schedule_timeout from wait_for_completion+0xa0/0x124
> wait_for_completion from devtmpfs_submit_req+0x70/0x80
> devtmpfs_submit_req from devtmpfs_delete_node+0x84/0xb4
> devtmpfs_delete_node from device_del+0x3b8/0x454
> device_del from cdev_device_del+0x24/0x54
> cdev_device_del from gpiolib_cdev_unregister+0x20/0x24
> gpiolib_cdev_unregister from gpiochip_remove+0x100/0x130
> gpiochip_remove from devm_gpio_chip_release+0x18/0x1c
> devm_gpio_chip_release from devm_action_release+0x1c/0x20
> devm_action_release from release_nodes+0x78/0x104
> release_nodes from devres_release_all+0x90/0xe0
> devres_release_all from device_unbind_cleanup+0x1c/0x70
> device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
> device_release_driver_internal from device_release_driver+0x20/0x24
> device_release_driver from bus_remove_device+0xd0/0xf4
> bus_remove_device from device_del+0x164/0x454
> device_del from platform_device_del.part.0+0x20/0x84
> platform_device_del.part.0 from platform_device_unregister+0x28/0x34
> platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
> mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
> device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
> devm_mfd_dev_release from release_nodes+0x78/0x104
> release_nodes from devres_release_all+0x90/0xe0
> devres_release_all from device_unbind_cleanup+0x1c/0x70
> device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
> device_release_driver_internal from device_driver_detach+0x20/0x24
> device_driver_detach from unbind_store+0x64/0xa0
> unbind_store from drv_attr_store+0x34/0x40
> drv_attr_store from sysfs_kf_write+0x48/0x54
> sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
> kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
> vfs_write from ksys_write+0x70/0xf4
> ksys_write from sys_write+0x18/0x1c
> sys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xe0c55fa8 to 0xe0c55ff0)
> 5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
> 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
> 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
> cpsw-switch 4a100000.switch eth0: Link is Down
>
>
> It looks to me like I have some things to fix :)
>
>
> Is it worth me still trying to recreate / test? I haven't used
> ocelot-8021q really at all.
The WARN_ON() in dsa_switch_release_ports() is different, and I tried to fix it here:
https://patchwork.kernel.org/project/netdevbpf/patch/20230411144955.1604591-1-vladimir.oltean@nxp.com/
but judging by the fact that that was in April and now we're in August,
obviously I didn't succeed.
What's worse is the other one, the "scheduling while atomic" bug in the
gpiochip removal path from ocelot-pinctrl.c. I'm not sure, at first glance,
what causes the calling context to be atomic. Presumably some kind of
spinlock which should be tracked down.
Unfortunately I'm not very good with kernel debugging the way it should
be done, so what I would advise you to do is to walk the stack trace
down, from device_del() or so, and sprinkle a few might_sleep() calls
until you figure out who's forcing atomic context and why.
Otherwise, can't you just unbind the driver from the ethernet-switch
child of the SPI device, rather than the entire SPI device? That should
avoid the gpiochip/pinctrl bug. And the other one is ignorable for the
intents and purposes here (that is, unless you want to take care of it,
of course).
Powered by blists - more mailing lists