[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f8d3018-d7e5-47e5-b99d-550f8a4011ee@gmail.com>
Date: Tue, 15 Apr 2025 20:48:53 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Niklas Cassel <cassel@...nel.org>, nic_swsd@...ltek.com,
Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
netdev@...r.kernel.org
Subject: Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
On 15.04.2025 11:53, Niklas Cassel wrote:
> For a PCI controller driver with a .shutdown() callback, we will see the
> following warning:
I saw the related mail thread, IIRC it was about potential new code.
Is this right? Or do we talk about existing code? Then it would have
to be treated as fix.
Existence of a shutdown callback itself is not the problem, the problem is
that all PCI bus devices are removed as part of shutdown handling for this
specific controller driver.
> [ 12.020111] called from state HALTED
> [ 12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
>
> This is because rtl8169_down() (which calls phy_stop()) is called twice
> during shutdown.
>
> First time:
> [ 23.827764] Call trace:
> [ 23.827765] show_stack+0x20/0x40 (C)
> [ 23.827774] dump_stack_lvl+0x60/0x80
> [ 23.827778] dump_stack+0x18/0x24
> [ 23.827782] rtl8169_down+0x30/0x2a0
> [ 23.827788] rtl_shutdown+0xb0/0xc0
> [ 23.827792] pci_device_shutdown+0x3c/0x88
> [ 23.827797] device_shutdown+0x150/0x278
> [ 23.827802] kernel_restart+0x4c/0xb8
>
> Second time:
> [ 23.841468] Call trace:
> [ 23.841470] show_stack+0x20/0x40 (C)
> [ 23.841478] dump_stack_lvl+0x60/0x80
> [ 23.841483] dump_stack+0x18/0x24
> [ 23.841486] rtl8169_down+0x30/0x2a0
> [ 23.841492] rtl8169_close+0x64/0x100
> [ 23.841496] __dev_close_many+0xbc/0x1f0
> [ 23.841502] dev_close_many+0x94/0x160
> [ 23.841505] unregister_netdevice_many_notify+0x160/0x9d0
> [ 23.841510] unregister_netdevice_queue+0xf0/0x100
> [ 23.841515] unregister_netdev+0x2c/0x58
> [ 23.841519] rtl_remove_one+0xa0/0xe0
> [ 23.841524] pci_device_remove+0x4c/0xf8
> [ 23.841528] device_remove+0x54/0x90
> [ 23.841534] device_release_driver_internal+0x1d4/0x238
> [ 23.841539] device_release_driver+0x20/0x38
> [ 23.841544] pci_stop_bus_device+0x84/0xe0
> [ 23.841548] pci_stop_bus_device+0x40/0xe0
> [ 23.841552] pci_stop_root_bus+0x48/0x80
> [ 23.841555] dw_pcie_host_deinit+0x34/0xe0
> [ 23.841559] rockchip_pcie_shutdown+0x20/0x38
> [ 23.841565] platform_shutdown+0x2c/0x48
> [ 23.841571] device_shutdown+0x150/0x278
> [ 23.841575] kernel_restart+0x4c/0xb8
>
> Add a netif_device_present() guard around the rtl8169_down() call in
> rtl8169_close(), to avoid rtl8169_down() from being called twice.
>
> This matches how e.g. e1000e_close() has a netif_device_present() guard
> around the e1000e_down() call.
>
This approach has at least two issues:
1. Likely it breaks WoL, because now phy_detach() is called.
2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
for the remove callback. Now it's left in D0.
I'll also spend a few thoughts on how to solve this best.
> Signed-off-by: Niklas Cassel <cassel@...nel.org>
> ---
> drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 4eebd9cb40a3..0300a06ae260 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
> pm_runtime_get_sync(&pdev->dev);
>
> netif_stop_queue(dev);
> - rtl8169_down(tp);
> + if (netif_device_present(tp->dev))
> + rtl8169_down(tp);
> rtl8169_rx_clear(tp);
>
> free_irq(tp->irq, tp);
Powered by blists - more mailing lists