lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <276986c2-7dbe-33e5-3c11-ba8b2b2083a2@oss.qualcomm.com>
Date: Wed, 16 Apr 2025 21:15:19 +0530
From: Krishna Chaitanya Chundru <krishna.chundru@....qualcomm.com>
To: Niklas Cassel <cassel@...nel.org>, Heiner Kallweit <hkallweit1@...il.com>
Cc: nic_swsd@...ltek.com, Andrew Lunn <andrew+netdev@...n.ch>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
        netdev@...r.kernel.org
Subject: Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown



On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> On Tue, Apr 15, 2025 at 08:48:53PM +0200, Heiner Kallweit wrote:
>> On 15.04.2025 11:53, Niklas Cassel wrote:
>>> For a PCI controller driver with a .shutdown() callback, we will see the
>>> following warning:
>>
>> I saw the related mail thread, IIRC it was about potential new code.
>> Is this right? Or do we talk about existing code? Then it would have
>> to be treated as fix.
> 
> The qcom PCIe controller driver wants to add a .shutdown callback:
> https://lore.kernel.org/linux-pci/tb6kkyfgpemzacfwbg45otgcrg5ltj323y6ufjy3bw3rgri7uf@vrmmtueyg7su/T/#t
> 
> In that shutdown callback, they want to call dw_pcie_host_deinit(),
> which will call pci_stop_root_bus().
> 
> Looking at other PCIe controller drivers:
> $ git grep -p pci_stop_root_bus
> 
> There seems to be a lot of PCIe controller drivers that call
> pci_stop_root_bus(), but they all do it from the .remove() callback,
> not from the .shutdown() callback.
> 
> Actually, I can't see any existing PCIe controller driver that calls
> pci_stop_root_bus() from the .shutdown() callback.
> 
Not all endpoint have .shutdown callback(), the endpoint drivers still
will try to do data transfers if we don't call pci_stop_stop_bus(), and
if there are ongoing transfers over link and if the controller driver
turns off the PCIe link without calling pci_stop_root_bus we might get
bus errors, NOC errors etc which can hang the system.

This should be handled gracefully in the rtl8169 driver.
> So perhaps we should hold off with this patch.
> 
I disagree on this, it might be causing issue with net driver, but we
might face some other issues as explained above if we don't call
pci_stop_root_bus().

- Krishna Chaitanya.
> Adding the qcom folks to CC.
> 
> 
> Kind regards,
> Niklas
> 
> 
>>
>> Existence of a shutdown callback itself is not the problem, the problem is
>> that all PCI bus devices are removed as part of shutdown handling for this
>> specific controller driver.
>>
>>> [   12.020111] called from state HALTED
>>> [   12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
>>>
>>> This is because rtl8169_down() (which calls phy_stop()) is called twice
>>> during shutdown.
>>>
>>> First time:
>>> [   23.827764] Call trace:
>>> [   23.827765]  show_stack+0x20/0x40 (C)
>>> [   23.827774]  dump_stack_lvl+0x60/0x80
>>> [   23.827778]  dump_stack+0x18/0x24
>>> [   23.827782]  rtl8169_down+0x30/0x2a0
>>> [   23.827788]  rtl_shutdown+0xb0/0xc0
>>> [   23.827792]  pci_device_shutdown+0x3c/0x88
>>> [   23.827797]  device_shutdown+0x150/0x278
>>> [   23.827802]  kernel_restart+0x4c/0xb8
>>>
>>> Second time:
>>> [   23.841468] Call trace:
>>> [   23.841470]  show_stack+0x20/0x40 (C)
>>> [   23.841478]  dump_stack_lvl+0x60/0x80
>>> [   23.841483]  dump_stack+0x18/0x24
>>> [   23.841486]  rtl8169_down+0x30/0x2a0
>>> [   23.841492]  rtl8169_close+0x64/0x100
>>> [   23.841496]  __dev_close_many+0xbc/0x1f0
>>> [   23.841502]  dev_close_many+0x94/0x160
>>> [   23.841505]  unregister_netdevice_many_notify+0x160/0x9d0
>>> [   23.841510]  unregister_netdevice_queue+0xf0/0x100
>>> [   23.841515]  unregister_netdev+0x2c/0x58
>>> [   23.841519]  rtl_remove_one+0xa0/0xe0
>>> [   23.841524]  pci_device_remove+0x4c/0xf8
>>> [   23.841528]  device_remove+0x54/0x90
>>> [   23.841534]  device_release_driver_internal+0x1d4/0x238
>>> [   23.841539]  device_release_driver+0x20/0x38
>>> [   23.841544]  pci_stop_bus_device+0x84/0xe0
>>> [   23.841548]  pci_stop_bus_device+0x40/0xe0
>>> [   23.841552]  pci_stop_root_bus+0x48/0x80
>>> [   23.841555]  dw_pcie_host_deinit+0x34/0xe0
>>> [   23.841559]  rockchip_pcie_shutdown+0x20/0x38
>>> [   23.841565]  platform_shutdown+0x2c/0x48
>>> [   23.841571]  device_shutdown+0x150/0x278
>>> [   23.841575]  kernel_restart+0x4c/0xb8
>>>
>>> Add a netif_device_present() guard around the rtl8169_down() call in
>>> rtl8169_close(), to avoid rtl8169_down() from being called twice.
>>>
>>> This matches how e.g. e1000e_close() has a netif_device_present() guard
>>> around the e1000e_down() call.
>>>
>> This approach has at least two issues:
>>
>> 1. Likely it breaks WoL, because now phy_detach() is called.
>> 2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
>>     for the remove callback. Now it's left in D0.
>>
>> I'll also spend a few thoughts on how to solve this best.
>>
>>> Signed-off-by: Niklas Cassel <cassel@...nel.org>
>>> ---
>>>   drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>> index 4eebd9cb40a3..0300a06ae260 100644
>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>> @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
>>>   	pm_runtime_get_sync(&pdev->dev);
>>>   
>>>   	netif_stop_queue(dev);
>>> -	rtl8169_down(tp);
>>> +	if (netif_device_present(tp->dev))
>>> +		rtl8169_down(tp);
>>>   	rtl8169_rx_clear(tp);
>>>   
>>>   	free_irq(tp->irq, tp);
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ