lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2bf715fb-509b-4b00-a28d-1cc83c0bb588@amd.com>
Date: Wed, 4 Sep 2024 10:24:26 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, Gary Li <Gary.Li@....com>,
 Mario Limonciello <superm1@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
 Mathias Nyman <mathias.nyman@...el.com>,
 "open list : PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>,
 "open list : USB XHCI DRIVER" <linux-usb@...r.kernel.org>,
 Daniel Drake <drake@...lessos.org>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in
 pci_dev_wait()

On 9/4/2024 07:05, Mika Westerberg wrote:
> Hi,
> 
> On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
>> On 9/3/2024 13:25, Bjorn Helgaas wrote:
>>> On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
>>>> On 9/3/2024 12:11, Bjorn Helgaas wrote:
>>>> ...
>>>
>>>>>      8) The USB4 stack sees the device and assumes it is in D0, but it
>>>>>      seems to still be in D3cold.  What is this based on?  Is there a
>>>>>      config read that returns ~0 data when it shouldn't?
>>>>
>>>> Yes there is.  From earlier in the thread I have a [log] I shared.
>>>>
>>>> The message emitted is from ring_interrupt_active():
>>>>
>>>> "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
>>>
>>> Right, that's in the cover letter, but I can't tell from this what the
>>> ioread32(ring->nhi->iobase + reg) returned.  It looks like this is an
>>> MMIO read of BAR 0, not a config read.
>>>
>>
>> Yeah.  I suppose another way to approach this problem is to make something
>> else in the call chain poll PCI_PM_CTRL.
>>
>> Polling at the start of nhi_runtime_resume() should also work.  For the
>> "normal" scenario it would just be a single read to PCI_PM_CTRL.
>>
>> Mika, thoughts?

We did this experiment to throw code to poll PCI_PM_CTRL at the start of 
nhi_runtime_resume() but this also fails.  From that I would hypothesize 
the device transitioned to D0uninitialized sometime in the middle of 
pci_pm_runtime_resume() before the call to pm->runtime_resume(dev);

> 
> I'm starting to wonder if we are looking at the correct place ;-) This
> reminds me that our PCIe SV people recently reported a couple of Linux
> related issues which they recommended to fix, and these are on my list
> but I'll share them because maybe they are related?

Thanks for sharing those.  We had a try with them but sorry to say no 
improvements to the issue at hand.

> 
> First problem, and actually a PCI spec violation, is that Linux does not
> clear Bus Master, MMIO and IO space enables when it programs the device
> to D3 on runtime suspend path. It does so on system sleep path though.
> Something like below (untested) should do that:
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f412ef73a6e4..79a566376301 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1332,6 +1332,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
>   
>   	if (!pci_dev->state_saved) {
>   		pci_save_state(pci_dev);
> +		pci_pm_default_suspend(pci_dev);
>   		pci_finish_runtime_suspend(pci_dev);
>   	}
>   
> 
> The second thing is that Thunderbolt driver, for historical reasons,
> leaves the MSI enabled when entering D3. This too might be related. I
> think we can unconditionally disable it so below hack should do that
> (untested as well). I wonder if you could try if any of these or both
> can help here? Both of these issues can result unwanted events during D3
> entry as far as I understand.
> 
> diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
> index dc1f456736dc..73b815fbbceb 100644
> --- a/drivers/thunderbolt/ctl.c
> +++ b/drivers/thunderbolt/ctl.c
> @@ -659,12 +659,11 @@ struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, int index, int timeout_msec,
>   	if (!ctl->frame_pool)
>   		goto err;
>   
> -	ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
> +	ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, 0);
>   	if (!ctl->tx)
>   		goto err;
>   
> -	ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0, 0xffff,
> -				   0xffff, NULL, NULL);
> +	ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, 0, 0, 0xffff, 0xffff, NULL, NULL);
>   	if (!ctl->rx)
>   		goto err;
>   


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ