[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2bf715fb-509b-4b00-a28d-1cc83c0bb588@amd.com>
Date: Wed, 4 Sep 2024 10:24:26 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, Gary Li <Gary.Li@....com>,
Mario Limonciello <superm1@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Mathias Nyman <mathias.nyman@...el.com>,
"open list : PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"open list : USB XHCI DRIVER" <linux-usb@...r.kernel.org>,
Daniel Drake <drake@...lessos.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in
pci_dev_wait()
On 9/4/2024 07:05, Mika Westerberg wrote:
> Hi,
>
> On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
>> On 9/3/2024 13:25, Bjorn Helgaas wrote:
>>> On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
>>>> On 9/3/2024 12:11, Bjorn Helgaas wrote:
>>>> ...
>>>
>>>>> 8) The USB4 stack sees the device and assumes it is in D0, but it
>>>>> seems to still be in D3cold. What is this based on? Is there a
>>>>> config read that returns ~0 data when it shouldn't?
>>>>
>>>> Yes there is. From earlier in the thread I have a [log] I shared.
>>>>
>>>> The message emitted is from ring_interrupt_active():
>>>>
>>>> "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
>>>
>>> Right, that's in the cover letter, but I can't tell from this what the
>>> ioread32(ring->nhi->iobase + reg) returned. It looks like this is an
>>> MMIO read of BAR 0, not a config read.
>>>
>>
>> Yeah. I suppose another way to approach this problem is to make something
>> else in the call chain poll PCI_PM_CTRL.
>>
>> Polling at the start of nhi_runtime_resume() should also work. For the
>> "normal" scenario it would just be a single read to PCI_PM_CTRL.
>>
>> Mika, thoughts?
We did this experiment to throw code to poll PCI_PM_CTRL at the start of
nhi_runtime_resume() but this also fails. From that I would hypothesize
the device transitioned to D0uninitialized sometime in the middle of
pci_pm_runtime_resume() before the call to pm->runtime_resume(dev);
>
> I'm starting to wonder if we are looking at the correct place ;-) This
> reminds me that our PCIe SV people recently reported a couple of Linux
> related issues which they recommended to fix, and these are on my list
> but I'll share them because maybe they are related?
Thanks for sharing those. We had a try with them but sorry to say no
improvements to the issue at hand.
>
> First problem, and actually a PCI spec violation, is that Linux does not
> clear Bus Master, MMIO and IO space enables when it programs the device
> to D3 on runtime suspend path. It does so on system sleep path though.
> Something like below (untested) should do that:
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f412ef73a6e4..79a566376301 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1332,6 +1332,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
>
> if (!pci_dev->state_saved) {
> pci_save_state(pci_dev);
> + pci_pm_default_suspend(pci_dev);
> pci_finish_runtime_suspend(pci_dev);
> }
>
>
> The second thing is that Thunderbolt driver, for historical reasons,
> leaves the MSI enabled when entering D3. This too might be related. I
> think we can unconditionally disable it so below hack should do that
> (untested as well). I wonder if you could try if any of these or both
> can help here? Both of these issues can result unwanted events during D3
> entry as far as I understand.
>
> diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
> index dc1f456736dc..73b815fbbceb 100644
> --- a/drivers/thunderbolt/ctl.c
> +++ b/drivers/thunderbolt/ctl.c
> @@ -659,12 +659,11 @@ struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, int index, int timeout_msec,
> if (!ctl->frame_pool)
> goto err;
>
> - ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
> + ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, 0);
> if (!ctl->tx)
> goto err;
>
> - ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0, 0xffff,
> - 0xffff, NULL, NULL);
> + ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, 0, 0, 0xffff, 0xffff, NULL, NULL);
> if (!ctl->rx)
> goto err;
>
Powered by blists - more mailing lists