lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab893fd1-6a90-4a39-963a-111dbdc9f720@amd.com>
Date: Wed, 28 Aug 2024 17:26:32 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-pci@...r.kernel.org, Ilpo Järvinen
 <ilpo.jarvinen@...ux.intel.com>, "Maciej W . Rozycki" <macro@...am.me.uk>,
 Mika Westerberg <mika.westerberg@...ux.intel.com>,
 Lukas Wunner <lukas@...ner.de>,
 "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
 Duc Dang <ducdang@...gle.com>, Alex Williamson <alex.williamson@...hat.com>,
 linux-kernel@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
 "Li, Gary" <Gary.Li@....com>
Subject: Re: [RFC PATCH 0/3] PCI: Use Configuration RRS to wait for device
 ready

On 8/28/2024 16:42, Bjorn Helgaas wrote:
> On Wed, Aug 28, 2024 at 04:24:01PM -0500, Mario Limonciello wrote:
>> On 8/27/2024 18:48, Bjorn Helgaas wrote:
>>> From: Bjorn Helgaas <bhelgaas@...gle.com>
>>>
>>> After a device reset, pci_dev_wait() waits for a device to become
>>> completely ready by polling the PCI_COMMAND register.  The spec envisions
>>> that software would instead poll for the device to stop responding to
>>> config reads with Completions with Request Retry Status (RRS).
>>>
>>> Polling PCI_COMMAND leads to hardware retries that are invisible to
>>> software and the backoff between software retries doesn't work correctly.
>>>
>>> Root Ports are not required to support the Configuration RRS Software
>>> Visibility feature that prevents hardware retries and makes the RRS
>>> Completions visible to software, so this series only uses it when available
>>> and falls back to PCI_COMMAND polling when it's not.
>>>
>>> This is completely untested and posted for comments.
>>>
>>> Bjorn Helgaas (3):
>>>     PCI: Wait for device readiness with Configuration RRS
>>>     PCI: aardvark: Correct Configuration RRS checking
>>>     PCI: Rename CRS Completion Status to RRS
>>>
>>>    drivers/bcma/driver_pci_host.c             | 10 ++--
>>>    drivers/pci/controller/dwc/pcie-tegra194.c | 18 +++---
>>>    drivers/pci/controller/pci-aardvark.c      | 64 +++++++++++-----------
>>>    drivers/pci/controller/pci-xgene.c         |  6 +-
>>>    drivers/pci/controller/pcie-iproc.c        | 18 +++---
>>>    drivers/pci/pci-bridge-emul.c              |  4 +-
>>>    drivers/pci/pci.c                          | 41 +++++++++-----
>>>    drivers/pci/pci.h                          | 11 +++-
>>>    drivers/pci/probe.c                        | 33 +++++------
>>>    include/linux/bcma/bcma_driver_pci.h       |  2 +-
>>>    include/linux/pci.h                        |  1 +
>>>    include/uapi/linux/pci_regs.h              |  6 +-
>>>    12 files changed, 117 insertions(+), 97 deletions(-)
>>
>> Although this looks like a useful series, I'm sorry to say but this doesn't
>> solve the issue that Gary and I raised.  We double checked today and found
>> that reading the vendor ID works just fine at this time.
> 
> Thanks for testing that.

Sure.

> 
>> I think that we're still better off polling PCI_PM_CTRL to "wait" for D0
>> after the state change from D3cold.
> 
> Is there some spec justification for polling PCI_PM_CTRL?  I'm dubious
> about doing that just because "it works" in this situation, unless we
> have some better understanding about *why* it works and whether all
> devices are supposed to work that way.
> 

I mentioned this a little bit in my patch 3/5 in my submission.  The 
issue isn't "normal" D3cold exit that is fully settled down.
That takes ~6ms from measurements.  The issue is how long it takes for 
D3cold *entry* followed by *exit*.

When this issue occurs it's tied with a tight loop of runtime PM entry 
and exit happening in that short window.
That's why it can be tripped by unplugging a dock, waiting until 
~approximately autosuspend delay and plugging it back in.  If you catch 
the right timing then the USB4 router is still on it's way down to D3cold.

In terms of a way to match this problem to the spec, the closest I could 
think is PCI-PM spec.

But I do see in the PCI-PM spec that the delay for D0->D3hot should be 
10ms. In the Linux kernel implementation __pci_set_power_state() when 
called with D3cold calls pci_set_low_power_state() which does wait 10ms 
followed by using the platform to remove power.

I can't find any timing requirements for D3hot->D3cold transition though.

I would hypothesize that if we injected a longer delay on the "other 
end" for the D3cold transition entry it would solve this issue as well 
though.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ