linux-kernel - Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bdc99602-7bb2-4026-8122-e92f894aca09@amd.com>
Date: Tue, 3 Sep 2024 12:31:00 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Bjorn Helgaas <helgaas@...nel.org>, Gary Li <Gary.Li@....com>
Cc: Mario Limonciello <superm1@...nel.org>,
 Bjorn Helgaas <bhelgaas@...gle.com>, Mathias Nyman
 <mathias.nyman@...el.com>, Mika Westerberg
 <mika.westerberg@...ux.intel.com>,
 "open list : PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>,
 "open list : USB XHCI DRIVER" <linux-usb@...r.kernel.org>,
 Daniel Drake <drake@...lessos.org>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in
 pci_dev_wait()

On 9/3/2024 12:11, Bjorn Helgaas wrote:

<snip>

>>
>> I guess I reading between the lines you have an assumption that you
>> can't read the vendor ID from D3; which doesn't appear to be the
>> case from our testing.
> 
> A Vendor ID read of a device in D3hot should definitely work.
> Obviously if the device were in D3cold, we'd get no response at all,
> so the requester should log a UR error and fabricate ~0 data.
> 
> But if the device starts out in D3cold and we power it up, it should
> not go through D3hot.  The only legal transition from D3cold is to
> D0uninitialized (PCIe r6.0, sec 5.8).

Right.  The issue is it didn't finish getting into D3 at the time that 
we attempted to go to D0 though.  So all this extra time is basically 
waiting for the D0->D3 transition to finish followed by D3->D0uninitialized.

The best description I could offer is to call it an "aborted" D3.

> 
> OK, so with [1] and patch 3/5:
> 
>    1) Initially the device is in D0
> 
>    2) We put it in D3cold (using some ACPI method) because the
>    autosuspend delay expired (?)
> 
>    3) Plugging in the dock wakes up the device, so we power up the
>    device (via pci_power_up(), which again uses some ACPI method), and
>    it should transition to D0uninitialized
> 
>    4) With patch 3/5, pci_power_up() calls pci_dev_wait() because
>    dev->current_state == PCI_D3cold
> 
>    5) I *assume* RRS SV is enabled (lspci -vv of Root Port would
>    confirm this; maybe we should add a pci_dbg message about which
>    register we're polling).  If so, patch [1] means we should poll
>    Vendor ID until successful completion.
> 

Yup.
                 RootCap: CRSVisible+
                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- 
PMEIntEna+ CRSVisible+

>    6) pci_dbg log should confirm the device is ready with a "ready %dms
>    after D3cold->D0" message, which would mean we got a successful
>    completion when reading Vendor ID
> 
>    7) For debugging purposes, it would be interesting to read and log
>    the PCI_PM_CTRL value here.  Per sec 2.3.1, the device is not
>    allowed to return RRS at this point since we already got a
>    successful completion.
> 

OK let me get a debug log with [1], 3/5, 6.11-rc6 and a message added 
about this value to share back.

>    8) The USB4 stack sees the device and assumes it is in D0, but it
>    seems to still be in D3cold.  What is this based on?  Is there a
>    config read that returns ~0 data when it shouldn't?
> 

Yes there is.  From earlier in the thread I have a [log] I shared.

The message emitted is from ring_interrupt_active():

"thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"

[log] https://gist.github.com/superm1/cb407766ab15f42f12a6fe9d1196f6fc