lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66019fa3-2f02-4b03-9eb7-7b0bed0fd044@amd.com>
Date: Thu, 12 Sep 2024 23:12:11 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, Gary Li <Gary.Li@....com>,
 Mario Limonciello <superm1@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
 Mathias Nyman <mathias.nyman@...el.com>,
 "open list : PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>,
 "open list : USB XHCI DRIVER" <linux-usb@...r.kernel.org>,
 Daniel Drake <drake@...lessos.org>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in
 pci_dev_wait()

On 9/10/2024 04:13, Mika Westerberg wrote:
> Hi,
> 
> On Mon, Sep 09, 2024 at 03:40:54PM -0500, Mario Limonciello wrote:
>>> Few additional side paths here, though. This is supposed to work so that
>>> once the host router sleep bit is set the driver is supposed to allow
>>> the domain to enter sleep (e.g it should not be waken up before it is
>>> fully transitioned). That's what we do:
>>>
>>> 1. All tunneled PCIe Root/Downstream ports are in D3.
>>> 2. All tunneled USB 3.x ports are in U3.
>>> 3. No DisplayPort is tunneled.
>>> 4. Thunderbolt driver enables wakes.
>>> 5. Thunderbolt driver writes sleep ready bit of the host router.
>>> 6. Thunderbolt driver runtime suspend is complete.
>>> 7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
>>> Event".
>>>
>>> If between 5 and 7 there is device connected, it should not "abort" the
>>> sequence. Unfortunately this is not explict in the USB4 spec but the
>>> connection manager guide has similar note. Even if the connect happens
>>> there the "Sleep Event" should happen but after that it can trigger
>>> normal wakeup which will then bring everything back.
>>>
>>> Would it be possible to enable tracing around these steps so that we
>>> could see if there is hotplug notification somewhere there that is not
>>> expected? Here are instructions how to get pretty accurate trace:
>>>
>>> https://github.com/intel/tbtools?tab=readme-ov-file#tracing
>>>
>>> Please also take full dmesg.
>>
>> Sure, here is the dmesg with tracing enabled:
>>
>> https://gist.github.com/superm1/5186e0023c8a5d2ecd75c50fd2168308
> 
> Thanks! I meant also enable control channel tracing as explained in the
> above linked page so we could maybe see the hotplug packet coming from
> the lane adapter too. Anyhow,
> 
> [  560.789681] thunderbolt 0-2: device disconnected
> [  560.789771] thunderbolt 0000:e5:00.5: bandwidth consumption changed, re-calculating estimated bandwidth
> [  560.789775] thunderbolt 0000:e5:00.5: bandwidth re-calculation done
> [  560.789778] thunderbolt 0000:e5:00.5: looking for DP IN <-> DP OUT pairs:
> [  560.789800] thunderbolt 0000:e5:00.5: 0:6: DP IN available
> [  560.789803] thunderbolt 0000:e5:00.5: 0:6: no suitable DP OUT adapter available, not tunneling
> [  560.790484] thunderbolt 0000:e5:00.5: 0:7: DP IN available
> [  560.790487] thunderbolt 0000:e5:00.5: 0:7: no suitable DP OUT adapter available, not tunneling
> ...
> [  578.677640] thunderbolt 0000:e5:00.5: nhi_runtime_suspend() - enter, pdev->current_state = 0
> [  578.677648] thunderbolt 0000:e5:00.5: 0: suspending switch
> [  578.677653] thunderbolt 0000:e5:00.5: 0: enabling wakeup: 0x3f
> [  578.678248] thunderbolt 0000:e5:00.5: stopping RX ring 0
> [  578.678256] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 3 (0x9 -> 0x1)
> [  578.678272] thunderbolt 0000:e5:00.5: stopping TX ring 0
> [  578.678277] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
> [  578.678287] thunderbolt 0000:e5:00.5: control channel stopped
> ...
> pci_pm_runtime_resume()
>    pci_pm_default_resume_early(pci_dev);
>      pci_pm_power_up_and_verify_state(pci_dev);
>        pci_power_up(pci_dev);
>          platform_pci_set_power_state(dev, PCI_D0);
> 	// here we should see acpi_pci_set_power_state() printing "xxx: power
> 	// state changed by ACPI to D0" but we don't.
> 	//
> 	// also pdev->current_state is set to PCI_D0
>        pci_update_current_state(pci_dev, PCI_D0);
>          // this one reads it back, it should be 0..
>    pm->runtime_resume(dev);
>      nhi_runtime_resume(dev)
> [  578.773503] thunderbolt 0000:e5:00.5: nhi_runtime_resume() - enter, pdev->current_state = 3
> 
> // .. but it is not. It seems to be powered off, D3cold.
> 
> [  578.773516] thunderbolt 0000:e5:00.5: control channel starting...
> [  578.773518] thunderbolt 0000:e5:00.5: starting TX ring 0
> [  578.773524] thunderbolt 0000:e5:00.5: enabling interrupt at register 0x38200 bit 0 (0xffffffff -> 0xffffffff)
> 
> // As here too.
> 
> It would be interesting to see what ACPI does here, I mean enabling
> dynamic debugging of acpi_device_set_power() and friends. 

Here is a log with dyndbg turned up.
https://gist.github.com/superm1/2f7ca6123431365d11e9bc3cc9329c14

You can see firmware node \_SB_.PCI0.BUSC.NHI0 is physical device 
0000:e5:00.5.

> One suspect is
> that when the runtime suspends happens, the power resource or _PS3()
> gets called to put the device into D3cold and it has some sort of
> Sleep() inside. This allows the OS to relese the CPU to do other things
> too according to ACPI spec (and Linux does this) so now when you plug in
> the device the GPE triggers wake and Linux starts processing that one.
> We see the above runtime resume but then the Sleep() in the _PS3()
> completes and cuts the power from the NHI while we are in the middle of
> resuming it already (or something along those lines).
> 
> We saw similar "context switch" happening when PCIe L2 failed, BIOS
> implemented the timeout using Sleep() that allowed Linux to process
> other things which resulted unexpected wake (not the same as here
> though).
> 
> So one thing to check too is how the ACPI methods get called and
> especially if they somehow end up messing with each other.

At least to me it looks pretty straightforward that each D0 sequence is 
going through and nothing looked out of order:

acpi_pci_set_power_state()
->acpi_device_set_power()
->->acpi_dev_pm_explicit_set()


There is a Sleep() call in M232() which is a SMI used in both _PS0 and 
_PS3, but as far as I can tell it's not actually called as the case 
always has zero as an argument.

Here's some snippets of ASL from the NHI0 device to see.
https://gist.github.com/superm1/32ef1f822a6220a41b19574024717f79

I feel your theory is right about it being an aborted D3cold request, 
but I don't believe it's from concurrency issue of _PS3.  I feel there 
isn't any guard rails either in Linux or the AML for ensuring that
the transition actually finished.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ