lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD8Lp46_8hVqs2psK4FLRR8EGFssbkWyrQ5OEst0-59OuOpwwQ@mail.gmail.com>
Date: Mon, 19 Feb 2024 10:52:28 +0100
From: Daniel Drake <drake@...lessos.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, bhelgaas@...gle.com, 
	david.e.box@...ux.intel.com, mario.limonciello@....com, rafael@...nel.org, 
	lenb@...nel.org, linux-acpi@...r.kernel.org, linux@...lessos.org
Subject: Re: [PATCH v2 1/2] PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge

On Thu, Feb 8, 2024 at 10:52 AM Daniel Drake <drake@...lessos.org> wrote:
> Just realised my main workstation (Dell XPS) has the same chipset.
>
> The Dell ACPI table has the exact same suspect-buggy function, which
> the affected Asus system calls from PEG0.PXP._OFF:
>
>         Method (DL23, 0, Serialized)
>         {
>             L23E = One
>             Sleep (0x10)
>             Local0 = Zero
>             While (L23E)
>             {
>                 If ((Local0 > 0x04))
>                 {
>                     Break
>                 }
>
>                 Sleep (0x10)
>                 Local0++
>             }
>
>             SCB0 = One
>         }
>
> (the "L23E = One" line is the one that writes a value to config offset
> 0xe2, if you comment out this line then everything works)
>
> However, on the Dell XPS system, nothing calls DL23() i.e. it is dead code.
>
> Comparing side by side:
> Asus root port (PC00.PEG0) has the PXP power resource which gets
> powered down during D3cold transition as it becomes unused. Dell root
> port has no power resources (no _PR0).
> Asus NVM device sitting under that root port (PC00.PEG0.PEGP) has
> no-op _PS3 method, but Dell does not have _PS3. This means that Dell
> doesn't attempt D3cold on NVMe nor the parent root port during suspend
> (both go to D3hot only).

Recap: comparing Asus device (NVMe + parent bridge goes into D3cold in
suspend, and cannot wake up) vs Dell device with same chipset (NVMe
device + parent bridge go into D3hot).

These suspend power states were confirmed by:
    echo -n "file pci-driver.c +p" > /sys/kernel/debug/dynamic_debug/control

In asking "why does the Dell device not go into D3cold" I got some
details mixed up above. I have now clarified:
The NVMe device does not have any _PSx _PRx methods so
acpi_bus_get_power_flags() does not set the power_manageable flag.
This limits the pci layer to D3hot at best.
The parent bridge has _PS0 and _PS3 methods, so it is
power_manageable. However, it does not have any power resources
(_PR0/_PR3) and hence ACPI_STATE_D3_COLD is not marked as valid.
Checking the ACPI spec, this is indeed the definition of D3cold
support (_PR3 gives the required power resources for running the
device in D3hot state, so if you turn those ones off you get D3cold).

This does not conclusively answer the question of "is D3cold broken on
this PCI bridge for all devices built on this chipset?". But at a
stretch you could regard it as another data point agreeing with that
theory: the Dell product does not attempt D3cold support at the ACPI
level and there may be a good reason for that.

Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ