lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD8Lp47DjuAAxqwt+yKD22UNMyvqE00x0u+JeM74KO2OC+Otrg@mail.gmail.com>
Date: Thu, 8 Feb 2024 09:37:36 +0100
From: Daniel Drake <drake@...lessos.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, bhelgaas@...gle.com, 
	david.e.box@...ux.intel.com, mario.limonciello@....com, rafael@...nel.org, 
	lenb@...nel.org, linux-acpi@...r.kernel.org, linux@...lessos.org
Subject: Re: [PATCH v2 1/2] PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge

On Wed, Feb 7, 2024 at 9:05 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
> Can you run "sudo lspci -vvxxxx -s00:06.0" before putting the Root
> Port in D3hot, and then again after putting it back in D0 (when NVMe
> is inaccessible), and attach both outputs to the bugzilla?

Done: https://bugzilla.kernel.org/show_bug.cgi?id=215742#c21

> Given that D3cold is just "main power off," and obviously the Root
> Port *can* transition from D3cold to D0 (at initial platform power-up
> if nothing else), this seems kind of strange and makes me think we may
> not completely understand the root cause, e.g., maybe some config
> didn't get restored.
>
> But the fact that Windows doesn't use D3cold in this case suggests
> that either (1) Windows has a similar quirk to work around this, or
> (2) Windows decides whether to use D3cold differently than Linux does.
>
> I have no data, but (1) seems sort of unlikely.  In case it turns out
> to be (2) and we figure out how to fix it that way someday, can you
> add the output of "sudo lspci -vvxxxx" of the system to the bugzilla?

https://bugzilla.kernel.org/show_bug.cgi?id=215742#c27

Some other interesting observations from Windows, observed via socwatch & VTune:

On affected BIOS versions:
CPU does not go into the lowest power state PC10 during suspend - it
only reaches PC8.
SLP_S0# signal is not asserted (which follows from it not reaching PC10).
NVMe device in D0 and the HDD LED briefly blinks every 1-2 seconds
(can't recall if it a regular or irregular blink)

On latest BIOS version:
PC10 reached and SLP_S0# asserted during suspend, but only for about
25% of the suspend time
NVMe device in D0 and the HDD LED briefly blinks every 1-2 seconds
(can't recall if it a regular or irregular blink)

The LED blinking leaves me wondering if there is something "using" the
disk during suspend in Windows, so that's why it doesn't try to power
it down even on the original version with StorageD3Enable=1. This HDD
LED blinking during suspend does not happen on Linux, not even when
NVMe device is left in D0 over suspend with the regular nvme_suspend()
path putting the NVMe device into lower power mode at the NVMe
protocol level.

> What would be the downside of skipping the DMI table and calling
> pci_d3cold_disable() always?  If this truly is a Root Port defect, it
> should affect all platforms with this device, and what's the benefit
> of relying on BIOS to use StorageD3Enable to avoid the defect?

I had more assumed that it was a platform-specific DSDT bug, in that
PEG0.PXP._OFF is doing something that PEG0.PXP._ON is unable to
recover from, and that other platforms might handle the suspend/resume
of this root port more correctly. Not sure if it is reasonable to
assume that all other platforms on the same chipset have the same bug
(if that's what this is).

Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ