[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240207200538.GA912749@bhelgaas>
Date: Wed, 7 Feb 2024 14:05:38 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Daniel Drake <drake@...lessos.org>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
	dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	bhelgaas@...gle.com, david.e.box@...ux.intel.com,
	mario.limonciello@....com, rafael@...nel.org, lenb@...nel.org,
	linux-acpi@...r.kernel.org, linux@...lessos.org
Subject: Re: [PATCH v2 1/2] PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge
On Wed, Feb 07, 2024 at 09:44:51AM +0100, Daniel Drake wrote:
> The Asus B1400 with original shipped firmware versions and VMD disabled
> cannot resume from suspend: the NVMe device becomes unresponsive and
> inaccessible.
> 
> This is because the NVMe device and parent PCI bridge get put into D3cold
> during suspend, and this PCI bridge cannot be recovered from D3cold mode:
> 
>   echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/unbind
>   echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/unbind
>   setpci -s 00:06.0 CAP_PM+4.b=03 # D3hot
>   acpidbg -b "execute \_SB.PC00.PEG0.PXP._OFF"
>   acpidbg -b "execute \_SB.PC00.PEG0.PXP._ON"
>   setpci -s 00:06.0 CAP_PM+4.b=0 # D0
>   echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/bind
>   echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/bind
>   # NVMe probe fails here with -ENODEV
Can you run "sudo lspci -vvxxxx -s00:06.0" before putting the Root
Port in D3hot, and then again after putting it back in D0 (when NVMe
is inaccessible), and attach both outputs to the bugzilla?
> This appears to be an untested D3cold transition by the vendor; Intel
> socwatch shows that Windows leaves the NVMe device and parent bridge in D0
> during suspend, even though these firmware versions have StorageD3Enable=1.
> 
> Experimenting with the DSDT, the _OFF method calls DL23() which sets a L23E
> bit at offset 0xe2 into the PCI configuration space for this root port.
> This is the specific write that the _ON routine is unable to recover from.
> This register is not documented in the public chipset datasheet.
> 
> Disallow D3cold on the PCI bridge to enable successful suspend/resume.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742
> Signed-off-by: Daniel Drake <drake@...lessos.org>
> ---
>  arch/x86/pci/fixup.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> v2:
> Match only specific BIOS versions where this quirk is required.
> Add subsequent patch to this series to revert the original S3 workaround
> now that s2idle is usable again.
> 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index f347c20247d30..6b0b341178e4f 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -907,6 +907,51 @@ static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability);
>  DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability);
>  
> +/*
> + * Disable D3cold on Asus B1400 PCIe bridge at 00:06.0.
I doubt the 00:06.0 PCI address is relevant here.
> + * On this platform with VMD off, the NVMe's parent PCI bridge cannot
> + * successfully power back on from D3cold, resulting in unresponsive NVMe on
> + * resume. This appears to be an untested transition by the vendor: Windows
> + * leaves the NVMe and parent bridge in D0 during suspend.
> + * This is only needed on BIOS versions before 308; the newer versions flip
> + * StorageD3Enable from 1 to 0.
Given that D3cold is just "main power off," and obviously the Root
Port *can* transition from D3cold to D0 (at initial platform power-up
if nothing else), this seems kind of strange and makes me think we may
not completely understand the root cause, e.g., maybe some config
didn't get restored.
But the fact that Windows doesn't use D3cold in this case suggests
that either (1) Windows has a similar quirk to work around this, or
(2) Windows decides whether to use D3cold differently than Linux does.
I have no data, but (1) seems sort of unlikely.  In case it turns out
to be (2) and we figure out how to fix it that way someday, can you
add the output of "sudo lspci -vvxxxx" of the system to the bugzilla?
What would be the downside of skipping the DMI table and calling
pci_d3cold_disable() always?  If this truly is a Root Port defect, it
should affect all platforms with this device, and what's the benefit
of relying on BIOS to use StorageD3Enable to avoid the defect?
Rewrap into a single paragraph or add a blank line between paragraphs.
> + */
> +static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = {
> +	{
> +		.matches = {
> +				DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
> +				DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.304"),
> +		},
> +	},
> +	{
> +		.matches = {
> +				DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
> +				DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.305"),
> +		},
> +	},
> +	{
> +		.matches = {
> +				DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
> +				DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.306"),
> +		},
> +	},
> +	{
> +		.matches = {
> +				DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
> +				DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.307"),
> +		},
> +	},
> +	{}
> +};
> +
> +static void asus_disable_nvme_d3cold(struct pci_dev *pdev)
> +{
> +	if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0)
> +		pci_d3cold_disable(pdev);
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold);
> +
>  #ifdef CONFIG_SUSPEND
>  /*
>   * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but
> -- 
> 2.43.0
> 
Powered by blists - more mailing lists
 
