[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0ee3415-4670-4c0c-897a-d5f70e0f65eb@amd.com>
Date: Wed, 11 Dec 2024 15:24:28 -0600
From: Mario Limonciello <mario.limonciello@....com>
To: Werner Sembach <wse@...edocomputers.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: ggo@...edocomputers.com, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] PCI: Avoid putting some root ports into D3 on some
Ryzen chips
On 12/11/2024 06:47, Werner Sembach wrote:
> Hi,
>
> Am 10.12.24 um 17:00 schrieb Mario Limonciello:
>> On 12/10/2024 09:24, Werner Sembach wrote:
>>> Hi,
>>>
>>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>>> From: Mario Limonciello <mario.limonciello@....com>
>>>>>
>>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>>> the system is suspended if the port is not power manageable by the
>>>>> platform and won't be used for wakeup via a PME this sets up the
>>>>> policy for these ports to go into D3hot.
>>>>>
>>>>> This policy generally makes sense from an OSPM perspective but it
>>>>> leads
>>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1
>>>>> with
>>>>> an unupdated BIOS.
>>>>>
>>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing
>>>>> wakeup
>>>>> interrupt.
>>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system
>>>>> hang.
>>>>>
>>>>> On the affected Device + BIOS combination, add a quirk for the PCI
>>>>> device
>>>>> ID used by the problematic root port on both chips to ensure that
>>>>> these
>>>>> root ports are not put into D3hot at suspend.
>>>>>
>>>>> This patch is based on
>>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2-
>>>>> mario.limonciello@....com/
>>>>> but with the added condition both in the documentation and in the
>>>>> code to
>>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original
>>>>> unpatched BIOS.
>>>>>
>>>>> Co-developed-by: Georg Gottleuber <ggo@...edocomputers.com>
>>>>> Signed-off-by: Georg Gottleuber <ggo@...edocomputers.com>
>>>>> Co-developed-by: Werner Sembach <wse@...edocomputers.com>
>>>>> Signed-off-by: Werner Sembach <wse@...edocomputers.com>
>>>>> Cc: stable@...r.kernel.org # 6.1+
>>>>> Reported-by: Iain Lane <iain@...ngesquash.org.uk>
>>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-
>>>>> suspend-with-external-USB-keyboard/m-p/5217121
>>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@....com>
>>>>> ---
>>>>> drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>> 1 file changed, 31 insertions(+)
>>>>>
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -3908,6 +3908,37 @@ static void
>>>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>>> DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>> quirk_apple_poweroff_thunderbolt);
>>>>> +
>>>>> +/*
>>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers
>>>>> into D3hot
>>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>>> + *
>>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a
>>>>> missing wakeup
>>>>> + * interrupt.
>>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system
>>>>> hang.
>>>>> + *
>>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with
>>>>> the original
>>>>> + * unupdated BIOS.
>>>>> + */
>>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>>> + {
>>>>> + .matches = {
>>>>> + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>>> + DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>>> + DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>>> + },
>>>>> + },
>>>>> + {}
>>>>> +};
>>>>> +
>>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>>> +{
>>>>> + if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>>> + !acpi_pci_power_manageable(pdev))
>>>>> + pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>>> +}
>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9,
>>>>> quirk_ryzen_rp_d3);
>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb,
>>>>> quirk_ryzen_rp_d3);
>>>>> #endif
>>>>> /*
>>>>
>>>> Wait, what is wrong with:
>>>>
>>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD
>>>> Rembrandt and Phoenix USB4")
>>>>
>>>> Is that not activating on your system for some reason?
>>>
>>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had
>>> blackscreen on wakeup.
>>
>> OK, I think we need to dig a layer deeper to see which root port is
>> causing problems to understand this.
>>
>>>
>>> With a newer BIOS for that device suspend and resume however works.
>>>
>>
>> Is there any reason that people would realistically be staying on the
>> old BIOS and instead we need to carry this quirk in the kernel for
>> eternity?
> Fear of device bricking or not knowing an update is available.
The not knowing is solved by publishing firmware to LVFS.
Most "popular" distributions include fwupd, regularly check for updates
and will notify users through the MOTD or a graphical application that
there is something available.
>>
>> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's
>> not a very big barrier to entry to do an update like it was 20 years ago.
> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/nsh
> scripts which still is the main form ODMs provide bios updates.
It's tangential to this thread; but generally ODMs will provide you a
capsule if you ask for one.
Anyhow if you are providing scripts and random EFI binaries in order to
get things updated, that explains why this is a large enough chunk of
people to justify a patch like this.
>>
>>> Looking in the patch the device id's are different (0x162e, 0x162f,
>>> 0x1668, and 0x1669).
>>>
>>
>> TUXEDO Sirius 16 Gen1 is Phoenix based, right? So at a minimum you
>> shouldn't be including PCI IDs from Rembrandt (0x14b9)
> Thanks for the hint, I can delete that then.
>>
>> Here is the topology from a Phoenix system on my side:
>>
>> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
>
> Sorry for the noob question: How do I get that format? it's not lspci -
> tvnn on my system
No worry.
Having looked at a lot of s2idle bugs I've found it's generally helpful
to know what ACPI companion is assigned to devices and what are parents.
So it's actually created by this function in the s2idle issue triage
script, amd_s2idle.py.
https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1945
And while on the topic, does your "broken" BIOS report this from that
script?
'''
Platform may hang resuming. Upgrade your firmware or add
pcie_port_pm=off to kernel command line if you have problems
'''
>
> But i think this contains the same information:
>
> $ lspci -Pnn
> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Root Complex [1022:14e8]
> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU
> [1022:14e9]
> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Dummy Host Bridge [1022:14ea]
> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> GPP Bridge [1022:14ed]
> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Dummy Host Bridge [1022:14ea]
> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> GPP Bridge [1022:14ee]
> 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> GPP Bridge [1022:14ee]
> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> GPP Bridge [1022:14ee]
> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> GPP Bridge [1022:14ee]
> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Dummy Host Bridge [1022:14ea]
> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
> USB4/Thunderbolt PCIe tunnel [1022:14ef]
> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Dummy Host Bridge [1022:14ea]
> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Dummy Host Bridge [1022:14ea]
> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Internal GPP Bridge to Bus [C:A] [1022:14eb]
> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Internal GPP Bridge to Bus [C:A] [1022:14eb]
> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Internal GPP Bridge to Bus [C:A] [1022:14eb]
> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus
> Controller [1022:790b] (rev 71)
> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC
> Bridge [1022:790e] (rev 51)
> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 0 [1022:14f0]
> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 1 [1022:14f1]
> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 2 [1022:14f2]
> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 3 [1022:14f3]
> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 4 [1022:14f4]
> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 5 [1022:14f5]
> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 6 [1022:14f6]
> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
> Data Fabric; Function 7 [1022:14f7]
> 00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI]
> Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
> 00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/
> ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 12)
> 00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced Micro
> Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M
> XT/7600S/7700S / PRO W7600] [1002:7480] (rev c7)
> 00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, Inc.
> [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
> 00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller
> [10ec:8168] (rev 15)
> 00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi
> 6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
> 00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung Electronics
> Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
> 00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro Devices,
> Inc. [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
> 00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
> Rembrandt Radeon High Definition Audio Controller [1002:1640]
> 00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, Inc.
> [AMD] Phoenix CCP/PSP 3.0 Device [1022:15c7]
> 00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> Device [1022:15b9]
> 00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> Device [1022:15ba]
> 00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc.
> [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
> 00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
> Family 17h/19h/1ah HD Audio Controller [1022:15e3]
> 00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro
> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
> 00:08.2/00.1 Signal processing controller [1180]: Advanced Micro
> Devices, Inc. [AMD] AMD IPU Device [1022:1502]
> 00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro
> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
> 00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> Device [1022:15c0]
> 00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> Device [1022:15c1]
> 00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> Pink Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
>
>>
>> That's why 7d08f21f8c630 intentionally matches the NHI and then
>> changes the root port right above that instead of all the root ports -
>> because that is where the problem was.
> For some reason it doesn't seem to trigger on my system (added debug
> output) I will look further into it why that happens.
You never see this message in your logs at suspend time (even on a
"fixed" BIOS)?
"quirk: disabling D3cold for suspend"
I'm /suspecting/ you do see it, but you're having problems with another
root port.
I mentioned this in my previous iterations of patches that eventually
landed on that quirk, but Windows and Linux handle root ports
differently at suspend time and that could be why it's exposing your
BIOS bug.
If you can please narrow down which root ports actually need the quirk
for your side (feel free to do a similar style to 7d08f21f8c630) I think
we could land on something more narrow and upstreamable.
At a minimum what you're doing today is covering both Rembrandt and
Phoenix and it should only apply to Phoenix.
Powered by blists - more mailing lists