[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90631333-fda0-42cf-9e32-8289c353549f@tuxedocomputers.com>
Date: Tue, 17 Dec 2024 15:07:07 +0100
From: Werner Sembach <wse@...edocomputers.com>
To: Mario Limonciello <mario.limonciello@....com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: ggo@...edocomputers.com, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] PCI: Avoid putting some root ports into D3 on some
Ryzen chips
Hi,
some more testing below.
Am 11.12.24 um 22:24 schrieb Mario Limonciello:
> On 12/11/2024 06:47, Werner Sembach wrote:
>> Hi,
>>
>> Am 10.12.24 um 17:00 schrieb Mario Limonciello:
>>> On 12/10/2024 09:24, Werner Sembach wrote:
>>>> Hi,
>>>>
>>>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>>>> From: Mario Limonciello <mario.limonciello@....com>
>>>>>>
>>>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>>>> the system is suspended if the port is not power manageable by the
>>>>>> platform and won't be used for wakeup via a PME this sets up the
>>>>>> policy for these ports to go into D3hot.
>>>>>>
>>>>>> This policy generally makes sense from an OSPM perspective but it leads
>>>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 with
>>>>>> an unupdated BIOS.
>>>>>>
>>>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>>>> interrupt.
>>>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>>>>
>>>>>> On the affected Device + BIOS combination, add a quirk for the PCI device
>>>>>> ID used by the problematic root port on both chips to ensure that these
>>>>>> root ports are not put into D3hot at suspend.
>>>>>>
>>>>>> This patch is based on
>>>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2-
>>>>>> mario.limonciello@....com/
>>>>>> but with the added condition both in the documentation and in the code to
>>>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original unpatched BIOS.
>>>>>>
>>>>>> Co-developed-by: Georg Gottleuber <ggo@...edocomputers.com>
>>>>>> Signed-off-by: Georg Gottleuber <ggo@...edocomputers.com>
>>>>>> Co-developed-by: Werner Sembach <wse@...edocomputers.com>
>>>>>> Signed-off-by: Werner Sembach <wse@...edocomputers.com>
>>>>>> Cc: stable@...r.kernel.org # 6.1+
>>>>>> Reported-by: Iain Lane <iain@...ngesquash.org.uk>
>>>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-
>>>>>> suspend-with-external-USB-keyboard/m-p/5217121
>>>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@....com>
>>>>>> ---
>>>>>> drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>>> 1 file changed, 31 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>>>> --- a/drivers/pci/quirks.c
>>>>>> +++ b/drivers/pci/quirks.c
>>>>>> @@ -3908,6 +3908,37 @@ static void
>>>>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>>>> DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>>> quirk_apple_poweroff_thunderbolt);
>>>>>> +
>>>>>> +/*
>>>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3hot
>>>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>>>> + *
>>>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>>>> + * interrupt.
>>>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>>>> + *
>>>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with the
>>>>>> original
>>>>>> + * unupdated BIOS.
>>>>>> + */
>>>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>>>> + {
>>>>>> + .matches = {
>>>>>> + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>>>> + DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>>>> + DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>>>> + },
>>>>>> + },
>>>>>> + {}
>>>>>> +};
>>>>>> +
>>>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>>>> +{
>>>>>> + if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>>>> + !acpi_pci_power_manageable(pdev))
>>>>>> + pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>>>> +}
>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
>>>>>> #endif
>>>>>> /*
>>>>>
>>>>> Wait, what is wrong with:
>>>>>
>>>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD
>>>>> Rembrandt and Phoenix USB4")
>>>>>
>>>>> Is that not activating on your system for some reason?
>>>>
>>>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had blackscreen
>>>> on wakeup.
>>>
>>> OK, I think we need to dig a layer deeper to see which root port is causing
>>> problems to understand this.
>>>
>>>>
>>>> With a newer BIOS for that device suspend and resume however works.
>>>>
>>>
>>> Is there any reason that people would realistically be staying on the old
>>> BIOS and instead we need to carry this quirk in the kernel for eternity?
>> Fear of device bricking or not knowing an update is available.
>
> The not knowing is solved by publishing firmware to LVFS.
>
> Most "popular" distributions include fwupd, regularly check for updates and
> will notify users through the MOTD or a graphical application that there is
> something available.
>
>>>
>>> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's not a
>>> very big barrier to entry to do an update like it was 20 years ago.
>> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/nsh
>> scripts which still is the main form ODMs provide bios updates.
>
> It's tangential to this thread; but generally ODMs will provide you a capsule
> if you ask for one.
>
> Anyhow if you are providing scripts and random EFI binaries in order to get
> things updated, that explains why this is a large enough chunk of people to
> justify a patch like this.
>
>>>
>>>> Looking in the patch the device id's are different (0x162e, 0x162f, 0x1668,
>>>> and 0x1669).
>>>>
>>>
>>> TUXEDO Sirius 16 Gen1 is Phoenix based, right? So at a minimum you
>>> shouldn't be including PCI IDs from Rembrandt (0x14b9)
>> Thanks for the hint, I can delete that then.
>>>
>>> Here is the topology from a Phoenix system on my side:
>>>
>>> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
>>
>> Sorry for the noob question: How do I get that format? it's not lspci - tvnn
>> on my system
>
> No worry.
>
> Having looked at a lot of s2idle bugs I've found it's generally helpful to
> know what ACPI companion is assigned to devices and what are parents.
>
> So it's actually created by this function in the s2idle issue triage script,
> amd_s2idle.py.
>
> https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1945
>
> And while on the topic, does your "broken" BIOS report this from that script?
>
> '''
> Platform may hang resuming. Upgrade your firmware or add pcie_port_pm=off to
> kernel command line if you have problems
> '''
Yes, full log attached (kernel 6.13-rc3 one time without sudo one time with sudo)
>
>>
>> But i think this contains the same information:
>>
>> $ lspci -Pnn
>> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Root
>> Complex [1022:14e8]
>> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU
>> [1022:14e9]
>> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy
>> Host Bridge [1022:14ea]
>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP
>> Bridge [1022:14ed]
>> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy
>> Host Bridge [1022:14ea]
>> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP
>> Bridge [1022:14ee]
>> 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP
>> Bridge [1022:14ee]
>> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP
>> Bridge [1022:14ee]
>> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP
>> Bridge [1022:14ee]
>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy
>> Host Bridge [1022:14ea]
>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>> USB4/Thunderbolt PCIe tunnel [1022:14ef]
>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy
>> Host Bridge [1022:14ea]
>> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy
>> Host Bridge [1022:14ea]
>> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
>> [1022:790b] (rev 71)
>> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
>> [1022:790e] (rev 51)
>> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 0 [1022:14f0]
>> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 1 [1022:14f1]
>> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 2 [1022:14f2]
>> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 3 [1022:14f3]
>> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 4 [1022:14f4]
>> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 5 [1022:14f5]
>> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 6 [1022:14f6]
>> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data
>> Fabric; Function 7 [1022:14f7]
>> 00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi
>> 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
>> 00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ ATI]
>> Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 12)
>> 00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced Micro
>> Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S
>> / PRO W7600] [1002:7480] (rev c7)
>> 00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, Inc.
>> [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
>> 00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>> RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168]
>> (rev 15)
>> 00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi 6E(802.11ax)
>> AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
>> 00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co
>> Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
>> 00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
>> [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
>> 00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
>> Rembrandt Radeon High Definition Audio Controller [1002:1640]
>> 00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD]
>> Phoenix CCP/PSP 3.0 Device [1022:15c7]
>> 00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:15b9]
>> 00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:15ba]
>> 00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD]
>> ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
>> 00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family
>> 17h/19h/1ah HD Audio Controller [1022:15e3]
>> 00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices,
>> Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>> 00:08.2/00.1 Signal processing controller [1180]: Advanced Micro Devices,
>> Inc. [AMD] AMD IPU Device [1022:1502]
>> 00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices,
>> Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>> 00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:15c0]
>> 00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:15c1]
>> 00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Pink
>> Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
>>
>>>
>>> That's why 7d08f21f8c630 intentionally matches the NHI and then changes the
>>> root port right above that instead of all the root ports - because that is
>>> where the problem was.
>> For some reason it doesn't seem to trigger on my system (added debug output)
>> I will look further into it why that happens.
>
> You never see this message in your logs at suspend time (even on a "fixed" BIOS)?
>
> "quirk: disabling D3cold for suspend"
On the fixed BIOS I see that line. On the unfixed BIOS it aborts the functions
at "if (pm_suspend_target_state == PM_SUSPEND_ON)". Skipping the check on the
unfixed BIOS it still hangs on resume.
>
> I'm /suspecting/ you do see it, but you're having problems with another root
> port.
>
> I mentioned this in my previous iterations of patches that eventually landed
> on that quirk, but Windows and Linux handle root ports differently at suspend
> time and that could be why it's exposing your BIOS bug.
>
> If you can please narrow down which root ports actually need the quirk for
> your side (feel free to do a similar style to 7d08f21f8c630) I think we could
> land on something more narrow and upstreamable.
>
> At a minimum what you're doing today is covering both Rembrandt and Phoenix
> and it should only apply to Phoenix.
I also try to find out how many devices where actually shipped with this very
first BIOS version.
Kind regards,
Werner Sembach
View attachment "s2idle_report-2024-12-17_sudo.txt" of type "text/plain" (195419 bytes)
View attachment "s2idle_report-2024-12-17.txt" of type "text/plain" (168565 bytes)
Powered by blists - more mailing lists