lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <312d7781-671c-3765-a7d9-690d032650ec@amd.com>
Date:   Thu, 8 Sep 2022 10:22:15 -0500
From:   "Limonciello, Mario" <mario.limonciello@....com>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     "mika.westerberg@...ux.intel.com" <mika.westerberg@...ux.intel.com>,
        "andreas.noever@...il.com" <andreas.noever@...il.com>,
        "michael.jamet@...el.com" <michael.jamet@...el.com>,
        "YehezkelShB@...il.com" <YehezkelShB@...il.com>,
        "Mehta, Sanju" <Sanju.Mehta@....com>,
        "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Tsao, Anson" <anson.tsao@....com>,
        Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [PATCH] thunderbolt: Resume PCIe bridges after switch is found on
 AMD USB4 controller

On 9/8/2022 09:02, Kai-Heng Feng wrote:
> "
> 
> On Thu, Sep 8, 2022 at 12:30 AM Limonciello, Mario
> <Mario.Limonciello@....com> wrote:
>>
>> [Public]
>>
>> Hi,
>>
>>> -----Original Message-----
>>> From: Greg KH <gregkh@...uxfoundation.org>
>>> Sent: Monday, September 5, 2022 02:30
>>> To: Kai-Heng Feng <kai.heng.feng@...onical.com>
>>> Cc: mika.westerberg@...ux.intel.com; andreas.noever@...il.com;
>>> michael.jamet@...el.com; YehezkelShB@...il.com; Mehta, Sanju
>>> <Sanju.Mehta@....com>; Limonciello, Mario
>>> <Mario.Limonciello@....com>; linux-usb@...r.kernel.org; linux-
>>> kernel@...r.kernel.org
>>> Subject: Re: [PATCH] thunderbolt: Resume PCIe bridges after switch is found
>>> on AMD USB4 controller
>>>
>>> On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote:
>>>> AMD USB4 can not detect external PCIe devices like external NVMe when
>>>> it's hotplugged, because card/link are not up:
>>>>
>>>> pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101
>>>
>>> That sounds like a hardware bug, how does this work in other operating
>>> systems for this hardware?
>>
>> We happen to have this HP system in our lab.  My colleague Anson (now on CC) flashed
>> the same BIOS to it (01.02.01) using dediprog and loaded a 6.0-rc3 mainline kernel built
>> from the Canonical mainline kernel PPA.
>>
>> He then tried to hotplug a TBT3 SSD a number of times but couldn't hit this issue.
>> I attached his log to the kernel Bugzilla.
> 
> Nice to hear. Hopefully this can be fixed at firmware/hardware side.

I guess you and Anson might want to sync up offline and compare whether 
you have the same hardware stepping.

> 
>>
>>>
>>>> Use `lspci` to resume pciehp bridges can find external devices.
>>>
>>> That's not good :(
>>>
>>>> A long delay before checking card/link presence doesn't help, either.
>>>> The only way to make the hotplug work is to enable pciehp interrupt and
>>>> check card presence after the TB switch is added.
>>>>
>>>> Since the topology of USB4 and its PCIe bridges are siblings, hardcode
>>>> the bridge ID so TBT driver can wake them up to check presence.
>>>
>>> As I mention below, this is not an acceptable solution.
>>>
>>> AMD developers, any ideas on how to get this fixed in the TB controller
>>> firware instead?
>>
>> Anson also double checked on the AMD reference hardware that the HP system is built
>> against and couldn't reproduce it there either.
>>
>> KH, I've got a few questions/comments to try to better explain why we're here.
>>
>> 1) How did you flash the 01.02.01 firmware?  In Anson's check, he used dediprog.
>> Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the
>> upgrade that didn't get set/cleared properly from an earlier pre-release BIOS?
> 
> We used UEFI capsule to update the firmware, via fwupd.

So that's a difference from how Anson did it.  Could you perhaps dump 
the BIOS SPI image?  Anson can flash the exact same dump via dediprog 
and see if he can repro.

It would let us confirm if it was caused by your upgrade path.

> 
>>
>> 2) Did you change any BIOS settings?  Particularly anything to do with Pre-OS CM?
> 
> No, nothing in BIOS was changed. >
>>
>> 3) If you explicitly reset to HP's "default BIOS settings" does it resolve?
> 
> Doesn't help. I put the device to ACPI G3 and it doesn't help, either.

OK.

> 
>>
>> 4) Can you double check ADP_CS_5 bit 31?  I attached is a patch to kernel Bugzilla to
>> add dyndbg output for it.  If it was for some reason set by Pre-OS CM in your BIOS/settings
>> combination, we might need to undo it by the Linux CM.
> 
> All ports say "Hotplug disabled: 0".
> 
> dmesg attached to the bugzilla.

OK, that at least rules out DHP from Pre-OS CM.

> 
>>
>> 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or
>> root ports used for tunneling using software like TLP?
> 
> No. And they should be suspended by default.
> 

Thinking about this being possibly a firmware upgrade path problem, can 
you please check:

# grep SMC /sys/kernel/debug/dri/0/amdgpu_firmware_info

Anson's system was 0x04453200 (program 4, version 69.50.0).

> Kai-Heng
> 
>>
>>>
>>>>
>>>> Bugzilla:
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
>>> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&amp;data=05%7C01%7Cm
>>> ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d
>>> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU
>>> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
>>> 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=0lhcaKfUyoK
>>> 0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&amp;reserved=0
>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
>>>> ---
>>>>   drivers/thunderbolt/nhi.c    | 29 +++++++++++++++++++++++++++++
>>>>   drivers/thunderbolt/switch.c |  6 ++++++
>>>>   drivers/thunderbolt/tb.c     |  1 +
>>>>   drivers/thunderbolt/tb.h     |  5 +++++
>>>>   include/linux/thunderbolt.h  |  1 +
>>>>   5 files changed, 42 insertions(+)
>>>>
>>>> diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
>>>> index cb8c9c4ae93a2..75f5ce5e22978 100644
>>>> --- a/drivers/thunderbolt/nhi.c
>>>> +++ b/drivers/thunderbolt/nhi.c
>>>> @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const
>>> struct pci_device_id *id)
>>>>   {
>>>>      struct tb_nhi *nhi;
>>>>      struct tb *tb;
>>>> +   struct pci_dev *p = NULL;
>>>> +   struct tb_pci_bridge *pci_bridge, *n;
>>>>      int res;
>>>>
>>>>      if (!nhi_imr_valid(pdev)) {
>>>> @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const
>>> struct pci_device_id *id)
>>>>              nhi_shutdown(nhi);
>>>>              return res;
>>>>      }
>>>> +
>>>> +   if (pdev->vendor == PCI_VENDOR_ID_AMD) {
>>>> +           while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd,
>>> p))) {
>>>> +                   pci_bridge = kmalloc(sizeof(struct tb_pci_bridge),
>>> GFP_KERNEL);
>>>> +                   if (!pci_bridge)
>>>> +                           goto cleanup;
>>>> +
>>>> +                   pci_bridge->bridge = p;
>>>> +                   INIT_LIST_HEAD(&pci_bridge->list);
>>>> +                   list_add(&pci_bridge->list, &tb->bridge_list);
>>>> +           }
>>>> +   }
>>>
>>> You can't walk the device tree and create a "shadow" list of devices
>>> like this and expect any lifetime rules to work properly with them at
>>> all.
>>>
>>> Please do not do this.
>>>
>>> greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ