lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_D1EA17EA4011F145E9B70F3C08E57C37890A@qq.com>
Date: Mon, 13 Jan 2025 20:44:37 +0800
From: Jiwei <jiwei.sun.bj@...com>
To: "Maciej W. Rozycki" <macro@...am.me.uk>
Cc: ilpo.jarvinen@...ux.intel.com, Bjorn Helgaas <bhelgaas@...gle.com>,
 linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
 guojinhui.liam@...edance.com, helgaas@...nel.org, lukas@...ner.de,
 ahuang12@...ovo.com, sunjw10@...ovo.com
Subject: Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during
 hotplug testing



On 1/12/25 00:00, Maciej W. Rozycki wrote:
> On Fri, 10 Jan 2025, Jiwei Sun wrote:
> 
>> In order to fix the issue, don't do the retraining work except ASMedia
>> ASM2824.
> 
>  I yet need to go through all of your submission in detail, but this 
> assumption defeats the purpose of the workaround, as the current 
> understanding of the origin of the training failure and the reason to 
> retrain by hand with the speed limited to 2.5GT/s is the *downstream* 
> device rather than the ASMedia ASM2824 switch.
> 
>  It is also why the quirk has been wired to run everywhere rather than
> having been keyed by VID:DID, and the VID:DID of the switch is only 
> listed, conservatively, because it seems safe with the switch to lift the 
> speed restriction once the link has successfully completed training.
> 
>  Overall I think we need to get your problem sorted differently, because I 
> suppose in principle your hot-plug scenario could also happen with the 
> ASMedia ASM2824 switch as the upstream device and your NVMe storage 
> element as the downstream device.  Perhaps the speed restriction could be 
> always lifted, and then the bandwidth controller infrastructure used for 
> that, so that it doesn't have to happen within `pcie_failed_link_retrain'?

According to our test, the following modification can fix the issue in our
test machine.

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 02d2e16672a8..9ca051b86878 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -97,10 +97,6 @@ static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
  */
 int pcie_failed_link_retrain(struct pci_dev *dev)  {
-       static const struct pci_device_id ids[] = {
-               { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
-               {}
-       };
        u16 lnksta, lnkctl2;
        int ret = -ENOTTY;
 
@@ -128,8 +124,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
        }
 
        if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
-           (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
-           pci_match_id(ids, dev)) {
+           (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == 
+ PCI_EXP_LNKCTL2_TLS_2_5GT) {
                u32 lnkcap;

                pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");

But I don't know if the above modification will have any other negative effects
on other devices. Could you please share your thoughts?

Thanks,
Regards,
Jiwei

> 
>   Maciej


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ