lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk>
Date:   Wed, 2 Mar 2022 23:48:09 +0000 (GMT)
From:   "Maciej W. Rozycki" <macro@...am.me.uk>
To:     Bjorn Helgaas <helgaas@...nel.org>
cc:     Bjorn Helgaas <bhelgaas@...gle.com>, Stefan Roese <sr@...x.de>,
        Jim Wilson <wilson@...iptree.org>,
        David Abdurachmanov <david.abdurachmanov@...ive.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] pci: Work around ASMedia ASM2824 PCIe link training
 failures

On Tue, 1 Mar 2022, Bjorn Helgaas wrote:

> > Attempt to handle cases with a downstream port of the ASMedia ASM2824 
> > PCIe switch where link training never completes and the link continues 
> > switching between speeds indefinitely with the data link layer never 
> > reaching the active state.
> > 
> > It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> > switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> > switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> > P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> > switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> > falling back to 2.5GT/s.
> 
> Can you collect the complete dmesg and lspci output, please?

 Please find them attached (including output from U-boot), both from an 
unchanged kernel and from one with my patch proposed applied.

 Note that I have the U-boot-side workaround already installed and while 
the firmware is on a uSD card and therefore easy to reflash I have the 
system in a remote lab hundreds of kilometres/miles away so I dare not 
fiddle with it while not there and for the purpose of this experiment I 
have manually removed the sticky 2.5GT/s clamp installed by U-boot with:

=> pci write.w 02.03.0 0xb0 0x0063

This is indicated in the console logs.

> >From hints on the web, I guess the ASM2824 is built-in to the SiFive
> board, right?  So I suspect the topology is something like this:
> 
>   00:00.0 Root Port to [bus 01-??]                  # soldered on to Unmatched
>   01:00.0 ASM2824 Upstream Port to [bus 02-09]      # soldered on
>   02:03.0 ASM2824 Downstream Port to [bus 05-09]    # soldered on, to slot
>   05:00.0 PI7C9X2G304 Upstream Port to [bus 06-09]  # on Delock riser card
>   06:??.? PI7C9X2G304 Downstream Port to [bus ??]   # slot 0 on Delock card
>   06:??.? PI7C9X2G304 Downstream Port to [bus ??]   # slot 1 on Delock card

 Correct.

> Do we have other reports of this bridge being broken?  Or could this
> be some kind of signal integrity problem on Unmatched or the slot?

 This appears to be the same issue on the ASM2824's upstream port side: 
<https://lore.kernel.org/lkml/20220221210347.1335004-2-ben.dooks@codethink.co.uk/>.  
I have replied downthread there already, pointing here.

 On the ASM2824's downstream ports side it happens with the Delock device 
regardless of whether port 02:03.0 (M.2 Key E slot, with an adapter to a 
regular PCIe x1 slot installed) or 02:08.0 (regular PCIe x8 slot) is used, 
so it's clearly not slot-specific.

 Furthermore I highly doubt it's a signal integrity issue.  It is because 
the link is correctly negotiated and continues working at its maximum of 
5GT/s for weeks since the bootstrap as long as it's first manually clamped 
to 2.5GT/s and negotiated at that speed and then manually retrained with 
the clamp removed.  Except for the longer interval between negotiations 
it's not much different AFAIK to automatic link training (where 2.5GT/s is 
first negotiated anyway before a higher speed is negotiated according to 
link capability information exchanged between the endpoints).  Therefore I 
highly suspect it is a protocol issue and either or both devices become 
confused and restart link training in an attempt to recover.

 Sadly ASMedia and Pericom declined to comment, all Delock stated was 
their device is fine and works correctly with millions of systems out 
there, and SiFive guys while sympathetic do not have the expertise 
required to further debug this issue.

> The ASM2824 has been around for a while and seems to be used in other
> systems, e.g., https://linux-hardware.org/?id=pci:1b21-2824, so if
> it's an ASM2824 issue, we should see it elsewhere, too.

 Also e.g.: <https://www.amazon.com/dp/B07PRN2QCV> and I could get it for 
further experiments, but somehow I lack the incentive to spend any money 
on a device and therefore indirectly pay its manufacturer where said 
manufacturer does not bother to respond to a legitimate issue report.

> > However the link continues oscillating between the two speeds, at the 
> > rate of 34-35 times per second, with link training reported repeatedly 
> > active ~84% of the time, e.g.:
> > 
> > 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> > [...]
> > 	Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> > [...]
> > 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> > [...]
> > 		LnkSta:	Speed 5GT/s (downgraded), Width x1 (ok)
> > 			TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> > [...]
> > 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> > 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > 			 Compliance De-emphasis: -6dB
> > [...]
> > 
> > Forcibly limiting the target link speed to 2.5GT/s with the upstream 
> > ASM2824 device makes the two switches communicate correctly however:
> > 
> > 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> > [...]
> > 	Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> > [...]
> > 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> > [...]
> > 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (ok)
> > 			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> > [...]
> > 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> > 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > 			 Compliance De-emphasis: -6dB
> > [...]
> > 
> > and then:
> > 
> > 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> > [...]
> > 	Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> > [...]
> > 	Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> > [...]
> > 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (downgraded)
> > 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > [...]
> > 		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> > 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > 			 Compliance De-emphasis: -6dB
> > [...]
> 
> I think the above is supposed to show 02:03.0 operating at 5GT/s or
> 2.5GT/s, and 05:00.0 operating at 2.5GT/s.  But there's a lot of noise
> that makes it hard to pick out what's important.

 With the 2.5GT/s clamp in place for 02:03.0 (Target Link Speed: 2.5GT/s 
in LnkCtl2) it is correctly shown that the device operates at 2.5GT/s 
(Speed 2.5GT/s (downgraded) in LnkSta).  Conversely 05:00.0 has no clamp 
in place (Target Link Speed: 5GT/s in LnkCtl2), so it only operates at 
2.5GT/s (Speed 2.5GT/s (downgraded) in LnkSta) due to the clamp upstream.

> >  NB the corresponding U-boot change has since gone in with commit 
> > a398a51ccc68 ("pci: Work around PCIe link training failures").  See 
> 
> Please include a URL for this commit.  Without knowing where to use
> it, the SHA1 by itself isn't much use.

 Sure: <https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68>.

> > +static void pcie_downstream_link_retrain(struct pci_dev *dev)
> > +{
> > +	u16 lnksta, lnkctl2;
> > +	u8 pos;
> > +
> > +	pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
> > +	WARN_ON(!pos);
> > +	if (!pos)
> > +		return;
> > +
> > +	pci_read_config_word(dev, pos + PCI_EXP_LNKCTL2, &lnkctl2);
> > +	pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &lnksta);
> > +	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> > +	    PCI_EXP_LNKSTA_LBMS) {
> > +		unsigned long timeout;
> > +		u16 lnkctl;
> > +
> > +		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s...\n");
> > +
> > +		pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
> > +		lnkctl |= PCI_EXP_LNKCTL_RL;
> > +		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> > +		lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> > +		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
> > +		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
> > +
> > +		timeout = jiffies + msecs_to_jiffies(200);
> > +		do {
> > +			pci_read_config_word(dev, pos + PCI_EXP_LNKSTA,
> > +					     &lnksta);
> > +			if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> > +				break;
> > +			usleep_range(10000, 20000);
> > +		} while (time_before(jiffies, timeout));
> > +
> > +		pci_info(dev, "retraining %s!\n",
> > +			 lnksta & PCI_EXP_LNKSTA_DLLLA ?
> > +			 "succeeded" : "failed");
> > +	}
> 
> Perhaps a place to use pcie_retrain_link() (would require a little
> rework to make it usable outside aspm.c).

 I'd rather rely on PCI_EXP_LNKSTA_DLLLA and not PCI_EXP_LNKSTA_LT here 
(maybe it's OK for ASPM).  The thing is with the ASM2824 PCI_EXP_LNKSTA_LT 
will oscillate in the case of an unsuccessful link negotiation, as noted 
in the change description and consequently:

		if (!(reg16 & PCI_EXP_LNKSTA_LT))
			break;

may break out of the loop prematurely.  This is prevented in the U-boot 
version of the workaround (which is not ASM2824-specific and therefore 
does not rely on the presence of PCI_EXP_LNKSTA_DLLLA), by taking samples 
in a busy loop, which is something the firmware can do, but we cannot in a 
running OS, and then doing simple statistics.

 NB in an earlier version of the U-boot workaround I actually iterated 
clamping over all speeds supported rather than at 2.5GT/s only, in which 
case ensuring the link has stabilised was even more important.  But then I 
realised it will be better if the firmware only tries the single reduced 
speed and then leaves the clamp in, taking advantage of the stickiness of 
the bit in case the OS booted does not have a workaround and resetting the 
port with the clamp removed as the OS boots would break the link again.

> > Index: linux-macro/include/linux/pci_ids.h
> > ===================================================================
> > --- linux-macro.orig/include/linux/pci_ids.h
> > +++ linux-macro/include/linux/pci_ids.h
> > @@ -2545,6 +2545,7 @@
> >  #define PCI_SUBDEVICE_ID_QEMU            0x1100
> >  
> >  #define PCI_VENDOR_ID_ASMEDIA		0x1b21
> > +#define PCI_DEVICE_ID_ASMEDIA_ASM2824	0x2824
> 
> We only add to pci_ids.h when the ID is used in more than one place.

 I find it weird as working with magic numbers sprinkled throughout code 
is problematic (and I spent a considerable amount of time to clean such 
stuff up in the MIPS port), but won't insist if that's a prerequisite for 
acceptance.

 Please let me know if you need the change updated beyond the removal of
the pci_ids.h entry and I will post a new version.  Also let me know if 
you need any further details.

 Thank you for your review.

  Maciej
View attachment "lspci-xxxx-broken.log" of type "text/plain" (125080 bytes)

View attachment "dmesg-broken.log" of type "text/plain" (19049 bytes)

View attachment "lspci-xxxx-fixed.log" of type "text/plain" (224314 bytes)

View attachment "dmesg-fixed.log" of type "text/plain" (24855 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ