lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 17 Jun 2024 18:18:41 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
	Lukas Wunner <lukas@...ner.de>,
	Yazen Ghannam <yazen.ghannam@....com>,
	Ilpo Jarvinen <ilpo.jarvinen@...ux.intel.com>,
	Bowman Terry <terry.bowman@....com>,
	Hagan Billy <billy.hagan@....com>,
	Simon Guinot <simon.guinot@...gate.com>,
	"Maciej W . Rozycki" <macro@...am.me.uk>,
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
Subject: Re: [PATCH v2] PCI: pciehp: Clear LBMS on hot-remove to prevent link
 speed reduction

On Mon, Jun 17, 2024 at 03:51:57PM -0700, Smita Koralahalli wrote:
> Hi Bjorn,
> 
> On 6/17/2024 1:09 PM, Bjorn Helgaas wrote:
> > On Thu, May 16, 2024 at 08:47:48PM +0000, Smita Koralahalli wrote:
> > > Clear Link Bandwidth Management Status (LBMS) if set, on a hot-remove
> > > event.
> > > 
> > > The hot-remove event could result in target link speed reduction if LBMS
> > > is set, due to a delay in Presence Detect State Change (PDSC) happening
> > > after a Data Link Layer State Change event (DLLSC).
> > > 
> > > In reality, PDSC and DLLSC events rarely come in simultaneously. Delay in
> > > PDSC can sometimes be too late and the slot could have already been
> > > powered down just by a DLLSC event. And the delayed PDSC could falsely be
> > > interpreted as an interrupt raised to turn the slot on. This false process
> > > of powering the slot on, without a link forces the kernel to retrain the
> > > link if LBMS is set, to a lower speed to restablish the link thereby
> > > bringing down the link speeds [2].
> > 
> > Not sure we need PDSC and DLLSC details to justify clearing LBMS if it
> > has no meaning for an empty slot?
> 
> I'm trying to also answer your below question here..
> 
> >I guess the slot is empty, so retraining
> > is meaningless and will always fail.  Maybe avoiding it avoids a
> > delay?  Is the benefit that we get rid of the message and a delay?"
> 
> The benefit of this patch is to "avoid link speed drops" on a hot remove
> event if LBMS is set and DLLLA is clear. But I'm not trying to solve delay
> issues here..
> 
> I included the PDSC and DLLSC details as they are the cause for link speed
> drops on a remove event. On an empty slot, DLLLA is cleared and LBMS may or
> may not be set. And, we see cases of link speed drops here, if PDSC happens
> on an empty slot.
> 
> We know for the fact that slot becomes empty if either of the events PDSC or
> DLLSC occur. Also, either of them do not wait for the other to bring down
> the device and mark the slot as "empty". That is the reason I was also
> thinking of waiting on both events PDSC and DLLSC to bring down the device
> as I mentioned in my comments in V1. We could eliminate link speed drops by
> taking this approach as well. But then we had to address cases where PDSC is
> hardwired to zero.
> 
> On our AMD systems, we expect to see both events on an hot-remove event.
> But, mostly we see DLLSC happening first, which does the job of marking the
> slot empty. Now, if the PDSC event is delayed way too much and if it occurs
> after the slot becomes empty, kernel misinterprets PDSC as the signal to
> re-initialize the slot and this is the sequence of steps the kernel takes:
> 
> pciehp_handle_presence_or_link_change()
>   pciehp_enable_slot()
>     __pciehp_enable_slot()
>         board_added
>           pciehp_check_link_status()
>             pcie_wait_for_link()
>               pcie_wait_for_link_delay()
>                 pcie_failed_link_retrain()
> 
> while doing so, it hits the case of DLLLA clear and LBMS set and brings down
> the speeds.

So I guess LBMS currently remains set after a device has been removed,
so the slot is empty, and later when a device is hot-added, *that*
device sees a lower-than expected link speed?

> The issue of PDSC and DLLSC never occurring simultaneously was a known thing
> from before and it wasn't breaking anything functionally as the kernel would
> just exit with the message: "No link" at pciehp_check_link_status().
> 
> However, Commit a89c82249c37 ("PCI: Work around PCIe link training
> failures") introduced link speed downgrades to re-establish links if LBMS is
> set, and DLLLA is clear. This caused the devices to operate at 2.5GT/s after
> they were plugged-in which were previously operating at higher speeds before
> hot-remove.
> 
> > > According to PCIe r6.2 sec 7.5.3.8 [1], it is derived that, LBMS cannot
> > > be set for an unconnected link and if set, it serves the purpose of
> > > indicating that there is actually a device down an inactive link.
> > 
> > I see that r6.2 added an implementation note about DLLSC, but I'm not
> > a hardware person and can't follow the implication about a device
> > present down an inactive link.
> > 
> > I guess it must be related to the fact that LBMS indicates either
> > completion of link retraining or a change in link speed or width
> > (which would imply presence of a downstream device).  But in both
> > cases I assume the link would be active.
> > 
> > But IIUC LBMS is set by hardware but never cleared by hardware, so if
> > we remove a device and power off the slot, it doesn't seem like LBMS
> > could be telling us anything useful (what could we do in response to
> > LBMS when the slot is empty?), so it makes sense to me to clear it.
> > 
> > It seems like pciehp_unconfigure_device() does sort of PCI core and
> > driver-related things and possibly could be something shared by all
> > hotplug drivers, while remove_board() does things more specific to the
> > hotplug model (pciehp, shpchp, etc).
> > 
> >  From that perspective, clearing LBMS might fit better in
> > remove_board().  In that case, I wonder whether it should be done
> > after turning off slot power?  This patch clears is *before* turning
> > off the power, so I wonder if hardware could possibly set it again
> > before the poweroff?
> 
> Yeah by talking to HW people I realized that HW could interfere possibly
> anytime to set LBMS when the slot power is on. Will change it to include in
> remove_board().
> 
> > > However, hardware could have already set LBMS when the device was
> > > connected to the port i.e when the state was DL_Up or DL_Active. Some
> > > hardwares would have even attempted retrain going into recovery mode,
> > > just before transitioning to DL_Down.
> > > 
> > > Thus the set LBMS is never cleared and might force software to cause link
> > > speed drops when there is no link [2].
> > > 
> > > Dmesg before:
> > > 	pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
> > > 	pcieport 0000:20:01.1: pciehp: Slot(59): Card present
> > > 	pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
> > > 	pcieport 0000:20:01.1: retraining failed
> > > 	pcieport 0000:20:01.1: pciehp: Slot(59): No link
> > > 
> > > Dmesg after:
> > > 	pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
> > > 	pcieport 0000:20:01.1: pciehp: Slot(59): Card present
> > > 	pcieport 0000:20:01.1: pciehp: Slot(59): No link
> > 
> > I'm a little confused about the problem being solved here.  Obviously
> > the message is extraneous.  I guess the slot is empty, so retraining
> > is meaningless and will always fail.  Maybe avoiding it avoids a
> > delay?  Is the benefit that we get rid of the message and a delay?
> > 
> > > [1] PCI Express Base Specification Revision 6.2, Jan 25 2024.
> > >      https://members.pcisig.com/wg/PCI-SIG/document/20590
> > > [2] Commit a89c82249c37 ("PCI: Work around PCIe link training failures")
> > > 
> > > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> > 
> > Lukas asked about this; did you confirm that it is related?  Asking
> > because the Fixes tag may cause this to be backported along with
> > a89c82249c37.
> 
> Yeah, without this patch we won't see link speed drops.
> 
> Thanks,
> Smita
> 
> > 
> > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
> > > ---
> > > Link to v1:
> > > https://lore.kernel.org/all/20240424033339.250385-1-Smita.KoralahalliChannabasappa@amd.com/
> > > 
> > > v2:
> > > 	Cleared LBMS unconditionally. (Ilpo)
> > > 	Added Fixes Tag. (Lukas)
> > > ---
> > >   drivers/pci/hotplug/pciehp_pci.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
> > > index ad12515a4a12..dae73a8932ef 100644
> > > --- a/drivers/pci/hotplug/pciehp_pci.c
> > > +++ b/drivers/pci/hotplug/pciehp_pci.c
> > > @@ -134,4 +134,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
> > >   	}
> > >   	pci_unlock_rescan_remove();
> > > +
> > > +	pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
> > > +				   PCI_EXP_LNKSTA_LBMS);
> > >   }
> > > -- 
> > > 2.17.1
> > > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ