[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9ce9d8d-6ebd-3526-f806-5e9812eae762@amd.com>
Date: Thu, 25 Apr 2024 13:49:28 -0700
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Bjorn Helgaas <helgaas@...nel.org>,
Mahesh J Salgaonkar <mahesh@...ux.ibm.com>, Lukas Wunner <lukas@...ner.de>,
Yazen Ghannam <yazen.ghannam@....com>,
Ilpo Jarvinen <ilpo.jarvinen@...ux.intel.com>,
Bowman Terry <terry.bowman@....com>, Hagan Billy <billy.hagan@....com>,
Simon Guinot <simon.guinot@...gate.com>,
"Maciej W . Rozycki" <macro@...am.me.uk>, pradeepvineshreddy.kodamati@....com
Subject: Re: [PATCH] PCI: pciehp: Clear LBMS on hot-remove to prevent link
speed reduction
Hi,
On 4/24/2024 8:10 PM, Kuppuswamy Sathyanarayanan wrote:
>
> On 4/23/24 8:33 PM, Smita Koralahalli wrote:
>> Clear Link Bandwidth Management Status (LBMS) if set, on a hot-remove event.
>>
>> The hot-remove event could result in target link speed reduction if LBMS
>> is set, due to a delay in Presence Detect State Change (PDSC) happening
>> after a Data Link Layer State Change event (DLLSC).
>
> What is the actual impact? Since this happens during hot-remove,
> and there is no device, the link retrain will fail, right?
That's right. Link retrain fails resulting in reduced link speeds. It
shouldn't attempt link retrain in the first place if there is no device.
>
>>
>> In reality, PDSC and DLLSC events rarely come in simultaneously. Delay in
>> PDSC can sometimes be too late and the slot could have already been
>> powered down just by a DLLSC event. And the delayed PDSC could falsely be
>> interpreted as an interrupt raised to turn the slot on. This false process
>> of powering the slot on, without a link forces the kernel to retrain the
>> link if LBMS is set, to a lower speed to restablish the link thereby
>> bringing down the link speeds [2].
>>
>> According to PCIe r6.2 sec 7.5.3.8 [1], it is derived that, LBMS cannot
>> be set for an unconnected link and if set, it serves the purpose of
>
> I did not find this detail in the spec. Can you copy paste the lines
> in the spec that mentions the case where it cannot set in an
> unconnected link? All I see are the cases where it can be set.
This is the only line in the Spec: "Link Bandwidth Management Status -
This bit is Set by hardware to indicate that either of the following has
occurred without the Port transitioning through DL_Down status".
I have mostly written down the inferences for the statement. And part of
the inferences I have even picked up from Bjorn's statements in the
below link: (probably all of them :))
https://lore.kernel.org/all/alpine.DEB.2.21.2306080224280.36323@angie.orcam.me.uk/
Repasting..
"...LBMS cannot be set for an unconnected link, because the bit is only
allowed to be set for an event observed that has happened with a port
reporting no DL_Down status at any time throughout the event, which can
only happen with the physical layer up, which of course cannot happen
for an unconnected link....
...
.IOW the LBMS bit serves the purpose of indicating that there is
actually a device down an inactive link ...."
>
>> indicating that there is actually a device down an inactive link.
>> However, hardware could have already set LBMS when the device was
>> connected to the port i.e when the state was DL_Up or DL_Active. Some
>
> Isn't LBMS only set during DL_Down transition ? Why would it be
> set during DL_Up?
The statement in Spec is very confusing :/ LBMS could only be set when
the state is not DL_Down. But it could already be set before the port
enters DL_Down.
Tried my attempt to collect some points here:
https://lore.kernel.org/linux-pci/4852519a-c941-aa0c-2912-f6383c708ade@amd.com/
Hoping I'm on the right track!
Thanks
Smita
>
>> hardwares would have even attempted retrain going into recovery mode,
>> just before transitioning to DL_Down.
>>
>> Thus the set LBMS is never cleared and might force software to cause link
>> speed drops when there is no link [2].
>>
>> Dmesg before:
>> pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
>> pcieport 0000:20:01.1: pciehp: Slot(59): Card present
>> pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
>> pcieport 0000:20:01.1: retraining failed
>> pcieport 0000:20:01.1: pciehp: Slot(59): No link
>>
>> Dmesg after:
>> pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
>> pcieport 0000:20:01.1: pciehp: Slot(59): Card present
>> pcieport 0000:20:01.1: pciehp: Slot(59): No link
>>
>> [1] PCI Express Base Specification Revision 6.2, Jan 25 2024.
>> https://members.pcisig.com/wg/PCI-SIG/document/20590
>> [2] Commit a89c82249c37 ("PCI: Work around PCIe link training failures")
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
>> ---
>> 1. Should be based on top of fixes for link retrain status in
>> pcie_wait_for_link_delay()
>> https://patchwork.kernel.org/project/linux-pci/list/?series=824858
>> https://lore.kernel.org/linux-pci/53b2239b-4a23-a948-a422-4005cbf76148@linux.intel.com/
>>
>> Without the fixes patch output would be:
>> pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
>> pcieport 0000:20:01.1: pciehp: Slot(59): Card present
>> pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
>> pcieport 0000:20:01.1: retraining failed
>> pcieport 0000:20:01.1: pciehp: Slot(59): No device found.
>>
>> 2. I initially attempted to wait for both events PDSC and DLLSC to happen
>> and then turn on the slot.
>> Similar to: https://lore.kernel.org/lkml/20190205210701.25387-1-mr.nuke.me@gmail.com/
>> but before turning on the slot.
>>
>> Something like:
>> - ctrl->state = POWERON_STATE;
>> - mutex_unlock(&ctrl->state_lock);
>> - if (present)
>> + if (present && link_active) {
>> + ctrl->state = POWERON_STATE;
>> + mutex_unlock(&ctrl->state_lock);
>> ctrl_info(ctrl, "Slot(%s): Card present\n",
>> slot_name(ctrl));
>> - if (link_active)
>> ctrl_info(ctrl, "Slot(%s): Link Up\n",
>> slot_name(ctrl));
>> - ctrl->request_result = pciehp_enable_slot(ctrl);
>> - break;
>> + ctrl->request_result = pciehp_enable_slot(ctrl);
>> + break;
>> + }
>> + else {
>> + mutex_unlock(&ctrl->state_lock);
>> + break;
>> + }
>>
>> This would also avoid printing the lines below on a remove event.
>> pcieport 0000:20:01.1: pciehp: Slot(59): Card present
>> pcieport 0000:20:01.1: pciehp: Slot(59): No link
>>
>> I understand this would likely be not applicable in places where broken
>> devices hardwire PDS to zero and PDSC would never happen. But I'm open to
>> making changes if this is more applicable. Because, SW cannot directly
>> track the interference of HW in attempting link retrain and setting LBMS.
>>
>> 3. I tried introducing delay similar to pcie_wait_for_presence() but I
>> was not successful in picking the right numbers. Hence hit with the same
>> link speed drop.
>>
>> 4. For some reason I was unable to clear LBMS with:
>> pcie_capability_clear_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
>> PCI_EXP_LNKSTA_LBMS);
>> ---
>> drivers/pci/hotplug/pciehp_pci.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
>> index ad12515a4a12..9155fdfd1d37 100644
>> --- a/drivers/pci/hotplug/pciehp_pci.c
>> +++ b/drivers/pci/hotplug/pciehp_pci.c
>> @@ -92,7 +92,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
>> {
>> struct pci_dev *dev, *temp;
>> struct pci_bus *parent = ctrl->pcie->port->subordinate;
>> - u16 command;
>> + u16 command, lnksta;
>>
>> ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
>> __func__, pci_domain_nr(parent), parent->number);
>> @@ -134,4 +134,10 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
>> }
>>
>> pci_unlock_rescan_remove();
>> +
>> + /* Clear LBMS on removal */
>> + pcie_capability_read_word(ctrl->pcie->port, PCI_EXP_LNKSTA, &lnksta);
>> + if (lnksta & PCI_EXP_LNKSTA_LBMS)
>> + pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
>> + PCI_EXP_LNKSTA_LBMS);
>> }
>
Powered by blists - more mailing lists