[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <73fd7b2d-9256-9eba-70be-d69ea336fd67@amd.com>
Date: Tue, 25 Jun 2024 13:20:10 -0700
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To: Lukas Wunner <lukas@...ner.de>
Cc: Bjorn Helgaas <helgaas@...nel.org>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
Yazen Ghannam <yazen.ghannam@....com>,
Ilpo Jarvinen <ilpo.jarvinen@...ux.intel.com>,
Bowman Terry <terry.bowman@....com>, Hagan Billy <billy.hagan@....com>,
Simon Guinot <simon.guinot@...gate.com>,
"Maciej W . Rozycki" <macro@...am.me.uk>,
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
Subject: Re: [PATCH v2] PCI: pciehp: Clear LBMS on hot-remove to prevent link
speed reduction
Hi all,
Sorry for the delay here. Took some time to find a system to run
experiments. Comments inline.
On 6/19/2024 12:47 AM, Lukas Wunner wrote:
> On Tue, Jun 18, 2024 at 02:23:21PM -0700, Smita Koralahalli wrote:
>> On 6/18/2024 11:51 AM, Smita Koralahalli wrote:
>>>>>> But IIUC LBMS is set by hardware but never cleared by hardware, so if
>>>>>> we remove a device and power off the slot, it doesn't seem like LBMS
>>>>>> could be telling us anything useful (what could we do in response to
>>>>>> LBMS when the slot is empty?), so it makes sense to me to clear it.
>>>>>>
>>>>>> It seems like pciehp_unconfigure_device() does sort of PCI core and
>>>>>> driver-related things and possibly could be something shared by all
>>>>>> hotplug drivers, while remove_board() does things more specific to the
>>>>>> hotplug model (pciehp, shpchp, etc).
>>>>>>
>>>>>> From that perspective, clearing LBMS might fit better in
>>>>>> remove_board(). In that case, I wonder whether it should be done
>>>>>> after turning off slot power? This patch clears is *before* turning
>>>>>> off the power, so I wonder if hardware could possibly set it again
>>>>>> before the poweroff?
>>
>> While clearing LBMS in remove_board() here:
>>
>> if (POWER_CTRL(ctrl)) {
>> pciehp_power_off_slot(ctrl);
>> + pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
>> PCI_EXP_LNKSTA_LBMS);
>>
>> /*
>> * After turning power off, we must wait for at least 1 second
>> * before taking any action that relies on power having been
>> * removed from the slot/adapter.
>> */
>> msleep(1000);
>>
>> /* Ignore link or presence changes caused by power off */
>> atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
>> &ctrl->pending_events);
>> }
>>
>> This can happen too right? I.e Just after the slot poweroff and before LBMS
>> clearing the PDC/PDSC could be fired. Then
>> pciehp_handle_presence_or_link_change() would hit case "OFF_STATE" and
>> proceed with pciehp_enable_slot() ....pcie_failed_link_retrain() and
>> ultimately link speed drops..
>>
>> So, I added clearing just before turning off the slot.. Let me know if I'm
>> thinking it right.
I guess I should have experimented before putting this comment out.
After talking to the HW/FW teams, I understood that, none of our CRBs
support power controller for NVMe devices, which means the "Power
Controller Present" in Slot_Cap is always false. That's what makes it a
"surprise removal." If the OS was notified beforehand and there was a
power controller attached, the OS would turn off the power with
SLOT_CNTL. That's an "orderly" removal. So essentially, the entire block
from "if (POWER_CTRL(ctrl))" will never be executed for surprise removal
for us.
There could be board designs outside of us, with power controllers for
the NVME devices, which I'm not aware of.
>
> This was added by 3943af9d01e9 ("PCI: pciehp: Ignore Link State Changes
> after powering off a slot"). You can try reproducing it by writing "0"
> to the slot's "power" file in sysfs, but your hardware needs to support
> slot power.
>
> Basically the idea is that after waiting for 1 sec, chances are very low
> that any DLLSC or PDSC events caused by removing slot power may still
> occur.
PDSC events occurring in our case aren't by removing slot power. It
should/will always happen on a surprise removal along with DLLSC for us.
But this PDSC is been delayed and happens after DLLSC is invoked and
ctrl->state = OFF_STATE in pciehp_disable_slot(). So the PDSC is mistook
to enable slot in pciehp_enable_slot() inside
pciehp_handle_presence_or_link_change().
>
> Arguably the same applies to LBMS changes, so I'd recommend to likewise
> clear stale LBMS after the msleep(1000).
>
> pciehp_ctrl.c only contains the state machine and higher-level logic of
> the hotplug controller and all the actual register accesses are in helpers
> in pciehp_hpc.c. So if you want to do it picture-perfectly, add a helper
> in pciehp_hpc.c to clear LBMS and call that from remove_board().
>
> That all being said, I'm wondering how this plays together with Ilpo's
> bandwidth control driver?
>
> https://lore.kernel.org/all/20240516093222.1684-1-ilpo.jarvinen@linux.intel.com/
I need to yet do a thorough reading of Ilpo's bandwidth control driver.
Ilpo please correct me if I misspeak something as I don't have a
thorough understanding.
Ilpo's bandwidth controller also checks for lbms count to be greater
than zero to bring down link speeds if CONFIG_PCIE_BWCTRL is true. If
false, it follows the default path to check LBMS bit in link status
register. So if, CONFIG_PCIE_BWCTRL is disabled by default we continue
to see link speed drops. Even, if BWCTRL is enabled, LBMS count is
incremented to 1 in pcie_bwnotif_enable() so likely pcie_lbms_seen()
might return true thereby bringing down speeds here as well if DLLLA is
clear?
>
> IIUC, the bandwidth control driver will be in charge of handling LBMS
> changes. So clearing LBMS behind the bandwidth control driver's back
> might be problematic. Ilpo?
>
> Also, since you've confirmed that this issue is fallout from
> a89c82249c37 ("PCI: Work around PCIe link training failures"),
> I'm wondering if the logic introduced by that commit can be
> changed so that the quirk is applied more narrowly, i.e. *not*
> applied to unaffected hardware, such as AMD's hotplug ports.
> That would avoid the need to undo the effect of the quirk and
> work around the downtraining you're seeing.
>
> Maciej, any ideas?
Yeah I'm okay to go down to that approach as well. Any ideas would be
helpful here.
Thanks
Smita
>
> Thanks,
>
> Lukas
>
Powered by blists - more mailing lists