lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 25 Jun 2024 13:20:10 -0700
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To: Lukas Wunner <lukas@...ner.de>
Cc: Bjorn Helgaas <helgaas@...nel.org>, linux-pci@...r.kernel.org,
 linux-kernel@...r.kernel.org, Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
 Yazen Ghannam <yazen.ghannam@....com>,
 Ilpo Jarvinen <ilpo.jarvinen@...ux.intel.com>,
 Bowman Terry <terry.bowman@....com>, Hagan Billy <billy.hagan@....com>,
 Simon Guinot <simon.guinot@...gate.com>,
 "Maciej W . Rozycki" <macro@...am.me.uk>,
 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
Subject: Re: [PATCH v2] PCI: pciehp: Clear LBMS on hot-remove to prevent link
 speed reduction

Hi all,

Sorry for the delay here. Took some time to find a system to run 
experiments. Comments inline.

On 6/19/2024 12:47 AM, Lukas Wunner wrote:
> On Tue, Jun 18, 2024 at 02:23:21PM -0700, Smita Koralahalli wrote:
>> On 6/18/2024 11:51 AM, Smita Koralahalli wrote:
>>>>>> But IIUC LBMS is set by hardware but never cleared by hardware, so if
>>>>>> we remove a device and power off the slot, it doesn't seem like LBMS
>>>>>> could be telling us anything useful (what could we do in response to
>>>>>> LBMS when the slot is empty?), so it makes sense to me to clear it.
>>>>>>
>>>>>> It seems like pciehp_unconfigure_device() does sort of PCI core and
>>>>>> driver-related things and possibly could be something shared by all
>>>>>> hotplug drivers, while remove_board() does things more specific to the
>>>>>> hotplug model (pciehp, shpchp, etc).
>>>>>>
>>>>>>  From that perspective, clearing LBMS might fit better in
>>>>>> remove_board(). In that case, I wonder whether it should be done
>>>>>> after turning off slot power? This patch clears is *before* turning
>>>>>> off the power, so I wonder if hardware could possibly set it again
>>>>>> before the poweroff?
>>
>> While clearing LBMS in remove_board() here:
>>
>> if (POWER_CTRL(ctrl)) {
>> 	pciehp_power_off_slot(ctrl);
>> +	pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
>> 				   PCI_EXP_LNKSTA_LBMS);
>>
>> 	/*
>> 	 * After turning power off, we must wait for at least 1 second
>> 	 * before taking any action that relies on power having been
>> 	 * removed from the slot/adapter.
>> 	 */
>> 	msleep(1000);
>>
>> 	/* Ignore link or presence changes caused by power off */
>> 	atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
>> 		   &ctrl->pending_events);
>> }
>>
>> This can happen too right? I.e Just after the slot poweroff and before LBMS
>> clearing the PDC/PDSC could be fired. Then
>> pciehp_handle_presence_or_link_change() would hit case "OFF_STATE" and
>> proceed with pciehp_enable_slot() ....pcie_failed_link_retrain() and
>> ultimately link speed drops..
>>
>> So, I added clearing just before turning off the slot.. Let me know if I'm
>> thinking it right.

I guess I should have experimented before putting this comment out.

After talking to the HW/FW teams, I understood that, none of our CRBs 
support power controller for NVMe devices, which means the "Power 
Controller Present" in Slot_Cap is always false. That's what makes it a 
"surprise removal." If the OS was notified beforehand and there was a 
power controller attached, the OS would turn off the power with 
SLOT_CNTL. That's an "orderly" removal. So essentially, the entire block 
from "if (POWER_CTRL(ctrl))" will never be executed for surprise removal 
for us.

There could be board designs outside of us, with power controllers for 
the NVME devices, which I'm not aware of.
> 
> This was added by 3943af9d01e9 ("PCI: pciehp: Ignore Link State Changes
> after powering off a slot").  You can try reproducing it by writing "0"
> to the slot's "power" file in sysfs, but your hardware needs to support
> slot power.
> 
> Basically the idea is that after waiting for 1 sec, chances are very low
> that any DLLSC or PDSC events caused by removing slot power may still
> occur.

PDSC events occurring in our case aren't by removing slot power. It 
should/will always happen on a surprise removal along with DLLSC for us. 
But this PDSC is been delayed and happens after DLLSC is invoked and 
ctrl->state = OFF_STATE in pciehp_disable_slot(). So the PDSC is mistook 
to enable slot in pciehp_enable_slot() inside 
pciehp_handle_presence_or_link_change().
> 
> Arguably the same applies to LBMS changes, so I'd recommend to likewise
> clear stale LBMS after the msleep(1000).
> 
> pciehp_ctrl.c only contains the state machine and higher-level logic of
> the hotplug controller and all the actual register accesses are in helpers
> in pciehp_hpc.c.  So if you want to do it picture-perfectly, add a helper
> in pciehp_hpc.c to clear LBMS and call that from remove_board().
> 
> That all being said, I'm wondering how this plays together with Ilpo's
> bandwidth control driver?
> 
> https://lore.kernel.org/all/20240516093222.1684-1-ilpo.jarvinen@linux.intel.com/

I need to yet do a thorough reading of Ilpo's bandwidth control driver. 
Ilpo please correct me if I misspeak something as I don't have a 
thorough understanding.

Ilpo's bandwidth controller also checks for lbms count to be greater 
than zero to bring down link speeds if CONFIG_PCIE_BWCTRL is true. If 
false, it follows the default path to check LBMS bit in link status 
register. So if, CONFIG_PCIE_BWCTRL is disabled by default we continue 
to see link speed drops. Even, if BWCTRL is enabled, LBMS count is 
incremented to 1 in pcie_bwnotif_enable() so likely pcie_lbms_seen() 
might return true thereby bringing down speeds here as well if DLLLA is 
clear?

> 
> IIUC, the bandwidth control driver will be in charge of handling LBMS
> changes.  So clearing LBMS behind the bandwidth control driver's back
> might be problematic.  Ilpo?
> 
> Also, since you've confirmed that this issue is fallout from
> a89c82249c37 ("PCI: Work around PCIe link training failures"),
> I'm wondering if the logic introduced by that commit can be
> changed so that the quirk is applied more narrowly, i.e. *not*
> applied to unaffected hardware, such as AMD's hotplug ports.
> That would avoid the need to undo the effect of the quirk and
> work around the downtraining you're seeing.
> 
> Maciej, any ideas?

Yeah I'm okay to go down to that approach as well. Any ideas would be 
helpful here.

Thanks
Smita
> 
> Thanks,
> 
> Lukas
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ