lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1f5abbae-7a7d-402d-ac6e-029cdc3b0d63@linux.ibm.com>
Date: Tue, 30 Sep 2025 10:04:25 -0700
From: Farhan Ali <alifm@...ux.ibm.com>
To: Benjamin Block <bblock@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
        alex.williamson@...hat.com, helgaas@...nel.org, clg@...hat.com,
        schnelle@...ux.ibm.com, mjrosato@...ux.ibm.com
Subject: Re: [PATCH v4 02/10] PCI: Add additional checks for flr reset


On 9/30/2025 3:03 AM, Benjamin Block wrote:
> On Wed, Sep 24, 2025 at 10:16:20AM -0700, Farhan Ali wrote:
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index a3d93d1baee7..327fefc6a1eb 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4576,12 +4576,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>>    */
>>   int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>   {
>> +	u32 reg;
>> +
>>   	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>>   		return -ENOTTY;
>>   
>>   	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>   		return -ENOTTY;
>>   
>> +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
>> +		pci_warn(dev, "Device unable to do an FLR\n");
>> +		return -ENOTTY;
>> +	}
> Just thinking out loud, not sure whether it make sense, but since you already
> read an up-to-date value from the config space, would it make sense to
> pull the check above `dev->devcap & PCI_EXP_DEVCAP_FLR` below this read, and
> check on the just read `reg`?

My thinking was we could exit early if the device never had FLR 
capability (and so was not cached in devcap). This way we avoid an extra 
PCI read.


>
> Also wondering whether it makes sense to stable-tag this? We've recently seen
> "unpleasant" recovery attempts that look like this in the kernel logs:
>
>      [  663.330053] vfio-pci 0007:00:00.1: timed out waiting for pending transaction; performing function level reset anyway
>      [  664.730051] vfio-pci 0007:00:00.1: not ready 1023ms after FLR; waiting
>      [  665.830023] vfio-pci 0007:00:00.1: not ready 2047ms after FLR; waiting
>      [  667.910023] vfio-pci 0007:00:00.1: not ready 4095ms after FLR; waiting
>      [  672.070022] vfio-pci 0007:00:00.1: not ready 8191ms after FLR; waiting
>      [  680.550025] vfio-pci 0007:00:00.1: not ready 16383ms after FLR; waiting
>      [  697.190023] vfio-pci 0007:00:00.1: not ready 32767ms after FLR; waiting
>      [  730.470021] vfio-pci 0007:00:00.1: not ready 65535ms after FLR; giving up
>
> The VF here was already dead in the water at that point, so I suspect
> `pci_read_config_dword()` might have failed, and so this check would have
> failed, and we wouldn't have "wasted" the minute waiting for a FLR that was
> never going to happen anyway.
I think maybe we could? I don't think this patch fixes anything that's 
"broken" but rather improves the behavior to escalate to other reset 
method if the device is already in a bad state. I will cc stable.

Thanks

Farhan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ