lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87b08879-4f1a-e91d-861a-0a1af4ad46fc@linux.alibaba.com>
Date:   Fri, 13 Jan 2023 11:58:06 +0800
From:   Yang Su <yang.su@...ux.alibaba.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     bhelgaas@...gle.com, linux-pci@...r.kernel.org,
        alex.williamson@...hat.com, matthew@....cx,
        jbarnes@...tuousgeek.org, greg@...ah.com, patchwork-bot@...nel.org,
        andrew.murray@....com, lorenzo.pieralisi@....com, robh@...nel.org,
        kw@...ux.com, linux-kernel@...r.kernel.org,
        Lukas Wunner <lukas@...ner.de>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>
Subject: Re: [PATCH 1/1] PCI: Tune secondary bus reset time for PCIe

Sorry to interrupt, this email is reformat as plaint text, I want to 
test this email

whether can send linux-kernel@...r.kernel.org .


Hi Bjorn,

I think my patch is different from Lucas, because I use 
pcie_wait_for_link not

pci_bridge_wait_for_secondary_bus, my patch is similar to the process logic

in pci_bridge_wait_for_secondary_bus which also call pcie_wait_for_link.


pcie_wait_for_link wait fixed 100ms and then wait the data link is 
ready, but

pci_bridge_wait_for_secondary_bus call pcie_wait_for_link wait time depends

on the devices max waiting time in bus, the calculate max time having a bug

as below,


In pci_bridge_wait_for_secondary_bus, pci_bus_max_d3cold_delay will take 
count of wrong time delay,

such as NVIDIA GPU T4 is not pci bridge, so the subordinate is none, 
pci_bus_max_d3cold_delay

set the min_delay is 100, max_delay is 0, here is the bug, after 
list_for_each_entry() in pci_bus_max_d3cold_delay,

the min_delay will be 0, the max_delay also 0, the 
pci_bus_max_d3cold_delay return is surely 0.


Last, I request Ravi Kishore Koppuravuri to test my patch to see Intel 
Ponte Vecchio HPC GPU

whether can work, I think my patch will wait enough time to be ready 
after secondary bus reset.


I have tested NVIDIA GPU T4 and NVIDIA GPU A100 which my patch is ok, 
but I think there is need

more test to validate my patch. But the fact is I do not have enough 
device to validate. If Ravi Kishore Koppuravuri

can help me test, the patch test will be more enough, and I would be 
grateful for test. Thank you very much!


Yang


On 2023/1/13 06:48, Bjorn Helgaas wrote:
> [+cc Lukas, Mika]
>
> Hi Yang Su,
>
> Thank you for your patch!
>
> On Sun, Jan 01, 2023 at 05:22:33PM +0800, Yang Su wrote:
>> On PCI Express, there will be cases where the new code sleeps far less
>> than the 1s being replaced by this patch. This should be okay, because
>> PCI Express Base Specification Revision 5.0 Version 1.0 (May 22, 2019)
>> in Section 6.6.1 "Conventional Reset" only notes 100ms as the minimum
>> waiting time. After this time, the OS is permitted to issue
>> Configuration Requests, but it is possible that the device responds
>> with Configuration Request Retry Status (CRS) Completions, rather than
>> Successful Completion. Returning CRS can go on for up to 1 second after
>> a Conventional Reset (such as SBR) before the OS can consider the device
>> broken. This additional wait is handled by pci_dev_wait. Besides,
>> this patch also cover PCI and PCI-X after device reset waiting Tpvrh 1000ms.
>>
>> Currently, the only callchain that lands in the function modified by
>> this patch which invokes one out of two versions of pcibios_reset_secondary_bus
>> that both end with a call to pci_reset_secondary_bus.
>>
>> Afterwards, pci_reset_secondary_bus always invokes pci_dev_wait
>> which wait for the device to return a non-CRS completion.
>>
>> Signed-off-by: Yang Su <yang.su@...ux.alibaba.com>
>> ---
>>   drivers/pci/pci.c | 36 +++++++++++++++++++++++++++++++++---
>>   1 file changed, 33 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index fba95486caaf..8e4899755718 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5063,10 +5063,40 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>>   	 * Trhfa for conventional PCI is 2^25 clock cycles.
>>   	 * Assuming a minimum 33MHz clock this results in a 1s
>>   	 * delay before we can consider subordinate devices to
>> -	 * be re-initialized.  PCIe has some ways to shorten this,
>> -	 * but we don't make use of them yet.
>> +	 * be re-initialized.
>> +	 *
>> +	 * For conventional PCI needing 1s delay after bus reset.
>> +	 * Using pci_is_pcie to judge the bus is pci or pcie.
>> +	 * If the bus is pci, sleeping 1s to wait device is ready.
>> +	 *
>> +	 * And if the bus is pcie, PCI Express Base Specification Revision 2.0
>> +	 * (December 20, 2006) in Section 6.6.1 "Conventional Reset" only notes
>> +	 * 100ms as the minimum waiting time, the same as the newer PCIe spec
>> +	 * PCI Express Base Specification Revision 3.0 Version 1.a (December 7, 2015)
>> +	 * and PCI Express Base Specification Revision 5.0 Version 1.0 (May 22, 2019).
>> +	 * With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
>> +	 * software must wait a minimum of 100 ms after Link training completes before
>> +	 * sending a Configuration Request to the device immediately below that Port.
>> +	 * After this time, the OS is permitted to issue Configuration Requests,
>> +	 * but it is possible that the device responds with Configuration Request
>> +	 * Retry Status (CRS) Completions, rather than Successful Completion.
>> +	 * Returning CRS can go on for up to 1 second after a Conventional Reset
>> +	 * (such as SBR) before the OS can consider the device. This additional
>> +	 * wait is handled by pci_dev_wait.
>> +	 *
>> +	 * Currently, the only callchain that lands in the function modified by
>> +	 * this patch starts at pci_bridge_secondary_bus_reset which invokes
>> +	 * one out of two versions of pcibios_reset_secondary_bus that both end
>> +	 * with a call to pci_reset_secondary_bus.
>> +	 * Afterwards, pci_bridge_secondary_bus_reset always invokes pci_dev_wait.
>>   	 */
>> -	ssleep(1);
>> +	if (pci_is_pcie(dev))
>> +		if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT)
>> +			msleep(100);
>> +		else
>> +			pcie_wait_for_link(dev, true);
>> +	else
>> +		ssleep(1);
> This code is also updated by Lukas' patch at
> https://lore.kernel.org/r/bd6ac49d60c1ca6fe5c27c2fa54b78d70a8ba07b.1672511017.git.lukas@wunner.de,
> which is pretty much ready to go.
>
> Can you take a look at that series and see whether it solves the same
> problem you're solving here?  And if not, can you provide feedback on
> what would still be needed?
>
> If you do need something on top of Lukas' series, please CC him if you
> post a revised patch.
>
> Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ