lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <417720e7-c793-4c36-a542-a7e937e5a3cf@163.com>
Date: Fri, 17 Jan 2025 21:53:09 +0800
From: Jiwei Sun <sjiwei@....com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: macro@...am.me.uk, bhelgaas@...gle.com, linux-pci@...r.kernel.org,
 LKML <linux-kernel@...r.kernel.org>, helgaas@...nel.org,
 Lukas Wunner <lukas@...ner.de>, ahuang12@...ovo.com, sunjw10@...ovo.com,
 jiwei.sun.bj@...com, sunjw10@...look.com
Subject: Re: [PATCH v2 2/2] PCI: reread the Link Control 2 Register before
 using



On 1/16/25 22:12, Ilpo Järvinen wrote:
> On Wed, 15 Jan 2025, Jiwei Sun wrote:
> 
>> From: Jiwei Sun <sunjw10@...ovo.com>
>>
>> Since commit de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set
>> PCIe Link Speed"), the local variable "lnkctl2" is not changed after
>> reading from PCI_EXP_LNKCTL2 in the pcie_failed_link_retrain(). It might
>> cause that the removing 2.5GT/s downstream link speed restriction codes
>> are not executed.
>>
>> Reread the Link Control 2 Register before using.
>>
>> Fixes: de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
>> Suggested-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
>> Signed-off-by: Jiwei Sun <sunjw10@...ovo.com>
>> ---
>>  drivers/pci/quirks.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 76f4df75b08a..02d2e16672a8 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -123,6 +123,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
>>  			return ret;
>>  		}
>>  
>> +		pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
>>  		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
>>  	}
> 
> I started to wonder if there would be a better way to fix this. If I've 
> understood Maciej's intent correctly, there are two cases when the 2nd 
> step (the one lifting the 2.5GT/s restriction) should be used:
> 
> 1) TLS is 2.5GT/s at the entry to the quirk and it's an ASMedia switch.
> 
> 2) If the quirk lowered TLS to 2.5GT/s and the link became up fine 
> because of that. This also currently checks for ASMedia but in the v1 you 
> also proposed to change that. We know it works in some cases with ASMedia 
> but are unsure what happens with other switches.
> 
> In the case 2, we don't need to check TLS since the function itself asked 
> TLS to be lowered which did not return an error, so we know the speed was 
> lowered so why spend time on rereading the register? A simple local 
> variable could convey the same information.

Uh, maybe my commit messages wasn't clear. My understanding is as
following,

  98 int pcie_failed_link_retrain(struct pci_dev *dev)
  99 {  
			 ...
 111         pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
 112         pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
 113         if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
 114                 u16 oldlnkctl2 = lnkctl2;
 115         
 116                 pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
 117         
 118                 ret = pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false);
 119                 if (ret) {
 120                         pci_info(dev, "retraining failed\n");
 121                         pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2),
 122                                               true);
 123                         return ret;
 124                 }
 125         
 126                 pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
 127         }
 128         
 129         if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
 130             (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
 131             pci_match_id(ids, dev)) {
 132                 u32 lnkcap;
 133         
 134                 pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
 135                 pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
 136                 ret = pcie_set_target_speed(dev, PCIE_LNKCAP_SLS2SPEED(lnkcap), false);


Consider the following scenario:
After the execution of line 111, lnkctl2.tls is 0x3 (Gen 3, such as
for the device ASMedia ASM2824). The function call
pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false) on line 118 returns 0,
and the tls field of the link control 2 register is modified to 0x1 
(corresponding to 2.5GT/s).

However, within this section of code, lnkctl2 is not modified (after 
reading from register on line 111) at all and remains 0x5. This results 
in the condition on line 130 evaluating to 0 (false), which in turn 
prevents the code from line 132 onward from being executed. The removing
2.5GT/s downstream link speed restriction work can not be done.

Before the commit de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed()
to set PCIe Link Speed"), after setting the minimum speed and successfully
completing retraining, an attempt would be made to clear the minimum speed.
IIUC, even for this device ASMedia ASM2824, it is necessary to reread the
link control 2 register after successful retraining on line 118.

> 
> In case 1, I don't think ASMedia check should be removed. It was about a 
> case where FW has a workaround to lower the speed (IIRC). If Link Speed is 
> 2.5GT/s at entry but it's not ASMedia switch, there's no 2.5GT/s 
> restriction to lift.
> 
> 
> As a general comment abouth your test case, I don't think this Target 
> Speed quirk really is compatible with your test case. The quirk makes 
> assumption that the Link Up/Down status changes are because of the changes 
> the quirk made itself but your rapid add/remove test alters the 
> environment, which produces unrelated Link Up/Down status changes that get 
> picked up by the quirk (a false signal).

Yes, it seems that our testing has entered an inappropriate code flow, 
but how to avoid it is a troublesome issue.

Thanks,
Regards,
Jiwei
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ