lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Jun 2022 09:32:02 +0000
From:   "R, Monish Kumar" <monish.kumar.r@...el.com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
CC:     "open list:NVM EXPRESS DRIVER" <linux-nvme@...ts.infradead.org>,
        "Sagi Grimberg" <sagi@...mberg.me>,
        "alan.adamson@...cle.com" <alan.adamson@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Yi Zhang <yi.zhang@...hat.com>,
        Keith Busch <kbusch@...nel.org>, "axboe@...com" <axboe@...com>,
        Christoph Hellwig <hch@....de>,
        "Rao, Abhijeet" <abhijeet.rao@...el.com>
Subject: RE: 2 second nvme initialization delay regression in 5.18 [Was: Re:
 [bug report]nvme0: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) MORE DNR
 observed during blktests]

Hi Jason,

I would like to provide justification for this Samsung X5 SSD fix added.
We were facing SSD enumeration issue after cold / warm reboot with device 
connected ends up with probe failures.

When I debug on this issue, I could find that this device was not enumerating 
once the system got booted. Moreover, we were facing this enumeration issue
specific to this device. 

Based on analysis, due to deep power state of the device fails to enumerate.
So, added the following quirks as a workaround fixe and it helps to enumerate the device after cold/warm reboot. If new Samsung X5 SSD's are working fine as expected, we can remove those 
fix. 

Regarding the PCI-Id's, I have confirmed from the logs and it shows as vendor ID : 0x144d 
device ID : 0xa808. I am not sure about why Samsung 970 EVO Plus have the same PCI-Ids.

Logs for reference : 
After connecting Samsung X5 SSD.

lspci
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983

dmesg
Line 1478: <6>[  112.838998] pci 0000:04:00.0: [144d:a808] type 00 class 0x010802
Line 1479: <6>[  112.845765] pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
Line 1480: <6>[  112.853715] pci 0000:04:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
Line 1481: <6>[  112.870536] pci 0000:04:00.0: Adding to iommu group 22
Line 1498: <6>[  113.019698] pci 0000:04:00.0: BAR 0: assigned [mem 0x83000000-0x83003fff 64bit]

Regards,
Monish Kumar R

-----Original Message-----
From: Jason A. Donenfeld <Jason@...c4.com> 
Sent: 09 June 2022 14:04
To: R, Monish Kumar <monish.kumar.r@...el.com>
Cc: open list:NVM EXPRESS DRIVER <linux-nvme@...ts.infradead.org>; Sagi Grimberg <sagi@...mberg.me>; alan.adamson@...cle.com; LKML <linux-kernel@...r.kernel.org>; Yi Zhang <yi.zhang@...hat.com>; Keith Busch <kbusch@...nel.org>; axboe@...com; Christoph Hellwig <hch@....de>; Rao, Abhijeet <abhijeet.rao@...el.com>
Subject: Re: 2 second nvme initialization delay regression in 5.18 [Was: Re: [bug report]nvme0: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) MORE DNR observed during blktests]

Hey again,

Figured it out. 2.3 seconds to be exact... It looks like this is caused by:

bc360b0b1611 ("nvme-pci: add quirks for Samsung X5 SSDs") https://lore.kernel.org/all/20220316075449.18906-1-monish.kumar.r@intel.com/

This commit doesn't have any justification and got applied without much discussion. Perhaps Monish could supply some more info about why this is needed here? FTR, I have no issues on my system when reverting that. Perhaps it should be reverted. (I can send a revert commit for that if necessary.)

Looking further, however, the PCIe ID is said to be for a "Samsung X5", which Google says is a portable thunderbolt drive. Is the PCIe ID correct? On my system, this is the PCIe ID of a Samsung 970 EVO Plus.
Is it possible that Monish copied and pasted the wrong PCIe ID? Or has Samsung *reused* the same PCIe ID on both devices? In which case, we'd need some additional data for that quirk to avoid the delay.

Also note that this (potentially errant) commit has been backported to stable.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ