lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fe363c33-b42a-4613-a633-694edcebb2ee@4net.rs>
Date: Mon, 8 Dec 2025 14:37:01 +0100
From: Sinisa <sinisa@...t.rs>
To: Paul Rolland <rol@...917.net>, Dragan Milivojević
 <galileo@...-inc.com>
Cc: LKML <linux-kernel@...r.kernel.org>, linux-nvme@...ts.infradead.org,
 linux-raid@...r.kernel.org
Subject: Re: WD Red SN700 4000GB, F/W: 11C120WD (Device not ready; aborting
 reset, CSTS=0x1)

Hello Dragan (and others),

Just to add my ¢2: I have also had NVMe drives dropping out of md RAID10, after reboot SMART says that they are perfectly fine and I am able to re-add them to 
RAID, just for the same situation to happen a few weeks/months later again.

I have seen this on consumer grade motherboards from ASUS, MSI and Gigabyte, but also on Supermicro servers (actually on only one Supermicro SYS-6029P-TR, but 
multiple times, as far as I can remember).

Affected drives are Samsung 980 Pro and Samsung 990 Pro, but I think there were also some Kingston ones (I have replaced them all in the meantime).

Now, I try to always run the latest stable kernel on those machines/servers, so all of them are now on 6.17 and I think that I haven't seen this problem since I 
upgraded to it.


Btw.

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off

didn't seem to help, I have tried with those parameters before, but the problem would appear after some time, although maybe less frequently.


Btw2.
I don't know if that is related, but I have also had this happen with rotating SATA disks, most recently yesterday on my home/office "server" (MSI PRO B650-P 
WIFI (MS-7D78), 128GB RAM, kernel 6.17.9):
[Sun Dec  7 10:12:18 2025] [    T772] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Sun Dec  7 10:12:18 2025] [    T772] ata6.00: failed command: FLUSH CACHE EXT
[Sun Dec  7 10:12:18 2025] [    T772] ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 3
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[Sun Dec  7 10:12:18 2025] [    T772] ata6.00: status: { DRDY }
[Sun Dec  7 10:12:18 2025] [    T772] ata6: hard resetting link
[Sun Dec  7 10:12:24 2025] [    T772] ata6: link is slow to respond, please be patient (ready=0)
[Sun Dec  7 10:12:28 2025] [    T772] ata6: found unknown device (class 0)
[Sun Dec  7 10:12:28 2025] [    T772] ata6: softreset failed (device not ready)
... (repeat last 4 rows 4 more times)
[Sun Dec  7 10:13:19 2025] [    T772] ata6.00: disable device
[Sun Dec  7 10:13:19 2025] [    T772] ata6: EH complete
[Sun Dec  7 10:13:19 2025] [     C14] sd 5:0:0:0: [sdb] tag#5 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=123s
[Sun Dec  7 10:13:19 2025] [     C14] sd 5:0:0:0: [sdb] tag#5 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[Sun Dec  7 10:13:19 2025] [     C14] I/O error, dev sdb, sector 2064 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[Sun Dec  7 10:13:19 2025] [     C14] md: super_written gets error=-5
[Sun Dec  7 10:13:19 2025] [     C14] md/raid10:md3: Disk failure on sdb1, disabling device.
md/raid10:md3: Operation continuing on 1 devices.
[Sun Dec  7 10:13:19 2025] [     C14] sd 5:0:0:0: [sdb] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
.... (many, many I/O errors)

So this morning I just ran (without reboot):
     for I in /sys/class/scsi_host/host*/scan
       echo "- - -" > $I
     done
and the drive is back, no errors logged in SMART, re-added to RAID, currently re-syncing.


Srdačan pozdrav / Best regards / Freundliche Grüße / Cordialement / よろしくお願いします
Siniša Bandin


On 11/25/25 5:57 PM, Paul Rolland wrote:
> Hello,
>
> On Tue, 25 Nov 2025 16:19:27 +0100
> Dragan Milivojević <galileo@...-inc.com> wrote:
>
>>> Issue/Summary:
>>> 1. Usually once a month, a random WD Red SN700 4TB NVME drive will
>>> drop out of a NAS array, after power cycling the device, it rebuilds
>>> successfully.
>>>   
>> Seen the same, although far less frequent, with Samsung SSD 980 PRO on
>> a Dell PowerEdge R7525.
>> It's the nature of consumer grade drives, I guess.
>>
> Got some issue long time ago, and used :
>
> nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off
>
> to boot the kernel. That fixed issue with SN700 2TB.
>
> Regards,
> Paul
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ