linux-kernel - WD Red SN700 4000GB, F/W: 11C120WD (Device not ready; aborting reset, CSTS=0x1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO9zADxCYgQVOD9A1WYoS4JcLgvsNtGGr4xEZm9CMFHXsTV8ww@mail.gmail.com>
Date: Tue, 25 Nov 2025 09:42:11 -0500
From: Justin Piszcz <jpiszcz@...idpixels.com>
To: LKML <linux-kernel@...r.kernel.org>, linux-nvme@...ts.infradead.org, 
	linux-raid@...r.kernel.org, Btrfs BTRFS <linux-btrfs@...r.kernel.org>
Subject: WD Red SN700 4000GB, F/W: 11C120WD (Device not ready; aborting reset, CSTS=0x1)

Hello,

Issue/Summary:
1. Usually once a month, a random WD Red SN700 4TB NVME drive will
drop out of a NAS array, after power cycling the device, it rebuilds
successfully.

Details:
0. I use an NVME NAS (FS6712X) with WD Red SN700 4TB drives (WDS400T1R0C):
1. Ever since I installed the drives, there will be a random drive
that drops offline every month or so, almost always when the system is
idle.
2. I have troubleshot this with Asustor and WD/SanDisk.
3. Asustor noted that they did have other users with the same
configuration running into this problem.
4. When troubleshooting with WD/SanDisk's it was noted my main option
is to replace the drive, even though the issue occurs across nearly
all of the drives.
5. The drives are up to date currently according to the WD Dashboard
(when removing them and checking them on another system).
6. As for the device/filesystem, the FS6712X's configuration is
MD-RAID6 device with BTRFS on-top of it.
7. The "workaround" is to power cycle the FS6712X and when it boots up
the MD-RAID6 re-syncs back to a healthy state.

I am using the latest Asus ADM/OS which uses the 6.6.x kernel:
1. Linux FS6712X-EB92 6.6.x #1 SMP PREEMPT_DYNAMIC Tue Nov  4 00:53:39
CST 2025 x86_64 GNU/Linux

Questions:
1. Have others experienced this failure scenario?
2. Are there identified workarounds for this issue outside of power
cycling the device when this happens?
3. Are there any debug options that can be enabled that could help to
pinpoint the root cause?
4. Within the BIOS settings, which starts 2:18 below there are some
advanced settings that are shown, could there be a power saving
feature or other setting that can be modified to address this issue?
4a. https://www.youtube.com/watch?v=YytWFtgqVy0

[1] The last failures have been at random times on the following days:
1. August 27, 2025
2. September 19th, 2025
3. September 29th, 2025
4. October 28th, 2025
5. November 24, 2025

Chipset being used:
1. ASMedia Technology Inc.:ASM2806 4-Port PCIe x2 Gen3 Packet Switch

Details:

1. August 27, 2025
[1156824.598513] nvme nvme2: I/O 5 QID 0 timeout, reset controller
[1156896.035355] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
[1156906.057936] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
[1158185.737571] md/raid:md1: Disk failure on nvme2n1p4, disabling device.
[1158185.744188] md/raid:md1: Operation continuing on 11 devices.

2. September 19th, 2025
[2001664.727044] nvme nvme9: I/O 26 QID 0 timeout, reset controller
[2001736.159123] nvme nvme9: Device not ready; aborting reset, CSTS=0x1
[2001746.180813] nvme nvme9: Device not ready; aborting reset, CSTS=0x1
[2002368.631788] md/raid:md1: Disk failure on nvme9n1p4, disabling device.
[2002368.638414] md/raid:md1: Operation continuing on 11 devices.
[2003213.517965] md/raid1:md0: Disk failure on nvme9n1p2, disabling device.
[2003213.517965] md/raid1:md0: Operation continuing on 11 devices.

3.  September 29th, 2025
[858305.408049] nvme nvme3: I/O 8 QID 0 timeout, reset controller
[858376.843140] nvme nvme3: Device not ready; aborting reset, CSTS=0x1
[858386.865240] nvme nvme3: Device not ready; aborting reset, CSTS=0x1
[858386.883053] md/raid:md1: Disk failure on nvme3n1p4, disabling device.
[858386.889586] md/raid:md1: Operation continuing on 11 devices.

4. October 28th, 2025
[502963.821407] nvme nvme4: I/O 0 QID 0 timeout, reset controller
[503035.257391] nvme nvme4: Device not ready; aborting reset, CSTS=0x1
[503045.282923] nvme nvme4: Device not ready; aborting reset, CSTS=0x1
[503142.226962] md/raid:md1: Disk failure on nvme4n1p4, disabling device.
[503142.233496] md/raid:md1: Operation continuing on 11 devices.

5. November 24th, 2025
[1658454.034633] nvme nvme2: I/O 24 QID 0 timeout, reset controller
[1658525.470287] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
[1658535.491803] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
[1658535.517638] md/raid1:md0: Disk failure on nvme2n1p2, disabling device.
[1658535.517638] md/raid1:md0: Operation continuing on 11 devices.
[1659258.368386] md/raid:md1: Disk failure on nvme2n1p4, disabling device.
[1659258.375012] md/raid:md1: Operation continuing on 11 devices.

Justin