[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOBLbT8103fAyoFNF8=YcEM1sM6HodcUe=Ee5NsE2hUkfCYv7g@mail.gmail.com>
Date: Wed, 15 Jan 2025 06:37:42 +0000
From: Bruno Gravato <bgravato@...il.com>
To: Stefan <linux-kernel@...g.de>
Cc: Keith Busch <kbusch@...nel.org>, bugzilla-daemon@...nel.org,
Adrian Huang <ahuang12@...ovo.com>,
Linux kernel regressions list <regressions@...ts.linux.dev>, linux-nvme@...ts.infradead.org,
Jens Axboe <axboe@...com>, "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
LKML <linux-kernel@...r.kernel.org>,
Thorsten Leemhuis <regressions@...mhuis.info>, Christoph Hellwig <hch@....de>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock
X600M-STX + Ryzen 8700G
I finally got the chance to run some more tests with some interesting
and unexpected results...
I put another disk (WD Black SN750) in the main M.2 slot (the
problematic one), but kept my main disk (Solidigm P44 Pro) in the
secondary M.2 slot (where it doesn't have any issues).
I rerun my test: step 1) copy a large number of files to the WD disk
(main slot), step 2) run btrfs scrub on it and expect some checksum
errors
To my surprise there were no errors!
I tried it twice with different kernels (6.2.6 and 6.11.5) and booting
from either disk (I have linux installations on both).
Still no errors.
I then removed the Solidigm disk from the secondary and kept the WD
disk in the main M.2 slot.
Rerun my tests (on kernel 6.11.5) and bang! btrfs scrub now detected
quite a few checksum errors!
I then tried disabling volatile write cache with "nvme set-feature
/dev/nvme0 -f 6 -v 0"
"nvme get-feature /dev/nvme0 -f 6" confirmed it was disabled, but
/sys/block/nvme0n1/queue/fua still showed 1... Was that supposed to
turn into 0?
I re-run my test, but I still got checksum errors on btrfs scrub. So
disabling volatile write cache (assuming I did it correctly) didn't
make a difference in my case.
I put the Solidigm disk back into the secondary slot, booted and rerun
the test on the WD disk (main slot) just to be triple sure and still
no errors.
So it looks like the corruption only happens if only the main M.2 slot
is occupied and the secondary M.2 slot is free.
With two nvme disks (one on each M.2 slot), there were no errors at all.
Stefan, did you ever try running your tests with 2 nvme disks
installed on both slots? Or did you use only one slot at a time?
Bruno
Powered by blists - more mailing lists