[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <00b01ab3-ec9d-4a35-a593-c9fc764e0f04@simg.de>
Date: Fri, 17 Jan 2025 22:31:55 +0100
From: Stefan <linux-kernel@...g.de>
To: Christoph Hellwig <hch@....de>,
Thorsten Leemhuis <regressions@...mhuis.info>, bugzilla-daemon@...nel.org
Cc: Bruno Gravato <bgravato@...il.com>, Keith Busch <kbusch@...nel.org>,
Adrian Huang <ahuang12@...ovo.com>,
Linux kernel regressions list <regressions@...ts.linux.dev>,
linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock
X600M-STX + Ryzen 8700G
Hi,
>> What does it mean that disabling the NVMe devices's write cache
>> often but apparently not always helps? It it just reducing the
>> chance of the problem occurring or accidentally working around it?
>
> For consumer NAND device you basically can't disable the volatile
> write cache. If you do disable it, that just means it gets flushed
> after every write, meaning you have to write the entire NAND
> (super)block for every write, causing a huge slowdown (and a lot of
> media wear). This will change timings a lot obviously. If it
> doesn't change the timing the driver just fakes it, which reputable
> vendors shouldn't be doing, but I would not be entirely surprised
> about for noname devices.
As already mentioned, my SSD has no DRAM and uses HMB (Host memory
buffer). (It has non-volatile SLC cache.) Disabling volatile write cache
has no significant effect on read/write performance of large files,
because the HMB size in only 40MB. But things like file deletions may be
slower.
AFAIS the corruption occur with both kinds of SSD's, the ones that have
own DRAM and he ones that use HMB.
> --- Comment #49 from Bruno Gravato ---
>> * Not totally sure, but it seems most or everyone affected is
>> using a Ryzen 8000 CPU -- and now one user showed up that mentioned
>> a DeskMini x600 with a Ryzen 7000 CPU is not affected (see ticket
>> for details). But that might be due to other aspects. A former
>> colleague of mine who can reproduce the problem will later test if
>> a different CPU line really is making a difference.
>
> One other different aspect for that user besides the 7000 series CPU
> is that he's using a wifi card as well (that sits in a M.2 wifi slot
> just below the main M.2 disk slot), so I wonder if that may play a
> role? I think most of us have no wifi card installed. I think I have
> a M.2 wifi card on my former NUC, I'll see if it's compatible with
> the deskmini and try it out.
>
> The other reason could be some disk models aren't affected... I think
> Stefan reported no issues on a Firecuda 520.
Correct. To verify that the two other CPU series are not affected,
someone who can reproduce this error and who have laying around another
CPU must swap them.
> --- Comment #51 from Ralph Gerstman --- > A missing network might prevent the failure during install - at least
> in Ubuntu> 22.10 - but can happen anyway. Enabling network seems to
> raise the chance.
I had to disable it in BIOS. Just not connecting it has no effect
because drivers and firmware are still loaded.
Just for the files (already mentioned it): I'm using the latest BIOS
version 4.08 with AGESA PI 1.2.0.2a (according to AsRock page) and
firmware blobs version 20241210 from
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
and I can confirm the the corruptions also occur with older versions of
BIOS/firmware.
Regards Stefan
Powered by blists - more mailing lists