linux-kernel - Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250109082849.GC20724@lst.de>
Date: Thu, 9 Jan 2025 09:28:49 +0100
From: Christoph Hellwig <hch@....de>
To: Keith Busch <kbusch@...nel.org>
Cc: Thorsten Leemhuis <regressions@...mhuis.info>,
	Adrian Huang <ahuang12@...ovo.com>, Christoph Hellwig <hch@....de>,
	Linux kernel regressions list <regressions@...ts.linux.dev>,
	linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Regression] File corruptions on SSD in 1st M.2 socket of
 AsRock X600M-STX

On Wed, Jan 08, 2025 at 08:07:28AM -0700, Keith Busch wrote:
> It should always be okay to do smaller transfers as long as everything
> stays aligned the logical block size. I'm guessing the dma opt change
> has exposed some other flaw in the nvme controller. For example, two
> consecutive smaller writes are hitting some controller side caching bug
> that a single larger trasnfer would have handled correctly. The host
> could have sent such a sequence even without the patch reverted, but
> happens to not be doing that in this particular test.

Yes.  This somehow reminds of the bug with an Intel SSD that got
really upset with quickly following writes to different LBAs inside the
same indirection unit.  But as the new smaller size is nicely aligned
that seems unlikely.  Maybe the higher number of commands simply overloads
the buggy firmware?

Of course the real question is why we're even seeing the limitation.
The value suggests it's the swiotlb one.  Does the system use AMD SEV
(memory encryption)?