linux-kernel - Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <210e7b28-de05-44bc-9604-83a79ae131b0@leemhuis.info>
Date: Thu, 9 Jan 2025 09:52:22 +0100
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: Christoph Hellwig <hch@....de>, Keith Busch <kbusch@...nel.org>
Cc: Adrian Huang <ahuang12@...ovo.com>,
 Linux kernel regressions list <regressions@...ts.linux.dev>,
 linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
 "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
 LKML <linux-kernel@...r.kernel.org>, linux-kernel@...g.de, bgravato@...il.com
Subject: Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock
 X600M-STX

[CCing the people from
https://bugzilla.kernel.org/show_bug.cgi?id=219609, as they permitted that.

Stefan, Bruno, reminder: some developers might not follow the ticket or
unwilling to go to a web-based bug tracker; so any answers to questions
that are raised here via email might not be seen if you only provide
them in the bug tracker; yes, that sucks, but that's how it is for now;
hopefully things on that front will improve soon.]

On 09.01.25 09:28, Christoph Hellwig wrote:
> On Wed, Jan 08, 2025 at 08:07:28AM -0700, Keith Busch wrote:
>> It should always be okay to do smaller transfers as long as everything
>> stays aligned the logical block size. I'm guessing the dma opt change
>> has exposed some other flaw in the nvme controller. For example, two
>> consecutive smaller writes are hitting some controller side caching bug
>> that a single larger trasnfer would have handled correctly. The host
>> could have sent such a sequence even without the patch reverted, but
>> happens to not be doing that in this particular test.
> 
> Yes.  This somehow reminds of the bug with an Intel SSD that got
> really upset with quickly following writes to different LBAs inside the
> same indirection unit.  But as the new smaller size is nicely aligned
> that seems unlikely.  Maybe the higher number of commands simply overloads
> the buggy firmware?

Thx for the assessment. FWIW, I bought such a machine myself recently
and it's still in a state where I could abandon the install. I haven't
checked yet if mine is affected, too.

> Of course the real question is why we're even seeing the limitation.
> The value suggests it's the swiotlb one.  Does the system use AMD SEV
> (memory encryption)?

In case it is helpful to anyone: there are some logs buried deep in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 I'm attaching
one of the kernel logs I found there (there were multiple ones; hope I
picked a appropriate one) for easier access.

Ciao, Thorsten
Download attachment "kern.log-6.11.5" of type "application/x-troff-man" (118169 bytes)