[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <210e7b28-de05-44bc-9604-83a79ae131b0@leemhuis.info>
Date: Thu, 9 Jan 2025 09:52:22 +0100
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: Christoph Hellwig <hch@....de>, Keith Busch <kbusch@...nel.org>
Cc: Adrian Huang <ahuang12@...ovo.com>,
Linux kernel regressions list <regressions@...ts.linux.dev>,
linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
LKML <linux-kernel@...r.kernel.org>, linux-kernel@...g.de, bgravato@...il.com
Subject: Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock
X600M-STX
[CCing the people from
https://bugzilla.kernel.org/show_bug.cgi?id=219609, as they permitted that.
Stefan, Bruno, reminder: some developers might not follow the ticket or
unwilling to go to a web-based bug tracker; so any answers to questions
that are raised here via email might not be seen if you only provide
them in the bug tracker; yes, that sucks, but that's how it is for now;
hopefully things on that front will improve soon.]
On 09.01.25 09:28, Christoph Hellwig wrote:
> On Wed, Jan 08, 2025 at 08:07:28AM -0700, Keith Busch wrote:
>> It should always be okay to do smaller transfers as long as everything
>> stays aligned the logical block size. I'm guessing the dma opt change
>> has exposed some other flaw in the nvme controller. For example, two
>> consecutive smaller writes are hitting some controller side caching bug
>> that a single larger trasnfer would have handled correctly. The host
>> could have sent such a sequence even without the patch reverted, but
>> happens to not be doing that in this particular test.
>
> Yes. This somehow reminds of the bug with an Intel SSD that got
> really upset with quickly following writes to different LBAs inside the
> same indirection unit. But as the new smaller size is nicely aligned
> that seems unlikely. Maybe the higher number of commands simply overloads
> the buggy firmware?
Thx for the assessment. FWIW, I bought such a machine myself recently
and it's still in a state where I could abandon the install. I haven't
checked yet if mine is affected, too.
> Of course the real question is why we're even seeing the limitation.
> The value suggests it's the swiotlb one. Does the system use AMD SEV
> (memory encryption)?
In case it is helpful to anyone: there are some logs buried deep in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 I'm attaching
one of the kernel logs I found there (there were multiple ones; hope I
picked a appropriate one) for easier access.
Ciao, Thorsten
Download attachment "kern.log-6.11.5" of type "application/x-troff-man" (118169 bytes)
Powered by blists - more mailing lists