linux-kernel - Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z36UsE5dj6j5HhkX@kbusch-mbp>
Date: Wed, 8 Jan 2025 08:07:28 -0700
From: Keith Busch <kbusch@...nel.org>
To: Thorsten Leemhuis <regressions@...mhuis.info>
Cc: Adrian Huang <ahuang12@...ovo.com>, Christoph Hellwig <hch@....de>,
	Linux kernel regressions list <regressions@...ts.linux.dev>,
	linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock
 X600M-STX

On Wed, Jan 08, 2025 at 03:38:53PM +0100, Thorsten Leemhuis wrote:
> [side note TWIMC: regression tracking is sadly kinda dormant temporarily
> (hopefully this will change again soon), but this was brought to my
> attention and looked kinda important]
> 
> Hi, Thorsten here, the Linux kernel's regression tracker.
> 
> Adrian, Christoph I noticed a report about a regression in
> bugzilla.kernel.org that appears to be caused by a change you too
> handled a while ago -- or it exposed an earlier problem:
> 
> 3710e2b056cb92 ("nvme-pci: clamp max_hw_sectors based on DMA optimized
> limitation") [v6.4-rc3]

...
 
> > The bug is triggered by the patch "nvme-pci: clamp max_hw_sectors
> > based on DMA optimized limitation" (see https://lore.kernel.org/linux-
> > iommu/20230503161759.GA1614@....de/ ) introduced in 6.3.7
> > 
> > To examine the situation, I added this debug info (all files are
> > located in `drivers/nvme/host`):
> > 
> >> --- core.c.orig       2025-01-03 14:27:38.220428482 +0100
> >> +++ core.c    2025-01-03 12:56:34.503259774 +0100
> >> @@ -3306,6 +3306,7 @@
> >>               max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
> >>       else
> >>               max_hw_sectors = UINT_MAX;
> >> +     dev_warn(ctrl->device, "id->mdts=%d,  max_hw_sectors=%d, 
> >> ctrl->max_hw_sectors=%d\n", id->mdts, max_hw_sectors, ctrl->max_hw_sectors);
> >>       ctrl->max_hw_sectors =
> >>               min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
> > 
> > 6.3.6 (last version w/o mentioned patch and w/o data corruption) says:
> > 
> >> [  127.196212] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
> >> ctrl->max_hw_sectors=16384
> >> [  127.203530] nvme nvme0: allocated 40 MiB host memory buffer.
> > 
> > 6.3.7 (first version w/ mentioned patch and w/ data corruption) says:
> > 
> >> [   46.436384] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
> >> ctrl->max_hw_sectors=256
> >> [   46.443562] nvme nvme0: allocated 40 MiB host memory buffer.

It should always be okay to do smaller transfers as long as everything
stays aligned the logical block size. I'm guessing the dma opt change
has exposed some other flaw in the nvme controller. For example, two
consecutive smaller writes are hitting some controller side caching bug
that a single larger trasnfer would have handled correctly. The host
could have sent such a sequence even without the patch reverted, but
happens to not be doing that in this particular test.