lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z36UsE5dj6j5HhkX@kbusch-mbp>
Date: Wed, 8 Jan 2025 08:07:28 -0700
From: Keith Busch <kbusch@...nel.org>
To: Thorsten Leemhuis <regressions@...mhuis.info>
Cc: Adrian Huang <ahuang12@...ovo.com>, Christoph Hellwig <hch@....de>,
	Linux kernel regressions list <regressions@...ts.linux.dev>,
	linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock
 X600M-STX

On Wed, Jan 08, 2025 at 03:38:53PM +0100, Thorsten Leemhuis wrote:
> [side note TWIMC: regression tracking is sadly kinda dormant temporarily
> (hopefully this will change again soon), but this was brought to my
> attention and looked kinda important]
> 
> Hi, Thorsten here, the Linux kernel's regression tracker.
> 
> Adrian, Christoph I noticed a report about a regression in
> bugzilla.kernel.org that appears to be caused by a change you too
> handled a while ago -- or it exposed an earlier problem:
> 
> 3710e2b056cb92 ("nvme-pci: clamp max_hw_sectors based on DMA optimized
> limitation") [v6.4-rc3]

...
 
> > The bug is triggered by the patch "nvme-pci: clamp max_hw_sectors
> > based on DMA optimized limitation" (see https://lore.kernel.org/linux-
> > iommu/20230503161759.GA1614@....de/ ) introduced in 6.3.7
> > 
> > To examine the situation, I added this debug info (all files are
> > located in `drivers/nvme/host`):
> > 
> >> --- core.c.orig       2025-01-03 14:27:38.220428482 +0100
> >> +++ core.c    2025-01-03 12:56:34.503259774 +0100
> >> @@ -3306,6 +3306,7 @@
> >>               max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
> >>       else
> >>               max_hw_sectors = UINT_MAX;
> >> +     dev_warn(ctrl->device, "id->mdts=%d,  max_hw_sectors=%d, 
> >> ctrl->max_hw_sectors=%d\n", id->mdts, max_hw_sectors, ctrl->max_hw_sectors);
> >>       ctrl->max_hw_sectors =
> >>               min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
> > 
> > 6.3.6 (last version w/o mentioned patch and w/o data corruption) says:
> > 
> >> [  127.196212] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
> >> ctrl->max_hw_sectors=16384
> >> [  127.203530] nvme nvme0: allocated 40 MiB host memory buffer.
> > 
> > 6.3.7 (first version w/ mentioned patch and w/ data corruption) says:
> > 
> >> [   46.436384] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
> >> ctrl->max_hw_sectors=256
> >> [   46.443562] nvme nvme0: allocated 40 MiB host memory buffer.

It should always be okay to do smaller transfers as long as everything
stays aligned the logical block size. I'm guessing the dma opt change
has exposed some other flaw in the nvme controller. For example, two
consecutive smaller writes are hitting some controller side caching bug
that a single larger trasnfer would have handled correctly. The host
could have sent such a sequence even without the patch reverted, but
happens to not be doing that in this particular test.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ