linux-kernel - [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <401f2c46-0bc3-4e7f-b549-f868dc1834c5@leemhuis.info>
Date: Wed, 8 Jan 2025 15:38:53 +0100
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: Adrian Huang <ahuang12@...ovo.com>, Christoph Hellwig <hch@....de>
Cc: Linux kernel regressions list <regressions@...ts.linux.dev>,
 Keith Busch <kbusch@...nel.org>, linux-nvme@...ts.infradead.org,
 Jens Axboe <axboe@...com>, "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
 Linux kernel regressions list <regressions@...ts.linux.dev>,
 LKML <linux-kernel@...r.kernel.org>
Subject: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock
 X600M-STX

[side note TWIMC: regression tracking is sadly kinda dormant temporarily
(hopefully this will change again soon), but this was brought to my
attention and looked kinda important]

Hi, Thorsten here, the Linux kernel's regression tracker.

Adrian, Christoph I noticed a report about a regression in
bugzilla.kernel.org that appears to be caused by a change you too
handled a while ago -- or it exposed an earlier problem:

3710e2b056cb92 ("nvme-pci: clamp max_hw_sectors based on DMA optimized
limitation") [v6.4-rc3]

As many (most?) kernel developers don't keep an eye on the bug tracker,
I decided to write this mail. To quote from
https://bugzilla.kernel.org/show_bug.cgi?id=219609 :

> Bug 219609 - File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
>
> there are one or two bugs which were originally reported at
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 . For details
> (logs, etc.), see there. Here, I will post a summary and try to point
> out the most relevant observations:
> 
> Bug 1: Write errors with Lexar NM790 NVME
> 
> * Occur since Debian kernel 6.5, but reproduced with upstream kernel
> 6.11.5 (the only upstream kernel I tested)
> * Only occur in 1st M.2 socket (not in the 2nd one on rear side)
> * Easiest way to reproduce them is to use f3 (
> https://fight-flash-fraud.readthedocs.io/en/latest/usage.html ). f3
> reports overwritten sectors
> * The errors seem not to occur in the last files of 500 file (=500 GB)
> test runs and I never detected file system corruption (just defect
> files; I produced probably more than thousand ones). The reason for the
> latter observation is maybe, that file system information are written
> last. (See see message 113 in the Debian bug report)
> 
> (Possible) Bug 2: Read errors with Kingston FURY Renegade
> 
> * Only occur in 1st M.2 socket (did not tested the rear socket, because
> the warranty seal would to be broken in order to remove the heat sink)
> * Almost impossible to reproduce it, only detected it in Debian kernel
> that bases on 6.1.112
> * 1st occurrence: I detected in an SSD intensive computation (as data
> cache) which produced wrong results after a few days (but not in the
> first days). The error could be reproduced with f3: The corruptions were
> massive and different files were affected in subsequent f3read runs (==>
> read errors). Unfortunately I did not stored the f3 logs. (I still have
> the corrupt computation results, so it was real.)
> * 2nd occurrence: A single defect sector (read error) in a multi-day
> attempt to reproduce the error with the same kernel (Debian 6.1.112),
> see message 113 in the Debian bug report
> 
> Consideration / Notes:
> * These serial links (PCIe) need to be calibrated. Calibration issues
> would explain while the errors (dis)appear under certain condition. But
> errors like this should be detected (nothing could be found in the
> kernel logs). Is the error correction possibly inactive? However, this
> still does not explain why f2 reports overwritten sectors, unless the
> signal errors occur during command / address transmission.
> * Testing is difficult, because the machine is installed remotely and in
> use. ATM, till about end of Janaury, can run tests for bug 1.
> * On the AsRock X600M-STX mainboard (without chipset), the CPU (Ryzen
> 8700G) runs in SoC (system on chip) mode. Maybe someone did not tested
> this properly ...
> 
[...]

> With the help of TJ from the Debian kernel team ( https://
> bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 ), at least a
> workaround could be found.
> 
> The bug is triggered by the patch "nvme-pci: clamp max_hw_sectors
> based on DMA optimized limitation" (see https://lore.kernel.org/linux-
> iommu/20230503161759.GA1614@....de/ ) introduced in 6.3.7
> 
> To examine the situation, I added this debug info (all files are
> located in `drivers/nvme/host`):
> 
>> --- core.c.orig       2025-01-03 14:27:38.220428482 +0100
>> +++ core.c    2025-01-03 12:56:34.503259774 +0100
>> @@ -3306,6 +3306,7 @@
>>               max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
>>       else
>>               max_hw_sectors = UINT_MAX;
>> +     dev_warn(ctrl->device, "id->mdts=%d,  max_hw_sectors=%d, 
>> ctrl->max_hw_sectors=%d\n", id->mdts, max_hw_sectors, ctrl->max_hw_sectors);
>>       ctrl->max_hw_sectors =
>>               min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
> 
> 6.3.6 (last version w/o mentioned patch and w/o data corruption) says:
> 
>> [  127.196212] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
>> ctrl->max_hw_sectors=16384
>> [  127.203530] nvme nvme0: allocated 40 MiB host memory buffer.
> 
> 6.3.7 (first version w/ mentioned patch and w/ data corruption) says:
> 
>> [   46.436384] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
>> ctrl->max_hw_sectors=256
>> [   46.443562] nvme nvme0: allocated 40 MiB host memory buffer.
> 
> After I reverted the mentioned patch (
> 
>> --- pci.c.orig        2025-01-03 14:28:05.944819822 +0100
>> +++ pci.c     2025-01-03 12:54:37.014579093 +0100
>> @@ -3042,7 +3042,8 @@
>>        * over a single page.
>>        */
>>       dev->ctrl.max_hw_sectors = min_t(u32,
>> -             NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9);
>> +//           NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9);
>> +             NVME_MAX_KB_SZ << 1, dma_max_mapping_size(&pdev->dev) >> 9);
>>       dev->ctrl.max_segments = NVME_MAX_SEGS;
>>  
>>       /*
> 
> ), 6.11.5 (used this version because sources were laying around) works and says:
> 
>> [    1.251370] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
>> ctrl->max_hw_sectors=16384
>> [    1.261168] nvme nvme0: allocated 40 MiB host memory buffer.
> 
> Thus, the corruption occurs if `ctrl->max_hw_sectors` is set to another (a smaller) value than defined by `id->mdts`. 
> 
> If this should be allowed, the mentioned patch is not the (root) cause, but reversion is at least a workaround.

See the ticket for more details. Note, you have to use bugzilla to reach
the reporter, as I sadly[1] can not CCed them in mails like this.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"