linux-kernel - Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4270c0e3-161e-42d5-a6d3-f16b7fbcdc00@simg.de>
Date: Mon, 3 Feb 2025 19:48:11 +0100
From: Stefan <linux-kernel@...g.de>
To: "Dr. David Alan Gilbert" <linux@...blig.org>, bugzilla-daemon@...nel.org
Cc: Christoph Hellwig <hch@....de>, Thorsten Leemhuis <linux@...mhuis.info>,
 Mario Limonciello <mario.limonciello@....com>,
 Bruno Gravato <bgravato@...il.com>, Keith Busch <kbusch@...nel.org>,
 Adrian Huang <ahuang12@...ovo.com>,
 Linux kernel regressions list <regressions@...ts.linux.dev>,
 linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
 "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock
 X600M-STX + Ryzen 8700G

Hi,

just got feedback from ASRock. They asked me to make a video from the
corruptions occurring on my remotely (and headless) running system.
Maybe I should make video of printing out the logs that can be found an
the Linux and Debian bug trackers ...

Seems that ASRock is unwilling to solve the problem.

Regards Stefan

Am 28.01.25 um 15:24 schrieb Stefan:
> Hi,
>
> Am 28.01.25 um 13:52 schrieb Dr. David Alan Gilbert:
>> Is there any characterisation of the corrupted data; last time I
>> looked at the bz there wasn't.
>
> Yes, there is. (And I already reported it at least on the Debian bug
> tracker, see links in the initial message.)
>
> f3 reports overwritten sectors, i.e. it looks like the pseudo-random
> test pattern is written to wrong position. These corruptions occur in
> clusters whose size is an integer multiple of 2^17 bytes in most cases
> (about 80%) and 2^15 in all cases.
>
> The frequency of these corruptions is roughly 1 cluster per 50 GB written.
>
> Can others confirm this or do they observe a different characteristic?
>
> Regards Stefan
>
>
>> I mean, is it reliably any of:
>>     a) What's the size of the corruption?
>>            block, cache line, word, bit???
>>     b) Position?
>>            e.g. last word in a block or something?
>>     c) Data?
>>            pile of zero's/ff's junk/etc?
>>
>>     d) Is it a missed write, old data, or partially written block?
>>
>> Dave
>>
>>>> Puh.  I'm kinda lost on what we could do about this on the Linux
>>>> side.
>>>
>>> Because it also depends on the CPU series, a firmware or hardware issue
>>> seems to be more likely than a Linux bug.
>>>
>>> ATM ASRock is still trying to reproduce the issue. (I'm in contact with
>>> them to. But they have Chinese new year holidays in Taiwan this week.)
>>>
>>> If they can't reproduce it, they have to provide an explanation why the
>>> issues are seen by so many users.
>>>
>>> Regards Stefan
>>>
>>>
>