[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <45fe8146-ef86-40dd-919a-eb6c9438dafa@simg.de>
Date: Thu, 6 Feb 2025 16:58:00 +0100
From: Stefan <linux-kernel@...g.de>
To: "Dr. David Alan Gilbert" <linux@...blig.org>, bugzilla-daemon@...nel.org
Cc: Christoph Hellwig <hch@....de>, Thorsten Leemhuis <linux@...mhuis.info>,
Mario Limonciello <mario.limonciello@....com>,
Bruno Gravato <bgravato@...il.com>, Keith Busch <kbusch@...nel.org>,
Adrian Huang <ahuang12@...ovo.com>,
Linux kernel regressions list <regressions@...ts.linux.dev>,
linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock
X600M-STX + Ryzen 8700G
Hi,
after Matthias was so kind (more than me) to make a video (!) for the
ASRock support, and after I once again referred to this thread and the
many users who have the same problem, ASRock is able to reproduce the
issues.
Ralph, all tests in comment #40 (including the network issue) where run
twice, because I did not collect logs and lspci outputs the first time.
(The corruptions seem to depend on which PCIe devices / lanes (?) are
used. That's why I also included the lspci outputs.)
(As announced in initial message, I cannot run tests ATM and for a while.)
Regards Stefan
Am 03.02.25 um 19:48 schrieb Stefan:
> Hi,
>
> just got feedback from ASRock. They asked me to make a video from the
> corruptions occurring on my remotely (and headless) running system.
> Maybe I should make video of printing out the logs that can be found an
> the Linux and Debian bug trackers ...
>
> Seems that ASRock is unwilling to solve the problem.
>
> Regards Stefan
>
>
> Am 28.01.25 um 15:24 schrieb Stefan:
>> Hi,
>>
>> Am 28.01.25 um 13:52 schrieb Dr. David Alan Gilbert:
>>> Is there any characterisation of the corrupted data; last time I
>>> looked at the bz there wasn't.
>>
>> Yes, there is. (And I already reported it at least on the Debian bug
>> tracker, see links in the initial message.)
>>
>> f3 reports overwritten sectors, i.e. it looks like the pseudo-random
>> test pattern is written to wrong position. These corruptions occur in
>> clusters whose size is an integer multiple of 2^17 bytes in most cases
>> (about 80%) and 2^15 in all cases.
>>
>> The frequency of these corruptions is roughly 1 cluster per 50 GB
>> written.
>>
>> Can others confirm this or do they observe a different characteristic?
>>
>> Regards Stefan
>>
>>
>>> I mean, is it reliably any of:
>>> a) What's the size of the corruption?
>>> block, cache line, word, bit???
>>> b) Position?
>>> e.g. last word in a block or something?
>>> c) Data?
>>> pile of zero's/ff's junk/etc?
>>>
>>> d) Is it a missed write, old data, or partially written block?
>>>
>>> Dave
>>>
>>>>> Puh. I'm kinda lost on what we could do about this on the Linux
>>>>> side.
>>>>
>>>> Because it also depends on the CPU series, a firmware or hardware issue
>>>> seems to be more likely than a Linux bug.
>>>>
>>>> ATM ASRock is still trying to reproduce the issue. (I'm in contact with
>>>> them to. But they have Chinese new year holidays in Taiwan this week.)
>>>>
>>>> If they can't reproduce it, they have to provide an explanation why the
>>>> issues are seen by so many users.
>>>>
>>>> Regards Stefan
>>>>
>>>>
>>
>
Powered by blists - more mailing lists