lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dec260e9-8874-4727-9211-de939991a344@gmail.com>
Date: Thu, 22 May 2025 19:49:24 +0200
From: Gabor Juhos <j4g8y7@...il.com>
To: Miquel Raynal <miquel.raynal@...tlin.com>,
 Md Sadre Alam <quic_mdalam@...cinc.com>
Cc: Mark Brown <broonie@...nel.org>,
 Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
 Richard Weinberger <richard@....at>, Vignesh Raghavendra <vigneshr@...com>,
 Varadarajan Narayanan <quic_varada@...cinc.com>,
 Sricharan Ramabadhran <quic_srichara@...cinc.com>,
 linux-spi@...r.kernel.org, linux-mtd@...ts.infradead.org,
 linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH next 2/2] spi: spi-qpic-snand: add support for 8 bits ECC
 strength

2025. 05. 21. 9:52 keltezéssel, Miquel Raynal írta:
> On 21/05/2025 at 11:08:02 +0530, Md Sadre Alam <quic_mdalam@...cinc.com> wrote:
> 
>> Hi,
>>
>> On 5/16/2025 7:44 PM, Miquel Raynal wrote:
>>>
>>>>>> Interestingly enough, it reports the correct number of bit errors now.
>>>>>> For me it seems, that the hardware reports the number of the corrected
>>>>>> *bytes* instead of the corrected *bits*.
>>>>> I doubt that, nobody counts bytes of errors.
>>>>> You results are surprising. I initially though in favour of a software
>>>>> bug, but then it looks even weirder than that. Alam?
>>>> I have checked with HW team , the QPIC ECC HW engine reports the bit
>>>> error byte wise not bit wise.
>>>>
>>>> e.g
>>>>      Byte0 --> 2-bitflips --> QPIC ECC counts 1 only
>>>>      Byte1 --> 3-bitflips --> QPIC ECC counts 1 only
>>>>      Byte2 --> 1-bitflips --> QPIC ECC counts 1 only
>>>>      Byte3 --> 4-bitflips --> QPIC ECC counts 1 only (in 8-bit ecc)
>>>>      Byte4 --> 6-bitflips --> QPIC ECC counts 1 only (in 8-bit ecc)
>>>>
>>>> Hope this can clearify the things now.
>>> o_O ????
>>> How is that even useful? This basically means UBI will never refresh
>>> the
>>> data because we will constantly underestimate the number of bitflips! We
>>> need to know the actual number, this averaging does not make any sense
>>> for Linux. Is there another way to get the raw number of bitflips?
>> I have re-checked with HW team, unfortunately currently there is no
>> register fields available to get the raw number of bit flips. But
>> for newer chipset they have fixed this issue. But currently the QPIC
>> QPIC_NANDC_BUFFER_STATUS | 0x79B0018 register bit-8 will get set if
>> there is uncorrectable bitflips happened.
>>
>> For 4-bit ECC if 5-bit raw bit flips happened then bit-8 will get set in
>> QPIC_NANDC_BUFFER_STATUS.
>>
>> similar for 8-bit ECC if 9-bit raw bit flips happened then bit-8 will
>> get set in QPIC_NANDC_BUFFER_STATUS.
> 
> I believe the unrecoverable situation is handled correctly. What is not
> is the fact that we care about the number of bitflips before having a
> failure because if it reaches a certain threshold (typically 2/3 of the
> strength) the upper layer is responsible of moving the data around to
> avoid loosing it.
> 
> You need to identify the hardware revision that fixed it and provide a
> warning otherwise, or at least a comment in the code...

In itself, neither a comment, nor a warning will help as far as the upper layer
is concerned. However the driver can be changed to overestimate the number of
corrected bitflips.

I just sent a patch [1] which tries to addresses this. I admit that it is not
ideal, but in my opinion it is a reasonable tradeoff which can be used as a
temporary solution.

For a long term fix, probably it would be possible to change the driver to do
the ECC correction in software.  Although I have no idea how that would impact
the performance.

[1]
https://lore.kernel.org/r/20250522-qpic-snand-overestimate-bitflips-v1-1-35c65c05068e@gmail.com

Regards,
Gabor

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ