lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b22425c3-01e0-4d2e-bf78-5db884d4ec38@gmail.com>
Date: Sat, 18 Oct 2025 13:14:46 +0100
From: Hugh Cole-Baker <sigmaris@...il.com>
To: Dragan Simic <dsimic@...jaro.org>, Jimmy Hon <honyuenkwun@...il.com>
Cc: Tianling Shen <cnsztl@...il.com>, Rob Herring <robh@...nel.org>,
 Krzysztof Kozlowski <krzk+dt@...nel.org>, Conor Dooley
 <conor+dt@...nel.org>, Heiko Stuebner <heiko@...ech.de>,
 Grzegorz Sterniczuk <grzegorz@...rnicz.uk>, Jonas Karlman <jonas@...boo.se>,
 devicetree@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 linux-rockchip@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: dts: rockchip: fix eMMC corruption on NanoPC-T6
 with A3A444 chips

Hi Dragan,

On 18/10/2025 09:30, Dragan Simic wrote:
> Hello Jimmy,
> 
> On Saturday, October 18, 2025 02:42 CEST, Jimmy Hon <honyuenkwun@...il.com> wrote:
>> On Fri, Oct 17, 2025 at 10:15 AM Dragan Simic <dsimic@...jaro.org> wrote:
>>> On Friday, October 17, 2025 14:08 CEST, Tianling Shen <cnsztl@...il.com> wrote:
>>>> On 2025/10/17 18:25, Dragan Simic wrote:
>>>>> On Friday, October 17, 2025 09:39 CEST, Tianling Shen <cnsztl@...il.com> wrote:
>>>>>> From: Grzegorz Sterniczuk <grzegorz@...rnicz.uk>
>>>>>>
>>>>>> Some NanoPC-T6 boards with A3A444 eMMC chips experience I/O errors and
>>>>>> corruption when using HS400 mode. Downgrade to HS200 mode to ensure
>>>>>> stable operation.
>>>>>
>>>>> Could you, please, provide more details about the troublesome eMMC
>>>>> chip that gets identified as A3A444, i.e. what's the actual brand
>>>>> and model?  Maybe you could send a picture of it?  It might also
>>>>> help if you'd send the contents of "/sys/class/block/mmcblkX/device
>>>>> /manfid" from your board (where "X" should equal two).
>>>>
>>>> Unfortunately I don't have this board nor this eMMC chip.
>>>> I got the chip model from my friend, it's FORESEE FEMDNN256G-A3A44,
>>>> manfid is 0x0000d6.
>>>
>>> Thanks for responding and providing the details so quickly!
>>>
>>>>> I'm asking for that because I'd like to research it a bit further,
>>>>> if possible, because some other eMMC chips that are also found on
>>>>> the NanoPc-T6 seem to work fine in HS400 mode. [1]  It may be that
>>>>> the A3A444 chip has some issues with the HS400 mode on its own,
>>>>> i.e. the observed issues may not be caused by the board.
>>>>
>>>> Yes, it should be caused by this eMMC chip.
>>>
>>> I'd suggest that we move forward by "quirking off" the HS400 mode
>>> for the FEMDNN256G-A3A44 eMMC chip in the MMC drivers, instead of
>>> downgrading the speed of the sdhci interface on the NanoPC-T6.
>>>
>>> That way, the other similar Foresee eMMC chip that's also found
>>> on NanoPC-T6 boards, FEMDNN256G-A3A564, will continue to work in
>>> the faster HS400 mode, while the troublesome A3A44 variant will
>>> be downgraded to the HS200 globally for everyone's benefit.  It's
>>> quite unlikely that the A3A44 variant fails to work reliable in
>>> HS400 mode on the NanoPC-T6 only, so quirking it off in the MMC
>>> drivers should be a sane and safe choice.
>>>
>>> If you agree with dropping this patch, I'll be more than happy
>>> to implement this HS200 quirk in the MMC drivers.
>>>
>>> As a note, FEMDNN256G-A3A44 is found in the Rockchip Qualified
>>> eMMC Support List v1.84, [2] but the evidence says the opposite,
>>> so we should react appropriately by adding this quirk.
>>
>> When adding the quirk for the A3A44, can we lower the max frequency
>> and keep the HS400 mode instead?
>> That's what the Fedora folks found works [3]. There's more test
>> results in Armbian [4]
> 
> Are there any I/O performance tests that would prove that lowering
> the HS400 frequency to 150 MHz ends up working significantly faster
> than dropping the eMMC chip to HS200 mode?
> 
> I'm asking that because lowering the frequency looks much more like
> there's some issue with the board, rather than the issue being the
> eMMC chip's support for HS400 mode.  Thus, a quirk that would lower
> the HS400 mode frequency would likely be frowned upon and rejected,
> while a quirk that puts the chip into HS200 mode is much cleaner
> and has much higher chances to be accepted.

I also have the NanoPC-T6 with one of the A3A444 eMMCs which suffers
from I/O errors in the default HS400 mode. These are its details in
/sys/block/mmcblk0/device/:
manfid: 0x0000d6
oemid: 0x0103
name: A3A444
fwrev: 0x1100000000000000
hwrev: 0x0
rev: 0x8

I wasn't sure if I was just unlucky to get a faulty chip, but seeing
this thread it seems like a wider issue. On my board, limiting it to
HS200 mode gets rid of the I/O errors, and it seems that lowering
the frequency to 150MHz also avoids I/O errors.

I did a quick unscientific test with fio; HS400 Enhanced Strobe mode
with a 150MHz clock gives slightly better performance than HS200:

HS200 mode:
read: IOPS=697, BW=43.6MiB/s
write: IOPS=697, BW=43.6MiB/s

HS400 mode with 150MHz clock:
read: IOPS=805, BW=50.3MiB/s
write: IOPS=799, BW=50.0MiB/s

so from my perspective, limiting the frequency would be a better fix
than disabling HS400 entirely.

It could also be of interest that the clock used apparently can't
provide an exact 200MHz, e.g. in HS200 mode:

root@t6:~# cat /sys/kernel/debug/mmc0/ios
clock:		200000000 Hz
actual clock:	187500000 Hz
vdd:		18 (3.0 ~ 3.1 V)
bus mode:	2 (push-pull)
chip select:	0 (don't care)
power mode:	2 (on)
bus width:	3 (8 bits)
timing spec:	9 (mmc HS200)
signal voltage:	1 (1.80 V)
driver type:	0 (driver type B)

> With all that in mind, if the resulting I/O performance difference
> between 150 MHz HS400 and HS200 is within 15-20% or so, I'd highly
> recommend that we still go with the HS200 quirk.  It also leaves
> us with a nice safety margin, which is always good to have when
> such hardware instability issues are worked around in software,
> unless detailed eye diagrams, protocol dumps and whatnot can be
> pulled and analyzed, in which case the resulting safety margin
> can be much slimmer.
> 
> Ideally, we'd have a completely different board with the same
> Foresee FEMDNN256G-A3A44 eMMC chip to test how reliably its HS400
> mode works there, to see is it really up to this eMMC chip or up
> to the board design, but I'm afraid we don't have that (easily)
> available, so the only remaining option is to work with what's
> actually available, which inevitably leads to a certain amount
> of guesswork and some compromises.
> 
>>> [1] https://github.com/openwrt/openwrt/issues/18844
>>> [2] https://dl.radxa.com/rock5/hw/RKeMMCSupportList%20Ver1.84_20240815.pdf
>> [3] https://lists.fedoraproject.org/archives/list/kernel@lists.fedoraproject.org/thread/MCSDYDQVOXS5AZMKA7LLY4QX7JXBWPCA/
>> [4] https://github.com/armbian/build/pull/8736#issuecomment-3387760536

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ