lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <de5e8643-49bb-4e0e-45fd-51b25ecf530d@manjaro.org>
Date: Sat, 18 Oct 2025 15:57:24 +0200
From: "Dragan Simic" <dsimic@...jaro.org>
To: "Hugh Cole-Baker" <sigmaris@...il.com>
Cc: "Jimmy Hon" <honyuenkwun@...il.com>, "Tianling Shen" <cnsztl@...il.com>, "Rob Herring" <robh@...nel.org>, "Krzysztof Kozlowski" <krzk+dt@...nel.org>, "Conor Dooley" <conor+dt@...nel.org>, "Heiko Stuebner" <heiko@...ech.de>, "Grzegorz Sterniczuk" <grzegorz@...rnicz.uk>, "Jonas Karlman" <jonas@...boo.se>, devicetree@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: dts: rockchip: fix eMMC corruption on NanoPC-T6 with A3A444 chips

Hello Hugh,

On Saturday, October 18, 2025 14:14 CEST, Hugh Cole-Baker <sigmaris@...il.com> wrote:
> On 18/10/2025 09:30, Dragan Simic wrote:
> > On Saturday, October 18, 2025 02:42 CEST, Jimmy Hon <honyuenkwun@...il.com> wrote:
> >> On Fri, Oct 17, 2025 at 10:15 AM Dragan Simic <dsimic@...jaro.org> wrote:
> >>> On Friday, October 17, 2025 14:08 CEST, Tianling Shen <cnsztl@...il.com> wrote:
> >>>> On 2025/10/17 18:25, Dragan Simic wrote:
> >>>>> On Friday, October 17, 2025 09:39 CEST, Tianling Shen <cnsztl@...il.com> wrote:
> >>>>>> From: Grzegorz Sterniczuk <grzegorz@...rnicz.uk>
> >>>>>>
> >>>>>> Some NanoPC-T6 boards with A3A444 eMMC chips experience I/O errors and
> >>>>>> corruption when using HS400 mode. Downgrade to HS200 mode to ensure
> >>>>>> stable operation.
> >>>>>
> >>>>> Could you, please, provide more details about the troublesome eMMC
> >>>>> chip that gets identified as A3A444, i.e. what's the actual brand
> >>>>> and model?  Maybe you could send a picture of it?  It might also
> >>>>> help if you'd send the contents of "/sys/class/block/mmcblkX/device
> >>>>> /manfid" from your board (where "X" should equal two).
> >>>>
> >>>> Unfortunately I don't have this board nor this eMMC chip.
> >>>> I got the chip model from my friend, it's FORESEE FEMDNN256G-A3A44,
> >>>> manfid is 0x0000d6.
> >>>
> >>> Thanks for responding and providing the details so quickly!
> >>>
> >>>>> I'm asking for that because I'd like to research it a bit further,
> >>>>> if possible, because some other eMMC chips that are also found on
> >>>>> the NanoPc-T6 seem to work fine in HS400 mode. [1]  It may be that
> >>>>> the A3A444 chip has some issues with the HS400 mode on its own,
> >>>>> i.e. the observed issues may not be caused by the board.
> >>>>
> >>>> Yes, it should be caused by this eMMC chip.
> >>>
> >>> I'd suggest that we move forward by "quirking off" the HS400 mode
> >>> for the FEMDNN256G-A3A44 eMMC chip in the MMC drivers, instead of
> >>> downgrading the speed of the sdhci interface on the NanoPC-T6.
> >>>
> >>> That way, the other similar Foresee eMMC chip that's also found
> >>> on NanoPC-T6 boards, FEMDNN256G-A3A564, will continue to work in
> >>> the faster HS400 mode, while the troublesome A3A44 variant will
> >>> be downgraded to the HS200 globally for everyone's benefit.  It's
> >>> quite unlikely that the A3A44 variant fails to work reliable in
> >>> HS400 mode on the NanoPC-T6 only, so quirking it off in the MMC
> >>> drivers should be a sane and safe choice.
> >>>
> >>> If you agree with dropping this patch, I'll be more than happy
> >>> to implement this HS200 quirk in the MMC drivers.
> >>>
> >>> As a note, FEMDNN256G-A3A44 is found in the Rockchip Qualified
> >>> eMMC Support List v1.84, [2] but the evidence says the opposite,
> >>> so we should react appropriately by adding this quirk.
> >>
> >> When adding the quirk for the A3A44, can we lower the max frequency
> >> and keep the HS400 mode instead?
> >> That's what the Fedora folks found works [3]. There's more test
> >> results in Armbian [4]
> > 
> > Are there any I/O performance tests that would prove that lowering
> > the HS400 frequency to 150 MHz ends up working significantly faster
> > than dropping the eMMC chip to HS200 mode?
> > 
> > I'm asking that because lowering the frequency looks much more like
> > there's some issue with the board, rather than the issue being the
> > eMMC chip's support for HS400 mode.  Thus, a quirk that would lower
> > the HS400 mode frequency would likely be frowned upon and rejected,
> > while a quirk that puts the chip into HS200 mode is much cleaner
> > and has much higher chances to be accepted.
> 
> I also have the NanoPC-T6 with one of the A3A444 eMMCs which suffers
> from I/O errors in the default HS400 mode. These are its details in
> /sys/block/mmcblk0/device/:
> manfid: 0x0000d6
> oemid: 0x0103
> name: A3A444
> fwrev: 0x1100000000000000
> hwrev: 0x0
> rev: 0x8

Thanks for reporting the same issue with the same board and
increasing our sample size to two. :)

> I wasn't sure if I was just unlucky to get a faulty chip, but seeing
> this thread it seems like a wider issue. On my board, limiting it to
> HS200 mode gets rid of the I/O errors, and it seems that lowering
> the frequency to 150MHz also avoids I/O errors.
> 
> I did a quick unscientific test with fio; HS400 Enhanced Strobe mode
> with a 150MHz clock gives slightly better performance than HS200:
> 
> HS200 mode:
> read: IOPS=697, BW=43.6MiB/s
> write: IOPS=697, BW=43.6MiB/s
> 
> HS400 mode with 150MHz clock:
> read: IOPS=805, BW=50.3MiB/s
> write: IOPS=799, BW=50.0MiB/s
> 
> so from my perspective, limiting the frequency would be a better fix
> than disabling HS400 entirely.

Thanks for running these tests!  The measured difference in the
I/O performance is about 15%, which surely isn't insignificant,
but IMHO it makes the proposed lowering of the eMMC chip to HS200
mode fall into the "good safety margin" bracket that I described
earlier.  I think it's better to sacrifice those 15% to stay on
the, hopefully, rock-solid side.

I've been thinking more about the 150 MHz HS400 and HS200 quirks,
and I'm afraid I'm even more sure that the 150 MHz HS400 quirk
would be frowned upon and rejected.  See, it does make it look
like a board-level issue, requiring a board-level fix, instead of
being a chip-level issue, for which a quirk would be fine.  The
acceptably low difference in the measured performance levels just
solidifies such a viewpoint, I'm afraid.

> It could also be of interest that the clock used apparently can't
> provide an exact 200MHz, e.g. in HS200 mode:
> 
> root@t6:~# cat /sys/kernel/debug/mmc0/ios
> clock:		200000000 Hz
> actual clock:	187500000 Hz
> vdd:		18 (3.0 ~ 3.1 V)
> bus mode:	2 (push-pull)
> chip select:	0 (don't care)
> power mode:	2 (on)
> bus width:	3 (8 bits)
> timing spec:	9 (mmc HS200)
> signal voltage:	1 (1.80 V)
> driver type:	0 (driver type B)

Thanks, that's also something to think about.

> > With all that in mind, if the resulting I/O performance difference
> > between 150 MHz HS400 and HS200 is within 15-20% or so, I'd highly
> > recommend that we still go with the HS200 quirk.  It also leaves
> > us with a nice safety margin, which is always good to have when
> > such hardware instability issues are worked around in software,
> > unless detailed eye diagrams, protocol dumps and whatnot can be
> > pulled and analyzed, in which case the resulting safety margin
> > can be much slimmer.
> > 
> > Ideally, we'd have a completely different board with the same
> > Foresee FEMDNN256G-A3A44 eMMC chip to test how reliably its HS400
> > mode works there, to see is it really up to this eMMC chip or up
> > to the board design, but I'm afraid we don't have that (easily)
> > available, so the only remaining option is to work with what's
> > actually available, which inevitably leads to a certain amount
> > of guesswork and some compromises.
> > 
> >>> [1] https://github.com/openwrt/openwrt/issues/18844
> >>> [2] https://dl.radxa.com/rock5/hw/RKeMMCSupportList%20Ver1.84_20240815.pdf
> >> [3] https://lists.fedoraproject.org/archives/list/kernel@lists.fedoraproject.org/thread/MCSDYDQVOXS5AZMKA7LLY4QX7JXBWPCA/
> >> [4] https://github.com/armbian/build/pull/8736#issuecomment-3387760536


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ