[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3764923.NsmnsBrXv5@archbook>
Date: Wed, 04 May 2022 19:51:45 +0200
From: Nicolas Frattaroli <frattaroli.nicolas@...il.com>
To: Sudeep Holla <sudeep.holla@....com>
Cc: linux-arm-kernel@...ts.infradead.org,
linux-rockchip@...ts.infradead.org,
Cristian Marussi <cristian.marussi@....com>,
Heiko Stuebner <heiko@...ech.de>,
Liang Chen <cl@...k-chips.com>, linux-kernel@...r.kernel.org,
Kever Yang <kever.yang@...k-chips.com>,
Jeffy Chen <jeffy.chen@...k-chips.com>,
Peter Geis <pgwipeout@...il.com>
Subject: Re: [BUG] New arm scmi check in linux-next causing rk3568 not to boot due to firmware bug
On Mittwoch, 4. Mai 2022 15:21:30 CEST Sudeep Holla wrote:
> + Cristian
>
> Hi Nicolas,
>
> Thanks for the formal report.
>
> On Wed, May 04, 2022 at 02:49:07PM +0200, Nicolas Frattaroli wrote:
> > Good day,
> >
> > a user on the #linux-rockchip channel on the Libera.chat IRC network
> > reported that their RK3568 was no longer getting a CPU and GPU clock
> > from scmi and consequently not booting when using linux-next. This
> > was bisected down to the following commit:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/firmware/arm_scmi/base.c?h=next-20220503&id=3b0041f6e10e5bdbb646d98172be43e88734ed62
> >
> > The error message in the log is as follows:
> >
> > arm-scmi firmware:scmi: Malformed reply - real_sz:8 calc_sz:4, t->rx.len is 12, sizeof(u32) is 4, loop_num_ret is 3
> >
> > The rockchip firmware (bl31) being used was v1.32, from here:
> >
> > https://github.com/JeffyCN/rockchip_mirrors/blob/rkbin/bin/rk35/rk3568_bl31_v1.32.elf
> >
>
> So this platform is not supported in upstream TF-A like its predecessors ?
Hello,
it is not yet supported by upstream. Rockchip plans to release the sources
for it at some point if I recall correctly, but I believe their software
team has been very busy due to new hardware releases, so it hasn't happened
yet. I hope we'll see an open source release of the TF-A sources eventually,
so that for bugs like this we can always fix them without the vendor needing
to do it for us.
>
> > This seems like a non-fatal firmware bug, for which a kernel workaround is
> > certainly possible, but it would be good if rockchip could fix this in their
> > firmware.
> >
>
> Indeed, we added this check finding issue in one of our tests. Luckily
> it helped to unearth the same issue on this platform, but due to the
> nature of its f/w release, it is bit unfortunate that it can't be fixed
> easily and quickly. But I really wish this gets fixed in the firmware.
> Are there any other f/w bugs reported so far ? If so how are they fixed
> as I don't expect all such bugs can be worked around in the kernel though
> this might be. I would like to hear details there if possible.
I'm not aware of how the rockchip bug report workflow works. They seemingly
did update the firmware multiple times, last in October of 2021.
The official rockchip repository at [1] hasn't been kept as up to date as
the mirror by a rockchip employee at [2], so most people seem to have been
using the latter. Speaking of which, I'll add the owner of that repo to
the CC of this thread to make sure this doesn't get lost.
Rockchip lists an e-mail at [3] for reporting issues at, but this seems to
relate to their open-source documentation. The official github repository
of "rkbin" on the "rockchip-linux" organisation does not have issues
enabled, so submitting a bug report through that is unfortunately not
possible.
>
> > The user going by "amazingfate" reported that commenting out the
> > ret = -EPROTO; break;
> > fixes the issue for them.
> >
>
> Sure, or we could relax the check as calc_sz <= real_sz or something so
> that the reverse is still caught and handled as OS might read junk data in
> the later case.
This seems like a good solution, that way we're unlikely to ever run into a
situation where the kernel does the wrong thing here even if we're less
strict about the check. In either case, it should print a dev_err though,
it's still an error even if we can tolerate it in some cases.
>
> > I'm writing here to get the discussion started on how we can resolve this
> > before the Linux 5.19 release.
> >
>
> Agreed, I just sent by pull request for this literally few hours ago.
>
> > Sudeep Holla has already told me they'll gladly add a workaround before
> > the 5.19 release, but would rather see this fixed in the vendor firmware
> > first. Would rockchip be able and willing to fix it and publish a new
> > bl31 for rk3568?
> >
>
> Indeed and as mentioned above details on how other such f/w bugs are dealt
> in general esp that the firmware is blob release and one can't fix it easily.
> Do we have a bugzilla kind of setup to report and get the bugs fixed ?
It's worth mentioning that I think even if we get Rockchip to fix the bug in
the firmware, I believe Linux should still add a workaround, as otherwise
people running older firmware who are upgrading their kernels could suddenly
have unbootable systems and don't know why that happened.
Regards,
Nicolas Frattaroli
PS: I've also CC'd Peter Geis as he has worked on the RK356x support in
mainline a bunch and I believe he has been in contact with Rockchip
about releasing the TF-A sources before.
[1]: https://github.com/rockchip-linux/rkbin
[2]: https://github.com/JeffyCN/rockchip_mirrors/tree/rkbin
[3]: http://opensource.rock-chips.com/wiki_Main_Page
Powered by blists - more mailing lists