linux-kernel - Re: [PATCH 0/2] Fix a black screen on sc7180 Trogdor devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAA8EJpqVGHqufKo1kV52RzQCNL5D92mmnCzUwKZn4o+5=wF9pQ@mail.gmail.com>
Date: Wed, 1 May 2024 21:11:14 +0300
From: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
To: Stephen Boyd <swboyd@...omium.org>
Cc: Doug Anderson <dianders@...omium.org>, Michael Turquette <mturquette@...libre.com>, 
	Stephen Boyd <sboyd@...nel.org>, Bjorn Andersson <andersson@...nel.org>, linux-kernel@...r.kernel.org, 
	linux-clk@...r.kernel.org, patches@...ts.linux.dev, 
	linux-arm-msm@...r.kernel.org, Taniya Das <quic_tdas@...cinc.com>, 
	Laura Nao <laura.nao@...labora.com>
Subject: Re: [PATCH 0/2] Fix a black screen on sc7180 Trogdor devices

On Wed, 1 May 2024 at 03:17, Stephen Boyd <swboyd@...omium.org> wrote:
>
> Quoting Doug Anderson (2024-03-28 09:39:54)
> >
> > I spent a bunch of time discussing this offline with Stephen and I'll
> > try to summarize. Hopefully this isn't too much nonsense...
> >
> > 1. We'll likely land the patches downstream in ChromeOS for now while
> > we're figuring things out since we're seeing actual breakages. Whether
> > to land upstream is a question. The first patch is a bit of a hack but
> > unlikely to cause any real problems. The second patch seems correct
> > but it also feels like it's going to cause stuck clocks for a pile of
> > other SoCs because we're not adding hacks similar to the sc7180 hack
> > for all the other SoCs. I guess we could hope we get lucky or play
> > whack-a-mole? ...or we try to find a more generic solution... Dunno
> > what others think.
>
> I think we should hope to get lucky or play whack-a-mole and merge
> something like this series. If we have to we can similarly turn off RCGs
> or branches during driver probe that are using shared parents like we
> have on sc7180.
>
> Put simply, the shared RCG implementation is broken because it reports
> the wrong parent for clk_ops::get_parent() and doesn't clear the force
> enable bit. With the current code we're switching the parent to XO when
> the clk is enabled the first time. That's an obvious bug that we should
> fix regardless of implementing proper clk handoff. We haven't
> implemented handoff in over a decade. Blocking this bug fix on
> implementing handoff isn't practical.

Yes, that needs to be fixed. My approach was to drop the XO parent and
consider the clock to be off if it is fed by the XO.

> Furthermore, we're relying on clk
> consumers to clear that force enable bit by enabling the clk once. That
> doesn't make any sense, although we could use that force enable bit to
> consider the RCG as enabled for clk_disable_unused.

That patch seems fine to me. Will it work if we take the force-enable
bit into account when considering the clock to be on or off?

>
> An alternative approach to this series would be to force all shared RCGs
> to be parented to XO at clk registration time, and continue to clear
> that RCG force enable bit. That's sort of what Dmitry was going for
> earlier. Doing this would break anything that's relying on the clks
> staying enabled at some frequency through boot, but that isn't supported
> anyway because clk handoff isn't implemented. It avoids the problem that
> the first patch is for too because XO doesn't turn off causing a clk to
> get stuck on. I can certainly craft this patch up if folks think that's
> better.

I think this approach makes sense too (and might be preferable to
boot-time hacks).
On most of the platforms we are already resetting the MDSS as soon as
the mdss (root device) is being probed. Then the display is going to
be broken until DPU collects all the coonectors and outpus and finally
creates the DRM device.

But I think we should fix the get_parent() too irrespectively of this.

> To ease the transition we can make a new clk_ops for the RCG as well so
> that each SoC has to opt-in to use this behavior. Then we can be certain
> that other platforms aren't affected without being tested first. I'd
> prefer to not do that though because I fear we'll be leaving drivers in
> the broken state for some time.

SGTM

--
With best wishes
Dmitry