[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230620155902.clspxncyvpodixft@ripper>
Date: Tue, 20 Jun 2023 08:59:02 -0700
From: Bjorn Andersson <andersson@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>,
Amit Pundir <amit.pundir@...aro.org>,
Mark Brown <broonie@...nel.org>,
Andy Gross <agross@...nel.org>,
Rob Herring <robh+dt@...nel.org>,
Konrad Dybcio <konrad.dybcio@...aro.org>,
Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
Caleb Connolly <caleb.connolly@...aro.org>,
Conor Dooley <conor+dt@...nel.org>,
linux-arm-msm <linux-arm-msm@...r.kernel.org>,
dt <devicetree@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] arm64: dts: qcom: sdm845-db845c: Move LVS regulator
nodes up
On Wed, Jun 14, 2023 at 12:44:15PM -0700, Doug Anderson wrote:
> Hi,
>
> On Wed, Jun 14, 2023 at 11:47 AM Krzysztof Kozlowski
> <krzysztof.kozlowski@...aro.org> wrote:
> >
> > On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> > > On 02.06.23 18:12, Amit Pundir wrote:
> > >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
> > >> list to workaround a boot regression uncovered by the upstream
> > >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
> > >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
> > >>
> > >> Without this fix DB845c fail to boot at times because one of the
> > >> lvs1 or lvs2 regulators fail to turn ON in time.
> > >
> > > /me waves friendly
> > >
> > > FWIW, as it's not obvious: this...
> > >
> > >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
> > >
> > > ...is a report about a regression. One that we could still solve before
> > > 6.4 is out. One I'll likely will point Linus to, unless a fix comes into
> > > sight.
> > >
> > > When I noticed the reluctant replies to this patch I earlier today asked
> > > in the thread with the report what the plan forward was:
> > > https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/
> > >
> > > Dough there replied:
> > >
> > > ```
> > > Of the two proposals made (the revert vs. the reordering of the dts),
> > > the reordering of the dts seems better. It only affects the one buggy
> > > board (rather than preventing us to move to async probe for everyone)
> > > and it also has a chance of actually fixing something (changing the
> > > order that regulators probe in rpmh-regulator might legitimately work
> > > around the problem). That being said, just like the revert the dts
> > > reordering is still just papering over the problem and is fragile /
> > > not guaranteed to work forever.
> > > ```
> > >
> > > Papering over obviously is not good, but has anyone a better idea to fix
> > > this? Or is "not fixing" for some reason an viable option here?
> > >
> >
> > I understand there is a regression, although kernel is not mainline
> > (hash df7443a96851 is unknown) and the only solutions were papering the
> > problem. Reverting commit is a temporary workaround. Moving nodes in DTS
> > is not acceptable because it hides actual problem and only solves this
> > one particular observed problem, while actual issue is still there. It
> > would be nice to be able to reproduce it on real mainline with normal
> > operating system (not AOSP) - with ramdiks/without/whatever. So far no
> > one did it, right?
>
> The worry I have about the revert here is that it will never be able
> to be undone and that doesn't seem great long term. I'm all for a
> temporary revert to fix a problem while the root cause is understood,
> but in this case I have a hard time believing that we'll make more
> progress towards a root cause once the revert lands. All the
> investigation we've done so far seems to indicate that the revert only
> fixes the problem by luck...
>
> I completely agree that moving the nodes in the DTS is a hack and just
> hides the problem. However, it also at least limits the workaround to
> the one board showing the problem and doesn't mean we're stuck with
> synchronous probe for rpmh-regulator for all eternity because nobody
> can understand this timing issue on db845c.
>
I agree that we shouldn't hide this by reverting the regulator change.
And as has been stated a few times already, the symptom indicates that
we have a misconfigured system.
Before accepting a patch just shuffling the bricks, I'd like to see some
more analysis of what happens wrt the rpmh right before the timeout.
Perhaps the landing team can assist here?
Regards,
Bjorn
Powered by blists - more mailing lists