lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3556261.BddDVKsqQX@workhorse>
Date: Thu, 04 Sep 2025 17:43:23 +0200
From: Nicolas Frattaroli <nicolas.frattaroli@...labora.com>
To: Ulf Hansson <ulf.hansson@...aro.org>
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel@...labora.com, linux-rockchip@...ts.infradead.org,
 Cristian Ciocaltea <cristian.ciocaltea@...labora.com>,
 Sebastian Reichel <sebastian.reichel@...labora.com>,
 Heiko Stuebner <heiko@...ech.de>
Subject: Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state

On Thursday, 4 September 2025 14:50:13 Central European Summer Time you wrote:
> On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> > On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> > <nicolas.frattaroli@...labora.com> wrote:
> > >
> > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> > 
> > I can't find this commit hash. What tree are you using when testing this?
> > 
> > Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
> 
> Probably, I did the revert on a rebased branch and then rebased the revert
> onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
> there is a v2 (it might actually become a different patch, see huge text
> below, sorry!)
> 
> > 
> > >
> > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > it to stay enabled, infrastricture for which was added in Commit
> > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > >
> > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > result is that RK3576 devices with the UFS controller enabled but unused
> > > will freeze once pmdomain shuts off unused domains.
> > >
> > > Revert it until a better fix can be found.
> > 
> > This sounds a bit vague to me, can you please clarify and elaborate a
> > bit more so I can try to help.
> > 
> > What does "UFS controller enabled but unused" actually mean? Has the
> > UFS controller driver been probed successfully and thus its
> > corresponding device been attached to its PM domain?
> 
> It means the UFS controller driver has probed, but does not find a
> UFS storage chip connected to it, and therefore reports a probe
> failure. This is a possibility on single-board computers like the
> Radxa ROCK 4D, where the UFS storage is a separate module that plugs
> into some headers.
> 
> > Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> > to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> > until sync_state") brings along with its corresponding sync_state
> > series for genpd [1]. Again, more information is needed to understand
> > what goes wrong.
> 
> [snip very long rubber duck debugging session]

Okay so I believe I have found the root cause of the regression. UFS is
innocent, disabling UFS just happens to avoid it due to how the timing of
things works out.

The real issue is that the NPU power domains on the RK3576, which are
currently unused, have an undeclared dependency on vdd_npu_s0.

Declaring this dependency with a `domain-supply` and adding the
necessary flag in the rockchip PD controller to use it does not solve
he problem. This is because the rockchip PD controller cannot acquire
those supplies during probe, as they're not available yet and their
availability depends on the PD controller finishing probe.

That's why it acquires them in the PD enable callback, but the NPU
PDs are never enabled because they're unused.

This worked fine when unused PDs were still turned off quite early, as
this meant they were turned off before regulators. Now the unused
regulators are turned off before turning off the unused PDs happens.

I don't really see an easy way to fix this with a patch that's fit for
an rc cycle. We can't request the regulator early or even just add a
device link, as the regulator is not around yet.

Marking vdd_npu_s0 as always-on would be abusing DT to work around a
Linux kernel shortcoming, which is a no-no.

What we need is either a way to register with pmdomain core that
certain PDs need a late init for additional supplies, which is then
called before any of the unused regulator power off functionality is
invoked by the regulator core.

Any ideas?

Kind regards,
Nicolas Frattaroli


> 
> Kind regards,
> Nicolas Frattaroli
> 
> > 
> > Kind regards
> > Uffe
> > 
> > [1]
> > https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
> > 
> > >
> > > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@...labora.com>
> > > ---
> > >  drivers/pmdomain/core.c | 4 ----
> > >  1 file changed, 4 deletions(-)
> > >
> > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
> > > --- a/drivers/pmdomain/core.c
> > > +++ b/drivers/pmdomain/core.c
> > > @@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
> > >         return ret;
> > >  }
> > >
> > > -#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
> > >  static bool pd_ignore_unused;
> > >  static int __init pd_ignore_unused_setup(char *__unused)
> > >  {
> > > @@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
> > >         return 0;
> > >  }
> > >  late_initcall_sync(genpd_power_off_unused);
> > > -#endif
> > >
> > >  #ifdef CONFIG_PM_SLEEP
> > >
> > > @@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
> > >         list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> > >                 if (genpd->provider == of_fwnode_handle(np)) {
> > >                         genpd_lock(genpd);
> > > -                       genpd->stay_on = false;
> > >                         genpd_power_off(genpd, false, 0);
> > >                         genpd_unlock(genpd);
> > >                 }
> > > @@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
> > >
> > >         case GENPD_SYNC_STATE_SIMPLE:
> > >                 genpd_lock(genpd);
> > > -               genpd->stay_on = false;
> > >                 genpd_power_off(genpd, false, 0);
> > >                 genpd_unlock(genpd);
> > >                 break;
> > >
> > > ---
> > > base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
> > > change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
> > >
> > > Best regards,
> > > --
> > > Nicolas Frattaroli <nicolas.frattaroli@...labora.com>
> > >
> > 
> 
> 





Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ