linux-kernel - Re: [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0irZj7ttvUqb-iENQS6BX+KTGuTqyVh0DxgKmsoKrBcbA@mail.gmail.com>
Date: Wed, 9 Apr 2025 19:55:19 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Miquel Raynal <miquel.raynal@...tlin.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>, 
	Michael Turquette <mturquette@...libre.com>, Stephen Boyd <sboyd@...nel.org>, 
	Thomas Petazzoni <thomas.petazzoni@...tlin.com>, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-clk@...r.kernel.org, 
	Chen-Yu Tsai <wenst@...omium.org>, Lucas Stach <l.stach@...gutronix.de>, 
	Laurent Pinchart <laurent.pinchart@...asonboard.com>, Marek Vasut <marex@...x.de>, 
	Ulf Hansson <ulf.hansson@...aro.org>, Kevin Hilman <khilman@...nel.org>, 
	Fabio Estevam <festevam@...x.de>, Jacky Bai <ping.bai@....com>, Peng Fan <peng.fan@....com>, 
	Shawn Guo <shawnguo@...nel.org>, Shengjiu Wang <shengjiu.wang@....com>, linux-imx@....com, 
	Ian Ray <ian.ray@...ealthcare.com>, Hervé Codina <herve.codina@...tlin.com>, 
	Luca Ceresoli <luca.ceresoli@...tlin.com>, Saravana Kannan <saravanak@...gle.com>
Subject: Re: [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers

Hi,

On Fri, Mar 28, 2025 at 10:59 AM Miquel Raynal
<miquel.raynal@...tlin.com> wrote:
>
> Hello Rafael,
>
> >> The runtime PM core currently allows to runtime resume/suspend a device,
> >> or its suppliers.
> >>
> >> Let's make it also possible to runtime resume/suspend consumers.
> >>
> >> Consumers and suppliers are seen here through the description made by
> >> device_links.
> >
> > It would be good to explain why all of this is needed.
> >
> > I gather that it is used for resolving some synchronization issues in
> > the clk framework, but neither the cover letter nor this changelog
> > explains how it is used.
>
> The explanation is quite long, there have been already 3 full threads
> from people attempting to fix a problem that resides in the clock
> subsystem (but that may also be probably problematic in others, just
> uncovered so far). I don't know if you took the time to read the cover
> letter:
> https://lore.kernel.org/linux-clk/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/
> It tries to explain the problem and the approach to fix this problem,
> but let me try to give a runtime PM focused view of it here.
>
> [Problem]
>
> We do have an ABBA locking situation between clk and any other subsystem
> that might be in use during runtime_resume() operations, provided that
> these subsystems also make clk calls at some point. The usual suspect
> here are power domains.
>
> There are different approaches that can be taken but the one that felt
> the most promising when we discussed it during last LPC (and also the
> one that was partially implemented in the clk subsystem already for a
> tiny portion of it) is the rule that "subsystem locks should not be kept
> acquired while calling in some other subsystems".
>
> Typically in the clk subsystem the logic is:
>
> func() {
>         mutex_lock(clk);
>         runtime_resume(clk);
>         ...
> }
>
> Whereas what would definitely work without locking issues is the
> opposite:
>
> func() {
>         runtime_resume(clk);
>         mutex_lock(clk);
>         ...
> }
>
> Of course life is not so simple, and the clock core is highly
> recursive, which means inverting the two calls like I hinted above
> simply does not work as we go deeper in the subcalls. As a result, we
> need to runtime resume *all* the relevant clocks in advance, before
> calling functions recursively (the lock itself is allowed to re-enter
> and is not blocking in this case).
>
> I followed all possible paths in the clock subsystem and identified 3
> main categories. The list of clocks we need to runtime resume in advance
> can either be:
> 1- the parent clocks
> 2- the child clocks
> 3- the parent and child clocks
> 4- all the clocks (typically for debugfs/sysfs purposes).
>
> [Solution 1: discarded]
>
> The first approach to do that was do to some guessing based on the clock
> tree topology. Unfortunately this approach does not stand because it is
> virtually unbounded. In order to know the clock topology we must acquire
> the clock main lock. In order to runtime resume we must release it. As a
> result, this logic is virtually unbounded (even though in practice we
> would converge at some point). So this approach was discarded by Steven.
>
> [Solution 2: this proposal]
>
> After the LPC discussion with Steven, I also discussed with Saravana
> about this and he pointed that since we were using fw_devlink=rpm by
> default now, all providers -including clock controllers of course- would
> already be runtime resumed the first time we would make a
> runtime_resume(clk), and thus all the nested calls were no longer
> needed. This native solution was already addressing point #1 above (and
> partially point #3) and all I had to do was to make a similar function
> for point #2.

So this depends on DT being used and fw_devlink=rpm being used, doesn't it?

You cannot really assume in general that there will be device links
between parents and children.