[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPfNY9Yig2cR-Fua@google.com>
Date: Tue, 21 Oct 2025 11:13:55 -0700
From: Brian Norris <briannorris@...omium.org>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: Lukas Wunner <lukas@...ner.de>, Bjorn Helgaas <bhelgaas@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, linux-pm@...r.kernel.org,
"Rafael J . Wysocki" <rafael@...nel.org>, linux-pci@...r.kernel.org
Subject: Re: [PATCH] PCI/PM: Prevent runtime suspend before devices are fully
initialized
Hi Ilpo,
On Tue, Oct 21, 2025 at 02:27:09PM +0300, Ilpo Järvinen wrote:
> On Mon, 20 Oct 2025, Brian Norris wrote:
> > On Mon, Oct 20, 2025 at 06:56:41PM +0300, Ilpo Järvinen wrote:
> > > On Fri, 17 Oct 2025, Brian Norris wrote:
> > > > On Fri, Oct 17, 2025 at 02:49:35PM +0300, Ilpo Järvinen wrote:
> > > > > On Fri, 17 Oct 2025, Lukas Wunner wrote:
> > > > > > On Thu, Oct 16, 2025 at 03:53:35PM -0700, Brian Norris wrote:
> > > > > > > PCI devices are created via pci_scan_slot() and similar, and are
> > > > > > > promptly configured for runtime PM (pci_pm_init()). They are initially
> > > > > > > prevented from suspending by way of pm_runtime_forbid(); however, it's
> > > > > > > expected that user space may override this via sysfs [1].
> > > > >
> > > > > Is this true as pm_runtime_forbid() also increases PM usage count?
> > > >
> > > > Yes it's true. See below.
> > > >
> > > > > "void pm_runtime_forbid(struct device *dev);
> > > > >
> > > > > unset the power.runtime_auto flag for the device and increase its
> > > > > usage counter (used by the /sys/devices/.../power/control interface to
> > > > > effectively prevent the device from being power managed at run time)"
> >
> > I see this doc line confused you, and I can sympathize.
> >
> > IIUC, the parenthetical means that sysfs *uses* pm_runtime_forbid() to
> > "effectively prevent runtime power management"; pm_runtime_forbid() does
> > not block user space from doing anything.
> >
> > > > Right, but sysfs `echo auto > .../power/control` performs the inverse --
> > > > pm_runtime_allow() -- which decrements that count.
> > >
> > > Fair enough, I didn't check what it does.
> > >
> > > IMO, the details about how the usage count behaves should be part of the
> > > changelog as that documentation I quoted sounded like user control is
> > > prevented when forbidden.
> >
> > I tried to elaborate on the API doc confusion above. But frankly, I'm
> > not sure how best to explain runtime PM.
> >
> > > I see you've put this part of the explanation
> > > into the v2 as well so I suggest you explain the usage count in the change
> > > so it is recorded in the commit if somebody has to look at this commit
> > > years from now.
> >
> > Both v1 and v2 mention that the sysfs 'power/control' file can override
> > the kernel calling pm_runtime_forbid(). They don't mention the usage
> > count, since that's an implementation detail IMO. (To me, the mental
> > model works best if "usage count" (usually get()/put()) is considered
> > mostly orthogonal to forbid()/allow()/sysfs, because "forbid()" can be
> > overridden at any time.)
> >
> > This is also covered here:
> >
> > https://docs.kernel.org/power/runtime_pm.html#runtime-pm-initialization-device-probing-and-removal
> >
> > "In principle, this mechanism may also be used by the driver to
> > effectively turn off the runtime power management of the device until
> > the user space turns it on."
>
> The problem is already rooted into the function name, when a function is
> called "forbid", anyone unfamiliar will think it really forbids
> something. The docs just further reinforced the idea and the fact that it
> also increments usage count.
>
> It is quite unexpected and feels quite illogical (for non-PM person like
> me) that user interface then goes to reverse that usage count increase,
> what would be the logical reason why there now are less users for it when
> user wants to turn on PM? (I understand things are done that way, no need
> to explain that further, but there are quite a few misleading things in
> this entire scenario, not just that parenthesis part of the docs.)
Ack. I'm definitely not defending the API names or the docs. I'm
frustrated by the same, by how long it took me to figure out what
everything really means, and by how the same footguns hit everyone I
work with, and we have to write these essays to explain it every time.
> > But admittedly, I find the runtime PM API surface to be enormous, and
> > its documentation ... not very helpful to outsiders. It's pretty much
> > written by and for PM experts. Case in point: you read and quoted the
> > appropriate docs, but it still misled you quite a bit :(
> >
> > I'm more tempted to try to improve the runtime PM docs than to try to
> > make up for them in the commit message, but maybe I can be convinced
> > otherwise.
>
> My personal approach is that if something comes up during review, it
> likely is something also the next person to look at this change (from
> history, maybe years from now) could similarly stumbles on when trying to
> understand the change. Thus, it often is worth to mention such things in
> the changelog too.
Yes, I like that approach too. Unfortunately in this case, it feels like
most of the confusion stems from the existing API and docs, and not from
anything in the description. I never even mentioned the usage count,
because frankly, I don't think it's relevant to the forbid()/allow()
topic. In fact, the whole bug is because forbid() does *not* guarantee
anything from the kernel's point of view -- it doesn't really hold a
"usage count" for functional correctness in the way get()/put() do.
Regardless, I can try to insert some language. Here's an attempt at yet
another footnote for a v3:
"""
[**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
/sys/.../power/control, and the runtime PM usage counter can be subtle.
It appears that the intention of pm_runtime_forbid() /
pm_runtime_allow() is twofold:
1. Allow the user to disable runtime_pm (force device to always be
powered on) through sysfs.
2. Allow the driver to start with runtime_pm disabled (device forced
on) and user space could later enable runtime_pm.
This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
specifically the section starting "The user space can effectively
disallow".
This means that while pm_runtime_forbid() does technically increase the
runtime PM usage counter, this usage counter is not a guarantee of
functional correctness from the kernel's point of view, because user
space can decrease that count again via sysfs.
"""
Let me know what you think. Or feel free to tweak my v2 message and send
a rewrite back to me (seriously! I'd like to see an outsider's view on
what would explain it best).
(And if that footnote is useful: I think part of that could belong in
the actual docs.)
> While I'm definitely not against improvements to docs too, the changelog
> for any patch should help to understand why the change was made. And
> IMO, this unexpected "internal detail" related to usage count which is
> quite significant here, if user interface wouldn't lower it, runtime PM
> would remain forbidden as forbid() promised.
I think improving the docs would reap rewards. Like I said, this is by
far not the first time that developers have had significant problems due
to API confusion.
Brian
Powered by blists - more mailing lists