[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <06cd0121-819d-652d-afa7-eece15bf82a2@linux.intel.com>
Date: Tue, 21 Oct 2025 14:27:09 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Brian Norris <briannorris@...omium.org>
cc: Lukas Wunner <lukas@...ner.de>, Bjorn Helgaas <bhelgaas@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, linux-pm@...r.kernel.org,
"Rafael J . Wysocki" <rafael@...nel.org>, linux-pci@...r.kernel.org
Subject: Re: [PATCH] PCI/PM: Prevent runtime suspend before devices are fully
initialized
On Mon, 20 Oct 2025, Brian Norris wrote:
> Hi Ilpo,
>
> On Mon, Oct 20, 2025 at 06:56:41PM +0300, Ilpo Järvinen wrote:
> > On Fri, 17 Oct 2025, Brian Norris wrote:
> >
> > > On Fri, Oct 17, 2025 at 02:49:35PM +0300, Ilpo Järvinen wrote:
> > > > On Fri, 17 Oct 2025, Lukas Wunner wrote:
> > > >
> > > > > [cc += Ilpo]
> > > > >
> > > > > On Thu, Oct 16, 2025 at 03:53:35PM -0700, Brian Norris wrote:
> > > > > > PCI devices are created via pci_scan_slot() and similar, and are
> > > > > > promptly configured for runtime PM (pci_pm_init()). They are initially
> > > > > > prevented from suspending by way of pm_runtime_forbid(); however, it's
> > > > > > expected that user space may override this via sysfs [1].
> > > >
> > > > Is this true as pm_runtime_forbid() also increases PM usage count?
> > >
> > > Yes it's true. See below.
> > >
> > > > "void pm_runtime_forbid(struct device *dev);
> > > >
> > > > unset the power.runtime_auto flag for the device and increase its
> > > > usage counter (used by the /sys/devices/.../power/control interface to
> > > > effectively prevent the device from being power managed at run time)"
>
> I see this doc line confused you, and I can sympathize.
>
> IIUC, the parenthetical means that sysfs *uses* pm_runtime_forbid() to
> "effectively prevent runtime power management"; pm_runtime_forbid() does
> not block user space from doing anything.
>
> > > Right, but sysfs `echo auto > .../power/control` performs the inverse --
> > > pm_runtime_allow() -- which decrements that count.
> >
> > Fair enough, I didn't check what it does.
> >
> > IMO, the details about how the usage count behaves should be part of the
> > changelog as that documentation I quoted sounded like user control is
> > prevented when forbidden.
>
> I tried to elaborate on the API doc confusion above. But frankly, I'm
> not sure how best to explain runtime PM.
>
> > I see you've put this part of the explanation
> > into the v2 as well so I suggest you explain the usage count in the change
> > so it is recorded in the commit if somebody has to look at this commit
> > years from now.
>
> Both v1 and v2 mention that the sysfs 'power/control' file can override
> the kernel calling pm_runtime_forbid(). They don't mention the usage
> count, since that's an implementation detail IMO. (To me, the mental
> model works best if "usage count" (usually get()/put()) is considered
> mostly orthogonal to forbid()/allow()/sysfs, because "forbid()" can be
> overridden at any time.)
>
> This is also covered here:
>
> https://docs.kernel.org/power/runtime_pm.html#runtime-pm-initialization-device-probing-and-removal
>
> "In principle, this mechanism may also be used by the driver to
> effectively turn off the runtime power management of the device until
> the user space turns it on."
The problem is already rooted into the function name, when a function is
called "forbid", anyone unfamiliar will think it really forbids
something. The docs just further reinforced the idea and the fact that it
also increments usage count.
It is quite unexpected and feels quite illogical (for non-PM person like
me) that user interface then goes to reverse that usage count increase,
what would be the logical reason why there now are less users for it when
user wants to turn on PM? (I understand things are done that way, no need
to explain that further, but there are quite a few misleading things in
this entire scenario, not just that parenthesis part of the docs.)
> But admittedly, I find the runtime PM API surface to be enormous, and
> its documentation ... not very helpful to outsiders. It's pretty much
> written by and for PM experts. Case in point: you read and quoted the
> appropriate docs, but it still misled you quite a bit :(
>
> I'm more tempted to try to improve the runtime PM docs than to try to
> make up for them in the commit message, but maybe I can be convinced
> otherwise.
My personal approach is that if something comes up during review, it
likely is something also the next person to look at this change (from
history, maybe years from now) could similarly stumbles on when trying to
understand the change. Thus, it often is worth to mention such things in
the changelog too.
While I'm definitely not against improvements to docs too, the changelog
for any patch should help to understand why the change was made. And
IMO, this unexpected "internal detail" related to usage count which is
quite significant here, if user interface wouldn't lower it, runtime PM
would remain forbidden as forbid() promised.
> > > > > Patch LGTM in principle, but adding Ilpo to cc who is refactoring PCI
> > > > > resource allocation and may judge whether this can actually happen.
> > > >
> > > > I can see there could be other failure modes besides just saving wrong
> > > > config if devices get suspended underneath the resource assignment
> > > > algorithm.
> > > >
> > > > Besides hotplug, also BAR resize does changes the resources and BARs.
> > > > This case is not helped by this patch.
> > >
> > > Is that the 'resource_N_resize' sysfs attributes? Becuase that holds PM
> > > references (pci_config_pm_runtime_{get,put}()) and therefore should not
> > > generally have the same problem.
> >
> > Okay, seem fine for the PCI core part.
> >
> > Driver's can also trigger BAR resize by calling pci_resize_resource()
> > directly but I've no idea how the usage counts behave (TBH, PM isn't my
> > strongest forte even if Lukas pulled me in to comment).
>
> There are only 3 drivers that call pci_resize_resource(). I looked into
> them, and it looks like they all hold pm_runtime_* references while
> doing this, or are calling it during pci_driver probe(), which docs say
> will hold a reference.
>
> https://docs.kernel.org/power/pci.html#device-runtime-power-management
>
> So they should all be OK too.
Thanks for checking.
--
i.
Powered by blists - more mailing lists