[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0i1O+ox2ewjm+op95=tWK1YK0fV5e_NPrRVknn7qaCKyA@mail.gmail.com>
Date: Thu, 23 Oct 2025 20:49:39 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Brian Norris <briannorris@...omium.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
linux-pm@...r.kernel.org, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, Lukas Wunner <lukas@...ner.de>
Subject: Re: [PATCH v3] PCI/PM: Prevent runtime suspend before devices are
fully initialized
On Thu, Oct 23, 2025 at 8:42 PM Brian Norris <briannorris@...omium.org> wrote:
>
> Hi again,
>
> On Thu, Oct 23, 2025 at 11:34:57AM -0700, Brian Norris wrote:
> > On Thu, Oct 23, 2025 at 11:41:38AM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Oct 22, 2025 at 11:15 PM Brian Norris <briannorris@...omium.org> wrote:
> > > >
> > > > Today, it's possible for a PCI device to be created and
> > > > runtime-suspended before it is fully initialized. When that happens, the
> > > > device will remain in D0, but the suspend process may save an
> > > > intermediate version of that device's state -- for example, without
> > > > appropriate BAR configuration. When the device later resumes, we'll
> > > > restore invalid PCI state and the device may not function.
> > > >
> > > > Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > > > until we've fully initialized the device.
> > > >
> > > > More details on how exactly this may occur:
> > > >
> > > > 1. PCI device is created by pci_scan_slot() or similar
> > > > 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
> > > > device starts "active" and we initially prevent (pm_runtime_forbid())
> > > > suspend -- but see [*] footnote
> > > > 3. Underlying 'struct device' is added to the system (device_add());
> > > > runtime PM can now be configured by user space
> > > > 4. PCI device receives BAR configuration
> > > > (pci_assign_unassigned_bus_resources(), etc.)
> > > > 5. PCI device is added to the system in pci_bus_add_device()
> > > >
> > > > The device may potentially suspend between #3 and #4.
> > > >
> > > > [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> > > > design [**], this can be overridden by user space policy via
> > > >
> > > > echo auto > /sys/bus/pci/devices/.../power/control
> > > >
> > > > Thus, the above #3/#4 sequence is racy with user space (udev or
> > > > similar).
> > > >
> > > > Notably, many PCI devices are enumerated at subsys_initcall time and so
> > > > will not race with user space. However, there are several scenarios
> > > > where PCI devices are created later on, such as with hotplug or when
> > > > drivers (pwrctrl or controller drivers) are built as modules.
> > > >
> > > > [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> > > > /sys/.../power/control, and the runtime PM usage counter can be subtle.
> > > > It appears that the intention of pm_runtime_forbid() /
> > > > pm_runtime_allow() is twofold:
> > > >
> > > > 1. Allow the user to disable runtime_pm (force device to always be
> > > > powered on) through sysfs.
> > > > 2. Allow the driver to start with runtime_pm disabled (device forced
> > > > on) and user space could later enable runtime_pm.
> > > >
> > > > This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> > > > specifically the section starting "The user space can effectively
> > > > disallow".
> > > >
> > > > This means that while pm_runtime_forbid() does technically increase the
> > > > runtime PM usage counter, this usage counter is not a guarantee of
> > > > functional correctness, because sysfs can decrease that count again.
> > > >
> > > > Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > > > Signed-off-by: Brian Norris <briannorris@...omium.org>
> > > > Cc: <stable@...r.kernel.org>
> > > > Reviewed-by: Rafael J. Wysocki (Intel) <rafael@...nel.org>
> > > > ---
> > > >
> > > > Changes in v3:
> > > > * Add Link to initial discussion
> > > > * Add Rafael's Reviewed-by
> > > > * Add lengthier footnotes about forbid vs allow vs sysfs
> > > >
> > > > Changes in v2:
> > > > * Update CC list
> > > > * Rework problem description
> > > > * Update solution: defer pm_runtime_enable(), instead of trying to
> > > > get()/put()
> > > >
> > > > drivers/pci/bus.c | 3 +++
> > > > drivers/pci/pci.c | 1 -
> > > > 2 files changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > > > index f26aec6ff588..fc66b6cb3a54 100644
> > > > --- a/drivers/pci/bus.c
> > > > +++ b/drivers/pci/bus.c
> > > > @@ -14,6 +14,7 @@
> > > > #include <linux/of.h>
> > > > #include <linux/of_platform.h>
> > > > #include <linux/platform_device.h>
> > > > +#include <linux/pm_runtime.h>
> > > > #include <linux/proc_fs.h>
> > > > #include <linux/slab.h>
> > > >
> > > > @@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> > > > put_device(&pdev->dev);
> > > > }
> > > >
> > > > + pm_runtime_enable(&dev->dev);
> > > > +
> > > > if (!dn || of_device_is_available(dn))
> > > > pci_dev_allow_binding(dev);
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index b14dd064006c..f792164fa297 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
> > > > pci_pm_power_up_and_verify_state(dev);
> > > > pm_runtime_forbid(&dev->dev);
> > > > pm_runtime_set_active(&dev->dev);
> > >
> > > Actually, I think that the two statements above can be moved too.
> > >
> > > The pm_runtime_forbid() call doesn't matter until runtime PM is
> > > enabled and it is better to do pm_runtime_set_active() right before
> > > enabling runtime PM.
> >
> > Ehh, sure, I can do that I suppose. That signs me up for fixing some
> > excessive Documentation/ that spells out exactly what goes on in
> > pci_pm_init() for some reason, but I can do that too :)
>
> Hmm, never mind, I think I agreed with you too soon.
>
> It should be ok to move pm_runtime_set_active(), but I believe moving
> pm_runtime_forbid() would introduce a similar race condition problem to
> the bug I'm resolving -- pci_bus_add_device() is run after device_add(),
> and so user space might already see the device and try to configure
> power/control. If we forbid() after that point, user space would be
> confused.
>
> Documentation/power/runtime_pm.rst even calls this out:
>
> "It should be noted, however, that if the user space has already
> intentionally changed the value of /sys/devices/.../power/control to
> "auto" to allow the driver to power manage the device at run time, the
> driver may confuse it by using pm_runtime_forbid() this way."
>
> So I'd rather not make this change.
OK, fair enough, so leave the pm_runtime_forbid() as is, but
pm_runtime_set_active() should really be done right prior to enabling.
Powered by blists - more mailing lists