[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPp1UWdc2rmafZr-@google.com>
Date: Thu, 23 Oct 2025 11:34:57 -0700
From: Brian Norris <briannorris@...omium.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
linux-pm@...r.kernel.org, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, Lukas Wunner <lukas@...ner.de>
Subject: Re: [PATCH v3] PCI/PM: Prevent runtime suspend before devices are
fully initialized
On Thu, Oct 23, 2025 at 11:41:38AM +0200, Rafael J. Wysocki wrote:
> On Wed, Oct 22, 2025 at 11:15 PM Brian Norris <briannorris@...omium.org> wrote:
> >
> > Today, it's possible for a PCI device to be created and
> > runtime-suspended before it is fully initialized. When that happens, the
> > device will remain in D0, but the suspend process may save an
> > intermediate version of that device's state -- for example, without
> > appropriate BAR configuration. When the device later resumes, we'll
> > restore invalid PCI state and the device may not function.
> >
> > Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > until we've fully initialized the device.
> >
> > More details on how exactly this may occur:
> >
> > 1. PCI device is created by pci_scan_slot() or similar
> > 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
> > device starts "active" and we initially prevent (pm_runtime_forbid())
> > suspend -- but see [*] footnote
> > 3. Underlying 'struct device' is added to the system (device_add());
> > runtime PM can now be configured by user space
> > 4. PCI device receives BAR configuration
> > (pci_assign_unassigned_bus_resources(), etc.)
> > 5. PCI device is added to the system in pci_bus_add_device()
> >
> > The device may potentially suspend between #3 and #4.
> >
> > [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> > design [**], this can be overridden by user space policy via
> >
> > echo auto > /sys/bus/pci/devices/.../power/control
> >
> > Thus, the above #3/#4 sequence is racy with user space (udev or
> > similar).
> >
> > Notably, many PCI devices are enumerated at subsys_initcall time and so
> > will not race with user space. However, there are several scenarios
> > where PCI devices are created later on, such as with hotplug or when
> > drivers (pwrctrl or controller drivers) are built as modules.
> >
> > [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> > /sys/.../power/control, and the runtime PM usage counter can be subtle.
> > It appears that the intention of pm_runtime_forbid() /
> > pm_runtime_allow() is twofold:
> >
> > 1. Allow the user to disable runtime_pm (force device to always be
> > powered on) through sysfs.
> > 2. Allow the driver to start with runtime_pm disabled (device forced
> > on) and user space could later enable runtime_pm.
> >
> > This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> > specifically the section starting "The user space can effectively
> > disallow".
> >
> > This means that while pm_runtime_forbid() does technically increase the
> > runtime PM usage counter, this usage counter is not a guarantee of
> > functional correctness, because sysfs can decrease that count again.
> >
> > Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > Signed-off-by: Brian Norris <briannorris@...omium.org>
> > Cc: <stable@...r.kernel.org>
> > Reviewed-by: Rafael J. Wysocki (Intel) <rafael@...nel.org>
> > ---
> >
> > Changes in v3:
> > * Add Link to initial discussion
> > * Add Rafael's Reviewed-by
> > * Add lengthier footnotes about forbid vs allow vs sysfs
> >
> > Changes in v2:
> > * Update CC list
> > * Rework problem description
> > * Update solution: defer pm_runtime_enable(), instead of trying to
> > get()/put()
> >
> > drivers/pci/bus.c | 3 +++
> > drivers/pci/pci.c | 1 -
> > 2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index f26aec6ff588..fc66b6cb3a54 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -14,6 +14,7 @@
> > #include <linux/of.h>
> > #include <linux/of_platform.h>
> > #include <linux/platform_device.h>
> > +#include <linux/pm_runtime.h>
> > #include <linux/proc_fs.h>
> > #include <linux/slab.h>
> >
> > @@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> > put_device(&pdev->dev);
> > }
> >
> > + pm_runtime_enable(&dev->dev);
> > +
> > if (!dn || of_device_is_available(dn))
> > pci_dev_allow_binding(dev);
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b14dd064006c..f792164fa297 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
> > pci_pm_power_up_and_verify_state(dev);
> > pm_runtime_forbid(&dev->dev);
> > pm_runtime_set_active(&dev->dev);
>
> Actually, I think that the two statements above can be moved too.
>
> The pm_runtime_forbid() call doesn't matter until runtime PM is
> enabled and it is better to do pm_runtime_set_active() right before
> enabling runtime PM.
Ehh, sure, I can do that I suppose. That signs me up for fixing some
excessive Documentation/ that spells out exactly what goes on in
pci_pm_init() for some reason, but I can do that too :)
Brian
> > - pm_runtime_enable(&dev->dev);
> > }
> >
> > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> > --
Powered by blists - more mailing lists