lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPp3JLWtqFMjZheY@google.com>
Date: Thu, 23 Oct 2025 11:42:44 -0700
From: Brian Norris <briannorris@...omium.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	linux-pm@...r.kernel.org, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org, Lukas Wunner <lukas@...ner.de>
Subject: Re: [PATCH v3] PCI/PM: Prevent runtime suspend before devices are
 fully initialized

Hi again,

On Thu, Oct 23, 2025 at 11:34:57AM -0700, Brian Norris wrote:
> On Thu, Oct 23, 2025 at 11:41:38AM +0200, Rafael J. Wysocki wrote:
> > On Wed, Oct 22, 2025 at 11:15 PM Brian Norris <briannorris@...omium.org> wrote:
> > >
> > > Today, it's possible for a PCI device to be created and
> > > runtime-suspended before it is fully initialized. When that happens, the
> > > device will remain in D0, but the suspend process may save an
> > > intermediate version of that device's state -- for example, without
> > > appropriate BAR configuration. When the device later resumes, we'll
> > > restore invalid PCI state and the device may not function.
> > >
> > > Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > > until we've fully initialized the device.
> > >
> > > More details on how exactly this may occur:
> > >
> > > 1. PCI device is created by pci_scan_slot() or similar
> > > 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
> > >    device starts "active" and we initially prevent (pm_runtime_forbid())
> > >    suspend -- but see [*] footnote
> > > 3. Underlying 'struct device' is added to the system (device_add());
> > >    runtime PM can now be configured by user space
> > > 4. PCI device receives BAR configuration
> > >    (pci_assign_unassigned_bus_resources(), etc.)
> > > 5. PCI device is added to the system in pci_bus_add_device()
> > >
> > > The device may potentially suspend between #3 and #4.
> > >
> > > [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> > > design [**], this can be overridden by user space policy via
> > >
> > >   echo auto > /sys/bus/pci/devices/.../power/control
> > >
> > > Thus, the above #3/#4 sequence is racy with user space (udev or
> > > similar).
> > >
> > > Notably, many PCI devices are enumerated at subsys_initcall time and so
> > > will not race with user space. However, there are several scenarios
> > > where PCI devices are created later on, such as with hotplug or when
> > > drivers (pwrctrl or controller drivers) are built as modules.
> > >
> > > [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> > > /sys/.../power/control, and the runtime PM usage counter can be subtle.
> > > It appears that the intention of pm_runtime_forbid() /
> > > pm_runtime_allow() is twofold:
> > >
> > > 1. Allow the user to disable runtime_pm (force device to always be
> > >    powered on) through sysfs.
> > > 2. Allow the driver to start with runtime_pm disabled (device forced
> > >    on) and user space could later enable runtime_pm.
> > >
> > > This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> > > specifically the section starting "The user space can effectively
> > > disallow".
> > >
> > > This means that while pm_runtime_forbid() does technically increase the
> > > runtime PM usage counter, this usage counter is not a guarantee of
> > > functional correctness, because sysfs can decrease that count again.
> > >
> > > Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > > Signed-off-by: Brian Norris <briannorris@...omium.org>
> > > Cc: <stable@...r.kernel.org>
> > > Reviewed-by: Rafael J. Wysocki (Intel) <rafael@...nel.org>
> > > ---
> > >
> > > Changes in v3:
> > >  * Add Link to initial discussion
> > >  * Add Rafael's Reviewed-by
> > >  * Add lengthier footnotes about forbid vs allow vs sysfs
> > >
> > > Changes in v2:
> > >  * Update CC list
> > >  * Rework problem description
> > >  * Update solution: defer pm_runtime_enable(), instead of trying to
> > >    get()/put()
> > >
> > >  drivers/pci/bus.c | 3 +++
> > >  drivers/pci/pci.c | 1 -
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > > index f26aec6ff588..fc66b6cb3a54 100644
> > > --- a/drivers/pci/bus.c
> > > +++ b/drivers/pci/bus.c
> > > @@ -14,6 +14,7 @@
> > >  #include <linux/of.h>
> > >  #include <linux/of_platform.h>
> > >  #include <linux/platform_device.h>
> > > +#include <linux/pm_runtime.h>
> > >  #include <linux/proc_fs.h>
> > >  #include <linux/slab.h>
> > >
> > > @@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> > >                 put_device(&pdev->dev);
> > >         }
> > >
> > > +       pm_runtime_enable(&dev->dev);
> > > +
> > >         if (!dn || of_device_is_available(dn))
> > >                 pci_dev_allow_binding(dev);
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index b14dd064006c..f792164fa297 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
> > >         pci_pm_power_up_and_verify_state(dev);
> > >         pm_runtime_forbid(&dev->dev);
> > >         pm_runtime_set_active(&dev->dev);
> > 
> > Actually, I think that the two statements above can be moved too.
> > 
> > The pm_runtime_forbid() call doesn't matter until runtime PM is
> > enabled and it is better to do pm_runtime_set_active() right before
> > enabling runtime PM.
> 
> Ehh, sure, I can do that I suppose. That signs me up for fixing some
> excessive Documentation/ that spells out exactly what goes on in
> pci_pm_init() for some reason, but I can do that too :)

Hmm, never mind, I think I agreed with you too soon.

It should be ok to move pm_runtime_set_active(), but I believe moving
pm_runtime_forbid() would introduce a similar race condition problem to
the bug I'm resolving -- pci_bus_add_device() is run after device_add(),
and so user space might already see the device and try to configure
power/control. If we forbid() after that point, user space would be
confused.

Documentation/power/runtime_pm.rst even calls this out:

  "It should be noted, however, that if the user space has already
  intentionally changed the value of /sys/devices/.../power/control to
  "auto" to allow the driver to power manage the device at run time, the
  driver may confuse it by using pm_runtime_forbid() this way."

So I'd rather not make this change.

Brian

> > > -       pm_runtime_enable(&dev->dev);
> > >  }
> > >
> > >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> > > --
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ