[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0iBNOmMtqfqEbrYyuK2u+2J2+zZ-iQd1FvyCPjdvU2TJg@mail.gmail.com>
Date: Thu, 22 Jan 2026 19:17:21 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Brian Norris <briannorris@...omium.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, Lukas Wunner <lukas@...ner.de>, linux-pm@...r.kernel.org,
"Rafael J . Wysocki" <rafael@...nel.org>, Marek Szyprowski <m.szyprowski@...sung.com>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: Re: [PATCH v5] PCI/PM: Prevent runtime suspend until devices are
fully initialized
On Thu, Jan 22, 2026 at 6:49 PM Brian Norris <briannorris@...omium.org> wrote:
>
> Previously, it was possible for a PCI device to be runtime-suspended before
> it was fully initialized. When that happened, the suspend process could
> save invalid device state, for example, before BAR assignment. Restoring
> the invalid state during resume may leave the device non-functional.
>
> Prevent runtime suspend for PCI devices until they are fully initialized by
> deferring pm_runtime_enable().
>
> More details on how exactly this may occur:
>
> 1. PCI device is created by pci_scan_slot() or similar
>
> 2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and
> prevents runtime suspend prevented via pm_runtime_forbid()
>
> 3. pci_device_add() adds the underlying 'struct device' via device_add(),
> which means user space can allow runtime suspend, e.g.,
>
> echo auto > /sys/bus/pci/devices/.../power/control
>
> 4. PCI device receives BAR configuration
> (pci_assign_unassigned_bus_resources(), etc.)
>
> 5. pci_bus_add_device() applies final fixups, saves device state, and
> tries to attach a driver
>
> The device may potentially be suspended between #3 and #5, so this is racy
> with user space (udev or similar).
>
> Many PCI devices are enumerated at subsys_initcall time and so will not
> race with user space, but devices created later by hotplug or modular
> pwrctrl or host controller drivers are susceptible to this race.
>
> More runtime PM details at the first Link: below.
>
> Signed-off-by: Brian Norris <briannorris@...omium.org>
> Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>
> Cc: stable@...r.kernel.org
> Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> Link: https://lore.kernel.org/all/0e35a4e1-894a-47c1-9528-fc5ffbafd9e2@samsung.com/
>
> Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> Cc: <stable@...r.kernel.org>
> ---
>
> Changes in v5:
> * Put pm_runtime_set_active() back where it was, to ensure our parent
> can't suspend before we're really ready. (See bug report in 2nd
> "Link:")
> * Add comments
> * Update commit description with Bjorn's rewrite
> * Add Marek's Tested-by
>
> Changes in v4:
> * Move pm_runtime_set_active() too
>
> Changes in v3:
> * Add Link to initial discussion
> * Add Rafael's Reviewed-by
> * Add lengthier footnotes about forbid vs allow vs sysfs
>
> Changes in v2:
> * Update CC list
> * Rework problem description
> * Update solution: defer pm_runtime_enable(), instead of trying to
> get()/put()
>
> drivers/pci/bus.c | 7 +++++++
> drivers/pci/pci.c | 5 ++++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 4383a36fd6ca..90954f81962b 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -15,6 +15,7 @@
> #include <linux/of.h>
> #include <linux/of_platform.h>
> #include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> #include <linux/proc_fs.h>
> #include <linux/slab.h>
>
> @@ -379,6 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
> put_device(&pdev->dev);
> }
>
> + /*
> + * Enable runtime PM (and potentially suspend) only after we've fully
> + * configured the PCI state.
> + */
I would make it a bit more precise, something like "Enable runtime PM,
which potentially allows the device to suspend immediately, only after
the PCI state has been configured completely."
Also, it is not particularly what "we" means in kernel code comments,
so I generally avoid phrasing them this way.
> + pm_runtime_enable(&dev->dev);
> +
> if (!dn || of_device_is_available(dn))
> pci_dev_allow_binding(dev);
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..07b0d029aa51 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3196,8 +3196,11 @@ void pci_pm_init(struct pci_dev *dev)
> poweron:
> pci_pm_power_up_and_verify_state(dev);
> pm_runtime_forbid(&dev->dev);
> + /*
> + * Mark ourselves active now, to prevent our parent from suspending
> + * while we finish configuring the PCI device.
> + */
I would rephrase the comment this way:
"Runtime PM will be enabled for the device when it has been fully
configured, but since its parent and suppliers may suspend in the
meantime, prevent them from doing so by changing the device's runtime
PM status to "active"."
> pm_runtime_set_active(&dev->dev);
> - pm_runtime_enable(&dev->dev);
> }
>
> static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> --
Powered by blists - more mailing lists