[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67381f3b-4aee-a314-b5dd-2b7d987a7794@linux.intel.com>
Date: Fri, 17 Oct 2025 14:49:35 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Lukas Wunner <lukas@...ner.de>
cc: Brian Norris <briannorris@...omium.org>,
Bjorn Helgaas <bhelgaas@...gle.com>, LKML <linux-kernel@...r.kernel.org>,
linux-pm@...r.kernel.org, "Rafael J . Wysocki" <rafael@...nel.org>,
linux-pci@...r.kernel.org
Subject: Re: [PATCH] PCI/PM: Prevent runtime suspend before devices are fully
initialized
On Fri, 17 Oct 2025, Lukas Wunner wrote:
> [cc += Ilpo]
>
> On Thu, Oct 16, 2025 at 03:53:35PM -0700, Brian Norris wrote:
> > PCI devices are created via pci_scan_slot() and similar, and are
> > promptly configured for runtime PM (pci_pm_init()). They are initially
> > prevented from suspending by way of pm_runtime_forbid(); however, it's
> > expected that user space may override this via sysfs [1].
Is this true as pm_runtime_forbid() also increases PM usage count?
"void pm_runtime_forbid(struct device *dev);
unset the power.runtime_auto flag for the device and increase its
usage counter (used by the /sys/devices/.../power/control interface to
effectively prevent the device from being power managed at run time)"
> > Now, sometime after initial scan, a PCI device receives its BAR
> > configuration (pci_assign_unassigned_bus_resources(), etc.).
> >
> > If a PCI device is allowed to suspend between pci_scan_slot() and
> > pci_assign_unassigned_bus_resources(), then pci-driver.c will
> > save/restore incorrect BAR configuration for the device, and the device
> > may cease to function.
> >
> > This behavior races with user space, since user space may enable runtime
> > PM [1] as soon as it sees the device, which may be before BAR
> > configuration.
> >
> > Prevent suspending in this intermediate state by holding a runtime PM
> > reference until the device is fully initialized and ready for probe().
>
> Not sure if that is comprehensible by everybody. The point is that
> unbound devices are left in D0 but are nevertheless allowed to
> (logically) runtime suspend. And pci_pm_runtime_suspend() may call
> pci_save_state() while config space isn't fully initialized yet,
> or pci_pm_runtime_resume() may call pci_restore_state() (via
> pci_pm_default_resume_early()) and overwrite initialized config space
> with uninitialized data.
>
> Have you actually seen this happen in practice? Normally enumeration
> happens during subsys_initcall time, when user space isn't running yet.
> Hotplug may be an exception though.
Adding that pm_runtime_get_noresume() doesn't look useful given
pm_runtime_forbid() already increases PM usage count. If this problem is
actually seen in practice, there could a bug elsewhere where something
decrements usage count too early so this change "helps" by double
incrementing the usage count.
To find more information what's going on, one could try to trace events
for the PM usage count (though last time I looked not all paths that
change PM usage count were covered by the event and adding the
trace_event() calls into the header turned out too much magic for me to
figure out so I couldn't solve the problem).
> Patch LGTM in principle, but adding Ilpo to cc who is refactoring PCI
> resource allocation and may judge whether this can actually happen.
I can see there could be other failure modes besides just saving wrong
config if devices get suspended underneath the resource assignment
algorithm.
Besides hotplug, also BAR resize does changes the resources and BARs.
This case is not helped by this patch.
I also recently learned some DT platforms do the "actual" scan for the bus
later on Link Up event through irq which could perhaps occur late enough,
I dunno for sure.
> I think the code comments you're adding are a little verbose and a simple
> /* acquired in pci_pm_init() */ in pci_bus_add_device() may be sufficient.
I'm also not entirely convinced these always pair or if the pci_dev may
get removed before pci_bus_add_device(), see e.g., enable_slot() in
hotplug/acpiphp_glue.c that does acpiphp_sanitize_bus() before
pci_bus_add_devices() (which could have other bugs too as is).
> Also, I think it is neither necessary nor useful to actually cc the e-mail
> to stable@...r.kernel.org if you include a stable designation in the
> patch. I believe stable maintainers only pick up backports from that list,
> not patches intended for upstream.
Probably some tool will auto-insert the Cc: tags as receipients so it
might be non-trivial to get rid of it.
--
i.
Powered by blists - more mailing lists