lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67381f3b-4aee-a314-b5dd-2b7d987a7794@linux.intel.com>
Date: Fri, 17 Oct 2025 14:49:35 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Lukas Wunner <lukas@...ner.de>
cc: Brian Norris <briannorris@...omium.org>, 
    Bjorn Helgaas <bhelgaas@...gle.com>, LKML <linux-kernel@...r.kernel.org>, 
    linux-pm@...r.kernel.org, "Rafael J . Wysocki" <rafael@...nel.org>, 
    linux-pci@...r.kernel.org
Subject: Re: [PATCH] PCI/PM: Prevent runtime suspend before devices are fully
 initialized

On Fri, 17 Oct 2025, Lukas Wunner wrote:

> [cc += Ilpo]
> 
> On Thu, Oct 16, 2025 at 03:53:35PM -0700, Brian Norris wrote:
> > PCI devices are created via pci_scan_slot() and similar, and are
> > promptly configured for runtime PM (pci_pm_init()). They are initially
> > prevented from suspending by way of pm_runtime_forbid(); however, it's
> > expected that user space may override this via sysfs [1].

Is this true as pm_runtime_forbid() also increases PM usage count?

"void pm_runtime_forbid(struct device *dev);

unset the power.runtime_auto flag for the device and increase its 
usage counter (used by the /sys/devices/.../power/control interface to 
effectively prevent the device from being power managed at run time)"

> > Now, sometime after initial scan, a PCI device receives its BAR
> > configuration (pci_assign_unassigned_bus_resources(), etc.).
> > 
> > If a PCI device is allowed to suspend between pci_scan_slot() and
> > pci_assign_unassigned_bus_resources(), then pci-driver.c will
> > save/restore incorrect BAR configuration for the device, and the device
> > may cease to function.
> > 
> > This behavior races with user space, since user space may enable runtime
> > PM [1] as soon as it sees the device, which may be before BAR
> > configuration.
> > 
> > Prevent suspending in this intermediate state by holding a runtime PM
> > reference until the device is fully initialized and ready for probe().
>
> Not sure if that is comprehensible by everybody.  The point is that
> unbound devices are left in D0 but are nevertheless allowed to
> (logically) runtime suspend.  And pci_pm_runtime_suspend() may call
> pci_save_state() while config space isn't fully initialized yet,
> or pci_pm_runtime_resume() may call pci_restore_state() (via
> pci_pm_default_resume_early()) and overwrite initialized config space
> with uninitialized data.
> 
> Have you actually seen this happen in practice?  Normally enumeration
> happens during subsys_initcall time, when user space isn't running yet.
> Hotplug may be an exception though.

Adding that pm_runtime_get_noresume() doesn't look useful given 
pm_runtime_forbid() already increases PM usage count. If this problem is 
actually seen in practice, there could a bug elsewhere where something 
decrements usage count too early so this change "helps" by double 
incrementing the usage count.

To find more information what's going on, one could try to trace events 
for the PM usage count (though last time I looked not all paths that 
change PM usage count were covered by the event and adding the 
trace_event() calls into the header turned out too much magic for me to 
figure out so I couldn't solve the problem).

> Patch LGTM in principle, but adding Ilpo to cc who is refactoring PCI
> resource allocation and may judge whether this can actually happen.

I can see there could be other failure modes besides just saving wrong 
config if devices get suspended underneath the resource assignment 
algorithm.

Besides hotplug, also BAR resize does changes the resources and BARs.
This case is not helped by this patch.

I also recently learned some DT platforms do the "actual" scan for the bus 
later on Link Up event through irq which could perhaps occur late enough, 
I dunno for sure.

> I think the code comments you're adding are a little verbose and a simple
> /* acquired in pci_pm_init() */ in pci_bus_add_device() may be sufficient.

I'm also not entirely convinced these always pair or if the pci_dev may 
get removed before pci_bus_add_device(), see e.g., enable_slot() in 
hotplug/acpiphp_glue.c that does acpiphp_sanitize_bus() before 
pci_bus_add_devices() (which could have other bugs too as is).

> Also, I think it is neither necessary nor useful to actually cc the e-mail
> to stable@...r.kernel.org if you include a stable designation in the
> patch.  I believe stable maintainers only pick up backports from that list,
> not patches intended for upstream.

Probably some tool will auto-insert the Cc: tags as receipients so it 
might be non-trivial to get rid of it.

-- 
 i.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ