linux-kernel - Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aS9f-K_MN0uYUCYx@google.com>
Date: Tue, 2 Dec 2025 13:54:00 -0800
From: Brian Norris <briannorris@...omium.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: René Rebe <rene@...ctco.de>,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
	Riccardo Mottola <riccardo.mottola@...ero.it>,
	Manivannan Sadhasivam <mani@...nel.org>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Lukas Wunner <lukas@...ner.de>,
	Mario Limonciello <mario.limonciello@....com>
Subject: Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC
 systems

On Tue, Dec 02, 2025 at 11:28:37AM -0600, Bjorn Helgaas wrote:
> I think we need some kind of analysis of what is happening to the PCI
> devices here.  I don't know why the CPU architecture per se would be
> related to PCI power management.

Agreed, and I think it will be very hard to ever make any traction on
modernizing the PM stack here if we can't any sort of "why?" answer out
of the systems that don't work. The last time this came up, the answer
was essentially:

https://lore.kernel.org/all/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/

  The DMI check at the end of pci_bridge_d3_possible() is really
  something to the effect of "there is no particular reason to prevent
  this bridge from going into D3, but try to avoid platforms where it
  may not work".

i.e., no specific reason, but a vague understanding that there is some
old HW that doesn't work. That's not very helpful for supporting non-DMI
systems that don't have a programmatic notion of "old."

OTOH, I sympathize with Rene, that it's hard to dig into what amounts to
new development on old platforms, and yet, they do remain broken.

> pci_bridge_d3_possible() is already a barely maintainable hodge podge
> of random things that work and don't work.  Generally speaking most of
> those cases relate to firmware.  

I wonder if we could take a different approach that helps straddle the
uncertain boundary here a bit:

 1) be more aggressive at *permitting* runtime PM / D3 for bridges
 (i.e., if we think a bridge might be OK to go to D3, then manage its
 get()/put() properly); and

 2) be less aggressive about default-enabling runtime suspend / D3
 (i.e., only call pm_runtime_allow() in drivers/pci/pcie/portdrv.c in
 limited circumstances).

For #2, that would actually match the documentation:

  Documentation/power/pci.rst

  The driver itself should not call pm_runtime_allow(), though.  Instead, it
  should let user space or some platform-specific code do that (user space can
  do it via sysfs as stated above), but it must be prepared to handle the
  runtime PM of the device correctly as soon as pm_runtime_allow() is called
  (which may happen at any time, even before the driver is loaded).

So instead of portdrv.c calling pm_runtime_allow(), we'd leave that
decision to user space (i.e., udev or similar). That will help limit the
impact of getting #1 "wrong." And it's possible the bad systems didn't
really want aggressive PM anyway, so it's not worth much trouble.

For #1, that means pci_bridge_d3_possible() would become more like
pci_bridge_d3_impossible(). We could leave it as-is, or at least ensure
it fails toward the "possible" side.

IOW, user space can choose to opt in by way of:

  echo auto > /sys/bus/pci/devices/[port device]/power/control

That might require some new udev rules if existing x86 systems are
supposed to retain their old behavior.

Personally, I care more about #1 (that the kernel manages pm_runtime_*()
refcounts properly, so that my systems *can* opt into aggressive PM),
and less about #2 (it's a fact of life that PM policy often requires
careful udev / sysfs management, and that the defaults will not
necessarily give the best power savings).

This might leave some old unmaintained systems as "D3 possible", but we
don't actually exercise it if user space doesn't poke
/sys/bus/pci/devices/[port device]/power/control.

Brian