linux-kernel - Re: [PATCH] PCI/PM: Ensure power-up succeeded before restoring MMIO state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251023172547.GA1301778@bhelgaas>
Date: Thu, 23 Oct 2025 12:25:47 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Brian Norris <briannorris@...omium.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-kernel@...r.kernel.org,
	linux-pci@...r.kernel.org, Brian Norris <briannorris@...gle.com>,
	stable@...r.kernel.org,
	Mario Limonciello <mario.limonciello@....com>,
	"Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH] PCI/PM: Ensure power-up succeeded before restoring MMIO
 state

[+cc Mario, Rafael]

On Thu, Aug 21, 2025 at 07:58:12AM -0700, Brian Norris wrote:
> From: Brian Norris <briannorris@...gle.com>
> 
> As the comments in pci_pm_thaw_noirq() suggest, pci_restore_state() may
> need to restore MSI-X state in MMIO space. This is only possible if we
> reach D0; if we failed to power up, this might produce a fatal error
> when touching memory space.
> 
> Check for errors (as the "verify" in "pci_pm_power_up_and_verify_state"
> implies), and skip restoring if it fails.
> 
> This mitigates errors seen during resume_noirq, for example, when the
> platform did not resume the link properly.
> 
> Cc: stable@...r.kernel.org
> Signed-off-by: Brian Norris <briannorris@...gle.com>
> Signed-off-by: Brian Norris <briannorris@...omium.org>
> ---
> 
>  drivers/pci/pci-driver.c | 12 +++++++++---
>  drivers/pci/pci.c        | 13 +++++++++++--
>  drivers/pci/pci.h        |  2 +-
>  3 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 302d61783f6c..d66d95bd0ca2 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -557,7 +557,13 @@ static void pci_pm_default_resume(struct pci_dev *pci_dev)
>  
>  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>  {
> -	pci_pm_power_up_and_verify_state(pci_dev);
> +	/*
> +	 * If we failed to reach D0, we'd better not touch MSI-X state in MMIO
> +	 * space.
> +	 */
> +	if (pci_pm_power_up_and_verify_state(pci_dev))
> +		return;

The MSI-X comment here seems oddly specific.

On most platforms, config/mem/io accesses to a device not in D0 result
in an error being logged, writes being dropped, and reads returning ~0
data.

I don't know the details, but I assume the fatal error is a problem
specific to arm64.

If the device is not in D0, we can avoid the problem here, but it
seems like we're just leaving a landmine for somebody else to hit
later.  The driver will surely access the device after resume, won't
it?  Is it better to wait for a fatal error there?

Even if we avoid errors here, aren't we effectively claiming to have
restored the device state, which is now a lie?

Even on other platforms, if the writes that are supposed to restore
the state are dropped because the device isn't in D0, the result is
also not what we expect, and something is probably broken.

Bjorn