linux-kernel - Re: [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly put devices into D0 when initializing"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKAwkKu4bePg_NJ9SORcvwgkKyrr7yhGVjFyDQR+d18MtrbyDA@mail.gmail.com>
Date: Fri, 5 Dec 2025 16:06:47 +1300
From: Matthew Ruffell <matthew.ruffell@...onical.com>
To: Mario Limonciello <mario.limonciello@....com>
Cc: "bhelgaas@...gle.com" <bhelgaas@...gle.com>, linux-pci@...r.kernel.org, 
	lkml <linux-kernel@...r.kernel.org>, Jay Vosburgh <jay.vosburgh@...onical.com>
Subject: Re: [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly
 put devices into D0 when initializing"

Hi Mario,

Again, thank you for your prompt response.

> That's at least what it seems like.  And I guess trying to set D0
> without bus mastering enabling is causing a problem.
>
> Could you try adding a pci_set_master() call to pci_power_up()?  This is
> what I have in mind (only compile tested):
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b14dd064006c..68661e333032 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1323,6 +1323,7 @@ int pci_power_up(struct pci_dev *dev)
>                  return -EIO;
>          }
>
> +       pci_set_master(dev);
>          pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>          if (PCI_POSSIBLE_ERROR(pmcsr)) {
>                  pci_err(dev, "Unable to change power state from %s to
> D0, device inaccessible\n",

I built a test kernel, using the very same git hash as yesterday,
51ab33fc0a8bef9454849371ef897a1241911b37
with this pci_set_master(dev) applied, and yes, kexec succeeds and the
system boots normally.

Would you do a global pci_set_master(dev) like this, or would you gate
it behind a check to see if the system is being kexec'd?

I then patched pci_device_shutdown() with the below patch to capture
state information of each NVME device, and then halted the system.

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 302d61783f6c..ac5dc8a466d2 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -510,6 +510,8 @@ static void pci_device_shutdown(struct device *dev)
        if (drv && drv->shutdown)
                drv->shutdown(pci_dev);

+       printk(KERN_WARNING "mruffell: vendor: %x, device: %x, state:
%x\n", pci_dev->vendor, pci_dev->device, pci_dev->current_state);
+       pci_warn(pci_dev, "mruffell: Current PCI device.");
        /*
         * If this is a kexec reboot, turn off Bus Master bit on the
         * device to tell it to not continue to do DMA. Don't touch

Full log in pastebin: https://paste.ubuntu.com/p/QBGVbNh2Bs/

Everything was in state 0 / PCI_D0.

The nvme device itself:

[  295.721701] mruffell: vendor: 1d0f, device: 61, state: 0
[  295.740647] nvme 0000:90:00.0: mruffell: Current PCI device.

This would indeed pass the check in pci_device_shutdown, and clear the
bus master bit.

     if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
             pci_clear_master(pci_dev);

I then reverted "907a7a2 PCI/PM: Set up runtime PM even for devices
without PCI PM" and
"4d4c10f PCI: Explicitly put devices into D0 when initializing"  and
then halted the system again:

Full log in pastebin: https://paste.ubuntu.com/p/VRrJHjmnxN/

The nvme was still in state 0 / PCI_D0:


> I have a relatively ignorant question.  Can you reproduce with kdump and
> a crash too?
>
> I don't actually know if you configure kdump and then crash the kernel
> (say magic sys-rq key), does pci_device_shutdown() get called in order
> to do the kexec?  Or because the kernel is already in a crash state is
> there just a jump into the crash kernel image location?

I will