[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKAwkKu4bePg_NJ9SORcvwgkKyrr7yhGVjFyDQR+d18MtrbyDA@mail.gmail.com>
Date: Fri, 5 Dec 2025 16:06:47 +1300
From: Matthew Ruffell <matthew.ruffell@...onical.com>
To: Mario Limonciello <mario.limonciello@....com>
Cc: "bhelgaas@...gle.com" <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
lkml <linux-kernel@...r.kernel.org>, Jay Vosburgh <jay.vosburgh@...onical.com>
Subject: Re: [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly
put devices into D0 when initializing"
Hi Mario,
Again, thank you for your prompt response.
> That's at least what it seems like. And I guess trying to set D0
> without bus mastering enabling is causing a problem.
>
> Could you try adding a pci_set_master() call to pci_power_up()? This is
> what I have in mind (only compile tested):
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b14dd064006c..68661e333032 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1323,6 +1323,7 @@ int pci_power_up(struct pci_dev *dev)
> return -EIO;
> }
>
> + pci_set_master(dev);
> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> if (PCI_POSSIBLE_ERROR(pmcsr)) {
> pci_err(dev, "Unable to change power state from %s to
> D0, device inaccessible\n",
I built a test kernel, using the very same git hash as yesterday,
51ab33fc0a8bef9454849371ef897a1241911b37
with this pci_set_master(dev) applied, and yes, kexec succeeds and the
system boots normally.
Would you do a global pci_set_master(dev) like this, or would you gate
it behind a check to see if the system is being kexec'd?
I then patched pci_device_shutdown() with the below patch to capture
state information of each NVME device, and then halted the system.
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 302d61783f6c..ac5dc8a466d2 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -510,6 +510,8 @@ static void pci_device_shutdown(struct device *dev)
if (drv && drv->shutdown)
drv->shutdown(pci_dev);
+ printk(KERN_WARNING "mruffell: vendor: %x, device: %x, state:
%x\n", pci_dev->vendor, pci_dev->device, pci_dev->current_state);
+ pci_warn(pci_dev, "mruffell: Current PCI device.");
/*
* If this is a kexec reboot, turn off Bus Master bit on the
* device to tell it to not continue to do DMA. Don't touch
Full log in pastebin: https://paste.ubuntu.com/p/QBGVbNh2Bs/
Everything was in state 0 / PCI_D0.
The nvme device itself:
[ 295.721701] mruffell: vendor: 1d0f, device: 61, state: 0
[ 295.740647] nvme 0000:90:00.0: mruffell: Current PCI device.
This would indeed pass the check in pci_device_shutdown, and clear the
bus master bit.
if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
pci_clear_master(pci_dev);
I then reverted "907a7a2 PCI/PM: Set up runtime PM even for devices
without PCI PM" and
"4d4c10f PCI: Explicitly put devices into D0 when initializing" and
then halted the system again:
Full log in pastebin: https://paste.ubuntu.com/p/VRrJHjmnxN/
The nvme was still in state 0 / PCI_D0:
> I have a relatively ignorant question. Can you reproduce with kdump and
> a crash too?
>
> I don't actually know if you configure kdump and then crash the kernel
> (say magic sys-rq key), does pci_device_shutdown() get called in order
> to do the kexec? Or because the kernel is already in a crash state is
> there just a jump into the crash kernel image location?
I will
Powered by blists - more mailing lists