linux-kernel - Re: [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly put devices into D0 when initializing"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKAwkKvmZUGi+gEhr1nw5MV+rfyVP=Exu4AW1_WOPHDH6tSYug@mail.gmail.com>
Date: Thu, 4 Dec 2025 18:04:15 +1300
From: Matthew Ruffell <matthew.ruffell@...onical.com>
To: Mario Limonciello <mario.limonciello@....com>
Cc: "bhelgaas@...gle.com" <bhelgaas@...gle.com>, linux-pci@...r.kernel.org, 
	lkml <linux-kernel@...r.kernel.org>, Jay Vosburgh <jay.vosburgh@...onical.com>
Subject: Re: [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly
 put devices into D0 when initializing"

Hi Mario,

I thank you for your prompt reply, and apologise for my delayed reply.
Answers inline.

> When you say AWS specific patches, can you be more specific?  What is
> missing from a mainline kernel to use this hardware?  IE; how do I know
> there aren't Ubuntu specific patches *causing* this issue.

I can reproduce the issue with the current HEAD of Linus's tree, with no
additional patches applied. My current HEAD for testing is the 6.19 merge
window, commit 51ab33fc0a8bef9454849371ef897a1241911b37.
To get the mainline build to work on c5.metal on AWS I needed to edit a few
config parameters, and I have attached the config I used.

> Now I've never used AWS - do you have an opportunity to do "regular"
> reboots, or only kexec reboots?
>
> This issue only happens with a kexec reboot, right?

We can do regular and kexec reboots with the c5.metal instance type. The issue
only happens with a kexec reboot.

> The first thing that jumps out at me is the code in
> pci_device_shutdown() that clears bus mastering for a kexec reboot.
> If you comment that out what happens?

I commented out the code that clears bus mastering, diff below, and kexec boots
correctly now, and the NVME drive appears just as it did before
"4d4c10f PCI: Explicitly put devices into D0 when initializing".

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 302d61783f6c..0cb14ff32475 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -517,8 +517,9 @@ static void pci_device_shutdown(struct device *dev)
         * If it is not a kexec reboot, firmware will hit the PCI
         * devices with big hammer and stop their DMA any way.
         */
-       if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
-               pci_clear_master(pci_dev);
+/*     if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
+ *             pci_clear_master(pci_dev);
+ */
 }

 #ifdef CONFIG_PM_SLEEP

Since this works, does that mean that the bus master bit isn't being set on the
NVME device on the other side of kexec?

> The next thing I would wonder if if you're compiling with
> CONFIG_KEXEC_JUMP and if that has an impact to your issue.  When this is
> defined there is a device suspend sequence in kernel_kexec() that is run
> which will run various suspend related callbacks.  Maybe the issue is
> actually in one of those callbacks.

Yes, Ubuntu kernels set CONFIG_KEXEC_JUMP=y. I did a build with
CONFIG_KEXEC_JUMP=n and it has the same symptoms.

> A possible way to determine this would be to run rtcwake to suspend and
> resume and see if the drive survives.  If it doesn't, it's a hint that
> there is something going on with power management in this drive or the
> bridge it's connected to.  Maybe one of them isn't handling D3 very well.

Unfortunately, this c5.metal instance type doesn't support rtcwake with mode mem
or disk, as hibernation is disabled on these instance types. But since
CONFIG_KEXEC_JUMP=n doesn't help,

I'm going to add some debug statements to pci_device_shutdown() to see what
state the NVME device is in with and without
"4d4c10f PCI: Explicitly put devices into D0 when initializing".

Thanks,
Matthew

Download attachment "config-6.19+c5metal1" of type "application/octet-stream" (275157 bytes)