linux-kernel - Re: AMD SNP guest kdump broken since linuxnext-20250908

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNPxLQBxUau-FWtj@google.com>
Date: Wed, 24 Sep 2025 06:25:01 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Srikanth Aithal <sraithal@....com>
Cc: Linux-Next Mailing List <linux-next@...r.kernel.org>, open list <linux-kernel@...r.kernel.org>, 
	KVM <kvm@...r.kernel.org>, Ashish Kalra <Ashish.Kalra@....com>, 
	Ard Biesheuvel <ardb@...nel.org>, Borislav Petkov <bp@...en8.de>, Tom Lendacky <thomas.lendacky@....com>
Subject: Re: AMD SNP guest kdump broken since linuxnext-20250908

+Ard and Boris (and Tom for good measure)

On Wed, Sep 24, 2025, Srikanth Aithal wrote:
> Hello all,
> 
> kdump on an SNP guest is broken in linux-next, starting with next-20250908 [1].
> 
> kdump on an SNP guest works with the following kernels as the guest kernel:
> 
> 1. https://git.kernel.org/pub/scm/virt/kvm/kvm.git, kvm/next
> 2. git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git next-20250905
> 3. git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git v6.17-rc7
> 
> The crash log during kdump varies each time. I have attached all variants of
> the error console logs to this bug report as files, as they are too large to
> include here.
> 
> kdump with other guest types (normal, SEV, SEV-ES) is working fine.
> 
> I attempted bisecting multiple times, but due to varying error console
> messages—sometimes with a call trace, sometimes just a hang with no error
> messages, and sometimes with extensive register dumps including KVM hardware
> error messages—I had no success until now. Additionally, a couple of
> linux-next bisect attempt pointed to a merge commit where the parent commits
> had no issues, suggesting a possible merge problem.
> 
> I am also attaching the host kernel config and guest kernel config used for
> these tests.
> 
> Tests were conducted with the following component versions:
> 
>  * Host kernel: next-20250919
>  * QEMU version: v10.1.0
>  * EDK2: edk2-stable202508
>  * Platform: Milan with the latest BIOS v2.20
> 
> 
> Thank you,
> 
> Srikanth Aithal <Srikanth.Aithal@....com>
> 
> root@...ntu:~# echo c > /proc/sysrq-trigger
> [   26.686014] sysrq: Trigger a crash
> [   26.687006] Kernel panic - not syncing: sysrq triggered crash
> [   26.688594] CPU: 0 UID: 0 PID: 4235 Comm: bash Kdump: loaded Not tainted 6.17.0-rc7-next-20250923ce7f1a983b07 #1 PREEMPT(voluntary)
> [   26.691788] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
> [   26.693957] Call Trace:
> [   26.694681]  <TASK>
> [   26.695320]  vpanic+0x307/0x360
> [   26.696237]  panic+0x52/0x60
> [   26.697065]  sysrq_handle_crash+0x11/0x20
> [   26.698177]  __handle_sysrq+0xb6/0x170
> [   26.699220]  write_sysrq_trigger+0x50/0x70
> [   26.700358]  proc_reg_write+0x50/0x90
> [   26.701395]  ? preempt_count_add+0x42/0xa0
> [   26.702531]  vfs_write+0xf4/0x430
> [   26.703481]  ? handle_mm_fault+0xd0/0x200
> [   26.704602]  ksys_write+0x5c/0xd0
> [   26.705551]  do_syscall_64+0x4c/0x200
> [   26.706577]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   26.707961] RIP: 0033:0x7f4cb8024574
> [   26.708974] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
> [   26.713912] RSP: 002b:00007ffdad4f3208 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [   26.715976] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4cb8024574
> [   26.717905] RDX: 0000000000000002 RSI: 0000564731e37b80 RDI: 0000000000000001
> [   26.719843] RBP: 00007ffdad4f3230 R08: 0000000000000073 R09: 0000000000000000
> [   26.721797] R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> [   26.723715] R13: 0000564731e37b80 R14: 00007f4cb810c5c0 R15: 00007f4cb8109ee0
> [   26.725658]  </TASK>
> 
> [1373710140.379273] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [2800084354.542901] BUG: unable to handle page fault for address: ffffffff9a91e731
> [15541331571.597940] #PF: supervisor instruction fetch in kernel mode
> [11262208929.107056] #PF: error_code(0x0011) - permissions violation
> [15541331571.597940] PGD 800000e045067 P4D 800000e045067 PUD 800000e046063 PMD 80000021b8063 PTE 800800000e91e163

This is definitely a valid (i.e. not corrupted), NX mapping.

> [1373710140.379273] Oops: Oops: 0011 [#1] SMP NOPTI
> [11262208929.107056] CPU: 0 UID: 0 PID: 4235 Comm: bash Kdump: loaded Not tainted 6.17.0-rc7-next-20250923ce7f1a983b07 #1 PREEMPT(voluntary)
> [2800084354.542901] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
> [12688583143.270684] RIP: 0010:early_set_pages_state+0x0/0x120

Given that a lore search on early_set_pages_state lights up Ard's series[*] to
cleanup the boot code for SEV, and that said series is new in next-20250908 (NOT
in next-20250905), that seems like a likely culprit.

[*] https://lore.kernel.org/all/20250828102202.1849035-24-ardb+git@google.com

> [15541331571.597940] Code: 02 02 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 <02> 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 02 02 02
> [12688583143.270684] RSP: 0018:ffffb608807a7be0 EFLAGS: 00010006
> [1373710140.379273] RAX: ffff9ed0bfe53000 RBX: ffffffff9abecbe8 RCX: ffffb608807a7be8
> [2800084354.542901] RDX: 0000000000000001 RSI: 000000007fe53000 RDI: ffff9ed03fe53000
> [1373710140.379273] RBP: 0000000000000001 R08: 0000000000000001 R09: ffff9ed03fe53000
> [12688583143.270684] R10: 000000000f001000 R11: 0000000000000000 R12: ffff9ed03fe53000
> [15541331571.597940] R13: 0000000000000000 R14: ffff9ecfcf00a298 R15: 0000000000001000
> [11262208929.107056] FS:  00007f4cb7f05740(0000) GS:ffff9ed0a282c000(0000) knlGS:0000000000000000
> [2800084354.542901] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [18394079999.925196] CR2: ffffffff9a91e731 CR3: 000800000fb1c000 CR4: 00000000003506f0
> [12688583143.270684] Call Trace:
> [18394079999.925196]  <TASK>
> [2800084354.542901]  set_pages_state.part.0+0x63/0xa0
> [2800084354.542901]  snp_kexec_finish+0x432/0x490
> [12688583143.270684]  native_machine_crash_shutdown+0x65/0x90
> [15541331571.597940]  __crash_kexec+0x56/0x120
> [1373710140.379273]  ? __crash_kexec+0x104/0x120
> [12688583143.270684]  ? vpanic+0x2a2/0x360
> [18394079999.925196]  ? panic+0x52/0x60
> [11262208929.107056]  ? sysrq_handle_crash+0x11/0x20
> [16967705785.761568]  ? __handle_sysrq+0xb6/0x170
> [1373710140.379273]  ? write_sysrq_trigger+0x50/0x70
> [1373710140.379273]  ? proc_reg_write+0x50/0x90
> [18394079999.925196]  ? preempt_count_add+0x42/0xa0
> [2800084354.542901]  ? vfs_write+0xf4/0x430
> [11262208929.107056]  ? handle_mm_fault+0xd0/0x200
> [18394079999.925196]  ? ksys_write+0x5c/0xd0
> [12688583143.270684]  ? do_syscall_64+0x4c/0x200
> [11262208929.107056]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [15541331571.597940]  </TASK>
> [12688583143.270684] Modules linked in: efivarfs
> [2800084354.542901] CR2: ffffffff9a91e731
> [14114957357.434312] ---[ end trace 0000000000000000 ]---
> [11262208929.107056] RIP: 0010:early_set_pages_state+0x0/0x120
> [12688583143.270684] Code: 02 02 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 <02> 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 02 02 02
> [15541331571.597940] RSP: 0018:ffffb608807a7be0 EFLAGS: 00010006
> [14114957357.434312] RAX: ffff9ed0bfe53000 RBX: ffffffff9abecbe8 RCX: ffffb608807a7be8
> [2800084354.542901] RDX: 0000000000000001 RSI: 000000007fe53000 RDI: ffff9ed03fe53000
> [15541331571.597940] RBP: 0000000000000001 R08: 0000000000000001 R09: ffff9ed03fe53000
> [2800084354.542901] R10: 000000000f001000 R11: 0000000000000000 R12: ffff9ed03fe53000
> [2800084354.542901] R13: 0000000000000000 R14: ffff9ecfcf00a298 R15: 0000000000001000
> [2800084354.542901] FS:  00007f4cb7f05740(0000) GS:ffff9ed0a282c000(0000) knlGS:0000000000000000
> [14114957357.434312] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11262208929.107056] CR2: ffffffff9a91e731 CR3: 000800000fb1c000 CR4: 00000000003506f0
> [12688583143.270684] Kernel panic - not syncing: Fatal exception