linux-kernel - [RFC] AMD VM crashing on deferred memory error injection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com>
Date: Mon, 9 Feb 2026 17:36:32 +0100
From: William Roche <william.roche@...cle.com>
To: "Ghannam, Yazen" <Yazen.Ghannam@....com>, Tony Luck
 <tony.luck@...el.com>,
        bp@...en8.de, Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
        dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com
Cc: "Allen, John" <John.Allen@....com>, linux-edac@...r.kernel.org,
        linux-kernel@...r.kernel.org, Jane Chu <jane.chu@...cle.com>
Subject: [RFC] AMD VM crashing on deferred memory error injection

Hello,

I'd like to bring to your attention a consequence of the integration of
this set of commits early into the 6.19 kernel:

   2025-11-04 14:55 [PATCH v8 0/8] AMD MCA interrupts rework
  
https://lore.kernel.org/all/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com/

Yazen Ghannam (7):
       x86/mce: Unify AMD THR handler with MCA Polling
       x86/mce: Unify AMD DFR handler with MCA Polling
       x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
       x86/mce/amd: Support SMCA Corrected Error Interrupt
       x86/mce/amd: Remove redundant reset_block()
       x86/mce/amd: Define threshold restart function for banks
       x86/mce: Save and use APEI corrected threshold limit


An AMD Qemu VM running this kernel is no longer able to deal with the
injection of a deferred memory error, and crashes with:

[  333.420854] mce: MSR access error: WRMSR to 0xc0002098 (tried to 
write 0x0000000000000000) at rIP: 0xffffffff8229894d 
(mce_wrmsrq+0x1d/0x60)
[  333.428105] Call Trace: 
  

[  333.429566]  <IRQ> 
  

[  333.430745]  amd_clear_bank+0x6e/0x70 
  

[  333.432828]  machine_check_poll+0x228/0x2e0 
  

[  333.435068]  ? __pfx_mce_timer_fn+0x10/0x10 
  

[  333.437241]  mce_timer_fn+0xb1/0x130 
  

[  333.438966]  ? __pfx_mce_timer_fn+0x10/0x10 
  

[  333.441380]  call_timer_fn+0x26/0x120 
  

[  333.443518]  __run_timers+0x202/0x290 
  

[  333.445763]  run_timer_softirq+0x49/0x100 
  

[  333.447908]  handle_softirqs+0xeb/0x2c0 
  

[  333.449863]  __irq_exit_rcu+0xda/0x100 
  

[  333.452065]  sysvec_apic_timer_interrupt+0x71/0x90 
  

[  333.454846]  </IRQ> 
  

[  333.456192]  <TASK> 
  

[  333.457520]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  333.460355] RIP: 0010:pv_native_safe_halt+0xf/0x20
[  333.463203] Code: 20 d0 e9 5f 99 e6 fe 0f 1f 40 00 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 33 ee 18 00 fb 
f4 <e9> 37 990
[  333.472816] RSP: 0018:ffffffff83403e78 EFLAGS: 00000246
[  333.475848] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[  333.479481] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[  333.483492] RBP: ffffffff83412980 R08: 0000000000000000 R09: 
0000000000000000
[  333.487503] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000000
[  333.491482] R13: 0000000000000000 R14: 0000000000000000 R15: 
00000000000947d0
[  333.495258]  default_idle+0x9/0x30
[  333.497283]  default_idle_call+0x28/0x100
[  333.499641]  cpuidle_idle_call+0x12e/0x180
[  333.502087]  do_idle+0x77/0xb0
[  333.503914]  cpu_startup_entry+0x29/0x30
[  333.506337]  rest_init+0xcc/0xd0
[  333.508296]  start_kernel+0x4df/0x4e0
[  333.510491]  x86_64_start_reservations+0x32/0x40
[  333.513101]  x86_64_start_kernel+0xce/0xd0
[  333.515433]  common_startup_64+0x13e/0x141
[  333.517920]  </TASK>
[  333.519468] Kernel panic - not syncing: MCA architectural violation!


The problem appeared with the addition of clearing MCA_DESTAT for all
deferred errors in the amd_clear_bank() function by this kernel commit:

     7cb735d7c0cb  x86/mce: Unify AMD DFR handler with MCA Polling

+       /* Clear MCA_DESTAT for all deferred errors even those logged in 
MCA_STATUS. */
+       if (m->status & MCI_STATUS_DEFERRED)
+               mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);


Where a Qemu AMD implementation of MCE injection for deferred errors
relies on machine_check_poll() picking up these errors.
As indicated in Qemu change:
     4b77512b2782  i386: Fix MCE support for AMD hosts
https://lore.kernel.org/qemu-devel/20240603193622.47156-2-john.allen@amd.com/


When a Qemu process receives the SIGBUS information from the host, it
generates a virtual MCE to be dealt by the VM kernel machine_check_poll().
But clearing MCA_DESTAT doesn't seem to be allowed and triggers an
exception. Which looks like a kernel & AMD SMCA contract mismatch (?)

So should we consider that the Qemu platform has to allow the change or
is the kernel missing guards around clearing this MCA bank after
injected UEs on this platform ?


FYI, to reproduce the problem:
. I used a QEMU Standard PC q35:

qemu-system-x86_64 --version
QEMU emulator version 10.2.50 (v10.2.0-1085-gcd5a79dc98)
Copyright (c) 2003-2026 Fabrice Bellard and the QEMU Project developers

qemu-system-x86_64 -smp 4 -m 20G -enable-kvm -cpu host -usb \
	-device usb-tablet -serial mon:stdio -M q35 \
	-nic user,model=e1000,hostfwd=tcp::60022-:22 -nographic \
	-drive file=disk.qcow2,cache=none

. Inject an error into this VM running a 6.19.0-rc1 or more recent kernel.
 From the host:
# modprobe hwpoison-inject
# echo <pfn> > /sys/kernel/debug/hwpoison/corrupt-pfn

Wait 5 minutes until the deferred error is handled by the VM kernel, and
the VM than crashes with the above stack trace...


. But removing the reset of MCA_DESTAT in the kernel amd_clear_bank()
function or adding this simple test makes the system work again as
before:


diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d9f9ee7db5c8..86b3070fbb40 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -860,7 +860,7 @@ void amd_clear_bank(struct mce *m)
         amd_reset_thr_limit(m->bank);

         /* Clear MCA_DESTAT for all deferred errors even those logged 
in MCA_STATUS. */
-       if (m->status & MCI_STATUS_DEFERRED)
+       if (m->status & MCI_STATUS_DEFERRED && !(m->status & 
MCI_STATUS_POISON))
                 mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);

         /* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */



According to me, this small kernel fix relies too much on a Qemu AMD
specific implementation detail.

Would you have a more appropriate fix to suggest please ?

Thanks in advance for your feedback.
William.