[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f87d29b-c30a-47f8-a519-0e1fba36f1a7@oracle.com>
Date: Wed, 11 Feb 2026 02:42:07 +0100
From: William Roche <william.roche@...cle.com>
To: Yazen Ghannam <yazen.ghannam@....com>, Tony Luck <tony.luck@...el.com>,
bp@...en8.de
Cc: Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
"Allen, John" <John.Allen@....com>, linux-edac@...r.kernel.org,
linux-kernel@...r.kernel.org, Jane Chu <jane.chu@...cle.com>
Subject: Re: [RFC] AMD VM crashing on deferred memory error injection
On 2/9/26 22:18, Yazen Ghannam wrote:
> On Mon, Feb 09, 2026 at 04:08:19PM -0500, Yazen Ghannam wrote:
>> On Mon, Feb 09, 2026 at 05:36:32PM +0100, William Roche wrote:
>
> [...]
>
>>> According to me, this small kernel fix relies too much on a Qemu AMD
>>> specific implementation detail.
>>>
>>> Would you have a more appropriate fix to suggest please ?
>>>
>>> Thanks in advance for your feedback.
>>> William.
>>
>> Thanks William for the report and details.
>>
>> Clearing "STATUS" registers is a normal part of MCA handling.
>>
>> We seem to allow clearing the regular "MCi_STATUS" register. I assume
>> this gets trapped/ignored by the hypervisor.
>>
>> I expect we need to do the same behavior for the "MCA_DESTAT" register.
>>
>> I'll do some research here, but please do share any pointers you may
>> have.
Yazen, I'm simply trying to find an answer in the AMD64 Architecture
Programmer's Manual, Volume 2: System Programming, 24593
This documents indicates (In chapter 9.3.3.4 MCA Deferred Error Status
Register) that:
"When the deferred error has been processed by the deferred error
handler, MCA_DESTAT should be
cleared. If MCA_STATUS also contains a deferred error, MCA_STATUS should
be cleared."
So I would imagine that allowing the reset of MCA_DESTAT the same way as
MCA_STATUS should be what the platform has to allow (or ignore).
>
> Sorry for the rapid reply, but I think this is where we need an update.
>
> Linux:
> arch/x86/kvm/x86.c : set_msr_mce()
>
> Please note the comment:
> "All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR."
>
> We should include the MCA_DESTAT register range here.
>
> What do you think?
But before trying to update the set_msr_mce() function, I don't think
that KVM keeps track of an MSR_AMD64_SMCA_MCx_DESTAT set of registers.
I can see mce_banks (for ctl, status, addr and misc) and mci_ctl2_banks
locations in struct kvm_vcpu_arch, but I don't see a location for SMCA
banks like MCA_DESTAT MSRs.
So if we make kvm ignore this update instead of raising a #GP error,
would it be a valid solution ?
Thanks,
William.
Powered by blists - more mailing lists