linux-kernel - Re: [PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251020144646.GT316284@nvidia.com>
Date: Mon, 20 Oct 2025 11:46:46 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Jiaqi Yan <jiaqiyan@...gle.com>
Cc: maz@...nel.org, oliver.upton@...ux.dev, duenwen@...gle.com,
	rananta@...gle.com, jthoughton@...gle.com, vsethi@...dia.com,
	joey.gouly@....com, suzuki.poulose@....com, yuzenghui@...wei.com,
	catalin.marinas@....com, will@...nel.org, pbonzini@...hat.com,
	corbet@....net, shuah@...nel.org, kvm@...r.kernel.org,
	kvmarm@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM_SEA

On Mon, Oct 13, 2025 at 06:59:00PM +0000, Jiaqi Yan wrote:
> Problem
> =======
> 
> When host APEI is unable to claim a synchronous external abort (SEA)
> during guest abort, today KVM directly injects an asynchronous SError
> into the VCPU then resumes it. The injected SError usually results in
> unpleasant guest kernel panic.
> 
> One of the major situation of guest SEA is when VCPU consumes recoverable
> uncorrected memory error (UER), which is not uncommon at all in modern
> datacenter servers with large amounts of physical memory. Although SError
> and guest panic is sufficient to stop the propagation of corrupted memory,
> there is room to recover from an UER in a more graceful manner.
> 
> Proposed Solution
> =================
> 
> The idea is, we can replay the SEA to the faulting VCPU. If the memory
> error consumption or the fault that cause SEA is not from guest kernel,
> the blast radius can be limited to the poison-consuming guest process,
> while the VM can keep running.
> 
> In addition, instead of doing under the hood without involving userspace,
> there are benefits to redirect the SEA to VMM:
> 
> - VM customers care about the disruptions caused by memory errors, and
>   VMM usually has the responsibility to start the process of notifying
>   the customers of memory error events in their VMs. For example some
>   cloud provider emits a critical log in their observability UI [1], and
>   provides a playbook for customers on how to mitigate disruptions to
>   their workloads.
> 
> - VMM can protect future memory error consumption by unmapping the poisoned
>   pages from stage-2 page table with KVM userfault [2], or by splitting the
>   memslot that contains the poisoned pages.
> 
> - VMM can keep track of SEA events in the VM. When VMM thinks the status
>   on the host or the VM is bad enough, e.g. number of distinct SEAs
>   exceeds a threshold, it can restart the VM on another healthy host.
> 
> - Behavior parity with x86 architecture. When machine check exception
>   (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to
>   let VMM either recover from the MCE, or terminate itself with VM.
>   The prior RFC proposes to implement SIGBUS on arm64 as well, but
>   Marc preferred KVM exit over signal [3]. However, implementation
>   aside, returning SEA to VMM is on par with returning MCE to VMM.
> 
> Once SEA is redirected to VMM, among other actions, VMM is encouraged
> to inject external aborts into the faulting VCPU.

I don't know much about the KVM details but this explanation makes
sense to me and we also have use cases for all of what is written
here.

Thanks,
Jason