[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACw3F51QG70YpSfWaj_gQjAwoPcZ6uFa5dfd+Ave5PxQYDt-Ew@mail.gmail.com>
Date: Fri, 3 Oct 2025 14:34:23 -0700
From: Jiaqi Yan <jiaqiyan@...gle.com>
To: maz@...nel.org, oliver.upton@...ux.dev
Cc: joey.gouly@....com, suzuki.poulose@....com, yuzenghui@...wei.com,
catalin.marinas@....com, will@...nel.org, pbonzini@...hat.com, corbet@....net,
shuah@...nel.org, kvm@...r.kernel.org, kvmarm@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kselftest@...r.kernel.org,
duenwen@...gle.com, rananta@...gle.com, jthoughton@...gle.com
Subject: Re: [PATCH v3 0/3] VMM can handle guest SEA via KVM_EXIT_ARM_SEA
Hi Marc, Oliver, and other upstream friends, can you help review this
patch series? I would really appreciate any comments and feedback.
[sorry for resending, as previous msg was sent as HTML]
On Thu, Jul 31, 2025 at 1:58 PM Jiaqi Yan <jiaqiyan@...gle.com> wrote:
>
> Problem
> =======
>
> When host APEI is unable to claim a synchronous external abort (SEA)
> during guest abort, today KVM directly injects an asynchronous SError
> into the VCPU then resumes it. The injected SError usually results in
> unpleasant guest kernel panic.
>
> One of the major situation of guest SEA is when VCPU consumes recoverable
> uncorrected memory error (UER), which is not uncommon at all in modern
> datacenter servers with large amounts of physical memory. Although SError
> and guest panic is sufficient to stop the propagation of corrupted memory,
> there is room to recover from an UER in a more graceful manner.
>
> Proposed Solution
> =================
>
> The idea is, we can replay the SEA to the faulting VCPU. If the memory
> error consumption or the fault that cause SEA is not from guest kernel,
> the blast radius can be limited to the poison-consuming guest process,
> while the VM can keep running.
>
> In addition, instead of doing under the hood without involving userspace,
> there are benefits to redirect the SEA to VMM:
>
> - VM customers care about the disruptions caused by memory errors, and
> VMM usually has the responsibility to start the process of notifying
> the customers of memory error events in their VMs. For example some
> cloud provider emits a critical log in their observability UI [1], and
> provides a playbook for customers on how to mitigate disruptions to
> their workloads.
>
> - VMM can protect future memory error consumption by unmapping the poisoned
> pages from stage-2 page table with KVM userfault [2], or by splitting the
> memslot that contains the poisoned pages.
>
> - VMM can keep track of SEA events in the VM. When VMM thinks the status
> on the host or the VM is bad enough, e.g. number of distinct SEAs
> exceeds a threshold, it can restart the VM on another healthy host.
>
> - Behavior parity with x86 architecture. When machine check exception
> (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to
> let VMM either recover from the MCE, or terminate itself with VM.
> The prior RFC proposes to implement SIGBUS on arm64 as well, but
> Marc preferred KVM exit over signal [3]. However, implementation
> aside, returning SEA to VMM is on par with returning MCE to VMM.
>
> Once SEA is redirected to VMM, among other actions, VMM is encouraged
> to inject external aborts into the faulting VCPU.
>
> New UAPIs
> =========
>
> This patchset introduces following userspace-visible changes to empower
> VMM to control what happens for SEA on guest memory:
>
> - KVM_CAP_ARM_SEA_TO_USER. While taking SEA, if userspace has enabled
> this new capability at VM creation, and the SEA is not owned by kernel
> allocated memory, instead of injecting SError, return KVM_EXIT_ARM_SEA
> to userspace.
>
> - KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details
> about the SEA is provided in arm_sea as much as possible, including
> sanitized ESR value at EL2, faulting guest virtual and physical
> addresses if available.
>
> * From v2 [4]:
> - Rebased on "[PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection" [5]
> and kvmarm/next commit 7b8346bd9fce ("KVM: arm64: Don't attempt vLPI
> mappings when vPE allocation is disabled")
> - Took the host_owns_sea implementation from Oliver [6, 7].
> - Excluded the guest SEA injection patches.
> - Updated selftest.
>
> * From v1 [8]:
> - Rebased on commit 4d62121ce9b5 ("KVM: arm64: vgic-debug: Avoid
> dereferencing NULL ITE pointer").
> - Sanitize ESR_EL2 before reporting it to userspace.
> - Do not do KVM_EXIT_ARM_SEA when SEA is caused by memory allocated to
> stage-2 translation table.
>
> [1] https://cloud.google.com/solutions/sap/docs/manage-host-errors
> [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com
> [3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org
> [4] https://lore.kernel.org/kvm/20250604050902.3944054-1-jiaqiyan@google.com/
> [5] https://lore.kernel.org/kvmarm/20250729182342.3281742-1-oliver.upton@linux.dev/
> [6] https://lore.kernel.org/kvm/aHFohmTb9qR_JG1E@linux.dev/#t
> [7] https://lore.kernel.org/kvm/aHK-DPufhLy5Dtuk@linux.dev/
> [8] https://lore.kernel.org/kvm/20250505161412.1926643-1-jiaqiyan@google.com
>
> Jiaqi Yan (3):
> KVM: arm64: VM exit to userspace to handle SEA
> KVM: selftests: Test for KVM_EXIT_ARM_SEA
> Documentation: kvm: new UAPI for handling SEA
>
> Documentation/virt/kvm/api.rst | 61 ++++
> arch/arm64/include/asm/kvm_host.h | 2 +
> arch/arm64/kvm/arm.c | 5 +
> arch/arm64/kvm/mmu.c | 68 +++-
> include/uapi/linux/kvm.h | 10 +
> tools/arch/arm64/include/asm/esr.h | 2 +
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> .../testing/selftests/kvm/arm64/sea_to_user.c | 327 ++++++++++++++++++
> tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
> 9 files changed, 476 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c
>
> --
> 2.50.1.565.gc32cd1483b-goog
>
Powered by blists - more mailing lists