linux-kernel - Re: [PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <wuuvrqxezybzdnijarlom4wvxlfgzgjoakwt7ixittz2jb4mal@ngjvq2rrt2ps>
Date: Thu, 13 Nov 2025 14:54:33 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jiaqi Yan <jiaqiyan@...gle.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, maz@...nel.org, 
	oliver.upton@...ux.dev, duenwen@...gle.com, rananta@...gle.com, jthoughton@...gle.com, 
	vsethi@...dia.com, joey.gouly@....com, suzuki.poulose@....com, yuzenghui@...wei.com, 
	catalin.marinas@....com, will@...nel.org, pbonzini@...hat.com, corbet@....net, 
	shuah@...nel.org, kvm@...r.kernel.org, kvmarm@...ts.linux.dev, 
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, 
	linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM_SEA

Hi,

On Mon, Nov 10, 2025 at 09:41:33AM -0800, Jiaqi Yan wrote:
> On Mon, Oct 20, 2025 at 7:46 AM Jason Gunthorpe <jgg@...dia.com> wrote:
> >
> > On Mon, Oct 13, 2025 at 06:59:00PM +0000, Jiaqi Yan wrote:
> > > Problem
> > > =======
> > >
> > > When host APEI is unable to claim a synchronous external abort (SEA)
> > > during guest abort, today KVM directly injects an asynchronous SError
> > > into the VCPU then resumes it. The injected SError usually results in
> > > unpleasant guest kernel panic.
> > >
> > > One of the major situation of guest SEA is when VCPU consumes recoverable
> > > uncorrected memory error (UER), which is not uncommon at all in modern
> > > datacenter servers with large amounts of physical memory. Although SError
> > > and guest panic is sufficient to stop the propagation of corrupted memory,
> > > there is room to recover from an UER in a more graceful manner.
> > >
> > > Proposed Solution
> > > =================
> > >
> > > The idea is, we can replay the SEA to the faulting VCPU. If the memory
> > > error consumption or the fault that cause SEA is not from guest kernel,
> > > the blast radius can be limited to the poison-consuming guest process,
> > > while the VM can keep running.

I like the idea of having a "guest-first"/"host-first" approach for APEI,
letting userspace (likely rasdaemon) to decide to handle hardware errors
either at the guest or at the host. Yet, it sounds wrong to have a flag
called KVM_EXIT_ARM_SEA, as:

    1. This is not exclusive to ARM;
    2. There are other notification mechanisms that can rise an APEI
       errors. For instance QEMU code defines:

    ACPI_GHES_NOTIFY_POLLED = 0,
    ACPI_GHES_NOTIFY_EXTERNAL = 1,
    ACPI_GHES_NOTIFY_LOCAL = 2,
    ACPI_GHES_NOTIFY_SCI = 3,
    ACPI_GHES_NOTIFY_NMI = 4,
    ACPI_GHES_NOTIFY_CMCI = 5,
    ACPI_GHES_NOTIFY_MCE = 6,
    ACPI_GHES_NOTIFY_GPIO = 7,
    ACPI_GHES_NOTIFY_SEA = 8,
    ACPI_GHES_NOTIFY_SEI = 9,
    ACPI_GHES_NOTIFY_GSIV = 10,
    ACPI_GHES_NOTIFY_SDEI = 11,
    ACPI_GHES_NOTIFY_RESERVED = 12

 - even on arm. QEMU currently implements two mechanisms (SEA and GPIO);
 - once we implement the same feature on Intel, it will likely use
   NMI, MCE and/or SCI.

So, IMO, the best would be to use a more generic name like
KVM_EXIT_APEI or KVM_EXIT_GHES - or maybe even name it the way it really
is meant: KVM_EXIT_ACPI_GUEST_FIRST.

That's said, I'd say that we need an implementation on a real userspace
applicaton to be able to test it (rasdaemon being the obvious candidate).

In order to test, the better is to use the new QEMU code (for 10.2) to
allow injecting hardware errors via QMP.

Regards,
Mauro