[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fcqjl5a7m27f2mfpblnhgmozbipdjmvpdyk3m5hhzwcenp4cpg@m2ooa7ykrcvs>
Date: Wed, 3 Dec 2025 18:34:29 +0530
From: Naveen N Rao <naveen@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Peter Gonda <pgonda@...gle.com>,
Michael Roth <michael.roth@....com>, Vishal Annapurve <vannapurve@...gle.com>,
Ackerly Tng <ackerleytng@...gle.com>, Nikunj A Dadhania <nikunj@....com>,
Tom Lendacky <thomas.lendacky@....com>
Subject: Re: [PATCH 01/22] KVM: x86: Disallow read-only memslots for SEV-ES
and SEV-SNP (and TDX)
Hi Sean,
On Fri, Aug 09, 2024 at 12:02:58PM -0700, Sean Christopherson wrote:
> Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
> directly emulate instructions for ES/SNP, and instead the guest must
> explicitly request emulation. Unless the guest explicitly requests
> emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
> SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
>
> But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
> because except for ES/SNP, doing so requires setting reserved bits in the
> SPTE, i.e. the SPTE can't be readable while also generating a #VC on
> writes. Because KVM never creates MMIO SPTEs and jumps directly to
> emulation, the guest never gets a #VC. And since KVM simply resumes the
> guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
> into an infinite #NPF loop if the vCPU attempts to write read-only memory.
>
> Disallow read-only memory for all VMs with protected state, i.e. for
> upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible
> to support read-only memory, as TDX uses EPT Violation #VE to reflect the
> fault into the guest, e.g. KVM could configure read-only SPTEs with RX
> protections and SUPPRESS_VE=0. But there is no strong use case for
> supporting read-only memslots on TDX, e.g. the main historical usage is
> to emulate option ROMs, but TDX disallows executing from shared memory.
> And if someone comes along with a legitimate, strong use case, the
> restriction can always be lifted for TDX.
>
> Don't bother trying to retroactively apply the restriction to SEV-ES
> VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't
> possibly work for SEV-ES, i.e. disallowing such memslots is really just
> means reporting an error to userspace instead of silently hanging vCPUs.
> Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
> isn't worth the marginal benefit it would provide userspace.
>
> Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES")
> Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support")
> Cc: Peter Gonda <pgonda@...gle.com>
> Cc: Michael Roth <michael.roth@....com>
> Cc: Vishal Annapurve <vannapurve@...gle.com>
> Cc: Ackerly Tng <ackerleytng@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/include/asm/kvm_host.h | 2 ++
> include/linux/kvm_host.h | 7 +++++++
> virt/kvm/kvm_main.c | 5 ++---
> 3 files changed, 11 insertions(+), 3 deletions(-)
As discussed in one of the previous PUCK calls, this is causing Qemu to
throw an error when trying to enable debug-swap for a SEV-ES guest when
using a pflash drive for OVMF. Sample qemu invocation (*):
qemu-system-x86_64 ... \
-drive if=pflash,format=raw,unit=0,file=/path/to/OVMF_CODE.fd,readonly=on \
-drive if=pflash,format=raw,unit=1,file=/path/to/OVMF_VARS.fd \
-machine q35,confidential-guest-support=sev0 \
-object sev-guest,id=sev0,policy=0x5,cbitpos=51,reduced-phys-bits=1,debug-swap=on
This is expected since enabling debug-swap requires use of
KVM_SEV_INIT2, which implies a VM type of KVM_X86_SEV_ES_VM. However,
SEV-ES VMs that do not enable any VMSA SEV features (and are hence
KVM_X86_DEFAULT_VM type) are allowed to continue to launch though they
are also susceptible to this issue.
One of the suggestions in the call was to consider returning an error to
userspace instead. Is this close to what you had in mind:
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 73cdcbccc89e..19e27ed27e17 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -387,8 +387,10 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
* they can fix it by changing memory to shared, or they can
* provide a better error.
*/
- if (r == RET_PF_EMULATE && fault.is_private) {
- pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
+ if (r == RET_PF_EMULATE && (fault.is_private ||
+ (!fault.map_writable && fault.write && vcpu->arch.guest_state_protected))) {
+ if (fault.is_private)
+ pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
return -EFAULT;
}
This seems to work though Qemu seems to think we are asking it to
convert the memory to shared (so we probably need to signal this error
some other way?):
qemu-system-x86_64: Convert non guest_memfd backed memory region (0xf0000 ,+ 0x1000) to shared
Thoughts?
Thanks,
Naveen
--
(*) This requires below patches for Qemu, unless using IGVM:
https://lore.kernel.org/qemu-devel/cover.1761648149.git.naveen@kernel.org/
Powered by blists - more mailing lists