lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fcqjl5a7m27f2mfpblnhgmozbipdjmvpdyk3m5hhzwcenp4cpg@m2ooa7ykrcvs>
Date: Wed, 3 Dec 2025 18:34:29 +0530
From: Naveen N Rao <naveen@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Peter Gonda <pgonda@...gle.com>, 
	Michael Roth <michael.roth@....com>, Vishal Annapurve <vannapurve@...gle.com>, 
	Ackerly Tng <ackerleytng@...gle.com>, Nikunj A Dadhania <nikunj@....com>, 
	Tom Lendacky <thomas.lendacky@....com>
Subject: Re: [PATCH 01/22] KVM: x86: Disallow read-only memslots for SEV-ES
 and SEV-SNP (and TDX)

Hi Sean,

On Fri, Aug 09, 2024 at 12:02:58PM -0700, Sean Christopherson wrote:
> Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
> directly emulate instructions for ES/SNP, and instead the guest must
> explicitly request emulation.  Unless the guest explicitly requests
> emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
> SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
> 
> But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
> because except for ES/SNP, doing so requires setting reserved bits in the
> SPTE, i.e. the SPTE can't be readable while also generating a #VC on
> writes.  Because KVM never creates MMIO SPTEs and jumps directly to
> emulation, the guest never gets a #VC.  And since KVM simply resumes the
> guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
> into an infinite #NPF loop if the vCPU attempts to write read-only memory.
> 
> Disallow read-only memory for all VMs with protected state, i.e. for
> upcoming TDX VMs as well as ES/SNP VMs.  For TDX, it's actually possible
> to support read-only memory, as TDX uses EPT Violation #VE to reflect the
> fault into the guest, e.g. KVM could configure read-only SPTEs with RX
> protections and SUPPRESS_VE=0.  But there is no strong use case for
> supporting read-only memslots on TDX, e.g. the main historical usage is
> to emulate option ROMs, but TDX disallows executing from shared memory.
> And if someone comes along with a legitimate, strong use case, the
> restriction can always be lifted for TDX.
> 
> Don't bother trying to retroactively apply the restriction to SEV-ES
> VMs that are created as type KVM_X86_DEFAULT_VM.  Read-only memslots can't
> possibly work for SEV-ES, i.e. disallowing such memslots is really just
> means reporting an error to userspace instead of silently hanging vCPUs.
> Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
> isn't worth the marginal benefit it would provide userspace.
> 
> Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES")
> Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support")
> Cc: Peter Gonda <pgonda@...gle.com>
> Cc: Michael Roth <michael.roth@....com>
> Cc: Vishal Annapurve <vannapurve@...gle.com>
> Cc: Ackerly Tng <ackerleytng@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 2 ++
>  include/linux/kvm_host.h        | 7 +++++++
>  virt/kvm/kvm_main.c             | 5 ++---
>  3 files changed, 11 insertions(+), 3 deletions(-)

As discussed in one of the previous PUCK calls, this is causing Qemu to 
throw an error when trying to enable debug-swap for a SEV-ES guest when 
using a pflash drive for OVMF. Sample qemu invocation (*):
  qemu-system-x86_64 ... \
    -drive if=pflash,format=raw,unit=0,file=/path/to/OVMF_CODE.fd,readonly=on \
    -drive if=pflash,format=raw,unit=1,file=/path/to/OVMF_VARS.fd \
    -machine q35,confidential-guest-support=sev0 \
    -object sev-guest,id=sev0,policy=0x5,cbitpos=51,reduced-phys-bits=1,debug-swap=on

This is expected since enabling debug-swap requires use of 
KVM_SEV_INIT2, which implies a VM type of KVM_X86_SEV_ES_VM. However, 
SEV-ES VMs that do not enable any VMSA SEV features (and are hence 
KVM_X86_DEFAULT_VM type) are allowed to continue to launch though they 
are also susceptible to this issue.

One of the suggestions in the call was to consider returning an error to 
userspace instead. Is this close to what you had in mind:

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 73cdcbccc89e..19e27ed27e17 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -387,8 +387,10 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
         * they can fix it by changing memory to shared, or they can
         * provide a better error.
         */
-       if (r == RET_PF_EMULATE && fault.is_private) {
-               pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
+       if (r == RET_PF_EMULATE && (fault.is_private ||
+           (!fault.map_writable && fault.write && vcpu->arch.guest_state_protected))) {
+               if (fault.is_private)
+                       pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
                kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
                return -EFAULT;
        }

This seems to work though Qemu seems to think we are asking it to 
convert the memory to shared (so we probably need to signal this error 
some other way?):
  qemu-system-x86_64: Convert non guest_memfd backed memory region (0xf0000 ,+ 0x1000) to shared

Thoughts?


Thanks,
Naveen

--
(*) This requires below patches for Qemu, unless using IGVM:
https://lore.kernel.org/qemu-devel/cover.1761648149.git.naveen@kernel.org/


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ