lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZpbKqG_ZhCWxl-Fc@google.com>
Date: Tue, 16 Jul 2024 12:31:52 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: isaku.yamahata@...el.com
Cc: kvm@...r.kernel.org, isaku.yamahata@...il.com, 
	Paolo Bonzini <pbonzini@...hat.com>, rick.p.edgecombe@...el.com, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: x86/mmu: Allow per VM kvm_mmu_max_gfn()

On Mon, Jul 15, 2024, isaku.yamahata@...el.com wrote:
> From: Isaku Yamahata <isaku.yamahata@...el.com>
> 
> Prepare for TDX support by making kvm_mmu_max_gfn() configurable.  Have
> this preparation also be useful by non-TDX changes to improve correctness
> associated with the combination of 4-level EPT and MAXPA > 48.  The issue
> is analyzed at [1].
> 
> Like other confidential computing technologies, TDX has the concept of
> private and shared memory.  For TDX, the private and shared mappings of the
> same GFN are mapped at separate GPAs, with a configurable GPA bit selecting
> between private and shared aliases of the same GFN.  This means that
> operations working with the concept of a maximum GFN should only go up to
> this configurable GPA bit instead of the existing behavior based on
> shadow_phys_bits.  Other TDX changes will handle applying the operation
> across both GFN aliases.

This is going to be confusing.  A TDX will be all but guaranteed to generate an
EPT violation on a gfn > kvm->mmu_max_gfn.

> Using the existing kvm_mmu_max_gfn() based on shadow_phys_bits would cause
> functional problems for TDX.  Specifically, because the function is used to
> restrict the range where memslots can be created.  For TDX, if a memslot is
> created at a GFN that includes the bit that selects between private/shared,
> it would confuse the logic that zaps both aliases.  It would end up zapping
> only the higher alias and missing the lower one.  In this case, freed pages
> could remain mapped in the TDX guest.

So why don't we simply disallow creating aliased memslots?

> Since this GPA bit is configurable per-VM, make kvm_mmu_max_gfn() per-VM by
> having it take a struct kvm, and keep the max GFN as a member on that
> struct.  Future TDX changes will set this member based on the configurable
> position of the private/shared bit.
> 
> Besides functional issues, it is generally more correct and easier to

No, it most definitely is not more correct.  There is absolutely no issue zapping
SPTEs that should never exist.  In fact, restricting the zapping path is far more
likely to *cause* correctness issues, e.g. see 

  524a1e4e381f ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs")
  86931ff7207b ("KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR")

Creating SPTEs is a different matter, but unless I'm missing something, the only
path that _needs_ to be updated is kvm_arch_prepare_memory_region(), to disallow
aliased memslots.

I assume TDX will strip the shared bit from fault->gfn, and shove it back in when
creating MMIO SPTEs in the shared EPT page tables.

Why can't we simply do:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 842a3a4cdfe9..5ea428dde891 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4395,6 +4395,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
        struct kvm_memory_slot *slot = fault->slot;
        int ret;
 
+       if (WARN_ON_ONCE(vcpu->kvm, kvm_is_gfn_alias(fault->gfn)))
+               return -EFAULT;
+
        /*
         * Note that the mmu_invalidate_seq also serves to detect a concurrent
         * change in attributes.  is_page_fault_stale() will detect an
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 994743266480..091da7607025 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12979,6 +12979,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
                if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
                        return -EINVAL;
 
+               if (kvm_is_gfn_alias(kvm, new->base_gfn + new->npages - 1))
+                       return -EINVAL;
+
                return kvm_alloc_memslot_metadata(kvm, new);
        }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ