linux-kernel - Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3fa97619ca852854408eec8bb2f7a0ed635b3c1d.camel@intel.com>
Date: Tue, 28 May 2024 18:29:59 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "pbonzini@...hat.com" <pbonzini@...hat.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai" <kai.huang@...el.com>,
	"sagis@...gle.com" <sagis@...gle.com>, "isaku.yamahata@...ux.intel.com"
	<isaku.yamahata@...ux.intel.com>, "Aktas, Erdem" <erdemaktas@...gle.com>,
	"Zhao, Yan Y" <yan.y.zhao@...el.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "dmatlack@...gle.com" <dmatlack@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>
Subject: Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for
 TDP MMU

On Tue, 2024-05-28 at 19:16 +0200, Paolo Bonzini wrote:
> > > After this, gfn_t's never have shared bit. It's a simple rule. The MMU
> > > mostly
> > > thinks it's operating on a shared root that is mapped at the normal GFN.
> > > Only
> > > the iterator knows that the shared PTEs are actually in a different
> > > location.
> > > 
> > > There are some negative side effects:
> > > 1. The struct kvm_mmu_page's gfn doesn't match it's actual mapping
> > > anymore.
> > > 2. As a result of above, the code that flushes TLBs for a specific GFN
> > > will be
> > > confused. It won't functionally matter for TDX, just look buggy to see
> > > flushing
> > > code called with the wrong gfn.
> > 
> > flush_remote_tlbs_range() is only for Hyper-V optimization.  In other cases,
> > x86_op.flush_remote_tlbs_range = NULL or the member isn't defined at compile
> > time.  So the remote tlb flush falls back to flushing whole range.  I don't
> > expect TDX in hyper-V guest.  I have to admit that the code looks
> > superficially
> > broken and confusing.
> 
> You could add an "&& kvm_has_private_root(kvm)" to
> kvm_available_flush_remote_tlbs_range(), since
> kvm_has_private_root(kvm) is sort of equivalent to "there is no 1:1
> correspondence between gfn and PTE to be flushed".
> 
> I am conflicted myself, but the upsides below are pretty substantial.

It looks like kvm_available_flush_remote_tlbs_range() is not checked in many of
the paths that get to x86_ops.flush_remote_tlbs_range().

So maybe something like:
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 65bbda95acbb..e09bb6c50a0b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1959,14 +1959,7 @@ static inline int kvm_arch_flush_remote_tlbs(struct kvm
*kvm)
 
 #if IS_ENABLED(CONFIG_HYPERV)
 #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
-static inline int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn,
-                                                  u64 nr_pages)
-{
-       if (!kvm_x86_ops.flush_remote_tlbs_range)
-               return -EOPNOTSUPP;
-
-       return static_call(kvm_x86_flush_remote_tlbs_range)(kvm, gfn, nr_pages);
-}
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages);
 #endif /* CONFIG_HYPERV */
 
 enum kvm_intr_type {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 43d70f4c433d..9dc1b3db286d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -14048,6 +14048,14 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu,
unsigned int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
 
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages)
+{
+       if (!kvm_x86_ops.flush_remote_tlbs_range || kvm_gfn_direct_mask(kvm))
+               return -EOPNOTSUPP;
+
+       return static_call(kvm_x86_flush_remote_tlbs_range)(kvm, gfn, nr_pages);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio);


Regarding the kvm_gfn_direct_mask() usage, in the current WIP code we have
renamed things around the concepts of "mirrored roots" and "direct masks". The
mirrored root, just means "also go off an update something else" (S-EPT). The
direct mask, just means when on the direct root, shift the actual page table
mapping using the mask (shared memory). Kai raised that all TDX special stuff in
the x86 MMU around handling private memory is confusing from the SEV
perspective, so we were trying to rename those things to something related, but
generic instead of "private".

So the TLB flush confusion is more about that the direct GFNs are shifted by
something (i.e. kvm_gfn_direct_mask() returns non-zero).