[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLINIpCx7uBPtOUa@google.com>
Date: Fri, 29 Aug 2025 13:27:14 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: "pbonzini@...hat.com" <pbonzini@...hat.com>, Kai Huang <kai.huang@...el.com>,
"ackerleytng@...gle.com" <ackerleytng@...gle.com>, Vishal Annapurve <vannapurve@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Yan Y Zhao <yan.y.zhao@...el.com>,
Ira Weiny <ira.weiny@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"michael.roth@....com" <michael.roth@....com>
Subject: Re: [RFC PATCH v2 02/18] KVM: x86/mmu: Add dedicated API to map
guest_memfd pfn into TDP MMU
On Fri, Aug 29, 2025, Rick P Edgecombe wrote:
> On Thu, 2025-08-28 at 17:06 -0700, Sean Christopherson wrote:
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4994,6 +4994,65 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> > return min(range->size, end - range->gpa);
> > }
> >
> > +int kvm_tdp_mmu_map_private_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> > +{
> > + struct kvm_page_fault fault = {
> > + .addr = gfn_to_gpa(gfn),
> > + .error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS,
> > + .prefetch = true,
> > + .is_tdp = true,
> > + .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(vcpu->kvm),
>
> These fault's don't have fault->exec so nx_huge_page_workaround_enabled
> shouldn't be a factor. Not a functional issue though. Maybe it is more robust?
Whether or not the fault itself is EXEC is irrelevant, nx_huge_page_workaround_enabled
is used to ensure KVM doesn't create hugepage overtop an exiting EXEC 4KiB mapping.
Of course, this fault is irrelevant on that front as well. But I don't see any
reason to get cute and let .nx_huge_page_workaround_enabled be stale.
> > +
> > + .max_level = PG_LEVEL_4K,
> > + .req_level = PG_LEVEL_4K,
> > + .goal_level = PG_LEVEL_4K,
> > + .is_private = true,
> > +
> > + .gfn = gfn,
> > + .slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn),
> > + .pfn = pfn,
> > + .map_writable = true,
> > + };
> > + struct kvm *kvm = vcpu->kvm;
> > + int r;
> > +
> > + lockdep_assert_held(&kvm->slots_lock);
> > +
> > + if (KVM_BUG_ON(!tdp_mmu_enabled, kvm))
> > + return -EIO;
> > +
> > + if (kvm_gfn_is_write_tracked(kvm, fault.slot, fault.gfn))
> > + return -EPERM;
>
> If we care about this, why don't we care about the read only memslot flag?
Because private memory fundamentally can't support read-only memslots. If we
wanted to be paranoid, this code could assert that the memslot can be private
but for me that reaches a pointless level of paranoia.
> TDX doesn't need this or the nx huge page part above. So this function is
> more general.
I don't see anything that makes nx_huge_page_workaround_enabled mutually exclusive
with TDX though.
> What about calling it __kvm_tdp_mmu_map_private_pfn() and making it a powerful
> "map this pfn at this GFN and don't ask questions" function. Otherwise, I'm not
> sure where to draw the line.
Eh, for me, the line is pretty clear. This is obviously specific to private memory,
and so implies a guest_memfd source, a private pfn, and everything that comes
along with private gmem pfns. Everything else should be accounted for.
Powered by blists - more mailing lists