linux-kernel - Re: [RFC PATCH v2 02/18] KVM: x86/mmu: Add dedicated API to map guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aLINIpCx7uBPtOUa@google.com>
Date: Fri, 29 Aug 2025 13:27:14 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: "pbonzini@...hat.com" <pbonzini@...hat.com>, Kai Huang <kai.huang@...el.com>, 
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>, Vishal Annapurve <vannapurve@...gle.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Yan Y Zhao <yan.y.zhao@...el.com>, 
	Ira Weiny <ira.weiny@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"michael.roth@....com" <michael.roth@....com>
Subject: Re: [RFC PATCH v2 02/18] KVM: x86/mmu: Add dedicated API to map
 guest_memfd pfn into TDP MMU

On Fri, Aug 29, 2025, Rick P Edgecombe wrote:
> On Thu, 2025-08-28 at 17:06 -0700, Sean Christopherson wrote:
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4994,6 +4994,65 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> >  	return min(range->size, end - range->gpa);
> >  }
> >  
> > +int kvm_tdp_mmu_map_private_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> > +{
> > +	struct kvm_page_fault fault = {
> > +		.addr = gfn_to_gpa(gfn),
> > +		.error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS,
> > +		.prefetch = true,
> > +		.is_tdp = true,
> > +		.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(vcpu->kvm),
> 
> These fault's don't have fault->exec so nx_huge_page_workaround_enabled
> shouldn't be a factor. Not a functional issue though. Maybe it is more robust?

Whether or not the fault itself is EXEC is irrelevant, nx_huge_page_workaround_enabled
is used to ensure KVM doesn't create hugepage overtop an exiting EXEC 4KiB mapping.
Of course, this fault is irrelevant on that front as well.  But I don't see any
reason to get cute and let .nx_huge_page_workaround_enabled be stale.

> > +
> > +		.max_level = PG_LEVEL_4K,
> > +		.req_level = PG_LEVEL_4K,
> > +		.goal_level = PG_LEVEL_4K,
> > +		.is_private = true,
> > +
> > +		.gfn = gfn,
> > +		.slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn),
> > +		.pfn = pfn,
> > +		.map_writable = true,
> > +	};
> > +	struct kvm *kvm = vcpu->kvm;
> > +	int r;
> > +
> > +	lockdep_assert_held(&kvm->slots_lock);
> > +
> > +	if (KVM_BUG_ON(!tdp_mmu_enabled, kvm))
> > +		return -EIO;
> > +
> > +	if (kvm_gfn_is_write_tracked(kvm, fault.slot, fault.gfn))
> > +		return -EPERM;
> 
> If we care about this, why don't we care about the read only memslot flag?

Because private memory fundamentally can't support read-only memslots.  If we
wanted to be paranoid, this code could assert that the memslot can be private
but for me that reaches a pointless level of paranoia.

> TDX doesn't need this or the nx huge page part above. So this function is
> more general.

I don't see anything that makes nx_huge_page_workaround_enabled mutually exclusive
with TDX though.

> What about calling it __kvm_tdp_mmu_map_private_pfn() and making it a powerful
> "map this pfn at this GFN and don't ask questions" function. Otherwise, I'm not
> sure where to draw the line.

Eh, for me, the line is pretty clear.  This is obviously specific to private memory,
and so implies a guest_memfd source, a private pfn, and everything that comes
along with private gmem pfns.  Everything else should be accounted for.