linux-kernel - Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGtprH8+x5Z=tPz=NcrQM6Dor2AYBu3jiZdo+Lg4NqAk0pUJ3w@mail.gmail.com>
Date: Sat, 12 Jul 2025 10:38:01 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Michael Roth <michael.roth@....com>, Yan Zhao <yan.y.zhao@...el.com>, pbonzini@...hat.com, 
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org, rick.p.edgecombe@...el.com, 
	kai.huang@...el.com, adrian.hunter@...el.com, reinette.chatre@...el.com, 
	xiaoyao.li@...el.com, tony.lindgren@...el.com, binbin.wu@...ux.intel.com, 
	dmatlack@...gle.com, isaku.yamahata@...el.com, ira.weiny@...el.com, 
	david@...hat.com, ackerleytng@...gle.com, tabba@...gle.com, 
	chao.p.peng@...el.com
Subject: Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()

On Fri, Jul 11, 2025 at 11:46 AM Vishal Annapurve <vannapurve@...gle.com> wrote:
>
> On Fri, Jul 11, 2025 at 8:40 AM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Fri, Jul 11, 2025, Michael Roth wrote:
> > > On Fri, Jul 11, 2025 at 12:36:24PM +0800, Yan Zhao wrote:
> > > > Besides, it can't address the 2nd AB-BA lock issue as mentioned in the patch
> > > > log:
> > > >
> > > > Problem
> > > > ===
> > > > ...
> > > > (2)
> > > > Moreover, in step 2, get_user_pages_fast() may acquire mm->mmap_lock,
> > > > resulting in the following lock sequence in tdx_vcpu_init_mem_region():
> > > > - filemap invalidation lock --> mm->mmap_lock
> > > >
> > > > However, in future code, the shared filemap invalidation lock will be held
> > > > in kvm_gmem_fault_shared() (see [6]), leading to the lock sequence:
> > > > - mm->mmap_lock --> filemap invalidation lock
> > >
> > > I wouldn't expect kvm_gmem_fault_shared() to trigger for the
> > > KVM_MEMSLOT_SUPPORTS_GMEM_SHARED case (or whatever we end up naming it).
> >
> > Irrespective of shared faults, I think the API could do with a bit of cleanup
> > now that TDX has landed, i.e. now that we can see a bit more of the picture.
> >
> > As is, I'm pretty sure TDX is broken with respect to hugepage support, because
> > kvm_gmem_populate() marks an entire folio as prepared, but TDX only ever deals
> > with one page at a time.  So that needs to be changed.  I assume it's already
> > address in one of the many upcoming series, but it still shows a flaw in the API.
> >
> > Hoisting the retrieval of the source page outside of filemap_invalidate_lock()
> > seems pretty straightforward, and would provide consistent ABI for all vendor
>
> Will relying on standard KVM -> guest_memfd interaction i.e.
> simulating a second stage fault to get the right target address work
> for all vendors i.e. CCA/SNP/TDX? If so, we might not have to maintain
> this out of band path of kvm_gmem_populate.

I think the only different scenario is SNP, where the host must write
initial contents to guest memory.

Will this work for all cases CCA/SNP/TDX during initial memory
population from within KVM:
1) Simulate stage2 fault
2) Take a KVM mmu read lock
3) Check that the needed gpa is mapped in EPT/NPT entries
4) For SNP, if src != null, make the target pfn to be shared, copy
contents and then make the target pfn back to private.
5) For TDX, if src != null, pass the same address for source and
target (likely this works for CCA too)
6) Invoke appropriate memory encryption operations
7) measure contents
8) release the KVM mmu read lock

If this scheme works, ideally we should also not call RMP table
population logic from guest_memfd, but from KVM NPT fault handling
logic directly (a bit of cosmetic change). Ideally any outgoing
interaction from guest_memfd to KVM should be only via invalidation
notifiers.





>
> > flavors.  E.g. as is, non-struct-page memory will work for SNP, but not TDX.  The
> > obvious downside is that struct-page becomes a requirement for SNP, but that
> >
>
> Maybe you had more thought here?