lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGtprH-VcO4VJkgRZJPS8SEpdO6Xsjmw3CeGYCFchjkdROoMLg@mail.gmail.com>
Date: Tue, 5 Aug 2025 07:30:17 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Ira Weiny <ira.weiny@...el.com>, Yan Zhao <yan.y.zhao@...el.com>, 
	Michael Roth <michael.roth@....com>, pbonzini@...hat.com, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, rick.p.edgecombe@...el.com, kai.huang@...el.com, 
	adrian.hunter@...el.com, reinette.chatre@...el.com, xiaoyao.li@...el.com, 
	tony.lindgren@...el.com, binbin.wu@...ux.intel.com, dmatlack@...gle.com, 
	isaku.yamahata@...el.com, david@...hat.com, ackerleytng@...gle.com, 
	tabba@...gle.com, chao.p.peng@...el.com
Subject: Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()

On Mon, Aug 4, 2025 at 6:20 PM Vishal Annapurve <vannapurve@...gle.com> wrote:
>
> On Mon, Aug 4, 2025 at 5:22 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > > > > IIUC, the suggestion in the link is to abandon kvm_gmem_populate().
> > > > > > For TDX, it means adopting the approach in this RFC patch, right?
> > > > > Yes, IMO this RFC is following the right approach as posted.
> >
> > I don't think we want to abandon kvm_gmem_populate().  Unless I'm missing something,
> > SNP has the same AB-BA problem as TDX.  The copy_from_user() on @src can trigger
> > a page fault, and resolving the page fault may require taking mm->mmap_lock.
> >
> > Fundamentally, TDX and SNP are doing the same thing: copying from source to guest
> > memory.  The only differences are in the mechanics of the copy+encrypt, everything
> > else is the same.  I.e. I don't expect that we'll find a magic solution that works
> > well for one and not the other.
> >
> > I also don't want to end up with wildly different ABI for SNP vs. everything else.
> > E.g. cond_resched() needs to be called if the to-be-initialzied range is large,
> > which means dropping mmu_lock between pages, whereas kvm_gmem_populate() can
> > yield without dropping invalidate_lock, which means that the behavior of populating
> > guest_memfd memory will be quite different with respect to guest_memfd operations.
>
> I would think that TDX/CCA VMs [1] will run into the similar behavior
> of needing to simulate stage2 faults i.e. KVM will end up picking up
> and dropping mmu_lock for each page anyways at least for these two
> platforms.
>
> [1] https://lore.kernel.org/kvm/20250611104844.245235-5-steven.price@arm.com/
> (rmi_rtt_create())
>
> >
> > Pulling in the RFC text:
> >
> > : I think the only different scenario is SNP, where the host must write
> > : initial contents to guest memory.
> > :
> > : Will this work for all cases CCA/SNP/TDX during initial memory
> > : population from within KVM:
> > : 1) Simulate stage2 fault
> > : 2) Take a KVM mmu read lock
> >
> > Doing all of this under mmu_lock is pretty much a non-starter.

Looking closer at CPU <-> PSP communication which is not implemented
to work within an atomic context, I agree now that this wouldn't work
for SNP VMs.


> >
> > : 3) Check that the needed gpa is mapped in EPT/NPT entries
> >
> > No, KVM's page tables are not the source of truth.  S-EPT is a special snowflake,
> > and I'd like to avoid foisting the same requirements on NPT.
>
> I agree this would be a new requirement.
>
> >
> > : 4) For SNP, if src != null, make the target pfn to be shared, copy
> > : contents and then make the target pfn back to private.
> >
> > Copying from userspace under spinlock (rwlock) is illegal, as accessing userspace
> > memory might_fault() and thus might_sleep().
>
> I would think that a combination of get_user_pages() and
> kmap_local_pfn() will prevent this situation of might_fault().
>
> >
> > : 5) For TDX, if src != null, pass the same address for source and
> > : target (likely this works for CCA too)
> > : 6) Invoke appropriate memory encryption operations
> > : 7) measure contents
> > : 8) release the KVM mmu read lock
> > :
> > : If this scheme works, ideally we should also not call RMP table
> > : population logic from guest_memfd, but from KVM NPT fault handling
> > : logic directly (a bit of cosmetic change).
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ