linux-kernel - Re: [PATCH v2 2/5] KVM: guest_memfd: Remove preparation tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260108213811.qhurtlw3gto5yhfy@amd.com>
Date: Thu, 8 Jan 2026 15:38:11 -0600
From: Michael Roth <michael.roth@....com>
To: "Huang, Kai" <kai.huang@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "david@...hat.com"
	<david@...hat.com>, "liam.merwick@...cle.com" <liam.merwick@...cle.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "aik@....com" <aik@....com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "vbabka@...e.cz" <vbabka@...e.cz>,
	"ashish.kalra@....com" <ashish.kalra@....com>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Weiny, Ira" <ira.weiny@...el.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>
Subject: Re: [PATCH v2 2/5] KVM: guest_memfd: Remove preparation tracking

On Thu, Dec 18, 2025 at 10:53:14PM +0000, Huang, Kai wrote:
> 
> >  /*
> >   * Process @folio, which contains @gfn, so that the guest can use it.
> >   * The folio must be locked and the gfn must be contained in @slot.
> > @@ -90,13 +85,7 @@ static inline void kvm_gmem_mark_prepared(struct folio *folio)
> >  static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  				  gfn_t gfn, struct folio *folio)
> >  {
> > -	unsigned long nr_pages, i;
> >  	pgoff_t index;
> > -	int r;
> > -
> > -	nr_pages = folio_nr_pages(folio);
> > -	for (i = 0; i < nr_pages; i++)
> > -		clear_highpage(folio_page(folio, i));
> > 
> 
> Here the entire folio is cleared, but ...
> 
> [...]
> 
> > -	folio = __kvm_gmem_get_pfn(file, slot, index, pfn, &is_prepared, max_order);
> > +	folio = __kvm_gmem_get_pfn(file, slot, index, pfn, max_order);
> >  	if (IS_ERR(folio))
> >  		return PTR_ERR(folio);
> >  
> > -	if (!is_prepared)
> > -		r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
> > +	if (!folio_test_uptodate(folio)) {
> > +		clear_highpage(folio_page(folio, 0));
> > +		folio_mark_uptodate(folio);
> > +	}
> 
> ... here only the first page is cleared.
> 
> I understand currently there's no huge folio coming out of gmem now, but
> since both __kvm_gmem_get_pfn() and kvm_gmem_get_pfn() still have
> @max_level as output, it's kinda inconsistent here.
> 
> I also see kvm_gmem_fault_user_mapping() only clears the first page too,
> but I think that already has assumption that folio can never be huge
> currently?
> 
> Given this, and the fact that the first patch of this series has
> introduced 
> 
> 	WARN_ON_ONCE(folio_order(folio));
> 
> in kvm_gmem_get_folio(), I think it's fine to only clear the first page,
> but for the sake of consistency, perhaps we should just remove @max_order
> from __kvm_gmem_get_pfn() and kvm_gmem_get_pfn()?
> 
> Then we can handle huge folio logic when that comes to play.

Sean had mentioned during the PUCK prior to this that he was okay with
stripping out traces of hugepage support from kvm_gmem_populate() path,
since it's bringing about unecessary complexity for a use-case we'll
potentially never support. We will however eventually support hugepages
outside the kvm_gmem_populate() path, and the bits and pieces of the
API that plumb those details into KVM MMU code are more useful to keep
around since there's existing hugepage support in KVM MMU that make it
clearer where/when we'll need it. So I'm not sure we gain much from the
churn of stripping it out. However, if as part of wiring up hugepage
support those interfaces prove insufficient, then I wouldn't be opposed
to similarly adding pre-patches to strip it out for a cleaner base
implementation, but I don't really think that need to be part fo this
series which is focused more on the population path rather than fault
handling at run-time, so I've left things as-is for v3 for now.

> 
> Btw:
> 
> I actually looked into the RFC v1 discussion but the code there actually
> does a loop to clear all pages in the folio.  There were some other

The thinking there was to try to not actively break the hugepage-related
bits that were already in place, but since we decided to implement
hugepage-related stuff for kvm_gmem_populate() as a clean follow-up
implementation, there's no need to consider the hugepage case and loop
through the page.

> discussions about AFAICT they were more related to issues regarding to 
> "mark entire folio as uptodate while only one page is processed in
> post_populate()".
> 
> Btw2:
> 
> There was also discussion that clearing page isn't required for TDX.  To
> that end, maybe we can remove clearing page from gmem common code but to
> SEV code, e.g., as part of "folio preparation"?

I think we are considering this approach for TDX and there was some
discussion of having gmem-internal flags to select for this type of
handling, but I think that would make more sense as part of TDX-specific
enablement of in-place conversion. I'm planning to post the SNP-specific
enablement of in-place conversion patches on top of this series, so
maybe we can consider this in response to that series or as part of the
TDX-specific enablement.

-Mike