linux-kernel - Re: [PATCH v3 16/24] KVM: guest_memfd: Split for punch hole and private-to-shared conversion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aXqQFo9S-UVMYfn-@google.com>
Date: Wed, 28 Jan 2026 14:39:18 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, linux-kernel@...r.kernel.org, kvm@...r.kernel.org, 
	x86@...nel.org, rick.p.edgecombe@...el.com, dave.hansen@...el.com, 
	kas@...nel.org, tabba@...gle.com, ackerleytng@...gle.com, 
	michael.roth@....com, david@...nel.org, vannapurve@...gle.com, 
	sagis@...gle.com, vbabka@...e.cz, thomas.lendacky@....com, 
	nik.borisov@...e.com, pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com, 
	francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com, 
	binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 16/24] KVM: guest_memfd: Split for punch hole and
 private-to-shared conversion

On Tue, Jan 06, 2026, Yan Zhao wrote:
>  virt/kvm/guest_memfd.c | 67 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 03613b791728..8e7fbed57a20 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -486,6 +486,55 @@ static int merge_truncate_range(struct inode *inode, pgoff_t start,
>  	return ret;
>  }
>  
> +static int __kvm_gmem_split_private(struct gmem_file *f, pgoff_t start, pgoff_t end)
> +{
> +	enum kvm_gfn_range_filter attr_filter = KVM_FILTER_PRIVATE;
> +
> +	bool locked = false;
> +	struct kvm_memory_slot *slot;
> +	struct kvm *kvm = f->kvm;
> +	unsigned long index;
> +	int ret = 0;
> +
> +	xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
> +		pgoff_t pgoff = slot->gmem.pgoff;
> +		struct kvm_gfn_range gfn_range = {
> +			.start = slot->base_gfn + max(pgoff, start) - pgoff,
> +			.end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
> +			.slot = slot,
> +			.may_block = true,
> +			.attr_filter = attr_filter,
> +		};
> +
> +		if (!locked) {
> +			KVM_MMU_LOCK(kvm);
> +			locked = true;
> +		}
> +
> +		ret = kvm_split_cross_boundary_leafs(kvm, &gfn_range, false);

This bleeds TDX details all over guest_memfd.  Presumably SNP needs a similar
callback to update the RMP, but SNP most definitely doesn't _need_ to split
hugepages that now have mixed attributes.  In fact, SNP can probably do literally
nothing here and let kvm_gmem_zap() do the heavy lifting.

Sadly, an arch hook is "necessary", because otherwise we'll end up in dependency
hell.  E.g. I _want_ to just let the TDP MMU do the splits during kvm_gmem_zap(),
but then an -ENOMEM when splitting would result in a partial conversion if more
than one KVM instance was bound to the gmem instance (ignoring that it's actually
"fine" for the TDX case, because only one S-EPT tree can have a valid mapping).

Even if we're willing to live with that assumption baked into the TDP MMU, we'd
still need to allow kvm_gmem_zap() to fail, e.g. because -ENOMEM isn't strictly
fatal.  And I really, really don't want to set the precedence that "zap" operations
are allow to fail.

But those details absolutely do not belong in guest_memfd.c.  Provide an arch
hook to give x86 the opportunity to pre-split hugepages, but keep the details
in arch code.

static int __kvm_gmem_convert(struct gmem_file *f, pgoff_t start, pgoff_t end,
			      bool to_private)
{
	struct kvm_memory_slot *slot;
	unsigned long index;
	int r;

	xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
		r = kvm_arch_gmem_convert(f->kvm,
					  kvm_gmem_get_start_gfn(slot, start),
					  kvm_gmem_get_end_gfn(slot, end),
					  to_private);
		if (r)
			return r;
	}
	return 0;
}

static int kvm_gmem_convert(struct inode *inode, pgoff_t start, pgoff_t end,
			    bool to_private)
{
	struct gmem_file *f;
	int r;

	kvm_gmem_for_each_file(f, inode->i_mapping) {
		r = __kvm_gmem_convert(f, start, end, to_private);
		if (r)
			return r;
	}
	return 0;
}