[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXqQFo9S-UVMYfn-@google.com>
Date: Wed, 28 Jan 2026 14:39:18 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
x86@...nel.org, rick.p.edgecombe@...el.com, dave.hansen@...el.com,
kas@...nel.org, tabba@...gle.com, ackerleytng@...gle.com,
michael.roth@....com, david@...nel.org, vannapurve@...gle.com,
sagis@...gle.com, vbabka@...e.cz, thomas.lendacky@....com,
nik.borisov@...e.com, pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com,
francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com,
isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com,
binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 16/24] KVM: guest_memfd: Split for punch hole and
private-to-shared conversion
On Tue, Jan 06, 2026, Yan Zhao wrote:
> virt/kvm/guest_memfd.c | 67 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 67 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 03613b791728..8e7fbed57a20 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -486,6 +486,55 @@ static int merge_truncate_range(struct inode *inode, pgoff_t start,
> return ret;
> }
>
> +static int __kvm_gmem_split_private(struct gmem_file *f, pgoff_t start, pgoff_t end)
> +{
> + enum kvm_gfn_range_filter attr_filter = KVM_FILTER_PRIVATE;
> +
> + bool locked = false;
> + struct kvm_memory_slot *slot;
> + struct kvm *kvm = f->kvm;
> + unsigned long index;
> + int ret = 0;
> +
> + xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
> + pgoff_t pgoff = slot->gmem.pgoff;
> + struct kvm_gfn_range gfn_range = {
> + .start = slot->base_gfn + max(pgoff, start) - pgoff,
> + .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
> + .slot = slot,
> + .may_block = true,
> + .attr_filter = attr_filter,
> + };
> +
> + if (!locked) {
> + KVM_MMU_LOCK(kvm);
> + locked = true;
> + }
> +
> + ret = kvm_split_cross_boundary_leafs(kvm, &gfn_range, false);
This bleeds TDX details all over guest_memfd. Presumably SNP needs a similar
callback to update the RMP, but SNP most definitely doesn't _need_ to split
hugepages that now have mixed attributes. In fact, SNP can probably do literally
nothing here and let kvm_gmem_zap() do the heavy lifting.
Sadly, an arch hook is "necessary", because otherwise we'll end up in dependency
hell. E.g. I _want_ to just let the TDP MMU do the splits during kvm_gmem_zap(),
but then an -ENOMEM when splitting would result in a partial conversion if more
than one KVM instance was bound to the gmem instance (ignoring that it's actually
"fine" for the TDX case, because only one S-EPT tree can have a valid mapping).
Even if we're willing to live with that assumption baked into the TDP MMU, we'd
still need to allow kvm_gmem_zap() to fail, e.g. because -ENOMEM isn't strictly
fatal. And I really, really don't want to set the precedence that "zap" operations
are allow to fail.
But those details absolutely do not belong in guest_memfd.c. Provide an arch
hook to give x86 the opportunity to pre-split hugepages, but keep the details
in arch code.
static int __kvm_gmem_convert(struct gmem_file *f, pgoff_t start, pgoff_t end,
bool to_private)
{
struct kvm_memory_slot *slot;
unsigned long index;
int r;
xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
r = kvm_arch_gmem_convert(f->kvm,
kvm_gmem_get_start_gfn(slot, start),
kvm_gmem_get_end_gfn(slot, end),
to_private);
if (r)
return r;
}
return 0;
}
static int kvm_gmem_convert(struct inode *inode, pgoff_t start, pgoff_t end,
bool to_private)
{
struct gmem_file *f;
int r;
kvm_gmem_for_each_file(f, inode->i_mapping) {
r = __kvm_gmem_convert(f, start, end, to_private);
if (r)
return r;
}
return 0;
}
Powered by blists - more mailing lists