[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHEwT4X0RcfZzHlt@google.com>
Date: Fri, 11 Jul 2025 08:39:59 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Michael Roth <michael.roth@....com>
Cc: Yan Zhao <yan.y.zhao@...el.com>, pbonzini@...hat.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, rick.p.edgecombe@...el.com, kai.huang@...el.com,
adrian.hunter@...el.com, reinette.chatre@...el.com, xiaoyao.li@...el.com,
tony.lindgren@...el.com, binbin.wu@...ux.intel.com, dmatlack@...gle.com,
isaku.yamahata@...el.com, ira.weiny@...el.com, vannapurve@...gle.com,
david@...hat.com, ackerleytng@...gle.com, tabba@...gle.com,
chao.p.peng@...el.com
Subject: Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()
On Fri, Jul 11, 2025, Michael Roth wrote:
> On Fri, Jul 11, 2025 at 12:36:24PM +0800, Yan Zhao wrote:
> > Besides, it can't address the 2nd AB-BA lock issue as mentioned in the patch
> > log:
> >
> > Problem
> > ===
> > ...
> > (2)
> > Moreover, in step 2, get_user_pages_fast() may acquire mm->mmap_lock,
> > resulting in the following lock sequence in tdx_vcpu_init_mem_region():
> > - filemap invalidation lock --> mm->mmap_lock
> >
> > However, in future code, the shared filemap invalidation lock will be held
> > in kvm_gmem_fault_shared() (see [6]), leading to the lock sequence:
> > - mm->mmap_lock --> filemap invalidation lock
>
> I wouldn't expect kvm_gmem_fault_shared() to trigger for the
> KVM_MEMSLOT_SUPPORTS_GMEM_SHARED case (or whatever we end up naming it).
Irrespective of shared faults, I think the API could do with a bit of cleanup
now that TDX has landed, i.e. now that we can see a bit more of the picture.
As is, I'm pretty sure TDX is broken with respect to hugepage support, because
kvm_gmem_populate() marks an entire folio as prepared, but TDX only ever deals
with one page at a time. So that needs to be changed. I assume it's already
address in one of the many upcoming series, but it still shows a flaw in the API.
Hoisting the retrieval of the source page outside of filemap_invalidate_lock()
seems pretty straightforward, and would provide consistent ABI for all vendor
flavors. E.g. as is, non-struct-page memory will work for SNP, but not TDX. The
obvious downside is that struct-page becomes a requirement for SNP, but that
The below could be tweaked to batch get_user_pages() into an array of pointers,
but given that both SNP and TDX can only operate on one 4KiB page at a time, and
that hugepage support doesn't yet exist, trying to super optimize the hugepage
case straightaway doesn't seem like a pressing concern.
static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
struct file *file, gfn_t gfn, void __user *src,
kvm_gmem_populate_cb post_populate, void *opaque)
{
pgoff_t index = kvm_gmem_get_index(slot, gfn);
struct page *src_page = NULL;
bool is_prepared = false;
struct folio *folio;
int ret, max_order;
kvm_pfn_t pfn;
if (src) {
ret = get_user_pages((unsigned long)src, 1, 0, &src_page);
if (ret < 0)
return ret;
if (ret != 1)
return -ENOMEM;
}
filemap_invalidate_lock(file->f_mapping);
if (!kvm_range_has_memory_attributes(kvm, gfn, gfn + 1,
KVM_MEMORY_ATTRIBUTE_PRIVATE,
KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
ret = -EINVAL;
goto out_unlock;
}
folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &is_prepared, &max_order);
if (IS_ERR(folio)) {
ret = PTR_ERR(folio);
goto out_unlock;
}
folio_unlock(folio);
if (is_prepared) {
ret = -EEXIST;
goto out_put_folio;
}
ret = post_populate(kvm, gfn, pfn, src_page, opaque);
if (!ret)
kvm_gmem_mark_prepared(folio);
out_put_folio:
folio_put(folio);
out_unlock:
filemap_invalidate_unlock(file->f_mapping);
if (src_page)
put_page(src_page);
return ret;
}
long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
kvm_gmem_populate_cb post_populate, void *opaque)
{
struct file *file;
struct kvm_memory_slot *slot;
void __user *p;
int ret = 0;
long i;
lockdep_assert_held(&kvm->slots_lock);
if (npages < 0)
return -EINVAL;
slot = gfn_to_memslot(kvm, start_gfn);
if (!kvm_slot_can_be_private(slot))
return -EINVAL;
file = kvm_gmem_get_file(slot);
if (!file)
return -EFAULT;
npages = min_t(ulong, slot->npages - (start_gfn - slot->base_gfn), npages);
for (i = 0; i < npages; i ++) {
if (signal_pending(current)) {
ret = -EINTR;
break;
}
p = src ? src + i * PAGE_SIZE : NULL;
ret = __kvm_gmem_populate(kvm, slot, file, start_gfn + i, p,
post_populate, opaque);
if (ret)
break;
}
fput(file);
return ret && !i ? ret : i;
}
Powered by blists - more mailing lists