lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHEwT4X0RcfZzHlt@google.com>
Date: Fri, 11 Jul 2025 08:39:59 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Michael Roth <michael.roth@....com>
Cc: Yan Zhao <yan.y.zhao@...el.com>, pbonzini@...hat.com, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, rick.p.edgecombe@...el.com, kai.huang@...el.com, 
	adrian.hunter@...el.com, reinette.chatre@...el.com, xiaoyao.li@...el.com, 
	tony.lindgren@...el.com, binbin.wu@...ux.intel.com, dmatlack@...gle.com, 
	isaku.yamahata@...el.com, ira.weiny@...el.com, vannapurve@...gle.com, 
	david@...hat.com, ackerleytng@...gle.com, tabba@...gle.com, 
	chao.p.peng@...el.com
Subject: Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()

On Fri, Jul 11, 2025, Michael Roth wrote:
> On Fri, Jul 11, 2025 at 12:36:24PM +0800, Yan Zhao wrote:
> > Besides, it can't address the 2nd AB-BA lock issue as mentioned in the patch
> > log:
> > 
> > Problem
> > ===
> > ...
> > (2)
> > Moreover, in step 2, get_user_pages_fast() may acquire mm->mmap_lock,
> > resulting in the following lock sequence in tdx_vcpu_init_mem_region():
> > - filemap invalidation lock --> mm->mmap_lock
> > 
> > However, in future code, the shared filemap invalidation lock will be held
> > in kvm_gmem_fault_shared() (see [6]), leading to the lock sequence:
> > - mm->mmap_lock --> filemap invalidation lock
> 
> I wouldn't expect kvm_gmem_fault_shared() to trigger for the
> KVM_MEMSLOT_SUPPORTS_GMEM_SHARED case (or whatever we end up naming it).

Irrespective of shared faults, I think the API could do with a bit of cleanup
now that TDX has landed, i.e. now that we can see a bit more of the picture.

As is, I'm pretty sure TDX is broken with respect to hugepage support, because
kvm_gmem_populate() marks an entire folio as prepared, but TDX only ever deals
with one page at a time.  So that needs to be changed.  I assume it's already
address in one of the many upcoming series, but it still shows a flaw in the API.

Hoisting the retrieval of the source page outside of filemap_invalidate_lock()
seems pretty straightforward, and would provide consistent ABI for all vendor
flavors.  E.g. as is, non-struct-page memory will work for SNP, but not TDX.  The
obvious downside is that struct-page becomes a requirement for SNP, but that

The below could be tweaked to batch get_user_pages() into an array of pointers,
but given that both SNP and TDX can only operate on one 4KiB page at a time, and
that hugepage support doesn't yet exist, trying to super optimize the hugepage
case straightaway doesn't seem like a pressing concern.

static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
				struct file *file, gfn_t gfn, void __user *src,
				kvm_gmem_populate_cb post_populate, void *opaque)
{
	pgoff_t index = kvm_gmem_get_index(slot, gfn);
	struct page *src_page = NULL;
	bool is_prepared = false;
	struct folio *folio;
	int ret, max_order;
	kvm_pfn_t pfn;

	if (src) {
		ret = get_user_pages((unsigned long)src, 1, 0, &src_page);
		if (ret < 0)
			return ret;
		if (ret != 1)
			return -ENOMEM;
	}

	filemap_invalidate_lock(file->f_mapping);

	if (!kvm_range_has_memory_attributes(kvm, gfn, gfn + 1,
					     KVM_MEMORY_ATTRIBUTE_PRIVATE,
					     KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
		ret = -EINVAL;
		goto out_unlock;
	}

	folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &is_prepared, &max_order);
	if (IS_ERR(folio)) {
		ret = PTR_ERR(folio);
		goto out_unlock;
	}

	folio_unlock(folio);

	if (is_prepared) {
		ret = -EEXIST;
		goto out_put_folio;
	}

	ret = post_populate(kvm, gfn, pfn, src_page, opaque);
	if (!ret)
		kvm_gmem_mark_prepared(folio);

out_put_folio:
	folio_put(folio);
out_unlock:
	filemap_invalidate_unlock(file->f_mapping);

	if (src_page)
		put_page(src_page);
	return ret;
}

long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
		       kvm_gmem_populate_cb post_populate, void *opaque)
{
	struct file *file;
	struct kvm_memory_slot *slot;
	void __user *p;
	int ret = 0;
	long i;

	lockdep_assert_held(&kvm->slots_lock);
	if (npages < 0)
		return -EINVAL;

	slot = gfn_to_memslot(kvm, start_gfn);
	if (!kvm_slot_can_be_private(slot))
		return -EINVAL;

	file = kvm_gmem_get_file(slot);
	if (!file)
		return -EFAULT;

	npages = min_t(ulong, slot->npages - (start_gfn - slot->base_gfn), npages);
	for (i = 0; i < npages; i ++) {
		if (signal_pending(current)) {
			ret = -EINTR;
			break;
		}

		p = src ? src + i * PAGE_SIZE : NULL;

		ret = __kvm_gmem_populate(kvm, slot, file, start_gfn + i, p,
					  post_populate, opaque);
		if (ret)
			break;
	}

	fput(file);
	return ret && !i ? ret : i;
}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ