linux-kernel - Re: [RFC PATCH v2 32/51] KVM: guest_memfd: Support guestmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aN_jGnu2KQnxDniD@google.com>
Date: Fri, 3 Oct 2025 07:52:10 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Ackerley Tng <ackerleytng@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Yan Zhao <yan.y.zhao@...el.com>, Fuad Tabba <tabba@...gle.com>, 
	Binbin Wu <binbin.wu@...ux.intel.com>, Michael Roth <michael.roth@....com>, 
	Ira Weiny <ira.weiny@...el.com>, Rick P Edgecombe <rick.p.edgecombe@...el.com>, 
	Vishal Annapurve <vannapurve@...gle.com>, David Hildenbrand <david@...hat.com>, 
	Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v2 32/51] KVM: guest_memfd: Support guestmem_hugetlb
 as custom allocator

Trimmed the Cc to KVM/guest_memfd folks.

On Wed, May 14, 2025, Ackerley Tng wrote:
> @@ -22,6 +25,10 @@ struct kvm_gmem_inode_private {
>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>  	struct maple_tree shareability;
>  #endif
> +#ifdef CONFIG_KVM_GMEM_HUGETLB
> +	const struct guestmem_allocator_operations *allocator_ops;
> +	void *allocator_private;
> +#endif

This is beyond silly.  There is _one_ "custom" allocator, and no evidence that
the "generic" logic written for custom allocators would actually be correct for
anything other than a hugetlb allocator.

And to my point about guestmem_hugetlb.c not actually needing access to "private"
mm/ state,  I would much rather add e.g. virt/kvm/guest_memfd_hugetlb.{c.h}, and
in the header define:

  struct gmem_hugetlb_inode {
	struct hstate *h;
	struct hugepage_subpool *spool;
	struct hugetlb_cgroup *h_cg_rsvd;
  };

and then in guest_memfd.c have

  struct gmem_inode {
	struct shared_policy policy;
	struct inode vfs_inode;

	u64 flags;
	struct maple_tree attributes;

#ifdef CONFIG_KVM_GUEST_MEMFD_HUGETLB
	struct gmem_hugetlb_inode hugetlb;
#endif
  };

or maybe even better, avoid even that "jump" and define "struct gmem_inode" in a
new virt/kvm/guest_memfd.h:

  struct gmem_inode {
	struct shared_policy policy;
	struct inode vfs_inode;

	u64 flags;
	struct maple_tree attributes;

#ifdef CONFIG_KVM_GUEST_MEMFD_HUGETLB
	struct hstate *h;
	struct hugepage_subpool *spool;
	struct hugetlb_cgroup *h_cg_rsvd;
#endif
  };


The setup code can them be:

#ifdef CONFIG_KVM_GUEST_MEMFD_HUGETLB
	if (flags & GUEST_MEMFD_FLAG_HUGETLB) {
		err = kvm_gmem_init_hugetlb(inode, size, huge_page_size_log2)
		if (err)
			goto out;
	}
#endif


Actually, if we're at all clever, the #ifdefs can go away completely so long as
kvm_gmem_init_hugetlb() is uncondtionally _declared_, because we rely on dead
code elimination to drop the call before linking.

>  };
> +/**
> + * kvm_gmem_truncate_indices() - Truncates all folios beginning @index for
> + * @nr_pages.
> + *
> + * @mapping: filemap to truncate pages from.
> + * @index: the index in the filemap to begin truncation.
> + * @nr_pages: number of PAGE_SIZE pages to truncate.
> + *
> + * Return: the number of PAGE_SIZE pages that were actually truncated.
> + */

Do not add kerneldoc comments for internal helpers.  They inevitably become stale
and are a source of friction for developers.  The _only_ non-obvious thing here
is the return value.  

> +static long kvm_gmem_truncate_indices(struct address_space *mapping,
> +				      pgoff_t index, size_t nr_pages)
> +{
> +	struct folio_batch fbatch;
> +	long truncated;
> +	pgoff_t last;
> +
> +	last = index + nr_pages - 1;
> +
> +	truncated = 0;
> +	folio_batch_init(&fbatch);
> +	while (filemap_get_folios(mapping, &index, last, &fbatch)) {
> +		unsigned int i;
> +
> +		for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> +			struct folio *f = fbatch.folios[i];
> +
> +			truncated += folio_nr_pages(f);
> +			folio_lock(f);
> +			truncate_inode_folio(f->mapping, f);
> +			folio_unlock(f);
> +		}
> +
> +		folio_batch_release(&fbatch);
> +		cond_resched();
> +	}
> +
> +	return truncated;
> +}