lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aN_0ZMduyGlX0QwU@google.com>
Date: Fri, 3 Oct 2025 09:05:56 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Ackerley Tng <ackerleytng@...gle.com>
Cc: kvm@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	x86@...nel.org, linux-fsdevel@...r.kernel.org, aik@....com, 
	ajones@...tanamicro.com, akpm@...ux-foundation.org, amoorthy@...gle.com, 
	anthony.yznaga@...cle.com, anup@...infault.org, aou@...s.berkeley.edu, 
	bfoster@...hat.com, binbin.wu@...ux.intel.com, brauner@...nel.org, 
	catalin.marinas@....com, chao.p.peng@...el.com, chenhuacai@...nel.org, 
	dave.hansen@...el.com, david@...hat.com, dmatlack@...gle.com, 
	dwmw@...zon.co.uk, erdemaktas@...gle.com, fan.du@...el.com, fvdl@...gle.com, 
	graf@...zon.com, haibo1.xu@...el.com, hch@...radead.org, hughd@...gle.com, 
	ira.weiny@...el.com, isaku.yamahata@...el.com, jack@...e.cz, 
	james.morse@....com, jarkko@...nel.org, jgg@...pe.ca, jgowans@...zon.com, 
	jhubbard@...dia.com, jroedel@...e.de, jthoughton@...gle.com, 
	jun.miao@...el.com, kai.huang@...el.com, keirf@...gle.com, 
	kent.overstreet@...ux.dev, kirill.shutemov@...el.com, liam.merwick@...cle.com, 
	maciej.wieczor-retman@...el.com, mail@...iej.szmigiero.name, maz@...nel.org, 
	mic@...ikod.net, michael.roth@....com, mpe@...erman.id.au, 
	muchun.song@...ux.dev, nikunj@....com, nsaenz@...zon.es, 
	oliver.upton@...ux.dev, palmer@...belt.com, pankaj.gupta@....com, 
	paul.walmsley@...ive.com, pbonzini@...hat.com, pdurrant@...zon.co.uk, 
	peterx@...hat.com, pgonda@...gle.com, pvorel@...e.cz, qperret@...gle.com, 
	quic_cvanscha@...cinc.com, quic_eberman@...cinc.com, 
	quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com, quic_pheragu@...cinc.com, 
	quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com, richard.weiyang@...il.com, 
	rick.p.edgecombe@...el.com, rientjes@...gle.com, roypat@...zon.co.uk, 
	rppt@...nel.org, shuah@...nel.org, steven.price@....com, 
	steven.sistare@...cle.com, suzuki.poulose@....com, tabba@...gle.com, 
	thomas.lendacky@....com, usama.arif@...edance.com, vannapurve@...gle.com, 
	vbabka@...e.cz, viro@...iv.linux.org.uk, vkuznets@...hat.com, 
	wei.w.wang@...el.com, will@...nel.org, willy@...radead.org, 
	xiaoyao.li@...el.com, yan.y.zhao@...el.com, yilun.xu@...el.com, 
	yuzenghui@...wei.com, zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 29/51] mm: guestmem_hugetlb: Wrap HugeTLB as an
 allocator for guest_memfd

On Fri, Oct 03, 2025, Sean Christopherson wrote:
> On Wed, May 14, 2025, Ackerley Tng wrote:
> > guestmem_hugetlb is an allocator for guest_memfd. It wraps HugeTLB to
> > provide huge folios for guest_memfd.
> > 
> > This patch also introduces guestmem_allocator_operations as a set of
> > operations that allocators for guest_memfd can provide. In a later
> > patch, guest_memfd will use these operations to manage pages from an
> > allocator.
> > 
> > The allocator operations are memory-management specific and are placed
> > in mm/ so key mm-specific functions do not have to be exposed
> > unnecessarily.
> 
> This code doesn't have to be put in mm/, all of the #includes are to <linux/xxx.h>.
> Unless I'm missing something, what you actually want to avoid is _exporting_ mm/
> APIs, and for that all that is needed is ensure the code is built-in to the kernel
> binary, not to kvm.ko.
> 
> diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
> index d047d4cf58c9..c18c77e8a638 100644
> --- a/virt/kvm/Makefile.kvm
> +++ b/virt/kvm/Makefile.kvm
> @@ -13,3 +13,5 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
>  kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
>  kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
> +
> +obj-$(subst m,y,$(CONFIG_KVM_GUEST_MEMFD)) += $(KVM)/guest_memfd_hugepages.o
> \ No newline at end of file
> 
> People may want the code to live in mm/ for maintenance and ownership reasons
> (or not, I haven't followed the discussions on hugepage support), but that's a
> very different justification than what's described in the changelog.
> 
> And if the _only_ user is guest_memfd, putting this in mm/ feels quite weird.
> And if we anticipate other users, the name guestmem_hugetlb is weird, because
> AFAICT there's nothing in here that is in any way guest specific, it's just a
> few APIs for allocating and accounting hugepages.
> 
> Personally, I don't see much point in trying to make this a "generic" library,
> in quotes because the whole guestmem_xxx namespace makes it anything but generic.
> I don't see anything in mm/guestmem_hugetlb.c that makes me go "ooh, that's nasty,
> I'm glad this is handled by a library".  But if we want to go straight to a
> library, it should be something that is really truly generic, i.e. not "guest"
> specific in any way.

Ah, the complexity and the mm-internal dependencies come along in the splitting
and merging patch.  Putting that code in mm/ makes perfect sense, but I'm still
not convinced that putting _all_ of this code in mm/ is the correct split.

As proposed, this is a weird combination of being an extension of guest_memfd, a
somewhat generic library, _and_ a subsystem (e.g. the global workqueue and stash).

_If_ we need a library, then IMO it should be a truly generic library.  Any pieces
that are guest_memfd specific belong in KVM.  And any subsystem-like things should
should probably be implemented as an extension to HugeTLB itself, which is already
it's own subsytem.  Emphasis on "if", because it's not clear to me that that a
library is warranted.

AFAICT, the novelty here is the splitting and re-merging of hugetlb folios, and
that seems like it should be explicitly an extension of the hugetlb subsystem.
E.g. that behavior needs to take hugetlb_lock, interact with global vmemmap state
like hugetlb_optimize_vmemmap_key, etc.  If that's implemented as something like
hugetlb_splittable.c or whatever, and wired up to be explicitly configured via
hugetlb_init(), then there may not be much left for a library.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ