[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250916224133.ysuqlboywxyybsbl@amd.com>
Date: Tue, 16 Sep 2025 17:41:33 -0500
From: Michael Roth <michael.roth@....com>
To: Ackerley Tng <ackerleytng@...gle.com>
CC: <kvm@...r.kernel.org>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
<linux-fsdevel@...r.kernel.org>, <aik@....com>, <ajones@...tanamicro.com>,
<akpm@...ux-foundation.org>, <amoorthy@...gle.com>,
<anthony.yznaga@...cle.com>, <anup@...infault.org>, <aou@...s.berkeley.edu>,
<bfoster@...hat.com>, <binbin.wu@...ux.intel.com>, <brauner@...nel.org>,
<catalin.marinas@....com>, <chao.p.peng@...el.com>, <chenhuacai@...nel.org>,
<dave.hansen@...el.com>, <david@...hat.com>, <dmatlack@...gle.com>,
<dwmw@...zon.co.uk>, <erdemaktas@...gle.com>, <fan.du@...el.com>,
<fvdl@...gle.com>, <graf@...zon.com>, <haibo1.xu@...el.com>,
<hch@...radead.org>, <hughd@...gle.com>, <ira.weiny@...el.com>,
<isaku.yamahata@...el.com>, <jack@...e.cz>, <james.morse@....com>,
<jarkko@...nel.org>, <jgg@...pe.ca>, <jgowans@...zon.com>,
<jhubbard@...dia.com>, <jroedel@...e.de>, <jthoughton@...gle.com>,
<jun.miao@...el.com>, <kai.huang@...el.com>, <keirf@...gle.com>,
<kent.overstreet@...ux.dev>, <kirill.shutemov@...el.com>,
<liam.merwick@...cle.com>, <maciej.wieczor-retman@...el.com>,
<mail@...iej.szmigiero.name>, <maz@...nel.org>, <mic@...ikod.net>,
<mpe@...erman.id.au>, <muchun.song@...ux.dev>, <nikunj@....com>,
<nsaenz@...zon.es>, <oliver.upton@...ux.dev>, <palmer@...belt.com>,
<pankaj.gupta@....com>, <paul.walmsley@...ive.com>, <pbonzini@...hat.com>,
<pdurrant@...zon.co.uk>, <peterx@...hat.com>, <pgonda@...gle.com>,
<pvorel@...e.cz>, <qperret@...gle.com>, <quic_cvanscha@...cinc.com>,
<quic_eberman@...cinc.com>, <quic_mnalajal@...cinc.com>,
<quic_pderrin@...cinc.com>, <quic_pheragu@...cinc.com>,
<quic_svaddagi@...cinc.com>, <quic_tsoni@...cinc.com>,
<richard.weiyang@...il.com>, <rick.p.edgecombe@...el.com>,
<rientjes@...gle.com>, <roypat@...zon.co.uk>, <rppt@...nel.org>,
<seanjc@...gle.com>, <shuah@...nel.org>, <steven.price@....com>,
<steven.sistare@...cle.com>, <suzuki.poulose@....com>, <tabba@...gle.com>,
<thomas.lendacky@....com>, <usama.arif@...edance.com>,
<vannapurve@...gle.com>, <vbabka@...e.cz>, <viro@...iv.linux.org.uk>,
<vkuznets@...hat.com>, <wei.w.wang@...el.com>, <will@...nel.org>,
<willy@...radead.org>, <xiaoyao.li@...el.com>, <yan.y.zhao@...el.com>,
<yilun.xu@...el.com>, <yuzenghui@...wei.com>, <zhiquan1.li@...el.com>
Subject: Re: [RFC PATCH v2 29/51] mm: guestmem_hugetlb: Wrap HugeTLB as an
allocator for guest_memfd
On Fri, May 16, 2025 at 01:33:39PM -0700, Ackerley Tng wrote:
> Ackerley Tng <ackerleytng@...gle.com> writes:
>
> > Ackerley Tng <ackerleytng@...gle.com> writes:
> >
> >> guestmem_hugetlb is an allocator for guest_memfd. It wraps HugeTLB to
> >> provide huge folios for guest_memfd.
> >>
> >> This patch also introduces guestmem_allocator_operations as a set of
> >> operations that allocators for guest_memfd can provide. In a later
> >> patch, guest_memfd will use these operations to manage pages from an
> >> allocator.
> >>
> >> The allocator operations are memory-management specific and are placed
> >> in mm/ so key mm-specific functions do not have to be exposed
> >> unnecessarily.
> >>
> >> Signed-off-by: Ackerley Tng <ackerleytng@...gle.com>
> >>
> >> Change-Id: I3cafe111ea7b3c84755d7112ff8f8c541c11136d
> >> ---
> >> include/linux/guestmem.h | 20 +++++
> >> include/uapi/linux/guestmem.h | 29 +++++++
> >> mm/Kconfig | 5 +-
> >> mm/guestmem_hugetlb.c | 159 ++++++++++++++++++++++++++++++++++
> >> 4 files changed, 212 insertions(+), 1 deletion(-)
> >> create mode 100644 include/linux/guestmem.h
> >> create mode 100644 include/uapi/linux/guestmem.h
> >>
> >> <snip>
> >>
> >> diff --git a/mm/Kconfig b/mm/Kconfig
> >> index 131adc49f58d..bb6e39e37245 100644
> >> --- a/mm/Kconfig
> >> +++ b/mm/Kconfig
> >> @@ -1218,7 +1218,10 @@ config SECRETMEM
> >>
> >> config GUESTMEM_HUGETLB
> >> bool "Enable guestmem_hugetlb allocator for guest_memfd"
> >> - depends on HUGETLBFS
> >> + select GUESTMEM
> >> + select HUGETLBFS
> >> + select HUGETLB_PAGE
> >> + select HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> >
> > My bad. I left out CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y in
> > my testing and just found that when it is set, I hit
> >
> > BUG_ON(pte_page(ptep_get(pte)) != walk->reuse_page);
> >
> > with the basic guest_memfd_test on splitting pages on allocation.
> >
> > I'll follow up with the fix soon.
> >
> > Another note about testing: I've been testing in a nested VM for the
> > development process:
> >
> > 1. Host
> > 2. VM for development
> > 3. Nested VM running kernel being developed
> > 4. Nested nested VMs created during selftests
> >
> > This series has not yet been tested on a physical host.
> >
> >> help
> >> Enable this to make HugeTLB folios available to guest_memfd
> >> (KVM virtualization) as backing memory.
> >>
> >> <snip>
> >>
>
> Here's the fix for this issue
>
> From 998af6404d4e39920ba42764e7f3815cb9bb9e3d Mon Sep 17 00:00:00 2001
> Message-ID: <998af6404d4e39920ba42764e7f3815cb9bb9e3d.1747427489.git.ackerleytng@...gle.com>
> From: Ackerley Tng <ackerleytng@...gle.com>
> Date: Fri, 16 May 2025 13:14:55 -0700
> Subject: [RFC PATCH v2 1/1] KVM: guest_memfd: Reorder undoing vmemmap
> optimization and stashing hugetlb folio metadata
>
> Without this patch, when HugeTLB folio metadata is stashed, the
> vmemmap_optimized flag, stored in a HugeTLB folio's folio->private was
> stashed as set.
>
> The first splitting works, but on merging, when the folio metadata was
> unstashed, vmemmap_optimized is unstashed as set, making the call to
> hugetlb_vmemmap_optimize_folio() skip actually applying optimizations.
>
> On a second split, hugetlb_vmemmap_restore_folio() attempts to reapply
> optimizations when it was already applied, hence hitting the BUG().
>
> Signed-off-by: Ackerley Tng <ackerleytng@...gle.com>
> ---
> mm/guestmem_hugetlb.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/mm/guestmem_hugetlb.c b/mm/guestmem_hugetlb.c
> index 8727598cf18e..2c0192543676 100644
> --- a/mm/guestmem_hugetlb.c
> +++ b/mm/guestmem_hugetlb.c
> @@ -200,16 +200,21 @@ static int guestmem_hugetlb_split_folio(struct folio *folio)
> return 0;
>
> orig_nr_pages = folio_nr_pages(folio);
> - ret = guestmem_hugetlb_stash_metadata(folio);
> +
> + /*
> + * hugetlb_vmemmap_restore_folio() has to be called ahead of the rest
> + * because it checks page type. This doesn't actually split the folio,
> + * so the first few struct pages are still intact.
> + */
> + ret = hugetlb_vmemmap_restore_folio(folio_hstate(folio), folio);
> if (ret)
> return ret;
>
> /*
> - * hugetlb_vmemmap_restore_folio() has to be called ahead of the rest
> - * because it checks and page type. This doesn't actually split the
> - * folio, so the first few struct pages are still intact.
> + * Stash metadata after vmemmap stuff so the outcome of the vmemmap
> + * restoration is stashed.
> */
> - ret = hugetlb_vmemmap_restore_folio(folio_hstate(folio), folio);
> + ret = guestmem_hugetlb_stash_metadata(folio);
> if (ret)
> goto err;
Doh, I missed this before replying earlier. This definitely seems like
the cleaner fix as it pertains to other flags/state that potentially
becomes stale after calling hugetlb_vmemmap_restore_folio().
-Mike
Powered by blists - more mailing lists