lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+CK2bDQDsjBkYabH5DVzSr_kuut-XHKb+JFTA=PLa+8gcCVLw@mail.gmail.com>
Date: Tue, 30 Dec 2025 11:37:19 -0500
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: Mike Rapoport <rppt@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	David Hildenbrand <david@...nel.org>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, 
	Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, Jonathan Corbet <corbet@....net>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, Muchun Song <muchun.song@...ux.dev>, 
	Oscar Salvador <osalvador@...e.de>, Alexander Graf <graf@...zon.com>, David Matlack <dmatlack@...gle.com>, 
	David Rientjes <rientjes@...gle.com>, Jason Gunthorpe <jgg@...dia.com>, 
	Samiullah Khawaja <skhawaja@...gle.com>, Vipin Sharma <vipinsh@...gle.com>, 
	Zhu Yanjun <yanjun.zhu@...ux.dev>, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	linux-doc@...r.kernel.org, kexec@...ts.infradead.org
Subject: Re: [RFC PATCH 06/10] liveupdate: hugetlb subsystem FLB state preservation

On Mon, Dec 29, 2025 at 4:21 PM Pratyush Yadav <pratyush@...nel.org> wrote:
>
> On Tue, Dec 23 2025, Pasha Tatashin wrote:
>
> > On Sat, Dec 6, 2025 at 6:03 PM Pratyush Yadav <pratyush@...nel.org> wrote:
> >>
> >> HugeTLB manages its own pages. It allocates them on boot and uses those
> >> to fulfill hugepage requests.
> >>
> >> To support live update for a hugetlb-backed memfd, it is necessary to
> >> track how many pages of each hstate are coming from live update. This is
> >> needed to ensure the boot time allocations don't over-allocate huge
> >> pages, causing the rest of the system unexpected memory pressure.
> >>
> >> For example, say the system has 100G memory and it uses 90 1G huge
> >> pages, with 10G put aside for other processes. Now say 5 of those pages
> >> are preserved via KHO for live updating a huge memfd.
> >>
> >> But during boot, the system will still see that it needs 90 huge pages,
> >> so it will attempt to allocate those. When the file is later retrieved,
> >> those 5 pages also get added to the huge page pool, resulting in 95
> >> total huge pages. This exceeds the original expectation of 90 pages, and
> >> ends up wasting memory.
> >>
> >> LUO has file-lifecycle-bound (FLB) data to keep track of global state of
> >> a subsystem. Use it to track how many huge pages are used up for each
> >> hstate. When a file is preserved, it will increment to the counter, and
> >> when it is unpreserved, it will decrement it. During boot time
> >> allocations, this data can be used to calculate how many hugepages
> >> actually need to be allocated.
> >>
> >> Design note: another way of doing this would be to preserve the entire
> >> set of hugepages using the FLB, skip boot time allocation, and restore
> >> them all on FLB retrieve. The pain problem with that approach is that it
> >> would need to freeze all hstates after serializing them. This will need
> >> a lot more invasive changes in hugetlb since there are many ways folios
> >> can be added to or removed from a hstate. Doing it this way is simpler
> >> and less invasive.
> >>
> >> Signed-off-by: Pratyush Yadav <pratyush@...nel.org>
> >> ---
> >>  Documentation/mm/memfd_preservation.rst |   9 ++
> >>  MAINTAINERS                             |   1 +
> >>  include/linux/kho/abi/hugetlb.h         |  66 +++++++++
> >>  kernel/liveupdate/Kconfig               |  12 ++
> >>  mm/Makefile                             |   1 +
> >>  mm/hugetlb.c                            |   1 +
> >>  mm/hugetlb_internal.h                   |  15 ++
> >>  mm/hugetlb_luo.c                        | 179 ++++++++++++++++++++++++
> >>  8 files changed, 284 insertions(+)
> >>  create mode 100644 include/linux/kho/abi/hugetlb.h
> >>  create mode 100644 mm/hugetlb_luo.c
> >>
> [...]
> >> +static int hugetlb_flb_retrieve(struct liveupdate_flb_op_args *args)
> >> +{
> >> +       /*
> >> +        * The FLB is only needed for boot-time calculation of how many
> >> +        * hugepages are needed. This is done by early boot handlers already.
> >> +        * Free the serialized state now.
> >> +        */
> >
> > It should be done in this function.
>
> The calculations can't be done in retrieve. Retrieve happens only once
> and for the whole FLB. They will need to come from
> hugetlb_hstate_alloc_pages().
>
> Maybe you mean getting rid of liveupdate_flb_incoming_early()? Yeah,
> that I can do. It will make this function a no-op once we move the
> kho_restore_free() to finish().

Yeah, this is what I meant.

Thanks,
Pasha

>
> >
> >> +       kho_restore_free(phys_to_virt(args->data));
> >
> > This should be moved to finish() after blackout.
>
> Sure.
>
> >
> >> +
> >> +       /*
> >> +        * HACK: But since LUO FLB still needs an obj, use ZERO_SIZE_PTR to
> >> +        * satisfy it.
> >> +        */
> >> +       args->obj = ZERO_SIZE_PTR;
> >
> > Hopefully this is not needed any more with the updated FLB, please check :-)
>
> Yep. IIRC when I sent this series the older version of FLB was in
> mm-nonmm-unstable.
>
> >
> >> +       return 0;
> >> +}
> >> +
> [...]
>
> --
> Regards,
> Pratyush Yadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ